Virach Sornlertlamvanich

2016

pdf bib abs
Recurrent Neural Network with Word Embedding for Complaint Classification
Panuwat Assawinjaipetch | Kiyoaki Shirai | Virach Sornlertlamvanich | Sanparith Marukata
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

Complaint classification aims at using information to deliver greater insights to enhance user experience after purchasing the products or services. Categorized information can help us quickly collect emerging problems in order to provide a support needed. Indeed, the response to the complaint without the delay will grant users highest satisfaction. In this paper, we aim to deliver a novel approach which can clarify the complaints precisely with the aim to classify each complaint into nine predefined classes i.e. acces-sibility, company brand, competitors, facilities, process, product feature, staff quality, timing respec-tively and others. Given the idea that one word usually conveys ambiguity and it has to be interpreted by its context, the word embedding technique is used to provide word features while applying deep learning techniques for classifying a type of complaints. The dataset we use contains 8,439 complaints of one company.

This paper presents the language resource management system for the development and dissemination of Asian WordNet (AWN) and its web service application. We develop the platform to establish a network for the cross language WordNet development. Each node of the network is designed for maintaining the WordNet for a language. Via the table that maps between each language WordNet and the Princeton WordNet (PWN), the Asian WordNet is realized to visualize the cross language WordNet between the Asian languages. We propose a language resource management system, called WordNet Management System (WNMS), as a distributed management system that allows the server to perform the cross language WordNet retrieval, including the fundamental web service applications for editing, visualizing and language processing. The WNMS is implemented on a web service protocol therefore each node can be independently maintained, and the service of each language WordNet can be called directly through the web service API. In case of cross language implementation, the synset ID (or synset offset) defined by PWN is used to determined the linkage between the languages.

2009

pdf bib
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)
Hammam Riza | Virach Sornlertlamvanich
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

Corpus-based approaches and statistical approaches have been the main stream of natural language processing research for the past two decades. Language resources play a key role in such approaches, but there is an insufficient amount of language resources in many Asian languages. In this situation, standardisation of language resources would be of great help in developing resources in new languages. This paper presents the latest development efforts of our project which aims at creating a common standard for Asian language resources that is compatible with an international standard. In particular, the paper focuses on i) lexical specification and data categories relevant for building multilingual lexical resources for Asian languages; ii) a core upper-layer ontology needed for ensuring multilingual interoperability and iii) the evaluation platform used to test the entire architectural framework.

pdf bib abs
A Dependency Parser for Thai
Shisanu Tongchim | Randolf Altmeyer | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents some preliminary results of our dependency parser for Thai. It is part of an ongoing project in developing a syntactically annotated Thai corpus. The parser has been trained and tested by using the complete part of the corpus. The parser achieves 83.64% as the root accuracy, 78.54% as the dependency accuracy and 53.90% as the complete sentence accuracy. The trained parser will be used as a preprocessing step in our corpus annotation workflow in order to accelerate the corpus development.

pdf bib
Constructing Taxonomy of Numerative Classifiers for Asian Languages
Kiyoaki Shirai | Takenobu Tokunaga | Chu-Ren Huang | Shu-Kai Hsieh | Tzu-Yi Kuo | Virach Sornlertlamvanich | Thatsanee Charoenporn
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Synset Assignment for Bi-lingual Dictionary with Limited Resource
Virach Sornlertlamvanich | Thatsanee Charoenporn | Chumpol Mokarat | Hitoshi Isahara | Hammam Riza | Purev Jaimai
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Invited Talk: Cross Language Resource Sharing
Virach Sornlertlamvanich
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
KUI: an ubiquitous tool for collective intelligence development
Thatsanee Charoenporn | Virach Sornlertlamvanich | Hitoshi Isahara | Kergrit Robkop
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Enhanced Tools for Online Collaborative Language Resource Development
Virach Sornlertlamvanich | Thatsanee Charoenporn | Suphanut Thayaboon | Chumpol Mokarat | Hitoshi Isahara
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
Experiments in Base-NP Chunking and Its Role in Dependency Parsing for Thai
Shisanu Tongchim | Virach Sornlertlamvanich | Hitoshi Isahara
Coling 2008: Companion volume: Posters

2006

pdf bib abs
Blind Evaluation for Thai Search Engines
Shisanu Tongchim | Prapass Srichaivattana | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper compares the effectiveness of two different Thai search engines by using a blind evaluation. The probabilistic-based dictionary-less search engine is evaluated against the traditional word-based indexing method. The web documents from 12 Thai newspaper web sites consisting of 83,453 documents are used as the test collection. The relevance judgment is conducted on the first five returned results from each system. The evaluation process is completely blind. That is, the retrieved documents from both systems are shown to the judges without any information about thesearch techniques. Statistical testing shows that the dictionary-less approach is better than the word-based indexingapproach in terms of the number of found documents and the number of relevance documents.

pdf bib abs
A Conditional Random Field Framework for Thai Morphological Analysis
Canasai Kruengkrai | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents a framework for Thai morphological analysis based on the theoretical background of conditional random fields. We formulate morphological analysis of an unsegmented language as the sequential supervised learning problem. Given a sequence of characters, all possibilities of word/tag segmentation are generated, and then the optimal path is selected with some criterion. We examine two different techniques, including the Viterbi score and the confidence estimation. Preliminary results are given to show the feasibility of our proposed framework.

pdf bib abs
Word Knowledge Acquisition for Computational Lexicon Construction
Thatsanee Charoenporn | Canasai Kruengkrai | Thanaruk Theeramunkong | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The growing of multilingual information processing technology has created the need of linguistic resources, especially lexical database. Many attempts were put to alter the traditional dictionary to computational dictionary, or widely named as computational lexicon. TCLs Computational Lexicon (TCLLEX) is a recent development of a large-scale Thai Lexicon, which aims to serve as a fundamental linguistic resource for natural language processing research. We design either terminology or ontology for structuring the lexicon based on the idea of computability and reusability.

2005

pdf bib
From Non-segmenting Language Processing to Web Language Engineering
Virach Sornlertlamvanich
Proceedings of the Australasian Language Technology Workshop 2005

pdf bib
Analysis of an Iterative Algorithm for Term-Based Ontology Alignment
Shisanu Tongchim | Canasai Kruengkrai | Virach Sornlertlamvanich | Prapass Srichaivattana | Hitoshi Isahara
Second International Joint Conference on Natural Language Processing: Full Papers

2004

pdf bib
Open Collaborative Development of the Thai Language Resources for Natural Language Processing
Thatsanee Charoenporn | Virach Sornlertlamvanich | Sawit Kasuriya | Chatchawarn Hansakunbuntheung | Hitoshi Isahara
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Enriching a Thai Lexical Database with Selectional Preferences
Canasai Kruengkrai | Thatsanee Charoenporn | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis
Virongrong Tesprasit | Paisarn Charoenpornsawat | Virach Sornlertlamvanich
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
Improving Translation Quality of Rule-based Machine Translation
Paisarn Charoenpornsawat | Virach Sornlertlamvanich | Thatsanee Charoenporn
COLING-02: Machine Translation in Asia

pdf bib
A Cross System Machine Translation
Thepchai Supnithi | Virach Sornlertlamvanich | Thatsanee Charoenporn
COLING-02: Machine Translation in Asia

2001

pdf bib
Towards an Intelligent Multilingual Keyboard System
Tanapong Potipiti | Virach Sornlertlamvanich | Kanokwut Thanadkran
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Automatic Corpus-Based Thai Word Extraction with the C4.5 Learning Algorithm
Virach Sornlertlamvanich | Tanapong Potipiti | Thatsanee Charoenporn
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf bib
Panel: The State of the Art in Thai Language Processing
Virach Sornlertlamvanich | Tanapong Potipiti | Chai Wutiwiwatchai | Pradit Mittrapiyanuruk
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib abs
A New Formalization of Probabilistic GLR Parsing
Kentaro Unui | Virach Sornlertlamvanich | Hozumi Tanaka | Takenobu Tokunaga
Proceedings of the Fifth International Workshop on Parsing Technologies

This paper presents a new formalization of probabilistic GLR language modeling for statistical parsing. Our model inherits its essential features from Briscoe and Carroll’s generalized probabilistic LR model, which obtains context-sensitivity by assigning a probability to each LR parsing action according to its left and right context. Briscoe and Carroll’s model, however, has a drawback in that it is not formalized in any probabilistically well-founded way, which may degrade its parsing performance. Our formulation overcomes this drawback with a few significant refinements, while maintaining all the advantages of Briscoe and Carroll’s modeling.