Donghui Lin

2020

pdf bib abs
Designing Multilingual Interactive Agents using Small Dialogue Corpora
Donghui Lin | Masayuki Otani | Ryosuke Okuno | Toru Ishida
Proceedings of the Twelfth Language Resources and Evaluation Conference

Interactive dialogue agents like smart speakers have become more and more popular in recent years. These agents are being developed on machine learning technologies that use huge amounts of language resources. However, many entities in specialized fields are struggling to develop their own interactive agents due to a lack of language resources such as dialogue corpora, especially when the end users need interactive agents that offer multilingual support. Therefore, we aim at providing a general design framework for multilingual interactive agents in specialized domains that, it is assumed, have small or non-existent dialogue corpora. To achieve our goal, we first integrate and customize external language services for supporting multilingual functions of interactive agents. Then, we realize context-aware dialogue generation under the situation of small corpora. Third, we develop a gradual design process for acquiring dialogue corpora and improving the interactive agents. We implement a multilingual interactive agent in the field of healthcare and conduct experiments to illustrate the effectiveness of the implemented agent.

2018

pdf bib
A Framework for Multi-Language Service Design with the Language Grid
Donghui Lin | Yohei Murakami | Toru Ishida
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib abs
Towards a Language Service Infrastructure for Mobile Environments
Ngoc Nguyen | Donghui Lin | Takao Nakaguchi | Toru Ishida
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Since mobile devices have feature-rich configurations and provide diverse functions, the use of mobile devices combined with the language resources of cloud environments is high promising for achieving a wide range communication that goes beyond the current language barrier. However, there are mismatches between using resources of mobile devices and services in the cloud such as the different communication protocol and different input and output methods. In this paper, we propose a language service infrastructure for mobile environments to combine these services. The proposed language service infrastructure allows users to use and mashup existing language resources on both cloud environments and their mobile devices. Furthermore, it allows users to flexibly use services in the cloud or services on mobile devices in their composite service without implementing several different composite services that have the same functionality. A case study of Mobile Shopping Translation System using both a service in the cloud (translation service) and services on mobile devices (Bluetooth low energy (BLE) service and text-to-speech service) is introduced.

pdf bib
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)
Yohei Murakami | Donghui Lin | Nancy Ide | James Pustejovsky
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

pdf bib abs
An Ontology for Language Service Composability
Yohei Murakami | Takao Nakaguchi | Donghui Lin | Toru Ishida
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

Fragmentation and recombination is a key to create customized language environments for supporting various intercultural activities. Fragmentation provides various language resource components for the customized language environments and recombination builds each language environment according to user’s request by combining these components. To realize this fragmentation and recombination process, existing language resources (both data and programs) should be shared as language services and combined beyond mismatch of their service interfaces. To address this issue, standardization is inevitable: standardized interfaces are necessary for language services as well as data format required for language resources. Therefore, we have constructed a hierarchy of language services based on inheritance of service interfaces, which is called language service ontology. This ontology allows users to create a new customized language service that is compatible with existing ones. Moreover, we have developed a dynamic service binding technology that instantiates various executable customized services from an abstract workflow according to user’s request. By using the ontology and service binding together, users can bind the instantiated language service to another abstract workflow for a new customized one.

2014

pdf bib abs
Bilingual Dictionary Induction as an Optimization Problem
Wushouer Mairidan | Toru Ishida | Donghui Lin | Katsutoshi Hirayama
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words. In this paper we propose a constraint approach to pivot-based dictionary induction for the case of two closely related languages. In order to take into account the word senses, we use an approach based on semantic distances, in which possibly missing translations are considered, and instance of induction is encoded as an optimization problem to generate new dictionary. Evaluations show that the proposal achieves 83.7% accuracy and approximately 70.5% recall, thus outperforming the baseline pivot-based method.

pdf bib abs
Crowdsourcing for Evaluating Machine Translation Quality
Shinsuke Goto | Donghui Lin | Toru Ishida
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to professional scores with regard to three metrics: translation-score, sentence-score, and system-score. A Chinese to English translation evaluation task was designed using around the NTCIR-9 PATENT parallel corpus with the goal being 5-range evaluations of adequacy and fluency. The experiment shows that the average score of crowdsource workers well matches professional evaluation results. The system-score comparison strongly indicates that crowdsourcing can be used to find the best translation system given the input of 10 source sentence.

pdf bib abs
Integration of Workflow and Pipeline for Language Service Composition
Trang Mai Xuan | Yohei Murakami | Donghui Lin | Toru Ishida
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Integrating language resources and language services is a critical part of building natural language processing applications. Service workflow and processing pipeline are two approaches for sharing and combining language resources. Workflow languages focus on expressive power of the languages to describe variety of workflow patterns to meet users’ needs. Users can combine those language services in service workflows to meet their requirements. The workflows can be accessible in distributed manner and can be invoked independently of the platforms. However, workflow languages lack of pipelined execution support to improve performance of workflows. Whereas, the processing pipeline provides a straightforward way to create a sequence of linguistic processing to analyze large amounts of text data. It focuses on using pipelined execution and parallel execution to improve throughput of pipelines. However, the resulting pipelines are standalone applications, i.e., software tools that are accessible only via local machine and that can only be run with the processing pipeline platforms. In this paper we propose an integration framework of the two approaches so that each offests the disadvantages of the other. We then present a case study wherein two representative frameworks, the Language Grid and UIMA, are integrated.

2013

pdf bib
Interoperability between Service Composition and Processing Pipeline: Case Study on the Language Grid and UIMA
Trang Mai Xuan | Yohei Murakami | Donghui Lin | Toru Ishida
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib abs
Two Phase Evaluation for Selecting Machine Translation Services
Chunqi Shi | Donghui Lin | Masahiko Shimada | Toru Ishida
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

An increased number of machine translation services are now available. Unfortunately, none of them can provide adequate translation quality for all input sources. This forces the user to select from among the services according to his needs. However, it is tedious and time consuming to perform this manual selection. Our solution, proposed here, is an automatic mechanism that can select the most appropriate machine translation service. Although evaluation methods are available, such as BLEU, NIST, WER, etc., their evaluation results are not unanimous regardless of the translation sources. We proposed a two-phase architecture for selecting translation services. The first phase uses a data-driven classification to allow the most appropriate evaluation method to be selected according to each translation source. The second phase selects the most appropriate machine translation result by the selected evaluation method. We describe the architecture, detail the algorithm, and construct a prototype. Tests show that the proposal yields better translation quality than employing just one machine translation service.

pdf bib abs
Service Composition Scenarios for Task-Oriented Translation
Chunqi Shi | Donghui Lin | Toru Ishida
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Due to instant availability and low cost, machine translation is becoming popular. Machine translation mediated communication plays a more and more important role in international collaboration. However, machine translators cannot guarantee high quality translation. In a multilingual communication task, many in-domain resources, for example domain dictionaries, are needed to promote translation quality. This raises the problem of how to help communication task designers provide higher quality translation systems, systems that can take advantage of various in-domain resources. The Language Grid, a service-oriented collective intelligent platform, allows in-domain resources to be wrapped into language services. For task-oriented translation, we propose service composition scenarios for the composition of different language services, where various in-domain resources are utilized effectively. We design the architecture, provide a script language as the interface for the task designer, which is easy for describing the composition scenario, and make a case study of a Japanese-English campus orientation task. Based on the case study, we analyze the increase in translation quality possible and the usage of in-domain resources. The results demonstrate a clear improvement in translation accuracy when the in-domain resources are used.

2011

pdf bib
Open-Source Platform for Language Service Sharing
Yohei Murakami | Masahiro Tanaka | Donghui Lin | Toru Ishida
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

2010

pdf bib abs
Composing Human and Machine Translation Services: Language Grid for Improving Localization Processes
Donghui Lin | Yoshiaki Murakami | Toru Ishida | Yohei Murakami | Masahiro Tanaka
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

With the development of the Internet environments, more and more language services become accessible for common people. However, the gap between human translators and machine translators remains huge especially for the domain of localization processes that requires high translation quality. Although efforts of combining human and machine translators for supporting multilingual communication have been reported in previous research, how to apply such approaches for improving localization processes are rarely discussed. In this paper, we aim at improving localization processes by composing human and machine translation services based on the Language Grid, which is a language service platform that we have developed. Further, we conduct experiments to compare the translation quality and translation cost using several translation processes, including absolute machine translation processes, absolute human translation processes and translation processes by human and machine translation services. The experiment results show that composing monolingual roles and dictionary services improves the translation quality of machine translators, and that collaboration of human and machine translators is possible to reduce the cost comparing with the absolute bilingual human translation. We also discuss the generality of the experimental results and further challenging issues of the proposed localization processes.

pdf bib abs
Language Service Management with the Language Grid
Yohei Murakami | Donghui Lin | Masahiro Tanaka | Takao Nakaguchi | Toru Ishida
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As the number of language resources accessible on the Internet increases, many efforts have been made for combining language resources and language processing tools to create new services. However, existing language resource coordination frameworks cannot manage issues of intellectual property associated with language resources, which make it difficult for most end-users to get supports for their intercultural collaborations because they always have to deal with the issues by themselves. In this paper, we aim at constructing a new language service management architecture on the Language Grid, which enables language resource providers to control access to their resources in accordance with their own policies. Furthermore, we apply the proposed architecture to the operating Language Grid in order to validate the effectiveness of the architecture. As a result, several service management models utilizing the monitoring and access constraints are occurring to satisfy various requirements from language resource providers. These models can handle paid-for language resources as well as free language resources. Finally, we discuss further challenging issues of combining language resources under each different policies.