Eiichiro Sumita

Also published as: Eiichiro SUMITA, Eiichro Sumita


2019

pdf bib
NICT’s Supervised Neural Machine Translation Systems for the WMT19 News Translation Task
Raj Dabre | Kehai Chen | Benjamin Marie | Rui Wang | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions. We focused on leveraging multilingual transfer learning and back-translation for the extremely low-resource language pairs: Kazakh↔English and Gujarati↔English translation. For the Chinese↔English translation, we used the provided parallel data augmented with a large quantity of back-translated monolingual data to train state-of-the-art NMT systems. We then employed techniques that have been proven to be most effective, such as back-translation, fine-tuning, and model ensembling, to generate the primary submissions of Chinese↔English. For English→Finnish, our submission from WMT18 remains a strong baseline despite the increase in parallel corpora for this year’s task.

pdf bib
NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task
Benjamin Marie | Haipeng Sun | Rui Wang | Kehai Chen | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper presents the NICT’s participation in the WMT19 unsupervised news translation task. We participated in the unsupervised translation direction: German-Czech. Our primary submission to the task is the result of a simple combination of our unsupervised neural and statistical machine translation systems. Our system is ranked first for the German-to-Czech translation task, using only the data provided by the organizers (“constraint’”), according to both BLEU-cased and human evaluation. We also performed contrastive experiments with other language pairs, namely, English-Gujarati and English-Kazakh, to better assess the effectiveness of unsupervised machine translation in for distant language pairs and in truly low-resource conditions.

pdf bib
NICT’s Supervised Neural Machine Translation Systems for the WMT19 Translation Robustness Task
Raj Dabre | Eiichiro Sumita
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this paper we describe our neural machine translation (NMT) systems for Japanese↔English translation which we submitted to the translation robustness task. We focused on leveraging transfer learning via fine tuning to improve translation quality. We used a fairly well established domain adaptation technique called Mixed Fine Tuning (MFT) (Chu et. al., 2017) to improve translation quality for Japanese↔English. We also trained bi-directional NMT models instead of uni-directional ones as the former are known to be quite robust, especially in low-resource scenarios. However, given the noisy nature of the in-domain training data, the improvements we obtained are rather modest.

pdf bib
Online Sentence Segmentation for Simultaneous Interpretation using Multi-Shifted Recurrent Neural Network
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation
Junya Ono | Masao Utiyama | Eiichiro Sumita
Proceedings of The 8th Workshop on Patent and Scientific Literature Translation

pdf bib
SJTU-NICT at MRP 2019: Multi-Task Learning for End-to-End Uniform Semantic Graph Parsing
Zuchao Li | Hai Zhao | Zhuosheng Zhang | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes our SJTU-NICT’s system for participating in the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL). Our system uses a graph-based approach to model a variety of semantic graph parsing tasks. Our main contributions in the submitted system are summarized as follows: 1. Our model is fully end-to-end and is capable of being trained only on the given training set which does not rely on any other extra training source including the companion data provided by the organizer; 2. We extend our graph pruning algorithm to a variety of semantic graphs, solving the problem of excessive semantic graph search space; 3. We introduce multi-task learning for multiple objectives within the same framework. The evaluation results show that our system achieved second place in the overall F1 score and achieved the best F1 score on the DM framework.

pdf bib
Recurrent Positional Embedding for Neural Machine Translation
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In the Transformer network architecture, positional embeddings are used to encode order dependencies into the input representation. However, this input representation only involves static order dependencies based on discrete numerical information, that is, are independent of word content. To address this issue, this work proposes a recurrent positional embedding approach based on word vector. In this approach, these recurrent positional embeddings are learned by a recurrent neural network, encoding word content-based order dependencies into the input representation. They are then integrated into the existing multi-head self-attention model as independent heads or part of each head. The experimental results revealed that the proposed approach improved translation performance over that of the state-of-the-art Transformer baseline in WMT’14 English-to-German and NIST Chinese-to-English translation tasks.

pdf bib
MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription. By using this method, arbitrary Burmese strings can be accurately inputted with 26 lowercase Latin letters. Meanwhile, the 26 uppercase Latin letters are designed as shortcuts of lowercase letter sequences. The frequency of Burmese characters is considered in MY-AKKHARA to realize an efficient keystroke distribution on a QWERTY keyboard. Given that the Unicode standard has not been extensively used in digitization of Burmese, we hope that MY-AKKHARA can contribute to the widespread use of Unicode in Myanmar and can provide a platform for smart input methods for Burmese in the future. An implementation of MY-AKKHARA running in Windows is released at http://www2.nict.go.jp/astrec-att/member/ding/my-akkhara.html

pdf bib
Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English
Benjamin Marie | Hour Kaing | Aye Myat Mon | Chenchen Ding | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita
Proceedings of the 6th Workshop on Asian Translation

This paper presents the NICT’s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks. For all the translation directions, we built state-of-the-art supervised neural (NMT) and statistical (SMT) machine translation systems, using monolingual data cleaned and normalized. Our combination of NMT and SMT performed among the best systems for the four translation directions. We also investigated the feasibility of unsupervised machine translation for low-resource and distant language pairs and confirmed observations of previous work showing that unsupervised MT is still largely unable to deal with them.

pdf bib
NICT’s participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT
Raj Dabre | Eiichiro Sumita
Proceedings of the 6th Workshop on Asian Translation

In this paper we describe our submissions to WAT 2019 for the following tasks: English–Tamil translation and Russian–Japanese translation. Our team,“NICT-5”, focused on multilingual domain adaptation and back-translation for Russian–Japanese translation and on simple fine-tuning for English–Tamil translation . We noted that multi-stage fine tuning is essential in leveraging the power of multilingualism for an extremely low-resource language like Russian–Japanese. Furthermore, we can improve the performance of such a low-resource language pair by exploiting a small but in-domain monolingual corpus via back-translation. We managed to obtain second rank in both tasks for all translation directions.

pdf bib
English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
Rui Wang | Haipeng Sun | Kehai Chen | Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 6th Workshop on Asian Translation

This paper presents the NICT’s participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions. We built neural machine translation (NMT) systems for these tasks. Our NMT systems were trained with language model pretraining. Back-translation technology is adopted to NMT. Our NMT systems rank the third in English-to-Myanmar and the second in Myanmar-to-English according to BLEU score.

pdf bib
Long Warm-up and Self-Training: Training Strategies of NICT-2 NMT System at WAT-2019
Kenji Imamura | Eiichiro Sumita
Proceedings of the 6th Workshop on Asian Translation

This paper describes the NICT-2 neural machine translation system at the 6th Workshop on Asian Translation. This system employs the standard Transformer model but features the following two characteristics. One is the long warm-up strategy, which performs a longer warm-up of the learning rate at the start of the training than conventional approaches. Another is that the system introduces self-training approaches based on multiple back-translations generated by sampling. We participated in three tasks—ASPEC.en-ja, ASPEC.ja-en, and TDDC.ja-en—using this system.

pdf bib
Recycling a Pre-trained BERT Encoder for Neural Machine Translation
Kenji Imamura | Eiichiro Sumita
Proceedings of the 3rd Workshop on Neural Generation and Translation

In this paper, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model is applied to Transformer-based neural machine translation (NMT). In contrast to monolingual tasks, the number of unlearned model parameters in an NMT decoder is as huge as the number of learned parameters in the BERT model. To train all the models appropriately, we employ two-stage optimization, which first trains only the unlearned parameters by freezing the BERT model, and then fine-tunes all the sub-models. In our experiments, stable two-stage optimization was achieved, in contrast the BLEU scores of direct fine-tuning were extremely low. Consequently, the BLEU scores of the proposed method were better than those of the Transformer base model and the same model without pre-training. Additionally, we confirmed that NMT with the BERT encoder is more effective in low-resource settings.

pdf bib
Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
Haipeng Sun | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Unsupervised bilingual word embedding (UBWE), together with other technologies such as back-translation and denoising, has helped unsupervised neural machine translation (UNMT) achieve remarkable results in several language pairs. In previous methods, UBWE is first trained using non-parallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT. That is, the training of UBWE and UNMT are separate. In this paper, we first empirically investigate the relationship between UBWE and UNMT. The empirical findings show that the performance of UNMT is significantly affected by the performance of UBWE. Thus, we propose two methods that train UNMT with UBWE agreement. Empirical results on several language pairs show that the proposed methods significantly outperform conventional UNMT.

pdf bib
Neural Machine Translation with Reordering Embeddings
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The reordering model plays an important role in phrase-based statistical machine translation. However, there are few works that exploit the reordering information in neural machine translation. In this paper, we propose a reordering mechanism to learn the reordering embedding of a word based on its contextual information. These learned reordering embeddings are stacked together with self-attention networks to learn sentence representation for machine translation. The reordering mechanism can be easily integrated into both the encoder and the decoder in the Transformer translation system. Experimental results on WMT’14 English-to-German, NIST Chinese-to-English, and WAT Japanese-to-English translation tasks demonstrate that the proposed methods can significantly improve the performance of the Transformer.

pdf bib
Sentence-Level Agreement for Neural Machine Translation
Mingming Yang | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita | Min Zhang | Tiejun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The training objective of neural machine translation (NMT) is to minimize the loss between the words in the translated sentences and those in the references. In NMT, there is a natural correspondence between the source sentence and the target sentence. However, this relationship has only been represented using the entire neural network and the training objective is computed in word-level. In this paper, we propose a sentence-level agreement module to directly minimize the difference between the representation of source and target sentence. The proposed agreement module can be integrated into NMT as an additional training objective function and can also be used to enhance the representation of the source sentences. Empirical results on the NIST Chinese-to-English and WMT English-to-German tasks show the proposed agreement module can significantly improve the NMT performance.

pdf bib
Improving Neural Machine Translation with Neural Syntactic Distance
Chunpeng Ma | Akihiro Tamura | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The explicit use of syntactic information has been proved useful for neural machine translation (NMT). However, previous methods resort to either tree-structured neural networks or long linearized sequences, both of which are inefficient. Neural syntactic distance (NSD) enables us to represent a constituent tree using a sequence whose length is identical to the number of words in the sentence. NSD has been used for constituent parsing, but not in machine translation. We propose five strategies to improve NMT with NSD. Experiments show that it is not trivial to improve NMT with NSD; however, the proposed strategies are shown to improve translation performance of the baseline model (+2.1 (En–Ja), +1.3 (Ja–En), +1.2 (En–Ch), and +1.0 (Ch–En) BLEU).

pdf bib
Incorporating Word Attention into Character-Based Word Segmentation
Shohei Higashiyama | Masao Utiyama | Eiichiro Sumita | Masao Ideuchi | Yoshiaki Oida | Yohei Sakamoto | Isaac Okada
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering. Typical segmentation models are categorized as character-based, for conducting exact inference, or word-based, for utilizing word-level information. We propose a character-based model utilizing word information to leverage the advantages of both types of models. Our model learns the importance of multiple candidate words for a character on the basis of an attention mechanism, and makes use of it for segmentation decisions. The experimental results show that our model achieves better performance than the state-of-the-art models on both Japanese and Chinese benchmark datasets.

2018

pdf bib
Multilingual Parallel Corpus for Global Communication Plan
Kenji Imamura | Eiichiro Sumita
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation
Kenji Imamura | Atsushi Fujita | Eiichiro Sumita
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

A large-scale parallel corpus is required to train encoder-decoder neural machine translation. The method of using synthetic parallel texts, in which target monolingual corpora are automatically translated into source sentences, is effective in improving the decoder, but is unreliable for enhancing the encoder. In this paper, we propose a method that enhances the encoder and attention using target monolingual corpora by generating multiple source sentences via sampling. By using multiple source sentences, diversity close to that of humans is achieved. Our experimental results show that the translation quality is improved by increasing the number of synthetic source sentences for each given target sentence, and quality close to that using a manually created parallel corpus was achieved.

pdf bib
NICT Self-Training Approach to Neural Machine Translation at NMT-2018
Kenji Imamura | Eiichiro Sumita
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

This paper describes the NICT neural machine translation system submitted at the NMT-2018 shared task. A characteristic of our approach is the introduction of self-training. Since our self-training does not change the model structure, it does not influence the efficiency of translation, such as the translation speed. The experimental results showed that the translation quality improved not only in the sequence-to-sequence (seq-to-seq) models but also in the transformer models.

pdf bib
NICT’s Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task
Benjamin Marie | Rui Wang | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
NICT’s Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task
Rui Wang | Benjamin Marie | Masao Utiyama | Eiichiro Sumita
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
Exploring Recombination for Efficient Decoding of Neural Machine Translation
Zhisong Zhang | Rui Wang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations. This means that partial hypotheses with different prefixes will be regarded differently no matter how similar they are. However, this might be inefficient since some partial hypotheses can contain only local differences that will not influence future predictions. In this work, we introduce recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses. Heuristically, we use a simple n-gram suffix based equivalence function and adapt it into beam search decoding. Through experiments on large-scale Chinese-to-English and English-to-Germen translation tasks, we show that the proposed method can obtain similar translation quality with a smaller beam size, making NMT decoding more efficient.

pdf bib
CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

This paper presents an open-source neural machine translation toolkit named CytonMT. The toolkit is built from scratch only using C++ and NVIDIA’s GPU-accelerated libraries. The toolkit features training efficiency, code simplicity and translation quality. Benchmarks show that cytonMT accelerates the training speed by 64.5% to 110.8% on neural networks of various sizes, and achieves competitive translation quality.

pdf bib
NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers
Raj Dabre | Anoop Kunchukuttan | Atsushi Fujita | Eiichiro Sumita
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib
English-Myanmar NMT and SMT with Pre-ordering: NICT’s Machine Translation Systems at WAT-2018
Rui Wang | Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib
Combination of Statistical and Neural Machine Translation for Myanmar-English
Benjamin Marie | Atsushi Fujita | Eiichiro Sumita
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib
Guiding Neural Machine Translation with Retrieved Translation Pieces
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect n-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call “translation pieces”. We compute pseudo-probabilities for each retrieved sentence based on similarities between the input sentence and the retrieved source sentences, and use these to weight the retrieved translation pieces. Finally, an existing NMT model is used to translate the input sentence, with an additional bonus given to outputs that contain the collected translation pieces. We show our method improves NMT translation results up to 6 BLEU points on three narrow domain translation tasks where repetitiveness of the target sentences is particularly salient. It also causes little increase in the translation time, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation.

pdf bib
Forest-Based Neural Machine Translation
Chunpeng Ma | Akihiro Tamura | Masao Utiyama | Tiejun Zhao | Eiichiro Sumita
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Tree-based neural machine translation (NMT) approaches, although achieved impressive performance, suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors. For statistical machine translation (SMT), forest-based methods have been proven to be effective for solving this problem, while for NMT this kind of approach has not been attempted. This paper proposes a forest-based NMT method that translates a linearized packed forest under a simple sequence-to-sequence framework (i.e., a forest-to-sequence NMT model). The BLEU score of the proposed method is higher than that of the sequence-to-sequence NMT, tree-based NMT, and forest-based SMT systems.

pdf bib
Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation
Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training. In this approach, a weight is assigned to each sentence based on the measured difference between the training costs of two iterations. Further, in each epoch, a certain percentage of sentences are dynamically sampled according to their weights. Empirical results based on the NIST Chinese-to-English and the WMT English-to-German tasks show that the proposed method can significantly accelerate the NMT training and improve the NMT performance.

pdf bib
Simplified Abugidas
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics. We investigate the feasibility of recovering the original text written in an abugida after omitting subordinate diacritics and merging consonant letters with similar phonetic values. This is crucial for developing more efficient input methods by reducing the complexity in abugidas. Four abugidas in the southern Brahmic family, i.e., Thai, Burmese, Khmer, and Lao, were studied using a newswire 20,000-sentence dataset. We compared the recovery performance of a support vector machine and an LSTM-based recurrent neural network, finding that the abugida graphemes could be recovered with 94% - 97% accuracy at the top-1 level and 98% - 99% at the top-4 level, even after omitting most diacritics (10 - 30 types) and merging the remaining 30 - 50 characters into 21 graphemes.

2017

pdf bib
NICT-NAIST System for WMT17 Multimodal Translation Task
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Second Conference on Machine Translation

pdf bib
Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing
Atsushi Fujita | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

Aiming at facilitating the research on quality estimation (QE) and automatic post-editing (APE) of machine translation (MT) outputs, especially for those among Asian languages, we have created new datasets for Japanese to English, Chinese, and Korean translations. As the source text, actual utterances in Japanese were extracted from the log data of our speech translation service. MT outputs were then given by phrase-based statistical MT systems. Finally, human evaluators were employed to grade the quality of MT outputs and to post-edit them. This paper describes the characteristics of the created datasets and reports on our benchmarking experiments on word-level QE, sentence-level QE, and APE conducted using the created datasets.

pdf bib
Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017
Kenji Imamura | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

In this paper, we describe the NICT-2 neural machine translation system evaluated at WAT2017. This system uses multiple models as an ensemble and combines models with opposite decoding directions by reranking (called bi-directional reranking). In our experimental results on small data sets, the translation quality improved when the number of models was increased to 32 in total and did not saturate. In the experiments on large data sets, improvements of 1.59-3.32 BLEU points were achieved when six-model ensembles were combined by the bi-directional reranking.

pdf bib
A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task
Yusuke Oda | Katsuhito Sudoh | Satoshi Nakamura | Masao Utiyama | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task. The system consists of a language-independent tokenizer and an attentional encoder-decoder style neural machine translation model. According to the official results, our system achieves higher translation accuracy than any systems submitted previous campaigns despite simple model architecture.

pdf bib
Context-Aware Smoothing for Neural Machine Translation
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In Neural Machine Translation (NMT), each word is represented as a low-dimension, real-value vector for encoding its syntax and semantic information. This means that even if the word is in a different sentence context, it is represented as the fixed vector to learn source representation. Moreover, a large number of Out-Of-Vocabulary (OOV) words, which have different syntax and semantic information, are represented as the same vector representation of “unk”. To alleviate this problem, we propose a novel context-aware smoothing method to dynamically learn a sentence-specific vector for each word (including OOV words) depending on its local context words in a sentence. The learned context-aware representation is integrated into the NMT to improve the translation performance. Empirical results on NIST Chinese-to-English translation task show that the proposed approach achieves 1.78 BLEU improvements on average over a strong attentional NMT, and outperforms some existing systems.

pdf bib
Improving Neural Machine Translation through Phrase-based Forced Decoding
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using the phrase-based decoding cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of phrase-based SMT is limited by the phrase-based translation rule table. We propose a soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the forced decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

pdf bib
Key-value Attention Mechanism for Neural Machine Translation
Hideya Mino | Masao Utiyama | Eiichiro Sumita | Takenobu Tokunaga
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this paper, we propose a neural machine translation (NMT) with a key-value attention mechanism on the source-side encoder. The key-value attention mechanism separates the source-side content vector into two types of memory known as the key and the value. The key is used for calculating the attention distribution, and the value is used for encoding the context representation. Experiments on three different tasks indicate that our model outperforms an NMT model with a conventional attention mechanism. Furthermore, we perform experiments with a conventional NMT framework, in which a part of the initial value of a weight matrix is set to zero so that the matrix is as the same initial-state as the key-value attention mechanism. As a result, we obtain comparable results with the key-value attention mechanism without changing the network structure.

pdf bib
Instance Weighting for Neural Machine Translation Domain Adaptation
Rui Wang | Masao Utiyama | Lemao Liu | Kehai Chen | Eiichiro Sumita
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation. Empirical results on the IWSLT English-German/French tasks show that the proposed methods can substantially improve NMT performance by up to 2.7-6.7 BLEU points, outperforming the existing baselines by up to 1.6-3.6 BLEU points.

pdf bib
Neural Machine Translation with Source Dependency Representation
Kehai Chen | Rui Wang | Masao Utiyama | Lemao Liu | Akihiro Tamura | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Source dependency information has been successfully introduced into statistical machine translation. However, there are only a few preliminary attempts for Neural Machine Translation (NMT), such as concatenating representations of source word and its dependency label together. In this paper, we propose a novel NMT with source dependency representation to improve translation performance of NMT, especially long sentences. Empirical results on NIST Chinese-to-English translation task show that our method achieves 1.6 BLEU improvements on average over a strong NMT system.

pdf bib
Sentence Embedding for Neural Machine Translation Domain Adaptation
Rui Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT’s internal embedding of the source sentence and use the sentence embedding similarity to select the sentences which are close to in-domain data. The empirical adaptation results on the IWSLT English-French and NIST Chinese-English tasks show that the proposed methods can substantially improve NMT performance by 2.4-9.0 BLEU points, outperforming the existing state-of-the-art baseline by 2.3-4.5 BLEU points.

2016

pdf bib
Unsupervised Word Alignment by Agreement Under ITG Constraint
Hidetaka Kamigaito | Akihiro Tamura | Hiroya Takamura | Manabu Okumura | Eiichiro Sumita
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Neural Machine Translation with Supervised Attention
Lemao Liu | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The attention mechanism is appealing for neural machine translation, since it is able to dynamically encode a source sentence by generating a alignment between a target word and source words. Unfortunately, it has been proved to be worse than conventional alignment models in alignment accuracy. In this paper, we analyze and explain this issue from the point view of reordering, and propose a supervised attention which is learned with guidance from conventional alignment models. Experiments on two Chinese-to-English translation tasks show that the supervised attention mechanism yields better alignments leading to substantial gains over the standard attention based NMT.

pdf bib
Connecting Phrase based Statistical Machine Translation Adaptation
Rui Wang | Hai Zhao | Bao-Liang Lu | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Although more additional corpora are now available for Statistical Machine Translation (SMT), only the ones which belong to the same or similar domains of the original corpus can indeed enhance SMT performance directly. A series of SMT adaptation methods have been proposed to select these similar-domain data, and most of them focus on sentence selection. In comparison, phrase is a smaller and more fine grained unit for data selection, therefore we propose a straightforward and efficient connecting phrase based adaptation method, which is applied to both bilingual phrase pair and monolingual n-gram adaptation. The proposed method is evaluated on IWSLT/NIST data sets, and the results show that phrase based SMT performances are significantly improved (up to +1.6 in comparison with phrase based SMT baseline system and +0.9 in comparison with existing methods).

pdf bib
A Prototype Automatic Simultaneous Interpretation System
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Simultaneous interpretation allows people to communicate spontaneously across language boundaries, but such services are prohibitively expensive for the general public. This paper presents a fully automatic simultaneous interpretation system to address this problem. Though the development is still at an early stage, the system is capable of keeping up with the fastest of the TED speakers while at the same time delivering high-quality translations. We believe that the system will become an effective tool for facilitating cross-lingual communication in the future.

pdf bib
MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Rei Miyata | Anthony Hartley | Kyo Kageura | Cécile Paris | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.

pdf bib
Bilingual Segmented Topic Model
Akihiro Tamura | Eiichiro Sumita
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Target-Bidirectional Neural Models for Machine Transliteration
Andrew Finch | Lemao Liu | Xiaolin Wang | Eiichiro Sumita
Proceedings of the Sixth Named Entity Workshop

pdf bib
Global Pre-ordering for Improving Sublanguage Translation
Masaru Fuji | Masao Utiyama | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations. This paper proposes a novel global reordering method with particular focus on long-distance reordering for capturing the global sentence structure of a sublanguage. The proposed method learns global reordering models from a non-annotated parallel corpus and works in conjunction with conventional syntactic reordering. Experimental results on the patent abstract sublanguage show substantial gains of more than 25 points in the RIBES metric and comparable BLEU scores both for Japanese-to-English and English-to-Japanese translations.

pdf bib
NICT-2 Translation System for WAT2016: Applying Domain Adaptation to Phrase-based Statistical Machine Translation
Kenji Imamura | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes the NICT-2 translation system for the 3rd Workshop on Asian Translation. The proposed system employs a domain adaptation method based on feature augmentation. We regarded the Japan Patent Office Corpus as a mixture of four domain corpora and improved the translation quality of each domain. In addition, we incorporated language models constructed from Google n-grams as external knowledge. Our domain adaptation method can naturally incorporate such external knowledge that contributes to translation quality.

pdf bib
An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine. The key problem is how to segment the stream in an online manner into units suitable for translation. The segmentation process proceeds by calculating a confidence score for each word that indicates the soundness of placing a sentence boundary after it, and then heuristics are employed to determine the position of the boundaries. Multiple variants of the confidence scoring method and segmentation heuristics were studied. Experimental results show that the best performing strategy is not only efficient in terms of average latency per word, but also achieved end-to-end translation quality close to an offline baseline, and close to oracle segmentation.

pdf bib
Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.

pdf bib
Introducing the Asian Language Treebank (ALT)
Ye Kyaw Thu | Win Pa Pa | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.

pdf bib
ASPEC: Asian Scientific Paper Excerpt Corpus
Toshiaki Nakazawa | Manabu Yaguchi | Kiyotaka Uchimoto | Masao Utiyama | Eiichiro Sumita | Sadao Kurohashi | Hitoshi Isahara
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

pdf bib
Agreement on Target-bidirectional Neural Machine Translation
Lemao Liu | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Interlocking Phrases in Phrase-based Statistical Machine Translation
Ye Kyaw Thu | Andrew Finch | Eiichiro Sumita
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Improving fast_align by Reordering
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Hierarchical Phrase-based Stream Decoding
Andrew Finch | Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model
Hidetaka Kamigaito | Taro Watanabe | Hiroya Takamura | Manabu Okumura | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Leave-one-out Word Alignment without Garbage Collector Effects
Xiaolin Wang | Masao Utiyama | Andrew Finch | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Binarized Neural Network Joint Model for Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Neural Network Transduction Models in Transliteration Generation
Andrew Finch | Lemao Liu | Xiaolin Wang | Eiichiro Sumita
Proceedings of the Fifth Named Entity Workshop

bib
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Graham Neubig | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Graham Neubig | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
NICT at WAT 2015
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
Transition-based Neural Constituent Parsing
Taro Watanabe | Eiichiro Sumita
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Hai Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
A Large-scale Study of Statistical Machine Translation Methods for Khmer Language
Ye Kyaw Thu | Vichet Chea | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
Integrating Dictionaries into an Unsupervised Model for Myanmar Word Segmentation
Ye Kyaw Thu | Andrew Finch | Eiichiro Sumita | Yoshinori Sagisaka
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Proceedings of the 1st Workshop on Asian Translation (WAT2014)
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Overview of the 1st Workshop on Asian Translation
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Word Order Does NOT Differ Significantly Between Chinese and Japanese
Chenchen Ding | Masao Utiyama | Eiichiro Sumita | Mikio Yamamoto
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Syntax-Augmented Machine Translation using Syntax-Label Clustering
Hideya Mino | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Learning Hierarchical Translation Spans
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation
Rui Wang | Hai Zhao | Bao-Liang Lu | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation
Xiaolin Wang | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Recurrent Neural Networks for Word Alignment Model
Akihiro Tamura | Taro Watanabe | Eiichiro Sumita
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Dependency-based Pre-ordering for Chinese-English Machine Translation
Jingsheng Cai | Masao Utiyama | Eiichiro Sumita | Yujie Zhang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora
Xiaolin Wang | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Tuning SMT with a Large Number of Features via Online Feature Grouping
Lemao Liu | Tiejun Zhao | Taro Watanabe | Eiichiro Sumita
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation
Rui Wang | Masao Utiyama | Isao Goto | Eiichro Sumita | Hai Zhao | Bao-Liang Lu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Distortion Model Considering Rich Context for Statistical Machine Translation
Isao Goto | Masao Utiyama | Eiichiro Sumita | Akihiro Tamura | Sadao Kurohashi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Additive Neural Networks for Statistical Machine Translation
Lemao Liu | Taro Watanabe | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hierarchical Phrase Table Combination for Machine Translation
Conghui Zhu | Taro Watanabe | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
Akihiro Tamura | Taro Watanabe | Eiichiro Sumita | Hiroya Takamura | Manabu Okumura
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Post-ordering by Parsing for Japanese-English Statistical Machine Translation
Isao Goto | Masao Utiyama | Eiichiro Sumita
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Rescoring a Phrase-based Machine Transliteration System with Recurrent Neural Network Language Models
Andrew Finch | Paul Dixon | Eiichiro Sumita
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf bib
Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
Akihiro Tamura | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
An Unsupervised Model for Joint Phrase Alignment and Extraction
Graham Neubig | Taro Watanabe | Eiichiro Sumita | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Machine Translation System Combination by Confusion Forest
Taro Watanabe | Eiichiro Sumita
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reordering Constraint Based on Document-Level Context
Takashi Onishi | Masao Utiyama | Eiichiro Sumita
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Translation Quality Indicators for Pivot-based Statistical MT
Michael Paul | Eiichiro Sumita
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
Michael Paul | Andrew Finch | Paul R. Dixon | Eiichiro Sumita
Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties

pdf bib
Integrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System
Andrew Finch | Paul Dixon | Eiichiro Sumita
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
Using Features from a Bilingual Alignment Model in Transliteration Mining
Takaaki Fukunishi | Andrew Finch | Seiichi Yamamoto | Eiichiro Sumita
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

2010

pdf bib
Community-based Construction of Draft and Final Translation Corpus Through a Translation Hosting Site Minna no Hon’yaku (MNH)
Takeshi Abekawa | Masao Utiyama | Eiichiro Sumita | Kyo Kageura
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Transliteration Using a Phrase-Based Statistical Machine Translation System to Re-Score the Output of a Joint Multigram Model
Andrew Finch | Eiichiro Sumita
Proceedings of the 2010 Named Entities Workshop

pdf bib
Helping Volunteer Translators, Fostering Language Resources
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

pdf bib
Syntactic Constraints on Phrase Extraction for Phrase-Based Machine Translation
Hailong Cao | Andrew Finch | Eiichiro Sumita
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf bib
Paraphrase Lattice for Statistical Machine Translation
Takashi Onishi | Masao Utiyama | Eiichiro Sumita
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Filtering Syntactic Constraints for Statistical Machine Translation
Hailong Cao | Eiichiro Sumita
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Bidirectional Phrase-based Statistical Machine Translation
Andrew Finch | Eiichiro Sumita
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
NICT@WMT09: Model Adaptation and Transliteration for Spanish-English SMT
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation
Kei Hashimoto | Hirohumi Yamamoto | Hideo Okuma | Eiichiro Sumita | Keiichi Tokuda
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

pdf bib
Transliteration by Bidirectional Statistical Machine Translation
Andrew Finch | Eiichiro Sumita
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
On the Importance of Pivot Language Selection for Statistical Machine Translation
Michael Paul | Hirofumi Yamamoto | Eiichiro Sumita | Satoshi Nakamura
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2008

pdf bib
Chinese Unknown Word Translation by Subword Re-segmentation
Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Method of Selecting Training Data to Build a Compact and Efficient Translation Model
Keiji Yasuda | Ruiqiang Zhang | Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Achilles: NiCT/ATR Chinese Morphological Analyzer for the Fourth Sighan Bakeoff
Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Phrase-based Machine Transliteration
Andrew Finch | Eiichiro Sumita
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)

pdf bib
Dynamic Model Interpolation for Statistical Machine Translation
Andrew Finch | Eiichiro Sumita
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Improved Statistical Machine Translation by Multiple Chinese Word Segmentation
Ruiqiang Zhang | Keiji Yasuda | Eiichiro Sumita
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Imposing Constraints from the Source Tree on ITG Constraints for SMT
Hirofumi Yamamoto | Hideo Okuma | Eiichiro Sumita
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Multilingual Mobile-Phone Translation Services for World Travelers
Michael Paul | Hideo Okuma | Hirofumi Yamamoto | Eiichiro Sumita | Shigeki Matsuda | Tohru Shimizu | Satoshi Nakamura
Coling 2008: Companion volume: Demonstrations

2007

pdf bib
NICT-ATR Speech-to-Speech Translation System
Eiichiro Sumita | Tohru Shimizu | Satoshi Nakamura
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation
Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Multilingual Spoken Language Corpus Development for Communication Research
Toshiyuki Takezawa | Genichiro Kikui | Masahide Mizushima | Eiichiro Sumita
International Journal of Computational Linguistics & Chinese Language Processing, Volume 12, Number 3, September 2007: Special Issue on Invited Papers from ISCSLP 2006

pdf bib
Bilingual Cluster Based Models for Statistical Machine Translation
Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Exploiting Variant Corpora for Machine Translation
Michael Paul | Eiichiro Sumita
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Using the Web to Disambiguate Acronyms
Eiichiro Sumita | Fumiaki Sugaya
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Word Pronunciation Disambiguation using the Web
Eiichiro Sumita | Fumiaki Sugaya
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Subword-based Tagging by Conditional Random Fields for Chinese Word Segmentation
Ruiqiang Zhang | Genichiro Kikui | Eiichiro Sumita
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Using Lexical Dependency and Ontological Knowledge to Improve a Detailed Syntactic and Semantic Tagger of English
Andrew Finch | Ezra Black | Young-Sook Hwang | Eiichiro Sumita
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
Ruiqiang Zhang | Genichiro Kikui | Eiichiro Sumita
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Measuring Non-native Speakers’ Proficiency of English by Using a Test with Automatically-Generated Fill-in-the-Blank Questions
Eiichiro Sumita | Fumiaki Sugaya | Seiichi Yamamoto
Proceedings of the Second Workshop on Building Educational Applications Using NLP

pdf bib
Acquiring Synonyms from Monolingual Comparable Texts
Mitsuo Shimohata | Eiichiro Sumita
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence
Andrew Finch | Young-Sook Hwang | Eiichiro Sumita
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

2004

pdf bib
Example-based Rescoring of Statistical Machine Translation Output
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
Automatic Measuring of English Language Proficiency using MT Evaluation Technology
Keiji Yasuda | Fumiaki Sugaya | Eiichiro Sumita | Toshiyuki Takezawa | Genichiro Kikui | Seiichi Yamamoto
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning

pdf bib
Example-based Machine Translation Based on Syntactic Transfer with Statistical Models
Kenji Imamura | Hideo Okuma | Taro Watanabe | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Splitting Input Sentence for Machine Translation Using Language Model with Sentence Similarity
Takao Doi | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Reordering Constraints for Phrase-Based Statistical Machine Translation
Richard Zens | Hermann Ney | Taro Watanabe | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs
Yasuhiro Akiba | Eiichiro Sumita | Hiromi Nakaiwa | Seiichi Yamamoto | Hiroshi G. Okuno
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Building a Paraphrase Corpus for Speech Translation
Mitsuo Shimohata | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Incremental Methods to Select Test Sentences for Evaluating Translation Ability
Yasuhiro Akiba | Eiichiro Sumita | Hiromi Nakaiwa | Seiichi Yamamoto | Hiroshi G. Okuno
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
How Does Automatic Machine Translation Evaluation Correlate with Human Scoring as the Number of Reference Translations Increases?
Andrew Finch | Yasuhiro Akiba | Eiichiro Sumita
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Automatic Construction of Machine Translation Knowledge Using Translation Literalness
Kenji Imamura | Eiichiro Sumita | Yuji Matsumoto
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A corpus-centered approach to spoken language translation
Eiichiro Sumita | Yasuhiro Akiba | Takao Doi | Andrew Finch | Kenji Imamura | Michael Paul | Mitsuo Shimohata | Taro Watanabe
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Adaptation Using Out-of-Domain Corpus within EBMT
Takao Doi | Eiichiro Sumita | Hirofumi Yamamoto
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
Automatic Expansion of Equivalent Sentence Set Based on Syntactic Substitution
Kenji Imamura | Yasuhiro Akiba | Eiichiro Sumita
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
Retrieving Meaning-equivalent Sentences for Example-based Rough Translation
Mitsuo Shimohata | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Input Sentence Splitting and Translating
Takao Doi | Eiichiro Sumita
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Chunk-Based Statistical Translation
Taro Watanabe | Eiichiro Sumita | Hiroshi G. Okuno
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation
Kenji Imamura | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Automatic paraphrasing based on parallel corpus for normalization
Mitsuo Shimohata | Eiichiro Sumita
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Statistical Machine Translation on Paraphrased Corpora
Taro Watanabe | Mitsuo Shimohata | Eiichiro Sumita
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World
Toshiyuki Takezawa | Eiichiro Sumita | Fumiaki Sugaya | Hirofumi Yamamoto | Seiichi Yamamoto
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Corpus-Centered Computation
Eiichiro Sumita
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Identifying Synonymous Expressions from a Bilingual Corpus for Example-Based Machine Translation
Mitsuo Shimohata | Eiichiro Sumita
COLING-02: Machine Translation in Asia

pdf bib
Corpus-based Generation of Numeral Classifier using Phrase Alignment
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Bidirectional Decoding for Statistical Machine Translation
Taro Watanabe | Eiichiro Sumita
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Using Language and Translation Models to Select the Best among Outputs from Multiple MT Systems
Yasuhiro Akiba | Taro Watanabe | Eiichiro Sumita
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Example-based machine translation using DP-matching between work sequences
Eiichiro Sumita
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf bib
Integration of Referential Scope Limitations into Japanese Pronoun Resolution
Michael Paul | Eiichiro Sumita
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

2000

pdf bib
Translation using Information on Dialogue Participants
Setsuo Yamada | Eiichiro Sumita | Hideki Kashioka
Sixth Applied Natural Language Processing Conference

pdf bib
Lexical Transfer Using a Vector-Space Model
Eiichiro Sumita
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
Corpus-Based Anaphora Resolution Towards Antecedent Preference
Michael Paul | Kazuhide Yamamoto | Eiichiro Sumita
Coreference and Its Applications

1998

pdf bib
A Method for Correcting Errors in Speech Recognition using the Statistical Features of Character Co-occurrence
Satoshi Kaki | Eiichiro Sumita | Hitoshi Iida
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Feasibility Study for Ellipsis Resolution in Dialogues by Machine-Learning Technique
Kazuhide Yamamoto | Eiichiro Sumita
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence
Satoshi Kaki | Eiichiro Sumita | Hitoshi Iida
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Feasibility Study for Ellipsis Resolution in Dialogues by Machine-Learning Technique
Kazuhide YAMAMOTO | Eiichiro SUMITA
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1996

pdf bib
Spoken-Language Translation Method Using Examples
Hitoshi Iida | Eiichiro Sumita | Osamu Furuse
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib
Real-Time Spoken Language Translation Using Associative Processors
Kozo Oi | Eiichiro Sumita | Osamu Furuse | Hitoshi Iida | Tetsuya Higuchi
Fourth Conference on Applied Natural Language Processing

1991

pdf bib
EXPERIMENTS AND PROSPECTS OF EXAMPLE-BASED MACHINE TRANSLATION
Eiichiro SUMITA | Hitoshi HDA
29th Annual Meeting of the Association for Computational Linguistics

Search
Co-authors