Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019

This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.


Introduction
We participated in the newswire translation tasks with JIJI Corpus, one of the tasks in WAT 2019 (Nakazawa et al., 2019). JIJI Corpus, a Japanese-English news corpus, comes from Jiji Press news, which has various categories including politics, economy, nation, business, markets, and sports. The newswire official tasks of WAT started in 2017, and some participants and organizer had already submitted their translation results before WAT 2019. Their quality, however, has not been equivalent with that in other tasks, such as scientific paper tasks and patent tasks. This is because of not only the small size (0.20M) of the JIJI Corpus but also a significant amount of noise for the neural machine translation (NMT) system training. The English news articles, which are generated as news-writing, not as translating, are mainly targeted at native English speakers, so information is often omitted or added. Figure 1 shows an example from JIJI Corpus. The omitted and added phrases become noise for the NMT training, and we consider this is one of the reasons for the low translation quality. To solve this problem and improve the translation quality of an NMT system, Japanese sentence Content-equivalent translation of Japanese sentence: English Sentence ペットは機内では通常、貨物室で預かるが、「客 室で一緒に過ごしたい」との声を受け同社の系列 旅行会社が企画した。 ANA Sales Co., a travel agency unit of ANA Holdings, organized the tour to meet requests from customers wanting to travel with their pets in the cabin.

Omitted Added
Pets are usually kept in the cargo compartment in a plane. A travel agency unit of the company organized to meet requests from customers wanting to travel with their pets in the cabin. we are making a corpus with content-equivalent English translations of Japanese Jiji Press news, i.e. translations that do not omit and add information. We called the corpus Equivalent-JIJI Corpus 1 .
In this system description paper, we focus on these two styles of news parallel data, called the JIJI Corpus and the Equivalent-JIJI Corpus, and we named their styles the JIJI-style and the Equivalent-style, respectively. For WAT 2019, we submitted two translation results using translation systems adapted to the JIJI-style. In addition, to confirm the effectiveness of the content-equivalent translation, we submitted two more translation results using translation systems adapted to the Equivalent-style. Results showed that although our NMT systems adapted to the Equivalent-style scored lower than that adapted to the JIJI-style in the automatic evaluation, their scores reversed in the human evaluation.

Corpus Description
JIJI Corpus, which is extracted from Japanese and English Jiji Press news, is relatively small compared with those used in other Japanese→English or English→Japanese tasks of WAT 2019. To alleviate this low-resource translation problem, Morishita et al. (2017) used other resources for pre-training and fine-tuned with JIJI Corpus. We also used the external resources to improve the translation quality of the newswire tasks. For this purpose, we developed four types of corpora apart from JIJI Corpus. The first one was constructed through content-equivalent manual translation of Japanese Jiji Press news into English and is named Equivalent-JIJI Corpus. The second one was obtained through automatic sentence alignment between Japanese and English Jiji Press news using a sentence similarity score and is named Aligned-JIJI. The official JIJI Corpus is constructed in the same way. JIJI Corpus and Aligned-JIJI Corpus include noise as training data for an NMT system. The third corpus was constructed by back-translating monolingual English news sentences into Japanese (Sennrich et al., 2016b). This corpus is used for Japanese→English translation only. We named this parallel data BT-JIJI Corpus. For the back-translation, we used our best English→Japanese system adapted to the JIJIstyle. Finally, we used another newspaper parallel corpus originating from the Yomiuri Shimbun, which we named Aligned-Yomiuri Corpus. Aligned-Yomiuri Corpus is made with a parallel sentence similarity score, as is the case of JIJI Corpus. Table 1 summarizes the detail of each corpus.

Domain Adaptation Techniques
In this paper, we used a domain-adaptation technique to train a model adapted to the JIJIand Equivalent-style. The multi-domain method (Chu et al., 2017;Sennrich et al., 2016a) is one of the most effective approaches to leverage out-ofdomain data. Chu et al. (2017) proposed training an NMT system with multi-domain parallel corpora using domain tags such as "<domain-name>" attached to the respective corpora. We used domain adaptations with the names of the styls as domain tags. We used a "<JIJI-style>" tag for the JIJI, Aligned-JIJI, and BT-JIJI corpora and a "<Equivalent-style>" tag for Equivalent-JIJI Corpus. In addition, we used a "<YOMIURI-style>" tag for Aligned-Yomiuri Corpus because it comes from a newspaper other than Jiji Press news.

Experiments
In this study, we verified the effectiveness of the Equivalent-style translation through the following procedures. Firstly, we trained the multiple NMT models with different combinations of five corpora as shown in Table 1, and evaluated these NMT models with an official test-set, in which the number of data was 2000. Then, we evaluated these NMT models with a further test-set, in which the number of data was 1764, extracted from Equivalent-JIJI Corpus of Equivalent-style in contrast to the official test-set extracted JIJI Corpus in JIJI-style. Finally, we evaluated the effectiveness of the Equivalent-style translation.

Data Processing and System Setup
All of the datasets were preprocessed as follows. We used the Moses toolkit 2 to clean and tokenize the English data and used KyTea (Neubig et al., 2011) to tokenize the Japanese data. Then, we used a vocabulary of 32K units based on a joint source and target byte-pair encoding (BPE) (Sennrich et al., 2016c). For the translation model,   we used the encoder and decoder of the transformer model (Vaswani et al., 2017), which is a state of the art NMT model. The transformer model uses a multi-headed attention mechanism applied as self-attention and a position-wise fully connected feed-forward network. The encoder converts the received source language sentence into a sequence of continuous representations, and the decoder generates the target language sentence. We implemented our systems with the Sockeye toolkit (Hieber et al., 2018), and trained them on one Nvidia P100 Tesla GPU. While training our models, we used the stochastic gradient descent (SGD) with Adam (Kingma and Ba, 2015) as the optimizer, using a learning rate of 0.0002, multiplied by 0.7 after every eight checkpoints. We set the batch size to 5000 tokens and maximum sentence length to 99 BPE units. For the other hyperparameters of our models, we used the default parameter values of Sockeye. We used early stopping with a patience of 32. Decoding was performed with a beam search with a beam size of 5, and we did not apply an ensemble decoding with multiple models, although this could possibly improve the translation quality, though we used a beam search with a beam size of 30 and an ensemble of ten models when submitting the official results. To evaluate translation quality, we used BLEU (Papineni et al., 2002). BLEU is calculated using multi-bleu.perl 3 . We report case-sensitive scores.

Results
Tables 2 and 3 show the experimental results. The Training corpus column shows the corpora used for training. The Style column shows the tag used for translation, i.e. the JIJI-or Equivalent-style.
The JIJI-style test-set is equal to the official testset in the newswire task of WAT 2019.

Trained with Different Combinations of Five Corpora
The JIJI-style test-set column of Tables 2 and 3 shows the translation quality of the JIJI-style testsets with the BLEU metric for different combinations of the five corpora. For the models without domain adaptation, where Domain adaptation column is "No," the BLEU scores are improved by adding the other domains' data into the JIJI  In the case of Japanese→English task with JIJI Corpus and Equivalent-JIJI Corpus, the origin of the target-side English sentences differs between the two corpora (JIji Press news and Contentequivalent translation) despite the origin of the source-side is being the same (Jiji Press news), so the NMT system cannot decide which style, JIJIor Equivalent-style, should be output. In contrast, no choice is necessary for the English→Japanese task because the target-side Japanese sentences is the same origin (Jiji Press news).
The Equivalent-style test-set column in Tables 2 and 3 shows translation quality of the Equivalent-style test-sets. For the models without domain adaptation, the BLEU scores are not improved by adding the other domains' data into the Equivalent-JIJI Corpus in case of the Japanese→English task. The domain adaptation using tags is extremely effective for the Japanese→Engish task. Although the amount of Equivalent-style data is much smaller than that of JIJI-style data, the BLEU scores for the Equivalent-style test-set are higher than those for the JIJI-style test-set. In particular, the BLEU scores of the Equivalent-style test-set for the English→Japanese are over 43. It appears that it is more difficult to improve the translation quality for the JIJI-style test-set than for the Equivalentstyle test-set because the JIJI-style test-set includes noise for training the NMT system.

Translation with Different Types of Systems
Supposing that JIJI Corpus includes noise, the NMT system adapted to the Equivalent-style seems to be a better system to translate news generally. However, the BLEU scores for the JIJI-style test-set trained with Equivalent-JIJI Corpus are 9.15 for Japanese→English and 17.92 for English→Japanese and they are lower than the scores for the test-set trained with JIJI Corpus, as shown in Tables 2 and 3. To determine whether or not the translation systems adapted to the Equivalent-style are better for human evaluation than those adapted to the JIJI-style, we submitted the translated results with both of the translation systems adapted to JIJI-and Equivalentstyle.

Official Results
We used the bottom translation systems of Tables  2 and 3 for submitting to WAT 2019. These systems can be adapted to each style by attaching domain tags, "<JIJI-style>" for JIJI-style translation and "<Equivalent-style>" for Equivalentstyle translation, at the top of the source sentence.
To improve the translation quality further, we submitted the translation results with an ensemble decode of ten models and a beam search with a beam size of 30. Table 4 shows the official results of our submission to WAT 2019. Our systems adapted to JIJI-style achieved the best BLEU and RIBES scores. In contrast, for the pairwise crowdsourcing evaluation and the JPO adequacy evaluation 4 , our systems adapted to the Equivalent-style achieved the best evaluation. For the AMFM, our system Example sentence BLEU Source

Content-equivalent
After the meeting, Akamatsu emphasized to the press, "We will hear opinions translation of the public in a fair and equitable manner," and Eda said, "We will accept the results of the objective investigation." Reference After the meeting, Akamatsu told reporters, "We will seek the views of the public (JIJI Corpus) in a fair and equitable manner." NMT output After the meeting, Akamatsu told reporters that he will listen to public opinions 51.61 adapted to JIJI-style in a fair and equitable manner. NMT output adapted After the meeting, Akamatsu emphasized to the press, "We will listen to the opinions 22.70 to Equivalent-style of the people in a fair and fair manner," and Eda said, "We will accept the results of the investigation objectively." Source Content-equivalent It is expected to be passed and enacted at a plenary session of the House of translation Councilors in the afternoon of the same day. Reference The House of Councillors, the upper chamber of the Diet, approved the spending (JIJI Corpus) program at a plenary meeting on Monday afternoon after the House of Representatives, the lower chamber, passed it earlier in the day.

NMT output
The House of Councillors, the upper chamber, is expected to approve the bill 26.33 adapted to JIJI-style at a plenary meeting later in the day. NMT output adapted It is expected to be passed and enacted at the plenary session of the House of 0.00 to Equivalent-style Councilors in the afternoon of the same day.  adapted to the JIJI-style achieved the best evaluation for the Japanese→English task, whereas our system adapted to the Equivalent-style achieved the best evaluation for the English→Japanese task. These results show that the NMT systems adapted to the Equivalent-style are generally better systems for translating the news. The overview paper for WAT 2019 gives the details of our submission including the other WAT participants' results.

Further Human Evaluation
Apart from the official pairwise crowdsourcing evaluation and JPO adequacy evaluation, we also evaluated our official submission independently with a translation company to analyze deeply the official results. We randomly selected 300 and 50 sentences from the Japanese→English and English→Japanese official test-sets respectively, and three evaluators counted the number of omitted and added words in the NMT outputs adapted to the JIJI-and Equivalent-styles. Table 6 shows the average number of words per 100 words of the three evaluators. These results indicate that the NMT systems adapted to the Equivalent-style can prevent the omission and addition of information. Table 5 shows the examples of NMT outputs adapted to the JIJI-and Equivalent-styles in the official tasks. The first example shows omitted information, and the second example shows added information in the NMT output adapted to the JIJI-style. The references also include omitted and added information. The NMT output adapted to the Equivalent-style is translated without omitted and added information. The sentence BLEU scores of outputs adapted to the JIJI-style NMT are higher than those of outputs adapted to the Equivalent-style NMT. These results indicate that NMT outputs adapted to the JIJI-style often include the omission and addition of information, and these cause the worse human evaluation. This seems to be a reason that our systems adapted to the Equivalent-style, which prevent the omission and addition of information, achieved the best human evaluation in spite of the lower BLEU scores.

Conclusions
In this description paper, we presented our NMT systems adapted to the JIJI-and the Equivalentstyles. In addition to the JIJI Corpus in the JIJIstyle that was officially provided by the WAT 2019 organizer, we developed a corpus of 0.22M sentence pairs in the Equivalent-style by manually, content-equivalently translating Japanese news sentences into English. We obtained the state-of-the-art results for the newswire tasks of WAT 2019. In our four submissions, the translation models adapted to the JIJI-style achieved the best results for the BLEU evaluation. In contrast, the translation models adapted to the Equivalentstyle achieved the best results for the pairwise crowdsourcing evaluation and JPO adequacy evaluation. We showed that the content-equivalently translated data is effective for the widespread news translation from the perspective of a human evaluation.