The 3rd Workshop on Asian Translation

Event Notification Type: 
Call for Participation
Abbreviated Title: 
WAT2016
Location: 
Osaka International Convention Center
Monday, 12 December 2016
Country: 
Japan
City: 
Osaka
Contact: 
Toshiaki Nakazawa
Submission Deadline: 
Friday, 19 August 2016

WAT 2016
(The 3rd Workshop on Asian Translation)
in conjunction with COLING 2016
http://lotus.kuee.kyoto-u.ac.jp/WAT/
December 12, 2016, Osaka, Japan

*********************+******** UPDATES *****************************
Japanese/Chinese <-> English newswire translation subtasks are cancelled.
********************************************************************

Following the success of the previous Workshops on Asian Translation
(WAT 2014 and WAT 2015), WAT 2016 will bring together machine
translation researchers and users to try, evaluate, share and discuss
brand-new ideas of machine translation. We are working toward the
practical use of machine translation among all Asian countries.

For the WAT 2016, we adopt new translation subtasks
"Hindi-to-English/Japanese mixed domain translations" as well as
"Indonesian-to-English newswire translation" in
addition to the subtasks that were conducted in the previous two
workshops. The workshop will also feature research papers on topics
related to the machine translation, especially for Asian languages.

WAT 2016 also invites researchers to submit their original work on
machine translation of Asian languages. The scope covers studies and
reports on theories, techniques, and resources to improve the machines
translation of Asian languages. All submitted research papers will be
examined under a double-blind peer-reviewing to decide if they will
appear at the workshop.

Topics of interest include, but are not limited to:
- Word-/phrase-/syntax-/semantics-/rule-based, neural and hybrids machine translation
- Asian language processing
- Incorporating linguistic information into machine translation
- Decoding algorithms
- System combination
- Error analysis
- Manual and automatic machine translation evaluation
- Machine translation applications
- Quality estimation
- Domain adaptation
- Machine translation for low resource languages
- Language resources

************************* IMPORTANT NOTICE *************************
Participants of the previous workshop are also required to sign up to
WAT2016
********************************************************************

IMPORTANT DATES
---------------

August 19 Crowdsourcing evaluation due
September 25 System description draft and research paper (new!) due
October 16 System description draft Review feedback
October 16 Research paper acceptance notification
October 30 System description and research paper camera-ready paper due
December 12 WAT 2016

TASK
----

The task is to improve the text translation quality for scientific
papers and patent documents. Participants choose any of the subtasks
in which they would like to participate and translate the test data
using their machine translation systems. The WAT organizers will
evaluate the results submitted using automatic evaluation and human
evaluation. We will also provide a baseline machine translation.

Subtasks:
Scientific Paper Subtasks:
English/Chinese <--> Japanese
Patent Subtasks:
English/Chinese/Korean <--> Japanese
Newswire Subtasks:
Indonesian <--> English
Mixed domain Subtasks:
Hindi <--> English/Japanese

Dataset:

* Scientific paper Subtasks:

WAT uses ASPEC for the dataset including training, development,
development test and test data. Participants of the scientific papers
subtask must get a copy of ASPEC by themselves. ASPEC consists of
approximately 3 million Japanese-English parallel sentences from paper
abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
paper excerpts (ASPEC-JC)

* Patent Subtasks:

WAT uses JPO Patent Corpus, which is constructed by Japan Patent
Office (JPO). This corpus consists of 1 million Chinese-Japanese
parallel sentences and 1 million Korean-Japanese parallel sentences
from patent description with four categories. Participants of patents
subtask are required to get it on WAT2016 site of JPO Patent Corpus.

* Newswire Subtasks (Indonesian <--> English):

WAT uses BPPT Corpus, which is constructed by Badan Pengkajian dan
Penerapan Teknologi (BPPT). This corpus consists of 50,000
Indonesian-Japanese parallel sentences from news description with five
categories. Participants of patents subtask are required to get it on
WAT2016 site of BPPT Corpus.

* Mixed domain Subtask:

- Hindi <--> English
WAT uses IITB Corpus for the dataset for training, development, development test and test data. The training corpus is mixed domain and contains around 1 million lines of sentences and phrases. In order to access the corpus participants should sign the following agreement, scan and send it to the addresss mentioned in it. The training corpus is a mixed domain corpus. The development and test set are from the News domain and are exactly the same as the ones in WMT 2014.

- Hindi <--> Japanese Pivot Language Task
For the first time we are introducing a pivot language task. For this tasks participants can use the following corpora.
A parallel corpus (created using openly available corpora) which is located at here.
The Hindi-English (IITB) task corpus and the English-Japanese (ASPEC) task corpus for pivoting. For triangulation of the source-pivot and pivot-target phrase tables they can use the scripts provided by: MultiMT .
The objective of this task is to compare the performance of a baseline system constructed only on a mixed domain parallel corpus with a system that uses additional mixed domain corpus by means of pivoting.

EVALUATION
----------

Automatic evaluation:
We are providing an automatic evaluation server. It is for free for
everyone, but you need to create an account for evaluation. Just
showing the list of evaluation results does not require an account.

Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/registration/index.html
Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/index.html

Human evaluation:
Both crowdsourcing evaluation (for all the submissions) and JPO
adequacy evaluation (for selected subtasks and selected submissions)
will be carried out. Participants can submit one translation result
for each subtask. The details of the selection will be announced
later.

INVITED TALK
------------

TBA

ORGANIZERS
----------

Toshiaki Nakazawa, Japan Science and Technology Agency (JST), Japan
Hideya Mino, National Institute of Information and Communications Technology (NICT), Japan
Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan
Isao Goto, Japan Broadcasting Corporation (NHK), Japan
Graham Neubig, Nara Institute of Science and Technology (NAIST), Japan
Sadao Kurohashi, Kyoto University, Japan
Ir. Hammam Riza, Agency for the Assessment and Application of Technology (BPPT), Indonesia
Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IIT), India

CONTACT
-------

wat@nlp.ist.i.kyoto-u.ac.jp