Patent Machine Translation Task (PatentMT) at NTCIR-9

Event Notification Type: 
Call for Participation
Abbreviated Title: 
PatentMT @ NTCIR-9
Location: 
National Institiute of Informatics (NII)
Tuesday, 6 December 2011 to Friday, 9 December 2011
Country: 
Japan
City: 
Tokyo
Contact: 
Chinese-English: Benjamin K. Tsou (Hong Kong Institute of Education, China)
Japanese-English: Isao Goto (NICT, Japan)
Submission Deadline: 
Thursday, 20 January 2011

-------------------------------------------------------------------------

Patent Machine Translation Task (PatentMT) at NTCIR-9
Call for Participation

December 6-9, 2011, Tokyo, Japan
http://ntcir.nii.ac.jp/PatentMT/

- A Chinese to English subtask newly added
- Human evaluations will be carried out
- Parallel corpora provided: 1 million Chinese-English and
3 million Japanese-English sentence pairs

-------------------------------------------------------------------------

Interested participants are invited to the Patent Machine Translation
task (PatentMT) at NTCIR-9.

Patents constitute one of the challenging domains for machine
translation because the sentences can be quite long and contain complex
structures. Moreover, there is a significant practical need for machine
translation of patent documents. Let us cultivate this challenging and
significant practical research field with patent machine translation!
Participants can use a large-scale patent parallel corpus for research
and will benefit from reliable human evaluation of their MT quality.

PatentMT, while cast in a framework of friendly competition, has the
ultimate goal to foster scientific cooperation. In this context, the
organizers propose a research task and an open experimental
infrastructure for the scientific community working on machine
translation research.

PatentMT at NTCIR-9 offers three subtasks:
- Chinese to English
- Japanese to English
- English to Japanese
Participants choose the subtasks in which they would like to
participate.

Patent parallel corpora from mostly patent descriptions and patent
monolingual corpora will be provided for training. The training
resources are as follows:
- Chinese to English subtask:
* 1 million Chinese-English parallel sentences
* Large-scale monolingual patent documents in English
- Japanese to English subtask:
* 3 million Japanese-English parallel sentences
* Large-scale monolingual patent documents in English
- English to Japanese subtask:
* 3 million Japanese-English parallel sentences
* Large-scale monolingual patent documents in Japanese
Moreover, blind test sets of patent descriptions will be released.
Participants are requested to machine translate the test sets.
Use of the data will be governed by NTCIR-9 agreement which is under
preparation.

The submitted translation results will be evaluated by human evaluation
and automatic evaluation. The primary evaluation is human evaluation.
So the MT systems can be tested in an all-round and human manner.
Result feedback will be released by the organizers after the evaluation
of the run submissions.

Participants are requested to submit a paper describing the MT system,
the utilized resources, and their results using the provided test data,
and are requested to present their papers at the workshop.

=== Important Dates

- Training data release January 5, 2011
- Task registration due January 20, 2011
- Test data release May 9, 2011
- Translation results submission due May 22, 2011
- Evaluation results release August 19, 2011
- MT system description due September 20, 2011
- Camera-ready due November 4, 2011
- NTCIR-9 workshop December 6-9, 2011

=== Organizers

Chinese-English Side:
- Benjamin K. Tsou (Hong Kong Institute of Education/
City University of Hong Kong, China)
- Kapo Chow (Hong Kong Institute of Education, China)
- Bin Lu (City University of Hong Kong, China)

Japanese-English side:
- Isao Goto (NICT, Japan)
- Eiichiro Sumita (NICT, Japan)

For more information, please visit the NTICR-9 PatentMT web site:
http://ntcir.nii.ac.jp/PatentMT/

-------------------------------------------------------------------------