An Aligned French-Chinese corpus of 10K segments from university educational material

Ruslan Kalitvianski, Lingxiao Wang, Valérie Bellynck, Christian Boitet


Abstract
This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT_NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.
Anthology ID:
W16-4915
Volume:
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Hsin-Hsi Chen, Yuen-Hsien Tseng, Vincent Ng, Xiaofei Lu
Venue:
NLP-TEA
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
117–121
Language:
URL:
https://aclanthology.org/W16-4915
DOI:
Bibkey:
Cite (ACL):
Ruslan Kalitvianski, Lingxiao Wang, Valérie Bellynck, and Christian Boitet. 2016. An Aligned French-Chinese corpus of 10K segments from university educational material. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016), pages 117–121, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
An Aligned French-Chinese corpus of 10K segments from university educational material (Kalitvianski et al., NLP-TEA 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4915.pdf