Annotating dropped pronouns in Chinese newswire text

Elizabeth Baran, Yaqin Yang, Nianwen Xue


Abstract
We propose an annotation framework to explicitly identify dropped subject pronouns in Chinese. We acknowledge and specify 10 concrete pronouns that exist as words in Chinese and 4 abstract pronouns that do not correspond to Chinese words, but that are recognized conceptually, to native Chinese speakers. These abstract pronouns are identified as """"unspecified"""", """"pleonastic"""", """"event"""", and """"existential"""" and are argued to exist cross-linguistically. We trained two annotators, fluent in Chinese, and adjudicated their annotations to form a gold standard. We achieved an inter-annotator agreement kappa of .6 and an observed agreement of .7. We found that annotators had the most difficulty with the abstract pronouns, such as """"unspecified"""" and """"event"""", but we posit that further specification and training has the potential to significantly improve these results. We believe that this annotated data will serve to help improve Machine Translation models that translate from Chinese to a non pro-drop language, like English, that requires all subject pronouns to be explicit.
Anthology ID:
L12-1177
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2795–2799
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/361_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Elizabeth Baran, Yaqin Yang, and Nianwen Xue. 2012. Annotating dropped pronouns in Chinese newswire text. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2795–2799, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Annotating dropped pronouns in Chinese newswire text (Baran et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/361_Paper.pdf