Building a Corpus of Manually Revised Texts from Discourse Perspective

Ryu Iida, Takenobu Tokunaga


Abstract
This paper presents building a corpus of manually revised texts which includes both before and after-revision information. In order to create such a corpus, we propose a procedure for revising a text from a discourse perspective, consisting of dividing a text to discourse units, organising and reordering groups of discourse units and finally modifying referring and connective expressions, each of which imposes limits on freedom of revision. Following the procedure, six revisers who have enough experience in either teaching Japanese or scoring Japanese essays revised 120 Japanese essays written by Japanese native speakers. Comparing the original and revised texts, we found some specific manual revisions frequently occurred between the original and revised texts, e.g. ‘thesis’ statements were frequently placed at the beginning of a text. We also evaluate text coherence using the original and revised texts on the task of pairwise information ordering, identifying a more coherent text. The experimental results using two text coherence models demonstrated that the two models did not outperform the random baseline.
Anthology ID:
L14-1173
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
936–941
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/155_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ryu Iida and Takenobu Tokunaga. 2014. Building a Corpus of Manually Revised Texts from Discourse Perspective. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 936–941, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Building a Corpus of Manually Revised Texts from Discourse Perspective (Iida & Tokunaga, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/155_Paper.pdf