The Kyutech corpus and topic segmentation using a combined method

Takashi Yamamura, Kazutaka Shimada, Shintaro Kawahara


Abstract
Summarization of multi-party conversation is one of the important tasks in natural language processing. In this paper, we explain a Japanese corpus and a topic segmentation task. To the best of our knowledge, the corpus is the first Japanese corpus annotated for summarization tasks and freely available to anyone. We call it “the Kyutech corpus.” The task of the corpus is a decision-making task with four participants and it contains utterances with time information, topic segmentation and reference summaries. As a case study for the corpus, we describe a method combined with LCSeg and TopicTiling for a topic segmentation task. We discuss the effectiveness and the problems of the combined method through the experiment with the Kyutech corpus.
Anthology ID:
W16-5412
Volume:
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Koiti Hasida, Kam-Fai Wong, Nicoletta Calzorari, Key-Sun Choi
Venue:
ALR
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
95–104
Language:
URL:
https://aclanthology.org/W16-5412
DOI:
Bibkey:
Cite (ACL):
Takashi Yamamura, Kazutaka Shimada, and Shintaro Kawahara. 2016. The Kyutech corpus and topic segmentation using a combined method. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 95–104, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
The Kyutech corpus and topic segmentation using a combined method (Yamamura et al., ALR 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-5412.pdf