Chasing the Perfect Splitter: A Comparison of Different Compound Splitting Tools

Carla Parra Escartín


Abstract
This paper reports on the evaluation of two compound splitters for German. Compounding is a very frequent phenomenon in German and thus efficient ways of detecting and correctly splitting compound words are needed for natural language processing applications. This paper presents different strategies for compound splitting, focusing on German. Four compound splitters for German are presented. Two of them were used in Statistical Machine Translation (SMT) experiments, obtaining very similar qualitative scores in terms of BLEU and TER and therefore a thorough evaluation of both has been carried out.
Anthology ID:
L14-1694
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3340–3347
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/909_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Carla Parra Escartín. 2014. Chasing the Perfect Splitter: A Comparison of Different Compound Splitting Tools. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3340–3347, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Chasing the Perfect Splitter: A Comparison of Different Compound Splitting Tools (Escartín, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/909_Paper.pdf