Dictionary Look-up with Katakana Variant Recognition

Satoshi Sato


Abstract
The Japanese language has rich variety and quantity of word variant. Since 1980s, it has been recognized that this richness becomes an obstacle against natural language processing. A complete solution, however, has not been presented yet. This paper proposes a method to recognize Katakana variants―a major type of word variant in Japanese―in the process of dictionary look-up. For a given set of variant generation rules, the method executes variant generation and entry retrieval simultaneously and efficiently. We have developed the seven-layered rule set (216 rules in total) according to the specification manual of UniDic-2.1.0 and other sources. An experiment shows that the spelling-variant generator with 102 rules in the first five layers is almost perfect. Another experiment shows that the form-variant generator with all 216 rules is powerful and 77.7% of multiple spellings of Katakana loanwords are unnecessary (i.e., can be removed). This result means that the proposed method can drastically reduce the number of variants that we have to register into a dictionary in advance.
Anthology ID:
L12-1121
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
249–255
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/282_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Satoshi Sato. 2012. Dictionary Look-up with Katakana Variant Recognition. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 249–255, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Dictionary Look-up with Katakana Variant Recognition (Sato, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/282_Paper.pdf