Clustering of Terms from Translation Dictionaries and Synonyms Lists to Automatically Build more Structured Linguistic Resources

Maria Teresa Pazienza, Armando Stellato


Abstract
Building a Linguistic Resource (LR) is a task requiring a huge quantitative of means, human resources and funds. Though finalization of the development phase and assessment of the produced resource, necessarily require human involvement, a computer aided process for building the resource’s initial structure would greatly reduce the overall effort to be undertaken. We present here a novel approach for automatizing the process of building structured (possibly multilingual) LRs, starting from already available LRs and exploiting simple vocabularies of synonyms and/or translations for different languages. A simple algorithm for clustering terms, according to their shared senses, is presented in two versions, both for separating flat list of synonyms and flat lists of translations. The algorithm is then motivated against two possible exploitations: reducing the cost for producing new LRs, and linguistically enriching the content of existing semantic resources, like SW ontologies and knowledge bases. Empirical results are provided for two experimental setups: automatic term clustering for English synonyms list, and for Italian translations of English terms
Anthology ID:
L08-1607
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/629_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Maria Teresa Pazienza and Armando Stellato. 2008. Clustering of Terms from Translation Dictionaries and Synonyms Lists to Automatically Build more Structured Linguistic Resources. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Clustering of Terms from Translation Dictionaries and Synonyms Lists to Automatically Build more Structured Linguistic Resources (Pazienza & Stellato, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/629_paper.pdf