Semantic Evaluation of Machine Translation

Billy Tak-Ming Wong


Abstract
It is recognized that many evaluation metrics of machine translation in use that focus on surface word level suffer from their lack of tolerance of linguistic variance, and the incorporation of linguistic features can improve their performance. To this end, WordNet is therefore widely utilized by recent evaluation metrics as a thesaurus for identifying synonym pairs. On this basis, word pairs in similar meaning, however, are still neglected. We investigate the significance of this particular word group to the performance of evaluation metrics. In our experiments we integrate eight different measures of lexical semantic similarity into an evaluation metric based on standard measures of unigram precision, recall and F-measure. It is found that a knowledge-based measure proposed by Wu and Palmer and a corpus-based measure, namely Latent Semantic Analysis, lead to an observable gain in correlation with human judgments of translation quality, in an extent to which better than the use of WordNet for synonyms.
Anthology ID:
L10-1579
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/837_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Billy Tak-Ming Wong. 2010. Semantic Evaluation of Machine Translation. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Semantic Evaluation of Machine Translation (Wong, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/837_Paper.pdf