ESL Synonym Questions (State of the art)
Jump to navigation
Jump to search
- ESL = English as a Second Language
- 50 multiple-choice synonym questions; 4 choices per question
- each question includes a sentence, providing context for the question
- ESL questions available from Peter Turney
- introduced in Turney (2001) as a way of evaluating algorithms for measuring degree of similarity between two words
- subsequently used by many other researchers
Sample question
Stem: "A rusty nail is not as strong as a clean, new one." Choices: (a) corroded (b) black (c) dirty (d) painted Solution: (a) corroded
Table of results
Algorithm | Reference for algorithm | Reference for experiment | Type | Correct | 95% confidence |
---|---|---|---|---|---|
Random | Random guessing | 1 / 4 = 25.00% | Random | 25.00% | |
RES | Resnik (1995) | Jarmasz and Szpakowicz (2003) | Hybrid | 32.66% | |
LC | Leacock and Chodrow (1998) | Jarmasz and Szpakowicz (2003) | Lexicon-based | 36.00% | |
LIN | Lin (1998) | Jarmasz and Szpakowicz (2003) | Hybrid | 36.00% | |
JC | Jiang and Conrath (1997) | Jarmasz and Szpakowicz (2003) | Hybrid | 36.00% | |
HSO | Hirst and St.-Onge (1998) | Jarmasz and Szpakowicz (2003) | Lexicon-based | 62.00 | |
PMI-IR | Turney (2001) | Turney (2001) | Corpus-based | 74.00% | |
JS | Jarmasz and Szpakowicz (2003) | Jarmasz and Szpakowicz (2003) | Lexicon-based | 82.00% |
Explanation of table
- Algorithm = name of algorithm
- Reference for algorithm = where to find out more about given algorithm
- Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
- Type = general type of algorithm: corpus-based, lexicon-based, hybrid
- Correct = percent of 80 questions that given algorithm answered correctly
- 95% confidence = confidence interval calculated using Binomial Exact Test
- table rows sorted in order of increasing percent correct
- several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
Caveats
- the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
- the ESL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns
References
Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.
Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.
Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.