Difference between revisions of "ESL Synonym Questions (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
(13 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
* 50 multiple-choice synonym questions; 4 choices per question
 
* 50 multiple-choice synonym questions; 4 choices per question
 
* each question includes a sentence, providing context for the question
 
* each question includes a sentence, providing context for the question
* ESL questions available from [http://www.apperceptual.com/ Peter Turney]
+
* ESL questions available on request from [http://www.apperceptual.com/ Peter Turney]
* introduced in Turney (2001) as a way of evaluating algorithms for measuring degree of similarity between two words
+
* introduced in Turney (2001) as a way of evaluating algorithms for measuring degree of similarity between words
 
* subsequently used by many other researchers
 
* subsequently used by many other researchers
  
Line 54: Line 54:
 
| Random
 
| Random
 
| 25.00%
 
| 25.00%
|  
+
| 14.63-40.34%
 
|-
 
|-
 
| RES
 
| RES
Line 61: Line 61:
 
| Hybrid
 
| Hybrid
 
| 32.66%
 
| 32.66%
|  
+
| 21.21-48.77%
 
|-
 
|-
 
| LC
 
| LC
Line 68: Line 68:
 
| Lexicon-based
 
| Lexicon-based
 
| 36.00%
 
| 36.00%
|  
+
| 22.92-50.81%
 
|-
 
|-
 
| LIN
 
| LIN
Line 75: Line 75:
 
| Hybrid
 
| Hybrid
 
| 36.00%
 
| 36.00%
|  
+
| 22.92-50.81%
 
|-
 
|-
 
| JC
 
| JC
Line 82: Line 82:
 
| Hybrid
 
| Hybrid
 
| 36.00%
 
| 36.00%
|  
+
| 22.92-50.81%
 
|-
 
|-
 
| HSO
 
| HSO
Line 89: Line 89:
 
| Lexicon-based
 
| Lexicon-based
 
| 62.00%
 
| 62.00%
|  
+
| 47.18-75.35%
 
|-
 
|-
 
| PMI-IR
 
| PMI-IR
Line 96: Line 96:
 
| Corpus-based
 
| Corpus-based
 
| 74.00%
 
| 74.00%
|  
+
| 59.66-85.37%
 +
|-
 +
| PMI-IR
 +
| Terra and Clarke (2003)
 +
| Terra and Clarke (2003)
 +
| Corpus-based
 +
| 80.00%
 +
| 66.28-89.97%
 
|-
 
|-
 
| JS
 
| JS
Line 103: Line 110:
 
| Lexicon-based
 
| Lexicon-based
 
| 82.00%
 
| 82.00%
|  
+
| 68.56-91.42%
 
|-
 
|-
 
|}
 
|}
 +
  
 
== Explanation of table ==
 
== Explanation of table ==
Line 111: Line 119:
 
* '''Algorithm''' = name of algorithm
 
* '''Algorithm''' = name of algorithm
 
* '''Reference for algorithm''' = where to find out more about given algorithm
 
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
+
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with ESL questions
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
 
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using [http://home.clara.net/sisa/onemean.htm Binomial Exact Test]
+
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
 
* table rows sorted in order of increasing percent correct
 
* table rows sorted in order of increasing percent correct
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 
+
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 +
* Terra and Clarke (2003) call the ESL Synonym Questions "TS1"
  
 
== Caveats ==
 
== Caveats ==
Line 129: Line 138:
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
  
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
+
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
  
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
Line 142: Line 151:
  
 
Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.
 
Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.
 +
 +
== See also ==
 +
 +
* [[Attributional and Relational Similarity (State of the art)]]
 +
* [[SAT Analogy Questions]]
 +
* [[TOEFL Synonym Questions]]
 +
* [[State of the art]]
 +
 +
 +
[[Category:State of the art]]

Revision as of 10:55, 13 January 2013

  • ESL = English as a Second Language
  • 50 multiple-choice synonym questions; 4 choices per question
  • each question includes a sentence, providing context for the question
  • ESL questions available on request from Peter Turney
  • introduced in Turney (2001) as a way of evaluating algorithms for measuring degree of similarity between words
  • subsequently used by many other researchers


Sample question

Stem: "A rusty nail is not as strong as a clean, new one."
Choices: (a) corroded
(b) black
(c) dirty
(d) painted
Solution: (a) corroded


Table of results

Algorithm Reference for algorithm Reference for experiment Type Correct 95% confidence
Random Random guessing 1 / 4 = 25.00% Random 25.00% 14.63-40.34%
RES Resnik (1995) Jarmasz and Szpakowicz (2003) Hybrid 32.66% 21.21-48.77%
LC Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 36.00% 22.92-50.81%
LIN Lin (1998) Jarmasz and Szpakowicz (2003) Hybrid 36.00% 22.92-50.81%
JC Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) Hybrid 36.00% 22.92-50.81%
HSO Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 62.00% 47.18-75.35%
PMI-IR Turney (2001) Turney (2001) Corpus-based 74.00% 59.66-85.37%
PMI-IR Terra and Clarke (2003) Terra and Clarke (2003) Corpus-based 80.00% 66.28-89.97%
JS Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) Lexicon-based 82.00% 68.56-91.42%


Explanation of table

  • Algorithm = name of algorithm
  • Reference for algorithm = where to find out more about given algorithm
  • Reference for experiment = where to find out more about evaluation of given algorithm with ESL questions
  • Type = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using the Binomial Exact Test
  • table rows sorted in order of increasing percent correct
  • several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
  • PMI-IR = Pointwise Mutual Information - Information Retrieval
  • Terra and Clarke (2003) call the ESL Synonym Questions "TS1"

Caveats

  • the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
  • the ESL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns


References

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

See also