ACL Wiki - User contributions [en]

TOEFL Synonym Questions (State of the art)

2013-10-02T22:06:30Z

Wartena: /* References */ Added Karlgren and Sahlgren 2001

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| RI
| Karlgren and Sahlgren (2001)
| Karlgren and Sahlgren (2001)
| Corpus-based
| 72%
|
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
|-
| ADW
| Pilehvar et al. (2013)
| Pilehvar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space
* RI = Random Indexing

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Karlgren, J. and Sahlgren, M. (2001). [http://www.sics.se/~jussi/Artiklar/2001_RWIbook/KarlgrenSahlgren2001.pdf From Words to Understanding]. In Uesaka, Y., Kanerva, P., & Asoh, H. (Eds.), ''Foundations of Real-World Intelligence'', Stanford: CSLI Publications, pp. 294–308.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). [http://books.google.ca/books?id=Rehu8OOzMIMC&lpg=PA265&ots=IpnaLkZUec&lr&pg=PA265#v=onepage&q&f=false Combining local context and WordNet similarity for word sense identification]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

Pilehvar, M.T., Jurgens D., and Navigli R. (2013). [http://wwwusers.di.uniroma1.it/~navigli/pubs/ACL_2013_Pilehvar_Jurgens_Navigli.pdf Align, disambiguate and walk: A unified approach for measuring semantic similarity]. ''Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013),'' Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]

TOEFL Synonym Questions (State of the art)

2013-10-02T21:57:00Z

Wartena: /* Explanation of table */

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| RI
| Karlgren and Sahlgren (2001)
| Karlgren and Sahlgren (2001)
| Corpus-based
| 72%
|
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
|-
| ADW
| Pilehvar et al. (2013)
| Pilehvar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space
* RI = Random Indexing

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). [http://books.google.ca/books?id=Rehu8OOzMIMC&lpg=PA265&ots=IpnaLkZUec&lr&pg=PA265#v=onepage&q&f=false Combining local context and WordNet similarity for word sense identification]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

Pilehvar, M.T., Jurgens D., and Navigli R. (2013). [http://wwwusers.di.uniroma1.it/~navigli/pubs/ACL_2013_Pilehvar_Jurgens_Navigli.pdf Align, disambiguate and walk: A unified approach for measuring semantic similarity]. ''Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013),'' Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]

TOEFL Synonym Questions (State of the art)

2013-10-02T21:56:03Z

Wartena: /* Table of results */ Added Randon Indexing Result from Karlgren and Sahlgren

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| RI
| Karlgren and Sahlgren (2001)
| Karlgren and Sahlgren (2001)
| Corpus-based
| 72%
|
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
|-
| ADW
| Pilehvar et al. (2013)
| Pilehvar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). [http://books.google.ca/books?id=Rehu8OOzMIMC&lpg=PA265&ots=IpnaLkZUec&lr&pg=PA265#v=onepage&q&f=false Combining local context and WordNet similarity for word sense identification]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

Pilehvar, M.T., Jurgens D., and Navigli R. (2013). [http://wwwusers.di.uniroma1.it/~navigli/pubs/ACL_2013_Pilehvar_Jurgens_Navigli.pdf Align, disambiguate and walk: A unified approach for measuring semantic similarity]. ''Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013),'' Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]