ACL Wiki - User contributions [en]

TOEFL Synonym Questions (State of the art)

2013-07-03T15:25:30Z

David.jurgens:

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
|-
| ADW
| Pilevar et al. (2013)
| Pilevar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

Pilehvar, M.T., Jurgens D., and Navigli R. (2013) Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. ''Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013),'' Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]

TOEFL Synonym Questions (State of the art)

2013-07-03T15:24:45Z

David.jurgens: Fixing non-conforming reference text

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
|-
| ADW
| Pilevar et al. (2013)
| Pilevar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

Pilehvar, M.T., Jurgens D., and Navigli R. (2013) Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. "Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)," Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]

TOEFL Synonym Questions (State of the art)

2013-07-03T15:22:52Z

David.jurgens: Fixing typo in wiki markup

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
|-
| ADW
| Pilevar et al. (2013)
| Pilevar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

M. T. Pilehvar, D. Jurgens and R. Navigli. (2013) Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]

TOEFL Synonym Questions (State of the art)

2013-07-03T15:21:34Z

David.jurgens: Updated to add results from Pilevar et al. (2013)

* TOEFL = Test of English as a Foreign Language
* 80 multiple-choice synonym questions; 4 choices per question
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* subsequently used by many other researchers

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| levied
|-
! Choices:
| (a)
| imposed
|-
|
| (b)
| believed
|-
|
| (c)
| requested
|-
|
| (d)
| correlated
|-
! Solution:
| (a)
| imposed
|-
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| RES
| Resnik (1995)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 20.31%
| 12.89–31.83%
|-
| LC
| Leacock and Chodrow (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 21.88%
| 13.91–33.21%
|-
| LIN
| Lin (1998)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 24.06%
| 15.99–35.94%
|-
| Random
| Random guessing
| 1 / 4 = 25.00%
| Random
| 25.00%
| 15.99–35.94%
|-
| JC
| Jiang and Conrath (1997)
| Jarmasz and Szpakowicz (2003)
| Hybrid
| 25.00%
| 15.99–35.94%
|-
| LSA
| Landauer and Dumais (1997)
| Landauer and Dumais (1997)
| Corpus-based
| 64.38%
| 52.90–74.80%
|-
| Human
| Average non-English US college applicant
| Landauer and Dumais (1997)
| Human
| 64.50%
| 53.01–74.88%
|-
| DS
| Pado and Lapata (2007)
| Pado and Lapata (2007)
| Corpus-based
| 73.00%
| 62.72-82.96%
|-
| PMI-IR
| Turney (2001)
| Turney (2001)
| Corpus-based
| 73.75%
| 62.72–82.96%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 76.25%
| 65.42-85.06%
|-
| HSO
| Hirst and St.-Onge (1998)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 77.91%
| 68.17–87.11%
|-
| JS
| Jarmasz and Szpakowicz (2003)
| Jarmasz and Szpakowicz (2003)
| Lexicon-based
| 78.75%
| 68.17–87.11%
|-
| PMI-IR
| Terra and Clarke (2003)
| Terra and Clarke (2003)
| Corpus-based
| 81.25%
| 70.97–89.11%
|-
| CWO
| Ruiz-Casado et al. (2005)
| Ruiz-Casado et al. (2005)
| Web-based
| 82.55%
| 72.38–90.09%
|-
| PPMIC
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2007)
| Corpus-based
| 85.00%
| 75.26-92.00%
|-
| GLSA
| Matveeva et al. (2005)
| Matveeva et al. (2005)
| Corpus-based
| 86.25%
| 76.73-92.93%
|-
| LSA
| Rapp (2003)
| Rapp (2003)
| Corpus-based
| 92.50%
| 84.39-97.20%
-
| ADW
| Pilevar et al. (2013)
| Pilevar et al. (2013)
| WordNet graph-based (unsupervised)
| 96.25%
| 89.43-99.22%
|-
| PR
| Turney et al. (2003)
| Turney et al. (2003)
| Hybrid
| 97.50%
| 91.26–99.70%
|-
| PCCP
| Bullinaria and Levy (2012)
| Bullinaria and Levy (2012)
| Corpus-based
| 100.00%
| 96.32-100.00%
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* LSA = Latent Semantic Analysis
* PCCP = Principal Component vectors with Caron P
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* PR = Product Rule
* PPMIC = Positive Pointwise Mutual Information with Cosine
* GLSA = Generalized Latent Semantic Analysis
* CWO = Context Window Overlapping
* DS = Dependency Space

== Notes ==

* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

== References ==

Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'', 44(3):890-907.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.

M. T. Pilehvar, D. Jurgens and R. Navigli. (2013) Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria.

Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
* [[SAT Analogy Questions]]
* [[State of the art]]

[[Category:State of the art]]

SemEval Portal

2013-02-05T17:23:48Z

David.jurgens: Updated SemEval-2013 with location and collocated conferences

This page serves as a community portal for everything related to Semantic Evaluation ('''SemEval''').

Quick links:

* [[SemEval-2012]]
* [[SemEval-2013]]
* [http://www.clres.com/siglex.html SIGLEX]

==Semantic Evaluation Exercises==

'''SemEval''' (Semantic Evaluation) is an ongoing series of evaluations of [[Semantics|computational semantic analysis]] systems; it evolved from the Senseval [[Word sense disambiguation|Word sense]] evaluation series. The evaluations are intended to explore the nature of [[Semantics|meaning]] in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify [[Word sense disambiguation|word senses]] computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., [[semantic role labeling]]), relations between sentences (e.g., [[coreference]]), and the nature of what we are saying ([[semantic relations]] and [[sentiment analysis]]).

The purpose of the SemEval exercises and [http://www.senseval.org SENSEVAL] is to evaluate semantic analysis systems. The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation, each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the 4th workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. This portal will be used to provide a comprehensive view of the issues involved in semantic evaluations.

==Upcoming and Past Events==

{| border="1" cellpadding="7" cellspacing="0"
|-
! Event
! Year
! Location
! Notes
|-
| [http://www.cs.york.ac.uk/semeval-2013/ SemEval-2013]
| align="center" | 2013
| Atlanta, USA
| - part of [http://clic2.cimec.unitn.it/starsem2013/ *SEM] and [http://naacl2013.naacl.org/ NAACL] <br /> - discussion at [http://groups.google.com/group/semeval3 SemEval 3 Group]
|-
| [http://www.cs.york.ac.uk/semeval-2012/ SemEval-2012]
| align="center" | 2012
| Montreal, Canada
| - part of [http://ixa2.si.ehu.es/starsem/ *SEM] <br /> - discussion at [http://groups.google.com/group/semeval3 SemEval 3 Group]
|-
| [http://semeval2.fbk.eu/semeval2.php SemEval-2010]
| align="center" | 2010
| Uppsala, Sweden
| - [http://aclweb.org/anthology-new/S/S10/ proceedings]
|-
| [http://nlp.cs.swarthmore.edu/semeval/index.php SemEval-2007]
| align="center" | 2007
| Prague, Czech Republic
| - [http://aclweb.org/anthology-new/S/S07/ proceedings] <br /> - copy of website at [http://web.archive.org/web/20080727062358/http://nlp.cs.swarthmore.edu/semeval/index.php Internet Archive]
|-
| [http://www.senseval.org/senseval3 SENSEVAL 3]
| align="center" | 2004
| Barcelona, Spain
| - [http://aclweb.org/anthology-new/W/W04/#0800 proceedings]
|-
| [http://www.just-the-word.com/senseval2/ SENSEVAL 2]
| align="center" | 2001
| Toulouse, France
| - main link provides links to results, data, system descriptions, task descriptions, and workshop program <br /> - copy of website at [http://web.archive.org/web/20050507011044/http://www.sle.sharp.co.uk/senseval2/ Internet Archive]
|-
| [http://www.itri.brighton.ac.uk/events/senseval/ARCHIVE/index.html SENSEVAL 1]
| align="center" | 1998
| East Sussex, UK
| - papers in [http://www.springerlink.com/content/0010-4817/34/1-2/ Computers and the Humanities], subscribers or pay per view
|-
|}

==Overview of Issues in Semantic Analysis==

The SemEval exercises provide a mechanism for examining issues in semantic analysis of texts. The topics of interest are concerned with identifying and characterizing the kinds of issues relevant to human understanding of language; the topics are generally different from the concerns of the logic-based approach of formal computational semantics. The primary goal is to replicate human processing by means of computer systems. The tasks (shown below) are developed by individuals and groups to deal with identifiable issues, as they take on some concrete form.

The first major area in semantic analysis is the identification of the intended meaning at the word level (taken to include idiomatic expressions). This is word-sense disambiguation (a concept that is evolving away from the notion that words have discrete senses, but rather are characterized by the ways in which they are used, i.e., their contexts). The tasks in this area include lexical sample and all-word disambiguation, multi- and cross-lingual disambiguation, and lexical substitution. Given the difficulties of identifying word senses, other tasks relevant to this topic include word-sense induction, subcategorization acquisition, and evaluation of lexical resources. The tasks in this area may be characterized as dealing with dictionary issues.

The second major area in semantic analysis is the understanding of how different sentence and textual elements fit together. Tasks in this area include semantic role labeling, semantic relation analysis, and coreference resolution. Other tasks in this area look at more specialized issues of semantic analysis, such as temporal information processing, metonymy resolution, and sentiment analysis. The tasks in this area have many potential applications, such as information extraction, question answering, document summarization, machine translation, construction of thesauri and semantic networks, language modeling, paraphrasing, and recognizing textual entailment. In each of these potential applications, the contribution of the types of semantic analysis constitutes the most outstanding research issue.

==Tasks in Semantic Evaluation==

The major tasks in semantic evaluation include:
* '''[[Word sense disambiguation]]''': WSD, lexical sample and all-words, the process of identifying which [[Word sense disambiguation|sense]] of a word (i.e. [[Semantics|meaning]]) is used in a [[Sentence (linguistics)|sentence]], when the word has multiple meanings ([[polysemy]]). The WSD task has two variants: "[[lexical sample task|lexical sample]]" and "[[all-words task|all words]]" task. The former comprises disambiguating the occurrences of a small sample of target words which were previously selected, while in the latter all the words in a piece of running text need to be disambiguated. Tasks have been performed for many languages. Tasks have covered disambiguation of nouns, verbs, adjectives, and prepositions. A new task is evaluating phrasal semantics (compositionality and semantic similarity of phrases).
* '''Multi-lingual or cross-lingual word-sense disambiguation''': word senses are defined according to translation distinctions, e.g., a polysemous word in Japanese is translated differently in a given context. The WSD task provides texts with target words and requires identification of the appropriate translation. A related task is cross-language information retrieval, where participants disambiguate in one language (e.g., with WordNet synsets) and retrieve documents in another language; standard information retrieval metrics are use to assess the quality of the disambiguation. New tasks include cross-lingual content-based recommendation (where user profiles are built to recommend items of interest in another language), examining semantic textual similarity with a view toward evaluating modular semantic components, and linking noun phrases across Wikipedia articles in different languages.
* '''Word-sense induction''': comparison of sense-induction and discrimination systems. The task is to cluster corpus instances (word uses, rather than word senses) and to evaluate systems on how well they correspond to pre-existing sense inventories or to various sense mapping systems. New tasks are to provide an evaluation framework for web search result clustering, induction for graded or non-graded senses, and tags used in folksonomies.
* '''Lexical substitution or simplification''': find an alternative substitute word or phrase for a target word in context. The task involves both finding the synonyms and disambiguating the context. It allows the use of any kind of lexical resource or technique, including word sense disambiguation and word sense induction. A cross-lingual task was also defined. This topic also includes textual entailment and paraphrasing tasks.
* '''Evaluation of lexical resources''': the task evaluates the submitted lexical resources indirectly, running a simple WSD based on topic signatures (sets of words related to each target sense). A lexical sample tagged with English WordNet senses was used for evaluation.
* '''Subcategorization acquistion''': semantically similar verbs are similar in terms of subcategorization frames. The task is to use any available method for disambiguating verb senses, so that the results can then be fed into automatic methods used for acquiring subcategorization frames, with the hypothesis that the disambiguation will cluster the instances.
* '''Semantic role labeling''': identifying and labeling constituents of sentences with their semantic roles. The basic task began with attempts to replicate FrameNet data, specifically frame elements. This task has expanded to inferring and developing new frames and frame elements, in individual sentences and in full running texts, with identification of intersentential links and coreference chains. New tasks focus on extraction of spatial information from natural language (spatial role labeling) and the utility of semantic dependency parsing in semantic role labeling.
* '''[[Semantic relation identification]]''': examining relations between lexical items in a sentence. The task, given a sample of semantic relation types, is to identify and classify semantic relations between nominals (i.e., nouns and base noun phrases, excluding named entities); a main purpose of this task is to assess different classification methods. Another task is, given a sentence and two tagged nominals, to predict the relation between those nominals and the direction of the relation. New tasks seek to measure the relational similarity between pairs of words, to extract drug-drug interactions from biomedical texts, and to develop methods in causal reasoning.
* '''Metonymy resolution''': the figurative substitution of an attribute of a name for the thing specified. The task is a lexical sample task (1) to classify preselected expressions of a particular semantic class (such as country names) as having a literal or a metonymic reading, and if so, (2) to identify a further specification into prespecified metonymic patterns (such as place-for-event or company-for-stock) or, alternatively, recognition as an innovative reading. A second task is to identify when the arguments of a specified predicate does not satisfy selectional restrictions, and if not, to identify both the type mismatch and the type shift (coercion).
* '''Temporal information processing''': the temporal location and order of events in newspaper articles, narratives, and similar texts. The task is to identify the events described in a text and locate these in time, i.e., identification of temporal referring expressions, events and temporal relations within a text. A further task requires systems to recognize which of a fixed set of temporal relations holds between (a) events and time expressions within the same sentence (b) events and the document creation time (c) main events in consecutive sentences, and (d) two events where one syntactically dominates the other.
* '''Coreference resolution''': detection and resolution of coreferences. The task is to detect full coreference chains, composed by named entities, pronouns, and full noun phrases and to resolve pronouns, i.e., finding their antecedents.
* '''Sentiment analysis''': emotion annotation, polarity orientation labeling. The task is to classify the titles of newspaper articles with the appropriate emotion label and/or with a valence indication (positive/negative), given a set of predefined six emotion labels (i.e., Anger, Disgust, Fear, Joy, Sadness, Surprise). A new task is to examine polarity in Twitter.
This list is expected to grow as the field progresses.

Some tasks are closely related to each other. For instance, word sense disambiguation (monolingual, multi-lingual and cross-lingual), word sense induction task, lexical substitution, subcategorization acquisition and evaluation of lexical resources are all related to word senses.

==Organization==

[http://www.clres.com/siglex.html SIGLEX, the ACL Special Interest Group on the Lexicon] is the umbrella organization for the SemEval semantic evaluations and the SENSEVAL word-sense evaluation exercises. [http://www.senseval.org/ SENSEVAL] is the home page for SENSEVAL 1-3. Each exercise is usually organized by two individuals, who make the call for tasks and handle the overall administration. Within the general guidelines, each task is then organized and run by individuals or groups.

==SemEval on Wikipedia==

On [http://en.wikipedia.org/wiki/Main_Page Wikipedia], a [http://en.wikipedia.org/wiki/SemEval SemEval] page had been created and it is calling for contributions and suggestions on how to improve the Wikipedia page and to further the understanding of computational semantics.

==See also==

* [[Semantics]]
* [[Computational Semantics]]
* [[Statistical Semantics]]
* [[Semantics software for English]]

[[Category:SemEval Portal]]

SAT Analogy Questions (State of the art)

2010-04-29T22:48:52Z

David.jurgens: Updated with Bollegala et al. reference.

* SAT = Scholastic Aptitude Test
* 374 multiple-choice analogy questions; 5 choices per question
* SAT questions collected by [http://www.cs.rutgers.edu/~mlittman/ Michael Littman], available from [http://www.apperceptual.com/ Peter Turney]
* introduced in Turney et al. (2003) as a way of evaluating algorithms for measuring relational similarity

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| mason:stone
|-
! Choices:
| (a)
| teacher:chalk
|-
|
| (b)
| carpenter:wood
|-
|
| (c)
| soldier:gun
|-
|
| (d)
| photograph:camera
|-
|
| (e)
| book:word
|-
! Solution:
| (b)
| carpenter:wood
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| Random
| Random guessing
| 1 / 5 = 20.0%
| Random
| 20.0%
| 16.1-24.5%
|-
| JC
| Jiang and Conrath (1997)
| Turney (2006b)
| Hybrid
| 27.3%
| 23.1-32.4%
|-
| LIN
| Lin (1998)
| Turney (2006b)
| Hybrid
| 27.3%
| 23.1-32.4%
|-
| LC
| Leacock and Chodrow (1998)
| Turney (2006b)
| Lexicon-based
| 31.3%
| 26.9-36.5%
|-
| HSO
| Hirst and St.-Onge (1998)
| Turney (2006b)
| Lexicon-based
| 32.1%
| 27.6-37.4%
|-
| RES
| Resnik (1995)
| Turney (2006b)
| Hybrid
| 33.2%
| 28.7-38.5%
|-
| PMI-IR
| Turney (2001)
| Turney (2006b)
| Corpus-based
| 35.0%
| 30.2-40.1%
|-
| LSA+Predication
| Mangalath et al. (2004)
| Mangalath et al. (2004)
| Corpus-based
| 42.0%
| 37.2-47.4%
|-
| KNOW-BEST
| Veale (2004)
| Veale (2004)
| Lexicon-based
| 43.0%
| 38.0-48.2%
|-
| ''k''-means
| Bicici and Yuret (2006)
| Bicici and Yuret (2006)
| Corpus-based
| 44.0%
| 39.0-49.3%
|-
| BagPack
| Herdağdelen and Baroni (2009)
| Herdağdelen and Baroni (2009)
| Corpus-based
| 44.1%
| (not stated in paper)
|-
| VSM
| Turney and Littman (2005)
| Turney and Littman (2005)
| Corpus-based
| 47.1%
| 42.2-52.5%
|-
| BMI
| Bollegala et al. (2009)
| Bollegala et al. (2009)
| Corpus-based
| 51.1%
| (not stated in paper)
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 52.1%
| 46.9-57.3%
|-
| PERT
| Turney (2006a)
| Turney (2006a)
| Corpus-based
| 53.5%
| 48.5-58.9%
|-
| LRA
| Turney (2006b)
| Turney (2006b)
| Corpus-based
| 56.1%
| 51.0–61.2%
|-
| Human
| Average US college applicant
| Turney and Littman (2005)
| Human
| 57.0%
| 52.0-62.3%
|-
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with SAT questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 374 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* KNOW-BEST = KNOWledge-Based Entertainment and Scholastic Testing
* VSM = Vector Space Model
* LRA = Latent Relational Analysis
* PERT = Pertinence
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* LSA+Predication = Latent Semantic Analysis + Predication
* BagPack = Bag of words representation of Paired concept knowledge

== References ==

Bicici, E., and Yuret, D. (2006). [http://www.denizyuret.com/pub/tainn-06/LAWSQ-LNCS.pdf Clustering word pairs to answer analogy questions]. ''Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006)''.

Bollegala D., Matsuo Y., and Ishizuka M. (2009). [http://www2009.org/proceedings/pdf/p651.pdf Measuring the similarity between implicit semantic relations from the web]. In Proceedings of the 18th international conference on World wide web, ACM, pages 651–660.

Herdağdelen A. and Baroni M. (2009) [http://clic.cimec.unitn.it/marco/publications/gems-09/herdagdelen-baroni-gems09.pdf BagPack: A general framework to repre- sent semantic relations]. Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop, East Stroudsburg PA: ACL, 33-40.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Mangalath, P., Quesada, J., and Kintsch, W. (2004). [http://www.josequesada.name/papers/Mangalath-Quesada-2004-analogyPredicationCogSciPoster1.pdf Analogy-making as predication using relational information and LSA vectors]. In K.D. Forbus, D. Gentner & T. Regier (Eds.), ''Proceedings of the 26th Annual Meeting of the Cognitive Science Society''. Chicago: Lawrence Erlbaum Associates.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D., and Littman, M.L. (2005). [http://arxiv.org/abs/cs.LG/0508103 Corpus-based learning of analogies and semantic relations]. ''Machine Learning'', 60 (1-3), 251-278.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D. (2006a). [http://arxiv.org/abs/cs.CL/0607120 Expressing implicit semantic relations without supervision]. ''Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (Coling/ACL-06)'', Sydney, Australia, pp. 313-320.

Turney, P.D. (2006b). [http://arxiv.org/abs/cs.CL/0608100 Similarity of semantic relations]. ''Computational Linguistics'', 32 (3), 379-416.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

Veale, T. (2004). [http://afflatus.ucd.ie/Papers/ecai2004.pdf WordNet sits the SAT: A knowledge-based approach to lexical analogy]. ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', pp. 606–612, Valencia, Spain.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[TOEFL Synonym Questions]]
* [[State of the art]]

[[Category:State of the art]]

SAT Analogy Questions (State of the art)

2010-04-29T22:46:23Z

David.jurgens: Updated table with Bollegala et al. result

* SAT = Scholastic Aptitude Test
* 374 multiple-choice analogy questions; 5 choices per question
* SAT questions collected by [http://www.cs.rutgers.edu/~mlittman/ Michael Littman], available from [http://www.apperceptual.com/ Peter Turney]
* introduced in Turney et al. (2003) as a way of evaluating algorithms for measuring relational similarity

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| mason:stone
|-
! Choices:
| (a)
| teacher:chalk
|-
|
| (b)
| carpenter:wood
|-
|
| (c)
| soldier:gun
|-
|
| (d)
| photograph:camera
|-
|
| (e)
| book:word
|-
! Solution:
| (b)
| carpenter:wood
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| Random
| Random guessing
| 1 / 5 = 20.0%
| Random
| 20.0%
| 16.1-24.5%
|-
| JC
| Jiang and Conrath (1997)
| Turney (2006b)
| Hybrid
| 27.3%
| 23.1-32.4%
|-
| LIN
| Lin (1998)
| Turney (2006b)
| Hybrid
| 27.3%
| 23.1-32.4%
|-
| LC
| Leacock and Chodrow (1998)
| Turney (2006b)
| Lexicon-based
| 31.3%
| 26.9-36.5%
|-
| HSO
| Hirst and St.-Onge (1998)
| Turney (2006b)
| Lexicon-based
| 32.1%
| 27.6-37.4%
|-
| RES
| Resnik (1995)
| Turney (2006b)
| Hybrid
| 33.2%
| 28.7-38.5%
|-
| PMI-IR
| Turney (2001)
| Turney (2006b)
| Corpus-based
| 35.0%
| 30.2-40.1%
|-
| LSA+Predication
| Mangalath et al. (2004)
| Mangalath et al. (2004)
| Corpus-based
| 42.0%
| 37.2-47.4%
|-
| KNOW-BEST
| Veale (2004)
| Veale (2004)
| Lexicon-based
| 43.0%
| 38.0-48.2%
|-
| ''k''-means
| Bicici and Yuret (2006)
| Bicici and Yuret (2006)
| Corpus-based
| 44.0%
| 39.0-49.3%
|-
| BagPack
| Herdağdelen and Baroni (2009)
| Herdağdelen and Baroni (2009)
| Corpus-based
| 44.1%
| (not stated in paper)
|-
| VSM
| Turney and Littman (2005)
| Turney and Littman (2005)
| Corpus-based
| 47.1%
| 42.2-52.5%
|-
| BMI
| Bollegala et al. (2009)
| Bollegala et al. (2009)
| Corpus-based
| 51.1%
| (not stated in paper)
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 52.1%
| 46.9-57.3%
|-
| PERT
| Turney (2006a)
| Turney (2006a)
| Corpus-based
| 53.5%
| 48.5-58.9%
|-
| LRA
| Turney (2006b)
| Turney (2006b)
| Corpus-based
| 56.1%
| 51.0–61.2%
|-
| Human
| Average US college applicant
| Turney and Littman (2005)
| Human
| 57.0%
| 52.0-62.3%
|-
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with SAT questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 374 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* KNOW-BEST = KNOWledge-Based Entertainment and Scholastic Testing
* VSM = Vector Space Model
* LRA = Latent Relational Analysis
* PERT = Pertinence
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* LSA+Predication = Latent Semantic Analysis + Predication
* BagPack = Bag of words representation of Paired concept knowledge

== References ==

Bicici, E., and Yuret, D. (2006). [http://www.denizyuret.com/pub/tainn-06/LAWSQ-LNCS.pdf Clustering word pairs to answer analogy questions]. ''Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006)''.

Herdağdelen A. and Baroni M. [http://clic.cimec.unitn.it/marco/publications/gems-09/herdagdelen-baroni-gems09.pdf BagPack: A general framework to repre- sent semantic relations]. Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop, East Stroudsburg PA: ACL, 33-40.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Mangalath, P., Quesada, J., and Kintsch, W. (2004). [http://www.josequesada.name/papers/Mangalath-Quesada-2004-analogyPredicationCogSciPoster1.pdf Analogy-making as predication using relational information and LSA vectors]. In K.D. Forbus, D. Gentner & T. Regier (Eds.), ''Proceedings of the 26th Annual Meeting of the Cognitive Science Society''. Chicago: Lawrence Erlbaum Associates.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D., and Littman, M.L. (2005). [http://arxiv.org/abs/cs.LG/0508103 Corpus-based learning of analogies and semantic relations]. ''Machine Learning'', 60 (1-3), 251-278.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D. (2006a). [http://arxiv.org/abs/cs.CL/0607120 Expressing implicit semantic relations without supervision]. ''Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (Coling/ACL-06)'', Sydney, Australia, pp. 313-320.

Turney, P.D. (2006b). [http://arxiv.org/abs/cs.CL/0608100 Similarity of semantic relations]. ''Computational Linguistics'', 32 (3), 379-416.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

Veale, T. (2004). [http://afflatus.ucd.ie/Papers/ecai2004.pdf WordNet sits the SAT: A knowledge-based approach to lexical analogy]. ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', pp. 606–612, Valencia, Spain.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[TOEFL Synonym Questions]]
* [[State of the art]]

[[Category:State of the art]]

SAT Analogy Questions (State of the art)

2010-04-29T21:31:21Z

David.jurgens: Updated with Herdağdelen and Baroni's result

* SAT = Scholastic Aptitude Test
* 374 multiple-choice analogy questions; 5 choices per question
* SAT questions collected by [http://www.cs.rutgers.edu/~mlittman/ Michael Littman], available from [http://www.apperceptual.com/ Peter Turney]
* introduced in Turney et al. (2003) as a way of evaluating algorithms for measuring relational similarity

== Sample question ==

::{| border="0" cellpadding="1" cellspacing="1"
|-
! Stem:
|
| mason:stone
|-
! Choices:
| (a)
| teacher:chalk
|-
|
| (b)
| carpenter:wood
|-
|
| (c)
| soldier:gun
|-
|
| (d)
| photograph:camera
|-
|
| (e)
| book:word
|-
! Solution:
| (b)
| carpenter:wood
|}

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! Algorithm
! Reference for algorithm
! Reference for experiment
! Type
! Correct
! 95% confidence
|-
| Random
| Random guessing
| 1 / 5 = 20.0%
| Random
| 20.0%
| 16.1-24.5%
|-
| JC
| Jiang and Conrath (1997)
| Turney (2006b)
| Hybrid
| 27.3%
| 23.1-32.4%
|-
| LIN
| Lin (1998)
| Turney (2006b)
| Hybrid
| 27.3%
| 23.1-32.4%
|-
| LC
| Leacock and Chodrow (1998)
| Turney (2006b)
| Lexicon-based
| 31.3%
| 26.9-36.5%
|-
| HSO
| Hirst and St.-Onge (1998)
| Turney (2006b)
| Lexicon-based
| 32.1%
| 27.6-37.4%
|-
| RES
| Resnik (1995)
| Turney (2006b)
| Hybrid
| 33.2%
| 28.7-38.5%
|-
| PMI-IR
| Turney (2001)
| Turney (2006b)
| Corpus-based
| 35.0%
| 30.2-40.1%
|-
| LSA+Predication
| Mangalath et al. (2004)
| Mangalath et al. (2004)
| Corpus-based
| 42.0%
| 37.2-47.4%
|-
| KNOW-BEST
| Veale (2004)
| Veale (2004)
| Lexicon-based
| 43.0%
| 38.0-48.2%
|-
| ''k''-means
| Bicici and Yuret (2006)
| Bicici and Yuret (2006)
| Corpus-based
| 44.0%
| 39.0-49.3%
|-
| BagPack
| Herdağdelen and Baroni (2009)
| Herdağdelen and Baroni (2009)
| Corpus-based
| 44.1%
| (not stated in paper)
|-
| VSM
| Turney and Littman (2005)
| Turney and Littman (2005)
| Corpus-based
| 47.1%
| 42.2-52.5%
|-
| PairClass
| Turney (2008)
| Turney (2008)
| Corpus-based
| 52.1%
| 46.9-57.3%
|-
| PERT
| Turney (2006a)
| Turney (2006a)
| Corpus-based
| 53.5%
| 48.5-58.9%
|-
| LRA
| Turney (2006b)
| Turney (2006b)
| Corpus-based
| 56.1%
| 51.0–61.2%
|-
| Human
| Average US college applicant
| Turney and Littman (2005)
| Human
| 57.0%
| 52.0-62.3%
|-
|}

== Explanation of table ==

* '''Algorithm''' = name of algorithm
* '''Reference for algorithm''' = where to find out more about given algorithm
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with SAT questions
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
* '''Correct''' = percent of 374 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test]
* table rows sorted in order of increasing percent correct
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
* KNOW-BEST = KNOWledge-Based Entertainment and Scholastic Testing
* VSM = Vector Space Model
* LRA = Latent Relational Analysis
* PERT = Pertinence
* PMI-IR = Pointwise Mutual Information - Information Retrieval
* LSA+Predication = Latent Semantic Analysis + Predication
* BagPack = Bag of words representation of Paired concept knowledge

== References ==

Bicici, E., and Yuret, D. (2006). [http://www.denizyuret.com/pub/tainn-06/LAWSQ-LNCS.pdf Clustering word pairs to answer analogy questions]. ''Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006)''.

Herdağdelen A. and Baroni M. [http://clic.cimec.unitn.it/marco/publications/gems-09/herdagdelen-baroni-gems09.pdf BagPack: A general framework to repre- sent semantic relations]. Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop, East Stroudsburg PA: ACL, 33-40.

Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.

Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.

Mangalath, P., Quesada, J., and Kintsch, W. (2004). [http://www.josequesada.name/papers/Mangalath-Quesada-2004-analogyPredicationCogSciPoster1.pdf Analogy-making as predication using relational information and LSA vectors]. In K.D. Forbus, D. Gentner & T. Regier (Eds.), ''Proceedings of the 26th Annual Meeting of the Cognitive Science Society''. Chicago: Lawrence Erlbaum Associates.

Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.

Turney, P.D., and Littman, M.L. (2005). [http://arxiv.org/abs/cs.LG/0508103 Corpus-based learning of analogies and semantic relations]. ''Machine Learning'', 60 (1-3), 251-278.

Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.

Turney, P.D. (2006a). [http://arxiv.org/abs/cs.CL/0607120 Expressing implicit semantic relations without supervision]. ''Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (Coling/ACL-06)'', Sydney, Australia, pp. 313-320.

Turney, P.D. (2006b). [http://arxiv.org/abs/cs.CL/0608100 Similarity of semantic relations]. ''Computational Linguistics'', 32 (3), 379-416.

Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.

Veale, T. (2004). [http://afflatus.ucd.ie/Papers/ecai2004.pdf WordNet sits the SAT: A knowledge-based approach to lexical analogy]. ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', pp. 606–612, Valencia, Spain.

== See also ==

* [[Attributional and Relational Similarity (State of the art)]]
* [[TOEFL Synonym Questions]]
* [[State of the art]]

[[Category:State of the art]]