Bigger analogy test set (State of the art)

From ACL Wiki
Revision as of 03:52, 6 January 2017 by Anna gladkova (talk | contribs) (This page lists published results on Bigger Analogy Test Set (BATS))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Dataset description

  • New dataset proposed by Gladkova et al. (2016) [1]
  • dataset balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics)
  • 10 relations of each type, 50 unique pairs per category
  • 99,200 questions in total
  • more challenging than the Google set because of more diverse relations
  • where applicable, more than one correct answer is supplied (e.g. both canine and animal are hypernyms of dog).
  • comes with a testing script a testing script that implements 5 methods of solving analogies (See Analogy (State of the art))

This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2].

Table of results

  • Listed in chronological order.
Model Reference Inflectional
morphology
Derivational
morphology
Lexicographic
semantics
Encyclopedic
semantics
Corpus, window size, vector size
SVD Drozd et al. (2016) [3] 44.0 9.8 10.1 18.5 5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions
GloVe Drozd et al. (2016) [3] 59.9 10.2 10.9 31.5 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions
Skip-Gram Drozd et al. (2016) [3] 61.0 11.2 9.1 26.5 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions


References

  1. Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
  2. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
  3. 3.0 3.1 3.2 Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf