Jmp8 at SemEval-2017 Task 2: A simple and general distributional approach to estimate word similarity

Josué Melka, Gilles Bernard


Abstract
We have built a simple corpus-based system to estimate words similarity in multiple languages with a count-based approach. After training on Wikipedia corpora, our system was evaluated on the multilingual subtask of SemEval-2017 Task 2 and achieved a good level of performance, despite its great simplicity. Our results tend to demonstrate the power of the distributional approach in semantic similarity tasks, even without knowledge of the underlying language. We also show that dimensionality reduction has a considerable impact on the results.
Anthology ID:
S17-2035
Volume:
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Steven Bethard, Marine Carpuat, Marianna Apidianaki, Saif M. Mohammad, Daniel Cer, David Jurgens
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
230–234
Language:
URL:
https://aclanthology.org/S17-2035
DOI:
10.18653/v1/S17-2035
Bibkey:
Cite (ACL):
Josué Melka and Gilles Bernard. 2017. Jmp8 at SemEval-2017 Task 2: A simple and general distributional approach to estimate word similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 230–234, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Jmp8 at SemEval-2017 Task 2: A simple and general distributional approach to estimate word similarity (Melka & Bernard, SemEval 2017)
Copy Citation:
PDF:
https://aclanthology.org/S17-2035.pdf
Code
 yoch/jmp8