OPI-JSA at SemEval-2017 Task 1: Application of Ensemble learning for computing semantic textual similarity

Martyna Śpiewak, Piotr Sobecki, Daniel Karaś


Abstract
Semantic Textual Similarity (STS) evaluation assesses the degree to which two parts of texts are similar, based on their semantic evaluation. In this paper, we describe three models submitted to STS SemEval 2017. Given two English parts of a text, each of proposed methods outputs the assessment of their semantic similarity. We propose an approach for computing monolingual semantic textual similarity based on an ensemble of three distinct methods. Our model consists of recursive neural network (RNN) text auto-encoders ensemble with supervised a model of vectorized sentences using reduced part of speech (PoS) weighted word embeddings as well as unsupervised a method based on word coverage (TakeLab). Additionally, we enrich our model with additional features that allow disambiguation of ensemble methods based on their efficiency. We have used Multi-Layer Perceptron as an ensemble classifier basing on estimations of trained Gradient Boosting Regressors. Results of our research proves that using such ensemble leads to a higher accuracy due to a fact that each member-algorithm tends to specialize in particular type of sentences. Simple model based on PoS weighted Word2Vec word embeddings seem to improve performance of more complex RNN based auto-encoders in the ensemble. In the monolingual English-English STS subtask our Ensemble based model achieved mean Pearson correlation of .785 compared with human annotators.
Anthology ID:
S17-2018
Volume:
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Steven Bethard, Marine Carpuat, Marianna Apidianaki, Saif M. Mohammad, Daniel Cer, David Jurgens
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
139–143
Language:
URL:
https://aclanthology.org/S17-2018
DOI:
10.18653/v1/S17-2018
Bibkey:
Cite (ACL):
Martyna Śpiewak, Piotr Sobecki, and Daniel Karaś. 2017. OPI-JSA at SemEval-2017 Task 1: Application of Ensemble learning for computing semantic textual similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 139–143, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
OPI-JSA at SemEval-2017 Task 1: Application of Ensemble learning for computing semantic textual similarity (Śpiewak et al., SemEval 2017)
Copy Citation:
PDF:
https://aclanthology.org/S17-2018.pdf