A stream computing approach towards scalable NLP

Xabier Artola, Zuhaitz Beloki, Aitor Soroa


Abstract
Computational power needs have grown dramatically in recent years. This is also the case in many language processing tasks, due to overwhelming quantities of textual information that must be processed in a reasonable time frame. This scenario has led to a paradigm shift in the computing architectures and large-scale data processing strategies used in the NLP field. In this paper we describe a series of experiments carried out in the context of the NewsReader project with the goal of analyzing the scaling capabilities of the language processing pipeline used in it. We explore the use of Storm in a new approach for scalable distributed language processing across multiple machines and evaluate its effectiveness and efficiency when processing documents on a medium and large scale. The experiments have shown that there is a big room for improvement regarding language processing performance when adopting parallel architectures, and that we might expect even better results with the use of large clusters with many processing nodes.
Anthology ID:
L14-1528
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
8–13
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/670_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Xabier Artola, Zuhaitz Beloki, and Aitor Soroa. 2014. A stream computing approach towards scalable NLP. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 8–13, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A stream computing approach towards scalable NLP (Artola et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/670_Paper.pdf