Building a Node of the Accessible Language Technology Infrastructure

Bartosz Broda, Michał Marcińczuk, Maciej Piasecki


Abstract
A limited prototype of the CLARIN Language Technology Infrastructure (LTI) node is presented. The node prototype provides several types of web services for Polish. The functionality encompasses morpho-syntactic processing, shallow semantic processing of corpus on the basis of the SuperMatrix system and plWordNet browsing. We take the prototype as the starting point for the discussion on requirements that must be fulfilled by the LTI. Some possible solutions are proposed for less frequently discussed problems, e.g. streaming processing of language data on the remote processing node. We experimentally investigate how to tackle with several requirements from many discussed. Such aspects as processing large volumes of data, asynchronous mode of processing and scalability of the architecture to large number of users got especial attention in the constructed prototype of the Web Service for morpho-syntactic processing of Polish called TaKIPI-WS (http://plwordnet.pwr.wroc.pl/clarin/ws/takipi/). TaKIPI-WS is a distributed system with a three-layer architecture, an asynchronous model of request handling and multi-agent-based processing. TaKIPI-WS consists of three layers: WS Interface, Database and Daemons. The role of the Database is to store and exchange data between the Interface and the Daemons. The Daemons (i.e. taggers) are responsible for executing the requests queued in the database. Results of the performance tests are presented in the paper, too.
Anthology ID:
L10-1477
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/690_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Bartosz Broda, Michał Marcińczuk, and Maciej Piasecki. 2010. Building a Node of the Accessible Language Technology Infrastructure. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Building a Node of the Accessible Language Technology Infrastructure (Broda et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/690_Paper.pdf