Learning Based Java for Rapid Development of NLP Systems

Nick Rizzolo, Dan Roth


Abstract
Today's natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components or to assign simultaneously the values that would be assigned by multiple components with an expressive, data dependent structure among them. As a result, the design of systems with multiple learning components is inevitably quite technically complex, and implementations of conceptually simple NLP systems can be time consuming and prone to error. Our new modeling language, Learning Based Java (LBJ), facilitates the rapid development of systems that learn and perform inference. LBJ has already been used to build state of the art NLP systems. In this paper, we first demonstrate that there exists a theoretical model that describes most NLP approaches adeptly. Second, we show how our improvements to the LBJ language enable the programmer to describe the theoretical model succinctly. Finally, we introduce the concept of data driven compilation, a translation process in which the efficiency of the generated code benefits from the data given as input to the learning algorithms.
Anthology ID:
L10-1517
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/747_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Nick Rizzolo and Dan Roth. 2010. Learning Based Java for Rapid Development of NLP Systems. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Learning Based Java for Rapid Development of NLP Systems (Rizzolo & Roth, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/747_Paper.pdf