The Common Orthographic Vocabulary of the Portuguese Language: a set of open lexical resources for a pluricentric language

José Pedro Ferreira, Maarten Janssen, Gladis Barcellos de Oliveira, Margarita Correia, Gilvan Müller de Oliveira


Abstract
This paper outlines the design principles and choices, as well as the ongoing development process of the Common Orthographic Vocabulary of the Portuguese Language (VOC), a large scale electronic lexical database which was adopted by the Community of Portuguese-Speaking Countries' (CPLP) Instituto Internacional da Língua Portuguesa to implement a spelling reform that is currently taking place. Given the different available resources and lexicographic traditions within the CPLP countries, a range of different solutions was adopted for different countries and integrated into a common development framework. Although the publication of lexicographic resources to implement spelling reforms has always been done for Portuguese, VOC represents a paradigm change, switching from idiosyncratic, closed source, paper-format official resources to standardized, open, free, web-accessible and reusable ones. We start by outlining the context that justifies the resource development and its requirements, then focusing on the description of the methodology, workflow and tools used, showing how a collaborative project in a common web-based platform and administration interface make the creation of such a long-sought and ambitious project possible.
Anthology ID:
L12-1616
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1071–1075
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1034_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
José Pedro Ferreira, Maarten Janssen, Gladis Barcellos de Oliveira, Margarita Correia, and Gilvan Müller de Oliveira. 2012. The Common Orthographic Vocabulary of the Portuguese Language: a set of open lexical resources for a pluricentric language. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1071–1075, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
The Common Orthographic Vocabulary of the Portuguese Language: a set of open lexical resources for a pluricentric language (Ferreira et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1034_Paper.pdf