FreP: An electronic tool for extracting frequency information of phonological units from Portuguese written text

S. Frota, M. Vigário, F. Martins


Abstract
The importance of frequency for phonological phenomena has long been noticed in the literature. However, frequency information available for phonological units in Portuguese is scarce, non-replicable, corpus dependent, and hard to obtain due to the non-existence of a free tool for public use. This paper describes FreP, a new electronic tool that provides frequency counts of phonological units at the word-level and below from Portuguese written text: namely, major classes of segments, syllables and syllable types, phonological clitics, clitic types and size, prosodic words and their shape, word stress location, and syllable type by position within the word and/or status relative to word stress. Useful applications of FreP in general linguistics, phonology, language acquisition and development, speech evaluation and therapy are also described. Forthcoming extensions of the tool include the ability to extract frequency information for different varieties of Portuguese, Brazilian Portuguese in particular, and the ability to provide a SAMPA output from the written text, together with the frequency of segmental features, like manner, place of articulation and laryngeal features. Updated information on FreP can be found at http://www.fl.ul.pt/LaboratorioFonetica/FreP.
Anthology ID:
L06-1261
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/441_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
S. Frota, M. Vigário, and F. Martins. 2006. FreP: An electronic tool for extracting frequency information of phonological units from Portuguese written text. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
FreP: An electronic tool for extracting frequency information of phonological units from Portuguese written text (Frota et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/441_pdf.pdf