Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners

Lianet Sepúlveda Torres, Magali Sanches Duran, Sandra Aluísio


Abstract
Portuguese is a less resourced language in what concerns foreign language learning. Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners. Each item of the artificially generated lexicon contains, besides the wrong word, the respective Spanish and Portuguese correct words. The wrong word is used to identify the interlanguage error and the correct Spanish and Portuguese forms are used to generate the suggestions. Keeping control of the correct word forms, we can provide correction or, at least, useful suggestions for the learners. We propose to combine two automatic procedures to obtain the error correction: i) a similarity measure and ii) a translation algorithm based on aligned parallel corpus. The similarity-based method achieved a precision of 52%, whereas the alignment-based method achieved a precision of 90%. In this paper we focus only on interlanguage errors involving suffixes that have different forms in both languages. The approach, however, is very promising to tackle other types of errors, such as gender errors.
Anthology ID:
L14-1231
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3952–3957
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/247_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Lianet Sepúlveda Torres, Magali Sanches Duran, and Sandra Aluísio. 2014. Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3952–3957, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners (Sepúlveda Torres et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/247_Paper.pdf