Typing Race Games as a Method to Create Spelling Error Corpora

Paul Rodrigues, C. Anton Rytting


Abstract
This paper presents a method to elicit spelling error corpora using an online typing race game. After being tested for their native language, English-native participants were instructed to retype stimuli as quickly and as accurately as they could. The participants were informed that the system was keeping a score based on accuracy and speed, and that a high score would result in a position on a public scoreboard. Words were presented on the screen one at a time from a queue, and the queue was advanced by pressing the ENTER key following the stimulus. Responses were recorded and compared to the original stimuli. Responses that differed from the stimuli were considered a typographical or spelling error, and added to an error corpus. Collecting a corpus using a game offers several unique benefits. 1) A game attracts engaged participants, quickly. 2) The web-based delivery reduces the cost and decreases the time and effort of collecting the corpus. 3) Participants have fun. Spelling error corpora have been difficult and expensive to obtain for many languages and this research was performed to fill this gap. In order to evaluate the methodology, we compare our game data against three existing spelling corpora for English.
Anthology ID:
L12-1316
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3019–3024
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/559_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Paul Rodrigues and C. Anton Rytting. 2012. Typing Race Games as a Method to Create Spelling Error Corpora. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3019–3024, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Typing Race Games as a Method to Create Spelling Error Corpora (Rodrigues & Rytting, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/559_Paper.pdf