Call My Net 2: A New Resource for Speaker Recognition

Karen Jones, Stephanie Strassel, Kevin Walker, Jonathan Wright


Abstract
We introduce the Call My Net 2 (CMN2) Corpus, a new resource for speaker recognition featuring Tunisian Arabic conversations between friends and family, incorporating both traditional telephony and VoIP data. The corpus contains data from over 400 Tunisian Arabic speakers collected via a custom-built platform deployed in Tunis, with each speaker making 10 or more calls each lasting up to 10 minutes. Calls include speech in various realistic and natural acoustic settings, both noisy and non-noisy. Speakers used a variety of handsets, including landline and mobile devices, and made VoIP calls from tablets or computers. All calls were subject to a series of manual and automatic quality checks, including speech duration, audio quality, language identity and speaker identity. The CMN2 corpus has been used in two NIST Speaker Recognition Evaluations (SRE18 and SRE19), and the SRE test sets as well as the full CMN2 corpus will be published in the Linguistic Data Consortium Catalog. We describe CMN2 corpus requirements, the telephone collection platform, and procedures for call collection. We review properties of the CMN2 dataset and discuss features of the corpus that distinguish it from prior SRE collection efforts, including some of the technical challenges encountered with collecting VoIP data.
Anthology ID:
2020.lrec-1.816
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6621–6626
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.816
DOI:
Bibkey:
Cite (ACL):
Karen Jones, Stephanie Strassel, Kevin Walker, and Jonathan Wright. 2020. Call My Net 2: A New Resource for Speaker Recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6621–6626, Marseille, France. European Language Resources Association.
Cite (Informal):
Call My Net 2: A New Resource for Speaker Recognition (Jones et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.816.pdf