CALCS 2021 Second Call for Papers
Multilingual speakers will often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CSW). CSW is typically present on the intersentential, intrasentential and even morphological levels. CSW presents serious challenges for language technologies such as Machine Translation (MT), Automatic Speech Recognition (ASR), language generation (LG), information retrieval (IR) and extraction (IE), and semantic processing. Traditional techniques trained for one language quickly break down when there is input mixed in from another. Recent work has shown that even powerful multilingual models, such as multilingual BERT, yield subpar performance on CSW data (cf. Aguilar and Solorio, 2020).
Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.
Topics of interest
The workshop invites contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop include but are not limits to:
Development of linguistic resources to support research on code-switched data;
NLP approaches for any of language identification/named entity recognition/sentiment analysis/machine translation/language generation in code-switched data;
NLP techniques for the syntactic analysis of code-switched data;
Domain/dialect/genre adaptation techniques applied to code-switched data processing;
Language modeling for code-switched data;
Crowdsourcing approaches for the annotation of code-switched data;
Position papers discussing the challenges of code-switched data to NLP and speech technology;
Methods for improving ASR and TTS in code-switched data;
Dialogue systems for code-switched languages;
Code-switched spoken language understanding;
Survey papers of NLP research for code-switched data;
Sociolinguistic and/or sociopragmatic aspects of code-switching.
*NEW* Rising Stars Track *NEW*
We also invite non-archival one page abstracts of recently published work highlighting the CSW research by young researchers or early career investigators. The goal is to help increase the visibility of PhD students, Postdocs and early career investigators (loosely defined) working in the space of language technology for CSW.
Submission Formats
We welcome long and short paper submissions, as well as one page abstracts for the rising stars track mentioned above.
Important Dates:
Workshop submission deadline (long, short and special track): March 15th
Notification of acceptance: April 15th
Workshop date: June 11th
Organizing Committee:
Alan Black, Carnegie Mellon University
Mona Diab, Facebook and George Washington University
Sunayana Sitaram, Microsoft Research India
Thamar Solorio, University of Houston
Victor Soto, Amazon Alexa
Emre Yilmaz, SRI International
Contact email:calcsworkshops [at] gmail.com