The first ACL Workshop on Computation and Written Language (CAWL) will be held in conjunction with ACL 2023 in Toronto, Canada, on July 14th 2023. It will feature invited talks by Mark Aronoff (Stony Brook University) and Amalia Gnanadesikan (University of Maryland, College Park). We welcome submissions of scientific papers to be presented at the conference and archived in the ACL Anthology. Please see explicit submission guidelines below.
Most work in NLP focuses on language in its canonical written form. This has often led researchers to ignore the differences between written and spoken language or, worse, to conflate the two. Instances of conflation are statements like “Chinese is a logographic language" or “Persian is a right-to-left language", variants of which can be found frequently in the ACL anthology. These statements confuse properties of the language with properties of its writing system. Ignoring differences between written and spoken language leads, among other things, to conflating different words that are spelled the same (e.g., English bass), or treating as different, words that have multiple spellings (e.g., Japanese umai ‘tasty’, which can be written 旨い, うまい, ウマい, or 美味い).
Furthermore, methods for dealing with written language issues (e.g., various kinds of normalization or conversion) or for recognizing text input (e.g. OCR & handwriting recognition or text entry methods) are often regarded as precursors to NLP rather than as fundamental parts of the enterprise, despite the fact that most NLP methods rely centrally on representations derived from text rather than (spoken) language. This general lack of consideration of writing has led to much of the research on such topics to largely appear outside of ACL venues, in conferences or journals of neighboring fields such as speech technology (e.g., text normalization) or human-computer interaction (e.g., text entry).
This workshop will bring together researchers who are interested in the relationship between written and spoken language, the properties of written language, the ways in which writing systems encode language, and applications specifically focused on characteristics of writing systems. Topics of interest include but are not limited to:
- Text entry
- Text tokenization
- Disambiguation of abbreviations and homographs
- Grapheme-to-phoneme conversion, transliteration, and diacritization
- Text normalization for speech and for processing "informal" genres of text
- Computational study of literary devices involving writing systems, such as eye dialect
- Information-theoretic and machine-learning approaches to decipherment
- Methods for specialized text genres, e.g., clinical notes
- Optical character (incl. handwriting) recognition and historical document processing
- Orthographic representation for unwritten languages
- Spelling error detection and correction
- Script normalization and encoding
- Writing system typology and its relevance to speech and language processing
We invite submissions on the relationship between written and spoken language, the properties of written language, the ways in which writing systems encode language, and applications specifically focused on characteristics of writing systems.
Important dates:
Paper submission deadline: April 24, 2023
Notification of acceptance: May 22, 2023
Camera-ready paper due: June 6, 2023
Workshop date: July 14, 2023
Submission Guidelines
Please submit short (4 page) or long (8 page) submissions in PDF format to https://softconf.com/acl2023/cawl/. Both short and long paper submissions will be reviewed in the same process. Authors should follow the formatting guidelines of ACL 2023 (https://2023.aclweb.org/calls/style_and_formatting/), and we will follow the paper submission and reviewing policies detailed in the ACL call for papers (https://2023.aclweb.org/calls/main_conference/), although we do not require an explicit responsible NLP checklist. Note that, as with the main conference, reviewing is double-anonymous, i.e., reviewers will not know author identity and vice versa, hence no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymised. Accepted papers will appear in the workshop proceedings in the ACL anthology.
For questions about the submission guidelines, please contact workshop organizers at cawl.workshop.2023@gmail.com.
Organizers:
- Kyle Gorman (Graduate Center, City University of New York, USA)
- Brian Roark (Google, USA)
- Richard Sproat (Google, Japan)
Program Committee:
- Manex Agirrezabal, University of Copenhagen, Denmark
- Sina Ahmadi, George Mason University, USA
- Cecilia Alm, Rochester Institute of Technology, USA
- Steven Bedrick, Oregon Health & Science University, USA
- Taylor Berg-Kirkpatrick, UC San Diego, USA
- Steven Bird, Charles Darwin University, Australia
- Dan Garrette, Google, USA
- Alexander Gutkin, Google, UK
- Nizar Habash, NYU Abu Dhabi, United Arab Emirates
- Yannis Haralambous, IMT Atlantique & CNRS Lab-STICC, France
- Cassandra Jacobs, University of Buffalo, USA
- George Kiraz, Princeton University, USA
- Christo Kirov, Google, USA
- Grzegorz Kondrak, University of Alberta, Canada
- Martin Jansche, Amazon, UK
- Yang Li, Northwestern Polytechnical University, China
- Zoey Liu, University of Florida, USA
- Gerald Penn, University of Toronto, Canada
- Yuval Pinter, Ben-Gurion University of the Negev, Israel
- William Poser, independent scholar, Canada
- Emily Prud’hommeaux, Boston College, USA
- Shruti Rijhwani, Google, USA
- Maria Ryskina, MIT, USA
- Lane Schwartz, University of Alaska, Fairbanks, USA
- Djamé Seddah, Sorbonne University & Inria, France
- Shuming Shi, Tencent, China
- David Smith, Northeastern University, USA
- Kumiko Tanaka-Ishii, University of Tokyo, Japan
- Annalu Waller, University of Dundee, UK
- Shumin Zhai, Google, USA