Building the Australian National Corpus: Data Sources and Tools

Event Notification Type: 
Call for Papers
Thursday, 3 December 2009
Submission Deadline: 
Monday, 28 September 2009

Call for Papers - HCSNet SummerFest Workshop on Building the Australian National Corpus: Data Sources and Tools

Following on from the successful Designing the Australian National Corpus workshop held during SummerFest 2008, HCSNet (the ARC Research Network in Human Communication) is organising a workshop on Building the Australian National Corpus to be held as part of SummerFest 2009.

This workshop focuses on current developments and emerging possibilities in language data gathering and tools that might complement existing collections of language data in building the Australian National Corpus. While sources like the World Wide Web have enormous potential there are numerous challenges facing those wishing to draw out linguistically relevant data from the Web. In constructing the Australian National Corpus, then, a wider range of data sources needs to be drawn upon. The aim of the workshop is thus to bring together researchers with expertise in corpus and web linguistics along with corpus building and annotation in a single forum in order to work towards strategies to capitalize on the potential for language data in existing as well as new collections to be incorporated into the Australian National Corpus in a principled manner.

Topics include but are not limited to:
*Using the web as a source of corpus data
*Bringing existing data collections into the AusNC
*Proposed models for collecting new data
*Legal issues for data collection and sharing technical infrastructure and requirements
*Long term curation of the AusNC

Intended Audience:
The aim of the workshop is to bring together researchers in linguistics and language technology who build and use corpora together with experts in web and search technologies.

Submissions for presentation at the workshop are sought. Topics of interest include but are not limited to:
*corpus building
*corpus-based research
*web-based research
*web linguistics
*web for corpus
*computer-mediated communication
*search technologies
*linguistic and multimodal annotation
*corpus interrogation

For more information and submissions please visit:
Please contact Michael Haugh (, Steve Cassidy (, Diego Molla-Aliod ( for further details.