7th Web as Corpus Workshop

Event Notification Type: 
Call for Papers
Abbreviated Title: 
Tuesday, 17 April 2012
Contact Email: 
Serge Sharoff
Submission Deadline: 
Sunday, 22 January 2012

To be held in association with WWW2012 in Lyon, France, 17th April 2012

Sponsored by ACL SIGWAC, http://www.sigwac.org.uk

More and more people are using Web data for linguistic and NLP research: the Web provides an easy
source of linguistic data in a great variety of languages. However, a ‘crawl’ is not ready for exploration
in the same way a traditional ‘corpus’ is. We need to turn a crawl into a corpus. The workshop, the seventh
in an annual series, provides a venue for exploring what it involves, how to do it, and what we find out if we do.

We invite submissions which:
* describe Web corpus collection projects, or modules for one part of the process (crawling, filtering, de-duplication, language-id, tokenising, indexing, ...)
* explore characteristics of Web data from a linguistics/NLP perspective including registers, domains, frequency distributions, comparisons between datasets
* use crawled Web data for NLP purposes (with emphasis on the data rather than the use)

The previous WAC workshops have been co-located with various conferences in computational linguistics. This time the workshop co-locates with WWW2012, the main world conference on the Web technologies and their impact on the society.

== Deadlines ==
* Submission by '''January 22 2012,''' to be made through https://www.easychair.org/conferences/?conf=wac7
* Notification of acceptance by February 3
* Camera-ready copy due February 15

== Organising committee ==
* Adam Kilgarriff (Lexical Computing Ltd.)
* Jan Pomikalek (Masaryk University)
* Serge Sharoff (University of Leeds, Workshop Chair)