The 5th Web as Corpus Workshop

Event Notification Type: 
Call for Papers
Abbreviated Title: 
WAC5
Monday, 7 September 2009
Country: 
Spain
City: 
San Sebastian
Submission Deadline: 
Friday, 17 April 2009

Full Title: The 5th Web as Corpus Workshop
Short Title: WAC5

Date: 07-Sep-2009 - 07-Sep-2009
Location: San Sebastian, Spain
Contact Person: Igor Leturia
Meeting Email: igor@elhuyar.com
Web Site: http://www.sigwac.org.uk/wiki/WAC5

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics

Call Deadline: 17-Apr-2009

Meeting Description:

The workshop will be held on 7 September, 2009, in San Sebastian, preceding
SEPLN, the Spanish NLP conference: http://ixa2.si.ehu.es/sepln2009/

Call for Papers

We invite papers on various topics concerning the use of Web resources for
corpus research and NLP applications, including (but not limited to) the
following:

- linguistic Web crawler technology and Web corpus collection projects
- applications of Web-derived corpora and other kinds of Web data
- how far does the "easy way" get you? (using search engines, or Google's
n-gram
lists; we are particularly interested in a critical discussion of the
usefulness
and limitations of such approaches)
- methods and tools for "cleaning" Web pages to turn them into a corpus
- automatic linguistic annotation of Web data: tokenisation, POS tagging,
lemmatisation, semantic tagging, etc. (established tools often perform very
poorly on Web data)
- search engine architectures for linguists: bringing linguistics to
commercial
search engines, or high-performance search technology to linguistics?
- search engine-related topics such as result ranking (e.g. how to identify
"typical" uses rather than returning 50 very similar matches on the first
page)
- duplicate detection, interactive query refinement, etc.
- reviews and clever uses of search engine APIs (Google, Yahoo, Altavista,
and
in particular Microsoft's current generous Live Search API)

We particularly welcome submissions on the use of languages other than
English.
One of the bottlenecks in corpus linguistic research on a particular
language
consists in availability of corpora for this language: translation studies
for,
say, Ukrainian or Vietnamese are limited by the existence of diverse corpora
for
these languages. The Web gives the opportunity to alleviate this bottleneck,
as
millions of Ukrainian or Vietnamese texts are available on the Web, but we
still
do not know many parameters of what is there and how useful it is for
translation, language teaching, linguistics research, etc.

Submission Information:
Authors are invited to submit full papers on original, unpublished work in
the
topic area of this workshop. Submissions should follow the format of ACL
proceedings and should not exceed eight (8) pages, including references. We
strongly recommend the use of ACL LaTeX or Microsoft Word style files
tailored
for this year's conference
(http://www.acl-ijcnlp-2009.org/main/authors/stylefiles/).

Submissions are managed via Easy Chair. In order to submit a paper, login at
http://www.easychair.org/conferences/?conf=wac5 (or register an account with
Easy Chair if you don't have one yet), then click New Submission and fill in
the
standard fields.

Important Dates:
- Submission deadline: 17 April, 2009
- Decisions sent by: 12 June, 2009
- Camera-ready submission deadline: 17 July, 2009
- Welcome party: 6 September, 2009
- Workshop: 7 September, 2009

Programme Committee:
- Silvia Bernardini, U of Bologna, Italy
- Massimiliano Ciaramita, Yahoo! Research Barcelona, Spain
- Jesse de Does, INL, Netherlands
- Katrien Depuydt, INL, Netherlands
- Stefan Evert, U of Osnabrück, Germany
- Cédrick Fairon, UCLouvain, Belgium
- William Fletcher, U.S. Naval Academy, USA
- Gregory Grefenstette, Commissariat à l'Énergie Atomique, France
- Péter Halácsy, Budapest U of Technology and Economics, Hungary
- Katja Hofmann, U of Amsterdam, Netherlands
- Adam Kilgarriff, Lexical Computing Ltd, UK
- Igor Leturia, Elhuyar Fundazioa, Basque Country, Spain
- Preslav Nakov, National U of Singapore
- Phil Resnik, U of Maryland, College Park, USA
- Kevin Scannell, Saint Louis U, USA
- Gilles-Maurice de Schryver, U Gent, Belgium
- Klaus Schulz, LMU München, Germany
- Serge Sharoff, U of Leeds, UK
- Eros Zanchetta, U of Bologna, Italy

Organising Committee:
- Stefan Evert, University of Osnabrück
- Igor Leturia, Elhuyar Fundazioa
- Serge Sharoff, University of Leeds