3rd Workshop on Analytics for Noisy Unstructured Text Data

Event Notification Type: 
Call for Papers
Abbreviated Title: 
AND 2009
Thursday, 23 July 2009 to Friday, 24 July 2009
Country: 
Spain
City: 
Barcelona
Submission Deadline: 
Monday, 4 May 2009

**** NOTE: SUBMISSION DEADLINE EXTENDED TO MAY 4 ***

Full Title: 3rd Workshop on Analytics for Noisy Unstructured Text Data
Short Title: AND 2009

Date: 23-Jul-2009 - 24-Jul-2009
Location: Barcelona, Spain
Contact Person: Venkata Subramaniam
Meeting Email: lvsubram [at] in.ibm.com
Web Site: http://and2009workshop.googlepages.com

Linguistic Field(s): Computational Linguistics; Discourse Analysis;
Morphology;
Text/Corpus Linguistics; Translation

Call Deadline: 20-Apr-2009

Meeting Description:

AND 2009 is a workshop devoted to issues arising from the need to contend
with
noisy inputs, the impact noise can have on downstream applications, and the
demands it places on document analysis. The Third Workshop on Analytics for
Noisy Unstructured Text Data will build on two previous successful AND
workshops
held in 2007 (in conjunction with the 20th International Joint Conference on
Artificial Intelligence) and in 2008 (in conjunction with the 31st Annual
International ACM SIGIR Conference).

Call for Papers

Noisy unstructured text data is ubiquitous in real-world communications.
Text
produced by processing signals intended for human use such as printed/
handwritten documents, spontaneous speech, and camera-captured scene images,
are
prime examples. Telephonic conversations between call center agents and
customers often see 30-40% word error rates, even using state-ofthe-art ASR.
OCR
error rates for hardcopy documents can range widely from 2-3% for clean
inputs
to 50% or higher depending on the quality of the page image, the complexity
of
the layout, aspects of the typography, etc. Individual variabilities in
handwriting make this a particularly difficult form of input and error rates
here are often substantially higher than for machine print text. In spite of
the
tremendous challenges such data presents, it is pervasive in applications of
interest to corporations and government organizations.

Recognition errors are not the sole source of noise; natural language and
the
creative ways that humans use it can create problems for computational
techniques. Electronic text from the Internet (emails, message boards,
newsgroups, blogs, wikis, chat logs and web pages), contact centers
(customer
complaints, emails, call transcriptions, message summaries), and mobile
phones
(text messages) is often noisy, containing spelling errors, abbreviations,
non-standard words, false starts, repetitions, missing punctuation, missing
case
information, and pause-filling words such as "um" and "uh" in the case of
spoken
conversations.

AND 2009 is a workshop devoted to issues arising from the need to contend
with
noisy inputs, the impact noise can have on downstream applications, and the
demands it places on document analysis. The Third Workshop on Analytics for
Noisy Unstructured Text Data will build on two previous successful AND
workshops
held in 2007 (in conjunction with the 20th International Joint Conference on
Artificial Intelligence) and in 2008 (in conjunction with the 31st Annual
International ACM SIGIR Conference).

Topics of Interest
- Noise induced by document analysis techniques and its impact on downstream
applications
- Formal models for noise, including characterization and classification of
noise
- Treatment of noisy data in specific application areas, including
historical
texts, multilingual documents, blogs, chat / SMS logs, social network
analysis,
patent search, and machine translation
- Data sets, benchmarks, and evaluation techniques for analysis of noisy
text
- All other topics arising from noise and its effects on textual data

Submission Guidelines
Full papers may be submitted following the guidelines specified on the AND
2009
website: http://and2009workshop.googlepages.com/

Important Dates (tentative)
Paper Submission: April 20, 2009
Notification of Acceptance: May 20, 2009
Camera-Ready papers due: June 20, 2009