SIGWAC report to ACL Board for year 2008-2009 Serge Sharoff 24 June 2009

SIGWAC had another successful year. The last workshop (WAC4) was held on 1 June 2008, co-located with LREC2008 in Marrakech (Morocco). In spite of the shorter than usual gap between the previous WAC3 workshop held in September 2007, we had a fairly large event with nine presentations and about 30 people attending. Two presentations were designated as 'Star Talks', i.e. high-quality papers important for the community (we had no funding to invite an invited speaker).

The next workshop (WAC5) is going to take place on 7 September 2009 in San Sebastian, Spain, co-located with SEPLN, the Spanish NLP conference. We have obtained funding for an invited speaker and invited Dekang Lin from Google to talk on "Unsupervised acquisition of lexical knowledge from the Web". We have also selected nine papers for the workshop.

The next WAC6 workshop will be probably co-located with NAACL-HLT in Los Angeles in July 2010.

At WAC3 we also ran CleanEval, a competition on removal of unwanted elements from webpages (such as navigation frames, standard headers, footers, counters, which can potentially bias the language model). After the competition some participants commented that the text-only output of the gold standard misses important information linking text chunks to their representation in the original webpage. At WAC6 we plan to run CleanEval2, using a standoff annotation of the gold standard.