ACL2003 News Letter No.3

ACL2003 NEWS LETTER NO.3
(March 4, 2003)

Hitoshi Isahara (Publicity Chair, CRL) and Masaki Murata (CRL)

Venue:

Sapporo Convention Center, Sapporo, JAPAN

Dates:

Tutorials and Pre-conference Workshops: July 7, 2003

Main Conference: July 8-10, 2003

Post-conference Workshops: July 11-12, 2003

This news letter includes:

1) News from Program Committee of Main Conference

2) Extended Deadline of Student Research Workshop

3) Life Time Achievement Award

4) Abstracts of Tutorials

4-1) Finite State Language Processing

4-2) Maximum Entropy Models, Conditional Estimation, and Optimization without the Magic

4-3) Knowledge Discovery from Text

4-4) Spoken Language Processing: Separating Science Fact from Science

5) Deadlines and Web Sites

5-1) Student Research Workshop

5-2) Interactive Poster/Demo Sessions

5-3) Associated Conferences (EMNLP2003 and IRAL2003)

5-4) ACL Workshops

5-5) Exhibits and Sponsorship

6) Important Announcements from Several Associated Conferences and Workshops

1) News from Program Committee of Main Conference

376 papers were submitted to the main conference. This is far more than we expected. Thank you for your interest in ACL2003.

2) Extended Deadline of Student Research Workshop

The paper submission deadline of Student Research Workshop was extended:

Paper submission deadline: March 15, 2003 (extended)
(Note that we NO LONGER require early registration of papers.)
Web site: http://tangra.si.umich.edu/clair/acl03-student/

We would appreciate it if you could inform your students that the deadline has been extended.

3) Life Time Achievement Award

A ceremony for the second Life Time Achievement Award will be held during ACL 2003. The LTA was established at the 40th anniversary conference of ACL last year. The first winner of the LTA was Prof. Aravind Joshi of the University of Pennsylvania.

4) Abstracts of Tutorials

There will be four tutorials, to be given by leading experts in language and speech processing. The tutorials will take place on July 7. The abstracts of the tutorials and the profiles of the speakers will be described on the ACL-03 web site. For details, see the Web site http://www.ec-inc.co.jp/ACL2003/tutorials.html.

4-1) Finite State Language Processing

Gertjan van Noord (University of Groningen, the Netherlands)

Finite state automata are well-understood, and inherently compact and efficient models of simple languages. In addition, finite state automata can be combined in various interesting ways, with the guarantee that the result again is a finite state automaton.

In the introductory part of the tutorial, finite state acceptors and finite state transducers (both weighted and unweighted) are introduced, and we briefly review their formal and computational properties.

In the second part of the tutorial, we illustrate the use of finite state methods in dictionary construction. In particular, we present an application of perfect hash automata in tuple dictionaries. Tuple dictionaries provide a very compact representation of huge language models of the kind typically used in NLP applications (including Ngram language models).

In the third part of the tutorial we focus on regular expressions for NLP. The type of regular expressions used in modern NLP applications has evolved dramatically from the regular expressions found in standard Computer Science textbooks. In recent years, various high level regular expression operators have been introduced (such as contexted replacement operators). The availability of more and more abstract operators make the regular expression notation more and more attractive. The tutorial provides an introduction into the regular expression calculus. The examples use the notation of the Fsa Utilities toolkit: a freely available implementation of the regular expression calculus. We introduce various regular expression operators for acceptors and transducers. We then continue to show how new regular expression operators can be defined.

In the last part of the tutorial, we focus in more detail on regular expression operators that turned out to be useful for the description of certain aspects of phonology using ideas from Optimality Theory. This part of the tutorial describes the lenient composition operator of Karttunen, and the optimality operator of Gerdemann and van Noord, as well as a number of alternatives (Eisner, Jaeger).

4-2) Maximum Entropy Models, Conditional Estimation, and Optimization without the Magic

Dan Klein and Christopher D. Manning (Stanford University, U.S.A.)

This tutorial presents the foundations of maximum entropy models, optimization methods to learn them, and various issues in the use of graphical models more complex than simple naive-Bayes (NB) or HMM models. The focus is on intuition and understanding, using visual illustrations and simple examples rather than detailed derivations whenever possible.

Maximum Entropy Models: What maximum entropy models are, from first principles, what they can and cannot do, and how they behave. Lots of examples. The equivalence of maxent models and maximum-likelihood exponential models. The relationship between maxent models and other classifiers. Smoothing methods for maxent models.

Basic Optimization: Unconstrained optimization: convexity, gradient methods (both simple descent and more practical conjugate methods). Constrained optimization: Lagrange multipliers and several ways of turning them into a concrete optimization system. Other fun things to do with optimization. Specialized iterative scaling methods vs. general optimization.

Model Structures: Conditional independence in graphical models (focusing on NB, HMMs, and PCFGs). Practical ramifications of various independence assumptions. Label and observation biases in conditional structures. Survey of sequence models (HMMs, MEMMs, CRFs, and dependency networks).

Prerequisites: Familiarity with basic calculus and a working knowledge of NB and HMMs are required. Existent but possibly vague knowledge of general Bayes' nets or basic information theory is a plus. Most importantly: a low tolerance for conceptual black boxes labeled "magic here".

4-3) Knowledge Discovery from Text

Dan Moldovan (University of Texas at Dallas, U.S.A.)
Roxana Girju (Baylor University, U.S.A.)

Knowledge Discovery is a fast growing area of research and commercial interest. While knowledge may be discovered from many sources of information, this tutorial focuses on the discovery of knowledge from open texts, the largest source of knowledge. The problem of Knowledge Discovery from Text (KDT) is to extract explicit and implicit concepts and semantic relations between concepts using Natural Language Processing techniques. The discovery process is guided by the notion of context specified either by seed concepts or in some other more formal way.

KDT, while deeply rooted in NLP, actually draws on methods from statistics, machine learning, reasoning, information extraction, knowledge management, cognitive science and others for its discovery process. The emphasis here is on the automatic discovery of new concepts and on the large number of semantic relations that link them. This tutorial presents recent results from KDT research and system implementations.

Since the goal of KDT is to get insights into large quantities of text data and bring to bear text semantics, it plays an increasingly significant role in emerging applications, such as Question Answering, Summarization, Text Understanding and Ontology Development.

This tutorial is aimed at researchers, practitioners, educators, and research planners who want to keep in sync with the newly emerging KDT technology.

4-4) Spoken Language Processing: Separating Science Fact from Science Fiction

Roger K. Moore (20/20 Speech Ltd, U.K.)

The advent of talking and listening machines has long been hailed as "the next big thing" in human-machine interaction. Indeed only recently, the IEEE Spectrum magazine (September 2002) named speech as one of five technologies likely to reap big market rewards in the next five years. Certainly, the frequency with which members of the general public come across speech-enabled applications in their everyday lives does seem to be on the increase, and the marketplace is currently able to support a number of sizeable commercial companies who are supplying speech-based products and services - as well as a growing academic community of speech scientists and engineers. This apparent progress has been fuelled by a number of key developments: the relentless increase in available computing power, the introduction of 'data-driven' techniques for speech pattern modelling, and the institution of public system evaluations.

This tutorial will chart the main advances that have been made in spoken language processing algorithms and applications over the past few years. The key enabling technologies of 'automatic speech recognition', 'text-to-speech synthesis' and 'spoken language dialogue' will be explained in some detail, with emphasis being placed on how the technology works and, perhaps more importantly, why it sometimes doesn't. Insight will also be given into the linguistic/paralinguistic properties of speech signals and human spoken language, and comparisons will be drawn between the capabilities of 'automatic' and 'natural' spoken language processing systems.

The tutorial is aimed at both specialists and non-specialists in the language processing field, and will be of great interest to anyone who is keen to develop a greater understanding of the main issues involved in spoken language processing. Prof. Moore will cover theoretical and practical aspects of the inner workings of state-of-the-art spoken language systems, as well as providing a balanced overview of their capabilities in relation to other modes of human-machine interaction.

The tutorial will incorporate question-and-answer opportunities, and will conclude with a survey of open research issues and some predictions for the future.

5) Deadlines and Web Sites

The student research workshop, the interactive poster/demo sessions, the associated conferences (EMNLP2003 and IRAL2003) and the workshops have their own submission deadlines and sites. Please see the web sites for the details.

5-1) Student Research Workshop

Paper submission deadline: March 15, 2003 (extended)
Web site: http://tangra.si.umich.edu/clair/acl03-student/

5-2) Interactive Poster/Demo Sessions

Paper submission deadline: May 1, 2003
Web site: http://cl.aist-nara.ac.jp/staff/matsu/poster.html

5-3) Associated Conferences (EMNLP2003 and IRAL2003)

AC1 The Eighth Conference on Empirical Methods in Natural Language Processing (EMNLP2003)

Submission deadline:

April 4, 2003

Conference date:

July 11-12, 2003

Web site: http://www.ai.mit.edu/people/mcollins/emnlp03.html

AC2 The Sixth International Workshop on Information Retrieval with Asian Languages (IRAL2003)

Submission deadline:

April 15, 2003

Conference date:

July 7, 2003

Web site: http://research.nii.ac.jp/IRAL2003/

5-4) ACL Workshops

WS1 Multilingual Summarization and Question Answering - Machine Learning and Beyond

Submission deadline:

April 21, 2003

Workshop date:

July 11-12, 2003

Web site: http://www.isi.edu/~cyl/msqa-ml-acl2003/

WS2 Natural Language Processing in Biomedicine

Submission deadline:

April 10, 2003

Workshop date:

July 11, 2003

Web site: http://www-tsujii.is.s.u-tokyo.ac.jp/ACL03/bionlp.htm

WS3 The Lexicon and Figurative Language

Submission deadline:

April 13, 2003

Workshop date:

July 11, 2003

Web site: http://www.cs.bham.ac.uk/~amw/ACLWorkshop.html

WS4 Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models

Submission deadline:

April 4, 2003

Workshop date:

July 12, 2003

Web site: http://research.microsoft.com/conferences/mulner-acl03/

WS5 The Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications

Submission deadline:

April 21, 2003

Workshop date:

July 11, 2003

Web site: http://nlp.nagaokaut.ac.jp/IWP2003/

WS6 Second SIGHAN Workshop on Chinese Language Processing

Deadline: the workshop submission deadline:

March 10, 2003

Deadline: the word segmentation bakeoff:

April 22-25, 2003

Workshop date:

July 11-12, 2003

URL: the workshop: http://www.sighan.org/swclp2/

URL: the bakeoff: http://www.sighan.org/bakeoff2003/

WS7 Multiword Expressions: Analysis, Acquisition and Treatment

Submission deadline:

April 5, 2003

Workshop date:

July 12, 2003

Web site: http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html

WS8 Linguistic Annotation: Getting the Model Right

Submission deadline:

April 5, 2003

Workshop date:

July 11, 2003

Web site: http://www.cs.vassar.edu/~ide/events/ACL2003-LR/

WS9 Workshop on Patent Corpus Processing

Submission deadline:

April 10, 2003

Workshop date:

July 12, 2003

Web site: http://www.slis.tsukuba.ac.jp/~fujii/acl2003ws.html

WS10 Towards a Resources Information Infrastructure

Submission deadline:

April 13, 2003

Workshop date:

July 11-12, 2003

Web site: http://www.elsnet.org/acl2003-workshop/

5-5) Exhibits and Sponsorship

Application Deadline for both: April 1, 2003

For details, see Exhibits and Sponsorship at http://www.ec-inc.co.jp/ACL2003/.

6) Important Announcements from Several Associated Conferences and Workshops

AC1 The Eighth Conference on Empirical Methods in Natural Language Processing (EMNLP2003)

Abstract: SIGDAT, the Association for Computational Linguistics' special interest group on linguistic data and corpus-based approaches to NLP, invites submissions to EMNLP 2003. The conference will be held on July 11-12 in Sapporo, Japan, immediately following the 41st meeting of the ACL (ACL 2003).

URL: http://www.ai.mit.edu/people/mcollins/emnlp03
Deadline: April 4, 2003

WS1 Multilingual Summarization and Question Answering - Machine Learning and Beyond

Abstract: Automatic summarization and question answering (QA) aim at producing a concise representation of the key information content. Rule-based or statistical-based approaches to summarization and QA systems have shown promising results; it is, however, very difficult to find good evaluation functions or rules that work well across domains. In consequence, various machine learning (ML) techniques have recently been applied to summarization and QA systems. The purpose of this workshop is to provide a forum for exploring the commonality underling this diversity of problem domains and approaches.

Deadline: April 21, 2003

WS2 Natural Language Processing in Biomedicine

Invited speaker: Prof. Carol Friedman, CUNY/ Columbia University
'Opportunities and Challenges for NLP in Biomedicine'

The aim of this workshop is to bring together NLP researchers in biomedicine and to discuss recent advances in the computational analysis of text, which go beyond traditional keyword-based indexing methods and begin to offer content-based analysis. Knowledge discovery in the rapidly growing area of biomedicine is of paramount importance. Processing biomedical texts is a challenge especially in the areas of terminology, ontology building, information extraction, annotation tools, sharing and integration of knowledge from factual and textual data bases and evaluation of biomedical applications among others. One of the aims of the workshop is to create SIGs in areas of common interest such as annotation standards in biology, evaluation metrics, standardisation of terminological resources etc.

Submission deadline:

April 10, 2003

Workshop date:

July 11, 2003

Web site: http://www-tsujii.is.s.u-tokyo.ac.jp/ACL03/bionlp.htm

WS3 The Lexicon and Figurative Language

Abstract: The lexicon has variously been treated as a list of word senses, a list of hierarchically related senses, (e.g. WordNet), and as a structured entity containing rich lexical representations and means to generate novel uses of words. Figurative language poses problems for all these approaches, and a common claim is that metaphor is a cognitive not a linguistic phenomenon; instead, word senses are related in terms of their underlying conceptual domains. The major theme of this SIGLEX endorsed workshop is to explore and attempt to reconcile these different approaches to figurative language and the lexicon - although papers exploring other aspects of figurative language will also be welcome.

Deadline: April 13, 2003
Web site: http://www.cs.bham.ac.uk/~amw/ACLWorkshop.html

WS4 Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models

Invited speaker: David Yarowsky

Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains. Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks. Is it possible to:

- maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language?

- acquire and share resources (including lexicons and grammars) across languages?

- balance performance speed with reasonable accuracy?

- use specific language patterns while permitting rapid transfer to another language?

- minimize variability in results across language types?

We welcome research on combined models, in which these tradeoffs are calculated in particular ways. Demonstrations of implemented NE systems are also welcome.

Submit papers by April 4 electronically in Word, PDF or PostScript format. Assign a filename based on the paper's title, transfer to ftp://ftp.research.microsoft.com/incoming/josephp then email an identification page with title, author(s), contact details, and filename to molsen@microsoft.com.

URL: http://research.microsoft.com/conferences/mulner-acl03/

WS5 Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications

Abstract: Paraphrases, variant ways of conveying the same information, are of interest because they present challenges for many NLP tasks, such as MT, IR, QA, etc. This workshop is open to investigation of all aspects of paraphrase, with a particular focus on the automatic acquisition of paraphrases from corpora, and on the development of a standardized paraphrase framework or resource for use in applications.

URL: http://nlp.nagaokaut.ac.jp/IWP2003/
Deadline: April 21, 2003

WS6 Second Sighan Workshop on Chinese Language Processing

Abstract: As more resources for Chinese NLP have become available to the public recently, it is crucial to set up a platform that allows easy comparison of different approaches to various NLP tasks. Sighan is conducting a word-segmentation bakeoff before the workshop. Researchers all over the world are welcome to participate. As a part of this Sighan workshop, we are going to release the bakeoff results, followed by the presentation of bakeoff participants and the general discussions on future evaluations. A second part of the workshop will consist of presentations of papers on all aspects of Chinese language processing.

URL: the workshop: http://www.sighan.org/swclp2/

URL: the bakeoff: http://www.sighan.org/bakeoff2003/

Workshop date:

July 11-12, 2003

Deadline: the workshop submission deadline:

March 10, 2003

Deadline: the word segmentation bakeoff:

April 22-25, 2003

WS7 Multiword Expressions: Analysis, Acquisition and Treatment

The workshop will concentrate on the analysis, acquisition and treatment of multiword expressions (MWEs), such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "radar footprint"), and institutionalized phrases (e.g. "salt and pepper"). In particular we focus on addressing the problems that MWEs pose for natural language processing applications.

URL: http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html
Submission deadline: April 5, 2003

WS9 Workshop on Patent Corpus Processing

Abstract: The goal of this workshop is to foster research and development of the technology for patent corpus processing, by providing a forum in which researchers and practitioners can exchange and share their ideas, approaches, perspectives, and experiences from their work in progress. We invite both research papers and project papers associated with, but not limited to, the rudiments of patent corpus processing. We also invite papers addressing applications and user studies.

Deadline: April 10, 2003