< back to Tutorials
ADVANCES IN WORD SENSE DISAMBIGUATION
Ted Pedersen and Rada MihalceaWord Sense Disambiguation is the problem of identifying the intended meaning (or sense) of a word, based on the context in which it occurs. This is a central problem in natural language processing, and improved approaches have the potential to advance the state of the art in machine translation, information retrieval, and many other language related problems. This tutorial will introduce the full range of techniques that have been applied to this problem. These include knowledge-intensive methods that take advantage of dictionaries and other manually crafted resources, supervised techniques that learn classifiers from training examples, minimally supervised approaches that bootstrap off small amounts of labeled data, and unsupervised approaches that identify word senses in raw unannotated text. In addition, the tutorial will provide an overview of resources that are available to those who might wish to conduct research in this area, or incorporate word sense disambiguation techniques in their existing systems.
Tutorial attendees will come away with a firm understanding of all the major approaches to word sense disambiguation that are currently under investigation in the Computational Linguistics community. Our objective is that attendees will have sufficient understanding to make informed decisions about including word sense disambiguation techniques in their text processing applications in the future, and to see where there might be opportunities to advance the state of the art in word sense disambiguation by the application of novel techniques from their own areas of expertise.
This tutorial is intended for NLP researchers and practitioners who seek a general understanding of Word Sense Disambiguation. It is introductory in nature, no special knowledge or background is required.
TUTORIAL OUTLINE
- Introduction (Pedersen)
- Word Sense Disambiguation Defined and Illustrated
- Historical Overview
- Practical Applications
- Methodology (Mihalcea)
- All Words Disambiguation
- Targeted Words (Lexical Sample) Disambiguation
- Word Sense Discrimination and Sense Discovery
- Evaluation (granularity and scoring)
- Knowledge Intensive Methods (Mihalcea)
- Machine Readable Dictionaries
- Selectional Restrictions
- Measures of Semantic Similarity
- Heuristic
- based Methods
- Supervised Learning Methods (Pedersen)
- Introduction to Classifier Induction
- Support Vector Machines in WSD
- Ensemble Methods
- Supervised Methods (Mihalcea)
- Introduction to Bootstrapping and Co-Training
- Yarowsky's Algorithm
- Using the Web
- Unsupervised Methods (Pedersen)
- Discrimination as First Step of Disambiguation
- Clustering Senses from Unannotated Corpora
- Sense Discrimination Using Parallel Texts
- How to Get Started in WSD Research (Mihalcea)
- Software
- Lexicons and Thesauruses
- Sense Tagged Text
- Senseval exercises
- Conclusions (Pedersen)
- The Web and WSD
- Multilingual WSD
- The Next Five Years
TED PEDERSEN is an Associate Professor of Computer Science at the
University of Minnesota, Duluth. He has been actively engaged in word
sense disambiguation research since 1995. His work includes supervised
machine learning approaches to word sense disambiguation, unsupervised
clustering approaches to word sense discrimination, and disambiguation via
the use of measures of semantic similarity and relatedness. He is the
recipient of an NSF Faculty Early Development (CAREER) Award.
< back to Tutorials
