T1: Monday morning session 23/4 (9:00-12:30)
T4: Tuesday afternoon session 24/4 (14:00-17:30)
PRESENTER: Shuly Wintner
ABSTRACT: Language acquisition is one of Nature's greatest puzzles. Human languages are extremely complex systems, yet (most) children acquire them naturally, quickly and with little effort. Research in language acquisition attempts to study the mechanisms of this puzzle and to shed light on the very nature of language itself: the primary cognitive capacity which makes us human. In recent years, research in psycholinguistics in general and language acquisition in particular has become more aware of state-of-the-art results in computational linguistics. Methodologies and techniques that are regularly used in computational linguistics are employed in psycholinguistics, resulting in insights that shed new light on language acquisition processes. This tutorial will survey some of these recent results, focusing on areas in which computational linguists can contribute to psycholinguistic research. The main goal of the tutorial is to survey the current state of the art, acquaint computational linguists with the kind of problems in psycholinguistics that can benefit from their training and expertise, and identify directions for future cross-disciplinary research. Topics will include: a quick survey of language acquisition processes and the main psycholinguistic theories that study them; the use of corpora in language acquisition research, focusing on the CHILDES project, a large multilingual annotated corpus containing transcripts of spoken interactions between children and adults; the emergence of part-of-speech categories; the emergence of grammar; the innateness debate; computational language learning and its relevance for child language acquisition; etc. By the end of the tutorial, participants are expected to have a clear view of the problems that are the focus of contemporary research in language acquisition, and a good idea of how computational linguistics can be instrumental in approaching these problems.
- Introduction The Language Learning Task
- General characteristics
- Patterns of human language acquisition
- Formal characteristics
- Computational resources
- Frameworks Evaluation Models of Word Learning Models of Morphological Acquisition Models of Syntactic Acquisition
- Computational grammar induction
- Cognitively-motivated models
- Consolidating the two approaches Case Study: The Traceback Method
- Evaluation Directions for Future Research
PREREQUISITES: The tutorial is aimed at a general computer science audience with little or no background in psycholinguistics or cognitive science. The goal is to introduce basic research questions and fundamental research methodologies of psycholinguistics, and in particular child language acquisition, to a computational crowd, in order to facilitate future collaboration between the disciplines.
INSTRUCTOR: Shuly Wintner
Department of Computer Science University of Haifa 31905 Haifa Israel
Shuly Wintner is an associate professor at the Department of Computer Science, University of Haifa, Israel. His research spans various areas in computational linguistics, including formal grammars, morphology, syntax, development of resources and machine translation. Recently, he was involved in several projects focusing on language acquisition from a computational perspective. He has published over 80 scientific papers in computational linguistics. He is a regular reviewer for ACL and its chapters, was the program co-chair of EACL-2006 and the editor-in-chief of the journal Research in Language and Computation. He has an extensive teaching experience, including tutorials at NAACL-2004, MT-Summit 2003 and COLING-2000; four ESSLLI courses; three courses at the International PhD School in Formal Languages and Applications; and two at the Erasmus Mundus Master course in Language and Communication Technology.
PREVIOUS PRESENTATIONS: None.
PRESENTER: Marco Baroni
ABSTRACT: Distributional semantic models (DSMs) approximate the meaning of words with vectors that summarize their distribution in large text corpora (Turney and Pantel, 2010). Given the empirical success of DSMs in capturing lexical semantics on a large scale with knowledge-light techniques, it is natural to ask if similar techniques can be extended to handle the meaning of phrases and sentences. Consequently, the last few years have seen a number of proposals on how to incorporate compositionality into DSMs, in order to construct vectorial representations for linguistic constituents above the word. The tutorial, after a brief general introduction to DSMs, will introduce various representative approaches to distributional composition (Mitchell and Lapata, 2010, Baroni and Zamparelli, 2010, Grefenstette and Sadrzadeh, 2011, Socher et al., 2011), discussing models, evaluation methodology and data sets, as well as strengths and weaknesses. The conclusion will outline some of the most important next steps in this blooming field, in terms of modeling, evaluation and potential applications such as automated paraphrase detection.
- General introduction and motivation
- Brief introduction to DSMs
- State-of-the-art composition methods and evaluation
- Current and future challenges for compositional DSMs
PREREQUISITES: The tutorial assumes basic familiarity with elementary linear algebra (vectors, dot product and cosine, matrices and matrix multiplication, etc.).
INSTRUCTOR: Marco Baroni
Center for Mind/Brain Sciences (University of Trento) Palazzo Fedrigotti, C.so Bettini 31 38068 Rovereto (TN), Italy
Marco Baroni obtained a PhD in linguistics from UCLA in 2000. Since 2006 he is tenured researcher at the Center for Mind/Brain Sciences of the University of Trento. Distributional semantics is the central theme of his recent research, and he is currently focusing on multimodal and compositional DSMs (in July 2011, he was awarded a 5-year ERC Starting Grant to develop and evaluate compositional DSMs).
PREVIOUS PRESENTATIONS: Marco has taught various mini-courses in distributional semantics (most recently, at the ADT-TM winter school in Rome in 2011), he has co-coordinated the ESSLLI 2008 Distributional Lexical Semantics Workshop and he dedicates several lectures of the Text Processing class he regularly teaches at the University of Trento to distributional semantics. This will be his first tutorial on compositional distributional semantics.
PRESENTER: Bing Liu
ABSTRACT: Sentiment analysis and opinion mining is the computational study of people's opinions, appraisals, and emotions toward entities, individuals, topics and their attributes expressed in text. Opinions are important because they are key influencers of our behaviors. Our beliefs and perceptions of reality are to a considerable degree conditioned on how others see the world. For this reason, when we need to make a decision we often seek out the opinions of others. With the explosive growth of social media and opinionated text on the Web, sentiment analysis and opinion mining has emerged as a major research area in NLP due to many challenging research problems and a wide range of applications. It has also spread from computer science to social sciences and management sciences. From a natural language understanding perspective, sentiment and opinion represent an important aspect of semantic meaning of text, which touches every area of NLP but is highly restricted. It is not necessary to understand the full text in order to extract sentiments and opinions. Although extensive research has been done in the past decade, the real progress has been modest. There is still no accurate algorithm for solving any of its sub-problems. In this tutorial, I will first define the problem, describe its main tasks, and then present the current state-of-the-art techniques. A key feature of the tutorial is that I will not only describe seminal research ideas and techniques but will also look at the technology from an application point of view.
- Introduction and Motivation
- The Problem of Sentiment Analysis and Opinion Mining
- Document Level Sentiment Classification
- Sentence Level Subjectivity and Sentiment Classification
- Aspect-based Opinion Mining and Summarization
- Aspect Extraction and Grouping
- Opinion Lexicon Expansion
- Joint Modeling of Aspects and Opinions
- Opinion Spam Detection
- Utilities of Online Reviews
PREREQUISITES: Basic NLP and machine learning knowledge.
INSTRUCTOR: Bing Liu
Department of Computer Science University of Illinois at Chicago 851 S.
Morgan (M/C 152) Chicago, IL 60607-7053
Bing Liu is a full professor of Computer Science at the University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from the University of Edinburgh. Before joining UIC, he was with the National University of Singapore. His current research interests include sentiment analysis and opinion mining, text and Web mining, data mining, and machine learning. He has published extensively in these fields, and has given more than 20 invited and keynote talks on sentiment analysis and opinion mining. He has also written a textbook titled "Web Data Mining: Exploring Hyperlinks, Contents and Usage Data" published by Springer. Bing Liu is also an expert of data mining.
PREVIOUS PRESENTATIONS: Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), 2011 The content has been updated with the published results in 2011.
PRESENTER: Marius Pasca
ABSTRACT: This tutorial provides an overview of extraction methods developed in the area of Web-based open-domain information extraction, whose purpose is the acquisition of open-domain classes, instances and relations from Web text. The extraction methods operate over unstructured or semi-structured text. They take advantage of weak supervision provided in the form of seed examples or small amounts of annotated data, or draw upon knowledge already encoded within resources created strictly by experts or collaboratively by users. The tutorial teaches the audience about existing resources that include instances and relations; details of methods for extracting such data from structured and semi-structured text available on the Web; and strengths and limitations of resources extracted from text as part of recent literature, with applications in knowledge discovery and information retrieval.
- P1. Introduction
- (a) Overview of information extraction
- (b) Information extraction as an aid to Web search
- (c) Goals of open-domain information extraction
- P2. Resources for open-domain information extraction
- (a) Resources of open-domain knowledge
- (i) Formal, expert resources
- (ii) Collaborative resources
- (iii) Hybrid resources
- (b) Web-based textual data sources
- (i) Unstructured text
- (ii) Semi-structured text
- (iii) Web search queries
- P3. Methods for Web open-domain information extraction
- (a) Challenges of extracting open-domain information
- (i) Extraction from large document collections
- (ii) Beyond coarse-grained classes of instances
- (iii) Redundancy as a proxy for trustworthiness
- (b) Extraction methods
- (i) Extraction of instances and classes of instances
- (ii) Extraction of facts and relations among instances and among classes
- (iii) Extraction of common-sense knowledge
- P4. Discussion
- (a) Analysis of the extracted resources
- (i) Accuracy
- (ii) Coverage
- (b) Applications
- (i) Applications in information retrieval
- (ii) Applications in knowledge management
PREREQUISITES: The tutorial targets conference participants interested in the areas of knowledge acquisition and use in information retrieval. No prior knowledge of these topics is required, for participants to fully take advantage of the tutorial.
INSTRUCTOR: Marius Pasca
Google Inc. Mountain View, California
Marius Pasca is a research scientist at Google. He graduated with a Ph.D. degree in Computer Science from Southern Methodist University, Dallas, Texas and an M.Sc. degree in Computer Science from Joseph Fourier University, Grenoble, France. Current research interests include factual information extraction from unstructured text and natural-language matching functions for information retrieval.
PREVIOUS PRESENTATIONS: Previous versions of the tutorial have been presented at WWW 2011 and CIKM 2011.