Lillian Lee


2019

pdf bib
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel | Lillian Lee | David Mimno
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time.

pdf bib
Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features
Jack Hessel | Lillian Lee
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback. Our inclusion of the word “community” here is deliberate: what is controversial to some audiences may not be so to others. Using data from several different communities on reddit.com, we predict the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion. We find that even when only a handful of comments are available, e.g., the first 5 comments made within 15 minutes of the original post, discussion features often add predictive capacity to strong content-and- rate only baselines. Additional experiments on domain transfer suggest that conversation- structure features often generalize to other communities better than conversation-content features do.

2018

pdf bib
Valency-Augmented Dependency Parsing
Tianze Shi | Lillian Lee
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a complete, automated, and efficient approach for utilizing valency analysis in making dependency parsing decisions. It includes extraction of valency patterns, a probabilistic model for tagging these patterns, and a joint decoding process that explicitly considers the number and types of each token’s syntactic dependents. On 53 treebanks representing 41 languages in the Universal Dependencies data, we find that incorporating valency information yields higher precision and F1 scores on the core arguments (subjects and complements) and functional relations (e.g., auxiliaries) that we employ for valency analysis. Precision on core arguments improves from 80.87 to 85.43. We further show that our approach can be applied to an ostensibly different formalism and dataset, Tree Adjoining Grammar as extracted from the Penn Treebank; there, we outperform the previous state-of-the-art labeled attachment score by 0.7. Finally, we explore the potential of extending valency patterns beyond their traditional domain by confirming their helpfulness in improving PP attachment decisions.

pdf bib
Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets
Jack Hessel | David Mimno | Lillian Lee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

pdf bib
Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers
Tianze Shi | Carlos Gómez-Rodríguez | Lillian Lee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We generalize Cohen, Gómez-Rodríguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to O(n6), improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees.

pdf bib
Global Transition-based Non-projective Dependency Parsing
Carlos Gómez-Rodríguez | Tianze Shi | Lillian Lee
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Shi, Huang, and Lee (2017a) obtained state-of-the-art results for English and Chinese dependency parsing by combining dynamic-programming implementations of transition-based dependency parsers with a minimal set of bidirectional LSTM features. However, their results were limited to projective parsing. In this paper, we extend their approach to support non-projectivity by providing the first practical implementation of the MH₄ algorithm, an O(n4) mildly nonprojective dynamic-programming parser with very high coverage on non-projective treebanks. To make MH₄ compatible with minimal transition-based feature sets, we introduce a transition-based interpretation of it in which parser items are mapped to sequences of transitions. We thus obtain the first implementation of global decoding for non-projective transition-based parsing, and demonstrate empirically that it is effective than its projective counterpart in parsing a number of highly non-projective languages.

2017

pdf bib
Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set
Tianze Shi | Liang Huang | Lillian Lee
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produce the first implementation of worst-case O(n3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the “second-best-in-class” result on the English Penn Treebank.

2014

pdf bib
Is It All in the Phrasing? Computational Explorations in How We Say What We Say, and Why It Matters
Lillian Lee
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Keynote: Language Adaptation
Lillian Lee
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf bib
The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
Chenhao Tan | Lillian Lee | Bo Pang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication
Chenhao Tan | Lillian Lee
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
You Had Me at Hello: How Phrasing Affects Memorability
Cristian Danescu-Niculescu-Mizil | Justin Cheng | Jon Kleinberg | Lillian Lee
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hedge Detection as a Lens on Framing in the GMO Debates: A Position Paper
Eunsol Choi | Chenhao Tan | Lillian Lee | Cristian Danescu-Niculescu-Mizil | Jennifer Spindel
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

2011

pdf bib
Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs
Cristian Danescu-Niculescu-Mizil | Lillian Lee
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics

2010

pdf bib
(Invited Talk) Clueless: Explorations in Unsupervised, Knowledge-Lean Extraction of Lexical-Semantic Information
Lillian Lee
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

pdf bib
For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia
Mark Yatskar | Bo Pang | Cristian Danescu-Niculescu-Mizil | Lillian Lee
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Don’t ‘Have a Clue’? Unsupervised Co-Learning of Downward-Entailing Operators.
Cristian Danescu-Niculescu-Mizil | Lillian Lee
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Without a ’doubt’? Unsupervised Discovery of Downward-Entailing Operators
Cristian Danescu-Niculescu-Mizil | Lillian Lee | Richard Ducott
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework
Mohit Bansal | Claire Cardie | Lillian Lee
Coling 2008: Companion volume: Posters

pdf bib
Using Very Simple Statistics for Review Search: An Exploration
Bo Pang | Lillian Lee
Coling 2008: Companion volume: Posters

2006

pdf bib
Get out the vote: Determining support or opposition from Congressional floor-debate transcripts
Matt Thomas | Bo Pang | Lillian Lee
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
Bo Pang | Lillian Lee
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
Regina Barzilay | Lillian Lee
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
Bo Pang | Lillian Lee
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Regina Barzilay | Lillian Lee
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

2002

pdf bib
A non-programming introduction to computer science via NLP,IR,and AI
Lillian Lee
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

pdf bib
Thumbs up? Sentiment Classification using Machine Learning Techniques
Bo Pang | Lillian Lee | Shivakumar Vaithyanathan
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
Regina Barzilay | Lillian Lee
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2000

pdf bib
Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji
Rie Kubota Ando | Lillian Lee
1st Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
Book Reviews: Foundations of Statistical Natural Language Processing
Lillian Lee
Computational Linguistics, Volume 26, Number 2, June 2000

1999

pdf bib
Measures of Distributional Similarity
Lillian Lee
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

pdf bib
Distributional Similarity Models: Clustering vs. Nearest Neighbors
Lillian Lee
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib
Fast Context-Free Parsing Requires Fast Boolean Matrix Multiplication
Lillian Lee
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Similarity-Based Methods for Word Sense Disambiguation
Ido Dagan | Lillian Lee | Fernando Pereira
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1994

pdf bib
Similarity-Based Estimation of Word Cooccurrence Probabilities
Ido Dagan | Fernando Pereira | Lillian Lee
32nd Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS
Fernando Pereira | Naftali Tishby | Lillian Lee
31st Annual Meeting of the Association for Computational Linguistics