Anna Rumshisky


2019

pdf bib
Proceedings of the 2nd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the 2nd Clinical Natural Language Processing Workshop

pdf bib
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
Anna Rogers | Aleksandr Drozd | Anna Rumshisky | Yoav Goldberg
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

pdf bib
Revealing the Dark Secrets of BERT
Olga Kovaleva | Alexey Romanov | Anna Rogers | Anna Rumshisky
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT’s heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.

pdf bib
Calls to Action on Social Media: Detection, Social Impact, and Censorship Potential
Anna Rogers | Olga Kovaleva | Anna Rumshisky
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Calls to action on social media are known to be effective means of mobilization in social movements, and a frequent target of censorship. We investigate the possibility of their automatic detection and their potential for predicting real-world protest events, on historical data of Bolotnaya protests in Russia (2011-2013). We find that political calls to action can be annotated and detected with relatively high accuracy, and that in our sample their volume has a moderate positive correlation with rally attendance.

pdf bib
Adversarial Decomposition of Text Representation
Alexey Romanov | Anna Rumshisky | Anna Rogers | David Donahue
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we present a method for adversarial decomposition of text representation. This method can be used to decompose a representation of an input sentence into several independent vectors, each of them responsible for a specific aspect of the input sentence. We evaluate the proposed method on two case studies: the conversion between different social registers and diachronic language change. We show that the proposed method is capable of fine-grained controlled change of these aspects of the input sentence. It is also learning a continuous (rather than categorical) representation of the style of the sentence, which is more linguistically realistic. The model uses adversarial-motivational training and includes a special motivational loss, which acts opposite to the discriminator and encourages a better decomposition. Furthermore, we evaluate the obtained meaning embeddings on a downstream task of paraphrase detection and show that they significantly outperform the embeddings of a regular autoencoder.

pdf bib
What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes
Alexey Romanov | Maria De-Arteaga | Hanna Wallach | Jennifer Chayes | Christian Borgs | Alexandra Chouldechova | Sahin Geyik | Krishnaram Kenthapadi | Anna Rumshisky | Adam Kalai
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protected attributes, as well as their intersections. In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual’s true occupation and a word embedding of their name. This method leverages the societal biases that are encoded in word embeddings, eliminating the need for access to protected attributes. Crucially, it only requires access to individuals’ names at training time and not at deployment time. We evaluate two variations of our proposed method using a large-scale dataset of online biographies. We find that both variations simultaneously reduce race and gender biases, with almost no reduction in the classifier’s overall true positive rate.

2018

pdf bib
Automatic Labeling of Problem-Solving Dialogues for Computational Microgenetic Learning Analytics
Yuanliang Meng | Anna Rumshisky | Florence Sullivan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Similarity-Based Reconstruction Loss for Meaning Representation
Olga Kovaleva | Anna Rumshisky | Alexey Romanov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper addresses the problem of representation learning. Using an autoencoder framework, we propose and evaluate several loss functions that can be used as an alternative to the commonly used cross-entropy reconstruction loss. The proposed loss functions use similarities between words in the embedding space, and can be used to train any neural model for text generation. We show that the introduced loss functions amplify semantic diversity of reconstructed sentences, while preserving the original meaning of the input. We test the derived autoencoder-generated representations on paraphrase detection and language inference tasks and demonstrate performance improvement compared to the traditional cross-entropy loss.

pdf bib
Triad-based Neural Network for Coreference Resolution
Yuanliang Meng | Anna Rumshisky
Proceedings of the 27th International Conference on Computational Linguistics

We propose a triad-based neural network system that generates affinity scores between entity mentions for coreference resolution. The system simultaneously accepts three mentions as input, taking mutual dependency and logical constraints of all three mentions into account, and thus makes more accurate predictions than the traditional pairwise approach. Depending on system choices, the affinity scores can be further used in clustering or mention ranking. Our experiments show that a standard hierarchical clustering using the scores produces state-of-art results with MUC and B 3 metrics on the English portion of CoNLL 2012 Shared Task. The model does not rely on many handcrafted features and is easy to train and use. The triads can also be easily extended to polyads of higher orders. To our knowledge, this is the first neural network system to model mutual dependency of more than two members at mention level.

pdf bib
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
Anna Rogers | Alexey Romanov | Anna Rumshisky | Svitlana Volkova | Mikhail Gronas | Alex Gribov
Proceedings of the 27th International Conference on Computational Linguistics

This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.

pdf bib
What’s in Your Embedding, And How It Predicts Task Performance
Anna Rogers | Shashwath Hosur Ananthakrishna | Anna Rumshisky
Proceedings of the 27th International Conference on Computational Linguistics

Attempts to find a single technique for general-purpose intrinsic evaluation of word embeddings have so far not been successful. We present a new approach based on scaled-up qualitative analysis of word vector neighborhoods that quantifies interpretable characteristics of a given model (e.g. its preference for synonyms or shared morphological forms as nearest neighbors). We analyze 21 such factors and show how they correlate with performance on 14 extrinsic and intrinsic task datasets (and also explain the lack of correlation between some of them). Our approach enables multi-faceted evaluation, parameter search, and generally – a more principled, hypothesis-driven approach to development of distributional semantic representations.

pdf bib
Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the Second Workshop on Stylistic Variation

Language generation tasks that seek to mimic human ability to use language creatively are difficult to evaluate, since one must consider creativity, style, and other non-trivial aspects of the generated text. The goal of this paper is to develop evaluations methods for one such task, ghostwriting of rap lyrics, and to provide an explicit, quantifiable foundation for the goals and future directions for this task. Ghostwriting must produce text that is similar in style to the emulated artist, yet distinct in content. We develop a novel evaluation methodology that addresses several complementary aspects of this task, and illustrate how such evaluation can be used to meaning fully analyze system performance. We provide a corpus of lyrics for 13 rap artists, annotated for stylistic similarity, which allows us to assess the feasibility of manual evaluation for generated verse.

pdf bib
Context-Aware Neural Model for Temporal Information Extraction
Yuanliang Meng | Anna Rumshisky
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a context-aware neural network model for temporal information extraction. This model has a uniform architecture for event-event, event-timex and timex-timex pairs. A Global Context Layer (GCL), inspired by Neural Turing Machine (NTM), stores processed temporal relations in narrative order, and retrieves them for use when relevant entities come in. Relations are then classified in context. The GCL model has long-term memory and attention mechanisms to resolve irregular long-distance dependencies that regular RNNs such as LSTM cannot recognize. It does not require any new input features, while outperforming the existing models in literature. To our knowledge it is also the first model to use NTM-like architecture to process the information from global context in discourse-scale natural text processing. We are going to release the source code in the future.

2017

pdf bib
SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes a new shared task for humor understanding that attempts to eschew the ubiquitous binary approach to humor detection and focus on comparative humor ranking instead. The task is based on a new dataset of funny tweets posted in response to shared hashtags, collected from the ‘Hashtag Wars’ segment of the TV show @midnight. The results are evaluated in two subtasks that require the participants to generate either the correct pairwise comparisons of tweets (subtask A), or the correct ranking of the tweets (subtask B) in terms of how funny they are. 7 teams participated in subtask A, and 5 teams participated in subtask B. The best accuracy in subtask A was 0.675. The best (lowest) rank edit distance for subtask B was 0.872.

pdf bib
HumorHawk at SemEval-2017 Task 6: Mixing Meaning and Sound for Humor Recognition
David Donahue | Alexey Romanov | Anna Rumshisky
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the winning system for SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor. Humor detection has up until now been predominantly addressed using feature-based approaches. Our system utilizes recurrent deep learning methods with dense embeddings to predict humorous tweets from the @midnight show #HashtagWars. In order to include both meaning and sound in the analysis, GloVe embeddings are combined with a novel phonetic representation to serve as input to an LSTM component. The output is combined with a character-based CNN model, and an XGBoost component in an ensemble model which achieves 0.675 accuracy on the evaluation data.

pdf bib
Tracking Bias in News Sources Using Social Media: the Russia-Ukraine Maidan Crisis of 2013–2014
Peter Potash | Alexey Romanov | Mikhail Gronas | Anna Rumshisky | Mikhail Gronas
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

This paper addresses the task of identifying the bias in news articles published during a political or social conflict. We create a silver-standard corpus based on the actions of users in social media. Specifically, we reconceptualize bias in terms of how likely a given article is to be shared or liked by each of the opposing sides. We apply our methodology to a dataset of links collected in relation to the Russia-Ukraine Maidan crisis from 2013-2014. We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90.3% accuracy looking only at domain names of the news sources. The best accuracy of 93.5% is achieved by a feed forward neural network. We also apply our methodology to gold-labeled set of articles annotated for bias, where the aforementioned Naive Bayes classifier records 82.6% accuracy and a feed-forward neural networks records 85.6% accuracy.

pdf bib
Temporal Information Extraction for Question Answering Using Syntactic Dependencies in an LSTM-based Architecture
Yuanliang Meng | Anna Rumshisky | Alexey Romanov
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose to use a set of simple, uniform in architecture LSTM-based models to recover different kinds of temporal relations from text. Using the shortest dependency path between entities as input, the same architecture is used to extract intra-sentence, cross-sentence, and document creation time relations. A “double-checking” technique reverses entity pairs in classification, boosting the recall of positive cases and reducing misclassifications between opposite classes. An efficient pruning algorithm resolves conflicts globally. Evaluated on QA-TempEval (SemEval2015 Task 5), our proposed technique outperforms state-of-the-art methods by a large margin. We also conduct intrinsic evaluation and post state-of-the-art results on Timebank-Dense.

pdf bib
Here’s My Point: Joint Pointer Architecture for Argument Mining
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In order to determine argument structure in text, one must understand how individual components of the overall argument are linked. This work presents the first neural network-based approach to link extraction in argument mining. Specifically, we propose a novel architecture that applies Pointer Network sequence-to-sequence attention modeling to structural prediction in discourse parsing tasks. We then develop a joint model that extends this architecture to simultaneously address the link extraction task and the classification of argument components. The proposed joint model achieves state-of-the-art results on two separate evaluation corpora, showing far superior performance than the previously proposed corpus-specific and heavily feature-engineered models. Furthermore, our results demonstrate that jointly optimizing for both tasks is crucial for high performance.

pdf bib
Towards Debate Automation: a Recurrent Model for Predicting Debate Winners
Peter Potash | Anna Rumshisky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we introduce a practical first step towards the creation of an automated debate agent: a state-of-the-art recurrent predictive model for predicting debate winners. By having an accurate predictive model, we are able to objectively rate the quality of a statement made at a specific turn in a debate. The model is based on a recurrent neural network architecture with attention, which allows the model to effectively account for the entire debate when making its prediction. Our model achieves state-of-the-art accuracy on a dataset of debate transcripts annotated with audience favorability of the debate teams. Finally, we discuss how future work can leverage our proposed model for the creation of an automated debate agent. We accomplish this by determining the model input that will maximize audience favorability toward a given side of a debate at an arbitrary turn.

pdf bib
Length, Interchangeability, and External Knowledge: Observations from Predicting Argument Convincingness
Peter Potash | Robin Bhattacharya | Anna Rumshisky
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work, we provide insight into three key aspects related to predicting argument convincingness. First, we explicitly display the power that text length possesses for predicting convincingness in an unsupervised setting. Second, we show that a bag-of-words embedding model posts state-of-the-art on a dataset of arguments annotated for convincingness, outperforming an SVM with numerous hand-crafted features as well as recurrent neural network models that attempt to capture semantic composition. Finally, we assess the feasibility of integrating external knowledge when predicting convincingness, as arguments are often more convincing when they contain abundant information and facts. We finish by analyzing the correlations between the various models we propose.

2016

pdf bib
SimiHawk at SemEval-2016 Task 1: A Deep Ensemble System for Semantic Textual Similarity
Peter Potash | William Boag | Alexey Romanov | Vasili Ramanishka | Anna Rumshisky
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

pdf bib
MUTT: Metric Unit TesTing for Language Generation Tasks
William Boag | Renan Campos | Kate Saenko | Anna Rumshisky
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Catching the Red Priest: Using Historical Editions of Encyclopaedia Britannica to Track the Evolution of Reputations
Yen-Fu Luo | Anna Rumshisky | Mikhail Gronas
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf bib
GhostWriter: Using an LSTM for Automatic Rap Lyric Generation
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
TwitterHawk: A Feature Bucket Based Approach to Sentiment Analysis
William Boag | Peter Potash | Anna Rumshisky
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2013

pdf bib
Mixing in Some Knowledge: Enriched Context Patterns for Bayesian Word Sense Induction
Rachel Chasin | Anna Rumshisky
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

2012

pdf bib
Word Sense Inventories by Non-Experts.
Anna Rumshisky | Nick Botchan | Sophie Kushkuley | James Pustejovsky
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

2011

pdf bib
Crowdsourcing Word Sense Definition
Anna Rumshisky
Proceedings of the 5th Linguistic Annotation Workshop

2010

pdf bib
SemEval-2010 Task 7: Argument Selection and Coercion
James Pustejovsky | Anna Rumshisky | Alex Plotnick | Elisabetta Jezek | Olga Batiukova | Valeria Quochi
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
SemEval-2010 Task 7: Argument Selection and Coercion
James Pustejovsky | Anna Rumshisky
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib
GLML: Annotating Argument Selection and Coercion
James Pustejovsky | Jessica Moszkowicz | Olga Batiukova | Anna Rumshisky
Proceedings of the Eight International Conference on Computational Semantics

2008

pdf bib
Polysemy in Verbs: Systematic Relations between Senses and their Effect on Annotation
Anna Rumshisky | Olga Batiukova
Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics

2006

pdf bib
Towards a Generative Lexical Resource: The Brandeis Semantic Ontology
James Pustejovsky | Catherine Havasi | Jessica Littman | Anna Rumshisky | Marc Verhagen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

pdf bib
Inducing Sense-Discriminating Context Patterns from Sense-Tagged Corpora
Anna Rumshisky | James Pustejovsky
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

pdf bib
Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources
Ben Wellner | James Pustejovsky | Catherine Havasi | Anna Rumshisky | Roser Saurí
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

2005

pdf bib
Automating Temporal Annotation with TARSQI
Marc Verhagen | Inderjeet Mani | Roser Sauri | Jessica Littman | Robert Knippen | Seok B. Jang | Anna Rumshisky | John Phillips | James Pustejovsky
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2004

pdf bib
Automated Induction of Sense in Context
James Pustejovsky | Patrick Hanks | Anna Rumshisky
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

pdf bib
Automated Induction of Sense in Context
James Pustejovsky | Patrick Hanks | Anna Rumshisky
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics