Andrew McCallum


2019

pdf bib
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
Vivi Nastase | Benjamin Roth | Laura Dietz | Andrew McCallum
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

pdf bib
The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures
Sheshera Mysore | Zachary Jensen | Edward Kim | Kevin Huang | Haw-Shiuan Chang | Emma Strubell | Jeffrey Flanigan | Andrew McCallum | Elsa Olivetti
Proceedings of the 13th Linguistic Annotation Workshop

Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.

pdf bib
Roll Call Vote Prediction with Knowledge Augmented Models
Pallavi Patil | Kriti Myer | Ronak Zala | Arpit Singh | Sheshera Mysore | Andrew McCallum | Adrian Benton | Amanda Stent
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

The official voting records of United States congresspeople are preserved as roll call votes. Prediction of voting behavior of politicians for whom no voting record exists, such as individuals running for office, is important for forecasting key political decisions. Prior work has relied on past votes cast to predict future votes, and thus fails to predict voting patterns for politicians without voting records. We address this by augmenting a prior state of the art model with multiple sources of external knowledge so as to enable prediction on unseen politicians. The sources of knowledge we use are news text and Freebase, a manually curated knowledge base. We propose augmentations based on unigram features for news text, and a knowledge base embedding method followed by a neural network composition for relations from Freebase. Empirical evaluation of these approaches indicate that the proposed models outperform the prior system for politicians with complete historical voting records by 1.0% point of accuracy (8.7% error reduction) and for politicians without voting records by 33.4% points of accuracy (66.7% error reduction). We also show that the knowledge base augmented approach outperforms the news text augmented approach by 4.2% points of accuracy.

pdf bib
Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders
Andrew Drozdov | Patrick Verga | Yi-Pei Chen | Mohit Iyyer | Andrew McCallum
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Understanding text often requires identifying meaningful constituent spans such as noun phrases and verb phrases. In this work, we show that we can effectively recover these types of labels using the learned phrase vectors from deep inside-outside recursive autoencoders (DIORA). Specifically, we cluster span representations to induce span labels. Additionally, we improve the model’s labeling accuracy by integrating latent code learning into the training procedure. We evaluate this approach empirically through unsupervised labeled constituency parsing. Our method outperforms ELMo and BERT on two versions of the Wall Street Journal (WSJ) dataset and is competitive to prior work that requires additional human annotations, improving over a previous state-of-the-art system that depends on ground-truth part-of-speech tags by 5 absolute F1 points (19% relative error reduction).

pdf bib
Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference
Rajarshi Das | Ameya Godbole | Manzil Zaheer | Shehzaad Dhuliawala | Andrew McCallum
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

This paper describes our submission to the shared task on “Multi-hop Inference Explanation Regeneration” in TextGraphs workshop at EMNLP 2019 (Jansen and Ustalov, 2019). Our system identifies chains of facts relevant to explain an answer to an elementary science examination question. To counter the problem of ‘spurious chains’ leading to ‘semantic drifts’, we train a ranker that uses contextualized representation of facts to score its relevance for explaining an answer to a question. Our system was ranked first w.r.t the mean average precision (MAP) metric outperforming the second best system by 14.95 points.

pdf bib
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering
Rajarshi Das | Ameya Godbole | Dilip Kavarthapu | Zhiyu Gong | Abhishek Singhal | Mo Yu | Xiaoxiao Guo | Tian Gao | Hamed Zamani | Manzil Zaheer | Andrew McCallum
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Multi-hop question answering (QA) requires an information retrieval (IR) system that can find multiple supporting evidence needed to answer the question, making the retrieval process very challenging. This paper introduces an IR technique that uses information of entities present in the initially retrieved evidence to learn to ‘hop’ to other relevant evidence. In a setting, with more than 5 million Wikipedia paragraphs, our approach leads to significant boost in retrieval performance. The retrieved evidence also increased the performance of an existing QA model (without any training) on the benchmark by 10.59 F1.

pdf bib
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell | Ananya Ganesh | Andrew McCallum
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.

pdf bib
A2N: Attending to Neighbors for Knowledge Graph Inference
Trapit Bansal | Da-Cheng Juan | Sujith Ravi | Andrew McCallum
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

State-of-the-art models for knowledge graph completion aim at learning a fixed embedding representation of entities in a multi-relational graph which can generalize to infer unseen entity relationships at test time. This can be sub-optimal as it requires memorizing and generalizing to all possible entity relationships using these fixed representations. We thus propose a novel attention-based method to learn query-dependent representation of entities which adaptively combines the relevant graph neighborhood of an entity leading to more accurate KG completion. The proposed method is evaluated on two benchmark datasets for knowledge graph completion, and experimental results show that the proposed model performs competitively or better than existing state-of-the-art, including recent methods for explicit multi-hop reasoning. Qualitative probing offers insight into how the model can reason about facts involving multiple hops in the knowledge graph, through the use of neighborhood attention.

pdf bib
Optimal Transport-based Alignment of Learned Character Representations for String Similarity
Derek Tam | Nicholas Monath | Ari Kobren | Aaron Traylor | Rajarshi Das | Andrew McCallum
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE–a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE’s ability to detect whether two strings can refer to the same entity–a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE’s ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in Bˆ3 F1 over the previous state-of-the-art approach.

pdf bib
OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference
Dongxu Zhang | Subhabrata Mukherjee | Colin Lockard | Luna Dong | Andrew McCallum
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embedding for (subject, object) pairs, thus cannot handle pairs absent in any existing triples; or they perform predicate-level mapping and completely ignore background evidence from individual entities, thus cannot achieve satisfying quality. We propose OpenKI to handle sparsity of OpenIE extractions by performing instance-level inference: for each entity, we encode the rich information in its neighborhood in both KB and OpenIE extractions, and leverage this information in relation inference by exploring different methods of aggregation and attention. In order to handle unseen entities, our model is designed without creating entity-specific parameters. Extensive experiments show that this method not only significantly improves state-of-the-art for conventional OpenIE extractions like ReVerb, but also boosts the performance on OpenIE from semi-structured data, where new entity pairs are abundant and data are fairly sparse.

pdf bib
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders
Andrew Drozdov | Patrick Verga | Mohit Yadav | Mohit Iyyer | Andrew McCallum
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce the deep inside-outside recursive autoencoder (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence. During training we use dynamic programming to consider all possible binary trees over the sentence, and for inference we use the CKY algorithm to extract the highest scoring parse. DIORA outperforms previously reported results for unsupervised binary constituency parsing on the benchmark WSJ dataset.

2018

pdf bib
Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings
Haw-Shiuan Chang | Amol Agrawal | Ananya Ganesh | Anirudha Desai | Vinayak Mathur | Alfred Hough | Andrew McCallum
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)

Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-the-art methods while being significantly more efficient.

pdf bib
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset
Michael Boratko | Harshit Padigela | Divyendra Mikkilineni | Pritish Yuvraj | Rajarshi Das | Andrew McCallum | Maria Chang | Achille Fokoue-Nkoutche | Pavan Kapanipathi | Nicholas Mattei | Ryan Musa | Kartik Talamadupula | Michael Witbrock
Proceedings of the Workshop on Machine Reading for Question Answering

The recent work of Clark et al. (2018) introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into easy and challenge sets. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the challenge set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.

pdf bib
Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?
Emma Strubell | Andrew McCallum
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

Do unsupervised methods for learning rich, contextualized token representations obviate the need for explicit modeling of linguistic structure in neural network models for semantic role labeling (SRL)? We address this question by incorporating the massively successful ELMo embeddings (Peters et al., 2018) into LISA (Strubell and McCallum, 2018), a strong, linguistically-informed neural network architecture for SRL. In experiments on the CoNLL-2005 shared task we find that though ELMo out-performs typical word embeddings, beginning to close the gap in F1 between LISA with predicted and gold syntactic parses, syntactically-informed models still out-perform syntax-free models when both use ELMo, especially on out-of-domain data. Our results suggest that linguistic structures are indeed still relevant in this golden age of deep learning for NLP.

pdf bib
Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets
Nathan Greenberg | Trapit Bansal | Patrick Verga | Andrew McCallum
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Extracting typed entity mentions from text is a fundamental component to language understanding and reasoning. While there exist substantial labeled text datasets for multiple subsets of biomedical entity types—such as genes and proteins, or chemicals and diseases—it is rare to find large labeled datasets containing labels for all desired entity types together. This paper presents a method for training a single CRF extractor from multiple datasets with disjoint or partially overlapping sets of entity types. Our approach employs marginal likelihood training to insist on labels that are present in the data, while filling in “missing labels”. This allows us to leverage all the available data within a single model. In experimental results on the Biocreative V CDR (chemicals/diseases), Biocreative VI ChemProt (chemicals/proteins) and MedMentions (19 entity types) datasets, we show that joint training on multiple datasets improves NER F1 over training in isolation, and our methods achieve state-of-the-art results.

pdf bib
Linguistically-Informed Self-Attention for Semantic Role Labeling
Emma Strubell | Patrick Verga | Daniel Andor | David Weiss | Andrew McCallum
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare linguistic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated by training one attention head to attend to syntactic parents for each token. Moreover, if a high-quality syntactic parse is already available, it can be beneficially injected at test time without re-training our SRL model. In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art performance for a model using predicted predicates and standard word embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 F1. LISA also out-performs the state-of-the-art with contextually-encoded (ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on out-of-domain text.

pdf bib
An Interface for Annotating Science Questions
Michael Boratko | Harshit Padigela | Divyendra Mikkilineni | Pritish Yuvraj | Rajarshi Das | Andrew McCallum | Maria Chang | Achille Fokoue | Pavan Kapanipathi | Nicholas Mattei | Ryan Musa | Kartik Talamadupula | Michael Witbrock
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Recent work introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That work includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them. However, it does not include clear definitions of these types, nor does it offer information about the quality of the labels or the annotation process used. In this paper, we introduce a novel interface for human annotation of science question-answer pairs with their respective knowledge and reasoning types, in order that the classification of new questions may be improved. We build on the classification schema proposed by prior work on the ARC dataset, and evaluate the effectiveness of our interface with a preliminary study involving 10 participants.

pdf bib
Embedded-State Latent Conditional Random Fields for Sequence Labeling
Dung Thai | Sree Harsha Ramesh | Shikhar Murty | Luke Vilnis | Andrew McCallum
Proceedings of the 22nd Conference on Computational Natural Language Learning

Complex textual information extraction tasks are often posed as sequence labeling or shallow parsing, where fields are extracted using local labels made consistent through probabilistic inference in a graphical model with constrained transitions. Recently, it has become common to locally parametrize these models using rich features extracted by recurrent neural networks (such as LSTM), while enforcing consistent outputs through a simple linear-chain model, representing Markovian dependencies between successive labels. However, the simple graphical model structure belies the often complex non-local constraints between output labels. For example, many fields, such as a first name, can only occur a fixed number of times, or in the presence of other fields. While RNNs have provided increasingly powerful context-aware local features for sequence tagging, they have yet to be integrated with a global graphical model of similar expressivity in the output distribution. Our model goes beyond the linear chain CRF to incorporate multiple hidden states per output label, but parametrizes them parsimoniously with low-rank log-potential scoring matrices, effectively learning an embedding space for hidden states. This augmented latent space of inference variables complements the rich feature representation of the RNN, and allows exact global inference obeying complex, learned non-local output constraints. We experiment with several datasets and show that the model outperforms baseline CRF+RNN models when global output constraints are necessary at inference-time, and explore the interpretable latent structure.

pdf bib
Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection
Haw-Shiuan Chang | Ziyun Wang | Luke Vilnis | Andrew McCallum
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Modeling hypernymy, such as poodle is-a dog, is an important generalization aid to many NLP tasks, such as entailment, relation extraction, and question answering. Supervised learning from labeled hypernym sources, such as WordNet, limits the coverage of these models, which can be addressed by learning hypernyms from unlabeled text. Existing unsupervised methods either do not scale to large vocabularies or yield unacceptably poor accuracy. This paper introduces distributional inclusion vector embedding (DIVE), a simple-to-implement unsupervised method of hypernym discovery via per-word non-negative vector embeddings which preserve the inclusion property of word contexts. In experimental evaluations more comprehensive than any previous literature of which we are aware—evaluating on 11 datasets using multiple existing as well as newly proposed scoring functions—we find that our method provides up to double the precision of previous unsupervised methods, and the highest average performance, using a much more compact word representation, and yielding many new state-of-the-art results.

pdf bib
Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction
Patrick Verga | Emma Strubell | Andrew McCallum
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Most work in relation extraction forms a prediction by looking at a short span of text within a single sentence containing a single entity pair mention. This approach often does not consider interactions across mentions, requires redundant computation for each mention pair, and ignores relationships expressed across sentence boundaries. These problems are exacerbated by the document- (rather than sentence-) level annotation common in biological text. In response, we propose a model which simultaneously predicts relationships between all mention pairs in a document. We form pairwise predictions over entire paper abstracts using an efficient self-attention encoder. All-pairs mention scores allow us to perform multi-instance learning by aggregating over mentions to form entity pair representations. We further adapt to settings without mention-level annotation by jointly training to predict named entities and adding a corpus of weakly labeled data. In experiments on two Biocreative benchmark datasets, we achieve state of the art performance on the Biocreative V Chemical Disease Relation dataset for models without external KB resources. We also introduce a new dataset an order of magnitude larger than existing human-annotated biological information extraction datasets and more accurate than distantly supervised alternatives.

pdf bib
Training Structured Prediction Energy Networks with Indirect Supervision
Amirmohammad Rooshenas | Aishwarya Kamath | Andrew McCallum
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

This paper introduces rank-based training of structured prediction energy networks (SPENs). Our method samples from output structures using gradient descent and minimizes the ranking violation of the sampled structures with respect to a scalar scoring function defined with domain knowledge. We have successfully trained SPEN for citation field extraction without any labeled data instances, where the only source of supervision is a simple human-written scoring function. Such scoring functions are often easy to provide; the SPEN then furnishes an efficient structured prediction inference procedure.

pdf bib
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking
Shikhar Murty | Patrick Verga | Luke Vilnis | Irena Radovanovic | Andrew McCallum
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Extraction from raw text to a knowledge base of entities and fine-grained types is often cast as prediction into a flat set of entity and type labels, neglecting the rich hierarchies over types and entities contained in curated ontologies. Previous attempts to incorporate hierarchical structure have yielded little benefit and are restricted to shallow ontologies. This paper presents new methods using real and complex bilinear mappings for integrating hierarchical information, yielding substantial improvement over flat predictions in entity linking and fine-grained entity typing, and achieving new state-of-the-art results for end-to-end models on the benchmark FIGER dataset. We also present two new human-annotated datasets containing wide and deep hierarchies which we will release to the community to encourage further research in this direction: MedMentions, a collection of PubMed abstracts in which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet, which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k entity types. In experiments on all three datasets we show substantial gains from hierarchy-aware training.

pdf bib
Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures
Luke Vilnis | Xiang Li | Shikhar Murty | Andrew McCallum
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE), are a natural way to model transitive relational data (e.g. entailment graphs). However, OE learns a deterministic knowledge base, limiting expressiveness of queries and the ability to use uncertainty for both prediction and learning (e.g. learning from expectations). Probabilistic extensions of OE have provided the ability to somewhat calibrate these denotational probabilities while retaining the consistency and inductive bias of ordered models, but lack the ability to model the negative correlations found in real-world knowledge. In this work we show that a broad class of models that assign probability measures to OE can never capture negative correlation, which motivates our construction of a novel box lattice and accompanying probability measure to capture anti-correlation and even disjoint concepts, while still providing the benefits of probabilistic modeling, such as the ability to perform rich joint and conditional queries over arbitrary sets of concepts, and both learning from and predicting calibrated uncertainty. We show improvements over previous approaches in modeling the Flickr and WordNet entailment graphs, and investigate the power of the model.

2017

pdf bib
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
Isabelle Augenstein | Mrinal Das | Sebastian Riedel | Lakshmi Vikraman | Andrew McCallum
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.

pdf bib
Dependency Parsing with Dilated Iterated Graph CNNs
Emma Strubell | Andrew McCallum
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

Dependency parses are an effective way to inject linguistic knowledge into many downstream tasks, and many practitioners wish to efficiently parse sentences at scale. Recent advances in GPU hardware have enabled neural networks to achieve significant gains over the previous best models, these models still fail to leverage GPUs’ capability for massive parallelism due to their requirement of sequential processing of the sentence. In response, we propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for graph-based dependency parsing, a graph convolutional architecture that allows for efficient end-to-end GPU parsing. In experiments on the English Penn TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best neural network parsers.

pdf bib
Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
Emma Strubell | Patrick Verga | David Belanger | Andrew McCallum
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs. Recent advances in GPU hardware have led to the emergence of bi-directional LSTMs as a standard method for obtaining per-token vector representations serving as input to labeling tasks such as NER (often followed by prediction in a linear-chain CRF). Though expressive and accurate, these models fail to fully exploit GPU parallelism, limiting their computational efficiency. This paper proposes a faster alternative to Bi-LSTMs for NER: Iterated Dilated Convolutional Neural Networks (ID-CNNs), which have better capacity than traditional CNNs for large context and structured prediction. Unlike LSTMs whose sequential processing on sentences of length N requires O(N) time even in the face of parallelism, ID-CNNs permit fixed-depth convolutions to run in parallel across entire documents. We describe a distinct combination of network structure, parameter sharing and training procedures that enable dramatic 14-20x test-time speedups while retaining accuracy comparable to the Bi-LSTM-CRF. Moreover, ID-CNNs trained to aggregate context from the entire document are more accurate than Bi-LSTM-CRFs while attaining 8x faster test time speeds.

pdf bib
Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks
Rajarshi Das | Manzil Zaheer | Siva Reddy | Andrew McCallum
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Existing question answering methods infer answers either from a knowledge base or from raw text. While knowledge base (KB) methods are good at answering compositional questions, their performance is often affected by the incompleteness of the KB. Au contraire, web text contains millions of facts that are absent in the KB, however in an unstructured form. Universal schema can support reasoning on the union of both structured KBs and unstructured text by aligning them in a common embedded space. In this paper we extend universal schema to natural language question answering, employing Memory networks to attend to the large body of facts in the combination of text and KB. Our models can be trained in an end-to-end fashion on question-answer pairs. Evaluation results on Spades fill-in-the-blank question answering dataset show that exploiting universal schema for question answering is better than using either a KB or text alone. This model also outperforms the current state-of-the-art by 8.5 F1 points.

pdf bib
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks
Rajarshi Das | Arvind Neelakantan | David Belanger | Andrew McCallum
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Our goal is to combine the rich multi-step inference of symbolic logical reasoning with the generalization capabilities of neural networks. We are particularly interested in complex reasoning about entities and relations in text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs to compose the distributed semantics of multi-hop paths in KBs; however for multiple reasons, the approach lacks accuracy and practicality. This paper proposes three significant modeling advances: (1) we learn to jointly reason about relations, entities, and entity-types; (2) we use neural attention modeling to incorporate multiple paths; (3) we learn to share strength in a single RNN that represents logical composition across all relations. On a large-scale Freebase+ClueWeb prediction task, we achieve 25% error reduction, and a 53% error reduction on sparse relations due to shared strength. On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus previous state-of-the-art.

pdf bib
Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema
Patrick Verga | Arvind Neelakantan | Andrew McCallum
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Universal schema predicts the types of entities and relations in a knowledge base (KB) by jointly embedding the union of all available schema types—not only types from multiple structured databases (such as Freebase or Wikipedia infoboxes), but also types expressed as textual patterns from raw text. This prediction is typically modeled as a matrix completion problem, with one type per column, and either one or two entities per row (in the case of entity types or binary relation types, respectively). Factorizing this sparsely observed matrix yields a learned vector embedding for each row and each column. In this paper we explore the problem of making predictions for entities or entity-pairs unseen at training time (and hence without a pre-learned row embedding). We propose an approach having no per-row parameters at all; rather we produce a row vector on the fly using a learned aggregation function of the vectors of the observed columns for that row. We experiment with various aggregation functions, including neural network attention models. Our approach can be understood as a natural language database, in that questions about KB entities are answered by attending to textual or database evidence. In experiments predicting both relations and entity types, we demonstrate that despite having an order of magnitude fewer parameters than traditional universal schema, we can match the accuracy of the traditional model, and more importantly, we can now make predictions about unseen rows with nearly the same accuracy as rows available at training time.

2016

pdf bib
Incorporating Selectional Preferences in Multi-hop Relation Extraction
Rajarshi Das | Arvind Neelakantan | David Belanger | Andrew McCallum
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Row-less Universal Schema
Patrick Verga | Andrew McCallum
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks
Teresa Martin | Fiete Botschen | Ajay Nagesh | Andrew McCallum
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Multilingual Relation Extraction using Compositional Universal Schema
Patrick Verga | David Belanger | Emma Strubell | Benjamin Roth | Andrew McCallum
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Learning Dynamic Feature Selection for Fast Sequential Prediction
Emma Strubell | Luke Vilnis | Kate Silverstein | Andrew McCallum
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Compositional Vector Space Models for Knowledge Base Completion
Arvind Neelakantan | Benjamin Roth | Andrew McCallum
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Lexicon Infused Phrase Embeddings for Named Entity Resolution
Alexandre Passos | Vineet Kumar | Andrew McCallum
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space
Arvind Neelakantan | Jeevan Shankar | Alexandre Passos | Andrew McCallum
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Learning Soft Linear Constraints with Application to Citation Field Extraction
Sam Anzaroot | Alexandre Passos | David Belanger | Andrew McCallum
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Dynamic Knowledge-Base Alignment for Coreference Resolution
Jiaping Zheng | Luke Vilnis | Sameer Singh | Jinho D. Choi | Andrew McCallum
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
Relation Extraction with Matrix Factorization and Universal Schemas
Sebastian Riedel | Limin Yao | Andrew McCallum | Benjamin M. Marlin
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Transition-based Dependency Parsing with Selectional Branching
Jinho D. Choi | Andrew McCallum
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Michael Wick | Sameer Singh | Andrew McCallum
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Unsupervised Relation Discovery with Sense Disambiguation
Limin Yao | Sebastian Riedel | Andrew McCallum
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Human-Machine Cooperation: Supporting User Corrections to Automatically Constructed KBs
Michael Wick | Karl Schultz | Andrew McCallum
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Monte Carlo MCMC: Efficient Inference by Sampling Factors
Sameer Singh | Michael Wick | Andrew McCallum
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Probabilistic Databases of Universal Schema
Limin Yao | Sebastian Riedel | Andrew McCallum
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Parse, Price and Cut—Delayed Column and Row Generation for Graph Based Parsers
Sebastian Riedel | David Smith | Andrew McCallum
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Monte Carlo MCMC: Efficient Inference by Approximate Sampling
Sameer Singh | Michael Wick | Andrew McCallum
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Sameer Singh | Amarnag Subramanya | Fernando Pereira | Andrew McCallum
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Fast and Robust Joint Models for Biomedical Event Extraction
Sebastian Riedel | Andrew McCallum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Optimizing Semantic Coherence in Topic Models
David Mimno | Hanna Wallach | Edmund Talley | Miriam Leenders | Andrew McCallum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Structured Relation Discovery using Generative Models
Limin Yao | Aria Haghighi | Sebastian Riedel | Andrew McCallum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation
Sebastian Riedel | Andrew McCallum
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Model Combination for Event Extraction in BioNLP 2011
Sebastian Riedel | David McClosky | Mihai Surdeanu | Andrew McCallum | Christopher D. Manning
Proceedings of BioNLP Shared Task 2011 Workshop

2010

pdf bib
Collective Cross-Document Relation Extraction Without Labelled Data
Limin Yao | Sebastian Riedel | Andrew McCallum
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Constraint-Driven Rank-Based Learning for Information Extraction
Sameer Singh | Limin Yao | Sebastian Riedel | Andrew McCallum
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria
Gregory Druck | Gideon Mann | Andrew McCallum
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Active Learning by Labeling Features
Gregory Druck | Burr Settles | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment
Kedar Bellare | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Polylingual Topic Models
David Mimno | Hanna M. Wallach | Jason Naradowsky | David A. Smith | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Inference for Natural Language Processing
Andrew McCallum
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

2008

pdf bib
Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields
Gideon S. Mann | Andrew McCallum
Proceedings of ACL-08: HLT

2007

pdf bib
First-Order Probabilistic Models for Coreference Resolution
Aron Culotta | Michael Wick | Andrew McCallum
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields
Gideon Mann | Andrew McCallum
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

2006

pdf bib
Reducing Weight Undertraining in Structured Discriminative Learning
Charles Sutton | Michael Sindelar | Andrew McCallum
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text
Aron Culotta | Andrew McCallum | Jonathan Betz
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Learning Field Compatibilities to Extract Database Records from Unstructured Text
Michael Wick | Aron Culotta | Andrew McCallum
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Ryan McDonald | Charles Sutton | Hal Daumé III | Andrew McCallum | Fernando Pereira | Jeff Bilmes
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

pdf bib
Practical Markov Logic Containing First-Order Quantifiers with Application to Identity Uncertainty
Aron Culotta | Andrew McCallum
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

2005

pdf bib
Joint Parsing and Semantic Role Labeling
Charles Sutton | Andrew McCallum
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

pdf bib
Composition of Conditional Random Fields for Transfer Learning
Charles Sutton | Andrew McCallum
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Accurate Information Extraction from Research Papers using Conditional Random Fields
Fuchun Peng | Andrew McCallum
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
Confidence Estimation for Information Extraction
Aron Culotta | Andrew McCallum
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
Chinese Segmentation and New Word Detection using Conditional Random Fields
Fuchun Peng | Fangfang Feng | Andrew McCallum
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons
Andrew McCallum | Wei Li
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

1999

pdf bib
Text Classification by Bootstrapping with Keywords, EM and Shrinkage
Andrew McCallum | Kamal Nigam
Unsupervised Learning in Natural Language Processing

Search
Co-authors
Venues