Karl Stratos


2019

pdf bib
EntEval: A Holistic Evaluation Benchmark for Entity Representations
Mingda Chen | Zewei Chu | Yang Chen | Karl Stratos | Kevin Gimpel
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Rich entity representations are useful for a wide class of problems involving entities. Despite their importance, there is no standardized benchmark that evaluates the overall quality of entity representations. In this work, we propose EntEval: a test suite of diverse tasks that require nontrivial understanding of entities including entity typing, entity similarity, entity relation prediction, and entity disambiguation. In addition, we develop training techniques for learning better entity representations by using natural hyperlink annotations in Wikipedia. We identify effective objectives for incorporating the contextual information in hyperlinks into state-of-the-art pretrained language models (Peters et al., 2018) and show that they improve strong baselines on multiple EntEval tasks.

pdf bib
Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction
Karl Stratos
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We address part-of-speech (POS) induction by maximizing the mutual information between the induced label and its context. We focus on two training objectives that are amenable to stochastic gradient descent (SGD): a novel generalization of the classical Brown clustering objective and a recently proposed variational lower bound. While both objectives are subject to noise in gradient updates, we show through analysis and experiments that the variational lower bound is robust whereas the generalized Brown objective is vulnerable. We obtain strong performance on a multitude of datasets and languages with a simple architecture that encodes morphology and context.

pdf bib
Label-Agnostic Sequence Labeling by Copying Nearest Neighbors
Sam Wiseman | Karl Stratos
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Retrieve-and-edit based approaches to structured prediction, where structures associated with retrieved neighbors are edited to form new structures, have recently attracted increased interest. However, much recent work merely conditions on retrieved structures (e.g., in a sequence-to-sequence framework), rather than explicitly manipulating them. We show we can perform accurate sequence labeling by explicitly (and only) copying labels from retrieved neighbors. Moreover, because this copying is label-agnostic, we can achieve impressive performance in zero-shot sequence-labeling tasks. We additionally consider a dynamic programming approach to sequence labeling in the presence of retrieved neighbors, which allows for controlling the number of distinct (copied) segments used to form a prediction, and leads to both more interpretable and accurate predictions.

2018

pdf bib
Compositional Morpheme Embeddings with Affixes as Functions and Stems as Arguments
Daniel Edmiston | Karl Stratos
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

This work introduces a novel, linguistically motivated architecture for composing morphemes to derive word embeddings. The principal novelty in the work is to treat stems as vectors and affixes as functions over vectors. In this way, our model’s architecture more closely resembles the compositionality of morphemes in natural language. Such a model stands in opposition to models which treat morphemes uniformly, making no distinction between stem and affix. We run this new architecture on a dependency parsing task in Korean—a language rich in derivational morphology—and compare it against a lexical baseline,along with other sub-word architectures. StAffNet, the name of our architecture, shows competitive performance with the state-of-the-art on this task.

2017

pdf bib
Domain Attention with an Ensemble of Experts
Young-Bum Kim | Karl Stratos | Dongchan Kim
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

An important problem in domain adaptation is to quickly generalize to a new domain with limited supervision given K existing domains. One approach is to retrain a global model across all K + 1 domains using standard techniques, for instance Daumé III (2009). However, it is desirable to adapt without having to re-estimate a global model from scratch each time a new domain with potentially new intents and slots is added. We describe a solution based on attending an ensemble of domain experts. We assume K domain specific intent and slot models trained on respective domains. When given domain K + 1, our model uses a weighted combination of the K domain experts’ feedback along with its own opinion to make predictions on the new domain. In experiments, the model significantly outperforms baselines that do not use domain adaptation and also performs better than the full retraining approach.

pdf bib
Adversarial Adaptation of Synthetic or Stale Data
Young-Bum Kim | Karl Stratos | Dongchan Kim
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Two types of data shift common in practice are 1. transferring from synthetic data to live user data (a deployment shift), and 2. transferring from stale data to current data (a temporal shift). Both cause a distribution mismatch between training and evaluation, leading to a model that overfits the flawed training data and performs poorly on the test data. We propose a solution to this mismatch problem by framing it as domain adaptation, treating the flawed training dataset as a source domain and the evaluation dataset as a target domain. To this end, we use and build on several recent advances in neural domain adaptation such as adversarial training (Ganinet al., 2016) and domain separation network (Bousmalis et al., 2016), proposing a new effective adversarial training scheme. In both supervised and unsupervised adaptation scenarios, our approach yields clear improvement over strong baselines.

pdf bib
Reconstruction of Word Embeddings from Sub-Word Parameters
Karl Stratos
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Pre-trained word embeddings improve the performance of a neural model at the cost of increasing the model size. We propose to benefit from this resource without paying the cost by operating strictly at the sub-lexical level. Our approach is quite simple: before task-specific training, we first optimize sub-word parameters to reconstruct pre-trained word embeddings using various distance measures. We report interesting results on a variety of tasks: word similarity, word analogy, and part-of-speech tagging.

pdf bib
Entity Identification as Multitasking
Karl Stratos
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

Standard approaches in entity identification hard-code boundary detection and type prediction into labels and perform Viterbi. This has two disadvantages: 1. the runtime complexity grows quadratically in the number of types, and 2. there is no natural segment-level representation. In this paper, we propose a neural architecture that addresses these disadvantages. We frame the problem as multitasking, separating boundary detection and type prediction but optimizing them jointly. Despite its simplicity, this architecture performs competitively with fully structured models such as BiLSTM-CRFs while scaling linearly in the number of types. Furthermore, by construction, the model induces type-disambiguating embeddings of predicted mentions.

pdf bib
A Sub-Character Architecture for Korean Language Processing
Karl Stratos
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce a novel sub-character architecture that exploits a unique compositional structure of the Korean language. Our method decomposes each character into a small set of primitive phonetic units called jamo letters from which character- and word-level representations are induced. The jamo letters divulge syntactic and semantic information that is difficult to access with conventional character-level units. They greatly alleviate the data sparsity problem, reducing the observation space to 1.6% of the original while increasing accuracy in our experiments. We apply our architecture to dependency parsing and achieve dramatic improvement over strong lexical baselines.

2016

pdf bib
Scalable Semi-Supervised Query Classification Using Matrix Sketching
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models
Karl Stratos | Michael Collins | Daniel Hsu
Transactions of the Association for Computational Linguistics, Volume 4

We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem. These HMMs, which we call anchor HMMs, assume that each tag is associated with at least one word that can have no other tag, which is a relatively benign condition for POS tagging (e.g., “the” is a word that appears only under the determiner tag). We exploit this assumption and extend the non-negative matrix factorization framework of Arora et al. (2013) to design a consistent estimator for anchor HMMs. In experiments, our algorithm is competitive with strong baselines such as the clustering method of Brown et al. (1992) and the log-linear model of Berg-Kirkpatrick et al. (2010). Furthermore, it produces an interpretable model in which hidden states are automatically lexicalized by words.

pdf bib
Frustratingly Easy Neural Domain Adaptation
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Popular techniques for domain adaptation such as the feature augmentation method of Daumé III (2009) have mostly been considered for sparse binary-valued features, but not for dense real-valued features such as those used in neural networks. In this paper, we describe simple neural extensions of these techniques. First, we propose a natural generalization of the feature augmentation method that uses K + 1 LSTMs where one model captures global patterns across all K domains and the remaining K models capture domain-specific information. Second, we propose a novel application of the framework for learning shared structures by Ando and Zhang (2005) to domain adaptation, and also provide a neural extension of their approach. In experiments on slot tagging over 17 domains, our methods give clear performance improvement over Daumé III (2009) applied on feature-rich CRFs.

pdf bib
Domainless Adaptation by Constrained Decoding on a Schema Lattice
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In many applications such as personal digital assistants, there is a constant need for new domains to increase the system’s coverage of user queries. A conventional approach is to learn a separate model every time a new domain is introduced. This approach is slow, inefficient, and a bottleneck for scaling to a large number of domains. In this paper, we introduce a framework that allows us to have a single model that can handle all domains: including unknown domains that may be created in the future as long as they are covered in the master schema. The key idea is to remove the need for distinguishing domains by explicitly predicting the schema of queries. Given permitted schema of a query, we perform constrained decoding on a lattice of slot sequences allowed under the schema. The proposed model achieves competitive and often superior performance over the conventional model trained separately per domain.

2015

pdf bib
New Transfer Learning Techniques for Disparate Label Sets
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya | Minwoo Jeong
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Model-based Word Embeddings from Decompositions of Count Matrices
Karl Stratos | Michael Collins | Daniel Hsu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Pre-training of Hidden-Unit CRFs
Young-Bum Kim | Karl Stratos | Ruhi Sarikaya
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Compact Lexicon Selection with Spectral Methods
Young-Bum Kim | Karl Stratos | Xiaohu Liu | Ruhi Sarikaya
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs
Young-Bum Kim | Minwoo Jeong | Karl Stratos | Ruhi Sarikaya
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Simple Semi-Supervised POS Tagging
Karl Stratos | Michael Collins
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

2013

pdf bib
Experiments with Spectral Learning of Latent-Variable PCFGs
Shay B. Cohen | Karl Stratos | Michael Collins | Dean P. Foster | Lyle Ungar
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Spectral Learning Algorithms for Natural Language Processing
Shay Cohen | Michael Collins | Dean Foster | Karl Stratos | Lyle Ungar
NAACL HLT 2013 Tutorial Abstracts

pdf bib
Spectral Learning of Refinement HMMs
Karl Stratos | Alexander Rush | Shay B. Cohen | Michael Collins
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

2012

pdf bib
Spectral Learning of Latent-Variable PCFGs
Shay B. Cohen | Karl Stratos | Michael Collins | Dean P. Foster | Lyle Ungar
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Detecting Visual Text
Jesse Dodge | Amit Goyal | Xufeng Han | Alyssa Mensch | Margaret Mitchell | Karl Stratos | Kota Yamaguchi | Yejin Choi | Hal Daumé III | Alex Berg | Tamara Berg
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Midge: Generating Image Descriptions From Computer Vision Detections
Margaret Mitchell | Jesse Dodge | Amit Goyal | Kota Yamaguchi | Karl Stratos | Xufeng Han | Alyssa Mensch | Alex Berg | Tamara Berg | Hal Daumé III
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics