Marine Carpuat


2019

pdf bib
The University of Maryland’s Kazakh-English Neural Machine Translation System at WMT19
Eleftheria Briakou | Marine Carpuat
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the University of Maryland’s submission to the WMT 2019 Kazakh-English news translation task. We study the impact of transfer learning from another low-resource but related language. We experiment with different ways of encoding lexical units to maximize lexical overlap between the two language pairs, as well as back-translation and ensembling. The submitted system improves over a Kazakh-only baseline by +5.45 BLEU on newstest2019.

pdf bib
Identifying Fluently Inadequate Output in Neural and Statistical Machine Translation
Marianna Martindale | Marine Carpuat | Kevin Duh | Paul McNamee
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Controlling Text Complexity in Neural Machine Translation
Sweta Agrawal | Marine Carpuat
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a high quality dataset of news articles available in English and Spanish, written for diverse grade levels and propose a method to align segments across comparable bilingual articles. The resulting dataset makes it possible to train multi-task sequence to sequence models that can translate and simplify text jointly. We show that these multi-task models outperform pipeline approaches that translate and simplify text independently.

pdf bib
Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation
Yogarshi Vyas | Marine Carpuat
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Words in different languages rarely cover the exact same semantic space. This work characterizes differences in meaning between words across languages using semantic relations that have been used to relate the meaning of English words. However, because of translation ambiguity, semantic relations are not always preserved by translation. We introduce a cross-lingual relation classifier trained only with English examples and a bilingual dictionary. Our classifier relies on a novel attention-based distillation approach to account for translation ambiguity when transferring knowledge from English to cross-lingual settings. On new English-Chinese and English-Hindi test sets, the resulting models largely outperform baselines that more naively rely on bilingual embeddings or dictionaries for cross-lingual transfer, and approach the performance of fully supervised systems on English tasks.

pdf bib
Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation
Xing Niu | Weijia Xu | Marine Carpuat
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We aim to better exploit the limited amounts of parallel text available in low-resource settings by introducing a differentiable reconstruction loss for neural machine translation (NMT). This loss compares original inputs to reconstructed inputs, obtained by back-translating translation hypotheses into the input language. We leverage differentiable sampling and bi-directional NMT to train models end-to-end, without introducing additional parameters. This approach achieves small but consistent BLEU improvements on four language pairs in both translation directions, and outperforms an alternative differentiable reconstruction strategy based on hidden states.

pdf bib
Curriculum Learning for Domain Adaptation in Neural Machine Translation
Xuan Zhang | Pamela Shapiro | Gaurav Kumar | Paul McNamee | Marine Carpuat | Kevin Duh
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.

pdf bib
Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation
Weijia Xu | Xing Niu | Marine Carpuat
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Despite some empirical success at correcting exposure bias in machine translation, scheduled sampling algorithms suffer from a major drawback: they incorrectly assume that words in the reference translations and in sampled sequences are aligned at each time step. Our new differentiable sampling algorithm addresses this issue by optimizing the probability that the reference can be aligned with the sampled output, based on a soft alignment predicted by the model itself. As a result, the output distribution at each time step is evaluated with respect to the whole predicted sequence. Experiments on IWSLT translation tasks show that our approach improves BLEU compared to maximum likelihood and scheduled sampling baselines. In addition, our approach is simpler to train with no need for sampling schedule and yields models that achieve larger improvements with smaller beam sizes.

2018

pdf bib
Multi-Task Neural Models for Translating Between Styles Within and Across Languages
Xing Niu | Sudha Rao | Marine Carpuat
Proceedings of the 27th International Conference on Computational Linguistics

Generating natural language requires conveying content in an appropriate style. We explore two related tasks on generating text of varying formality: monolingual formality transfer and formality-sensitive machine translation. We propose to solve these tasks jointly using multi-task learning, and show that our models achieve state-of-the-art performance for formality transfer and are able to perform formality-sensitive translation without being explicitly trained on style-annotated translation examples.

pdf bib
Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT
Marianna Martindale | Marine Carpuat
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers)

pdf bib
Bi-Directional Neural Machine Translation with Synthetic Parallel Data
Xing Niu | Michael Denkowski | Marine Carpuat
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Despite impressive progress in high-resource settings, Neural Machine Translation (NMT) still struggles in low-resource and out-of-domain scenarios, often failing to match the quality of phrase-based translation. We propose a novel technique that combines back-translation and multilingual NMT to improve performance in these difficult cases. Our technique trains a single model for both directions of a language pair, allowing us to back-translate source or target monolingual data without requiring an auxiliary model. We then continue training on the augmented parallel data, enabling a cycle of improvement for a single model that can incorporate any source, target, or parallel data to improve both translation directions. As a byproduct, these models can reduce training and deployment costs significantly compared to uni-directional models. Extensive experiments show that our technique outperforms standard back-translation in low-resource scenarios, improves quality on cross-domain tasks, and effectively reduces costs across the board.

pdf bib
The University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18
Weijia Xu | Marine Carpuat
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
Robust Cross-Lingual Hypernymy Detection Using Dependency Context
Shyam Upadhyay | Yogarshi Vyas | Marine Carpuat | Dan Roth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Cross-lingual Hypernymy Detection involves determining if a word in one language (“fruit”) is a hypernym of a word in another language (“pomme” i.e. apple in French). The ability to detect hypernymy cross-lingually can aid in solving cross-lingual versions of tasks such as textual entailment and event coreference. We propose BiSparse-Dep, a family of unsupervised approaches for cross-lingual hypernymy detection, which learns sparse, bilingual word embeddings based on dependency contexts. We show that BiSparse-Dep can significantly improve performance on this task, compared to approaches based only on lexical context. Our approach is also robust, showing promise for low-resource settings: our dependency-based embeddings can be learned using a parser trained on related languages, with negligible loss in performance. We also crowd-source a challenging dataset for this task on four languages – Russian, French, Arabic, and Chinese. Our embeddings and datasets are publicly available.

pdf bib
Identifying Semantic Divergences in Parallel Text without Annotations
Yogarshi Vyas | Xing Niu | Marine Carpuat
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.

pdf bib
Proceedings of The 12th International Workshop on Semantic Evaluation
Marianna Apidianaki | Saif M. Mohammad | Jonathan May | Ekaterina Shutova | Steven Bethard | Marine Carpuat
Proceedings of The 12th International Workshop on Semantic Evaluation

pdf bib
UMD at SemEval-2018 Task 10: Can Word Embeddings Capture Discriminative Attributes?
Alexander Zhang | Marine Carpuat
Proceedings of The 12th International Workshop on Semantic Evaluation

We describe the University of Maryland’s submission to SemEval-018 Task 10, “Capturing Discriminative Attributes”: given word triples (w1, w2, d), the goal is to determine whether d is a discriminating attribute belonging to w1 but not w2. Our study aims to determine whether word embeddings can address this challenging task. Our submission casts this problem as supervised binary classification using only word embedding features. Using a gaussian SVM model trained only on validation data results in an F-score of 60%. We also show that cosine similarity features are more effective, both in unsupervised systems (F-score of 65%) and supervised systems (F-score of 67%).

2017

pdf bib
Detecting Asymmetric Semantic Relations in Context: A Case-Study on Hypernymy Detection
Yogarshi Vyas | Marine Carpuat
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

We introduce WHiC, a challenging testbed for detecting hypernymy, an asymmetric relation between words. While previous work has focused on detecting hypernymy between word types, we ground the meaning of words in specific contexts drawn from WordNet examples, and require predictions to be sensitive to changes in contexts. WHiC lets us analyze complementary properties of two approaches of inducing vector representations of word meaning in context. We show that such contextualized word representations also improve detection of a wider range of semantic relations in context.

pdf bib
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Steven Bethard | Marine Carpuat | Marianna Apidianaki | Saif M. Mohammad | Daniel Cer | David Jurgens
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

pdf bib
Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation
Marine Carpuat | Yogarshi Vyas | Xing Niu
Proceedings of the First Workshop on Neural Machine Translation

Parallel corpora are often not as parallel as one might assume: non-literal translations and noisy translations abound, even in curated corpora routinely used for training and evaluation. We use a cross-lingual textual entailment system to distinguish sentence pairs that are parallel in meaning from those that are not, and show that filtering out divergent examples from training improves translation quality.

pdf bib
Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases
Xing Niu | Marine Carpuat
Proceedings of the Workshop on Stylistic Variation

Detecting and analyzing stylistic variation in language is relevant to diverse Natural Language Processing applications. In this work, we investigate whether salient dimensions of style variations are embedded in standard distributional vector spaces of word meaning. We hypothesizes that distances between embeddings of lexical paraphrases can help isolate style from meaning variations and help identify latent style dimensions. We conduct a qualitative analysis of latent style dimensions, and show the effectiveness of identified style subspaces on a lexical formality prediction task.

pdf bib
A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output
Xing Niu | Marianna Martindale | Marine Carpuat
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Stylistic variations of language, such as formality, carry speakers’ intention beyond literal meaning and should be conveyed adequately in translation. We propose to use lexical formality models to control the formality level of machine translation output. We demonstrate the effectiveness of our approach in empirical evaluations, as measured by automatic metrics and human assessments.

pdf bib
Proceedings of ACL 2017, Student Research Workshop
Allyson Ettinger | Spandana Gella | Matthieu Labeau | Cecilia Ovesdotter Alm | Marine Carpuat | Mark Dredze
Proceedings of ACL 2017, Student Research Workshop

2016

pdf bib
Learning Monolingual Compositional Representations via Bilingual Supervision
Ahmed Elgohary | Marine Carpuat
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
Steven Bethard | Marine Carpuat | Daniel Cer | David Jurgens | Preslav Nakov | Torsten Zesch
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
Nathan Schneider | Dirk Hovy | Anders Johannsen | Marine Carpuat
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Sparse Bilingual Word Representations for Cross-lingual Lexical Entailment
Yogarshi Vyas | Marine Carpuat
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Retrofitting Sense-Specific Word Vectors Using Parallel Text
Allyson Ettinger | Philip Resnik | Marine Carpuat
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marine Carpuat | Eneko Agirre | Nora Aranberri
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Proceedings of the Second Workshop on Discourse in Machine Translation
Bonnie Webber | Marine Carpuat | Andrei Popescu-Belis | Christian Hardmeier
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
Connotation in Translation
Marine Carpuat
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2014

pdf bib
Linear Mixture Models for Robust Machine Translation
Marine Carpuat | Cyril Goutte | George Foster
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Mixed Language and Code-Switching in the Canadian Hansard
Marine Carpuat
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marine Carpuat | Xavier Carreras | Eva Maria Vecchi
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
The NRC System for Discriminating Similar Languages
Cyril Goutte | Serge Léger | Marine Carpuat
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

pdf bib
Cross-lingual Discourse Relation Analysis: A corpus study and a semi-supervised classification system
Junyi Jessy Li | Marine Carpuat | Ani Nenkova
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Assessing the Discourse Factors that Influence the Quality of Machine Translation
Junyi Jessy Li | Marine Carpuat | Ani Nenkova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
CNRC-TMT: Second Language Writing Assistant System Description
Cyril Goutte | Michel Simard | Marine Carpuat
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Measuring Machine Translation Errors in New Domains
Ann Irvine | John Morgan | Marine Carpuat | Hal Daumé III | Dragos Munteanu
Transactions of the Association for Computational Linguistics, Volume 1

We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.

pdf bib
NRC: A Machine Translation Approach to Cross-Lingual Word Sense Disambiguation (SemEval-2013 Task 10)
Marine Carpuat
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation
Marine Carpuat | Lucia Specia | Dekai Wu
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
A Semantic Evaluation of Machine Translation Lexical Choice
Marine Carpuat
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Feature Space Selection and Combination for Native Language Identification
Cyril Goutte | Serge Léger | Marine Carpuat
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat | Hal Daumé III | Katharine Henry | Ann Irvine | Jagadeesh Jagarlamudi | Rachel Rudinger
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
The Trouble with SMT Consistency
Marine Carpuat | Michel Simard
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Marine Carpuat | Lucia Specia | Dekai Wu
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

2011

pdf bib
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marianna Apidianaki | Marine Carpuat | Lucia Specia
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

2010

pdf bib
Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation
Marine Carpuat | Mona Diab
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
Marine Carpuat | Yuval Marton | Nizar Habash
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Toward Using Morphology in French-English Phrase-Based SMT
Marine Carpuat
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
One Translation Per Discourse
Marine Carpuat
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

2008

pdf bib
Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation
Marine Carpuat | Dekai Wu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

2007

pdf bib
Improving Statistical Machine Translation Using Word Sense Disambiguation
Marine Carpuat | Dekai Wu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Proceedings of the COLING/ACL 2006 Student Research Workshop
Marine Carpuat | Kevin Duh | Rebecca Hwa
Proceedings of the COLING/ACL 2006 Student Research Workshop

pdf bib
Boosting for Chinese Named Entity Recognition
Xiaofeng Yu | Marine Carpuat | Dekai Wu
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Word Sense Disambiguation vs. Statistical Machine Translation
Marine Carpuat | Dekai Wu
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation
Marine Carpuat | Dekai Wu
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
Using N-best lists for Named Entity Recognition from Chinese Speech
Lufeng Zhai | Pascale Fung | Richard Schwartz | Marine Carpuat | Dekai Wu
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
A Kernel PCA Method for Superior Word Sense Disambiguation
Dekai Wu | Weifeng Su | Marine Carpuat
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Augmenting ensemble classification for Word Sense Disambiguation with a kernel PCA model
Marine Carpuat | Weifeng Su | Dekai Wu
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Semantic role labeling with Boosting, SVMs, Maximum Entropy, SNOW, and Decision Lists
Grace Ngai | Dekai Wu | Marine Carpuat | Chi-Shing Wang | Chi-Yung Wang
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Joining forces to resolve lexical ambiguity: East meets West in Barcelona
Richard Wicentowski | Grace Ngai | Dekai Wu | Marine Carpuat | Emily Thomforde | Adrian Packel
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Why Nitpicking Works: Evidence for Occam’s Razor in Error Correctors
Dekai Wu | Grace Ngai | Marine Carpuat
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Semi-supervised training of a Kernel PCA-Based Model for Word Sense Disambiguation
Weifeng Su | Marine Carpuat | Dekai Wu
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Raising the Bar: Stacked Conservative Error Correction Beyond Boosting
Dekai Wu | Grace Ngai | Marine Carpuat
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
A Stacked, Voted, Stacked Model for Named Entity Recognition
Dekai Wu | Grace Ngai | Marine Carpuat
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

pdf bib
Boosting for Named Entity Recognition
Dekai Wu | Grace Ngai | Marine Carpuat | Jeppe Larsen | Yongsheng Yang
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
Identifying Concepts Across Languages: A First Step towards a Corpus-based Approach to Automatic Ontology Alignment
Grace Ngai | Marine Carpuat | Pascale Fung
COLING 2002: The 19th International Conference on Computational Linguistics