Declan Groves


2022

pdf bib
Leveraging Pre-trained Language Models for Gender Debiasing
Nishtha Jain | Declan Groves | Lucia Specia | Maja Popović
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Studying and mitigating gender and other biases in natural language have become important areas of research from both algorithmic and data perspectives. This paper explores the idea of reducing gender bias in a language generation context by generating gender variants of sentences. Previous work in this field has either been rule-based or required large amounts of gender balanced training data. These approaches are however not scalable across multiple languages, as creating data or rules for each language is costly and time-consuming. This work explores a light-weight method to generate gender variants for a given text using pre-trained language models as the resource, without any task-specific labelled data. The approach is designed to work on multiple languages with minimal changes in the form of heuristics. To showcase that, we have tested it on a high-resourced language, namely Spanish, and a low-resourced language from a different family, namely Serbian. The approach proved to work very well on Spanish, and while the results were less positive for Serbian, it showed potential even for languages where pre-trained models are less effective.

2021

pdf bib
Generating Gender Augmented Data for NLP
Nishtha Jain | Maja Popović | Declan Groves | Eva Vanmassenhove
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable re-writing approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for creating gender balanced training data. The proposed approach is based on a neural machine translation system trained to ‘translate’ from one gender alternative to another. Both the automatic and manual analysis of the approach show promising results with respect to the automatic generation of gender alternatives for conversational sentences in Spanish.

2019

pdf bib
Automatic Translation for Software with Safe Velocity
Dag Schmidtke | Declan Groves
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib
When less is more in Neural Quality Estimation of Machine Translation. An industry case study
Dimitar Shterionov | Félix Do Carmo | Joss Moorkens | Eric Paquin | Dag Schmidtke | Declan Groves | Andy Way
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

2017

pdf bib
IJCNLP-2017 Task 4: Customer Feedback Analysis
Chao-Hong Liu | Yasufumi Moriya | Alberto Poncelas | Declan Groves
Proceedings of the IJCNLP 2017, Shared Tasks

This document introduces the IJCNLP 2017 Shared Task on Customer Feedback Analysis. In this shared task we have prepared corpora of customer feedback in four languages, i.e. English, French, Spanish and Japanese. They were annotated in a common meanings categorization, which was improved from an ADAPT-Microsoft pivot study on customer feedback. Twenty teams participated in the shared task and twelve of them have submitted prediction results. The results show that performance of prediction meanings of customer feedback is reasonable well in four languages. Nine system description papers are archived in the shared tasks proceeding.

2014

pdf bib
Perception vs. reality: measuring machine translation post-editing productivity
Federico Gaspari | Antonio Toral | Sudip Kumar Naskar | Declan Groves | Andy Way
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas

This paper presents a study of user-perceived vs real machine translation (MT) post-editing effort and productivity gains, focusing on two bidirectional language pairs: English—German and English—Dutch. Twenty experienced media professionals post-edited statistical MT output and also manually translated comparative texts within a production environment. The paper compares the actual post-editing time against the users’ perception of the effort and time required to post-edit the MT output to achieve publishable quality, thus measuring real (vs perceived) productivity gains. Although for all the language pairs users perceived MT post-editing to be slower, in fact it proved to be a faster option than manual translation for two translation directions out of four, i.e. for Dutch to English, and (marginally) for English to German. For further objective scrutiny, the paper also checks the correlation of three state-of-the-art automatic MT evaluation metrics (BLEU, METEOR and TER) with the actual post-editing time.

pdf bib
MTWatch: A Tool for the Analysis of Noisy Parallel Data
Sandipan Dandapat | Declan Groves
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

State-of-the-art statistical machine translation (SMT) technique requires a good quality parallel data to build a translation model. The availability of large parallel corpora has rapidly increased over the past decade. However, often these newly developed parallel data contains contain significant noise. In this paper, we describe our approach for classifying good quality parallel sentence pairs from noisy parallel data. We use 10 different features within a Support Vector Machine (SVM)-based model for our classification task. We report a reasonably good classification accuracy and its positive effect on overall MT accuracy.

2013

pdf bib
Meta-Evaluation of a Diagnostic Quality Metric for Machine Translation
Sudip Kumar Naskar | Antonio Toral | Federico Gaspari | Declan Groves
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
QTLaunchpad
Stephen Doherty | Declan Groves | Josef van Genabith | Arle Lommel | Aljoscha Burchardt | Hans Uszkoreit | Lucia Specia | Stelios Piperidis
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
TMTprime: A Recommender System for MT and TM Integration
Aswarth Abhilash Dara | Sandipan Dandapat | Declan Groves | Josef van Genabith
Proceedings of the 2013 NAACL HLT Demonstration Session

pdf bib
A Web Application for the Diagnostic Evaluation of Machine Translation over Specific Linguistic Phenomena
Antonio Toral | Sudip Kumar Naskar | Joris Vreeke | Federico Gaspari | Declan Groves
Proceedings of the 2013 NAACL HLT Demonstration Session

2009

pdf bib
Identification and Analysis of Post-Editing Patterns for MT
Declan Groves | Dag Schmidtke
Proceedings of Machine Translation Summit XII: Commercial MT User Program

2008

pdf bib
Bringing humans into the loop: Localization with MT at Traslán
Declan Groves
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Keynote Presentations

2006

pdf bib
Contextual Bitext-Derived Paraphrases in Automatic MT Evaluation
Karolina Owczarzak | Declan Groves | Josef Van Genabith | Andy Way
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Wrapper Syntax for Example-based Machine Translation
Karolina Owczarzak | Bart Mellebeek | Declan Groves | Josef Van Genabith | Andy Way
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their translation to form target language sentences. This generally improves both the word order and lexical selection of the translation. To date, TransBooster has been successfully applied to rule-based MT, statistical MT, and multi-engine MT. This paper presents the application of TransBooster to Example-Based Machine Translation. In an experiment conducted on test sets extracted from Europarl and the Penn II Treebank we show that our method can raise the BLEU score up to 3.8% relative to the EBMT baseline. We also conduct a manual evaluation, showing that TransBooster-enhanced EBMT produces a better output in terms of fluency than the baseline EBMT in 55% of the cases and in terms of accuracy in 53% of the cases.

pdf bib
Example-Based Machine Translation of the Basque Language
Nicolas Stroppa | Declan Groves | Andy Way | Kepa Sarasola
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular Data-Driven MT system which includes different chunkers as well as chunk aligners which can deal with the free order of sentence constituents of Basque. We conducted Basque to English translation experiments, evaluated on a large corpus (270,000 sentence pairs). The experimental results show that our system significantly outperforms state-of-the-art approaches according to several common automatic evaluation metrics.

pdf bib
Hybridity in MT. Experiments on the Europarl Corpus
Declan Groves | Andy Way
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

pdf bib
A Syntactic Skeleton for Statistical Machine Translation
Bart Mellebeek | Karolina Owczarzak | Declan Groves | Josef Van Genabith | Andy Way
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

pdf bib
Hybrid Example-Based SMT: the Best of Both Worlds?
Declan Groves | Andy Way
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
Robust Sub-Sentential Alignment of Phrase-Structure Trees
Declan Groves | Mary Hearne | Andy Way
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics