Philip Williams


2021

pdf bib
ELITR Multilingual Live Subtitling: Demo and Strategy
Ondřej Bojar | Dominik Macháček | Sangeet Sagar | Otakar Smrž | Jonáš Kratochvíl | Peter Polák | Ebrahim Ansari | Mohammad Mahmoudi | Rishu Kumar | Dario Franceschini | Chiara Canton | Ivan Simonini | Thai-Son Nguyen | Felix Schneider | Sebastian Stüker | Alex Waibel | Barry Haddow | Rico Sennrich | Philip Williams
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

This paper presents an automatic speech translation system aimed at live subtitling of conference presentations. We describe the overall architecture and key processing components. More importantly, we explain our strategy for building a complex system for end-users from numerous individual components, each of which has been tested only in laboratory conditions. The system is a working prototype that is routinely tested in recognizing English, Czech, and German speech and presenting it translated simultaneously into 42 target languages.

2020

pdf bib
ELITR Non-Native Speech Translation at IWSLT 2020
Dominik Macháček | Jonáš Kratochvíl | Sangeet Sagar | Matúš Žilinec | Ondřej Bojar | Thai-Son Nguyen | Felix Schneider | Philip Williams | Yuekun Yao
Proceedings of the 17th International Conference on Spoken Language Translation

This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-to-end general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.

pdf bib
Removing European Language Barriers with Innovative Machine Translation Technology
Dario Franceschini | Chiara Canton | Ivan Simonini | Armin Schweinfurth | Adelheid Glott | Sebastian Stüker | Thai-Son Nguyen | Felix Schneider | Thanh-Le Ha | Alex Waibel | Barry Haddow | Philip Williams | Rico Sennrich | Ondřej Bojar | Sangeet Sagar | Dominik Macháček | Otakar Smrž
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper presents our progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling. The platform has been designed with a focus on very low latency and high flexibility while allowing research prototypes of speech and text processing tools to be easily connected, regardless of where they physically run. We outline our architecture solution and also briefly compare it with the ELG platform. Technical details are provided on the most important components and we summarize the test deployment events we ran so far.

pdf bib
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
Biao Zhang | Philip Williams | Ivan Titov | Rico Sennrich
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both one-to-many and many-to-many settings, and improves zero-shot performance by ~10 BLEU, approaching conventional pivot-based methods.

pdf bib
ELITR: European Live Translator
Ondřej Bojar | Dominik Macháček | Sangeet Sagar | Otakar Smrž | Jonáš Kratochvíl | Ebrahim Ansari | Dario Franceschini | Chiara Canton | Ivan Simonini | Thai-Son Nguyen | Felix Schneider | Sebastian Stücker | Alex Waibel | Barry Haddow | Rico Sennrich | Philip Williams
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

ELITR (European Live Translator) project aims to create a speech translation system for simultaneous subtitling of conferences and online meetings targetting up to 43 languages. The technology is tested by the Supreme Audit Office of the Czech Republic and by alfaview®, a German online conferencing system. Other project goals are to advance document-level and multilingual machine translation, automatic speech recognition, and automatic minuting.

pdf bib
The University of Edinburgh’s English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task
Rachel Bawden | Alexandra Birch | Radina Dobreva | Arturo Oncevay | Antonio Valerio Miceli Barone | Philip Williams
Proceedings of the Fifth Conference on Machine Translation

We describe the University of Edinburgh’s submissions to the WMT20 news translation shared task for the low resource language pair English-Tamil and the mid-resource language pair English-Inuktitut. We use the neural machine translation transformer architecture for all submissions and explore a variety of techniques to improve translation quality to compensate for the lack of parallel training data. For the very low-resource English-Tamil, this involves exploring pretraining, using both language model objectives and translation using an unrelated high-resource language pair (German-English), and iterative backtranslation. For English-Inuktitut, we explore the use of multilingual systems, which, despite not being part of the primary submission, would have achieved the best results on the test set.

2019

pdf bib
Samsung and University of Edinburgh’s System for the IWSLT 2019
Joanna Wetesko | Marcin Chochowski | Pawel Przybysz | Philip Williams | Roman Grundkiewicz | Rico Sennrich | Barry Haddow | Barone | Valerio Miceli | Alexandra Birch
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes the joint submission to the IWSLT 2019 English to Czech task by Samsung RD Institute, Poland, and the University of Edinburgh. Our submission was ultimately produced by combining four Transformer systems through a mixture of ensembling and reranking.

2018

pdf bib
Samsung and University of Edinburgh’s System for the IWSLT 2018 Low Resource MT Task
Philip Williams | Marcin Chochowski | Pawel Przybysz | Rico Sennrich | Barry Haddow | Alexandra Birch
Proceedings of the 15th International Conference on Spoken Language Translation

This paper describes the joint submission to the IWSLT 2018 Low Resource MT task by Samsung R&D Institute, Poland, and the University of Edinburgh. We focused on supplementing the very limited in-domain Basque-English training data with out-of-domain data, with synthetic data, and with data for other language pairs. We also experimented with a variety of model architectures and features, which included the development of extensions to the Nematus toolkit. Our submission was ultimately produced by a system combination in which we reranked translations from our strongest individual system using multiple weaker systems.

2017

pdf bib
The QT21 Combined Machine Translation System for English to Latvian
Jan-Thorsten Peter | Hermann Ney | Ondřej Bojar | Ngoc-Quan Pham | Jan Niehues | Alex Waibel | Franck Burlot | François Yvon | Mārcis Pinnis | Valters Šics | Jasmijn Bastings | Miguel Rios | Wilker Aziz | Philip Williams | Frédéric Blain | Lucia Specia
Proceedings of the Second Conference on Machine Translation

pdf bib
The University of Edinburgh’s Neural MT Systems for WMT17
Rico Sennrich | Alexandra Birch | Anna Currey | Ulrich Germann | Barry Haddow | Kenneth Heafield | Antonio Valerio Miceli Barone | Philip Williams
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Edinburgh’s Statistical Machine Translation Systems for WMT16
Philip Williams | Rico Sennrich | Maria Nădejde | Matthias Huck | Barry Haddow | Ondřej Bojar
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Edinburgh’s Syntax-Based Systems at WMT 2015
Philip Williams | Rico Sennrich | Maria Nadejde | Matthias Huck | Philipp Koehn
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Using Feature Structures to Improve Verb Translation in English-to-German Statistical MT
Philip Williams | Philipp Koehn
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
EU-BRIDGE MT: Combined Machine Translation
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Matthias Huck | Rico Sennrich | Nadir Durrani | Maria Nadejde | Philip Williams | Philipp Koehn | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Edinburgh’s Syntax-Based Systems at WMT 2014
Philip Williams | Rico Sennrich | Maria Nadejde | Matthias Huck | Eva Hasler | Philipp Koehn
Proceedings of the Ninth Workshop on Statistical Machine Translation

bib
Syntax-Based Statistical Machine Translation
Philip Williams | Philipp Koehn
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

The tutorial explains in detail syntax-based statistical machine translation with synchronous context free grammars (SCFG). It is aimed at researchers who have little background in this area, and gives a comprehensive overview about the main models and methods.While syntax-based models in statistical machine translation have a long history, spanning back almost 20 years, they have only recently shown superior translation quality over the more commonly used phrase-based models, and are now considered state of the art for some language pairs, such as Chinese-English (since ISI's submission to NIST 2006), and English-German (since Edinburgh's submission to WMT 2012).While the field is very dynamic, there is a core set of methods that have become dominant. Such SCFG models are implemented in the open source machine translation toolkit Moses, and the tutors draw from the practical experience of its development.The tutorial focuses on explaining core established concepts in SCFG-based approaches, which are the most popular in this area. The main goal of the tutorial is for the audience to understand how these systems work end-to-end. We review as much relevant literature as necessary, but the tutorial is not a primarily research survey.The tutorial is rounded up with open problems and advanced topics, such as computational challenges, different formalisms for syntax-based models and inclusion of semantics.

2013

pdf bib
Edinburgh’s Syntax-Based Machine Translation Systems
Maria Nadejde | Philip Williams | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Learning to Prune: Context-Sensitive Pruning for Syntactic MT
Wenduan Xu | Yue Zhang | Philip Williams | Philipp Koehn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
GHKM Rule Extraction and Scope-3 Parsing in Moses
Philip Williams | Philipp Koehn
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Agreement Constraints for Statistical Machine Translation into German
Philip Williams | Philipp Koehn
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
More Linguistic Annotation for Statistical Machine Translation
Philipp Koehn | Barry Haddow | Philip Williams | Hieu Hoang
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR