Sandeep Subramanian


2022

pdf bib
NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022
Oleksii Hrinchuk | Vahid Noroozi | Abhinav Khattar | Anton Peganov | Sandeep Subramanian | Somshubra Majumdar | Oleksii Kuchaiev
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks. Our end-to-end model has less parameters and consists of Conformer encoder and Transformer decoder. It relies on the cascade system by re-using its pre-trained ASR encoder and training on synthetic translations generated with the ensemble of NMT models. Our En->De cascade and end-to-end systems achieve 29.7 and 26.2 BLEU on the 2020 test set correspondingly, both outperforming the previous year’s best of 26 BLEU.

2021

pdf bib
NVIDIA NeMo’s Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian | Oleksii Hrinchuk | Virginia Adams | Oleksii Kuchaiev
Proceedings of the Sixth Conference on Machine Translation

This paper provides an overview of NVIDIA NeMo’s neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks. Our news task submissions for English-German (En-De) and English-Russian (En-Ru) are built on top of a baseline transformer-based sequence-to-sequence model (CITATION). Specifically, we use a combination of 1) checkpoint averaging 2) model scaling 3) data augmentation with backtranslation and knowledge distillation from right-to-left factorized models 4) finetuning on test sets from previous years 5) model ensembling 6) shallow fusion decoding with transformer language models and 7) noisy channel re-ranking. Additionally, our biomedical task submission for English Russian uses a biomedically biased vocabulary and is trained from scratch on news task data, medically relevant text curated from the news task dataset, and biomedical data provided by the shared task. Our news system achieves a sacreBLEU score of 39.5 on the WMT’20 En-De test set outperforming the best submission from last year’s task of 38.8. Our biomedical task Ru-En and En-Ru systems reach BLEU scores of 43.8 and 40.3 respectively on the WMT’20 Biomedical Task Test set, outperforming the previous year’s best submissions.

2020

pdf bib
On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
Jonathan Pilault | Raymond Li | Sandeep Subramanian | Chris Pal
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarization. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher ROUGE scores. We provide extensive comparisons with strong baseline methods, prior state of the art work as well as multiple variants of our approach including those using only transformers, only extractive techniques and combinations of the two. We examine these models using four different summarization tasks and datasets: arXiv papers, PubMed papers, the Newsroom and BigPatent datasets. We find that transformer based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human generated abstracts. We include a human evaluation, finding that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. We hope that these architectures and experiments may serve as strong points of comparison for future work. Note: The abstract above was collaboratively written by the authors and one of the models presented in this paper based on an earlier draft of this paper.

2019

pdf bib
Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
Chinnadhurai Sankar | Sandeep Subramanian | Chris Pal | Sarath Chandar | Yoshua Bengio
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural generative models have been become increasingly popular when building conversational agents. They offer flexibility, can be easily adapted to new domains, and require minimal domain engineering. A common criticism of these systems is that they seldom understand or use the available dialog history effectively. In this paper, we take an empirical approach to understanding how these models use the available dialog history by studying the sensitivity of the models to artificially introduced unnatural changes or perturbations to their context at test time. We experiment with 10 different types of perturbations on 4 multi-turn dialog datasets and find that commonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most perturbations such as missing or reordering utterances, shuffling words, etc. Also, by open-sourcing our code, we believe that it will serve as a useful diagnostic tool for evaluating dialog systems in the future.

2018

pdf bib
Neural Models for Key Phrase Extraction and Question Generation
Sandeep Subramanian | Tong Wang | Xingdi Yuan | Saizheng Zhang | Adam Trischler | Yoshua Bengio
Proceedings of the Workshop on Machine Reading for Question Answering

We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.

2017

pdf bib
Machine Comprehension by Text-to-Text Neural Question Generation
Xingdi Yuan | Tong Wang | Caglar Gulcehre | Alessandro Sordoni | Philip Bachman | Saizheng Zhang | Sandeep Subramanian | Adam Trischler
Proceedings of the 2nd Workshop on Representation Learning for NLP

We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notably, one of these rewards is the performance of a question-answering system. We motivate question generation as a means to improve the performance of question answering systems. Our model is trained and evaluated on the recent question-answering dataset SQuAD.

pdf bib
Adversarial Generation of Natural Language
Sandeep Subramanian | Sai Rajeswar | Francis Dutil | Chris Pal | Aaron Courville
Proceedings of the 2nd Workshop on Representation Learning for NLP

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

2016

pdf bib
Neural Architectures for Named Entity Recognition
Guillaume Lample | Miguel Ballesteros | Sandeep Subramanian | Kazuya Kawakami | Chris Dyer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies