Yin-Wen Chang


2021

pdf bib
A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen | Henry Tsai | Srinadh Bhojanapalli | Hyung Won Chung | Yin-Wen Chang | Chun-Sung Ferng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transformer models are permutation equivariant. To supply the order and type information of the input tokens, position and segment embeddings are usually added to the input. Recent works proposed variations of positional encodings with relative position encodings achieving better performance. Our analysis shows that the gain actually comes from moving positional information to attention layer from the input. Motivated by this, we introduce Decoupled Positional Attention for Transformers (DIET), a simple yet effective mechanism to encode position and segment information into the Transformer models. The proposed method has faster training and inference time, while achieving competitive performance on GLUE, XTREME and WMT benchmarks. We further generalize our method to long-range transformers and show performance gain.

2017

pdf bib
Source-Side Left-to-Right or Target-Side Left-to-Right? An Empirical Comparison of Two Phrase-Based Decoding Algorithms
Yin-Wen Chang | Michael Collins
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper describes an empirical study of the phrase-based decoding algorithm proposed by Chang and Collins (2017). The algorithm produces a translation by processing the source-language sentence in strictly left-to-right order, differing from commonly used approaches that build the target-language sentence in left-to-right order. Our results show that the new algorithm is competitive with Moses (Koehn et al., 2007) in terms of both speed and BLEU scores.

pdf bib
A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit
Yin-Wen Chang | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 5

Decoding of phrase-based translation models in the general case is known to be NP-complete, by a reduction from the traveling salesman problem (Knight, 1999). In practice, phrase-based systems often impose a hard distortion limit that limits the movement of phrases during translation. However, the impact on complexity after imposing such a constraint is not well studied. In this paper, we describe a dynamic programming algorithm for phrase-based decoding with a fixed distortion limit. The runtime of the algorithm is O(nd!lhd+1) where n is the sentence length, d is the distortion limit, l is a bound on the number of phrases starting at any position in the sentence, and h is related to the maximum number of target language translations for any source word. The algorithm makes use of a novel representation that gives a new perspective on decoding of phrase-based models.

2014

pdf bib
A Constrained Viterbi Relaxation for Bidirectional Word Alignment
Yin-Wen Chang | Alexander M. Rush | John DeNero | Michael Collins
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Optimal Beam Search for Machine Translation
Alexander Rush | Yin-Wen Chang | Michael Collins
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2011

pdf bib
Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Yin-Wen Chang | Michael Collins
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing