Minwoo Lee


2023

pdf bib
Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation
Minwoo Lee | Hyukhun Koh | Kang-il Lee | Dongdong Zhang | Minsung Kim | Kyomin Jung
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques. However, most works focus on debiasing bilingual models without much consideration for multilingual systems. In this paper, we specifically target the gender bias issue of multilingual machine translation models for unambiguous cases where there is a single correct translation, and propose a bias mitigation method based on a novel approach. Specifically, we propose Gender-Aware Contrastive Learning, GACL, which encodes contextual gender information into the representations of non-explicit gender words. Our method is target language-agnostic and is applicable to pre-trained multilingual machine translation models via fine-tuning. Through multilingual evaluation, we show that our approach improves gender accuracy by a wide margin without hampering translation performance. We also observe that incorporated gender information transfers and benefits other target languages regarding gender accuracy. Finally, we demonstrate that our method is applicable and beneficial to models of various sizes.

pdf bib
Asking Clarification Questions to Handle Ambiguity in Open-Domain QA
Dongryeol Lee | Segwang Kim | Minwoo Lee | Hwanhee Lee | Joonsuk Park | Sang-Woo Lee | Kyomin Jung
Findings of the Association for Computational Linguistics: EMNLP 2023

Ambiguous questions persist in open-domain question answering, because formulating a precise question with a unique answer is often challenging. Previous works have tackled this issue by asking disambiguated questions for all possible interpretations of the ambiguous question. Instead, we propose to ask a clarification question, where the user’s response will help identify the interpretation that best aligns with the user’s intention. We first present CAmbigNQ, a dataset consisting of 5,653 ambiguous questions, each with relevant passages, possible answers, and a clarification question. The clarification questions were efficiently created by generating them using InstructGPT and manually revising them as necessary. We then define a pipeline of three tasks—(1) ambiguity detection, (2) clarification question generation, and (3) clarification-based QA. In the process, we adopt or design appropriate evaluation metrics to facilitate sound research. Lastly, we achieve F1 of 61.3, 25.1, and 40.5 on the three tasks, demonstrating the need for further improvements while providing competitive baselines for future work.

2019

pdf bib
Topological Data Analysis for Discourse Semantics?
Ketki Savle | Wlodek Zadrozny | Minwoo Lee
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

In this paper we present new results on applying topological data analysis to discourse structures. We show that topological information, extracted from the relationships between sentences can be used in inference, namely it can be applied to the very difficult legal entailment given in the COLIEE 2018 data set. Previous results of Doshi and Zadrozny (2018) and Gholizadeh et al. (2018) show that topological features are useful for classification. The applications of computational topology to entailment are novel in our view provide a new set of tools for discourse semantics: computational topology can perhaps provide a bridge between the brittleness of logic and the regression of neural networks. We discuss the advantages and disadvantages of using topological information, and some open problems such as explainability of the classifier decisions.