Hui Wang


2023

pdf bib
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Luyao Cheng | Siqi Zheng | Zhang Qinglin | Hui Wang | Yafeng Chen | Qian Chen
Findings of the Association for Computational Linguistics: ACL 2023

Speaker diarization is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic environment. In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization. We introduce two sub-tasks, Dialogue Detection and Speaker-Turn Detection, in which we effectively extract speaker information from conversational semantics. We also propose a simple yet effective algorithm to jointly model acoustic and semantic information and obtain speaker-identified texts. Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.

pdf bib
Enabling Unsupervised Neural Machine Translation with Word-level Visual Representations
Chengpeng Fu | Xiaocheng Feng | Yichong Huang | Wenshuai Huo | Hui Wang | Bing Qin | Ting Liu
Findings of the Association for Computational Linguistics: EMNLP 2023

Unsupervised neural machine translation has recently made remarkable strides, achieving impressive results with the exclusive use of monolingual corpora. Nonetheless, these methods still exhibit fundamental flaws, such as confusing similar words. A straightforward remedy to rectify this drawback is to employ bilingual dictionaries, however, high-quality bilingual dictionaries can be costly to obtain. To overcome this limitation, we propose a method that incorporates images at the word level to augment the lexical mappings. Specifically, our method inserts visual representations into the model, modifying the corresponding embedding layer information. Besides, a visible matrix is adopted to isolate the impact of images on other unrelated words. Experiments on the Multi30k dataset with over 300,000 self-collected images validate the effectiveness in generating more accurate word translation, achieving an improvement of up to +2.81 BLEU score, which is comparable or even superior to using bilingual dictionaries.

pdf bib
FEDLEGAL: The First Real-World Federated Learning Benchmark for Legal NLP
Zhuo Zhang | Xiangjing Hu | Jingyuan Zhang | Yating Zhang | Hui Wang | Lizhen Qu | Zenglin Xu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The inevitable private information in legal data necessitates legal artificial intelligence to study privacy-preserving and decentralized learning methods. Federated learning (FL) has merged as a promising technique for multiple participants to collaboratively train a shared model while efficiently protecting the sensitive data of participants. However, to the best of our knowledge, there is no work on applying FL to legal NLP. To fill this gap, this paper presents the first real-world FL benchmark for legal NLP, coined FEDLEGAL, which comprises five legal NLP tasks and one privacy task based on the data from Chinese courts. Based on the extensive experiments on these datasets, our results show that FL faces new challenges in terms of real-world non-IID data. The benchmark also encourages researchers to investigate privacy protection using real-world data in the FL setting, as well as deploying models in resource-constrained scenarios. The code and datasets of FEDLEGAL are available here.

2022

pdf bib
CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation
Han Zhang | Sheng Zhang | Yang Xiang | Bin Liang | Jinsong Su | Zhongjian Miao | Hui Wang | Ruifeng Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Continual Language Learning (CLL) in multilingual translation is inevitable when new languages are required to be translated. Due to the lack of unified and generalized benchmarks, the evaluation of existing methods is greatly influenced by experimental design which usually has a big gap from the industrial demands. In this work, we propose the first Continual Language Learning Evaluation benchmark CLLE in multilingual translation. CLLE consists of a Chinese-centric corpus — CN-25 and two CLL tasks — the close-distance language continual learning task and the language family continual learning task designed for real and disparate demands. Different from existing translation benchmarks, CLLE considers several restrictions for CLL, including domain distribution alignment, content overlap, language diversity, and the balance of corpus. Furthermore, we propose a novel framework COMETA based on Constrained Optimization and META-learning to alleviate catastrophic forgetting and dependency on history training data by using a meta-model to retain the important parameters for old languages. Our experiments prove that CLLE is a challenging CLL benchmark and that our proposed method is effective when compared with other strong baselines. Due to the construction of the corpus, the task designing and the evaluation method are independent of the centric language, we also construct and release the English-centric corpus EN-25 to facilitate academic research.

2017

pdf bib
FuRongWang at SemEval-2017 Task 3: Deep Neural Networks for Selecting Relevant Answers in Community Question Answering
Sheng Zhang | Jiajun Cheng | Hui Wang | Xin Zhang | Pei Li | Zhaoyun Ding
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describes deep neural networks frameworks in this paper to address the community question answering (cQA) ranking task (SemEval-2017 task 3). Convolutional neural networks and bi-directional long-short term memory networks are applied in our methods to extract semantic information from questions and answers (comments). In addition, in order to take the full advantage of question-comment semantic relevance, we deploy interaction layer and augmented features before calculating the similarity. The results show that our methods have the great effectiveness for both subtask A and subtask C.

2012

pdf bib
Identification of Social Acts in Dialogue
David Bracewell | Marc Tomlinson | Hui Wang
Proceedings of COLING 2012

2011

pdf bib
An Exploration into the Use of Contextual Document Clustering for Cluster Sentiment Analysis
Niall Rooney | Hui Wang | Fiona Browne | Fergal Monaghan | Jann Müller | Alan Sergeant | Zhiwei Lin | Philip Taylor | Vladimir Dobrynin
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Lexical Semantics-Syntactic Model for Defining and Subcategorizing Attribute Noun Class
Xiaopeng Bai | Hui Wang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
從構式語法理論看漢語詞義研究 (A Construction-Bsed Approach to Chinese Lexical Semantics) [In Chinese]
Hui Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

2003

pdf bib
The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD
Hui Wang | Shiwen Yu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
A Large-scale Lexical Semantic Knowledge-base of Chinese
Hui Wang | Shiwen Yu
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

2002

pdf bib
基於組合特徵的漢語名詞詞義消歧 (A Study on Noun Sense Disambiguation Based on Syntagmatic Features) [In Chinese]
Hui Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational Chinese Lexical Semantics