Xuemin Zhao


2022

pdf bib
DialogUSR: Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection
Haoran Meng | Zheng Xin | Tianyu Liu | Zizhen Wang | He Feng | Binghuai Lin | Xuemin Zhao | Yunbo Cao | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2022

While interacting with chatbots, users may elicit multiple intents in a single dialogue utterance. Instead of training a dedicated multi-intent detection model, we propose DialogUSR, a dialogue utterance splitting and reformulation task that first splits multi-intent user query into several single-intent sub-queries and then recovers all the coreferred and omitted information in the sub-queries. DialogUSR can serve as a plug-in and domain-agnostic module that empowers the multi-intent detection for the deployed chatbots with minimal efforts. We collect a high-quality naturally occurring dataset that covers 23 domains with a multi-step crowd-souring procedure. To benchmark the proposed dataset, we propose multiple action-based generative models that involve end-to-end and two-stage training, and conduct in-depth analyses on the pros and cons of the proposed baselines.

2018

pdf bib
Discriminating between Similar Languages on Imbalanced Conversational Texts
Junqing He | Xian Huang | Xuemin Zhao | Yan Zhang | Yonghong Yan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
HCCL at SemEval-2018 Task 8: An End-to-End System for Sequence Labeling from Cybersecurity Reports
Mingming Fu | Xuemin Zhao | Yonghong Yan
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes HCCL team systems that participated in SemEval 2018 Task 8: SecureNLP (Semantic Extraction from cybersecurity reports using NLP). To solve the problem, our team applied a neural network architecture that benefits from both word and character level representaions automatically, by using combination of Bi-directional LSTM, CNN and CRF (Ma and Hovy, 2016). Our system is truly end-to-end, requiring no feature engineering or data preprocessing, and we ranked 4th in the subtask 1, 7th in the subtask2 and 3rd in the SubTask2-relaxed.

2017

pdf bib
HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity
Junqing He | Long Wu | Xuemin Zhao | Yonghong Yan
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we introduce an approach to combining word embeddings and machine translation for multilingual semantic word similarity, the task2 of SemEval-2017. Thanks to the unsupervised transliteration model, our cross-lingual word embeddings encounter decreased sums of OOVs. Our results are produced using only monolingual Wikipedia corpora and a limited amount of sentence-aligned data. Although relatively little resources are utilized, our system ranked 3rd in the monolingual subtask and can be the 6th in the cross-lingual subtask.