Qi Su


2023

pdf bib
Alleviating Exposure Bias via Multi-level Contrastive Learning and Deviation Simulation in Abstractive Summarization
Jiawen Xie | Qi Su | Shaoting Zhang | Xiaofan Zhang
Findings of the Association for Computational Linguistics: ACL 2023

Most Transformer based abstractive summarization systems have a severe mismatch between training and inference, i.e., exposure bias. From diverse perspectives, we introduce a simple multi-level contrastive learning framework for abstractive summarization (SimMCS) and a tailored sparse decoder self-attention pattern (SDSA) to bridge the gap between training and inference to improve model performance. Compared with previous contrastive objectives focusing only on the relative order of probability mass assigned to non-gold summaries, SimMCS additionally takes their absolute positions into account, which guarantees that the relatively high-quality (positive) summaries among them could be properly assigned high probability mass, and further enhances the capability of discriminating summary quality beyond exploiting potential artifacts of specific metrics. SDSA simulates the possible inference scenarios of deviation in the training phase to get closer to the ideal paradigm. Our approaches outperform the previous state-of-the-art results on two summarization datasets while just adding fairly low overhead. Further empirical analysis shows our model preserves the advantages of prior contrastive methods and possesses strong few-shot learning ability.

pdf bib
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models
Shengguang Wu | Mei Yuan | Qi Su
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent advances in image and video creation, especially AI-based image synthesis, have led to the production of numerous visual scenes that exhibit a high level of abstractness and diversity. Consequently, Visual Storytelling (VST), a task that involves generating meaningful and coherent narratives from a collection of images, has become even more challenging and is increasingly desired beyond real-world imagery. While existing VST techniques, which typically use autoregressive decoders, have made significant progress, they suffer from low inference speed and are not well-suited for synthetic scenes. To this end, we propose a novel diffusion-based system DiffuVST, which models the generation of a series of visual descriptions as a single conditional denoising process. The stochastic and non-autoregressive nature of DiffuVST at inference time allows it to generate highly diverse narratives more efficiently. In addition, DiffuVST features a unique design with bi-directional text history guidance and multimodal adapter modules, which effectively improve inter-sentence coherence and image-to-text fidelity. Extensive experiments on the story generation task covering four fictional visual-story datasets demonstrate the superiority of DiffuVST over traditional autoregressive models in terms of both text quality and inference speed.

pdf bib
CCL23-Eval任务1总结报告:古籍命名实体识别(GuNER2023)(Overview of CCL23-Eval Task 1: Named Entity Recognition in Ancient Chinese Books)
Qi Su (祺苏,) | Yingying Wang (王莹莹) | Zekun Deng (邓泽琨) | Hao Yang (杨浩) | Jun Wang (王军)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“第23届中国计算语言学大会(CCL)提出了中文信息处理方面的10个评测任务。其中,任务1为古籍命名实体识别评测,由北京大学数字人文研究中心、北京大学人工智能研究院组织。该任务的主要目标是自动识别古籍文本中事件基本构成要素的重要实体,以提供对古汉语文本进行分析处理的基础。评测发布了覆盖多个朝代和领域的”二十四史”评测数据集,共15万余字,包含人名、书名、官职名三种实体超万数。同时设置了封闭和开放两个赛道,聚焦于不同规格的预训练模型的应用能力。共有127支队伍报名参加了该评测任务。在封闭赛道上,参赛系统在测试集上的最佳性能达到了96.15%的F1值;在开放赛道上,最佳性能达到了95.48%的F1值。”

2022

pdf bib
Unsupervised Chinese Word Segmentation with BERT Oriented Probing and Transformation
Wei Li | Yuhan Song | Qi Su | Yanqiu Shao
Findings of the Association for Computational Linguistics: ACL 2022

Word Segmentation is a fundamental step for understanding Chinese language. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploits shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas because of its ability to capture the deep contextual semantic relation. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves state-of-the-art F1 score on two CWS benchmark datasets.

pdf bib
Dim-Krum: Backdoor-Resistant Federated Learning for NLP with Dimension-wise Krum-Based Aggregation
Zhiyuan Zhang | Qi Su | Xu Sun
Findings of the Association for Computational Linguistics: EMNLP 2022

Despite the potential of federated learning, it is known to be vulnerable to backdoor attacks. Many robust federated aggregation methods are proposed to reduce the potential backdoor risk. However, they are mainly validated in the CV field. In this paper, we find that NLP backdoors are hard to defend against than CV, and we provide a theoretical analysis that the malicious update detection error probabilities are determined by the relative backdoor strengths. NLP attacks tend to have small relative backdoor strengths, which may result in the failure of robust federated aggregation methods for NLP attacks. Inspired by the theoretical results, we can choose some dimensions with higher backdoor strengths to settle this issue. We propose a novel federated aggregation algorithm, Dim-Krum, for NLP tasks, and experimental results validate its effectiveness.

pdf bib
That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory
Xuemei Tang | Qi Su
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.

pdf bib
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang | Ruixuan Luo | Qi Su | Xu Sun
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-of-the-art generalization abilities in vision tasks. It demonstrates that flat minima tend to imply better generalization abilities. However, it has some difficulty implying SAM to some natural language tasks, especially to models with drastic gradient changes, such as RNNs. In this work, we analyze the relation between the flatness of the local minimum and its generalization ability from a novel and straightforward theoretical perspective. We propose that the shift of the training and test distributions can be equivalently seen as a virtual parameter corruption or perturbation, which can explain why flat minima that are robust against parameter corruptions or perturbations have better generalization performances. On its basis, we propose a Gradient-Strength based Adaptive Sharpness-Aware Minimization (GA-SAM) algorithm to help to learn algorithms find flat minima that generalize better. Results in various language benchmarks validate the effectiveness of the proposed GA-SAM algorithm on natural language tasks.

2021

pdf bib
基于预训练语言模型的繁体古文自动句读研究(Automatic Traditional Ancient Chinese Texts Segmentation and Punctuation Based on Pre-training Language Model)
Xuemei Tang (唐雪梅) | Qi Su (苏祺) | Jun Wang (王军) | Yuhang Chen (陈雨航) | Hao Yang (杨浩)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

未经整理的古代典籍不含任何标点,不符合当代人的阅读习惯,古籍断句标点之后有助于阅读、研究和出版。本文提出了一种基于预训练语言模型的繁体古文自动句读框架。本文整理了约10亿字的繁体古文语料,对于训练语言模型进行增量训练,在此基础上上实现古文自动句读和标点。实验表明经过大规模繁体古文语料增量训练后的语言模型具备更好的古文语义表示能力,能够有助提升繁体古文自动句读和自动标点的效果。融合了增量训练模型之后,古文断句F1值达到95.03%,古文标点F1值达到了80.18%,分别比使用未增量训练的语言模型提升1.83%和2.21%。为解决现有篇章级句读方案效率低的问题,本文改进了前人的串行滑动窗口方式,在一定程度上提高了句读效率,并提出一种新的并行滑动窗口方式,能够高效准确地进行长文本自动句读。

pdf bib
A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models
Kaiyuan Liao | Yi Zhang | Xuancheng Ren | Qi Su | Xu Sun | Bin He
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Early exit mechanism aims to accelerate the inference speed of large-scale pre-trained language models. The essential idea is to exit early without passing through all the inference layers at the inference stage. To make accurate predictions for downstream tasks, the hierarchical linguistic information embedded in all layers should be jointly considered. However, much of the research up to now has been limited to use local representations of the exit layer. Such treatment inevitably loses information of the unused past layers as well as the high-level features embedded in future layers, leading to sub-optimal performance. To address this issue, we propose a novel Past-Future method to make comprehensive predictions from a global perspective. We first take into consideration all the linguistic information embedded in the past layers and then take a further step to engage the future information which is originally inaccessible for predictions. Extensive experiments demonstrate that our method outperforms previous early exit methods by a large margin, yielding better and robust performance.

pdf bib
Neural Network Surgery: Injecting Data Patterns into Pre-trained Models with Minimal Instance-wise Side Effects
Zhiyuan Zhang | Xuancheng Ren | Qi Su | Xu Sun | Bin He
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Side effects during neural network tuning are typically measured by overall accuracy changes. However, we find that even with similar overall accuracy, existing tuning methods result in non-negligible instance-wise side effects. Motivated by neuroscientific evidence and theoretical results, we demonstrate that side effects can be controlled by the number of changed parameters and thus, we propose to conduct neural network surgery by only modifying a limited number of parameters. Neural network surgery can be realized using diverse techniques and we investigate three lines of methods. Experimental results on representative tuning problems validate the effectiveness of the surgery approach. The dynamic selecting method achieves the best overall performance that not only satisfies the tuning goal but also induces fewer instance-wise side effects by changing only 10-5 of the parameters.

2020

pdf bib
Using Conceptual Norms for Metaphor Detection
Mingyu Wan | Kathleen Ahrens | Emmanuele Chersoni | Menghan Jiang | Qi Su | Rong Xiang | Chu-Ren Huang
Proceedings of the Second Workshop on Figurative Language Processing

This paper reports a linguistically-enriched method of detecting token-level metaphors for the second shared task on Metaphor Detection. We participate in all four phases of competition with both datasets, i.e. Verbs and AllPOS on the VUA and the TOFEL datasets. We use the modality exclusivity and embodiment norms for constructing a conceptual representation of the nodes and the context. Our system obtains an F-score of 0.652 for the VUA Verbs track, which is 5% higher than the strong baselines. The experimental results across models and datasets indicate the salient contribution of using modality exclusivity and modality shift information for predicting metaphoricity.

pdf bib
Sensorimotor Enhanced Neural Network for Metaphor Detection
Mingyu Wan | Baixi Xing | Qi Su | Pengyuan Liu | Chu-Ren Huang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Sina Mandarin Alphabetical Words:A Web-driven Code-mixing Lexical Resource
Rong Xiang | Mingyu Wan | Qi Su | Chu-Ren Huang | Qin Lu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Mandarin Alphabetical Word (MAW) is one indispensable component of Modern Chinese that demonstrates unique code-mixing idiosyncrasies influenced by language exchanges. Yet, this interesting phenomenon has not been properly addressed and is mostly excluded from the Chinese language system. This paper addresses the core problem of MAW identification and proposes to construct a large collection of MAWs from Sina Weibo (SMAW) using an automatic web-based technique which includes rule-based identification, informatics-based extraction, as well as Baidu search engine validation. A collection of 16,207 qualified SMAWs are obtained using this technique along with an annotated corpus of more than 200,000 sentences for linguistic research and applicable inquiries.

pdf bib
metaCAT: A Metadata-based Task-oriented Chatbot Annotation Tool
Ximing Liu | Wei Xue | Qi Su | Weiran Nie | Wei Peng
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations

Creating high-quality annotated dialogue corpora is challenging. It is essential to develop practical annotation tools to support humans in this time-consuming and error-prone task. We present metaCAT, which is an open-source web-based annotation tool designed specifically for developing task-oriented dialogue data. To the best of our knowledge, metaCAT is the first annotation tool that provides comprehensive metadata annotation coverage to the domain, intent, and span information. The data annotation quality is enhanced by a real-time annotation constraint-checking mechanism. An Automatic Speech Recognition (ASR) function is implemented to allow users to paraphrase and create more diversified annotated utterances. metaCAT is publicly available for the community.

pdf bib
Pretrain-KGE: Learning Knowledge Representation from Pretrained Language Models
Zhiyuan Zhang | Xiaoqian Liu | Yi Zhang | Qi Su | Xu Sun | Bin He
Findings of the Association for Computational Linguistics: EMNLP 2020

Conventional knowledge graph embedding (KGE) often suffers from limited knowledge representation, leading to performance degradation especially on the low-resource problem. To remedy this, we propose to enrich knowledge representation via pretrained language models by leveraging world knowledge from pretrained models. Specifically, we present a universal training framework named Pretrain-KGE consisting of three phases: semantic-based fine-tuning phase, knowledge extracting phase and KGE training phase. Extensive experiments show that our proposed Pretrain-KGE can improve results over KGE models, especially on solving the low-resource problem.

pdf bib
汉语竞争类多人游戏语言中疑问句的形式与功能(The Form and Function of Interrogatives in Multi-party Chinese Competitive Game Conversation)
Wenxian Zhang (张文贤) | Qi Su (苏琪)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

本文基于自建的竞争类多人游戏对话语料库对汉语疑问句的形式与功能进行了考察。文章首先在前人研究的基础上将疑问句的类型分为五大类,然后考察不同类型的疑问句在对话中出现的位置与功能。研究显示,是非问(包括反复问)与特指问是最常见的类型,选择问使用频率最低。大部分疑问句会引起话轮转换,具有询问功能,此外,否定与指出事实也是疑问句的主要功能。特指问的否定功能与附加问指出事实的 功能比较突出。

2019

pdf bib
Specificity-Driven Cascading Approach for Unsupervised Sentiment Modification
Pengcheng Yang | Junyang Lin | Jingjing Xu | Jun Xie | Qi Su | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The task of unsupervised sentiment modification aims to reverse the sentiment polarity of the input text while preserving its semantic content without any parallel data. Most previous work follows a two-step process. They first separate the content from the original sentiment, and then directly generate text with the target sentiment only based on the content produced by the first step. However, the second step bears both the target sentiment addition and content reconstruction, thus resulting in a lack of specific information like proper nouns in the generated text. To remedy this, we propose a specificity-driven cascading approach in this work, which can effectively increase the specificity of the generated text and further improve content preservation. In addition, we propose a more reasonable metric to evaluate sentiment modification. The experiments show that our approach outperforms competitive baselines by a large margin, which achieves 11% and 38% relative improvements of the overall metric on the Yelp and Amazon datasets, respectively.

pdf bib
Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction
Deli Chen | Shuming Ma | Keiko Harimoto | Ruihan Bao | Qi Su | Xu Sun
Proceedings of the Second Workshop on Economics and Natural Language Processing

Incorporating related text information has proven successful in stock market prediction. However, it is a huge challenge to utilize texts in the enormous forex (foreign currency exchange) market because the associated texts are too redundant. In this work, we propose a BERT-based Hierarchical Aggregation Model to summarize a large amount of finance news to predict forex movement. We firstly group news from different aspects: time, topic and category. Then we extract the most crucial news in each group by the SOTA extractive summarization method. Finally, we conduct interaction between the news and the trade data with attention to predict the forex movement. The experimental results show that the category based method performs best among three grouping methods and outperforms all the baselines. Besides, we study the influence of essential news attributes (category and region) by statistical analysis and summarize the influence patterns for different currency pairs.

2018

pdf bib
Deconvolution-Based Global Decoding for Neural Machine Translation
Junyang Lin | Xu Sun | Xuancheng Ren | Shuming Ma | Jinsong Su | Qi Su
Proceedings of the 27th International Conference on Computational Linguistics

A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding.

pdf bib
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
Junyang Lin | Xu Sun | Xuancheng Ren | Muyu Li | Qi Su
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality.

pdf bib
Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification
Junyang Lin | Qi Su | Pengcheng Yang | Shuming Ma | Xu Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a novel model for multi-label text classification, which is based on sequence-to-sequence learning. The model generates higher-level semantic unit representations with multi-level dilated convolution as well as a corresponding hybrid attention mechanism that extracts both the information at the word-level and the level of the semantic unit. Our designed dilated convolution effectively reduces dimension and supports an exponential expansion of receptive fields without loss of local information, and the attention-over-attention mechanism is able to capture more summary relevant information from the source context. Results of our experiments show that the proposed model has significant advantages over the baseline models on the dataset RCV1-V2 and Ren-CECps, and our analysis demonstrates that our model is competitive to the deterministic hierarchical models and it is more robust to classifying low-frequency labels

pdf bib
Structure Regularized Neural Network for Entity Relation Classification for Chinese Literature Text
Ji Wen | Xu Sun | Xuancheng Ren | Qi Su
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Relation classification is an important semantic processing task in the field of natural language processing. In this paper, we propose the task of relation classification for Chinese literature text. A new dataset of Chinese literature text is constructed to facilitate the study in this task. We present a novel model, named Structure Regularized Bidirectional Recurrent Convolutional Neural Network (SR-BRCNN), to identify the relation between entities. The proposed model learns relation representations along the shortest dependency path (SDP) extracted from the structure regularized dependency tree, which has the benefits of reducing the complexity of the whole model. Experimental results show that the proposed method significantly improves the F1 score by 10.3, and outperforms the state-of-the-art approaches on Chinese literature text.

pdf bib
Global Encoding for Abstractive Summarization
Junyang Lin | Xu Sun | Shuming Ma | Qi Su
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In neural abstractive summarization, the conventional sequence-to-sequence (seq2seq) model often suffers from repetition and semantic irrelevance. To tackle the problem, we propose a global encoding framework, which controls the information flow from the encoder to the decoder based on the global information of the source context. It consists of a convolutional gated unit to perform global encoding to improve the representations of the source-side information. Evaluations on the LCSTS and the English Gigaword both demonstrate that our model outperforms the baseline models, and the analysis shows that our model is capable of generating summary of higher quality and reducing repetition.

2017

pdf bib
Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization
Shuming Ma | Xu Sun | Jingjing Xu | Houfeng Wang | Wenjie Li | Qi Su
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Current Chinese social media text summarization models are based on an encoder-decoder framework. Although its generated summaries are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms baseline systems on a social media corpus.

2014

pdf bib
Guo1 and Guo2 in Chinese Temporal System
Zhuang Qiu | Qi Su
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2010

pdf bib
Evidentiality for Text Trustworthiness Detection
Qi Su | Chu-Ren Huang | Kai-yun Chen
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

pdf bib
Incorporate Credibility into Context for the Best Social Media Answers
Qi Su | Helen Kai-yun Chen | Chu-Ren Huang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation