Text Graph Transformer for Document Classification

Haopeng Zhang, Jiawei Zhang


Abstract
Text classification is a fundamental problem in natural language processing. Recent studies applied graph neural network (GNN) techniques to capture global word co-occurrence in a corpus. However, previous works are not scalable to large-sized corpus and ignore the heterogeneity of the text graph. To address these problems, we introduce a novel Transformer based heterogeneous graph neural network, namely Text Graph Transformer (TG-Transformer). Our model learns effective node representations by capturing structure and heterogeneity from the text graph. We propose a mini-batch text graph sampling method that significantly reduces computing and memory costs to handle large-sized corpus. Extensive experiments have been conducted on several benchmark datasets, and the results demonstrate that TG-Transformer outperforms state-of-the-art approaches on text classification task.
Anthology ID:
2020.emnlp-main.668
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8322–8327
Language:
URL:
https://aclanthology.org/2020.emnlp-main.668
DOI:
10.18653/v1/2020.emnlp-main.668
Bibkey:
Cite (ACL):
Haopeng Zhang and Jiawei Zhang. 2020. Text Graph Transformer for Document Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8322–8327, Online. Association for Computational Linguistics.
Cite (Informal):
Text Graph Transformer for Document Classification (Zhang & Zhang, EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.668.pdf
Video:
 https://slideslive.com/38938916