Graph-based Text Representations: Boosting Text Mining, NLP and Information Retrieval with Graphs

Fragkiskos D. Malliaros, Michalis Vazirgiannis


Abstract
Graphs or networks have been widely used as modeling tools in Natural Language Processing (NLP), Text Mining (TM) and Information Retrieval (IR). Traditionally, the unigram bag-of-words representation is applied; that way, a document is represented as a multiset of its terms, disregarding dependencies between the terms. Although several variants and extensions of this modeling approach have been proposed (e.g., the n-gram model), the main weakness comes from the underlying term independence assumption. The order of the terms within a document is completely disregarded and any relationship between terms is not taken into account in the final task (e.g., text categorization). Nevertheless, as the heterogeneity of text collections is increasing (especially with respect to document length and vocabulary), the research community has started exploring different document representations aiming to capture more fine-grained contexts of co-occurrence between different terms, challenging the well-established unigram bag-of-words model. To this direction, graphs constitute a well-developed model that has been adopted for text representation. The goal of this tutorial is to offer a comprehensive presentation of recent methods that rely on graph-based text representations to deal with various tasks in NLP and IR. We will describe basic as well as novel graph theoretic concepts and we will examine how they can be applied in a wide range of text-related application domains.All the material associated to the tutorial will be available at: http://fragkiskosm.github.io/projects/graph_text_tutorial
Anthology ID:
D17-3003
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Alexandra Birch, Nathan Schneider
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/D17-3003
DOI:
Bibkey:
Cite (ACL):
Fragkiskos D. Malliaros and Michalis Vazirgiannis. 2017. Graph-based Text Representations: Boosting Text Mining, NLP and Information Retrieval with Graphs. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Graph-based Text Representations: Boosting Text Mining, NLP and Information Retrieval with Graphs (Malliaros & Vazirgiannis, EMNLP 2017)
Copy Citation: