Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph

Zheng Zhang, Pierre Zweigenbaum, Ruiqing Yin


Abstract
Corpus2graph is an open-source NLP-application-oriented tool that generates a word co-occurrence network from a large corpus. It not only contains different built-in methods to preprocess words, analyze sentences, extract word pairs and define edge weights, but also supports user-customized functions. By using parallelization techniques, it can generate a large word co-occurrence network of the whole English Wikipedia data within hours. And thanks to its nodes-edges-weight three-level progressive calculation design, rebuilding networks with different configurations is even faster as it does not need to start all over again. This tool also works with other graph libraries such as igraph, NetworkX and graph-tool as a front end providing data to boost network generation speed.
Anthology ID:
W18-1702
Volume:
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana, USA
Editors:
Goran Glavaš, Swapna Somasundaran, Martin Riedl, Eduard Hovy
Venue:
TextGraphs
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7–11
Language:
URL:
https://aclanthology.org/W18-1702
DOI:
10.18653/v1/W18-1702
Bibkey:
Cite (ACL):
Zheng Zhang, Pierre Zweigenbaum, and Ruiqing Yin. 2018. Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), pages 7–11, New Orleans, Louisiana, USA. Association for Computational Linguistics.
Cite (Informal):
Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph (Zhang et al., TextGraphs 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-1702.pdf
Code
 zzcoolj/corpus2graph