Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation

Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos


Abstract
We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, the European Union’s public document database, annotated with concepts from EUROVOC, a multidisciplinary thesaurus. The dataset is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot and zero-shot learning. Experimenting with several neural classifiers, we show that BIGRUs with self-attention outperform the current multi-label state-of-the-art methods, which employ label-wise attention. Replacing CNNs with BIGRUs in label-wise attention networks leads to the best overall performance.
Anthology ID:
W19-2209
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2019
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
78–87
URL:
https://www.aclweb.org/anthology/W19-2209
DOI:
10.18653/v1/W19-2209
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/W19-2209.pdf