Text Zoning and Classification for Job Advertisements in German, French and English

Ann-Sophie Gnehm, Simon Clematide


Abstract
We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand. Our main contribution are empirical findings on the benefits of contextualized embeddings and the potential of multi-task models for this purpose. With contextualized in-domain embeddings in BiLSTM-CRF models, we reach an accuracy of 91% for token-level text zoning and outperform previous approaches. A multi-tasking BERT model performs well for our classification tasks. We further compare transfer approaches for our multilingual data.
Anthology ID:
2020.nlpcss-1.10
Volume:
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Month:
November
Year:
2020
Address:
Online
Editors:
David Bamman, Dirk Hovy, David Jurgens, Brendan O'Connor, Svitlana Volkova
Venue:
NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
83–93
Language:
URL:
https://aclanthology.org/2020.nlpcss-1.10
DOI:
10.18653/v1/2020.nlpcss-1.10
Bibkey:
Cite (ACL):
Ann-Sophie Gnehm and Simon Clematide. 2020. Text Zoning and Classification for Job Advertisements in German, French and English. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 83–93, Online. Association for Computational Linguistics.
Cite (Informal):
Text Zoning and Classification for Job Advertisements in German, French and English (Gnehm & Clematide, NLP+CSS 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.nlpcss-1.10.pdf
Optional supplementary material:
 2020.nlpcss-1.10.OptionalSupplementaryMaterial.zip
Video:
 https://slideslive.com/38940604