Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework

Akshay Bhola, Kishaloy Halder, Animesh Prasad, Min-Yen Kan


Abstract
We introduce a deep learning model to learn the set of enumerated job skills associated with a job description. In our analysis of a large-scale government job portal mycareersfuture.sg, we observe that as much as 65% of job descriptions miss describing a significant number of relevant skills. Our model addresses this task from the perspective of an extreme multi-label classification (XMLC) problem, where descriptions are the evidence for the binary relevance of thousands of individual skills. Building upon the current state-of-the-art language modeling approaches such as BERT, we show our XMLC method improves on an existing baseline solution by over 9% and 7% absolute improvements in terms of recall and normalized discounted cumulative gain. We further show that our approach effectively addresses the missing skills problem, and helps in recovering relevant skills that were missed out in the job postings by taking into account the structured semantic representation of skills and their co-occurrences through a Correlation Aware Bootstrapping process. We further show that our approach, to ensure the BERT-XMLC model accounts for structured semantic representation of skills and their co-occurrences through a Correlation Aware Bootstrapping process, effectively addresses the missing skills problem, and helps in recovering relevant skills that were missed out in the job postings. To facilitate future research and replication of our work, we have made the dataset and the implementation of our model publicly available.
Anthology ID:
2020.coling-main.513
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5832–5842
Language:
URL:
https://aclanthology.org/2020.coling-main.513
DOI:
10.18653/v1/2020.coling-main.513
Bibkey:
Cite (ACL):
Akshay Bhola, Kishaloy Halder, Animesh Prasad, and Min-Yen Kan. 2020. Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5832–5842, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework (Bhola et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.513.pdf
Code
 wing-nus/jd2skills-bert-xmlc