Damir Korenčić

Also published as: Damir Korencic


2023

pdf bib
Definitions Matter: Guiding GPT for Multi-label Classification
Youri Peskine | Damir Korenčić | Ivan Grubisic | Paolo Papotti | Raphael Troncy | Paolo Rosso
Findings of the Association for Computational Linguistics: EMNLP 2023

Large language models have recently risen in popularity due to their ability to perform many natural language tasks without requiring any fine-tuning. In this work, we focus on two novel ideas: (1) generating definitions from examples and using them for zero-shot classification, and (2) investigating how an LLM makes use of the definitions. We thoroughly analyze the performance of GPT-3 model for fine-grained multi-label conspiracy theory classification of tweets using zero-shot labeling. In doing so, we asses how to improve the labeling by providing minimal but meaningful context in the form of the definitions of the labels. We compare descriptive noun phrases, human-crafted definitions, introduce a new method to help the model generate definitions from examples, and propose a method to evaluate GPT-3’s understanding of the definitions. We demonstrate that improving definitions of class labels has a direct consequence on the downstream classification results.

2022

pdf bib
IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations
Damir Korenčić | Ivan Grubisic
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks that address these questions. The goal of the Definition Modeling task is to investigate the power of information laying inside a word embedding to express the meaning of the word in a humanly understandable way – as a dictionary definition. Conversely, the Reverse Dictionary task explores the ability to predict word embeddings directly from its definition. In this paper, by tackling these two tasks, we are exploring the relationship between words and their semantic representations. We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset. We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks, and that achieved top scores on SemEval-2022 CODWOE challenge in several subtasks. We hope that our experimental results concerning the predictive models and the data analyses we provide will prove useful in future explorations of word representations and their relationships.

2021

pdf bib
To Block or not to Block: Experiments with Machine Learning for News Comment Moderation
Damir Korencic | Ipek Baris | Eugenia Fernandez | Katarina Leuschel | Eva Sánchez Salido
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Today, news media organizations regularly engage with readers by enabling them to comment on news articles. This creates the need for comment moderation and removal of disallowed comments – a time-consuming task often performed by human moderators. In this paper we approach the problem of automatic news comment moderation as classification of comments into blocked and not blocked categories. We construct a novel dataset of annotated English comments, experiment with cross-lingual transfer of comment labels and evaluate several machine learning models on datasets of Croatian and Estonian news comments. Team name: SuperAdmin; Challenge: Detection of blocked comments; Tools/models: CroSloEn BERT, FinEst BERT, 24Sata comment dataset, Ekspress comment dataset.

2013

pdf bib
Aspect-Oriented Opinion Mining from User Reviews in Croatian
Goran Glavaš | Damir Korenčić | Jan Šnajder
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing