Ivandré Paraboni

Also published as: Ivandre Paraboni

2024

pdf bib
A Bag-of-Users approach to mental health prediction from social media data
Rafael Oliveira | Ivandré Paraboni
Proceedings of the 16th International Conference on Computational Processing of Portuguese

pdf bib
Semi-automatic corpus expansion: the case of stance prediction
Camila Pereira | Ivandré Paraboni
Proceedings of the 16th International Conference on Computational Processing of Portuguese

pdf bib
Sequence-to-sequence and transformer approaches to Portuguese text style transfer
Pablo Costa | Ivandré Paraboni
Proceedings of the 16th International Conference on Computational Processing of Portuguese

2023

pdf bib abs
BERTabaporu: Assessing a Genre-Specific Language Model for Portuguese NLP
Pablo Botton Costa | Matheus Camasmie Pavan | Wesley Ramos Santos | Samuel Caetano Silva | Ivandré Paraboni
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Transformer-based language models such as Bidirectional Encoder Representations from Transformers (BERT) are now mainstream in the NLP field, but extensions to languages other than English, to new domains and/or to more specific text genres are still in demand. In this paper we introduced BERTabaporu, a BERT language model that has been pre-trained on Twitter data in the Brazilian Portuguese language. The model is shown to outperform the best-known general-purpose model for this language in three Twitter-related NLP tasks, making a potentially useful resource for Portuguese NLP in general.

pdf bib abs
Stance Prediction from Multimodal Social Media Data
Lais Carraro Leme Cavalheiro | Matheus Camasmie Pavan | Ivandré Paraboni
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Stance prediction - the computational task of inferring attitudes towards a given target topic of interest - relies heavily on text data provided by social media or similar sources, but it may also benefit from non-text information such as demographics (e.g., users’ gender, age, etc.), network structure (e.g., friends, followers, etc.), interactions (e.g., mentions, replies, etc.) and other non-text properties (e.g., time information, etc.). However, so-called hybrid (or in some cases multimodal) approaches to stance prediction have only been developed for a small set of target languages, and often making use of count-based text models (e.g., bag-of-words) and time-honoured classification methods (e.g., support vector machines). As a means to further research in the field, in this work we introduce a number of text- and non-text models for stance prediction in the Portuguese language, which make use of more recent methods based on BERT and an ensemble architecture, and ask whether a BERT stance classifier may be enhanced with different kinds of network-related information.

2020

pdf bib abs
Cross-domain Author Gender Classification in Brazilian Portuguese
Rafael Dias | Ivandré Paraboni
Proceedings of the Twelfth Language Resources and Evaluation Conference

Author profiling models predict demographic characteristics of a target author based on the text that they have written. Systems of this kind will often follow a single-domain approach, in which the model is trained from a corpus of labelled texts in a given domain, and it is subsequently validated against a test corpus built from precisely the same domain. Although single-domain settings are arguably ideal, this strategy gives rise to the question of how to proceed when no suitable training corpus (i.e., a corpus that matches the test domain) is available. To shed light on this issue, this paper discusses a cross-domain gender classification task based on four domains (Facebook, crowd sourced opinions, Blogs and E-gov requests) in the Brazilian Portuguese language. A number of simple gender classification models using word- and psycholinguistics-based features alike are introduced, and their results are compared in two kinds of cross-domain setting: first, by making use of a single text source as training data for each task, and subsequently by combining multiple sources. Results confirm previous findings related to the effects of corpus size and domain similarity in English, and pave the way for further studies in the field.

pdf bib abs
Searching Brazilian Twitter for Signs of Mental Health Issues
Wesley Santos | Amanda Funabashi | Ivandré Paraboni
Proceedings of the Twelfth Language Resources and Evaluation Conference

Depression and related mental health issues are often reflected in the language employed by the individuals who suffer from these conditions and, accordingly, research in Natural Language Processing (NLP) and related fields have developed an increasing number of studies devoted to their recognition in social media text. Some of these studies have also attempted to go beyond recognition by focusing on the early signs of these illnesses, and by analysing the users’ publication history over time to potentially prevent further harm. The two kinds of study are of course overlapping, and often make use of supervised machine learning methods based on annotated corpora. However, as in many other fields, existing resources are largely devoted to English NLP, and there is little support for these studies in under resourced languages. To bridge this gap, in this paper we describe the initial steps towards building a novel resource of this kind - a corpus intended to support both the recognition of mental health issues and the temporal analysis of these illnesses - in the Brazilian Portuguese language, and initial results of a number of experiments in text classification addressing both tasks.

2019

pdf bib abs
Personality-dependent Neural Text Summarization
Pablo Costa | Ivandré Paraboni
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In Natural Language Generation systems, personalization strategies - i.e, the use of information about a target author to generate text that (more) closely resembles human-produced language - have long been applied to improve results. The present work addresses one such strategy - namely, the use of Big Five personality information about the target author - applied to the case of abstractive text summarization using neural sequence-to-sequence models. Initial results suggest that having access to personality information does lead to more accurate (or human-like) text summaries, and paves the way for more robust systems of this kind.

pdf bib abs
Moral Stance Recognition and Polarity Classification from Twitter and Elicited Text
Wesley Santos | Ivandré Paraboni
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We introduce a labelled corpus of stances about moral issues for the Brazilian Portuguese language, and present reference results for both the stance recognition and polarity classification tasks. The corpus is built from Twitter and further expanded with data elicited through crowd sourcing and labelled by their own authors. Put together, the corpus and reference results are expected to be taken as a baseline for further studies in the field of stance recognition and polarity classification from text.

2018

pdf bib
Building a Corpus for Personality-dependent Natural Language Understanding and Generation
Ricelli Ramos | Georges Neto | Barbara Silva | Danielle Monteiro | Ivandré Paraboni | Rafael Dias
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Author Profiling from Facebook Corpora
Fernando Hsieh | Rafael Dias | Ivandré Paraboni
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Reference production in human-computer interaction: Issues for Corpus-based Referring Expression Generation
Danillo Rocha | Ivandré Paraboni
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Definite Description Lexical Choice: taking Speaker’s Personality into account
Alex Lan | Ivandré Paraboni
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Referring Expression Generation in time-constrained communication
André Mariotti | Ivandré Paraboni
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
Improving the generation of personalised descriptions
Thiago Castro Ferreira | Ivandré Paraboni
Proceedings of the 10th International Conference on Natural Language Generation

Referring expression generation (REG) models that use speaker-dependent information require a considerable amount of training data produced by every individual speaker, or may otherwise perform poorly. In this work we propose a simple personalised method for this task, in which speakers are grouped into profiles according to their referential behaviour. Intrinsic evaluation shows that the use of speaker’s profiles generally outperforms the personalised method found in previous work.

pdf bib abs
Squib: Effects of Cognitive Effort on the Resolution of Overspecified Descriptions
Ivandré Paraboni | Alex Gwo Jen Lan | Matheus Mendes de Sant’Ana | Flávio Luiz Coutinho
Computational Linguistics, Volume 43, Issue 2 - June 2017

Studies in referring expression generation (REG) have shown different effects of referential overspecification on the resolution of certain descriptions. To further investigate effects of this kind, this article reports two eye-tracking experiments that measure the time required to recognize target objects based on different kinds of information. Results suggest that referential overspecification may be either helpful or detrimental to identification depending on the kind of information that is actually overspecified, an insight that may be useful for the design of more informed hearer-oriented REG algorithms.

In Natural Language Generation, the task of attribute selection (AS) consists of determining the appropriate attribute-value pairs (or semantic properties) that represent the contents of a referring expression. Existing work on AS includes a wide range of algorithmic solutions to the problem, but the recent availability of corpora annotated with referring expressions data suggests that corpus-based AS strategies become possible as well. In this work we tentatively discuss a number of AS strategies using both semantic and surface information obtained from a corpus of this kind. Relying on semantic information, we attempt to learn both global and individual AS strategies that could be applied to a standard AS algorithm in order to generate descriptions found in the corpus. As an alternative, and perhaps less traditional approach, we also use surface information to build statistical language models of the referring expressions that are most likely to occur in the corpus, and let the model probabilities guide attribute selection.

pdf bib abs
Portuguese Text Generation from Large Corpora
Eder Novais | Ivandré Paraboni | Douglas Silva
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In the implementation of a surface realisation engine, many of the computational techniques seen in other AI fields have been widely applied. Among these, the use of statistical methods has been particularly successful, as in the so-called 'generate-and-select', or 2-stages architectures. Systems of this kind produce output strings from possibly underspecified input data by over-generating a large number of alternative realisations (often including ungrammatical candidate sentences.) These are subsequently ranked with the aid of a statistical language model, and the most likely candidate is selected as the output string. Statistical approaches may however face a number of difficulties. Among these, there is the issue of data sparseness, a problem that is particularly evident in cases such as our target language - Brazilian Portuguese - which is not only morphologically-rich, but relatively poor in NLP resources such as large, publicly available corpora. In this work we describe a first implementation of a shallow surface realisation system for this language that deals with the issue of data sparseness by making use of factored language models built from a (relatively) large corpus of Brazilian newspapers articles.

2010

pdf bib
Text Generation for Brazilian Portuguese: the Surface Realization Task
Eder Novais | Thiago Tadeu | Ivandré Paraboni
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf bib abs
SINotas: the Evaluation of a NLG Application
Roberto P. A. Araujo | Rafael L. de Oliveira | Eder M. de Novais | Thiago D. Tadeu | Daniel B. Pereira | Ivandré Paraboni
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

SINotas is a data-to-text NLG application intended to produce short textual reports on students academic performance from a database conveying their grades, weekly attendance rates and related academic information. Although developed primarily as a testbed for Portuguese Natural Language Generation, SINotas generates reports of interest to both students keen to learn how their professors would describe their efforts, and to the professors themselves, who may benefit from an at-a-glance view of the students performance. In a traditional machine learning approach, SINotas uses a data-text aligned corpus as training data for decision-tree induction. The current system comprises a series of classifiers that implement major Document Planning subtasks (namely, data interpretation, content selection, within- and between-sentence structuring), and a small surface realisation grammar of Brazilian Portuguese. In this paper we focus on the evaluation work of the system, applying a number of intrinsic and user-based evaluation metrics to a collection of text reports generated from real application data.

pdf bib abs
Extracting Surface Realisation Templates from Corpora
Thiago D. Tadeu | Eder M. de Novais | Ivandré Paraboni
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In Natural Language Generation (NLG), template-based surface realisation is an effective solution to the problem of producing surface strings from a given semantic representation, but many applications may not be able to provide the input knowledge in the required level of detail, which in turn may limit the use of the available NLG resources. However, if we know in advance what the most likely output sentences are (e.g., because a corpus on the relevant application domain happens to be available), then corpus knowledge may be used to quickly deploy a surface realisation engine for small-scale applications, for which it may be sufficient to select a sentence (in natural language) that resembles the desired output, and then modify some or all of its constituents accordingly. In other words, the application may simply 'point to' an existing sentence in the corpus and specify only the changes that need to take place to obtain the desired surface string. In this paper we describe one such approach to surface realisation, in which we extract syntactically-structured templates from a target corpus, and use these templates to produce existing and modified versions of the target sentences by a combination of canned text and basic dependency-tree operations.