Silvio Amir


2023

pdf bib
RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media
Somin Wadhwa | Vivek Khetan | Silvio Amir | Byron Wallace
Findings of the Association for Computational Linguistics: EACL 2023

We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Interventions, and Outcomes (PIO elements) within these. Using this corpus, we introduce the task of retrieving trustworthy evidence relevant to a given claim made on social media. We propose a new method to automatically derive (noisy) supervision for this task which we use to train a dense retrieval model; this outperforms baseline models. Manual evaluation of retrieval results performed by medical doctors indicate that while our system performance is promising, there is considerable room for improvement. We release all annotations collected (and scripts to assemble the dataset), and all code necessary to reproduce the results in this paper at: https://sominw.com/redhot.

pdf bib
SemEval-2023 Task 8: Causal Medical Claim Identification and Related PIO Frame Extraction from Social Media Posts
Vivek Khetan | Somin Wadhwa | Byron Wallace | Silvio Amir
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Identification of medical claims from user-generated text data is an onerous but essential step for various tasks including content moderation, and hypothesis generation. SemEval-2023 Task 8 is an effort towards building those capabilities and motivating further research in this direction. This paper summarizes the details and results of shared task 8 at SemEval-2023 which involved identifying causal medical claims and extracting related Populations, Interventions, and Outcomes (“PIO”) frames from social media (Reddit) text. This shared task comprised two subtasks: (1) Causal claim identification; and (2) PIO frame extraction. In total, seven teams participated in the task. Of the seven, six provided system descriptions which we summarize here. For the first subtask, the best approach yielded a macro-averaged F-1 score of 78.40, and for the second subtask, the best approach achieved token-level F-1 scores of 40.55 for Populations, 49.71 for Interventions, and 30.08 for Outcome frames.

pdf bib
Revisiting Relation Extraction in the era of Large Language Models
Somin Wadhwa | Silvio Amir | Byron Wallace
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a sequence-to-sequence task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.

2021

pdf bib
On the Impact of Random Seeds on the Fairness of Clinical Classifiers
Silvio Amir | Jan-Willem van de Meent | Byron Wallace
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III —— the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes.

2019

pdf bib
Mental Health Surveillance over Social Media with Digital Cohorts
Silvio Amir | Mark Dredze | John W. Ayers
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

The ability to track mental health conditions via social media opened the doors for large-scale, automated, mental health surveillance. However, inferring accurate population-level trends requires representative samples of the underlying population, which can be challenging given the biases inherent in social media data. While previous work has adjusted samples based on demographic estimates, the populations were selected based on specific outcomes, e.g. specific mental health conditions. We depart from these methods, by conducting analyses over demographically representative digital cohorts of social media users. To validated this approach, we constructed a cohort of US based Twitter users to measure the prevalence of depression and PTSD, and investigate how these illnesses manifest across demographic subpopulations. The analysis demonstrates that cohort-based studies can help control for sampling biases, contextualize outcomes, and provide deeper insights into the data.

2016

pdf bib
Modelling Context with User Embeddings for Sarcasm Detection in Social Media
Silvio Amir | Byron C. Wallace | Hao Lyu | Paula Carvalho | Mário J. Silva
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
INESC-ID at SemEval-2016 Task 4-A: Reducing the Problem of Out-of-Embedding Words
Silvio Amir | Ramon F. Astudillo | Wang Ling | Mário J. Silva | Isabel Trancoso
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Not All Contexts Are Created Equal: Better Word Representations with Variable Attention
Wang Ling | Yulia Tsvetkov | Silvio Amir | Ramón Fermandez | Chris Dyer | Alan W Black | Isabel Trancoso | Chu-Cheng Lin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Wang Ling | Chris Dyer | Alan W Black | Isabel Trancoso | Ramón Fermandez | Silvio Amir | Luís Marujo | Tiago Luís
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction
Silvio Amir | Ramon F. Astudillo | Wang Ling | Bruno Martins | Mario J. Silva | Isabel Trancoso
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
INESC-ID: Sentiment Analysis without Hand-Coded Features or Linguistic Resources using Embedding Subspaces
Ramon F. Astudillo | Silvio Amir | Wang Ling | Bruno Martins | Mario J. Silva | Isabel Trancoso
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces
Ramon F. Astudillo | Silvio Amir | Wang Ling | Mário Silva | Isabel Trancoso
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
TUGAS: Exploiting unlabelled data for Twitter sentiment analysis
Silvio Amir | Miguel B. Almeida | Bruno Martins | João Filgueiras | Mário J. Silva
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)