Samhaa R. El-Beltagy

Also published as: Samhaa El-Beltagy

2023

2022

pdf bib abs
NGU CNLP atWANLP 2022 Shared Task: Propaganda Detection in Arabic
Ahmed Samir Hussein | Abu Bakr Soliman Mohammad | Mohamed Ibrahim | Laila Hesham Afify | Samhaa R. El-Beltagy
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

This paper presents the system developed by the NGU_CNLP team for addressing the shared task on Propaganda Detection in Arabic at WANLP 2022. The team participated in the shared tasks’ two sub-tasks which are: 1) Propaganda technique identification in text and 2) Propaganda technique span identification. In the first sub-task, the goal is to detect all employed propaganda techniques in some given piece of text out of a possible 17 different techniques or to detect that no propaganda technique is being used in that piece of text. As such, this first sub-task is a multi-label classification problem with a pool of 18 possible labels. Subtask 2 extends sub-task 1, by requiring the identification of the exact text span in which a propaganda technique was employed, making it a sequence labeling problem. For task 1, a combination of a data augmentation strategy coupled with an enabled transformer-based model comprised our classification model. This classification model ranked first amongst the 14 systems participating in this subtask. For sub-task two, a transfer learning model was adopted. The system ranked third among the 3 different models that participated in this subtask.

pdf bib
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Samhaa R. El-Beltagy | Xipeng Qiu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

2020

pdf bib abs
ASU_OPTO at OSACT4 - Offensive Language Detection for Arabic text
Amr Keleg | Samhaa R. El-Beltagy | Mahmoud Khalil
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

In the past years, toxic comments and offensive speech are polluting the internet and manual inspection of these comments is becoming a tiresome task to manage. Having a machine learning based model that is able to filter offensive Arabic content is of high need nowadays. In this paper, we describe the model that was submitted to the Shared Task on Offensive Language Detection that is organized by (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools). Our model makes use transformer based model (BERT) to detect offensive content. We came in the fourth place in subtask A (detecting Offensive Speech) and in the third place in subtask B (detecting Hate Speech).

2017

pdf bib abs
NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter.
Omar Enayet | Samhaa R. El-Beltagy
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Final submission for NileTMRG on RumourEval 2017.

pdf bib abs
NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis
Samhaa R. El-Beltagy | Mona El Kalamawy | Abu Bakr Soliman
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes two systems that were used by the NileTMRG for addressing Arabic Sentiment Analysis as part of SemEval-2017, task 4. NileTMRG participated in three Arabic related subtasks which are: Subtask A (Message Polarity Classification), Subtask B (Topic-Based Message Polarity classification) and Subtask D (Tweet quantification). For subtask A, we made use of NU’s sentiment analyzer which we augmented with a scored lexicon. For subtasks B and D, we used an ensemble of three different classifiers. The first classifier was a convolutional neural network that used trained (word2vec) word embeddings. The second classifier consisted of a MultiLayer Perceptron while the third classifier was a Logistic regression model that takes the same input as the second classifier. Voting between the three classifiers was used to determine the final outcome. In all three Arabic related tasks in which NileTMRG participated, the team ranked at number one.

2016

pdf bib
NileTMRG at SemEval-2016 Task 5: Deep Convolutional Neural Networks for Aspect Category and Sentiment Extraction
Talaat Khalil | Samhaa R. El-Beltagy
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
NileTMRG at SemEval-2016 Task 7: Deriving Prior Polarities for Arabic Sentiment Terms
Samhaa R. El-Beltagy
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Bilingual Embeddings and Word Alignments for Translation Quality Estimation
Amal Abdelsalam | Ondřej Bojar | Samhaa El-Beltagy
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib abs
NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
Samhaa R. El-Beltagy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents NileULex, which is an Arabic sentiment lexicon containing close to six thousands Arabic words and compound phrases. Forty five percent of the terms and expressions in the lexicon are Egyptian or colloquial while fifty five percent are Modern Standard Arabic. While the collection of many of the terms included in the lexicon was done automatically, the actual addition of any term was done manually. One of the important criterions for adding terms to the lexicon, was that they be as unambiguous as possible. The result is a lexicon with a much higher quality than any translated variant or automatically constructed one. To demonstrate that a lexicon such as this can directly impact the task of sentiment analysis, a very basic machine learning based sentiment analyser that uses unigrams, bigrams, and lexicon based features was applied on two different Twitter datasets. The obtained results were compared to a baseline system that only uses unigrams and bigrams. The same lexicon based features were also generated using a publicly available translation of a popular sentiment lexicon. The experiments show that usage of the developed lexicon improves the results over both the baseline and the publicly available lexicon.