Nizar Habash


2019

pdf bib
Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling
Nasser Zalmout | Nizar Habash
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Morphological tagging is challenging for morphologically rich languages due to the large target space and the need for more training data to minimize model sparsity. Dialectal variants of morphologically rich languages suffer more as they tend to be more noisy and have less resources. In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging. We use multitask learning for joint morphological modeling for the features within two dialects, and as a knowledge-transfer scheme for cross-dialectal modeling. We use adversarial training to learn dialect invariant features that can help the knowledge-transfer scheme from the high to low-resource variants. We work with two dialectal variants: Modern Standard Arabic (high-resource “dialect’”) and Egyptian Arabic (low-resource dialect) as a case study. Our models achieve state-of-the-art results for both. Furthermore, adversarial training provides more significant improvement when using smaller training datasets in particular.

pdf bib
The Effectiveness of Simple Hybrid Systems for Hypernym Discovery
William Held | Nizar Habash
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Hypernymy modeling has largely been separated according to two paradigms, pattern-based methods and distributional methods. However, recent works utilizing a mix of these strategies have yielded state-of-the-art results. This paper evaluates the contribution of both paradigms to hybrid success by evaluating the benefits of hybrid treatment of baseline models from each paradigm. Even with a simple methodology for each individual system, utilizing a hybrid approach establishes new state-of-the-art results on two domain-specific English hypernym discovery tasks and outperforms all non-hybrid approaches in a general English hypernym discovery task.

pdf bib
Automatic Gender Identification and Reinflection in Arabic
Nizar Habash | Houda Bouamor | Christine Chung
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

The impressive progress in many Natural Language Processing (NLP) applications has increased the awareness of some of the biases these NLP systems have with regards to gender identities. In this paper, we propose an approach to extend biased single-output gender-blind NLP systems with gender-specific alternative reinflections. We focus on Arabic, a gender-marking morphologically rich language, in the context of machine translation (MT) from English, and for first-person-singular constructions only. Our contributions are the development of a system-independent gender-awareness wrapper, and the building of a corpus for training and evaluating first-person-singular gender identification and reinflection in Arabic. Our results successfully demonstrate the viability of this approach with 8% relative increase in Bleu score for first-person-singular feminine, and 5.3% comparable increase for first-person-singular masculine on top of a state-of-the-art gender-blind MT system on a held-out test set.

pdf bib
A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance
Alexander Erdmann | Salam Khalifa | Mai Oudah | Nizar Habash | Houda Bouamor
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input. Our technique involves creating a small grammar of closed-class affixes which can be written in a few hours. The grammar over generates analyses for word forms attested in a raw corpus which are disambiguated based on features of the linguistic base proposed for each form. Extending the grammar to cover orthographic, morpho-syntactic or lexical variation is simple, making it an ideal solution for challenging corpora with noisy, dialect-inconsistent, or otherwise non-standard content. In two evaluations, we consistently outperform competitive unsupervised baselines and approach the performance of state-of-the-art supervised models trained on large amounts of data, providing evidence for the value of linguistic input during preprocessing.

pdf bib
Morphologically Annotated Corpora for Seven Arabic Dialects: Taizi, Sanaani, Najdi, Jordanian, Syrian, Iraqi and Moroccan
Faisal Alshargi | Shahd Dibas | Sakhar Alkhereyf | Reem Faraj | Basmah Abdulkareem | Sane Yagi | Ouafaa Kacha | Nizar Habash | Owen Rambow
Proceedings of the Fourth Arabic Natural Language Processing Workshop

We present a collection of morphologically annotated corpora for seven Arabic dialects: Taizi Yemeni, Sanaani Yemeni, Najdi, Jordanian, Syrian, Iraqi and Moroccan Arabic. The corpora collectively cover over 200,000 words, and are all manually annotated in a common set of standards for orthography, diacritized lemmas, tokenization, morphological units and English glosses. These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.

pdf bib
The MADAR Shared Task on Arabic Fine-Grained Dialect Identification
Houda Bouamor | Sabit Hassan | Nizar Habash
Proceedings of the Fourth Arabic Natural Language Processing Workshop

In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. This shared task was organized as part of The Fourth Arabic Natural Language Processing Workshop, collocated with ACL 2019. The shared task includes two subtasks: the MADAR Travel Domain Dialect Identification subtask (Subtask 1) and the MADAR Twitter User Dialect Identification subtask (Subtask 2). This shared task is the first to target a large set of dialect labels at the city and country levels. The data for the shared task was created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project. A total of 21 teams from 15 countries participated in the shared task.

pdf bib
The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation
Mai Oudah | Amjad Almahairi | Nizar Habash
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
ADIDA: Automatic Dialect Identification for Arabic
Ossama Obeid | Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text. The system distinguishes among the dialects of 25 Arab cities (from Rabat to Muscat) in addition to Modern Standard Arabic. The results are presented with either a point map or a heat map visualizing the automatic identification probabilities over a geographical map of the Arab World.

2018

pdf bib
Fine-Grained Arabic Dialect Identification
Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 27th International Conference on Computational Linguistics

Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification). This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic – a very challenging task. We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9% for sentences with an average length of 7 words (a 9% relative error reduction over the state-of-the-art technique for Arabic dialect identification) and reach more than 90% when we consider 16 words. We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.

pdf bib
A Cross-lingual Messenger with Keyword Searchable Phrases for the Travel Domain
Shehroze Khan | Jihyun Kim | Tarik Zulfikarpasic | Peter Chen | Nizar Habash
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present Qutr (Query Translator), a smart cross-lingual communication application for the travel domain. Qutr is a real-time messaging app that automatically translates conversations while supporting keyword-to-sentence matching. Qutr relies on querying a database that holds commonly used pre-translated travel-domain phrases and phrase templates in different languages with the use of keywords. The query matching supports paraphrases, incomplete keywords and some input spelling errors. The application addresses common cross-lingual communication issues such as translation accuracy, speed, privacy, and personalization.

pdf bib
Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models
Daniel Watson | Nasser Zalmout | Nizar Habash
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little exploration in this direction. Both the scarcity of annotated data and the complexity of the language increase the difficulty of the problem. To address these challenges, we use a sequence-to-sequence model with character-based attention, which in addition to its self-learned character embeddings, uses word embeddings pre-trained with an approach that also models subword information. This provides the neural model with access to more linguistic information especially suitable for text normalization, without large parallel corpora. We show that providing the model with word-level features bridges the gap for the neural network approach to achieve a state-of-the-art F1 score on a standard Arabic language correction shared task dataset.

pdf bib
Feature Optimization for Predicting Readability of Arabic L1 and L2
Hind Saddiki | Nizar Habash | Violetta Cavalli-Sforza | Muhamed Al Khalil
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Advances in automatic readability assessment can impact the way people consume information in a number of domains. Arabic, being a low-resource and morphologically complex language, presents numerous challenges to the task of automatic readability assessment. In this paper, we present the largest and most in-depth computational readability study for Arabic to date. We study a large set of features with varying depths, from shallow words to syntactic trees, for both L1 and L2 readability tasks. Our best L1 readability accuracy result is 94.8% (75% error reduction from a commonly used baseline). The comparable results for L2 are 72.4% (45% error reduction). We also demonstrate the added value of leveraging L1 features for L2 readability prediction.

pdf bib
Improving Domain Independent Question Parsing with Synthetic Treebanks
Halim-Antoine Boukaram | Nizar Habash | Micheline Ziadee | Majd Sakr
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.

pdf bib
A Bilingual Interactive Human Avatar Dialogue System
Dana Abu Ali | Muaz Ahmad | Hayat Al Hassan | Paula Dozsa | Ming Hu | Jose Varias | Nizar Habash
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

This demonstration paper presents a bilingual (Arabic-English) interactive human avatar dialogue system. The system is named TOIA (time-offset interaction application), as it simulates face-to-face conversations between humans using digital human avatars recorded in the past. TOIA is a conversational agent, similar to a chat bot, except that it is based on an actual human being and can be used to preserve and tell stories. The system is designed to allow anybody, simply using a laptop, to create an avatar of themselves, thus facilitating cross-cultural and cross-generational sharing of narratives to wider audiences. The system currently supports monolingual and cross-lingual dialogues in Arabic and English, but can be extended to other languages.

pdf bib
Complementary Strategies for Low Resourced Morphological Modeling
Alexander Erdmann | Nizar Habash
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Morphologically rich languages are challenging for natural language processing tasks due to data sparsity. This can be addressed either by introducing out-of-context morphological knowledge, or by developing machine learning architectures that specifically target data sparsity and/or morphological information. We find these approaches to complement each other in a morphological paradigm modeling task in Modern Standard Arabic, which, in addition to being morphologically complex, features ubiquitous ambiguity, exacerbating sparsity with noise. Given a small number of out-of-context rules describing closed class morphology, we combine them with word embeddings leveraging subword strings and noise reduction techniques. The combination outperforms both approaches individually by about 20% absolute. While morphological resources already exist for Modern Standard Arabic, our results inform how comparable resources might be constructed for non-standard dialects or any morphologically rich, low resourced language, given scarcity of time and funding.

pdf bib
An Arabic Morphological Analyzer and Generator with Copious Features
Dima Taji | Salam Khalifa | Ossama Obeid | Fadhl Eryani | Nizar Habash
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

We introduce CALIMA-Star, a very rich Arabic morphological analyzer and generator that provides functional and form-based morphological features as well as built-in tokenization, phonological representation, lexical rationality and much more. This tool includes a fast engine that can be easily integrated into other systems, as well as an easy-to-use API and a web interface. CALIMA-Star also supports morphological reinflection. We evaluate CALIMA-Star against four commonly used analyzers for Arabic in terms of speed and morphological content.

pdf bib
A Parallel Corpus of Arabic-Japanese News Articles
Go Inoue | Nizar Habash | Yuji Matsumoto | Hiroyuki Aoyama
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Palmyra: A Platform Independent Dependency Annotation Tool for Morphologically Rich Languages
Talha Javed | Nizar Habash | Dima Taji
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Leveled Reading Corpus of Modern Standard Arabic
Muhamed Al Khalil | Hind Saddiki | Nizar Habash | Latifa Alfalasi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction
Ossama Obeid | Salam Khalifa | Nizar Habash | Houda Bouamor | Wajdi Zaghouani | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The MADAR Arabic Dialect Corpus and Lexicon
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Unified Guidelines and Resources for Arabic Dialect Orthography
Nizar Habash | Fadhl Eryani | Salam Khalifa | Owen Rambow | Dana Abdulrahim | Alexander Erdmann | Reem Faraj | Wajdi Zaghouani | Houda Bouamor | Nasser Zalmout | Sara Hassan | Faisal Al-Shargi | Sakhar Alkhereyf | Basma Abdulkareem | Ramy Eskander | Mohammad Salameh | Hind Saddiki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Morphologically Annotated Corpus of Emirati Arabic
Salam Khalifa | Nizar Habash | Fadhl Eryani | Ossama Obeid | Dana Abdulrahim | Meera Al Kaabi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing
Amir More | Özlem Çetinoğlu | Çağrı Çöltekin | Nizar Habash | Benoît Sagot | Djamé Seddah | Dima Taji | Reut Tsarfaty
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Addressing Noise in Multidialectal Word Embeddings
Alexander Erdmann | Nasser Zalmout | Nizar Habash
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Word embeddings are crucial to many natural language processing tasks. The quality of embeddings relies on large non-noisy corpora. Arabic dialects lack large corpora and are noisy, being linguistically disparate with no standardized spelling. We make three contributions to address this noise. First, we describe simple but effective adaptations to word embedding tools to maximize the informative content leveraged in each training sentence. Second, we analyze methods for representing disparate dialects in one embedding space, either by mapping individual dialects into a shared space or learning a joint model of all dialects. Finally, we evaluate via dictionary induction, showing that two metrics not typically reported in the task enable us to analyze our contributions’ effects on low and high frequency words. In addition to boosting performance between 2-53%, we specifically improve on noisy, low frequency forms without compromising accuracy on high frequency forms.

pdf bib
Noise-Robust Morphological Disambiguation for Dialectal Arabic
Nasser Zalmout | Alexander Erdmann | Nizar Habash
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging. The challenging nature of noisy text processing is exacerbated for dialectal content, where in addition to spelling and lexical differences, dialectal text is characterized with morpho-syntactic and phonetic variations. These issues increase sparsity in NLP models and reduce accuracy. We present a neural morphological tagging and disambiguation model for Egyptian Arabic, with various extensions to handle noisy and inconsistent content. Our models achieve about 5% relative error reduction (1.1% absolute improvement) for full morphological analysis, and around 22% relative error reduction (1.8% absolute improvement) for part-of-speech tagging, over a state-of-the-art baseline.

2017

pdf bib
Proceedings of the Third Arabic Natural Language Processing Workshop
Nizar Habash | Mona Diab | Kareem Darwish | Wassim El-Hajj | Hend Al-Khalifa | Houda Bouamor | Nadi Tomeh | Mahmoud El-Haj
Proceedings of the Third Arabic Natural Language Processing Workshop

pdf bib
A Morphological Analyzer for Gulf Arabic Verbs
Salam Khalifa | Sara Hassan | Nizar Habash
Proceedings of the Third Arabic Natural Language Processing Workshop

We present CALIMAGLF, a Gulf Arabic morphological analyzer currently covering over 2,600 verbal lemmas. We describe in detail the process of building the analyzer starting from phonetic dictionary entries to fully inflected orthographic paradigms and associated lexicon and orthographic variants. We evaluate the coverage of CALIMA-GLF against Modern Standard Arabic and Egyptian Arabic analyzers on part of a Gulf Arabic novel. CALIMA-GLF verb analysis token recall for identifying correct POS tag outperforms both the Modern Standard Arabic and Egyptian Arabic analyzers by over 27.4% and 16.9% absolute, respectively.

pdf bib
A Characterization Study of Arabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining Models
Ramy Baly | Gilbert Badaro | Georges El-Khoury | Rawan Moukalled | Rita Aoun | Hazem Hajj | Wassim El-Hajj | Nizar Habash | Khaled Shaban
Proceedings of the Third Arabic Natural Language Processing Workshop

Opinion mining in Arabic is a challenging task given the rich morphology of the language. The task becomes more challenging when it is applied to Twitter data, which contains additional sources of noise, such as the use of unstandardized dialectal variations, the nonconformation to grammatical rules, the use of Arabizi and code-switching, and the use of non-text objects such as images and URLs to express opinion. In this paper, we perform an analytical study to observe how such linguistic phenomena vary across different Arab regions. This study of Arabic Twitter characterization aims at providing better understanding of Arabic Tweets, and fostering advanced research on the topic. Furthermore, we explore the performance of the two schools of machine learning on Arabic Twitter, namely the feature engineering approach and the deep learning approach. We consider models that have achieved state-of-the-art performance for opinion mining in English. Results highlight the advantages of using deep learning-based models, and confirm the importance of using morphological abstractions to address Arabic’s complex morphology.

pdf bib
Robust Dictionary Lookup in Multiple Noisy Orthographies
Lingliang Zhang | Nizar Habash | Godfried Toussaint
Proceedings of the Third Arabic Natural Language Processing Workshop

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean” feature, as well as the Yamli smart Arabic keyboard.

pdf bib
Universal Dependencies for Arabic
Dima Taji | Nizar Habash | Daniel Zeman
Proceedings of the Third Arabic Natural Language Processing Workshop

We describe the process of creating NUDAR, a Universal Dependency treebank for Arabic. We present the conversion from the Penn Arabic Treebank to the Universal Dependency syntactic representation through an intermediate dependency representation. We discuss the challenges faced in the conversion of the trees, the decisions we made to solve them, and the validation of our conversion. We also present initial parsing results on NUDAR.

pdf bib
Don’t Throw Those Morphological Analyzers Away Just Yet: Neural Morphological Disambiguation for Arabic
Nasser Zalmout | Nizar Habash
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper presents a model for Arabic morphological disambiguation based on Recurrent Neural Networks (RNN). We train Long Short-Term Memory (LSTM) cells in several configurations and embedding levels to model the various morphological features. Our experiments show that these models outperform state-of-the-art systems without explicit use of feature engineering. However, adding learning features from a morphological analyzer to model the space of possible analyses provides additional improvement. We make use of the resulting morphological models for scoring and ranking the analyses of the morphological analyzer for morphological disambiguation. The results show significant gains in accuracy across several evaluation metrics. Our system results in 4.4% absolute increase over the state-of-the-art in full morphological analysis accuracy (30.6% relative error reduction), and 10.6% (31.5% relative error reduction) for out-of-vocabulary words.

pdf bib
A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages
Nizar Habash | Nasser Zalmout | Dima Taji | Hieu Hoang | Maverick Alzate
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRC-Acquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
OMAM at SemEval-2017 Task 4: Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based Model
Ramy Baly | Gilbert Badaro | Ali Hamdi | Rawan Moukalled | Rita Aoun | Georges El-Khoury | Ahmad Al Sallab | Hazem Hajj | Nizar Habash | Khaled Shaban | Wassim El-Hajj
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

While sentiment analysis in English has achieved significant progress, it remains a challenging task in Arabic given the rich morphology of the language. It becomes more challenging when applied to Twitter data that comes with additional sources of noise including dialects, misspellings, grammatical mistakes, code switching and the use of non-textual objects to express sentiments. This paper describes the “OMAM” systems that we developed as part of SemEval-2017 task 4. We evaluate English state-of-the-art methods on Arabic tweets for subtask A. As for the remaining subtasks, we introduce a topic-based approach that accounts for topic specificities by predicting topics or domains of upcoming tweets, and then using this information to predict their sentiment. Results indicate that applying the English state-of-the-art method to Arabic has achieved solid results without significant enhancements. Furthermore, the topic-based method ranked 1st in subtasks C and E, and 2nd in subtask D.

pdf bib
OMAM at SemEval-2017 Task 4: English Sentiment Analysis with Conditional Random Fields
Chukwuyem Onyibe | Nizar Habash
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe a supervised system that uses optimized Condition Random Fields and lexical features to predict the sentiment of a tweet. The system was submitted to the English version of all subtasks in SemEval-2017 Task 4.

2016

pdf bib
The Columbia University - New York University Abu Dhabi SIGMORPHON 2016 Morphological Reinflection Shared Task Submission
Dima Taji | Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Analysis of Foreign Language Teaching Methods: An Automatic Readability Approach
Nasser Zalmout | Hind Saddiki | Nizar Habash
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

Much research in education has been done on the study of different language teaching methods. However, there has been little investigation using computational analysis to compare such methods in terms of readability or complexity progression. In this paper, we make use of existing readability scoring techniques and our own classifiers to analyze the textbooks used in two very different teaching methods for English as a Second Language – the grammar-based and the communicative methods. Our analysis indicates that the grammar-based curriculum shows a more coherent readability progression compared to the communicative curriculum. This finding corroborates with the expectations about the differences between these two methods and validates our approach’s value in comparing different teaching methods quantitatively.

pdf bib
DALILA: The Dialectal Arabic Linguistic Learning Assistant
Salam Khalifa | Houda Bouamor | Nizar Habash
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP). The number and sophistication of tools and datasets in DA are very limited in comparison to Modern Standard Arabic (MSA) and other languages. MSA tools do not effectively model DA which makes the direct use of MSA NLP tools for handling dialects impractical. This is particularly a challenge for the creation of tools to support learning Arabic as a living language on the web, where authentic material can be found in both MSA and DA. In this paper, we present the Dialectal Arabic Linguistic Learning Assistant (DALILA), a Chrome extension that utilizes cutting-edge Arabic dialect NLP research to assist learners and non-native speakers in understanding text written in either MSA or DA. DALILA provides dialectal word analysis and English gloss corresponding to each word.

pdf bib
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
Faisal Al-Shargi | Aidan Kaplan | Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present new language resources for Moroccan and Sanaani Yemeni Arabic. The resources include corpora for each dialect which have been morphologically annotated, and morphological analyzers for each dialect which are derived from these corpora. These are the first sets of resources for Moroccan and Yemeni Arabic. The resources will be made available to the public.

pdf bib
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
Wajdi Zaghouani | Nizar Habash | Ossama Obeid | Behrang Mohit | Houda Bouamor | Kemal Oflazer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

pdf bib
Applying the Cognitive Machine Translation Evaluation Approach to Arabic
Irina Temnikova | Wajdi Zaghouani | Stephan Vogel | Nizar Habash
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The goal of the cognitive machine translation (MT) evaluation approach is to build classifiers which assign post-editing effort scores to new texts. The approach helps estimate fair compensation for post-editors in the translation industry by evaluating the cognitive difficulty of post-editing MT output. The approach counts the number of errors classified in different categories on the basis of how much cognitive effort they require in order to be corrected. In this paper, we present the results of applying an existing cognitive evaluation approach to Modern Standard Arabic (MSA). We provide a comparison of the number of errors and categories of errors in three MSA texts of different MT quality (without any language-specific adaptation), as well as a comparison between MSA texts and texts from three Indo-European languages (Russian, Spanish, and Bulgarian), taken from a previous experiment. The results show how the error distributions change passing from the MSA texts of worse MT quality to MSA texts of better MT quality, as well as a similarity in distinguishing the texts of better MT quality for all four languages.

pdf bib
SPLIT: Smart Preprocessing (Quasi) Language Independent Tool
Mohamed Al-Badrashiny | Arfath Pasha | Mona Diab | Nizar Habash | Owen Rambow | Wael Salloum | Ramy Eskander
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Text preprocessing is an important and necessary task for all NLP applications. A simple variation in any preprocessing step may drastically affect the final results. Moreover replicability and comparability, as much as feasible, is one of the goals of our scientific enterprise, thus building systems that can ensure the consistency in our various pipelines would contribute significantly to our goals. The problem has become quite pronounced with the abundance of NLP tools becoming more and more available yet with different levels of specifications. In this paper, we present a dynamic unified preprocessing framework and tool, SPLIT, that is highly configurable based on user requirements which serves as a preprocessing tool for several tools at once. SPLIT aims to standardize the implementations of the most important preprocessing steps by allowing for a unified API that could be exchanged across different researchers to ensure complete transparency in replication. The user is able to select the required preprocessing tasks among a long list of preprocessing steps. The user is also able to specify the order of execution which in turn affects the final preprocessing output.

pdf bib
A Large Scale Corpus of Gulf Arabic
Salam Khalifa | Nizar Habash | Dana Abdulrahim | Sara Hassan
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Most Arabic natural language processing tools and resources are developed to serve Modern Standard Arabic (MSA), which is the official written language in the Arab World. Some Dialectal Arabic varieties, notably Egyptian Arabic, have received some attention lately and have a growing collection of resources that include annotated corpora and morphological analyzers and taggers. Gulf Arabic, however, lags behind in that respect. In this paper, we present the Gumar Corpus, a large-scale corpus of Gulf Arabic consisting of 110 million words from 1,200 forum novels. We annotate the corpus for sub-dialect information at the document level. We also present results of a preliminary study in the morphological annotation of Gulf Arabic which includes developing guidelines for a conventional orthography. The text of the corpus is publicly browsable through a web interface we developed for it.

pdf bib
Exploiting Arabic Diacritization for High Quality Automatic Annotation
Nizar Habash | Anas Shahrour | Muhamed Al-Khalil
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a novel technique for Arabic morphological annotation. The technique utilizes diacritization to produce morphological annotations of quality comparable to human annotators. Although Arabic text is generally written without diacritics, diacritization is already available for large corpora of Arabic text in several genres. Furthermore, diacritization can be generated at a low cost for new text as it does not require specialized training beyond what educated Arabic typists know. The basic approach is to enrich the input to a state-of-the-art Arabic morphological analyzer with word diacritics (full or partial) to enhance its performance. When applied to fully diacritized text, our approach produces annotations with an accuracy of over 97% on lemma, part-of-speech, and tokenization combined.

pdf bib
Arabic Corpora for Credibility Analysis
Ayman Al Zaatari | Rim El Ballouli | Shady ELbassouni | Wassim El-Hajj | Hazem Hajj | Khaled Shaban | Nizar Habash | Emad Yahya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

A significant portion of data generated on blogging and microblogging websites is non-credible as shown in many recent studies. To filter out such non-credible information, machine learning can be deployed to build automatic credibility classifiers. However, as in the case with most supervised machine learning approaches, a sufficiently large and accurate training data must be available. In this paper, we focus on building a public Arabic corpus of blogs and microblogs that can be used for credibility classification. We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic. We discuss our data acquisition approach and annotation process, provide rigid analysis on the annotated data and finally report some results on the effectiveness of our data for credibility classification.

pdf bib
Machine Translation Evaluation for Arabic using Morphologically-enriched Embeddings
Francisco Guzmán | Houda Bouamor | Ramy Baly | Nizar Habash
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Evaluation of machine translation (MT) into morphologically rich languages (MRL) has not been well studied despite posing many challenges. In this paper, we explore the use of embeddings obtained from different levels of lexical and morpho-syntactic linguistic analysis and show that they improve MT evaluation into an MRL. Specifically we report on Arabic, a language with complex and rich morphology. Our results show that using a neural-network model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation into Arabic, by almost over 75% increase in correlation with human judgments on pairwise MT evaluation quality task. More importantly, we demonstrate the usefulness of morpho-syntactic representations to model sentence similarity for MT evaluation and address complex linguistic phenomena of Arabic.

pdf bib
Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine
Ramy Eskander | Nizar Habash | Owen Rambow | Arfath Pasha
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much. However, as more and more written dialectal Arabic is found in social media, NLP for Arabic dialects becomes an important goal. We present a methodology for creating a morphological analyzer and a morphological tagger for dialectal Arabic, and we illustrate it on Egyptian and Levantine Arabic. To our knowledge, these are the first analyzer and tagger for Levantine.

pdf bib
Botta: An Arabic Dialect Chatbot
Dana Abu Ali | Nizar Habash
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents BOTTA, the first Arabic dialect chatbot. We explore the challenges of creating a conversational agent that aims to simulate friendly conversations using the Egyptian Arabic dialect. We present a number of solutions and describe the different components of the BOTTA chatbot. The BOTTA database files are publicly available for researchers working on Arabic chatbot technologies. The BOTTA chatbot is also publicly available for any users who want to chat with it online.

pdf bib
YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer
Salam Khalifa | Nasser Zalmout | Nizar Habash
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but provides specific outputs catering to specific applications.

pdf bib
CamelParser: A system for Arabic Syntactic Analysis and Morphological Disambiguation
Anas Shahrour | Salam Khalifa | Dima Taji | Nizar Habash
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this paper, we present CamelParser, a state-of-the-art system for Arabic syntactic dependency analysis aligned with contextually disambiguated morphological features. CamelParser uses a state-of-the-art morphological disambiguator and improves its results using syntactically driven features. The system offers a number of output formats that include basic dependency with morphological features, two tree visualization modes, and traditional Arabic grammatical analysis.

2015

pdf bib
Predicting the Structure of Cooking Recipes
Jermsak Jermsurawong | Nizar Habash
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improving Arabic Diacritization through Syntactic Analysis
Anas Shahrour | Salam Khalifa | Nizar Habash
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus
Wajdi Zaghouani | Nizar Habash | Houda Bouamor | Alla Rozovskaya | Behrang Mohit | Abeer Heider | Kemal Oflazer
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
Proceedings of the Second Workshop on Arabic Natural Language Processing
Nizar Habash | Stephan Vogel | Kareem Darwish
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
The Second QALB Shared Task on Automatic Text Correction for Arabic
Alla Rozovskaya | Houda Bouamor | Nizar Habash | Wajdi Zaghouani | Ossama Obeid | Behrang Mohit
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools
Ahmed Hamdi | Alexis Nasr | Nizar Habash | Núria Gala
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
A Conventional Orthography for Algerian Arabic
Houda Saadane | Nizar Habash
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
Annotating Targets of Opinions in Arabic using Crowdsourcing
Noura Farra | Kathy McKeown | Nizar Habash
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib
Unsupervised Morphology-Based Vocabulary Expansion
Mohammad Sadegh Rasooli | Thomas Lippincott | Nizar Habash | Owen Rambow
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Generalized Character-Level Spelling Error Correction
Noura Farra | Nadi Tomeh | Alla Rozovskaya | Nizar Habash
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Sentence Level Dialect Identification for Machine Translation System Selection
Wael Salloum | Heba Elfardy | Linda Alamir-Salloum | Nizar Habash | Mona Diab
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development
Mohamed Maamouri | Ann Bies | Seth Kulick | Michael Ciul | Nizar Habash | Ramy Eskander
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon
Mona Diab | Mohamed Al-Badrashiny | Maryam Aminian | Mohammed Attia | Heba Elfardy | Nizar Habash | Abdelati Hawwari | Wael Salloum | Pradeep Dasigi | Ramy Eskander
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
A Conventional Orthography for Tunisian Arabic
Inès Zribi | Rahma Boujelbane | Abir Masmoudi | Mariem Ellouze | Lamia Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
Abir Masmoudi | Mariem Ellouze Khmekhem | Yannick Estève | Lamia Hadrich Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
A Multidialectal Parallel Corpus of Arabic
Houda Bouamor | Nizar Habash | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic
Arfath Pasha | Mohamed Al-Badrashiny | Mona Diab | Ahmed El Kholy | Ramy Eskander | Nizar Habash | Manoj Pooleery | Owen Rambow | Ryan Roth
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
Large Scale Arabic Error Annotation: Guidelines and Framework
Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Ossama Obeid | Nadi Tomeh | Alla Rozovskaya | Noura Farra | Sarah Alkuhlani | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

bib
Natural Language Processing of Arabic and its Dialects
Mona Diab | Nizar Habash
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

This tutorial introduces the different challenges and current solutions to the automatic processing of Arabic and its dialects. The tutorial has two parts: First, we present a discussion of generic issues relevant to Arabic NLP and detail dialectal linguistic issues and the challenges they pose for NLP. In the second part, we review the state-of-the-art in Arabic processing covering several enabling technologies and applications, e.g., dialect identification, morphological processing (analysis, disambiguation, tokenization, POS tagging), parsing, and machine translation.

pdf bib
Automatic Transliteration of Romanized Dialectal Arabic
Mohamed Al-Badrashiny | Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
The Illinois-Columbia System in the CoNLL-2014 Shared Task
Alla Rozovskaya | Kai-Wei Chang | Mark Sammons | Dan Roth | Nizar Habash
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)
Nizar Habash | Stephan Vogel
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
Building a Corpus for Palestinian Arabic: a Preliminary Study
Mustafa Jarrar | Nizar Habash | Diyam Akra | Nasser Zalmout
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
The First QALB Shared Task on Automatic Text Correction for Arabic
Behrang Mohit | Alla Rozovskaya | Nizar Habash | Wajdi Zaghouani | Ossama Obeid
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus
Ann Bies | Zhiyi Song | Mohamed Maamouri | Stephen Grimes | Haejoong Lee | Jonathan Wright | Stephanie Strassel | Nizar Habash | Ramy Eskander | Owen Rambow
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
A Pipeline Approach to Supervised Error Correction for the QALB-2014 Shared Task
Nadi Tomeh | Nizar Habash | Ramy Eskander | Joseph Le Roux
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction
Alla Rozovskaya | Nizar Habash | Ramy Eskander | Noura Farra | Wael Salloum
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining
Gilbert Badaro | Ramy Baly | Hazem Hajj | Nizar Habash | Wassim El-Hajj
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic
Serena Jeblee | Weston Feely | Houda Bouamor | Alon Lavie | Nizar Habash | Kemal Oflazer
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script
Ramy Eskander | Mohamed Al-Badrashiny | Nizar Habash | Owen Rambow
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
INVITED TALK 1: Computational Processing of Arabic Dialects
Nizar Habash
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants

2013

pdf bib
Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
Ramy Eskander | Nizar Habash | Owen Rambow
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
Ahmed El Kholy | Nizar Habash | Gregor Leusch | Evgeny Matusov | Hassan Sawaf
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition
Nadi Tomeh | Nizar Habash | Ryan Roth | Noura Farra | Pradeep Dasigi | Mona Diab
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Translating verbs between MSA and arabic dialects through deep morphological analysis (Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde) [in French]
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic
Wael Salloum | Nizar Habash
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Morphological Analysis and Disambiguation for Dialectal Arabic
Nizar Habash | Ryan Roth | Owen Rambow | Ramy Eskander | Nadi Tomeh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Morphological Enrichment of a Morphologically Underspecified Treebank
Sarah Alkuhlani | Nizar Habash | Ryan Roth
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Processing Spontaneous Orthography
Ramy Eskander | Nizar Habash | Owen Rambow | Nadi Tomeh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Correction and Extension of Morphological Annotations
Ramy Eskander | Nizar Habash | Ann Bies | Seth Kulick | Mohamed Maamouri
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
SPMRL‘13 Shared Task System: The CADIM Arabic Dependency Parser
Yuval Marton | Nizar Habash | Owen Rambow | Sarah Alkhulani
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah | Reut Tsarfaty | Sandra Kübler | Marie Candito | Jinho D. Choi | Richárd Farkas | Jennifer Foster | Iakes Goenaga | Koldo Gojenola Galletebeitia | Yoav Goldberg | Spence Green | Nizar Habash | Marco Kuhlmann | Wolfgang Maier | Joakim Nivre | Adam Przepiórkowski | Ryan Roth | Wolfgang Seeker | Yannick Versley | Veronika Vincze | Marcin Woliński | Alina Wróblewska | Eric Villemonte de la Clergerie
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Orthographic and Morphological Processing for Persian-to-English Statistical Machine Translation
Mohammad Sadegh Rasooli | Ahmed El Kholy | Nizar Habash
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Selective Combination of Pivot and Direct Statistical Machine Translation Models
Ahmed El Kholy | Nizar Habash | Gregor Leusch | Evgeny Matusov | Hassan Sawaf
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
A Web-based Annotation Framework For Large-Scale Text Correction
Ossama Obeid | Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Kemal Oflazer | Nadi Tomeh
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf bib
DIRA: Dialectal Arabic Information Retrieval Assistant
Arfath Pasha | Mohammad Al-Badrashiny | Mohamed Altantawy | Nizar Habash | Manoj Pooleery | Owen Rambow | Ryan M. Roth | Mona Diab
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf bib
Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features
Yuval Marton | Nizar Habash | Owen Rambow
Computational Linguistics, Volume 39, Issue 1 - March 2013

2012

pdf bib
Conventional Orthography for Dialectal Arabic
Nizar Habash | Mona Diab | Owen Rambow
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

pdf bib
Elissa: A Dialectal to Standard Arabic Machine Translation System
Wael Salloum | Nizar Habash
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Rich Morphology Generation Using Statistical Machine Translation
Ahmed El Kholy | Nizar Habash
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib
A Morphological Analyzer for Egyptian Arabic
Nizar Habash | Ramy Eskander | Abdelati Hawwari
Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology

pdf bib
Identifying Broken Plurals, Irregular Gender, and Rationality in Arabic Text
Sarah Alkuhlani | Nizar Habash
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Arabic Dialect Processing Tutorial
Mona Diab | Nizar Habash
Tutorial Abstracts at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
Nizar Habash | Ryan Roth
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
Yuval Marton | Nizar Habash | Owen Rambow
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality
Sarah Alkuhlani | Nizar Habash
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Fuzzy Syntactic Reordering for Phrase-based Statistical Machine Translation
Jacob Andreas | Nizar Habash | Owen Rambow
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation
Yuval Marton | Ahmed El Kholy | Nizar Habash
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
Wael Salloum | Nizar Habash
Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties

pdf bib
One-Step Statistical Parsing of Hybrid Dependency-Constituency Syntactic Representations
Kais Dukes | Nizar Habash
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Fast Yet Rich Morphological Analysis
Mohamed Altantawy | Nizar Habash | Owen Rambow
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

2010

pdf bib
Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features
Yuval Marton | Nizar Habash | Owen Rambow
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Morphological Annotation of Quranic Arabic
Kais Dukes | Nizar Habash
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach
Mohamed Altantawy | Nizar Habash | Owen Rambow | Ibrahim Saleh
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
Marine Carpuat | Yuval Marton | Nizar Habash
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules
Fadi Biadsy | Nizar Habash | Julia Hirschberg
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
CATiB: The Columbia Arabic Treebank
Nizar Habash | Ryan Roth
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language
Nizar Habash | Jun Hu
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Spoken Arabic Dialect Identification Using Phonotactic Modeling
Fadi Biadsy | Julia Hirschberg | Nizar Habash
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

pdf bib
Syntactic Reordering for English-Arabic Phrase-Based Machine Translation
Jakob Elming | Nizar Habash
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

2008

pdf bib
Improving NER in Arabic Using a Morphological Tagger
Benjamin Farber | Dayne Freitag | Nizar Habash | Owen Rambow
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

pdf bib
Identification of Naturally Occurring Numerical Expressions in Arabic
Nizar Habash | Ryan Roth
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

pdf bib
Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT
Josep M. Crego | Nizar Habash
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation
Nizar Habash
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking
Ryan Roth | Owen Rambow | Nizar Habash | Mona Diab | Cynthia Rudin
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
Nizar Habash | Ryan Gabbard | Owen Rambow | Seth Kulick | Mitch Marcus
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Combination of Statistical Word Alignments Based on Multiple Preprocessing Schemes
Jakob Elming | Nizar Habash
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Arabic Diacritization through Full Morphological Tagging
Nizar Habash | Owen Rambow
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Arabic Dialect Processing Tutorial
Mona Diab | Nizar Habash
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2006

pdf bib
Arabic Preprocessing Schemes for Statistical Machine Translation
Nizar Habash | Fatiha Sadat
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Developing and Using a Pilot Dialectal Arabic Treebank
Mohamed Maamouri | Ann Bies | Tim Buckwalter | Mona Diab | Nizar Habash | Owen Rambow | Dalila Tabessi
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

pdf bib
Inter-annotator Agreement on a Multilingual Semantic Annotation Task
Rebecca Passonneau | Nizar Habash | Owen Rambow
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

pdf bib
Parallel Syntactic Annotation of Multiple Languages
Owen Rambow | Bonnie Dorr | David Farwell | Rebecca Green | Nizar Habash | Stephen Helmreich | Eduard Hovy | Lori Levin | Keith J. Miller | Teruko Mitamura | Florence Reeder | Advaith Siddharthan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

pdf bib
Design, Construction and Validation of an Arabic-English Conceptual Interlingua for Cross-lingual Information Retrieval
Nizar Habash | Clinton Mah | Sabiha Imran | Randy Calistri-Yeh | Páraic Sheridan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

pdf bib
Parsing Arabic Dialects
David Chiang | Mona Diab | Nizar Habash | Owen Rambow | Safiullah Shareef
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Combination of Arabic Preprocessing Schemes for Statistical Machine Translation
Fatiha Sadat | Nizar Habash
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects
Nizar Habash | Owen Rambow
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Kareem Darwish | Mona Diab | Nizar Habash
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
Morphological Analysis and Generation for Arabic Dialects
Nizar Habash | Owen Rambow | George Kiraz
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop
Nizar Habash | Owen Rambow
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Interlingual Annotation of Multilingual Text Corpora
Stephen Helmreich | David Farwell | Bonnie Dorr | Nizar Habash | Lori Levin | Teruko Mitamura | Florence Reeder | Keith Miller | Eduard Hovy | Owen Rambow | Advaith Siddharthan
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

2003

pdf bib
A Categorial Variation Database for English
Nizar Habash | Bonnie Dorr
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

2002

pdf bib
Generation-Heavy Hybrid Machine Translation
Nizar Habash
Proceedings of the International Natural Language Generation Conference

2000

pdf bib
Generation from Lexical Conceptual Structures
David Traum | Nizar Habash
NAACL-ANLP 2000 Workshop: Applied Interlinguas: Practical Applications of Interlingual Approaches to NLP

Search
Co-authors