Difference between revisions of "Resources for Arabic"

From ACL Wiki
Jump to navigation Jump to search
 
(4 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
*[https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
 
*[https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
 
*[http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]
 
*[http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]
 +
*[http://sourceforge.net/projects/aracomlex/ AraComLex] - An open source finite state morphology for Modern Standard Arabic. The source files can be compiled by the open source compiler, foma, or Xerox xfst.
 +
* [https://github.com/mikahama/uralicNLP UralicNLP] is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many languages including Arabic
  
 
===Proprietary===
 
===Proprietary===
 
*[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator]
 
*[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator]
 +
 +
==WordNets==
 +
 +
===Free software===
 +
* http://compling.hss.ntu.edu.sg/omw/ Hebrew Wordnet with links to all the other Open Multilingual Wordnets
 +
 +
===Proprietary===
 +
* http://babelnet.org/ (available for download for "Non-Commercial" use)
  
 
==Parsers==
 
==Parsers==
Line 18: Line 28:
 
===Proprietary===
 
===Proprietary===
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs
 +
 +
==Diacritization==
 +
===Free software===
 +
*[https://github.com/mikahama/haracat hAraCat] a free tool for predicting vowels and other diacritics.
  
 
===Free/open licence===
 
===Free/open licence===
Line 24: Line 38:
 
* [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words.
 
* [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words.
 
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
  
 
==Bibliography==
 
==Bibliography==

Latest revision as of 05:36, 29 June 2020

Morphology

Free software

  • AraMorph - Perl - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
  • AraMorph - Java - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for Lucene
  • AraComLex - An open source finite state morphology for Modern Standard Arabic. The source files can be compiled by the open source compiler, foma, or Xerox xfst.
  • UralicNLP is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many languages including Arabic

Proprietary

WordNets

Free software

Proprietary

Parsers

Free software

Corpora

Proprietary

Diacritization

Free software

  • hAraCat a free tool for predicting vowels and other diacritics.

Free/open licence

Bibliography

External links