Difference between revisions of "Resources for Persian"
Jump to navigation
Jump to search
(20 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | == Corpora == |
+ | ===Free=== | ||
+ | *[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain) | ||
+ | *[https://www.clarin.si/repository/xmlui/handle/11356/1042 Orwell's 1984 Corpus in MULTEXT-EAST] (public domain) | ||
+ | |||
+ | ===Proprietary=== | ||
+ | <!-- Please keep this list in alphabetical order --> | ||
+ | *[http://ece.ut.ac.ir/DBRG/Bijankhan/ Bijankhan corpus] (gratis for research/non-commercial purposes) | ||
+ | *[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S50 CALLFRIEND Farsi (speech)], LDC | ||
+ | *[http://ece.ut.ac.ir/dbrg/hamshahri/ Hamshahri corpus] (gratis for research/non-commercial purposes) | ||
+ | *[http://www.elda.org/catalogue/en/speech/S0112.html Persian speech database Farsdat], ELRA | ||
− | ===Free | + | == Online Concordance Tools == |
+ | *[http://pars.ie/lr/corpora/run.cgi/corp_info?corpname=multext_east_farsi Orwell's 1984 Corpus] (public domain) | ||
+ | ==Lexical resources== | ||
+ | ===Free=== | ||
+ | *[http://www.ling.ohio-state.edu/~jonsafari/corpora/wikipedia_fa-en_20120217.txt.xz Persian - English dictionary], derived from Wikipedia article names. Retains Wikipedia's CC-BY-SA 3.0 license. | ||
+ | |||
+ | ===Proprietary=== | ||
+ | *[http://pwn.ir Persian WordNet] | ||
+ | |||
+ | *[http://catalog.elra.info/product_info.php?products_id=1126 ELRA Persian Lexicon, ISLRN : 547-614-436-004-7] | ||
+ | |||
+ | ==Machine translation== | ||
+ | ===Free=== | ||
+ | *[http://ece.ut.ac.ir/node/100869?destination=node%2F100869 Tehran English-Persian Parallel Corpus] by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use. | ||
===Proprietary=== | ===Proprietary=== | ||
Line 7: | Line 30: | ||
==Morphology tools== | ==Morphology tools== | ||
− | ===Free | + | ===Free=== |
*[http://sourceforge.net/projects/perstem Perstem] - Persian stemmer, light morphological analyzer, and character set converter. | *[http://sourceforge.net/projects/perstem Perstem] - Persian stemmer, light morphological analyzer, and character set converter. | ||
− | *[http://apertium.svn.sourceforge.net/svnroot/apertium/ | + | *[http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tg-fa/apertium-tg-fa.fa.dix Morphological dictionary] — compiled using [[lttoolbox]]. |
+ | *[http://stp.lingfil.uu.se/~mojgan/ BLARK by Mojgan Seraji] – normaliser, tokeniser, segmentation, hunpos model for PoS-tagging and (java) dependency parser, all GPL | ||
− | == | + | ==Parsing== |
− | ===Free | + | ===Free=== |
− | *[http:// | + | * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. |
+ | * [http://www.ling.ohio-state.edu/~jonsafari/persianlg/ Persian dictionaries] for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. By [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari]. These require the Perstem stemming package, above. | ||
+ | * [http://stp.lingfil.uu.se/~mojgan/UPDT.html Uppsala Persian Dependency Treebank], Creative Commons Attribution 3.0 License | ||
===Proprietary=== | ===Proprietary=== | ||
− | + | *[http://dadegan.ir/en/persiandependencytreebank Dadegan Dependency Treebank] for research purposes only. | |
− | *[http:// | + | *[http://hpsg.fu-berlin.de/~ghayoomi/PTB.html HPSG Persian Treebank (PerTreeBank)] for academic research purposes only. |
− | *[http:// | ||
− | |||
− | |||
− | |||
− | |||
− | |||
==Bibliography== | ==Bibliography== | ||
* Dehdari, Jon, and Deryle Lonsdale. 2008. [http://www.ling.ohio-state.edu/~jonsafari/papers/dehdari_lonsdale_2005.pdf A link grammar parser for Persian]. In Karimi, S., Samiian, V., and Stilo, D., editors, ''Aspects of Iranian Linguistics'', volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 ([http://www.ling.ohio-state.edu/~jonsafari/bib/dehdarilonsdale2005.bib.txt BIB]) | * Dehdari, Jon, and Deryle Lonsdale. 2008. [http://www.ling.ohio-state.edu/~jonsafari/papers/dehdari_lonsdale_2005.pdf A link grammar parser for Persian]. In Karimi, S., Samiian, V., and Stilo, D., editors, ''Aspects of Iranian Linguistics'', volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 ([http://www.ling.ohio-state.edu/~jonsafari/bib/dehdarilonsdale2005.bib.txt BIB]) | ||
+ | |||
+ | * QasemiZadeh, Behrang and Rahimi Saeed. Persian in MULTEXT-East Framework, FinTAL, 2006, pp 541-551 ([http://pars.ie/publications/papers/pre-prints/persian-in-multext-east.pdf]). | ||
* Feili, H. and G. Ghassem-Sani (2004) "[http://sharif.edu/~sani/papers/Feili_SaniE2.pdf An Application of Lexicalized Grammars in English-Persian Translation]". ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600. | * Feili, H. and G. Ghassem-Sani (2004) "[http://sharif.edu/~sani/papers/Feili_SaniE2.pdf An Application of Lexicalized Grammars in English-Persian Translation]". ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600. | ||
* Megerdoomian, K. (2000) "[http://crl.nmsu.edu/Research/Projects/shiraz/publications/papers/Cicling.pdf Unification-Based Persian Morphology]". ''Proceedings of CICLing 2000'', Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000. | * Megerdoomian, K. (2000) "[http://crl.nmsu.edu/Research/Projects/shiraz/publications/papers/Cicling.pdf Unification-Based Persian Morphology]". ''Proceedings of CICLing 2000'', Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000. | ||
* Megerdoomian, K. (2004) "[http://acl.ldc.upenn.edu/coling2004/W5/pdf/W5-7.pdf Finite-State Morphological Analysis of Persian]". ''COLING 2004 Computational Approaches to Arabic Script-based Languages''. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41. | * Megerdoomian, K. (2004) "[http://acl.ldc.upenn.edu/coling2004/W5/pdf/W5-7.pdf Finite-State Morphological Analysis of Persian]". ''COLING 2004 Computational Approaches to Arabic Script-based Languages''. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41. | ||
+ | * Mohammad Amin Farajian (2011). [http://world-comp.org/p2011/ICA4953.pdf PEN: Parallel English-Persian News Corpus]. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA. | ||
==See also== | ==See also== | ||
+ | *[[Resources for Kurdish]] | ||
*[[Resources for Tajik]] | *[[Resources for Tajik]] | ||
==External links== | ==External links== | ||
− | *[http:// | + | *https://wiki.iranianlinguistics.org/wiki/Main_Page: NLP Resources for Persian] |
+ | *[http://www.ling.ohio-state.edu/~jonsafari/persian_nlp.html the Jon safari] (link parser, small lexicon, stemmer, morphological analysis tools) | ||
[[Category:Resources by language|Persian]] | [[Category:Resources by language|Persian]] |
Latest revision as of 09:58, 23 February 2016
Corpora
Free
- VOA Persian Corpus 2003-2008 (public domain)
- Orwell's 1984 Corpus in MULTEXT-EAST (public domain)
Proprietary
- Bijankhan corpus (gratis for research/non-commercial purposes)
- CALLFRIEND Farsi (speech), LDC
- Hamshahri corpus (gratis for research/non-commercial purposes)
- Persian speech database Farsdat, ELRA
Online Concordance Tools
- Orwell's 1984 Corpus (public domain)
Lexical resources
Free
- Persian - English dictionary, derived from Wikipedia article names. Retains Wikipedia's CC-BY-SA 3.0 license.
Proprietary
Machine translation
Free
- Tehran English-Persian Parallel Corpus by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use.
Proprietary
- The Shiraz project (Persian -> English)
Morphology tools
Free
- Perstem - Persian stemmer, light morphological analyzer, and character set converter.
- Morphological dictionary — compiled using lttoolbox.
- BLARK by Mojgan Seraji – normaliser, tokeniser, segmentation, hunpos model for PoS-tagging and (java) dependency parser, all GPL
Parsing
Free
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Persian dictionaries for the Link-Grammar parser. By Jon Dehdari. These require the Perstem stemming package, above.
- Uppsala Persian Dependency Treebank, Creative Commons Attribution 3.0 License
Proprietary
- Dadegan Dependency Treebank for research purposes only.
- HPSG Persian Treebank (PerTreeBank) for academic research purposes only.
Bibliography
- Dehdari, Jon, and Deryle Lonsdale. 2008. A link grammar parser for Persian. In Karimi, S., Samiian, V., and Stilo, D., editors, Aspects of Iranian Linguistics, volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 (BIB)
- QasemiZadeh, Behrang and Rahimi Saeed. Persian in MULTEXT-East Framework, FinTAL, 2006, pp 541-551 ([1]).
- Feili, H. and G. Ghassem-Sani (2004) "An Application of Lexicalized Grammars in English-Persian Translation". Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600.
- Megerdoomian, K. (2000) "Unification-Based Persian Morphology". Proceedings of CICLing 2000, Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000.
- Megerdoomian, K. (2004) "Finite-State Morphological Analysis of Persian". COLING 2004 Computational Approaches to Arabic Script-based Languages. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41.
- Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA.
See also
External links
- https://wiki.iranianlinguistics.org/wiki/Main_Page: NLP Resources for Persian]
- the Jon safari (link parser, small lexicon, stemmer, morphological analysis tools)