Difference between revisions of "Resources for Bulgarian"

Latest revision as of 09:36, 26 May 2014

Machine translation systems

Free software

apertium-mk-bg RBMT system between Macedonian and Bulgarian

Proprietary

WebTrance

Lexical resources

Morphological analysis

Free software

Morphological analyser 8,581 lemmata, ~88% coverage over SETimes

Proprietary

Grammars

Proprietary

BulNet WordNet (21,444 synonym sets)
KPML generation grammar

Corpora

Free

Southeast European Times, sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish — approximately 4.5 million words per language
Europarl corpus, sentence aligned with English
HamleDT, harmonized dependency treebanks of many languages, common annotation style.

Proprietary

Corpus of spoken Bulgarian

Bibliography

External links

@@ Line 2: / Line 2: @@
 ===Free software===
+* [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-mk-bg apertium-mk-bg] RBMT system between Macedonian and Bulgarian
 ===Proprietary===
@@ Line 8: / Line 10: @@
 ==Lexical resources==
+===Morphological analysis===
+====Free software====
+* [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-mk-bg/apertium-mk-bg.bg.dix Morphological analyser] 8,581 lemmata, ~88% coverage over SETimes
+====Proprietary====
 == Grammars ==
-* [[Generation grammars|KPML generation grammar]]
 ===Proprietary===
 * [http://dcl.bas.bg/BulNet/general_en.html BulNet WordNet] (21,444 synonym sets)
+* [[Generation grammars|KPML generation grammar]]
 ==Corpora==
 ===Free===
-* [http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
-* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language)
+* [http://www.statmt.org/setimes/ Southeast European Times], sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language
+* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
+* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 ===Proprietary===
+* [http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
 ==Bibliography==

Difference between revisions of "Resources for Bulgarian"

Latest revision as of 09:36, 26 May 2014

Contents

Machine translation systems

Free software

Proprietary

Lexical resources

Morphological analysis

Free software

Proprietary

Grammars

Proprietary

Corpora

Free

Proprietary

Bibliography

External links

Navigation menu

Search