Difference between revisions of "Resources for Polish"
Jump to navigation
Jump to search
Bilbao451f (talk | contribs) |
Bilbao451f (talk | contribs) |
||
Line 1: | Line 1: | ||
==Corpora== | ==Corpora== | ||
* [http://korpus.pl/en/ IPI PAN Corpus] - The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS) | * [http://korpus.pl/en/ IPI PAN Corpus] - The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS) | ||
+ | * [http://korpus.pwn.pl/ PWN Corpus] - PWN has prepared and made available an online version of the Corpus of Polish consisting of 40 million words. The samples were taken from 386 books, 977 editions selected from 185 different press publications, 84 transcribed spoken texts, 207 web sites and several hundred advertising leaflets and other ephemera. The full version of the corpus is available on payment for access, while a demonstration version of over 7.5 million words is available free of charge. | ||
==Parsers== | ==Parsers== |
Revision as of 07:26, 7 December 2008
Corpora
- IPI PAN Corpus - The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS)
- PWN Corpus - PWN has prepared and made available an online version of the Corpus of Polish consisting of 40 million words. The samples were taken from 386 books, 977 editions selected from 185 different press publications, 84 transcribed spoken texts, 207 web sites and several hundred advertising leaflets and other ephemera. The full version of the corpus is available on payment for access, while a demonstration version of over 7.5 million words is available free of charge.
Parsers
- Spejd - Shallow Parsing and Disambiguation Engine
- Świgra - a DCG Parser of Polish
- Dawid Weiss - lemmmatizer Polish
Lexical resources
Bibliography
External links
- Polish linguistics mailing list - mainly in Polish