Resources for Finnish
From ACL Wiki
- Europarl corpus, sentence aligned with English
- WMT News Crawl monolingual corpus. Currently 14M tokens.
- Finnish plain text and Co-occurrences at LCC
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Araneum Finnicum, Gigaword Finnish web corpus
- CSC Kielipankki Language Bank at the CSC Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
- Omorfi is an Open Morphology for Finnish, in association with the voikko speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with HFST. (LGPL/GPL)