Resources for Croatian
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
General
- IHJJ - Institute of Croatian Language and Linguistics
- Croatian Language Technologies Portal - exhaustive lists of corpora, dictionaries, tools, associations, institutions and projects in LT. Developed in the Institute of Linguistics, Facutly of Humanities and Social Sciences, University of Zagreb.
Corpora
- Croatian National Corpus - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998.
- Croatian Language Corpus (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)
Free
- Southeast European Times (sentence aligned corpus, Albanian, Bosnian, Bulgarian, Croatian, English, Greek, Macedonian, Romanian, Serbian, Turkish — approximately 4.5 million words per language)
Lexicons
Free
Proprietary
- Croatian Morphological Lexicon - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb.