Zygmunt Vetulani


2020

pdf bib
Polish Lexicon-Grammar Development Methodology as an Example for Application to other Languages
Zygmunt Vetulani | Grażyna Vetulani
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

In the paper we present our methodology with the intention to propose it as a reference for creating lexicon-grammars. We share our long-term experience gained during research projects (past and on-going) concerning the description of Polish using this approach. The above-mentioned methodology, linking semantics and syntax, has revealed useful for various IT applications. Among other, we address this paper to researchers working on “less” or “middle-resourced” Indo-European languages as a proposal of a long term academic cooperation in the field. We believe that the confrontation of our lexicon-grammar methodology with other languages – Indo-European, but also Non-Indo-European languages of India, Ugro-Finish or Turkic languages in Eurasia – will allow for better understanding of the level of versatility of our approach and, last but not least, will create opportunities to intensify comparative studies. The reason of presenting some our works on language resources within the Wildre workshop is the intention not only to take up the challenge thrown down in the CFP of this workshop which is: “To provide opportunity for researchers from India to collaborate with researchers from other parts of the world”, but also to generalize this challenge to other languages.

2016

pdf bib
Recent Advances in Development of a Lexicon-Grammar of Polish: PolNet 3.0
Zygmunt Vetulani | Grażyna Vetulani | Bartłomiej Kochanowski
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The granularity of PolNet (Polish Wordnet) is the main theoretical issue discussed in the paper. We describe the latest extension of PolNet including valency information of simple verbs and noun-verb collocations using manual and machine-assisted methods. Valency is defined to include both semantic and syntactic selectional restrictions. We assume the valency structure of a verb to be an index of meaning. Consistently we consider it an attribute of a synset. Strict application of this principle results in fine granularity of the verb section of the wordnet. Considering valency as a distinctive feature of synsets was an essential step to transform the initial PolNet (first intended as a lexical ontology) into a lexicon-grammar. For the present refinement of PolNet we assume that the category of language register is a part of meaning. The totality of PolNet 2.0 synsets is being revised in order to split the PolNet 2.0 synsets that contain different register words into register-uniform sub-synsets. We completed this operation for synsets that were used as values of semantic roles. The operation augmented the number of considered synsets by 29%. In the paper we report an extension of the class of collocation-based verb synsets.

2014

pdf bib
PolNet - Polish WordNet” project: PolNet 2.0 - a short description of the release
Zygmunt Vetulani | Bartłomiej Kochanowski
Proceedings of the Seventh Global Wordnet Conference

2012

pdf bib
Wordnet Based Lexicon Grammar for Polish
Zygmunt Vetulani
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In the paper we present a long-term on-going project of a lexicon-grammar of Polish. It is based on our former research focusing mainly on morphological dictionaries, text understanding and related tools. By Lexicon Grammars we mean grammatical formalisms which are based on the idea that sentence is the fundamental unit of meaning and that grammatical information should be closely related to words. Organization of the grammatical knowledge into a lexicon results in a powerful NLP tool, particularly well suited to support heuristic parsing. The project is inspired by the achievements of Maurice Gross, Kazimierz Polanski and George Miller. We present the actual state of the project of a wordnet-like lexical network PolNet with particular emphasis on its verbal component, now being converted into the kernel of a lexicon grammar for Polish. We present various aspects of PolNet development and validation within the POLINT-112-SMS project. The reader is precisely informed on the current stage of the project.

2010

pdf bib
PolNet — Polish WordNet: Data and Tools
Zygmunt Vetulani | Marek Kubis | Tomasz Obrębski
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the PolNet-Polish WordNet project which aims at building a linguistically oriented ontology for Polish compatible with other WordNet projects such as Princeton WordNet, EuroWordNet and other similarly organized ontologies. The main idea behind this kind of ontologies is to use words related by synonymy to construct formal representations of concepts. In the paper we sketch the PolNet project methodology and implementation. We present data obtained so far, as well as the WQuery tool for querying and maintaining PolNet. WQuery is a query language that make use of data types based on synsets, word senses and various semantic relations which occur in wordnet-like lexical databases. The tool is particularly useful to deal with complex querying tasks like searching for cycles in semantic relations, finding isolated synsets or computing overall statistics. Both data and tools presented in this paper have been applied within an advanced AI system POLINT-112-SMS with emulated natural language competence, where they are used in the understanding subsystem.

2008

pdf bib
Verb-Noun Collocation SyntLex Dictionary: Corpus-Based Approach
Grazyna Vetulani | Zygmunt Vetulani | Tomasz Obrębski
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The project presented here is a part of a long term research program aiming at a full lexicon grammar for Polish (SyntLex). The main concern of this project is computer-assisted acquisition and morpho-syntactic description of verb-noun collocations in Polish. We present methodology and resources obtained in three main project phases which are: dictionary-based acquisition of collocation lexicon, feasibility study for corpus-based lexicon enlargement phase, corpus-based lexicon enlargement and collocation description. In this paper we focus on the results of the third phase. The presented here corpus-based approach permitted us to triple the size the verb-noun collocation dictionary for Polish. In the paper we describe the SyntLex Dictionary of Collocations and announce some future research intended to be a separate project continuation.

2006

pdf bib
Syntactic Lexicon of Polish Predicative Nouns
Grażyna Vetulani | Zygmunt Vetulani | Tomasz Obrębski
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the paper we report realization of SyntLex project aiming at construction of a full lexicon grammar for Polish. The lexicon-grammar based paradigm in computer linguistics is derived from the predicate logic and attributes a central role to the predicative constructions. An important class of syntactic constructions in many languages (French, English, Polish and other Slavonic languages in particular) are those based on verbo-nominal collocations, with the verb playing a support role with respect to the noun considered as carrying the predicative information. In this paper we refer to the former research by one of the authors aiming at full description of verbo-nominal predicative constructions for Polish in the form of an electronic resource for LI applications. We describe procedures to complete and corpus-validate the resource obtained so far.

2004

pdf bib
An Environment for Dialogue Corpora Collection (ENDIACC)
Zygmunt Vetulani
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2000

pdf bib
Electronic Language Resources for Polish: POLEX, CEGLEX and GRAMLEX
Zygmunt Vetulani
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)