Discontinuous VP in Bulgarian

This paper presents Bulgarian discontinuous constituents. 1 Bulgarian is claimed to be a language of relatively free word order. As a typical manifestation of free word order dis-continuous constituents in Bulgarian have not been studied so far. The paper discusses and analyzes the freedom in Bulgarian word order and points out the way discontinuity has been treated within BulTreeBank. We show the results of our linguistic analysis of discontinu-ous VPs and summarize the extent of word order freedom and word order constraints within VP.


Introduction
It is well known that discontinuous constituents are typical manifestation of free word order. It is also claimed that discontinuity is characteristic of languages with rich morphology. Bulgarian shares both of the above features. Scientists, working on word order problems in Bulgarian, debate whether Bulgarian is a configurational or nonconfigurational language. Most of them share the belief that Bulgarian is a configurational language, but has some non-configurational features (i.e. the free permutation of the elements within VP) (Penchev, 1991).
By exploring the issue of discontinuity in VP we aim to show the extent of word order freedom in Bulgarian and the restrictions on this freedom, coming from semantics.
The structure of the paper is as follows: In Section 2 discontinuous constituents and the theories of "free" word order in Bulgarian are discussed; In Section 3 discontinuous constituents are presented within BulTreeBank; Section 4 deals with the types of discontinuity in VP and Section 5 concludes the paper.

Discontinuous constituents and the theories of "free" word order in Bulgarian
Researchers on Bulgarian word order have so far noticed that there is a greater word order freedom within the verb phrase than within other phrases (Rudin, 1986;Penchev, 1991). Scientists show that the variety of word order models usually is due to the influence of discourse on word ordering. In the tradition of Bulgarian word order investigations, especially in the first half of the 20 th century, information packaging was taken as one of the most prominent manifestations of discourse. Thus much of the research in the field of word order was devoted to the connection between word order and information packaging. Sv. Ivanchev (Ivanchev, 1975) is the first scientist who spreaded the ideas of the Prague linguistic school in Bulgaria. The relation between word order and information packaging is investigated by a number of researchers (Georgieva, 1974;Brezinski, 1995;Avgustinova, 1997, Tisheva, 2003, Tisheva and Djonova, 2002Tisheva and Djonova, 2004a;Tisheva and Djonova, 2004b;Tisheva, 2013).
The interrelation between intonation, word order and information packaging is a topic of research of another Bulgarian linguist -Jordan Penchev (Penchev, 1980). J. Penchev aims to describe the main intonation types of Bulgarian sentences and for this reason he uses the information from the relationship between semantics and information packaging. There are several researchers who investigate particular word order constructions, but as a whole we can summarize that all the researchers claim that there is a number of factors from structural (syntactic), discourse and prosodic nature, which affect the word order models in Bulgarian. The different combinations of these factors give rise to a larger number of word order combinations within some Bulgarian phrases (the VP especially), than in others (NP, AP), which is the reason for the researchers to discuss whether Bulgarian is a nonconfigurational language. Scientists deny this hypothesis, showing that the word order freedom in some phrases is an isolated phenomenon, which cannot be taken as a sign of nonconfiugrationality. They claim that within the structure of Bulgarian there is a combination of configurational and nonconfigurational features.
Based on the above mentioned assumptions, all the researchers are on the shared opinion of that Bulgarian has a rather free or relatively free word order, but noone has pointed out precisely the extent of the word order freedom and the restrictions on word order. Also noone so far has studied discontinuity in Bulgarian, so our survey is the first attempt to analyze one of the most frequent types of discontinuous VPs and the factors, causing discontinuity.

Representation of discontinuous constituents within BulTreeBank
Investigating discontinuity within a corpus is a good way to investigate the extent of word order freedom and word order constraints. For this sake we use the corpus of syntactic trees in Bulgarian, namely BulTreeBank, which is a corpus of syntactic trees of Bulgarian sentences. Constituency within the treebank is represented via graphs, which are defined on the basis of mother-daughter relation (Simov and Osenova, 2004). Graphs are chosen as close to the context free-tree representation (Simov and Osenova, 2004). In the syntactic trees the original word order is preserved and discontinuous elements are introduced where necessary (Simov and Osenova, 2004). In the examples from the treebank VPC stands for verbcomplement phrase, VPS -for a verb-subject phrase, VPA is a head-adjunct phrase and VPF -a head-filler phrase, which has an extracted element, realized outside the phrase. There are three types of discontinuous constituents in the treebank.

Functional element DiscA
This is when a higher dependent is realized between the head and lower dependent/s. For the word order to be preserved, the higher element is marked up with the functional element DiscA (Discontinuous adjunct) and is annotated at a higher place with the functional element nid (nonimmediate dominance). Then the element DiscA and nid are connected with the same index, seen as a line in the tree below (Simov and Osenova, 2004).

Functional element DiscM
DiscM stands for discontinuous mixture. This is a mixture of two constituents. The elements of two constituents are mixed with neither of the two being a governor of the other (Simov and Osenova, 2004). This is a very rare case of discontinuity and has only two or three occurrences in the treebank.

Functional element DiscE
This is external realization of inner constituent. This is the case of extraction (Simov and Osenova, 2004). Again the element DiscE (Discontinuous extraction) is marked with nid with the same index as the phrase where it has been extracted from. In this part we present our investigation on discontinuity within VP. VP is chosen as the most prominent example of free word order in Bulgarian (see Section 2a). For the completion of the task it was necessary first of all to extract all the sentences, containing discontinuous VPs from the corpus. The total number of sentences with discontinuity in the corpus is 4160 sentences, which makes about 35% of all the sentences in the treebank. After doing this, we had to select the types of discontinuities within VP. We found out that there are 2 main groups of discontinuity in Bulgarian VP: i) discontinuity, caused by an element which is part of the syntactic structure of the sentence (the element, causing discontinuity in this case is marked up with DiscA in the treebank) and ii) discontinuity, caused by an element, which is not part of the syntactic structure of the sentence (the element, causing discontinuity in this case is marked up with the tag Pragmatic element in the treebank). In this paper we will not deal with discontinuities of the second type. We are focused only on discontinuities, caused by elements, which are part of the syntactic structure of the tree. These elements are: adjuncts; extracted complements of the head verb and the subject. Here we will focus only on discontinuity, caused by adjuncts.

Discontinuity, caused by adjuncts
Within BulTreeBank discontinuities, caused by adjuncts, are the greatest number of discontinuities (67% of all the sentences with discontinuity in VP). In the treebank the sentences are annotated along the lines of HPSG (Pollard and Sag 1994). Thus, according to the theoretical frame we use, adjuncts are attached as sisters of the saturated VP phrase, i.e. when the verb has realized its dependents -complement/s and subject (if there is a subject in the sentence, since Bulgarian is a pro-drop language). Only after the verb has taken its dependents and formed either a VPC (verbcomplement phrase) or a VPS (verb-subject phrase), the adjunct is attached to this VPC or VPS phrase, forming a VPA phrase. This is the usual case, when adjuncts are realized linearly without causing discontinuity. In this linear realization of the adjunct the latter modifies semantically the saturated VP. On the contrary, in the cases of discontinuities the adjuncts are realized linearly first and the dependents of the verb (subject and/or complements) -afterwards. In such sentences the projection of the VPA phrase is higher up in the tree and the linear intersection is seen as a line in the graph (see Section 3.1). There are two cases of discontinuity in VP, caused by adjunct: i) The adjunct is realized between the subject and the head verb; ii) The adjunct is realized between the head verb and the complement. Before starting the linguistic analysis, we came across one problem. Namely, the adjuncts were not classified by types in the treebank. Therefore, we needed to have a classification of adjuncts first and then annotate manually all the adjuncts in the sentences of discontinuities, according to this classification. Only afterwards we could extract the sentences with discontinuities by types of the adjuncts. The classification of adjuncts we used is based on GSBKE (GSBKE, 1983) and contains the following types of adjuncts: adjuncts of time, of manner, of quantity and degree, of place, adjuncts of second predication, of condition, of reason, and of aim. In all the sentences with adjunct, causing discontinuity, we annotated the adjuncts manually along this classification. Then it was possible to extract the groups of discontinuities by the type of the adjunct. This allowed us to make conclusions about the reasons, causing discontinuous linear realization of the elements of the VP.

Discontinuity between the subject and the head verb
This is the biggest group of sentences with discontinuities, caused by adjuncts.
Here is the proportional distribution of sentences with adjuncts, causing discontinuity between the subject and the head verb in VP: Adjuncts of time -45%; Adjunts of manner -31%; Adjuncts of quantity and degree -11%; Adjuncts of place -5%; Adjuncts of second predication -3%; Adjuncts of condition -2%; Adjuncts of reason -1%; Adjuncts of aim -less than 1% The information packaging 2 in this type of sentences follows two main patterns: 1) The adjunct is part of the Ground.

Example: Ground [link[V tzivilizovania sviat] tail [chovek prez celia si zhivot (adjunct of time)]]
Focus [pazi "svetaia svetih" na svoiata reputaciasvoeto kreditno dosie]. (In the civilized world one, during his whole life, keeps the most precious of his reputation -his credit history.) In sentences with such discontinuity and communicatively marked word order 3 the adjunct can take the information value of either tail, or link of the tail. In sentences with communicatively unmarked word order the information value of the adjunct is only a tail.
2) The adjunct is part of the Focus 2 For analysis of information packaging we use the methodology of Engdahl and Vallduvi (Engdahl and Vallduvi, 1994;Engdahl and Vallduvi 1996), where Focus is the actual information of the sentence, Ground is what is presupposed by the information at the output. Sentences have Ground only if the context ensures it. The Link is the particular place in the sentence for introduction of the new information and the Tail points out that there is a need for information update in this part of the discourse.
Example: Ground [Vseki opit za ocenka na organiziranata prestapnost v izmereniata na nacionalnata sigurnost] Focus [zadalzhitelno (adjunct of manner) predpolaga predvaritelno da se utochniat obhvatyt i sadarzhanieto na samoto poniatie. (Any attempt to estimate the organized crime in the context of the national security obligatorily presupposes to define content of the notion itself.)
Here is the proportional distribution of sentences with adjuncts, causing discontinuity between the subject and the head verb in VP: Adjuncts of manner -40%; Adjunts of time -30%; Adjuncts of quantity and degree -11%; Adjuncts of place -9%; Adjuncts of second predication -4%; Adjuncts of aim -2%; juncts of condition -less than 1%; Adjuncts of reason -less than 1%. In sentences with adjuncts between the head verb and the complement the adjunct becomes part of the focus.

Conclusion about the word order models with discontinuity, caused by adjuncts
From our linguistic analysis we can summarize that the factors, which rule the realization of adjuncts within VP are: 1. The information packaging within the sentence, which depends on 2. The semantics of the adjuncts. According to the semantics of the adjunct and according to which part of the sentence the adjunct is syntactically attached, we distinguish4 two types of adjuncts: i) sentential (they modify semantically the whole sentence) and ii) phrasal (they modify semantically a particular element of the VP).
Most of the adjuncts in Bulgarian modify semantically the whole sentence (these are the adjuncts of time, of place, of condition, of reason and of aim). Syntactically, these adjuncts are realized as sisters of the saturated VP. Thus their linear realization within VP is only a result of the particular information packaging that the speaker chooses to make in his utterance.
The word order realization of the phrasal adjuncts (adjuncts of manner, of quantity and degree and adjuncts of second predication) is restricted by semantic constraints. The semantic scope of these adjuncts -i.e. over a particular element of the VP 5 -demands that they are realized in contact to the element they semantically modify (the contact position can be pre-or postposition).
Since discontinuous constituents are a typical manifestation of free word order, we can summarize that the word order freedom within Bulgarian VP is a result of different information packaging. The constraints on word order, though, come from semantics. This means that whenever adjuncts with narrow sematic scope are realized within VP, their semantics poses restrictions on word order since the adjunct has to be realized in contact to the element of the VP it semantically modifies. The realization of the adjunct in contact to the element it modifies semantically (in pre-or postposition to this element) results in syntactic discontinuity of the phrase.

Conclusion
In this paper we have reviewed the theories about Bulgarian word order in the limelight of discontinuous constituents. We have shown how discontinuous constituents have been presented within Bul-TreeBank. We have also pointed out the types of discontinuous constituents and presented our linguistic analysis of the discontinuities, caused by adjuncts. We have described the reasons for linear realization of adjuncts within VP and we have also summarized the factors, which trigger word order freedom and impose word order constraints on the elements of VP, thus pointing out the precise extent of word order freedom in Bulgarian, which had not been studied thoroughly so far.