Full valency and the position of enclitics in the Old Czech

The paper is focused on the analysis of the relationship between the full valency of the predicate and the position of enclitics in the clause. For this analysis, ones of the oldest Old Czech prose texts were used. We set up the hypothesis the higher the full valency of the predicate, the lower the probability of the occurrence of the enclitic after the initial phrase of the clause – and test it. The hypothesis was corroborated only for narrative texts. In the case of poetic texts, the hypothesis was rejected.


Introduction
Enclitics are language units with a variety of specific grammatical characteristics that have attracted linguists for decades. Despite the fact that a huge number of methods and theoretical approaches were applied to study of this phenomenon, some fundamental questions are still open. Among others, an empirical diachronic description and an explanation of the historical development of enclitics that is based on a larger amount of language material and interpreted from the point of view of the quantitative linguistics remain rather an unexplored field. The reasons are obvious: language material is accessible only with difficulties (in the majority of cases it must be transcribed from a manuscript); an annotation must be performed manually; for the oldest periods, a limited number of texts is available, etc. However, an analysis of the historical development of any language property (or unit) often brings knowledge that substantially enhances an understanding of the phenomena under study. Therefore, in a recent series of papers several properties of enclitics in Old Czech and their historical development were explored (cf. Kosek et al., 2018aKosek et al., , 2018bKosek et al., , 2018cKosek et al., , 2018d, with the aim of obtaining a diachronic perspective of their characteristics. This paper represents a further step in this endeavour. Specifically, we analyse the relationship between the position of the pronominal enclitic in the clause and the so-called full valency (Čech et al., 2010) of the clause predicate. We assume that the full valency (for details, see Section 4) is one of factors which significantly influence the position of the enclitic. Therefore, we set up the following hypothesis: The higher the full valency of the predicate, the lower the probability of the occurrence of the enclitic after the initial phrase of the clause.
The position of the enclitic immediately after the initial phrase (hereafter, this position will be called the postinitial position, abbreviated as 2P) is considered, according to the Wackernagel's Law (Wackernagel, 1892), the basic position of this unit in the clause, cf. the position of reflexive "sě" in the sentence (1).
(1) [Co] sě tobě vidí, Šimone? whatNOM REFLACC see3.PS.SG.PRAES 'What is thy opinion, Simon? ' Bible olomoucká (BiblOl) Mathew 17,24 In previous studies (Kosek et al., 2018c; it was shown that there are several factors which move the enclitic to a position which is different from 2P: for instance, the length of the initial phrase, the style, and the impact of the original Latin pretext. All these factors decrease the probability of the occurrence of the enclitic in the 2P position. This study is based on the assumption that the full valency is another factor which should lead to a similar result. Reasons for this assumption are summarized in Section 4 where the notion of full valency is introduced in detail. Statistical methods applied in this paper require certain amount of data to be reliable, therefore, only the most frequent pronominal enclitic in Old Czech, i.e. the enclitic "sě" (accusative reflexive) is analysed. Data from the Olomouc Bible (Bible olomoucká, BiblOl) and Litoměřice-Třeboň Bible (Bible litoměřickotřeboňská, BiblLitTřeb), which represent the oldest complete Czech Bible translation, are used.

Word order of enclitics in Old Czech
For the purpose of this study, we determine two positions of the enclitic (E) in Old Czech: a) the postinitial position, schematically where symbol [I] represents the initial phrase of the clause and symbol []* represents any consequent syntactic unit(s) of the clause (including the empty unit, i.e. the clause can end with the enclitic). The initial phrase can be represented by one or more words, cf. sentence (1) and (2)  For the analysis, some books from the Olomouc Bible (Bible olomoucká, BiblOl) and one book (Acts) from Litoměřice-Třeboň Bible (Bible litoměřicko-třeboňská, BiblLitTřeb) were used. These Bibles originates from the beginning of 15 th century (however, it is considered to be copied from missing older translation from 1360, cf. Kyas, 1997;Vintr, 2008) and can be ranked among the oldest Old Czech prose texts (older texts, from the first half of the 14th century, are poetic, and they cannot be used to observe word order characteristics). Since our long-term aim is an analysis of the historical development of the word order characteristics of enclitics, the use of one of the oldest texts seems to be a proper choicethe result of this study can be, afterwards, compared with the results based on later Czech Bible translations.
All the phenomena under the study must be annotated manually, therefore, only eight books from the Bible were analysed. Specifically, four books from the Old Testament and four books from the New Testament were chosen: Genesis (Gen), Isaiah (Is), Job (Job), Ecclesiastes (Ecc), Gospel of St. Matthew (Mt), Gospel of St. Luke (Lk), Acts (Act), and Revelation (Rev).

Full valency and word order of enclitics
The notion of full valency (FV) was introduced to linguistics by Čech et al. (2010) and was elaborated by Vincze (2014) and Čech et al. (2015). The FV approach is a reaction to the absence of reliable criteria for distinguishing obligatory arguments (complements) and non-obligatory arguments (optional adjuncts). FV does not distinguish between obligatory arguments and non-obligatory ones. Thus, all directly dependent units of the predicate which occur in the actual language usage comprise its full valency frame.
A higher FV of the predicate means a higher complexity 1 of the clause (at least at this level of the syntactic tree, i.e. at the root of the clause and its direct dependents). We assume that the higher complexity is the factor which increases the probability that the Wackernagel's Law is "violated" for the following reason. The occurrence of the enclitic in the 2P position often means that the enclitic is not in the position adjacent to its syntactically superior word. Further, a more complex the clause structure increases the difficulty of processing the clause structure cognitively, especially when it contains distant dependency relations. Consequently, the tendency to put the enclitic next to its syntactically superior word instead to the 2P position should be positively correlated with the complexity of the clause. Of course, the complexity of the clause could be, in the ideal case, determined from the property of the entire clause structure. However, the character of the language material, which must be annotated manually, forced us to focus exclusively on the FV as the measure of the clause complexity (i.e., only the highest levels of syntactic trees are taken into account). Admittedly, this approach has its limitations and more comprehensive characteristics of the syntactic complexity will have to be applied in future research, but results achieved indicate that the positions of the enclitics are indeed influenced also by syntactic properties of the clause of which they are part.

Results
The relationship between the FV and the proportion of enclitics in the 2P position is presented in Table  1 and Figure 1. Here, all data are merged together, i.e. these results represent property of the corpus comprising eight Biblical books (see Section 2). Since some FV sizes do not contain enough instances for a proper evaluation of the data (e.g., there are only four clauses with enclitics for which the FV attains the value of seven), we decided to pool the adjacent bins so that each bin contains at least ten instances. In Tables 2 and 3, the FV size expresses the weighted arithmetic mean, with frequencies being the weights.  It is obvious (see Table 1 and Figure 1) that there is no tendency corresponding to our prediction from Section 1. On the contrary, clauses with the lowest FV have the lowest proportion of the 2P positions of enclitics which is a direct contradiction to the hypothesis. However, it is known that the distribution of particular positions of enclitics is significantly influenced by the style (Kosek et al., 2018c). As Biblical books fundamentally differ with respect to style, we studied also the relationship between the FV and the position of the enclitic separately in individual books. The results are presented in Table 2 and Figure  2.
The analysis of individual books brings rather a different picture. We can see that results from Act, Mt, Lk, and Gen corroborate the hypothesis, while results from Job and Ecc falsify it. As for Is and Rev, there are not enough data for a conclusion. At the first sight, it seems that there are differences between narrative texts (i.e., Act, Mt, Lk, and Gen) and poetic texts (i.e., Job and Ecc). In the case of poetic texts, their specific character can be a reason why the hypothesis is rejected -the author must fulfil some conditions to fit the rules of poetry, which can influence (or violate) the mechanism underlying the hypothesis.

VFAct 2PAct
non-2PAct proportion of 2PAct    The study brings some important findings. First, even though the hypothesis was falsified when it was tested on both poetic texts and a corpus consisting of eight Biblical books, we do not reject the hypothesis generally. We assume that the poetic character of texts can be interpreted as a border condition which restricts the validity of the hypothesis. Further, it was revealed that mixing texts is another factor that can influence the outcome of hypothesis testing significantly. A mixture of different texts (e.g. with respect to their genre or style) means that particular mechanisms can "fight" each other and, as a consequence, their influence can be weakened (or it can even disappear). Finally, it must be emphasized that this paper is the first attempt to test this hypothesis. Needless to say, further research is necessary in this research field.