An explanation of the decisive role of function words in driving syntactic development

The early mastery of function words (FWs) better predicts children’s concurrent and subsequent syntactic development than their acquisition of content words (CWs). Wishing to understand why the advantage of the early mastering of a FW vocabulary, we tested the hypothesis that the learning of FWs involves learning their syntax to a higher degree than is the case for CWs. English-language parental (N=506) and young children’s speech samples (N=350) were taken from the CHILDES archive. We mapped the use of words of different form-classes in parental speech, comparing the words’ occurrence as single-word utterances and as the heads of two-word long syntactically structured sentences. The distributions showed a dramatic effect of form-class: the four FW categories subordinators, determiners, prepositions and auxiliary verbs are used by parents almost exclusively in multiword utterances. By contrast, words in the four CW categories verbs, nouns, adjectives and adverbs appear both as single-word utterances and as roots of twoword sentences. Analysis of children’s talk had similar results, the proportions correlating very highly with parents’. Acquisition of FWs predicts syntactic development because they must be learned as combining words, whereas CWs can be learned as stand-alone lexemes, without mastering their syntax. 1. The research question 1.1 FWs predict syntactic development better than CWs Grammatical words such as determiners, auxiliary verbs and prepositions had long been considered marginal for the early stages of syntactic development. Many authorities such as Radford (1990) believed that such ‘function words’ (FWs) are acquired late by typically-developing children, and that, at the early stages of acquisition, syntactic development relies on ‘content words’ (CWs) or ‘lexical words’ (nouns, verbs, adjectives and adverbs), that carry semantic relations which can be expressed as patterned speech (Brown, 1973). In the last few years the trend has turned, as recently developmental studies have been offering some new evidence for the importance of the early mastery of FWs for syntactic development. In several studies it was found that in children acquiring various languages, the early mastery of FWs such as subordinators, auxiliary verbs, prepositions and determiners strongly predicts children’s concurrent and especially subsequent syntactic development. Kedar et al. (2006) found that 18and 24-month-old infants acquiring English oriented faster and more accurately to a visual target following sentences in which the referential expression included determiners. They concluded that by 18 months of age, infants use their knowledge of determiners when they process sentences and establish reference. Le Normand et al. (2013) examined the speech of French-speaking children aged 2-4 years and correlated the diversity of word types in various form-classes with the children’s mean length of utterance in words (MLU). They found that the diversity of word types in FWs was the best predictor of MLU, much surpassing CW categories. Szagun and Schramm (2016) studied young deaf children acquiring German who received cochlear implants at an early age. They found that the early type and token frequencies of determiners predict MLU two years later more strongly than the early frequency of lexical words. In addition, Ninio (2019) established that FWs are indeed participants in the sentence’s syntactic structuring. Taking the determiner-noun relation as an exemplar of the FW-CW relationship, Ninio demonstrated that the syntactic relation of FWs to the CWs they are associated with is probably identical to the syntactic relation of Head-Dependent complementation (or Merge) that constitutes the major building process of syntax. The determiner-noun combination appears to be a type of complementation, the same syntactic operation that underlies, for example, the combination of a verb with its direct object. Possibly, some generalized knowledge of how words relate to one another is learned when producing early multiword utterances headed by FWs, and, through transfer and facilitation, it drives the construction of grammar. It seems that children learn syntactic principles through specific constructions; in the case of determiner-noun combinations, they apparently learn the principle of headcomplement relation, which then can be transferred to other syntactic constructions employing the same basic combinatory operation. Although the correlational results are a good foundation for theorizing, it is actually still unclear what is the explanation for the higher predictive power of learning FWs for syntactic development. At the early stages of syntax, children learn both a vocabulary of FWs and of CWs; and children do learn to produce verb-object combinations early, not only determiner-noun ones. Nevertheless, it is the acquisition of determiners as vocabulary items, and not of verbs, that best predicts a child’s syntactic development. In the present study, our goal is to provide an explanation for FWs’ advantage over CWs in providing a learning environment for principles of syntax while being learned as vocabulary items. Our starting point is the most significant difference between CWs and FWs which is that CWs have rich semantic content whereas FWs are grammatical words that have little or none. This implies that CWs require a learning format where words are matched to their meaning in some non-linguistic context (Macnamara, 1972), but may not necessarily need a syntactic context to acquire their meaning (Ninio, 2016). By contrast, FWs cannot be learned from the non-linguistic context; they are words whose whole purpose is to participate in some combinatory process with another word. It follows that learning them requires multiword input that will make possible to observe their syntactic behaviour. Thus, our hypothesis is that the acquisition of FWs is better connected to syntactic development than the learning of CWs because their learning necessarily involves multiword input, hence it necessitates the mastery of syntactic principles. CWs, as words possessing semantic content, can be learned from singleword utterances, as their meaning is learned from the nonlinguistic context. We therefore predict that CWs are more likely to be learned from single-word input than FWs; the latter are more likely to be learned from multiword input sentences. To make the comparison between single-word and multiword as equitable as possible, we took as multiword input sentences the shortest possible multi-word combinations, which are two-words long sentences possessing syntactic connectivity. The hypotheses were tested by two different comparisons: first, we compared the parental input of content words (CWs) and of function words (FWs) as either single-word utterances or as heads of twoword syntactic combinations. Second, we looked at words occurring in children’s single-word sentences and compared them with the heads of their two-word syntactic combinations. The hypothesis was that FWs are presented, and learned, as strongly syntactic elements, while CWs, less so. We expect a significant difference between FWs and CWs in proportion of tokens in single-word and multiword uses, both in the parental input and in children’s own spontaneous productions. 2. Method 2.1 Participants For English-language parental and child samples we systematically sampled the English transcripts in the CHILDES (Child Language Data Exchange System) archive (MacWhinney, 2000). The CHILDES is a public domain database for corpora on first and second language acquisition. The publicly available, shared archive contains documentation of the speech of more than 500 English-speaking parents addressed to their young children. The CHILDES archive stores the transcribed observations collected in various different research projects. In building our corpora, we followed closely the principles established in linguistics for constructing systematically assembled large corpora (Francis and Kučera, 1979). We selected projects among the ones available using the criteria that the observations were of normally developing young children with no diagnosed hearing or speech problems, and of their parents, native speakers of English, their speech produced in the context of naturalistic, dyadic parent-child interaction. We restricted the child's age during the observed period to three years and six months. This process resulted in the selection of parents and children from 33 research projects in the CHILDES archive: the British projects Belfast, Howe, Korman, Manchester, and Wells, and the American projects Bates, BernsteinRatner, Bliss, Bloom 1970 and 1973, Brent, Brown, Clark, Cornell, Demetras, Feldman, Gleason, Harvard Home-School, Higginson, Kuczaj, MacWhinney, McMillan, Morisset, New England, Peters-Wilson, Post, Rollins, Sachs, Suppes, Tardif, Valian, Van Houten, and Warren-Leubecker (MacWhinney, 2000). From these projects, we selected 471 observational studies involving a target child of the desired young age range, namely, below three years and six months.


FWs predict syntactic development better than CWs
Grammatical words such as determiners, auxiliary verbs and prepositions had long been considered marginal for the early stages of syntactic development. Many authorities such as Radford (1990) believed that such 'function words' (FWs) are acquired late by typically-developing children, and that, at the early stages of acquisition, syntactic development relies on 'content words' (CWs) or 'lexical words' (nouns, verbs, adjectives and adverbs), that carry semantic relations which can be expressed as patterned speech (Brown, 1973).
In the last few years the trend has turned, as recently developmental studies have been offering some new evidence for the importance of the early mastery of FWs for syntactic development. In several studies it was found that in children acquiring various languages, the early mastery of FWs such as subordinators, auxiliary verbs, prepositions and determiners strongly predicts children's concurrent and especially subsequent syntactic development. Kedar et al. (2006) found that 18-and 24-month-old infants acquiring English oriented faster and more accurately to a visual target following sentences in which the referential expression included determiners. They concluded that by 18 months of age, infants use their knowledge of determiners when they process sentences and establish reference. Le Normand et al. (2013) examined the speech of French-speaking children aged 2-4 years and correlated the diversity of word types in various form-classes with the children's mean length of utterance in words (MLU). They found that the diversity of word types in FWs was the best predictor of MLU, much surpassing CW categories. Szagun and Schramm (2016) studied young deaf children acquiring German who received cochlear implants at an early age. They found that the early type and token frequencies of determiners predict MLU two years later more strongly than the early frequency of lexical words. In addition, Ninio (2019) established that FWs are indeed participants in the sentence's syntactic structuring. Taking the determiner-noun relation as an exemplar of the FW-CW relationship, Ninio demonstrated that the syntactic relation of FWs to the CWs they are associated with is probably identical to the syntactic relation of Head-Dependent complementation (or Merge) that constitutes the major building process of syntax. The determiner-noun combination appears to be a type of complementation, the same syntactic operation that underlies, for example, the combination of a verb with its direct object. Possibly, some generalized knowledge of how words relate to one another is learned when producing early multiword utterances headed by FWs, and, through transfer and facilitation, it drives the construction of grammar. It seems that children learn syntactic principles through specific constructions; in the case of determiner-noun combinations, they apparently learn the principle of headcomplement relation, which then can be transferred to other syntactic constructions employing the same basic combinatory operation.
Although the correlational results are a good foundation for theorizing, it is actually still unclear what is the explanation for the higher predictive power of learning FWs for syntactic development. At the early stages of syntax, children learn both a vocabulary of FWs and of CWs; and children do learn to produce verb-object combinations early, not only determiner-noun ones. Nevertheless, it is the acquisition of determiners as vocabulary items, and not of verbs, that best predicts a child's syntactic development. In the present study, our goal is to provide an explanation for FWs' advantage over CWs in providing a learning environment for principles of syntax while being learned as vocabulary items.
Our starting point is the most significant difference between CWs and FWs which is that CWs have rich semantic content whereas FWs are grammatical words that have little or none. This implies that CWs require a learning format where words are matched to their meaning in some non-linguistic context (Macnamara, 1972), but may not necessarily need a syntactic context to acquire their meaning (Ninio, 2016). By contrast, FWs cannot be learned from the non-linguistic context; they are words whose whole purpose is to participate in some combinatory process with another word. It follows that learning them requires multiword input that will make possible to observe their syntactic behaviour.
Thus, our hypothesis is that the acquisition of FWs is better connected to syntactic development than the learning of CWs because their learning necessarily involves multiword input, hence it necessitates the mastery of syntactic principles. CWs, as words possessing semantic content, can be learned from singleword utterances, as their meaning is learned from the nonlinguistic context. We therefore predict that CWs are more likely to be learned from single-word input than FWs; the latter are more likely to be learned from multiword input sentences.
To make the comparison between single-word and multiword as equitable as possible, we took as multiword input sentences the shortest possible multi-word combinations, which are two-words long sentences possessing syntactic connectivity.
The hypotheses were tested by two different comparisons: first, we compared the parental input of content words (CWs) and of function words (FWs) as either single-word utterances or as heads of twoword syntactic combinations. Second, we looked at words occurring in children's single-word sentences and compared them with the heads of their two-word syntactic combinations. The hypothesis was that FWs are presented, and learned, as strongly syntactic elements, while CWs, less so. We expect a significant difference between FWs and CWs in proportion of tokens in single-word and multiword uses, both in the parental input and in children's own spontaneous productions.

Participants
For English-language parental and child samples we systematically sampled the English transcripts in the CHILDES (Child Language Data Exchange System) archive (MacWhinney, 2000). The CHILDES is a public domain database for corpora on first and second language acquisition. The publicly available, shared archive contains documentation of the speech of more than 500 English-speaking parents addressed to their young children. The CHILDES archive stores the transcribed observations collected in various different research projects. In building our corpora, we followed closely the principles established in linguistics for constructing systematically assembled large corpora (Francis and Kučera, 1979).
We selected projects among the ones available using the criteria that the observations were of normally developing young children with no diagnosed hearing or speech problems, and of their parents, native speakers of English, their speech produced in the context of naturalistic, dyadic parent-child interaction. We restricted the child's age during the observed period to three years and six months. This process resulted in the selection of parents and children from 33 research projects in the CHILDES archive: the British projects Belfast, Howe, Korman, Manchester, and Wells, and the American projects Bates, Bernstein-Ratner, Bliss, Bloom 1970and 1973, Brent, Brown, Clark, Cornell, Demetras, Feldman, Gleason, Harvard Home-School, Higginson, Kuczaj, MacWhinney, McMillan, Morisset, New England, Peters-Wilson, Post, Rollins, Sachs, Suppes, Tardif, Valian, Van Houten, and Warren-Leubecker (MacWhinney, 2000. From these projects, we selected 471 observational studies involving a target child of the desired young age range, namely, below three years and six months.

Parents' corpus
We built a corpus of parental utterances containing single-word and two-word sentences. Each parent was selected individually, so that from the same research project involving the same target child, we included in the study either the mother, or the father, or both parents as separate speakers, as long as either or both passed the criteria for inclusion. In 35 of the 471 studies there were two active parents interacting with the target child, resulting in a parental sample of 506 different parents.
In order to avoid severely unequal contributions to the pooled corpus, the number of utterances included from each parent was restricted to a maximum of 3,000, counting from the beginning of observations. We have excluded the speech of parents addressed to other adults present in the observational session or on the telephone, as this speech may be ignored by young children because of unfamiliar subjects. Contextual comments were checked in order to ascertain that we included only spontaneous utterances from target parent to target child. The resultant parental corpus contains almost 1.5 million (1,470,811) running words of transcribed speech based on naturalistic observations of interaction between parents and their young children, representing several hundred hours of transcribed speech. Most of the children addressed were under three years of age, and 93% of the parents in the sample talked to a child between one year and two and a half years of age in all or the majority of the observations we included in the corpus. The mean age of the children addressed was 2.25 years.
The corpora of English-language parental child-directed speech represent the linguistic input that young children receive when acquiring syntax. Although each separate study is by necessity limited in its coverage of the phenomenon, the different studies pooled together can provide the requisite solid database for generalization. The use of pooled corpora of unrelated parents as a representation of the linguistic input is a relatively conventional move in child language research (e.g., Goodman et al., 2008). Multiple speakers of child-directed speech may provide a good estimate of the total linguistic input to which children are exposed, which includes, besides the speech of the individual mother or father, also the speech of grandparents, aunts and uncles, older siblings and other family members, neighbours, care professionals, and so forth, represented in our corpus by the speech of mothers and fathers unrelated to the individual child. The pooled database represents the language behaviour exhibited by the community as a whole when addressing young children.
The analyses of a study using corpus data do not attempt to demonstrate that particular children learned particular patterns of use from their own parents. When working with a corpus pooling the speech data of a large number of, respectively, parents and children, the aim is, rather, to create a data set of typical child directed speech and use this to make predictions about children's contrastive mastery of different patterns, thus finding out which of a possible set of factors are most predictive of development. Thus, the variability exploited for statistical testing is not individual differences but, rather, contrasts between the effects of different potential sources of input.
As our analytic plan was to find what kind of input, single-word or multiword sentences, was more likely to serve as a model for learning words of the various form-classes by young children, we manually checked the transcribed dialogue and the action and other contextual comments in the CHILDES archive in order to ascertain that we include only spontaneous utterances from target parent to target child. This means we excluded parental utterances if they imitated the child's previous utterance, if in order to ask a clarification question or to provide feedback. We wanted parental utterances that can serve as models for the child's learning --if the parent imitates the child, this cannot be considered a model for new learning. This was in particular important for single-word sentences by parents that were likely to be verbatim repetitions of children's single-word sentences. The exclusion of such utterances alongside children's imitation ensured that we did not arrive at a positive correlation between parental and child frequencies for particular types of words because of mutual imitation of participant speakers.
We focused on parents' single-word utterances and their two-word long sentences possessing syntactic structure, excluding utterances consisting of vocatives or an interjections, or where one of the two words of an utterance was a vocative or an interjection. Unfinished or cut off utterances, or containing words not transcribed in the original were also excluded. Besides these exclusions, we used the original transcripts' separation into sentences as our criterion for identifying single-word and two-word utterances. This corpus represents the shortest linguistic input that young children receive when acquiring syntax. Parents produced 25,694 single-word utterances and 23,141 two-words long sentences. The total number of sentences by parents processed in this study was 48,834.

Children's corpus
Samples of one-word and two-word utterances from 471 children were taken from the same English transcripts in the CHILDES archive from which we took the parents' speech. We restricted the contribution of each individual child to 300 multiword sentences, starting from the first observation in which they produced multiword utterances. Children's utterances were included only if they were spontaneous, namely, not immediate imitations of preceding adult utterances. For each utterance marked in the original transcriptions as one uttered by the child, we checked the context to make certain that the line was indeed child speech (and not, for example, an action description or parental sentence erroneously marked as child speech). The size of the resulting pooled child corpus is 194,359 running words. It contains 101,064 utterances; this makes the mean length of utterance (MLU) in words 1.92. Similar to the group of parents, we are treating young children acquiring English as their first language as a homogeneous group, as far as the important characteristics of their syntax is concerned. In this, we follow the tradition of researchers who examine pooled corpora of child speech for various characteristics thought to reflect on the relevant class of child speakers (Radford, 1990 ;Serratrice et al., 2003).
For this study, we selected the children who have well-mastered two-word speech. Our criterion was that they not only produced some two-word sentences but that they have already started to combine three words in syntactic combinations. Of the total sample of 471 children, 350 children produced at least three sentences of three words in syntactic combination, that is, excluding vocatives, interjections, or syntactically unrelated words, during the period of observation sampled in the study. The mean age of the children was 2 years and 18 days (SD = 4 months 8 days); range 14-42 months). They produced 24,429 single-word utterances and 11,642 two-word long syntactically structured sentences. The total number of sentences by children processed in this study was 36,070.

Lemmatizing verbs and nouns
We lemmatized all verbs in the corpus into their respective stem-groups. Lemmatization is the grouping of related verb forms that share the same stem and differ only in inflection or spelling. For example, eat, eats, ate, eaten, and eating all belong to the stem-group or lemma of eat. In case of irregular verbs changing their shape when inflected such as had and has of the verb have, these forms were also included in the lemma of the relevant stem. This process neutralizes differences in morphological shape irrelevant for the syntactic behaviour of verbs, such as differences of tense, aspect, and person. This analysis assumes that young children ignore the differences in morphological form between verbs belonging to the same lemma, so that they treat an inflected form such as eats as equivalent to an uninflected form such as eat. Similarly, we collapsed singular and plural nouns into a single noun-stem category.

Syntactic annotation for grammatical relations.
Sentences were parsed manually for syntactic structure. We based our dependency analyses on the detailed descriptions of Hudson's English Word Grammar (Hudson, 1990) with its online update (Hudson, 2014). We also consulted descriptive grammars of English, and in particular Quirk, Greenbaum, Leech, and Svartvik (1985).
For each sentence, the root (namely, the highest element syntactically) of the dependency structure was identified and subsequently tagged for form class membership (see below).
Syntactic annotation of the sentence was done by graduate students at the Hebrew University with training in linguistics. It relied on extensive coding instructions and a very large collection of annotated exemplars. We checked for reliability by having three pairs of coders blindly recode 1,900 utterances produced by four different parents and two children. A checking of all reliability codes showed that the agreement of each coder with the others was above 95%, based on codes actually given by the relevant pairs of coders. Throughout coding, all problem cases were discussed and resolved. Ultimately, each coded utterance was double-checked by another coder. Classifying roots for form-class: The root-words were classified according to categories of form-classes.

Results and discussion
First, we compared parents' use of words belonging to each form-class as single-word utterances or as the roots of two-word long syntactically connected utterances. Figure 1 presents the proportion of single-word and of two-word utterances with root-words belonging to each form-class. The comparison revealed that in four categories of function words, namely subordinator, determiner, preposition and auxiliary verb, there were almost no single-word utterances produced. The two remaining closed-class categories --particle and pronoun --were also extreme in their distribution but to the other direction, as almost all tokens were single-word utterances. The four open classes occurred in a mixture of single-word and multiword sentences, at least 20% in the minority category. Next, we did the same analysis of children's talk. The results were very similar. Figure 2 presents the results.  To estimate the similarity of the two distributions, we computed Pearson correlation coefficients between the proportion of single-word tokens of children's and parents' use of words as head-words, out of the total tokens of single-word and two-word sentences in various form-classes. We found that the correlation is very high, with a correlation coefficient of 0.99. That means children closely follow parental models in their use-patterns of words of various type as single words or as roots of two-word long sentences.

Conclusions
Our results offer a simple explanation for the high correlation found in many studies between the learning of such FWs as subordinators, determiners, prepositions and auxiliary verbs, and syntactic development in languages in which such FWs are used in many contexts. These languages are mostly analytic languages such as English, but include also relatively more synthetic languages such as French and German that nevertheless do use FWs quite extensively. We have shown that there is a strong relation between the acquisition of FWs and the development of syntax because such words need multiword input to learn them, hence mastering syntax is a condition for their acquisition. FWs must be learned from multiword sentences because their all of their content is concerned with connection between words; namely, they must be learned as combining words. By contrast, CWs are often learned from single-word utterances as they have semantic content and that can be learned from the nonlinguistic context by semantic matching of the word to the world. The studies finding a stronger correlation of FWs with syntactic development than CWs (e.g. Le Normand et al., 2013) used as predictors words that children have learned into their active vocabulary. For this measure, the precise form of employment of the words is irrelevant and it is not given in the publications. Our study provides the missing information which is that children not only learn FW from multiword and not single-word input sentences, but that they also use them exclusively in syntactically connected multiword sentences. Thus, we have shown that when a child has learned a FW, he or she has also learned its syntax, while the same is not necessarily true for CWs. Apparently such learning facilitates syntactic development in general in the relevant languages.
These results also explain a paradoxical finding in the field according to which the learning of CWs such as verbs does not correlate significantly with the mastery of syntax. The lack of correlation is unexplainable on a well-accepted theory called the syntactic bootstrapping hypothesis according to which the meaning of verbs cannot be gleaned from the interactive context but need the syntactic context to be learned (Gleitman, 1990). If this theory were correct, we would expect that the learning of lexical verbs and other contentwords would be closely correlated to syntactic development, which does not happen according to the relevant studies. Our findings account for this lack of correlation by showing that CWs occur as singleword utterances in large numbers both in parental input and in children's productions and hence the syntactic context is not crucial for learning their use (see also Ninio, 2016). The opposite is the case for those FWs that do not function as single-word utterances, namely determiners, prepositions, subordinators, and auxiliary verbs; such words and not CWs are the ones that require syntactic bootstrapping for their acquisition.
Although the preceding explanation is sufficient for accounting for the findings that FWs are more strongly correlated with syntactic development than CWs, this cannot be the whole story. FWs (and not CWs) are also crucially connected with language loss, not only with language development. In particular, it was repeatedly found that in patients suffering from Broca's aphasia there a strong correlation between the loss of syntax and the loss of the FW vocabulary (e.g., Bock, 1989;Garrett, 1982). The connection with both development of syntax and its loss suggests that FWs must have a central role in the syntactic structuring of sentences.
Our findings provide an opening for such a model. We have shown that FWs are distinctive in lacking semantic meaning and being solely defined by their kind of connection to other words. The promising possibility is that they work as interface elements, serving communication with other units of the total system, while the specific semantic content of their complement content-word is "encapsulated". This is the role elements possessing behavioral content but lacking semantic content fill, in, for example, Object Oriented Programming languages (The Java Tutorials, 2017). In our ongoing research, we are currently testing this intriguing possibility.