Modeling Incremental Language Comprehension in the Brain with Combinatory Categorial Grammar

Hierarchical sentence structure plays a role in word-by-word human sentence comprehension, but it remains unclear how best to characterize this structure and unknown how exactly it would be recognized in a step-by-step process model. With a view towards sharpening this picture, we model the time course of hemodynamic activity within the brain during an extended episode of naturalistic language comprehension using Combinatory Categorial Grammar (CCG). CCG has well-defined incremental parsing algorithms, surface compositional semantics, and can explain long-range dependencies as well as complicated cases of coordination. We find that CCG-derived predictors improve a regression model of fMRI time course in six language-relevant brain regions, over and above predictors derived from context-free phrase structure. Adding a special Revealing operator to CCG parsing, one designed to handle right-adjunction, improves the fit in three of these regions. This evidence for CCG from neuroimaging bolsters the more general case for mildly context-sensitive grammars in the cognitive science of language.


Introduction
The mechanism of human sentence comprehension remains elusive; the scientific community has not come to an agreement about the sorts of abstract steps or cognitive operations that would bestexplain people's evident ability to understand sentences as they are spoken word-by-word. One way of approaching this question begins with a competence grammar that is well-supported on linguistic grounds, then adds other theoretical claims about how that grammar is deployed in real-time processing. The combined theory is then evaluated against observations from actual human language processing. This approach has been successful in accounting for eye-tracking data, for instance * Correspondence to m.stanojevic@ed.ac.uk starting from Tree-Adjoining Grammar and adding a special Verification operation (Demberg et al., 2013).
In this spirit, the current paper models the hemodynamics of language comprehension in the brain using complexity metrics from psychologicallyplausible parsing algorithms. We start from a mildly context-sensitive grammar that supports incremental interpretation, 1 Combinatory Categorial Grammar (CCG; for a review see Steedman and Baldridge, 2011). We find that CCG offers an improved account of fMRI blood-oxygen level dependent time courses in "language network" brain regions, and that a special Revealing parser operation, which allows CCG to handle optional postmodifiers in a more human-like way, improves fit yet further (Stanojević and Steedman, 2019;. These results underline the consensus that an expressive grammar, one that goes a little beyond context-free power, will indeed be required in an adequate model of human comprehension (Joshi, 1985;Stabler, 2013).

A Focus on the Algorithmic Level
A step-by-step process model for human sentence parsing would be a proposal at Marr's (1982) middle level, the algorithmic level (for a textbook introduction to these levels, see Bermúdez, 2020, §2.3). While this is a widely shared research goal, a large proportion of prior work linking behavioral and neural data with parsing models has relied upon 1 This work presupposes that sentence interpretation for the most part reflects compositional semantics, and that comprehension proceeds by and large incrementally. This perspective does not exclude the possibility that highly frequent or idiosyncratic patterns might map directly to interpretations in a noncompositional way (see Ferreira and Patson, 2007;Blache, 2018 as well as Slattery et al., 2013;Paolazzi et al., 2019 and discussion of Bever's classic 1970 proposal by Phillips 2013). de Lhoneux et al. (2019) shows how to accommodate these cases as multi-word expressions in a CCG parser.  maps brain regions implicated in these two theorized routes of human sentence processing.

Mary reads papers
NP (S \NP )/NP NP mary ′ λx.λy.reads ′ (y, x) papers ′ > S \NP λy.reads ′ (y, papers ′ ) < S reads ′ (mary ′ , papers ′ ) (a) Right-branching derivation.  the surprisal linking hypothesis, which is not an algorithm. In fact surprisal wraps an abstraction barrrier around an algorithmic model, deriving predictions solely from the probability distribution on that model's outputs (for a review see Hale, 2016). This abstraction is useful because it allows for the evenhanded comparison of sequence-oriented models such as ngrams or recurrent neural networks against hierarchical, syntax-aware models. And indeed in eye-tracking, this approach confirms that some sort of hierarchical structure is needed (see e.g. Fossum and Levy, 2012;van Schijndel and Schuler, 2015). This same conclusion seems to be borne out by fMRI data (Henderson et al., 2016;Brennan et al., 2016;Willems et al., 2016;Shain et al., 2020). But precisely because of the abstraction barrier that it sets up, surprisal is ill-suited to the task of distinguishing ordered steps in a processing mechanism. We therefore put surprisal aside in this paper, focusing instead on complexity metrics that are nearer to algorithms; the ones introduced below in §5.3 all map directly on to tree traversals. By counting derivation tree nodes, these metrics track work that the parser does, rather than the rar-ity of particular words or ambiguity of particular constructions. 2 Previous research at the algorithmic level has been limited in various ways. Brennan et al. (2016) used an expressive grammar, but it was not broad coverage and the step counts were based on derived X-bar trees rather than the derivation trees that would need to be handled by a provably correct parsing algorithm (Stanojević and Stabler, 2018).  used a full-throated parser but employed the Penn Treebank phrase structure without explicit regard for long-distance dependency. Figure 2 shows an example of one of these dependencies.

Why CCG?
CCG presents an opportunity to remedy the limitations identified above in section 2. As already mentioned, CCG is appropriately expressive (Vijay-Shanker and Weir, 1994). And it has special characteristics that are particularly attractive for incremental parsing. CCG can extract filler-gap dependencies such as those in the object relative clause in Figure 2, synchronously and incrementally building surface compositional semantics (cf. Demberg 2012). 3 CCG also affords many different ways of 2 Counting derivation-tree nodes dissociates from surprisal.  addresses the choice of linking hypothesis empirically by deriving both step-counting and surprisal predictors from the same parser. The former but not the latter predictor significantly improves a regression model of fMRI timecourse in posterior temporal lobe, even in the presence of a co-predictor derived from a sequence-oriented language model. 3 The derivations in Figure 1 and 2 use type-raising as a parser operation. In the definition of CCG from Steedman (2000) type-raising is not a syntactic, but a lexical operation. The reason why we use it as a parsing operation is because that is the way it was defined in the CCGbank (Hockenmaier and Steedman, 2007) and because it is implemented as such in all broad coverage parsers. Type-raising contributes to the complexity metric described in Section 5.3 deriving the same sentence (see Figure 1). These alternative derivations all have the same semantics, so from the point of view of comprehension they are all equally useful. Steedman (2000, §9.2) argues that this flexible constituency is the key to achieving human-like incremental interpretation without unduly complicating the relationship between grammar and processor. Incremental interpretation here amounts to delivering updated meaning-representations at each new word of the sentence. Such early delivery would seem to be necessary to explain the high degree of incrementality that has been demonstrated in laboratory experiments (Marslen-Wilson, 1973;Altmann and Steedman, 1988;Tanenhaus et al., 1995).
Other types of grammar rely upon special parsing strategies to achieve incrementality. Eager left-corner parsing (LC) is often chosen because it uses a finite amount of memory for processing leftand right-branching structures (Abney and Johnson, 1991). Resnik (1992) was the first to notice a similarity between eager left-corner CFG parsing and shift-reduce parsing of CCG left-branching derivations. In short, forward type-raising >T is like LC prediction while forward function-composition >B is like LC completion (both of these combinators are used in Figure 2). However, CCG has other combinators that make it even more incremental. For instance, in a level one center embedding such as "Mary gave John a book" a left-corner parser cannot establish connection between Mary and gave before it sees John. CCG includes a generalized forward function composition >B 2 that can combine type-raised Mary S/(S\N P ) and gave ((S\N P )/N P )/N P into (S/N P )/N P .
To our knowledge, the present study is the first to validate the human-like processing characteristics of CCG by quantifying their fit to human neural signals.

The Challenge of Right Adjunction for Incremental Parsing
A particular grammatical analysis may be viewed as imposing ordering requirements on left-to-right incremental parser operations; it obligates certain operations to wait until others have finished. A case in point is right adjunction in sentences such as "Mary reads papers daily." (see Figure 3a). Here the parser has built "Mary reads papers" eagerly, as it should be expected from any parser with humanlike behavior, but then it encountered the adjunct  "daily". This adjunct is an optional postmodifier of the verb phrase "reads papers." It could be analyzed using the rule VP → VP AdvP where "daily" is a one-word adverbial phrase adjunct of VP. With this rule, a context-free phrase structure parser will be forced either (i) to backtrack upon seeing "daily" or (ii) to leave the VP open for postmodification (Hale, 2014, pages 31-33 opts for the latter). Neither of these alternatives is particularly appealing from the perspective of cognitive modeling, and indeed Sturt and Lombardo (2005) report a pattern of eyetracking data that appears to be inconsistent with CCG. They suggest that CCG's account of conjunction, itself analyzable as adjunction, imposes an ordering requirement that cannot be satisfied in psycholinguistically-realistic way.
Sturt and Lombardo's 2005 finding is an important challenge for theories of incremental interpretation, including neurolinguistic models based on LC parsing (Brennan and Pylkkänen, 2017;Nelson et al., 2017). Stanojević and Steedman (2019) offer a crucial part of a solution to this problem.
First, they relax the notion of attaching of a right-adjunct: an adjunct does not have to attach to the top category of the tree but it can attach to any node on the right spine of the derivation, as long as the attachment respects the node's syntactic type. In Figure 3a the right spine is highlighted in blue. However, none of the constituents on the right spine can be modified by "daily" because the constituent that needs to be modified, "reads papers" was never built; it is not part of the leftbranching derivation. To address this, the Stanojević and Steedman parser includes a second innovation: it applies a special tree-rotation operation that transforms left-branching derivations into semantically equivalent right-branching ones. In Figure 3b this operation produces a new right spine, revealing a node of type S\NP , which is the type assigned to English verb phrases in CCG. In Figure 3c the adjunct "daily" is properly attached to this boxed node via Application, a CCG rule that is used quite generally across many different constructions.
The idea of attaching right-adjuncts to a node of an already-built tree has appeared several times before (Pareschi and Steedman, 1987;Niv, 1994;Ambati et al., 2015;Stanojević and Steedman, 2019) and in all cases it crucially leverages CCG's flexible constituency as shown in Figure 1. See  for more detailed treatment of Sturt and Lombardo's construction using predictive completion. The present study examines whether or not the addition of the Revealing operation increases the fidelity of CCG-derived parsing predictions to human fMRI time course data.

Methods
We follow Brennan et al. (2012) and Willems et al. (2016) in using a spoken narrative as a stimulus in the fMRI study. Participants hear the story over headphones while they are in the scanner. The neuroimages collected during their session serve as data for regression modeling with word-by-word predictors derived from the text of the story.

The Little Prince fMRI Dataset
The English audio stimulus was Antoine de Saint-Exupéry's The Little Prince, translated by David Wilkinson and read by Karen Savage. It constitutes a fairly lengthy exposure to naturalistic language, comprising 19,171 tokens, 15,388 words and 1,388 sentences, and lasting over an hour and a half. This is the fMRI version of the EEG corpus described in Stehwien et al. (2020). It has been used before to investigate a variety of brain-language questions unrelated to CCG parsing Li et al., 2018). Prior to parsing, number expressions were spelled out i.e. 42 as "forty two" and all punctuation was removed.

Participants
Participants comprised fifty-one volunteers (32 women and 19 men, 18-37 years old) with no history of psychiatric, neurological, or other medical illness or history of drug or alcohol abuse that might compromise cognitive functions. All strictly qualified as right-handed on the Edinburgh handedness inventory (Oldfield, 1971). All self-identified as native English speakers and gave their written informed consent prior to participation, in accordance with the university's IRB guidelines. Participants were compensated for their time, consistent with typical practice for studies of this kind. They were paid $65 at the end of the session. Data from three out of the 51 participants was excluded because they did not complete the entire session or moved their head excessively.

Presentation
After giving their informed consent, participants were familiarized with the MRI facility and assumed a supine position on the scanner gurney. The presentation script was written in PsychoPy (Peirce, 2007). Auditory stimuli were delivered through MRI-safe, high-fidelity headphones (Confon HP-VS01, MR Confon, Magdeburg, Germany) inside the head coil. Using a spoken recitation of the US Constitution, an experimenter increased the volume until participants reported that they could hear clearly. Participants then listened passively to the audio storybook for 1 hour 38 minutes. The story had nine chapters and at the end of each chapter the participants were presented with a multiple-choice questionnaire with four questions (36 questions in total), concerning events and situations described in the story. These questions served to assess participants' comprehension. The entire session lasted around 2.5 hours.

Data Collection
Imaging was performed using a 3T MRI scanner (Discovery MR750, GE Healthcare, Milwaukee, WI) with a 32-channel head coil at the Cornell MRI Facility. Blood Oxygen Level Dependent (BOLD) signals were collected using a T2 -weighted echo planar imaging sequence ( isotropic voxels). Anatomical images were collected with a high resolution T1-weighted (1×1×1 mm 3 voxel) with a Magnetization-Prepared RApid Gradient-Echo (MP-RAGE) pulse sequence.

Preprocessing
Preprocessing allows us to make adjustments to improve the signal to noise ratio. Primary preprocessing steps were carried out in AFNI version 16 (Cox, 1996) and include motion correction, coregistration, and normalization to standard MNI space. After the previous steps were completed, ME-ICA (Kundu et al., 2012) was used to further preprocess the data. ME-ICA is a denoising method which uses Independent Components Analysis to split the T2*-signal into BOLD and non-BOLD components. Removing the non-BOLD components mitigates noise due to motion, physiology, and scanner artifacts (Kundu et al., 2017).

Grammatical Annotations
We annotated each sentence in The Little Prince with phrase structure parses from the benepar constituency parser (Kitaev and Klein, 2018). Previous studies have used the Stanford CoreNLP parser, but benepar is much closer to the current stateof-the-art in constituency parsing. To find CCG derivations we used RotatingCCG by Stanojević and Steedman (2019;2020).

Complexity Metric
The complexity metric used in this study is the number of nodes visited in between leaf nodes, on a given traversal of a derivation tree. This corresponds to the number of parsing actions that would be taken, per word, in a mechanistic model of human comprehension (see e.g. Kaplan, 1972;Frazier, 1985). These numbers (i.e. written below the leaves of the trees in Figure 4) are intended as predictions about sentence processing effort, which may be reflected in the fMRI BOLD signal (see discussion of convolution with hemodynamic response function in §6.2).
For constituency parses we examine bottomup (aka shift-reduce parsing), top-down, and leftcorner parsing. Figure 4 shows all these parsing strategies on an example constituency tree. This Figure highlights three points: (a) that the complexity metrics correspond to visited nodes of the tree (b) that they are incremental metrics, computed word by word and (c) that alternative parsing strategies lead to different predictions.
In CCG all natural parsing strategies are bottomup. The main difference among them is what kind of derivation they deliver. We evaluate rightbranching derivations, left-branching derivations and revealing derivations; the latter are simply leftbranching derivations with the addition of the Revealing operation. To compute this we get the best derivation from a CCG parser and then convert it to the three different kinds of semantically equivalent derivations using the tree-rotation operation (Niv, 1994;Stanojević and Steedman, 2019).
In the case of revealing derivations we count only the nodes that are constructed with reduce and right-adjunction operations, but we do not count the nodes constructed with tree-rotation. This is because tree-rotation is not an operation that introduces anything new in the interpretation -treerotation only helps the right-adjunction operation reveal the constituent that needs to be modified.
All parsing strategies have the same total number of nodes, but only differ in the abstract timing of those nodes' construction. In general, leftbranching derivations construct nodes earlier than do the corresponding right-branching derivations. However, in the case of right-adjunction both leftand right-branching derivations delay construction of many nodes until the right-adjunct is consumed. This is not the case with the revealing derivations that are specifically designed to allow flexibility with right-adjuncts.

Hypotheses
Using the formalism-specific and parsing strategyspecific complexity metrics defined above in §5.3, we evaluate three hypotheses.

Hypothesis 1 (H1): CCG improves a model of fMRI BOLD time courses above and beyond context-free phrase structure grammar.
Mildly context-sensitive grammars like CCG capture properties of sentence structure that are only very inelegantly covered by context-free phrase structure grammars. For instance, the recovery of filler-gap dependency in Figure 2 follows directly from the definition of the combinators. This hypothesis supposes that the brain indeed does work to recover these dependencies, and that that work shows up in the BOLD signal.
Hypothesis 2 (H2): The Revealing parser operation explains unique variability in the BOLD signal, variability not accounted for by other CCG derivational steps.
As described above in §4, Revealing allows a CCG parser to handle right-adjunction gracefully. This hypothesis in effect proposes that this enhanced psychological realism should extend to fMRI.

Hypothesis 3 (H3): Left-branching CCG derivations improve BOLD activity prediction over rightbranching.
Left-branching derivations provide maximally incremental CCG analyses. If human processing is maximally incremental, and if this incrementality is manifested in fMRI time courses, then complexity metrics based on left-branching CCG derivations should improve model fit over and above right-branching. 6 Data Analysis

Regions of Interest
We consider six regions of interest in the left hemisphere: the pars opercularis (IFG_oper), the pars triangularis (IFG_tri), the pars orbitalis (IFG_orb), the superior temporal gyrus (STG), the superior temporal pole (sATL) and the middle temporal pole (mATL). These regions are implicated in current neurocognitive models of language (Hagoort, 2016;Friederici, 2017;Matchin and Hickok, 2020). However evidence suggests that particular sentence-processing operations could be localized to different specific regions within this set (Lopopolo et al., 2021;Li and Hale, 2019;. We use the parcellation provided by the automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002) for SPM12. For each subject, extracting the average blood-oxygenation level-dependent (BOLD) signal from each region yields 2,816 data points for each region of interest (ROI). These data served as dependent measures in the statistical analyses described below in §6.3.

Predictors
The predictors of theoretical interest are the parserderived complexity metrics described above in section 5.3. To these we add additional covariates that are known to influence human sentence processing. The first of these is Word Rate, which has the value 1 at the offset of each word and zero elsewhere. The second is (unigram) word Frequency. This is a log-transformed attestation count of the given word type in a corpus of movie subtitles (Brysbaert and New, 2009). The third is the root-mean-squared (RMS) intensity of the audio. Finally we include the fundamental frequency f0 of the narrator's voice as recovered by the RAPT pitch tracker (Talkin, 1995). These control predictors serve to rule out effects that could be explained by general properties of speech perception (cf. Goodkind and Bicknell 2021;Bullmore et al. 1999;Lund et al. 2006). The word-by-word complexity metrics are given timestamps according to the offsets of the words with which they correspond.
In order to use these predictors to model the BOLD signal, we convolve the time-aligned vectors with the SPM canonical hemodynamic response function which consists of a linear combination of two gamma functions and links neural activity and the estimated BOLD signal (see e.g. Henson and Friston, 2007). After convolution, each of the word-by-word metrics of interest is orthogonalized against convolved word rate to remove correlations attributable to their common timing. Figure 7 in the Appendix reports correlations between these predictors.

Statistical Analyses
Data were analyzed using linear mixed-effects regression. 4 All models included random intercepts for subjects. Random slopes for the predictors were not retained either because of convergence failures or because they did not alter the pattern of results.
A theory-guided, model comparison framework was used to contrast alternative hypotheses (articulated in §5.4). The Likelihood Ratio test was used to compare the fit of competing regression models (for an introduction, see Bliese and Ployhart, 2002). Effects were considered statistically significant with α = 0.008 (0.05/6 regions, following the Bonferroni procedure). 5 As a quantitative comparison between ROIs was not directly relevant for the research questions at issue, statistical analyses were carried out by region. This approach, as compared to examining the effects of, and the interactions between, all ROIs and predictors in the same analysis, reduced the complexity of the models and facilitated parameter estimation.
Hypothesis H1 was tested by examining the overall predictive power of the CCG-derived predictors over and above a baseline model that included word rate, word frequency, sound power, fundamental frequency, and word-by-word node counts derived from all three phrase structure parsing strategies:

(I) BOLD ∼ word_rate + word_freq + RMS + f0 + bottom-up + top-down + left-corner {CCG-left + CCG-right + CCG-revealing}
To test H2, we examined whether node counts incorporating the Reveal operation explained BOLD signal variability over and above a baseline model that included, in addition to the variables in (I), node counts from left branching and right branching CCG derivations: 4 Regression analyses used the lme4 R package (version 1.1-26; Bates et al., 2015). 5 A Bonferroni correction of 0.05/6 reflects the fact each of the three hypotheses was tested with a single Likelihood Ratio test per ROI, irrespective of the number of variables in the models compared.

(II) BOLD ∼ word_rate + word_freq + RMS + f0 + bottom-up + top-down + left-corner + CCG-left + CCG-right {CCG-revealing}
Last, for H3 in section 5.4, we tested whether word-by-word traversals of left branching CCG derivations accounted for any significant amount of BOLD signal variability over and above right branching. This amounts to asking whether CCG processing is maximally eager or maximally delayed.

H1: CCG-specific effects
The first question that we investigated was whether CCG derivations would account for any significant amount of BOLD activity over and above bottomup, top-down, and left-corner phrase structure parsing strategies in addition to baseline covariates (i.e. as introduced above in §5.3 and depicted in Figure 4). The overall predictive power of the three CCG derivations emerged to significantly improve the models fit in all six regions examined, thus providing strong support for H1. For all analyses, the complete tables of results are provided in the Appendix (Tables 1 to 6).
To better understand the source of those effects, we followed-up with an additional set of analyses in which we contrasted one CCG parsing strategy at a time against the same baseline model. These CCG parsing strategies exhibit a region-specific pattern of fits which is summarized in Figure 5 Figure 6: Effects of the CCG-revealing predictor by ROI. Coefficient point estimates ± SE. Filled points indicate that the predictor significantly improved model fit. Note that for IFG_orb and mATL, the effects became only marginally significant after Bonferroni correction.
sATL) and marginally significant in two others after Bonferroni correction (IFG_orb and mATL). The positive sign of the statistically significant coefficients in Figure 6 indicates that, as expected, increased processing cost, as derived from the CCGrevealing parser, was associated with increased BOLD activity.

H3: Left-versus Right-branching
In the last set of analyses, we investigated whether left-branching CCG derivations improve BOLD activity predictions over right-branching derivations (H3 in section 5.4). It emerged that the CCG-left predictor significantly improved model fit in IFG_tri, IFG_orb, STG, mATL, and, but only marginally significant after Bonferroni correction, IFG_oper. These findings, overall, indicate the ability of left branching CCG derivations to account for a unique amount of BOLD activity during language processing.

Discussion
The improvement that CCG brings to modeling fMRI time courses -over and above predictors derived from well-known context-free parsing strategies -confirms that mildly context-sensitive grammars capture real aspects of human sentence processing, as suggested earlier by Brennan et al. (2016). We interpret the additional improvement due to the Revealing operation as neurolinguistic evidence in support of that particular way of achieving heightened incrementality in a parser. While it is possible that other incremental parsing techniques might adequately address the challenge of right adjunction (see §4 above) we are at present unaware of any that are supported by evidence from human neural signals. The patterning of fits across regions aligns with the suggestion that different kinds of processing, some more eager and others less so, may be happening across the brain (cf. Just and Varma 2007). For instance the explanatory success of predictors derived from left-branching and Revealing derivations in the middle temporal pole (mATL) supports the idea that this region tracks tightly time-locked, incremental language combinatorics 7 while other regions such as the inferior frontal gyrus (IFG) hang back, waiting to process linguistic relationships until the word at which they would be integrated into a rightbranching CCG derivation (roughly consistent with Friederici, 2017;Pylkkänen, 2019).
In the superior temporal gyrus (STG) the sign of the effect changes for CCG-derived predictors. This is the unique region where Lopopolo et al. (2021) observe an effect of phrase structure processing, as opposed to dependency grammar processing. This could be because our CCG is lexicalized. Of course, the CCGbank grammar captures many other aspects of sentence structure besides lexical dependencies (see Hockenmaier and Steedman 2007). Shain et al. (2020) use a different, noncombinatory categorial grammar to model fMRI time courses. Whereas this earlier publication employs the surprisal linking hypothesis to study predictive processing, the present study considers instead the parsing steps that would be needed to recover grammatical descriptions assigned by CCG. This distinction can be cast as the difference between Marr's computational and algorithmic levels of analysis, as suggested above in §2. But besides the choice of vantage point, there are conceptual differences that lead to different modeling at both levels. For instance, the generalized categorial grammar of Shain et al. (2020) is quite expressive and may go far beyond context-free power. But in that study it was first flattened into a probabilistic context-free grammar (PCFG) to derive surprisal predictions. The present study avoids this step by deriving processing complexity predictions directly from CCG derivations using node count. This directness is important when reasoning from human data, such as neural signals, to mathematical properties of formal systems, such as grammars (see discussion of Competence hypotheses in Steedman, 1989). This prior work by Shain et al. (2020) includes a telling observation: that surprisal from a 5-gram language model improves fit to brain data, over and above a PCFG. Shain et al. hypothesize that this additional contribution is possible expressly because of PCFGs' context-freeness, and that a (mildly) context-sensitive grammar would do better. The results reported here are consistent with this suggestion.
9 Conclusion and Future Work CCG, a mildly context-sensitive grammar, helps explain the time course of word-by-word language comprehension in the brain over and above Penn Treebank-style context-free phrase structure grammars regardless of whether they are parsed leftcorner, top-down or bottom-up. This special contribution from CCG is likely attributable to its more realistic analysis of "movement" constructions (e.g. Figure 2) which would not be assigned by naive context-free grammars. CCG's flexible approach to constituency may offer a way to understand both immediate and delayed subprocesses of language comprehension from the perspective of a single grammar. The Revealing operation, designed to facilitate more human-like CCG parsing, indeed leads to increased neurolinguistic fidelity in a subset of brain regions that have been previously implicated in language comprehension.
We look ahead in future work to quantifying the effect of individual complexity metrics across brain regions using alternative metrics related to surprise and memory (e.g. Graf et al., 2017). This future work also includes investigation of syntactic ambiguity, for instance via beam search along the lines of Crabbé et al. (2019) using the incremental neural CCG model of