CCGweb: a New Annotation Tool and a First Quadrilingual CCG Treebank

We present the first open-source graphical annotation tool for combinatory categorial grammar (CCG), and the first set of detailed guidelines for syntactic annotation with CCG, for four languages: English, German, Italian, and Dutch. We also release a parallel pilot CCG treebank based on these guidelines, with 4x100 adjudicated sentences, 10K single-annotator fully corrected sentences, and 82K single-annotator partially corrected sentences.


Introduction
Combinatory Categorial Grammar (CCG; Steedman, 2000) is a grammar formalism distinguished by its transparent syntax-semantics interface and its elegant handling of coordination. It is a popular tool in semantic parsing, and treebank creation efforts have been made for Turkish (Ç akıcı, 2005), German (Hockenmaier, 2006), English (Hockenmaier and Steedman, 2007), Italian (Bos et al., 2009), Chinese (Tse and Curran, 2010), Arabic (Boxwell and Brew, 2010), Japanese (Uematsu et al., 2013), and Hindi (Ambati et al., 2018). However, all of these treebanks were not directly annotated according to the CCG formalism, but automatically converted from phrase structure or dependency treebanks, which is an error-prone process. Direct annotation in CCG has so far mostly been limited to small datasets for seeding or testing semantic parsers (e.g., Artzi et al., 2015), and no graphical annotation interface is available to support such efforts, making the annotation process difficult to scale. The only exceptions we are aware of are the Groningen Meaning Bank  and the Parallel Meaning Bank (Abzianidze et al., 2017), two annotation efforts which use a graphical user interface for annotating sentences with CCG derivations and other annotation layers, and which have produced CCG treebanks for English, German, Italian, and Dutch. However, these efforts are focused on semantics and have not released explicit guidelines for syntactic annotation. Their annotation tool is limited in that annotators only have control over lexical categories, not larger constituents. Even though CCG is a lexicalized formalism, where most decisions can be made on the lexical level, there is no full control over attachment phenomena in the lexicon. Moreover, these annotation tools are not open-source and cannot easily be deployed to support other annotation efforts.
In this paper, we present an open-source, lightweight, easy-to-use graphical annotation tool that employs a statistical parser to create initial CCG derivations for sentences, and allows annotators to correct these annotations via lexical category constraints and span constraints. Together, these constraints make it possible to effect (almost) all annotation decisions consistent with the principles of CCG. We also present a pilot study for multilingual CCG annotation, in which a parallel corpus of 4x100 sentences (in English, German, Italian, and Dutch) was annotated by two annotators per sentence, a detailed annotation manual was created, and adjudication was performed to create a final version. We publicly release the manual, the annotation tool, and the adjudicated data. Our release also includes an additional > 10 K derivations, each manually corrected by a single annotator, and an additional > 82 K sentences, each partially corrected by a single annotator.

An Annotation Tool for CCG
Our annotation tool CCGweb 1 is Web-based, implemented in Python, PHP, and JavaScript, and should be easy to deploy on any recent Linux dis-  tribution. It has two main views: the home page shows the list of sentences an annotator is assigned to annotate. Those already done are marked as "marked correct". Clicking on a sentence takes the annotator to the sentence view. Annotators can also enter arbitrary sentences to annotate, e.g., for experimenting or for producing illustrations.
Dynamic Annotation Annotation follows an approach called dynamic annotation (Oepen et al., 2002) or human-aided machine annotation , in which sentences are automatically analyzed, annotators impose constraints to rule out undesired analyses, sentences are then reanalyzed subject to the constraints, and the process is repeated until only the desired analysis remains. The current system is backed by the EasyCCG parser (Lewis and Steedman, 2014), slightly modified to allow for incorporating constraints, and other CCG parsers could be plugged in with similar modifications.
What You See Is What You Get Derivations are rendered in the same graphical format that is used in the literature, representing nodes as horizontal lines placed underneath their children. Annotators directly interact with this graphical representation when annotating, following the WYSI-WYG (what you see is what you get) principle.
Lexical Category Constraints As an example of editing, consider Figure 1. Suppose that the parser has analyzed there as an adjunct with category (S \ NP)\(S \ NP), but we wish to analyze it as an argument to the verb go with category PP. As a result, the category of the verb also has to change, viz. from To do this, the annotator clicks on the category and changes it, as shown in the figure. When they hit enter or click somewhere else, the sentence is automatically parsed again in the background, this time with the lexical category constraint that go has category (S[b]\ NP)/ PP. In many cases, the parser will directly find the desired parse, with there being a PP, and the annotator only has to check it, not make another edit.
Span Constraints Although constraining lexical categories is often enough to determine the entire CCG derivation (cf. Bangalore and Joshi, 1999;Lewis and Steedman, 2014), this is not always the case. For example, consider the sentence I want to be a millionaire like my dad. Assuming that like my dad is a verb phrase modifier (category (S \ NP)\(S \ NP)), it could attach to either to be or want, giving very different meanings (cf. Zimmer, 2013). We therefore implemented one other type of edit operation/constraint: span constraints. By simply clicking and dragging across a span of tokens as shown in Figure 2, annotators can constrain this span to be a constituent in the resulting parse. Additional Features Our tool offers annotators some additional convenient features. When unsure about some annotation decision, they can click the "report issue" button to open a discussion thread in an external forum, such as a GitHub issue tracker. To erase all constraints and restart annotation from the parser's original analysis, an annotator can click the "reset" button. And the buttons "HTML" and "LaTeX" provide code that can be copied and pasted to use the current derivation as an illustration on a web page or in a paper.
Adjudication Support Once two or more annotators have annotated a sentence, disagreements need to be discovered, and a final, authoritative version has to be created. Our tool supports this adjudication process through the special user account judge. This user can see the derivations of other annotators in a tabbed interface as shown in Figure 3. In order to enable the judge to easily spot disagreements, categories that annotators disagree on are struck through, and constituents that annotators disagree on are dashed.

A Quadrilingual Pilot CCG Treebank
To test the viability of creating multilingual CCG treebanks by direct annotation, we conducted an annotation experiment on 110 short sentences from the Tatoeba corpus (Tatoeba, 2019), each in four translations (English, German, Italian, and Dutch). The main annotation guideline was to copy the annotation style of CCGrebank (Honni-bal et al., 2010), a CCG treebank adapted from CCGbank (Hockenmaier and Steedman, 2007), which is in turn based on the Penn Treebank (Marcus et al., 1993). Since CCGrebank only covers English and lacks some constructions observed in our corpus, an annotation manual with more specific instructions was needed. We initially annotated ten sentences in four languages and discussed disagreements. The results were recorded in an initial annotation manual, and the initial annotations were discarded. Each of the remaining 4x100 sentences was then annotated independently by at least two of the authors.
Table 1 (upper part) shows the number of nonoverlapping category and span constraints that each annotator created on average per sentence before marking the sentence as correct. Annotated sentences were manually classified by the first author into four classes: (0) sentences without any disagreements, (1) sentences with only trivial violations of the annotation guidelines (e.g., concerning attachment of punctuation or underspecifying modifier features), (2) sentences with only apparent oversights, such as giving a determiner a pronoun category, (3) sentences with more intricate disagreements which required additional guidelines to resolve. Table 1 (upper part) shows the distribution of disagreement classes, and Table 2 shows examples of class (3). The first author adjudicated all disagreements and updated the annotation manual accordingly. We release the manual and the full adjudicated dataset. 2 To make the resource more useful (e.g., for training parsers), we also include in the release the syntactic CCG derivations created so far in the Parallel Meaning Bank (Abzianidze et al., 2017). These do not follow the annotation guidelines in detail due to their focus on semantics, nor have they been adjudicated, but instead corrected by a single annotator. However, they are much greater in number. For an even greater number, we also release partially corrected derivations, meaning that the annotator made at least one change to the automatically created derivation.  lexical label constraints and span constraints, adjudication support, and various conveniences. We have used this tool to create the first published CCG resource that comes with an explicit annotation manual for syntax and has been created by direct annotation, rather than conversion from a non-CCG treebank. It is multilingual, currently including English, German, Italian, and Dutch, and aims for cross-lingually consistent annotation guidelines.
For future work, we envision more extensive direct annotation of multilingual data with CCG derivations, and putting them to use for evaluating unsupervised and distantly supervised CCG parsers. We would also like to investigate the use of our tool as an interactive aid in teaching CCG.