A Type-coherent, Expressive Representation as an Initial Step to Language Understanding

A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.

: The semantic interpretation process, with the ULF step in the fore. Structurally dependent steps in the interpretation process are connected by solid black arrows and structurally independent information flow is represented with dashed blue arrows. The components that changed from the previous structural step are highlighted in yellow. Backward information arrows indicate that arriving at the optimal choice at a particular step may depend on "later" -or structurally dependent -steps.
Our working hypothesis in designing ULF is that a divide-and-conquer approach starting with preliminary surface-like LFs is a practical way to generate fully resolved interpretations of natural language in EL. Figure 1 shows a diagram of our divide-and-conquer approach, which elaborated upon in Section 3.3. We also outline a framework for quickly and reliably collecting ULF annotations for a corpus in a multi-pronged approach. Our evaluation of the annotation framework shows that we achieve annotation speeds and agreement comparable to those for the abstract meaning representation (AMR) project, which has successfully built a large enough corpus to drive research into corpus-based parsing (Banarescu et al., 2013). Further resources relating to this project, including a more in-depth description of ULFs, the annotation guidelines, and related code are available from the project website http://cs.rochester.edu/u/gkim21/ulf/.

Episodic Logic
EL is a semantic representation that extends FOL to more closely match the expressivity of natural languages. It echoes both the surface form of language, and more crucially, the semantic types that are found in all languages. Some semantic theorists view the fact that noun phrases denoting both concrete and abstract entities can appear as predicate arguments (Aristotle, everyone, the fact that there is water on Mars) as grounds for treating all noun phrases as being of higher types (e.g., second-order predicates). EL instead uses a small number of reification operators to map predicate and sentence intensions to individuals. As a result, quantification remains first-order (but allows quantified phrases such as most people who smoke, or hardly any errors). Another distinctive feature of EL is that it treats the relation between sentences and episodes (including events, situations, and processes) as a characterizing relation, written '**'. This coincides with the Davidsonian treatment of events as extra variables of predicates (Davidson, 1967) when we restrict ourselves to positive, atomic predications. However, '**' also allows for logically complex characterizations of episodes, such as not eating anything all day, or each superpower menacing the other with its nuclear arsenal (Schubert, 2000).
EL defines a hierarchical ontology over the domain of individuals, D. D includes simple individuals, e.g. John, possible situations, S, possible worlds, W ĂS , various numerical types, propositions, P , and kinds, K , as well as others that are not important for the purposes of this document. A complete description of the ontology is provided by Schubert and Hwang (2000). The types of some predicates are further restricted by these categories. For example, the predicate claim.v -as in "I claim that grass is red." -has the type P Ñ pD Ñ pS Ñ 2qq, since its first argument is a proposition and the second argument is a simple individual (in the semantics of EL the agent argument is supplied last, though it precedes the predicate in the surface syntax).
The semantic types in EL are defined by recursive functions over individuals, D, and truth values, t0, 1u, written as 2. Semantic values of predicates applied to their surface arguments can yield a value in 2 at a given (possible) situation, or be undefined there (indicating irrelevance of the predication in the given situation). Most predicates in EL are of type D n Ñ pS Ñ 2q (where D 2 Ñ 2 abbreviates D Ñ pD Ñ 2q, D 3 Ñ 2 abbreviates D Ñ pD Ñ pD Ñ 2qq, and so on). That is, they are first-order intensional predicates. 1 Monadic predicates play a particularly important role in EL as well as ULF, and we will abbreviate their type D Ñ pS Ñ 2q as N . In EL syntax, square brackets indicate infixed operators (i.e. rτ n π τ 1 ... τ n´1 s where π is the operator) and parentheses indicate prefixed operators (i.e. pπ τ 1 ... τ n q where π is the operator). Predicative formulas such as [|Aristotle| famous.a] or [|Romeo| love.v |Juliet|] are regarded as temporal and must be evaluated with respect to a situation via an episode-relating operator (e.g. '**') to supply the episode and thus produce an atemporal formula.
There are also a limited number of type-shifting operators in EL to map between some of these types.
The kind operator, 'k', shifts a monadic predicate into a kind, pD Ñ pS Ñ 2qq Ñ K , and the operator , 'that', forms propositions from sentence intensions, pS Ñ 2q Ñ P . "that grass is red", a segment of an earlier example, is formulated as (that [(k grass.n) red.a]) in EL, uses both of these operators.

Unscoped logical form
ULFs are type-coherent initial LFs which provide a stepping stone to capturing full sentential EL meanings. They enable interesting classes of structural inferences that are of broader scope than those enabled by Natural Logic (NLog) (Sánchez Valencia, 1995), and unlike NLog inferences do not depend on prior knowledge of the propositions to be confirmed or refuted. ULF captures the full predicate argument structure of EL while leaving word sense, scope, and anaphora unresolved. Therefore, ULFs can be analyzed using the formal EL type system while taking the scopal ambiguities into account. There is not enough space here to exhaustively discuss how ULF handles various phenomena, so the discussion will be restricted to the broad framework of ULF and the most crucial aspects of the semantics. Please refer to http://cs.rochester.edu/u/gkim21/ulf/ for complete information on ULF.

ULF Syntax
All atoms in ULF, with the exception of certain logical functions and syntactic macros, are marked with an atomic syntactic type. The atomic syntactic types are written with suffixed tags: .v,.n,.a,.p, .pro,.d,.aux-v,.aux-s,.adv-a,.adv-e,.adv-s,.adv-f,.cc,.ps,.pq,.mod-n, or .mod-a, except for names, which use wrapped bars, e.g. |John|. These are intended to echo the part-of-speech origins of the constituents, such as verb, noun, adjective, preposition, pronoun, determiner, etc., respectively; some of them contain further specifications as relevant to their entailments, e.g., .adv-e for locative or temporal adverbs (implying properties of events). The distinctions among predicates of sorts .v,.n,.a,.p, corresponding to English parts of speech, are often suppressed in other LFs for language, but are semantically important. For example, "Bob danced" can refer to a brief episode while "Jill was a dancer" generally cannot (and may suggest Jill is no longer alive); this is related to the fact that verbal predicates are typically "stage-level" (episodic) while nominal predicates are generally "individual-level" (enduring). Whereas in EL the bracket type specifies whether prefix or infix notation is being used, in ULF this distinction is inferred from the semantic types of the constituents and only parentheses are used.
((if.ps (i.pro ((cf were.v) (= you.pro)))) (i.pro ((cf will.aux-s) (be.v (able.a (to succeed.v)))))) (3) Flowers are weak creatures ((k (plur flower.n)) ((pres be.v) (weak.a (plur creature.n)))) (4) Very few people still debate the fact that the earth is heating up (((fquan (very.mod-a few.a)) (plur person.n)) (still.adv-s (debate.v (the.d (n+preds fact.n (= (that ((the.d |Earth|.n) ((pres prog) heat_up.v)))))))) Atoms that are implicit in the sentence or elided and thus supplied by the annotator are wrapped in curly brackets, such as {ref}.pro in example (1) of Figure 2. For practical purposes we distinguish raw ULF from postprocessed ULF. In raw ULF we allow certain argument-taking constituents to be dislocated from their "proper" place, so as to adhere more closely to linguistic surface structure and thereby facilitate annotation. For example, sentence-level operators (of type adv-s) appearing mid-sentence may be left "floating" (e.g., (|Alice| certainly.adv-s ((pres know.v) |Bob|))), since they can be automatically lifted to the sentence-level; and verb-level adverbs (of type adv-a) can be interleaved with arguments (e.g., ((past speak.v) sternly.adv-a (to.p-arg |Bob|))), even though semantically they operate on the whole verb phrase. Kim and Schubert (2017) presented this method of dislocated annotation for sentence-level operators. In postprocessed ULF, we can understand all atoms and subexpressions of well-formed formulas (wffs) as being one of the following ULF constituent types (modulo some following remarks): entity, predicate, determiner, monadic predicate modifier, sentence, sentence modifier, connective, lambda abstract, or one of a limited number of type-shifting operators, where the predicates and operators that act on predicates are subcategorized by whether the predicate is derived from a noun, verb, adjective, or preposition. These constituent types uniquely map to particular semantic types, i.e. are aliases for the formal types. Clausal constituents are combined according to their bracketing and semantic types.
A qualification of the above general claim is that unscoped tense operators, determiners, and coordinators remain in their surface position even in postprocessed ULF. For example, in (|Bob| ((pres own.v) (a.d dog.n))), pres is actually an unscoped sentence-level operator (which, in conversion to EL, is deindexed to yield a characterization of an episode by the sentence, and a temporal predication about that episode). We also retain coordinated expressions such as ((in.p |Rome|) and.cc happy.a), where this will ultimately lead to a sentential conjunction in EL. Similarly, (a.d dog.n) is kept in argument position as if it were of semantic type D (thus, as if the determiner were of semantic type N Ñ D). 2 Such unscoped constituents do not disrupt type coherence, because the possible conversions to type-coherent EL are well-defined.
Finally, both raw ULFs and postprocessed ULFs can contain macros. For example, the macro operator n+preds is used for postmodified nominal predicates such as (n+preds dog.n (on.p (a.d leash.n))) -see also example (4) in Figure 2; this avoids immediate introduction of a λ-abstracted conjunction of predicates, simplifying the annotation task. Appendix C discusses macros further, including their formal definitions. Section 4 will ground the high-level discussions in this and the following section with a concrete discussion of modifiers.

ULF Type Structure
The type-shifting operators mentioned in the previous section are crucial for type coherence in ULFs. In example (1) the phrase "for me" is coded as (adv-a (for.p me.pro)), rather than simply (for.p me.pro) because it is functioning as a predicate modifier, semantically operating on the verbal predicate (dial.v {ref1}.pro) (dial a certain thing). Let N ADJ , N N , and N V be the sortal refinements of the monadic predicate type N corresponding to adjectives, nouns, and verbs, respectively. (adv-a (for.p me.pro)) has type N V Ñ N V . Without the adv-a operator the prepositional phrase is just a 1-place predicate. Its use as a predicate is apparent in contexts like "This puppy is for me". Note that semantically the 1-place predicate (for.p me.pro) is formed by applying the 2-place predicate for.p to the (individualdenoting) term me.pro. If we apply (for.p me.pro) to another argument, such as |Snoopy| (the name of a puppy), we obtain a sentence intension. 3 So semantically, adv-a is a type-shifting operator of type This brings up the issue of intensionality, which is preserved in ULF. Example (2) is a counterfactual conditional, and the consequent clause "I would be able to succeed" is not evaluated in the actual world, but in a possible world where the (patently false) antecedent is imagined to be true. ULF captures this with the 'cf' operator in place of the tense and the EL formulas derived from it are evaluated with respect to possible situations (episodes), whose maxima are possible worlds. The type of 'cf' is pS Ñ 2q Ñ pS Ñ 2q after operator scoping to the sentence-level, but like tense operators is kept with the verb in raw ULF, essentially functioning as a predicate-level identity function, pλX.Xq, there.
'to' in (2), 'k' in (3), and 'that' in (4) are all operators that reify different semantic categories, shifting them to abstract individuals. 'to' (synonym: ka) shifts a verbal predicate to a kind (type) of action or attribute, N V Ñ K A ; 'k' shifts a nominal predicate to a kind of thing, N N Ñ K (so the subject in example (3) is the abstract kind, flowers, whose instances consist of sets of flowers); and 'that' produces a reified proposition, pS Ñ 2q Ñ P (again an abstract individual) from a sentence meaning.
Using these type shifts, EL and ULF are able to maintain a simple, classical view of predication, while allowing greater expressivity than the most widely employed LFs.

Role of ULF in Comprehensive Semantic Interpretation
ULFs are underspecified, but their surface-like form and the type structure they encode make them wellsuited to reducing underspecification by using well-established linguistic principles and exploiting the distributional properties of language. Figure 1 shows the interpretation process for EL formulas and the role of ULFs in providing the first step into it. Due to the structural dependencies between the components in the interpretation process, the optimal choice at any given component depends on the overall coherence of the final interpretation; hence the backward arrows in the figure. Word sense disambiguation (WSD) and anaphora have no structural dependencies in the interpretation process so they are separated from and fully connected to the post-ULF components. These resolutions are depicted in the last step in the figure.
WSD & Anaphora: While (weak.a (plur creature.n)) in example (3) does not specify which of the dozen WordNet senses of weak or three senses of creature is intended here, the type structure is perfectly clear: A predicate modifier is being applied to a nominal predicate. ULF also does not assume unique adicity of word-derived predicates such as run.v, since such predicates can have intransitive, simple transitive and other variants, but the adicity of a predicate in ULF is always clear from its structural context -we know that it has all its arguments in place when an argument (the "subject") is placed on its left, as in English.
Linguistic constraints (e.g. binding constraints) exist for coreference resolution. For example, in "John said that he was robbed", he can refer to John; but this is not possible in "He said that John was robbed", because in the latter, he C-commands John, i.e., in the phrase structure of the sentence, it is a sibling of an ancestor of John. ULF preserves this structure, allowing use of such constraints. While ULF constrains the word senses and coreferences through adicity and syntactic structure, WSD and anaphora resolution should not be applied to isolated sentences since word sense patterns and coreference chains often span multiple sentences.
Scoping: Unscoped constituents (determiners, tense operators, and coordinators) can generally "float" to more than one possible position. Following a view of scope ambiguity developed by Schubert and Pelletier (1982) elaborated on by Hurum and Schubert (1986), these constituents always float to presentential positions, and determiner phrases leave behind a variable that is then bound at the sentential level. The accessible positions are constrained by linguistic restrictions, such as scope island constraints in subordinate clauses (Ruys and Winter, 2010). Beyond this, many factors influence preferred scoping possibilities, with surface form playing a prominent role (Manshadi et al., 2013). The proximity of ULF to surface syntax enables the use of these constraints.
Deindexing and Canonicalization: Much of the past work relating to EL has been concerned with the principles of deindexing (Hwang, 1992;Hwang and Schubert, 1994;Schubert and Hwang, 2000). Deindexing corresponds to the introduction of event variables for explicitly characterizing the sentence it is linked to via the '**' operator (this variable becomes |E|.sk in Figure 1 after Skolemization). Hwang and Schubert's approach to tense-aspect processing, constructing tense trees for temporally relating event variables, is only possible if the LF being processed reflects the original clausal structure -as ULF indeed does. Canonicalization is the mapping of an LF into "minimal", distinct propositions, with top-level Skolemization. The CLF step in Figure 1 contains two separate formulas as a result of this process.
Episodic Logical Forms (ELF): When episodes have been made explicit and all anaphoric and word ambiguities are resolved the result is a set of episodic logical forms. These can be used in the EPILOG inference engine for reasoning that combines linguistic semantic content with world knowledge. 4 A variety of complex EPILOG inferences are reported by Schubert (2013), and Morbini and Schubert (2011) give examples of self-aware metareasoning. EPILOG also reasoned about snippets from the Little Red Riding Hood story, for example using knowledge about the world and goal-oriented behavior to understand why the presence of nearby woodcutters prevented the wolf from attacking Little Red Riding Hood when he first saw her (Hwang, 1992;Schubert and Hwang, 2000).

Inference with ULFs
An important insight of NLog research is that language can be used directly for inference, requiring only phrase structure analysis and upward/downward entailment marking (polarity) of phrasal contexts. This means that NLog inferences are situated inferences, i.e., their meaning is just as dependent on the utterance setting and discourse state as the linguistic "input" that drives them. This insight carries over to ULFs, and provides a separate justification for computing ULFs, apart from their utility in the process of deriving EL interpretations from language. The semantic type structure encoded by ULFs provides a more reliable and general basis for situated inference than mere phrase structure. Here, briefly, are some kinds of inferences we can expect ULFs to support with minimal additional knowledge due to their structural nature: • NLog inferences based on generalizations/specializations. For example, "Every NATO member sent troops to Afghanistan", together with the knowledge that France is a NATO member and that Afghanistan is a country entails that France sent troops to Afghanistan and that France sent troops to a country.
• Inferences based on implicatives. For example, "She managed to quit smoking" entails that She quit smoking (and the negation of the premise leads to the opposite conclusion). Inferences of this sort have been demonstrated for headlines using ELFs by Stratos et al. (2011).
• Inferences based on attitudinal and communicative verbs. For example, "John denounced Bill as a charlatan" entails that John probably believes that Bill is a charlatan, that John asserted to his listeners (or readers) that Bill is a charlatan, and that John wanted his listeners (or readers) to believe that Bill is a charlatan. These inferences would be hard to capture within NLog, since they are partially probabilistic, require structural elaboration, and depend on constituent types.
• Inferences based on counterfactuals. For example, "If I were rich, I would pay off your debt" and "I wish I were rich" both implicate that the speaker is not rich. This depends on recognition of the counterfactual form, which is distinguished in ULF.
• Inferences from questions and requests. For example, "When are you getting married?" enables the inferences that the addressee will get married (in the foreseeable future), that the questioner wants to know the expected date of the event, and that the addressee probably knows the answer and will supply it. Similarly an apparent request such as "Could you close the door?" implies that the speaker wants the addressee to close the door, and expects that he or she will do so.

Predicate and Sentence Modification in Depth
Here we ground the general description of ULF given so far with an in-depth discussion of how ULF handles modification. This is done with the purpose of demonstrating how the core syntax of ULF, its syntactic looseness, and semantic types fit together in practice. EL semantic types represent predicate modifiers as functions from monadic intensional predicates to monadic intensional predicates, i.e., N Ñ N , which enables handling of intersective, subsective, and intensional modifiers such as in the examples ((mod-n wooden.a) shoe.n), ((mod-n ice.n) pick.n), (fake.mod-n ruby.n), ((mod-a worldly.a) wise.a), (very.mod-a fit.a), (slyly.adv-a grin.v). Modifier extensions .mod-n, and .mod-a respectively reflect the linguistic categories of nounpremodifying (attributive) adjectives and adjective-premodifying adverbs; correspondingly, operators mod-n, and mod-a type-shift prenominal predicates to modifiers applicable to predicates of sorts .n and .a respectively. Modifier extension .adv-a reflects the linguistic category of VP adverbials, and operator adv-a creates such modifiers from predicates. Thus, "walk with Bob" is represented in raw and postprocessed ULF respectively as (walk.v (adv-a (with.p |Bob|))) and ((adv-a (with.p |Bob|)) walk.v). Adverbial modifiers of the sort .adv-a intuitively modify actions, experiences, or attributes, as distinct from events. Thus "He lifted the child easily" refers to an action that was easy for the agent, rather than to an easy event. Actions, experiences, and attributes in EL are individuals comprised of agentepisode pairs, and this allows modifiers of the sort .adv-a to express a constraint on both the agent and the episode it characterizes. As such, actions are not explicitly represented in ULF and derived during deindexing when event variables are introduced.
A formula or nonatomic verbal predicate in ULF may contain sentential modifiers of type pS Ñ 2q Ñ pS Ñ 2q: .adv-s, .adv-e, and .adv-f. Again there are type-shifting operators that create these sorts of modifiers from monadic predicates. Ones of the sort .adv-s are usually modal (and thus opaque), e.g., perhaps.adv-s, (adv-s (without.p (a.d doubt.n))); However, negation is transparent in the usual sense -the truth value of a negated sentence depends only of the truth value of the unnegated sentence. Modifiers of sort .adv-e are transparent, typically implying temporal or locative constraints, e.g., today.adv-e, (adv-e (during.p (the.d drought.n))), (adv-e (in.p |Rome|)); these constraints are ultimately cashed out as predications about episodes characterized by the sentence being modified. (This is also true for the past and pres tense operators.) Similarly any modifier of sort .adv-f is transparent and implies the existence of a multi-episode (characterized by the sentence as a whole) whose temporally disjoint parts each have the same characterization (Hwang and Schubert, 1994); e.g., regularly.adv-f, (adv-f (at.p (three.d (plur time.n)))); The earlier walk with Bob example shows how in ULF the operator and operand can be inferred from the constituent types. Consider the types for play.v and (adv-a (with.p (the.d dog.n))). Since they have types N V and N V Ñ N V , respectively, we can be certain that (adv-a (with.p (the.d dog.n))) is the operator while play.v is the operand.
In practice, we're able to drop the mod-a, mod-n, and nnp type-shifters during annotation since we can post-process them with the appropriate type-shifter to make the composition valid. We assume in these cases that the prefixed predicate is intended as the operator, which reflects a common pattern in English. Thus, "burning hot melting pot" would be hand annotated as ((burning.a hot.a) (melting.n pot.n)) which would be post-processed to ((mod-n ((mod-a burning.a) hot.a)) ((mod-n melting.n) pot.n)) While the prefixed predicate modification allows us to formally model non-intersective modification, there are modification patterns in English that force an intersective interpretation, e.g., post-nominal modification and appositives, and we annotate them accordingly. "The buildings in the city" is annotated (the.d (n+preds (plur building.n) (in.p (the.d city.n)))) which is equivalent (via the n+preds macro) to (the.d (λx ((x (plur building.n)) and.cc (x (in.p (the.d city.n)))))).

Annotating a ULF Corpus
The syntactic relaxations in ULF and the annotation environment work hand-in-hand to enable quick and consistent annotations. ULF syntax relaxations are designed to: (1) Preserve surface word order and (2) Make the annotations match linguistic intuitions more closely. As a result, annotating a sentence with its ULF interpretation boils down to marking the words with their semantic types, bracketing the sentence according to the operator-operand relations, then introducing macros and logical operators as necessary to make the ULF type-consistent. The annotation environment is designed to assist in this process by improving the readability of long ULFs and catching mistakes that are easy to miss. The environment is shared across annotators with certainty marking so that more experienced annotators can correct and give feedback to trainees. This streamlines the training process and minimizes the mistakes entering into the corpus. Here are the core annotator features. 5 1. Syntax and bracket highlighting. Highlights the cursor location and the closing bracket, unmatched brackets and quotes, operator keywords, and badly placed operators.
2. Sanity checker. Alerts the annotator to invalid type compositions and suggests corrections for common mistakes.
3. Certainty marking. Annotators can mark whether they are certain of an annotation's correctness so that partial progress can be made while preserving the integrity of the corpus.
4. Sentence-specific comments. Annotators can record their thoughts on partially complete annotations so that others can pick up where they left off.
The ULF type system makes it possible to build a robust sanity checker for the annotator. The type system severely restricts the space of valid ULF formulas and usually when an annotator makes an error in annotation, it leads to a type inconsistency.

Experimental Results and Current Progress
We ran a timing study and an interannotator agreement (IA) study to quantify the efficacy of the presented annotation framework. We timed 80 annotations of the Tatoeba dataset and found the average annotation speed to be 8 min/sent with 4 min/sent among the two experts and 11 min/sent among the three trainees that participated. AMRs reportedly took on average 10 min/sent (Hermjakob, 2013). In the IA study five annotators each annotated between 18 and 23 sentences from the same set of 23 sentences, marking their certainty of the annotations as they normally would. The sentences were sampled from the four datasets listed in Table 1. The mean and standard deviation of sentence length were 15.3 words and 10.8 words, respectively. We computed a similarity score between two annotations using EL-smatch (Kim and Schubert, 2016), a generalization of smatch  which handles non-atomic operators. The document-level ELsmatch score between all annotated sentence pairs was 0.70. When we restricted the analysis to just annotations that were marked certain, the agreement rose to 0.78. The complete pairwise scores are shown in Table 2. Notice that annotators 1, 2, and 3 had very high agreement with each other. If we restrict the agreement to just those three annotators, the full and certain-subset scores are 0.79 and 0.88, respectively. Out of all the annotations, less than a third were marked as uncertain or incomplete. AMR annotations reportedly have annotator vs consensus IA of 0.83 for newswire and 0.79 for web text (Tsialos, 2015). This study also demonstrates that the certainty marking indeed reflects the quality of the annotation, thus performing the role we intended. Also, based on the high agreement between annotators 1, 2, and 3, we can conclude that consistent ULF annotations across multiple annotators is possible. However, the lower scores of annotators 4 and 5, even in annotations marked as certain, indicates room for improvement in the annotation guidelines and training of some annotators.
We have so far collected 927 certain annotations and have 1,580 in total. The full annotation breakdown is in Table 1. We started with the English portion of the Tatoeba dataset (https://tatoeba.org/ eng/), a crowd-sourced translation dataset. This source tends to have shorter sentences, but they are more varied in topic and form. We then added text from Project Gutenberg (http://gutenberg.org), the UIUC Question Classification dataset (Li and Roth, 2002), and the Discourse Graphbank (Wolf, 2005). Preliminary parsing experiments on a small dataset (900 sentences) show promising results and we expect to be able to build an accurate parser with a moderately-sized dataset and representation-specific engineering (Kim, 2019).

Related Work
A notable development in general representations of semantic content has been the design of AMR (Banarescu et al., 2013) followed by numerous research studies on generating AMR from English and on using it for downstream tasks. AMR is intended as a kind of intuitive normal form for the relational context of English sentences in order to assist in machine translation. Given this goal, AMR deliberately neglected issues such as articles, tense, the distinction between real and hypothetical entities, and nonintersective modification. In the context of inference, this risks making false conclusions such as that a "big ant" is bigger than a "small elephant".
Still, this development was an inspiration to us in terms of both the quest for broad coverage and methods of learning and evaluating semantic parsers. There has also been much activity in developing semantic parsers that derive logical representations, raising the possibility of making inferences with those representations (Artzi et al., 2015;Artzi and Zettlemoyer, 2013;Howard et al., 2014;Kate and Mooney, 2006;Konstas et al., 2017;Kwiatkowski et al., 2011;Liang et al., 2011;Poon, 2013;Popescu et al., 2004;Tellex et al., 2011). The techniques and formalisms employed are interesting (e.g., learning of CCG grammars that generate λ-calculus expressions), but the targeted tasks have generally been question-answering in domains consisting of numerous monadic and dyadic ground facts ("triples"), or simple robotic or human action descriptions. 6 Noteworthy examples of formal logic-based approaches, not targeting specific applications are Bos ' (2008) andDraiccio et al.'s (2013), whose hand-built semantic parsers respectively generate FOL formulas and OWL-DL expressions. But these representations preclude generalized quantifiers, modification, reification, attitudes, etc. We are not aware of any work on inference generation of the type ULFs targets, based on these projects. A couple yet-unmentioned but notable semantic annotation projects are the Groningen Meaning Bank (Bos et al., 2017), with discourse representation structure (DRS) annotations (Kamp, 1981) and the Redwoods treebank (Flickinger et al., 2012;Oepen et al., 2002) with Minimal Recursion Semantics (MRS) (Copestake et al., 2005) annotations. DRSs have the same representational limitations as Bos' (2008) system. MRS is descriptively powerful and linguistically motivated, with significant resources including a hand-built grammar, multiple parsers, and a large annotated dataset (Bub et al., 1997;Callmeier, 2001). Given that MRS is an object-language agnostic, meta-level semantic representation, an inference system cannot be built directly for MRS based on model-theoretic notions of interpretation, truth, satisfaction, and entailment. However, the lack of an object-language in MRS leaves open the possibility of forming a correspondence between MRS and ULF that fully respects both formalisms. Finally, the use of unscoped LFs in a rule-to-rule framework was first introduced by Schubert and Pelletier (1982) and a similar approach to scope ambiguity was taken by the Core Language Engine (Alshawi and van Eijck, 1989).

Conclusion & Future Work
ULF, the underspecified initial representation for EL described in this document, captures a subset of the semantic information of EL that allows it to be annotated reliably, participate in the complete resolution to EL, and form the basis for structural inferences that are important for language understanding tasks. We will continue this work by expanding the corpus of ULF annotations and training a statistical parser over that corpus. Automatic ULF parses could then be used as the backbone for a complete EL parser or as the core representation for NLP tasks that require sentence-level formal semantic information or structural inferences.