Bridging Resolution: A Survey of the State of the Art

Bridging reference resolution is an anaphora resolution task that is arguably more challenging and less studied than entity coreference resolution. Given that significant progress has been made on coreference resolution in recent years, we believe that bridging resolution will receive increasing attention in the NLP community. Nevertheless, progress on bridging resolution is currently hampered in part by the scarcity of large annotated corpora for model training as well as the lack of standardized evaluation protocols. This paper presents a survey of the current state of research on bridging reference resolution and discusses future research directions.


Introduction
Bridging resolution is an anaphora resolution task that involves identifying and resolving bridging/associative anaphors, which are anaphoric references to non-identical associated antecedents. To better understand the difficulty of the task, consider the following sentences: Even if baseball triggers losses at CBS -and he doesn't think it will -"I'd rather see the games on our air than on NBC and ABC," he says.
In this example, a bridging link exists between the anaphor the games and its antecedent baseball, as the definite description cannot be interpreted correctly unless it is associated with baseball.
Bridging anaphora resolution is arguably more difficult than entity coreference resolution, the task of determining which entity mentions in a text refer to the same real-world entity. For entity coreference resolution, there are well-defined linguistic constraints at the grammatical (e.g., gender and number agreement), syntactic (e.g., binding theory), semantic (e.g., semantic class agreement), and discourse (e.g., centering) levels. Oftentimes, the antecedent of an anaphor can be identified by comparing its lexical similarity with the anaphor. In contrast, there are typically no clear syntactic or other surface clues for identifying the antecedent of a bridging anaphor. It is not uncommon that resolution requires the use of context as well as commonsense inference. Furthermore, while antecedents in entity coreference are noun phrases (NPs), antecedents in bridging can also be non-NPs such as verb phrases (VPs) or clauses, which considerably increase the possible number of candidate antecedents for each anaphor.
Bridging resolution is comparatively less studied than entity coreference resolution. Progress on bridging resolution is currently hampered in part by the scarcity of large annotated corpora for model training as well as the lack of standardized evaluation protocols. More specifically, while there are a few bridging corpora that are used more extensively than the others for evaluation purposes, many of which do not have standard train-test partitions. Moreover, these corpora were annotated with somewhat different definitions of bridging, so good performance on one corpus does not necessarily translate to good performance on another. Worse still, resolvers were evaluated under different settings. For instance, different researchers employ different, sometimes undocumented strategies for filtering bridging anaphors and candidate antecedents, while others employ gold annotations (e.g., syntactic parses, coreference) for feature computation. Above all, many of the implementations of bridging resolvers have not been This work is licensed under a Creative Commons Attribution 4.0 International License.
License details: urlhttp://creativecommons.org/licenses/by/4.0/. made publicly available. The lack of a standard evaluation protocol has made it somewhat difficult to track research progress on this task. To some extent, this is reminiscent of the state of affairs with entity coreference research prior to the CoNLL 2011 and 2012 shared tasks on entity coreference resolution. As significant advances have been made on entity coreference resolution, we believe that bridging reference resolution will gain increasing attention in the years to come. Our goal in this paper is to provide a timely survey of the current state of research on bridging anaphora resolution.

Task Definition: Some Historical Perspectives
The definition of bridging has evolved over the years, particularly with respect to (1) the types of relations bridging should cover, and (2) the types of linguistic expressions that can serve as bridging anaphors. In this section, we take a closer look at these two issues.
First, what types of relations should bridging cover? As a linguistic phenomenon, bridging has been studied extensively by linguists (e.g., Clark (1975), Prince (1981, Gundel et al. (1993)). Clark (1975), who started this area of research, introduced a broad concept of bridging that includes coreference (i.e., the identity relation). Coreference, however, is gradually being excluded from bridging over time. For instance, while some early studies still included the difficult cases of coreference where two coreferent mentions do not share the same head as bridging (Poesio and Vieira, 1998;Vieira and Poesio, 2000;Bunescu, 2003) 1 , most of the recent studies focus on non-identity cases of bridging, which is the closest to Hawkins's (1978) concept of associative anaphora. Among the non-identity relations, bridging covers various types of semantic relations. While early studies typically restrict themselves to predefined relations such as part-of, subset, set membership, and possession relations (Poesio and Vieira, 1998;Poesio et al., 2004b), recent studies claim that bridging is a diverse phenomenon that cannot be simply captured with a limited set of predefined relations (Markert et al., 2012;Rösiger, 2018a).
Second, what types of linguistic expressions can serve as bridging anaphors? Many traditional studies (Hawkins, 1978;Poesio and Vieira, 1998;Lassalle and Denis, 2011;Rösiger, 2016) limited bridging anaphors to definite expressions, excluding indefinite expressions since they generally introduce new information that can be interpreted without the discourse context. However, Löbner (1998) claimed that bridging anaphors can also be indefinite because these indefinite expressions can have semantic relations with preceding expressions. Recent studies therefore allow both definite and indefinite expressions to serve as bridging anaphors (Poesio and Artstein, 2008;Markert et al., 2012;Rösiger, 2018a).

Corpora
This section provides an overview of existing corpora used for bridging research, with a focus on four widely-used English corpora, namely ISNotes (composed of 50 WSJ articles in OntoNotes) (Markert et al., 2012) , BASHI (The Bridging Anaphors Hand-annotated Inventory, composed of another 50 WSJ articles in OntoNotes) (Rösiger, 2018a), ARRAU (composed of articles from four domains, RST, GNOME, PEAR, and TRAINS) (Poesio and Artstein, 2008;Uryupina et al., 2020), and SciCorp (The Scientific Corpus, composed of scientific articles from computational linguistics and genetics) (Rösiger, 2016). Table 1 compares these corpora along five dimensions: (1) the domain type, (2) the size (in terms of the number of documents, tokens, and mentions), (3) the number of bridging anaphors, (4) the types of anaphor, and (5) the types of antecedent. While early corpora limited anaphors to definite NPs and predefined relations (Poesio and Vieira, 1998;Poesio et al., 2004b), many of these newer corpora do not. For instance, ISNotes and BASHI include both definite and indefinite expressions as anaphors and both entity and event mentions as antecedents; moreover, they do not restrict bridging relations to predefined relations. Also, all of these corpora contain coreference in addition to bridging annotations. In addition to the differences shown in Table 1, there are several notable differences among these corpora: Referential vs. lexical bridging.  introduced the notions of referential bridging and lexical bridging as a way to explain a key difference between ARRAU and the other corpora. Referential bridging refers to the cases in which the bridging anaphor cannot be interpreted without the antecedent (e.g., the window in Tim walked into the room. The window was broken), whereas lexical bridging refers to the cases where the reference can be interpreted independently of the antecedent (e.g., Tokyo in The capital of Japan is Tokyo). While ISNotes, BASHI and SciCorp are composed of referential bridging references, ARRAU contains both referential and lexical bridging references, with lexical bridging references being the majority. Information status. Information status (IS), a linguistic notion that is related to bridging, describes the extent to which a discourse entity is available to the hearer/reader. At a coarse level, a discourse entity's IS is (1) OLD to the hearer if it is known to the hearer and has previously been referred to; (2) NEW if it is unknown to her and has not been previously referred to; and (3) MEDIATED if it is newly mentioned but the hearer can infer its identity from a previously-mentioned entity or world knowledge. By definition, bridging is a subcategory of MEDIATED. While BASHI does not contain IS annotations, ISNotes has eight IS classes ("new", "old", and six subclasses of mediated (one of them is bridging)), ARRAU has three ("new", "old", and non-referring), and SciCorp has eight (one of them is bridging). Predefined relations. Some corpora provide the semantic relation type of each bridging link. In IS-Notes, a link is labeled with one of the following relation types: part-of/attribute-of, set, and other (including encyclopedic and frame relations). The RST domain of ARRAU also has annotations of predefined relations, which include possessive, subset, element, comparative (labeled as "other"), everything else (labeled as "underspecified"), as well as the inverse of each of these relation types. Comparative anaphora. A comparative anaphor is a non-identity anaphor that is compared to another mention (Modjeska, 2003). In ISNotes, comparative anaphors are excluded from the bridging category because such anaphors often have surface indicators, containing modifiers such as "other" and "another" (Markert et al., 2012). In contrast, BASHI and ARRAU consider them as a subcategory of bridging.
Parallel bridging corpora are also available. For instance, Copenhagen Dependency Treebank is a parallel corpus involving Danish, English, Italian, German, and Spanish (Korzen and Buch-kromann, 2011), and CorefPro is a parallel corpus involving German, English, and Russian (Grishina, 2016).
While not widely used, GUM is an ever-expanding English corpus annotated with bridging links by students at Georgetown University (Zeldes, 2017).

Evaluation Issues
As mentioned before, an issue surrounding bridging resolution research concerns the lack of a standardized evaluation protocol. In this section, we take a look at current evaluation practices. Evaluation settings. Bridging resolvers operate in one of three settings.
In end-to-end bridging resolution, a system is given a raw document as input. The goal is to identify the bridging anaphors (a subtask known as bridging recognition) and resolve each of them to its antecedent. Since this setting is considered very challenging, none of the existing bridging resolvers are evaluated in an end-to-end fashion. In full bridging resolution, a system is given as input not only a document but also the gold (i.e., hand-annotated) mentions in the document. The goal is to identify the subset of the gold mentions that are bridging anaphors and resolve them to their antecedents, which are also chosen from the gold mentions. In principle, gold mentions are mentions that can participate in a bridging relation. In practice, gold mentions are typically much smaller than the set of possible mentions. Since full bridging resolution constrains the selection of anaphors and antecedents to those that are gold mentions, it is less challenging than the end-to-end setting. Gold mentions are defined slightly differently in different corpora. In ISNotes, gold mentions include NPs, possessive nouns/pronouns, premodifiers, and verbs. In BASHI, gold mentions are assumed to be all and only those NPs that can be extracted from gold parse trees. In ARRAU, gold mentions include all NPs, possessive pronouns, and a subset of premodifiers. In SciCorp, gold mentions are definite NPs, which include definite descriptions, named entities, and pronouns. Finally, in bridging resolution, a system is given as input not only a document and the gold mentions it contains, but also the gold anaphors. The goal is to resolve each gold anaphor to its antecedent, which is chosen from the given set of gold mentions. This setting is the least challenging of the three, as it focuses solely on resolution and does not require bridging anaphors to be identified. Evaluation metrics. For full bridging resolution, results are reported for both recognition and resolution in terms of precision, recall, and F-score. For recognition, recall is defined as the fraction of gold anaphors that are correctly identified, whereas precision is defined as the fraction of anaphors identified by the system that are correct. For resolution, recall and precision can be defined in a similar fashion. For bridging resolution, since gold anaphors are given, results are reported in terms of resolution accuracy, which is the fraction of gold anaphors that are correctly resolved. Entity-vs. mention-based evaluation. A resolver needs to resolve an anaphor to an antecedent chosen from a set of candidate antecedents. For (full) bridging resolution, the candidate antecedents can simply be taken to be the set of gold mentions that appear in all of the previous sentences or a fixed sentence window (Poesio et al., 2004a). Slightly more sophisticated candidate selection strategies have been employed. For instance, the window size can be tuned for each rule in rule-based systems (Hou et al., 2014;Rösiger, 2018b;. The top k salient mentions can be used in addition to those from the fixed window (Hou et al., 2013b;Hou, 2018b;Hou, 2018a). Moreover, Hou et al. (2013b) and Hou et al. (2018) have proposed an entity-based evaluation method where an anaphor is resolved to a preceding entity rather than a preceding mention. The idea is to first use gold coreference information to group the candidate antecedents of an anaphor into coreference clusters, and then extract cluster/entity-level features for encoding each of the resulting clusters/entities. The goal of the resolver is to resolve the anaphor to one of these clusters/entities based on the extracted features. Note that the resolution task is simplified when an anaphor is resolved to a cluster/entity as opposed to a candidate antecedent, because the number of clusters/entities is smaller than the number of candidate antecedents. Moreover, the use of gold coreference chains to produce entities and extract cluster-level features also makes it unfair to compare these entity-based evaluation results against other results. Anaphor filtering. Several kinds of bridging anaphors are excluded from evaluation. One filtering rule says that any bridging anaphor whose closest antecedent is coreferent with it should be excluded from evaluation (Hou et al., 2014). This is understandable as these anaphors should be resolved by a coreference resolver instead. Another rule excludes a bridging anaphor from consideration as long as one of its antecedents is coreferential with it (Rösiger, 2018b). We believe that this rule is rather unmotivated, and may remove bridging links that cannot otherwise be recovered from other mentions.
Some rules filter bridging anaphors that are "problematic". Rösiger (2018b) enumerates exactly what is being filtered (i.e., anaphors with multiple antecedents, antecedents spanning more than one sentence, empty antecedents and discontinuous markables). In contrast, Hou (2018a) simply says that "a few problematic cases on each corpus" are filtered out without even mentioning why they are problematic. The lack of such details may make it difficult to replicate her results. Antecedent filtering. Besides anaphor filtering, there have also been attempts to filter candidate antecedents prior to resolution in order to improve resolution performance. For instance, Hou et al. (2013b) exclude candidate antecedents that are coreferent with a bridging anaphor. The motivation is that by definition, these candidate antecedents cannot serve as the antecedents of a bridging anaphor. However, to ensure a fair comparison between systems that employ filtering and those that do not, we believe that predicted, rather than gold, coreference information should be used in the filtering process. Data splits. While ARRAU has a standard train-dev-test split, the other corpora do not. In the absence of a standard data split, resolvers are evaluated via k-fold cross validation, which makes a head-to-head comparison of their results difficult.

Rule-based Approaches
Virtually all early ruled-based resolvers operate in the least challenging setting, i.e., bridging resolution.  use a heuristic to resolve bridging anaphors based on synonymy, hyponymy, and meronymy relations from WordNet 1.6. Poesio et al. (1997) improve this system by limiting the use of some WordNet relations and improving the antecedent search strategy. For further improvement of this system, Poesio et al. (2002) complement WordNet coverage with another lexical resource of meronymy relations, which is acquired by querying syntactic patterns such as NP of NP and NP's NP in the British National Corpus. To have a large corpus as a resource for acquiring semantic relations,  use the Web to extract meronymy and hyponymy relations.
Following an early rule-based bridging system (Vieira and Poesio, 2000) 2 , all recently-developed rulebased bridging systems are composed of rules that perform recognition and resolution at the same time. ruleset that still capture common patterns that appear both in ISNotes and ARRAU, and add eight rules that are designed specifically for ARRAU. One disadvantage of rule-based bridging resolvers, which is also true for rule-based systems in general, is that new rules may need to be designed for a new corpus annotated with a different scheme. Table 2 shows the rules designed by Hou et al. (2014) and  for full bridging resolution in ISNotes. The rules are sorted by precision and should be applied in the order in which they are presented in the table. Each rule is composed of two conditions: one on the anaphor and the other on the antecedent. If the two mentions satisfy these conditions, the rule will posit a bridging link between them. In the table, each rule is expressed in terms of its name, the condition on the anaphor, the condition on the antecedent, the motivation behind its design, and its recognition and resolution recall and precision on ISNotes (I), BASHI (B), and ARRAU RST (A). As we can see, these are mostly lowrecall rules: many bridging anaphors cannot be recognized or resolved using these rules. Moreover, each rule has different performances (in terms of recognition and resolution) on different corpora, meaning that these rules, which are designed for ISNotes, do not generalize across corpora.

Learning-based Approaches
We divide existing learning-based approaches into three categories. Feature-based approaches. In these approaches, a pairwise classifier, known as the mention-pair model in the coreference resolution literature (Soon et al., 2001;Ng and Cardie, 2002), is trained to determine whether two mentions has a bridging relation. Each training instance therefore corresponds to two mentions, one of which is a bridging anaphor and the other is its candidate antecedent. If the candidate antecedent is its correct antecedent, the instance is labeled as POSITIVE; otherwise, it is labeled as NEGATIVE. Table 3 shows the list of features that have been used to train the mention-pair model.
The mention-pair model works well if a resolver is given gold anaphors as input. To perform full bridging resolution, in which gold mentions are given, we need to first train a "recognition" classifier  Table 2: Rules for resolving bridging anaphors in ISNotes. The first eight rules are proposed by Hou et al. (2014) and the last rule is proposed by . 'I', 'B', and 'A' refer to ISNotes, BASHI, and ARRAU RST respectively.
to identify the bridging anaphors from the gold mentions and then pass the resulting anaphors to the mention-pair model for resolution. While in principle a binary classifier can be trained to determine whether a gold mention is an anaphor or not, previous work has trained classifiers for determining the IS of a mention and assumed that those mentions that are classified as "bridging" are bridging anaphors. Table 4 enumerates the features that have been proposed to train a classifier for determining the IS of a mention (Nissim, 2006;Rahman and Ng, 2011;Cahill and Riester, 2012;Markert et al., 2012;Rahman and Ng, 2012a;Hou et al., 2013a;Hou, 2016;Hou et al., 2018).

Syntactic features
Co-argument whether mi and mj are the subject and the object of the same verb respectively Hou et al. (2013b) Parallel structure whether mi has the same syntactic role and is in the same sentence (but not the same clause) as mj Hou et al. (2018) Closest modifier whether mi's syntactic head is a modifier of one or more of the occurrences of the lemma of mj's head in the associated text Hou et al. (2018) Semantic features WordNet query whether mi and mj have a "part-of" relation in WordNet Hou et al. (2013b) Google distance number of hits of the query "the X of the Y" returned by Google, where X is the mj's head and Y is mi's head Poesio et al. (2004a) WordNet distance (the inverse value of) the shortest path length between mi's head and mj's head among all synset combinations Poesio et al. (2004a) Hou et al. (2018 Verb pattern (relative) the semantic compatibility (expressed in PMI) between mi and mj's governing verb Hou et al. (2018) Verb pattern (top) whether mi is the candidate antecedent that has the highest semantic compatibility with mj's governing verb Hou et al. (2013b) Preposition pattern (relative) hit count (converted into the Dunning Root Log Likelihood association measure) obtained by querying the pattern X prep Y where X is mj, Y is mi, and prep is one of the three prepositions most frequently associated with X

Hou et al. (2013b)
Preposition pattern (top) whether mi is the candidate antecedent that has the highest Dunning Root Log Likelihood association measure with mj using the aforementioned preposition patterns  Embedding approaches. Hou (2018b) observes that commonly used word representations such as GloVe (Pennington et al., 2014) capture "genuine" similarity and relatedness, but bridging resolution requires lexical association knowledge instead of semantic similarity information between synonyms or hypernyms. This motivates her to train task-specific embeddings for bridging resolution. To do so, she first observes that the prepositional (i.e., X of Y) and possessive structures (i.e., Y's X) of NPs encode a variety of bridging relations between anaphors and their antecedents. For example, the window of the room implies a part-of relation between the window and the room, and in Japan's prime minister, there is a bridging relation between Japan and prime minister. Then she extracts noun pairs involved in these syntactic structures from the parsed Gigaword corpus and uses them as distant supervision signals to train   an embedding model, embeddings PP. The resulting embeddings can be used to select an antecedent for a bridging anaphor by calculating the vector similarity between the anaphor's head and a candidate antecedent's head. Moreover, she combines embeddings PP, which covers only a subset of nouns, and the GloVe embeddings so that both non-nouns and additional nouns are covered (Hou, 2018a).
Neural models. Yu and Poesio (2020) propose the first neural model for full bridging resolution, leveraging a span-based neural model originally developed for entity coreference resolution by Kantor and Globerson (2019). Kantor and Globerson's span-based model is a mention-ranking model (Denis and Baldridge, 2008), meaning that it is trained to rank the candidate antecedents of an anaphor so that the correct antecedent has the highest rank. Key to the success of this and other span-based coreference models is their ability to learn text spans corresponding to entity mentions as well as their representations so    Kantor and Globerson's model. First, they provide gold mentions as input to the model, meaning that the model needs to learn the span representations but not the span boundaries. Second, and more importantly, they propose to train the model to perform coreference and bridging in a multi-task learning (MTL) framework. In this framework, the span representation layer is shared by the two tasks, so that information learned from one task can be utilized when learning the other task. Unlike feature-based approaches, where feature engineering plays a critical role in performance, this neural model employs only two features, the length of a mention and mention-pair distance. More recently, Hou (2020) has proposed a neural approach to bridging resolution based on question answering (QA). Given a gold anaphor, the idea is to (1) create a question from the anaphor in the form of "anaphor of what?", (2) create candidate answers from the candidate antecedents (i.e., the preceding mentions that appear in a fixed sentence window), and (3) use a BERT-based QA system pretrained on the SQuAD corpus (Joshi et al., 2020) to choose the most probable answer (i.e., the antecedent). An appealing aspect of this approach is that it does not require any gold or system mention information as the antecedent candidates. Hou further hypothesizes that the results could be improved if the model were pretrained on a bridging corpus rather than a QA corpus. However, as mentioned before, all existing bridging corpora are too small to train an effective neural model. To address this problem, Hou employs a distant supervision method (see the embedding approaches above) to generate an automatically labeled bridging corpus, and subsequently shows that the model pretrained on this bridging corpus offers better performance than the one pretrained on SQuAD.

The State of the Art
To better understand the state of the art, we show the best results achieved on bridging resolution and full bridging resolution by different approaches on three commonly used datasets (ISNotes, BASHI, and ARRAU RST) in Tables 5 and 6 respectively. Note that (1) these results are taken verbatim from the respective papers, and (2) not all of them are directly comparable, as some rely on gold coreference chains to compute cluster-level features or perform anaphor filtering.
As we can see in Table 5, solid progress is being made for bridging resolution, even though the best accuracy is only around 50%. Full bridging resolution results are shown in Table 6. Note that Hou et al.'s (2018) resolver cannot be directly compared with the other two because it uses gold coreference information in a different way. Comparing Rösiger et al.'s (2018b) rule-based system with Yu and Poesio's (2020) MTL model, we see that the neural model has achieved solid improvements in both recognition and resolution on ARRAU and ISNotes. Additional experiments are needed to understand why similar improvements are not observed on BASHI. Nevertheless, the best resolution F-score is below 25%. Overall, these results suggest that both bridging resolution and full bridging resolution are far from being solved.

Concluding Remarks
We have presented a survey of the current state of research on bridging resolution, a task that is far from being solved. Given that significant advances have been made on coreference resolution recently, we believe that bridging resolution will be the anaphora resolution task that will receive increasing attention in the near future. We conclude this paper with several recommendations on future research directions. Resources and evaluation. The CoNLL 2011 and 2012 shared tasks (Pradhan et al., 2011;Pradhan et al., 2012) have played a crucial role in the accelerated progress on entity coreference resolution in the past few years by providing a standardized evaluation protocol (e.g., standard evaluation corpus and metrics) that facilitates performance comparisons of different resolvers. Progress on current bridging resolution research, which is reminiscent of that of entity coreference research in the pre-CoNLL era, is hindered in part by the lack of such standardization. As we move forward, it is imperative that a common evaluation framework be established for bridging research. Part of this effort should include the development of an annotated corpus that is much larger than those currently available. While the use of distant supervision (to produce automatically labeled data) and pretrained models (to transfer knowledge from other tasks) may have reduced the amount of annotated data needed for model training, we believe having a large annotated corpus will not only stimulate interest in the task among researchers in other areas (e.g., by allowing them to develop complex models) but also provide task-specific linguistic insights. Corpora that contain other discourse-level annotations (e.g., the discourse relations in the Penn Discourse Treebank (Prasad et al., 2008)) would be ideal choices, as they can facilitate the development of joint models that enable the study of the potential interactions between bridging and other discourse phenomena. Cross-area collaboration. While bridging has been studied primarily by discourse researchers, the task is so broad that it covers many semantic relations studied by researchers in lexical semantics and information extraction, such as meronymy (Hearst, 1998;Berland and Charniak, 1999;Girju et al., 2006), hyponymy (Hearst, 1992;Cederberg and Widdows, 2003;Pantel and Pennacchiotti, 2006), and set-membership. While space limitations have precluded a detailed discussion of this line of related work, it is important to note that some of the ideas we have seen in this paper have also been explored in research on extracting these specific relations. For instance, like Hou (2018b), Hearst (1992) and  have explored the use of lexico-syntactic patterns for automatically harvesting hyponyms and antecedents of other-anaphora, respectively. As another example, Girju et al.'s (2003) decision tree approach to part-whole relation extraction has employed many of the features that are also used in feature-based approaches to bridging resolution. Rather than reinventing the wheels, we encourage researchers in these different areas work together on bridging resolution. In particular, it is conceivable that the best approach to bridging resolution may involve first decomposing the task into different types of relations and then have researchers from different areas address each type of relations. Models. Existing bridging resolvers have all assumed as input either gold anaphors or gold mentions, making them unusable in practice where such gold annotations are not available. Consequently, we believe that time is ripe for end-to-end bridging resolution. Though it is a very challenging evaluation setting, we believe that researchers should examine whether the successes of end-to-end neural models developed for various NLP tasks can be transferred to bridging resolution. In addition, researchers may consider hybrid rule-based and learning-based models for bridging resolution. So far, rule-based and learning-based approaches have been viewed as distinct approaches in bridging research, but it is worth investigating whether they have complementary strengths. Finally, while much of the work on bridging has been conducted for English, we believe it is time to examine multilingual bridging resolution. Since we have bridging-annotated data for multiple languages including parallel corpora, it would be interesting to see if multilingual bridging resolution can be addressed using projection-based techniques (Yarowsky et al., 2001;Postolache et al., 2006;Rahman and Ng, 2012b;Grishina and Stede, 2015;Martins, 2015).