Argument Mining for Scholarly Document Processing: Taking Stock and Looking Ahead

Argument mining targets structures in natural language related to interpretation and persuasion which are central to scientific communication. Most scholarly discourse involves interpreting experimental evidence and attempting to persuade other scientists to adopt the same conclusions. While various argument mining studies have addressed student essays and news articles, those that target scientific discourse are still scarce. This paper surveys existing work in argument mining of scholarly discourse, and provides an overview of current models, data, tasks, and applications. We identify a number of key challenges confronting argument mining in the scientific domain, and suggest some possible solutions and future directions.


Introduction
Scientific papers aim to present verifiable evidence for a series of stated claims, anchoring these claims in experiments, data, and references. However, the interpretation of such objective sources of evidence is often ambiguous and subjective. Thus, much of scientific communication is essentially persuasive and uses an argumentative structure to establish the relevance, validity, and novelty of an author's main claims and conclusions (Pelclova and Weilun, 2018). This argumentation takes the form of a dialogue between the author and her readers, in which new knowledge is proposed and an attempt made to persuade the readers to accept and follow particular claims (Fahy, 2008;Hyland, 2014). However, most current research on automatic document processing ignores this argumentative context and treats statements that are persuasive, tentative, or speculative to be factual. This risks overstating the certainty of claims and hypotheses, and bypasses *These authors contributed equally the rhetorical aspect of scientific discourse (see e.g. (Gross and Chesley, 2012)).
Computational argumentation is a recent and growing field of research concerned with the computational analysis and generation of natural language arguments and argumentative discourses. Over the past decade, this area has attracted researchers seeking to tackle different tasks including argument mining, argument quality assessment, and argument generation (for an overview, see e.g. (Stede et al., 2018)). The most studied task is argument mining, i.e., the identification of argumentative units, argument components (e.g., conclusion and premise), and structures of text documents. However, despite a wealth of Natural Language Processing (NLP) research on extracting information from scientific literature-including entity extraction (Augenstein et al., 2017;Hou et al., 2019), relation identification (Luan et al., 2018), question answering (Demner-Fushman and Lin, 2007), and summarization (Erera et al., 2019)-relatively few attempts have been made to model argumentative structures in science. This paper argues for an increased focus of the NLP community on argument mining in scientific documents. To encourage work at the intersection of Scholarly Discourse Processing and Argument Mining, we provide a brief overview of current work in this field, and discusses the most used models, data, methods, and applications. We discuss a number of challenges in mining the argumentative structure of scientific documents and propose some promising future directions.

Argumentation in Scientific Discourses
To support future efforts on argument mining of scientific documents, we present a survey of the literature from 2000 to the present, summarized in Table 1 in the Appendix. To attempt to create a somewhat comprehensive overview, we concentrated on papers published by the NLP commu-nity 2 . To obtain this list, we used Google Scholar (https://scholar.google.com/) to find papers on "Argumentation Mining on Scientific Papers", "Argumentation Mining on Research Papers", and "Argumentative Zoning on Scientific Papers". We also traced the references of some pivotal papers from the proceedings of Argument Mining workshops 3 .
For each paper, we identified the Domain of study (i.e., a specific scientific domain, full-text or abstracts), the Objectives of the work, and the Methods used. Furthermore, the papers can be categorized under four areas of study, discussed, in turn, below.

Corpus Creation and New Annotation Schemes
A number of studies propose an annotation scheme for mining argumentative discourse in the science domain. Many of these studies follow the wellknown argumentation model of Toulmin (Toulmin, 1958). Toulmin's model targets the structure of an argument, modelling it as a claim that is supported by data following some warrants, which can be supported by backing. The model has also two optional components: qualifiers and rebuttals.
Examples of the studies that adopt Toulmin's model are Green (2014) andLauscher et al. (2018b). The former proposes the scheme of premise (i.e., data and warrant) and conclusion. The latter's scheme includes background claim, own claim, and data, which is used to annotate 40 publications from computer graphics.
Another model that is often used is that of argumentation schemes (Walton et al., 2008). Argumentation schemes target the structure of an argument, where the argument is modeled as a set of propositions, i.e., a conclusion and one or more premises, with a pattern that manifests the logical inference between the conclusion and its premise. Walton et al. (2008) proposed around 60 different schemes including 'argument from cause to effect' and 'argument from example', among others. An example of this approach is Green (2015a), where ten schemes were selected and annotated in a corpus of biomedical genetics articles.
Other studies focus on identifying argumentative discourse roles, especially argumentative zones (Teufel and Moens, 2002), assigning roles such as 'aim' and 'background' to large text spans (usually paragraphs). Following this approach, several corpora have been constructed for biomedical papers (Guo et al., 2011), as well as papers in chemistry, computational linguistics (Yang and Li, 2018), and agriculture (Teufel, 2014).
Inspired by the theory of Freeman (2011), some studies annotate the argumentative relations between arguments. For instance, Lauscher et al. (2018a) consider the relations of 'support', 'contradicts', and 'same claim '. Kirschner et al. (2015), in another study, consider the relations of 'support', 'attack', 'detail', and 'sequence', which were annotated in 24 articles belong to the domain of educational and developmental.

Automatic Argument Unit Identification
Much work in argument mining focuses on identifying Argumentative Discourse Units (ADUs). An ADU is a text span that plays a specific role in an argument. In this way, argument unit identification resembles named entity recognition or discourse segment type identification. Green (2017b) extracted argumentative units from biomedical and biological articles using a semantic rule-based approach. Lauscher et al. (2018a) andLauscher et al. (2018c) proposed several neural multi-task learning models based on Bi-LSTM to identify premises and conclusions. Other papers propose different approaches to identify argumentative zones, including supervised and weakly-supervised approaches with a rich set of linguistics features (e.g., (Guo et al., 2011)). Identifying the 'claim' unit is tackled in several papers such as Achakulvisut et al. (2019), which employs transfer learning on top of a discourse tagging model using a pre-trained BilSTM-CRF to identify claims in biomedical abstracts. Extracting 'evidence' has been tackled in other studies, e.g. Li et al. (2019) extracted evidence in biomedical publications with sentence-level sequential labelling, using BiLSTM-CRF and attention.

Automatic Argument Structure Identification
If unit identification resembles entity recognition, argument structure identification is akin to relation extraction: this work aims to find typed relationships between ADUs. This more challenging task has been addressed by relatively few studies: Accuosto and Saggion (2020) extend existing discourse parsing models to address this problem on computational linguistics abstracts and identify the argumentative discourses of computational linguistics abstracts using lexical and ELMo embeddings, while Song et al. (2019) analyze the argument structure of information science and biomedical science articles through sequential pattern mining.
Applications To date, much of the applicationoriented work has focused on scientific article summarization. An exception is Feltrim and Teufel (2004), which had the goal of developing tools for scientific writing for the computer science domain. Other efforts aim to identify claims and evidence, to enable claim-evidence based representations of collections of documents, such as (de Waard et al., 2009), (Groza et al., 2011) and (Li et al., 2021. The goal here is to allow the reader to traverse the reasoning behind a scientific claim to either experimental evidence in the paper itself, or to reasoning for data provided in cited papers. Recently, Yu et al. (2020) study the problem of correlation-tocausation exaggeration in press releases by comparing claims made in news articles and the corresponding scientific papers.

Challenges
In this section, we describe a few challenges that are relevant to argument mining in the scientific literature. Although not only specific for the scientific domain, these are hurdles that need to be faced in future research to allow progress to be made.

Argumentation
Modeling As described above, various argument models have been proposed (Stede et al., 2018). The selection of which model fits scientific documents is a crucial and challenging research question.
Most previous studies in argument mining of scientific documents utilize either Toulmin's model or argumentation schemes. However, none of these models seems to be a perfect fit: Toulmin's warrants and rebuttals are not common to scholarly argumentation 4 , and none of the other argument schemes take the specific nature of scholarly argumentation into account. Adapting these models for use seems to be an essential step to achieve feasible annotation and identification of argument structures in scholarly discourse.
Domain Knowledge Science communication encompasses a variety of domains, topics, and methodologies organized into research communities, each following its own standards regarding the structuring of documents and the arguments they contain (Weinstein, 1990). These community conventions present a barrier to understanding for nonspecialists and computational models alike. An important open question, therefore, is whether argument mining techniques must be tailored to individual scientific communities, or whether a unified model can be adapted to address domain-specific features of scientific argumentation.
Scientific Document Type Scientific communication involves a variety of document types, including reviews, methods papers, and experimental reports, among others 5 . Each type concentrates on specific aspects of the discussed topic and usually provides particular types of evidence.
Analogous to the previous point, an open question is whether different document types require different models, or whether they can be accommodated by a single representation and modeling approach tailored to different argument structures.
Enthymemes An enthymeme is the implicit (unstated) premise or conclusion in an argument. Because enthymemes are supposedly known by the target audience (or easily constructed using common knowledge), enthymeme are rarely a problem for humans. However, to the extent that shared knowledge is required which is not found in the document, this offers a challenge for argument mining techniques.
As an example, Green (2014) conducted a manual inspection of several arguments in the biomedical genetics research literature, showing that arguments with enthymemes are common there and suggested explicitly providing domain knowledge for reconstructing enthymemes.
Subjective Interpretation A common dilemma in argument mining is that an argumentative text may have multiple valid interpretations of its structure. This is a concern for scientific documents, where the connection between a claim and its evidence can be implicit, i.e., the author leaves this connection to the readers' interpretations.
In particular, experimental papers can follow a line of reasoning that makes e.g. 'biological sense', i.e. where a specific experiment follows another experiment to address a potential alternate interpretation of the previous experiment. For a nonbiologist, this reasoning is unclear, and the reason for these subsequent results are generally never explicitly stated in the text.
Context-Dependence Context plays a key role in text mining in general and argument mining in particular. Scientific documents are at least as complex as other genres where argument plays a role, such as persuasive essays, to fulfil both the persuasive role and the presentation of objectivity which scientific writing demands (Vazquez Orta and Giner, 2009-11). More specifically, selecting the optimal boundaries of argumentative units in scientific documents is known to be challenging (Green, 2014;Stab et al., 2014). For instance, the distance between a claim and its premise may be particularly wide in scientific discourse, e.g., the claim which is stated in one section can be supported by a premise in a different section.

Discussion
In summary, we have provided a brief overview of current work and a summary of issues that need to be addressed to make headway in the automated argument mining for scholarly documents. We hope to have shown that more research is needed in this field to enable better representation of the persuasive aspects of scholarly communication. This can help provide a more realistic representation of how scientific knowledge is obtained, and how authors aim to persuade readers of the validity of claims. In particular, seeing scholarly discourse as a pragmatic discourse, i.e. one that humans undertake with interpersonal, as well as informative goals, can allow richer representations of the knowledge structures underlying scientific progress.
As noted, applications of argument mining in scientific discourse, such as summarization and aids to technical writing, to date have been limited to those that are relatively robust to errors, a partial consequence of the immaturity of the field. In particular, these applications are mostly insensitive to the factual content of scientific arguments. Meanwhile, a relatively mature community continues to expand models and methods for information extraction in various scientific domains, usually with no attention to the argumentative context in which the target facts are presented. Because a correct understanding and use of facts is critical to scientific understanding and progress, we see an opportunity for many innovative applications at the intersection of fact and argument. For example, models capable of determining the salience of individual facts in a domain could provide the basis for highly precise forms of scientific information retrieval, or even offer forms of automation that assist scientists in maximizing the pertinence of their experiments.
To achieve this vision at scale, the argument mining community must grapple with the problem of increasing scientific domain specialization. It is crucial that we separate the invariant features of scientific argumentation from those that vary with field and specialization, and that we investigate effective methods of cross-domain transfer. To this end, the field should seek consensus regarding how scientific argumentation should be formalized and strive for broad-coverage reference corpora annotated under guidelines optimized for high interannotator agreement.
To support these efforts, we suggest a greater collaboration between participants of the scholarly document processing and argument mining domains, with a particular focus on creating shared models and shared and accessible corpora to spur on research. We hope such conversations can commence at this workshop and others, to inspire and unite members of both communities with natural language processing and improve sharing and improving the outputs of science and scholarship.

Conclusion
This paper endeavors at promoting the collaboration between the communities of scholarly discourse processing and computational argumentation, arguing for the ultimate importance of more extensive research on argument mining in scientific documents. Particularly, we address the current contributions on argument mining for scientific documents by surveying about 40 papers that approach different aspects and tasks such as proposing annotation schemes, creating corpora, and identifying argumentative discourse units as well as argumentative relations in scientific documents. Furthermore, we describe various challenges for mining argumentative structures of scientific documents and suggest some strategic directions in order to accomplish remarkable benefits on a wide range of downstream applications such as scientific writing assistance, scientific articles summarization, and quality assessment.

A Appendix
Please follow in the next page.
Biomedical papers Extraction of connections or "higher order relations" between biomedical relations (relationship between biomedical entities). The higher order relation conveys a causal sense, which indicates that the latter relation causes the earlier one.
In the first stage, the authors use a discourse relation parser to extract the explicit discourse relations from text. In the second stage, the authors analyze each extracted explicit discourse relation to determine whether it can produce a higher order relation. Argument unit identification and relation extraction Explore two transfer learning approaches in which discourse parsing is used as an auxiliary task when training argument mining models Propose a new annotation schema and use it to augment a corpus of computational linguistics abstracts that had previously been annotated with discourse units and relations Song et al. (2019) Information Science and Biomedical articles Apply sequential pattern mining to analyse the common argument structure in two scientific domains (Information science and biomedical science)