PaperRobot: Incremental Draft Generation of Scientific Ideas

We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper. Turing Tests, where a biomedical domain expert is asked to compare a system output and a human-authored string, show PaperRobot generated abstracts, conclusion and future work sections, and new titles are chosen over human-written ones up to 30%, 24% and 12% of the time, respectively.


Introduction
Our ambitious goal is to speed up scientific discovery and production by building a PaperRobot, who addresses three main tasks as follows.
Read Existing Papers. Scientists now find it difficult to keep up with the overwhelming amount of papers. For example, in the biomedical domain, on average more than 500K papers are published every year 2 , and more than 1.2 million new papers are published in 2016 alone, bringing the total number of papers to over 26 million (Van Noorden, 2014). However, human's reading ability keeps almost the same across years. In 2012, US scientists estimated that they read, on average, only 264 papers per year (1 out of 5000 available papers), which is, statistically, not different from what they reported in an identical survey last conducted in 2005. PaperRobot automatically reads existing papers to build background knowledge graphs (KGs), in which nodes are entities/concepts and edges are the relations between these entities (Section 2.2). Create New Ideas. Scientific discovery can be considered as creating new nodes or links in the knowledge graphs. Creating new nodes usually means discovering new entities (e.g., new proteins) through a series of real laboratory experiments, which is probably too difficult for Paper-Robot. In contrast, creating new edges is easier to automate using the background knowledge graph as the starting point. Foster et al. (2015) shows that more than 60% of 6.4 million papers in biomedicine and chemistry are about incremental work. This inspires us to automate the incremental creation of new ideas and hypotheses by predicting new links in background KGs. In fact, when there is more data available, we can construct larger and richer background KGs for more reliable link prediction. Recent work (Ji et al., 2015b) successfully mines strong relevance between drugs and diseases from biomedical pa-  Figure 2: PaperRobot Architecture Overview pers based on KGs constructed from weighted cooccurrence. We propose a new entity representation that combines KG structure and unstructured contextual text for link prediction (Section 2.3).
Write a New Paper about New Ideas. The final step is to communicate the new ideas to the reader clearly, which is a very difficult thing to do; many scientists are, in fact, bad writers (Pinker, 2014). Using a novel memory-attention network architecture, PaperRobot automatically writes a new paper abstract about an input title along with predicted related entities, then further writes conclusion and future work based on the abstract, and finally predicts a new title for a future follow-on paper, as shown in Figure 1 (Section 2.4).
We choose biomedical science as our target domain due to the sheer volume of available papers. Turing tests show that PaperRobot-generated output strings are sometimes chosen over humanwritten ones; and most paper abstracts only require minimal edits from domain experts to become highly informative and coherent.

Overview
The overall framework of PaperRobot is illustrated in Figure 2. A walk-through example produced from this whole process is shown in Table 1. In the following subsections, we will elaborate on the algorithms for each step.

Background Knowledge Extraction
From a massive collection of existing biomedical papers, we extract entities and their relations to construct background knowledge graphs (KGs). We apply an entity mention extraction and linking system (Wei et al., 2013) to extract mentions of three entity types (Disease, Chemical and Gene) which are the core data categories in the Comparative Toxicogenomics Database (CTD) (Davis et al., 2016), and obtain a Medical Subject Headings (MeSH) Unique ID for each mention. Based on the MeSH Unique IDs, we further link all entities to the CTD and extract 133 subtypes of relations such as Marker/Mechanism, Therapeutic, and Increase Expression. Figure 3 shows an example.

Link Prediction
After constructing the initial KGs from existing papers, we perform link prediction to enrich them. Both contextual text information and graph structure are important to represent an entity, thus we combine them to generate a rich representation for each entity. Based on the entity representations, we determine whether any two entities are semantically similar, and if so, we propagate the neighbors of one entity to the other. For example, in Figure 3, because Calcium and Zinc are similar in terms of contextual text information and graph structure, we predict two new neighbors for Calcium: CD14 molecule and neuropilin 2 which are neighbors of Zinc in the initial KGs.
We formulate the initial KGs as a list of tuples numbered from 0 to κ. Each tuple (e h i , r i , e t i ) is composed of a head entity e h i , a tail entity e t i , and their relation r i . Each entity e i may be involved in multiple tuples and its one-hop connected neighbors are denoted as N e i = [n i1 , n i2 , ...]. e i is

Gene Chemical
Contextual Sentence: So, Ca 2+ possibly promoted caspases activation upstream of cytochrome c release, but inactivated caspase activity by calpain and/or fast depletion of ATP; whereas Zn 2+ blocked the activation ofprocaspase-3 with no visible change in the level of cytochrome c, and the block possibly resulted from its direct inhibition on caspase-3 enzyme. also associated with a context description s i which is randomly selected from the sentences where e i occurs. We randomly initialize vector representations e i and r i for e i and r i respectively. Graph Structure Encoder To capture the importance of each neighbor's feature to e i , we perform self-attention (Veličković et al., 2018) and compute a weight distribution over N e i : where W e is a linear transformation matrix applied to each entity. W f is the parameter for a single layer feedforward network. ⊕ denotes the concatenation operation between two matrices. Then we use c i and N e i to compute a structure based context representation of i = σ c ij n ij , where n ij ∈ N e i and σ is Sigmoid function.
In order to capture various types of relations between e i and its neighbors, we further perform multi-head attention on each entity, based on multiple linear transformation matrices. Finally, we get a structure based context representationẽ where m i refers to the context representation obtained with the m-th head, and e i is the concatenated representation based on the attention of all M heads. Contextual Text Encoder Each entity e is also associated with a context sentence [w 1 , ..., w l ].
To incorporate the local context information, we first apply a bi-directional long short-term memory (LSTM) (Graves and Schmidhuber, 2005) network to get the encoder hidden states H s = [h 1 , ..., h l ], where h i represents the hidden state of w i . Then we compute a bilinear attention weight for each word w i : µ i = e W s h i , µ = Softmax(µ), where W s is a bilinear term. We finally get the context representationê = µ h i . Gated Combination To combine the graph-based representationẽ and local context based representationsê, we design a gate function to balance these two types of information: g e = σ(g e ), e = g e ẽ + (1 − g e ) ê where g e is an entity-dependent gate function of which each element is in [0, 1],g e is a learnable parameter for each entity e, σ is a Sigmoid function, and is an element-wise multiplication. Training and Prediction To optimize both entity and relation representations, following TransE (Bordes et al., 2013), we assume the relation between two entities can be interpreted as translations operated on the entity representations, We use marginal loss to train the model: where (e h , r, t h ) is a positive tuple and (ē h ,r h ,t h ) is a negative tuple, and γ is a margin. The negative tuples are generated by either replacing the head or the tail entity of positive tuples with a randomly chosen different entity. Background: Maspin, a putative tumor suppressor that is down-regulated in breast and prostate cancer, has been associated with decreased cell motility.
Snail transcription factor is a zinc finger protein that is increased in breast cancer and is associated with increased tumor motility and invasion by induction of epithelial-mesenchymal transition (EMT). We investigated the molecular mechanisms by which Snail increases tumor motility and invasion utilizing prostate cancer cells. Methods: Expression levels were analyzed by RT-PCR and western blot analyses. Cell motility and invasion assays were performed , while Snail regulation and binding to maspin promoter was analyzed by luciferase reporter and chromatin immunoprecipitation (ChIP) assays. Results: Snail protein expression was higher in different prostate cancer cells lines as compared to normal prostate epithelial cells.
Background: Snail is a multifunctional protein that plays an important role in the pathogenesis of prostate cancer. However, it has been shown to be associated with poor prognosis. The purpose of this study was to investigate the effect of negatively on the expression of maspin in human nasopharyngeal carcinoma cell lines. Methods: Quantitative real-time PCR and western blot analysis were used to determine whether the demethylating agent was investigated by quantitative RT-PCR (qRT-PCR) and Western blotting. Results showed that the binding protein plays a significant role in the regulation of tumor growth and progression.
Background: Snail is a multifunctional protein that plays an important role in the pathogenesis of prostate cancer. It has been shown associated with poor prognosis. The purpose of this study is to investigate the negative effect of on the expression of Maspin in human nasopharyngeal carcinoma cell lines. Methods: Quantitative RT-PCR (qRT-PCR) and western blot analyses were used to determine correlation of the two proteins expressions. Results showed that the binding protein plays a significant role in the regulation of tumor growth and progression.

Conclusion and Future work
Collectively, our results indicate for the first time that Snail can negatively regulate maspin through direct promoter repression resulting in increased migration and invasion in prostate cancer cells. This study reveals a novel mechanism of how Snail may function and show the importance of therapeutic targeting of Snail signaling in future.
In summary, our study demonstrates that Snail negatively inhibited the expression of Maspin in human nasopharyngeal carcinoma cell lines and in vitro.
Our results indicate that the combination of the demethylating agent might be a potential therapeutic target for the treatment of prostate cancer.
In summary, our study in vitro demonstrates that Snail negatively inhibits the expression of Maspin in human nasopharyngeal carcinoma cell lines. Our results further indicate that Maspin might be a potential therapeutic target for the treatment of prostate cancer.

New Title
Role of maspin in cancer (Berardi et al., 2013) The role of nasopharyngeal carcinoma in the rat model of prostate cancer cells The role of Maspin in the rat model of nasopharyngeal carcinoma cells After training, for each pair of indirectly connected entities e i , e j and a relation type r, we compute a score y to indicate the probability that (e i , r, e j ) holds, and obtain an enriched knowledge graph K = [(e h κ+1 , r κ+1 , e t κ+1 , y κ+1 )...].

New Paper Writing
In this section, we use title-to-abstract generation as a case study to describe the details of our paper writing approach. Other tasks (abstract-toconclusion and future work, and conclusion and future work-to-title) follow the same architecture. Given a reference title τ = [w 1 , ..., w l ], we apply the knowledge extractor (Section 2.2) to extract entities from τ . For each entity, we retrieve a set of related entities from the enriched knowledge graph K after link prediction. We rank all the related entities by confidence scores and select up to 10 most related entities E τ = [e τ 1 , ..., e τ v ]. Then we feed τ and E τ together into the paper generation framework as shown in Figure 2. The framework is based on a hybrid approach of a Mem2seq model (Madotto et al., 2018) and a pointer generator (Gu et al., 2016;See et al., 2017). It allows us to balance three types of sources for each time step during decoding: the probability of generating a token from the entire word vocabulary based on language model, the probability of copying a word from the reference title, such as regulates in Table 1, and the probability of incorporating a related entity, such as Snail in Table 1. The output is a paragraph Y = [y 1 , ..., y o ]. 3 Reference Encoder For each word in the refer-ence title, we randomly embed it into a vector and obtain τ = [w 1 , ..., w l ]. Then, we apply a bi-directional Gated Recurrent Unit (GRU) encoder (Cho et al., 2014) on τ to produce the encoder hidden states H = [h 1 , ..., h l ]. Decoder Hidden State Initialization Not all predicted entities are equally relevant to the title. For example, for the title in Table 2, we predict multiple related entities including nasopharyngeal carcinoma and diallyl disulfide, but nasopharyngeal carcinoma is more related because nasopharyngeal carcinoma is also a cancer related to snail transcription factor, while diallyl disulfide is less related because diallyl disulfide's anticancer mechanism is not closely related to maspin tumor suppressor. We propose to apply memoryattention networks to further filter the irrelevant ones. Recent approaches (Sukhbaatar et al., 2015;Madotto et al., 2018) show that compared with soft-attention, memory-based multihop attention is able to refine the attention weight of each memory cell to the query multiple times, drawing better correlations. Therefore, we apply a multihop attention mechanism to generate the initial decoder hidden state.
Given the set of related entities E = [e 1 , ..., e v ], we randomly initialize their vector representation E = [e 1 , ..., e v ] and store them in memories. Then we use the last hidden state of reference encoder h l as the first query vector q 0 , and iteratively compute the attention distribution over all memories and update the query vector: where k denotes the k-th hop among ϕ hops in total. 4 After ϕ hops, we obtain q ϕ and take it as the initial hidden state of the GRU decoder. Memory Network To better capture the contribution of each entity e j to each decoding output, at each decoding step i, we compute an attention weight for each entity and apply a memory network to refine the weights multiple times. We take the hidden stateh i as the initial queryq 0 =h i and iteratively update it: p kj = ν k tanh W k qqk−1 + U k e e j + Wĉĉ ij + b k u ik =p k e j ,q k = u ik +q k−1 whereĉ ij = i−1 m=0 β mj is an entity coverage vector and β i is the attention distribution of last hop β i =p ψ , and ψ is the total number of hops. We then obtain a final memory based context vector for the set of related entities χ i = u iψ . Reference Attention Our reference attention is similar to (Bahdanau et al., 2015;See et al., 2017), which aims to capture the contribution of each word in the reference title to the decoding output. At each time step i, the decoder receives the previous word embedding and generate decoder statẽ h i , the attention weight of each reference token is computed as: m=0 α mj is a reference coverage vector, which is the sum of attention distributions over all previous decoder time steps to reduce repetition (See et al., 2017). φ i is the reference context vector. Generator For a particular word w, it may occur multiple times in the reference title or in multiple related entities. Therefore, at each decoding step i, for each word w, we aggregate its attention weights from the reference attention and memory attention distributions: P i τ = m|wm=w α im and P i e = m|w∈em β im respectively. In addition, at each decoding step i, each word in the vocabulary may also be generated with a probability according to the language model. The probability is computed from the decoder stateh i , the reference context vector φ i , and the memory context vector χ i : P gen = Softmax(W gen [h i ; φ i ; χ i ] + b gen ), where W gen and b gen are learnable parameters. To combine P τ , P e and P gen , we compute a gate g τ as a soft switch between generating a word from the vocabulary and copying words from the reference title τ or the related entities E: g p = σ(W ph i + W z z i−1 + b p ), where z i−1 is the embedding of the previous generated token at step i − 1. W p , W z , and b p are learnable parameters, and σ is a Sigmoid function. We also compute a gateg p as a soft switch between copying words from reference text and the related entities: andb p are learnable parameters.
The final probability of generating a token z at decoding step i can be computed by: P (z i ) = g p P gen + (1 − g p ) (g p P τ + (1 −g p )P e )   The loss function, combined with the coverage loss (See et al., 2017) for both reference attention and memory distribution, is presented as: where P (z i ) is the prediction probability of the ground truth token z i , and λ is a hyperparameter. Repetition Removal Similar to many other long text generation tasks (Suzuki and Nagata, 2017), repetition remains a major challenge (Foster and White, 2007;Xie, 2017). In fact, 11% sentences in human written abstracts include repeated entities, which may mislead the language model. Following the coverage mechanism proposed by (Tu et al., 2016;See et al., 2017), we use a coverage loss to avoid any entity in reference input text or related entity receiving attention multiple times. We further design a new and simple masking method to remove repetition during the test time. We apply beam search with beam size 4 to generate each output, if a word is not a stop word or punctuation and it is already generated in the previous context, we will not choose it again in the same output.

Data
We collect biomedical papers from the PMC Open Access Subset. 5 To construct ground truth for new title prediction, if a human written paper A cites a paper B, we assume the title of A is generated from B's conclusion and future work session. We construct background knowledge graphs from 1,687,060 papers which include 30,483 entities and 875,698 relations. Tables 2 shows the detailed data statistics. The hyperparameters of our model are presented in the Appendix.

Automatic Evaluation
Previous work Lowe et al., 2015) has proven it to be a major challenge to automatically evaluate long text generation. Following the story generation work (Fan et al., 2018), we use METEOR (Denkowski and Lavie, 2014) to measure the topic relevance towards given titles and use perplexity to further evaluate the quality of the language model. The perplexity scores of our model are based on the language model 6 learned on other PubMed papers (500,000 titles, 50,000 abstracts, 50,000 conclusions and future work) which are not used for training or testing in our experiment. 7 The results are shown in Table 3. We can see that our framework outperforms all previous approaches.

Turing Test
Similar to (Wang et al., 2018b), we conduct Turing tests by a biomedical expert (non-native speaker) and a non-expert (native speaker). Each human judge is asked to compare a system output and a human-authored string, and select the better one.    Table 4 shows the results on 50 pairs in each setting. We can see that PaperRobot generated abstracts are chosen over human-written ones by the expert up to 30% times, conclusion and future work up to 24% times, and new titles up to 12% times. We don't observe the domain expert performs significantly better than the non-expert, because they tend to focus on different aspectsthe expert focuses on content (entities, topics, etc.) while the non-expert focuses on the language.

Human Post-Editing
In order to measure the effectiveness of Paper-Robot acting as a wring assistant, we randomly select 50 paper abstracts generated by the system during the first iteration and ask the domain expert to edit them until he thinks they are informative and coherent. The BLEU (Papineni et al., 2002), ROUGE (Lin, 2004) and TER (Snover et al., 2006) scores by comparing the abstracts before and after human editing are presented in Table 5. It took about 40 minutes for the expert to finish editing 50 abstracts. Table 1 includes the post-edited example. We can see that most edits are stylist changes.

Analysis and Discussions
To better justify the function of each component, we conduct ablation studies by removing memory networks, link prediction, and repetition removal respectively. The results are shown in Table 6. We can see that the approach without memory networks tends to diverge from the main topic, especially for generating long texts such as abstracts (the detailed length statistics are shown in Table 8). From Table 6 we can see the later parts of the abstract (Methods and Results) include topically irrelevant entities such as "imipramine" which is used to treat depression instead of human prostate cancer.
Link prediction successfully introduces new and topically related ideas, such as "RT-PCR" and "western blot" which are two methods for analyzing the expression level of Snail protein, as also mentioned in the human written abstract in Table 1. Table 7 shows more examples of entities which are related to the entities in input titles based on link prediction. We can see that the predicted entities are often genes or proteins which cause the disease mentioned in a given title, or other diseases from the same family.
Our simple beam search based masking method successfully removes some repeated words and phrases and thus produces more informative output. The plagiarism check in Table 9 shows our model is creative, because it's not simply copying from the human input.

Remaining Challenges
Our generation model is still largely dependent on language model and extracted facts, and thus it lacks of knowledge reasoning. It generates a few incorrect abbreviations such as "Organophosphates(BA)", "chronic kidney disease(UC)" and "Fibrosis(DC)") because they appear rarely in the training data and thus their contextual representations are not reliable. It also generates some incorrect numbers (e.g., "The patients were divided into four groups : Group 1 , Group B...") and pronouns (e.g., "A 63-year-old man was referred to our hospital ... she was treated with the use of the descending coronary artery" ).

Output
Without Memory Networks Without Link Prediction Without Repetition Removal Abstract Background: Snail has been reported to exhibit a variety of biological functions. In this study, we investigated the effect of negatively on maspin demethylation in human prostate cancer cells. Methods: Quantitative real-time PCR and western blot analysis were used to investigate the effects of the demethylating agent on the expression of the protein kinase (TF) gene promoter. Results: The results showed that the presence of a single dose of 50 µM in a dose-dependent manner, whereas the level of the BMP imipramine was significantly higher than that of the control group.
Background: Snail has been shown to be associated with poor prognosis. In this study, we investigated the effect of negatively on the expression of maspin in human prostate cancer cells. Methods: Cells were treated with a single dose of radiotherapy for 24 h, and was used to investigate the significance of a quantitative factor for the treatment of the disease. Results: The remaining controls showed a significant increase in the G2/M phase of the tumor suppressor protein (p<0.05).
Background: Snail is a major health problem in human malignancies. However, the role of Snail on the expression of maspin in human prostate cancer cells is not well understood. The aim of this study was to investigate the effect of Snail on the expression of maspin in human prostate cancer cells.
Methods: The expression of the expression of Snail and maspin was investigated using quantitative RT-PCR and western blot analysis. Results: The remaining overall survival (OS) and overall survival (OS) were analyzed. Conclusion and Future work In summary, our study demonstrated that negatively inhibited the expression of the BMP imipramine in human prostate cancer cells. Our findings suggest that the inhibition of maspin may be a promising therapeutic strategy for the treatment.
In summary, our results demonstrate that negatively inhibited the expression of maspin in human prostate cancer cells. Our findings suggest that the combination of radiotherapy may be a potential therapeutic target for the treatment of disease.
In summary, our results demonstrate that snail inhibited the expression of maspin in human prostatic cells. The expression of snail in PC-3 cells by snail, and the expression of maspin was observed in the presence of the expression of maspin. New Title Protective effects of homolog on human breast cancer cells by inhibiting the Endoplasmic Reticulum Stress The role of prostate cancer in human breast cancer cells The role of maspin and maspin in human breast cancer cells    Table 9: Plagiarism Check: Percentage (%) of n-grams in human input which appear in system generated output for test data.
All of the system generated titles are declarative sentences while human generated titles are often more engaging (e.g., "Does HPV play any role in the initiation or prognosis of endometrial adenocarcinomas?"). Human generated titles often include more concrete and detailed ideas such as "etumorType , An Algorithm of Discriminating Cancer Types for Circulating Tumor Cells or Cellfree DNAs in Blood", and even create new entity abbreviations such as etumorType in this example.

Requirements to Make PaperRobot Work: Case Study on NLP Domain
When a cool Natural Language Processing (NLP) system like PaperRobot is built, it's natural to ask whether she can benefit the NLP community itself. We re-build the system based on 23,594 NLP papers from the new ACL Anthology Network (Radev et al., 2013). For knowledge extraction we apply our previous system trained for the NLP domain (Luan et al., 2018). But the results are much less satisfactory compared to the biomedical domain. Due to the small size of data, the language model is not able to effectively copy out-of-vocabulary words and thus the output is often too generic. For example, given a title "Statistics based hybrid approach to Chinese base phrase identification", PaperRobot generates a fluent but uninformative abstract "This paper describes a novel approach to the task of Chinese-base-phrase identification. We first utilize the solid foundation for the Chinese parser, and we show that our tool can be easily extended to meet the needs of the sentence structure.". Moreover, compared to the biomedical domain, the types of entities and relations in the NLP domain are rather coarse-grained, which often leads to inaccurate prediction of related entities. For example, for an NLP paper title "Extracting molecular binding relationships from biomedical text", PaperRobot mistakenly extracts "prolog" as a related entity and generates an abstract "In this paper, we present a novel approach to the problem of extracting relationships among the prolog program. We present a system that uses a macromolecular binding relationships to extract the relationships between the abstracts of the entry. The results show that the system is able to extract the most important concepts in the prolog program.".

Related Work Link Prediction.
Translation-based approaches (Nickel et al., 2011;Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015;Ji et al., 2015a) have been widely exploited for link prediction. Compared with these studies, we are the first to incorporate multi-head graph attention (Sukhbaatar et al., 2015;Madotto et al., 2018;Veličković et al., 2018) to encourage the model to capture multi-aspect relevance among nodes. Similar to (Wang and Li, 2016;Xu et al., 2017), we enrich entity representation by combining the contextual sentences that include the target entity and its neighbors from the graph structure. This is the first work to incorporate new idea creation via link prediction into automatic paper writing.
Knowledge-driven Generation. Deep Neural Networks have been applied to generate natural language to describe structured knowledge bases (Duma and Klein, 2013;Konstas and Lapata, 2013;Flanigan et al., 2016;Hardy and Vlachos, 2018;Pourdamghani et al., 2016;Trisedya et al., 2018;Xu et al., 2018;Madotto et al., 2018;Nie et al., 2018), biographies based on attributes (Lebret et al., 2016;Chisholm et al., 2017;Kaffee et al., 2018;Wang et al., 2018a;Wiseman et al., 2018), and image/video captions based on background entities and events (Krishnamoorthy et al., 2013;Lu et al., 2018). To handle unknown words, we design an architecture similar to pointer-generator networks (See et al., 2017) and copy mechanism (Gu et al., 2016). Some interesting applications include generating abstracts based on titles for the natural language processing domain (Wang et al., 2018b), generating a poster (Qiang et al., 2016) or a science news blog title (Vadapalli et al., 2018) about a published paper. This is the first work on automatic writing of key paper elements for the biomedical domain, especially conclusion and future work, and follow-on paper titles.

Conclusions and Future Work
We build a PaperRobot who can predict related entities for an input title and write some key elements of a new paper (abstract, conclusion and future work) and predict a new title. Automatic evaluations and human Turing tests both demonstrate her promising performance. PaperRobot is merely an assistant to help scientists speed up scientific discovery and production. Conducting experiments is beyond her scope, and each of her current components still requires human intervention: constructed knowledge graphs cannot cover all technical details, predicted new links need to be verified, and paper drafts need further editing. In the future, we plan to develop techniques for extracting entities of more fine-grained entity types, and extend PaperRobot to write related work, predict authors, their affiliations and publication venues.