Will this Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

What kind of basic research ideas are more likely to get applied in practice? There is a long line of research investigating patterns of knowledge transfer, but it generally focuses on documents as the unit of analysis and follow their transfer into practice for a specific scientific domain. Here we study translational research at the level of scientific concepts for all scientific fields. We do this through text mining and predictive modeling using three corpora: 38.6 million paper abstracts, 4 million patent documents, and 0.28 million clinical trials. We extract scientific concepts (i.e., phrases) from corpora as instantiations of “research ideas”, create concept-level features as motivated by literature, and then follow the trajectories of over 450,000 new concepts (emerged from 1995-2014) to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials. Results from our analysis suggest several mechanisms that distinguish which scientific concept will be adopted in practice, and which will not. We also demonstrate that our derived features can be used to explain and predict knowledge transfer with high accuracy. Our work provides greater understanding of knowledge transfer for researchers, practitioners, and government agencies interested in encouraging translational research.


Introduction
Science generates a myriad of new ideas, only some of which find value in practical uses (Backer, 1991;Lane and Bertuzzi, 2011). Large government agencies (e.g., NSF, NIH) pour billions of dollars into basic research in the hopes that it will span the research-practice divide so as to generate private sector advances in technologies (Narin and * Equal contribution Figure 1: An illustration of scientific concept's "knowledge transfer" from basic research to practice use: we analyze individual concept's time-varying features (e.g., popularity) and relative positions with other concepts (i.e., cooccurrence) to understand the key mechanisms behind knowledge transfer, using Web of Science research papers, USPTO patents and clinical trial documents. Noma, 1985), social policies (McDonald and Mair, 2010), and pharmaceuticals (Berwick, 2003). To this end, these agencies increasingly seek to nurture "translational research" that succeeds at extending, bridging and transforming basic research so it finds greater applied value (Li et al., 2017). Surrounding this effort has arisen a line of research that tries to identify when, where, and how academic research influences science and technological invention (Backer, 1991;Li et al., 2017).
However, prior research efforts are limited in their ability to understand and facilitate the translation of research ideas. This is partially due to a shortage of data, a biased focus on successful examples, and specialized modeling paradigms. In practice, only a small proportion of knowledge outputs are successfully translated into inventive outputs (∼ 2.7% concepts from WoS to patent, and ∼ 11.3% concepts from WoS to clinical trials, according to our data analysis). Previous studies conduct post-hoc analyses of successful scientifictechnological linkages, but are unable to explain why the majority of scientific innovations do not transfer into technological inventions. Additionally, prior work mostly look at document-level linkages across research and applied domains, i.e., citations from patents into research papers or shared inventors across them, rather than diving into the document content where ideas are discussed (Narin and Noma, 1985;Ahmadpoor and Jones, 2017). Documents entail many ideas, and linkages across them loosely capture which intellectual innovation is in focus and being transferred. By contrast, we conceptualize knowledge transfer in terms of scientific concepts, rather than documents associated to particular desirable outcomes, and demonstrate the importance of our derived features in knowledge transfer through machine learning model in a large-scale original dataset.
In this paper, we focus on studying patterns behind knowledge transfer from academia to research. We use "knowledge transfer from academia to research" in our study to mean a "concept's transfer from research papers to patent documents/clinical trials", or a concept that first appear in academia later get used a non-trivial frequency (decided by a pre-defined threshold) in practical outlets (patents, clinical trials). 1 In scientific writing, a scientific concept is a term or set of terms that have semantically coherent usage and reflect scientific entities -e.g., curricula, tools, programs, ideas, theories, substances, methods, processes, and propositions, which are argued to be the basic units of scientific discovery and advance (Toulmin). We use the titles and abstracts of 38.6 million academic publications from the Web of Science (WoS) to identify 0.45 million new scientific concepts emerging between 1995 to 2014 through state of the art phrase mining techniques (AutoPhrase), and follow their trajectories in 4 million patent documents of the United States Patent and Trademark Office (USPTO), and 0.28 million clinical trials from U.S. National Library of Medicine.
In our analysis, we compare the properties of 1 We use knowledge transfer, concept transfer and idea transfer interchangeably throughout the paper  Internet, world wide web,  ethnographic exploration,  interactive visualization, web server,  immersive virtual reality,  gpu, recombinant protein production,  european maize,  hcci engine, cloud service,  institutional demand,  artificial magnetic conductor  automatic imitation,  multifunctional enzym,  network reorganization,  tissue remodeling, human capital, single photon detector amercian theatre new scientific concepts that successfully transfer into patents with those that did not. We find that (a) the intrinsic properties of ideas and their temporal behavior, and (b) relative position of the ideas are the two mechanisms that determine whether an idea could transfer successfully. In particular, we find new engineering-focused scientific concepts situated in emotionally positive contexts are more likely to transfer than other concepts. Furthermore, increased scientific hype and adoption across scientists, as well as usage in interdisciplinary venues over time, are early signs of impending knowledge transfer into technological inventions. Finally, we find that new concepts positioned close to concepts that already transferred into patents are far more likely to transfer than their counterparts. Based on the derived features, we further built model to predict the likelihood of knowledge transfer from papers to patents/clinical trials at individual concept level, and demonstrated our derived feature can achieve great performance, indicating that our proposed features can explain majority of the knowledge transfer cases.
Contributions Our main contributions are summarized as follows: (1) To the best of our knowledge, we present the first ever research that aims at understanding knowledge transfer at a large scale, using multiple corpora.
(2) We are the first to leverage text mining techniques to understand transfer on scientific concept level, rather than document level.
(3) We systematically analyzed the differences between transferable and non-transferable concepts, and identified the key mechanisms behind knowledge transfer. We showed our derived insights can help explain and predict knowledge transfer with high accuracy.

Data Preparation and Processing
In this section we introduce the dataset used in our study (Sec. 2.1), and present the concept extraction process (Sec. 2.2). More details of the leveraged datasets are further elaborated in Appendix A. Note that our study inevitably suffer from data bias. For instance, not all practitioners will patent their idea, or file clinical trials, and that some clinical trails and patents are unused, thus there will be some false positives and negatives of 'transferred' labels through our approach. Yet so far patent and clinical trial have been demonstrated to be the best proxy to study translational science from research to practice (Ahmadpoor and Jones, 2017). Moreover, we have tried our best to mitigate such bias by investigating transfer patterns in both patent-heavy and patentlight fields, where we found very similar patterns emerge.

Scientific Concept Extraction
Using titles and abstracts of articles, patents and clinical trials, we employ phrase detection technique AutoPhrase (Shang et al., 2018), to identify key concepts in the two corpora and trace their emergence and transfer across domains over time. Phrase detection identifies 1,471,168 concepts for research papers, 316,442 concepts for patents, and 112,389 concepts for clinical trials. Some samples of transferred concepts and non-transferred concepts extracted from WoS and USPTO by phrase detection are shown in Table. 1. We observe that phrase detection results in high-quality concepts (92% are labelled as high quality through our evaluations) that are suitable to investigate knowledge transfer across domains. Details of the phrase detection techniques, cleaning and evaluations are further discussed in Appendix B.
New Concept Identification. The focus of this study is on new concepts and their careers. However, our sample of 1.5 million distinct concepts occur at any time in the corpus, some of which emerged long ago and others more recent. To avoid left-censoring issue (certain concepts appear before the start time of the recorded data thus we do not fully observe their behaviors) and identify 'real' new concepts, we aggregate (or "burn in") the set of concepts over time, and count the number of new concepts that arrive each year. Early papers (starting 1900) identify many new concepts, but this quickly decelerates by around 1995 and then assumes a linear growth in vocabulary afterwards (see Fig. 2 (a)). To identify that point, we aggregate the set of concepts every year with prior years until the rate of new concepts' introduction is approximately linear and stable. The point occurs after 1995, when 0.45 million scientific concepts are left. Then we follow knowledge transfer via these new scientific concepts, and find only ∼2.7% of all concepts get transferred to patent, and only ∼11.3% of bio & health concepts get transferred to clinical trials across years. The number of transferred concepts each year from WoS to USPTO is illustrated in Fig. 2

Feature Creation and Analysis
Based on the concepts extracted from research papers, patent and clinical trial documents, we first create concept level features as motivated by prior literature on knowledge diffusion, and present a large-scale data analysis on transferred and nontransferred concepts to better understand properties facilitating the knowledge transfer process. Here we present transfer patterns from research paper to patent and omit clinical trial due to page limit 3 .

Intrinsic Properties of Concepts
Motivated by previous works on knowledge diffusion and transfer, we extracted intrinsic concept features that would most likely facilitate a scientific idea's transfer into technological inventions, which can be classified into four categories: 1) hype features (Latour, 1987;Rossiter, 1993), 2) bridge positioning features (Shi et al., 2010;Kim et al., 2017), 3) ideational conditions (Berger and Heath, 2005), and 4) technological resonance (Narin and Noma, 1985) 4 . The four sets of features represent the characteristic of individual concept from diverse angles, and as we will show signify the differences between transferred and non-transferred concepts, both in mean value (Appendix E) and temporal behavior.
To illustrate concepts' temporal behavior over time, we plot the feature curves of transferred and non-transferred concepts over concept age. Details could be found in Figs. 3-8.
Hype. This group of features draws on prior work concerning concept hype (Latour, 1987;Acharya et al., 2014;Larivière et al., 2014). We include two features: the adopter size using the concept, and the degree to which authors repeatedly use the concept. We measure adopter size as the total number of authors who employ a concept in a particular year, and author repeated usage as the total number of previous authors continuing to use the concept. We found that transferred concepts generally demonstrate higher numbers of adopters and repeated usage. Furthermore, transferred concepts attract adopters at a faster rate than non-transferred concepts. We also found that transferred concepts are repeatedly used much more often by the previous authors when controlling for concept age. What's more, we observe an increasing gap with regard to 'hype' features between transferred and non-transferred concepts over time, possibly due to the preferential attachment effect (Newman, 2001).
Bridge Positioning. This group of features identify the disciplinary placement of concepts. Previous works argue that knowledge transfer is facilitated when ideas are placed at the boundary of fields and in fields especially relevant to technological invention ( (Shi et al., 2010)). Here we include two features: discipline diversity and engineering relation in this group. Discipline diversity is computed as a concept's average entropy across NRC discipline subject codes (sociology, math, economics, etc.), and engineering relation is computed as the proportion of engineering fields among all the fields using the concept.
We found transferred concepts are more likely to be used in interdisciplinary and engineering venues. Moreover, transferred concepts gained greater interdisciplinary attention over time compared to nontransferred concepts, as shown in Fig. 4. The finding is consistent with the assumption that transferred concepts are likely to achieve a more diverse audience than non-transferred concepts. Engineering focused concepts also achieved a higher knowledge transfer rate, which supports our hypothesis that knowledge transfer is facilitated when ideas are placed at the boundary of fields especially relevant to technological invention like engineering (e.g. mechanical engineering). Once again, we observed the difference of 'bridge positioning' feature values between transferred and non-transferred concepts increase over time.  Ideational Conditions. This group of features represents the semantic context and expression of a concept. How the concept is related to other concepts and the style with which the concept is expressed can both influence the diffusion and transfer process (Hamilton et al., 2016). Here we select emotionality, and accessibility in this group, and calculated them through LIWC and Dale Chall metric (details in Appendix C).
We found transferred concepts are embedded in more emotional context, and described in more difficult language, compared to non-transferred counterparts. In a similar way, we plot ideational condition features over time for transferred concepts and non-transferred concepts in Fig. 5. We found that transferred concepts were consistently placed in increasingly positive contexts and conveyed in more difficult language over time, compared to nontransferred concepts, although the accessibility gap decreases over time.  Technological Resonance. This group of features quantifies the extent to which a concept is established within an environment conducive to link scientific publications with patents and other outcomes (Narin and Noma, 1985;Tijssen, 2001). We measure this as journal linkage and universityindustry relationship in our study. journal linkage is computed as the percentage of journals where the concept is situated that have been cited by patents before. university-industry relationship is calculated as the proportion of industry-affiliated authors out of all the authors employing the term each year. Should a scientific concept be in a high bridging space like these, they will more likely transfer. Transferred concepts are more likely to be mentioned in journals that have been cited by patents, and this relationship strengthens over time. We also find that if a concept is associated with more industry-affiliated authors, the concept has a higher potential to transfer. While the industry-affiliate author percentage between transferred and nontransferred concepts remain relatively stable, the gap between them with regard to journal linkage gets greater over time.

Relative Position in Concept
Co-occurrence Graph In addition to the above features, we investigate the same data with a relational approach (Hofstra et al., 2019). Intuitively, how a concept get positioned/coused with other concepts may be associated with knowledge transfer. As a motivating example, we plot the local cooccurrence network of concept search engine in  Figure 7: Illustration of the dynamic graphs that capture interactions between search engine and its co-occurrence concepts. The orange circles denote transferred concepts while the blue denotes non-transferred ones; the circle size represents the node degree. gine, and the orange nodes denote the transferred neighbor concepts while the blue denotes the nontransferred one. Search engine first emerged in WoS in 1992 and entered USPTO in 1998. Coincidentally, the percentage of its transferred neighbors increased rapidly right before 1998, which indicates the neighboring concepts that get co-used with a concept may embed useful signals that explain concept transfer. The consistency between co-occurrence network and transfer status is also common in other concepts.
To facilitate analysis, we construct a dynamic graph G for concept co-occurrence. Each node in graph denotes a concept which has occurred in the corpus. Each edge between two nodes indicates the two concepts co-occur in at least one document in the corpus, and we define the edge weight as the number of documents the two concepts co-occur. We sort all documents by year and construct a graph at each time-stamp, then we will get a set of graphs {G} = {G (1) , · · · , G (t) } as dynamic concept cooccurrence graph. This set of graphs reflects the dynamic succession of concepts' neighbors and provides us with extra temporal information on local graph structures. Based on dynamic concept co-occurrence network, we derived two graph features: weighted degree and weighted percentage of transferred neighbors as specified in Appendix D.
The curves of the two features over time are shown in Fig. 8(a) and Fig. 8(b). We find that transferred concepts indeeed have higher weighted degrees and weighted percentages compared to  non-transferred ones, which indicates the importance of utilizing concept co-occurrence for knowledge transfer prediction.

Field Comparison & Feature Correlation
We further carried out analysis on feature correlation, and comparison across fields, which is discussed in detail in Appendix F and Appendix G.
Summary. Results of our data analysis support the conclusion that knowledge transfer is not by chance but follows specific patterns. Whether a concept will transfer from research to practice in the immediate future depends largely on their (a) individual properties over time, and (b) relative positions with respect to other concepts.

Predictive Analysis of Features
So far we have systematically analyzed the potential factors that reflect the process of knowledge transfer from research to practice. But how well can these features explain and predict knowledge transfer in practice? In this section, we seek to shed light on this question through predictive analysis.

Prediction Task Formulation
Will a scientific concept transfer from academic papers to patent documents in the next X years? Here we consider the predictive task which aims to predict concept transfer status given all observed historical data. As there can be only two potential outcomes -either the concept transfers or notthe proposed prediction task is essentially a binary classification problem. We label a concept as transferred if it first originates in research papers and later get used at least 5 times in practical outlets (patents, clinical trials) within X years after the concept's birth in research papers.
We denote all N concepts' time-series attributes at one particular time-stamp as X ∈ R N ×Nx , where N x is the dimension of attributes. As shown in Fig. 9, the goal of the transfer prediction prob-lem is to construct a function f (·) mapping historical time-series attributes to the future transfer probability of concept, · is the conditional probability and k is input history length. y i denotes transfer status of concept i in next T years, i.e., the ground truth label of y (t) i is 1 if it transfers in t ∼ t + T − 1 else 0, and T denotes prediction window length. Particularly, we note t as cutoff year and our model inputs the attributes previous to this time-stamp and predicts future transfer probability. For simplicity, we denote P y i . Accordingly, if the true transfer status is y (t) i , the loss function for cutoff year t is

Prediction Models
Feature based Model. We use logistic regression (LR) as an interpretable model. To better validate our finding, we also run a mixed effects logistic regression detailed in Appendix I, a form of Generalized Linear Mixed Model (GLMM), to help explain variance both within-concept and across-concept. The results from the mixed effects logistic regression are nearly identical with our findings from the vanilla logistic regression, except for slight changes in the magnitude of coefficients, so we only report performance of LR in our analysis.
Deep Sequence Model. To model a concept's temporal features, i.e. time-series attributes, we further propose RNN sequence models. According to Sec.3, some time-series features are strongly related to potential transfer; therefore, we adopt Recurrent Neural Network (RNN) models (e.g., LSTM and GRU) which are built to capture temporal dependencies (Details in Appendix H).

Experiments
Experiment Set-up. We apply Z-score normalization on time-series attributes and divide dataset into training/test sets as Fig. 9 shows. Given test cutoff year t, we first ensure the prediction intervals (red line in Fig. 9) of training and test set have no overlap to avoid data leakage, and then use the latest three cutoff years as train cutoff years. For Evaluation Metric. We adopt area-under-curve (AUC) as evaluation metric, which is not affected by data imbalance in test set.

Results
We first compared the performances of all aforementioned models for cutoff year 2008 on datasets: from WoS to patent, and from WoS bio & health science papers to clinical trials 5 . For each cutoff year, we ran two sets of experiments with training history lengths of 3 and 5 years and repeated 3 times for each experiment. The performances (mean AUC) are summarized in Table. 2. Patent vs. Clinical Trial As a robustness check, we tested our model on both knowledge transfer from WoS to patent, and to clinical trial. We obtained consistent main attribute importance results based upon clinical trial data.
As can be observed from Table. 2, our derived features achieve good result, i.e. AUC 0.80, in predicting knowledge transfer, demonstrating knowledge transfer can be largely explained by our proposed mechanisms.
Study of Feature Importance. In Fig. 10, we further plot the standardized coefficients of each temporal feature from the logistic regression to understand how a specific attribute contributes to 5 Note that for knowledge transfer to clinical trial, we excluded bridge positioning features since we focused on bio & health science only. the knowledge transfer. We observed that author repeated usage, adopter size, and weighted graph degree are the three most important factors in influencing knowledge transfer.
Next, we studied feature importance in our proposed models, where we ran both models with different sets of features on knowledge transfer from WoS to patent. The result is summarized in Table. 3. As reflected by experiment results with RNN model, graph features achieve best prediction results compared to other feature sets, followed by "bridge positioning" features, "ideational conditions", and "hype" features, suggesting that the relative position position of the concept in the semantics network is the single most useful feature set that explains concept transfer.

Study on Field Difference
We studied the prediction performance of the proposed model in different fields. We partitioned the concepts used in Web of Science based on their field, trained and tested models separately using 5-year historical data as training inputs with train cutoff year 2003 and tested cutoff year 2008 for next 3-year prediction. We observed that it is easiest to predict knowledge transfer from academia to practice in humanity (AUC 0.973), followed by physical & math science (AUC 0.791), bio & health science (AUC 0.783), engineering (AUC 0.782), social science (AUC 0.706) and agriculture (AUC 0.633), which indicates our proposed mechanism can explain knowledge transfer quite well in most fields other than agriculture.

Sensitivity Analysis
Finally, we tested our proposed models under different settings on WoS to patent. We investigated whether our proposed transfer model is influenced as a result of 1) varying length of historical observations, 2) varying prediction time window, and 3) varying cutoff year.  1. Length of Observation History. Fig. 11 demonstrates the effects of historical observation length on performance, where we selected 1 year, 2 years, 3 years, 5 years and 7 years of observation before cutoff year 2008 as training sets. We found that the longer the observation data, the better prediction result we will get for the transfer prediction, which can be explained by the fact that longer observation better captures knowledge transfer patterns. We also note that performance starts to plateau when observation length gets larger, indicating that longer training sets only provide limited additional signal. All this indicates knowledge transfer is most influenced by behavior of concepts in the recent few years. Fig. 12 further illustrates the knowledge transfer prediction performance with prediction window of 1 year, 3 years and 5 years, representing the case when predicting whether a concept will transfer in next 1 year, 3 years or 5 years, respectively. To compare them fairly, we fix both training and testing cutoff years to keep time interval from training set to test set unchanged, which is different from the setting in previous experiments. As can be observed, prediction performance is consistently best when prediction window is 1 year, indicating the increasing difficulty in capturing long-term tempo-  ral pattern of knowledge transfer of our proposed mechanisms.

Cutoff Year.
We also tested our model with different cutoff years (i.e., 2008, 2009 and 2010), representing knowledge transfer prediction with different training and testing sets. As illustrated by Fig. 13, Our model achieves consistent results, which further verifies the generalizability of our proposed knowledge transfer mechanism.

Related Work
Knowledge Diffusion and Transfer. Extensive studies have been dedicated to study the diffusion of knowledge (Kuhn, 1962;Rogers Everett, 1995;Hallett et al., 2019), and the transfer of knowledge from science to more applicable domains like technology (Narin and Noma, 1985;Tijssen, 2001). The majority of these studies focus on identifying contributing factors to knowledge diffusion and transfer (Rossiter, 1993;Azoulay et al., 2010;Shi et al., 2010;Kim et al., 2017). However, this line of work falls short in that (a) they focus primarily on successful / post-hoc knowledge diffusion and transfer, and little comparison of successful with unsuccessful transfer are presented, and (b) poorly specify what idea is being transferred because it focuses entirely at the document / invention level. In contrast, we contribute by empirically investigating properties of knowledge transfer through large-scale data analysis at the concept-level by using text mining approaches, through which we not only verified existing findings, but also revealed the significance of knowledge co-occurrence and ideational context in shaping knowledge transfer.
Temporal Sequence Modelling. As one fundamental task in behavior modelling and NLP, numerous techniques for modelling and predicting temporal sequence have been proposed (Kurashima et al., 2018;Pierson et al., 2018). In recent years, leveraging recurrent neural network (RNN) (Mikolov et al., 2010)

Conclusion
In this paper, we systematically studied the process and properties of knowledge transfer from research to practice. Specifically, we used a sample of 38.6 million research papers, 4 million patents and 280 thousand clinical trials, where we leveraged Au-toPhrase to extract concepts from text and focus on the applicable career of nearly 450,000 new scientific concepts that emerged from 1995-2014. Through extensive analysis, we found that 'transferable' ideas distinguish themselves from 'nontransferable' ideas by their (a) intrinsic properties and their temporal behavior, and (b) their relative position to other concepts. Through predictive analysis, we showed our proposed features can explain majority of transfer cases. Our research not only provides significant implications for researchers, practitioners, and government agencies as a whole, but also introduces a novel research question of real world impact for computer scientists.

B Phrase Detection Techniques, and Evaluations
The phrase detection technique we adopted is AutoPhrase (Shang et al., 2018), a widely-used method that extracts frequent and meaningful phrases through weak supervision. AutoPhrase first extracts single-word and multi-word expressions (i.e. phrases) from the text corpus as candidate concepts, and then applies salient concept selection functions to pick the most representative concepts for each document. Given a word sequence (e.g., a sentence in an abstract), phrase segmentation can partition the word sequence into non-overlapping segments, each representing a cohesive semantic unit as illustrated in the first step in. We used default parameters as suggested by (Shang et al., 2018) in our study. We further conducted data cleaning on the output of AutoPhrase to ensure the quality of the analyzed concepts. Specifically, we filtered out general phrases used for scientific writing (e.g. 'significantly important') and publisher name (e.g., 'Else-vier').
To quantitatively evaluate AutoPhrase for concept extraction, we randomly sampled 200 outputs and asked three experts to manually label whether they are good-quality concepts or not, where 184 (92%) are labelled as good-quality by all three experts.

C Calculations of Emotionality and Accessibility
Emotionality is computed as the percentage of words that were classified as either positive or negative where a concept is used. The number of positive and negative words in each article is counted by the Linguistic Inquiry and Word Count computer program (LIWC), which adopts a list of words classified as positive or negative by human readers beforehand (Pennebaker et al., 2015). We quantify accessibility through a variation of Dale Chall readability (Powers et al., 1958) by substituting the 'easy term list' with college student vocabulary. This widely used index variable essentially measures the difficulty or appropriateness of the writing for each article. We then weighted the average Dale Chall readability score of all the documents associated with a concept.

D Calculations of Graph Features
Given co-occurrence graph G = {V, E, s, W } defined in subsection 3.2, the weighted degree d i and weighted percentage of transferred neighbors p i are calculated as follows.
Different from unweighted features, weighted degree and weighted percentage use co-occurrence weights to stress the influence of high-frequency correlations. The edge weights is necessary especially when central concept co-occurs with a large amount of non-transferred concepts. Table 4 illustrates the mean value for each attribute with regard to transferred concepts and nontransferred concepts, where we observe a statistically significant gap between the two groups.

F Field Comparison.
We studied transfer patterns in different fields. We identified the field of each concept as one of the six disciplines -biology & health sciences, physical & math sciences, the humanities, engineering, agriculture, and the social sciences, based on the maximum TF-IDF value component of its field use frequency distribution. While different fields demonstrate distinct transfer rates from research to patent -engineering 7.5%, physical & math sciences 1.9%, the social sciences 1.1%, bio & health sciences 0.96% (11.3% concepts in bio & health sciences transferred to clinical trial), agriculture 0.83% and the humanities 0.39% -we found that the aforementioned features show consistent patterns in different fields.

G Feature Correlation
We further studied the correlation between the extracted features. As illustrated in Fig.14

H Details of Temporal Feature Model
The RNN model is given as where h x is the hidden states of attributes. Suppose the concept transfer status is Markovian, then the model should be P y Here we adopt GRU as RNN and one fully connected layer with sigmoid activation as classifier g(·).

I Details on Mixed Effect Logistic Regression
We ran a mixed effects logistic regression as a robustness check of logistic regression. Mixed effect logistic regression is a form of Generalized Linear Mixed Model (GLMM). Mixed effects logistic regression accounts for both within-concept variation (how concept use changes) and betweenconcept variation (how concept use differs on average), while a single measure of residual variance from the vanilla logistic regression can't account for both.

J Model Training and Hyperparameters.
To deal with the data imbalance problem -the positive samples (concepts which will transfer in the future) are much less than the negative, we over-sample positive samples to make their amount equal to negative ones in training set while keeping the original distribution in test set. The hidden state size in RNN is set as 32. We experimented on different state sizes, and 32 achieved best performance on testing set.