Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text

Condition is essential in scientific statement. Without the conditions (e.g., equipment, environment) that were precisely specified, facts (e.g., observations) in the statements may no longer be valid. Existing ScienceIE methods, which aim at extracting factual tuples from scientific text, do not consider the conditions. In this work, we propose a new sequence labeling framework (as well as a new tag schema) to jointly extract the fact and condition tuples from statement sentences. The framework has (1) a multi-output module to generate one or multiple tuples and (2) a multi-input module to feed in multiple types of signals as sequences. It improves F1 score relatively by 4.2% on BioNLP2013 and by 6.2% on a new bio-text dataset for tuple extraction.


Introduction
Conditions such as environment and equipment provide validation supports for facts, while the facts focus on scientific observation and hypothesis in scientific literature (Miller, 1947). Existing ScienceIE methods, which extract (subject, relational phrase, object)-tuples from scientific text, do not distinguish the roles of fact and condition. Simply adding a tuple classification module has two weak points: (1) one tuple may have different roles in different sentences; (2) the tuples in one sentence have high dependencies with each other, for example, given a statement sentence in a biochemistry paper (Tomilin et al., 2016): "We observed that ... alkaline pH increases the activity of TRPV5/V6 channels in Jurkat T cells." an existing system (Stanovsky et al., 2018) would return one tuple as below: 1 This work was done when the first author was visiting the University of Notre Dame. (alkaline pH, increases, activity of TRPV5/V6 channels in Jurkat T cells). where (a) the object should just be the channel's activity and (b) the condition tuple (TRPV5/V6 channels, in, Jurkat T cells) was not found. Note that the term "TRPV5/V6 channels" is not only the concept in the fact tuple's object but also the condition tuple's subject.
In this work, we define the joint tuple extraction task as a multi-output sequence labeling problem. First, we create a new tag schema: Non-"O" tags are formatted as "B/I-XYZ", where • X ∈ {fact, condition}; • Y ∈ {1: subject; 2: relation; 3: object}; • Z ∈ {concept, attribute, relational phrase}. Note that if Y="2" then Z="p". So, the number of non-"O" tags is 20. Now each fact/condition tuple can be represented as a tag sequence. Moreover, it is the first work in sequence labeling that concepts and attributes are separated. The fact tuple in the example will ideally be: (alkaline pH, increases, {TRPV5/V6 channels : activity}). Figure 1 shows our framework. Multiple tag sequences are generated after the LSTMd decoder, each of which represents a fact or condition tuple. This multi-output module has two layers: one is a relation name tagging layer that predicts the tags of relational phrases and determines the number of output sequences; the other is a tuple completion tagging layer that generates the tag sequences for completing the fact and condition tuples.
To address the challenge of modeling the complex tag schema, besides language model, we incorporate as much information as possible from upstream tools such as Part-of-Speech tagging (POS), Concept detection, Attribute name extraction, and Phrase mining (CAP). And we transform them into tag sequences as the model input. We observe strong dependencies between the token's POS/CAP tags and target tags. We appreciate the high accuracy of existing techniques making the multi-input sequences available for new datasets.
The multi-input multi-output sequence labeling framework is named as MIMO. Experiments demonstrate that it improves F1 score relatively by 6.2% over state-of-the-art models for tuple extraction on a new bio-text dataset we will introduce in the later section. When transferred to the BioNLP2013 dataset without additional training, it improves F1 score relatively by 4.2%. We apply MIMO to a large set of 15.5M MEDLINE papers and construct a knowledge graph: An example can be found in Figure 4.

A New Dataset
We built a system with GUI ( Figure 2) to collect a new dataset for the joint tuple extraction purpose, named Biomedical Conditional Fact Extraction (BioCFE). Three participants (experts in biomedical domain) manually annotated the fact and condition tuples from statement sentences from 31 paper abstracts in the MEDLINE database. The an-  (2) make slots for a new tuple; (3) drag spans into the slots; (4) save annotations.
notation procedure took over 30 minutes on average for each paper. Here is a brief guide to the system. First, the users merged the token(s) into a span. Second, they gave a proper number of fact and/or condition tuple(s), where the proper number is not fixed but depends on the concrete sentence. Each tuple has five slots (subject's concept, subject's attribute, relation phrase, object's concept, and object's attribute). Third, they dragged the spans filling into the slots. If the three annotations are inconsistent, we filtered out the case. Eventually we have 756 fact tuples and 654 condition tuples from 336 annotated sentences. It is common to see one sentence having multiple facts and/or conditions, and actually 61%/52% statement sentences have more than one fact/condition tuples.

The Proposed Approach
Our approach has two modules: (1) a multi-input module that harnesses recent NLP development to process the text for input sequences from multiple tasks and feeds them into a multi-head encoderdecoder model with multi-input gates; (2) a multioutput module that generates multiple tuple tag sequences for fact and condition tuples, which consists of a relation name tagging layer and a tuple completion tagging layer, as shown in Figure 1.

The Multi-Input Module
Preprocessing for input sequences: Following fundamental NLP techniques have achieved high accuracy requiring no additional training with labeled data: Language Model (LM) (Howard and Ruder, 2018), POS (Labeau et al., 2015), CAP (Luan et al., 2018;Jiang et al., 2017;Shang et al., 2018;Wang et al., 2018a). For any given input sentence, we tokenize it and represent each token by its word embedding (pre-trained GloVe vector in this paper). Then we get another three input sequences by the input sentence and the above three fundamental NLP techniques.
(1) A pre-trained LSTM-based language model takes the sentence as input and returns semantic embedding sequence, where the dependencies between a token and its predecessors in distant contexts are preserved.
(2) We employ NLTK tool to generate the POS tag sequence for the given sentence. The POS tag sequence indicates syntactic patterns of the words in a sentence, that is the dependencies between POS tags and output tags, like verbs (e.g., VBD) and predicates (e.g., B-f2p).
(3) Multiple complementary IE techniques are used to detect concepts, attributes and phrases from the given sentences, being merged and resulting a CAP sequence. We make tags in the format of "B/Ic/a/p" for the tokens of concepts, attributes, and phrases.
Each sequence encodes a specific type of dependencies. A combination of multi-type dependencies learns the complicated dependencies on the 21 tuple tags better than any sole type. LM learns the dependencies between a token and its predecessors in distant contexts, which helps predict the position of subject, relation, and object. POS encodes the syntactic features of words. Dependencies between the POS tag and tuple tag (e.g., "VBD" and "B-f2p") can be modeled. We also spot high dependencies between the CAP tag and tuple tag. For example, the tokens of "B/I-c" (concept) and "B/I-a" (attribute) tags have high probability of being labeled as "B/I-XYc" and "B/I-XYa" in the output sequences, respectively.
Multi-head Encoder-Decoder: We investigate two neural models as encoder: one is bidirectional LSTM (BiLSTM), the other is the renown, bidirectional encoder representations from Transformers (BERT). We adopt a LSTM structure as the decoding layer (LSTMd) (Zheng et al., 2017). We observe that the input sequences may have different tag predictability on different sentences. For short sentences, POS and CAP are more useful (modeling local dependencies); for long sentences, LM is more effective (modeling distant dependencies). In order to secure the model's robustness on massive data, we apply a multi-head mechanism to the encoder-decoder model. Each head of the encoder-decoder is fed with one type of input sequence, and they are combined at the end of decoder layer. Thus, the tag prediction becomes more stable than using a simple encoderdecoder without the multi-head. Multi-input gates: We adopt the multi-input gates in ResNet (He et al., 2016) to take the most use of the multi-input sequences. We add the gates to the input of BiLSTM or BERT encoder, the input of LSTMd decoder, and the multi-output module.

Multi-Output Module
We propose to generate multiple output sequences. As annotating multiple tuples from one sentence is common, a token may have different expected tags in the tuples. On BioCFE, we observe that 93.8% statement sentences make multiple tuples: 21.7% of the sentences have at least one token that appears in at least one fact tuple and at least one condition tuple, expecting tags "B/I-fYZ" and "B/I-cYZ"; 18.1% of the sentences have at least one token that appears in one condition tuple as a part of subject and in another condition tuple as a part of object, expecting tags "B/I-c1Z" and "B/I-c3Z". Therefore, we extend the typical one-output sequence labeling to a multi-output design.
Then what is the number of output sequences? We reveal the significant role of relation names in making tuples. If we tagged the relation names out, for each relation name, of tags beginning with "B-f2p" as a fact's and "B-c2p" as a condition's, the module would generate an output sequence, respectively. Then we extract all possible tuples, whose relation has been specified, from every output sequence. Two observations on the annotated data support this idea: We transform each of the 1,410 tuples into a tag sequence. For the same sentence, if the tuples' relation names are the same, we merge their tag sequences into one and then use the matching function in (Stanovsky et al., 2018) to recover the tuples. First, 0 token has conflicting tags among the 240 merged sequences. Second, the recovery has 0 missing or wrong tuple. So, generating one output sequence and completing the tuples per relation name is practical.
The multi-output module has two layers: one is a relation name tagging layer and the other is a tuple completion tagging layer. Relation name tagging (RNT) layer: It consists of feed-forward neural networks (FFNs) and softmax layers. Decoded vectors are fed into the FFNs and the softmax predict the probability distribution of tags on fact and condition, respectively: ( 2) where f is for fact and c for condition. d i denotes the i-th token's vector given by the LSTMd. Now we have two tag sequences, one for fact and the other for condition. As we have argued with one-output, extracting tuples from the "twooutput" sequences cannot resolve the tag conflicts, either. Here we extract only the relation names: {r f 1 , r f 2 , · · · , r f n } denotes the n relation names (beginning with "B-f2p" tag) in fact tuples and {r c 1 , r c 2 , · · · , r c m } denotes the m relation names (beginning with "B-c2p" tag) in condition tuples. Tuple completion tagging (TCT) Layer: This layer predicts n fact tag sequences and m condition tag sequences. Each sequence is generated by a FFN and a softmax layer. The FFN obtains the relation name from the RNT layer. The FFN's input also includes the token's vectors from the encoder-decoder model of the multi-input module.
Here we take condition sequences as an example to describe the details of the method. When predicting the j-th tag sequence, we define the position embedding of the i-th token as follows, representing the relative position to the j-th relation name's tag "B-c2p": Thus, the tag probability distributions of the i-th token in the condition tag sequences are: Similarly, we have the following tag distributions for the i-th token in the fact tag sequences: where v f i,j is the position embedding of the i-th token in the j-th fact sequence, representing the relative position to the relation name's tag "B-f2p".
Finally, we apply the matching function in (Stanovsky et al., 2018) to complete and extract the tuples (i.e., the concepts and/or attributes in the subjects and objects) for each output sequence.

Loss Function and Training
Given a sentence s, the loss function of the relation name tagging layer can be written as below: where p f i,y and p c i,y are the probability of predicting y as the tag of the i-th token in the fact and condition tag sequences, respectively. y f i and y c i are the observed tag of the i-th token in the fact and condition tuple, respectively. N s is the length of the sentence s. The loss function of the tuple completion tagging layer is consisted of two parts, loss on fact tuples and loss on condition tuples: where n and m are the number of fact and condition tag sequences for the sentence s, respectively. The overall loss function for optimization is: where S is the set of statement sentences.
Training details: On one hand, Equations (6) and (7) show that the error signal can be propagated from the RNT/TCT layers to the encoder-decoder model. On the other hand, the RNT layer specifies the relation names, or say, the tokens that have tags "B/I-f2p" and "B/I-c2p" for each tag sequence in the TCT layer. So we cannot have smooth gradients for back propagation from the TCT layer to the RNT layer. So, in order to have good learning effectiveness, the quality of predicting relation names has been secured beforehand. We pre-train the RNT layer with the multi-input module till the relation name's tag prediction achieves a higherthan-0.8 F1 score. Then we plug the TCT layer onto the RNT layer and train the entire framework to generate the multi-output tag sequences.

Experiments
We evaluate the performance of condition/fact tag prediction and tuple extraction by the proposed MIMO model, its variants, and state-of-the-art models on the newly annotated BioCFE dataset and transferred to the BioNLP2013 dataset.

Experimental Setup
Datasets: Statistics of BioCFE has been given in the Section 2. Additionally, the attribute-related tags take 11.7% and 9.4% of non-"O" tags in fact and condition tuples, respectively. So, it is important to distinguish concept and attribute. To the best of our knowledge, it is the first time that conditional information was carefully annotated on biomedical literature. We use the system in Figure 2 to annotate a subset of BioNLP2013 Cancer Genetics (CG) task dataset (Nédellec et al., 2013). We have 197 fact tuples and 173 condition tuples. We use this BioNLP dataset as an extra test set for task of fact and condition tuples extraction, but the model will not be trained on this dataset. Validation: The ratio of training:validation:test is 60:8:32. For BioCFE, the evaluation set has 242 fact tuples and 209 condition tuples (on average from 108 sentences). We repeat five times, evaluate the performance, and report average results. Evaluation metrics: For tag prediction, We use standard metrics, precision, recall, and F1 scores. We have similar observations on Micro F1 scores as Macro F1 scores, so we report Macro F1 only. For evaluating tuple extraction, we use pair-wise comparison to match the extracted and groundtruth tuples. We evaluate the correctness on the tuple's five slots using the same metrics. Baselines: We compare with statistical sequence labeling methods: Structured Support Vector Machine (SVM) (Tsochantaridis et al., 2005) and Conditional random field (CRF) (Lafferty et al., 2001). We compare with a neural sequence labeling method, BiLSTM-LSTMd (Zheng et al., 2017). We replace its encoder with BERT (Devlin et al., 2018) to make it a more competitive baseline. We also compare against two renown OpenIE systems, Stanford OpenIE (Angeli et al., 2015) and AllenNLP OpenIE (Stanovsky et al., 2018) followed by a condition/fact classification.
We enhance statistical sequence labeling models with multi-input signals for fairness, and train them for fact tuple and condition tuple extrac-tion separately. In the neural baselines (BiLSTM-LSTMd and BERT-LSTMd), fact extraction and condition extraction share the encoder-decoder model and use different, proper parameters in the linear-softmax layer. Hyperparameters: The multi-input module has a BiLSTM/BERT encoder and a LSTM decoder. The word embeddings were obtained from GloVe (Pennington et al., 2014) with the dimension size d W E = 50. The language model dimension size ns d LM = 200. The size of POS tag embedding is d P OS = 6. The size of CAP tag embedding is d CAP = 3. The number of LSTM units in the encoding layer is 300. The number of transformer units in the BERT encoding layer is 768.

Results on BioCFE
In this section, we present overall performance, ablation study, error analysis, and efficiency. Table 1 shows that the proposed multi-input multioutput sequence labeling model with a BERT encoder consistently performs the best over all the baselines on tag prediction and tuple extraction. Compared to BiLSTM-LSTMd, BiLSTM-based MIMO improves F1 score relatively by 7.1% on tag prediction and by 8.8% on tuple extraction; compared to BERT-LSTMd, BERT-based MIMO improve F1 by 4.7% and 6.2% on the two tasks, respectively. Apparently the BERT encoder significantly improves the performance (by 16.9-17.2% on tag prediction and 7.7-10.3% on tuple extraction). And the MIMO design can further improve it. Neural sequence labeling models perform better than OpenIE systems and statistical methods. Neural sequence labeling models are more adaptive to learning structures with the new tag schema. Open IE method plus a condition/fact classification is not effective.

Overall Performance
Compared to BERT-LSTMd, the BERT-based MIMO improves precision and recall relatively by 8.3% and 1.3% on tag prediction; and relatively by 3.1% and 9.3% on tuple extraction, respectively. When the tags were more precisely predicted, the tuple's five slots would be more accurately filled, and we would have more complete tuples.
We also observe that the improvements on condition's tags/tuples are consistently bigger than the improvements on fact's tag/tuples. It demonstrates that the MIMO design recognizes the role of conditions in the statement sentences better.  Table 2 compares variants of the proposed model to evaluate the effectiveness of the following components: (1) multi-input sequences, such as none, or one (in LM, POS, and CAP), double combination, or triple combination; (2) multi-input encoder model, BiLSTM or BERT; (3) multi-output module, with the RNT layer only (generating one fact tag sequence and one condition tag sequence) or a combination of RNT and TCT layers (generating multiple sequences for each tuple type).

Ablation Study
Multi-input sequences: When the choices of the encoder model and multi-output layers are specified, we observe that triple combination of input sequences performs better than double combinations and the double combinations win over the sole input. An additional sequence makes a relative F1 improvement by 1.0-2.4%. The triple combination improves F1 relatively by 3.2-4.1%. This demonstrates that the three types of input sequences encode complementary information for learning dependencies in the proposed tag schema. First, the language model learns the dependencies between a token and its predecessors in distant contexts. Having the LM sequence recognizes subjects and objects relative to the relation names and reduces the false positives of "B/I-X1Z" and "B/I-X3Z". Second, the POS tag encodes the token's syntactic feature. Having the POS sequence improves the precision of tag prediction. For example, verbs and prepositions (e.g., "in", "during") often act as the relation name of facts and conditions, respectively; conjunction words (e.g., "that", "which") indicate subordinate clauses, so the noun phrase before the conjunction word is likely to be the subject of the tuple given by the clause. Third, the formerly-detected concepts, attribute names, and phrases are absolutely useful for tagging the slots of subjects and objects. In other words, the tags "B/I-c" and "B/I-a" in the CAP sequence are strongly associated with the target tags "B/I-XYc" and "B/I-XYa", respectively. Encoder in the multi-input module: Comparing the middle three columns (BiLSTM-based encoder) and the right-hand three columns (BERTbased encoder), one can easily tell the significant improvement brought by the BERT model. Layers in the multi-output module: If the multioutput models have both RNT and TCT layers, the F1 score is relatively 1.4-5.0% higher than the models that have the RNT layer only. Moreover, the recall is improved relatively by 1.5-9.0%. So the TCT layer, which generates multiple tag sequences for each type of tuple (i.e., fact and condition), plays a very important role in recognizing the multiple tuples from one statement sentence. Table 3 presents the confusion matrices made by the BERT-based MIMO on predicting non-"O" tags for facts and conditions, respectively. The columns are predicted tags and the rows are actual ones. Perfect results would be diagonal matrices.

Error Analysis
We observe that the numbers at the diagonal are consistently bigger than the numbers on the corresponding row and column. The accuracy scores are 0.905 for predicting fact tags and 0.908 for predicting condition tags. Of the 182 actual "B-f2p", the model predicted that 175 were "B-f2p"; of the 186 actual "B-c2p", it predicted that one was "I-c1c" and one was "I-c3c". It demonstrates the high accuracy (0.961 and 0.989) of extracting relation names for multi-output generation.  The ovals in each confusion matrix present the most significant type of error. Of a small set of actual subjects, the model predicted them as objects, and vise versa, though the fact/condition role and concept role were correctly predicted.
The dashed circles show the second frequent type of error. Of the actual "I-f2p" tokens, the model predicted that 7 were "B-f2p"; for the actual "I-c2p", it predicted that 6 were "B-c2p". Basically, it was because of missing the beginning word of the relational phrases. Of the actual "B-f3a" tokens, the model predicted 6 were "I-f2p". Future work will aim at improving the prediction of the boundaries of long relational phrases.

Efficiency
All the experiments were conducted on 16 Graphics Cards (GeForce GTX 1080 Ti), where one individual model only used 1 GPU. Each model was trained for 1,000 epochs. For the BiLSTM-LSTMd MIMOs, the pre-training took 2.4 hours and the re-training (TCT layer) took 0.4 hour. For the BERT-LSTMd MIMOs of the best performance, the pre-training took 3.5 hours and the retraining took 0.9 hour. It took 5.7 hours to extract fact and condition tuples from 141 million sentences in the MEDLINE text data. It is comparable with existing approaches in terms of scalability.

Results on BioNLP2013
As shown in Table 3, the BERT-LSTMd MIMO model achieves an F1 score of 0.790 on tuple extraction from BioNLP2013. Note that the model was trained on BioCFE that has no overlapping sentence with BioNLP2013. This score is comparable with the testing F1 score on the BioCFE (0.808), which demonstrates the effectiveness and reliability of the proposed model.
Our model improves the F1 score relatively by 4.2% over the best baseline BERT-LSTMd. The improvement on recall is more substantial: It improves recall relatively by 5.8%. It was because of the design of the multi-output module: the TCT layer generates multiple tag sequences based on the relation names predicted by the RNT layer. A token in a statement sentence may have different roles in different tuples of the same type (fact or condition). For example, given the following statement sentence: "Immunohistochemical staining of the tumors demonstrated a decreased number of blood vessels in the treatment group versus the controls." The proposed model is able to find one fact tuple and two condition tuples precisely:  -Condition 1: (blood vessels,in,treatment group) -Condition 2: (treatment group,versus,controls) Note that the concept "treatment group" acts as the object of Condition Tuple 1 (having tags "B/I-c3c") and the subject of Condition Tuple 2 (having tags "B/I-c1c"). The multi-output design tackled this issue while other models could not. Compared with BioCFE: On BioCFE, the F1 score on condition tuple extraction is a bit higher than that on fact tuple extraction (81.64 vs 79.94). On BioNLP2013, we have the opposite observation (78.58 vs 79.42). They are still comparable but if we look at the error cases, we find that most of the false predictions of condition tuple come from long sentences (having more than 30 words). And 35% of the sentences in BioNLP are long sentences, while only 5% in Bio CFE are long. Long dependency modeling is always challenging for IE, especially condition extraction. We will study it in the future work.

A Visualized Case Study
Scientific knowledge graph enables effective search and exploration. It is certainly important to represent the conditions of the corresponding fact    being valid in the graph. As we have applied our model to the large MEDLINE dataset, Figure 4 visualizes the fact and condition tuples extracted from four statement sentences about "cell proliferation". On the left side, we find (1) "VPA treatment" and the "incubation" of "HDLs" increased cell proliferation, while (2)"Chlorin e6-PDT" and the "inhibition" of "MiR-199a-5p" decreased cell proliferation. On the right, we are aware of the conditions of the factual claims. They describe the methodology of the observation (e.g., "using", "in combination with") or the context (e.g., "in" a specific disease or "from" specific animals). In some other cases, we find the temperature and pH values are detected as the conditions of observations. 5 Related Work

Scientific Information Extraction
Information extraction in scientific literature, e.g., computer science, biology and chemistry, has been receiving much attention in recent years. Scien-ceIE in computer science focus on concept recognition and factual relation extraction (Luan et al., 2017;Gábor et al., 2018;Luan et al., 2018). Sci-enceIE in biological literature aims at identifying the relationships between biological concepts (i.e., proteins, diseases, drugs and genes) (Kang et al., 2012;Xu et al., 2018). Rule-based approaches were used in early studies (Rindflesch and Fiszman, 2003;Kang et al., 2012). Recently, a wide line of neural network models have been proposed and outperformed traditional methods (Wang et al., 2018b;Xu et al., 2018;. Wang et al. (2018b) investigated different kinds of word embeddings on different NLP tasks in the biological domain.  employed attentionbased neural networks to extract chemical-protein relations. Xu et al. (2018) used the BiLSTM model to recognize the drug interaction. In our work, we extract biological relational facts as well as their conditions. The condition tuples are essential to interpreting the factual claims.

Open-Domain IE
Open IE refers to the extraction of (subject, relation, object)-triples from plain text (Angeli et al., 2015;Stanovsky et al., 2018;Saha et al., 2018;Wang et al., 2018a). The schema for the relations does not need to be specified in advance. Distant supervision has been widely used because the size of the benchmark data is often limited (Banko et al., 2007;Wu and Weld, 2010). Stanovsky et al. (2018) proposed supervised neural methods for OpenIE. The idea was to transform annotated tuples into tags and learn via sequence tagging. We create a new tag schema and propose a novel sequence labeling framework.

Sequence Labeling for IE
Statistical models have been studied for long, including Hidden Markov Models (HMM), Support Vector Machine (SVM), and Conditional Random Fields (CRF) (Lafferty et al., 2001;Tsochantaridis et al., 2005;Passos et al., 2014;Luo et al., 2015;Li et al., 2018). However, these methods rely heavily on hand-crafted features. Then neural network models become popular and obtain more promising performance than traditional statistical methods (Yang and Mitchell, 2017;Zheng et al., 2017;Wang et al., 2019;Yu et al., 2019). So, we use them as strong baselines.

Conclusions
We present a new problem to find conditional information in scientific statements. We created a new tag schema for jointly extracting condi-tion and fact tuples from scientific text. We proposed a multi-input multi-output sequence labeling model to utilize results from well-established related tasks and extract an uncertain number of fact(s)/condition(s). Our model yields improvement over all the baselines on a newly annotated dataset BioCFE and a public dataset BioNLP2013. We argue that structured representations of knowledge, such as fact/condition tuple, for scientific statements will enable more intelligent downstream applications. In the future work, we will explore the use of the structured tuples to bridge the gap between text content and knowledge-based applications, such as knowledge-based scientific literature search.