Scientific Information Extraction with Semi-supervised Neural Tagging

This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph-based semi-supervised algorithm together with a data selection scheme to leverage unannotated articles. Both inductive and transductive semi-supervised learning strategies outperform state-of-the-art information extraction performance on the 2017 SemEval Task 10 ScienceIE task.


Introduction
As a research community grows, more and more papers are published each year.As a result there is increasing demand for improved methods for finding relevant papers and automatically understanding the key ideas in those papers.However, due to the large variety of domains and extremely limited annotated resources, there has been relatively little work on scientific information extraction.Previous research has focused on unsupervised approaches such as bootstrapping (Gupta and Manning, 2011;Tsai et al., 2013), where hand-designed templates are used to extract scientific keyphrases, and more templates are added through bootstrapping.
Very recently a new challenge on Scientific Information Extraction (ScienceIE) (Augenstein et al., 2017) 1 provides a dataset consisting of 500  scientific paragraphs with keyphrase annotations for three categories: TASK, PROCESS, MATERIAL across three scientific domains, Computer Science (CS), Material Science (MS), and Physics (Phy), as in Figure 1.This dataset enables the use of more advanced approaches such as neural network (NN) models.To that end, we cast the keyphrase extraction task as a sequence tagging problem, and build on recent progress in another information extraction task: Named Entity Recognition (NER) (Lample et al., 2016;Peng and Dredze, 2015).Like named entities, keyphrases can be identified by their linguistic context, e.g.researchers "use" methods.In addition, keyphrases can be associated with different categories in different contexts.For example, 'semantic parsing' can be labeled as a TASK in one article and as a PROCESS in another.Scientific keyphrases differ in that they can include both noun phrases and verb phrases and in that non-standard "words" (equations, chemical compounds, references) can provide important cues.
Since the scale of the data is still small for supervised training of neural systems, we introduce semi-supervised methods to the neural tagging model in order to take advantage of the large quantity of unlabeled scientific articles.This is particularly important because of the differences in keyphrases across domains.Our semi-supervised learning algorithm uses a graph-based label propagation scheme to estimate the posterior probabilities of unlabeled data.It additionally extends the training objective to leverage the confidence of the estimated posteriors.The new training treats low confidence tokens as missing labels and computes the sentence-level score by marginalizing over them.
Our experiments show that our neural tagging model achieves state-of-the-art results in the Se-mEval Science IE task.We further show that both inductive and transductive semi-supervised strategies significantly improve the performance.Finally, we provide in-depth analysis of domain differences as well as analysis of failure cases.
The key contributions of our work include: i) achieving state of the art in scientific information extraction SEMEVAL Task 10 by extending recent advances in neural tagging models; ii) introducing a semi-supervised learning algorithm that uses graph-based label propagation and confidence-aware data selection, iii) exploring different alternatives for taking advantage of large, multi-domain unannotated data including both unsupervised embedding initialization and semi-supervised model training.

Related Work
There has been growing interest in research on automatic methods to help researchers search and extract useful information from scientific literature.Past research has addressed citation sentiment (Athar and Teufel, 2012b,a), citation networks (Kas, 2011;Gabor et al., 2016;Sim et al., 2012;Do et al., 2013;Jaidka et al., 2014), summarization (Abu-Jbara and Radev, 2011) and some analysis of research community (Vogel and Jurafsky, 2012;Anderson et al., 2012;Luan et al., 2012Luan et al., , 2014b;;Levow et al., 2014).However, due to scarce hand-annotated data resources, previous work on information extraction (IE) for scientific literature is very limited.Gupta and Manning (2011) first proposed a task that defines scientific terms for 474 abstracts from the ACL anthologhy (Bird et al., 2008) into three aspects: domain, technique and focus and apply templatebased bootstrapping to tackle the problem.Based on this study, Tsai et al. (2013) improve the performance by introducing hand-designed features from NER (Collins and Singer, 1999) to the bootstrapping framework.QasemiZadeh and Schumann (2012) compile a dataset of scientific terms into 7 fine-grained categories for 171 abstracts of ACL anothology.Similar to our work, very recently Augenstein and Søgaard (2017) also evaluated on ScienceIE dataset, but use multi-task learning to improve the performance of a supervised neural approach.Instead, we introduce a semi-supervised neural tagging approach that leverages unlabeled data.
Neural tagging models have been recently introduced to tagging problems such as NER.For example, Collobert et al. (2011) use a CNN over a sequence of word embeddings and apply a CRF layer on top.Huang et al. (2015) use hand-crafted features with LSTMs to improve performance.There is currently great interest in using characterbased embeddings in neural models.(Chiu and Nichols, 2016;Lample et al., 2016;Ballesteros et al., 2015;Ma and Hovy, 2016).Our approach also takes advantage of neural tagging models and character-based embeddings for IE in scientific articles.
Previous work on semi-supervised learning for neural models has mainly focused on transfer learning (Dai and Le, 2015;Luan et al., 2014a;Harsham et al., 2015) or initializing the model with pre-trained word embeddings (Mikolov et al., 2013;Pennington et al., 2014;Levy and Goldberg, 2014;Luan et al., 2016bLuan et al., , 2015Luan et al., , 2016a)).In our work, we use pre-training but also use more powerful methods including graph-based semisupervision (Subramanya and Bilmes, 2011;Liu and Kirchhoff, 2013, 2015, 2016a,b) and a method for leveraging partially labeled data (Kim et al., 2015).We show that the combination of these techniques gives better results than any one alone.

Problem Definition and Data
The purpose of this work is to extract phrases that can answer questions that researchers usually face when reading a paper: What TASK has the paper addressed?What PROCESS or method has the paper used or compared to?What MATERIALS has the paper utilized in experiments?While these fundamental concepts are important in a wide variety of scientific disciplines, the terms that are used in specific disciplines can be substantially differ-ent.For example, MATERIALS in computer science might be a text corpus, while they would be physical materials in physics or materials science.
Data We use the SemEval 2017 Task 10 Sci-enceIE dataset.Fig. 1 provides examples that illustrate the variation in domains, but also show that there are common cues such as "the task of", "using", "technique," etc.A challenge with this dataset is that the size of the training data is very small.It is built from ScienceDirect open access publications and consists of 500 journal articles, but only one paragraph of each article is manually labeled.Therefore, we use a large amount of external data to leverage the continuous-space representation of language in neural network model.We explore the effect of pre-training word embedding with two different external resources: i) a data set of Wikipedia articles as a general English resource, and ii) a data set of 50k Computer Science papers from ACM. 2Tagging Problem Formulation The task requires detecting the exact span of a keyphrase.In order to be able to distinguish spans of two consecutive keyphrases of the same type, we assign labels to every word in a sentence, indicating position in the phrase and the type of phrase.We formulate the problem as an IOBES (Inside, Outside, Beginning, End and Singleton) tagging problem where every token is labeled either as: B, if it is at the beginning of a keyphrase; E, if it ends the phrase; I, if it is inside a keyphrase but not the first or last token; S, if it is a single-word keyphrase; or O, otherwise.For example, "named entity recognition" in first sentence of Fig. 1 is labeled as "B-Task Itask E-task".

Neural Architecture Model
We introduce an end-to-end model to categorize scientific keyphrases, building on a neural named entity recognition model (Lample et al., 2016) and adding a feature-based embedding.

Model
We develop a 3-layer hierarchical neural model to tag tokens of the documents (details of the tokenization is in Sec. 6).( 1) The token representation layer concatenates three components for each token: a bi-directional character-based embedding, a word embedding, and an embedding associated with orthographic and part-of-speech features.(2) The token LSTM layer uses a bidirectional LSTM to incorporate contextual cues from surrounding tokens to derive intermediate token embeddings.(3) The CRF tagging layer models token-level tagging decisions jointly using a CRF objective function to incorporate dependencies between tags.
Character-Based Embedding.The embedding for a token is derived from its characters as the concatenation of forward and backward representations from a bidirectional LSTM.The character lookup table is initialized at random.The advantage of building a character-based embedding layer is that it can handle out-of-vocabulary words and equations, which are frequent in this data, all of which are mapped to "UNK" tokens in the Word Embedding Layer.Word Embedding.Words from a fixed vocabulary (plus the unknown word token) are mapped to a vector space, initialized using Word2vec pretraining with different combinations of corpora.Feature Embedding.We map features to a vector space: capitalization (all capital, first capital, all lower, any capital but first letter) and Part-of-Speech tags. 3 We randomly initialize feature vectors and train them together as other parameters.Token LSTM Layer We apply a bidirectional LSTM at the token level taking the concatenated character-word-feature embedding as input.The token representation obtained by stacking the forward and backward LSTM hidden states is passed as input to a linear layer that project the dimension to the size of label type space and is used as input to CRF layer.CRF Layer Keyphrase categorization is a task where there is strong dependencies across output labels (e.g., I-TASK cannot follow B-Process).Therefore, instead of making independent tagging decisions for each output, we model them jointly using conditional random field (Lafferty et al., 2001).For an input sentence x = (x 1 , x 2 , x 3 , . . ., x n ), we consider P to be the matrix of scores output by the bidirectional LSTM network.P is of size n × m, where n is the number of tokens in a sentence, and m is the number of distinct tags.P t,i corresponds to the score of the i-th tag of the t-th word in a sentence.We use a first-order Markov Model and define a transition matrix T where T i,j represents the score from tag i to tag j.We also add y 0 and y n as the start and end tags of a sentence.Therefore T becomes a square matrix of dimension m + 2. Given one possible output y, and neural network parameters θ we define the score as The probability of sequence y is obtained by applying a softmax over all possible tag sequences p θ (y|x) = exp(φ(y; x, θ)) y ∈Y exp(φ(y ; x, θ)) (2) where Y denotes all possible tag sequences.The normalization term is efficiently computed using the forward algorithm.

Supervised Training
During training, we maximize the log-probability L(Y ; X, θ) of the correct tag sequence given the corpus {X, Y }.Backpropagation is done based on a gradient computed using sentence-level scores.

Semi-supervised Learning
We develop a semi-supervised algorithm that extends self-training by estimating the labels of unlabeled data and then using those labels for retraining.Specifically, we use a graph-based algorithm to estimate the posterior probabilities of unlabeled data and develop a new CRF training to take the uncertainty of the estimated labels into account while optimizing the objective function.

Graph-based Posterior Estimates
Our semi-supervised algorithm uses the following steps to estimate the posterior.It first constructs a graph of tokens based on their semantic similarity, then uses the CRF marginal as a regularization term to do label propagation on the graph.The smoothed posterior is then used to either interpolate with the CRF marginal or as an additional feature to the neural network.
Graph Construction Vertices in the graph correspond to tokens, and edges are distance between token features which capture semantic similarity.The total size of the graph is equal to the number of tokens in both labeled data V l and unlabeled data V u .The tokens are modelled with a concatenation of pre-trained word embeddings (with dimension d) of 5-gram centered by the current token, the word embedding of the closest verb, and a set of discrete features including part-of-speech tags and capitalization (43 and 4 dimension onehot features).The resulting feature vector with dimension of 5d + d + 43 + 4 is then projected down to 100 dimensions using PCA.We define the weight w uv of the edge between nodes u and v as follows: , where K(u) is the set of k-nearest neighbors of u and d e (u, v) is the Euclidean distance between any two nodes u and v in the graph.An example of our graph is in Fig. 2.
For every node i in the graph, we compute the marginal probabilities {q i } using the forwardbackward algorithm.Let θ i represent the estimate of the CRF parameters after the n-th iteration, we compute the marginal probabilities p(j,t) = p(y j t |x; θ i ) over IOBES tags for every token position t in sentence j in labeled and unlabeled data.Label Propagation We use prior-regularized measure propagation (Liu and Kirchhoff, 2014;Subramanya and Bilmes, 2011) to propagate labels from the annotated data to their neighbors in the graph.The algorithm aims for the label distribution between neighboring nodes to be as similar to each other as possible by optimizing an objective function that minimizes the Kullback-Leibler distances between: i) the empirical distribution r u of labeled data and the predicted label distribution q u for all labeled nodes in the graph; ii) the distributions q u and q v for all nodes u in the graph and their neighbors v; iii) the distributions q u and the CRF marginals pu for all nodes.The third term regularizes the predicted distribution toward the CRF prediction if the node is not connected to a labeled vertex, ensuring the algorithm performs at least as well as standard self-training.
Posterior Estimates We develop two strategies to estimate the new posteriors p(y t |x; θ), which can then be used in the CRF training.
The first strategy (called GRAPHINTERP) is the commonly used approach (Subramanya et al., 2010;Aliannejadi et al., 2014) that interpolates the smoothed posterior {q} with CRF marginals p: where α is a mixing coefficient.
A second strategy introduced here (called GRAPHFEAT) uses the smoothed posterior {q} as features and learns it with other parameters in the neural network.Given a sentence {x 1 , . . ., x n }, let Q = {q 1 , . . ., q n } be the predicted label distribution from the graph.We then use Q as a feature input to neural network as P = P + M Q where P is the n × m matrix output by the bidirectional LSTM network as in Eq. 1, and M is m × m matrix and is learned together with other parameters of neural network.We modify Eq. 1 by replacing P t,yt with Pt,yt .Note that GRAPHFEAT can only be done in a transductive way since it requires output Q from the graph at test time.

CRF training with Uncertain Labels
A standard approach to self-training is to make hard decisions for labeling tokens based on the estimated posteriors and retrain the model.However, the estimated posteriors in our task are noisy due to the difficulty and variety of the ScienceIE task.Instead, we extend the CRF training to leverage the confidence of the estimated posteriors.The new CRF training (called Uncertain Label Marginalizing (ULM)) treats low confidence tokens as missing labels and computes the sentencelevel score by marginalizing over them.A similar idea has been previously used in treating partially labeled data (Kim et al., 2015).
Specifically, given a sentence x we define a constrained lattice Y(x), where at each position t the allowed label types Y(x t ) are: where η is the confidence threshold, y t is the prediction of posterior decoding and p(y t |x; θ) is its CRF token marginal.The new neural network parameters θ are estimated by maximizing the loglikelihood of p θ (Y(x k )|x k ) for every input sentence x k , where where y k is an instance sequence of lattice Y(x), and k is the sentence index in the training set.Extreme cases are when all tokens are uncertain then the likelihood would be equal to 1, when all tokens of a sequence are confident, it would be equal to Eq. 2 where only one possible sequence, as in Fig. 3.

Inductive and Transductive Learning
The semi-supervised training process is summarized as follow: It first computes marginals over the unlabeled data given a set of CRF parameters.It then uses the marginals as a regularization term for label propagation.The smoothed posteriors from the graph are then interpolated with the CRF marginal in GRAPHINTERP or used as an additional feature in GRAPHFEAT.It then uses the estimated labels for the unlabeled data combined with the labeled data to retrain the CRF using either the hard decision CRF training objective as Eq. 2 or the ULM data selection objective.
In the inductive setting, we only use the unlabeled data from the development set for the semisupervision.In the transductive setting we also use the unlabeled data of the test set to construct the graph.In both cases, the parameters are tuned only on the dev set.

Experimental Setup
Data The SemEval ScienceIE (SE) corpus consists of 500 journal articles; one paragraph of each article is randomly selected and annotated.The complete unlabeled articles and their metadata are provided together with the labeled data.The training data consists of 350 documents; 50 are kept for development and 100 for testing.The 500 articles come from 82 different journals evenly distributed in three domains.We manually labeled 82 journal names in the dataset into the three domains and do analysis based on the domain partitions.The 500 full articles contains 2M words and is 30 times the size of the annotated data.
Additionally, we use two external resources for pretraining word embeddings: i) WIKI, as for Wikipedia articles, specifically a full Wikipedia dump from 2012 containing 46M words, and ii) ACM, a collection of CS papers, containing 108M words.
Comparisons We compare our system with two template matching baselines and the state-of-theart on the SemEval Science IE task.The first baseline (Gupta and Manning, 2011) is an unsupervised method to extract keyphrases by initially using seed patterns in a dependency tree, and then adding to seed patterns through bootstrapping.The second baseline (Tsai et al., 2013) improves the work of Gupta and Manning (2011) by adding Named Entity Features and use different set of seed patterns.
Implementation details All parameters are tuned on the dev set performance, the best parameters are selected and fixed for model switching and semi-supervised systems.The word embedding dimension is 250; the token-level hidden dimension is 100; the character-level hidden dimension is 25; and the optimization algorithm is SGD with a learning rate of 0.05.For building the graph, the best pre-trained embeddings for the supervised system (Sec.7.2) are used in each domain.Two special tokens BOS and EOS are added when pre-training, indicating the begin and end of a sentence.The number of the graph vertices is 2M in tranductive setting and 1.4M in inductive setting.The ULM parameter η in Eq. 4 is tuned from 0.1 to 0.9, the best η is 0.4.The best parameters of label propagation are µ = 10 −6 and ν = 10 −5 .The interpolation parameter α in Eq. 3 is tuned from 0.1 to 0.9, the best α is 0.3.We do iteration of semi-supervised learning until we obtain the best result on the dev set, which is mostly achieved in the second round.
We use Stanford CoreNLP (Manning et al., 2014) tokenizer to tokenize words.The tokenizer is augmented with a few hand-designed rules to handle equations (e.g."fs(B,t)=Spel(t)S" is a single token) and other non-standard word phenomena (Cu40Zn, 20MW/m2) in scientific literature.We use Approximate Nearest Neighbor Searching (ANN)4 to calculate the k-nearest neighbors.For all experiments in this paper, k = 10.Setup We evaluate our system in both inductive and transductive settings.The systems with a * superscript in the table are transductive.The inductive setting uses 400 full articles in ScienceIE training and dev sets, while the transductive setting uses 500 full articles including the test set.In both settings parameters are tuned over the dev set.
We evaluate our NN-CRF model in both supervised and semi-supervised settings.We also perform ablations and try different variants to best understand our model.

Best Case System Performance
Table 1 reports the results of our neural sequence tagging model NN-CRF in both supervised and semi-supervised learning (ULM and graph-based), and compares them with the baselines and the state-of-the-art (best SemEval System (Augenstein et al., 2017)).
Augenstein and Søgaard (2017) use a multi-task learning strategy to improve the performance of supervised keyphrase classification, but they only report dev set performance on SemEval Task 10, we also include their result here and refer it as MULTITASK.We report results for both span identification (SemEval SubTask A) and span classification into TASK, PROCESS and MATERIAL (SemEval Subtask B). 5The results show that our neural sequence tagging models significantly outperforms the state of the art and both baselines.It confirms that our neural tagging model outperforms other nonneural and neural models for the SemEval Scien-ceIE challenge6 .It further shows that our system achieves significant boost from semi-supervised learning using unlabeled data.Table 5 shows the detailed analysis of the system across different categories.

Supervised Learning
Impact of Neural Model Components Table 2 provides the results of an ablation study on the dev set showing the impact of different components of our NN-CRF on the Scientific IE task.For the basic model, the word embeddings are initialized by word2vec trained on the 350 full journal articles in the SE training set together with Wikipedia and ScienceIE data.The feature layer, character layer, and bi-LSTM word layers all improves the performance.Moreover, we observe a large improvement (20.6% relative) in the scientific IE task by adding the CRF layer.Initialization Table 3  mains.We explore different word embedding pretraining with ScienceIE training set alone (SE), and adding other external resources including Wikipedia (wiki) and Computer Science articles (ACM).All alternatives use word2vec.Compared with using SE alone, introduction of all external data sources improve performance.Moreover, we observe that with the introduction of the ACM dataset, the performance on the CS domain is increased significantly in both the dev and test sets.Adding Wikipedia data benefits all three domains, with more significant improvement on the MS and Physics domains.
Based on these observations, we select the best model on each domain according to the dev set and use the combined result as our best suprevised system (called NN-CRF(supervised)).The F1 score improves from 39.4 to 40.2 when applying model switching strategy.The best model on the dev set is used for each domain: for MS and physics domain, we pretrain word embeddings with the SE and Wiki, and for the CS domain, we pretrain with the SE and ACM.well, and tune the parameters on the dev set.ii) GRAPHFEAT uses the smoothed posterior from label propagation as additional feature to neural network and only has transductive setting.As expected, the transductive approaches consistently outperform inductive approaches on the test set.With around the same performance on dev set, GRAPHINTERP* seems to generalize better on test set with 1.6% relative improvement over GRAPHINTERP.We observe higher improvement with GRAPHFEAT* compared to GRAPHINTERP.This is mainly because automatically learning the weight matrix M between neural network scores and graph outputs adds more flexibility compared to tuning an interpolation weight α.The performance is further improved by applying data selection through modifying the objective to ULM.The best inductive system is ULM+GRAPHINTERP with 5.6% relative improvement over pure Self-Training that makes hard decisions, and the best transductive system is ULM+GRAPHFEAT* with 8.6% relative improvement.

Category and Span Analysis
Table 5 details the performance of our method on the three categories at the span and token level.We observe significant improvement by using ULM+GRAPHINTERP and ULM+GRAPHFEAT over best SemEval and our best supervised system on all three categories at both token and span levels.We further observe that systems' performance on TASK classification is much lower than PROCESS and MATERIAL.This is in part because TASK is much less frequent than the other types.In addition, TASK keyphrases often include verb phrases while the other two domains mainly consists of noun phrases.An analysis of confusion patterns show that the most frequent type confusions are between PROCESS and MATERIAL.However, we observe that ULM+GRAPHFEAT* can greatly reduce the confusion, with 3.5% relative improvement of PRO-CESS and 3.6% relative improvement of PROCESS over ULM+GRAPHINTERP on token level.

Error Analysis
We provide examples of typical errors that our system makes in Table 6.As described in the previous subsection, TASK is the hardest type to identify with our system.Row 1 shows a failure to detect the verb phrase following 'to' as part of the TASK, but detect 'enantiopure products' as MA-TERIAL.The system prefers to predict PROCESS or MATERIAL since those classes have more samples than TASK.Row 2 illustrates the problem of identifying general terms as keyphrases due to similar context, such as 'receptors' and 'drug action'.A third common error involves incorrectly labeling adjectives, such as 'neighbouring' in Row 3, which leads to span errors.Another common cause of error is insufficient context: in the last example, a larger context is needed to determine whether 'SWE' is a PROCESS or MATERIAL.

Conclusion
This paper casts the scientific information extraction task as a sequence tagging problem and introduces a hierarchical LSTM-CRF neural tagging model for this task, building on recent results in NER.We introduced a semi-supervised learning algorithm that incorporates graph-based label propagation and confidence-aware data selection.We show the introduction of semi-supervision significantly outperforms the performance of the supervised LSTM-CRF tagging model.We additionally show that external resources are useful for initializing word embeddings.Both inductive and transductive semi-supervised strategies  achieve state-of-the-art performance in SemEval 2017 ScienceIE task.We also conducted a detailed analysis of the system and point out common error cases.
In our experiments, we observe that including in-domain data only for semi-supervised learning has slightly better performance than using crossdomain data.Reducing the amount of in-domain data hurts performance.Therefore, adding more in-domain unlabeled data may help when combined with selection schemes such as the ULM algorithms proposed here.It would be useful to assess the impact of matched unlabeled data for the physics and material science domain.Other future work includes leveraging global context, information of citation network.
Physics:[Local field effects] Process on spontaneous emission rates within [nanostructure photonics material] Material for example are familiar, and have been well used.Material Science: The [Kelvin probe force microscopy technique]Process allows [detection of local EWF] Task between an [atomic force micorscopy] Material and [metal surface] Material .

Figure 2 :
Figure 2: Label propagation.Gray nodes indicates labeled data while white nodes are unlabeled.Bold font word indicates the current token.The assumption is if two instances are similar according to the graph, the output labels should be similar.

Figure 3 :
Figure 3: Lattice representation of ULM.Dashed box is the uncertain token which is going to be marginalized over.Arrows and grey nodes are paths to be summed over during training.When all tokens are confident, the score of only one path is calculated.

Table 1 :
Overall span-level F1 results for keyphrase identification (SemEval Subtask A) and classification (SemEval Subtask B). * indicates tranductive setting.+ indicates not documented as either transductive or inductive.-indicates score not reported or not applied.

Table 2 :
Ablation study showing impact of neural network configurations of our NN-CRF(supervised) model on the dev set.

Table 3 :
reports our NN-CRF performance when pretrained on different do-F1 score on the dev and test sets for using different sources of data for pretraining.

Table 4
For inductive setting, GRAPHIN-TERP only uses un-annotated data from the dev set and uses the best model for decoding at test time.For transductive setting, GRAPHINTERP * uses unannoated data from test set to build the graph as

Table 5 :
F1 score results on the test set for different cat- egories: T indicates TASK, P indicates PROCESS, M is MA-TERIAL and K is Keyword identification (SubTask A). * is transductive model.
in aiming to [achieve [enantiopure products] Material ] Task is therefore a means to [quantitate [the enantiometric excess]Process] Task .General terms Since the [receptors] Material in human biology mostly consist of [chiral molecules] Material , [drug action]Process mostly involves a specified enantiometric form.Falsely predicted adjectives It has been shown that the most efficient forms of energy transfer between the two occurs when there is a [neighbouring carotenoid species] Material .Lack of context Other models use [SWEs ] Material Process but focus on the use of multi resolution grids or irregular mesh.

Table 6 :
Common errors, where blue means golden label our system misses, red means falsely predicted results, and green means correctly predicted spans.