Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain

We propose the task of Open-Domain Information Narration (OIN) as the reverse task of Open Information Extraction (OIE), to implement the dual structure between language and knowledge in the open domain. Then, we develop an agent, called Orator, to accomplish the OIN task, and assemble the Orator and the recently proposed OIE agent — Logician into a dual system to utilize the duality structure with a reinforcement learning paradigm. Experimental results reveal the dual structure between OIE and OIN tasks helps to build better both OIE agents and OIN agents.


Introduction
The duality between language and knowledge is natural for human intelligence. The human can extract knowledge from natural language to learn or remember, and then narrate the knowledge back to natural language to communicate. Information extraction (IE) is a task to simulate the first part of the duality, which is a long-term hot spot for NLP research. Recently, the task that fulfills the last part of the duality, that is, assembling a set of relation instances/facts or database records into natural language sentences/documents, has also attracted many interests (Wiseman et al., 2017;Chisholm et al., 2017;Agarwal and Dymetman, 2017;Vougiouklis et al., 2017;Yin et al., 2016). In the literature, this task has been referred to as "data to document generation" (Wiseman et al., 2017) or "knowledge-to-text" (Chisholm et al., 2017). In this paper, we name the task as information narration (IN), to emphasize the reverse relationship to the information extraction (IE) task.
The duality between language and knowledge (and thus between the IE and IN tasks) can be examined in closed-domain or open-domain. For the closed-domain problem, the closed-domain IE (CIE) task is often referred to as "relation ex-traction" or "relation classification", which identifies instances of a fixed and finite set of relations from natural language corpus, using supervised methods (Kambhatla, 2004;Zelenko et al., 2003;Miwa and Bansal, 2016;Zheng et al., 2017) or weakly supervised methods (Mintz et al., 2009;Lin et al., 2016). In the meantime, the closedomain IN (CIN) task (Wiseman et al., 2017;Chisholm et al., 2017;Agarwal and Dymetman, 2017;Vougiouklis et al., 2017;Yin et al., 2016) transforms a set of facts with a pre-defined schema or relation types (such as facts from Freebase (Bollacker et al., 2008), DBpedia (Auer et al., 2007), or database tables), into natural language sentences/documents. Furthermore, the dual structure between CIE and CIN tasks has been noticed and utilized in (Chisholm et al., 2017).
For the open-domain problem, the open-domain IE (OIE) task is to investigate how the natural language sentences express the facts, and then use the learned knowledge to extract entity and relation level intermediate structures from opendomain sentences Schmitz et al., 2012;Pal and Mausam, 2016). Although the OIE task has attracted much interests and obtained many applications (Christensen et al., 2013(Christensen et al., , 2014Mausam, 2016;Stanovsky et al., 2015;Khot et al., 2017;Fader et al., 2014), the OIN task has not been stated, neither the duality between the language and knowledge in the open domain.
Open-Domain Closed-Domain Extraction OIE CIE Narration OIN CIN The tasks involved in the duality between language and knowledge is shown in Table 1, where the OIN task has not been stated. In this paper, we focus on the OIN task and the duality between the OIE and OIN tasks, for following reasons: 1) the OIN task is an essential component for opendomain information processing pipeline. For example, it is helpful for building natural and informative response for open-domain KBQA systems (Khot et al., 2017;Fader et al., 2014). 2) (as the results in this paper will illustrate) the duality between tasks can be valuable for building better agents for both tasks (Xia et al., 2017(Xia et al., , 2016. A major historical obstacle for investigating the duality between OIE and OIN tasks is the absence of parallel corpus between natural language sentence and open-domain facts. Recently, the SAOKE dataset (Sun et al., 2018) was released, which contains more than forty thousand of human-labeled open-domain sentencefacts pairs, and thus essentially eliminates the obstacle for our investigation.
The contribution of this paper lies in following aspects: • We propose the concept of OIN task, which is potentially an important component for opendomain information pipeline. We develop the Orator agent to fulfill the task; • We build a multi-agent system with Logician and Orator to exploit the dual structure between language and knowledge in open domain. Experimental results reveal that the dual information is beneficial for improving the performance of both agents.
The paper is organized as follows: Section 2 discusses the related work. Section 3 explains the Orator agent for OIN task. Section 4 describes the multi-agent system with Logician and Oration and its algorithm to learn from the duality between language and sentence. The experimental results of the fine tuned agents are shown and discussed in Section 5. We conclude our work and discuss the future direction in Section 6.
2 Related Work

Information Narration
The closed-domain information narration (CIN) task has been studied in (Wiseman et al., 2017;Chisholm et al., 2017;Agarwal and Dymetman, 2017;Vougiouklis et al., 2017). These CIN agents face problems from different problem domains, from people biographies to basketball game records, but most of them follow the same sequence-to-sequence pattern. First, the algorithm encodes a sequence of facts into a set of annotations and then decodes the annotations into a natural language text. Mechanisms such as attention  and copying (Gu et al., 2016) are employed into the decoder to improve the performance. Then, the models are trained on a supervised dataset with backpropagation.
In this work, we adopt a similar sequence-tosequence architecture to build our baseline Orator agent, but with following differences: 1) the Orator is proposed to narrate open domain facts, where the encoder must encode words rather than the entities and relations in the closed domain; 2) the baseline Orator will be fine tuned using the dual learning algorithm proposed in this paper.

Dual Learning Systems
For many natural language processing tasks, there exist corresponding reverse/dual tasks. One example of a pair of dual problems is the question answering (QA) and question generation (QG). In , the duality between QA problem and QG problem was considered as a constraint that both problems must share the same joint probability. Then, a loss function that implemented the constraint was involved in the supervised learning procedures for both agents. Furthermore, researchers Sachan and Xing, 2018) use both the question-answering agent and the questiongeneration agent to identify extra high-confident question-answering pairs, which are further used to fine tune the pre-trained agents.
Back-and-forth translation (or round-trip translation) 1 is another example of duality, in the field of machine translation. It has been employed to evaluate the quality of machine translation systems (Van Zaanen and Zwarts, 2006), or to test the suitability of text for machine translation (Gaspari, 2006;Shigenobu, 2007). Recently, (Xia et al., 2016) implemented the duality in a neuralbased dual learning system, in which the quality of each translation agent was improved on the unlabeled dataset using the rewards provided by its   corresponding dual agent, using the reinforcement learning technique.
Parsing-reconstruction is also a pattern of duality. (Konstas et al., 2017) considered the AMR (Abstract Meaning Representation) (Banarescu et al., 2013) parsing problem (text to AMR) and AMR generation problem (AMR to text) in one system, in which the AMR parser generated extra text-AMR pair data to fine tune the AMR generator. The AMR generator, however, does not contribute to the performance improvement of the AMR parser. The CIE agent and CIN agent in (Chisholm et al., 2017) also follow this pattern, where both agents help each other to improve by sharing weights. Nevertheless, the sharing weight strategy cannot be applied to agents with different architecture, which is a typical situation in practice.
From these practices, it can be seen that the duality can be implemented with two major approaches: 1) by providing additional labeled samples via bootstrapping, and 2) by adding losses or rewards to the training procedure of the agents. In this paper, we follow the second approach. We design a set of rewards, among which some are related to OIE and OIN tasks respectively, and some are related to the duality of the problems. Then we optimize both agents using the reinforcement learning technique. The learning algorithm is similar to the dual-NMT algorithm described in (Xia et al., 2016), but with adaption for the OIE and OIN tasks, especially on the task related rewards. Compared to the approach of applying the regularization about sharing the same joint probability , our approach directly optimizes the task objective by introducing task related rewards. Furthermore, our approach is more adaptable than the weight sharing approach adopted in (Chisholm et al., 2017).

SAOKE Dataset
Symbolic Aided Open Knowledge Expression (SAOKE) is proposed in (Sun et al., 2018) as the form to honestly record the facts that humans can extract from sentences when humans read them. SAOKE uses a unified form -an n-ary tuple: (subject, predicate, object 1 , · · · , object N ), to express four categories of facts: 1) Relation: Verb/preposition-based n-ary relations between entity mentions; 2) Attribute: Nominal attributes for entity mentions; 3) Description: Descriptive phrases of entity mentions; 4) Concept: Hyponymy and synonymy relations among concepts and instances.
Using this SAOKE format, Sun et al. (2018) manually labeled the SAOKE dataset D SAOKE by crowdsourcing, which includes more than forty thousand sentence-facts pairs < S, F >. 2 The labeling procedure is under the supervision of the "Completeness" criterion (Sun et al., 2018), so the facts recorded information in the sentence as much as possible (only auxiliary information and relation between facts are omitted (Sun et al., 2018)). As a result, the SAOKE dataset is a valid opendomain sentence-facts parallel dataset for both OIE and OIN tasks. Table 2 is an example from the SAOKE dataset for an easy understanding of the dual relationship between sentence and facts.

Model
The Orator is an agent O that assembles a set of open-domain facts F into a sentence S with prob- For each pair < S, F >∈ D SAOKE , the set of facts F is actually expressed as a sequence of facts, in the order of the labeler wrote them. So, the deep sequence to sequence paradigm is suitable to model the Orator. In this work, we build the base Orator model with the attention-based sequence to sequence model, together with copy and coverage mechanism, in a similar way of the implementation of the Logician in (Sun et al., 2018).

Attention based Sequence-to-sequence Learning
The attention-based sequence-to-sequence learning  first encodes the in- . Then, when generating word w t of the target sentence, the decoder computes the probability of generating w t by p(w t |{w 1 , · · · , w t−1 }, where s t is the hidden state of the GRU decoder, g is the word generation model, and c t is the dynamic context vector which focuses attention on specific location l in the input hidden states H F .
For the Orator, we use the copy mechanism to implement the word generation model g and use the coverage mechanism to compute the dynamic context vector c t .

Copy Mechanism
In the SAOKE dataset, the words in the set of facts (excluding the external symbols) must be in the corresponding sentence, so the problem is suitable to be modeled via the copy mechanism (Gu et al., 2016). In the copy mechanism, when the decoder is considering generating a word w t , it can either be copied from the source fact sequence F or select from a vocabulary V : where p F is the probability of copying from F and p V is the probability of selecting from V . The details can be found in (Gu et al., 2016).

Coverage Mechanism
To cope with the problem of information lost or redundancy in the generated sentence, the copied histories of previous generated words should be remembered to guide future generation. This could be done through the coverage mechanism (Tu et al., 2016), in which a coverage vector m t j is introduced for each word w F j in F and updated at each step t as a gated function of h F j , α tj , s t−1 , m t−1 j . By this means, the coverage vectors remember the historical attentions over source sequence and can be incorporated in the alignment model to generate complete and nonredundant sentences. Detailed formulations can be found in (Tu et al., 2016) and (Sun et al., 2018).

Learning the Dual Structure between
Knowledge and Natural Language

Dual Structure between Orator and Logician
In (Sun et al., 2018), an agent L, called Logician, was trained to convert a sentence S into a set of facts F with probability P L (F|S, Θ L ), where Θ L is the set of parameters of L: Logician L: S → P L (F|S, Θ L ).
Obviously, the Logician and Orator can cooperate to supervise each other. Given < S, F >∈ D SAOKE , the Logician produces a predicted set of facts F * for the sentence S, and the Orator can calculate the probability P O (S|F * , Θ O ) of reconstruction S from F * . Intuitively, if F * loses major information of S, honestly reconstructing S from F * would be impossible, and thus the probability P O (S|F * , Θ O ) would be small. Thus, it is a strong signal to evaluate the quality of F * . Similarly, when the Orator produces a sentence S * for the set of facts F, the probability P L (F|S * , Θ L ) provided by the Logician is a strong signal for evaluating the quality of S * . These signals are helpful to conquer several problems of the original agents, including information lost, information redundancy, and non-fluency.
Note that the supervision signals P O (S|F * , Θ O ) and P L (F|S * , Θ L ) do not rely on any supervised parallel corpus. Thus, similar to the application of dual learning paradigm on NMT task (Xia et al., 2016), it is theoretically possible to use unparalleled sentences and sets of facts to compute these signals. However, unsupervised collections of fact-groups that can be reasonably narrated in a sentence are not naturally available. Currently, the only available collection is the sets of facts provided by the SAOKE dataset, where the supervised information is available. As a result, we implement our dual learning system in a supervised approach, which uses the reinforcement learning algorithm to optimize the Orator and the Logician. The involved rewards are described in the next subsection, and then the algorithm is detailed in the last subsection.

Rewards
Given < S, F >∈ D SAOKE , we sample a set of facts F * from distribution P L (·|S, Θ L ) and a sentence S * from distribution P O (·|F, Θ O ). Following rewards are introduced into the proposed dual learning system, and the relationships between them are shown in Figure 1.

Reconstruction Rewards
Following the idea described in above subsection, we design the reconstruction reward for the Orator as: and that for the Logician as:

Similarity Rewards
Since the SAOKE dataset has label information, the similarities between the predicted results and the ground truths can be used as rewards.
For the Orator, since the S * can be viewed as the summarization of S, we use the widely used ROUGE-L (Lin, 2004) measure in the text summarization field to evaluate the quality of S * : For the Logician, we use following procedure to calculate the similarity between F and F * . First, we compute the similarity between each predicted fact f * ∈ F * and each ground truth fact f ∈ F with following measure: where f * i and f i denote the i-th element of tuples of fact f * and f , SimStr(·, ·) denotes the gestalt pattern matching (Ratcliff and Metzener, 1988) measure for two strings, and | · | is the cardinality function. Then, each predicted fact in F * is aligned to its corresponding ground-truth fact in F by solving a linear assignment problem (Wikipedia, 2017) to maximize the sum of similarities between the aligned facts. Finally, the similarity reward for the Logician is calculated by: where f * ∈ F * , f ∈ F are aligned pair of facts.

Validity Rewards
For the Orator, the output is expected as a valid natural language sentence, so the validity reward can be defined as: where the LM (·) is a language model. For the Logician, the output should represent a valid collection of facts, which means: 1) the output can be parsed into a collection of facts; 2) there is no duplicated fact (identified by the SimF act value larger than 0.85) in the parsed collection. The validity reward for Logician is defined as: Algorithm 1 A simple dual-learning algorithm for facts extraction and expression Require: A set of sentence-facts pairs {< S, F >}; An initial Logician L and an initial Orator O; Beam size K; repeat 1: Sample a sentence-facts pair < S, F >; 2: Logician produces K sets of facts F 1 , · · · , F K from S via beam search; 3: for each set of facts F i do 4: Compute the reward for F i as: 5: end for 6: Compute the total reward r = 1 K K i=1 r F i ; 7: Compute the stochastic gradient of Θ L : 9: Model updates: 1: Sample a sentence-facts pair < S, F >; 2: Orator produces K sentences S 1 , · · · , S K from F via beam search; 3: for each sentence S i do 4: Compute the reward for S i as: 5: end for 6: Compute the total reward r = 1 9: Model updates: until convergence

Algorithm
For each pair < S, F >∈ D SAOKE , the following procedures are performed respectively (details are shown in Algorithm 1):

Learning from Sentence to Facts
We sample F * from the Logician P L (·|S, Θ L ) and calculate the total reward for F * by where α i = 1. The gradients of the expected reward E[r L ] to the parameters of agents can be computed as follows, according to the policy gradient theorem (Sutton et al., 1999): where D Θ L (F, S) = ∇ Θ L log P L (F|S, Θ L ) and

Learning from Facts to Sentence
We sample S * from the Logician P O (·|F, Θ O ) and define the total reward for S * by: The gradients can be computed as follows: In practice, we use beam search (Sutskever et al., 2014) to obtain high-quality samples as F * and S * , and estimate the true gradient with the empirical average of gradients over these samples.

Experimental Design
First, we evaluate the performance of each agent fine-tuned by the dual learning procedure on the SAOKE dataset. Then we evaluate the Orator on noisy facts, which accords with real OIN application scenarios. Last, we investigate the behavior of agents in the dual system.
In the experiments, the SAOKE dataset is split into the training set, validating set and testing set with ratios of 80%, 10%, 10%, respectively. For each algorithm involved in the experiments, we perform grid search to find the optimal superparameters, and the model with the best performance on the validating set is chosen as the learnt model to be evaluated on the testing set.

Evaluation Metric
For the Orator, BLEU-4 and ROUGE-L are used to measure how well the output matches the ground truth sentence.
For the Logician, based on the fact-equivalence judgment proposed in (Sun et al., 2018), we compute the Precision(P), Recall (R) and F1-score over the testing set of the SAOKE dataset as the evaluation metric.

Agent Implementation
For the Orator, we make a vocabulary V with size 72,591 by collecting all web pages from Baidu Baike website 3 (a Chinese alternative to Wikipedia) and identifying the words occurred in more than 100 web pages. For the Orator, the dimension of embedding vectors is set to N e = 256, and the dimension of hidden states is set to N h = 256. We use a three-layer bi-directional GRU with dimension 128 as the encoder. All dimensions of hidden states in the decoder are set to 256.
For the Logician, we implement the model described in (Sun et al., 2018), including the shallow tag information and the gated dependency attention mechanism.
Furthermore, to provide an intuitive comprehension of the OIN task, we implement a rulebased method for OIN task. For each sequence of facts in the SAOKE dataset, the method first identifies the subsequences in which the facts share the same subject. Then it preserves the subject of the first fact in each subsequence and removes the subjects of following facts (by replacing it with an empty string). It is necessary since the SAOKE dataset requires the shared subject to be repeated for completeness of the related facts. At last, each fact is formatted into a string by filling the objects into the placeholders of the predicate and these strings are concatenated with commas to form the final sentence.

Reward Implementation
For the validity reward of the Orator, the language model is trained using an RNN based method (Mikolov et al., 2010) with the same vocabulary V and the web pages from Baidu Baike website.
For the reconstruction reward of the Orator, since the Logician needs the shallow tag and dependency information of S * as inputs, the information is extracted using the LTP tool-set (Che et al., 2010) and then fed to the Logician.

Training
When training the base model for each agent, the batch size is set to 20. When training two agents in dual learning, the batch size is set to 12, and the beam size is set to 3. Both agents are trained using the stochastic gradient descent (SGD) with RM-SPROP strategy (Hinton et al., 2012) and earlystop strategy on the validating set. In dual learning, the super-parameters, including α i , β i , is determined by grid-search.

Evaluation of Agents on the SAOKE dataset
First, we evaluate the performance of the agents optimized by the dual learning method. To identify the contribution of the dual structure, we train another pair of agents with α 1 = 0 and β 1 = 0 in Algorithm 1 to exclude the dual information. Without the dual information, these two agents are trained independently to each other with reinforcement learning on their own supervised information. We name these two agents as R-Logician and R-Orator, where "R" means "Reinforced". In the experimental results of this paper, the symbol at the top mark means that the marked result is significantly different (with p = 0.05) with the corresponding result of the agent with the specific mark.  The experimental results for the Logician agents are shown in Table 3, from which we can observe a significant performance improvement from Logician to R-Logician and also from R-Logician to Logician@Dual. The experimental results for the Orator agents are shown in Table 4. The neural based Orator agents significantly outperform the rule-based agent. For both evaluation metric, the R-Orator and Orator@Dual are both significantly outperform the original Orator. The Orator@Dual significantly outperforms the R-Orator on the BLEU-4 score, but is not significantly different on the ROUGE-L score.  By comparing the performance of R-agents and the agents@Dual, we can observe that agents@Dual generally achieve better performance on precision, but may recall less information, resulting in smaller advances in the balanced evaluation metric (F1 and ROUGE-L). This may imply that the agents tend to provide easy input for each other for higher accuracy, by neglecting some difficult part of the problem which they currently cannot handle properly. This interesting phenomenon is the subject of our future research.

Evaluation of Orator on Noisy Facts
Experiments in the previous subsection show the performance of Orators to narrate a set of humanlabeled facts. In practice, however, the input to the Orator might not be the human-labeled perfect facts, but some noisy facts automatically extracted by OIE algorithms. In this subsection, we make a collection of sets of noisy facts by feeding the sentences in the testing set of the SAOKE dataset to the base Logician model and collecting the outputs. Then we evaluate the series of Orator models on these noisy facts, and report their performance at

Evaluation of the Dual System
In this section, we investigate the behavior of agents in the dual system. We first examine the procedure F Orator − −−− → S * Logician −−−−−→ F * * , that is, for each F in the testing set of the SAOKE dataset, let the Orator narrate it into a sentence S * , and then let the Logician to extract facts F * * from S * . Then the quality of F * * is measured by comparing it with F. Then we examine S Logician −−−−−→ F * Orator − −−− → S * * , which is the reverse procedure. The comparison is made between the family of base agents and that of the dual-trained agents. The results are shown in Table 6, and two instance of these two experiments are shown in Table 7 and 8 respectively. From these results, we can observe large improvements of reconstruction quality on both directions.

Conclusion
In this paper, we investigate the OIN task and its duality to the OIE task. The proposed Orator has shown its ability to fulfill the OIN task, that is, assembling open-domain facts into high quality sentences. Furthermore, our attempt to utilize the duality between OIN and OIE tasks for improving the performances for both OIN and OIE agents accomplishes a preliminary success.
Our work suggests at least three future research topics: Firstly, one can enrich the theoretical study of the duality between the OIE and OIN tasks. Secondly, one can investigate how to conquer the barrier of the absence of an extensive collection of reasonable sets of open-domain facts and incorporate unsupervised information into this Logician-Orator dual learning structures for further improvement. Lastly, one can also interested in developing task-oriented rewards for adapting the agent to a specific task, for example, the answer generation task for open-domain KBQA system.
The throughput of all integrated cargoes kept a double-digit growth. Among them, the throughput of iron ore exceeded 45 million tons and the throughput of timber exceeded 6 million cubic meters, all of which hit record highs, and became the country's largest port of timber imports. Base Models Dual Models The throughput of all integrated cargoes kept double-digit growth, breaking 45 million tons, breaking 6 million cubic meters, a new high, became the country's largest port of timber imports.
The throughput of all integrated cargoes kept double-digit growth. The throughput of iron ore exceeded 45 million tons and the throughput of timber exceeded 6 million cubic meters, all hit a record high and became the country's largest port of timber imports.

−−−−→ S * in English
Business legal person is with secondary (high school) or above, has certain managerial and operational capabilities, and is with strong philosophy of service and teamwork spirit.
Business legal person is with secondary (high school) or above, business legal person has certain managerial and operational capabilities, and is with strong philosophy of service and teamwork spirit.