CHIME: Cross-passage Hierarchical Memory Network for Generative Review Question Answering

We introduce CHIME, a cross-passage hierarchical memory network for question answering (QA) via text generation. It extends XLNet introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidences, and the answer memory working as a buffer continually refining the generated answers. Empirically, we show the efficacy of the proposed architecture in the multi-passage generative QA, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset. An additional qualitative analysis revealed the interpretability introduced by the memory module.


Introduction
With the development of large-scale pre-trained Language Models (LMs) such as BERT (Devlin et al., 2018), XLNet , and T5 (Raffel et al., 2019), tremendous progress has been made in Question Answering (QA). Fine tuning pre-trained LMs on task-specific data has surpassed human performance on QA datasets such as SQuAD (Rajpurkar et al., 2016) and NewsQA (Trischler et al., 2016). Nevertheless, most existing QA systems largely deal with factoid questions and assume a simplified setup such as multiple-choice questions, retrieving spans of text from given documents, and filling in the blanks. However, in many more realistic situations such as online communities, people tend to ask 'descriptive' questions (e.g., 'How to improve the sound quality of echo dot?'). Answering such questions requires the identification, linking, and integration of relevant information scattered over long-form multiple documents for the generation of free-form answers.
We are particularly interested in developing a QA system for questions from e-shopping communities using customer reviews. Compared to factoid QA systems, building a review QA system faces the following challenges: (1) as opposed to extractive QA where answers can be directly extracted from documents or multiple-choice QA where systems only need to make a selection over a set of pre-defined answers, review QA needs to gather evidence across multiple documents and generate answers in freeform text; (2) while factoid QA mostly centres on 'entities' and only needs to deal with limited types of questions, review QA systems are often presented with a wide variety of 'descriptive' questions; (3) customer reviews may contain contradictory opinions. Review QA systems need to automatically identify the most prominent opinion given a question for answer generation.
In our work here, we focus on the AmazonQA dataset (Gupta et al., 2019), which contains a total of 923k questions and most of the questions are associated with 10 reviews and one or more answers. We propose a novel Cross-passage Hierarchical Memory Network named CHIME to address the aforementioned challenges. Regular neural QA models search answers by interactively comparing the question and supporting text, which is in line with human cognition in solving factoid questions (Zheng et al., 2019;Guthrie and Mosenthal, 1987). While for opinion questions, the cognition process is deeper: reading larger scale and more complex texts, building cross-text comprehension, continually refine the Figure 1: Illustration of the review QA task and the general idea of CHIME. The example question (the top box) is paired with 10 reviews (left panel) and one or more answers (right upper panel). CHIME is trained on the (Question, Review, Answer) triplet. During testing, CHIME is presented with a question and 10 related reviews and generates an answer (right bottom box). Both reviews and answers in this example contain contradictory information as highlighted by colors, while the question contains complex sub-questions. CHIME is able to identify relevant evidence and generate clear answers.
opinions, and finally form the answers (Zheng et al., 2019). Therefore, CHIME is designed to maintain hierarchical dual memories to closely simulates this cognition process. In this model, a context memory dynamically collect cross-passage evidences, an answer memory stores and continually refines answers generated as CHIME reads supporting text in a sequential manner. Figure 1 illustrates the setup of our task and an example output generated from CHIME. The top box shows a question extracted from our test set while the left panel and the right upper panel show the related 10 reviews and the paired 4 actual answers. We can observe that the question can be decomposed into complex sub-questions and both reviews and answers contain contradictory information. However, CHIME can deal with such information effectively and generate appropriate answers as shown in the right-bottom box.
In summary, we have made the following contributions: • We propose a novel Cross-passage HIerarchical MEmory Network (CHIME) for review QA. Compared with many multi-passage QA models, CHIME does not rely on explicit helpful ranking information of supporting reviews, but can capture cross-passage contextual information and effectively identify the most prominent opinion in reviews. • CHIME reads reviews sequentially, overcoming the input length limitation affecting most of the existing transformer-based systems, and brings some interpretability for these "black box" models. • Experimental results on the AmazonQA dataset show that CHIME outperforms a number of competitive baselines in terms of the quality of answers generated.

Related Work
Our work is related to the following three lines of research: Opinion/Review Question-Answering In Opinion or Review QA, questions may concern about finding subjective personal experiences or opinions of certain products and services. The Amazon QA dataset was first released in (McAuley and Yang, 2016) which contains 1.4 million questions (and answers) and 13 million reviews on 191 thousand products collected from Amazon product pages. They developed a Mixture of Expert (MoE) model which automatically detects whether a review of a product is relevant to a given query. In their subsequent work, Wan and McAuley (2016) noticed that users tend to ask for subjective information and answers might also be highly subjective and possibly contradictory. They, therefore, built a new dataset with 800 thousand questions and over 3 million answers from Amazon, in which each question is paired with multiple answers, and extended their previous MoE model with subjective information such as review rating scores and reviewer's bias incorporated. But they found that subjective information is only effective in predicting 'yes/no' answers to binary questions and does not help in distinguishing 'true' answers from alternatives in open-ended 'descriptive' questions. More recently, Yu and Lam (2018) only focused on the yes/no questions in the Amazon QA dataset (McAuley and Yang, 2016) and trained a binary answer prediction model by leveraging latent aspect-specific representations of both questions and reviews learned by an autoencoder.  focused on factual QA in e-commerce and proposed a Product-Aware Answer Generator that combines reviews and product attributes for answer generation, and uses a discriminator to determine whether the generated answer contains question-related facts. Xu et al. (2019a) proposed an extractive review-based QA task and manually created just over 2,500 questions and annotated the corresponding answer spans in less than 1,000 reviews relating to laptops and restaurants from the review data of SemEval 2016 Task 5 1 . They first jointly fine-tuned BERT for answer span detection, aspect extraction and aspect sentiment classification on the SemEval 2016 Task 5 data, and then post-trained BERT on over 3 million unlabelled Amazon and Yelp reviews in order to fuse domain knowledge, and also on SQuAD 1.1 (Rajpurkar et al., 2016) in order to gain task-relevant but out-of-domain knowledge. Gupta et al. (2019) created a subset from the Amazon QA product review dataset (McAuley and Yang, 2016), consisting of 923k questions with 3.6M answers and 14M reviews on 156k Amazon products. They trained an answerability classifier from 3,297 question-context pairs labeled by Mechanical Turk and used it to classify answerability for the whole dataset. They then converted the dataset into a span-based format by heuristically creating an answer span from reviews that best answers a question based on users' actual answers, and trained R-Net (Wang et al., 2017), which uses a gated self-attention mechanism and pointer networks, to predict answer boundaries. There are few studies using generative models to deal with opinion/review-based QA.
Multi-passage QA There are mainly two types of methods for multi-passage QA. One is to use retrieval-based methods to first identify text passages that are most likely to contain answer information, and then perform QA on the extracted text passages which are essentially considered as a single passage. The other one is to separately run single-passage QA over each passage, obtaining multiple answer candidates, and then determine the best answer through mutual verification among the answers. Examples in the first type of methods include S-NET (Tan et al., 2018), Multi-passage BERT , and Masque (Nishida et al., 2019). These models require supporting text passages to be explicitly annotated. S-NET (Tan et al., 2018) follows an extraction-then-synthesis framework. First, relevant passages are extracted from context using a variant of R-NET (Wang et al., 2017), which learns to rank passages and extract the most possible evidence span from the selective passage; then, the evidencenotated selective passage is used for the GRU decoder synthesizing answers. In Multi-passage BERT , two independent BERTs were used to perform multi-passage QA. One BERT takes the question and a text passage as input and then uses the hidden states of the CLS token to train a classifier to determine if the text passage is relevant to the given question. The other BERT is used for extracting candidate answers from relevant text passages. The Masque model (Nishida et al., 2019) is a generative reading comprehension approach based on multi-source abstractive summarization. Masque uses a joint-learning framework, comprising of a question answerability classifier, a passage ranker, and an answer generator. At each step of answer generation, the decoder chooses a word from the mixture of three distributions derived from a vocabulary, from the question and associated multiple passages. A representative example of the second type of methods is V-Net (Wang et al., 2018). The main assumption of V-Net is that correct answers often appear in multiple documents with high frequency and similarity, and wrong answers are usually different from each other. Therefore, V-Net builds a mutual verification mechanism between all answer candidates, which are separately extracted from different passages, to select the best final answer.
Most existing approaches require explicit annotations of supporting text passages in order to train multi-passage QA models in a supervised way. In our setup here, supporting review passages to a question was unsupervised ranked by BM25, which may introduce noises to QA model training and poses a more significant challenge.
Memory Network Memory network has been first proposed to model the relation between a story and a query for QA systems (Weston et al., 2014;Sukhbaatar et al., 2015). Apart from its application in QA, memory networks have also achieved great successes in other NLP tasks, such as machine translation (Maruf and Haffari, 2017), sentiment analysis (Fan et al., 2018), visual question answering (Xiong et al., 2016), social networks (Fu et al., 2020), and summarization (Kim et al., 2018). The main idea of memory networks is to use the attention mechanism to assign different weights to text passages so as to identify the most relevant passages for answer generation (Weston et al., 2014). Kumar et al. (2016) proposed a gated memory network to represent facts in different iterations during the learning process to verify the potentially related passages to generate an answer. Gui et al. (2017) used a convolutional architecture to capture attention signals in memory networks. Xu et al. (2019b) leveraged the memory network as an information retrieval system to search possible entities in knowledge bases for complex questions. Chen et al. (2019) used the memory network to verify items in knowledge bases as passages and then generate answers. Generally speaking, existing memory-network-based QA methods mainly focus on using memory networks to weigh and derive representations of question-aware text passages and knowledge entities for answer generation. We instead explore a novel structure of a hierarchical memory network composing of both context and answer memories for better capturing review context and generating more appropriate answers.

Cross-passage Hierarchical Memory Network (CHIME)
In this section, we first define the review QA task and then present our proposed Cross-passage Hierarchical Memory Network (CHIME).

Task Formulation
We focus on generative QA with multiple reviews and develop our model based on the AmazonQA dataset (Gupta et al., 2019) in which most of the questions is paired with multiple answers and the top 10 most relevant text snippets as supporting passages extracted from the associated reviews by BM25 (Robertson and Zaragoza, 2009). In addition, each question is annotated if it is answerable based on the top 10 review snippets, and each answer is accompanied with response votes. The review QA task can be defined as: given an answerable question x q = {x q 1 , x q 2 , · · · , x q Nq }, K supporting reviews with k-th review represented as x r k = {x r k 1 , x r k 2 , · · · , x r k Nr }, a model is asked to generate an answer y = {ŷ 1 ,ŷ 2 , · · · ,ŷ Na }, where N q , N r and N a denote the length of a question, a review and an answer, respectively. In training phase, L answers with l-th answer represented as y a l = {y a l 1 , y a l 2 , · · · , y a l Na } and corresponding response votes v a l = {v a l + , v a l } are provided, where v a l + denotes the number of positive votes and v a l denotes the number of all votes, and 0 ≤ v a l + ≤ v a l .

CHIME
In this paper, we propose a Cross-passage HIerarchical MEmory Network (CHIME) for review question answering. As has been shown in (Petroni et al., 2019), pre-trained LMs can be used as implicit knowledge bases, making them suitable for language generation. Hence, in this paper, we leverage the XLNet , which combines advantages of autoregressive and autoencoder models. Based on our task formulation, CHIME is designed to maximize the probability p(ŷ|x q , x r 1 · · · x r K ) of generating an answer given a question and its associated K reviews in multi-passage review QA. The overall architecture of CHIME is shown in Figure 2. Given a question paired with K text passages, we create K training instances with each one consisting of the question, a text passage, and the best answer chosen by the helpfulness votes assigned by users. Each training instance is fed into an XLNet encoder to derive hidden representations, which will be used to update two memories. In particular, the context memory is updated when seeing more text passages and the answer memory is continuously refined with the answer Figure 2: The architecture of Cross-passage Hierarchical Memory Network (CHIME). The model reads multiple reviews in a sequential manner. When reading the instance k consists of the question, k-th review, and the gold answer, the model first derive hidden states of the instance k from the XLNet encoder, then use the hidden states of context part update the context memory (the left part of the Memory k ). With the newly updated context memory, CHIME then be able to use the hidden states of the answer part to update the answer memory (the middle part of the Memory k ). After reading the last review, the answer memory will be input to the decoder and get a final answer generation (the top dotted frame).
generated from each (question, text passage) pair. CHIME has the following characteristics: (1) the use of a pre-trained XLNet as an encoder instead of traditional recurrent neural networks as the pre-trained LMs captures rich background knowledge and is more suitable for encoding semantic meanings of questions and review documents; (2) the proposal of the cross-passage context memory mechanism to perform the reading of review passages in a sequential manner to deal with multiple text passages more effectively, which avoids the massive memory costs required to read all supporting passages in one go; (3) the use of the answer memory to gradually refine the generated answer for a question after reading more text passages. Figure 2 shows the general architecture of CHIME, which consists of three key components: the XLNet encoder for encoding a question, a review, and an answer, the cross-passage hierarchical memory mechanism, and the decoder for answer generation.
XLNet Encoder The XLNet Encoder in CHIME is a vanilla XLNet encoder with special Seq2Seq masks introduced in UniLM ( Dong et al., 2019), which is essentially a concatenation of a standard pre-trained LM encoder and a pre-trained LM decoder. With the Seq2Seq masks, we are able to train an encoder for an encoder-decoder task. In specific, for each question paired with K text passages, we create K training instances with each one consisting of the triple (question, passage, answer ) as Part 2. The Seq2Seq masks are designed in a way such that all tokens in Part 1 attend to each other, and tokens in Part 2 attend to any tokens in Part 1, but only preceding tokens in Part 2. Let y ag be the gold-standard answer selected for current training instance, x r k * be the whole input sequence of instance k; N x be the length of x r k * , which keeps the same across all text passages; d be the dimension of hidden size, and H r k ∈ R Nx×d be the contextual hidden states of the encoder: where E t (·), E s (·) and E p (·) denote token embeddings, segment embeddings and position embeddings respectively. Here we use an interval segment embedding [E A t , E B t , E A t ] to distinguish question, passage and answer other than the usual two-segment embedding in regular XLNet. As answers are only available during the training phase, training XLNet for the encoder-decoder task can be considered as fine-tuning pre-trained XLNet on our corpus in order to learn a better XLNet encoder.
Cross-passage hierarchical memory mechanism Hidden states of Part 1 and Part 2 are used to initialize and update context memory and answer memory respectively. Here the last [SEP] token in Part 1 is removed and added as the start token of Part 2 from this stage onwards for language generation purpose. Memory update is accomplished by taking a weighted aggregation of the previously retained memory and the current hidden state using a forget gate. The gate is obtained by using an MLP layer with a memory-specific Transformer encoder (Vaswani et al., 2017), which is composed of a multi-head scaled dot product attention sublayer and a position-wise fully connected feed forward network sublayer. When receiving the hidden states derived from XLNet encoder, CHIME first use the states of Part 1 to update context memory, then hierarchically use the newly updated context memory with the states of Part 2 to update answer memory. Let N S 1 and N S 2 be the length of Part 1 and Part 2, respectively, which are kept the same across different text passages; H r k c ∈ R N S 1 ×d be the hidden states of the context part, which refers to the question and a text passage; H r k a ∈ R N S 2 ×d be the hidden states of the answer part; M r k c ∈ R N S 1 ×d and M r k a ∈ R N S 2 ×d be the updated context memory and answer memory respectively after reading k-th passage: where Z r k c ∈ R N S 1 ×d and Z r k a ∈ R N S 2 ×d denote the normalized attention output from the Transformer denote the forget gate. W r k mc ∈ R N S 1 ×d , W r k zc ∈ R N S 1 ×d , b r k c ∈ R N S 1 , W r k ha ∈ R N S 2 ×d , W r k za ∈ R N S 2 ×d and b r k a ∈ R N S 2 are all trainable parameters. The two memories are initialized by taking the hidden states after reading the first review text passage of a question: M r 1 c = H r 1 c , M r 1 a = H r 1 a . Decoder and Loss Function The answer probability p(ŷ) over all V tokens of the whole vocabulary is generated by adding a softmax layer on the top of the answer memory: where W ma ∈ R d×V and b a ∈ R V are trainable. The training loss of each sample is the cross entropy loss of the predicted answerŷ and gold-standard answer y:

Experiments
In this section, we first introduce the dataset used in our experiments, the baselines for comparison, and the evaluation metrics employed, followed by a discussion over the obtained results and a few examples generated using the different approaches presented.

Settings
Dataset We built our dataset 2 from AmazonQA (Gupta et al., 2019). We only focused on more difficult 'descriptive' questions and filter out non-answerable or 'yes/no' questions. We kept questions with 10 review snippets, sorting in descending order of relevance to the question. In the original dataset, 96% of the answerable 'descriptive' questions are paired with 10 reviews. For each question, we only selected the best answer with the highest positive response rate. We further removed URL links from question, review, and answer text. The filtered dataset contains 365k samples in the training set, 47k samples in the validation set and 48k samples in the testing set. We set the maximum tokenized lengths of questions, reviews, and answers to 40, 124, and 82, respectively, which cover 95% of our samples.

Parameters setup
The hidden size of BERT-base and XLNet-base is 768. The corresponding vocabulary sizes are 28,996 and 32,000. For CHIME, the inner Transformer encoders are 1-block vanilla Transformer, which contains an 8-heads multi-head attention and a feed-forward network with 2048 inner state size. The optimizer of all neural baselines is AdamW (Loshchilov and Hutter, 2018) with β1 = 0.9, β2 = 0.999, and = 1e − 06. Except for parameters of bias and layer normalization, all other training parameters are decayed with a rate of 0.95. The gradients of all parameters are clipped to the maximum norm 1.0. The learning rate is increased linearly from 0 to 1e-5 in the first 20% total training steps and then linearly decreased to 0.
Baselines We developed two heuristic baselines as well as three neural baselines: • Random Sentence. Given a question, select a random sentence from paired reviews as an answer.
• Sentence Retrieval. First, convert each question and each sentence of its paired reviews into sentence embeddings using BERT, then retrieve the sentences with the highest cosine similarity with the question as the selective answer. The sentence length of both heuristic baselines is 120. • BERT+summary. Directly using BERT (Devlin et al., 2018) for generative QA is difficult since it is memory demanding to deal with multiple reviews in one go. We instead first generate an extractive summary of reviews using Textrank (Mihalcea and Tarau, 2004), then feed a question and its associated review summary into BERT for answer generation. • XLNet+summary. Although XLnet is theoretically capable of dealing with the text of unlimited length as it adopts the segmentation mechanism from Transformer XL , and could potentially process at once the concatenation of all the passages paired with a question, the computational requirements easily became rather prohibitive, and in practice is often not feasible to simultaneously deal with multiple long reviews with limited computational resources. Therefore, we take a similar summary-then-QA approach for XLNet. • XLNet+V-Net. We follow the mutual verification mechanism proposed in V-Net (Wang et al., 2018) for answer post-processing. In particular, after XLNet generates candidate answers given individual reviews, mutual verification is conducted by calculating the average attention value of the current candidate answer with all the other answers. The one with the highest value is the final answer. Due to the limitations of our computing resources, we have to use regular versions of large-scale pre-trained LMs and a subset of the original data. We use the BERT-base and the XLNet-base from Huggingface 3 . Both the neural baselines and our proposed CHIME are trained with 25% randomly selected data from our constructed dataset, which consists of 92k samples, comparable to popular large-scale datasets such as MS Marco (100k) (Nguyen et al., 2016) and HotpotQA (113k) . For all neural models, we train for 3 epochs and use the beam search with size 3 over the best models to generate answers from decoder probability distributions. In testing phase, 1k samples are extracted randomly for answer generation and evaluation.
Metrics We use ROUGE-L (Lin, 2004) and BLEU (Papineni et al., 2002) to evaluate the lexical similarity between the gold-standard and the model generated answers. To measure the semantic similarity, we use BertScore 4 (Zhang et al., 2019), which first computes the pairwise cosine similarity among all the tokens in the candidate and reference answers, and then greedily match them to get the highest similarity score for the sentence pair. BLEURT 5 (Sellam et al., 2020) is a text generation quality evaluation framework that uses BLEU, ROUGE and BertScore and other indicators as multi-task joint training through fine-tuning BERT. We use BLEURT as a comprehensive metric to evaluate both the lexical and semantic similarities. A higher BLEURT score means that the generated sentence is both lexically and semantically closer to the ground truth. As each question is paired with multiple ground-truth answers, for BertScore and BleuRT, we finally consider the pair obtaining the maximum score.

Model
Bleu-1 Bleu-2 Rouge-L F1 BertScore BleuRT  Table 1: Evaluation results of CHIMEs and baselines. The answers generated by CHIMEs are superior in terms of lexical and semantics evaluations. CHIME-c removes the context memory and makes use of just the answer memory, in which the answer memory is updated not by context memory but by current context hidden states. In contrast, CHIME-a removes the answer memory and makes use of just the context memory, in which we remove the MLP sublayer for answer memory and directly feed the output of middle transformer encoder to the final decoder as shown in Figure 2. Table 1 reports the evaluations of 1k selective samples from the testing set. The answers generated by CHIME exhibit an overall improved quality reflected by lexical and semantic evaluations outperforming all baselines. This validates the efficacy of combining the context and the answer memory to generate coherent answers when processing multiple passages, containing possibly contradictory opinions. CHIME-c is an ablated version of CHIME that only uses the answer memory, which is updated without the link from the context memory M r k c but using the current context hidden states H r k c . The comparison of CHIME-c with CHIME demonstrates the importance of the cross-passage evidence collection. Similarly, CHIME-a is another ablated version that makes use of the only context memory, in which we link Z r k a from the answer memory's encoder for the final decoding. The performance gap between CHIME-a and CHIME corroborate the relevance of a gradual answer refinement.

Qualitative analysis
As a case study, we analyze the example reported in Figure 1. We first compare the quality of the answers generated by different models and then illustrate a breakdown of the CHIME's generative process when iteratively reading different reviews. The gradual generative process provides some explicit interpretability of cross-passage evidence collection and sequential answer refinement.
In Table 2 we compare a few answers generated using different models 6 . Answers returned by either randomly selecting a sentence from review text passages or by retrieving a sentence from passages which is most similar to a given question are clearly not directly addressing the question. The poor quality of the answer returned by the BERT+Summary model, off-topic and ill-grammatical structure, shows the limitation of simply using the out-of-the-box BERT in text generation. The XLNets are able to generate some reliable answers, which is much better than the BERT+Summary. However, the two XLNet models mistakenly uses "30 mins" to replace the key term "30 seconds", which weakens the credibility of the answer. Compared with XLNets, the CHIME generates syntactically well-formed answer with better coherency and fluency.

Question:
Can I set it for 30 seconds on the memory function, and then can it will automatically reset to the 30 seconds continually? Answer 1: Yes, you can reset the times you set and reuse them. Answer 2: No...thats why it sucks... Answer 3: Do not know Answer 4: yes. the timer has 4 different timer settings. timer 1 is the regular one that gets used the most. but, timers 2, 3 and 4 can be programmed in memory so that you can recall the time every time you want to use it. see my other post, though. the first time this gets knocked off the counter, it's history.
Random Sentence: the only way to reset to the memorized time is to hit the "memory 2-3-4" button 4x to cycle through all 4 timers and get back to the one you are using. Retrieval Sentence: but, and here's the major flaw, once the timer counts down to 0:00, beeps, and you hit the stop button, the time remains at 0:00. BERT + Summary: i have not have a little good. i don't have a little light. i'm not sure, but you can. XLNet + Summary: i have no problem with this. it works for 30 mins, but the timer will reset to 0, but not to a point where you can adjust your timer. XLNet + V-Net: yes, it will reset to 30 seconds, but i don't know if it would work for 30 minutes or not? i have no idea about this issue but. CHIME: yes you can set it for 30 seconds. i have set mine for 30 minutes and have had no problems with that timer going off and running again. Table 2: Comparison of example answers. CHIME outperforms all the five baselines in generating more reliable answer. XLNets provide readable but noisy, incomplete and ill-understandable answers. All other baselines including the BERT one are not able to generate readable answers. Figure 3: A breakdown of CHIME's generative process. The example question (the top box), the paired reviews (left panel), and the intermediate answers (right panel) after gradually reading the corresponding reviews. The major points of the question and reviews are highlighted with colors, and the italic text marked with underline is the content most concerned by the forget gate. Given new reviews, the very first generated simple answer becomes complicated and full of noise, but finally converges to the most prominent opinions and facts relevant to the question. Figure 3 shows a breakdown of CHIME's generative process. The question-related content highlighted with colors is highly likely the major concerning part that the forget gate believes to memorise. The intermediate answers reported show that CHIME has locked the answer to the first sub-question from the beginning. But for the second sub-question, as the 8th answer shows, CHIME was also misled by other unimportant information. The final answer is eventually a synthesis of the prominent opinions encountered, summarised in a few concise phrases.

Conclusions
In this paper, we have proposed CHIME, a cross-passage hierarchical memory network for multi-passage generative review QA. It is built on the XLNet generator  by adding a memory module consisting of a context and a answer memory which guarantees a more accurate refining process for crosspassage evidence collection and answer generation. The sequential process adopted in CHIME makes it possible to elaborate longer text passages and some straightforward interpretability. We have assessed experimentally a significant quality improvement using different state-of-the-art metrics to measure the lexical and semantic coherence of the generated text. We plan to further extend CHIME to model with multiple ground truth simultaneously and leverage the available product attributes.
Appendix A More Example Outputs Generated by Various Models (a) Question: does this come with the straps?/ Answer 1: no it doesn't sadly. Answer 2: no. you have to buy separate. Answer 3: it comes with the white cotton rope ending in knots typical of mayan hammocks but no tree hugger straps or hammock frame. Answer 4: no it does not come with the straps or rope. i advise you to purchase parachord with it. XLNet + Summary: yes, it comes with the straps and the hammock is included with it! it comes in two pairs of harnesses that are attached to each other, so you can use them on your porch or your bed, or just use them on a bed or bed XLNet + V-Net: yes it comes with the straps and the sizing is very good and very easy to do! CHIME: yes it does. i love this hammock and love the quality of the straps (b) Question: has anyone tried baking with this? Answer 1: yes you can! designer whey also has recipes that you can make, like cookies, waffles or pancakes and muffins! XLNet + Summary: i have never tried it with my own breads but i have had a few good ones and they were great! XLNet + V-Net: i haven't tried baking with it, but i've been using this for a couple of weeks now and CHIME: i haven't tried baking with this, but the flavor is very good and the flavor is very nice and good! (c) Question: can you put coals on the lid, like you can with some lodge models? Answer 1: yes you can, Answer 2: yes. i've done it many times. make sure you get one of those lid-lifters though so you don't pour ash or coals into the pot. Answer 3: yes, this dutch oven is specifically designed with a tall lip to cup the coals on top of the lid to allow for baking. this has worked great for me in making cobbler and biscuits. Answer 4: never heard of coals on the lid, sorry. it makes a great chili over an open pit bbq. we hang it over a pit in the winter out back and make stews, chili's and pot luck. makes for a fun time especially in a snow storm! Answer 5: shouldn't be a problem. lid fits with good seal so no coals or ash should spill inside. XLNet + Summary: yes, you can. i don't know if it is possible to put coals on the top of the lid and then place them on top of that XLNet + V-Net: yes, you can put coals on the lid, but i don't think you can put coals inside the top, so i don've had any problems with this oven and have not tried any of these models with any problem with mine CHIME: yes, you can put coals on the lid, but i don't think you can put any coal in this one, as the lid is not designed for that type of use (d) Question: in what stores are these sold? Answer 1: target and target.com is where i've bought 3 pairs over time. Answer 2: different sizes different stores. XLNet + Summary: i don't know. i have never used them in my life, but they are very good for the foot, but it is not as good as the ones that are sold at stores XLNet + V-Net: i don't know. i've been using them for a couple of years and it works great! they are very good and very good! CHIME: amazon has them in their store. i have had them for over 2 years and have never had any problems with it working for my feet, especially when Table A1: Example outputs to compare the quality of CHIME answers and XLNets answers.
(e) Question: is this set compatible with the new canon powershot sx50 hs? Answer 1: yes james, as long as you decide what mm size you want to use! i prefered the 58mm adapter ring, this info is available in the canon online operaters manual! just make sure you specify the camera number (sx50 hs) when ordering any after market devices!! ps: you may already have 52mm (or other) lenses from another camera, Answer 2: yes Answer 3: i don't know as i have the sx40. as far as the sx40 i can say that everything fits and works with the camera very nice. i've now used everything. the lens cover is much much better than the one that came with the camera. the little case that houses the lenses is very nice. once you screw the adapter ring onto the end Answer 4: i bought it for my canon sx40 and it works perfectly, but i'm not sure if the sx50 lens diameter is larger. this set does come with an adapter ring that fit snugly on my canon sx40 which allows placement of the filters and you can also stack the filters and/or add the lens hood. i mainly just use the lens hood Answer 5: i just recieved this in the mail and the adapter does not fit my sx50?? Answer 6: hi everyone! the sx40 and sx50 have very similar dimensions and many people were able to make this ring fit on the sx50, however please know that there is an adapter ring designed specifically for use with the sx50. you can purchase this exact kit with the sx50 ring by following this link essential accessory kit for canon powershot sx XLNet + Summary: yes, it is compatible with the new canon powershot and the new canon power shot camera, but it is compatible with all other cameras that have a different camera and XLNet + V-Net: yes it is compatible with the sx40 sx20 CHIME: yes, it is compatible with the new canon sx50 (f) Question: does it work on an iphone 4 or less? Answer 1: yes, it will work. it will not work with any higher versions only lower. Answer 2: yes as the plugs are the same.. but to tell you the truth i wouldn't bother buying one if i had my time over. not long after i ordered my dock sonos released an uprade, "play from this phone" which basically meant you didn't have to have your computer on the listen to music..i only bought the dock for this very purpose XLNet + Summary: yes, it does. i have a iphone 4 and it works fine with my 4 year old XLNet + V-Net: yes it works on my iphone 4 and it works great! i have a sonos and have had no problems with the device running with my sono 5, but the cord is not very long, so if you are looking for an older model you should be able to find one that will do that CHIME: yes it works on my iphone 4 and yes, works with my sonos (g) Question: how is this mounted on a deck railing that is 1 1/4" wide? Answer 1: the hanger just sits over the railing but it also has the side piece that a lot of other planters don't have. that extra piece rests against the railing at a ninety degree angle and keeps the planter from tipping forward. it's great...i love mine! Answer 2: it has adjustable brackets that attach on each end Answer 3: it is adjustable Answer 4: it has adjustable brackets. it works on my pull fence quite well. XLNet + Summary: it's a 2" wide. it's a 2 1/4" long, but i would say that this is the best way to mount the rack on any railing, so if you are looking for an adjustable bracket for your deck or balcony or window XLNet + V-Net: i've used mine on a deck with no problems. i've had no problems on my deck railing, but the rail is not sturdy enough to hold up to that height of my house CHIME: i have a deck railing that is 1 1/4" wide. i would recommend using a bracket to hold the box in place, but if you are going to hang your deck railing over Table A2: Example outputs to compare the quality of CHIME answers and XLNets answers.