Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions

In this paper, we propose to study the problem of court view generation from the fact description in a criminal case. The task aims to improve the interpretability of charge prediction systems and help automatic legal document generation. We formulate this task as a text-to-text natural language generation (NLG) problem. Sequence-to-sequence model has achieved cutting-edge performances in many NLG tasks. However, due to the non-distinctions of fact descriptions, it is hard for Seq2Seq model to generate charge-discriminative court views. In this work, we explore charge labels to tackle this issue. We propose a label-conditioned Seq2Seq model with attention for this problem, to decode court views conditioned on encoded charge labels. Experimental results show the effectiveness of our method.


Introduction
Previous work has brought up multiple legal assistant systems with various functions, such as finding relevant cases given the query (Chen et al., 2013), providing applicable law articles for a given case (Liu and Liao, 2005) and etc., which have substantially improved the working efficiency.As legal assistant systems, charge prediction systems aim to determine appropriate charges such as homicide and assault for varied criminal cases by analyzing textual fact descriptions from cases (Luo et al., 2017), but ignore to give out the interpretations for the charge determination.
Court view is the written explanation from judges to interprete the charge decision for certain criminal case and is also the core part in a legal document, which consists of rationales and a charge where the charge is supported by the rationales as shown in Fig. 1.In this work, we propose to study the problem of COURT VIEW GENeration from fact descriptions in cases, and we formulate it as a text-to-text natural language generation (NLG) problem (Gatt and Krahmer, 2017).The input is the fact description in a case and the output is the corresponding court view.We only focus on generating rationales because charges can be decided by judges or charge prediction systems by also analyzing the fact descriptions (Luo et al., 2017;Lin et al., 2012).COURT-VIEW-GEN has beneficial functions, in that: (1) improve the interpretability of charge prediction systems by generating rationales in court views to support the predicted charges.The justification for charge decision is as important as deciding the charge itself (Hendricks et al., 2016;Lei et al., 2016).(2) benefit the automatic legal document generation as legal assistant systems, by automatically generating court views from fact descriptions, to release much human labor especially for simple cases but in large amount, where fact descriptions can be obtained from legal professionals or techniques such as information extraction (Cowie and Lehnert, 1996).
COURT-VIEW-GEN is not a trivial task.Highquality rationales in court views should contain the important fact details such as the degree of injury for charge of intentional injury, as they are important basis for charge determination.Fact details are like the summary for the fact description similar to the task of DOCument SUMmarization (Yao et al., 2017).However, rationales are not the simple summary with only fact details, to support charges, they should be charge-discriminative with deduced information which does not appear in fact descriptions.The fact descriptions for charge of negligent homicide usually only describe someone being killed without direct statement about FACT DESCRIPTION ... 经审理查明, 2009年7月10日23时许, 被告人陈某伙同八至九名男青年在徐闻县新寮镇建寮路口附近路上拦截住搭载着李某的摩托车, 然后, 被告人陈某等人持钢管、刀对李某进行殴打。经法医鉴定, 李某伤情为轻伤。... # ... After hearing, our court identified that at 23:00 on July 10, 2009, the defendant Chen together with other eight or nine young men stopped Lee who was riding a motorcycle on street near the road in Xinliao town Xuwen County, after that the defendant Chen and the others beat Lee with steel pipe and knife.According to forensic identification, Lee suffered minor wound....

本院认为,
被告人陈某无视国家法律, 伙同他人, 持器械故意伤害他人身体致一人轻伤 rationales , 其 行 为 已 构 成故意伤害罪 charge 。# Our court hold that the defendant Chen ignored the state law and caused others minor wound with equipment together with others rationales .
His acts constituted the crime of intentional assault charge .... Table 1: An example of fact description and court view from a legal document for a case.
task.Firstly, it is hard to maintain the discriminations of generated court views when input fact descriptions are none-discriminative among charges in subtle difference.For example, the charges of intentional homicide and negligent homicide are similar and the corresponding fact descriptions will be expressed in similar way.Both of the fact descriptions of the two charges will describe the defendant killing someone but will not directly point out that the defendant is in intention or in neglect, causing it hard to generate chargediscriminative court views.Secondly, high-quality court views should contain the fact details in the fact descriptions such as the degree of injury for intentional injury charge because fact details are the important basis for charge determination.
Traditional natural language generation (NLG) will need much human-labor to design rules and templates.To overcome the difficulties of COURT-VIEW-GEN mentioned above and the shortcomings of traditional NLG methods, in this work, we propose a novel label conditioned sequence to sequence model with attention for COURT-VIEW-GEN aiming to directly map fact descriptions to court views.The architecture of our model is shown in Figure 1.Fact descriptions are encoded into context vectors by an encoder then a decoder generates court views with these vectors.To generate more class-discriminative court views from none-discriminative fact descriptions among charges with subtle difference, we introduce to encode charges as the labels for the corresponding fact descriptions and decode the court views conditioned on the charge labels by additionally encoding the charge information.The intuition lies in that charge labels will provide extra information to classify the non-discriminative fact descriptions and make the decoder learn to select words related to the charges to decode.To maintain the fact details from fact descriptions like the degree of injury for charge of intentional injury, we further apply the widely used attention mechanism (?) into Seq2Seq model.By applying attention technic, every time context vectors will contain most important information from the fact descriptions for decoder.Experimental results show that our model has strong performance on COURT-VIEW-GEN and exploiting charge labels will significantly improve the class-discriminations of generated court views especially for charges with subtle differences.
Our contributions of this paper can be summarized as follows: • We propose the task of court view generation which is meaningful but has bot been well studied before.
• We introduce a novel label conditioned sequence to sequence model with attention for COURT-VIEW-GEN.
• Experimental results demonstrate the effectiveness of our model and exploiting charge labels will significantly improve the classdiscriminations of generated court views.

Related Work
Our work is firstly related to previous studies on legal assistant systems.The task of charge prediction is to determine appropriate charges such as intentional homicide or intentional injury by analyzing the contents of fact descriptions.Previous works regard the task of charge prediction as a text classification problem (????).?adopt KNN to classify charges in Taiwan and recently, ?propose an attention based deep learning model to scale the charge classes to a large number.Besides, researchers also introduce to identify applicable articles for a given case (???), answer legal questions as a consult system (??) and search relevant cases for a given query (??).As a legal assistant system, COURT-VIEW-GEN can benefit automatic legal document generation by generating the part of court views from fact descriptions obtained from the last phase if we generate legal document step by step.The fact descriptions can be constructed the motive for killing, DOC-SUM will only summarize the fact of someone being killed, but rationales have to further contain the killing intention, aiming to be discriminative from those rationales for other charges like intentional homicide.However, it is hard to generate charge-discriminative rationales when input fact descriptions are not distinct among other facts with different charges.The fact descriptions for charge of intentional homicide are similar to those for charge of negligent homicide and also describe someone being killed but without clear motive, making it hard to generate charge-discriminative court views with accurate killing motives among the two charges.
Recently, sequence-to-sequence model with encoder-decoder paradigm (Sutskever et al., 2014) has achieved cutting-edge results in many NLG tasks, such as paraphrase (Mallinson et al., 2017), code generation (Ling et al., 2016) and question generation (Du et al., 2017).Seq2Seq model has also exhibited state-of-the-art performances on task of DOC-SUM (Chopra et al., 2016;Tan et al., 2017).However, non-distinctions of fact descriptions render Seq2Seq model hard to generate charge-discriminative rationales.In this paper, we explore charge labels of the corresponding fact descriptions, to benefit generating chargediscriminative rationales, where charge labels can be easily decided by human or charge prediction systems.Charge labels will provide with extra information to classify the non-discriminative fact descriptions.We propose a label-conditioned Seq2Seq model with attention for our task, in which fact descriptions are encoded into context vectors by an encoder and a decoder generates rationales with these vectors.We further encode charges as the labels and decode the rationales conditioned on the labels, to entail the decoder to learn to select gold-charge-related words to decode.Widely used attention mechanism (Luong et al., 2015) is fused into the Seq2Seq model, to learn to align target words to fact details in fact descriptions.Similar to Luo et al. (2017), we evaluate our model on Chinese criminal cases by constructing dataset from Chinese government website.
Our contributions in this paper can be summarized as follows: • We propose the task of court view generation and release a real-world dataset for this task.
• We formulate the task as a text-to-text NLG problem.We utilize charge labels to benefit charge-discriminative court views generation, and propose a label-conditioned sequence-to-sequence model with attention for this task.
• Extensive experiments are conducted on a real-world dataset.The results show the efficiency of our model and exploiting charge labels for charge-discriminations improvement.

Related Work
Our work is firstly related to previous studies on legal assistant systems.Previous work considers the task of charge prediction as a text classification problem (Luo et al., 2017;Liu et al., 2004;Liu and Hsieh, 2006;Lin et al., 2012).Recently, Luo et al. (2017) investigate deep learning methods for this task.Besides, there are also works on identifying applicable articles for a given case (Liu and Liao, 2005;Liu and Hsieh, 2006;Liu et al., 2015), answering legal questions as a consulting system (Kim et al., 2014;Carvalho et al., 2015) and searching relevant cases for a given query (Raghav et al., 2016;Chen et al., 2013).As a legal assistant system, COURT-VIEW-GEN can benefit automatic legal document generation by generating court views from fact descriptions obtained from the last phase, through legal professionals or other technics like information extraction (Cowie and Lehnert, 1996) from raw documents in a case, if we generate legal documents step by step.
Our work is also related to recent studies on model interpretation (Ribeiro et al., 2016;Lipton, 2016;Ling et al., 2017).Recently, much work has paid attention to giving textual explanations for classifications.Hendricks et al. ( 2016) generate visual explanations for image classification.Lei et al. (2016) propose to learn to select most supportive snippets from raw texts for text classification.COURT-VIEW-GEN can improve the interpretability of charge prediction systems by generating textual court views when predict the charges.
Our label-conditioned Seq2Seq model steams from widely used encoder-decoder paradigm (Sutskever et al., 2014) which has been widely used in machine translation (Bahdanau et al., 2014;Luong et al., 2015), summarization (Tan et al., 2017;Nallapati et al., 2016;Chopra et al., 2016;Cheng and Lapata, 2016), semantic parsing (Dong and Lapata, 2016) and paraphrase (Mallinson et al., 2017) or other NLG problems such as product review generation (Dong et al., 2017) and code generation (Yin and Neubig, 2017;Ling et al., 2016).Hendricks et al. (2016) propose to encode image labels for visual-language models to generate justification texts for image classification.We also introduce charge labels into Seq2Seq model to improve the charge-discriminations of generated rationales.Widely used attention mechanism (Luong et al., 2015;Xu et al., 2015) is applied to generate fact details more accurately.

COURT-VIEW-GEN Problem
Court View is the judicial explanation to interpret the reasons for the court making such charge for a case, consisting of the rationales and the charge supported by the rationales as shown in Fig. 1.In this work, we only focus on generating the part of rationales in court views.Charge prediction can be achieved by human or charge prediction systems (Luo et al., 2017).Final court views can be easily constructed by combining the generated rationales and the pre-decided charges.Fact Description is the identified facts in a case (relevant events that have happened) such as the criminal acts (e.g.degree of injury).
The input of our model is the word sequential fact description in a case and the output is a word sequential court view (rationales part).We define the fact description as x = (x 1 , x 2 , • • • , x |x| ) and the corresponding rationales as y = (y 1 , y 2 , • • • , y |y| ).The charge for the case is denoted as v and will be ex- ploited for COURT-VIEW-GEN.The task of COURT-VIEW-GEN is to find ŷ given x conditioned on the charge label v: where P (y|x, v) is the likelihood of the predicted rationales in the court view.
4 Our Model

Sequence-to-Sequence Model with Attention
Similar to Luong et al. (2015), our Seq2Seq model consists of an encoder and a decoder as shown in Fig. 2. Given the pair of fact description and rationales in court view (x, y), the encoder reads the word sequence of x and then the decoder will learn to predict the rationales in court view y.The probability of predicted y is given as follows: where We use a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) as encoder and use another LSTM as decoder similar to Du et al. (2017).
Decoder.From the decoder side, at time t, the probability to predict y t is computed as follows: where W 0 and W 1 are learnable parameters; s t is the hidden state of decoder at time t; c t is the context vector generated from the encoder side containing the information of x at time t; here the bias of model is omitted for simplification.The hidden state of s t is computed as follows: where y t−1 is the word embedding vector for prestate target word at time t − 1.The initial state for decoder is initialized by the last state of encoder.Context vector of c t is computed by summing up the hidden states of {h k } |x| k=1 generated by the encoder with attention mechanism and we adopt global attention (Luong et al., 2015) in our work.Encoder with Attention.We adopt a one-layer bidirectional LSTM to encoder the fact descriptions.The hidden state h j at time j is computed as follows: where h j is the concatenation of forward hidden state − → h j and backward hidden state ← − h j , specifically: The hidden outputs {h k } |x| k=1 will be used to compute the context vectors for decoder.
From the decoder side, by applying attention mechanism at time i, the context vector of c i is generated as follows: where α ij is the attention weight and is computed as follows: where s i is the hidden output state at time i in the decoder side.

Label-conditioned Sequence-to-Sequence Model with Attention
Given the tuple of fact description, rationales in court view and charge label (x, y, v), the probability to predict y is computed as follows: From this formula, encoding charge labels provides extra constrains comparing to Eq. ( 2), and restricts the target word searching space from the whole space to only gold-charge-related space for rationales generation, so model can generate more charge-distinct rationales.Charge labels are trainable parameters denoted by E v where every charge will have a trainable vector from E v , which will be updated in the model training process.
As shown in Fig. 2, in the decoder side, at time t, y t is predicted with the probability as follows: is the embedding vector of v obtained from E v .In this formula, we connect charge label v to s t and c t aiming to influence the word selection process.We hope that our model can learn the latent connections between the charge label v and the words of rationales in court views through this way, to decode out charge-discriminative words.
As shown in Fig. 2, we further embed the charge label v to highlight the computing of hidden state s t at time t and s t is merged as follows: where W v and b v are learnable parameters.In this way, the information of charge label can be embedded into s t .From Eq. (3) and Eq. ( 4), attention weights c t are computed from s t , so encoding the charge label v to hidden states will make the model concentrate more on charge-related information from fact descriptions to help generate more accurate fact details.

Model Training and Inference
Suppose we are given the training data: , we aim to maximize the log-likelihood of generated rationales in court views given the fact descriptions and charge labels, so the loss function is computed as follows: We split the training data into multiple batches with size of 64 and adopt adam learning (Kingma and Ba, 2014)  the fact descriptions and charge labels into vectors and use the decoder to generate rationales in court views based on Eq. ( 1).We adopt the algorithm of beam search to generate rationales.Beam search size is set to 5. To make generation process stoppable, an indicator tag "</s>" is added to the end of the rationales sequences, and when "</s>" is generated the inference process will be terminated.The generated word sequential paths will be ranked and the one with largest value is selected as the final rationales in court view.

Data Preparation
Following Luo et al. (2017), we construct dataset from the published legal documents in China Judgements Online2 .We extract the fact descriptions, rationales in court views and charge labels using regular expressions.The paragraph started with "经审理查明" ("our court identified that") is regarded as the fact description and the part between "本院认为" ("our court hold that") and the charge are regarded as the rationales.Nearly all the samples in dataset match this extraction pattern.Length threshold of 256 is set up, and fact description longer than that will be stripped, leaving too long facts for future study.We use the tokens of "<name>", "<num>" and "<date>" to replace the names, numbers and dates appearing in the corpus.We tokenize the Chinese texts with the open source tool of HanLP3 .For charge labels, we select the top 50 charge labels ranked by occurrences and leave the left charges as others.Details about our dataset are shown in Table 1.
For cases with multiple charges and multiple defendants, we can separate the fact descriptions and the court views according to the charges or the defendants.In this work, we only focus on the cases with one defendant and one charge, leaving the complex cases for future study, so we can collect large enough data from the published legal documents without human to annotate the data.

Experimental Settings
Word embeddings are randomly initialized and updated in the training process, with the size of 512 tuned from {256, 512, 1024}.Charge label vectors are initialized randomly with size of 512.Maximal vocabulary size of encoder is set to 100K words and decoder is 50K by stripping words exceeding the bounds.Maximal source length is 256 and target is 50.The hidden size of LSTM is 1024 tuned from {256, 512, 1024}.We choose perplexity as the update metric.Early stopping mechanism is applied to train the model.The initial learning rate is set to 0.0003 and the reduce factor is 0.5.Model performance will be checked on the validation set after every 1000 batches training and keep the parameters with lowest perplexity.Training process will be terminated if model performance is not improved for successive 8 times.

Comparisons with Baselines
Evaluation Metrics.We adopt both automatic evaluation and human judgement for model evaluation.BLEU-4 score (Papineni et al., 2002) and variant Rouge scores (Lin, 2004) are adopted for automatic evaluation which have been widely used in many NLG tasks.We set up two evaluation dimensions for human judgement: 1) how fluent of the rationales in court view is; 2) how accurate of the rationales is, aiming to evaluate how many fact details have been accurately expressed in the generated rationales.We adopt 5 scales for both fluent and accurate evaluation (5 is for the best).We ask three annotators who knows well about our task to conduct the human judgement.We randomly select 100 generated rationales in court views for every evaluated method.The three raters are also asked to judge whether rationales can be adopted for use in comprehensive evaluation (adoptable) and record the number of adoptable rationales for every evaluated method.Baselines.
• Rand is to randomly select rationales in court views from the training set (method of Rand all ).We also randomly choose rationales from pools with same charge labels (Rand charge ).Adopting Rand method is to indicate the low bound performance of COURT-VIEW-GEN.
• BM25 is a retrieval baseline to index the fact description match to the input fact description with highest BM25 score (Robertson and Walker, 1994) from the training set, and use its rationales as the result (BM25 f2f ).Similar fact descriptions may have the similar rationales.Fact descriptions from pools with same charges are also retrieved (BM25 f2f+charge ), to see how much improvement that adding charge labels can gender.
• MOSES+ (Koehn et al., 2007) is a phrasebased statistical machine translation system mapping fact descriptions to rationales.KenLM (Heafield et al., 2013) is adopted to train a trigram language model on the target corpus of training set which is tuned on the validation set with MERT.
• NN-S2S is the basic Seq2Seq model without attention from Sutskever et al. (2014) for machine translation.We set one LSTM layer for encoding and another one LSTM layer for decoding.We adopt perplexity for training metric and select the model with lowest perplexity on validation set.
• RAS † is an attention based abstract summarization model from Chopra et al. (2016).To deal with the much longer fact descriptions, we exploit the more advanced bidirectional LSTM model for the encoder instead of the simple convolutional model.Another LSTM model is set as the decoder coherent to Chopra et al. (2016).Experimental Results.In automatic evaluation from Table 2, the evaluation scores are relatively high even for method of Rand charge , which indicates that the expressions of the rationales with same charge labels are similar with many overlapped n-grams, such that the rationales for crime of theft usually begin with "以非法占有为目的" ("in intention of illegal possession").Accurately generating fact details like degree of injury or time of theft is more difficult.Retrieval method by adding charge labels is the strong baseline even better than basic Seq2Seq model.Adding attention mechanism will improve the performance indicated by the method of RAS † which is superior to retrieval methods.By exploiting charge labels, our full model achieves the best performance.The performances of statistical machine translation model are really poor, for it requiring the lengths of parallel corpus to be similar.
In human evaluation, we can see that retrieval methods can not accurately express fact details, for that it is hard to retrieve rationales containing details all matching the fact descriptions.However, our system can learn to generate fact details by analyzing fact descriptions.Dropping attention mechanism will have negative effects on model performance.RAS † has worse performance in ACC.whose main reason may lie in that RAS † can not generate charge-discriminative rationales with deduced information, which demonstrates that our task is not the simple DOC-SUM task.For the fluent evaluation, generation models are highly close to retrieval methods whose rationales are written by humans, which reflects that the generation models can generate highly natural rationales.

Further Analysis
Impact of Exploiting Charge Labels.
• Charge2Charge Analysis.We first analyze the effects of exploiting charge labels on model performance charge to charge, by dropping to encode charges based on our full model.From the results shown in Fig. 3, we can find that the results can be improved much by exploiting charge labels among nearly all charges.This result also indicates that the non-distinct fact descriptions are common among nearly all charges and reflects the difficulty of this task, but utilizing charge labels can release the seriousness of the problem.
• Charge-discriminations Analysis.We further evaluate the effects of charge labels for charge-discriminations improvement on specific charges with non-distinct fact descriptions: intentional homicide, negligent homicide, duty embezzlement and corruption.For every charge, two participants are asked to count the number of rationales that are relevant to the charge on 20 randomly selected candidates.From Fig. 4, the number of charge discriminative rationales can be much improved among every charge by utilizing charge information, which demonstrates that charge labels can provide with much extra charge-related information to deal with latent information in fact descriptions.For crimes of homicide, the motives for killing are latent in the descriptions of killing without direct statement, but our system can learn to align the motives in rationales to the charge labels which are the strong distinct indicator for the two motives.Ablation Study.We also ablate our full model to reveal different components of encoding charge labels for performance improvement.As shown in Table 3, " / softmax comp." is to remove the part in Eq. ( 6) and yields worse performance than our full model, but better than " / charge comp." that ignores to encode charge labels, which is same to the situation of " / hidden comp." that removes the part in Eq. ( 7).Our full model is still better than the ablated models.This finding shows that both of the methods of exploiting charge labels can improve model performance and stacking them will achieve better results.Attention Mechanism Analysis.Heat map in Fig. 5 is used to illustrate the attention mechanism.The "slight injury" is aligned between the source and target."responsibility" and "run" are well aligned to "away", which demonstrate the   the results, we can find that automatic evaluations track well with the human judgement with high correlation coefficients.This finding demonstrates that BLEU-4 and variant Rouges are adoptable for COURT-VIEW-GEN evaluation and provides the basis for future studies on this task.
Error Analysis.Our model has the drawback of generating latent fact details, which appear in rationales but are not clearly expressed in fact descriptions.For example, for the time of theft in charge of larceny, the term of "多次" ("several times") appears in rationales but may not be expressed in fact descriptions directly, only with descriptions of larceny but without exact term for this detail, so it will be hard for attention mechanism to learn to align "多次" in rationales to latent information in fact descriptions.In the generated rationales on test set, we find that only 42.4% samples can accurately extract out the term of "多次".It may need designed rules to deal with such details, like that count the time of theft from the descriptions, and if the time exceeds 1 then the term of "多次" can be generated in rationales.

Analysis through Cases
Fake Charge Label Conditioned Study.What generated rationales in court views will be if they are conditioned on fake charge labels?We select one fact description with gold charge of intentional injury, then generate rationales conditioned on fake charges of defiance and affray crime, intentional homicide and neglectful homicide.From Fig. 8, the rationales conditioned on fake charges will be partly relevant to fake charge labels and also maintain fact details from the input fact description of gold charge.For the fake charge of intentional homicide, its fact details should be "caused someone dead", but instead express "causing someone slight injury" which is relevant to charge of intentional injury.For charge prediction systems, the discriminations between fact details and charges will help to remind people that the prediction results may be unreliable.
Case Study.Examples of generated rationales in court views are shown in Fig. 8. Generally speaking, our full label-conditioned model has high accuracy on generating fact details better than baseline models.For charges of traffic accident crime and negligent homicide, all fact details are generated.The extra information from charge labels helps the model to capture more important fact details, by forcing model to pay more attention to charge-related information in fact descriptions.
As for the charge-discrimination analysis, from the rationales of negligent homicide, we can infer that its fact description may relate to a traffic accident, which is non-distinct from that for traffic accident crime.Without encoding charge labels, Ours / c wrongly generates the rationales coherent to traffic accident crime, because traffic accidents are the strong indicator for traffic crimes, but the charge label will provide extra bias towards the homicide crime, so our full model can generate highly discriminative rationales.Utilizing charge labels, retrieval method can easily retrieve chargerelated rationales, but hard to index rationales with accurate fact details.For charge of larceny, our full model extracts nearly all fact details but misses the fact of "多次"("several times"), reflecting the shortcoming of dealing with latent details.

Conclusion and Future Work
In this paper, we propose a novel task of court view generation and formulate it as a text-to-text NLG problem.We utilize charge labels to benefit the generation of charge-discriminative rationales in court views and propose a label-conditioned Seq2Seq model with attention for this task.Extensive experiments show the efficiency of our model and exploiting charge labels.
In the future: 1) More advanced technologies like reinforcement learning (Sutton and Barto, 1998) can be introduced to generate latent fact details such as the time of theft more accurately; 2) In this work, we only generate rationales in court views omitting charge prediction, it is interesting to see whether jointly generating the two parts will benefit both of the tasks; 3) Studying verification mechanism is meaningful to judge whether generated court views can really be adopted which is important for COURT-VIEW-GEN in practice; 4) More complex cases with multiple charges and multiple defendants will be considered in the future.times") which is important in penalty measurement.Actually, the time of larceny is not all directly expressed in fact description and only describes the fact of larceny, so it is hard for model to learn to align the time of larceny in court view to latent information in fact description.Fake Charge Label Conditioned Study.What generated court views will be if they are conditioned on fake charge labels?We select one fact description with gold charge label of intentional injury then generate court views conditioned on fake charge labels of defiance and affray crime, intentional homicide and neglectful homicide.From Table ??, the court views conditioned on fake charges will be class-discriminative relevant to the fake charge labels and also maintain fact details from the input fact description of gold charge.For the fake charge of intentional homicide, its corresponding fact will be "caused someone dead", but instead express "causing someone slight injury" which is relevant to charge of intentional injury.The discriminations between fact details and charge will help to remind people that the prediction for charge may be unreliable.

Conclusion and Future Works
In this paper, we propose a meaningful but notwell studied task of court view generation.We introduce a novel charge label conditioned sequence to sequence model for COURT-VIEW-GEN.
Experimental results show the effectiveness of our model.Generating court views conditioned on charge labels by encoding charge labels will significantly improve the class-discriminations of generated court views.
In the future: 1) We will apply the copy mechanism (??) to improve the diversities and characteristics of generated court views which are important for generating high-quality court views; 2) More advanced technologies like reinforcement learning (?) will be introduced to generate latent fact details such as the time of theft more accurately; 3) In this work, we only generate rationales in court views omitting charge prediction, it is interesting to see whether jointly generating the two parts will benefit both of the tasks.

Figure 1 :
Figure 1: An example of fact description and court view from a legal document in a case.

Figure 3 :Figure 4 :
Figure 3: Results of impact of exploiting charge labels evaluated charge to charge in the metric of BLEU-4 (similar results can gender in other three metrics but are omitted for space saving).

Figure 5 :Figure 6 :
Figure 5: Heat map for attention mechanism analysis.The column is the source and the raw is the target.

Figure 7 :
Figure 7: ACC. and ADOPT. of human judgement predict automatic evaluation scores.

Table 1 :
to update the parameters in every batch data.At the inference time, we encode Statistics of our dataset.

Table 2 :
Results of automatic evaluation and human judgement with BLEU-4 and full length of F1 scores of variant Rouges.Best results are labeled as boldface.Statistical significance is indicated with * * (p < 0.01) and * (p < 0.05) comparing to our full model.