How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence

Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. In recent years, LegalAI has drawn increasing attention rapidly from both AI researchers and legal professionals, as LegalAI is beneficial to the legal system for liberating legal professionals from a maze of paperwork. Legal professionals often think about how to solve tasks from rule-based and symbol-based methods, while NLP researchers concentrate more on data-driven and embedding methods. In this paper, we introduce the history, the current state, and the future directions of research in LegalAI. We illustrate the tasks from the perspectives of legal professionals and NLP researchers and show several representative applications in LegalAI. We conduct experiments and provide an in-depth analysis of the advantages and disadvantages of existing works to explore possible future directions. You can find the implementation of our work from https://github.com/thunlp/CLAIM.


Introduction
Legal Artificial Intelligence (LegalAI) mainly focuses on applying artificial intelligence technology to help legal tasks. The majority of the resources in this field are presented in text forms, such as judgment documents, contracts, and legal opinions. Therefore, most LegalAI tasks are based on Natural Language Processing (NLP) technologies.
LegalAI plays a significant role in the legal domain, as they can reduce heavy and redundant work for legal professionals. Many tasks in the legal domain require the expertise of legal practitioners and a thorough understanding of various legal documents. Retrieving and understanding legal documents take lots of time, even for legal professionals. * Corresponding author. Therefore, a qualified system of LegalAI should reduce the time consumption of these tedious jobs and benefit the legal system. Besides, LegalAI can also provide a reliable reference to those who are not familiar with the legal domain, serving as an affordable form of legal aid.
In order to promote the development of LegalAI, many researchers have devoted considerable efforts over the past few decades. Early works (Kort, 1957;Ulmer, 1963;Nagel, 1963;Segal, 1984;Gardner, 1984) always use hand-crafted rules or features due to computational limitations at the time. In recent years, with rapid developments in deep learning, researchers begin to apply deep learning techniques to LegalAI. Several new LegalAI datasets have been proposed (Kano et al., 2018;Duan et al., 2019;Chalkidis et al., 2019b,a), which can serve as benchmarks for research in the field. Based on these datasets, researchers began exploring NLP-based solutions to a variety of LegalAI tasks, such as Legal Judgment Prediction (Aletras et al., 2016;Luo et al., 2017;Chen et al., 2019), Court View Generation (Ye et al., 2018), Legal Entity Recognition and Classification (Cardellino et al., 2017;ANGELIDIS et al., 2018), Legal Question Answering (Monroy et al., 2009;Taniguchi and Kano, 2016;Kim and Goebel, 2017), Legal Summarization (Hachey and Grover, 2006;Bhattacharya et al., 2019).
As previously mentioned, researchers' efforts over the years led to tremendous advances in LegalAI. To summarize, some efforts concentrate on symbol-based methods, which apply interpretable hand-crafted symbols to legal tasks (Ashley, 2017;Surden, 2018). Meanwhile, other efforts with embedding-based methods aim at designing efficient neural models to achieve better performance (Chalkidis and Kampas, 2019). More specifically, symbol-based methods concentrate more on utilizing interpretable legal knowledge to reason between symbols in legal documents, like events and relationships. Meanwhile, embedding-based methods try to learn latent features for prediction from large-scale data. The differences between these two methods have caused some problems in existing works of LegalAI. Interpretable symbolic models are not effective, and embedding-methods with better performance usually cannot be interpreted, which may bring ethical issues to the legal system such as gender bias and racial discrimination. The shortcomings make it difficult to apply existing methods to real-world legal systems.
We summarize three primary challenges for both embedding-based and symbol-based methods in LegalAI: (1) Knowledge Modelling. Legal texts are well formalized, and there are many domain knowledge and concepts in LegalAI. How to utilize the legal knowledge is of great significance.
(2) Legal Reasoning. Although most tasks in NLP require reasoning, the LegalAI tasks are somehow different, as legal reasoning must strictly follow the rules well-defined in law. Thus combining predefined rules and AI technology is essential to legal reasoning. Besides, complex case scenarios and complex legal provisions may require more sophisticated reasoning for analyzing. (3) Interpretability.
Decisions made in LegalAI usually should be interpretable to be applied to the real legal system. Otherwise, fairness may risk being compromised. Interpretability is as important as performance in LegalAI.
The main contributions of this work are con-cluded as follows: (1) We describe existing works from the perspectives of both NLP researchers and legal professionals. Moreover, we illustrate several embedding-based and symbol-based methods and explore the future direction of LegalAI. (2) We describe three typical applications, including judgment prediction, similar case matching, and legal question answering in detail to emphasize why these two kinds of methods are essential to LegalAI. (3) We conduct exhaustive experiments on multiple datasets to explore how to utilize NLP technology and legal knowledge to overcome the challenges in LegalAI. You can find the implementation from github 1 . (4) We summarize LegalAI datasets, which can be regarded as the benchmark for related tasks. The details of these datasets can be found from github 2 with several legal papers worth reading.

Embedding-based Methods
First, we describe embedding-based methods in LegalAI, also named as representation learning. Embedding-based methods emphasize on representing legal facts and knowledge in embedding space, and they can utilize deep learning methods for corresponding tasks.

Character, Word, Concept Embeddings
Character and word embeddings play a significant role in NLP, as it can embed the discrete texts into continuous vector space. Many embedding methods have been proved effective (Mikolov et al., 2013;Joulin et al., 2016;Pennington et al., 2014;Peters et al., 2018;Yang et al., 2014;Bordes et al., 2013;Lin et al., 2015) and they are crucial for the effectiveness of the downstream tasks.
In LegalAI, embedding methods are also essential as they can bridge the gap between texts and vectors. However, it seems impossible to learn the meaning of a professional term directly from some legal factual description. Existing works (Chalkidis and Kampas, 2019;Nay, 2016) mainly revolve around applying existing embedding methods like Word2Vec to legal domain corpora. To overcome the difficulty of learning professional vocabulary representations, we can try to capture both grammatical information and legal knowledge in word embedding for corresponding tasks. Knowledge modelling is significant to LegalAI, as many results should be decided according to legal rules and knowledge.
Although knowledge graph methods in the legal domain are promising, there are still two major challenges before their practical usage. Firstly, the construction of the knowledge graph in LegalAI is complicated. In most scenarios, there are no ready-made legal knowledge graphs available, so researchers need to build from scratch. In addition, different legal concepts have different representations and meanings under legal systems in different countries, which also makes it challenging to construct a general legal knowledge graph. Some researchers tried to embed legal dictionaries (Cvrček et al., 2012), which can be regarded as an alternative method. Secondly, a generalized legal knowledge graph is different in the form with those commonly used in NLP. Existing knowledge graphs concern the relationship between entities and concepts, but LegalAI focuses more on the explanation of legal concepts. These two challenges make knowledge modelling via embedding in LegalAI non-trivial, and researchers can try to overcome the challenges in the future.

Pretrained Language Models
Pretrained language models (PLMs) such as BERT (Devlin et al., 2019) have been the recent focus in many fields in NLP (Radford et al., 2019;Yang et al., 2019;. Given the success of PLM, using PLM in LegalAI is also a very reasonable and direct choice. However, there are differences between the text used by existing PLMs and legal text, which also lead to unsatisfactory performances when directly applying PLMs to legal tasks. The differences stem from the terminology and knowledge involved in legal texts. To address this issue, Zhong et al. (2019b) propose a language model pretrained on Chinese legal documents, including civil and criminal case documents. Legal domain-specific PLMs provide a more qualified baseline system for the tasks of LegalAI. We will show several experiments comparing different BERT models in LegalAI tasks.
For the future exploration of PLMs in LegalAI, researchers can aim more at integrating knowledge into PLMs. Integrating knowledge into pretrained models can help the reasoning ability between legal concepts. Lots of work has been done on integrating knowledge from the general domain into models Peters et al., 2019;Hayashi et al., 2019). Such technology can also be considered for future application in LegalAI.

Symbol-based Methods
In this section, we describe symbol-based methods, also named as structured prediction methods. Symbol-based methods are involved in utilizing legal domain symbols and knowledge for the tasks of LegalAI. The symbolic legal knowledge, such as events and relationships, can provide interpretability. Deep learning methods can be employed for symbol-based methods for better performance.
IE in LegalAI has also attracted the interests of many researchers. To make better use of the particularity of legal texts, researchers try to use ontology (Bruckschen et al., 2010;Cardellino et al., 2017;Lenci et al., 2009; or global consistency (Yin et al., 2018) for named entity recognition in LegalAI. To extract relationship and events from legal documents, re-searchers attempt to apply different NLP technologies, including hand-crafted rules (Bartolini et al., 2004;Truyens and Eecke, 2014), CRF (Vacek and Schilder, 2017), joint models like SVM, CNN, GRU (Vacek et al., 2019), or scale-free identifier network (Yan et al., 2017) for promising results.
Existing works have made lots of efforts to improve the effect of IE, but we need to pay more attention to the benefits of the extracted information. The extracted symbols have a legal basis and can provide interpretability to legal applications, so we cannot just aim at the performance of methods. Here, we show two examples of utilizing the extracted symbols for interpretability of LegalAI: Relation Extraction and Inheritance Dispute. Inheritance dispute is a type of cases in Civil Law that focuses on the distribution of inheritance rights. Therefore, identifying the relationship between the parties is vital, as those who have the closest relationship with the deceased can get more assets. Towards this goal, relation extraction in inheritance dispute cases can provide the reason for judgment results and improve performance.
Event Timeline Extraction and Judgment Prediction of Criminal Case. In criminal cases, multiple parties are often involved in group crimes. To decide who should be primarily responsible for the crime, we need to determine what everyone has done throughout the case, and the order of these events is also essential. For example, in the case of crowd fighting, the person who fights first should bear the primary responsibility. As a result, a qualified event timeline extraction model is required for judgment prediction of criminal cases.
In future research, we need to concern more about applying extracted information to the tasks of LegalAI. The utilization of such information depends on the requirements of specific tasks, and the information can provide more interpretability.

Legal Element Extraction
In addition to those common symbols in general NLP, LegalAI also has its exclusive symbols, named legal elements. The extraction of legal elements focuses on extracting crucial elements like whether someone is killed or something is stolen. These elements are called constitutive elements of crime, and we can directly convict offenders based on the results of these elements. Utilizing these elements can not only bring intermediate supervision information to the judgment prediction task but also make the prediction results of the model more interpretable.
Fact Description: One day, Bob used a fake reason for marriage decoration to borrow RMB 2k from Alice. After arrested, Bob has paid the money back to Alice.
Whether did Bob sell something? × Whether did Bob make a fictional fact?
Whether did Bob illegally possess the property of others?
Judgment Results: Fraud. Towards a more in-depth analysis of elementbased symbols, Shu et al. (2019) propose a dataset for extracting elements from three different kinds of cases, including divorce dispute, labor dispute, and loan dispute. The dataset requires us to detect whether the related elements are satisfied or not, and formalize the task as a multi-label classification problem. To show the performance of existing methods on element extraction, we have conducted experiments on the dataset, and the results can be found in Table 2  We have implemented several classical encoding models in NLP for element extraction, including TextCNN (Kim, 2014), DPCNN (Johnson and Zhang, 2017), LSTM (Hochreiter and Schmidhuber, 1997), BiDAF (Seo et al., 2016), and BERT (Devlin et al., 2019). We have tried two different versions of pretrained parameters of BERT, including the origin parameters (BERT) and the parameters pretrained on Chinese legal documents (BERT-MS) (Zhong et al., 2019b). From the results, we can see that the language model pretrained on the general domain performs worse than domain-specific PLM, which proves the necessity of PLM in LegalAI. For the following parts of our paper, we will use BERT pretrained on legal documents for better performance.
From the results of element extraction, we can find that existing methods can reach a promising performance on element extraction, but are still not sufficient for corresponding applications. These elements can be regarded as pre-defined legal knowledge and help with downstream tasks. How to improve the performance of element extraction is valuable for further research.

Applications of LegalAI
In this section, we will describe several typical applications in LegalAI, including Legal Judgment Prediction, Similar Case Matching and Legal Question Answering. Legal Judgment Prediction and Similar Case Matching can be regarded as the core function of judgment in Civil Law and Common Law system, while Legal Question Answering can provide consultancy for those who are unfamiliar with the legal domain. Therefore, exploring these three tasks can cover most aspects of LegalAI.

Legal Judgment Prediction
Legal Judgment Prediction (LJP) is one of the most critical tasks in LegalAI, especially in the Civil Law system. In the Civil Law system, the judgment results are decided according to the facts and the statutory articles. One will receive legal sanctions only after he or she has violated the prohibited acts prescribed by law. The task LJP mainly concerns how to predict the judgment results from both the fact description of a case and the contents of the statutory articles in the Civil Law system.
As a result, LJP is an essential and representative task in countries with Civil Law system like France, Germany, Japan, and China. Besides, LJP has drawn lots of attention from both artificial intelligence researchers and legal professionals. In the following parts, we describe the research progress and explore the future direction of LJP.

Related Work
LJP has a long history. Early works revolve around analyzing existing legal cases in specific circumstances using mathematical or statistical methods (Kort, 1957;Ulmer, 1963;Nagel, 1963;Keown, 1980;Segal, 1984;Lauderdale and Clark, 2012). The combination of mathematical methods and legal rules makes the predicted results interpretable.
Fact Description: One day, the defendant Bob stole cash 8500 yuan and T-shirts, jackets, pants, shoes, hats (identified a total value of 574.2 yuan) in Beijing Lining store.

Relevant Articles
Article 264 of Criminal Law.
Applicable Charges Theft.
Term of Penalty 6 months. Table 3: An example of legal judgment prediction from . In this example, the judgment results include relevant articles, applicable charges and the the term of penalty.
To promote the progress of LJP,  have proposed a large-scale Chinese criminal judgment prediction dataset, C-LJP. The dataset contains over 2.68 million legal documents published by the Chinese government, making C-LJP a qualified benchmark for LJP. C-LJP contains three subtasks, including relevant articles, applicable charges, and the term of penalty. The first two can be formalized as multi-label classification tasks, while the last one is a regression task. Besides, English LJP datasets also exist (Chalkidis et al., 2019a), but the size is limited.
With the development of the neural network, many researchers begin to explore LJP using deep learning technology Li et al., 2019b;Li et al., 2019a;Kang et al., 2019). These works can be divided into two primary directions. The first one is to use more novel models to improve performance. Chen et al. (2019) use the gating mechanism to enhance the performance of predicting the term of penalty. Pan et al. (2019) propose multi-scale attention to handle the cases with multiple defendants. Besides, other researchers explore how to utilize legal knowledge or the properties of LJP. Luo et al. (2017) use the attention mechanism between facts and law articles to help the prediction of applicable charges.  present a topological graph to utilize the relationship between different tasks of LJP. Besides,  incorporate ten discriminative legal attributes to help predict low-frequency charges.

Experiments and Analysis
To better understand recent advances in LJP, we have conducted a series of experiments on C-LJP. Firstly, we implement several classical text classification models, including TextCNN (Kim, 2014), DPCNN (Johnson and Zhang, 2017),  LSTM (Hochreiter and Schmidhuber, 1997), and BERT (Devlin et al., 2019). For the parameters of BERT, we use the pretrained parameters on Chinese criminal cases (Zhong et al., 2019b). Secondly, we implement several models which are specially designed for LJP, including FactLaw (Luo et al., 2017), TopJudge , and Gating Network (Chen et al., 2019). The results can be found in Table 4.
From the results, we can learn that most models can reach a promising performance in predicting high-frequency charges or articles. However, the models perform not well on low-frequency labels as there is a gap between micro-F1 and macro-F1.  have explored few-shot learning for LJP. However, their model requires additional attribute information labelled manually, which is time-consuming and makes it hard to employ the model in other datasets. Besides, we can find that performance of BERT is not satisfactory, as it does not make much improvement from those models with fewer parameters. The main reason is that the length of the legal text is very long, but the maximum length that BERT can handle is 512. According to statistics, the maximum document length is 56, 694, and the length of 15% documents is over 512. Document understanding and reasoning techniques are required for LJP.
Although embedding-based methods can achieve promising performance, we still need to consider combining symbol-based with embedding-based methods in LJP. Take TopJudge as an example, this model formalizes topological order between the tasks in LJP (symbol-based part) and uses TextCNN for encoding the fact description. By combining symbol-based and embedding-based methods, TopJudge has achieved promising results on LJP. Comparing the results between TextCNN and TopJudge, we can find that just integrating the order of judgments into the model can lead to improvements, which proves the necessity of combining embedding-based and symbol-based methods.
For better LJP performance, some challenges require the future efforts of researchers: (1) Document understanding and reasoning techniques are required to obtain global information from extremely long legal texts. (2) Few-shot learning.
Even low-frequency charges should not be ignored as they are part of legal integrity. Therefore, handling in-frequent labels is essential to LJP. (3) Interpretability. If we want to apply methods to real legal systems, we must understand how they make predictions. However, existing embedding-based methods work as a black box. What factors affected their predictions remain unknown, and this may introduce unfairness and ethical issues like gender bias to the legal systems. Introducing legal symbols and knowledge mentioned before will benefit the interpretability of LJP.

Similar Case Matching
In those countries with the Common Law system like the United States, Canada, and India, judicial decisions are made according to similar and representative cases in the past. As a result, how to identify the most similar case is the primary concern in the judgment of the Common Law system.
In order to better predict the judgment results in the Common Law system, Similar Case Matching (SCM) has become an essential topic of LegalAI. SCM concentrate on finding pairs of similar cases, and the definition of similarity can be various. SCM requires to model the relationship between cases from the information of different granularity, like fact level, event level and element level. In other words, SCM is a particular form of semantic matching (Xiao et al., 2019), which can benefit the legal information retrieval.

Related Work
Traditional methods of Information Retrieve (IR) focus on term-level similarities with statistical models, including TF-IDF (Salton and Buckley, 1988) and BM25 (Robertson and Walker, 1994), which are widely applied in current search systems. In addition to these term matching methods, other researchers try to utilize meta-information (Medin, 2000;Gao et al., 2011;Wu et al., 2013) to capture semantic similarity. Many machine learning methods have also been applied for IR like SVD (Xu et al., 2010) or factorization (Rendle, 2010;Kabbur et al., 2013). With the rapid development of deep learning technology and NLP, many researchers apply neural models, including multi-layer perceptron (Huang et al., 2013), CNN (Shen et al., 2014;Hu et al., 2014;Qiu and Huang, 2015), and RNN (Palangi et al., 2016) to IR.
There are several LegalIR datasets, including COLIEE (Kano et al., 2018), CaseLaw (Locke and Zuccon, 2018), and CM (Xiao et al., 2019). Both COLIEE and CaseLaw are involved in retrieving most relevant articles from a large corpus, while data examples in CM give three legal documents for calculating similarity. These datasets provide benchmarks for the studies of LegalIR. Many researchers focus on building an easy-to-use legal search engine (Barmakian, 2000;Turtle, 1995). They also explore utilizing more information, including citations (Monroy et al., 2013;Geist, 2009;Raghav et al., 2016) and legal concepts (Maxwell and Schafer, 2008;Van Opijnen and Santos, 2017). Towards the goal of calculating similarity in semantic level, deep learning methods have also been applied to LegalIR. Tran et al. (2019) propose a CNN-based model with document and sentence level pooling which achieves the state-of-the-art results on COLIEE, while other researchers explore employing better embedding methods for Le-galIR (Landthaler et al., 2016;Sugathadasa et al., 2018).

Experiments and Analysis
To get a better view of the current progress of Le-galIR, we select CM (Xiao et al., 2019) for experiments. CM contains 8, 964 triples where each triple contains three legal documents (A, B, C). The task designed in CM is to determine whether B or C is more similar to A. We have implemented four different types of baselines: (1) Term matching methods, TF-IDF (Salton and Buckley, 1988). (2) Siamese Network with two parametershared encoders, including TextCNN (Kim, 2014), BiDAF (Seo et al., 2016) and BERT (Devlin et al., 2019), and a distance function. (3) Semantic matching models in sentence level, ABCNN (Yin et al., 2016), and document level, SMASH-RNN . The results can be found in Table 5  From the results, we observe that existing neural models which are capable of capturing semantic information outperform TF-IDF, but the performance is still not enough for SCM. As Xiao et al. (2019) state, the main reason is that legal professionals think that elements in this dataset define the similarity of legal cases. Legal professionals will emphasize on whether two cases have similar elements. Only considering term-level and semantic-level similarity is insufficient for the task.
For the further study of SCM, there are two directions which need future effort: (1) Elementalbased representation. Researchers can focus more on symbols of legal documents, as the similarity of legal cases is related to these symbols like elements. (2) Knowledge incorporation. As semantic-level matching is insufficient for SCM, we need to consider about incorporating legal knowledge into models to improve the performance and provide interpretability.

Legal Question-Answering
Another typical application of LegalAI is Legal Question Answering (LQA) which aims at answering questions in the legal domain. One of the most important parts of legal professionals' work is to provide reliable and high-quality legal consulting services for non-professionals. However, due to the insufficient number of legal professionals, it is often challenging to ensure that non-professionals  Table 6: Experimental results of JEC-QA. The evaluation metrics is accuracy. The performance of unskilled and skilled humans is collected from original paper.
Question: Which crimes did Alice and Bob commit if they transported more than 1.5 million yuan of counterfeit currency from abroad to China?
Direct Evidence P1: Transportation of counterfeit money: · · · The defendants are sentenced to three years in prison. P2: Smuggling counterfeit money: · · · The defendants are sentenced to seven years in prison.

Extra Evidence
P3: Motivational concurrence: The criminals carry out one behavior but commit several crimes. P4: For motivational concurrence, the criminals should be convicted according to the more serious crime.

Comparison: seven years > three years
Answer: Smuggling counterfeit money. can get enough and high-quality consulting services, and LQA is expected to address this issue. In LQA, the form of questions varies as some questions will emphasize on the explanation of some legal concepts, while others may concern the analysis of specific cases. Besides, questions can also be expressed very differently between professionals and non-professionals, especially when describing domain-specific terms. These problems bring considerable challenges to LQA, and we conduct experiments to demonstrate the difficulties of LQA better in the following parts.

Related Work
In LegalAI, there are many datasets of question answering. Duan et al. (2019) propose CJRC, a legal reading comprehension dataset with the same format as SQUAD 2.0 (Rajpurkar et al., 2018), which includes span extraction, yes/no questions, and unanswerable questions. Besides, COLIEE (Kano et al., 2018) contains about 500 yes/no questions. Moreover, the bar exam is a professional qualification examination for lawyers, so bar exam datasets (Fawei et al., 2016;Zhong et al., 2019a) may be quite hard as they require professional legal knowledge and skills.
In addition to these datasets, researchers have also worked on lots of methods on LQA. The rulebased systems (Buscaldi et al., 2010;Kim et al., 2013;Kim and Goebel, 2017) are prevalent in early research. In order to reach better performance, researchers utilize more information like the explanation of concepts (Taniguchi and Kano, 2016;Fawei et al., 2015) or formalize relevant documents as graphs to help reasoning (Monroy et al., 2009(Monroy et al., , 2008Tran et al., 2013). Machine learning and deep learning methods like CRF (Bach et al., 2017), SVM (Do et al., 2017), and CNN (Kim et al., 2015) have also been applied to LQA. However, most existing methods conduct experiments on small datasets, which makes them not necessarily applicable to massive datasets and real scenarios.

Experiments and Analysis
We select JEC-QA (Zhong et al., 2019a) as the dataset of the experiments, as it is the largest dataset collected from the bar exam, which guarantees its difficulty. JEC-QA contains 28, 641 multiple-choice and multiple-answer questions, together with 79, 433 relevant articles to help to answer the questions. JEC-QA classifies questions into knowledge-driven questions (KD-Questions) and case-analysis questions (CA-Questions) and reports the performances of humans. We implemented several representative question answering models, including BiDAF (Seo et al., 2016), BERT (Devlin et al., 2019), Co-matching , and HAF (Zhu et al., 2018). The experimental results can be found in Table 6.
From the experimental results, we can learn the models cannot answer the legal questions well compared with their promising results in open-domain question answering and there is still a huge gap between existing models and humans in LQA. For more qualified LQA methods, there are several significant difficulties to overcome: (1) Legal multi-hop reasoning. As Zhong et al. (2019a) state, existing models can perform inference but not multi-hop reasoning. However, legal cases are very complicated, which cannot be handled by singlestep reasoning.
(2) Legal concepts understanding. We can find that almost all models are better at case analyzing than knowledge understanding, which proves that knowledge modelling is still challenging for existing methods. How to model legal knowledge to LQA is essential as legal knowledge is the foundation of LQA.

Conclusion
In this paper, we describe the development status of various LegalAI tasks and discuss what we can do in the future. In addition to these applications and tasks we have mentioned, there are many other tasks in LegalAI like legal text summarization and information extraction from legal contracts. Nevertheless, no matter what kind application is, we can apply embedding-based methods for better performance, together with symbol-based methods for more interpretability.
Besides, the three main challenges of legal tasks remain to be solved. Knowledge modelling, legal reasoning, and interpretability are the foundations on which LegalAI can reliably serve the legal domain. Some existing methods are trying to solve these problems, but there is still a long way for researchers to go.
In the future, for these existing tasks, researchers can focus on solving the three most pressing challenges of LegalAI combining embedding-based and symbol-based methods. For tasks that do not yet have a dataset or the datasets are not large enough, we can try to build a large-scale and highquality dataset or use few-shot or zero-shot methods to solve these problems.
Furthermore, we need to take the ethical issues of LegalAI seriously. Applying the technology of LegalAI directly to the legal system will bring ethical issues like gender bias and racial discrimination. The results given by these methods cannot convince people. To address this issue, we must note that the goal of LegalAI is not replacing the legal professionals but helping their work. As a result, we should regard the results of the models only as a reference. Otherwise, the legal system will no longer be reliable. For example, professionals can spend more time on complex cases and leave the simple cases for the model. However, for safety, these simple cases must still be reviewed. In general, LegalAI should play as a supporting role to help the legal system.