A Survey of Unsupervised Dependency Parsing

Syntactic dependency parsing is an important task in natural language processing. Unsupervised dependency parsing aims to learn a dependency parser from sentences that have no annotation of their correct parse trees. Despite its difficulty, unsupervised parsing is an interesting research direction because of its capability of utilizing almost unlimited unannotated text data. It also serves as the basis for other research in low-resource parsing. In this paper, we survey existing approaches to unsupervised dependency parsing, identify two major classes of approaches, and discuss recent trends. We hope that our survey can provide insights for researchers and facilitate future research on this topic.


Introduction
Dependency parsing is an important task in natural language processing that aims to capture syntactic information in sentences in the form of dependency relations between words. It finds applications in semantic parsing, machine translation, relation extraction, and many other tasks.
Supervised learning is the main technique used to automatically learn a dependency parser from data. It requires the training sentences to be manually annotated with their correct parse trees. Such a training dataset is called a treebank. A major challenge faced by supervised learning is that treebanks are not always available for new languages or new domains and building a high-quality treebank is very expensive and time-consuming.
There are multiple research directions that try to learn dependency parsers with few or even no syntactically annotated training sentences, including transfer learning, unsupervised learning, and semisupervised learning. Among these directions, unsupervised learning of dependency parsers (a.k.a. unsupervised dependency parsing and dependency grammar induction) is the most challenging, which aims to obtain a dependency parser without using annotated sentences. Despite its difficulty, unsupervised parsing is an interesting research direction, not only because it would reveal ways to utilize almost unlimited text data without the need for human annotation, but also because it can serve as the basis for studies of transfer and semi-supervised learning of parsers. The techniques developed for unsupervised dependency parsing could also be utilized for other NLP tasks, such as unsupervised discourse parsing (Nishida and Nakayama, 2020). In addition, research in unsupervised parsing inspires and verifies cognitive research of human language acquisition.
In this paper, we conduct a survey of unsupervised dependency parsing research. We first introduce the definition and evaluation metrics of unsupervised dependency parsing, and discuss research areas related to it. Then we present in detail two major classes of approaches to unsupervised dependency parsing: generative approaches and discriminative approaches. Finally, we discuss important new techniques and setups of unsupervised dependency parsing that appear in recent years.

Problem Definition
Dependency parsing aims at discovering the syntactic dependency tree z of an input sentence x, where x is a sequence of words x 1 , . . . , x n with length n. A dummy root word x 0 is typically added at the beginning of the sentence. A dependency tree z is a set of directed edges between words that form a directed tree structure rooted at x 0 . Each edge points from a parent word (also called a head word) to a child word.
In unsupervised dependency parsing, the goal is to obtain a dependency parser without using annotated sentences. Some work requires no training data and derives dependency trees from centrality or saliency information (Søgaard, 2012). We focus on learning a dependency parser from an unannotated dataset that consists of a set of sentences without any parse tree annotation. In many cases, part-of-speech (POS) tags of the words in the training sentences are assumed to be available during training.
Two evaluation metrics are widely used in previous work of unsupervised dependency parsing (Klein and Manning, 2004): directed dependency accuracy (DDA) and undirected dependency accuracy (UDA). DDA denotes the percentage of correctly predicted dependency edges, while UDA is similar to DDA but disregards the directions of edges when evaluating their correctness.

Related Areas
Supervised Dependency Parsing Supervised dependency parsing aims to train a dependency parser from training sentences that are manually annotated with their dependency parse trees. Generally, supervised dependency parsing approaches can be divided into graph-based approaches and transition-based approaches. A graph-based dependency parser searches for the best spanning tree of the graph that is formed by connecting all pairs of words in the input sentence. In the simplest form, a graph-based parser makes the first-order assumption that the score of a dependency tree is the summation of scores of its edges (McDonald et al., 2005). A transition-based dependency parser searches for a sequence of actions that incrementally constructs the parse tree, typically from left to right. While current start-of-the-art approaches have achieved strong results in supervised dependency parsing, their usefulness is limited to resource-rich languages and domains with many annotated datasets.
Cross-Domain and Cross-Lingual Parsing One useful approach to handling the lack of treebank resources in the target domain or language is to adapt a learned parser from a resource-rich source domain or language (Yu et al., 2015;McDonald et al., 2011;Ma and Xia, 2014;Duong et al., 2015). This is very related to unsupervised parsing as both approaches do not rely on treebanks in the target domain or language. However, unsupervised parsing is more challenging because it does not have access to any source treebank either.
Unsupervised Constituency Parsing Constituency parsing aims to discover a constituency tree of the input sentence in which the leaf nodes are words and the non-leaf nodes (nonterminal nodes) represent phrases. Unsupervised constituency parsing is often considered more difficult than unsupervised dependency parsing because it has to induce not only edges but also nodes of a tree. Consequently, there have been far more papers in unsupervised dependency parsing than in unsupervised constituency parsing over the past decade. More recently, however, there is a surge in interest in unsupervised constituency parsing and several novel approaches were proposed in the past two years . While we focus on unsupervised dependency parsing in this paper, most of our discussions on the classification of approaches and recent trends apply to unsupervised constituency parsing as well.
Latent Tree Models with Downstream Tasks Latent tree models treat the parse tree as a latent variable that is used in downstream tasks such as sentiment classification. While no treebank is used in training, these models rely on the performance of the downstream tasks to guide the learning of the latent parse trees. To enable end-to-end learning, the REINFORCE algorithm and the Gumbel-softmax trick (Jang et al., 2017) can be utilized (Yogatama et al., 2016;Choi et al., 2018). There also exists previous work on latent dependency tree models that utilizes structured attention mechanisms (Kim et al., 2017) for applications. Latent tree models differ from unsupervised parsing in that they utilize training signals from downstream tasks and that they aim to improve performance of downstream tasks instead of syntactic parsing.

Models
A generative approach models the joint probability of the sentence and the corresponding parse tree. Traditional generative models are mostly based on probabilistic grammars. To enable efficient inference, they typically make one or more relatively strict conditional independence assumptions. The simplest assumption (a.k.a. the context-free assumption) states that the generation of a token is only dependent on its head token and is independent of anything else. Such assumptions make it possible to decompose the joint probability into a product of component probabilities or scores, leading to tractable inference. However, they also lead to unavailability of useful information (e.g., context and generation history) in generating each token.
Based on their respective independence assumptions, different generative models specify different generation processes of the sentence and parse tree. Paskin (2002) and Carroll and Charniak (1992) choose to first uniformly sample a dependency tree skeleton and then populate the tokens (words) conditioned on the dependency tree in a recursive root-to-leaf manner. The generation of a child token is conditioned on the head token and the dependency direction. In contrast, Klein and Manning (2004) propose the Dependency Model with Valence (DMV) that generates the sentence and the parse tree simultaneously. Without knowing the dependency tree structure, each head token has to sample a decision (conditioned on the head token and the dependency direction) of whether to generate a child token or not before actually generating the child token. Besides, the generation of a child token in DMV is additionally conditioned on the valence, defined as the number of the child tokens already generated from a head token. Headden  propose to also introduce the valence into the condition of decision sampling. Spitkovsky et al. (2012) additionally condition decision and child token generation on sibling words, sentence completeness, and punctuation context. Yang et al. (2020) propose a second-order extension of DMV that incorporates grandparent-child or sibling information. In addition to these generative dependency models, other grammar formalisms have also been used for unsupervised dependency parsing, such as tree substitution grammars (Blunsom and Cohn, 2010) and combinatory categorial grammars (Bisk and Hockenmaier, 2012;Bisk and Hockenmaier, 2013).
Similar tokens may have similar syntactic behaviors in a grammar. For example, all the verbs are very likely to generate a noun to the left as the subject. One way to capture this prior knowledge is to compute generation probabilities from a set of features that conveys syntactic similarity.  use a log-linear model based on manually-designed local morpho-syntactic features (e.g., whether a word is a noun) and Jiang et al. (2016) employ a neural network to automatically learn such features. Both approaches are based on DMV.

Inference
Given a model parameterized by Θ and a sentence x, the model predicts the parse z * with the highest probability.
where Z(x) is the set of all valid dependency trees of the sentence x. Due to the independence assumptions made by generative models, the inference problem can be efficiently solved exactly in most cases. For example, chart parsing can be used for DMV.

Learning Objective
Log marginal likelihood is typically employed as the objective function for learning generative models. It is defined on N training sentences X = {x (1) , x (2) , ..., x (N ) }: where the model parameters are denoted by Θ. The likelihood of each sentence x is as follows: where Z(x) is the set of all valid dependency trees of sentence x. As we mentioned earlier, the joint probability of a sentence and its dependency tree can be decomposed into the product of the probabilities of the components in the dependency tree. Apart from the vanilla marginal likelihood, priors and regularization terms are often added into the objective function to incorporate various inductive biases. Smith and Eisner (2006) (2012) introduce an entropy term to prevent the model from becoming too ambiguous. Mareček andŽabokrtskỳ (2012) insert a term that prefers reducible subtrees (i.e., their removal does not break the grammaticality of the sentence) in the parse tree. The same reducibility principle is used by Mareček and Straka (2013) to bias the decision probabilities in DMV. Noji et al. (2016) place a hard constraint in the objective that limits the degree of center-embedding of the parse tree.

Learning Algorithm
The Expectation-Maximization (EM) algorithm is typically used to optimize log marginal likelihood. For each sentence, the EM algorithm aims to maximize the following lower-bound of the objective function and alternates between the E-step and M-step.
where Q(z) is an auxiliary distribution with regard to z. In the E-step, Θ is fixed and Q(z) is set to P (z|x, Θ). A set of so-called expected counts can be derived from Q(z) to facilitate the subsequent Mstep and they are typically calculated using the inside-outside algorithm. In the M-step, Θ is optimized based on the expected counts with Q(z) fixed.
There are a few variants of the EM algorithm. If Q(z) represents a point-estimation (i.e., the best dependency tree has a probability of 1), the algorithm becomes hard-EM or Viterbi EM, which is found to outperform standard EM in unsupervised dependency parsing (Spitkovsky et al., 2010b). Softmax-EM (Tu and Honavar, 2012) falls between EM (considering all possible dependency trees) and hard-EM (only considering the best dependency tree), applying a softmax-like transformation to Q(z). During the EM iterations, an annealing schedule (Tu and Honavar, 2012) can be used to gradually shift from hard-EM to softmax-EM and finally to the EM algorithm, which leads to better performance than sticking to a single algorithm. Lateen EM (Spitkovsky et al., 2011c) repeatedly alternates between EM and hard-EM, which is also found to produce better results than both EM and hard-EM.
Approaches with more complicated objectives often require more advanced learning algorithms, but many of the algorithms can still be seen as extensions of the EM algorithm that revise either the Estep (e.g., to update Q(z) based on posterior regularization terms) or the M-step (e.g., to optimize the posterior probability that incorporates parameter priors).

Intermediate Representation Encoder Decoder
Autoencoder CRFAE (Cai et al., 2017) Z P (z|x) P (x|z) D-NDMV (Han et al., 2019a) Deterministic Variant S P (s|x) P (z,x|s) Variational Autoencoder (Li et al., 2019) Z P (z|x) P (z, x) D-NDMV (Han et al., 2019a) Variational Variant S P (s|x) P (z, x|s) (Corro and Titov, 2018) Z P (z|x) P (x|z) Table 1: Major approaches based on autoencoders and variational autoencoders for unsupervised dependency parsing. Z: dependency tree. S: continuous sentence representation.x is a copy of x representing the reconstructed sentence. z is the dependency tree. s is the continuous representation of sentence x.
In addition to the EM algorithm, the learning objective can also be optimized with gradient descent. Yang et al. (2020) recently observe that gradient descent can sometimes significantly outperform EM when learning neural DMV.
Better learning results can also be achieved by manipulating the training data. Spitkovsky et al. (2010a) apply curriculum learning to DMV training, which starts with only the shortest sentences and then progresses to increasingly longer sentences. Tu and Honavar (2011) provide a theoretical analysis on the utility of curriculum learning in unsupervised dependency parsing. Spitkovsky et al. (2013) propose to treat different learning algorithms and configurations as modules and connect them to form a network. Some approaches discussed above, such as Lateen EM and curriculum learning, can be seen as special cases of this approach.

Pros and Cons
It is often straightforward to incorporate various inductive biases and manually-designed local features into generative approaches. Moreover, generative models can be easily trained via the EM algorithm and its extensions. On the other hand, generative models often have limited expressive power because of the independence assumptions they make.

Discriminative Approaches
Because of the limitation of generative approaches, more recently, researchers have paid more attention to discriminative approaches. Discriminative approaches model the conditional probability or score of the dependency tree given the sentence. By conditioning on the whole sentence, discriminative approaches are capable of utilizing not only local features (i.e., features related to the current dependency) but also global features (i.e., contextual features from the whole sentence) in scoring a dependency tree.

Autoencoder-Based Approaches
Autoencoder-based approaches aim to map a sentence to an intermediate representation (encoding) and then reconstruct the observed sentence from the intermediate representation (decoding). In the two existing autoencoder approaches (summarized in Table 1), the intermediate representation is the dependency tree and a continuous sentence vector respectively.
The reconstruction loss is typically employed as the learning objective function for autoencoder models. For a training dataset including N sentences X = {x 1 , x 2 , ..., x N }, the objective function is as follows: where Θ is the model parameter andx (i) is a copy of x (i) representing the reconstructed sentence 1 . In some cases, there is an additional regularization term (e.g., L1) of Θ.
The first autoencoder model for unsupervised dependency parsing, proposed by Cai et al. (2017), is based on the conditional random field autoencoder framework (CRFAE). The encoder is a first-order graph-based discriminative dependency parser mapping an input sentence to the space of dependency trees. The decoder independently generates each token of the reconstructed sentence conditioned on the head of the token specified by the dependency tree. Both the encoder and the decoder are arc-factored, meaning that the encoding and decoding probabilities can be factorized by dependency arcs. Coordinate descent is applied to minimize the reconstruction loss and alternately updates the encoder parameters and the decoder parameters.
D-NDMV (Han et al., 2019a) (the deterministic variant) is the second autoencoder model proposed for unsupervised dependency parsing, in which the intermediate representation is a continuous vector representing the input sentence. The encoder is an LSTM summarizing the sentence with a continuous vector s, while the decoder models the joint probability of the sentence and the dependency tree. More specifically, the decoder is a generative neural DMV that generates the sentence and its parse simultaneously, and its parameters are computed based on the continuous vector s. The reconstruction loss is optimized using the EM algorithm. In the E-step, Θ is fixed and Q(z) is set to P (z|x, s; Θ). After we compute all the grammar rule probabilities given Θ, the inside-outside algorithm can be used to calculate the expected counts. In the M-step, Θ is optimized based on the expected counts with Q(z) fixed.

Variational Autoencoder-Based Approaches
As mentioned in Section 3.1, the training objective of a generative model is typically the probability of the training sentence and the dependency tree is marginalized as a hidden variable. However, the marginalized probability cannot usually be calculated accurately for more complex models that do not make strict independence assumption. Instead, a variational autoencoder maximizes the Evidence Lower Bound (ELBO), a lower bound of the marginalized probability. Since the intermediate representation follows a distribution, different sampling approaches are used to optimize the objective function (i.e., likelihood) according to different model schema.
Three unsupervised dependency parsing models were proposed in recent years based on variational autoencoders (shown in Table 1). There are three probabilities involved in ELBO: the prior probability of the syntactic structure, the probability of generating the sentence from the syntactic structure (the decoder), and the variational posterior (the encoder) from the sentence to the syntactic structure.
Recurrent Neural Network Grammars (RNNG) ) is a transition-based constituent parser, with a discriminative and a generative variant. Discriminative RNNG incrementally constructs the constituency tree of the input sentence through three kinds of operations: generating a non-terminal token, shifting, and reducing. Generative RNNG replaces the shifting operation with a word generation operation and incrementally generates a constituency tree and its corresponding sentence. The probability of each operation is calculated by a neural network. Li et al. (2019) modify RNNG for dependency parsing and use discriminative RNNG and generative RNNG as the encoder and decoder of a variational autoencoder respectively. However, because RNNG has a strong expressive power, it is prone to overfitting in the unsupervised setting. Li et al. (2019) propose to use posterior regularization to introduce linguistic knowledge as a constraint in learning, thereby mitigating this problem to a certain extent.
The model proposed by Corro and Titov (2018) is also based on a variational autoencoder. It is designed for semi-supervised dependency parsing, but in principle it can also be applied for unsupervised dependency parsing. The encoder of this model is a conditional random field model while the decoder generates a sentence based on a graph convolutional neural network whose structure is specified by the dependency tree. Since the variational autoencoder needs Monte Carlo sampling to approximate the gradient and the complexity of sampling a dependency tree is very high, Corro and Titov (2018) use Gumbel random perturbation. Jang et al. (2017) use differentiable dynamic programming to design an efficient approximate sampling algorithm.
The variational variant of D-NDMV (Han et al., 2019a) has the same structure as the deterministic variant described in Section 3.2.1, except that the variational variant probabilistically models the intermediate continuous vector conditioned on the input sentence using a Gaussian distribution. It also specifies a Gaussian prior over the intermediate continuous vector.

Other Discriminative Approaches
Apart from the approaches based on autoencoder and variational autoencoder, there are also a few other discriminative approaches based on discriminative clustering (Grave and Elhadad, 2015), self-training (Le and Zuidema, 2015), or searching (Daumé III, 2009). Because of space limit, below we only introduce the approach based on discriminative clustering called Convex MST (Grave and Elhadad, 2015).
Convex MST employs a first-order graph-based discriminative parser. It searches for the parses of all the training sentences and learns the parser simultaneously, with a learning objective that the searched parses are close to the predicted parses by the parser. In other words, the parses should be easily predictable by the parser. The objective function can be relaxed to become convex and then can be optimized exactly.

Pros and Cons
Discriminative models are capable of accessing global features from the whole input sentence and are typically more expressive than generative models. On the other hand, discriminative approaches are often more complicated and do not admit tractable exact inference.

Combined Approaches
Generative approaches and discriminative approaches have different pros and cons. Therefore, a natural idea is to combine the strengths of the two types of approaches to achieve better performance.  propose to jointly train two state-of-the-art models of unsupervised dependency parsing, the generative LC-DMV (Noji et al., 2016) and the discriminative Convex MST, with the dual decomposition technique that encourages the two models to gradually influence each other during training.

Neural Parameterization
Traditional generative approaches either directly learn or use manually-designed features to compute dependency rule probabilities. Following the recent rise of deep learning in the field of NLP, Jiang et al. (2016) propose to predict dependency rule probabilities using a neural network that takes as input the vector representations of the rule components such as the head and child tokens. The neural network can automatically learn features that capture correlations between tokens and rules. Han et al. (2019a) extend this generative approach to a discriminative approach by further introducing sentence information into the neural network in order to compute sentence-specific rule probabilities. Compared with generative approaches, it is more natural for discriminative approaches to use neural networks to score dependencies or parsing actions, so recent discriminative approaches all make use of neural networks (Li et al., 2019;Corro and Titov, 2018).

Lexicalization
In the most common setting of unsupervised dependency parsing, the parser is unlexicalized with POS tags being the tokens in the sentences. The POS tags are either human annotated or induced from the training corpus (Spitkovsky et al., 2011a;He et al., 2018). However, words with the same POS tag may have very different syntactic behavior and hence it should be beneficial to introduce lexical information into unsupervised parsers. Headden III et al. (2009), Blunsom and Cohn (2010 use partial lexicalization in which infrequent words are replaced by special symbols or their POS tags. Yuret (1998), Seginer (2007, Pate andSpitkovsky et al. (2013) experiment with full lexicalization. However, because the number of words is huge, a major problem with full lexicalization is that the grammar becomes much larger and thus learning requires more data. To mitigate the negative impact of data scarcity, smoothing techniques can be used. For instance,  use neural networks to predict dependency probabilities that are automatically smoothed.
In principle, lexicalized approaches could also benefit from pretrained word embeddings, which capture syntactic and semantic similarities between words. Recently proposed contextual word embeddings  Table 2: Reported directed dependency accuracies on section 23 of the WSJ corpus, evaluated on sentences of length ≤ 10 and all lengths. *: without gold POS tags. †: with more training data in addition to WSJ. (Devlin et al., 2019) are even more informative, capturing contextual information. However, word embeddings have not been widely used in unsupervised dependency parsing. One concern is that word embeddings are too informative and may make unsupervised models more prone to overfitting. One exception is He et al. (2018), who propose to use invertible neural projections to map word embeddings into a latent space that is more amenable to unsupervised parsing.

Big Data
Although unsupervised parsing does not require syntactically annotated training corpora and can theoretically use almost unlimited raw texts for training, most of the previous work conducts experiments on the WSJ10 corpus (the Wall Street Journal corpus with sentences no longer than 10 words) containing no more than 6,000 training sentences. There are a few papers that try to go beyond such a small training corpus. Pate and Johnson (2016) use two large corpora containing more than 700k sentences. Mareček and Straka (2013) utilize a very large corpus based on Wikipedia in learning an unlexicalized dependency grammar.  use a subset of the BLLIP corpus that contains around 180k sentences. With the advancement of computing power and deep neural models, we expect to see more future work on training with big data.

Unsupervised Multilingual Parsing
To tackle the lack of supervision in unsupervised dependency parsing, some previous work considers learning models of multiple languages simultaneously Liu et al., 2013;Jiang et al., 2019;Han et al., 2019b). Ideally, these models can learn from each other by identifying shared syntactic behaviors of different languages, especially those in the same language family. For example,  propose to utilize the similarity of different languages defined by a phylogenetic tree and learn several dependency parsers jointly. Han et al. (2019b) propose to learn a unified multilingual parser with language embeddings as input. Jiang et al. (2019) propose to guide the learning process of unsupervised dependency parser from the knowledge of another language by using three types of regularization to encourage similarity between model parameters, dependency edge scores, and parse trees respectively.

Benchmarking on the WSJ Corpus
Most papers of unsupervised dependency parsing report the accuracy of their approaches on the test set of the Wall Street Journal (WSJ) corpus. We list the reported accuracy on WSJ in Table 2. It must be emphasized that the approaches listed in this table may use different training sets and different external knowledge in their experiments, and one should check the corresponding papers to understand such differences before comparing these accuracies. While the accuracy of unsupervised dependency parsing has increased by over thirty points in the last fifteen years, it is still well below that of supervised models, which leaves much room for improvement and challenges for future research.
6 Future Directions

Utilization of Syntactic Information in Pretrained Language Modeling
Pretrained language modeling (Peters et al., 2018;Devlin et al., 2019;Radford et al., 2019), as a new NLP paradigm, has been utilized in various areas including question answering, machine translation, grammatical error correction, and so on. Pretrained language models leverage a large-scale corpus for pretraining and then small data sets of specific tasks for finetuning, reducing the difficulty of downstream tasks and boosting their performance. Current state-of-the-art approaches on supervised dependency parsing, such as Zhou and Zhao (2019), adopt the new paradigm and benefit from pretrained language modeling. However, pretrained language models have not been widely used in unsupervised dependency parsing. One major concern is that pretrained language models are too informative and may make unsupervised models more prone to overfitting. Besides, massive syntactic and semantic information is encoded in pretrained language models and how to extract the syntactic part from them is a challenging task.

Inspiration for Other Tasks
Unsupervised dependency parsing is a classic unsupervised learning task. Many techniques developed for unsupervised dependency parsing can serve as the inspiration for studies of other unsupervised tasks, especially unsupervised structured prediction tasks. A recent example is Nishida and Nakayama (2020), who study unsupervised discourse parsing (inducing discourse structures for a given text) by borrowing techniques from unsupervised parsing such as Viterbi EM and heuristically designed initialization.
Unsupervised dependency parsing techniques can also be used as building blocks for transfer learning of parsers. Some of the approaches discussed in this paper have already been applied to cross-lingual parsing (He et al., 2019;, and more such endeavors are expected in the future.

Interpretability
One prominent problem of deep neural networks is that they act as black boxes and are generally not interpretable. How to improve the interpretability of neural networks is a research topic that gains much attention recently. For natural language texts, their linguistic structures reveal important information of the texts and at the same time can be easily understood by human. It is therefore an interesting direction to integrate techniques of unsupervised parsing into various neural models of NLP tasks, such that the neural models can build their task-specific predictions on intermediate linguistic structures of the input text, which improves the interpretability of the predictions.

Conclusion
In this paper, we present a survey on the current advances of unsupervised dependency parsing. We first motivate the importance of the unsupervised dependency parsing task and discuss several related research areas. We split existing approaches into two main categories, and explain each category in detail. Besides, we discuss several recent trends in this research area. While there is a growing body of work that improves unsupervised dependency parsing, its performance is still below that of supervised dependency parsing by a large margin. This suggests that more investigation and research are needed to make unsupervised parsers useful for real applications. We hope that our survey can promote further development in this research direction.