Dependency Relations for Sanskrit Parsing and Treebank

Dependency relations are needed for the development of a dependency treebank and a dependency parser. The guidelines1 for the development of treebank for Sanskrit proposed a set of dependency relations. Use of these relations for the development of a sentence generator and a dependency parser for Sanskrit demanded a need for an enhancement as well as a revision of these relations. In this paper, we discuss the revised version of these relations and discuss the cases where there is a possibility of multiple tagging either due to the ellipsis of certain arguments or due to the possible derivational morphological analysis. This led us to arrive at specific instructions for handling such cases during the tagging. A treebank with around 4000 sentences has been developed following these guidelines. Finally we evaluate a grammar based dependency parser for Sanskrit on this treebank and report its performance.


Introduction
Sanskrit is one of the oldest languages in the world and has literature at least hundred times that of Greek and Latin together. This literature ranges from scientific disciplines such as Mathematics,Āyurveda, texts dealing with Language Sciences, Ontology, Logic, Metallurgy, Physics, Polity, and Law to Philosophical texts, Epics and several texts of lasting artistic merit. India's contribution to the development of Language Sciences dealing with various branches such as phonetics, phonology, morphology, syntax, semantics, discourse analysis and logic are found to be relevant for Language Technology. Among these, Pān . ini's grammar and the theories of verbal cognition deserve special mention from the Natural Language Processing (NLP) perspective. While the Pān . ini's grammar provides an almost complete grammar for generation, the theories of verbal cognition provide a systematic approach to analyse any text objectively. In this approach attention is paid to the information encoded in a linguistic expression. Division of a word into morphemes, role of some morphemes in connecting other morphemes, deciding the meaning of the morphemes are some of the topics that are discussed in these theories. Pān . inian grammar provides the detailed description of how the semantic relations are realised through various morphological features, word order, and various other means of information encoding. The theories of verbal cognition use these clues of information encoding and other factors such as expectancy, mututal congruency of word meanings, proximity of the arguments etc. to decide the relations between the words.
The semantic relations used by Pān . ini to describe various relations thus provide a basic set for developing a dependency parser and also for the development of a treebank. This set of relations was enhanced over a period of 2-3 millenia by the grammarians and theoreticians working in the field of verbal cognition. A list of all such relations is compiled by Ramakrishnamacaryulu (2009) and presented as dependency relations for Sanskrit for both inter-sentential as well as intra-sentential tagging. These dependency relations were used as a starting point and the consortium for Sanskrit-Hindi Machine Translation (SHMT) system 2 arrived at a set suitable for the development of Sanskrit treebank. This resulted into the first version of the tagging guidelines for Sansktit treebank 3 . While developing a dependency parser, and also a sentential generator for Sanskrit, it was noticed that this set of dependency relations has some limitations and needs further enhancement as well as modifications. In this paper we discuss the revised version of this set. This set of relations is also used to develop a Sanskrit treebank. We present the cases of ambiguities in tagging while developing the treebank. This treebank is also used for the evaluation of the Sanskrit parser. We present the performance of this parser and discuss the limitations of both the parser as well as the dependency relations.
The paper is structured as follows. In the next section, we provide the literature survey of the state-ofart dependency relations and treebanks for parsing. This is followed by the discussion on the modifications to the earlier Sanskrit dependency relations and the enhancement thereupon justifying the necessity. In the fourth section we describe the Sanskrit treebank followed by the evaluation of a grammar based parser on this treebank. This is followed by the conclusion.

Brief survey
The last two decades have established the suitability of dependency parse over a constituency parse, even in the case of positional languages, for a wide range of NLP tasks such as Machine Translation, question answering, information extraction. This led to the development of dependency treebanks for various languages. Most of the languages followed an easy path of converting the existing constituency treebanks into dependency treebanks. Therefore the dependency relations used by these treebanks are also more syntactic in nature. At the same time several efforts were on developing a dependency parser for English. For example, the Link grammar, which is closely related to a dependency grammar proposed a set of around 106 relations which were not directional (Daniel and Temperley, 1993). Minipar had 59 relations (Lin, 2003). Caroll et al. (1999) and King et al. (2003) had proposed a set of dependency relations which were used by Marneffe et al. (2006) to convert the Phrase Structure treebanks to Dependency treebanks. This effort also led to some modifications to these relations, largely based on practical considerations. The number of relations proposed by them were 47. Most of these relations were syntactic in nature rather than semantic. These relations were incorporated in the Stanford parser. Thus we see that there was a huge variation between the number of relations used by various research groups, and naturally their semantic content also differed.
For most of the morphologically rich languages like Czech, Hindi, and Finnish manually annotated dependency treebanks were developed. The Prague Dependeny Treebank (PDT) is one of the oldest dependency treebanks (Bejček et al., 2013). This treebank is annotated at both the syntactic as well as semantic (tectogrammatic) level (Böhmovà et al., 2003). AnnCorra, guidelines for annotating dependency relations based on Pān . inian grammar, was developed for Indian languages, and the treebanks for major Indian languages were developed following these guidelines (Bharati et al., 2002).
The major effort towards bringing in a standard among the dependency relations is by (Nivre et al., 2016) who proposed the Universal dependencies. 4 The Universal dependencies aim for a common annotation scheme for all the languages so that cross-linguistic consistency among the treebanks for several languages is achieved. The Universal dependencies were evolved from the Stanford dependencies (Marneffe and Manning, 2008). Though most of the relations from the Universal dependencies are syntactic in nature, the nsubj relation together with the newly proposed nsubj:pass relation makes this pair equivalent to the concept of abhihita of the Pān . inian dependencies (Bharati and Kulkarni, 2011). Around 90 languages in the world including the three Classical languages viz. Greek, Latin and Sanskrit have dependency treebanks following Universal Dependencies.
Among the classical languages, both Ancient Greek and Latin have dependency treebanks following their own grammars. The ancient Greek dependency treebank consists of 21,170 sentences (309,096 words) from ancient Greek texts (Bamman and Crane, 2011). The Latin dependency treebank (V. 1.5) consists of 3473 annotate sentences (53,143 words) from eight texts. The Latin tagset (V. 1.3) consists of 20 categories mainly and they are further elaborated into various types. In this tagset, they have explained, with examples, how to annotate specific constructions involving relational clauses, gerunds, direct speech, comparison etc. 5 .
All these dependency relations are mostly syntactic in nature. A strong need is also felt for the semantic annotation. Levin and Rappaport (2005) discuss the problems in thematic level annotation. This led to other models for semantic level tagging. Propbank (Palmer et al., 2005) and FrameNet (Fillmore et al., 2003) are the two prominent among them.
Pān . ini's scheme for annotation of relations is syntactico-semantic (Kulkarni and Sharma, 2019). Unlike the semantics dealt with in Propbank or the FrameNet annotations, in Pān . ini's scheme, the level of semantics is precisely the one that can be extracted only from the linguistic expression (Bharati and Kulkarni, 2010).

Saṁsādhanī Dependency Relations
Manually annotated data at various levels has become now an essential resource for computational analysis of texts. Such a resource is not only useful for machine learning but also comes handy as a test data for grammar based systems. To extract various kinds of relations between words in a sentence, it is necessary to have a corpus tagged at the level of relations between the words. Pān . ini's grammar provides semantic definitions of various relations between words and also provides rules that tell us how these relations are realised morphologically. The noun-verb relations are called the kāraka relations which refer to six different types of participants of an action viz. kartā (roughly an agent), karma (roughly a goal or a patient), karan . am (instrument), sampradānam (recipient), apādānam (source) and an adhikaran . am (location). The Indian grammarians further sub-classified and enhanced these relations by introducing a few more relations that deemed to be necessary from analysis point of view. In addition, two other relations viz. prayojanam . (purpose) and hetuh . (cause) also involve noun-verb relationship. The list of all these relations, with around 100 entries, is collected and classified by Ramakrishnamacaryulu (2009). This list was the starting point in framing tagging guidelines in building treebanks. It was noticed that these relations were very fine-grained, and were neither suitable for a human annotator nor for computer parsing with high accuracy. Taking into consideration both the aspects viz. the manual tagging as well as the automatic parsing, around 31 relations were chosen from this set (Kulkarni and Ramakrishnamacharyulu, 2013). A treebank of around 3,000 sentences was developed following these guidelines. 6 These dependency relations, when, were examined from the sentence generation point of view, it was noticed that this set has several relations that were not semantic in nature, and referred to the morphological requirement or were syntactic in nature. This forced us to look at these relations afresh.

Enhancements and Modifications
In Sanskrit, there are certain words, in the presence of which a noun gets a specific nominal suffix. This is a morphological requirement, and in Pān . ini's grammar no semantics associated with such morphological requirements is discussed. As an example of such requirement let us consider the following sentence.
In this sentence, the verb 'be' is not a copula, but indicates an existence. The word paritah . (surrounding) refers to the location and has an expectancy of a reference point, and the word denoting this reference point gets an accusative case marker. Figure 1 shows both the old and the new versions. In the old version, the label was upapadasambandhah . (literally 'a relation due to an adjacent word') which was a morphosyntactic label. In the new version this has been replaced by a semantic label 'sandarbha_binduh . ' (reference point  Another pair of relations that needed modification was 'anuyogī' and 'pratiyogī'. These were the relations used to connect two sentences by a connective. The two words anuyogī and pratiyogī are from the Indian logic which are used to refer to the two relata of a relation. In the old annotation scheme, some of the relations were not analysed semantically, and hence a general scheme of naming them as relata1 (anuyogī) and relata2 (pratiyogī) was followed. We illustrate this with an example. Consider the sentence (2) Skt: Gloss: In this sentence the relation of the particle 'iti' (thus) with gacchāmi (goes) and avadat (said) was marked as pratiyogī and anuyogī in the earlier version. The embedded sentence being the sentential argument, we propose vākyakarma (literally meaning 'sentential object') relation between the heads of the main and the embedded sentence. And 'iti' serves as a marker for this relation, and hence it is marked as vākyakarmadyotakah . (literally meaning 'indicator of sentential argument'). In the earlier version the relations were as shown in Fig. 4. The two relations anuyogī (relata1) and pratiyogī (relata2) and the relation sambandhah . (literally 'relation') do not provide any semantics other than that the two words yadā (when) and tadā (then) are related to each other and they in turn are related to the finite verbs of the respective sentences. But what is the relation between them is not specified. In the revised scheme, these relations are changed as shown in Fig. 5. The modified version clearly marks the relation between co-relatives (when-then), and also marks the semantic relation of each of the co-relative with the verb as a time-location. The revised scheme thus provides a better semantics than the previous one.  Finally the third major modification was with regards to the co-ordinating conjuncts. In the earlier set of relations the conjunctive particle (samuccaya-dyotakah . ) was marked as the head, connecting the conjuncting co-ordinates by a relation samuccitam as shown in Fig 6. This was modified as shown in Fig 7. Let us look at the following sentence with a conjunct.
(4) Skt: Gloss:  In Sanskrit, it is observed that the last conjunct shows concord with the verb (Panchal and Kulkarni, 2019). The conjunctive particle acts as a marker, similar to the case suffix, to mark the relation between the two conjuncts. Hence in the modified analysis, the last conjunct in the phrase is marked as the head, with which the other conjunct is related by a samuccitam (conjunct) relation and the conjunctive particle is related to this head by the relation of samuccaya-dyotakah . (literary 'a marker for conjunction'). The predicate-argument relations are known as kāraka relations in Pān . inian terminology. These are six in number with sub-classification of some of them. The six major relations are kartā (roughly agent), karma (roughly goal or patient), karan . am (instrument), sampradānam (recipient), apādānam . (source) and adhikaran . am . (location). If the activity involved is a causative one, then the agent of the basic activity is called prayojya kartā and the causative agent is called the prayojaka kartā. To account for the arguments of ditransitive verbs, we have introduced two sub categories of karma viz. mukhyakarma (primary object) and gaun . akarma (secondary object). These are something similar to, but not semantically equivalent to, direct and indirect object. As discussed in the previous section, a new tag vākyakarma is also introduced to mark a sentential argument to a verb.

Rāmah
Under the non-predicative arguments, the relations are categorised into three sub-categories. The relation of a finite verb with a non-finite verb marking precedence, simultaneity etc. forms the first category. The relation of a verb with a noun marking the cause or the purpose etc. constitutes the second sub-category. The genitive relation between two nouns, the adjectival relation, and the relation due to reduplication are some examples of the relations in the third sub-category. The relations in this category convey only a broad semantics. For example the genitive relation covers various semantic relations such as part-whole relation, kinship relation, and the possessive relation, and many more. Similarly the reduplication may mark a universal quantification, or intensity, etc. The exact semantics depends on the context.
The third category of relations is the set of relations due to certain special words called 'upapada's. These words govern the case suffix of the nouns they are in proximity with. Pān . ini has not discussed the semantics of these relations. We found that most of these words are related to the nouns whose case suffix they govern, and they indicate either a reference point or a comparison point. Then there are the relations due to conjuncts and disjuncts and a few miscellaneous relations. The detailed treatment of conjuncts is summarised in (Panchal and Kulkarni, 2019), and we do not discuss these here further. Finally there are relations between sentences. These are typically relations between two full sentences. These relations are marked by cetain indeclinable words such as if then (yadi-tarhi), because of (tatah . ), hence (atah . ) etc. The relations between them are classified under miscellaneous, since, in the current guidelines we mark them as either relata1 and relata2, or just simply a relation. The terms 'relata1', 'relata2' and 'relation' do not provide any semantics. In Ramakrishnamacaryulu (2009), a semantic classification of inter-sentential relations is provided. The current guidelines need further enhancement to incorporate inter-sentential relations. This is out of scope of this paper and hence is not discussed.

Saṁsādhanī Parser
During the last decade there is an upsurge in the use of Machine Learning approaches for the development of Dependency parsers. Dependency parsers for several languages including Classical languages such as Latin and Greek are available. Most of these parsers follow the Data Driven approaches. The first parser for Sanskrit was built by Bhattacharyya (1986) using integer programming. Huet (2007) has a shallow parser that uses the minimal information of the transitivity of a verb as a sub-categorisation frame and models it as a graph-matching algorithm. The main purpose of this shallow parser is to filter out nonsensical segmentations. Hellwig et al. (2020) describe a syntactic labeler for manual annotation. This syntactic labeler expects a human being to select the pair of words, and the syntactic labeler suggests a label. This is a first stage towards developing an automatic full syntactic parser.
The first full-fledged parser for Sanskrit is described in Kulkarni (2019). This parser follows the Pān . inian grammar and the theories of verbal cognition described in the Indian Sanskrit literature. The theories of verbal cognition describe three conditions necessary for verbal cognition. They areākāṅks .ā (expectancy), yogyatā (meaning congruity) and sannidhi (proximity). Kulkarni (2019) has discussed the computational models of these three factors and describes the design of a parser following the theories of verbal cognition. This parser which is a part of the Saṁsādhanī platform, is implemented as an edgecentric binary join to build a dependency tree, in bottom-up approach, with local and global constraints on the edges and the edge labels. It uses the dependency relations provided in the Appendix A. It differs from the state-of-art parsers in the following aspects.
• It is a grammar based parser and follows the Indian theories of verbal cognition for parsing, while the current trend is to follow data driven approaches.
• It produces all possible parses while a typical parser produces only one parse. There are two reasons for allowing multiple parses. The first reason is, in Sanskrit we come across texts that have multiple readings. These multiple readings may be intended by the author or may be due to different philosophical interpretations. We would like to present all these readings to the reader. The second reason is, and this is purely due to the limitation of the implementation, the mutual congruency (semantic restrictions) between the word meanings is not checked while establishing the relations between words. This leads to over-generation and false positives. It is left to the readers to choose the correct parse from among the possible solutions.
• The solutions are ranked with a cost function which is defined as a sum of product of the cost associated with the relation and the distance between the two relata.
• The parse comes with an intelligent user interface and helps user to select the correct parse if the first parse is not correct.

Treebank
The first treebank of dependency analysis for Sanskrit was developed by the Consortium (SHMT-Consortium) executing the project entitled 'Development of Sanskrit Computational Tools and Sanskrit-Hindi Machine Translation System' sponsored by TDIL Programme, Ministry of Information Technology, Government of India, 2008-12. This treebank has 3000 sentences, mostly taken from the modern stories. However, this treebank is not available in public domain, and is available with the TDIL only for research. The second treebank was developed following the Universal Dependencies for a tiny corpus of 230 sentences from a Pañcatantra story (Dwivedi and Guha, 2017). The third treebank is the treebank of Vedic Sanskrit of 4004 sentences, which consists of both prose as well as verses, developed by Hellwig et al. (2020). This treebank also follows the Universal Dependencies.
We decided to develop a separate treebank from those described above. Firstly, since the dependency relations used by our parser for tagging are different from the Universal Dependency relations, the second and the third treebanks were not useful for us to evaluate our parser. Secondly we wanted to make the treebank thus developed open. The Saṁsādhanī platform contains three manually annotated texts. The first one is the Saṅks . epa Rāmāyan . am which has 100 verses. All these verses are tagged manually following the guidelines developed for the SHMT Consortium project. Shukla et al. (2013) reported a GOLD data ofŚrimad-Bhagavad-Gītā (BhG), a philosophical text in verse form, consisting of 700 verses. This text was also tagged at various levels -metrical, segmentation, morphological and dependency (Patel, 2018). For the dependency level tagging, the guidelines of SHMT project were followed. The third manually annotated text consists of the first 10 Cantos of a poemŚiśupālavadhaṁ 7 which were tagged following the same guidelines.
While these three tagged texts were available under the Saṁsādhanī platform, we noticed that since these treebanks are created by individuals, and are not cross checked, there are a few inconsistencies. Meanwhile, the development of the parser also prompted us to improve upon the dependency relations. So these treebanks need to be modified as per the new guidelines and need to be cross checked as well for consistency in tagging. During the development of a parser, a need was also felt of controlled texts for testing. This led us to develop a new treebank. The sentences for this new treebank are chosen from four different sources. One set is from the grammar books to ensure that the treebank covers various types of constructions and special cases discussed in the grammar books covering various cases of subcategorization etc. The second set contains 284 sentences from a Sanskrit text book for 9 th grade by NCERT (National Council for Education, Research and Training). These sentences are not isolated ones, but they constitute complete meaningful paragraphs or stories. The third set of sentences is from various books on Sanskrit learning. These are independent sentences covering wide vocabulary and syntactic constructions for the beginners. The fourth set of sentences is from the modern stories from a story book 8 which is being cross checked by the annotators. The annotation forŚrīmad-Bhagavad-Gītā is also being checked and corrected following the new guidelines. The treebank also contains a few verses from the first chapter of this poem. This treebank is available at http://sanskrit.uohyd.ac.in/scl/GOLD_DATA under the creative commons license.

Ambiguities during annotation
The annotation of all these four sets was checked by two or more of the authors independently. There were a few cases where there was a difference of opinion among the annotators. We discuss here an example of each type of the difference.
There were certain constructions involving non-finite verbs where two different annotations were possible. Here is an example. Here the word R . s .ī n .ā m . is in genitive and hence it can be related to the following word vacanam . by a genitive relation. However, the word vacanam . itself is a gerund of the verb vac (to speak). Hence the relation of R . s .ī n .ā m . with vacanam . may be considered to be that of a kartā (agent), according to Pān . ini's grammar. 9 In such cases we noticed that the annotators do not have consistency in tagging. This difference in tagging is probably not so important from the translation point of view, but it is important for the tasks such as information extraction, question answering etc. As far as the parser is concerned, it marks the relation as genitive if the genrund analysis is not available. If gerund analysis is available then it produces both the genitive as well as agent relation, giving priority to the agent relation. So the performance of the parser depends on the performance of the morphological analyser. Marking the relation as a genitive leads to loss of information. On the other hand, if the relation is marked as kartā, then one can always downgrade it to genitive, for translation purpose. A conscious effort on the part of the annotator is needed to mark such relations, and a good coverage morphological analyser producing analysis of derived stems is needed to get a correct parse. Here the word avaruddhāh . is a past participle of the verb rudh with prefix ava. Now this sentence can be analysed in two different ways as follows. The verb bhū may mean either 'to happen' or 'to become' and also 'to be'. Accordingly, we have two different interpretations.  Both these analyses are correct. In the first one, the verb acts as a copula. The second one shows the analysis with the verbal meaning 'to happen', and 'being blocked' as its kartā. The mārgāh . (roads) is, then, the object of blocking. As in the previous case, the first one is good enough for translation while the second one is better for deeper semantic analysis. In both the above cases, we propose that the manually tagged corpus should produce the analysis that uses the derivational information.
Another observation regarding tagging was with the elliptical sentences. Since Sanskrit is a highly inflectional language, there is no specific position (such as the Subject position in positional languages) that is sacrosanct. This allows Sanskrit to be a pro-drop language as well. Further, even the mandatory arguments such as kartā and karma may be dropped. For example, in an answer to a question 'rāmah . kutra agacchat' (Where did Rama go?), a simple answer such as 'vanam . agacchat' (went to a forest) is possible where the subject is ellipsed. Here the word vanam . is ambiguous between a nominative and an accusative analysis with the same stem vana. This leads to two parses, one with vana as an agent and another with vana as a goal. In the absence of any module to deal with meaning congruity between the verb and a noun, the parser fails to select one parse out of the two. The human annotator however marks the correct parse since he knows the meanings of the words. However there are cases where even for a human being the sentence is ambiguous, due to multiple morphological analyses. For example the causative form of the verb katha (to tell) is same as its non-causative form. Thus the word kathayanti may mean either tell or make somebody tell. So a simple sentence such as (7) Skt: Gloss: tell{pres,pl,3p,[causative]} Eng: Friends tell / (They) tell friends / Friends make (somebody) tell / (They) make (somebody) tell friends.
is ambiguous between four readings -friends is an agent, friends is a karma, friends is the causative agent, and finally friends is the karma (object) of the causative verb. This ambiguity is there for a human reader as well, since all the three interpretations are meaningwise compatible. In such cases the annotators are advised to mark all possible readings. We present the last example where the arguments are shared. Consider an example with one verb in absolutive and the other one in finite form as follows. Here both the kartā as well as karma viz. Rāma and book are shared between the two verbs purchase and read. Pān . ini has provided a rule for the sharing of the kartā, and accordingly, we relate Rāma by the relation of kartā with the finite verb read. But, for the sharing of the karma, there is no rule in the grammar. Here we fall back to the default word order in prose for deciding which role to mark. If the verb in absolutive were intransitive, then the karma would have been always after this absolutive verb and before the final verb, in the default prose word order. Similarly, if the karma for both the verbs are different, then the karma for the finite verb would be just before it, and that of the one in absolutive would be before it. Taking clues from this, we mark the shared verb as an argument of the verb in absolutive, and then using the rule for sharing of arguments, we share it with the final verb. But if an annotator marks the relation the otherway, we do not want to penalise them. In other words, we provide both possible answers in such cases.

Evaluation
The sentences in the Saṁsādhanī treebank were run through the Saṁsādhanī parser. Table 1 shows the statistics of the treebank and the performance of the parser on the basis of following parameters: a) exact match, b) totally failed sentences, c) partially correct output, d) Labelled Attachment Score (LAS), and e) Unlabelled Attachment Score (UAS). Totally failed sentences are the ones which the parser fails to parse, either due to Out of Vocabulary words or if any word fails to get connected to any other word in the sentence. Partially correct output are the parses where at least one relation is wrong but not all.

Source
Sentences Thus we see that the performance of this parser is reasonably good. The percentage of failure is very small. The average LAS is 85.5% and the UAS is 91. 5%. We notice that the performance of verse is not good. This is mainly due to some relations such as that of genitive and the adjectival which can move around freely.
The confusion matrix for some of the frequently occuring relations is shown in Table 2. The maximum confusion is with respect to the relation of kartā (roughly agent). There are two major reasons for the confusion of any relation with the other one. The first reason is, the relations share the same case marker. For example, both the cause and the instrument always take the instrumental case marker. And in the passive voice, kartā also takes the instrumental case marker. Therefore we see the confusion betwen a cause and an instrument and the kartā. Similarly the adjective of any of the predicate-argument relation always takes the case of its head noun. Since the relative word order for the adjective and the head noun is not fixed, in the absence of any semantic information about the adjective there is a confusion between which of the two substantives is the head and which one is an adjective. The confusion between a kartā and the predicative adjective is also essentially for the same reason. The second reason for the confusion is due to multiple morphological analyses of a word. For example, in the neuter gender, the accusative and nominative word forms are the same. This results in the confusion between a kartā and a karma (roughly goal).

Conclusion
In this paper we have discussed the first publicly available Sanskrit treebank developed following the dependency relations based on the Indian grammatical tradition. The presence of derivational analysis leads to deeper semantic analysis. At the same time it also introduces inconsistency in tagging, since most of the time for frequently used derived words such as vacanam (speech), the annotator may take these as underived and provide the dependency relations which do not show up the deeper analysis. Such deeper analysis is useful for certain tasks such as question answering and information retrieval, though might be irrelevant for the machine translation purpose.
We have also discussed the improved version of the dependency relations based on the Indian grammatical tradition. Three major improvements related to the treatment of the complementiser, conjunct and co-relative constructions were discussed. The modified version reflects the associated semantics.
Finally we have tested the dependency parser for Sanskrit on the treebank, and noted that the performance of the parser is reasonably good. The confusion matrix conforms with the grammatical sources of ambiguities. The proper modeling of mutual congruency would help in improving the performance of the parser.