Pre-release of Hindi Dependency Treebank

Event Notification Type: 
Other
Abbreviated Title: 
Hindi DTB
Location: 
http://ltrc.iiit.ac.in/treebank_H2014/
Sunday, 6 July 2014 to Wednesday, 6 July 2016
Contact: 
Vishnu S G
Martha Palmer

We are making available to researchers a 425K word Hindi Dependency Treebank. This project was funded by NSF CISE-CRI CNS 0751202/0709167: Collaborative Research: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu. The grant investigators include Martha Palmer, Dipti Sharma, Rajesh Bhatt, Owen Rambow and Fei Xia. All of the annotation of the Hindi Treebank being released now was done at IIIT-Hyderabad under the leadership of Dipti Sharma. The goal has been to develop a Hindi and Urdu multi-representational and multi-layered treebanks, that include both dependency and phrase structure as syntactic representation, and both Paninian and PropBank style semantic role labels as semantic representations. The guidelines for the dependency structure annotation have been synchronized with the phrase structure guidelines to facilitate automatic conversion. The PropBank guidelines have been extended to include elements that help guide the conversion to phrase structure. The Urdu data with its annotations, and the additional layers and representations for Hindi will also be released when they are completed.

The pre-release version of the Hindi Dependency Treebank is available for download. The link for downloading the data is

http://ltrc.iiit.ac.in/treebank_H2014/