Pabitra Mitra


2018

pdf bib
An LSTM-CRF Based Approach to Token-Level Metaphor Detection
Malay Pramanick | Ashim Gupta | Pabitra Mitra
Proceedings of the Workshop on Figurative Language Processing

Automatic processing of figurative languages is gaining popularity in NLP community for their ubiquitous nature and increasing volume. In this era of web 2.0, automatic analysis of sarcasm and metaphors is important for their extensive usage. Metaphors are a part of figurative language that compares different concepts, often on a cognitive level. Many approaches have been proposed for automatic detection of metaphors, even using sequential models or neural networks. In this paper, we propose a method for detection of metaphors at the token level using a hybrid model of Bidirectional-LSTM and CRF. We used fewer features, as compared to the previous state-of-the-art sequential model. On experimentation with VUAMC, our method obtained an F-score of 0.674.

pdf bib
Unsupervised Detection of Metaphorical Adjective-Noun Pairs
Malay Pramanick | Pabitra Mitra
Proceedings of the Workshop on Figurative Language Processing

Metaphor is a popular figure of speech. Popularity of metaphors calls for their automatic identification and interpretation. Most of the unsupervised methods directed at detection of metaphors use some hand-coded knowledge. We propose an unsupervised framework for metaphor detection that does not require any hand-coded knowledge. We applied clustering on features derived from Adjective-Noun pairs for classifying them into two disjoint classes. We experimented with adjective-noun pairs of a popular dataset annotated for metaphors and obtained an accuracy of 72.87% with k-means clustering algorithm.

2017

pdf bib
Sentence Alignment using Unfolding Recursive Autoencoders
Jeenu Grover | Pabitra Mitra
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

In this paper, we propose a novel two step algorithm for sentence alignment in monolingual corpora using Unfolding Recursive Autoencoders. First, we use unfolding recursive auto-encoders (RAE) to learn feature vectors for phrases in syntactical tree of the sentence. To compare two sentences we use a similarity matrix which has dimensions proportional to the size of the two sentences. Since the similarity matrix generated to compare two sentences has varying dimension due to different sentence lengths, a dynamic pooling layer is used to map it to a matrix of fixed dimension. The resulting matrix is used to calculate the similarity scores between the two sentences. The second step of the algorithm captures the contexts in which the sentences occur in the document by using a dynamic programming algorithm for global alignment.

pdf bib
Bilingual Word Embeddings with Bucketed CNN for Parallel Sentence Extraction
Jeenu Grover | Pabitra Mitra
Proceedings of ACL 2017, Student Research Workshop

2015

pdf bib
Mining HEXACO personality traits from Enterprise Social Media
Priyanka Sinha | Lipika Dey | Pabitra Mitra | Anupam Basu
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2010

pdf bib
Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure
Plaban Kr. Bhowmick | Anupam Basu | Pabitra Mitra
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper presents a new fuzzy agreement measure $\gamma_f$ for determining the agreement in multi-label and subjective annotation task. In this annotation framework, one data item may belong to a category or a class with a belief value denoting the degree of confidence of an annotator in assigning the data item to that category. We have provided a notion of disagreement based on the belief values provided by the annotators with respect to a category. The fuzzy agreement measure $\gamma_f$ has been proposed by defining different fuzzy agreement sets based on the distribution of difference of belief values provided by the annotators. The fuzzy agreement has been computed by studying the average agreement over all the data items and annotators. Finally, we elaborate on the computation $\gamma_f$ measure with a case study on emotion text data where a data item (sentence) may belong to more than one emotion category with varying belief values.

2008

pdf bib
A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition
Sujan Kumar Saha | Sudeshna Sarkar | Pabitra Mitra
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
A Hybrid Named Entity Recognition System for South and South East Asian Languages
Sujan Kumar Saha | Sanjay Chatterji | Sandipan Dandapat | Sudeshna Sarkar | Pabitra Mitra
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf bib
Gazetteer Preparation for Named Entity Recognition in Indian Languages
Sujan Kumar Saha | Sudeshna Sarkar | Pabitra Mitra
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER
Sujan Kumar Saha | Pabitra Mitra | Sudeshna Sarkar
Proceedings of ACL-08: HLT

pdf bib
An Agreement Measure for Determining Inter-Annotator Reliability of Human Judgements on Affective Text
Plaban Kumar Bhowmick | Anupam Basu | Pabitra Mitra
Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics