Dwijen Rudrapal


2023

pdf bib
Cross-Lingual Speaker Identification for Indian Languages
Amaan Rizvi | Anupam Jamatia | Dwijen Rudrapal | Kunal Chakma | Björn Gambäck
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The paper introduces a cross-lingual speaker identification system for Indian languages, utilising a Long Short-Term Memory dense neural network (LSTM-DNN). The system was trained on audio recordings in English and evaluated on data from Hindi, Kannada, Malayalam, Tamil, and Telugu, with a view to how factors such as phonetic similarity and native accent affect performance. The model was fed with MFCC (mel-frequency cepstral coefficient) features extracted from the audio file. For comparison, the corresponding mel-spectrogram images were also used as input to a ResNet-50 model, while the raw audio was used to train a Siamese network. The LSTM-DNN model outperformed the other two models as well as two more traditional baseline speaker identification models, showing that deep learning models are superior to probabilistic models for capturing low-level speech features and learning speaker characteristics.

2017

pdf bib
Measuring the Limit of Semantic Divergence for English Tweets.
Dwijen Rudrapal | Amitava Das
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In human language, an expression could be conveyed in many ways by different people. Even that the same person may express same sentence quite differently when addressing different audiences, using different modalities, or using different syntactic variations or may use different set of vocabulary. The possibility of such endless surface form of text while the meaning of the text remains almost same, poses many challenges for Natural Language Processing (NLP) systems like question-answering system, machine translation system and text summarization. This research paper is an endeavor to understand the characteristic of such endless semantic divergence. In this research work we develop a corpus of 1525 semantic divergent sentences for 200 English tweets.

2015

pdf bib
Measuring Semantic Similarity for Bengali Tweets Using WordNet
Dwijen Rudrapal | Amitava Das | Baby Bhattacharya
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Sentence Boundary Detection for Social Media Text
Dwijen Rudrapal | Anupam Jamatia | Kunal Chakma | Amitava Das | Björn Gambäck
Proceedings of the 12th International Conference on Natural Language Processing