Nobal Bikram Niraula

Also published as: Nobal Niraula

2018

pdf bib abs
A Novel Approach to Part Name Discovery in Noisy Text
Nobal Bikram Niraula | Daniel Whyatt | Anne Kao
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

As a specialized example of information extraction, part name extraction is an area that presents unique challenges. Part names are typically multi-word terms longer than two words. There is little consistency in how terms are described in noisy free text, with variations spawned by typos, ad hoc abbreviations, acronyms, and incomplete names. This makes search and analyses of parts in these data extremely challenging. In this paper, we present our algorithm, PANDA (Part Name Discovery Analytics), based on a unique method that exploits statistical, linguistic and machine learning techniques to discover part names in noisy text such as that in manufacturing quality documentation, supply chain management records, service communication logs, and maintenance reports. Experiments show that PANDA is scalable and outperforms existing techniques significantly.

2016

pdf bib abs
SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
Nabin Maharjan | Rajendra Banjade | Nobal Bikram Niraula | Vasile Rus
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces a ruled-based method and software tool, called SemAligner, for aligning chunks across texts in a given pair of short English texts. The tool, based on the top performing method at the Interpretable Short Text Similarity shared task at SemEval 2015, where it was used with human annotated (gold) chunks, can now additionally process plain text-pairs using two powerful chunkers we developed, e.g. using Conditional Random Fields. Besides aligning chunks, the tool automatically assigns semantic relations to the aligned chunks (such as EQUI for equivalent and OPPO for opposite) and semantic similarity scores that measure the strength of the semantic relation between the aligned chunks. Experiments show that SemAligner performs competitively for system generated chunks and that these results are also comparable to results obtained on gold chunks. SemAligner has other capabilities such as handling various input formats and chunkers as well as extending lookup resources.

pdf bib
Evaluation Dataset (DT-Grade) and Word Weighting Approach towards Constructed Short Answers Assessment in Tutorial Dialogue Context
Rajendra Banjade | Nabin Maharjan | Nobal Bikram Niraula | Dipesh Gautam | Borhan Samei | Vasile Rus
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
DTSim at SemEval-2016 Task 2: Interpreting Similarity of Texts Based on Automated Chunking, Chunk Alignment and Semantic Relation Prediction
Rajendra Banjade | Nabin Maharjan | Nobal Bikram Niraula | Vasile Rus
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Judging the Quality of Automatically Generated Gap-fill Question using Active Learning
Nobal Bikram Niraula | Vasile Rus
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

2014

pdf bib abs
The DARE Corpus: A Resource for Anaphora Resolution in Dialogue Based Intelligent Tutoring Systems
Nobal Niraula | Vasile Rus | Rajendra Banjade | Dan Stefanescu | William Baggett | Brent Morgan
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We describe the DARE corpus, an annotated data set focusing on pronoun resolution in tutorial dialogue. Although data sets for general purpose anaphora resolution exist, they are not suitable for dialogue based Intelligent Tutoring Systems. To the best of our knowledge, no data set is currently available for pronoun resolution in dialogue based intelligent tutoring systems. The described DARE corpus consists of 1,000 annotated pronoun instances collected from conversations between high-school students and the intelligent tutoring system DeepTutor. The data set is publicly available.