Jinho D. Choi

Also published as: Jinho Choi


2019

pdf bib
Meta-Semantic Representation for Early Detection of Alzheimer’s Disease
Jinho D. Choi | Mengmei Li | Felicia Goldstein | Ihab Hajjar
Proceedings of the First International Workshop on Designing Meaning Representations

This paper presents a new task-oriented meaning representation called meta-semantics, that is designed to detect patients with early symptoms of Alzheimer’s disease by analyzing their language beyond a syntactic or semantic level. Meta-semantic representation consists of three parts, entities, predicate argument structures, and discourse attributes, that derive rich knowledge graphs. For this study, 50 controls and 50 patients with mild cognitive impairment (MCI) are selected, and meta-semantic representation is annotated on their speeches transcribed in text. Inter-annotator agreement scores of 88%, 82%, and 89% are achieved for the three types of annotation, respectively. Five analyses are made using this annotation, depicting clear distinctions between the control and MCI groups. Finally, a neural model is trained on features extracted from those analyses to classify MCI patients from normal controls, showing a high accuracy of 82% that is very promising.

pdf bib
FriendsQA: Open-Domain Question Answering on TV Show Transcripts
Zhengzhe Yang | Jinho D. Choi
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

This paper presents FriendsQA, a challenging question answering dataset that contains 1,222 dialogues and 10,610 open-domain questions, to tackle machine comprehension on everyday conversations. Each dialogue, involving multiple speakers, is annotated with several types of questions regarding the dialogue contexts, and the answers are annotated with certain spans in the dialogue. A series of crowdsourcing tasks are conducted to ensure good annotation quality, resulting a high inter-annotator agreement of 81.82%. A comprehensive annotation analytics is provided for a deeper understanding in this dataset. Three state-of-the-art QA systems are experimented, R-Net, QANet, and BERT, and evaluated on this dataset. BERT in particular depicts promising results, an accuracy of 74.2% for answer utterance selection and an F1-score of 64.2% for answer span selection, suggesting that the FriendsQA task is hard yet has a great potential of elevating QA research on multiparty dialogue to another level.

2018

pdf bib
SemEval 2018 Task 4: Character Identification on Multiparty Dialogues
Jinho D. Choi | Henry Y. Chen
Proceedings of The 12th International Workshop on Semantic Evaluation

Character identification is a task of entity linking that finds the global entity of each personal mention in multiparty dialogue. For this task, the first two seasons of the popular TV show Friends are annotated, comprising a total of 448 dialogues, 15,709 mentions, and 401 entities. The personal mentions are detected from nominals referring to certain characters in the show, and the entities are collected from the list of all characters in those two seasons of the show. This task is challenging because it requires the identification of characters that are mentioned but may not be active during the conversation. Among 90+ participants, four of them submitted their system outputs and showed strengths in different aspects about the task. Thorough analyses of the distributed datasets, system outputs, and comparative studies are also provided. To facilitate the momentum, we create an open-source project for this task and publicly release a larger and cleaner dataset, hoping to support researchers for more enhanced modeling.

pdf bib
They Exist! Introducing Plural Mentions to Coreference Resolution and Entity Linking
Ethan Zhou | Jinho D. Choi
Proceedings of the 27th International Conference on Computational Linguistics

This paper analyzes arguably the most challenging yet under-explored aspect of resolution tasks such as coreference resolution and entity linking, that is the resolution of plural mentions. Unlike singular mentions each of which represents one entity, plural mentions stand for multiple entities. To tackle this aspect, we take the character identification corpus from the SemEval 2018 shared task that consists of entity annotation for singular mentions, and expand it by adding annotation for plural mentions. We then introduce a novel coreference resolution algorithm that selectively creates clusters to handle both singular and plural mentions, and also a deep learning-based entity linking model that jointly handles both types of mentions through multi-task learning. Adjusted evaluation metrics are proposed for these tasks as well to handle the uniqueness of plural mentions. Our experiments show that the new coreference resolution and entity linking models significantly outperform traditional models designed only for singular mentions. To the best of our knowledge, this is the first time that plural mentions are thoroughly analyzed for these two resolution tasks.

pdf bib
Coordinate Structures in Universal Dependencies for Head-final Languages
Hiroshi Kanayama | Na-Rae Han | Masayuki Asahara | Jena D. Hwang | Yusuke Miyao | Jinho D. Choi | Yuji Matsumoto
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This paper discusses the representation of coordinate structures in the Universal Dependencies framework for two head-final languages, Japanese and Korean. UD applies a strict principle that makes the head of coordination the left-most conjunct. However, the guideline may produce syntactic trees which are difficult to accept in head-final languages. This paper describes the status in the current Japanese and Korean corpora and proposes alternative designs suitable for these languages.

pdf bib
Building Universal Dependency Treebanks in Korean
Jayeol Chun | Na-Rae Han | Jena D. Hwang | Jinho D. Choi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Challenging Reading Comprehension on Daily Conversation: Passage Completion on Multiparty Dialog
Kaixin Ma | Tomasz Jurczyk | Jinho D. Choi
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

This paper presents a new corpus and a robust deep learning architecture for a task in reading comprehension, passage completion, on multiparty dialog. Given a dialog in text and a passage containing factual descriptions about the dialog where mentions of the characters are replaced by blanks, the task is to fill the blanks with the most appropriate character names that reflect the contexts in the dialog. Since there is no dataset that challenges the task of passage completion in this genre, we create a corpus by selecting transcripts from a TV show that comprise 1,681 dialogs, generating passages for each dialog through crowdsourcing, and annotating mentions of characters in both the dialog and the passages. Given this dataset, we build a deep neural model that integrates rich feature extraction from convolutional neural networks into sequence modeling in recurrent neural networks, optimized by utterance and dialog level attentions. Our model outperforms the previous state-of-the-art model on this task in a different genre using bidirectional LSTM, showing a 13.0+% improvement for longer dialogs. Our analysis shows the effectiveness of the attention mechanisms and suggests a direction to machine comprehension on multiparty dialog.

2017

pdf bib
Improving Document Clustering by Removing Unnatural Language
Myungha Jang | Jinho D. Choi | James Allan
Proceedings of the 3rd Workshop on Noisy User-generated Text

Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can bean important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of un-natural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various for-mats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that re-moving unnatural language components gives an absolute improvement in document cluster-ing by up to 15%. Our corpus and tool are publicly available

pdf bib
Lexicon Integrated CNN Models with Attention for Sentiment Analysis
Bonggun Shin | Timothy Lee | Jinho D. Choi
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

With the advent of word embeddings, lexicons are no longer fully utilized for sentiment analysis although they still provide important features in the traditional setting. This paper introduces a novel approach to sentiment analysis that integrates lexicon embeddings and an attention mechanism into Convolutional Neural Networks. Our approach performs separate convolutions for word and lexicon embeddings and provides a global view of the document using attention. Our models are experimented on both the SemEval’16 Task 4 dataset and the Stanford Sentiment Treebank and show comparative or better results against the existing state-of-the-art systems. Our analysis shows that lexicon embeddings allow building high-performing models with much smaller word embeddings, and the attention mechanism effectively dims out noisy words for sentiment analysis.

pdf bib
Cross-genre Document Retrieval: Matching between Conversational and Formal Writings
Tomasz Jurczyk | Jinho D. Choi
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.

pdf bib
Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts
Henry Y. Chen | Ethan Zhou | Jinho D. Choi
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper presents a novel approach to character identification, that is an entity linking task that maps mentions to characters in dialogues from TV show transcripts. We first augment and correct several cases of annotation errors in an existing corpus so the corpus is clearer and cleaner for statistical learning. We also introduce the agglomerative convolutional neural network that takes groups of features and learns mention and mention-pair embeddings for coreference resolution. We then propose another neural model that employs the embeddings learned and creates cluster embeddings for entity linking. Our coreference resolution model shows comparable results to other state-of-the-art systems. Our entity linking model significantly outperforms the previous work, showing the F1 score of 86.76% and the accuracy of 95.30% for character identification.

pdf bib
Text-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks
Kaixin Ma | Catherine Xiao | Jinho D. Choi
Proceedings of ACL 2017, Student Research Workshop

2016

pdf bib
QA-It: Classifying Non-Referential It for Question Answer Pairs
Timothy Lee | Alex Lutz | Jinho D. Choi
Proceedings of the ACL 2016 Student Research Workshop

pdf bib
Dynamic Feature Induction: The Last Gist to the State-of-the-Art
Jinho D. Choi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows
Yu-Hsin Chen | Jinho D. Choi
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2015

pdf bib
Semantics-based Graph Approach to Complex Question-Answering
Tomasz Jurczyk | Jinho D. Choi
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

pdf bib
Computational Exploration to Linguistic Structures of Future: Classification and Categorization
Aiming Ni | Jinho D. Choi | Jason Shepard | Phillip Wolff
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

pdf bib
It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool
Jinho D. Choi | Joel Tetreault | Amanda Stent
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2013

pdf bib
Transition-based Dependency Parsing with Selectional Branching
Jinho D. Choi | Andrew McCallum
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Dynamic Knowledge-Base Alignment for Coreference Resolution
Jiaping Zheng | Luke Vilnis | Sameer Singh | Jinho D. Choi | Andrew McCallum
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah | Reut Tsarfaty | Sandra Kübler | Marie Candito | Jinho D. Choi | Richárd Farkas | Jennifer Foster | Iakes Goenaga | Koldo Gojenola Galletebeitia | Yoav Goldberg | Spence Green | Nizar Habash | Marco Kuhlmann | Wolfgang Maier | Joakim Nivre | Adam Przepiórkowski | Ryan Roth | Wolfgang Seeker | Yannick Versley | Veronika Vincze | Marcin Woliński | Alina Wróblewska | Eric Villemonte de la Clergerie
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
Jinho D. Choi | Martha Palmer
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Empty Argument Insertion in the Hindi PropBank
Ashwini Vaidya | Jinho D. Choi | Martha Palmer | Bhuvana Narasimhan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

2011

pdf bib
Getting the Most out of Transition-based Dependency Parsing
Jinho D. Choi | Martha Palmer
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Analysis of the Hindi Proposition Bank using Dependency Structure
Ashwini Vaidya | Jinho Choi | Martha Palmer | Bhuvana Narasimhan
Proceedings of the 5th Linguistic Annotation Workshop

pdf bib
Transition-based Semantic Role Labeling Using Predicate Argument Clustering
Jinho D. Choi | Martha Palmer
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics

pdf bib
Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing
Jinho D. Choi | Martha Palmer
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages

2010

pdf bib
Retrieving Correct Semantic Boundaries in Dependency Structure
Jinho Choi | Martha Palmer
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone
Jinho D. Choi | Claire Bonial | Martha Palmer
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee
Jinho D. Choi | Claire Bonial | Martha Palmer
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Multilingual Propbank Annotation Tools: Cornerstone and Jubilee
Jinho Choi | Claire Bonial | Martha Palmer
Proceedings of the NAACL HLT 2010 Demonstration Session

2009

pdf bib
Using Parallel Propbanks to enhance Word-alignments
Jinho Choi | Martha Palmer | Nianwen Xue
Proceedings of the Third Linguistic Annotation Workshop (LAW III)