Octavian Popescu


2022

pdf bib
Addressing Limitations of Encoder-Decoder Based Approach to Text-to-SQL
Octavian Popescu | Irene Manotas | Ngoc Phuoc An Vo | Hangu Yeo | Elahe Khorashani | Vadim Sheinin
Proceedings of the 29th International Conference on Computational Linguistics

Most attempts on Text-to-SQL task using encoder-decoder approach show a big problem of dramatic decline in performance for new databases. For the popular Spider dataset, despite models achieving 70% accuracy on its development or test sets, the same models show a huge decline below 20% accuracy for unseen databases. The root causes for this problem are complex and they cannot be easily fixed by adding more manually created training. In this paper we address the problem and propose a solution that is a hybrid system using automated training-data augmentation technique. Our system consists of a rule-based and a deep learning components that interact to understand crucial information in a given query and produce correct SQL as a result. It achieves double-digit percentage improvement for databases that are not part of the Spider corpus.

pdf bib
Tackling Temporal Questions in Natural Language Interface to Databases
Ngoc Phuoc An Vo | Octavian Popescu | Irene Manotas | Vadim Sheinin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Temporal aspect is one of the most challenging areas in Natural Language Interface to Databases (NLIDB). This paper addresses and examines how temporal questions being studied and supported by the research community at both levels: popular annotated dataset (e.g. Spider) and recent advanced models. We present a new dataset with accompanied databases supporting temporal questions in NLIDB. We experiment with two SOTA models (Picard and ValueNet) to investigate how our new dataset helps these models learn and improve performance in temporal aspect.

2021

pdf bib
Recognizing and Splitting Conditional Sentences for Automation of Business Processes Management
Ngoc Phuoc An Vo | Irene Manotas | Octavian Popescu | Algimantas Černiauskas | Vadim Sheinin
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Business Process Management (BPM) is the discipline which is responsible for management of discovering, analyzing, redesigning, monitoring, and controlling business processes. One of the most crucial tasks of BPM is discovering and modelling business processes from text documents. In this paper, we present our system that resolves an end-to-end problem consisting of 1) recognizing conditional sentences from technical documents, 2) finding boundaries to extract conditional and resultant clauses from each conditional sentence, and 3) categorizing resultant clause as Action or Consequence which later helps to generate new steps in our business process model automatically. We created a new dataset and three models to solve this problem. Our best model achieved very promising results of 83.82, 87.84, and 85.75 for Precision, Recall, and F1, respectively, for extracting Condition, Action, and Consequence clauses using Exact Match metric.

2020

pdf bib
Identifying Motion Entities in Natural Language and A Case Study for Named Entity Recognition
Ngoc Phuoc An Vo | Irene Manotas | Vadim Sheinin | Octavian Popescu
Proceedings of the 28th International Conference on Computational Linguistics

Motion recognition is one of the basic cognitive capabilities of many life forms, however, detecting and understanding motion in text is not a trivial task. In addition, identifying motion entities in natural language is not only challenging but also beneficial for a better natural language understanding. In this paper, we present a Motion Entity Tagging (MET) model to identify entities in motion in a text using the Literal-Motion-in-Text (LiMiT) dataset for training and evaluating the model. Then we propose a new method to split clauses and phrases from complex and long motion sentences to improve the performance of our MET model. We also present results showing that motion features, in particular, entity in motion benefits the Named-Entity Recognition (NER) task. Finally, we present an analysis for the special co-occurrence relation between the person category in NER and animate entities in motion, which significantly improves the classification performance for the person category in NER.

2018

pdf bib
A Large Resource of Patterns for Verbal Paraphrases
Octavian Popescu | Ngoc Phuoc An Vo | Vadim Sheinin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
QUEST: A Natural Language Interface to Relational Databases
Vadim Sheinin | Elahe Khorashani | Hangu Yeo | Kun Xu | Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Octavian Popescu | Carlo Strapparava
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

2016

pdf bib
Corpora for Learning the Mutual Relationship between Semantic Relatedness and Textual Entailment
Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present the creation of a corpora annotated with both semantic relatedness (SR) scores and textual entailment (TE) judgments. In building this corpus we aimed at discovering, if any, the relationship between these two tasks for the mutual benefit of resolving one of them by relying on the insights gained from the other. We considered a corpora already annotated with TE judgments and we proceed to the manual annotation with SR scores. The RTE 1-4 corpora used in the PASCAL competition fit our need. The annotators worked independently of one each other and they did not have access to the TE judgment during annotation. The intuition that the two annotations are correlated received major support from this experiment and this finding led to a system that uses this information to revise the initial estimates of SR scores. As semantic relatedness is one of the most general and difficult task in natural language processing we expect that future systems will combine different sources of information in order to solve it. Our work suggests that textual entailment plays a quantifiable role in addressing it.

2015

pdf bib
Learning the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity
Simone Magnolini | Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Learning the Impact and Behavior of Syntactic Structure: A Case Study in Semantic Textual Similarity
Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
FBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
FBK-HLT: A New Framework for Semantic Textual Similarity
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
FBK-HLT: An Application of Semantic Textual Similarity for Answer Selection in Community Question Answering
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
SemEval-2015 Task 15: A CPA dictionary-entry-building task
Vít Baisa | Jane Bradbury | Silvie Cinková | Ismaïl El Maarouf | Adam Kilgarriff | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
SemEval 2015, Task 7: Diachronic Text Evaluation
Octavian Popescu | Carlo Strapparava
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Corpus Patterns for Semantic Processing
Octavian Popescu | Patrick Hanks | Elisabetta Jezek | Daisuke Kawahara
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Tutorial Abstracts

pdf bib
A Preliminary Evaluation of the Impact of Syntactic Structure in Semantic Textual Similarity and Semantic Relatedness Tasks
Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

pdf bib
Paraphrase Identification and Semantic Similarity in Twitter with Simple Features
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the third International Workshop on Natural Language Processing for Social Media

2014

pdf bib
Inducing Example-based Semantic Frames from a Massive Amount of Verb Uses
Daisuke Kawahara | Daniel Peterson | Octavian Popescu | Martha Palmer
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
FBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity
Ngoc Phuoc An Vo | Tommaso Caselli | Octavian Popescu
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE
Ngoc Phuoc An Vo | Octavian Popescu | Tommaso Caselli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
T-PAS; A resource of Typed Predicate Argument Structures for linguistic analysis and semantic processing
Elisabetta Jezek | Bernardo Magnini | Anna Feltracco | Alessia Bianchini | Octavian Popescu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The goal of this paper is to introduce T-PAS, a resource of typed predicate argument structures for Italian, acquired from corpora by manual clustering of distributional information about Italian verbs, to be used for linguistic analysis and semantic processing tasks. T-PAS is the first resource for Italian in which semantic selection properties and sense-in-context distinctions of verbs are characterized fully on empirical ground. In the paper, we first describe the process of pattern acquisition and corpus annotation (section 2) and its ongoing evaluation (section 3). We then demonstrate the benefits of pattern tagging for NLP purposes (section 4), and discuss current effort to improve the annotation of the corpus (section 5). We conclude by reporting on ongoing experiments using semiautomatic techniques for extending coverage (section 6).

pdf bib
Mapping CPA Patterns onto OntoNotes Senses
Octavian Popescu | Martha Palmer | Patrick Hanks
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present an alignment experiment between patterns of verb use discovered by Corpus Pattern Analysis (CPA; Hanks 2004, 2008, 2012) and verb senses in OntoNotes (ON; Hovy et al. 2006, Weischedel et al. 2011). We present a probabilistic approach for mapping one resource into the other. Firstly we introduce a basic model, based on conditional probabilities, which determines for any given sentence the best CPA pattern match. On the basis of this model, we propose a joint source channel model (JSCM) that computes the probability of compatibility of semantic types between a verb phrase and a pattern, irrespective of whether the verb phrase is a norm or an exploitation. We evaluate the accuracy of the proposed mapping using cluster similarity metrics based on entropy.

pdf bib
Fast and Accurate Misspelling Correction in Large Corpora
Octavian Popescu | Ngoc Phuoc An Vo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Behind the Times: Detecting Epoch Changes using Large Corpora
Octavian Popescu | Carlo Strapparava
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Learning Corpus Patterns Using Finite State Automata
Octavian Popescu
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora
Octavian Popescu | Alberto Lavelli
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib
Regular Patterns - Probably Approximately Correct Language Model
Octavian Popescu
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib
Determining is-a relationships for Textual Entailment
Vlad Niculae | Octavian Popescu
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

2012

pdf bib
Buildind a Resource of Patterns Using Semantic Types
Octavian Popescu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

While a word in isolation has a high potential of expressing various senses, in certain phrases this potential is restricted up to the point that one and only one sense is possible. A phrase is called sense stable if the senses of all the words compounding it do not change their sense irrespective of the context which could be added to its left or to its right. By comparing sense stable phrases we can extract corpus patterns. These patterns have slots which are filled by semantic types that capture the relevant information for disambiguation. The relationship between slots is such that a chain like disambiguation process is possible. Annotating a corpus with these kinds of patterns is beneficial for NLP, because problems such as data sparseness, noise, learning complexity are alleviated. We evaluate the inter agreement of annotators on examples coming from BNC.

2010

pdf bib
Dynamic Parameters for Cross Document Coreference
Octavian Popescu
Coling 2010: Posters

2009

pdf bib
Person Cross Document Coreference with Name Perplexity Estimates
Octavian Popescu
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Name Perplexity
Octavian Popescu
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2007

pdf bib
IRST-BP: Preposition Disambiguation based on Chain Clarifying Relationships Contexts
Octavian Popescu | Sara Tonelli | Emanuele Pianta
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
IRST-BP: Web People Search Using Name Entities
Octavian Popescu | Bernardo Magnini
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
Ontology Population from Textual Mentions: Task Definition and Benchmark
Bernardo Magnini | Emanuele Pianta | Octavian Popescu | Manuela Speranza
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

2004

pdf bib
Cross-Language Acquisition of Semantic Models for Verbal Predicates
Jordi Atserias | Bernardo Magnini | Octavian Popescu | Eneko Agirre | Aitziber Atutxa | German Rigau | John Carroll | Rob Koeling
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)