DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

Kasra Hosseini, Federico Nanni, Mariona Coll Ardanuy


Abstract
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets.
Anthology ID:
2020.emnlp-demos.9
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
October
Year:
2020
Address:
Online
Editors:
Qun Liu, David Schlangen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–69
Language:
URL:
https://aclanthology.org/2020.emnlp-demos.9
DOI:
10.18653/v1/2020.emnlp-demos.9
Bibkey:
Cite (ACL):
Kasra Hosseini, Federico Nanni, and Mariona Coll Ardanuy. 2020. DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 62–69, Online. Association for Computational Linguistics.
Cite (Informal):
DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching (Hosseini et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-demos.9.pdf
Code
 Living-with-machines/DeezyMatch