Andrew Chisholm


2018

pdf bib
Extracting structured data from invoices
Xavier Holt | Andrew Chisholm
Proceedings of the Australasian Language Technology Association Workshop 2018

Business documents encode a wealth of information in a format tailored to human consumption – i.e. aesthetically disbursed natural language text, graphics and tables. We address the task of extracting key fields (e.g. the amount due on an invoice) from a wide-variety of potentially unseen document formats. In contrast to traditional template driven extraction systems, we introduce a content-driven machine-learning approach which is both robust to noise and generalises to unseen document formats. In a comparison of our approach with alternative invoice extraction systems, we observe an absolute accuracy gain of 20\% across compared fields, and a 25\%–94\% reduction in extraction latency.

2017

pdf bib
Learning to generate one-sentence biographies from Wikidata
Andrew Chisholm | Will Radford | Ben Hachey
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We investigate the generation of one-sentence Wikipedia biographies from facts derived from Wikidata slot-value pairs. We train a recurrent neural network sequence-to-sequence model with attention to select facts and generate textual summaries. Our model incorporates a novel secondary objective that helps ensure it generates sentences that contain the input facts. The model achieves a BLEU score of 41, improving significantly upon the vanilla sequence-to-sequence model and scoring roughly twice that of a simple template baseline. Human preference evaluation suggests the model is nearly as good as the Wikipedia reference. Manual analysis explores content selection, suggesting the model can trade the ability to infer knowledge against the risk of hallucinating incorrect information.

2016

pdf bib
Overview of the 2016 ALTA Shared Task: Cross-KB Coreference
Andrew Chisholm | Ben Hachey | Diego Mollá
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf bib
Discovering Entity Knowledge Bases on the Web
Andrew Chisholm | Will Radford | Ben Hachey
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

pdf bib
Entity Disambiguation with Web Links
Andrew Chisholm | Ben Hachey
Transactions of the Association for Computational Linguistics, Volume 3

Entity disambiguation with Wikipedia relies on structured information from redirect pages, article text, inter-article links, and categories. We explore whether web links can replace a curated encyclopaedia, obtaining entity prior, name, context, and coherence models from a corpus of web pages with links to Wikipedia. Experiments compare web link models to Wikipedia models on well-known conll and tac data sets. Results show that using 34 million web links approaches Wikipedia performance. Combining web link and Wikipedia models produces the best-known disambiguation accuracy of 88.7 on standard newswire test data.