Julian Brooke


2018

pdf bib
Proceedings of the Second Workshop on Stylistic Variation
Julian Brooke | Lucie Flekova | Moshe Koppel | Thamar Solorio
Proceedings of the Second Workshop on Stylistic Variation

pdf bib
Cross-corpus Native Language Identification via Statistical Embedding
Francisco Rangel | Paolo Rosso | Julian Brooke | Alexandra Uitdenbogerd
Proceedings of the Second Workshop on Stylistic Variation

In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus. The motivation behind this study is to investigate native language identification in the Australian academic scenario where a majority of students come from China, Indonesia, and Arabic-speaking nations. We have proposed a statistical embedding representation reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and Indonesian in a cross-corpus scenario. The proposed approach was shown to be competitive even when the data is scarce and imbalanced.

pdf bib
Deep-speare: A joint neural model of poetic language, meter and rhyme
Jey Han Lau | Trevor Cohn | Timothy Baldwin | Julian Brooke | Adam Hammond
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling. We assess the quality of generated poems using crowd and expert judgements. The stress and rhyme models perform very well, as generated poems are largely indistinguishable from human-written poems. Expert evaluation, however, reveals that a vanilla language model captures meter implicitly, and that machine-generated poems still underperform in terms of readability and emotion. Our research shows the importance expert evaluation for poetry generation, and that future research should look beyond rhyme/meter and focus on poetic language.

2017

pdf bib
Semi-Automated Resolution of Inconsistency for a Harmonized Multiword Expression and Dependency Parse Annotation
King Chan | Julian Brooke | Timothy Baldwin
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations. Candidates for correction are identified using a variety of heuristics, including an entirely novel one which identifies violations of MWE constituency in the dependency tree, and resolved by arbitration with minimal human intervention. Using this technique, we identified and corrected several hundred errors across both parse and MWE annotations, representing changes to a significant percentage (well over 10%) of the MWE instances in the joint corpus.

pdf bib
Sub-character Neural Language Modelling in Japanese
Viet Nguyen | Julian Brooke | Timothy Baldwin
Proceedings of the First Workshop on Subword and Character Level Models in NLP

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using sub-characters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of sub-characters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

pdf bib
Proceedings of the Workshop on Stylistic Variation
Julian Brooke | Thamar Solorio | Moshe Koppel
Proceedings of the Workshop on Stylistic Variation

pdf bib
Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice
Julian Brooke | Jan Šnajder | Timothy Baldwin
Transactions of the Association for Computational Linguistics, Volume 5

We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates. In contrast to the standard approach of simple ranking by association measure, in our model n-grams are arranged in a lattice structure based on subsumption and overlap relationships, with nodes inhibiting other nodes in their vicinity when they are selected as a lexical item. We show how the configuration of such a lattice can be optimized tractably, and demonstrate using annotations of sampled n-grams that our method consistently outperforms alternatives by at least 0.05 F-score across several corpora and languages.

pdf bib
Joint Sentence-Document Model for Manifesto Text Analysis
Shivashankar Subramanian | Trevor Cohn | Timothy Baldwin | Julian Brooke
Proceedings of the Australasian Language Technology Association Workshop 2017

2016

pdf bib
Melbourne at SemEval 2016 Task 11: Classifying Type-level Word Complexity using Random Forests with Corpus and Word List Features
Julian Brooke | Alexandra Uitdenbogerd | Timothy Baldwin
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Bootstrapped Text-level Named Entity Recognition for Literature
Julian Brooke | Adam Hammond | Timothy Baldwin
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Distinguishing Voices in The Waste Land using Computational Stylistics
Julian Brooke | Adam Hammond | Graeme Hirst
Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Lifts up Computational Linguistics

T. S. Eliot’s poem The Waste Land is a notoriously challenging example of modernist poetry, mixing the independent viewpoints of over ten distinct characters without any clear demarcation of which voice is speaking when. In this work, we apply unsupervised techniques in computational stylistics to distinguish the particular styles of these voices, offering a computer’s perspective on longstanding debates in literary analysis. Our work includes a model for stylistic segmentation that looks for points of maximum stylistic variation, a k-means clustering model for detecting non-contiguous speech from the same voice, and a stylistic profiling approach which makes use of lexical resources built from a much larger collection of literary texts. Evaluating using an expert interpretation, we show clear progress in distinguishing the voices of The Waste Land as compared to appropriate baselines, and we also offer quantitative evidence both for and against that particular interpretation.

pdf bib
GutenTag: an NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus
Julian Brooke | Adam Hammond | Graeme Hirst
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf bib
Building a Lexicon of Formulaic Language for Language Learners
Julian Brooke | Adam Hammond | David Jacob | Vivian Tsang | Graeme Hirst | Fraser Shein
Proceedings of the 11th Workshop on Multiword Expressions

2014

pdf bib
Unsupervised Multiword Segmentation of Large Corpora using Prediction-Driven Decomposition of n-grams
Julian Brooke | Vivian Tsang | Graeme Hirst | Fraser Shein
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Supervised Ranking of Co-occurrence Profiles for Acquisition of Continuous Lexical Attributes
Julian Brooke | Graeme Hirst
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
A Multi-Dimensional Bayesian Approach to Lexical Style
Julian Brooke | Graeme Hirst
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Hybrid Models for Lexical Acquisition of Correlated Styles
Julian Brooke | Graeme Hirst
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together
Adam Hammond | Julian Brooke | Graeme Hirst
Proceedings of the Workshop on Computational Linguistics for Literature

pdf bib
Clustering Voices in The Waste Land
Julian Brooke | Graeme Hirst | Adam Hammond
Proceedings of the Workshop on Computational Linguistics for Literature

pdf bib
Using Other Learner Corpora in the 2013 NLI Shared Task
Julian Brooke | Graeme Hirst
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf bib
Robust, Lexicalized Native Language Identification
Julian Brooke | Graeme Hirst
Proceedings of COLING 2012

pdf bib
Building Readability Lexicons with Unannotated Corpora
Julian Brooke | Vivian Tsang | David Jacob | Fraser Shein | Graeme Hirst
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations

pdf bib
Unsupervised Stylistic Segmentation of Poetry with Change Curves and Extrinsic Features
Julian Brooke | Adam Hammond | Graeme Hirst
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

pdf bib
Measuring Interlanguage: Native Language Identification with L1-influence Metrics
Julian Brooke | Graeme Hirst
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The task of native language (L1) identification suffers from a relative paucity of useful training corpora, and standard within-corpus evaluation is often problematic due to topic bias. In this paper, we introduce a method for L1 identification in second language (L2) texts that relies only on much more plentiful L1 data, rather than the L2 texts that are traditionally used for training. In particular, we do word-by-word translation of large L1 blog corpora to create a mapping to L2 forms that are a possible result of language transfer, and then use that information for unsupervised classification. We show this method is effective in several different learner corpora, with bigram features being particularly useful.

2011

pdf bib
Predicting Word Clipping with Latent Semantic Analysis
Julian Brooke | Tong Wang | Graeme Hirst
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Lexicon-Based Methods for Sentiment Analysis
Maite Taboada | Julian Brooke | Milan Tofiloski | Kimberly Voll | Manfred Stede
Computational Linguistics, Volume 37, Issue 2 - June 2011

2010

pdf bib
Automatic Acquisition of Lexical Formality
Julian Brooke | Tong Wang | Graeme Hirst
Coling 2010: Posters

2009

pdf bib
Genre-Based Paragraph Classification for Sentiment Analysis
Maite Taboada | Julian Brooke | Manfred Stede
Proceedings of the SIGDIAL 2009 Conference

pdf bib
Cross-Linguistic Sentiment Analysis: From English to Spanish
Julian Brooke | Milan Tofiloski | Maite Taboada
Proceedings of the International Conference RANLP-2009

pdf bib
A Syntactic and Lexical-Based Discourse Segmenter
Milan Tofiloski | Julian Brooke | Maite Taboada
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers