Reducing semantic drift in biomedical lexicon bootstrapping

Tuesday, 2 June 2009
North Ryde
Submission Deadline: 
Wednesday, 20 May 2009

H.A.I.L. Seminar series

Title: Reducing semantic drift in biomedical lexicon bootstrapping

Speaker: Tara McIntosh
School of Information Technologies
University of Sydney

Date: Tuesday 2nd June 2009 at 11am

Location: CSIRO ICT Centre,
Building E6B, Macquarie University.

See for details.

Video: We usually stream live video of seminars.

At the seminar time (see above), point your browser at:



Extracting biomedical semantic lexicons from raw text is critical for overcoming the knowledge bottleneck in many bio-NLP tasks. In this talk, I will present the Weighted Mutual Exclusion Bootstrapping (WMEB) algorithm for simultaneously extracting precise biomedical semantic lexicons and patterns for multiple categories. WMEB is capable of extracting larger lexicons with higher precision than previous techniques, successfully reducing semantic drift by incorporating new weighting functions and a cumulative pattern pool.

Unfortunately, semantic drift still dominates in later iterations, as erroneous terms eventually shift a category's direction. We present two novel approaches for reducing semantic drift further in WMEB - unsupervised bagging, and utilising distributional similarity to detect and censor potential semantic drifts.

Short resume

Tara McIntosh is a PhD student at the School of IT, University of Sydney. Her research focusses on the development of unsupervised techniques for extracting biomedical knowledge and linguistic resources from raw text. Her earlier research included the development of data mining algorithms for detecting outliers in insurance records, and for extracting gene networks from microarray data.