H.A.I.L. Seminar series
CSIRO ICT Centre
http://www.ict.csiro.au/HAIL/
Title: Reducing semantic drift in biomedical lexicon bootstrapping
Speaker: Tara McIntosh
School of Information Technologies
University of Sydney
Date: Tuesday 2nd June 2009 at 11am
Location: CSIRO ICT Centre,
Building E6B, Macquarie University.
See for details.
Video: We usually stream live video of seminars.
At the seminar time (see above), point your browser at:
URL: http://www.ict.csiro.au/HAIL/Abstracts/2009/TaraMcIntosh.htm
Abstract
Extracting biomedical semantic lexicons from raw text is critical for overcoming the knowledge bottleneck in many bio-NLP tasks. In this talk, I will present the Weighted Mutual Exclusion Bootstrapping (WMEB) algorithm for simultaneously extracting precise biomedical semantic lexicons and patterns for multiple categories. WMEB is capable of extracting larger lexicons with higher precision than previous techniques, successfully reducing semantic drift by incorporating new weighting functions and a cumulative pattern pool.
Unfortunately, semantic drift still dominates in later iterations, as erroneous terms eventually shift a category's direction. We present two novel approaches for reducing semantic drift further in WMEB - unsupervised bagging, and utilising distributional similarity to detect and censor potential semantic drifts.
Short resume
Tara McIntosh is a PhD student at the School of IT, University of Sydney. Her research focusses on the development of unsupervised techniques for extracting biomedical knowledge and linguistic resources from raw text. Her earlier research included the development of data mining algorithms for detecting outliers in insurance records, and for extracting gene networks from microarray data.