Detection of Peculiar Examples using LOF and One Class SVM

Hiroyuki Shinnou, Minoru Sasaki


Abstract
This paper proposes the method to detect peculiar examples of the target word from a corpus. In this paper we regard following examples as peculiar examples: (1) a meaning of the target word in the example is new, (2) a compound word consisting of the target word in the example is new or very technical. The peculiar example is regarded as an outlier in the given example set. Therefore we can apply many methods proposed in the data mining domain to our task. In this paper, we propose the method to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. In the experiment, we use the Whitepaper text in BCCWJ as the corpus, and 10 noun words as target words. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun `midori (green)'. The main reason of un-detections and wrong detection is that similarity measure of two examples is inadequacy. In future, we must improve it.
Anthology ID:
L10-1108
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/167_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Hiroyuki Shinnou and Minoru Sasaki. 2010. Detection of Peculiar Examples using LOF and One Class SVM. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Detection of Peculiar Examples using LOF and One Class SVM (Shinnou & Sasaki, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/167_Paper.pdf