Minoru Sasaki


2022

pdf bib
Text Classification Using a Graph Based on Relationships Between Documents
Hiromu Nakajima | Minoru Sasaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
Reputation Analysis Using Key Phrases and Sentiment Scores Extracted from Reviews
Yipu Huang | Minoru Sasaki | Kanako Komiya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
Effectiveness Analysis of Word Sense Disambiguation Using Example of Word Senses from WordNet
Hiroshi Sekiya | Minoru Sasaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
Effective Use of Japanese Dictionary Definition Sentences in Learning Hierarchical Embedding of Dictionaries
Yuki Ishii | Minoru Sasaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
Budget Argument Mining Dataset Using Japanese Minutes from the National Diet and Local Assemblies
Yasutomo Kimura | Hokuto Ototake | Minoru Sasaki
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Budget argument mining attempts to identify argumentative components related to a budget item, and then classifies these argumentative components, given budget information and minutes. We describe the construction of the dataset for budget argument mining, a subtask of QA Lab-PoliInfo-3 in NTCIR-16. Budget argument mining analyses the argument structure of the minutes, focusing on monetary expressions (amount of money). In this task, given sufficient budget information (budget item, budget amount, etc.), relevant argumentative components in the minutes are identified and argument labels (claim, premise, and other) are assigned their components. In this paper, we describe the design of the data format, the annotation procedure, and release information of budget argument mining dataset, to link budget information to minutes.

2020

pdf bib
Semi-supervised Word Sense Disambiguation Using Example Similarity Graph
Rie Yatabe | Minoru Sasaki
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

Word Sense Disambiguation (WSD) is a well-known problem in the natural language processing. In recent years, there has been increasing interest in applying neural net-works and machine learning techniques to solve WSD problems. However, these previ-ous supervised approaches often suffer from the lack of manually sense-tagged exam-ples. In this paper, to solve these problems, we propose a semi-supervised WSD method using graph embeddings based learning method in order to make effective use of labeled and unlabeled examples. The results of the experiments show that the proposed method performs better than the previous semi-supervised WSD method. Moreover, the graph structure between examples is effective for WSD and it is effective to utilize a graph structure obtained by fine-tuning BERT in the proposed method.

2018

pdf bib
Fine-tuning for Named Entity Recognition Using Part-of-Speech Tagging
Masaya Suzuki | Kanako Komiya | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
All-words Word Sense Disambiguation Using Concept Embeddings
Rui Suzuki | Kanako Komiya | Masayuki Asahara | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Japanese all-words WSD system using the Kyoto Text Analysis ToolKit
Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki | Shinsuke Mori
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Comparison of Annotating Methods for Named Entity Corpora
Kanako Komiya | Masaya Suzuki | Tomoya Iwakura | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Supervised Word Sense Disambiguation with Sentences Similarities from Context Word Embeddings
Shoma Yamaki | Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2015

pdf bib
Domain Adaptation with Filtering for Named Entity Extraction of Japanese Anime-Related Words
Kanako Komiya | Daichi Edamura | Ryuta Tamura | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Surrounding Word Sense Model for Japanese All-words Word Sense Disambiguation
Kanako Komiya | Yuto Sasaki | Hajime Morita | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Hybrid Method of Semi-supervised Learning and Feature Weighted Learning for Domain Adaptation of Document Classification
Hiroyuki Shinnou | Liying Xiao | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Learning under Covariate Shift for Domain Adaptation for Word Sense Disambiguation
Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
Unsupervised Domain Adaptation for Word Sense Disambiguation using Stacked Denoising Autoencoder
Kazuhei Kouno | Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

2012

pdf bib
Detection of Peculiar Word Sense by Distance Metric Learning with Labeled Examples
Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

For natural language processing on machines, resolving such peculiar usages would be particularly useful in constructing a dictionary and dataset for word sense disambiguation. Hence, it is necessary to develop a method to detect such peculiar examples of a target word from a corpus. Note that, hereinafter, we define a peculiar example as an instance in which the target word or phrase has a new meaning. In this paper, we proposed a new peculiar example detection method using distance metric learning from labeled example pairs. In this method, first, distance metric learning is performed by large margin nearest neighbor classification for the training data, and new training data points are generated using the distance metric in the original space. Then, peculiar examples are extracted using the local outlier factor, which is a density-based outlier detection method, from the updated training and test data. The efficiency of the proposed method was evaluated on an artificial dataset and the Semeval-2010 Japanese WSD task dataset. The results showed that the proposed method has the highest number of properly detected instances and the highest F-measure value. This shows that the label information of training data is effective for density-based peculiar example detection. Moreover, an experiment on outlier detection using a classification method such as SVM showed that it is difficult to apply the classification method to outlier detection.

2010

pdf bib
Detection of Peculiar Examples using LOF and One Class SVM
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes the method to detect peculiar examples of the target word from a corpus. In this paper we regard following examples as peculiar examples: (1) a meaning of the target word in the example is new, (2) a compound word consisting of the target word in the example is new or very technical. The peculiar example is regarded as an outlier in the given example set. Therefore we can apply many methods proposed in the data mining domain to our task. In this paper, we propose the method to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. In the experiment, we use the Whitepaper text in BCCWJ as the corpus, and 10 noun words as target words. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun `midori (green)'. The main reason of un-detections and wrong detection is that similarity measure of two examples is inadequacy. In future, we must improve it.

2008

pdf bib
Ping-pong Document Clustering using NMF and Linkage-Based Refinement
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes a ping-pong document clustering method using NMF and the linkage based refinement alternately, in order to improve the clustering result of NMF. The use of NMF in the ping-pong strategy can be expected effective for document clustering. However, NMF in the ping-pong strategy often worsens performance because NMF often fails to improve the clustering result given as the initial values. Our method handles this problem with the stop condition of the ping-pong process. In the experiment, we compared our method with the k-means and NMF by using 16 document data sets. Our method improved the clustering result of NMF significantly.

pdf bib
Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to reduce the similarity matrix size. First, using k-means, we obtain a clustering result for the given data set. From each cluster, we pick up some data, which are near to the central of the cluster. We take these data as one data. We call this data set as “committee”. Data except for committees remain one data. For these data, we construct the similarity matrix. Definitely, the size of this similarity matrix is reduced so much that we can perform spectral clustering using the reduced similarity matrix.

pdf bib
Division of Example Sentences Based on the Meaning of a Target Word Using Semi-Supervised Clustering
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique. In this task, the estimation of the cluster number (the number of the meaning) is critical. Our system primarily concentrates on this aspect. First, a user assigns the system an initial cluster number for the target word. The system then performs general clustering on the data set to obtain small clusters. Next, using constraints given by the user, the system integrates these clusters to obtain the final clustering result. Our system performs this entire procedure with high precision and requiring only a few constraints. In the experiment, we tested the system for 12 Japanese nouns used in the SENSEVAL2 Japanese dictionary task. The experiment proved the effectiveness of our system. In the future, we will improve sentence similarity measurements.

2007

pdf bib
Ensemble document clustering using weighted hypergraph generated by NMF
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Refinement of Document Clustering by Using NMF
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
Semi-supervised Learning by Fuzzy Clustering and Ensemble Learning
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Information Retrieval System Using Latent Contextual Relevance
Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

When the relevance feedback, which is one of the most popular information retrieval model, is used in an information retrieval system, a related word is extracted based on the first retrival result. Then these words are added into the original query, and retrieval is performed again using updated query. Generally, Using such query expansion technique, retrieval performance using the query expansion falls in comparison with the performance using the original query. As the cause, there is a few synonyms in the thesaurus and although some synonyms are added to the query, the same documents are retireved as a result. In this paper, to solve the problem over such related words, we propose latent context relevance in consideration of the relevance between query and each index words in the document set.

2003

pdf bib
Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003