Hiroyuki Shinnou

Recently, domain shift, which affects accuracy due to differences in data between source and target domains, has become a serious issue when using machine learning methods to solve natural language processing tasks. With additional pretraining and fine-tuning using a target domain corpus, pretraining models such as BERT (Bidirectional Encoder Representations from Transformers) can address this issue. However, the additional pretraining of the BERT model is difficult because it requires significant computing resources. The efficiently learning an encoder that classifies token replacements accurately (ELECTRA) pretraining model replaces the BERT pretraining method’s masked language modeling with a method called replaced token detection, which improves the computational efficiency and allows the additional pretraining of the model to a practical extent. Herein, we propose a method for addressing the computational efficiency of pretraining models in domain shift by constructing an ELECTRA pretraining model on a Japanese dataset and additional pretraining this model in a downstream task using a corpus from the target domain. We constructed a pretraining model for ELECTRA in Japanese and conducted experiments on a document classification task using data from Japanese news articles. Results show that even a model smaller than the pretrained model performs equally well.

pdf bib abs
Application of Mix-Up Method in Document Classification Task Using BERT
Naoki Kikuta | Hiroyuki Shinnou
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The mix-up method (Zhang et al., 2017), one of the methods for data augmentation, is known to be easy to implement and highly effective. Although the mix-up method is intended for image identification, it can also be applied to natural language processing. In this paper, we attempt to apply the mix-up method to a document classification task using bidirectional encoder representations from transformers (BERT) (Devlin et al., 2018). Since BERT allows for two-sentence input, we concatenated word sequences from two documents with different labels and used the multi-class output as the supervised data with a one-hot vector. In an experiment using the livedoor news corpus, which is Japanese, we compared the accuracy of document classification using two methods for selecting documents to be concatenated with that of ordinary document classification. As a result, we found that the proposed method is better than the normal classification when the documents with labels shortages are mixed preferentially. This indicates that how to choose documents for mix-up has a significant impact on the results.

pdf bib
Construction and Evaluation of Japanese Sentence-BERT Models
Naoki Shibayama | Hiroyuki Shinnou
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

2020

pdf bib abs
Automatic Creation of Correspondence Table of Meaning Tags from Two Dictionaries in One Language Using Bilingual Word Embedding
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

In this paper, we show how to use bilingual word embeddings (BWE) to automatically create a corresponding table of meaning tags from two dictionaries in one language and examine the effectiveness of the method. To do this, we had a problem: the meaning tags do not always correspond one-to-one because the granularities of the word senses and the concepts are different from each other. Therefore, we regarded the concept tag that corresponds to a word sense the most as the correct concept tag corresponding the word sense. We used two BWE methods, a linear transformation matrix and VecMap. We evaluated the most frequent sense (MFS) method and the corpus concatenation method for comparison. The accuracies of the proposed methods were higher than the accuracy of the random baseline but lower than those of the MFS and corpus concatenation methods. However, because our method utilized the embedding vectors of the word senses, the relations of the sense tags corresponding to concept tags could be examined by mapping the sense embeddings to the vector space of the concept tags. Also, our methods could be performed when we have only concept or word sense embeddings whereas the MFS method requires a parallel corpus and the corpus concatenation method needs two tagged corpora.

pdf bib
Generation and Evaluation of Concept Embeddings Via Fine-Tuning Using Automatically Tagged Corpus
Kanako Komiya | Daiki Yaginuma | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Evaluation of Pretrained BERT Model by Using Sentence Clustering
Naoki Shibayama | Rui Cao | Jing Bai | Wen Ma | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Composing Word Vectors for Japanese Compound Words Using Bilingual Word Embeddings
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

2018

pdf bib abs
Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus
Kanako Komiya | Hiroyuki Shinnou
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Fine-tuning is a popular method to achieve better performance when only a small target corpus is available. However, it requires tuning of a number of metaparameters and thus it might carry risk of adverse effect when inappropriate metaparameters are used. Therefore, we investigate effective parameters for fine-tuning when only a small target corpus is available. In the current study, we target at improving Japanese word embeddings created from a huge corpus. First, we demonstrate that even the word embeddings created from the huge corpus are affected by domain shift. After that, we investigate effective parameters for fine-tuning of the word embeddings using a small target corpus. We used perplexity of a language model obtained from a Long Short-Term Memory network to assess the word embeddings input into the network. The experiments revealed that fine-tuning sometimes give adverse effect when only a small target corpus is used and batch size is the most important parameter for fine-tuning. In addition, we confirmed that effect of fine-tuning is higher when size of a target corpus was larger.

pdf bib
Domain Adaptation for Sentiment Analysis using Keywords in the Target Domain as the Learning Weight
Jing Bai | Hiroyuki Shinnou | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Domain Adaptation Using a Combination of Multiple Embeddings for Sentiment Analysis
Hiroyuki Shinnou | Xinyu Zhao | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Fine-tuning for Named Entity Recognition Using Part-of-Speech Tagging
Masaya Suzuki | Kanako Komiya | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
All-words Word Sense Disambiguation Using Concept Embeddings
Rui Suzuki | Kanako Komiya | Masayuki Asahara | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Japanese all-words WSD system using the Kyoto Text Analysis ToolKit
Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki | Shinsuke Mori
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Supervised Word Sense Disambiguation with Sentences Similarities from Context Word Embeddings
Shoma Yamaki | Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

pdf bib
Comparison of Annotating Methods for Named Entity Corpora
Kanako Komiya | Masaya Suzuki | Tomoya Iwakura | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

2015

pdf bib
Surrounding Word Sense Model for Japanese All-words Word Sense Disambiguation
Kanako Komiya | Yuto Sasaki | Hajime Morita | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Hybrid Method of Semi-supervised Learning and Feature Weighted Learning for Domain Adaptation of Document Classification
Hiroyuki Shinnou | Liying Xiao | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Learning under Covariate Shift for Domain Adaptation for Word Sense Disambiguation
Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
Unsupervised Domain Adaptation for Word Sense Disambiguation using Stacked Denoising Autoencoder
Kazuhei Kouno | Hiroyuki Shinnou | Minoru Sasaki | Kanako Komiya
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
Domain Adaptation with Filtering for Named Entity Extraction of Japanese Anime-Related Words
Kanako Komiya | Daichi Edamura | Ryuta Tamura | Minoru Sasaki | Hiroyuki Shinnou | Yoshiyuki Kotani
Proceedings of the International Conference Recent Advances in Natural Language Processing

2013

pdf bib
Use of Combined Topic Models in Unsupervised Domain Adaptation for Word Sense Disambiguation
Shinya Kunii | Hiroyuki Shinnou
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

2012

pdf bib abs
Detection of Peculiar Word Sense by Distance Metric Learning with Labeled Examples
Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

For natural language processing on machines, resolving such peculiar usages would be particularly useful in constructing a dictionary and dataset for word sense disambiguation. Hence, it is necessary to develop a method to detect such peculiar examples of a target word from a corpus. Note that, hereinafter, we define a peculiar example as an instance in which the target word or phrase has a new meaning. In this paper, we proposed a new peculiar example detection method using distance metric learning from labeled example pairs. In this method, first, distance metric learning is performed by large margin nearest neighbor classification for the training data, and new training data points are generated using the distance metric in the original space. Then, peculiar examples are extracted using the local outlier factor, which is a density-based outlier detection method, from the updated training and test data. The efficiency of the proposed method was evaluated on an artificial dataset and the Semeval-2010 Japanese WSD task dataset. The results showed that the proposed method has the highest number of properly detected instances and the highest F-measure value. This shows that the label information of training data is effective for density-based peculiar example detection. Moreover, an experiment on outlier detection using a classification method such as SVM showed that it is difficult to apply the classification method to outlier detection.

2010

pdf bib abs
Detection of Peculiar Examples using LOF and One Class SVM
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes the method to detect peculiar examples of the target word from a corpus. In this paper we regard following examples as peculiar examples: (1) a meaning of the target word in the example is new, (2) a compound word consisting of the target word in the example is new or very technical. The peculiar example is regarded as an outlier in the given example set. Therefore we can apply many methods proposed in the data mining domain to our task. In this paper, we propose the method to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. In the experiment, we use the Whitepaper text in BCCWJ as the corpus, and 10 noun words as target words. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun `midori (green)'. The main reason of un-detections and wrong detection is that similarity measure of two examples is inadequacy. In future, we must improve it.

2008

pdf bib abs
Ping-pong Document Clustering using NMF and Linkage-Based Refinement
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes a ping-pong document clustering method using NMF and the linkage based refinement alternately, in order to improve the clustering result of NMF. The use of NMF in the ping-pong strategy can be expected effective for document clustering. However, NMF in the ping-pong strategy often worsens performance because NMF often fails to improve the clustering result given as the initial values. Our method handles this problem with the stop condition of the ping-pong process. In the experiment, we compared our method with the k-means and NMF by using 16 document data sets. Our method improved the clustering result of NMF significantly.

pdf bib abs
Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to reduce the similarity matrix size. First, using k-means, we obtain a clustering result for the given data set. From each cluster, we pick up some data, which are near to the central of the cluster. We take these data as one data. We call this data set as committee. Data except for committees remain one data. For these data, we construct the similarity matrix. Definitely, the size of this similarity matrix is reduced so much that we can perform spectral clustering using the reduced similarity matrix.

pdf bib abs
Division of Example Sentences Based on the Meaning of a Target Word Using Semi-Supervised Clustering
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique. In this task, the estimation of the cluster number (the number of the meaning) is critical. Our system primarily concentrates on this aspect. First, a user assigns the system an initial cluster number for the target word. The system then performs general clustering on the data set to obtain small clusters. Next, using constraints given by the user, the system integrates these clusters to obtain the final clustering result. Our system performs this entire procedure with high precision and requiring only a few constraints. In the experiment, we tested the system for 12 Japanese nouns used in the SENSEVAL2 Japanese dictionary task. The experiment proved the effectiveness of our system. In the future, we will improve sentence similarity measurements.

2007

pdf bib
Ensemble document clustering using weighted hypergraph generated by NMF
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Refinement of Document Clustering by Using NMF
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
Semi-supervised Learning by Fuzzy Clustering and Ensemble Learning
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib abs
Information Retrieval System Using Latent Contextual Relevance
Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

When the relevance feedback, which is one of the most popular information retrieval model, is used in an information retrieval system, a related word is extracted based on the first retrival result. Then these words are added into the original query, and retrieval is performed again using updated query. Generally, Using such query expansion technique, retrieval performance using the query expansion falls in comparison with the performance using the original query. As the cause, there is a few synonyms in the thesaurus and although some synonyms are added to the query, the same documents are retireved as a result. In this paper, to solve the problem over such related words, we propose latent context relevance in consideration of the relevance between query and each index words in the document set.

2003

pdf bib
Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

pdf bib
Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features
Hiroyuki Shinnou
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Extraction of Unknown Words Using the Probability of Accepting the Kanji Character Sequence as One Word
Hiroyuki Shinnou | Masanori Ikeya
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
Detection of Japanese Homophone Errors by a Decision List Including a Written Word as a Default Evidence
Hiroyuki Shinnou
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib abs
Revision of morphological analysis errors through the person name construction model
Hiroyuki Shinnou
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

In this paper, we present the method to automatically revise morphological analysis errors caused by unregistered person names. In order to detect and revise their errors, we propose the Person Name Construction Model for kanji characters composing Japanese names. Our method has the advantage of not using context information, like a suffix, to recognize person names, thus making our method a useful one. Through the experiment, we show that our proposed model is effective.

pdf bib
A Decision Tree Method for Finding and Classifying Names in Japanese Texts
Satoshi Sekine | Ralph Grishman | Hiroyuki Shinnou
Sixth Workshop on Very Large Corpora