Shinsuke Mori


2019

pdf bib
Procedural Text Generation from a Photo Sequence
Taichi Nishimura | Atsushi Hashimoto | Shinsuke Mori
Proceedings of the 12th International Conference on Natural Language Generation

Multimedia procedural texts, such as instructions and manuals with pictures, support people to share how-to knowledge. In this paper, we propose a method for generating a procedural text given a photo sequence allowing users to obtain a multimedia procedural text. We propose a single embedding space both for image and text enabling to interconnect them and to select appropriate words to describe a photo. We implemented our method and tested it on cooking instructions, i.e., recipes. Various experimental results showed that our method outperforms standard baselines.

2018

pdf bib
Universal Dependencies Version 2 for Japanese
Masayuki Asahara | Hiroshi Kanayama | Takaaki Tanaka | Yusuke Miyao | Sumire Uematsu | Shinsuke Mori | Yuji Matsumoto | Mai Omura | Yugo Murawaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
Suguru Matsuyoshi | Hirotaka Kameko | Yugo Murawaki | Shinsuke Mori
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Procedural Text Generation from an Execution Video
Atsushi Ushiku | Hayato Hashimoto | Atsushi Hashimoto | Shinsuke Mori
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In recent years, there has been a surge of interest in automatically describing images or videos in a natural language. These descriptions are useful for image/video search, etc. In this paper, we focus on procedure execution videos, in which a human makes or repairs something and propose a method for generating procedural texts from them. Since video/text pairs available are limited in size, the direct application of end-to-end deep learning is not feasible. Thus we propose to train Faster R-CNN network for object recognition and LSTM for text generation and combine them at run time. We took pairs of recipe and cooking video, generated a recipe from a video, and compared it with the original recipe. The experimental results showed that our method can produce a recipe as accurate as the state-of-the-art scene descriptions.

pdf bib
Japanese all-words WSD system using the Kyoto Text Analysis ToolKit
Hiroyuki Shinnou | Kanako Komiya | Minoru Sasaki | Shinsuke Mori
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Domain Specific Named Entity Recognition Referring to the Real World by Deep Neural Networks
Suzushi Tomori | Takashi Ninomiya | Shinsuke Mori
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Language Resource Addition Strategies for Raw Text Parsing
Atsushi Ushiku | Tetsuro Sasada | Shinsuke Mori
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We focus on the improvement of accuracy of raw text parsing, from the viewpoint of language resource addition. In Japanese, the raw text parsing is divided into three steps: word segmentation, part-of-speech tagging, and dependency parsing. We investigate the contribution of language resource addition in each of three steps to the improvement in accuracy for two domain corpora. The experimental results show that this improvement depends on the target domain. For example, when we handle well-written texts of limited vocabulary, white paper, an effective language resource is a word-POS pair sequence corpus for the parsing accuracy. So we conclude that it is important to check out the characteristics of the target domain and to choose a suitable language resource addition strategy for the parsing accuracy improvement.

pdf bib
Wikification for Scriptio Continua
Yugo Murawaki | Shinsuke Mori
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The fact that Japanese employs scriptio continua, or a writing system without spaces, complicates the first step of an NLP pipeline. Word segmentation is widely used in Japanese language processing, and lexical knowledge is crucial for reliable identification of words in text. Although external lexical resources like Wikipedia are potentially useful, segmentation mismatch prevents them from being straightforwardly incorporated into the word segmentation task. If we intentionally violate segmentation standards with the direct incorporation, quantitative evaluation will be no longer feasible. To address this problem, we propose to define a separate task that directly links given texts to an external resource, that is, wikification in the case of Wikipedia. By doing so, we can circumvent segmentation mismatch that may not necessarily be important for downstream applications. As the first step to realize the idea, we design the task of Japanese wikification and construct wikification corpora. We annotated subsets of the Balanced Corpus of Contemporary Written Japanese plus Twitter short messages. We also implement a simple wikifier and investigate its performance on these corpora.

pdf bib
A Japanese Chess Commentary Corpus
Shinsuke Mori | John Richardson | Atsushi Ushiku | Tetsuro Sasada | Hirotaka Kameko | Yoshimasa Tsuruoka
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In recent years there has been a surge of interest in the natural language prosessing related to the real world, such as symbol grounding, language generation, and nonlinguistic data search by natural language queries. In order to concentrate on language ambiguities, we propose to use a well-defined “real world,” that is game states. We built a corpus consisting of pairs of sentences and a game state. The game we focus on is shogi (Japanese chess). We collected 742,286 commentary sentences in Japanese. They are spontaneously generated contrary to natural language annotations in many image datasets provided by human workers on Amazon Mechanical Turk. We defined domain specific named entities and we segmented 2,508 sentences into words manually and annotated each word with a named entity tag. We describe a detailed definition of named entities and show some statistics of our game commentary corpus. We also show the results of the experiments of word segmentation and named entity recognition. The accuracies are as high as those on general domain texts indicating that we are ready to tackle various new problems related to the real world.

pdf bib
Universal Dependencies for Japanese
Takaaki Tanaka | Yusuke Miyao | Masayuki Asahara | Sumire Uematsu | Hiroshi Kanayama | Shinsuke Mori | Yuji Matsumoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon UniDic. Porting is done by mapping the part-of-speech tagset in UniDic to the universal part-of-speech tagset, and converting a constituent-based treebank to a typed dependency tree. The conversion is not straightforward, and we discuss the problems that arose in the conversion and the current solutions. A treebank consisting of 10,000 sentences was built by converting the existent resources and currently released to the public.

pdf bib
Parallel Speech Corpora of Japanese Dialects
Koichiro Yoshino | Naoki Hirayama | Shinsuke Mori | Fumihiko Takahashi | Katsutoshi Itoyama | Hiroshi G. Okuno
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Binary file summaries/549.html matches

2015

pdf bib
Keyboard Logs as Natural Annotations for Word Segmentation
Fumihiko Takahasi | Shinsuke Mori
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Can Symbol Grounding Improve Low-Level NLP? Word Segmentation as a Case Study
Hirotaka Kameko | Shinsuke Mori | Yoshimasa Tsuruoka
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Combining Active Learning and Partial Annotation for Domain Adaptation of a Japanese Dependency Parser
Daniel Flannery | Shinsuke Mori
Proceedings of the 14th International Conference on Parsing Technologies

pdf bib
A Framework for Procedural Text Understanding
Hirokuni Maeta | Tetsuro Sasada | Shinsuke Mori
Proceedings of the 14th International Conference on Parsing Technologies

2014

pdf bib
A Japanese Word Dependency Corpus
Shinsuke Mori | Hideki Ogura | Tetsuro Sasada
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
Language Resource Addition: Dictionary or Corpus?
Shinsuke Mori | Graham Neubig
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
Flow Graph Corpus from Recipe Texts
Shinsuke Mori | Hirokuni Maeta | Yoko Yamakata | Tetsuro Sasada
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
FlowGraph2Text: Automatic Sentence Skeleton Compilation for Procedural Text Generation
Shinsuke Mori | Hirokuni Maeta | Tetsuro Sasada | Koichiro Yoshino | Atsushi Hashimoto | Takuya Funatomi | Yoko Yamakata
Proceedings of the 8th International Natural Language Generation Conference (INLG)

2013

pdf bib
Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora
Katsuhito Sudoh | Shinsuke Mori | Masaaki Nagata
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Framework and Tool for Collaborative Extraction of Reliable Information
Graham Neubig | Shinsuke Mori | Masahiro Mizukami
Proceedings of the Workshop on Language Processing and Crisis Information 2013

pdf bib
Predicate Argument Structure Analysis using Partially Annotated Corpora
Koichiro Yoshino | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Machine Translation without Words through Substring Alignment
Graham Neubig | Taro Watanabe | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Statistical Method of Building Dialect Language Models for ASR Systems
Naoki Hirayama | Shinsuke Mori | Hiroshi G. Okuno
Proceedings of COLING 2012

pdf bib
Language Modeling for Spoken Dialogue System based on Filtering using Predicate-Argument Structures
Koichiro Yoshino | Shinsuke Mori | Tatsuya Kawahara
Proceedings of COLING 2012

pdf bib
Statistical Input Method based on a Phrase Class n-gram Model
Hirokuni Maeta | Shinsuke Mori
Proceedings of the Second Workshop on Advances in Text Input Methods

pdf bib
An Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method
Yoh Okuno | Shinsuke Mori
Proceedings of the Second Workshop on Advances in Text Input Methods

pdf bib
Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Graham Neubig | Taro Watanabe | Shinsuke Mori
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
An Unsupervised Model for Joint Phrase Alignment and Extraction
Graham Neubig | Taro Watanabe | Eiichiro Sumita | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis
Graham Neubig | Yosuke Nakata | Shinsuke Mori
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Spoken Dialogue System based on Information Extraction using Similarity of Predicate Argument Structures
Koichiro Yoshino | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the SIGDIAL 2011 Conference

pdf bib
Discriminative Method for Japanese Kana-Kanji Input Method
Hiroyuki Tokunaga | Daisuke Okanohara | Shinsuke Mori
Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)

pdf bib
Training Dependency Parsers from Partially Annotated Corpora
Daniel Flannery | Yusuke Miayo | Graham Neubig | Shinsuke Mori
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Word-based Partial Annotation for Efficient Corpus Construction
Graham Neubig | Shinsuke Mori
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

2008

pdf bib
Training Conditional Random Fields Using Incomplete Annotations
Yuta Tsuboi | Hisashi Kashima | Shinsuke Mori | Hiroki Oda | Yuji Matsumoto
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2006

pdf bib
Phoneme-to-Text Transcription System with an Infinite Vocabulary
Shinsuke Mori | Daisuke Takuma | Gakuto Kurata
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
A Stochastic Parser Based on an SLM with Arboreal Context Trees
Shinsuke Mori
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
A Stochastic Parser Based on a Structural Word Prediction Model
Shinsuke Mori | Masafumi Nishimura | Nobuyasu Itoh | Shiho Ogino | Hideo Watanabe
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1998

pdf bib
A Stochastic Language Model using Dependency and its Improvement by Word Clustering
Shinsuke Mori | Makoto Nagao
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
A Stochastic Language Model using Dependency and Its Improvement by Word Clustering
Shinsuke Mori | Makoto Nagao
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1996

pdf bib
Word Extraction from Corpora and Its Part-of-Speech Estimation Using Distributional Analysis
Shinsuke Mori | Makoto Nagao
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib
A New Method of N-gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese
Makoto Nagao | Shinsuke Mori
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics