Return to ACL-03 Home

EXHIBITS


#1

Microsoft

IBM Japan

NTT

OKI

FUJITSU

NEC

TOSHIBA

HITACHI

LanA Consulting

NII

CRL

The University of Tokyo

ATR

Corpus of Spontaneous Japanese

Hiroshima City University

Hokkaido University

JAIST

Keio University

Kyoto University

NAIST

Tohoku University

Tokushima University

Tokyo Institute of Technology

Tottori University

Yokohama National University




Microsoft

Machine Translation of a Very Large User Support Database

We demonstrate an example-based machine translation system deployed to translate English-language articles in the Microsoft Product Support Services Knowledge Base. The MSR-MT system is architected to permit simultaneous development among multiple languages, and can be rapidly trained to a client's domain using existing translated documents.

English Writing Wizard

We demonstrate English Writing Wizard. It provides assistance for users in English writing. Our approach to polish English sentences is to dynamically recommend appropriate sentences through NLP-based information retrieval when users are writing. Our approach accepts queries in both English and Chinese. There are three key technologies: mining collocations from un-annotated corpus, mining synonymous expressions from corpus, and translating Chinese collocations into English Collocations.

 

IBM Japan

IBM TAKMI: Text Analysis and Knowledge Mining for Business Intelligence and Life Sciences

We will demonstrate two text mining systems, IBM TAKMI for business intelligence (TAKMI, for short) and IBM TAKMI for life sciences (MedTAKMI, for short). By applying shallow parsing, synonym/semantic dictionary lookup, and information extraction, our systems can analyze millions of documents and extract useful information for specific domains. TAKMI has been applied to the CRM (Customer Relationship Management) domain for the analysis of customer contact records, and MedTAKMI has been applied to a medical domain for the analysis of MEDLINE documents.

 

NTT

Nippon Telegraph and Telephone Corp.

NTT Communication Science Laboratories provides a large lexical database (Nihongo-no Goitokuse; Lexical properties of Japanese) containing over 80,000 Japanese words and the characters, and a comprehensive thesaurus (Nihongo Goi-taikei; A Japanese Lexicon) including 300,000 Japanese words classified into 3,000 semantic categories. Goi-taikei provides a base ontology to SAIQA, NTT's Open-Domain Question Answering systems. SAIQA's Named Entity Recognizer employs Support Vector Machine (SVM), a high performance machine-learning method. Since off-the-shelf SVM classifiers were too inefficient, we developed a new algorithm that is orders-of-magnitude faster.

 

OKI

We demonstrate a web-based machine translation environment 'Yakushite.Net' that can be improved in terms of accuracy and scope through online collaboration by users. The environment leverages the cooperative efforts of online users for the creation of highly accurate dictionaries, enabling people with deep knowledge of a particular subject to collaborate in the enhancement of specialized dictionaries for online machine translation. Also we show future plan of the environment.

 

FUJITSU

Cliche : Integrating MT and TM

Seiji Okura, Tatsuo Yamashita, Masaru Fuji, Akira Ushioda
Fujitsu Laboratories Ltd.

We demonstrate a machine-aided translation system, Cliche, in which machine translation (MT) and translation memory (TM) technologies are integrated. Cliche's TM module handles translation examples analyzed by MT module, so that advanced search is possible. This system enables a high-speed, high-quality on-line translation and a translation by a group of translators from remote locations connected by a network.

 

NEC

NEC's Natural Language Processing Technologies

NEC's continuing research and development of natural language processing technologies has created many versatile products. In this exhibition, we will demonstrate TABITSU, TopicScope, and Reputation Search Engine.
TABITSU is a speech-to-speech translation system for notebook PCs that facilitates oral communications between Japanese and English speakers in various situations in traveling abroad. TopicScope is a text mining tool, which is based on a combination of natural language processing and data mining techniques. Reputation Search Engine is a kind of specialized search engine that extracts people's opinions related to user specified topics from a large amount of Web documents.

 

TOSHIBA

GroupScribe: A Rule-based Group Communication Management System

Sougo Tsuboi and Hideo Umeki
Corporate R&D Center, Toshiba Corporation

Communication and documentation can be complementary processes in various group-work situations. We have developed a group communication support system, GroupScribe, that provides a mechanism for extracting parts of relevant information from e-mail messages posted to each community and reorganizing them into auto-updating documents. Each document created in GroupScribe has a consolidation rule, which includes the range, the type, and the layout of extracts. These documents can also be edited interactively and shared as appropriate with other people. GroupScribe can therefore facilitate the users creating documents based on the contents of communication and managing both communication and documentation efficiently.

 

HITACHI

Recent Results of NLP Research at Hitachi's Central Research Laboratory

1

Diagnostic Image Searching according to the Similarity of Accompanying Diagnostic Reports.
We have developed a prototype system of searching similar radiological images according to the similarity of accompanying diagnostic reports. This system helps doctors not to miss important reporting points by referring to the past highly related cases.

2

An Information Retrieval and Filtering System Based on a Word Sense
Associative Network Word Sense Navigator helps users to select a particular word sense for each query term, and Document Filter selects relevant documents, in which terms are used in the intended senses.

3

An Annotated Japanese Sign Language Corpus on the Web
A portion of the JSL corpus has been downloadable on the Internet for research purposes. The contents include 100 examples, each of which consists of three synchronized video images (upper body, face, and side view of face) and annotation regarding manual signs as well as non-manual signs. (
http://koigakubo.hitachi.co.jp/index-e.html, News release 2002/9/13)

4

Extracting Biomolecular Interaction from Large Biomedical Literature
We explore a system for extracting biomolecule interactions. The system uses a semi-hand-coded protein and gene name dictionary of 200,000 entries. The system displays extracted interactions as a network.

LanA Consulting

LanA Consulting is a software company specializing in IT applications including multilingual natural language processing.

Products:
AutoPatGen - patent claim generator with "Beginners and Professional" modes.
AutoKnowledge - tools for NLP knowledge acquisition (flexible depth lexicons and grammar).
AutoKnowledgeAdvanced - tools for NLP knowledge acquisition linked to analyzer and generator with analysis/generation trace visualization.
NLPtrain - tools for teaching linguistic aspects of NLP for computational linguistics students.

Demos:
AutoPatRead - application for improving readability of patent claims that automatically decomposes a complex sentence of a patentncalaim into a set of simple sentences
AutoPatMT - application for machine translation of patent claims.

 

NII

NTCIR: Large-scale test collections for IR, QA and Summarization

The goal of the NTCIR Project is to provide the infrastructure of large-scale evaluation of information access technologies, which enhance better access to information in huge document collections using language analysis and information retrieval. The targets have been cross-language information retrieval of Chinese, Korean, Japanese and English, term recognition, text summarization, question answering, web retrieval, patent retrieval, and so on.

 

CRL

Computational Linguistics Group of the Communications Research Laboratory (CRL) is doing research on natural language processing (NLP). We cover from basic research on language to applications using NLP technologies. In this exhibition, we will demonstrate 1) Japanese-English Bilingual Corpora and its applications, 2) Japanese Learners' Corpus of English and 3) Medical Speech Translator.

 

The University of Tokyo

Innovation of Natural Language Processing at the University of Tokyo

Computational Linguistics and Natural Language Processing Lab, Department of Computer Science

Our group is concerned with a framework for representing, embedding and retrieving intelligent knowledge in texts to integrate text processing and knowledge processing, including:

  • HPSG Parser
  • XML-based Text Retrieval
  • Natural Language Resources in Biomedical Domain

Language Informatics Laboratory (LIL, Information Technology Center)

LIL is a research group of Information Technology Center. Tightly connected to the university library resource, we conduct following projects to support the actual text processing activities performed by the university students and faculty members:

  • Web based NL tools
  • User adaptive systems
  • Multi-lingual methods

Language and Knowledge Engineering Laboratory, Department of Information and Communication Engineering

Our goal is to develop intelligent media technology that facilitates human-to-human communication and to explore seeds for social contribution including risk communication, using:

  • Dialog-based QA system
  • Automatic acquisition of case frame dictionary
  • EgoChat system
  • Conversational interface agent

 

ATR

A Statistical-Information-Based Selector of the Best among Multiple MT Outputs

Yasuhiro Akiba, Eiichiro Sumita, Hiromi Nakaiwa, and Seiichi Yamamoto
ATR Spoken Language Translation Research Laboratories

The authors demonstrate a system for automatically selecting the best among outputs from three machine translation (MT) systems: D-cube, HPAT, and SAT, which are, respectively, an example-based MT, a pattern-based MT, and an SMT. The selection system assigns scores to each MT output by using statistical models of the target language and the translation, compares their scores statistically by using a non-parametric multiple comparison test, and selects the best. The selection system and MT systems are subsystems of ATR's speech-to-speech translation system and are automatically constructed using a bilingual corpus, ATR's BTEC (Basic Travel Expression Corpus).

 

Corpus of Spontaneous Japanese

In an attempt to build a firm basis for the processing technology of spontaneous speech, we have been compiling a large annotated corpus of spontaneous Japanese since 1999 aiming at the final public release in the spring of 2004. The Corpus of Spontaneous Japanese, or CSJ, contains more than 650 hours of spontaneous Standard Japanese produced by more than 1400 speakers. Speech signal, two-way transcription, and two-way POS annotation are provided for the whole corpus. In addition to these, phonetic labels (both segment and intonation), clause-boundary labels, dependency-structure labels, and, discourse structure labels are to be provided for a subset of the CSJ covering about 500,000 words, or 44 hours. Details of the corpus will be shown using real examples and the results of preliminary linguistic analyses.

 

Hiroshima City University

Zero Detector as a Japanese Language Teaching Aid

Mitsuko Yamura-Takei, Graduate School of Information Sciences, Hiroshima City University
Teruaki Aizawa, Faculty of Information Sciences, Hiroshima City University.

Zero Detector (ZD) is a linguistic analysis tool for Japanese language teachers. This program was developed to promote effective instruction of zero pronouns (zeros) by making invisible zeros visible. ZD takes Japanese narrative texts as input and undergoes morphological and syntactic analyses and zero detecting/inserting processes. It then provides various information on the input clause, including places of zeros, and a valency pattern of the predicate, depending on teachers' needs. ZD helps teachers: (1) predict the difficulties with zeros that students might encounter, and (2) provide students with systematic instruction of zeros, both in their interpretation and production of discourses containing zeros.

 

Hokkaido University

Applications of Natural Language Processing Method Using

Inductive Learning - Araki Laboratory, Hokkaido University -

Kenji Araki, Hokkaido University, Sapporo, JAPAN.
Hiroshi Echizen-ya, Hokkai-Gakuen University, Sapporo, JAPAN
Masafumi Matsuhara, Iwate Prefectural University, Iwate, JAPAN
Yasutomo Kimura, Hokkaido University, Sapporo, JAPAN

1 Introduction
We developed Natural Language Processing Method Using Inductive Learning that we developed. Our demonstrations include Kana-Kanji input method for mobile phone, machine translation and spoken dialogue processing.

2 Demonstrations

Our demonstrations are as follows:
Number-Kanji Translation System Using Inductive Learning
Machine Translation System Using Inductive Learning with Genetic Algorithms
Spoken Dialogue System Using Inductive Learning with Genetic Algorithms

 

JAIST

Research Activities in Japan Advanced Institute of Science and Technology

Akira Shimazu, Satoshi Tojo, Kiyoaki Shirai and Kentaro Torisawa
School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Tatsunokuchimachi, Nomi-gun, Ishikawa, Japan 923-1292

In this exhibition, we will show the research activities being conducted by the following four faculty members, who are studying computational linguistics and related research fields.

Prof. Akira Shimazu: Our main research topics are first, to model information, linguistic structure and relation between them, from the viewpoint of communication, and second, to develop a computational model for natural dialogues as seen in daily life. We will introduce some examples from our research.

Prof. Satoshi Tojo: Music scores are a form of language, in that they are both grammatical sequences of symbols, in terms of both rhythmic structure and cadence. According to music theory, salient notes dominate other notes and important chords dominate other chords, and thus the notion of `head' plays a major role. To support this notion, we will present Generative Theory of Tonal Music, and show our approach based on Head-driven Phrase Structure Grammar.

Assoc. Prof. Kiyoaki Shirai: Our main research topic is corpus-based natural language processing so as to achieve word sense disambiguation, statistical parsing, etc. We will introduce a word sense disambiguation, or WSD, system using two heterogeneous language resources as a supporting technology for a document reading assistant system.

Assoc. Prof. Kentaro Torisawa: Our research interests include automatic knowledge acquisition from large-scale corpora, high-level grammar formalisms for natural languages, and parsing algorithms. In the exhibition, we will show word clusters induced by EM-based clustering algorithms and related research.

 

Keio University

We will present a large-scale associative concept dictionary and its implementation as a brain memory model on pulsed neural network architecture. As an application system using the model, we will demonstrate a computational system for metaphor understanding. The system obtains a meaning of a metaphorical expression, "A is B," just like a human intuitive understanding. The dictionary was built by using large-scale associative data obtained by human association experiments. Distances between stimulus words and their associated words are calculated with a linear programming method using response parameters in the association experiments.

 

Kyoto University

Our exhibition introduces recent research activities at Language Media Laboratory of Kyoto University, focusing on knowledge acquisition from the Web. The Web can be viewed as the hugest corpus: NLP technologies enable automatic or semi-automatic acquisition of various kinds of knowledge from the corpus. We will demonstrate two systems. The first system realizes automatic collection of related terms from seed words, which can be used as a tool of compiling a glossary of a certain domain. The second system realizes bilingual lexicon acquisition from comparable corpora, which helps compile a Japanese-English dictionary.

 

NAIST

Computational Linguistics Laboratory of Nara Institute of Science and Technology will demonstrate basic natural language tools, and integrated tools and environment for paraphrasing.

Our natural language tools include part-of-speech taggers (ChaSen and MeCab) for Japanese, Chinese and English, pharse and NE chunkers, and syntactic dependency analyzer (CaboCha) for Japanese, all based on machine learning techniques such as HMM and SVM.

The integrated environment for paraphrasing is named KURA, and we will demonstrate its facilities and a question answering system, KURA-QA.

 

Tohoku University

Our exhibition introduces the Tohoku University 21st Century Center of Excellence (COE) program in humanities entitled "A Strategic Research and Education Center for an Integrated Approach to Language and Cognition". The principal disciplines involved in this project include Linguistics, Brain-Functional Studies, Cognitive Psychology, and Robotics/Artificial Intelligence. The goals of this project are: (i) Creation of a new field of "Integrated linguistics Science", which sheds light on studies of language learning, language acquisition, language disorder, age-related language loss, and robot language; (ii) Mutual interaction between theoretical and experimental studies: theoretical linguistic studies can receive feedback from experimental sciences, e.g., brain-functional studies and cognitive psychology, and vice versa.

 

Tokushima University

The explosive growth of Internet and increased availability of electronic media in many languages, has promoted the development of multilingual systems capable of running in several monolingual modes. The global implications of changes in recent society and scientific communities necessitate multi-national collaboration and thus a shift of emphasis the development of multilingual information retrieval systems for crossing language boundaries.

Most of the classification research is about term-weighting based on vector model, and some methods have proposed for estimating term relevance. There are several approaches which may use to address the particular problems of CLIR, including dictionary based, corpus-based and machine translation.

 

Tokyo Institute of Technology

The exhibition shows a prototype system K2 in which a user interacts with animated agents in the virtual world. Through speech input, the user can command the agents to manipulate the objects. The agent's behavior and the subsequent changes in the virtual world are presented to the user in terms of a three-dimensional animation. Through the prototype system, the project aims to explore natural language understanding situated to a real/virtual world and the relation between language understanding and action.

We demonstrate a Web-based system, Asunaro, which facilitates users to read sentences in the Japanese language. Asunaro provides users with information such as meanings of words, structures of sentences and explanations of syntax and idiomatic expressions in English, Thai, Indonesian, Malay and Chinese. Asunaro realises this by using morphological and syntactic analyses.

Integrating multiple databases of research papers with the data on the WWW

Hidetsugu Nanba, Hiroshima City University
Manabu Okumura, Tokyo Institute of Technology
Suguru Saito, Tokyo Institute of Technology
Takeshi Abekawa, Tokyo Institute of Technology

We have developed a system which makes it possible to retrieve papers from multiple databases at a time. Our system can show the citation relationships between papers together with their reasons for citations visually. Using our system, researchers can grasp the outline of a domain at a glance.

 

Tottori University

We are developing a new machine translation method called ``Analogical Mapping Method for MT based on Semantic Typology''. Our new method is constructed from two theories: One is the Semantic Typology Theory. This theory suggests that human understanding of the world is accompanied by an epistemological framework under the influence of one's mother tongue. The other is the Analogical Mapping Theory advocated by Kikuya Ichikawa. To test the applicability of these theories we are compiling sentence pattern database and semantic pattern database.

For demonstration, We will show the sentence pattern database we have compiled and show our method for matching Japanese sentences to sentence pattern database.

 

Yokohama National University

Recent Research at Mori Laboratory, Yokohama National University

Tatsunori Mori

We will introduce our recent research at Mori laboratory, Yokohama National University, Japan. It includes question-answering systems, navigation systems for IR results, summarizers for multiple documents, and so on.

We will demonstrate the following experimental systems:

  • A Japanese question-answering system, which does not require any preprocessing on target documents (Indexing for IR only) .
  • A navigation system for Japanese IR results with a summarization feature.