Dr. Kazuhide Yamamoto: publications

This page presents a list of international conference papers of Kazuhide Yamamoto. The abstracts shown here are as same as those written in their papers. You can download all papers in this list via PDF format, by using Adobe Acrobat Reader. Please let me (kazu_yamamoto@mcn.ne.jp) know if you have any comments on anything.

Back to self-introduction


  1. Satoshi Shirai, Kazuhide Yamamoto, Francis Bond and Hozumi Tanaka.
    Towards a Thesaurus of Predicates.
    accepted to LREC2002 (2002.5) [PDF]
    (The abstract will be open right after the conference.)

  2. Setsuo Yamada, Kenji Imamura and Kazuhide Yamamoto.
    Corpus-Assisted Expansion of Manual MT Knowledge.
    Proc. of TMI 2002 pp.199-208 (2002.3) [PDF]
    Since the expansion of MT knowledge is currently being performed by humans, it is taking too long and is too expensive. This paper proposes a new procedure that expands MT knowledge effciently by supporting human judgements with information automatically collected from any number of corpora. The new procedure uses the source knowledge present in an MT system as the key to retrieve source language information from corpora. It also uses the partial translations provided by the MT to acquire target language information. These two techniques can reduce time and labor costs. Experimental results confirm both benefits.

  3. Kazuhide Yamamoto.
    Paraphrasing Spoken Japanese for Untangling Bilingual Transfer.
    Proc. of NLPRS2001, pp.203-210 (2001.11) [PDF] [presentation]
    One of the problems in spoken language translation is the enormous variety of expressions not found in text translation. This volume can lead to a sparse translation coverage. In order to tackle this problem, we take the practical approach of untangling slight variations in the source language before transferring a source expression to its target. We therefore discuss how effective paraphrasing is in the sense of reducing varieties in a spoken language, with a focus on how many source language patterns are reduced by paraphrasing. We also discuss the characteristics of the spoken Japanese by the paraphrasing patterns we obtain.

  4. Yujie Zhang, Kazuhide Yamamoto and Masashi Sakamoto.
    Paraphrasing Utterances by Reordering Words Using Semi-Automatically Acquired Patterns.
    Proc. of NLPRS2001, pp.195-202 (2001.11) [PDF]
    How to deal with unrestricted expressions in spontaneous utterances is one of the issues in spoken language translation. One method is to automatically paraphrase utterances prior to transfer. In the spoken Chinese language, there are a large number of variants due to different word orders. In this paper, we focus on how to paraphrase utterances by reordering words and propose a pattern-based approach. We also describe a method of automatically learning paraphrasing patterns from a paraphrase corpus, and an efficient description tool of integrating human experiences into patterns. Experimental results are also reported.

  5. Chengqing Zong, Yujie Zhang, Kazuhide Yamamoto, Masashi Sakamoto and Satoshi Shirai.
    Approach to Spoken Chinese Paraphrasing Based on Feature Extraction.
    Proc. of NLPRS2001, pp.551-556 (2001.11) [PDF]
    This paper presents an approach to spoken Chinese language paraphrasing based on feature extraction and techniques of language generation. In this approach, an input utterance is first analyzed in terms of phrase structure, dependency of chunks, etc., by using multiple methods. Then, the main features of the input utterance are extracted, and the extraction results are represented by a frame. Finally, other possible expressions of the input are generated based on the analysis results by different methods. Preliminary results are shown in the paper.

  6. Kiyonori Ohtake and Kazuhide Yamamoto.
    Paraphrasing Honorifics.
    Proc. of NLPRS2001 Workshop on Automatic Paraphrasing: Theories and Applications, pp.13-20 (2001.11) [PDF] [presentation]
    This paper reports on a paraphrasing method for Japanese honorifics. Japanese honorific expressions, as seen in real world dialogs, have many forms of identical meanings. This paper discusses a paraphrasing method that simplifies each utterance by removing honorifics. To simplify an utterance, we take a practical approach: investigating a corpus, and construct paraphrasing rules that eliminate honorifics. We discuss how the constructed paraphrasing rules are effective for the simplification of each utterance, and a disambiguation method for some honorific verbs that require disambiguation in order to be paraphrased.

  7. Satoshi Shirai, Kazuhide Yamamoto and Francis Bond.
    Japanese-English Paraphrase Corpus.
    Proc. of NLPRS2001 Workshop on Language Resources in Asia, pp.23-30 (2001.11) [PDF]
    This paper introduces an attempt at collecting a corpus of various usages of Japanese predicates and synonymous expressions in English. We have learned that an effective consideration to exhaustively collect such various usages is to continue to create new sentences until no more sentences can be conceived within one language. We have found that an effective way of collecting synonymous expressions of predicates in Japanese-English or English-Japanese translation, is to create translations of the synonymous expressions and expand them to example sets of multiple pairs. An example of the corpus is given below: (J0):Kare-no kikaku-ga atatta. (J1):Kare-no kikaku-ga seikou-shita. (E0):His plan was a success. (E1):His plan succeded. (E2):His plan was successful. Here, the two Japanese sentences and three English sentences have basically the same meaning, and give rise to a bilingual corpus of six pairs (J0-E0, J0-E1, J0-E2, J1-E0, J1-E1, J1-E2). The sentences can also be used as examples of mono-lingual paraphrases. Sentence creation becomes problematic when sentences that are collected are arbitrary. However, we can reduce the possibility of collecting only arbitrary sentences by writing down all of the sentences that one can think of, or by having multiple checkers mutually perform a check. In other words, we can have the same objectivity as elicitation experiments carried out in linguistics. We have created example sets of multiple pairs (28,000 Japanese sentences and 27,000 English sentences) for 6,000 Japanese predicates. At present, we are working to expand the sets in order to cover the main predicates of the Japanese language.

  8. Chengqing Zong, Yujie Zhang, Kazuhide Yamamoto, Masashi Sakamoto and Satoshi Shirai.
    Paraphrasing Chinese Utterances in Spoken Language Translation System.
    Proc. of ICCC2001 (International Conference on Chinese Computing) pp.395-401 (2001.11) [PDF] (written in Chinese)
    In a spoken language translation system, when the input utterance can't be correctly parsed and translated, if the system can recognize the other possible expressions of the input, it will be very helpful for improving the performance of the translation system. In this paper, we introduce the basic ideas for paraphrasing Chinese utterances and present the preliminary results. In our approach, the key features of an input utterance, including the expression type, tense and syntactic components etc., will be extracted first by using parsing and chunk dependency analysis techniques. The long complex utterances will be segmented. Based on the analysis results, the possible expressions are generated by using language generating techniques.

  9. Satoshi Shirai, Kazuhide Yamamoto and Kyonghee Paik.
    Overlapping Constraints of Two Step Selection to Generate a Transfer Dictionary.
    Proc. of ICSP2001 (International Conference on Speech Processing), pp.731-736 (2001.8) [PDF] [HTML]
    Any machine translation system requires a transfer dictionary between the source and target languages. Typically, since the construction of such a dictionary is done by hand, a lot of time is taken and the cost is enormous. Considering this, we attempted the construction of a bilingual dictionary through the re-generation of already-existing language resources. Aiming at the generation of a Korean-Japanese dictionary, we extracted candidates of Korean and Japanese equivalent pairs by a two-step process of searching through a Korean-English dictionary first and then searching through an English-Japanese dictionary. We also attempted the narrowing down of Korean-Japanese equivalent pairs by the overlapping of obtained Japanese translations. According to a trial experiment using 100 Korean words randomly taken, 61 correct Japanese translations were obtained. Among the correct translations, we took 25 translations for which a search of the English-Japanese dictionary successfully produced two or more translations for the English words obtained in the search results of the Korean-English dictionary. Of the 25 translations, 21 (84%) could be automatically narrowed down by taking the overlapped words from the Japanese translation sets for the individual English words. With the above two-step dictionary extraction, moreover, nine cases out of ten were correct when only one Japanese translation was obtained. These results show the possibility that Korean-Japanese translation pairs can be generated at an expected correctness rate of 44 out of 100 words when using the already proposed method that combines a Korean-English dictionary and a Japanese-English dictionary.

  10. Satoshi Shirai, Kazuhide Yamamoto and Kazutaka Takao.
    Construction of a Dictionary for Translating Japanese Phrases into One English Word.
    Proc. of ICCPOL 2001 (International Conference on Computer Processing of Oriental Languages), pp.3-8 (2001.5) [PDF] [HTML]
    In translation between languages that have different linguistic characteristics like Japanese and English, there are many cases in which contents are not correctly transmitted in the substitution from word to word. A method known to be effective as a measure for this is to determine the translations of verbs and nouns by using valency pattern pairs, which describe the semantic co-occurrences of verbs and nouns as valency patterns, and to pair them in the source language and the target language. However, this does not eliminate the problem of expressions regarded as ungrammatical (from the viewpoint of translation) being translated literally. In this research, we carried out analyses on expressions of compound Japanese nouns and verbs in correspondence with English words, by focusing on Japanese equivalent phrases to English words described in an English-Japanese dictionary. Consequently, there is hope that many cases of expressions created from one Japanese case element and verb corresponding to one English word can be obtained.

  11. Kazuhide Yamamoto, Satoshi Shirai, Masashi Sakamoto and Yujie Zhang.
    Sandglass: Twin Paraphrasing Spoken Language Translation.
    Proc. of ICCPOL 2001, pp.154-159 (2001.5) [PDF] [HTML]
    This paper proposes a new machine translation design that is the core architecture in an on-going project named Sandglass. The Sandglass system places special emphasis on monolingual processing and is designed to effectively deal with spoken languages. The system has good portability from modularity provided by a natural language protocol and monolingual processing reinforcement. This paper clarifies some advantages of the system by discussing several aspects in conventional translation approaches. Currently, Sandglass is being applied to bidirectional Chinese and Japanese spoken language translation involving travel conversation dialogs.

  12. Satoshi Shirai and Kazuhide Yamamoto.
    Linking English Words in Two Bilingual Dictionaries to Generate Another English Pair Dictionary.
    Proc. of ICCPOL 2001, pp.174-179 (2001.5) [PDF] [HTML]
    In developing a machine translation system, one of the difficult tasks is how to build a transfer dictionary. It has been built by human labor from scratch in most cases. This approach, however, is very ineffective from the viewpoint of cost and time. To avoid this problem, we generate a Korean to Japanese dictionary as a sample, taking advantage of existing linguistic resources, which consist of a Japanese to English dictionary and a Korean to English dictionary for the present goal. First, we extract some sets of English words corresponding to Korean words from a Korean to English dictionary. Second, we search for Japanese words having English equivalents that are similar to Korean counterparts in meaning. Finally, we link the Korean words to Japanese ones. The degree of similarity is determined according to how many translated words are shared between Korean and Japanese. We test 1,000 Korean words extracted at random and get 365 appropriate Japanese words. The result shows that 72% are accurate for a degree of similarity of 0.8 and above.

  13. Yujie Zhang and Kazuhide Yamamoto.
    Analysis of Chinese Spoken Language for Automatic Paraphrasing.
    Proc. of ICCPOL 2001, pp.290-293 (2001.5) [PDF]
    In this paper, we propose a paraphrasing approach to spoken language processing and introduce our preliminary investigation on phenomena of the Chinese spoken language. In spoken language processing, many problems have still not been resolved satisfactorily, such as ungrammatical expressions due to spontaneous utterances and speech recognition errors due to noisy environments. One of the important issues in this field is how to achieve robustness against these phenomena. We propose transforming various expressions of a spoken language into formal expressions of a written language with the same meanings, i.e. paraphrasing. For this purpose, we design three types of paraphrasing processes, i.e. (1) to correct speech recognition errors (2) to provide formal and simple expressions, and (3) to add informative expressions for disambiguation. In order to automatically paraphrase the Chinese spoken language, we carry out an investigation into phenomena of Chinese spontaneous utterances in the ATR travel conversation corpus and LDC CallHome Mandarin transcript corpus. The investigation results point out the direction of future research.

  14. Kazuhide Yamamoto and Eiichiro Sumita.
    Multiple Decision-Tree Strategy for Input-Error Robustness: A Simulation of Tree Combination.
    Proc. of ICSLP 2000, Vol.I, pp.489-492 (2000.10) [PDF]
    This paper illustrates the characteristics of the multiple decision-tree (MDT) model, which we proposed in a previous work. MDT is an extension of the decision-tree model and is proposed for its robustness against input uncertainty. We present simulation results to show that the MDT model is task-independent and outperforms both the conventional decision-tree model and the majority model against noisy inputs.

  15. Kazuhide Yamamoto and Eiichiro Sumita.
    Multiple Decision-Tree Strategy for Error-Tolerant Ellipsis Resolution.
    Proc. of NLPRS'99, pp. 292-297 (1999.11) [PDF]
    A new approach to robust ellipsis resolution for spoken-language translation is proposed. The strategy consists of a multiple decision-tree (MDT) model and a preference strategy. The proposed MDT model is an extension of decision tree model, thus it is flexible since it is language-independent and task-independent. The preference strategy is a simple but strong preference. We will show that it can maintain a performance with minimum drops against any kinds of errors. It is also important to note that the model also outperforms our conventional model against non-error inputs.

  16. Akira Kataoka, Shigeru Masuyama and Kazuhide Yamamoto.
    Summarization by Shortening a Japanese Noun Modifier into Expression "A no B".
    Proc. of NLPRS'99, pp. 409-414 (1999.11) [PDF] [presentation]
    We propose a method of paraphrasing a Japanese noun modifier into a noun phrase in the form of "A no B." The semantic structure of "A no B" are sometimes recognized by supplementing some abbreviated predicate. We define these abbreviated verbs as "deletable verbs" in two ways: 1. We choose verbs matched with the semantic relations of "A no B" by using a thesaurus. 2. We choose verbs associated with specific nouns. If a verb frequently co-occurs with a noun in newspaper articles, we concluded that the verb is associated with the noun. By defining "deletable verbs" and utilizing a variety of the semantic structure of "A no B," we accomplished this paraphrasing by using surface linguistic characteristics.

  17. Kiyonori Ohtake, Masahiko Nedu, Shigeru Masuyama and Kazuhide Yamamoto.
    Automated Acquisition of Case Frame with Case Order.
    Proc. of NLPRS'99, pp. 503-506 (1999.11) [PDF] [presentation]
    This paper proposes a case transition network model to provide a framework for representing case order information in addition to a Japanese case frame. The model is regarded as an extension of bi-gram model employing a case element as a unit. A preliminary investigation of the model leads us to the conclusions that the transition network has sufficient capacity to acquire case frames with case order.

  18. Iram Shahzad, Kiyonori Ohtake, Shigeru Masuyama and Kazuhide Yamamoto.
    Identifying Translations of Compound Nouns Using Non-aligned Corpora.
    Proc. of NLPRS'99 Workshop on Multilingual Information Processing and Asian Language Processing (MAL'99), pp. 108-113 (1999.11) [PDF] [presentation]
    A compound noun and its translation do not always have a correspondence with each other in part-by-part basis. Therefore, there are cases where utilizing the translations of the constituent words for extracting the translation of the compound noun is ineffective. We propose a method which copes with this defect. At first, it detects the parts of the target-language corpus which are likely to contain the translation, by using the context of the compound noun. Then, it extracts the translation using some heuristics.

  19. Eiichiro Sumita, Setsuo Yamada, Kazuhide Yamamoto, Michael Paul, Hideki Kashioka, Kai Ishikawa and Satoshi Shirai.
    Solutions to Problems Inherent in Spoken-Language Translation: The ATR-MATRIX Approach.
    Proc. of MT Summit VII, pp.229-235 (1999.9) [PDF]
    ATR has built a multi-language speech translation system called ATR-MATRIX. It consists of a spoken-language translation subsystem, which is the focus of this paper. together with a highly accurate speech recognition subsystem and a high-definition speech synthesis subsystem. This paper gives a road map of solutions to the problems inherent in spoken-language translation. Spoken-language translation system need to tackle difficult problems such as ungrammaticality, contextual phenomena, speech recognition errors, and the high-speed required for real-time use. We have made great strides towards solving these problems in recent years. Our approach mainly uses an example-based translation model called TDMT. We have added the use of extra-linguistic information, a decision tree learning mechanism, and methods dealing with recognition errors.

  20. Michael Paul, Kazuhide Yamamoto and Eiichiro Sumita.
    Corpus-Based Anaphora Resolution Towards Antecedent Preference.
    Proc. of the 37th ACL Workshop on Coreference and It's Applications, pp.47-52 (1999.6) [PDF]
    In this paper we propose a corpus-based approach to anaphora resolution combining a machine learning method and statistical information. First, a decision tree trained on an annotated corpus determines the coreference relation of a given anaphor and antecedent candidates and is utilized as a filter in order to reduce the number of potential candidates. In the second step, preference selection is achieved by taking into account the frequency information of coreferential and non-referential pairs tagged in the training corpus as well as distance features within the current discourse. Preliminary experiments concerning the resolution of Japanese pronouns in spoken-language dialogs result in a success rate of 80.6%.

  21. Kazuhide Yamamoto.
    Proofreading Generated Outputs: Automated Rule Acquisition and Application to Japanese-Chinese Machine Translation.
    Proc. of ICCPOL-99, pp.87-92 (1999.3) [PDF]
    Automated proofreading, or the rewriting of generated outputs is discussed in this paper. We propose a new method of proofreading, which consists of an automatic rule acquisition module and its application module. Proofreading rules are described based on n-grams. In rule acquisition module, provisional rules are collected and then filtered out by ``timid'' policy. We utilize four kinds of screening processes. Our method has been implemented into a generation module of our Japanese-Chinese MT system (TDMT-JC). In a preliminary experiment, we could prove that our proposed method can acquire rules in a practical time and can improve the naturalness in rule application sessions.

  22. Kazuhide Yamamoto and Eiichiro Sumita.
    Feasibility Study for Ellipsis Resolution in Dialogues by Machine-Learning Technique.
    Proc. of COLING-ACL'98, pp.1428-1435 (1998.8) [PDF]
    A method for resolving the ellipses that appear in Japanese dialogues is proposed. This method resolves not only the subject ellipsis, but also those in object and other grammatical cases. In this approach, a machine-learning algorithm is used to select the attributes necessary for a resolution. A decision tree is built, and used as the actual ellipsis resolver. The results of blind tests have shown that the proposed method was able to provide a resolution accuracy of 91.7% for indirect objects, and 78.7% for subjects with a verb predicate. By investigating the decision tree we found that topic-dependent attributes are necessary to obtain high performance resolution, and that indispensable attributes vary according to the grammatical case. The problem of data size relative to decision-tree training is also discussed.

  23. Osamu Furuse, Setsuo Yamada and Kazuhide Yamamoto.
    Splitting Long or Ill-formed Input for Robust Spoken-language Translation.
    Proc. of COLING-ACL'98, pp.421-427 (1998.8) [PDF]
    This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT, which utilize left-to-right parsing and a score for a substructure. Experimental results show that the proposed method gives TDMT the following advantages: (1) elimination of null outputs, (2) splitting of utterances into sentences, and (3) robust translation of erroneous speech recognition results.

  24. Kazuhide Yamamoto, Eiichiro Sumita, Osamu Furuse and Hitoshi Iida.
    Ellipsis Resolution in Dialogues via Decision-Tree Learning.
    Proc. of NLPRS'97 pp.423-428 (1997.12) [PDF]
    As various elements should be considered in the resolution of subject ellipsis in Japanese dialogues, it is difficult to determine the contribution of each. Building a decision tree by tagged training sets automatically, however, gives weighting of importance to each element. The results of window tests have shown that the proposed method could provide resolution accuracy of over 90% in the total average rate of identification.

  25. Kazuhide Yamamoto, Shigeru Masuyama and Shozo Naito.
    An Empirical Study on Summarizing Multiple Texts of Japanese Newspaper Articles.
    Proc. of NLPRS'95, pp.461-466 (1995.12) [PDF]
    In this paper, we attempt to summarize multiple Japanese articles into one document with non-parsing approach. We mention deletions of the overlapped part among the input texts. Japanese grammar is considerably free in word order and allows high abridgment. This research aims at coping with these phenomena. This paper focuses on the following three points to sum up : same clauses, noun modifiers, and change in wording. We have implemented a prototype system of summarization.

  26. Kazuhide Yamamoto, Shigeru Masuyama and Shozo Naito.
    Automatic Text Classification Method with Simple Class-Weighting Approach.
    Proc. of NLPRS'95, pp.498-503 (1995.12) [PDF]
    This paper proposes an automatic text classification method using class-weighting, or group-of-term-weighting approach as an extension of term-weighting approach to consider change in wording. As a measure of importance, we introduce the product of the class frequency and the inverse document frequency (of classes). We use thesaurus to group all the terms by meaning. This paper reports the result of open-test experiments against Japanese columns which clarifies the efficacy of this classification method.


kazu_yamamoto@mcn.ne.jp