JPH0348366A

JPH0348366A - Morpheme alanysis, syntax analysis and morpheme forming system

Info

Publication number: JPH0348366A
Application number: JP2048717A
Authority: JP
Inventors: Norikazu Ito; 則和伊藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-04-04
Filing date: 1990-02-28
Publication date: 1991-03-01

Abstract

PURPOSE:To flexibly execute composite word processing by registering the whole composite word formed through a specific character in a translating temporary dictionary to be used for translation processing such as syntax analysis as an unknown word when the composite word is not included in a dictionary. CONSTITUTION:When a composite word formed by a specific character such as a hyphen (-) forming one composite word by mutually connecting plural words is included in a test to be analyzed in a morpheme analyzing part in a translation body part (translation part) 7, the composite word is retrieved by a dictionary 8 included in the system. When the composite word is not included in the dictionary, the whole composite word is registered in the translating temporary dictionary to be used for the translation processing such as syntax analysis as an unknown word. Since the retrieval of a composite word formed by connecting plural word through a specific character such as a hyphene in the dictionary by the morpheme analyzing part is improved, proper processing interlocking with the succeeding processing of the syntax analyzing part is executed even when the syntax analysis result of the composite word succeeds or fails. Consequently, the processing of the composite word based upon the specific character such as a hyphene can be flexibly executed.

Description

【発明の詳細な説明】技笹発立本発明は、機械翻訳等の自然言諸解析システムにおける
形態素解析方式、構文解析方式及び形態素生成方式に関
する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a morphological analysis method, a syntactic analysis method, and a morphological generation method in natural language analysis systems such as machine translation.

従末挟先本発明に係る従来技術として、特開昭６３−２０５６８
号公報がある。この特開昭６３−２０５６８号公報は、
単語を相五に結合して１つの複合語を形成するハイフン
「一』等の特殊文字を辞書登録のとき調べ，特殊文字を
スペースに変換して辞書へ登録し、翻訳すべき文字列に
は複合語を形成する特殊文字を付しても翻訳できるよう
にした機械翻訳装置に関するものである。As a prior art related to the present invention, Japanese Patent Application Laid-Open No. 63-20568
There is a publication. This Japanese Patent Application Publication No. 63-20568 is
When registering in the dictionary, check for special characters such as the hyphen "ichi" that combine words into one compound word, and convert the special characters to spaces and register them in the dictionary. The present invention relates to a machine translation device that can translate even when special characters forming compound words are added.

自然言語の解析において未知語の処理は意外と厄介な問
題である。この世のありとあらゆる言葉や表現を備えた
辞書はありえない。どうしても辞書に入っていない言葉
や表現は出てくるものである．辞書にない以上、その言
葉が何であるか、どういう意味であるかを推測しなけれ
ばならない。Processing unknown words in natural language analysis is a surprisingly difficult problem. There is no such thing as a dictionary that contains all the words and expressions of this world. Inevitably, words and expressions that are not in the dictionary will appear. Since it's not in the dictionary, you have to guess what the word is and what it means.

これにはいくつか方法がある。簡単に言うと、まず、大
文字で始まる未知語がある。これは固有名副と推定する
．意味までは推定できないから、例えば原文が英語なら
ば訳語はアルファベットのままとなる。小文字で始まる
未知語の場合は、語尾の形で推定できれば特定の品詞を
与える。何も手掛かりが無ければ，未知語としての頻度
の高い名，１シ１、形容詞、動詞をふっておく。これら
も意味はわからないので「原文Ｊ−１−ｒする」とか『
原文」＋「の」などとする。これ以外に、意味を推定で
きる未知語がある．「接頭辞」＋「既知語」、「既知語
」＋「接尾辞」といった「接辞」十「既知語」の組み合
わせと「既知語」−（ハ｛７ン）「既知語」のハイフン
等の特殊文字によって結合されて形成される複合語であ
る。これらはつづり全体で辞書に入っていないことはよ
くあるが、個々の構成要素は辞書にあるものである．よ
って部分ごとの訳を構成して全体の訳語を推定する。こ
の訳語構或は「接辞」＋「既知語」の組み合わせのとき
は比較的容易であるが、ハイフン等の特殊文字による複
合語のときは長くなると難しくなってくる。ハイフン等
の特殊文字による複合語はその構成要素の組み合わせか
ら成る言葉であるので、その意味を推定するときはハイ
フン等の特殊文字による複合語の構成要素の間で構文解
析を行う゛。これがうまくできるときはよいが、できな
いときはかえってわけのわからない結果になり、ひいて
はその複合語がある翻訳対象文の構文解析に悪影響を及
ぼすことになる。従って全体としては、未知語であるハ
イフン等の特殊文字による複合語を解析して、失敗した
ときは全体を未知語として扱った方がよい。There are several ways to do this. To put it simply, there are unknown words that start with a capital letter. This is presumed to be a proper name subscript. Since the meaning cannot be estimated, for example, if the original text is in English, the translated word will remain in the alphabet. In the case of an unknown word that starts with a lowercase letter, a specific part of speech is given if it can be estimated from the shape of the ending. If you don't have any clues, write down names, names, adjectives, and verbs that are frequently used as unknown words. I don't know the meaning of these either, so I say things like ``original J-1-r'' and ``
For example, ``original text'' + ``no''. In addition to these, there are other unknown words whose meanings can be estimated. Combinations of ``affixes'' + ``known words'' such as ``prefix'' + ``known word'' and ``known word'' + ``suffix'' and hyphens, etc. in ``known word'' - ``known word'' A compound word formed by combining special characters. These spellings as a whole are often not found in dictionaries, but the individual components are. Therefore, we construct translations for each part and estimate the entire translation. This translation structure is relatively easy when it is a combination of "affix" + "known word", but it becomes difficult when it becomes long when it is a compound word with special characters such as a hyphen. A compound word with special characters such as a hyphen is a word made up of a combination of its components, so when estimating its meaning, syntax analysis is performed between the components of the compound word with special characters such as a hyphen. It's good when you can do this well, but when you can't do it, you end up with rather incomprehensible results, which in turn has a negative impact on the parsing of the sentence to be translated that contains the compound word. Therefore, overall, it is better to analyze unknown compound words with special characters such as hyphens, and if the analysis fails, treat the whole word as an unknown word.

且−一匁本発明は、上述のごとき実情に鑑みてなされたもので，
ハイフン等の特殊文字により結合されて形威される複合
語の形態素解析部における辞書引きを改良して、続く構
文解析部での処理と連動して、ハイフン等の特殊文字に
よる複合語の構文解折時に、結果が成功しても失敗して
も相応の処理を施すことにより、ハイフン等の特殊文字
による複合語処理が柔軟に行えるようにした形態素解析
方式、構文解析方式及び形態素生戊方式を提但すること
を目的としてなされたものである。The present invention was made in view of the above-mentioned circumstances, and
We have improved the dictionary lookup in the morphological analysis unit for compound words that are formed by combining them with special characters such as hyphens, and in conjunction with the processing in the subsequent syntactic analysis unit, we can solve the syntax of compound words with special characters such as hyphens. We have developed a morphological analysis method, a syntactic analysis method, and a morpheme generation method that can flexibly process compound words using special characters such as hyphens by applying appropriate processing depending on whether the result is successful or not. However, this was done for the purpose of providing this information.

邦『」欠本発明は、上連のごとき尖↑Ｉ１１に鑑みてなされたも
ので、（１）少ＩＡ　＜とも翻訳用のシステム辞ａを右
する機絨翻訳へ午の自然言請解析システムにおいて，形
態素解析部での解析対象テキストに，単語を相互に結合
して１つの複合語を形成するハイフン「一」等の特殊文
字によって形成される複合＋Ｊｆがあるとき、該複合語
をシステムが備える牌書によって辞書引きし、辞書中に
存在しないときは該複合：６全体を未知語として構文解
析等翻訳処理に利用する翻訳用の一時辞書に登録するこ
と、（２）前記釘１訳用の一時辞書に登録される未知語
の複合語を名詞または形容詞または動詞と推定すること
、（　３　）　＋’ｒＭ記翻１沢用の一時辞書に登録さ
れる未知語の複合語を名詞または形容詞または動詞と推
定するとき，これらの品詞に名詞、形容詞、動詞の順に
優先１幀位をつけること，（４）少なくとも翻訳用のシ
ステム辞書を有する機械翻訳等の自然言語解析システム
において、形態素解析部でのＦ＋￥析対象テキストに、
単請を相互に結合して１つの複合語を形成するハイフン
「一」等の特殊文字によって形成される複合語があると
き、該複合語をシステムが備える辞書によって辞書引き
し．辞書中に存在しないときは該複合語を構成する個々
の構成要素を辞書引きし、それらの辞書情報を構文解析
等の翻訳処理に利用することができること，（５）前記
（１），（２），（３）．（４）の特ｙＩｉヲ合わせて
イｊすること、（６）少なくとも翻訳用のシステム辞〃
を有する機械翻訳等の自然言語解析システムにおいて，
構文解析部でのＮＩＦｔ対象テキストに，単諸を相互に
桔合してｌっの複合語を形成するハイフン「一」等の特
殊文字によって形成される複合語があり、該複合語がシ
ステムの持つ．！ｉｉ′書中に存在しないときには該複
合語を構成要素となる語の並びで構文解析し、解析が失
敗となるときには該複合１７ｆｔ全体を未知語として処
理して構文解析を進めて行くこと、（７）少なくとも翻
訳用のシステム辞，（Ｆを右する機械翻訳等の自然＋：
　Ｈｌ．ｔ解析システムにおいて、構文解析部での解折
対亀テキス１−に、ｉｌ’　Ｖｔｌを相互に結合して工
っの複合語を形成するハイフン「一」等の特殊文字ニよ
って形成される複合語があり、該複合語がシステムの持
つ辞δ中に存在しないときに、該複合語を｛１４成要素
となる語の並びで構文解析し、解析が成功したときには
該複合語全体を名詞または形容洞または動詞と推定して
構文解析を進めて行くこと、（８）前記ハイフン等の特
殊文字によって形成される複合語の構或要素の解析に成
功し、該複合Ｊ６全体を名詞または形容詞または動詞と
推定するとき、これらの品川に名詞、形容詞、動詞の順
に優先順位をつけて構文解析を進めていくこと、（９）
少なくとも翻訳用のシステム辞書を有する機械翻訳等の
自然言語解析システムにおいて、構文解析部での解析対
象テキストに、単Ｊ６を相互に結合して１つの複合語を
形成するハイフンｒ−Ｊ等の特殊文字によって形成され
る複合語があり、該複合語が、システムの持つ辞書中に
存在しないときに，該瑠合語をハイフン等の特殊文字を
スペースに置き換えて、特別な意味を持たない単なる複
数の語の並びとして構文解析を進めて行くこと、（１０
）少なくとも翻訳用のシステｌ１辞書を有する機械翻訳
等の自然ｇ語解析システムにおいて，形態素生成部で、
単語を相互に結合して１つの複合語を形成スるハイフン
「一」等の特殊文字によって形成される複合語が、該複
合語の個々の構成要素で辞書引きされて構文解析され，
該個々の要素を訳出するときに、該個々の構成要素間″
に前記特殊文字を挿入して生成することを特徴としたも
のである．以下，本発明の実施例に基づいて説明する。The present invention was made in view of the above-mentioned points ↑I11, and includes: (1) A natural speech analysis system for machine translation, which is a system for translation. , when the text to be analyzed by the morphological analysis unit contains a compound +Jf formed by special characters such as a hyphen "ichi" that combines words to form one compound word, the system Look it up in a dictionary using the prepared tablet, and if it does not exist in the dictionary, register the entire compound:6 as an unknown word in a temporary dictionary for translation to be used for translation processing such as parsing; (2) for the translation of the nail 1; Presuming that unknown word compounds registered in the temporary dictionary are nouns, adjectives, or verbs; (3) Presuming that unknown word compounds registered in the temporary dictionary for ＋'rMkitranslation1 are nouns or adjectives; (4) In a natural language analysis system such as machine translation that has at least a system dictionary for translation, the morphological analysis unit In the text to be analyzed by F+¥,
When there is a compound word formed by special characters such as a hyphen "ichi" that combines single lines to form one compound word, the compound word is looked up using a dictionary provided by the system. When the compound word does not exist in the dictionary, it is possible to look up the individual components constituting the compound word in a dictionary and use the dictionary information for translation processing such as syntactic analysis, (5) (1) and (2) above. ), (3). (4) must be combined with the special features of (6) at least a system dictionary for translation.
In natural language analysis systems such as machine translation that have
There is a compound word in the NIFt target text in the syntactic analysis section that is formed by special characters such as a hyphen "ichi" that combines single words to form a compound word. Have. ! ii' If the compound word does not exist in the book, parse the compound word using the sequence of constituent words, and if the analysis fails, proceed with the parse by treating the entire compound 17ft as an unknown word, ( 7) At least a system dictionary for translation, (natural + such as machine translation that corrects F:
Hl. In the t-analysis system, the syntactic analysis unit analyzes the text 1-, and the compound word 2, which is formed by combining il' Vtl with special characters such as the hyphen ``ichi'', which forms a compound word ``1''. When there is a compound word and the compound word does not exist in the dictionary δ held by the system, the compound word is parsed using the sequence of {14 component words, and if the parsing is successful, the entire compound word is converted into a noun or (8) Succeeding in analyzing the constituent elements of the compound word formed by special characters such as the hyphen, and converting the entire compound J6 into a noun, adjective, or When estimating a verb, prioritize these Shinagawa in the order of noun, adjective, and verb, and proceed with syntactic analysis (9)
In a natural language analysis system such as a machine translation system that has at least a system dictionary for translation, special characters such as hyphen r-J that combine simple J6 to form one compound word are added to the text to be analyzed by the syntactic analysis unit. When there is a compound word formed by characters, and the compound word does not exist in the dictionary of the system, special characters such as hyphens are replaced with spaces, and the compound word is made into a simple plural word with no special meaning. (10
) In a natural g-word analysis system such as machine translation that has at least a System 1 dictionary for translation, the morpheme generation unit
Compound words formed by special characters such as a hyphen "ichi" that combine words to form one compound word are looked up in a dictionary and parsed using the individual constituents of the compound word,
When translating the individual elements, between the individual constituent elements
The feature is that the above special characters are inserted into the . The following is a description based on embodiments of the present invention.

第１図は、本発明による形態素解析方式，構文解析方式
及び形態素生成方式を備える翻訳装置の一実施例を説明
するための構或図で、図中、ｌはＣＲＴ、２はキーボー
ド，３はＯＣＲ、４は入力文ａ、５はスペルチェック部
、６は前編集部、７は翻訳本体部、８はシステム辞書、
９は文法規則，１０は後編集部、１１は出力文書、１２
はプリンタである。FIG. 1 is a configuration diagram for explaining an embodiment of a translation device equipped with a morphological analysis method, a syntactic analysis method, and a morphological generation method according to the present invention. In the figure, l is a CRT, 2 is a keyboard, and 3 is a OCR, 4 is input sentence a, 5 is spell check section, 6 is pre-editing section, 7 is translation main section, 8 is system dictionary,
9 is a grammar rule, 10 is a post-editing department, 11 is an output document, 12
is a printer.

ファイル入力、キーボード入力，ＯＣＲ入力のいずれか
によって得た入力文はスペルチェック部５、前編集部６
を用いて前処理を行う。翻訳本体部７によって得られた
出力文は後編集部１０によって翻訳情報を利用して編集
できる。入力文と出力文はプリンタ１２を用いて印刷で
きる。Input sentences obtained by file input, keyboard input, or OCR input are sent to the spell check section 5 and the pre-edit section 6.
Perform pretreatment using The output sentence obtained by the translation main unit 7 can be edited by the post-editing unit 10 using translation information. The input and output sentences can be printed using printer 12.

翻訳本体部（翻訳部）７は大きく分けて４つの処理から
なる．第２図は，翻訳本体部の処理フローを示す図であ
る。形態素解析部では、入力テキストの辞書引きを行な
う。構文解析部では、個々の語の情報を得て文法規則に
従ってバージングを行い、解析結果から木構造を作或す
る．変換部では，入力言語の木構造から出力言語の木構
造に変形する．生成部では、得られた木構造をノードご
とに訳出する。本発明は形態素解析部および構文解析部
および形態素生成部に属するものである。The translation main unit (translation unit) 7 is roughly divided into four processes. FIG. 2 is a diagram showing the processing flow of the translation main body. The morphological analysis section performs dictionary lookup of the input text. The syntactic analysis section obtains information about individual words, performs parsing according to grammatical rules, and creates a tree structure from the analysis results. The conversion part transforms the tree structure of the input language into the tree structure of the output language. The generation unit translates the obtained tree structure node by node. The present invention pertains to a morphological analysis section, a syntactic analysis section, and a morphological generation section.

ここでは入力テキストは英文とする。入力されたテキス
トを対象として、形Ｓ索解析部で辞書引きを行う＝辞書
引きした結果を得て、構文解析部に進む。Here, the input text is English. A dictionary lookup is performed on the input text in the form S search analysis section = the result of the dictionary lookup is obtained and the process proceeds to the syntax analysis section.

ハイフン等の特殊文字の例として、ハイフン′スラッシ
ュ′／′　などが挙げられる。以下，ハイフン語（ハイ
フンによってつながれた連語）を例にとって未知語の複
合語の処理を具体的な例で説明する。Examples of special characters such as hyphens include hyphen 'slash'/'. Hereinafter, processing of unknown compound words will be explained using a specific example of a hyphenated word (a compound word connected by a hyphen).

まず、形態素解析部での処理を説明する。First, the processing in the morphological analysis section will be explained.

（１）　　　一時　　に豊録する場八例えば、ｌｉｇｈｔ−ｍｅａｌ　（軽い食事）という言
葉を挙げてみよう。これをシステム辞書から静書引きす
る。ここでハイフンのない場合、すなわちｌｉｇｈｔ−
ｍｅａｌとして辞書引きすることも可１指である。(1) For example, let's take the word light-meal. Look up this from the system dictionary. Here, if there is no hyphen, that is, light-
You can also look it up in the dictionary as "meal" with just one finger.

辞書引きの結果、もしあれば，その情報を用いて以後の
翻訳処理を進められるのだが、なければ未知語であり，
何等かの処理を施さなければならない。As a result of the dictionary lookup, if there is any information, that information can be used to proceed with the subsequent translation process, but if not, it is an unknown word.
Some kind of processing must be done.

梓通，辞書はある言葉について少なくとも以下の情報を
持っているものである。ある見出し語に対して、１）品
詞、２）変化形、３）品詞の優先度、４）訳語、５）訳
語の品詞、活用形を最低持っている。Azusa: A dictionary has at least the following information about a certain word. For a certain headword, it has at least 1) part of speech, 2) declension, 3) priority of part of speech, 4) translation, and 5) part of speech and conjugation of the translation.

これらは、通常、翻訳用のシステム辞ａに記述されてい
るが、未知語の場合はこういったものを一切持っていな
い。従って翻訳の際に、一時的な辞書を作成して、そこ
にその未知語の情報を書き入れる。ハイフン語で未知語
の場合は、大部分は名詞だが、優先度を下げて形容詞，
動詞などの情報を与えてもよい。訳語はアルファベット
のままとする（　’ｌｉｇｈｔ−ｎ＋ｅａｌ’　）　．
名詞には特別な語尾はいらないが，形容詞には「の』、
動詞には「する」などを付加する。これらの情報を一時
辞書に蓄え、そしてこの情報を辞書引きしてメモリー上
に備えるか、あるいはファイルに書き出してもよい。そ
して次に処理段階である構文解析に渡して、そこでこれ
らの情報を使うのである。These are usually described in the translation system dictionary a, but unknown words do not have any of these. Therefore, when translating, a temporary dictionary is created and information about the unknown word is written into it. Most of the unknown words in hyphenated languages are nouns, but they are lowered in priority to adjectives,
Information such as verbs may also be given. The translated words should remain alphabetical ('light-n+eal').
Nouns do not require a special ending, but adjectives do not require a special ending.
Add ``to'' to the verb. This information may be temporarily stored in a dictionary, and this information may be looked up in the dictionary and stored in memory, or may be written out to a file. The next step is parsing, which uses this information.

ｌｉｇｈｔ−ｍｅａｌ全体が辞書に入っていないとき、
’ｌｉｇｈｔ”　ｍｅａｌ’　を辞書引きして、例えば
以下のような結果を得る。When the entire light-meal is not in the dictionary,
When we look up 'light'meal' in a dictionary, we get the following result, for example.

ｌｉｇｈｔ　　軽い　　　　　形容詞明るい光　　　　　　　　　名詞ライト火をつける　　動詞火がつくｍｅａｌ　　食事　　　　　名詞得られたこれらの辞書情報をメモリー上に備えるか、あ
るいはファイルに７｝き出してもよい。そして次の処理
段階である構文解析に渡して、そこでこれらの情報を使
うのである。また、このとき、個々の語も辞書に存在し
ない場合も考えられる。light adjective bright light noun light light verb light meal meal noun The obtained dictionary information may be stored in memory or may be exported to a file. This information is then passed to the next processing step, parsing, where this information is used. Further, at this time, it is also possible that individual words do not exist in the dictionary.

そのときは前記（１）に準じた未知語処理を当該語に対
して施す。ｌｉｇｈｔ−ｍｅａｌについては「軽い食事
』との第１訳語が与えられることになるが、形態素生成
部で原文にあったハイフンをそのまま訳出して「軽い一
食事」としてもよい。またこのハイフンは半角でも全角
でもよい。In that case, unknown word processing similar to (1) above is performed on the word. For light-meal, the first translation will be given as "light meal," but the morpheme generator may translate the hyphen in the original sentence as is, resulting in "light meal." Also, this hyphen may be half-width or full-width.

（３）双Ｊの　　を八わせ持つ　八前記（１），（２）の，ｆｆ害引きの結果双方を持つこ
とになる，これらの辞書情報をメモリー上に備えるか、
あるいはファイルに書き出してもよい．そして次の処理
段階である構文解析に渡して、そこでこれらの情報を使
うのである。(3) Have eight double J's.Have both of the above (1) and (2) as a result of ff harm.Do you have these dictionary information in memory?
Or you can write it to a file. This information is then passed to the next processing step, parsing, where this information is used.

以下、構文解析部での処理の説明に移る。The following will explain the processing in the syntax analysis section.

なお、構文解析部での処理は、上記の説明の（３）すな
わち、双方の静書↑＋ｌＩ報を合わせ持つという前提で
説明する。Note that the processing in the syntax analysis unit will be explained on the assumption that (3) in the above explanation, that is, both static letters ↑+lI information are combined.

（１）ハイフン　の　　に　　する（　Ｉ　−ｔｈｉｎｋ−　Ｉ　−ｃｏＩｎ）Ｔｈｅ　ｃ
ｏｍｐａｎｙ　ｍａｒｋＣｔｓ　ｉｔｓ　ｓｅｒｖｉｃ
ｅｓ　ｗｉｔｈ　ａｐｌｕｃｋｙ，　　Ｉ−ｔｈｉｎｋ
−Ｉ−ｃａｎ　ｓｐｉｒｉｔ．ハイフン語を解析してそ
の結果をそのままにしたときの翻訳結果は以下のとおり
である。(1) Add to the hyphen (I-think- I-coIn) The c
company markCts its service
es with aplucky, I-think
-I-can spirit. The translation result when hyphen words are analyzed and the result is left as is is as follows.

「会社はそのサービスを元気のいくで，私は私がアルコ
ールをかん詰にすると思う市場へ出荷する。」ハイフン
語を未知語としたときの翻訳結果は以下のとおりである
。``The company sells its services to markets where I think it canned alcohol.'' When the hyphen word is used as an unknown word, the translation result is as follows.

「会社は、元気のいいＴ　−ｔｈｉｎｋ−　Ｉ　−ｃａ
ｎ　％’ｆ神でそのサービスを市場へ出荷する。」（２）ハイフン　の　　に　　する（　ｒ　−ｔｈｉｎｋ−　Ｉ　−ｃａｎ−ｄｏ−ｉｔ）
Ｔｈｅ　ｃｏｍｐａｎｙ　ｍａｒｋｅｔｓ　ｉｔｓ　ｓ
ｅｒｖｉｃｅｓ　ｗｉｔｈ　ａｐｌｕｃｋｙ，　　Ｉ−
ｔｈｉｎｋ−Ｉ−ｃａｎ−ｄｏ−ｉｔ　ｓｐｉｒｉｔ．
「会社は、元気のいい私が私がそれをすることができる
と思う桔神でそのサービスを市場へ出荷する。」では実
際のハイフン語解析がどのように行われているかについ
て、簡ｉｌｉな規則を示しながら説明する。“The company is a vibrant T-think-I-ca
n%'f God and ship the service to the market. (2) Add to the hyphen (r-think-I-can-do-it)
The company markets its
services with aplucky, I-
think-I-can-do-it spirit.
``The company will ship its services to market with a cheerful person who thinks I can do it.''Here's a quick explanation of how hyphen parsing is done in practice. Explain by showing the rules.

Ａ．品詞分類コードの説明ｐｒｎ　（主格代名詞）．　ｐｒｏ（目的格代名詞）、
ａｕｘ（助動詞）．　ｉｔｌ（他動詞、句を目的語にと
る）、１ｔ２（他動詞、ｔｈａｔ節を目的語とする）。A. Explanation of part-of-speech classification code prn (nominative pronoun). pro (objective pronoun),
aux (auxiliary verb). itl (transitive verb, which takes a phrase as its object), 1t2 (transitive verb, which takes that clause as its object).

Ｂ．文法コードの説明ＳＮ（主格名詞句）．ＯＮ（目的語名詞句）、ＶＩ（不
定形連語）、ＶＣ（定形述語）、Ｏ　Ｃ　（ｔｈａｔ節
、ｔｈａｔは省略可），ＳＧ（文末記Ｂを含まない文）
、ＴＬ（タイトル、または名詞句）。B. Grammar code explanation SN (nominative noun phrase). ON (object noun phrase), VI (infinitive collocation), VC (finite predicate), O C (that clause, that can be omitted), SG (sentence not containing sentence-final clause B)
, TL (title, or noun phrase).

Ｃ６文法規則１．　　３Ｇ　　−＞　　ＳＮ　　　ｖＣ２．　　０Ｃ
　　−＞　　ＳＧ３．　　Ｖ　Ｉ　　−＞　　ｉｔｌ　　○Ｎ４．　　Ｖ
　Ｉ　　−＞　　ｉｔ２　　ＱＣ５．　　ＶＣ　　−＞
　　ａｕｘ　　Ｖ　１６．　　ＶＣ　　−）　　ＶＩ７．　　Ｓ　Ｎ　　−＞　ｐｒｎ８．ＯＮ　　−＞　　ｐｒｏただし、行頭の数字は規則番号である。C6 grammar rules 1. 3G->SN vC2. 0C
-> SG 3. VI -> itl ○N4. V
I -> it2 QC5. VC->
aux V 16. VC-)VI 7. S N -> prn 8. ON -> pro However, the number at the beginning of the line is the rule number.

まず、下記ａのハイフンｌｉ＋’ｊの解析を語末がら行
う．簡単にするためにそれぞれの語は最も代表的な品詞
分類だけを持つとする。Ｉ　−ｔｈｉｎｋ−　Ｉ−ｃａ
ｎはこのままの形では辞書に登録され゛てぃない未知語
である。First, analyze the hyphen li+'j in a below from the end of the word. For simplicity, each word is assumed to have only the most representative part-of-speech classification. I-think- I-ca
In its current form, n is an unknown word that cannot be registered in dictionaries.

（以１；余白）心価々与えられた規則と品詞列ではこのハイフン語は一つの文
法コードにまとまらず解析失敗となる。(1; Margin) With the given rules and part-of-speech sequence, this hyphenated word cannot be combined into a single grammatical code, resulting in a failure in analysis.

この結果をそのまま使って全体の構文解析を行うと以下
の例文の場合では次のような悪い翻訳結果を得る。If you use this result as is to analyze the entire syntax, you will get the following bad translation results for the example sentence below.

Ｔｈｅ　ｃｏｍｐａｎｙ　ｍａｒｋｅｔｓ　　ｉｔｓ　
ｓｅｒｖｉｃｅｓ　ｗｉｔｈ　ａｐｌｕｃｋｙ，　　Ｉ
−ｔｈｉｎｋ−Ｉ−ｃａｎ　　ｓｐｉｒｉｔ．「会社は
そのサービスを元気のいくで、私は私がアルコールをか
ん詰にすると思う市場へ出荷する。」これではハイフン
語内の解析を行った意味がないので失敗したときには未
知語扱いとする。すなわち　Ｉ　−ｔｈｉｎｋ−　Ｉ−
ｃａｎを未知語とする．この文ではｓｐｉｒｉｔにかか
る名詞修飾成分となる．このときに以下の訳を得る。The company markets its
services with aplucky, I
-think-I-can spirit. ``The company sells its service to a market where I think it canned alcohol.'' In this case, there is no point in trying to analyze the hyphen, so if it fails, it will be treated as an unknown word. . That is, I -think- I-
Let can be an unknown word. In this sentence, it is a noun modifying component related to spirit. In this case, we get the following translation.

「会社は、元気のいいＩ　−ｔｈｉｎｋ−　Ｔ−ｃａｎ
精神でそのサービスを市場へ出荷する。」次に、比較のために下記ｂのハイフン語の解析を語末か
ら行う。簡単にするためそれぞれの語は最も代表的な品
詞分類だけを持つとする。``The company is a vibrant I-think- T-can
Deliver the service to the market in spirit. ” Next, for comparison, the hyphenated word b below is analyzed starting from the end of the word. For simplicity, each word is assumed to have only the most representative part-of-speech classification.

Ｉ　−ｔｈｉｎｋ−　Ｉ−ｃａｎ−ｄｏ−ｉｔはこのま
まの形では辞書に登録されていない未知語である。I-think- I-can-do-it is an unknown word that is not registered in the dictionary in its current form.

（以下余白）Ｚ＋ＵＺ　　　Ｏ　　Ｕ＋ＵＺＯ ○　　＞＞Ｃ／）　　　の　　○　　〉　　〉　　の　
　のＣＯ　　　ｖ　　の　　ｔｚ−Ｃ−２　　　マ　　
り　　ト　　ーこの場合はＳＧにまとまる。結局　Ｉ　
−ｔｈｉｎｋ−　Ｉ−ｃａｎ−ｄｏ−ｉｔは「私が私が
それをすることができると思う」という訳になり、名詞
修飾成分としてｓｐｉｒｉｔにかかる。(Left below) Z+UZ O U+UZO ○ >>C/) ○ 〉〉
of CO v tz-C-2 ma
In this case, it is grouped into SG. After all I
-think- I-can-do-it is translated as "I think I can do it," and it uses spirit as a noun modifying component.

Ｔｈｅ　　ｃｏＩＩｌｐａｎｙ　　ｍａｒｋｅｔｓ　　
ｊｌ．ｓ　　Ｓｃｒｖ．ｉｃｅｓ　　ｗｉｔｈ　　ａｐ
ｌｕｃｋｙ，　　Ｉ　−ｔｈｉｎｋ−　Ｉ　−ｃａｎ−
ｄｏ−ｉｔ　　ｓｐｉｒｊｔ．ｒ会社は，元気のいい私
が私がそれをすることができると思う桔神でそのサービ
スを市場へ出荷する。」このとき　Ｉ　−ｔｔ＋ｉｎｋ
−　Ｉ−ｃａｎ−ｄｏ−ｉｔの部分については構文解析
部で名詞または形容詞または動詞と椎定し、同じ順に優
先度を仮定して解析する。The coIIlpany markets
jl. s Scrv. ices with ap
lucky, I-think- I-can-
do-it spirit. The r company will market its services with Kishin, who is energetic and thinks I can do it. ”At this time I −tt+ink
- The I-can-do-it part is determined to be a noun, adjective, or verb by the syntactic analysis unit, and is analyzed in the same order assuming priority.

また、構文解析から渡されたハイフン語の先頭と末尾情
報を持っており，日本語を訳出するときに、ハイフンを
挿入することができる。It also has the beginning and end information of hyphenated words passed from the syntax analysis, and can insert hyphens when translating Japanese.

第３図は、未知語であるところのハイフン等の特殊文字
による複合語の形態素解析部での処理を示すフローチャ
ートである。FIG. 3 is a flowchart showing the processing in the morphological analysis unit of an unknown compound word containing special characters such as a hyphen.

まず、辞書引きをする。次に辞書にあるかどうかを判断
する。辞書になければ，全体を未知語処理する。次に個
々の語を辞書引きする。個々に未知語があれば未知語処
理する。そこで次に、辞書を情報をメモリかファイルに
格納する。First, look up a dictionary. Next, determine whether it is in the dictionary. If the word is not in the dictionary, the entire word is processed as an unknown word. Next, look up each word in a dictionary. If there are any individual unknown words, process the unknown words. The next step is to store the dictionary information in memory or in a file.

第４図は、未知語処理を示すフローチャートである。ま
ず、一時辞書を作或する。次に，訳語情報をセットし、
次に品詞情報をセットし、次に優先度情報をセットする
。そして、一時辞書から辞書引きする。FIG. 4 is a flowchart showing unknown word processing. First, create a temporary dictionary. Next, set the translation information,
Next, part of speech information is set, and then priority information is set. Then, look up the dictionary from the temporary dictionary.

第５図は、未知語であるところのハイフン等の特殊文字
による複合語の構文解析部での処理を示すフローチャー
トである。FIG. 5 is a flowchart showing the processing in the syntactic analysis unit of a compound word with special characters such as a hyphen, which is an unknown word.

まず，解析開始点を文末にセットし、構文解析を行う。First, set the analysis start point to the end of the sentence and perform syntax analysis.

解析文が文頭である場合は終了するが，文頭でない場合
は１語進める。次に、それが未知語のハイフン語である
かどうか判断し、ハイフン語であればハイフン語の解析
を行う。その解析結果が失敗である場合はそのハイフン
語を未知語とする。If the parsed sentence is at the beginning of a sentence, it ends, but if it is not at the beginning of a sentence, it advances one word. Next, it is determined whether it is an unknown hyphenated word or not, and if it is a hyphenated word, the hyphenated word is analyzed. If the analysis result is a failure, the hyphenated word is treated as an unknown word.

短一一果以上の説明から明らかなように、本発明によると、ハイ
フン等の特殊文字による複合語の未知語処理（形態素解
析処理）、構文解析処理，訳文生或処理がより円滑に行
なわれ、よりわかりやすい結果を得ることができる。Brief Summary As is clear from the above explanation, according to the present invention, unknown word processing (morphological analysis processing), syntactic analysis processing, and translation processing of compound words using special characters such as hyphens can be performed more smoothly. , it is possible to obtain more understandable results.

[Brief explanation of drawings]

第１図は、本発明による形態素解析方式、構文解析方式
及び形態素生成方式を備えた翻訳装置の一実施例を説明
するための構成図，第２図は、翻訳装置の翻訳本体部の
処理を示すフローチャート、第３図は，未知Ｍｌｉであ
るところのハイフン等の特殊文字による複合語の形態素
解析部での処理を示・すフローチャート，第４図は、未
知語処理を示すフローチャート、第５図は，未知語であ
るところのハイフン等の特殊文字による複合語の構文解
・折部での処理を示すフローチャートである。｛・・・ＣＲＴ，２・・・キーボード，３・・・ＯＣＲ
、４・・入力文書，５・・・スペルチェック部、６・・
・前編集部、７・・・翻訳本体部，８・・・システム辞
書、９・・・文法規則、１０・・・後編集部、ｌ１・・
・出力文書、■２・・・プリンタ。第３図第４図第５図FIG. 1 is a block diagram for explaining an embodiment of a translation device equipped with a morphological analysis method, a syntactic analysis method, and a morpheme generation method according to the present invention, and FIG. 2 shows the processing of the translation main body of the translation device. 3 is a flowchart showing the processing in the morphological analysis unit of a compound word with special characters such as a hyphen, which is an unknown Mli. FIG. 4 is a flowchart showing unknown word processing, and FIG. is a flowchart showing processing in the syntactic analysis/folding section of a compound word with special characters such as a hyphen, which is an unknown word. {...CRT, 2...Keyboard, 3...OCR
, 4... Input document, 5... Spell check section, 6...
・Pre-editing department, 7... Translation body section, 8... System dictionary, 9... Grammar rules, 10... Post-editing department, l1...
- Output document, ■2...Printer. Figure 3 Figure 4 Figure 5

Claims

[Claims] 1. In a natural language analysis system such as machine translation that has at least a system dictionary for translation, words are mutually combined to form one compound word in a text to be analyzed by a morphological analysis unit. When there is a compound word formed by special characters such as a hyphen "-", the compound word is looked up in the dictionary provided by the system, and if it does not exist in the dictionary, the entire compound word is treated as an unknown word and translated by parsing etc. A morphological analysis method characterized by registering data in a temporary translation dictionary used for processing. 2. In a natural language analysis system such as machine translation that has at least a system dictionary for translation, the text to be analyzed by the morphological analysis unit contains hyphens such as "-" that combine words to form one compound word. When there is a compound word formed by special characters, the compound word is looked up in a dictionary provided by the system, and when it does not exist in the dictionary, the individual constituent elements that make up the compound word are looked up in the dictionary, and those words are looked up in the dictionary. A morphological analysis method characterized in that information can be used for translation processing such as syntactic analysis. 3. In a natural language analysis system such as a machine translation system that has at least a system dictionary for translation, the text to be analyzed by the syntactic analysis unit contains characters such as hyphens "-" that combine words to form one compound word. When there is a compound word formed by special characters and the compound word does not exist in the dictionary of the system, the compound word is parsed using the sequence of constituent words, and if the parsing fails, the entire compound word is parsed. A syntactic analysis method that is characterized by processing the words as unknown words and proceeding with the syntactic analysis. 4. In a natural language analysis system such as a machine translation system that has at least a system dictionary for translation, the text to be analyzed by the syntactic analysis unit contains characters such as hyphens "-" that combine words to form one compound word. When there is a compound word formed by special characters and the compound word does not exist in the dictionary of the system, the compound word is parsed using the sequence of constituent words, and if the parsing is successful, the compound word is A syntactic analysis method that is characterized by proceeding with syntactic analysis by inferring that the whole is a noun, adjective, or verb. 5. In a natural language analysis system such as a machine translation system that has at least a system dictionary for translation, the text to be analyzed by the syntactic analysis unit contains characters such as hyphens "-" that combine words to form one compound word. When there is a compound word formed by special characters and the compound word does not exist in the dictionary of the system, the special characters such as hyphens are replaced with spaces and the compound word is made into a simple multiple word with no special meaning. A syntactic analysis method characterized by proceeding with syntactic analysis as a sequence of words. 6. In a natural language analysis system such as machine translation that has at least a system dictionary for translation, in the morpheme generation unit,
A compound word formed by special characters such as a hyphen "-" that joins words together to form a compound word is lexicographically looked up and parsed using the individual constituents of the compound word,
A morpheme generation method characterized in that when translating the individual constituent elements, the special characters are inserted between the individual constituent elements.