JP5245291B2

JP5245291B2 - Document analysis apparatus, document analysis method, and computer program

Info

Publication number: JP5245291B2
Application number: JP2007138379A
Authority: JP
Inventors: 智子大熊; 博増市
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2007-05-24
Filing date: 2007-05-24
Publication date: 2013-07-24
Anticipated expiration: 2027-05-24
Also published as: JP2008293295A

Description

本発明は、文書解析装置、および文書解析方法、並びにコンピュータ・プログラムに関する。さらに詳細には、日本語の文書データを入力して、各単語の意味などを正しく解析する処理を実行する文書解析装置、および文書解析方法、並びにコンピュータ・プログラムに関する。 The present invention relates to a document analysis apparatus, a document analysis method, and a computer program. More specifically, the present invention relates to a document analysis apparatus, a document analysis method, and a computer program that execute processing for inputting Japanese document data and correctly analyzing the meaning of each word.

本発明は、具体的には例えば自動翻訳処理システムに適用可能であり、正しい翻訳を行なうために日本語を構成する文字列の意味を正確に把握して正確な翻訳を可能とした文書解析を行なう文書解析装置、および文書解析方法、並びにコンピュータ・プログラムに関する。 Specifically, the present invention is applicable to, for example, an automatic translation processing system, and performs document analysis that enables accurate translation by accurately grasping the meaning of character strings constituting Japanese in order to perform correct translation. The present invention relates to a document analysis apparatus, a document analysis method, and a computer program.

例えば日本語を外国語に翻訳する自動翻訳システムでは、入力する日本語データを解析して、各文字列（単語など）の意味を正しく認識して対応する外国語、例えば英語などに翻訳する処理が必要となる。 For example, an automatic translation system that translates Japanese into a foreign language analyzes input Japanese data and correctly recognizes the meaning of each character string (word, etc.) and translates it into the corresponding foreign language, such as English. Is required.

しかしながら、日本語には、文字列のみからは複数の意味に解釈できるあいまいな表現が含まれる。日本語を構成する名詞の中には普通の体言として働くものも一部存在し、その結果、係り受け関係に曖昧性が生じる。例えば、形式名詞は、他の語や句に接続することによって自立することが可能な名詞群であり、例えば、「こと」、「の」、「際」、「みぎり」、「ところ」などがある。 However, Japanese includes ambiguous expressions that can be interpreted as multiple meanings from the character string alone. Some of the nouns that make up Japanese also work as ordinary expressions, resulting in ambiguity in dependency relationships. For example, formal nouns are a group of nouns that can stand on their own by connecting to other words and phrases, for example, “Koto”, “No”, “When”, “Migiri”, “Place”, etc. There is.

具体的な例として、「ところ」を用いた文について説明する。
（１）その建物は彼が調査したところに欠陥があった。
（２）彼が調査したところ、その建物には欠陥があった。
（１）における「ところ」は「彼が調査した」の先行詞としてあらわれているが、（２）における「ところ」は「彼が調査した」に接続して、句をまとめ上げ、副詞句相当の連用修飾成分を形成する形成子としてあらわれている。この係り受け関係はそのまま語義の差異にも反映される。例えば、（１）の「ところ」は場所を意味しているが、（２）の「ところは」状況を意味している。従って、例えば自動翻訳処理において、上記の日本文を解釈する場合、この曖昧性を解消しなければ、正しい翻訳ができないことになる。 As a specific example, a sentence using “place” will be described.
(1) The building was defective when he investigated.
(2) When he investigated, the building was defective.
“Place” in (1) appears as an antecedent of “He surveyed”, but “Place” in (2) is connected to “He surveyed” to summarize the phrases and correspond to adverb phrases It appears as a former that forms a continuous modifying component. This dependency relationship is directly reflected in the difference in meaning. For example, “Place” in (1) means a place, but “Place” in (2) means a situation. Therefore, for example, when interpreting the above Japanese sentence in automatic translation processing, correct translation cannot be performed unless this ambiguity is resolved.

日本語の語義や構文の曖昧性を解消するための処理を開示した従来技術としては、下記のような例えば以下のような文献がある。特許文献１（特開平５−２９８３４９）は、係り受けの曖昧性解消を行うため、共起事例の統計情報（共起関係の出現頻度、それぞれの語の出現頻度、それぞれの語にとっての共起語の異なり語数）および共起事例の重み付け（共起関係の強さ、名詞の係り自由度、述語の受け自由度）を共起関係辞書に記憶し、その結果を基に共起事例の重み付けを計算し、係り受け解析においては、共起事例の重み付けに基づいて、共起事例と一致する係り受け候補の尤もらしさを評価する構成について開示している。 For example, the following documents include the following as conventional techniques disclosing processing for solving the ambiguity of Japanese meaning and syntax. Patent Document 1 (Japanese Patent Laid-Open No. 5-298349) discloses co-occurrence example statistical information (co-occurrence relationship appearance frequency, appearance frequency of each word, co-occurrence for each word in order to resolve dependency ambiguity. (Number of different words) and weight of co-occurrence cases (strength of co-occurrence relations, degree of freedom of nouns, degree of freedom of predicate) are stored in the co-occurrence relation dictionary, and weights of co-occurrence cases are based on the results In the dependency analysis, a configuration is disclosed in which the likelihood of a dependency candidate matching the co-occurrence case is evaluated based on the weight of the co-occurrence case.

また、特許文献２（特開平６−３０１７１６）は、確率モデルにしたがった尤度付けを用いて、複数個の解から適切な解を選択する形態素解析の尤度付け手法において、入力文字列に照らした各解の条件付き確率を尤度に加味することにより、きめ細かい尤度付けを可能にし、曖昧性を低減する構成を開示している。 Further, Patent Document 2 (Japanese Patent Laid-Open No. 6-301716) uses a likelihood model according to a probability model to select an appropriate solution from a plurality of solutions. A configuration is disclosed that enables fine likelihooding and reduces ambiguity by adding the conditional probability of each illuminated solution to the likelihood.

さらに、特許文献３（特開２０００−３３０９８７）は、句構造解析結果を入力し、この句構造解析結果から二つの名詞の確実な係り受け関係を抽出し、これに基づいて、各名詞の修飾・被修飾の出現頻度の値を格納する。それ以外の係り受け関係を、出現頻度の値に基づいて修飾語・被修飾語を推定することによって、曖昧性を低減する構成を開示している。 Furthermore, Patent Document 3 (Japanese Patent Laid-Open No. 2000-330987) inputs a phrase structure analysis result, extracts a reliable dependency relationship between two nouns from the phrase structure analysis result, and modifies each noun based on this. Stores the value of the appearance frequency of the decoration. A configuration that reduces ambiguity by estimating modifiers / modifiers based on the appearance frequency values of other dependency relationships is disclosed.

しかし、先に説明した、「こと」、「の」、「際」、「みぎり」、「ところ」などの係り受け関係に曖昧性が生じる、例えば、形式名詞のような曖昧性は、出現頻度や確率モデルなどでは解消できない。なぜなら、前述した２つの文、すなわち、
（１）その建物は彼が調査したところに欠陥があった。
（２）彼が調査したところ、その建物には欠陥があった。
ひれらの（１）（２）の文において出現する単語は完全に同じであるため、従来技術に開示されたような出現頻度や確率モデルなどの手法では優先順位付けや絞込みが困難だからである。
特開平５−２９８３４９号公報特開平６−３０１７１６号公報特開２０００−３３０９８７号公報 However, ambiguity arises in the dependency relations such as “Koto”, “No”, “When”, “Migiri”, and “Place” explained earlier. For example, ambiguity such as formal nouns appears. It cannot be solved by frequency or probability model. Because the two sentences mentioned above, namely
(1) The building was defective when he investigated.
(2) When he investigated, the building was defective.
This is because the words appearing in the sentences of (1) and (2) are completely the same, and it is difficult to prioritize and narrow down with the methods such as the appearance frequency and probability model disclosed in the prior art. .
Japanese Patent Laid-Open No. 5-298349 JP-A-6-301716 JP 2000-330987 A

本発明は、上述の問題点に鑑みてなされたものであり、「こと」、「の」、「際」、「みぎり」、「ところ」などの係り受け関係に曖昧性が生じる形式名詞と呼ばれる名詞群について、所定の分類ルールと処理ルールを適用することによって構文意味解析結果の曖昧性を減少させ、正確な意味の把握を行なう文書解析装置、および文書解析方法、並びにコンピュータ・プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and is a formal noun that causes ambiguity in the dependency relationship such as “to”, “no”, “boom”, “migiri”, “place”, and the like. Provides a document analysis device, document analysis method, and computer program that reduce the ambiguity of syntactic and semantic analysis results by applying predetermined classification rules and processing rules to the noun group called, and grasp the correct meaning The purpose is to do.

本発明の第１の側面は、
入力文の構文意味解析処理を実行する構文意味解析部と、
入力文の構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行する役割判別部と、
前記役割判別部の判別結果を入力し、判別結果に従った構文意味解析結果を出力する判別結果出力部と、
を有することを特徴とする文書解析装置にある。 The first aspect of the present invention is:
A syntactic and semantic analysis unit for executing syntactic and semantic analysis processing of an input sentence;
A role discriminator for executing a discriminating process of whether the role of the formal noun included in the syntactic and semantic analysis result of the input sentence is an antecedent or an antecedent;
A discrimination result output unit for inputting the discrimination result of the role discrimination unit and outputting a syntax-semantic analysis result according to the discrimination result;
The document analysis apparatus is characterized by comprising:

さらに、本発明の文書解析装置の一実施例において、前記役割判別部は、形式名詞が独立語としての性質が強いか機能語としての性質が強いかによって分類した分類情報を有する形式名詞辞書に基づいて、前記構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行する構成であることを特徴とする。 Further, in one embodiment of the document analysis apparatus of the present invention, the role discriminating unit is a formal noun dictionary having classification information classified according to whether the formal noun has a strong property as an independent word or a strong property as a functional word. On the basis of the above, it is characterized in that the process of determining whether the role of the formal noun included in the result of the syntactic and semantic analysis is an antecedent or a constructor is executed.

さらに、本発明の文書解析装置の一実施例において、前記役割判別部は、形式名詞が、国語辞書に場所、空間、部分、方向、方角のいずれかの単語が含まれた語義文が対応付けられて登録されている場合、または、形式名詞が前記構文意味解析結果において単独で係り先と格関係にある場合は、該形式名詞は形成子であると判断し、上記以外の場合は先行詞であると判断する処理を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the document analysis apparatus of the present invention, the role determination unit associates formal nouns with meaning sentences including words of place, space, part, direction, or direction in a national language dictionary. If the formal noun has a case relationship with the destination in the result of syntactic and semantic analysis, it is determined that the formal noun is a former, otherwise it is an antecedent. It is the structure which performs the process which judges that it is.

さらに、本発明の文書解析装置の一実施例において、前記文書解析装置は、さらに、形式名詞が独立語としての性質が強いか機能語としての性質が強いかによって分類した分類情報を有する形式名詞辞書を作成する辞書作成部を有し、前記辞書作成部は、形式名詞を登録した国語辞書の形式名詞に対応する品詞情報および語義文情報に基づいて形式名詞の分類を行い、形式名詞辞書の作成を行なう構成であることを特徴とする。 Furthermore, in an embodiment of the document analysis apparatus of the present invention, the document analysis apparatus further includes a formal noun having classification information classified according to whether the formal noun is strong as an independent word or strong as a function word. A dictionary creation unit that creates a dictionary, the dictionary creation unit classifies formal nouns based on part-of-speech information and word-sentence information corresponding to formal nouns of a national language dictionary in which formal nouns are registered; It is the structure which performs creation.

さらに、本発明の文書解析装置の一実施例において、前記文書解析装置は、さらに、前記役割判別部の判別結果を入力し、判別結果に従った翻訳処理を実行する翻訳処理部を有することを特徴とする。 Furthermore, in an embodiment of the document analysis apparatus of the present invention, the document analysis apparatus further includes a translation processing unit that inputs a discrimination result of the role discrimination unit and executes a translation process according to the discrimination result. Features.

さらに、本発明の第２の側面は、
文書解析装置における文書解析方法であり、
構文意味解析部が、入力文の構文意味解析処理を実行する構文意味解析ステップと、
役割判別部が、入力文の構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行する役割判別ステップと、
判別結果出力部が、前記役割判別部の判別結果を入力し、判別結果に従った構文意味解析結果を出力する判別結果出力ステップと、
を有することを特徴とする文書解析方法にある。 Furthermore, the second aspect of the present invention provides
A document analysis method in a document analysis device,
A syntax and semantic analysis step in which the syntax and semantic analysis unit executes a syntax and semantic analysis process of the input sentence;
A role discriminating step in which the role discriminating unit executes a discriminating process of whether the role of the formal noun included in the syntactic and semantic analysis result of the input sentence is an antecedent or a constructor;
A discrimination result output unit that inputs a discrimination result of the role discrimination unit and outputs a syntactic and semantic analysis result according to the discrimination result; and
The document analysis method is characterized by comprising:

さらに、本発明の文書解析方法の一実施例において、前記役割判別ステップは、形式名詞が独立語としての性質が強いか機能語としての性質が強いかによって分類した分類情報を有する形式名詞辞書に基づいて、前記構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行するステップであることを特徴とする。 Furthermore, in one embodiment of the document analysis method of the present invention, the role determining step is performed on the formal noun dictionary having classification information classified according to whether the formal noun is strong as an independent word or strong as a function word. On the basis of this, it is a step of executing a process of discriminating whether the role of the formal noun included in the result of the syntactic and semantic analysis is an antecedent or a constructor.

さらに、本発明の文書解析方法の一実施例において、前記役割判別ステップは、形式名詞が、国語辞書に場所、空間、部分、方向、方角のいずれかの単語が含まれた語義文が対応付けられて登録されている場合、または、形式名詞が前記構文意味解析結果において単独で係り先と格関係にある場合は、該形式名詞は形成子であると判断し、上記以外の場合は先行詞であると判断する処理を実行するステップであることを特徴とする。 Furthermore, in one embodiment of the document analysis method of the present invention, the role determination step associates formal nouns with meaning sentences including words of place, space, part, direction, or direction in the national language dictionary. If the formal noun has a case relationship with the destination in the result of syntactic and semantic analysis, it is determined that the formal noun is a former, otherwise it is an antecedent. It is the step which performs the process which judges that it is.

さらに、本発明の文書解析方法の一実施例において、前記文書解析方法は、さらに、辞書作成部が、形式名詞が独立語としての性質が強いか機能語としての性質が強いかによって分類した分類情報を有する形式名詞辞書を作成する辞書作成ステップを有し、前記辞書作成ステップは、形式名詞を登録した国語辞書の形式名詞に対応する品詞情報および語義文情報に基づいて形式名詞の分類を行い、形式名詞辞書の作成を行なうステップであることを特徴とする。 Furthermore, in one embodiment of the document analysis method of the present invention, the document analysis method is further classified by the dictionary creation unit according to whether the formal noun is strong as an independent word or as a function word. A dictionary creating step for creating a formal noun dictionary having information, wherein the dictionary creating step classifies formal nouns based on part-of-speech information and semantic text information corresponding to formal nouns in a national language dictionary in which formal nouns are registered. This is a step of creating a formal noun dictionary.

さらに、本発明の文書解析方法の一実施例において、前記文書解析方法は、さらに、翻訳処理部が、前記役割判別部の判別結果を入力し、判別結果に従った翻訳処理を実行する翻訳処理ステップを有することを特徴とする。 Furthermore, in an embodiment of the document analysis method of the present invention, the document analysis method further includes a translation process in which a translation processing unit inputs a discrimination result of the role discrimination unit and executes a translation process according to the discrimination result It has a step.

さらに、本発明の第３の側面は、
文書解析装置において文書解析処理を実行させるコンピュータ・プログラムであり、
構文意味解析部に、入力文の構文意味解析処理を実行させる構文意味解析ステップと、
役割判別部に、入力文の構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行させる役割判別ステップと、
判別結果出力部に、前記役割判別部の判別結果を入力し、判別結果に従った構文意味解析結果を出力させる判別結果出力ステップと、
を有することを特徴とするコンピュータ・プログラムにある。 Furthermore, the third aspect of the present invention provides
A computer program for executing document analysis processing in a document analysis device;
A syntax and semantic analysis step for causing the syntax and semantic analysis unit to execute a syntax and semantic analysis process of the input sentence;
A role determination step for causing the role determination unit to execute a determination process as to whether the role of the formal noun included in the syntactic and semantic analysis result of the input sentence is an antecedent or a predecessor;
A discrimination result output step for inputting a discrimination result of the role discrimination unit to a discrimination result output unit and outputting a syntax-semantic analysis result according to the discrimination result;
There is a computer program characterized by comprising:

なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムにおいてコンピュータ可読な形式で提供する記憶媒体などによって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ・システム上でプログラムに応じた処理が実現される。 The computer program of the present invention is a computer program that can be provided by, for example, a storage medium provided in a computer-readable format in a general-purpose computer system that can execute various program codes. By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer system.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Other objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described later and the accompanying drawings. In this specification, the system is a logical set configuration of a plurality of devices, and is not limited to one in which the devices of each configuration are in the same casing.

本発明の構成によれば、入力文の構文意味解析処理を実行し、構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行して、判別結果に従った構文意味解析結果を出力する。形式名詞の役割が先行詞であるか形成子であるかの判別処理においては、形式名詞が独立語としての性質が強いか機能語としての性質が強いかによって分類した分類情報を有する形式名詞辞書に基づいて判別する。具体的には、形式名詞が、国語辞書に場所、空間、部分、方向、方角のいずれかの単語が含まれた語義文が対応付けられて登録されている場合、または、形式名詞が前記構文意味解析結果において単独で係り先と格関係にある場合は、形式名詞は形成子であると判断し、上記以外の場合は先行詞であると判断する。この判断結果に基づいてより精度の高い構文意味解析結果や、あるいはこの構文意味解析結果に基づく精度の高い翻訳結果を得ることが可能となる。 According to the configuration of the present invention, the syntactic / semantic analysis process of the input sentence is executed, the determination process is executed to determine whether the role of the formal noun included in the syntactic / semantic analysis result is an antecedent or a predecessor. Outputs the semantic analysis result according to the result. In the process of discriminating whether the role of a formal noun is an antecedent or a former, a formal noun dictionary having classification information classified according to whether the formal noun is strong as an independent word or strong as a function word Determine based on. Specifically, when a formal noun is registered in the Japanese language dictionary in association with a word meaning sentence including any word of place, space, part, direction, or direction, or If the semantic analysis result alone has a case relationship with the destination, the formal noun is determined to be a former, and otherwise, it is determined to be an antecedent. Based on this determination result, it is possible to obtain a more accurate syntactic and semantic analysis result or a highly accurate translation result based on this syntactic and semantic analysis result.

以下、図面を参照しながら本発明の実施形態に係る文書解析装置、および文書解析方法、並びにコンピュータ・プログラムの詳細について説明する。 Details of a document analysis apparatus, a document analysis method, and a computer program according to embodiments of the present invention will be described below with reference to the drawings.

図１に本発明の一実施例に係る文書解析装置の構成を示す。文書解析装置１００は、図１に示すように、文入力部１０１、構文意味解析部１０２、辞書作成部１０３、国語辞書格納部１０４、形式名詞辞書格納部１０５、役割判別部１０６、判別結果出力部１０７を有する。本発明の文書解析装置１００は、複数の異なる意味に解釈される可能性のあるあいまいな表現を含む文を入力した場合においても、正しい構文意味解析結果を出力する装置である。以下、各構成部の詳細および実行する処理について説明する。 FIG. 1 shows the configuration of a document analysis apparatus according to an embodiment of the present invention. As shown in FIG. 1, the document analysis apparatus 100 includes a sentence input unit 101, a syntax and semantic analysis unit 102, a dictionary creation unit 103, a national language dictionary storage unit 104, a formal noun dictionary storage unit 105, a role determination unit 106, and a determination result output. Part 107. The document analysis apparatus 100 of the present invention is an apparatus that outputs a correct syntactic and semantic analysis result even when a sentence including an ambiguous expression that may be interpreted into a plurality of different meanings is input. Hereinafter, details of each component and processing to be executed will be described.

［文入力部］
文入力部１０１は、構文意味解析処理対象となる文を入力する。例えば、データベース中に記録された文、あるいは、ユーザによる入力文など入力形態は様々である。以下、具体的な処理例として、下記の文が入力されたものとして説明をする。
（入力文）「その建物は彼が調査したところに欠陥があった。」 [Sentence input part]
The sentence input unit 101 inputs a sentence that is a target of syntax and semantic analysis processing. For example, there are various input forms such as a sentence recorded in a database or an input sentence by a user. Hereinafter, as a specific processing example, it is assumed that the following sentence is input.
(Input sentence) “The building was flawed where he investigated.”

上述の文は、文字列のみからは複数の意味に解釈できるあいまいな表現が含まれる。すなわち、「ところ」は、他の語や句に接続することによって自立することが可能な形式名詞であり、「ところ」は、場所を意味している場合や、状況を意味している場合がある。 The above sentence includes ambiguous expressions that can be interpreted from a character string alone into a plurality of meanings. In other words, "Place" is a formal noun that can stand on its own by connecting to other words and phrases, and "Place" may mean a place or a situation. is there.

［構文意味解析部］
次に構文意味解析部１０２は、入力された文の構文意味解析を行う。構文意味解析処理について説明する。日本語や英語を始めとする各種の言語で記述される自然言語は、本来抽象的であいまい性が高い性質を持つが、文章を数学的に取り扱うことにより、コンピュータ処理を行なうことができる。この結果、機械翻訳や対話システム、検索システム、文書解析装置など、自動化処理により自然言語に関するさまざまなアプリケーション／サービスが実現される。かかる自然言語処理は一般に、形態素解析、構文解析、意味解析、文脈解析という各処理フェーズに区分される。 [Syntax / Semantic Analysis]
Next, the syntax and semantic analysis unit 102 performs syntax and semantic analysis of the input sentence. The syntactic and semantic analysis process will be described. Natural languages written in various languages such as Japanese and English are inherently abstract and have high qualities, but they can be processed computerically by handling sentences mathematically. As a result, various applications / services related to natural language are realized by automated processing such as machine translation, dialogue system, search system, and document analysis device. Such natural language processing is generally divided into processing phases of morphological analysis, syntax analysis, semantic analysis, and context analysis.

形態素解析では、文を意味的最小単位である形態素（ｍｏｒｐｈｅｍｅ）に分節して品詞の認定処理を行なう。構文解析では、文法規則などを基に句構造などの文の構造を解析する。文法規則が木構造であることから、構文解析結果は一般に個々の形態素が係り受け関係などを基にして接合された木構造となる。意味解析では、文中の語の語義（概念）や、語と語の間の意味関係などに基づいて、文が伝える意味を表現する意味構造を求めて、意味構造を合成する。また、文脈解析では、文の系列である文章（談話）を解析の基本単位とみなして、文間の意味的なまとまりを得て談話構造を構成する。 In morpheme analysis, a sentence is segmented into morphemes which are the smallest semantic units, and part-of-speech recognition processing is performed. In syntax analysis, sentence structure such as phrase structure is analyzed based on grammatical rules. Since the grammatical rule is a tree structure, the parsing result generally has a tree structure in which individual morphemes are joined based on a dependency relationship. In semantic analysis, a semantic structure that expresses the meaning conveyed by a sentence is obtained based on the meaning (concept) of the words in the sentence and the semantic relationship between words, and the semantic structure is synthesized. In context analysis, a sentence series (discourse) is regarded as a basic unit of analysis, and a discourse structure is constructed by obtaining a semantic group between sentences.

構文解析及び意味解析は、自然言語処理の分野において、対話システム、機械翻訳、文書校正支援、文書要約などのアプリケーションを実現する上で必要不可欠の技術であるとされている。 In the field of natural language processing, syntactic analysis and semantic analysis are considered to be indispensable techniques for realizing applications such as dialog systems, machine translation, document proofreading, and document summarization.

構文解析では、自然言語文を受け取り、文法規則に基づいて単語（文節）間の係り受け関係を決定する処理を行なう。構文解析結果は、依存構造と呼ばれる木構造（依存木）の形態で表現することができる。また、意味解析では、単語（文節）間の係り受け関係に基づいて文中の格関係を決定する処理を行なうことができる。ここで言う格関係とは、文を構成する各要素が持つ、主語（ＳＵＢＪ）、目的語（ＯＢＪ）といった文法上の役割のことを指す。また、文の時制や様相、話法などを判定する処理を意味解析が含む場合もある。 In the syntax analysis, a natural language sentence is received, and a dependency relationship between words (sentences) is determined based on grammatical rules. The parsing result can be expressed in the form of a tree structure (dependency tree) called a dependency structure. In the semantic analysis, it is possible to perform a process of determining a case relationship in a sentence based on a dependency relationship between words (sentences). The case relationship here refers to a grammatical role such as a subject (SUBJ) and an object (OBJ) possessed by each element constituting a sentence. In addition, semantic analysis may include processing for determining sentence tense, appearance, speech, and the like.

構文意味解析システム例については、例えば、非特許文献「増市，大熊，"ＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒに基づく実用的な日本語解析システムの構築"，自然言語処理，Ｖｏｌ．１０，Ｎｏ．２，ｐｐ．７９−１０９（２００３）」に詳細が記述されているＬＦＧシステムがある。本発明の文書解析装置１００の構文意味解析部１０２は、例えばこのＬＦＧに基づく自然言語処理システムを適用することが可能である。 For examples of syntactic and semantic analysis systems, see, for example, non-patent literature “Masuichi, Okuma,“ Construction of a practical Japanese analysis system based on Lexical Functional Grammar ”, Natural Language Processing, Vol. 79-109 (2003) "is an LFG system. The syntactic and semantic analysis unit 102 of the document analysis apparatus 100 of the present invention can apply a natural language processing system based on this LFG, for example.

図２に、ＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ（ＬＦＧ）ＬＦＧに基づく自然言語処理を実行する構文意味解析システム３００の構成を示す。形態素解析部３０２は、日本語など特定の言語に関する形態素ルール３０２Ａと形態素辞書３０２Ｂを持ち、入力文を意味的最小単位である形態素に分節して品詞の認定処理を行なう。例えば、「私の娘は英語を話します。」という文が入力された場合、形態素解析結果として、「私｛Ｎｏｕｎ｝の｛ｕｐ｝娘｛Ｎｏｕｎ｝は｛ｕｐ｝英語｛Ｎｏｕｎ｝を｛ｕｐ｝話す｛Ｖｅｒｂ１｝｛ｔｒ｝ます｛ｊｐ｝。｛ｐｔ｝」が出力される。 FIG. 2 shows a configuration of a syntax and semantic analysis system 300 that executes natural language processing based on Lexical Functional Grammar (LFG) LFG. The morpheme analysis unit 302 has a morpheme rule 302A and a morpheme dictionary 302B relating to a specific language such as Japanese, and performs a part-of-speech recognition process by segmenting an input sentence into morphemes that are semantic minimum units. For example, if the sentence “My daughter speaks English” is input, the {up} daughter {Noun} of I {Noun} will {up} English {Noun} as {up} as the morphological analysis result. } Speak {Verb1} {tr} Mas {jp}. {Pt} "is output.

このような形態素解析結果は、次いで、構文・意味解析部３０３に入力される。構文・意味解析部３０３は、文法ルール３０３Ａや結合価辞書３０３Ｂなどの辞書を持ち、文法ルールなどに基づく句構造の解析や、文中の語の語義や語と語の間の意味関係などに基づいて文が伝える意味を表現する意味構造の解析を行なう（結合価辞書は動詞と主語などの文中の他の構成要素との関係を記述したものであり、述部とそれに係る語の意味関係を抽出することができる）。そして、構文解析した結果として、単語や形態素などからなる文章の句構造を木構造として表した"ｃ−ｓｔｒｕｃｔｕｒｅ（ｃｏｎｓｔｉｔｕｅｎｔｓｔｒｕｃｔｕｒｅ）"と、主語、目的語などの格構造に基づいて入力文を疑問文、過去形、丁寧文など意味的・機能的に解析した結果として"ｆ−ｓｔｒｕｃｔｕｒｅ（ｆｕｎｃｔｉｏｎａｌｓｔｒｕｃｔｕｒｅ）"を出力する。 Such a morphological analysis result is then input to the syntax / semantic analysis unit 303. The syntax / semantic analysis unit 303 has dictionaries such as a grammar rule 303A and a valence dictionary 303B, and is based on analysis of phrase structure based on the grammar rule and the like, meaning of words in a sentence, semantic relation between words, and the like. Analyzing the semantic structure expressing the meaning conveyed by the sentence (The valence dictionary describes the relationship between verbs and other components in the sentence such as the subject, and the semantic relation between the predicate and the related word. Can be extracted). As a result of the parsing, the input sentence is questioned based on “c-structure (constituent structure)” representing a phrase structure of a sentence composed of words, morphemes and the like as a tree structure, and a case structure such as a subject and an object. “F-structure (functional structure)” is output as a result of semantically and functionally analyzing sentences, past tense, polite sentences, and the like.

すなわち、ｃ−ｓｔｒｕｃｔｕｒｅは、自然言語文の構造を、文の形態素を上位のフレーズへとまとめあげることによって木構造として表現するものであり、ｆ−ｓｔｒｕｃｔｕｒｅは、文法機能の概念に基づき、文の格構造、時制、様相、話法などの意味情報を属性―属性値のマトリックス構造で表現するものである。 In other words, c-structure expresses the structure of a natural language sentence as a tree structure by collecting sentence morphemes into upper phrases, and f-structure is based on the concept of grammatical functions. Semantic information such as structure, tense, aspect, and speech is expressed in a matrix structure of attribute-attribute value.

例えば、本例では、文入力部１０１において入力された入力文は、
（入力文）「その建物は彼が調査したところに欠陥があった。」
である。この入力文の構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅを図３、図４に示す。ｆ−ｓｔｒｕｃｔｕｒｅは、文法的な機能を明確に表現したものであり、文法的な機能名、意味的形式、並びに特徴シンボルにより構成される。ｆ−ｓｔｒｕｃｔｕｒｅを参照することにより、主語（ｓｕｂｊｅｃｔ）、目的語（ｏｂｊｅｃｔ）、補語（ｃｏｍｐｌｅｍｅｎｔ）、修飾語（ａｄｊｕｎｃｔ）といった意味理解を得ることができる。ｆ−ｓｔｒｕｃｔｕｒｅは、木構造として示されるｃ−ｓｔｒｕｃｔｕｒｅの各節点に付随する素性の集合であり、図３、図４に示すように属性−属性値のマトリックスの形で表現される。すなわち、［］で囲まれた中の左側は素性（属性）の名前であり、右側は素性の値（属性値）である。 For example, in this example, the input sentence input in the sentence input unit 101 is
(Input sentence) “The building was flawed where he investigated.”
It is. FIG. 3 and FIG. 4 show the f-structure as a result of syntactic and semantic analysis of this input sentence. The f-structure clearly expresses a grammatical function, and includes a grammatical function name, a semantic form, and a feature symbol. By referring to f-structure, it is possible to obtain an understanding of the meaning of a subject, an object, an complement, a modifier, and so on. The f-structure is a set of features attached to each node of the c-structure shown as a tree structure, and is expressed in the form of an attribute-attribute value matrix as shown in FIGS. That is, the left side in [] is a feature (attribute) name, and the right side is a feature value (attribute value).

（入力文）「その建物は彼が調査したところに欠陥があった。」
に対する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅは図３または図４に示す構成となる。 (Input sentence) “The building was flawed where he investigated.”
The f-structure as the result of the syntactic and semantic analysis for is configured as shown in FIG.

図３、図４とも、
（入力文）「その建物は彼が調査したところに欠陥があった。」
上記の同じ入力文に対応する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅであるが、形式名詞である「ところ」について、異なる解析を行った結果を示している。
図３の解析結果は「ところ」が先行詞であるため、埋め込み句内の動詞「調査する」の目的語である解釈を示している。
図４の解析結果は「ところ」は形成子であるため、埋め込み句内の動詞「調査する」の目的語はこの文において省略されていることを示している。 Both FIG. 3 and FIG.
(Input sentence) “The building was flawed where he investigated.”
Although it is f-structure as a syntactic and semantic analysis result corresponding to the same input sentence, it shows the result of performing different analysis on “place” which is a formal noun.
The analysis result of FIG. 3 shows the interpretation that is the object of the verb “investigate” in the embedded phrase because “place” is an antecedent.
The analysis result of FIG. 4 indicates that the object “verb” in the embedded phrase is omitted in this sentence because “place” is a constructor.

［辞書作成部］
辞書作成部１０３は、形式名詞と呼ばれる名詞を分類して形式名詞辞書格納部１０５に格納する。一般に、形式名詞とは文法的な役割を担う助詞、助動詞などの機能語と、何かを指し示す役割を持つ独立語の中間にあるような働きをする語であるといわれている。本発明の文書解析装置１００の辞書作成部１０３は、形式名詞を独立語としての性質が強いか、機能語としての性質が強いかによって分類する。 [Dictionary creation part]
The dictionary creation unit 103 classifies nouns called formal nouns and stores them in the formal noun dictionary storage unit 105. In general, formal nouns are said to be words that work in the middle of function words such as particles and auxiliary verbs that play a grammatical role and independent words that have a role to indicate something. The dictionary creation unit 103 of the document analysis apparatus 100 of the present invention classifies formal nouns depending on whether they are strong as independent words or strong as functional words.

ここで形式名詞群は非特許文献「水谷、"稿本国文法体系"，東京女子大学日本文学科（１９９１）」ｐ．１０４に挙げられている下記の語を想定する。
「場合」，「時」，「挙げ句」，「際」，「みぎり」，「限り」，「あいだ」，「ところ」，「うち」，「うえ」
但し、上記に含まれていない語でも、本手法を用いれば形式名詞辞書への追加は可能である。 Here, the formal noun group is a non-patent document “Mizutani,“ The Manuscript Grammar System ”, Tokyo Women's University, Department of Japanese Literature (1991)” p. Assume the following words listed in 104:
“Case”, “Time”, “Claim”, “When”, “Migiri”, “Limit”, “Aida”, “Place”, “House”, “Up”
However, words that are not included above can be added to the formal noun dictionary using this method.

「場合」を例にして図５に示すフローチャートと図６に示す国語辞書を参照して語の分類手順の説明を行う。まず、Ｓ１０１において、図６に示すような国語辞書を格納した国語辞書格納部１０４に、ここで対象とする語である「場合」を問い合わせる。国語辞書には、図６に示すように、様々な語を「見出し語」として、「見出し語」に対応する品詞情報や語義文、用例などが記述されている。辞書作成部１０３は、国語辞書から、品詞情報と語義文を取得する。ステップＳ１０１では、例えば［場合］に対応する品詞情報と語義文を取得する。 Taking the “case” as an example, the word classification procedure will be described with reference to the flowchart shown in FIG. 5 and the national language dictionary shown in FIG. First, in S101, an inquiry is made to the national language dictionary storage unit 104 that stores the national language dictionary as shown in FIG. As shown in FIG. 6, the Japanese language dictionary includes various words as “entry words” and describes part-of-speech information, meaning sentences, examples, etc. corresponding to “entry words”. The dictionary creation unit 103 acquires part-of-speech information and word meaning sentences from the national language dictionary. In step S101, for example, part-of-speech information and meaning text corresponding to [Case] are acquired.

次にステップＳ１０２で、品詞部分に「指示詞」もしくは「代名詞」が含まれるかどうかを調べる。ここで、「場合」の品詞は「名詞」のみであるため、ステップＳ１０２の条件は満たされず、ステップＳ１０３に進む。 Next, in step S102, it is checked whether or not “participant” or “pronoun” is included in the part of speech. Here, since the part of speech of “case” is only “noun”, the condition of step S102 is not satisfied, and the process proceeds to step S103.

ステップＳ１０３においては、語義部分に「場所」、「空間」、「部分」、「方向」、「方角」などの単語が含まれているかを調べる。ここで、「場合」の語義文は「物事が行われているときの、事情や状況。」であるため、語義部分に「場所」、「空間」、「部分」、「方向」、「方角」などの単語が含まれていないと判定する。従って、ステップＳ１０３の判定はＮｏとなり、その結果、「場合」はＡ群に分類される。 In step S103, it is checked whether words such as “place”, “space”, “part”, “direction”, and “direction” are included in the meaning part. Here, the meaning of the word “case” is “the situation and the situation when things are being done.” Therefore, the word “part”, “space”, “part”, “direction”, “direction” It is determined that a word such as “is not included. Accordingly, the determination in step S103 is No, and as a result, the “case” is classified into the A group.

次に「あいだ」を例にして、図５に示すフローチャートと図６に示す国語辞書を参照して語の分類手順の説明を行う。まず、Ｓ１０１において、図６に示す国語辞書を格納した国語辞書格納部１０４に、ここで対象とする語である「あいだ」を問い合わせる。辞書作成部１０３は、ステップＳ１０１で、［あいだ］に対応する品詞情報と語義文を取得する。 Next, the word classification procedure will be described with reference to the flowchart shown in FIG. 5 and the national language dictionary shown in FIG. First, in step S101, the language dictionary storage unit 104 that stores the language dictionary shown in FIG. In step S <b> 101, the dictionary creation unit 103 acquires part-of-speech information and meaning text corresponding to [AIDA].

次にステップＳ１０２で、品詞部分に「指示詞」もしくは「代名詞」が含まれるかどうかを調べる。ここで、「あいだ」は指示詞にも代名詞にも該当しないため、条件は満たされない。従って、ステップＳ１０３に移る。ステップＳ１０３では、語義部分に「場所」、「空間」、「部分」、「方向」、「方角」などの単語が含まれているかを調べる。 Next, in step S102, it is checked whether or not “participant” or “pronoun” is included in the part of speech. Here, “Aida” is neither a directive nor a pronoun, so the condition is not satisfied. Accordingly, the process proceeds to step S103. In step S103, it is checked whether words such as “place”, “space”, “part”, “direction”, and “direction” are included in the meaning part.

ステップＳ１０３において「あいだ」の語義分を調べる。ここで、「あいだ」の語義文は「二つのものにはさまれた、あいている部分」であるため、「部分」という単語が含まれている。従って、ステップＳ１０３の判定はＹｅｓとなり、その結果、「あいだ」はＢ群に分類される。 In step S103, the meaning of “between” is checked. Here, since the meaning sentence of “Aida” is “the part that is sandwiched between two things”, the word “part” is included. Accordingly, the determination in step S103 is Yes, and as a result, “Ama” is classified into the B group.

次に「ほか」を例にして、図５に示すフローチャートと図６に示す国語辞書を参照して語の分類手順の説明を行う。まず、Ｓ１０１において、図６に示す国語辞書を格納した国語辞書格納部１０４に、ここで対象とする語である「ほか」を問い合わせる。辞書作成部１０３は、ステップＳ１０１で、［ほか］に対応する品詞情報と語義文を取得する。 Next, taking “other” as an example, a word classification procedure will be described with reference to the flowchart shown in FIG. 5 and the national language dictionary shown in FIG. First, in step S101, the language dictionary storage unit 104 that stores the language dictionary shown in FIG. In step S101, the dictionary creation unit 103 acquires part-of-speech information and meaning text corresponding to [others].

ステップＳ１０２において、品詞部分に「指示詞」もしくは「代名詞」が含まれるかどうかを調べる。ここで、「ほか」の品詞は「名詞」と「代名詞」であるため、ステップＳ１０２の条件は満たされ、Ｙｅｓの判定となり、「ほか」はＣ群に分類される。以上のようなプロセスによってすべての語を分類して、図７に示す形式名詞辞書を得る。形式名詞辞書は、図７に示すように、形式名詞を３つのカテゴリ（Ａ〜Ｃ群）に分類した辞書である。分類基準は、形式名詞が、独立語としての性質が強いか、機能語としての性質が強いかによる分類であり、具体的には、図５に示すフローに従って、各形式名詞に対応する国語字処理の登録情報である品詞情報と語義文とに基づいて分類される。 In step S102, it is checked whether or not “indicator” or “pronoun” is included in the part of speech. Here, since the part of speech of “other” is “noun” and “pronoun”, the condition of step S102 is satisfied, and the determination is Yes, and “other” is classified into the group C. All words are classified by the process as described above to obtain a formal noun dictionary shown in FIG. As shown in FIG. 7, the formal noun dictionary is a dictionary in which formal nouns are classified into three categories (groups A to C). The classification standard is a classification based on whether the formal noun has a strong characteristic as an independent word or a strong characteristic as a functional word. Specifically, according to the flow shown in FIG. 5, Japanese characters corresponding to each formal noun are classified. Classification is based on part-of-speech information, which is registration information for processing, and a word meaning sentence.

辞書作成部１０３は、このように、文法的な役割を担う助詞、助動詞などの機能語と、何かを指し示す役割を持つ独立語の中間にあるような働きをする形式名詞と呼ばれる名詞を３つのカテゴリ（Ａ〜Ｃ群）に分類して形式名詞辞書を生成して、形式名詞辞書格納部１０５に格納する。本発明の文書解析装置１００の辞書作成部１０３は、形式名詞を独立語としての性質が強いか、機能語としての性質が強いかによって分類する。 In this way, the dictionary creation unit 103 categorizes nouns called formal nouns that function in the middle of function words such as particles and auxiliary verbs that play a grammatical role, and independent words that have a role of indicating something. A formal noun dictionary is generated by classifying into one category (groups A to C) and stored in the formal noun dictionary storage unit 105. The dictionary creation unit 103 of the document analysis apparatus 100 of the present invention classifies formal nouns depending on whether they are strong as independent words or strong as functional words.

［役割判別部］
次に、役割判別部１０６は、構文意味解析部１０２が、入力文に基づいて生成した構文意味解析結果（図３、図４）を入力し、構文意味解析結果（図３、図４）に含まれる形式名詞「ところ」が「先行詞」であるか「形成子」であるかを判別する。 [Role discrimination part]
Next, the role discriminating unit 106 receives the syntax-semantic analysis result (FIGS. 3 and 4) generated by the syntax-semantic analysis unit 102 based on the input sentence, and the syntax-separation analysis result (FIGS. 3 and 4). It is determined whether the included formal noun “Place” is an antecedent or a “former”.

役割判別部１０６における処理シーケンスを図８のフローチャートを参照して説明する。役割判別部１０６は、まず、ステップＳ２０１で着目する名詞がＡ群に属するかどうかを、形式名詞辞書１０５（図７参照）を参照して調べる。形式名詞「ところ」は、図７の形式名詞辞書に示すように［Ｂ群］に属する。 A processing sequence in the role determination unit 106 will be described with reference to a flowchart of FIG. The role discriminating unit 106 first checks whether or not the noun of interest in step S201 belongs to the group A with reference to the formal noun dictionary 105 (see FIG. 7). The formal noun “Place” belongs to [Group B] as shown in the formal noun dictionary of FIG.

従って、ステップＳ２０１の判定はＮｏとなり、ステップＳ２０２に進む。ステップＳ２０２では、着目する形式名詞と動詞が格関係にあるかどうかをチェックする。図３、図４に示す構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅを参照すると、「ところ」は動詞「ある」のＯＢＬ（必須格）になっているので、このステップＳ２０２の条件に該当し、ステップＳ２０２の判定はＹｅｓとなり、「ところ」は［先行詞］であると判断され、図４の解析結果は破棄され、図３の解析結果が最終的に採用される。 Accordingly, the determination in step S201 is No, and the process proceeds to step S202. In step S202, it is checked whether or not the formal noun and verb concerned are in a case relationship. Referring to the f-structure as the result of the syntactic and semantic analysis shown in FIG. 3 and FIG. 4, “Place” is the OBL (essential case) of the verb “A”. The determination in S202 is Yes, it is determined that “Place” is [preceding], the analysis result of FIG. 4 is discarded, and the analysis result of FIG. 3 is finally adopted.

つまり、
（入力文）「その建物は彼が調査したところに欠陥があった。」
上記の同じ入力文に対応する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅである図３、図４の解析結果は、先に説明したように、図３の解析結果は「ところ」を先行詞とし、埋め込み句内の動詞「調査する」の目的語である解釈を示し、図４は、「ところ」を形成子とし、埋め込み句内の動詞「調査する」の目的語はこの文において省略されているという解釈であったが、役割判別部１０６における図８に示す役割判別処理シーケンスにおいて、［ところ］の役割が［先行詞］であると判定され、図４の解析結果は破棄され、図３の解析結果が最終的に採用される。 That means
(Input sentence) “The building was flawed where he investigated.”
The analysis results of FIG. 3 and FIG. 4 which are f-structures as the result of syntactic and semantic analysis corresponding to the same input sentence as described above, the analysis result of FIG. FIG. 4 shows the interpretation that is the object of the verb “investigate” in the embedded phrase, and FIG. 4 has “place” as a constructor, and the object of the verb “investigate” in the embedded phrase is omitted in this sentence. However, in the role determination processing sequence shown in FIG. 8 in the role determination unit 106, it is determined that the role of “Place” is “precedent”, the analysis result of FIG. 4 is discarded, and FIG. The analysis result is finally adopted.

なお、役割判別部１０６は、構文意味解析結果に含まれる形式名詞が先行詞であるか、形成子であるかの判別処理を図８に示すフローに従って実行する。図８および、図５に示すフローから理解されるように、役割判別部１０６は、構文意味解析結果に含まれる形式名詞について、国語辞書に場所、空間、部分、方向、方角のいずれかの単語が含まれた語義文が対応付けられて登録されている場合、または、形式名詞がそのようなものとして登録されておらず、かつ、その形式名詞が、単独で係り先と格関係にない場合は、その形式名詞は形成子であると判断し、上記以外の場合は先行詞であると判断する処理を実行する。

Note that the role determination unit 106 executes a determination process as to whether the formal noun included in the syntax-semantic analysis result is an antecedent or a constructor according to the flow shown in FIG. As can be understood from the flow shown in FIG. 8 and FIG. 5, the role determination unit 106 selects one of the words of place, space, part, direction, and direction in the Japanese language dictionary for the formal nouns included in the syntactic and semantic analysis result. When a meaning sentence that contains is registered in association with each other, or a formal noun is not registered as such, and the formal noun is not in a case relationship Performs a process of determining that the formal noun is a constructor and determining that it is an antecedent in other cases.

（処理例２）
次に、文入力部１０１において、下記の文が入力された場合の処理について説明する。
（入力文）彼が調査したところ、その建物には欠陥があった。
上記入力文には、前述した処理例と同様、他の語や句に接続することによって自立することが可能な形式名詞である「ところ」が含まれており、「ところ」は、場所を意味している場合や、状況を意味している場合がある。 (Processing example 2)
Next, processing when the following sentence is input in the sentence input unit 101 will be described.
(Input sentence) When he investigated, the building was defective.
In the above input sentence, as in the example of processing described above, “where” is a formal noun that can stand on its own by connecting to another word or phrase, and “where” means place. Or it may mean a situation.

［構文意味解析部］
構文意味解析部１０２は、入力文の構文意味解析を行う。構文意味解析処理は、先に図２を参照して説明したＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ（ＬＦＧ）ＬＦＧに基づく自然言語処理を実行する構文意味解析システム３００を適用した処理として実行される。 [Syntax / Semantic Analysis]
The syntax and semantic analysis unit 102 performs syntax and semantic analysis of the input sentence. The syntactic and semantic analysis processing is executed as processing to which the syntactic and semantic analysis system 300 that executes natural language processing based on the Lexical Functional Grammar (LFG) LFG described above with reference to FIG. 2 is applied.

（入力文）彼が調査したところ、その建物には欠陥があった。
に対する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅは図９または図１０に示す構成となる。
図９の解析結果は「ところ」が先行詞であるため、埋め込み句内の動詞「調査する」の目的語である解釈を示している。
図１０の解析結果は「ところ」は形成子であるため、埋め込み句内の動詞「調査する」の目的語はこの文において省略されていることを示している。 (Input sentence) When he investigated, the building was defective.
The f-structure as the result of the syntactic and semantic analysis for is configured as shown in FIG. 9 or FIG.
The analysis result of FIG. 9 shows the interpretation that is the object of the verb “investigate” in the embedded phrase because “place” is an antecedent.
The analysis result of FIG. 10 indicates that the object “verb” in the embedded phrase is omitted in this sentence because “Place” is a constructor.

［役割判別部］
次に、役割判別部１０５は、構文意味解析部１０２が、入力文に基づいて生成した構文意味解析結果（図９、図１０）を入力し、構文意味解析結果（図９、図１０）に含まれる形式名詞「ところ」が「先行詞」であるか「形成子」であるかを判別する。 [Role discrimination part]
Next, the role discriminating unit 105 receives the syntax-semantic analysis result (FIGS. 9 and 10) generated by the syntax-semantic analysis unit 102 based on the input sentence, and the syntax-separation analysis result (FIGS. 9 and 10). It is determined whether the included formal noun “Place” is an antecedent or a “former”.

従って、ステップＳ２０１の判定はＮｏとなり、ステップＳ２０２に進む。ステップＳ２０２では、着目する形式名詞と動詞が格関係にあるかどうかをチェックする。図９、図１０に示す構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅを参照すると、「ところ」は動詞「ある」のＡＤＪＵＮＣＴ（副詞相当の連用修飾成分）になっているので、このステップＳ２０２の条件に該当しない。 Accordingly, the determination in step S201 is No, and the process proceeds to step S202. In step S202, it is checked whether or not the formal noun and verb concerned are in a case relationship. Referring to the f-structure as the syntactic and semantic analysis results shown in FIG. 9 and FIG. 10, “Place” is an ADJUNCT (adverb equivalent modification component) of the verb “A”. Not applicable.

従って、ステップＳ２０２の判定はＮｏとなり、「ところ」は［形成子］であると判断され、図９の解析結果は破棄され、図１０の解析結果が最終的に採用される。 Accordingly, the determination in step S202 is No, it is determined that “Place” is “former”, the analysis result of FIG. 9 is discarded, and the analysis result of FIG. 10 is finally adopted.

つまり、
（入力文）「彼が調査したところ、その建物には欠陥があった。」
上記の同じ入力文に対応する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅである図９、図１０の解析結果は、先に説明したように、図９の解析結果は「ところ」を先行詞とし、埋め込み句内の動詞「調査する」の目的語である解釈を示し、図１０は、「ところ」を形成子とし、埋め込み句内の動詞「調査する」の目的語はこの文において省略されているという解釈であったが、役割判別部１０６における図８に示す役割判別処理シーケンスにおいて、［ところ］の役割が［形成子］であると判定され、図９の解析結果は破棄され、図１０の解析結果が最終的に採用される。 That means
(Input sentence) “When he investigated, the building was defective.”
The analysis results of FIG. 9 and FIG. 10, which are f-structures as the result of the syntactic and semantic analysis corresponding to the same input sentence, as described above, the analysis result of FIG. FIG. 10 shows an interpretation that is an object of the verb “investigate” in the embedded phrase, and FIG. 10 has “place” as a constructor, and the object of the verb “investigate” in the embedded phrase is omitted in this sentence. However, in the role determination processing sequence shown in FIG. 8 in the role determination unit 106, it is determined that the role of “Place” is “former”, the analysis result of FIG. 9 is discarded, and FIG. The analysis result is finally adopted.

［翻訳処理を実行する文書解析装置例］
例えば日本語を外国語に翻訳する自動翻訳システムでは、入力する日本語データを解析して、各文字列（単語など）の意味を正しく認識して対応する外国語、例えば英語などに翻訳する処理が必要となる。しかしながら、先に説明したように、日本語には、文字列のみからは複数の意味に解釈できるあいまいな表現が含まれる。例えば、上述した「こと」、「の」、「際」、「みぎり」、「ところ」などの形式名詞である。 [Example of document analysis device that executes translation processing]
For example, an automatic translation system that translates Japanese into a foreign language analyzes input Japanese data and correctly recognizes the meaning of each character string (word, etc.) and translates it into the corresponding foreign language, such as English. Is required. However, as explained above, Japanese includes ambiguous expressions that can be interpreted into a plurality of meanings only from a character string. For example, it is a formal noun such as “Koto”, “No”, “When”, “Migiri”, and “Place”.

図１〜図１０を参照して説明した文書解析装置による解析処理を行なうことで、形式名詞が形成子であるか、先行詞であるかを判別することが可能であり、この解析結果を用いて翻訳を行なうことで正しい翻訳処理が可能となる。以下、上述した文書解析処理によって得られる解析結果に基づいて翻訳処理を実行する文書解析装置の構成、および処理について説明する。 By performing the analysis processing by the document analysis apparatus described with reference to FIGS. 1 to 10, it is possible to determine whether a formal noun is a constructor or an antecedent, and using this analysis result The correct translation process becomes possible by translating. Hereinafter, the configuration and processing of a document analysis apparatus that executes translation processing based on the analysis result obtained by the document analysis processing described above will be described.

図１１に、翻訳処理を実行する文書解析装置の構成を示す。本実施例における文書解析装置４００は、図１１に示すように、文入力部４０１、構文意味解析部４０２、役割判別部４０３、形式名詞辞書格納部４０４、翻訳処理部４０５、対訳辞書格納部４０６、翻訳結果出力部４０７を有する。本実施例の文書解析装置４００は、複数の異なる意味に解釈される可能性のあるあいまいな表現を含む文を入力した場合においても、正しい構文意味解析結果に基づいて正しい翻訳結果を出力する装置である。 FIG. 11 shows the configuration of a document analysis apparatus that executes translation processing. As shown in FIG. 11, the document analysis apparatus 400 according to the present exemplary embodiment includes a sentence input unit 401, a syntax and semantic analysis unit 402, a role determination unit 403, a formal noun dictionary storage unit 404, a translation processing unit 405, and a bilingual dictionary storage unit 406. The translation result output unit 407 is provided. The document analysis apparatus 400 according to the present embodiment outputs a correct translation result based on a correct syntactic and semantic analysis result even when a sentence including an ambiguous expression that may be interpreted into a plurality of different meanings is input. It is.

図１１に示す構成中、文入力部４０１、構文意味解析部４０２、役割判別部４０３、形式名詞辞書格納部４０４は、図１に示す構成と同様の構成であり、同様の処理を実行する。 In the configuration shown in FIG. 11, the sentence input unit 401, the syntax and semantic analysis unit 402, the role determination unit 403, and the formal noun dictionary storage unit 404 have the same configuration as the configuration shown in FIG. 1 and execute the same processing.

すなわち、文入力部４０１は解析および翻訳対象となる文を入力し、構文意味解析部４０２は、入力文の構文意味解析を行う。構文意味解析処理は、先に図２を参照して説明したＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ（ＬＦＧ）ＬＦＧに基づく自然言語処理を実行する構文意味解析システム３００を適用した処理として実行される。役割判別部４０３は、構文意味解析部４０２が、入力文に基づいて生成した構文意味解析結果を入力し、構文意味解析結果に含まれる形式名詞が「先行詞」であるか「形成子」であるかを判別する。以下、
入力文が、
（入力文）「その建物は彼が調査したところに欠陥があった。」
上記文である場合の処理について説明する。翻訳部４０５以下の構成および処理について説明する。 That is, the sentence input unit 401 inputs a sentence to be analyzed and translated, and the syntax and semantic analysis unit 402 performs a syntax and semantic analysis of the input sentence. The syntactic and semantic analysis processing is executed as processing to which the syntactic and semantic analysis system 300 that executes natural language processing based on the Lexical Functional Grammar (LFG) LFG described above with reference to FIG. 2 is applied. The role discriminating unit 403 receives the syntactic and semantic analysis result generated by the syntactic and semantic analyzing unit 402 based on the input sentence, and the formal noun included in the syntactic and semantic analysis result is “precedent” or “former”. Determine if it exists. Less than,
The input sentence is
(Input sentence) “The building was flawed where he investigated.”
Processing for the above sentence will be described. The configuration and processing after the translation unit 405 will be described.

［翻訳部］
翻訳部４０５は、
（入力文）「その建物は彼が調査したところに欠陥があった。」
上記入力文とともに、役割判別部４０３から、構文意味解析部４０２が、入力文に基づいて生成した構文意味解析結果（図３、図４）に含まれる形式名詞「ところ」が「先行詞」であるか「形成子」であるかの判別結果を入力する。すなわち、先に説明したように、
（入力文）「その建物は彼が調査したところに欠陥があった。」
における形式名詞「ところ」は、「先行詞」であるとの判別結果を受け取る。 [Translation Department]
The translation unit 405
(Input sentence) “The building was flawed where he investigated.”
Along with the above input sentence, the formal noun “Place” included in the syntax semantic analysis result (FIG. 3 and FIG. 4) generated by the syntax semantic analysis unit 402 based on the input sentence from the role determination unit 403 is “preceding”. Input the discrimination result of whether it is a “former”. That is, as explained above,
(Input sentence) “The building was flawed where he investigated.”
The formal noun “Tokoro” in is received the discrimination result that it is “preceding”.

翻訳部４０５は、対訳辞書格納部４０６に格納された対訳辞書を参照して、入力文に対する対訳を生成する。図１２に対訳辞書の例を示す。対訳辞書は、図１２に示すように、名詞に対応する対訳、本例では英語訳を、
（ａ）名詞が「先行詞」である場合の対訳、
（ｂ）名詞が「形成子」である場合の対訳、
上記（ａ），（ｂ）の場合の対訳を登録した辞書として構成される。 The translation unit 405 refers to the bilingual dictionary stored in the bilingual dictionary storage unit 406 and generates a bilingual for the input sentence. FIG. 12 shows an example of a bilingual dictionary. As shown in FIG. 12, the bilingual dictionary includes a bilingual translation corresponding to a noun, in this example, an English translation.
(A) a parallel translation when the noun is an antecedent;
(B) a parallel translation when the noun is “former”;
It is configured as a dictionary in which parallel translations in the cases (a) and (b) are registered.

辞書によれば、
「ところ」が「先行詞」である場合の対訳は「ｐｌａｃｅ」
「ところ」が「形成子」である場合の対訳は「ｗｈｅｎ」
として登録されている。 According to the dictionary
If "Place" is an antecedent, the translation is "place"
If "Place" is "Former", the parallel translation is "when"
It is registered as.

翻訳部４０５は、役割判別部４０３から、
（入力文）「その建物は彼が調査したところに欠陥があった。」
における形式名詞「ところ」は、「先行詞」であるとの判別結果を受け取っており、この判別結果に従って、「ところ」は、「先行詞」であると判断して、対訳辞書から、「ところ」の訳語として「ｐｌａｃｅ」を得る。このようにした実行された翻訳結果が、翻訳結果出力部４０７を介して出力される。 The translation unit 405 receives the role determination unit 403 from
(Input sentence) “The building was flawed where he investigated.”
The formal noun “Tokoro” has received the discriminant result that it is “preceding”, and according to this discriminating result, “Tokoro” is judged to be “preceding”, "Place" is obtained as a translation of "." The translation result thus executed is output via the translation result output unit 407.

本実施例の文書解析装置４００では、あいまいで複数の解釈が可能な形式名詞についての解析を行い、形式名詞が「先行詞」であるか「形成子」であるかの判別を実行して、その判別結果に基づいて翻訳を行なう構成であるので、正しい翻訳が可能となり高精度なし角高い翻訳結果を出力することが可能となる。 In the document analysis apparatus 400 of the present embodiment, an ambiguous and plural interpretable formal noun is analyzed, and it is determined whether the formal noun is “precedent” or “former”. Since the translation is performed based on the determination result, correct translation is possible, and a highly accurate and highly accurate translation result can be output.

最後に、上述した処理を実行する文書解析装置を構成する情報処理装置のハードウェア構成例について、図１３を参照して説明する。ＣＰＵ（Central Processing Unit）５０１は、ＯＳ（Operating System)に対応する処理や、上述の実施例において説明した入力文に基づく構文意味解析処理、形式名詞辞書作成処理、形式名詞が先行詞であるか形成子であるかの判別処理を実行する役割判別処理、翻訳処理などを実行する。これらの処理は、各情報処理装置のＲＯＭ、ハードディスクなどのデータ記憶部に格納されたコンピュータ・プログラムに従って実行される。 Finally, an example of the hardware configuration of the information processing apparatus constituting the document analysis apparatus that executes the above-described processing will be described with reference to FIG. A CPU (Central Processing Unit) 501 performs processing corresponding to an OS (Operating System), syntactic and semantic analysis processing based on the input sentence described in the above-described embodiment, formal noun dictionary creation processing, and whether the formal noun is an antecedent. A role discrimination process, a translation process, and the like are executed to execute a discrimination process as to whether it is a creator. These processes are executed according to a computer program stored in a data storage unit such as a ROM or a hard disk of each information processing apparatus.

ＲＯＭ（Read Only Memory）５０２は、ＣＰＵ５０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（Random Access Memory）５０３は、ＣＰＵ５０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス５０４により相互に接続されている。 A ROM (Read Only Memory) 502 stores programs used by the CPU 501, calculation parameters, and the like. A RAM (Random Access Memory) 503 stores programs used in the execution of the CPU 501, parameters that change as appropriate during the execution, and the like. These are connected to each other by a host bus 504 including a CPU bus.

ホストバス５０４は、ブリッジ５０５を介して、ＰＣＩ(Peripheral Component Interconnect/Interface)バスなどの外部バス５０６に接続されている。 The host bus 504 is connected to an external bus 506 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 505.

キーボード５０８、ポインティングデバイス５０９は、ユーザにより操作される入力デバイスである。ディスプレイ５１０は、液晶表示装置またはＣＲＴ（Cathode Ray Tube）などから成り、各種情報をテキストやイメージで表示する。 A keyboard 508 and a pointing device 509 are input devices operated by the user. The display 510 includes a liquid crystal display device, a CRT (Cathode Ray Tube), or the like, and displays various types of information as text and images.

ＨＤＤ（Hard Disk Drive）５１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ５０１によって実行するプログラムや情報を記録または再生させる。ハードディスクは、例えば、国語辞書、形式名詞辞書、対訳辞書など、各種辞書データの格納手段などに利用され、さらに、データ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 511 includes a hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 501 and information. The hard disk is used, for example, for storing various dictionary data such as a national language dictionary, formal noun dictionary, bilingual dictionary and the like, and further stores various computer programs such as a data processing program.

ドライブ５１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記録媒体５２１に記録されているデータまたはプログラムを読み出して、そのデータまたはプログラムを、インタフェース５０７、外部バス５０６、ブリッジ５０５、およびホストバス５０４を介して接続されているＲＡＭ５０３に供給する。 The drive 512 reads data or a program recorded on a removable recording medium 521 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and the data or program is read out from the interface 507 and the external bus 506. , And supplied to the RAM 503 connected via the bridge 505 and the host bus 504.

接続ポート５１４は、外部接続機器５２２を接続するポートであり、ＵＳＢ，ＩＥＥＥ１３９４等の接続部を持つ。接続ポート５１４は、インタフェース５０７、および外部バス５０６、ブリッジ５０５、ホストバス５０４等を介してＣＰＵ５０１等に接続されている。通信部５１５は、ネットワークに接続され、クライアントやネットワーク接続サーバとの通信を実行する。 The connection port 514 is a port for connecting the external connection device 522 and has a connection unit such as USB or IEEE1394. The connection port 514 is connected to the CPU 501 and the like via the interface 507, the external bus 506, the bridge 505, the host bus 504, and the like. The communication unit 515 is connected to a network and executes communication with a client and a network connection server.

なお、図１３に示す文書解析装置として適用される情報処理装置のハードウェア構成例は、ＰＣを適用して構成した装置の一例であり、本発明の文書解析装置は、図１３に示す構成に限らず、上述した実施例において説明した処理を実行可能な構成であればよい。 Note that the hardware configuration example of the information processing apparatus applied as the document analysis apparatus illustrated in FIG. 13 is an example of an apparatus configured by applying a PC, and the document analysis apparatus of the present invention has the configuration illustrated in FIG. The configuration is not limited as long as the processing described in the above-described embodiments can be executed.

以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、特許請求の範囲の欄を参酌すべきである。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present invention. In other words, the present invention has been disclosed in the form of exemplification, and should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims should be taken into consideration.

また、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。例えば、プログラムは記録媒体に予め記録しておくことができる。記録媒体からコンピュータにインストールする他、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介してプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 The series of processing described in the specification can be executed by hardware, software, or a combined configuration of both. When executing processing by software, the program recording the processing sequence is installed in a memory in a computer incorporated in dedicated hardware and executed, or the program is executed on a general-purpose computer capable of executing various processing. It can be installed and run. For example, the program can be recorded in advance on a recording medium. In addition to being installed on a computer from a recording medium, the program can be received via a network such as a LAN (Local Area Network) or the Internet and can be installed on a recording medium such as a built-in hard disk.

なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Note that the various processes described in the specification are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Further, in this specification, the system is a logical set configuration of a plurality of devices, and the devices of each configuration are not limited to being in the same casing.

以上、説明したように、本発明の構成によれば、入力文の構文意味解析処理を実行し、構文意味解析結果に含まれる形式名詞の役割が先行詞であるか形成子であるかの判別処理を実行して、判別結果に従った構文意味解析結果を出力する。形式名詞の役割が先行詞であるか形成子であるかの判別処理においては、形式名詞が独立語としての性質が強いか機能語としての性質が強いかによって分類した分類情報を有する形式名詞辞書に基づいて判別する。具体的には、形式名詞が、国語辞書に場所、空間、部分、方向、方角のいずれかの単語が含まれた語義文が対応付けられて登録されている場合、または、形式名詞が前記構文意味解析結果において単独で係り先と格関係にある場合は、形式名詞は形成子であると判断し、上記以外の場合は先行詞であると判断する。この判断結果に基づいてより精度の高い構文意味解析結果や、あるいはこの構文意味解析結果に基づく精度の高い翻訳結果を得ることが可能となる。 As described above, according to the configuration of the present invention, the syntax / semantic analysis processing of the input sentence is executed, and it is determined whether the role of the formal noun included in the syntax / semantic analysis result is an antecedent or a predecessor. Execute the process and output the syntax and semantic analysis result according to the discrimination result. In the process of discriminating whether the role of a formal noun is an antecedent or a former, a formal noun dictionary having classification information classified according to whether the formal noun is strong as an independent word or strong as a function word Determine based on. Specifically, when a formal noun is registered in the Japanese language dictionary in association with a word meaning sentence including any word of place, space, part, direction, or direction, or If the semantic analysis result alone has a case relationship with the destination, the formal noun is determined to be a former, and otherwise, it is determined to be an antecedent. Based on this determination result, it is possible to obtain a more accurate syntactic and semantic analysis result or a highly accurate translation result based on this syntactic and semantic analysis result.

本発明の一実施形態に係る文書解析装置の構成例について説明する図である。It is a figure explaining the structural example of the document analysis apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置における構文意味解析部の処理例および構成例について説明する図である。It is a figure explaining the process example and structural example of a syntax semantic analysis part in the document analysis apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置において生成する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅの例について説明する図である。It is a figure explaining the example of f-structure as a syntax semantic analysis result produced | generated in the document analysis apparatus concerning one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置において生成する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅの例について説明する図である。It is a figure explaining the example of f-structure as a syntax semantic analysis result produced | generated in the document analysis apparatus concerning one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置における辞書作成部の処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the process sequence of the dictionary creation part in the document analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置における辞書作成部の処理に適用する国語辞書の構成例について説明する図である。It is a figure explaining the structural example of the national language dictionary applied to the process of the dictionary creation part in the document analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置における辞書作成部の処理によって生成する形式名詞辞書の構成例について説明する図である。It is a figure explaining the structural example of the formal noun dictionary produced | generated by the process of the dictionary creation part in the document analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置における役割判別部の処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the processing sequence of the role discrimination | determination part in the document analysis apparatus concerning one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置において生成する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅの例について説明する図である。It is a figure explaining the example of f-structure as a syntax semantic analysis result produced | generated in the document analysis apparatus concerning one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置において生成する構文意味解析結果としてのｆ−ｓｔｒｕｃｔｕｒｅの例について説明する図である。It is a figure explaining the example of f-structure as a syntax semantic analysis result produced | generated in the document analysis apparatus concerning one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置の構成例について説明する図である。It is a figure explaining the structural example of the document analysis apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置における翻訳処理部の処理に適用する対訳辞書の構成例について説明する図である。It is a figure explaining the structural example of the parallel translation dictionary applied to the process of the translation process part in the document analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文書解析装置のハードウェア構成例について説明する図である。It is a figure explaining the hardware structural example of the document analysis apparatus which concerns on one Embodiment of this invention.

Explanation of symbols

１００文書解析装置
１０１文入力部
１０２構文意味解析部
１０３辞書作成部
１０４国語辞書格納部
１０５形式名詞辞書格納部
１０６役割判別部
１０７判別結果出力部
３００構文意味解析システム
３０２形態素解析部
３０２Ａ形態素ルール
３０２Ｂ形態素辞書
３０３構文意味解析部
３０３Ａ文法ルール
３０３Ｂ結合価辞書
４００文書解析装置
４０１文入力部
４０２構文意味解析部
４０３役割判別部
４０４形式名詞辞書格納部
４０５翻訳処理部
４０６対訳辞書格納部
４０７翻訳結果出力部
５０１ＣＰＵ(Central Processing Unit)
５０２ＲＯＭ（Read-Only-Memory）
５０３ＲＡＭ（Random Access Memory）
５０４ホストバス
５０５ブリッジ
５０６外部バス
５０７インタフェース
５０８キーボード
５０９ポインティングデバイス
５１０ディスプレイ
５１１ＨＤＤ（Hard Disk Drive）
５１２ドライブ
５１４接続ポート
５１５通信部
５２１リムーバブル記録媒体
５２２外部接続機器 DESCRIPTION OF SYMBOLS 100 Document analysis apparatus 101 Sentence input part 102 Syntax semantic analysis part 103 Dictionary preparation part 104 Japanese language dictionary storage part 105 Formal noun dictionary storage part 106 Role discrimination part 107 Discrimination result output part 300 Syntax semantic analysis system 302 Morphological analysis part 302A Morphological rule 302B Morphological dictionary 303 Syntax and semantic analysis unit 303A Grammar rule 303B Value valence dictionary 400 Document analysis device 401 Sentence input unit 402 Syntax and semantic analysis unit 403 Role discrimination unit 404 Formal noun dictionary storage unit 405 Translation processing unit 406 Bilingual dictionary storage unit 407 Translation result output 501 CPU (Central Processing Unit)
502 ROM (Read-Only-Memory)
503 RAM (Random Access Memory)
504 Host bus 505 Bridge 506 External bus 507 Interface 508 Keyboard 509 Pointing device 510 Display 511 HDD (Hard Disk Drive)
512 drive 514 connection port 515 communication unit 521 removable recording medium 522 external connection device

Claims

A syntactic and semantic analysis unit for executing syntactic and semantic analysis processing of an input sentence;
A role discriminator for executing a discriminating process of whether the role of the formal noun included in the syntactic and semantic analysis result of the input sentence is an antecedent or an antecedent;
A discrimination result output unit for inputting the discrimination result of the role discrimination unit and outputting a syntax-semantic analysis result according to the discrimination result;
I have a,
The role discriminating unit is a formal noun dictionary in which a group of formal nouns that have strong properties as function words and are to be discriminated as they are are registered as distinctive nouns of the first group so that they can be distinguished from other formal nouns. The group of formal nouns is identified as a generator, and the formal nouns that are not the group of formal nouns are identified as a constructor when the formal noun is not in a case-related relationship. A document analyzing apparatus characterized by discriminating an antecedent when a character is in a case relationship with a destination .

The document analysis device further includes:
Has a dictionary creation unit that creates the form noun dictionary,
The dictionary creation unit
A formal noun is registered in the national language dictionary without a synonym sentence that does not include a pronoun or indicator as part of speech classification, and the national language dictionary contains a word in place, space, part, direction, or direction. 2. The document analysis apparatus according to claim 1, wherein the formal noun is registered in the formed noun dictionary as the formal noun of the first group .

The document analysis device further includes:
The document analysis apparatus according to claim 1, further comprising a translation processing unit that inputs a discrimination result of the role discrimination unit and executes a translation process according to the discrimination result.

A document analysis method in a document analysis device,
A syntax and semantic analysis step in which the syntax and semantic analysis unit executes a syntax and semantic analysis process of the input sentence;
A role discriminating step in which the role discriminating unit executes a discriminating process of whether the role of the formal noun included in the syntactic and semantic analysis result of the input sentence is an antecedent or a constructor;
A discrimination result output unit that inputs a discrimination result of the role discrimination unit and outputs a syntactic and semantic analysis result according to the discrimination result; and
I have a,
The role discrimination step includes a formal noun dictionary in which a group of formal nouns that have strong properties as function words and are to be discriminated as they are are registered as distinctive nouns of the first group so that they can be distinguished from other formal nouns. The group of formal nouns is identified as a generator, and the formal nouns that are not the group of formal nouns are identified as a constructor when the formal noun is not in a case-related relationship. A document analysis method characterized by discriminating an antecedent when the character is in a case relationship with the relationship .

The document analysis method further includes:
Dictionary creation unit has a dictionary generating step of generating the form noun dictionary,
The dictionary creation unit
A formal noun is registered in the national language dictionary without a synonym sentence that does not include a pronoun or indicator as part of speech classification, and the national language dictionary contains a word in place, space, part, direction, or direction. 5. The document analysis method according to claim 4 , wherein the formal noun is registered in the formed noun dictionary as the formal noun of the first group .

The document analysis method further includes:
5. The document analysis method according to claim 4 , further comprising: a translation processing step in which a translation processing unit inputs a discrimination result of the role discrimination unit and executes a translation process according to the discrimination result.

A computer program for executing document analysis processing in a document analysis device;
A syntax and semantic analysis step for causing the syntax and semantic analysis unit to execute a syntax and semantic analysis process of the input sentence;
A role determination step for causing the role determination unit to execute a determination process as to whether the role of the formal noun included in the syntactic and semantic analysis result of the input sentence is an antecedent or a predecessor;
A discrimination result output step for inputting a discrimination result of the role discrimination unit to a discrimination result output unit and outputting a syntax-semantic analysis result according to the discrimination result;
I have a,
The role discrimination step includes a formal noun dictionary in which a group of formal nouns that have strong properties as function words and are to be discriminated as they are are registered as distinctive nouns of the first group so that they can be distinguished from other formal nouns. The group of formal nouns is identified as a generator, and the formal nouns that are not the group of formal nouns are identified as a constructor when the formal noun is not in a case-related relationship. A computer program characterized by discriminating it as an antecedent when it is in a case relationship with a destination .