JP4789814B2

JP4789814B2 - Cluster generation apparatus and cluster generation method

Info

Publication number: JP4789814B2
Application number: JP2007014194A
Authority: JP
Inventors: プラムディオノイコ; 京士飯塚; 宏之佐藤; 健治大友; 隆彦村山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-01-24
Filing date: 2007-01-24
Publication date: 2011-10-12
Anticipated expiration: 2027-01-24
Also published as: JP2008181333A

Description

本発明は、グラフのクラスタを生成するクラスタ生成装置およびクラスタ生成方法に関するものである。 The present invention relates to a cluster generation device and a cluster generation method for generating a cluster of graphs.

インスタンスをもつノード間がアークによって接続されたグラフまたはそのサブグラフを分割したものをクラスタといい、クラスタは、その中ではノードのインスタンス同士が類似し、クラスタが異なればノードのインスタンス同士が類似しないように生成される。 A graph in which nodes with instances are connected by arcs or a subgraph divided is called a cluster. In a cluster, node instances are similar to each other, and if the clusters are different, node instances are not similar to each other. Is generated.

この出願の発明に関連する先行技術文献情報としては下記のものがある。 Prior art document information relating to the invention of this application includes the following.

非特許文献１では、サブグラフを検索する技術を開示している。グラフはＲＤＦで表現され、検索にはクエリが使用される。 Non-Patent Document 1 discloses a technique for searching a subgraph. The graph is expressed in RDF, and a query is used for searching.

非特許文献２では、クラスタを生成する技術を開示している。クラスタ生成には、エッジの密度の差異が用いられる。 Non-Patent Document 2 discloses a technique for generating a cluster. The difference in edge density is used for cluster generation.

非特許文献３では、サブグラフを検索する技術を開示している。ここでは、グラフは重み付きグラフであり、ノードのオーソリティ値とハブ値をＨＩＴＳアルゴリズムで計算する。そして、複数の情報のそれぞれをもつノード間にあるパスで、しかもそのパスを構成するノードについて計算された値がしきい値以上のものを検索する。 Non-Patent Document 3 discloses a technique for searching a subgraph. Here, the graph is a weighted graph, and the authority value and the hub value of the node are calculated by the HITS algorithm. Then, a search is made for a path between nodes having each of a plurality of pieces of information, and a value calculated for a node constituting the path is equal to or greater than a threshold value.

非特許文献４では、指定されたシードのウェブページと周辺のウェブページを収集し、それらのグラフを生成する。そして、そのグラフのノードのオーソリティ値とハブ値をＨＩＴＳアルゴリズムで計算し、その値が上位のウェブページがグラフの中心的なノード（コア）であることとする。
"SPARQL Query Language for RDF", [online], インターネット＜ＵＲＬ：http://www.w3.org/TR/rdf-sparql-query/＞ M.E..J.Newman, "Detecting community structure in networks", Europian Physical Journal B, 38:321-330, 2004 S.Mukherjea, B. Bamba, "BioPatentMiner: an information retrieval system for biomedical patents", VLDB, 2004 D.Gibson, J. Kleinberg, and P.Paghavan, "Inferring web communities from link topology", ACM Conf. on Hypertext and Hypermedia, 1998 In Non-Patent Document 4, a web page of a specified seed and surrounding web pages are collected and a graph thereof is generated. Then, the authority value and hub value of the node of the graph are calculated by the HITS algorithm, and the higher-level web page is the central node (core) of the graph.
"SPARQL Query Language for RDF", [online], Internet <URL: http://www.w3.org/TR/rdf-sparql-query/> ME.J.Newman, "Detecting community structure in networks", Europian Physical Journal B, 38: 321-330, 2004 S. Mukherjea, B. Bamba, "BioPatentMiner: an information retrieval system for biomedical patents", VLDB, 2004 D. Gibson, J. Kleinberg, and P. Paghavan, "Inferring web communities from link topology", ACM Conf. On Hypertext and Hypermedia, 1998

非特許文献１に記載の技術では、１つのサブグラフを検索するだけなので、複数の情報の関連性を示せない。また、グラフ構造の知識が必要である。 In the technique described in Non-Patent Document 1, since only one subgraph is searched, relevance of a plurality of information cannot be shown. In addition, knowledge of the graph structure is required.

非特許文献２に記載の技術では、クラスタを生成できるが、複数の情報の関連性を示せない。また、関心の対象が考慮されない。 In the technique described in Non-Patent Document 2, a cluster can be generated, but the relevance of a plurality of pieces of information cannot be shown. Also, the object of interest is not considered.

非特許文献３に記載の技術では、検索できるのはパスなので、複数の情報の関連性を十分に示せない。 In the technique described in Non-Patent Document 3, since a path can be searched, the relevance of a plurality of information cannot be shown sufficiently.

非特許文献４に記載の技術では、生成されるのは、シードを中心とした１つのグラフ（クラスタ）だけなので、複数の情報の関連性を示せない。また、関心の対象が考慮されない。 In the technique described in Non-Patent Document 4, since only one graph (cluster) centered on a seed is generated, the relevance of a plurality of information cannot be shown. Also, the object of interest is not considered.

本発明は、上記の課題に鑑みてなされたものであり、その目的とするところは、グラフからそのノード間の関連性を示せるクラスタを生成できるクラスタ生成装置およびクラスタ生成方法を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a cluster generation device and a cluster generation method capable of generating a cluster that can show the relationship between the nodes from a graph. .

上記の課題を解決するために、本発明では、インスタンスをもつノード間がラベルをもつアークによって接続されたグラフまたはそのサブグラフのアークに、指定された関心度に応じたアークスコアを設定し、そのアークスコアを基にして、グラフなどのノードについてのノードスコアを計算し、計算されたノードスコアを基にして、グラフなどのクラスタを生成する。 In order to solve the above problem, in the present invention, an arc score corresponding to a specified degree of interest is set to an arc of a graph or a subgraph thereof in which nodes having instances are connected by an arc having a label, A node score for a node such as a graph is calculated based on the arc score, and a cluster such as a graph is generated based on the calculated node score.

本発明によれば、ノード間にあるアークについてのアークスコアを基に計算されたノードスコアが用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 According to the present invention, since the node score calculated based on the arc score for the arc between the nodes is used, it is possible to generate a cluster that can indicate the relationship between the nodes.

以下、本発明の実施の形態を図面を参照して説明する。なお、同一のものには同一符号を付与し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, the same code | symbol is provided to the same thing and duplication description is abbreviate | omitted.

［第１の実施の形態］
図１は、第１の実施の形態に係るクラスタ生成装置１Ａの構成図である。 [First Embodiment]
FIG. 1 is a configuration diagram of a cluster generation device 1A according to the first embodiment.

クラスタ生成装置１Ａは、ユーザ端末２に接続され、ユーザ端末２には、表示装置３が接続されている。 The cluster generation device 1 </ b> A is connected to a user terminal 2, and a display device 3 is connected to the user terminal 2.

クラスタ生成装置１Ａは、有向グラフであるグラフＧを表示するためのデータ群が記憶されたグラフデータベース１１と、ユーザ端末２で入力操作をさせるための入力用インタフェースを生成する入力用インタフェース生成部１２と、ユーザ端末２のユーザの関心の対象として想定された項目（関心項目という）に対応づけて、その関心項目に関連するラベルを記憶したラベル記憶部１３と、グラフＧのアークにスコア（アークスコアという）を設定するアークスコア設定部１４と、そのグラフＧのノードについてスコア（ノードスコアという）を計算するノードスコア計算部１５と、そのグラフＧからクラスタを生成するクラスタ生成部１６と、データベースインタフェース１７と、処理結果を伝えるための出力用インタフェースを生成する出力用インタフェース生成部１８とを備える。 The cluster generation device 1A includes a graph database 11 in which a data group for displaying a graph G that is a directed graph is stored, an input interface generation unit 12 that generates an input interface for causing the user terminal 2 to perform an input operation, The label storage unit 13 stores a label associated with the item of interest (referred to as an item of interest) associated with the item of interest of the user of the user terminal 2 and a score (arc score) Arc score setting unit 14 for setting a node, a node score calculation unit 15 for calculating a score (referred to as a node score) for a node of the graph G, a cluster generation unit 16 for generating a cluster from the graph G, and a database interface 17 and an output interface to convey the processing result And an output interface generating unit 18.

クラスタは、グラフＧの一部を分割して得られた各部分と同じものである。分割して得られるものなので、クラスタ同士では重複がない。また、サブグラフ内では分離している部分がないから、クラスタ内でも同様である。クラスタは、その中ではノードのインスタンス同士が類似し、クラスタが異なればノードのインスタンス同士が類似しないように生成される。 The cluster is the same as each part obtained by dividing a part of the graph G. Since it is obtained by dividing, there is no overlap between clusters. Further, since there is no separated part in the subgraph, the same applies to the cluster. The clusters are generated so that the node instances are similar to each other, and if the clusters are different, the node instances are not similar to each other.

クラスタ生成装置１Ａは、各部（データベース含む）でデータの送受信（受け渡し）が可能であればよい。つまり、各部を、同一のコンピュータに配置してもよいし、複数のコンピュータに分散配置してもよい。また、これらコンピュータをクラスタ生成装置として動作させるコンピュータプログラムを通信回線を介して送受信してもよい。また、このコンピュータプログラムを、半導体メモリ、磁気ディスク、光ディスク、光磁気ディスク、磁気テープなどの記録媒体に記録し、その記録媒体を流通させてもよい。他の実施の形態でも同様である。 The cluster generation device 1A only needs to be able to transmit and receive (deliver) data in each unit (including a database). That is, each unit may be arranged on the same computer or may be distributed on a plurality of computers. Further, a computer program for operating these computers as a cluster generation device may be transmitted / received via a communication line. The computer program may be recorded on a recording medium such as a semiconductor memory, a magnetic disk, an optical disk, a magneto-optical disk, or a magnetic tape, and the recording medium may be distributed. The same applies to other embodiments.

図２は、グラフデータベース１１に記憶されたデータ群を全て使って表示できるグラフＧの一部を例示した図である。 FIG. 2 is a diagram illustrating a part of a graph G that can be displayed using all data groups stored in the graph database 11.

グラフデータベース１１に記憶されたデータ群を全て使って、図２に一部を例示したグラフＧ、つまり互いに異なるインスタンスをもつノード間がラベルをもつアークによって接続され且つ当該インスタンスのクラスが定義されたグラフＧ、を表示することができる。逆にいえば、グラフＧを表示するための過不足ないデータ群がグラフデータベース１１に記憶されている。以下、そのデータ群を便宜的にグラフＧという。また、なんらかのグラフ、サブグラフ（なんらかのグラフそのものまたはそれに含まれるグラフ）、パス（分岐および閉ループをもたないグラフ）などをクラスを含めて表示するための過不足ないデータ群を便宜的にグラフ、サブグラフ、パスなどという。 Using all the data groups stored in the graph database 11, the graph G partially illustrated in FIG. 2, that is, nodes having different instances are connected by arcs having labels, and classes of the instances are defined. A graph G can be displayed. Conversely, a data group for displaying the graph G is stored in the graph database 11. Hereinafter, the data group is referred to as a graph G for convenience. Also, for convenience, graphs and subgraphs can be used to display any graphs, subgraphs (some graphs themselves or graphs included in them), paths (graphs without branches and closed loops), etc. , Pass and so on.

グラフＧでは、例えば、「ＸＭＬ」や「Ｂプロジェクト」などのインスタンスをもつノードが、「ｒｍ：技術キーワード」や「ｒｍ：著者」などのラベルをもつアークで接続される。また、グラフＧでは、ノードにそのインスタンス「Ｐｅｒｓｏｎ：山田太郎」などの概念であるクラス「Ｐｅｒｓｏｎ：人」などが定義される。 In the graph G, for example, nodes having instances such as “XML” and “B project” are connected by arcs having labels such as “rm: technical keyword” and “rm: author”. Further, in the graph G, a class “Person: person” or the like that is a concept such as an instance “Person: Taro Yamada” is defined in a node.

図３は、グラフデータベース１１に記憶されたデータ群の一部を例示した図である。 FIG. 3 is a diagram illustrating a part of a data group stored in the graph database 11.

例えば、最初のデータ「＜ｏｒｇ：Ｄプロジェクト＞＜ｒｍ：技術キーワード＞”ＸＭＬ”」は、インスタンス「ｏｒｇ：Ｄプロジェクト」をもつノードからインスタンス「ＸＭＬ」のノードへ向かうアークがラベル「ｒｍ：技術キーワード」をもつことを示している。また、例えば、最後のデータ「＜ｐｅｒｓｏｎ：鈴木花子＞＜ｒｄｆ：ｔｙｐｅ＞＜Ｐｅｒｓｏｎ：人＞」は、インスタンス「ｐｅｒｓｏｎ：鈴木花子」をもつノードのクラスが「ｐｅｒｓｏｎ：人」であることを示している。 For example, in the first data “<org: D project> <rm: technical keyword>“ XML ””, an arc from the node having the instance “org: D project” to the node of the instance “XML” is labeled “rm: technique”. It has shown that it has a keyword. For example, the last data “<person: Hanako Suzuki> <rdf: type> <Person: person>” indicates that the class of the node having the instance “person: Hanako Suzuki” is “person: person”. ing.

なお、各データは、実際にはＲＤＦ（以下の文献に記載）で表現されている。 Each data is actually expressed in RDF (described in the following document).

「Resource Description Framework(RDF)Model and Syntax Specification」, Ora Lassia, Ralph R.Swick編,[online], インターネット<URL:http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/>
「RDF Vocabulary Description Language 1.0: RDF Schema」, Dan Brickley, R.V.Guha編,[online], インターネット<URL:http://www.w3.org/TR/rdf-schema/>
図４は、ラベル記憶部１３の記憶内容を示す図である。 "Resource Description Framework (RDF) Model and Syntax Specification", Ora Lassia, Ralph R. Swick, [online], Internet <URL: http://www.w3.org/TR/1999/REC-rdf-syntax- 19990222 />
“RDF Vocabulary Description Language 1.0: RDF Schema”, Dan Brickley, RVGuha, [online], Internet <URL: http://www.w3.org/TR/rdf-schema/>
FIG. 4 is a diagram showing the contents stored in the label storage unit 13.

例えば、関心項目「所属」には、ラベル「ｐｅｒｓｏｎ：所属」、ラベル「ｒｍ：学会ＫＷ」、ラベル「ｒｍ：会員」が対応づけられている。 For example, a label “person: affiliation”, a label “rm: academic society KW”, and a label “rm: member” are associated with the interest item “affiliation”.

また、関心項目「学術論文」には、ラベル「ｒｍ：著者」、ラベル「ｒｍ：著者ＫＷ」が対応づけられている。 Further, the label “rm: author” and the label “rm: author KW” are associated with the item of interest “scientific paper”.

また、関心項目「サービス開発」には、ラベル「ｍｓ：ＫＷ」、ラベル「ｍｓ：担当者」、ラベル「ｍｓ：参考文献」、ラベル「ｍｓ：著者」が対応づけられている。 The item of interest “service development” is associated with a label “ms: KW”, a label “ms: person in charge”, a label “ms: reference”, and a label “ms: author”.

かかるラベルは、対応する関心項目に関連するものとして、グラフＧのラベルから選択されたものである。 Such a label is selected from the labels of the graph G as related to the corresponding item of interest.

（第１の実施の形態の動作）
図５は、第１の実施の形態のシーケンス図である。 (Operation of the first embodiment)
FIG. 5 is a sequence diagram of the first embodiment.

クラスタ生成装置１Ａでは、入力用インタフェース生成部１２が、入力用インタフェースを生成し、それをユーザ端末２に送信して（Ｓ１）、図６に示したように表示させる（Ｓ３）。ここでの操作により特定された１以上のキーワードＫＷと１以上のクラスＣＬの一方または両方、各関心項目の関心度（大、中、小のいずれか）の組合せ、並びに、しきい値ＴＨをユーザ端末２がクラスタ生成装置１Ａに送信する（Ｓ５）。しきい値ＴＨは、これを例えば低く設定して、ノードを多く含むクラスタを表示させるためのものである。なお、これらのキーワードＫＷなどは、クラスタ生成装置１Ａに記憶しておき、それを適宜読み出して使用してもよい。 In the cluster generation device 1A, the input interface generation unit 12 generates an input interface, transmits it to the user terminal 2 (S1), and displays it as shown in FIG. 6 (S3). One or both of one or more keywords KW and one or more classes CL specified by the operation here, a combination of interest levels (either large, medium, or small) of each item of interest, and a threshold value TH The user terminal 2 transmits to the cluster generation device 1A (S5). The threshold value TH is set to, for example, a low value to display a cluster including many nodes. Note that these keywords KW and the like may be stored in the cluster generation device 1A and read and used as appropriate.

（アークスコア設定）
クラスタ生成装置１Ａは、アークスコア設定部１４が、グラフＧのアークにアークスコアを設定する（Ｓ１１）。具体的には、アークスコアを含むクエリをデータベースインタフェース１７に送信する（Ｓ１１１）ことで、アークスコアが設定されたグラフＧを取得する（Ｓ１１２）。 (Arc score setting)
In the cluster generation device 1A, the arc score setting unit 14 sets an arc score for the arc of the graph G (S11). Specifically, a query G including the arc score is acquired by transmitting a query including the arc score to the database interface 17 (S111) (S112).

例えば、関心項目「所属」にラベル記憶部１３で対応づけられたラベル「ｐｅｒｓｏｎ：所属」、ラベル「ｒｍ：学会ＫＷ」、ラベル「ｒｍ：会員」に対し、関心項目「所属」の関心度に応じたアークスコアを設定する。 For example, for the label “person: affiliation”, the label “rm: academic society KW”, and the label “rm: member” associated with the interest item “affiliation” in the label storage unit 13, the interest level of the interest item “affiliation” is set. Set the corresponding arc score.

また、関心項目「学術論文」にラベル記憶部１３で対応づけられたラベル「ｒｍ：著者」、ラベル「ｒｍ：著者ＫＷ」に対し、関心項目「学術論文」の関心度に応じたアークスコアを設定する。 Further, an arc score corresponding to the interest level of the interest item “scholarly paper” is given to the label “rm: author” and the label “rm: author KW” associated with the interest item “scholarly paper” in the label storage unit 13. Set.

また、関心項目「サービス開発」にラベル記憶部１３で対応づけられたラベル「ｍｓ：ＫＷ」、ラベル「ｍｓ：担当者」、ラベル「ｍｓ：参考文献」、ラベル「ｍｓ：著者」に対し、関心項目「サービス開発」の関心度に応じたアークスコアを設定する。 In addition, for the label “ms: KW”, the label “ms: person in charge”, the label “ms: reference”, and the label “ms: author” associated with the item of interest “service development” in the label storage unit 13, An arc score corresponding to the interest level of the interest item “service development” is set.

例えば、関心度が「大」の場合はアークスコア「１．０」を設定し、関心度が「中」の場合はアークスコア「０．１」を設定し、関心度が「小」の場合はアークスコア「０．０１」を設定する。なお、関心度を数値とし、その数値である関心度に相関するアークスコアを設定してもよい。 For example, when the degree of interest is “large”, the arc score “1.0” is set, when the degree of interest is “medium”, the arc score “0.1” is set, and when the degree of interest is “small”. Sets the arc score "0.01". Note that the degree of interest may be a numerical value, and an arc score that correlates with the numerical value of interest may be set.

そのグラフＧの各ノードのインスタンスをそのノードのクラスに置き換え、同一のインスタンス（クラス）をもつノードではその中の１つのみ残し、そのようにしてなるグラフに上記の各アークスコアを当てはめると、その一部は、例えば図７のようになる。 Replacing the instance of each node in the graph G with the class of the node, leaving only one of the nodes having the same instance (class), and applying the above arc scores to the graph thus formed, A part of it is as shown in FIG. 7, for example.

（ノードスコア計算）
次に、クラスタ生成装置１Ａは、ノードスコア計算部１５が、グラフＧのノードのスコア（ノードスコアという）を計算する（Ｓ１３）。 (Node score calculation)
Next, in the cluster generation device 1A, the node score calculation unit 15 calculates the score of the node of the graph G (referred to as node score) (S13).

図８は、ノードスコア計算のフローチャートを示す図である。かかる計算は、アークスコアを基にしたものであり、リンク解析ともいう。 FIG. 8 is a diagram showing a flowchart of node score calculation. Such calculation is based on the arc score and is also called link analysis.

ノードスコア計算部１５は、まず、グラフＧのノードに、その重要度を示すこととなる、ノードスコアの初期値「１」を設定する（Ｓ１０１）。 The node score calculation unit 15 first sets an initial value “1” of the node score, which indicates the importance level, to a node of the graph G (S101).

次に、各ノードにつき、例えば、PageRankのアルゴリズムを用いた、図９の式により、ノードスコアを計算する（Ｓ１０３）。 Next, for each node, for example, a node score is calculated by the equation of FIG. 9 using the PageRank algorithm (S103).

そして、このノードスコアで、グラフＧに設定されたノードスコアの初期値を更新する（Ｓ１０５）。このような計算更新を、ノードスコアが収束するまで、再帰的に行う。これにより、ノードスコアが重要度を示すこととなる。 Then, the initial value of the node score set in the graph G is updated with this node score (S105). Such calculation update is performed recursively until the node score converges. As a result, the node score indicates the importance.

グラフＧに各アークスコアを当てはめたグラフの一部が、例えば図１０に示すようなものであった場合、同図に示すように、例えば、インスタンス「ＸＭＬ」をもつノードのノードスコアは０．４となる。インスタンス「Person:電々三郎」をもつノードのノードスコアは０．９となる。インスタンス「org:H3G」をもつノードのノードスコアは０．６となる。 If a part of the graph in which each arc score is applied to the graph G is as shown in FIG. 10, for example, the node score of the node having the instance “XML” is 0. 4 The node score of the node having the instance “Person: Dentsu Saburo” is 0.9. The node score of the node having the instance “org: H3G” is 0.6.

次に、クラスタ生成装置１Ａは、クラスタ生成部１６が、グラフＧからクラスタを生成する（Ｓ１５）。ここでは、グラフＧでキーワードＫＷに等しいインスタンスをもつノードまたはグラフＧでクラスＣＬが定義されたノードである各ノードにつき、そのノード（キーノードという）を含むクラスタを生成する。キーノードと同数のクラスタが生成される。 Next, in the cluster generation device 1A, the cluster generation unit 16 generates a cluster from the graph G (S15). Here, for each node that is a node having an instance equal to the keyword KW in the graph G or a node having the class CL defined in the graph G, a cluster including the node (referred to as a key node) is generated. As many clusters as key nodes are created.

図１１は、１つのクラスタを生成するフローチャートを示す図であり、このフローチャートによるクラスタ生成が各キーノードについて同時に行われる。 FIG. 11 is a diagram showing a flowchart for generating one cluster, and cluster generation according to this flowchart is simultaneously performed for each key node.

まず、クラスタ生成部１６は、グラフＧから所定条件を満たさないノードを除外する（Ｓ２０１）。ここでは、キーノードを起点とし、起点に単一のアークで接続されたノードであり且つしきい値以下のノードスコアをもつノードを除外する（Ｓ２０１）。次に、除外されなかったノードを候補としてこれををクラスタに含める又は除外する処理を１候補づつ行うのだが、まず、未処理である１つの候補が、他のクラスタ生成でも候補になっているか否かを判定する（Ｓ２０３）。ＮＯと判定された場合は、候補を本クラスタ生成に係るクラスタに含める（Ｓ２０５）。一方、ＹＥＳと判定された場合は、候補と本クラスタ生成に係るキーノードとの間のアーク数、候補と他のクラスタ生成に係るキーノードとの間のアーク数を計算し、それらの中で本クラスタ生成に係るアーク数が最大であるか否かを判定する（Ｓ２０７）。ＮＯと判定された場合は、候補を本クラスタ生成の候補から除外する（Ｓ２０９）。ＹＥＳと判定された場合は、候補と他のクラスタ生成に係るキーノードとの間のアーク数の中で、候補と本クラスタ生成に係るキーノードとの間のアーク数と同数のものがあるか否かを判定する（Ｓ２１１）。つまり、他にもアーク数が最大のものがあるか否かを判定する（Ｓ２１１）。ＮＯと判定された場合は、候補を本クラスタ生成に係るクラスタに含める（Ｓ２０５）。ＹＥＳと判定された場合は、候補のノードスコアとその候補の本クラスタ生成に係る起点のノードスコアとの差分、候補のノードスコアとその候補の他のクラスタ生成に係る起点のノードスコアとの差分を計算し、その中で本クラスタ生成に係る差分が最小か否かを判定する（Ｓ２１３）。ＮＯと判定された場合は、候補を本クラスタ生成に係るクラスタに含め候補から除外する（Ｓ２０９）。ＹＥＳと判定された場合は、候補を本クラスタ生成に係るクラスタに含める（Ｓ２０５）。 First, the cluster generation unit 16 excludes nodes that do not satisfy the predetermined condition from the graph G (S201). Here, a node that has a node score starting from a key node and connected to the starting point by a single arc and having a node score equal to or lower than a threshold value is excluded (S201). Next, the process of including or excluding a node that has not been excluded as a candidate is performed one by one. First, whether an unprocessed candidate is also a candidate for generating another cluster. It is determined whether or not (S203). If NO is determined, the candidate is included in the cluster related to this cluster generation (S205). On the other hand, if the determination is YES, the number of arcs between the candidate and the key node related to the generation of the cluster and the number of arcs between the candidate and the key node related to the generation of the other cluster are calculated, It is determined whether or not the number of arcs related to generation is the maximum (S207). If NO is determined, the candidate is excluded from the candidates for generating the cluster (S209). If YES, whether or not there is the same number of arcs between the candidate and the key node related to the cluster generation as the number of arcs between the candidate and the key node related to the cluster generation Is determined (S211). That is, it is determined whether there is another arc with the maximum number of arcs (S211). If NO is determined, the candidate is included in the cluster related to this cluster generation (S205). If it is determined as YES, the difference between the candidate node score and the node score of the starting point related to the main cluster generation of the candidate, the difference between the candidate node score and the node score of the starting point related to the other cluster generation of the candidate Is calculated, and it is determined whether or not the difference related to the generation of this cluster is the smallest (S213). If it is determined NO, the candidate is included in the cluster related to this cluster generation and excluded from the candidates (S209). If YES is determined, the candidate is included in the cluster related to this cluster generation (S205).

ステップＳ２０５、Ｓ２０５を行ったら、次に、未処理の候補があるか否かを判定する（Ｓ２１５）。ＹＥＳと判定された場合は、ステップＳ２０３に戻る。ＮＯと判定された場合は、クラスタに含められたノードの１つを新たな起点とし（Ｓ２１７）、ステップＳ２０１に戻る。そのステップＳ２０１では、その新たな起点に単一のアークで接続されたノードであり且つしきい値以下のノードスコアをもつノードを除外する（Ｓ２０１）。 After performing steps S205 and S205, it is next determined whether or not there is an unprocessed candidate (S215). When it determines with YES, it returns to step S203. When it is determined NO, one of the nodes included in the cluster is set as a new starting point (S217), and the process returns to step S201. In step S201, nodes that are connected to the new starting point by a single arc and have a node score equal to or lower than the threshold value are excluded (S201).

なお、ある起点に単一のアークで接続されたノードからクラスタに含ませられたものが複数あり（便宜的にノードｎ１１，ｎ１２という）、その中の１つ（ノードｎ１１とする）が新たな起点となって、その起点に単一のアークで接続されたノード（ノードｎ１１１という）がクラスタに含ませられた場合は、ノードｎ１１１よりもノードｎ１２が先に起点となる。 There are a plurality of nodes included in the cluster from nodes connected to a starting point by a single arc (referred to as nodes n11 and n12 for convenience), and one of them (referred to as node n11) is a new one. When a node connected to the starting point by a single arc (referred to as node n111) is included in the cluster, node n12 is the starting point before node n111.

図１２は、２つのクラスタが生成された様子を示す図であり、アークの矢印とラベル、ノードのインスタンスについては、記載省略している。 FIG. 12 is a diagram illustrating a state in which two clusters are generated, and descriptions of arc arrows, labels, and node instances are omitted.

同図に示すように、ノードＩ１とキーノードＫＩ１との間のアーク数は「３」であり、ノードＩ１とキーノードＫＩ２との間のアーク数は「２」であるから、ノードＩ１は、キーノードＫＩ１を含むクラスタＣ１でなく、キーノードＫＩ２を含むクラスタＣ２に含められる。 As shown in the figure, since the number of arcs between the node I1 and the key node KI1 is “3” and the number of arcs between the node I1 and the key node KI2 is “2”, the node I1 is the key node KI1. Is included in the cluster C2 including the key node KI2.

また、同図に示すように、ノードＩ２とキーノードＫＩ１との間のアーク数は「２」であり、ノードＩ２とキーノードＫＩ２との間のアーク数は「２」である。また、ノードＩ２のノードスコア「０．６」とクラスタＣ１に係る起点Ｉ３のノードスコア「０．９」との差分は「０．３」であり、ノードＩ２のノードスコア「０．６」とクラスタＣ２に係る起点Ｉ４のノードスコア「０．８」との差分は「０．２」であるから、ノードＩ２はクラスタＣ２に含められる。 As shown in the figure, the number of arcs between the node I2 and the key node KI1 is “2”, and the number of arcs between the node I2 and the key node KI2 is “2”. Further, the difference between the node score “0.6” of the node I2 and the node score “0.9” of the starting point I3 related to the cluster C1 is “0.3”, and the node score “0.6” of the node I2 is Since the difference from the node score “0.8” of the starting point I4 related to the cluster C2 is “0.2”, the node I2 is included in the cluster C2.

次に、図５に戻り、出力用インタフェース生成部１８が、生成された各クラスタの出力用インタフェースを生成し、これをユーザ端末２に送信し（Ｓ１９）、各クラスタを表示させる（Ｓ２１）。 Next, returning to FIG. 5, the output interface generation unit 18 generates an output interface for each generated cluster, transmits it to the user terminal 2 (S19), and displays each cluster (S21).

第１の実施の形態によれば、インスタンスをもつノード間がラベルをもつアークによって接続されたグラフＧのアークに、指定された関心度に応じたアークスコアを設定するアークスコア設定手段（アークスコア設定部１４）と、そのアークスコアを基にして、グラフＧのノードについてのノードスコアを計算するノードスコア計算手段（ノードスコア計算部１５）と、その計算されたノードスコアを基にして、グラフＧのサブグラフであり且つ残り部分との重複のないサブグラフであるクラスタを生成するクラスタ生成手段（クラスタ生成部１６）とを備え、ノード間にあるアークについてのアークスコアを基に計算されたノードスコアが用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 According to the first embodiment, the arc score setting means (arc score) that sets an arc score according to the designated degree of interest in the arc of the graph G in which nodes having instances are connected by an arc having a label. A setting unit 14), a node score calculation means (node score calculation unit 15) for calculating a node score for a node of the graph G based on the arc score, and a graph based on the calculated node score Cluster generation means (cluster generation unit 16) for generating a cluster that is a subgraph of G and is a subgraph that does not overlap with the rest, and a node score calculated based on an arc score for arcs between nodes Is used, it is possible to generate a cluster that can show the relationship between nodes.

また、キーノード以外のノード（中間ノード）についてのノードスコアを用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 In addition, since the node score for nodes other than the key nodes (intermediate nodes) is used, a cluster that can show the relationship between the nodes can be generated.

また、図１１のステップＳ２０７、Ｓ２１１、Ｓ２１３を行うので、ノード間の関連性を示せるクラスタを生成することができる。 Further, since steps S207, S211, and S213 in FIG. 11 are performed, a cluster that can indicate the relationship between the nodes can be generated.

また、関心度に応じたアークスコアを設定するので、関心度に適したクラスタを生成することができる。 In addition, since an arc score corresponding to the degree of interest is set, a cluster suitable for the degree of interest can be generated.

また、ラベル記憶部１３において、関心項目に対応づけて予めラベルを記憶させ、指定された関心度に対応する関心項目に対応づけられたラベルをもつアークに、当該関心度に応じたアークスコアを設定するので、関心度に応じたクラスタを生成することができる。［第２の実施の形態］
図１３は、第２の実施の形態に係るクラスタ生成装置１Ｂの構成図である。 In the label storage unit 13, a label is stored in advance in association with the item of interest, and an arc score corresponding to the degree of interest is given to an arc having a label associated with the item of interest corresponding to the designated degree of interest. Since it sets, the cluster according to the degree of interest can be generated. [Second Embodiment]
FIG. 13 is a configuration diagram of the cluster generation device 1B according to the second embodiment.

クラスタ生成装置１Ｂは、ユーザ端末２に接続され、ユーザ端末２には、表示装置３が接続されている。 The cluster generation device 1 </ b> B is connected to the user terminal 2, and the display device 3 is connected to the user terminal 2.

クラスタ生成装置１Ｂは、グラフデータベース１１と、入力用インタフェース生成部１２と、ノードスコア記憶部１３Ａと、ノードスコア設定部１４Ａと、クラスタ生成部１６と、データベースインタフェース１７と、出力用インタフェース生成部１８とを備える。 The cluster generation device 1B includes a graph database 11, an input interface generation unit 12, a node score storage unit 13A, a node score setting unit 14A, a cluster generation unit 16, a database interface 17, and an output interface generation unit 18. With.

本実施の形態では、予め関心度の組合せのそれぞれにつき、図１０に示したようなノードスコアが計算され、ノードスコア記憶部１３Ａが、組合せに対応づけてノードスコアを記憶している。 In this embodiment, a node score as shown in FIG. 10 is calculated in advance for each combination of interest levels, and the node score storage unit 13A stores the node score in association with the combination.

（第２の実施の形態の動作）
図１４は、第２の実施の形態のシーケンス図である。 (Operation of Second Embodiment)
FIG. 14 is a sequence diagram of the second embodiment.

ステップＳ５までは、第１の実施の形態と同様である。 Up to step S5 is the same as in the first embodiment.

続いて、ノードスコア設定部１４Ａは、ユーザ端末２から送信された関心度の組合せに対応するノードスコアを選択し（Ｓ１１Ａ）、それをグラフＧのノードに設定する（Ｓ１１Ｂ）。具体的には、ノードスコアを含むクエリをデータベースインタフェース１７に送信する（Ｓ１１５）ことで、ノードスコアが設定されたグラフを取得する（Ｓ１１６）。 Subsequently, the node score setting unit 14A selects a node score corresponding to the combination of interest levels transmitted from the user terminal 2 (S11A), and sets it as a node of the graph G (S11B). Specifically, a query including the node score is transmitted to the database interface 17 (S115), thereby obtaining a graph in which the node score is set (S116).

そして、第１の実施の形態と同様に、そのノードスコアを設定したグラフからクラスタを生成する（Ｓ１５）。以降も、第１の実施の形態と同様である。 Then, as in the first embodiment, a cluster is generated from the graph in which the node score is set (S15). The subsequent steps are the same as in the first embodiment.

第２の実施の形態によれば、グラフＧのアークに、指定される関心度に応じたアークスコアが設定され、そのアークスコアを基にして、グラフＧのノードについてのノードスコアが計算されたときの当該ノードスコアが、対応する関心度に対応づけて予め記憶されるノードスコア記憶手段（ノードスコア記憶部１３Ａ）と、指定された関心度に対応づけてノードスコア記憶手段（１３Ａ）に記憶されたノードスコアをグラフＧに設定するノードスコア設定手段（ノードスコア設定部１４Ａ）と、設定されたノードスコアを基にして、グラフＧのクラスタを生成するクラスタ生成手段（１６）とを備え、ノード間にあるアークについてのアークスコアを基に計算されたノードスコアが用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 According to the second embodiment, an arc score corresponding to the specified degree of interest is set for the arc of the graph G, and the node score for the node of the graph G is calculated based on the arc score. The node score storage means (node score storage unit 13A) stores the node score in advance in association with the corresponding degree of interest, and stores it in the node score storage means (13A) in association with the designated degree of interest. Node score setting means (node score setting unit 14A) for setting the node score set in the graph G, and cluster generation means (16) for generating a cluster of the graph G based on the set node score, A node score calculated based on the arc score for arcs between nodes is used, so a cluster that can show the relationship between nodes is generated. Rukoto can.

また、関心度の組合せに対応するノードスコアを選択するので、関心度の組合せに適したクラスタを生成することができる。 Moreover, since the node score corresponding to the combination of interest levels is selected, a cluster suitable for the combination of interest levels can be generated.

［第３の実施の形態］
図１５は、第３の実施の形態に係るクラスタ生成装置１Ｃの構成図である。 [Third Embodiment]
FIG. 15 is a configuration diagram of the cluster generation device 1C according to the third embodiment.

クラスタ生成装置１Ｃは、ユーザ端末２に接続され、ユーザ端末２には、表示装置３が接続されている。 The cluster generation device 1 </ b> C is connected to the user terminal 2, and the display device 3 is connected to the user terminal 2.

クラスタ生成装置１Ｃは、グラフデータベース１１と、入力用インタフェース生成部１２と、ラベル記憶部１３と、アークスコア設定部１４と、ノードスコア計算部１５と、クラスタ生成部１６と、データベースインタフェース１７と、出力用インタフェース生成部１８と、グラフ処理部１９とを備える。 The cluster generation device 1C includes a graph database 11, an input interface generation unit 12, a label storage unit 13, an arc score setting unit 14, a node score calculation unit 15, a cluster generation unit 16, a database interface 17, An output interface generation unit 18 and a graph processing unit 19 are provided.

図１６は、第３の実施の形態のシーケンス図である。 FIG. 16 is a sequence diagram of the third embodiment.

予めクラスタ生成装置１Ｃにしきい値ＴＨＳが設定される。しきい値ＴＨＳは、経験的にしきい値ＴＨより低くなるように設定される。 A threshold value THS is set in advance in the cluster generation device 1C. The threshold value THS is empirically set to be lower than the threshold value TH.

ステップＳ１３までは、第１の実施の形態と同様である。 Up to step S13 is the same as in the first embodiment.

続いて、グラフ処理部１９が、そのグラフの一部を削除する（Ｓ１４Ａ）。ここでは、いずれかのキーノードに少なくとも１つのアークで接続され且つしきい値ＴＨＳより大きいノードスコアをもつノードを検出し、それ以外のノードとそのノードに接続されたアークを削除する。 Subsequently, the graph processing unit 19 deletes a part of the graph (S14A). Here, a node connected to any key node with at least one arc and having a node score greater than the threshold value THS is detected, and the other nodes and arcs connected to the node are deleted.

そして、そのグラフについてステップＳ１３と同様のノードスコア計算を行う（Ｓ１４Ｂ）。以降も、第１の実施の形態と同様である。 And the node score calculation similar to step S13 is performed about the graph (S14B). The subsequent steps are the same as in the first embodiment.

第３の実施の形態によれば、インスタンスをもつノード間がラベルをもつアークによって接続されたグラフＧのアークに、指定された関心度に応じたアークスコアを設定するアークスコア設定手段（１４）と、そのアークスコアを基にして、グラフＧのノードについてのノードスコアを計算する第１のノードスコア計算手段（ステップＳ１３が該当する）と、そのノードスコアを基にして、グラフＧの一部を削除するグラフ処理手段（グラフ処理部１９）と、一部を削除されたグラフのアークに設定されたアークスコアを基にして、当該グラフのノードについての新たなノードスコアを計算する第２のノードスコア計算手段（ステップＳ１４Ｂが該当する）と、そのノードスコアを基にして、グラフＧのクラスタを生成するクラスタ生成手段（１６）とを備え、ノード間にあるアークについてのアークスコアを基に計算されたノードスコアが用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 According to the third embodiment, the arc score setting means (14) for setting an arc score corresponding to the designated degree of interest in the arc of the graph G in which nodes having instances are connected by arcs having labels. And a first node score calculation means for calculating a node score for a node of the graph G based on the arc score (corresponding to step S13), and a part of the graph G based on the node score. And a graph processing means (graph processing unit 19) for deleting the second and a new node score for the node of the graph based on the arc score set for the arc of the partially deleted graph Node score calculating means (corresponding to step S14B) and cluster generating means for generating a cluster of the graph G based on the node score 16) and provided with a so calculated node score is used based on the arc score for arc located between nodes, it is possible to generate a cluster can show the relationship between nodes.

また、キーノードを含む部分以外を削除した後のグラフでノードスコアを計算するので、キーノードを含む部分でのノード間の関連性を示せるクラスタを生成することができる。 Further, since the node score is calculated from the graph after deleting the part other than the part including the key node, it is possible to generate a cluster that can show the relationship between the nodes in the part including the key node.

［第４の実施の形態］
図１７は、第４の実施の形態に係るクラスタ生成装置１Ｄの構成図である。 [Fourth Embodiment]
FIG. 17 is a configuration diagram of a cluster generation device 1D according to the fourth embodiment.

クラスタ生成装置１Ｄは、ユーザ端末２に接続され、ユーザ端末２には、表示装置３が接続されている。 The cluster generation device 1 </ b> D is connected to the user terminal 2, and the display device 3 is connected to the user terminal 2.

クラスタ生成装置１Ｄは、グラフデータベース１１と、入力用インタフェース生成部１２と、ラベル記憶部１３と、ノードスコア記憶部１３Ａと、アークスコア設定部１４、ノードスコア設定部１４Ａと、ノードスコア計算部１５と、クラスタ生成部１６と、データベースインタフェース１７と、出力用インタフェース生成部１８と、グラフ処理部１９とを備える。 The cluster generation device 1D includes a graph database 11, an input interface generation unit 12, a label storage unit 13, a node score storage unit 13A, an arc score setting unit 14, a node score setting unit 14A, and a node score calculation unit 15 A cluster generation unit 16, a database interface 17, an output interface generation unit 18, and a graph processing unit 19.

（第４の実施の形態の動作）
第４の実施の形態では、クラスタ生成装置１Ｄに、しきい値ＴＨより低くなるようにしきい値ＴＨＳが予め設定される。 (Operation of the fourth embodiment)
In the fourth embodiment, threshold value THS is preset in cluster generation device 1D so as to be lower than threshold value TH.

図１８は、第４の実施の形態のシーケンス図である。 FIG. 18 is a sequence diagram of the fourth embodiment.

ステップＳ１１Ｂまでは、第２の実施の形態と同様である。 Steps up to step S11B are the same as in the second embodiment.

続いて、グラフ処理部１９が、第３の実施の形態と同様なしきい値ＴＨＳを用いた判定により、そのグラフの一部を削除する（Ｓ１４Ａ）。そして、一部を削除したグラフについて、第１の実施の形態のステップＳ１１と同様なアークスコア設定を行い（Ｓ１２）、続いて、当該グラフについて、第１の実施の形態のステップＳ１３と同様なノードスコア計算を行う（Ｓ１４Ｂ）。以降も、第１の実施の形態と同様である。 Subsequently, the graph processing unit 19 deletes a part of the graph by the determination using the threshold THS similar to that in the third embodiment (S14A). Then, the same arc score setting as in step S11 of the first embodiment is performed for the graph from which a part has been deleted (S12). Subsequently, the same graph as in step S13 of the first embodiment is performed for the graph. Node score calculation is performed (S14B). The subsequent steps are the same as in the first embodiment.

第４の実施の形態によれば、グラフＧのアークに、指定される関心度に応じたアークスコアが設定され、そのアークスコアを基にして、グラフＧのノードについてのノードスコアが計算されたときの当該ノードスコアが、対応する関心度に対応づけて予め記憶されるノードスコア記憶手段（１３Ａ）と、指定された関心度に対応づけてノードスコア記憶手段（１３Ａ）に記憶されたノードスコアをグラフＧに設定するノードスコア設定手段（１４Ａ）と、その設定されたノードスコアを基にして、グラフの一部を削除するグラフ処理手段（１９）と、一部を削除されたグラフのアークに、指定された関心度に応じたアークスコアを設定するアークスコア設定手段（１４）と、一部を削除されたグラフのアークに設定されたアークスコアを基にして、当該グラフのノードについての新たなノードスコアを計算するノードスコア計算手段（１５）と、そのノードスコアを基にして、グラフＧのクラスタを生成するクラスタ生成手段（１６）とを備え、ノード間にあるアークについてのアークスコアを基に計算されたノードスコアが用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 According to the fourth embodiment, an arc score corresponding to the designated degree of interest is set for the arc of the graph G, and the node score for the node of the graph G is calculated based on the arc score. The node score stored in the node score storage means (13A) is stored in advance in association with the corresponding degree of interest, and the node score storage means (13A) is stored in the node score storage means (13A) in association with the designated degree of interest. A node score setting means (14A) for setting a graph G, a graph processing means (19) for deleting a part of the graph based on the set node score, and an arc of the graph from which a part is deleted Based on the arc score setting means (14) for setting the arc score according to the specified degree of interest, and the arc score set for the arc of the partially deleted graph A node score calculation means (15) for calculating a new node score for the node of the graph, and a cluster generation means (16) for generating a cluster of the graph G based on the node score. Since the node score calculated based on the arc score for the arc in between is used, a cluster that can show the relationship between the nodes can be generated.

［第５の実施の形態］
図１９は、第５の実施の形態に係るクラスタ生成装置１Ｅの構成図である。 [Fifth Embodiment]
FIG. 19 is a configuration diagram of a cluster generation device 1E according to the fifth embodiment.

グラフ検索装置１Ｅは、ユーザ端末２に接続され、ユーザ端末２には、表示装置３が接続されている。 The graph search device 1E is connected to a user terminal 2, and a display device 3 is connected to the user terminal 2.

クラスタ生成装置１Ｅは、グラフデータベース１１と、パターンデータベース１１Ａと、入力用インタフェース生成部１２と、ラベル記憶部１３と、アークスコア設定部１４と、パターン検索部１４Ｂと、ノードスコア計算部１５と、クラスタ生成部１６と、データベースインタフェース１７と、出力用インタフェース生成部１８とを備える。 The cluster generation device 1E includes a graph database 11, a pattern database 11A, an input interface generation unit 12, a label storage unit 13, an arc score setting unit 14, a pattern search unit 14B, a node score calculation unit 15, A cluster generation unit 16, a database interface 17, and an output interface generation unit 18 are provided.

パターンデータベース１１Ａには、グラフＧからサブグラフを検索するためのパターンが記憶される。 The pattern database 11A stores a pattern for searching for a subgraph from the graph G.

（パターンの説明）
ここでパターンについて説明する。 (Description of pattern)
Here, the pattern will be described.

パターンは、グラフデータベース１１に記憶されるデータ群（グラフＧ）の一部をなすデータ群と同様なものであり、それを図２０のようにグラフ化できるので、便宜的にはグラフと言えるが、パターンは表示するものではなく、表示されるグラフの検索に使用されるものである。パスパターンはパスを検索するパターンであり、サブグラフパターンはサブグラフを検索するものである。 The pattern is the same as the data group forming a part of the data group (graph G) stored in the graph database 11 and can be graphed as shown in FIG. The pattern is not displayed but is used for searching the displayed graph. The path pattern is a pattern for searching for a path, and the subgraph pattern is for searching for a subgraph.

パスパターンは、例えば、同図のようなグラフ（パス）である。 The path pattern is, for example, a graph (path) as shown in FIG.

パターンはグラフと同様に、実際にはデータ群であるが、それを逐一説明するのは冗長であるから、パターンの説明はグラフ化されたもので行う。 Like a graph, a pattern is actually a data group, but since it is redundant to explain each pattern, the pattern is explained in a graph.

一般的にパターンでは、ノードやアークの一部はインスタンスやラベルをもち、残りはそれらをもたない。そして、インスタンスやラベルをもたないノードやアークには変数が設定される。変数は、同図に示すように、？とそれに後続する単語からなる。 In general, in a pattern, some nodes and arcs have instances and labels, and the rest do not. Variables are set for nodes and arcs that do not have instances or labels. As shown in the figure, the variables are Followed by a word.

また、一般的にパターンでは、必要があれば、変数が設定された一部のノードに、そのパターンにより検索されるグラフにおける、同位置のノードがもつインスタンスの上位概念を示すクラスが予め定義される。 In general, in a pattern, if necessary, a class indicating a high-level concept of an instance of a node at the same position in a graph searched by the pattern is defined in advance for some nodes set with variables. The

同図のパスパターンでは、キーワードＫＷ「セマンティックｗｅｂ」がインスタンスであるノードを一方端とすると、他方端のノードにクラスＣＬ「Ｐｅｒｓｏｎ：人」が定義されている。 In the path pattern shown in the figure, when a node having the keyword KW “semantic web” as an instance is one end, a class CL “Person: person” is defined in the other end node.

このようなパターンによって、あるグラフから検索されるサブグラフは、以下の条件を備えるものである。 A subgraph retrieved from a certain graph by such a pattern has the following conditions.

つまり、検索されるのは、（１）そのグラフまたはそのサブグラフであって、（２）パターンの構造を過不足なく有し、（３）パターン内でのインスタンスやラベルを過不足なく有し、つまりパターン内でのインスタンスやラベルをもつノードやアークの位置に等しい位置にあるノードやアークが当該インスタンスに等しいインスタンスやラベルを有し、（４）場合によってはパターンに含まれるクラスが定義されたノードの位置に等しい位置にあるノードが当該クラスの下位概念であるインスタンスを有するものである。 In other words, what is searched is (1) the graph or its subgraph, (2) having a pattern structure without excess or deficiency, (3) having instances or labels within the pattern without deficiency, In other words, nodes and arcs at positions equal to the positions of nodes and arcs with instances and labels in the pattern have instances and labels equal to the instances, and (4) in some cases, classes included in the pattern are defined A node at a position equal to the position of the node has an instance that is a subordinate concept of the class.

なお、パターンにより、このようにしてサブグラフを検索することを、パターンにマッチするサブグラフを検索するという。 Note that searching for a subgraph in this way by pattern is referred to as searching for a subgraph that matches the pattern.

図２１は、第５の実施の形態のシーケンス図である。 FIG. 21 is a sequence diagram of the fifth embodiment.

次に、キーワードＫＷに等しいインスタンスをもつノードをもつパターンや、グラフＧでクラスＣＬが定義されたノードのインスタンスに等しいインスタンスをもつノードをもつパターンや、その両方をもつパターンを検索する（Ｓ１１Ｃ）。 Next, a pattern having a node having an instance equal to the keyword KW, a pattern having a node having an instance equal to an instance of a node whose class CL is defined in the graph G, and a pattern having both are searched (S11C). .

次に、アークスコア設定部１４が、グラフＧでそのパターンにマッチするサブグラフにおけるアークに対し、第１の実施の形態のステップＳ１１と同様に、アークスコアを設定する（Ｓ１２）。具体的には、パターンの内容やアークスコアを含むクエリをデータベースインタフェース１７に送信する（Ｓ１２Ａ）ことで、グラフＧでそのパターンにマッチするサブグラフであってアークスコアが設定されたグラフを取得する（Ｓ１２Ｂ）。 Next, the arc score setting unit 14 sets an arc score for the arc in the subgraph that matches the pattern in the graph G, as in step S11 of the first embodiment (S12). Specifically, by sending a query including the contents of the pattern and the arc score to the database interface 17 (S12A), a graph that is a sub-graph that matches the pattern in the graph G and is set with the arc score is acquired ( S12B).

そして、そのグラフについて、第１の実施の形態と同様のノードスコア計算を行う（Ｓ１３）。以降も、第１の実施の形態と同様である。 And the node score calculation similar to 1st Embodiment is performed about the graph (S13). The subsequent steps are the same as in the first embodiment.

第５の実施の形態によれば、インスタンスをもつノード間がラベルをもつアークによって接続されたグラフＧのサブグラフを検索する際に用いられるパターンが記憶されるパターンデータベース１１Ａと、指定された条件にマッチするパターンを検索するパターン検索手段（パターン検索部１４Ｂ）と、検索されたパターンにマッチするサブグラフのアークに、指定された関心度に応じたアークスコアを設定するアークスコア設定手段（１４）と、そのアークスコアを基にして、サブグラフのノードについてのノードスコアを計算するノードスコア計算手段（１５）と、そのノードスコアを基にして、サブグラフのクラスタを生成するクラスタ生成手段（１６）とを備え、ノード間にあるアークについてのアークスコアを基に計算されたノードスコアが用いられるので、ノード間の関連性を示せるクラスタを生成することができる。 According to the fifth embodiment, the pattern database 11A that stores a pattern used when searching a subgraph of the graph G in which nodes having instances are connected by arcs having labels is stored in the specified condition. A pattern search unit (pattern search unit 14B) for searching for a matching pattern, and an arc score setting unit (14) for setting an arc score corresponding to the designated degree of interest in the arc of the subgraph that matches the searched pattern. A node score calculation means (15) for calculating a node score for a node of the subgraph based on the arc score, and a cluster generation means (16) for generating a cluster of the subgraph based on the node score. Prepared and no calculated based on arc score for arcs between nodes Because the score is used, it is possible to produce a cluster of can show the relationship between nodes.

第１の実施の形態に係るクラスタ生成装置１Ａの構成図である。1 is a configuration diagram of a cluster generation device 1A according to a first embodiment. FIG. グラフＧの一部を例示した図である。6 is a diagram illustrating a part of a graph G. FIG. グラフＧを表示するためのデータ群の一部を例示した図である。6 is a diagram illustrating a part of a data group for displaying a graph G. FIG. ラベル記憶部１３の記憶内容を示す図である。It is a figure which shows the memory content of the label memory | storage part. 第１の実施の形態のシーケンス図である。It is a sequence diagram of a 1st embodiment. 表示された入力用インタフェースを例示した図である。It is the figure which illustrated the displayed interface for input. アークスコアの一部をグラフに当てはめた図である。It is the figure which applied a part of arc score to the graph. ノードスコア計算のフローチャートを示す図である。It is a figure which shows the flowchart of node score calculation. ノードスコアの計算式を示した図である。It is the figure which showed the calculation formula of a node score. ノードスコア計算の例にしたグラフを示す図である。It is a figure which shows the graph made into the example of node score calculation. １つのクラスタを生成するフローチャートを示す図The figure which shows the flowchart which produces | generates one cluster ２つのクラスタが生成された様子を示す図である。It is a figure which shows a mode that two clusters were produced | generated. 第２の実施の形態に係るクラスタ生成装置１Ｂの構成図である。It is a block diagram of the cluster production | generation apparatus 1B which concerns on 2nd Embodiment. 第２の実施の形態のシーケンス図である。It is a sequence diagram of a 2nd embodiment. 第３の実施の形態に係るクラスタ生成装置１Ｃの構成図である。It is a block diagram of the cluster production | generation apparatus 1C which concerns on 3rd Embodiment. 第３の実施の形態のシーケンス図である。It is a sequence diagram of a third embodiment. 第４の実施の形態に係るクラスタ生成装置１Ｄの構成図である。It is a block diagram of the cluster production | generation apparatus 1D which concerns on 4th Embodiment. 第４の実施の形態のシーケンス図である。It is a sequence diagram of a fourth embodiment. 第５の実施の形態に係るクラスタ生成装置１Ｅの構成図である。It is a block diagram of the cluster production | generation apparatus 1E which concerns on 5th Embodiment. パスパターンを例示した図である。It is the figure which illustrated the path pattern. 第５の実施の形態のシーケンス図である。It is a sequence diagram of a 5th embodiment.

Explanation of symbols

１Ａ、１Ｂ、１Ｃ、１Ｄ、１Ｅ：クラスタ生成装置
１１グラフデータベース
１１Ａパターンデータベース
１３ラベル記憶部
１３Ａノードスコア記憶部
１４アークスコア設定部
１４Ａノードスコア設定部
１５ノードスコア計算部
１６クラスタ生成部
１９グラフ処理部 1A, 1B, 1C, 1D, 1E: Cluster generation device 11 Graph database 11A Pattern database 13 Label storage unit 13A Node score storage unit 14 Arc score setting unit 14A Node score setting unit 15 Node score calculation unit 16 Cluster generation unit 19 Graph processing Part

Claims

An arc score setting means for setting an arc score corresponding to a specified degree of interest in an arc of a graph in which nodes having instances are connected by an arc having a label;
Node score calculation means for calculating a node score for a node of the graph based on the arc score;
Cluster generation means for generating a cluster that is a subgraph of the graph and is a subgraph that does not overlap with the rest based on the calculated node score ;
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference If is one cluster generating device, wherein a node of the candidate Ru contained in the cluster.

An arc score corresponding to a specified degree of interest is set to an arc of a graph in which nodes having instances are connected by an arc having a label. Based on the arc score, a node score for the node of the graph is determined based on the arc score. A node score storage means in which the calculated node score is stored in advance in association with the corresponding degree of interest;
Node score setting means for setting the node score stored in the node score storage means in association with the designated degree of interest in the graph;
Cluster generating means for generating a cluster that is a sub-graph of the graph based on the set node score and is a sub-graph that does not overlap with the rest , and
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference If is one cluster generating device, wherein a node of the candidate Ru contained in the cluster.

An arc score setting means for setting an arc score corresponding to a specified degree of interest in an arc of a graph in which nodes having instances are connected by an arc having a label;
First node score calculation means for calculating a node score for a node of the graph based on the arc score;
Graph processing means for deleting a part of the graph based on the node score;
Second node score calculation means for calculating a new node score for a node of the graph based on the arc score set for the arc of the graph from which a part has been deleted;
Cluster generating means for generating a cluster that is a sub-graph of the graph based on the node score and does not overlap with the rest , and
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference If is one cluster generating device, wherein a node of the candidate Ru contained in the cluster.

An arc score corresponding to a specified degree of interest is set to an arc of a graph in which nodes having instances are connected by an arc having a label. Based on the arc score, a node score for the node of the graph is determined based on the arc score. A node score storage means in which the calculated node score is stored in advance in association with the corresponding degree of interest;
Node score setting means for setting the node score stored in the node score storage means in association with the designated degree of interest in the graph;
Graph processing means for deleting a part of the graph based on the set node score;
Arc score setting means for setting an arc score according to the specified degree of interest in the arc of the graph from which a part has been deleted;
Node score calculation means for calculating a new node score for a node of the graph based on the arc score set for the arc of the graph from which a part has been deleted;
Cluster generating means for generating a cluster that is a sub-graph of the graph based on the node score and does not overlap with the rest , and
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference If is one cluster generating device, wherein a node of the candidate Ru contained in the cluster.

A pattern database that stores patterns used to search subgraphs of a graph in which nodes having instances are connected by arcs having labels;
A pattern search means for searching for a pattern that matches a specified condition;
Arc score setting means for setting an arc score according to the specified degree of interest to the arc of the subgraph that matches the searched pattern;
Node score calculating means for calculating a node score for the node of the subgraph based on the arc score;
Based on the node score, and a cluster generation means for generating a cluster which is non-overlapping sub-graph of a subgraph and is and the remaining part of the subgraph,
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference If is one cluster generating device, wherein a node of the candidate Ru contained in the cluster.

Label storage means for storing a label in advance in association with an item of interest corresponding to the specified degree of interest;
The arc score setting means sets an arc score corresponding to the designated degree of interest for an arc having a label associated with the item of interest corresponding to the designated degree of interest by the label storage means. The cluster generation device according to any one of claims 1, 3, 4, and 5 .

The arc score setting means of the cluster generation device according to claim 1 sets an arc score corresponding to a designated degree of interest in the arc of the graph,
The node score calculation means of the cluster generation device calculates a node score for a node of the graph based on the arc score,
When the cluster generation unit of the cluster generation device generates a cluster that is a sub-graph of the graph and a sub-graph with no overlap with the rest based on the calculated node score ,
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference Is a cluster generation method , wherein the candidate node is included in the cluster.

The node score setting unit of the cluster generation device according to claim 2 sets a node score stored in the node score storage unit in association with a designated degree of interest in the graph,
When the cluster generation unit of the cluster generation device generates a cluster that is a sub-graph of the graph and a sub-graph that does not overlap with the rest based on the set node score ,
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference Is a cluster generation method , wherein the candidate node is included in the cluster.

The arc score setting means of the cluster generation device according to claim 3 sets an arc score corresponding to a designated degree of interest in the arc of the graph,
The first node score calculation means of the cluster generation device calculates a node score for the node of the graph based on the arc score,
The graph processing means of the cluster generation device deletes a part of the graph based on the node score,
The second node score calculation means of the cluster generation device calculates a new node score for the node of the graph based on the arc score set for the arc of the graph from which a part has been deleted,
When the cluster generation means of the cluster generation device generates a cluster that is a subgraph of the graph and a subgraph that does not overlap with the rest, based on the node score ,
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference Is a cluster generation method , wherein the candidate node is included in the cluster.

The node score setting unit of the cluster generation device according to claim 4 sets a node score stored in the node score storage unit in association with a designated degree of interest in the graph,
The graph processing means of the cluster generation device deletes a part of the graph based on the set node score,
The arc score setting means of the cluster generation device sets an arc score corresponding to the designated degree of interest in the arc of the graph from which a part has been deleted,
The node score calculation means of the cluster generation device calculates a new node score for the node of the graph based on the arc score set for the arc of the graph from which a part has been deleted,
Based on the node score, the cluster generation unit of the cluster generation device generates a cluster that is a subgraph of the graph and has no overlap with the remaining portion.
When
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference Is a cluster generation method , wherein the candidate node is included in the cluster.

The pattern search unit of the cluster generation device according to claim 5 searches the pattern database for a pattern that matches a specified condition,
The arc score setting means of the cluster generation device sets an arc score corresponding to the specified degree of interest in the arc of the subgraph that matches the searched pattern,
The node score calculation means of the cluster generation device calculates a node score for the node of the subgraph based on the arc score,
When the cluster generation means of the cluster generation device generates a cluster that is a subgraph of the subgraph based on the node score and is a subgraph that does not overlap with the remaining portion ,
The cluster generation means includes
Calculate the number of arcs between each of the starting nodes preset to be included in each of the plurality of clusters and one node that is a candidate for inclusion in each of the clusters, and obtain the largest number of arcs If there is one corresponding cluster, include the candidate node in the cluster,
If there are multiple clusters corresponding to the largest number of arcs, the difference between the node score of each node at the starting point of the cluster and the node score of the candidate node is calculated, and the cluster corresponding to the smallest difference Is a cluster generation method , wherein the candidate node is included in the cluster.