JP3652913B2

JP3652913B2 - Correlation analyzer and recording medium

Info

Publication number: JP3652913B2
Application number: JP6593499A
Authority: JP
Inventors: 直輝赤星; 一隆荻原; 理一郎武
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-03-12
Filing date: 1999-03-12
Publication date: 2005-05-25
Anticipated expiration: 2019-03-12
Also published as: JP2000259612A

Description

【０００１】
【発明の属する技術分野】
本発明は、データベースなどに記録された大量のデータの処理に係り、そのデータに含まれる属性間の相関ルールを生成するデータマイニングと統計処理を行う相関分析装置及び記録媒体に関する。
【０００２】
【従来の技術】
以下、従来例を説明する。
§１：データマイニングと相関分析の説明
データマイニングとは、データベース中に格納された大量のデータの中から、有益な情報を見い出すための技術である。データマイニングの代表的な例として、相関分析がある。相関分析は、データの属性値の全相関関係を網羅的に調べ、属性間に存在する規則性を相関ルールとして見い出す手法であり、文献１（「R.Agrawal and R.Srikant,^Fast Algorithm for Mining Association Rules" 」及び対応した特開平８−２６３３４６号公報（大規模データベース内の順次パターンをマイニングするためのシステムおよび方法」に記述される効率的な手法が提案されている。なお、前記文献１の詳細は次の通りである。
【０００３】
文献１：「掲載書名：Proceedings of the 20th International Conference on Very Large Data Bases ，掲載書出版社：Morgan Kaufman Publishing ，ＩＳＢＮ：1-55860-153-8,発行年月：1994,9，掲載ページ：487-499 ，論文名：Fast Algorithms for Mining Association Rules, 論文著者：R.Agrawal and R.Srikant,」
【０００４】
§２：相関ルールの説明・・・図８のＡ図参照
図８は従来例の説明図（その１）であり、Ａ図はアイテムとトランザクションの説明図である。前記文献１によれば、相関ルールは次のように定義される。先ず、アイテムの集合Ｉを、Ｉ＝｛ｉ１，ｉ２，・・・ｉｍ｝とする。また、トランザクションの集合Ｄを、Ｄ＝｛ｔ１，ｔ２，・・・ｔｎ｝とする。
【０００５】
例えば、小売店での売り上げデータに対応させると、図８のＡ図に示したように、１枚のレシート（レシート１、２、・・・ｎのいずれか１枚）がトランザクションｔ（ｔ１、ｔ２、・・・ｔｎ）に対応し、レシート中の購入された商品（販売品目Ａ、Ｂ、Ｃ・・・Ｍ等）１つ１つがアイテムｉに対応する。ここで、或るアイテムの組み合わせＸについて、もし、ｓ％のトランザクションがＸを含めば、Ｘのサポートはｓ％であると定義する。
【０００６】
相関ルールは、Ｘ→Ｙで定義され、ルールの確信度とルールのサポートの２つの値を持つ。ルールのサポートは、Ｘ∩Ｙのサポートであり、ルールの確信度は（Ｘ∩Ｙのサポート）／（Ｘのサポート）で定義される。例えば、「パン→バター確信度３０％サポート８％」といった相関ルールは、「パンを購入した客の内３０％がバターも一緒に購入しており、パンとバターを両方一緒に購入した客は全体の８％である」ことを意味する。
【０００７】
トランザクション中に含まれる相関ルールを全て求めようとすると、組み合わせ爆発を起こし易いことが知られており、処理に多くの時間と多量のメモリやディスクといった計算機資源を必要とする。このため、ユーザがサポートの最小値及び確信度の最小値を与えて、与えた条件を満たす相関ルールのみを求める方式が用いられている。
【０００８】
つまり、相関分析とは、トランザクションの集合の中から、ユーザによって与えられたサポートと確信度を満たす全ての相関ルールを求める処理である。相関分析処理は、元データ中から与えられたサポートの最小値を満たす全てのアイテムの組み合わせの出現回数を得る「数え上げ処理」と、得られたアイテムの組み合わせと出現回数から与えられた確信度の最小値を満たす全ての相関ルールを生成する「ルール生成処理」の２つの処理からなる。
【０００９】
前者の「数え上げ処理」については処理に時間がかかることが知られている。できるだけ不要な組み合わせ生成を行わない方法として、前記文献１では、長さ１のアイテムの組み合わせについて与えられたサポートの最小値を満たすものについて数え上げを行い（ステップ１）、その結果に基づいて与えられたサポートの最小値を満たす長さ２のアイテムの組み合わせを数え上げ（ステップ２）、更に、その結果に基づいたサポートの最小値を満たす長さ３のアイテムの組み合わせを数え上げ（ステップ３）、与えられたサポートの最小値を満たすアイテムの組み合わせがなくなるまで、長さｋを増加させて処理を繰り返す（ステップｋ）方式が取られている。
【００１０】
ここでは、与えられたサポートの最小値を満たすアイテムの組み合わせ全体の集合をＬと表記する。前記集合Ｌは、サポートの最小値を満たすアイテムの組み合わせについて、「アイテムの組み合わせ、出現回数であるカウント」を１組として保持するものである。
【００１１】
前記文献１に示されているように、Ｌが求まれば、先程の相関ルールの定義によって条件付き確率を計算して簡単に相関ルールを生成することができる。例えば、Ｌが「ＡＢ」であれば、相関ルールＡ→Ｂと、Ｂ→Ａが得られるので、これらのルールについて与えられた確信度を計算してユーザが与える最小値を満たしているかを調べれば良い。
【００１２】
しかし、従来提案されていた相関分析の手法では、相関ルールを算出することのみが考慮され、相関ルールの算出と一緒に統計処理を行うことが考慮されていなかった。
【００１３】
§３：具体例による相関分析の説明・・・図８のＢ図、図９参照
図８のＢ図はデータ例である。また、図９は従来例の説明図（その２）であり、Ａ図は装置の説明図である。相関分析で得られたルールについて、ルール中のアイテム群が含まれている全てのトランザクションについて、別の属性値に関する統計値を得る場合の処理の効率が悪い。以下、この点に関して具体例により説明する。
【００１４】
例えば、図８のＢ図に示したデータがあったとする。このデータ例において、日付は商品の販売日を示し、気温は商品の販売日の最高気温［℃］、販売時刻（２４時間表記）はその品目（アイテム）を販売した時刻について時間帯を示している。
【００１５】
このデータに対して、ユーザがサポートの最小値を５０％、確信度の最小値を５０％と指示した場合、従来の方式によって相関ルールを生成すると以下のルールが生成される。
【００１６】
Ａ→Ｂサポート＝５０％、確信度＝６０％
Ｂ→Ａサポート＝５０％、確信度＝１００％
更に、これらのルール中に含まれるアイテムを販売した際の気温の平均を求める。まず、生成したルールから「ＡＢ」を取り出す。次に、元のデータベースを再び読んで、各トランザクション中に「ＡＢ」が含まれるかどうかを調べる。日付１及び日付３について「ＡＢ」が含まれることが分かったので、日付１と日付３の気温をそれぞれ、２０度と３０度として取り出し、２つの気温の平均値＝（２０＋３０）／２＝２５を求め、相関ルールと合わせて次のように出力する。
【００１７】
Ａ→Ｂサポート＝５０％、確信度＝６０％、平均気温＝２５度
Ｂ→Ａサポート＝５０％、確信度＝１００％、平均気温＝２５度
従来の相関分析装置としては、例えば図９に示した装置を使用する。この装置は、パーソナルコンピュータ、ワークステーション等の各種コンピュータにより実現される装置であり、データベース１と、相関分析処理を行う相関分析処理部２と、ワークメモリ３と、結果ファイル４と、統計処理を行う統計処理部５を備えている。
【００１８】
この装置により相関分析処理を行う場合、先ず、データベース１のデータを取り出して相関分析処理部２へ送り、この相関分析処理部２で相関分析を行い、その結果得られた相関ルールをワークメモリ３に格納し、該ワークメモリ３のデータを結果ファイル４に格納する。また、統計処理部５は、前記結果ファイル４に格納された相関ルールを用いてデータベース１から必要なデータを取り出し、統計処理を行って結果ファイル４に格納する。このようにして結果ファイル４には相関ルールと統計値が得られる。
【００１９】
【発明が解決しようとする課題】
前記のような従来のものにおいては、次のような課題があった。
【００２０】
すなわち、従来の相関分析では、相関分析処理部２の処理で生成した相関ルール（結果ファイル４に格納されたルール）中のアイテム群が含まれるトランザクションをデータベース１から全て取り出して、その中の該当する属性について統計処理部５が統計処理を行い、相関ルールと合わせて出力（例えば、表示）することになる。
【００２１】
例えば、全トランザクション数がｍ個で、ルールの総数がｎ個であるとすると、ｍｎ個の組み合わせについて突き合わせを行い、該当する属性について統計処理を行う必要がある。この場合、相関分析処理部２による相関分析処理に加えて、更にもう一度、統計処理部５によりデータベース１のデータをスキャンして統計処理を行うため処理効率が悪い。
【００２２】
本発明は、このような従来の課題を解決し、生成したルール中に含まれるアイテム群が含まれるトランザクションについて、属性値に関して効率良く統計値を生成することを目的とする。また、本発明は、相関ルールを求める際に、サポートと確信度に加えて、属性値に関する統計値での絞り込みを可能にすることを目的とする。
【００２３】
【課題を解決するための手段】
図１は本発明の原理説明図である。本発明は前記の目的を達成するため、次のように構成した。
【００２４】
(1) ：与えられたデータ（例えば、データベース１のデータ）に対し、相関分析を適用して相関ルールを生成すると共に、統計処理を行い、前記相関ルールと統計処理結果とを合わせて出力する相関分析装置であって、アイテム単位で属性データの記憶を行う記憶部と、前記相関分析でアイテムの出現回数を数え上げる際に、アイテムの出現した際に、該アイテムの出現回数をカウントすると共に、前記記憶部に該アイテムに対応する属性データとして蓄積される数え上げ処理を行い、数え上げ処理の完了後、該数え上げ処理による各アイテムの出現回数を元に、所定条件を示すアイテムを相関ルールとして抽出すると共に、該相関ルールとされたアイテムに対応する属性データを前記記憶部から抽出し、該抽出した属性データを元に統計情報を出力する数え上げ／統計処理手段６を備えている。
【００２５】
(2) ：前記(1) の相関分析装置において、前記数え上げ／統計処理手段６は、複数の属性値について同時に統計処理を行い、相関ルールと合わせて指定した統計処理値を得る機能を備えている。
【００２６】
(3) ：前記(1) の相関分析装置において、前記数え上げ／統計処理手段６は、前記相関ルールのサポートの最小値の条件及び確信度の最小値が、ユーザによって与えられた条件を満たすものに関して、相関ルールと一緒に統計値を得る機能を備えている。
【００２７】
(4) ：前記(1) の相関分析装置において、前記数え上げ／統計処理手段６は、前記属性の統計値がユーザによって与えられた条件を満たすものに関して、相関ルールと一緒に統計値を得る機能（絞り込み処理、或いはフィルタリング処理機能）を備えている。
【００２９】
（作用）
前記構成に基づく本発明の作用を、図１に基づいて説明する。
(a) ：数え上げ／統計処理手段６は、データベース１を検索して相関分析を行う。この相関分析では、アイテムの出現回数を数え上げるが、この時、アイテムの組み合わせの出現回数と一緒に元データ（データベース１から検索したデータ）中の属性値（例えば、商品を販売する場合であれば、その時の最高気温、販売時刻等）も保持して処理を行い、相関ルールと一緒に統計値（例えば、商品の販売数量等）を得る。
【００３０】
前記のようにして相関分析と統計処理を一緒に行い、相関ルールを生成して、該相関ルールと統計値を結果ファイル４へ格納する。そして、このようにすれば、相関ルールを生成するのと一緒に属性値についての統計処理を効率良く行うことができる。
【００３１】
(b) ：数え上げ／統計処理手段６は、複数の属性値について同時に統計処理を行い、相関ルールと合わせて指定した統計処理値を得る。このようにすれば、相関ルールを生成するのと一緒に複数の属性値についての統計処理を同時に行うことができ、処理の効率を向上できる。
【００３２】
(c) ：数え上げ／統計処理手段６は、相関ルールのサポートの最小値の条件及び確信度の最小値が、ユーザによって与えられた条件を満たすものに関して、相関ルールと一緒に統計値を得る。このようにすれば、相関分析を行う際に、ユーザが指定したサポートの最小値と確信度の最大値を満たす相関ルールを生成するのと一緒に、属性値についての統計値を得ることができ、処理の効率を向上できる。
【００３３】
(d) ：数え上げ／統計処理手段６は、前記属性の統計値がユーザによって与えられた条件を満たすものに関して、相関ルールと一緒に統計値を得る。このようにすれば、相関分析を行う際に、ユーザが指定した属性値についての統計値のうち、指定された条件を満たすものについて相関ルールと合わせて求める（絞り込み処理、或いはフィルタリング処理を行う）ことができ、処理の効率を向上できる。
【００３４】
(e) ：記録媒体のプログラムを読み出して実行する（例えば、相関分析装置内のＣＰＵが実行する）ことにより、相関分析でアイテムの出現回数を数え上げる際に、アイテムの組み合わせの出現回数と一緒に元データ中の属性値も保持して処理を行い、相関ルールと一緒に統計値を得る手順を実行する。このようにすれば、相関ルールを生成するのと一緒に属性値についての統計処理を効率良く行うことができる。
【００３５】
【発明の実施の形態】
以下、発明の実施の形態を図面に基づいて詳細に説明する。
【００３６】
§１：相関分析装置の構成と、処理の概要説明・・・図２参照
図２は装置説明図である。図２に示した装置は、相関分析を行うための相関分析装置の１例であり、パーソナルコンピュータ、ワークステーション等の任意のコンピュータにより実現する。この装置は、装置本体と、該装置本体に接続されたデータベース１と、結果ファイル４を備えている。そして、装置本体には、組み合わせ生成部１１と、数え上げ／統計処理部１２と、ワークメモリ３と、相関ルール生成部１３等を備えている。
【００３７】
この場合、前記データベース１と結果ファイル４は、別々の記憶手段に格納しても良いし、一緒の記憶手段に格納しても良く、例えば、ハードデイスク装置のディスク媒体に格納する。また、前記組み合わせ生成部１１と、数え上げ／統計処理部１２と、相関ルール生成部１３は、装置本体内に設けたＣＰＵ（図示省略）により実行されるプログラムで実現する。
【００３８】
この装置では、組み合わせ生成部１１がデータベース１内のデータから組み合わせを生成し、それに基づいて、数え上げ／統計処理部１２が数え上げと統計処理とを行い、その結果のデータをワークメモリ３に格納する。そして、相関ルール生成部１３がワークメモリ３のデータを取り出して相関ルールを生成し、結果ファイル４に格納し、その後、結果ファイル４のデータ（相関ルール＋統計値）は表示等により出力する。このようにして、結果ファイル４には、相関ルールと統計値が格納される。
【００３９】
この相関分析装置は、データベース１中のデータに対して相関分析を適用し、相関ルールを生成する際に、一緒に指定された属性についての統計値を求める処理を合わせて行うものである。従来の相関分析とは異なり、アイテムの出現頻度の数え上げに加えて、属性の値についての統計処理を合わせて行うため、従来必要であったデータベース１からのデータの再度の読み込みと突き合わせは不要となる。
【００４０】
§２：フローチャートによる処理の説明
(1) ：全体の処理・・・図３参照
図３は全体の処理フローチャートである。以下、図３に基づいて全体の処理を説明する。なお、Ｓ１〜Ｓ８は各処理ステップを示す。この処理では、アイテムの組み合わせの長さｋを指標として処理を行う。先ず、ｋ＝１から処理を開始して、長さｋのラージアイテムセットを数え上げていく。そして、生成した結果の数がｋ＋１以上ある場合は、ｋ＝ｋ＋１として処理を継続し、結果の数がｋ＋１以上ない場合は処理を終了する（終了条件）。具体的には次の通りである。
【００４１】
先ず、パラメータｋをｋ＝１とし（Ｓ１）、長さｋの組み合わせ生成と統計処理を行う（Ｓ２）。この場合、組み合わせ生成部１１がデータベース１のデータを検索して長さｋの組み合わせを生成し、数え上げ／統計処理部１２が統計処理を行う。
【００４２】
その後、数え上げ／統計処理部１２では、数え上げ結果はｋ＋１より多いかどうかを判断し（Ｓ３）、多い場合は、ｋ＝ｋ＋１とし（Ｓ４）、前記Ｓ２の処理から繰り返して行う。このようにして処理を行い、数え上げ結果がｋ＋１より多くなくなれば、数え上げ／統計処理部１２が処理結果のデータをワークメモリ３に格納する。そして、相関ルール生成部１３がワークメモリ３のデータを基に、相関ルール生成処理を行う（Ｓ５）。
【００４３】
その後、相関ルール生成部１３が相関ルール生成処理は終了したかを判断し（Ｓ６）、相関ルール生成処理が終了しなければ、ユーザが与える確信度の条件を満たしているかを判断する（Ｓ７）。その結果、ユーザが与える確信度の条件を満たしていなければ前記Ｓ５の処理から繰り返して行う。
【００４４】
また、前記Ｓ７の処理で、ユーザが与える確信度の条件を満たしていれば、相関ルール生成部１３は、生成した相関ルールを結果ファイル４に出力し（Ｓ８）、再び前記Ｓ５の処理から繰り返して行う。一方、前記Ｓ６の処理で、相関ルール生成処理が終了した場合は、全ての処理を終了する。
【００４５】
このようにして処理を行い、結果ファイル４には、相関ルールと統計値のデータが格納される。従って、この結果ファイル４のデータは、例えばディスプレイ装置（図示省略）に表示してオペレータに知らせる。
【００４６】
(2) ：長さｋの組み合わせ生成処理と統計処理の説明・・・図４参照
図４は長さｋの組み合わせ生成と統計処理のフローチャートである。以下、図４に基づいて、長さｋの組み合わせ生成処理と統計処理（前記Ｓ２の処理）を詳細に説明する。なお、Ｓ１０〜Ｓ１６は各処理ステップを示す。
【００４７】
この処理では、トランザクションからアイテムの組み合わせの候補を生成し、数え上げを行って行く。全てのトランザクションについて、アイテムの組み合わせを数えたら、与えられたサポートの最小値の条件を満たしているかどうかを調べる。条件を満たしたアイテムについては、指定された属性値に対して統計処理を行う。具体的には次の通りである。
【００４８】
先ず、組み合わせ生成部１１は、データベース１を検索して、トランザクション先頭を読み出す（Ｓ１０）。そして、組み合わせ生成部１１は、前記トランザクションから組み合わせを生成する。次に、前記生成したトランザクションを基に、数え上げ／統計処理部１２が数え上げを行い、ｋ≧２（ｋが２以上）なら統計情報もリストとしてワークメモリ３に登録する（Ｓ１１）。なお、ｋが２未満、すなわちｋ＝１なら統計にならないので、統計処理は不要である。
【００４９】
このようにして、トランザクションリストが空になるまで（Ｓ１２）、前記Ｓ１０、Ｓ１１の処理を行い、トランザクションリストが空になると、数え上げ／統計処理部１２は数え上げた組み合わせを取り出す（Ｓ１３）。そして、サポートの最小値を満たすか否かを判断し（Ｓ１４）、サポートの最小値を満たすものについて、リストから統計値を計算し、合わせてワークメモリ３に出力する（Ｓ１５）。
【００５０】
その後、数え上げ／統計処理部１２は、まだ処理していない組み合わせがあるか否かを判断し（Ｓ１６）、まだ処理していない組み合わせがあれば、前記Ｓ１３の処理から繰り返して行う。また、前記Ｓ１４の処理で、サポートの最小値を満たしていない場合には、前記Ｓ１５の処理を行うことなく、Ｓ１６の処理を行う。このようにして、長さｋの組み合わせ生成処理と、統計処理を行う。
【００５１】
(3) ：前記処理の補足説明
前記処理において、長さ２以上（ｋ≧２）のアイテムの組み合わせの数え上げ（カウント）を行う各ステップ（Ｓ１０〜Ｓ１２）で、従来の▲１▼：アイテムの組み合わせ、▲２▼：アイテムの組み合わせのカウント、に加えて、▲３▼：アイテムの組み合わせが含まれるトランザクション中の別の属性値を、ワークメモリ３にリスト形式で保持して数え上げを行う。カウントには、そのアイテムの組み合わせがトランザクション中で出現した回数をワークメモリ３に保持する。また、属性値は、指定された属性値の値をワークメモリ３に保持する。
【００５２】
そして、各ステップ（Ｓ１０〜Ｓ１２）が終了して、長さｋのアイテムの組み合わせについて数え上げが終了したら、数え上げたアイテムの組み合わせが、ユーザが与えるサポートの最小値を満たしたかどうかを調べる（前記Ｓ１４参照）。更に、サポートの最小値を満たしたアイテムの組み合わせについては、ワークメモリ３に保存していた属性値に対して指定された統計処理を行い（前記Ｓ１５参照）、「アイテムの組み合わせ、出現回数、統計処理した値」をＬ′としてワークメモリ３へ出力する。
【００５３】
すなわち、その後の処理で、相関ルール生成部１３が前記Ｌ′から相関ルールを生成する際、ユーザが与える確信度の最小値を満たすルールについては、ルールに加えて統計処理した値を合わせて出力する。
【００５４】
また、単一の属性値についての統計演算のみでなく、複数の属性値についての統計処理を相関分析と合わせて行うことができる。その場合には、従来の▲１▼：アイテムの組み合わせ、▲２▼：アイテムの組み合わせのカウント、に加えて、▲３▼：アイテムの組み合わせが含まれるトランザクション中の別の属性値をそれぞれワークメモリ３に保持する。
【００５５】
この場合の処理は、単一の属性値と同様に行うことができ、各ステップが終了して長さｋのアイテムの組み合わせについて数え上げが終了したら、数え上げたアイテムの組み合わせが、ユーザが与えるサポートの最小値を満たすかどうかを調べる（前記Ｓ１４参照）。
【００５６】
サポートの最小値を満たしたアイテムの組み合わせについて、ｋ≧２の場合に保存していた複数の属性値に対して指定された統計処理を行う（前記Ｓ１５参照）。この際、複数の属性値について別々の統計演算を施すことが可能である。結果は、「アイテムの組み合わせ、出現回数、統計演算した値１、統計演算した値２、・・・」をＬ″として出力する。
【００５７】
(4) ：統計値による絞り込み処理の説明・・・図５参照
図５は統計値による絞り込み処理フローチャートである。以下、図５に基づいて、統計値による絞り込み処理を説明する。なお、Ｓ２１〜Ｓ２８は各処理ステップを示す。
【００５８】
従来のサポートと確信度による相関ルールの絞り込みに加えて、或る属性値又は複数の属性値について統計処理した値が、指定する範囲に含まれる場合にのみ相関ルールを生成する。このような絞り込み処理を図５の処理フローチャートに基づいて説明する。
【００５９】
先ず、組み合わせ生成部１１は、データベース１を検索して、トランザクション先頭を読み出す（Ｓ２１）。そして、組み合わせ生成部１１は、前記トランザクションから組み合わせを生成する。次に、前記生成したトランザクションを基に、数え上げ／統計処理部１２が数え上げを行い、ｋ≧２なら統計情報もリストとしてワークメモリ３に登録する（Ｓ２２）。
【００６０】
このようにして、トランザクションリストが空になるまで（Ｓ２３）、前記Ｓ２１、Ｓ２２の処理を行い、トランザクションリストが空になると、数え上げ／統計処理部１２は数え上げた組み合わせをワークメモリ３から取り出す（Ｓ２４）。そして、サポートの最小値を満たすか否かを判断し（Ｓ２５）、サポートの最小値を満たすものについて、リストから統計値を計算し、統計値が条件を満たすかどうかを判断する（Ｓ２６）。
【００６１】
その結果、統計値が条件を満たせば、相関ルールと統計値を合わせてワークメモリ３に出力する（Ｓ２７）。その後、数え上げ／統計処理部１２は、まだ処理していない組み合わせがあるか否かを判断し（Ｓ２８）、まだ処理していない組み合わせがあれば、前記Ｓ２４の処理から繰り返して行う。また、前記Ｓ２５の処理で、サポートの最小値を満たしていない場合、及び前記Ｓ２６の処理で、統計値が条件を満たしていない場合には、前記Ｓ２８の処理を行う。このようにして、統計値による絞り込み処理を行う。
【００６２】
(5) ：その他の説明
前記のようにして生成された相関ルールに含まれるアイテムが含まれるトランザクション中の別の属性値に対して統計処理を合わせて行う際に、既に相関ルールが生成されていたとする。この場合、相関ルール中に含まれるアイテムが分かっているので、前記各アイテムの出現回数を求める処理は不要となる。
【００６３】
また、相関ルールに含まれるアイテム数の最大値も容易に求めることができるので、相関ルール中に含まれるアイテムが出現するトランザクションについて、アイテム数の最大値までの数え上げと統計処理を行うだけでよい。場合によっては、ステップ毎に実行するので、２からアイテム数の最大値までの長さの組み合わせを一度に実行して、数え上げと統計処理を行うことによって高速化することが可能である。
【００６４】
前記のように、相関ルールの生成時に、合わせて統計処理（統計演算）を適用することにより、統計処理のためのデータのスキャンが不要となる。また、データｍ個、ルールｎ個の場合にｍｎ回数必要な突き合わせ処理のコストを低減できる。また、サポートと確信度による相関ルールの絞り込み処理に加えて、属性値を元にした絞り込み処理を行うことができる。
【００６５】
§３：具体的な処理例の説明・・・図６参照
図６は処理説明図であり、Ａ図は数え上げ結果例、Ｂ図は属性値を含めた数え上げ例１、Ｃ図は属性値を含めた数え上げ例２である。以下、図６に基づいて具体的な処理例を説明する。
【００６６】
(1) ：例１
以下に説明する例１は、図８のＢ図に示した従来例のデータを使用する。そして、ユーザが与えるサポートの最小値を５０％、確信度の最小値を５０％とする。また、最高気温の平均値を合わせて求めるように、ユーザが指示を行うものとする。
【００６７】
（処理１）
先ず、数え上げ／統計処理部１２は、各アイテムの出現回数を求める。この（処理１）においては、従来の手法と同様に求める。日付１のデータから、販売品目のＡとＢについて数え上げを行う。Ａについての出現回数（カウント）を１としてワークメモリ３に保持する。次に、Ｂについても、同様に出現回数を１としてワークメモリ３に保持する。
【００６８】
日付２のデータからは、販売品目のＡとＤについての数え上げを行う。初めに、Ａについてカウントを１増加させて２とし、Ｄについてのカウントを１とする。日付３と日付４についても、同様に数え上げを行う。その結果、Ａについての出現回数は３で、Ｂについての出現回数は２、Ｃについては出現回数が１、Ｄの出現回数は１が得られる。なお、（処理１）での数え上げの結果については、図６のＡ図に示したようにワークメモリ３に保持できる。
【００６９】
すなわち、アイテムＡの出現回数＝３、アイテムＢの出現回数＝２、アイテムＣの出現回数＝１、アイテムＤの出現回数＝１のように数え上げ結果をワークメモリ３に保持する。この場合、ユーザのサポートの最小値が５０％であるから、全トランザクション数４のうち２つ以上で出現しているものだけを残せばよい。従って、ＡとＢのみが条件を満たす。
【００７０】
（処理２）
続いて、数え上げ／統計処理部１２は、長さ２のアイテムの組み合わせの数え上げを行う。前記（処理１）の結果として、販売品目のＡとＢのみ得られているので、ＡとＢのみを利用して長さ２の組み合わせを生成する。日付１では、組み合わせＡＢを生成して、その出現回数を１として、最高気温＝２０をワークメモリ３に保持しておく。
【００７１】
日付２では、前記（処理１）の結果に含まれるアイテムがＡしかないので、長さ２の組み合わせは生成できない。日付３では、組み合わせＡＢを生成し、出現回数を１増加させて２とする。更に、最高気温に３０を追加して、リスト２０、３０を得る。日付４では、前記Ｓ１の結果に含まれるアイテムがないので、長さ２の組み合わせは生成できない。
【００７２】
Ｓ２の数え上げについては、図６のＢ図に示すように、長さ２のアイテムの組み合わせとカウントに加えて、指定された属性値も一緒にワークメモリ３に保持する。この場合、ワークメモリ３には、アイテムＡＢについて、出現回数＝２、最高気温＝２０、３０が保持される。
【００７３】
前記（処理１）と同様に、ユーザが指定したサポートの最小値が５０％であるから、全トランザクション数４のうち、２つ以上で出現しているものだけを結果として残す。従って、ＡＢは条件を満たしているので、最高気温として保持している２つの値についての平均値を計算し、（２０＋３０）／２＝２５を得る。
【００７４】
（処理３）
日付１〜４について、長さ３の組み合わせを生成することがないので、数え上げは終了する。以上によって、得られたアイテムの組み合わせ群Ｌ′は、「Ａ」「Ｂ」「ＡＢ」となる。Ｌ′に基づいて相関ルールを求めることができる。既に長さ２以上のアイテムの組み合わせを数え上げる際には、指定した属性値の統計情報を求めているので、合わせて出力を行う。その結果次のような相関ルールを得る。
【００７５】
Ａ→Ｂ：サポート＝５０％、確信度＝６６％、最高気温の平均値＝２５
Ｂ→Ａ：サポート＝５０％、確信度＝１００％、最高気温の平均値＝２５
Ａ→Ｂについてのサポートは、「ＡＢ」の出現回数／全トランザクション数＝２／４＝５０％
Ａ→Ｂについての確信度は、「ＡＢ」の出現回数／「Ａ」の出現回数＝２／３＝６６．６％
Ｂ→Ａについてのサポートは、「ＡＢ」の出現回数／全トランザクション数＝２／４＝５０％
Ｂ→Ａについての確信度は、「ＡＢ」の出現回数／「Ｂ」の出現回数＝２／２＝１００％
【００７６】
(2) ：例２
以下に説明する例２は、図８のＢ図に示した従来例のデータを使用する。そして、ユーザが指示するサポートの最小値を５０％、確信度の最小値を５０％とする。また、最高気温の平均値と販売時刻の平均値を合わせて求めるように、ユーザが指示を行うものとする。
【００７７】
（処理１）
先ず、数え上げ／統計処理部１２は、アイテムの出現回数を求める。日付１のデータから、販売品目のＡとＢについて数え上げを行う。Ａについての出現回数（カウント）を１とし、Ｂについても同様に出現回数を１とする。日付２のデータからは、ＡとＤについての数え上げを行う。はじめに、Ａについてのカウントを１増加させて２とし、Ｄについてのカウントを１とする。日付３と日付４についても同様に数え上げを行う。
【００７８】
その結果、Ａについての出現回数は３で、Ｂについての出現回数は２、Ｃについての出現回数は１で、Ｄについての出現回数は１となる。この場合、ユーザが指示したサポートの最小値が５０％であるから、全トランザクション数４のうち２つ以上で出現しているものだけを残せばよい。従って、ＡとＢのみが条件を満たす。
【００７９】
（処理２）
次に、数え上げ／統計処理部１２は、長さ２のアイテムの組み合わせの数え上げを行う。前記（処理１）での結果として販売品目のＡとＢのみが得られているので、ＡとＢのみを利用して長さ２の組み合わせを生成する。日付１では、組み合わせＡＢを生成して、その出現回数を１として、最高気温に２０を、販売時刻に１０を追加する。日付２では、前記（処理１）の結果に含まれるアイテムがＡしかないので、長さ２の組み合わせは生成できない。
【００８０】
日付３では、組み合わせＡＢを生成し、出現回数を１増加させて２とする。更に、最高気温に３０を追加して、販売時刻に１５を追加する。日付４では、前記（処理１）の結果に含まれるアイテムがないので、長さ２の組み合わせは生成できない。長さ２の組み合わせについては、図６のＣ図（属性値を含めた数え上げ例２）に示したように、数え上げ結果をワークメモリ３に保持する。すなわち、アイテムＡＢの出現回数＝２、最高気温＝２０、３０、販売時刻＝１０、１５をワークメモリ３に保持する。
【００８１】
前記（処理１）と同様に、ユーザのサポートの最小値が５０％であるから、全トランザクション数４のうち２つ以上で出現しているものだけを結果として残す。従って、ＡＢは条件を満たしているので、最高気温リストから平均値を計算して（２０＋３０）／２＝２５を得て、かつ、販売時刻の平均（１０＋１５）／２＝１２．５を得る。
【００８２】
（処理３）
次に、日付１〜日付４について、長さ３の組み合わせを生成することができないので、数え上げを終了する。以上によって得られたアイテムの組み合わせ群Ｌ′は、「Ａ」「Ｂ」「ＡＢ」となる。そして、前記Ｌ′に基づいて相関ルールを求めることができ、次のようになる。
【００８３】
Ａ→Ｂ：サポート＝５０％、確信度＝６６％、最高気温の平均値＝２５、販売時刻の平均値＝１２．５
Ｂ→Ａ：サポート＝５０％、確信度＝１００％、最高気温の平均値＝２５、販売時刻の平均値＝１２．５
Ａ→Ｂについてのサポートは、「ＡＢ」の出現回数／全トランザクション数＝２／４＝５０％
Ａ→Ｂについての確信度は、「ＡＢ」の出現回数／「Ａ」の出現回数＝２／３＝６６．６％
Ｂ→Ａについてのサポートは、「ＡＢ」の出現回数／全トランザクション数＝２／４＝５０％
Ｂ→Ａについての確信度は、「ＡＢ」の出現回数／「Ｂ」の出現回数＝２／２＝１００％
【００８４】
(3) ：例３
以下に説明する例３は、図８のＢ図に示した従来例のデータを使用する。そして、ユーザが指示するサポートの最小値を５０％、確信度の最小値を５０％とし、最高気温の平均値が３０以上である相関ルールを求めるように指定を行うものとする。
【００８５】
前記例１と同様にして処理を行い、次の相関ルールを得る。
【００８６】
Ａ→Ｂ：サポート＝５０％、確信度＝１００％、最高気温の平均値＝２５
Ｂ→Ａ：サポート＝５０％、確信度＝１００％、最高気温の平均値＝２５
これらのルール中の最高気温の平均値は、指定された３０以上という条件を満たしていないので、結果の相関ルールとしては出力しない。すなわち、相関分析を行う際に属性値によって絞り込み（フィルタリング処理）を行うことができる。
【００８７】
§４：具体的な装置例と記録媒体の説明・・・図７参照
図７は具体的な装置例である。図２に示した相関分析装置は、例えば、図７に示した装置により実現することができる。この装置例は、パーソナルコンピュータ、ワークステーション等の任意のコンピュータにより実現する装置であり、装置本体と、該装置本体に接続されたディスプレイ装置２２、キーボード２３、フレキシブルディスクドライブ（フロッピィディスクドライブ）（以下、「ＦＤＤ」と記す）２４、ＣＤ−ＲＯＭドライブ２５、ハードディスク装置（以下「ＨＤＤ」と記す）２６等を備えている。
【００８８】
そして、前記コンピュータ本体２１には、装置内の各種制御等を行うＣＰＵ（中央演算処理装置）２７、プログラムや各種パラメータ等のデータを格納しておくためのＲＯＭ（不揮発性メモリ）２８、ＣＰＵ２７がワーク用として使用するメモリ２９、外部のＩ／Ｏ装置とのインタフェース制御を行うインタフェース制御部３０、外部との通信制御を行う通信制御部３１等を備えている。
【００８９】
そして、前記図２に示した相関分析装置が行う相関分析処理（数え上げ／統計処理、及び相関ルール生成処理等）は、予めＨＤＤ２６のハードディスク（記録媒体、或いは記憶媒体）、或いはＲＯＭ２８に格納（記録、或いは記憶）しておいたプログラムをＣＰＵ２７の制御により読み出し、前記ＣＰＵ２７が前記プログラムを実行することにより行う。
【００９０】
しかし、本発明は、このような例に限らず、例えば、ＨＤＤ２６のハードディスクに、次のようにしてプログラムを格納し、このプログラムをＣＰＵ２７が実行することで前記相関分析処理を行うことも可能である。
【００９１】
▲１▼：他の装置で作成されたフレキシブルディスク（フロッピィディスク）に格納されているプログラム（他の装置で作成したプログラムデータ）を、ＦＤＤ２４により読み取り、ＨＤＤ２６の記録媒体（ハードディスク）に格納する。
【００９２】
▲２▼：ＣＤ−ＲＯＭに格納されているデータを、ＣＤ−ＲＯＭドライブ２５により読み取り、ＨＤＤ２６の記録媒体（ハードディスク）に格納する。
【００９３】
▲３▼：ＬＡＮ等の通信回線を介して他の装置から伝送されたプログラム等のデータを、通信制御部３１を介して受信し、そのデータをＨＤＤ２６の記録媒体（ハードディスク）に格納する。
【００９４】
【発明の効果】
以上説明したように、本発明によれば次のような効果がある。
(1) ：生成したルール中に含まれるアイテム群が含まれるトランザクションについて、属性値に関して効率良く統計値を生成することができる。また、相関ルールを求める際に、サポートと確信度に加えて、属性値に関する統計値での絞り込みを可能にでき、処理効率が向上する。
【００９５】
(2) ：数え上げ／統計処理手段は、与えられたデータを基に相関分析を行うが、この相関分析では、アイテムの出現回数を数え上げる際、アイテムの組み合わせの出現回数と一緒に元データ中の属性値も保持して処理を行い、相関ルールと一緒に統計値を得る。このようにすれば、相関ルールを生成するのと一緒に属性値についての統計処理を効率良く行うことができる。
【００９６】
(3) ：数え上げ／統計処理手段は、複数の属性値について同時に統計処理を行い、相関ルールと合わせて指定した統計処理値を得る。このようにすれば、相関ルールを生成するのと一緒に複数の属性値についての統計処理を同時に行うことができ、処理の効率を向上できる。
【００９７】
(4) ：数え上げ／統計処理手段は、前記相関ルールのサポートの最小値の条件及び確信度の最小値が、ユーザによって与えられた条件を満たすものに関して、相関ルールと一緒に統計値を得る。このようにすれば、相関分析を行う際に、ユーザが指定したサポートの最小値と確信度の最大値を満たす相関ルールを生成するのと一緒に、属性値についての統計値を得ることができ、処理の効率を向上できる。
【００９８】
(5) ：数え上げ／統計処理手段は、前記属性の統計値がユーザによって与えられた条件を満たすものに関して、相関ルールと一緒に統計値を得る。このようにすれば、相関分析を行う際に、ユーザが指定した属性値についての統計値のうち、指定された条件を満たすものについて相関ルールと合わせて求めることができ、処理の効率を向上できる。
【００９９】
(6) ：記録媒体のプログラムを読み出して実行する（例えば、相関分析装置内のＣＰＵが実行する）ことにより、相関分析でアイテムの出現回数を数え上げる際に、アイテムの組み合わせの出現回数と一緒に元データ中の属性値も保持して処理を行い、相関ルールと一緒に統計値を得る手順を実行する。このようにすれば、相関ルールを生成するのと一緒に属性値についての統計処理を効率良く行うことができる。
【図面の簡単な説明】
【図１】本発明の原理説明図である。
【図２】本発明の実施の形態における装置説明図である。
【図３】本発明の実施の形態における全体の処理フローチャートである。
【図４】本発明の実施の形態における長さｋの組み合わせ生成と統計処理のフローチャートである。
【図５】本発明の実施の形態における統計値による絞り込み処理フローチャートである。
【図６】本発明の実施の形態における処理説明図である。
【図７】本発明の実施の形態における具体的な装置例である。
【図８】従来例の説明図（その１）である。
【図９】従来例の説明図（その２）である。
【符号の説明】
１データベース
２相関分析処理部
３ワークメモリ
４結果ファイル
５統計処理部
６数え上げ／統計処理手段
１１組み合わせ生成部
１２数え上げ／統計処理部
１３相関ルール生成部
２１コンピュータ本体
２２ディスプレイ装置
２３キーボード
２４フレキシブルディスクドライブ（ＦＤＤ）
２５ＣＤ−ＲＯＭドライブ
２６ハードディスク装置（ＨＤＤ）
２７ＣＰＵ（中央演算処理装置）
２８ＲＯＭ（読み出し専用メモリ）
２９メモリ
３０インタフェース制御部
３１通信制御部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to processing of a large amount of data recorded in a database or the like, and relates to a correlation analysis apparatus and a recording medium for performing data mining and statistical processing for generating a correlation rule between attributes included in the data.
[0002]
[Prior art]
A conventional example will be described below.
§1: Explanation of data mining and correlation analysis
Data mining is a technique for finding useful information from a large amount of data stored in a database. A typical example of data mining is correlation analysis. Correlation analysis is a method that comprehensively examines all correlations of attribute values of data and finds the regularity existing between attributes as a correlation rule. Reference 1 (“R. Agrawal and R. Srikant, ^ Fast Algorithm for An efficient method described in "Mining Association Rules""and the corresponding Japanese Patent Application Laid-Open No. 8-263346 (system and method for mining sequential patterns in a large-scale database) has been proposed. Details of 1 are as follows.
[0003]
Reference 1: “Publication Title: Proceedings of the 20th International Conference on Very Large Data Bases, Publication Publisher: Morgan Kaufman Publishing, ISBN: 1-55860-153-8, Publication Date: 1994, 9, Publication Page: 487 -499, Paper title: Fast Algorithms for Mining Association Rules, Paper author: R. Agrawal and R. Srikant, "
[0004]
§2: Explanation of association rules: See Fig. 8A
FIG. 8 is an explanatory diagram of a conventional example (part 1), and FIG. 8A is an explanatory diagram of items and transactions. According to the document 1, the association rule is defined as follows. First, let the set I of items be I = {i1, i2,... Im}. Further, the transaction set D is D = {t1, t2,... Tn}.
[0005]
For example, when corresponding to sales data at a retail store, as shown in FIG. 8A, one receipt (any one of receipts 1, 2,... N) is converted into a transaction t (t1, t corresponding to t2,... tn), each purchased product (sales items A, B, C... M, etc.) in the receipt corresponds to item i. Here, for a certain item combination X, if s% transactions include X, then X support is defined to be s%.
[0006]
The association rule is defined as X → Y, and has two values, rule certainty and rule support. The rule support is support of X サポート Y, and the certainty of the rule is defined by (support of X∩Y) / (support of X). For example, an association rule such as “bread → butter confidence 30% support 8%” is “30% of customers who bought bread also bought butter, and customers who bought both bread and butter together It means “8% of the total”.
[0007]
If it is attempted to obtain all the association rules included in the transaction, it is known that a combination explosion is likely to occur, and the processing requires a lot of time and a large amount of computer resources such as a memory and a disk. For this reason, a method is used in which a user gives a minimum value of support and a minimum value of certainty and obtains only an association rule that satisfies the given conditions.
[0008]
That is, the correlation analysis is a process for obtaining all correlation rules satisfying the support and certainty given by the user from a set of transactions. The correlation analysis process is a “counting process” that obtains the number of occurrences of all the combinations of items that satisfy the minimum support value given from the original data, and the confidence level given from the combination of the obtained items and the number of occurrences. It consists of two processes of “rule generation process” for generating all correlation rules that satisfy the minimum value.
[0009]
It is known that the former “counting process” takes time. As a method for avoiding generation of unnecessary combinations as much as possible, in the above-mentioned document 1, the items satisfying the minimum support value given for the combination of items of length 1 are counted (step 1) and given based on the result. The combination of items of length 2 satisfying the minimum support value is counted (step 2), and the combination of items of length 3 satisfying the minimum support value based on the result is counted (step 3). The method is repeated (step k) by increasing the length k until there are no more combinations of items that satisfy the minimum support value.
[0010]
Here, a set of all combinations of items satisfying a given minimum support value is denoted as L. The set L holds “a combination of items, a count that is the number of appearances” as one set for a combination of items that satisfy the minimum support value.
[0011]
As shown in the above-mentioned document 1, once L is obtained, it is possible to easily generate a correlation rule by calculating a conditional probability according to the definition of the correlation rule. For example, if L is “AB”, the correlation rules A → B and B → A are obtained, so the certainty factor given for these rules can be calculated to check whether the minimum value given by the user is satisfied. It ’s fine.
[0012]
However, conventionally proposed correlation analysis methods consider only the calculation of correlation rules, and do not consider performing statistical processing together with the calculation of correlation rules.
[0013]
§3: Explanation of correlation analysis by specific example: see Fig. 8B and Fig. 9
FIG. 8B is an example of data. FIG. 9 is an explanatory diagram of the conventional example (part 2), and FIG. 9A is an explanatory diagram of the apparatus. Regarding the rule obtained by the correlation analysis, the processing efficiency when obtaining a statistical value related to another attribute value for all transactions including the item group in the rule is poor. Hereinafter, this point will be described with a specific example.
[0014]
For example, assume that there is data shown in FIG. 8B. In this data example, the date indicates the sale date of the product, the temperature indicates the maximum temperature [° C.] on the sale date of the product, and the sales time (24-hour notation) indicates the time zone for the time when the item (item) is sold. Yes.
[0015]
If the user instructs the minimum support value to be 50% and the certainty value is 50% for this data, the following rule is generated when the association rule is generated by the conventional method.
[0016]
A → B support = 50%, certainty = 60%
B → A Support = 50%, certainty = 100%
Furthermore, the average of the temperature at the time of selling the item contained in these rules is calculated | required. First, “AB” is extracted from the generated rule. The original database is then read again to see if “AB” is included in each transaction. Since it was found that “AB” was included for Date 1 and Date 3, the temperatures of Date 1 and Date 3 were taken as 20 degrees and 30 degrees, respectively, and the average value of the two temperatures = (20 + 30) / 2 = 25 Is output together with the association rule as follows.
[0017]
A → B Support = 50%, confidence = 60%, average temperature = 25 ° C
B → A Support = 50%, confidence = 100%, average temperature = 25 ° C
As a conventional correlation analyzer, for example, the apparatus shown in FIG. 9 is used. This apparatus is an apparatus realized by various computers such as a personal computer and a workstation, and performs a database 1, a correlation analysis processing unit 2 that performs correlation analysis processing, a work memory 3, a result file 4, and statistical processing. A statistical processing unit 5 is provided.
[0018]
When correlation analysis processing is performed by this apparatus, first, data in the database 1 is extracted and sent to the correlation analysis processing unit 2, and correlation analysis is performed by this correlation analysis processing unit 2. The data of the work memory 3 is stored in the result file 4. The statistical processing unit 5 extracts necessary data from the database 1 using the correlation rules stored in the result file 4, performs statistical processing, and stores the data in the result file 4. In this way, the correlation rule and the statistical value are obtained in the result file 4.
[0019]
[Problems to be solved by the invention]
The conventional apparatus as described above has the following problems.
[0020]
That is, in the conventional correlation analysis, all transactions including the item group in the correlation rules (rules stored in the result file 4) generated by the processing of the correlation analysis processing unit 2 are extracted from the database 1, and the corresponding ones are included therein. The statistical processing unit 5 performs statistical processing on the attribute to be output and outputs (for example, displays) together with the association rule.
[0021]
For example, if the total number of transactions is m and the total number of rules is n, it is necessary to match mn combinations and perform statistical processing on the corresponding attributes. In this case, in addition to the correlation analysis processing by the correlation analysis processing unit 2, the statistical processing unit 5 scans the data in the database 1 again to perform the statistical processing, so that the processing efficiency is low.
[0022]
An object of the present invention is to solve such a conventional problem and efficiently generate a statistical value regarding an attribute value for a transaction including an item group included in a generated rule. Another object of the present invention is to make it possible to narrow down a statistical value related to an attribute value in addition to support and certainty when obtaining an association rule.
[0023]
[Means for Solving the Problems]
FIG. 1 is a diagram illustrating the principle of the present invention. In order to achieve the above object, the present invention is configured as follows.
[0024]
(1): Apply correlation analysis to given data (for example, data in database 1) to generate a correlation rule, perform statistical processing, and output the correlation rule and the statistical processing result together In the correlation analysis device, the storage unit that stores attribute data in units of items, and when counting up the number of appearances of an item in the correlation analysis, when an item appears, the number of appearances of the item is counted, Accumulated as attribute data corresponding to the item in the storage unit Count After the completion of the counting process, based on the number of appearances of each item in the counting process, an item indicating a predetermined condition is extracted as a correlation rule, and attribute data corresponding to the item set as the correlation rule is Counting / statistical processing means 6 is provided for extracting from the storage unit and outputting statistical information based on the extracted attribute data.
[0025]
(2): In the correlation analysis apparatus of (1), the counting / statistical processing means 6 has a function of performing statistical processing on a plurality of attribute values at the same time and obtaining a specified statistical processing value together with the correlation rule. Yes.
[0026]
(3): In the correlation analyzer of (1), the counting / statistical processing means 6 is such that the minimum value condition and the minimum certainty factor of the correlation rule support satisfy the conditions given by the user. With respect to, it has a function to obtain statistical values together with association rules.
[0027]
(4): In the correlation analysis apparatus according to (1), the counting / statistical processing means 6 is a function for obtaining a statistical value together with a correlation rule for a case where the statistical value of the attribute satisfies a condition given by a user. (Narrowing processing or filtering processing function).
[0029]
(Function)
The operation of the present invention based on the above configuration will be described with reference to FIG.
(a): The counting / statistical processing means 6 searches the database 1 and performs correlation analysis. In this correlation analysis, the number of occurrences of an item is counted. At this time, the attribute value in the original data (data retrieved from the database 1) together with the number of appearances of the combination of items (for example, if a product is sold) The maximum temperature at that time, the sales time, etc.) are also retained and processed, and a statistical value (for example, the sales quantity of the product) is obtained together with the correlation rule.
[0030]
As described above, correlation analysis and statistical processing are performed together to generate a correlation rule, and the correlation rule and statistical value are stored in the result file 4. And if it does in this way, statistical processing about an attribute value can be efficiently performed with generating an association rule.
[0031]
(b): The counting / statistical processing means 6 performs statistical processing on a plurality of attribute values at the same time, and obtains a statistical processing value designated in accordance with the association rule. In this way, statistical processing for a plurality of attribute values can be performed simultaneously with the generation of the association rule, and the processing efficiency can be improved.
[0032]
(c): The enumeration / statistical processing means 6 obtains a statistical value together with the correlation rule with respect to the condition that the minimum value of the support of the correlation rule and the minimum value of the certainty satisfy the condition given by the user. In this way, when performing a correlation analysis, it is possible to obtain a statistical value for the attribute value while generating a correlation rule that satisfies the minimum support value and the maximum certainty value specified by the user. , Processing efficiency can be improved.
[0033]
(d): The enumeration / statistical processing means 6 obtains a statistical value together with the association rule regarding the attribute whose statistical value satisfies the condition given by the user. In this way, when performing the correlation analysis, among the statistical values for the attribute values specified by the user, those that satisfy the specified conditions are obtained together with the correlation rule (restriction processing or filtering processing is performed). And the processing efficiency can be improved.
[0034]
(e): When the number of occurrences of an item is counted in correlation analysis by reading and executing the program of the recording medium (for example, executed by the CPU in the correlation analysis device), together with the number of occurrences of the combination of items Processing is also performed while retaining attribute values in the original data, and a procedure for obtaining statistical values together with the association rules is executed. In this way, it is possible to efficiently perform statistical processing on attribute values together with generating association rules.
[0035]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described in detail below with reference to the drawings.
[0036]
§1: Correlation analyzer configuration and processing overview description ... See FIG.
FIG. 2 is an explanatory diagram of the apparatus. The apparatus shown in FIG. 2 is an example of a correlation analysis apparatus for performing correlation analysis, and is realized by an arbitrary computer such as a personal computer or a workstation. This apparatus includes an apparatus main body, a database 1 connected to the apparatus main body, and a result file 4. The apparatus main body includes a combination generation unit 11, a counting / statistical processing unit 12, a work memory 3, an association rule generation unit 13, and the like.
[0037]
In this case, the database 1 and the result file 4 may be stored in separate storage means or may be stored in the same storage means, for example, stored in a disk medium of a hard disk device. The combination generation unit 11, the counting / statistical processing unit 12, and the correlation rule generation unit 13 are realized by a program executed by a CPU (not shown) provided in the apparatus main body.
[0038]
In this apparatus, the combination generation unit 11 generates a combination from data in the database 1, and based on this, the counting / statistical processing unit 12 performs counting and statistical processing, and stores the resulting data in the work memory 3. . Then, the correlation rule generation unit 13 extracts the data in the work memory 3 to generate a correlation rule, stores it in the result file 4, and then outputs the data (correlation rule + statistic value) of the result file 4 by display or the like. In this way, the result file 4 stores association rules and statistical values.
[0039]
This correlation analysis apparatus applies a correlation analysis to data in the database 1 and performs a process for obtaining a statistical value for attributes specified together when generating a correlation rule. Unlike conventional correlation analysis, in addition to counting the appearance frequency of items, statistical processing of attribute values is performed together, so that it is unnecessary to read and match data from database 1 that was required in the past Become.
[0040]
§2: Explanation of processing by flowchart
(1): Overall processing: See FIG.
FIG. 3 is an overall process flowchart. The overall process will be described below with reference to FIG. In addition, S1-S8 shows each process step. In this processing, processing is performed using the length k of the combination of items as an index. First, the process is started from k = 1, and a large item set having a length k is counted. If the number of generated results is equal to or greater than k + 1, the process is continued as k = k + 1. If the number of results is not equal to or greater than k + 1, the process ends (end condition). Specifically, it is as follows.
[0041]
First, the parameter k is set to k = 1 (S1), and a combination of length k is generated and statistical processing is performed (S2). In this case, the combination generation unit 11 searches the data in the database 1 to generate a combination of length k, and the counting / statistical processing unit 12 performs statistical processing.
[0042]
Thereafter, the counting / statistical processing unit 12 determines whether or not the counting result is larger than k + 1 (S3), and when it is larger, k = k + 1 is set (S4), and the process is repeated from the process of S2. When the processing is performed in this way and the counting result becomes less than k + 1, the counting / statistical processing unit 12 stores the processing result data in the work memory 3. Then, the correlation rule generation unit 13 performs a correlation rule generation process based on the data in the work memory 3 (S5).
[0043]
Thereafter, the correlation rule generation unit 13 determines whether the correlation rule generation process has been completed (S6). If the correlation rule generation process has not ended, it determines whether the certainty condition given by the user is satisfied (S7). . As a result, if the certainty condition given by the user is not satisfied, the process is repeated from the process of S5.
[0044]
If the certainty condition given by the user is satisfied in the process of S7, the correlation rule generation unit 13 outputs the generated correlation rule to the result file 4 (S8), and repeats the process from S5 again. Do it. On the other hand, when the correlation rule generation process is completed in the process of S6, all the processes are ended.
[0045]
Processing is performed in this way, and the result file 4 stores correlation rule and statistical value data. Therefore, the data of the result file 4 is displayed on, for example, a display device (not shown) to inform the operator.
[0046]
(2): Description of length k combination generation processing and statistical processing ... see FIG.
FIG. 4 is a flowchart of the length k combination generation and statistical processing. The length k combination generation process and the statistical process (the process of S2) will be described in detail below with reference to FIG. S10 to S16 indicate each processing step.
[0047]
In this process, item combination candidates are generated from transactions and counted up. For all transactions, after counting the item combinations, check to see if the minimum support requirement is met. For items that satisfy the conditions, statistical processing is performed on the specified attribute value. Specifically, it is as follows.
[0048]
First, the combination generation unit 11 searches the database 1 and reads the transaction head (S10). Then, the combination generation unit 11 generates a combination from the transaction. Next, the counting / statistical processing unit 12 counts up based on the generated transaction, and if k ≧ 2 (k is 2 or more), the statistical information is also registered in the work memory 3 as a list (S11). If k is less than 2, that is, k = 1, statistics are not required, and statistical processing is not necessary.
[0049]
In this way, the processes of S10 and S11 are performed until the transaction list becomes empty (S12). When the transaction list becomes empty, the counting / statistical processing unit 12 takes out the counted combination (S13). Then, it is determined whether or not the minimum support value is satisfied (S14), and for those satisfying the minimum support value, a statistical value is calculated from the list and output to the work memory 3 (S15).
[0050]
Thereafter, the counting / statistical processing unit 12 determines whether there is a combination that has not yet been processed (S16). If there is a combination that has not yet been processed, the counting / statistical processing unit 12 repeats the processing from S13. If the minimum support value is not satisfied in the process of S14, the process of S16 is performed without performing the process of S15. In this way, the length k combination generation process and the statistical process are performed.
[0051]
(3): Supplementary explanation of the above process
In the above processing, in each step (S10 to S12) of counting (counting) combinations of items having a length of 2 or more (k ≧ 2), conventional (1): item combination, (2): item combination (3): Another attribute value in a transaction including a combination of items is stored in the work memory 3 in a list format and counted. In the count, the number of times that the combination of items appears in the transaction is held in the work memory 3. The attribute value holds the value of the specified attribute value in the work memory 3.
[0052]
When each step (S10 to S12) is finished and the counting of the combination of items of length k is finished, it is checked whether the counted combination of items satisfies the minimum value of support given by the user (S14). reference). Further, for the combination of items satisfying the minimum support value, the specified statistical processing is performed on the attribute value stored in the work memory 3 (see S15), and “item combination, number of appearances, statistics The processed value "is output to the work memory 3 as L '.
[0053]
That is, in the subsequent processing, when the correlation rule generation unit 13 generates a correlation rule from L ′, for the rule that satisfies the minimum value of the certainty given by the user, the statistically processed value is output in addition to the rule. To do.
[0054]
In addition to statistical calculation for a single attribute value, statistical processing for a plurality of attribute values can be performed together with correlation analysis. In that case, in addition to the conventional {circle over (1)}: item combination, {circle over (2)}: item combination count, {circle over (3)} another attribute value in the transaction including the item combination is stored in the work memory. Hold at 3.
[0055]
The processing in this case can be performed in the same way as a single attribute value, and when each step is completed and counting of the combination of items of length k is completed, the combination of the counted items is supported by the user. It is checked whether or not the minimum value is satisfied (see S14).
[0056]
For the combination of items satisfying the minimum support value, the specified statistical processing is performed for a plurality of attribute values stored when k ≧ 2 (see S15). At this time, it is possible to perform different statistical calculations on a plurality of attribute values. As a result, “item combination, number of appearances, statistically calculated value 1, statistically calculated value 2,...” Is output as L ″.
[0057]
(4): Explanation of narrowing-down process using statistical values ... See Fig. 5
FIG. 5 is a flowchart of a narrowing process using statistical values. Hereinafter, the narrowing-down process using statistical values will be described with reference to FIG. S21 to S28 indicate each processing step.
[0058]
In addition to the narrowing down of the association rule based on the conventional support and the certainty factor, the association rule is generated only when the statistical processing value for a certain attribute value or a plurality of attribute values is included in the designated range. Such a narrowing-down process will be described based on the process flowchart of FIG.
[0059]
First, the combination generation unit 11 searches the database 1 and reads the transaction head (S21). Then, the combination generation unit 11 generates a combination from the transaction. Next, the counting / statistical processing unit 12 counts up based on the generated transaction, and if k ≧ 2, the statistical information is also registered in the work memory 3 as a list (S22).
[0060]
In this way, the processes of S21 and S22 are performed until the transaction list becomes empty (S23). When the transaction list becomes empty, the counting / statistical processing unit 12 takes out the counted combination from the work memory 3 (S24). ). Then, it is determined whether or not the minimum support value is satisfied (S25). For those satisfying the minimum support value, a statistical value is calculated from the list, and it is determined whether or not the statistical value satisfies the condition (S26).
[0061]
As a result, if the statistical value satisfies the condition, the correlation rule and the statistical value are combined and output to the work memory 3 (S27). Thereafter, the counting / statistical processing unit 12 determines whether there is a combination that has not yet been processed (S28). If there is a combination that has not yet been processed, the counting / statistical processing unit 12 repeats the processing from S24. If the minimum support value is not satisfied in the process of S25, and if the statistical value does not satisfy the condition in the process of S26, the process of S28 is performed. In this way, the narrowing process based on the statistical value is performed.
[0062]
(5): Other explanation
Assume that a correlation rule has already been generated when statistical processing is performed on another attribute value in a transaction including an item included in the correlation rule generated as described above. In this case, since the items included in the association rule are known, the process for obtaining the number of appearances of each item is not necessary.
[0063]
In addition, since the maximum value of the number of items included in the correlation rule can be easily obtained, it is only necessary to count up to the maximum number of items and perform statistical processing for the transaction in which the item included in the correlation rule appears. . Depending on the case, since it is performed for each step, it is possible to increase the speed by executing a combination of lengths from 2 to the maximum value of the number of items at once, and performing counting and statistical processing.
[0064]
As described above, by applying statistical processing (statistical calculation) together when generating an association rule, it is not necessary to scan data for statistical processing. Moreover, the cost of the matching process which requires mn times in the case of m data and n rules can be reduced. Further, in addition to the correlation rule narrowing process based on support and certainty, a narrowing process based on attribute values can be performed.
[0065]
§3: Description of specific processing example ... See FIG.
FIG. 6 is an explanatory diagram of processing. FIG. 6A shows an example of counting results, FIG. 6B shows a counting example 1 including attribute values, and FIG. 6C shows a counting example 2 including attribute values. Hereinafter, a specific processing example will be described with reference to FIG.
[0066]
(1): Example 1
Example 1 described below uses the data of the conventional example shown in FIG. 8B. The minimum support value given by the user is 50%, and the minimum confidence value is 50%. Further, it is assumed that the user gives an instruction so as to obtain the average value of the maximum temperatures together.
[0067]
(Process 1)
First, the counting / statistical processing unit 12 calculates the number of appearances of each item. In this (processing 1), it calculates | requires similarly to the conventional method. From the date 1 data, the sales items A and B are counted. The number of appearances (count) for A is set to 1 and held in the work memory 3. Next, B is also stored in the work memory 3 with the appearance count set to 1.
[0068]
From the date 2 data, the sales items A and D are counted. First, the count for A is increased by 1 to 2 and the count for D is set to 1. The date 3 and the date 4 are similarly counted up. As a result, the number of appearances for A is 3, the number of appearances for B is 2, the number of appearances for C is 1, and the number of appearances for D is 1. The counting results in (Process 1) can be held in the work memory 3 as shown in FIG. 6A.
[0069]
That is, the work memory 3 holds the counting results such that the number of appearances of the item A = 3, the number of appearances of the item B = 2, the number of appearances of the item C = 1, and the number of appearances of the item D = 1. In this case, since the minimum value of user support is 50%, it is only necessary to leave only those appearing in two or more of the total number of transactions 4. Therefore, only A and B satisfy the condition.
[0070]
(Process 2)
Subsequently, the counting / statistical processing unit 12 counts the combinations of items having a length of 2. As a result of the above (Processing 1), since only sales items A and B are obtained, a combination of length 2 is generated using only A and B. On date 1, a combination AB is generated, the number of appearances is set to 1, and the maximum temperature = 20 is held in the work memory 3.
[0071]
On date 2, since only item A is included in the result of (Process 1), a combination of length 2 cannot be generated. On date 3, a combination AB is generated and the number of appearances is increased by 1 to 2. Furthermore, 30 is added to the maximum temperature, and the lists 20 and 30 are obtained. On date 4, since there are no items included in the result of S1, a combination of length 2 cannot be generated.
[0072]
As for the counting of S2, as shown in FIG. 6B, in addition to the combination and count of items of length 2, the designated attribute value is also held in the work memory 3. In this case, the work memory 3 holds the appearance frequency = 2 and the maximum temperature = 20, 30 for the item AB.
[0073]
Similar to (Process 1), since the minimum value of support specified by the user is 50%, only those appearing in two or more of the total number of transactions 4 are left as results. Therefore, since AB satisfies the condition, the average value of the two values held as the maximum temperature is calculated, and (20 + 30) / 2 = 25 is obtained.
[0074]
(Process 3)
Since the combination of length 3 is not generated for dates 1 to 4, the counting ends. Thus, the obtained item combination group L ′ becomes “A”, “B”, and “AB”. An association rule can be obtained based on L ′. When counting up combinations of items having a length of 2 or more already, statistical information of the specified attribute value is obtained, and output is also performed. As a result, the following association rule is obtained.
[0075]
A → B: Support = 50%, certainty factor = 66%, average value of maximum temperature = 25
B → A: Support = 50%, certainty factor = 100%, average value of maximum temperature = 25
Support for A → B is the number of occurrences of “AB” / number of all transactions = 2/4 = 50%
The certainty about A → B is the number of appearances of “AB” / number of appearances of “A” = 2/3 = 66.6%
Support for B → A is the number of occurrences of “AB” / number of all transactions = 2/4 = 50%
The certainty factor for B → A is the number of appearances of “AB” / number of appearances of “B” = 2/2 = 100%
[0076]
(2): Example 2
Example 2 described below uses the data of the conventional example shown in FIG. 8B. The minimum support value specified by the user is set to 50%, and the minimum confidence value is set to 50%. Further, it is assumed that the user gives an instruction so as to obtain the average value of the maximum temperature and the average value of the sales time together.
[0077]
(Process 1)
First, the counting / statistical processing unit 12 obtains the number of appearances of an item. From the date 1 data, the sales items A and B are counted. The number of appearances (count) for A is 1, and the number of appearances for B is also 1. From date 2 data, A and D are counted. First, the count for A is increased by 1 to 2, and the count for D is set to 1. The date 3 and date 4 are counted in the same manner.
[0078]
As a result, the number of appearances for A is 3, the number of appearances for B is 2, the number of appearances for C is 1, and the number of appearances for D is 1. In this case, since the minimum value of support instructed by the user is 50%, it is sufficient to leave only those appearing in two or more of the total number of transactions 4. Therefore, only A and B satisfy the condition.
[0079]
(Process 2)
Next, the counting / statistical processing unit 12 counts the combinations of items having a length of 2. Since only sales items A and B are obtained as a result of the (Process 1), a combination of length 2 is generated using only A and B. On date 1, a combination AB is generated, the number of appearances is set to 1, and 20 is added to the maximum temperature and 10 is added to the sales time. On date 2, since only item A is included in the result of (Process 1), a combination of length 2 cannot be generated.
[0080]
On date 3, a combination AB is generated and the number of appearances is increased by 1 to 2. Further, 30 is added to the maximum temperature and 15 is added to the sales time. On date 4, since there is no item included in the result of (Process 1), a combination of length 2 cannot be generated. For the combination of length 2, the counting result is stored in the work memory 3 as shown in FIG. 6C (counting example 2 including attribute values). That is, the number of appearances of the item AB = 2, the maximum temperature = 20, 30, and the sales time = 10, 15 are stored in the work memory 3.
[0081]
As in the above (Processing 1), since the minimum value of user support is 50%, only those appearing in two or more of the total number of transactions 4 are left as results. Therefore, since AB satisfies the condition, the average value is calculated from the maximum temperature list to obtain (20 + 30) / 2 = 25, and the average sales time (10 + 15) /2=12.5 is obtained.
[0082]
(Process 3)
Next, since a combination of length 3 cannot be generated for date 1 to date 4, the counting ends. The item combination group L ′ obtained as described above is “A”, “B”, and “AB”. Then, an association rule can be obtained based on L ′ as follows.
[0083]
A → B: Support = 50%, certainty factor = 66%, average value of maximum temperature = 25, average value of sales time = 12.5
B → A: Support = 50%, certainty factor = 100%, average value of maximum temperature = 25, average value of sales time = 12.5
Support for A → B is the number of occurrences of “AB” / number of all transactions = 2/4 = 50%
The certainty about A → B is the number of appearances of “AB” / number of appearances of “A” = 2/3 = 66.6%
Support for B → A is the number of occurrences of “AB” / number of all transactions = 2/4 = 50%
The certainty factor for B → A is the number of appearances of “AB” / number of appearances of “B” = 2/2 = 100%
[0084]
(3): Example 3
Example 3 described below uses the data of the conventional example shown in FIG. 8B. Then, it is assumed that the minimum value of the support specified by the user is 50%, the minimum value of the certainty is 50%, and the correlation rule having an average maximum temperature of 30 or more is obtained.
[0085]
Processing is performed in the same manner as in Example 1 to obtain the following correlation rule.
[0086]
A → B: Support = 50%, certainty factor = 100%, average value of maximum temperature = 25
B → A: Support = 50%, certainty factor = 100%, average value of maximum temperature = 25
Since the average value of the maximum temperature in these rules does not satisfy the specified condition of 30 or more, it is not output as a result correlation rule. That is, it is possible to narrow down (filtering process) by attribute values when performing correlation analysis.
[0087]
§4: Description of specific device example and recording medium ... See FIG.
FIG. 7 shows a specific example of the apparatus. The correlation analysis apparatus shown in FIG. 2 can be realized by the apparatus shown in FIG. 7, for example. This device example is a device realized by an arbitrary computer such as a personal computer or a workstation. The device main body, a display device 22 connected to the device main body, a keyboard 23, a flexible disk drive (floppy disk drive) (hereinafter referred to as a floppy disk drive). , "FDD") 24, CD-ROM drive 25, hard disk device (hereinafter referred to as "HDD") 26, and the like.
[0088]
The computer main body 21 includes a CPU (central processing unit) 27 for performing various controls in the apparatus, a ROM (nonvolatile memory) 28 for storing data such as programs and various parameters, and a CPU 27. A memory 29 used for work, an interface control unit 30 that performs interface control with an external I / O device, a communication control unit 31 that performs communication control with the outside, and the like are provided.
[0089]
The correlation analysis processing (counting / statistical processing, correlation rule generation processing, etc.) performed by the correlation analyzer shown in FIG. 2 is stored (recorded) in the hard disk (recording medium or storage medium) of the HDD 26 or the ROM 28 in advance. Or the stored program is read out under the control of the CPU 27, and the CPU 27 executes the program.
[0090]
However, the present invention is not limited to such an example. For example, the correlation analysis process can be performed by storing a program in the hard disk of the HDD 26 as follows and executing the program by the CPU 27. is there.
[0091]
{Circle around (1)} A program (program data created by another device) stored in a flexible disk (floppy disk) created by another device is read by the FDD 24 and stored in a recording medium (hard disk) of the HDD 26.
[0092]
(2): The data stored in the CD-ROM is read by the CD-ROM drive 25 and stored in the recording medium (hard disk) of the HDD 26.
[0093]
(3): Data such as a program transmitted from another device via a communication line such as a LAN is received via the communication control unit 31, and the data is stored in a recording medium (hard disk) of the HDD.
[0094]
【The invention's effect】
As described above, the present invention has the following effects.
(1): A statistical value can be efficiently generated for an attribute value for a transaction including an item group included in the generated rule. In addition, when obtaining an association rule, in addition to support and certainty, it is possible to narrow down by statistical values related to attribute values, thereby improving processing efficiency.
[0095]
(2): The counting / statistical processing means performs a correlation analysis based on the given data. In this correlation analysis, when counting up the number of occurrences of an item, the number of occurrences of the combination of items is included in the original data. The attribute value is also stored and processed, and a statistical value is obtained together with the association rule. In this way, it is possible to efficiently perform statistical processing on attribute values together with generating association rules.
[0096]
(3): The enumeration / statistical processing means performs statistical processing on a plurality of attribute values at the same time, and obtains a statistical processing value specified together with the association rule. In this way, statistical processing for a plurality of attribute values can be performed simultaneously with the generation of the association rule, and the processing efficiency can be improved.
[0097]
(4): The enumeration / statistical processing means obtains a statistical value together with the correlation rule with respect to the minimum value condition of the correlation rule support and the minimum confidence value satisfying the condition given by the user. In this way, when performing a correlation analysis, it is possible to obtain a statistical value for the attribute value while generating a correlation rule that satisfies the minimum support value and the maximum certainty value specified by the user. , Processing efficiency can be improved.
[0098]
(5): The enumeration / statistical processing means obtains a statistical value together with the association rule regarding the attribute whose statistical value satisfies the condition given by the user. In this way, when performing correlation analysis, among the statistical values for the attribute values specified by the user, those that satisfy the specified conditions can be obtained together with the correlation rule, and the processing efficiency can be improved. .
[0099]
(6): When the number of occurrences of an item is counted in correlation analysis by reading and executing the program of the recording medium (for example, executed by the CPU in the correlation analysis device), together with the number of occurrences of the combination of items Processing is also performed while retaining attribute values in the original data, and a procedure for obtaining statistical values together with the association rules is executed. In this way, it is possible to efficiently perform statistical processing on attribute values together with generating association rules.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating the principle of the present invention.
FIG. 2 is an explanatory diagram of an apparatus according to an embodiment of the present invention.
FIG. 3 is an overall processing flowchart according to the embodiment of the present invention.
FIG. 4 is a flowchart of length k combination generation and statistical processing in the embodiment of the present invention;
FIG. 5 is a flowchart of a narrowing process using statistical values according to the embodiment of the present invention.
FIG. 6 is an explanatory diagram of processing according to the embodiment of the present invention.
FIG. 7 is a specific apparatus example according to the embodiment of the present invention.
FIG. 8 is an explanatory diagram (part 1) of a conventional example.
FIG. 9 is an explanatory diagram (part 2) of a conventional example.
[Explanation of symbols]
1 Database
2 Correlation analysis processor
3 Work memory
4 Result files
5 statistical processing department
6 Counting / statistical processing means
11 Combination generator
12 Counting / Statistical Processing Department
13 Association rule generator
21 Computer body
22 Display device
23 Keyboard
24 Flexible disk drive (FDD)
25 CD-ROM drive
26 Hard Disk Drive (HDD)
27 CPU (Central Processing Unit)
28 ROM (read only memory)
29 memory
30 Interface control unit
31 Communication control unit

Claims

A correlation analysis device that applies correlation analysis to given data to generate a correlation rule, performs statistical processing, and outputs the correlation rule and the statistical processing result together,
A storage unit for storing attribute data in item units;
When counting up the number of appearances of an item in the correlation analysis, when an item appears, count the number of appearances of the item, and perform a counting process accumulated as attribute data corresponding to the item in the storage unit, After completion of the counting process, based on the number of appearances of each item by the counting process, an item indicating a predetermined condition is extracted as a correlation rule, and attribute data corresponding to the item set as the correlation rule is extracted from the storage unit A counting / statistical processing means for outputting statistical information based on the extracted attribute data;
A correlation analyzer characterized by comprising:

The counting / statistical processing means includes:
The correlation analysis apparatus according to claim 1, further comprising a function of simultaneously performing statistical processing on a plurality of attribute values to obtain a statistical processing value specified in association with a correlation rule.

The counting / statistical processing means includes:
The function of obtaining a statistical value together with an association rule with respect to a condition for the minimum value of the support of the association rule and the minimum value of the certainty satisfying a condition given by a user. 1. The correlation analyzer according to 1.

The counting / statistical processing means includes:
The correlation analysis apparatus according to claim 1, further comprising a function of obtaining a statistical value together with a correlation rule for the attribute statistical value satisfying a condition given by a user.

It has a storage unit that stores attribute data in units of items, applies correlation analysis to the given data, generates correlation rules, performs statistical processing, and combines the correlation rules with the statistical processing results Output to the correlation analyzer
When counting up the number of occurrences of an item in the correlation analysis, when an item appears, the number of appearances of the item is counted, and the processing is stored as attribute data corresponding to the item in the storage unit, and the number is counted After completion of the process, based on the number of appearances of each item by the counting process, an item indicating a predetermined condition is extracted as a correlation rule, and attribute data corresponding to the item determined as the correlation rule is extracted from the storage unit A computer-readable recording medium recording a program for realizing the function of counting / statistical processing means for outputting statistical information based on the extracted attribute data.