CN105320756A

CN105320756A - Improved Apriori algorithm based method for mining database association rule

Info

Publication number: CN105320756A
Application number: CN201510666724.4A
Authority: CN
Inventors: 赵学健; 袁源; 孙知信; 乔爱锋
Original assignee: Jiangsu Posts and Telecommunications Planning and Designing Institute Co Ltd
Current assignee: Jiangsu Posts and Telecommunications Planning and Designing Institute Co Ltd
Priority date: 2015-10-15
Filing date: 2015-10-15
Publication date: 2016-02-10
Anticipated expiration: 2035-10-15
Also published as: CN105320756B

Abstract

The present invention proposes an improved Apriori algorithm based method for mining a database association rule. According to the method, a transaction database is converted into a relational matrix, the converted relational matrix is a sparse matrix, and the relational matrix is stored with an orthogonal link list. A generation process of a frequent item set is converted into an operation process of a single link list node set corresponding to items in the corresponding relational matrix. According to the method, a database only needs to be scanned once, so that the shortcomings that Apriori and a related algorithm therefor generate a large amount of candidate sets and need to scan the database for multiple times are overcome, and the time of frequently performing I/O operations is shortened; then, when a frequent 2-item set is generated and found, only an intersection operation of a node set needs to be performed, so that less time is consumed; and a single link list constructed by a generated frequent k-item set is recorded, so that a generation process of a frequent K+1-item set is simplified, and a complex pruning process of the Apriori algorithm is avoided.

Description

A kind of database association rule digging method based on improving Apriori algorithm

Technical field

The invention discloses a kind of database association rule digging method based on improving Apriori algorithm, emphasis relates to and is representing on the basis of transaction database with orthogonal linked list storage matrix, transformation and optimization is carried out to the Frequent Item Sets generative process of Apriori algorithm, belongs to computer data and excavate and technical field of information processing.

Background technology

Develop today like a raging fire at large data technique, people recognize that namely data are wealth gradually, especially have more huge practical value to the analysis of business data.Association Rule Analysis, as one of the Main Means of data mining, is an important component part indispensable in data mining technology, is mainly used in finding valuable interesting contact implicit in large-scale transaction database and rule.Therefore, the research of association rule algorithm is had very important significance.

As far back as 1993, the people such as the computer scientist R.Agrawal of IBM found the purchase rule of client when buying commodity in customer transaction DB, propose the correlative model between affairs, namely initial correlation rule.Normally a kind of uncomplicated but rule that practicality is very high of correlation rule.By Association Rule Analysis, we can by the relation excavation between transaction itemset and item collection out.The most typical application of Association Rule Analysis is market basket data analysis, such as classical { beer } → { diaper } rule.Except being applied to except market basket data, the application of Association Rule Analysis in other field is also very extensive, as personalized recommendations in E-business, and financial service, advertisement plan, bioinformatics and science data analysis etc.Such as in personalized recommendations in E-business, correlation rule can help e-commerce website to carry out the interested commercial product recommending of some their possibilities to the client with similar consumer behavior, contribute to e-commerce website like this and promote Consumer's Experience, net income increase etc.

Association Rule Analysis algorithm is more, and wherein the most classical practicality is it is preferred that Apriori algorithm and innovatory algorithm thereof.Apriori algorithm [1] is first association rule algorithm proposed in 1994 by Agrawal and Swami, is widely used, and this algorithm performs connection by repetitive cycling, beta pruning generates Frequent Item Sets, thus the rule that is associated.Based on Apriori algorithm, the people such as Yang propose Apriori-TFP algorithm [2], and raw data, in association rule mining process, is carried out pre-service and be stored in local supporting, in tree, finally to generate correlation rule by this algorithm.This algorithm, by effective pre-service, reduces the time of association rule mining, but needs the number of times of scan database still more.The people such as Zhang propose GP-Apriori algorithm [3], GP-Apriori algorithm adopts graphic process unit (GraphicalProcessingUnit, GPU) carry out the support counting of parallelization, and row of vertically concluding the business are stored as linear oldered array.GPU by this oldered array of traversal, and performs step-by-step and intersects and realize support and calculate, and result is copied back internal memory.Compared with the Apriori algorithm that traditional C PU runs, GP-Apriori algorithm improves operating rate owing to have employed advanced GPU, but complicacy increases on the contrary to some extent.The people such as Delighta it is also proposed innovatory algorithm (AprioriMendAlgorithm) [4] of Apriori.This algorithm uses hash function to generate Item Sets, and user must specify minimum support to delete unwanted item collection.This algorithm has efficiency more better than traditional Apriori algorithm, but the execution time increases to some extent.Ning etc. achieve the parallelization [5] of happy Apriori algorithm based on MapReduce framework.This algorithm is with good expansibility and efficiency when processing massive data sets, but this counts the powerful calculating of needs and storage capacity supports, and is normally operated in cluster environment.The people such as Sulianta attempt in the document [6] Apriori algorithm to be applied to multidimensional data analysis, have inquired into the more concrete effective method of rule that to be associated in multidimensional data.The people such as Sheila improve Apriori algorithm in document [7], introduce the concept of affairs size and affairs scale to eliminate the impact of non-critical item object.The people such as Feng propose a kind of Apriori algorithm based on matrix in document [8], and this algorithm effectively represents the various operations of database by matrix, and obtain maximum Frequent Item Sets with the AND operation based on matrix.The people such as Hu are application relational theory thought in document [9], introduce project reparable subspace and AND operation thereof, devise a kind of fast algorithm for mining-SLIG (Single-levelLargeItemsetsGeneration) algorithm, the production process of Frequent Item Sets is converted into vector operation process in the relational matrix of Item Sets.This algorithm overcomes Apriori and related algorithm produces a large amount of Candidate Set and needs the shortcoming of Multiple-Scan database, but the storage space needed is larger.

Reference citation

[1] R.Agrawal, R.Srikantetal..Fastalgorithmsforminingassociationrules (fast algorithm of Mining Association Rules), Proc.20thInt.Conf.VeryLargeDataBases, VLDB, vol.1215, pp.487-499, September1994.

[2] Z.Yang, W.Tang, A.Shintemirov, andQ.Wu.Associationrulemining-baseddissolvedgasanalysisf orfaultdiagnosisofpowertransformers (the diagnosing fault of power transformer dissolved gas analysis based on association rule mining), Systems, Man, andCybernetics, PartC:ApplicationsandReviews, IEEETransactionson, vol.39, no.6, pp.597-610,2009.

[3] F.Zhang, Y.Zhang, andJ.D.Bakos.Gpapriori:Gpu-acceleratedfrequentitemsetmin ing (frequent item set mining that graphic based processor accelerates), inCLUSTER.IEEE, 2011, pp.590-594.

[4] I.S.P.J.D.MagdaleneDelightaAngeline.Associationrulegener ationusingApriorimendalgorithmforstudent'splacement (Association Rules Generating Algorithm based on improving Apriori algorithm), vol.2, no.1,2012, pp.78-86.

[5] N.Li, L.Zeng, Q.He, andZ.Shi.Parallelimplementationofapriorialgorithmbasedon MapReduce (Parallel Implementation based on the Apriori algorithm of MapReduce), inSoftwareEngineering, ArtificialIntelligence, NetworkingandParallelDistributedComputing (SNPD), 201213thACISInternationalConferenceon, 2012, pp.236-241.

[6] F.Sulianta, T.H.Liong, andI.Atastina.Miningfoodindustry'smultidimensionaldatato produceassociationrulesusingApriorialgorithmasabasisofbu sinessstrategy (the corporate strategy association rules mining algorithm towards food industry multidimensional data based on Apriori algorithm), inInformationandCommunicationTechnology (ICoICT), 2013InternationalConferenceof, 2013, pp.176-181.

[7] S.A.Abaya.AssociationruleminingbasedonApriorialgorithmin minimizingcandidategeneration (the minimum generation candidate association rule mining algorithms based on Apriori algorithm), InternationalJournalofScientificandEngineeringResearch, vol.3, no.7, pp.1-4, July2012.

[8] WangFeng, LiYong-hua.AnImprovedAprioriAlgorithmBasedontheMatrix (a kind of improvement Apriori algorithm based on matrix), fbie, pp.152-155,2008InternationalSeminaronFutureBioMedicalInformationEng ineering, 2008.

[9] Hu Huirong, Wang Zhoujing. a kind of Fast algorithm for mining association rules based on relational matrix, computer utility, 2005,25 (7): 1577-1579.

Summary of the invention

The present invention proposes a kind of database association rule digging method based on improving Apriori algorithm, comprising improvement Apriori algorithm-OLA (OrthogonalListApriori) algorithm based on orthogonal linked list affairs storage matrix.

The present invention includes following steps:

Step 1, scanning transaction database D, obtains relational matrix M _a;

Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal _a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node, and M node is the gauge outfit node of orthogonal linked list; H node is row/column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding; Three kinds of nodes all comprise four territories, Tag territory, Element territory, Right territory and Down territory.Tag territory is mark domain, in order to distinguish three kinds of dissimilar nodes.Element territory is element fields, and concerning orthogonal linked list gauge outfit node, what two tuples in element fields stored is line number and the columns of corresponding sparse relational matrix, i.e. the number of transactions that comprises of transaction database D and item number; Concerning row gauge outfit node, the nonzero element number comprised in which row of two element group representations in element fields and this row; Concerning list head node, the nonzero element number in two element group representation projects in element fields or Item Sets and this row; Concerning nonzero element node, two element group representation projects in element fields or Item Sets and comprise the affairs numbering of this project or Item Sets.Right territory is pointer field, and concerning orthogonal linked list gauge outfit node, it points to first list head node; Concerning row gauge outfit node, it points to this row first nonzero element node; Concerning list head node, it points to next list head node; Concerning nonzero element node, it points to the next nonzero element node of this row.Down territory is also pointer field, and concerning orthogonal linked list gauge outfit node, it points to first row gauge outfit node; Concerning row gauge outfit node, it points to next list head node; Concerning list head node, it points to these row first nonzero element node; Concerning nonzero element node, it points to the next nonzero element node of these row;

Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2 ₁and frequent 1 collection set L ₁corresponding orthogonal linked list;

Step 4, by frequent k-1 item collection set L _k-1be connected with self and produce candidate's frequent k item collection set C _k, k be interval [2, ∞) in natural number;

Step 5, utilizes Apriori character (all nonvoid subsets of frequent item set also must be frequently, if the nonvoid subset of certain candidate is not frequently, so this candidate certainly not frequently) to candidate's frequent item set set C _kcarry out beta pruning;

Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection affairs set and calculated candidate frequent k item collection set C _kmiddle member support, wherein i ₁for interval [1, N _k] in natural number, N _krepresent candidate's frequent k item collection set C _kthe number of members comprised;

Step 7, by frequent for candidate k item collection set C _kmiddle member support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L _k, and according to gained affairs set in step 6 construct frequent k item collection set L _kcorresponding orthogonal linked list;

Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;

Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then can produces correlation rule:

R={A->B}, A are any member in frequent item set set F nonvoid subset, B is the supplementary set of A, namely ∈ F, i ₂for interval [1, N _f] in natural number, N _frepresent the number of members that frequent item set set F comprises.

In step 1, described relational matrix M _aas follows:

And have:

Wherein, I={I ₁, I ₂..., I _{| I|}the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d _ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].

In step 3, described computation process is as follows: the list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I ₁, I ₂..., I _{| I|}in the number of times that occurs in all affairs of projects member, be respectively N ₁, N ₂..., N _{| I|}, comprise project set I _jaffairs set be T ({ I _j), obtain projects support sup (I according to following formulae discovery _j):

sup(I _j)＝N _j/|D|,j∈[1,|I|]，

Projects support and set minimum support min_support are compared, and deletes the program member that support is less than minimum support, obtain frequent 1 collection set L ₁, based on frequent 1 collection set L ₁in affairs set T ({ I corresponding to each element _j), obtain frequent 1 collection set L ₁corresponding orthogonal linked list.

In step 4, with reference to citing document 1 in background technology, connection procedure is as follows: establish m ₁and m ₂frequent k-1 item collection set L _k-1any two members, the project in member by dictionary order sequence, namely for member have wherein represent member in i-th ₄individual project, wherein i ₃∈ { 0,1}, i ₄∈ 1,2 ..., k-1}, if member is m ₁and m ₂in before k-2 project all identical, member m ₁kth-2 projects be less than member m ₂kth-2 projects, i.e. (m ₁[1]=m ₂[1]) & & (m ₁[2]=m ₂[2]) & & ... & & (m ₁[k-2]=m ₂[k-2]) & & (m ₁[k-1] <m ₂[k-1]), then judge m ₁and m ₂be attachable, connect m ₁and m ₂the result produced is { m ₁[1], m ₁[2] ..., m ₁[k-1], m ₂[k-1] }.

In step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C _kmember if its all subsets comprising k-1 element are all in list head node, then by member be retained in candidate's frequent item set set C _kin, otherwise by it from C _kmiddle deletion.

In step 6, the step calculating described support comprises:

Candidate's frequent k item collection set C _kmiddle member then affairs set T (c _i)=T (I ₁, I ₂..., I _k-1, I _k)=T (I ₁, I ₂..., I _k-1) ∩ T (I _k), namely comprise the set of item collection affairs set T (c _i) be comprise item collection set { I ₁, I ₂..., I _k-1affairs set T (I ₁, I ₂..., I _k-1) and comprise item collection set { I _kaffairs set T (I _k) common factor, travel through the orthogonal linked list middle term collection { I that the set of frequent k-1 item collection is corresponding ₁, I ₂..., I _k-1corresponding row, affairs set T (I can be obtained ₁, I ₂..., I _k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding _kcorresponding row, obtain affairs set T (I _k), then by the set of following formulae discovery k item collection support:

sup({I ₁,I ₂,...,I _k})＝N(T(I ₁,I ₂,...,I _k-1)∩T(I _k))/|D|,k∈[1,n]，

Wherein, N (T (I ₁, I ₂..., I _k-1) ∩ T (I _k)) represent affairs set T (I ₁, I ₂..., I _k-1) and affairs set T (I _k) the number of transactions that comprises of common factor.

In the present invention, the member of item set is as item collection, and the member that item integrates is as project.

Beneficial effect: transaction database is converted into relational matrix by the present invention, owing to only comprising a small amount of project in usual each affairs, therefore the relational matrix after transforming is sparse matrix, and in order to reduce the space complexity of algorithm, this algorithm uses orthogonal linked list to store relational matrix.The production process of Frequent Item Sets is converted into the calculating process of the node set of single linked list corresponding to project in corresponding relation matrix.This algorithm only needs scan database one time, overcomes the shortcoming that Apriori and related algorithm thereof produce a large amount of Candidate Set and need Multiple-Scan database, decreases the time of frequently carrying out I/O operation; Secondly, generate and only need to carry out node set when finding frequent 2-item collection ship calculation, expend time in less, and record carried out to the frequent k-item collection structure single linked list generated, simplify the generative process of frequent k+1-item collection, avoid the beta pruning process of Apriori algorithm complexity.Finally, algorithm adopts orthogonal linked list storage organization, greatly reduces the demand to storage space.

Accompanying drawing illustrates:

Fig. 1 is transaction database D of the present invention.

Fig. 2 is relational matrix of the present invention.

Fig. 3 is relational matrix orthogonal linked list of the present invention.

Fig. 4 is each member's support of candidate of the present invention 2 collection.

Fig. 5 is frequent 2 the collection orthogonal linked lists of the present invention.

Fig. 6 is frequent 3 the collection orthogonal linked lists of the present invention.

Fig. 7 is correlation rule of the present invention.

Specific embodiments:

The present invention proposes a kind of database association rule digging method based on improving Apriori algorithm, comprising the following steps:

Step 1, scanning transaction database D, obtains relational matrix M _a;

In step 1, described relational matrix M _aas follows:

And have:

sup(I _j)＝N _j/|D|,j∈[1,|I|]，

Projects support and set minimum support min_support are compared, obtains frequent 1 collection set L ₁, based on frequent 1 collection set L ₁in affairs set T ({ I corresponding to each element _j), obtain frequent 1 collection set L ₁corresponding orthogonal linked list.

In step 6, the step calculating described support comprises:

Embodiment 1

Be described by the step of simple transaction database D to OLA algorithm of as shown in Figure 1, and its performance simply analyzed, the minimum support min_support=30% of setting.

1) according to OLA algorithm, first the transaction database D shown in Fig. 1 is scanned, in transaction database D, comprise 10 affairs T altogether ₁-T ₁₀, 6 project I ₁-I ₆.Scanning transaction database D will obtain relational matrix A as shown in Figure 2, the affairs T of the i-th row correspondence database D of relational matrix A _i, i ∈ [1,10], the project Ij in jth row correspondence database D, j ∈ [1,6], the nonzero element a in relational matrix _ijexpression project I _jbe included in affairs T _iin.Represented by relational matrix A orthogonal linked list, as shown in Figure 3, in Fig. 3, the node of M type is the gauge outfit node of orthogonal linked list, and the node of H type represents the row/column gauge outfit node of orthogonal linked list, and E type node is the node that in relational matrix, nonzero element is corresponding.

2) following, each row of traversal orthogonal linked list, can obtain Item Sets I={A, the number of times that in B, C, D, E, F}, projects occur in all affairs is respectively 5, and 4,5,3,5,6.According to formula sup (I _j)=N _j/ | D|, j ∈ [1,6] calculates, and can obtain projects support and be respectively 0.5,0.4,0.5,0.3,0.5,0.6, all be greater than default minimum support min_support=0.3, then frequent 1 collection set L ₁={ { A}, { B}, { C}, { D}, { E}, { F}}.Because all items all belongs to the member of frequent 1 collection, the orthogonal linked list that therefore frequent 1 set pair is answered is orthogonal linked list corresponding to relational matrix A.

4) by frequent 1 collection set L ₁carry out producing frequent 2 the collection set C of candidate from connecting ₂, as shown in Figure 4, C ₂={ { AB}, { AC}, { AD}, { AE}, { AF}, { BC}, { BD}, { BE}, { BF}, { CD}, { CE}, { CF}, { DE}, { DF}, { EF}}.Because all project Ij, j ∈ [1,6] is the member of frequent 1 collection, therefore frequent 2 collection are gathered without the need to carrying out beta pruning according to Apriori character.

5) to frequent 2 the collection set C of candidate ₂in all members, calculate its support.Such as candidate's frequent 2 collection AB}, travel through the row that in frequent 1 corresponding orthogonal linked list of collection, project A, B are corresponding respectively, the affairs set that can comprise project A is T (A)={ 1,5,6,8,10}, comprise the affairs set T (B)={ 2 of project B, 4,6,7}, then comprise Item Sets { affairs set T (AB)=T (A) ∩ T (B) of AB}={ 6}, then Item Sets { the support sup of AB} ({ AB})=0.1.In like manner can frequent 2 the collection set C of calculated candidate ₂in other all member support as shown in Figure 4.

6) by frequent for candidate 2 collection set C ₂in the support of all members and minimum support min_support=0.3 compare, delete the member that support is less than minimum support min_support, obtain frequent 2 collection set L ₂={ { AE}, { AF}, { BC}, { EF}} constructs orthogonal linked list corresponding to frequent 2 collection set as shown in Figure 5.

7) by frequent 2 collection set L ₂carry out producing frequent 3 the collection set C of candidate from connecting ₃, C ₃={ { AEF}}.{ row that AE} is corresponding, can comprise Item Sets { affairs set T (AE)={ 1,6 of AE} to travel through frequent 2 corresponding orthogonal linked list middle term collection of collection set, 10}, { row that F} is corresponding, can comprise Item Sets { the affairs set T (F)={ 1 of F} to travel through orthogonal linked list middle term collection corresponding to frequent 1 collection set, 4,5,6,8,10}, then T (AEF)=T (AE) ∩ T (F)={ 1,6,10}.{ support of AEF} is 0.3, is more than or equal to minimum support to can be calculated Item Sets according to OLA algorithm.Therefore, frequent 3 collection set L ₃=AEF}}, and construct frequent 3 corresponding orthogonal linked lists of collection set as shown in Figure 6.Due to frequent 3 collection set L ₃in only have a member, therefore it is very big Frequent Item Sets, generates Frequent Item Sets process and terminates.

8) according to OLA algorithm generation correlation rule as shown in Figure 7.

Performance: in this example, run OLA algorithm identical with the frequent item set that Apriori algorithm obtains, Apriori algorithm needs scanning transaction database 21 times, and OLA algorithm only needs scan database 1 time, greatly reduce the working time of algorithm, improve efficiency.

The invention provides a kind of database association rule digging method based on improving Apriori algorithm; the method and access of this technical scheme of specific implementation is a lot; the above is only the preferred embodiment of the present invention; should be understood that; for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.The all available prior art of each ingredient not clear and definite in the present embodiment is realized.

Claims

1., based on the database association rule digging method improving Apriori algorithm, it is characterized in that: comprise the following steps:

Step 1, scanning transaction database D, obtains relational matrix M _a;

Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal _a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node; M node is the gauge outfit node of orthogonal linked list; H node is row or column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding;

Step 5, utilizes Apriori character to candidate's frequent k item collection set C _kcarry out beta pruning;

Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then produces correlation rule:

2. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 1, it is characterized in that: in step 1, described relational matrix M _aas follows:

And have:

3. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 2, it is characterized in that, in step 3, described computation process is as follows:

The list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I ₁, I ₂..., I _{| I|}in the number of times that occurs in all affairs of projects member, be respectively N ₁, N ₂..., N _{| I|}, comprise project set I _jaffairs set be T ({ I _j), obtain projects support sup (I according to following formulae discovery _j):

sup(I _j)＝N _j/|D|,j∈[1,|I|]，

4. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 3, it is characterized in that, in step 4, connection procedure is as follows: establish m ₁and m ₂frequent k-1 item collection set L _k-1any two members, the project in member by dictionary order sequence, namely for member have wherein represent member in i-th ₄individual project, wherein i ₃∈ { 0,1}, i ₄∈ 1,2 ..., k-1}, if member is m ₁and m ₂in before k-2 project all identical, member m ₁kth-2 projects be less than member m ₂kth-2 projects, i.e. (m ₁[1]=m ₂[1]) & & (m ₁[2]=m ₂[2]) & & ... & & (m ₁[k-2]=m ₂[k-2]) & & (m ₁[k-1] <m ₂[k-1]), then judge m ₁and m ₂be attachable, connect m ₁and m ₂the result produced is { m ₁[1], m ₁[2] ..., m ₁[k-1], m ₂[k-1] }.

5. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 4, it is characterized in that, in step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C _kmember if its all subsets comprising k-1 element are all in list head node, then by member be retained in candidate's frequent item set set C _kin, otherwise by it from C _kmiddle deletion.

6. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 5, it is characterized in that, in step 6, the step calculating described support comprises:

Candidate's frequent k item collection set C _kmiddle member

c_{i_{1}} = {I_{1}, I_{2}, ..., I_{k}} = {I_{1}, I_{2}, ..., I_{k - 1}} \cup {I_{k}},

Then affairs set T (c _i)=T (I ₁, I ₂..., I _k-1, I _k)=T (I ₁, I ₂..., I _k-1) ∩ T (I _k), namely comprise the set of item collection affairs set T (c _i) be comprise item collection set { I ₁, I ₂..., I _k-1affairs set T (I ₁, I ₂..., I _k-1) and comprise item collection set { I _kaffairs set T (I _k) common factor, travel through the orthogonal linked list middle term collection set { I that the set of frequent k-1 item collection is corresponding ₁, I ₂..., I _k-1corresponding row, obtain affairs set T (I ₁, I ₂..., I _k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding _kcorresponding row, obtain affairs set T (I _k), then by the set of following formulae discovery k item collection support sup ({ I ₁, I ₂..., I _k):