CN105320756A - Improved Apriori algorithm based method for mining database association rule - Google Patents
Improved Apriori algorithm based method for mining database association rule Download PDFInfo
- Publication number
- CN105320756A CN105320756A CN201510666724.4A CN201510666724A CN105320756A CN 105320756 A CN105320756 A CN 105320756A CN 201510666724 A CN201510666724 A CN 201510666724A CN 105320756 A CN105320756 A CN 105320756A
- Authority
- CN
- China
- Prior art keywords
- frequent
- item
- node
- affairs
- linked list
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention proposes an improved Apriori algorithm based method for mining a database association rule. According to the method, a transaction database is converted into a relational matrix, the converted relational matrix is a sparse matrix, and the relational matrix is stored with an orthogonal link list. A generation process of a frequent item set is converted into an operation process of a single link list node set corresponding to items in the corresponding relational matrix. According to the method, a database only needs to be scanned once, so that the shortcomings that Apriori and a related algorithm therefor generate a large amount of candidate sets and need to scan the database for multiple times are overcome, and the time of frequently performing I/O operations is shortened; then, when a frequent 2-item set is generated and found, only an intersection operation of a node set needs to be performed, so that less time is consumed; and a single link list constructed by a generated frequent k-item set is recorded, so that a generation process of a frequent K+1-item set is simplified, and a complex pruning process of the Apriori algorithm is avoided.
Description
Technical field
The invention discloses a kind of database association rule digging method based on improving Apriori algorithm, emphasis relates to and is representing on the basis of transaction database with orthogonal linked list storage matrix, transformation and optimization is carried out to the Frequent Item Sets generative process of Apriori algorithm, belongs to computer data and excavate and technical field of information processing.
Background technology
Develop today like a raging fire at large data technique, people recognize that namely data are wealth gradually, especially have more huge practical value to the analysis of business data.Association Rule Analysis, as one of the Main Means of data mining, is an important component part indispensable in data mining technology, is mainly used in finding valuable interesting contact implicit in large-scale transaction database and rule.Therefore, the research of association rule algorithm is had very important significance.
As far back as 1993, the people such as the computer scientist R.Agrawal of IBM found the purchase rule of client when buying commodity in customer transaction DB, propose the correlative model between affairs, namely initial correlation rule.Normally a kind of uncomplicated but rule that practicality is very high of correlation rule.By Association Rule Analysis, we can by the relation excavation between transaction itemset and item collection out.The most typical application of Association Rule Analysis is market basket data analysis, such as classical { beer } → { diaper } rule.Except being applied to except market basket data, the application of Association Rule Analysis in other field is also very extensive, as personalized recommendations in E-business, and financial service, advertisement plan, bioinformatics and science data analysis etc.Such as in personalized recommendations in E-business, correlation rule can help e-commerce website to carry out the interested commercial product recommending of some their possibilities to the client with similar consumer behavior, contribute to e-commerce website like this and promote Consumer's Experience, net income increase etc.
Association Rule Analysis algorithm is more, and wherein the most classical practicality is it is preferred that Apriori algorithm and innovatory algorithm thereof.Apriori algorithm [1] is first association rule algorithm proposed in 1994 by Agrawal and Swami, is widely used, and this algorithm performs connection by repetitive cycling, beta pruning generates Frequent Item Sets, thus the rule that is associated.Based on Apriori algorithm, the people such as Yang propose Apriori-TFP algorithm [2], and raw data, in association rule mining process, is carried out pre-service and be stored in local supporting, in tree, finally to generate correlation rule by this algorithm.This algorithm, by effective pre-service, reduces the time of association rule mining, but needs the number of times of scan database still more.The people such as Zhang propose GP-Apriori algorithm [3], GP-Apriori algorithm adopts graphic process unit (GraphicalProcessingUnit, GPU) carry out the support counting of parallelization, and row of vertically concluding the business are stored as linear oldered array.GPU by this oldered array of traversal, and performs step-by-step and intersects and realize support and calculate, and result is copied back internal memory.Compared with the Apriori algorithm that traditional C PU runs, GP-Apriori algorithm improves operating rate owing to have employed advanced GPU, but complicacy increases on the contrary to some extent.The people such as Delighta it is also proposed innovatory algorithm (AprioriMendAlgorithm) [4] of Apriori.This algorithm uses hash function to generate Item Sets, and user must specify minimum support to delete unwanted item collection.This algorithm has efficiency more better than traditional Apriori algorithm, but the execution time increases to some extent.Ning etc. achieve the parallelization [5] of happy Apriori algorithm based on MapReduce framework.This algorithm is with good expansibility and efficiency when processing massive data sets, but this counts the powerful calculating of needs and storage capacity supports, and is normally operated in cluster environment.The people such as Sulianta attempt in the document [6] Apriori algorithm to be applied to multidimensional data analysis, have inquired into the more concrete effective method of rule that to be associated in multidimensional data.The people such as Sheila improve Apriori algorithm in document [7], introduce the concept of affairs size and affairs scale to eliminate the impact of non-critical item object.The people such as Feng propose a kind of Apriori algorithm based on matrix in document [8], and this algorithm effectively represents the various operations of database by matrix, and obtain maximum Frequent Item Sets with the AND operation based on matrix.The people such as Hu are application relational theory thought in document [9], introduce project reparable subspace and AND operation thereof, devise a kind of fast algorithm for mining-SLIG (Single-levelLargeItemsetsGeneration) algorithm, the production process of Frequent Item Sets is converted into vector operation process in the relational matrix of Item Sets.This algorithm overcomes Apriori and related algorithm produces a large amount of Candidate Set and needs the shortcoming of Multiple-Scan database, but the storage space needed is larger.
Reference citation
[1] R.Agrawal, R.Srikantetal..Fastalgorithmsforminingassociationrules (fast algorithm of Mining Association Rules), Proc.20thInt.Conf.VeryLargeDataBases, VLDB, vol.1215, pp.487-499, September1994.
[2] Z.Yang, W.Tang, A.Shintemirov, andQ.Wu.Associationrulemining-baseddissolvedgasanalysisf orfaultdiagnosisofpowertransformers (the diagnosing fault of power transformer dissolved gas analysis based on association rule mining), Systems, Man, andCybernetics, PartC:ApplicationsandReviews, IEEETransactionson, vol.39, no.6, pp.597-610,2009.
[3] F.Zhang, Y.Zhang, andJ.D.Bakos.Gpapriori:Gpu-acceleratedfrequentitemsetmin ing (frequent item set mining that graphic based processor accelerates), inCLUSTER.IEEE, 2011, pp.590-594.
[4] I.S.P.J.D.MagdaleneDelightaAngeline.Associationrulegener ationusingApriorimendalgorithmforstudent'splacement (Association Rules Generating Algorithm based on improving Apriori algorithm), vol.2, no.1,2012, pp.78-86.
[5] N.Li, L.Zeng, Q.He, andZ.Shi.Parallelimplementationofapriorialgorithmbasedon MapReduce (Parallel Implementation based on the Apriori algorithm of MapReduce), inSoftwareEngineering, ArtificialIntelligence, NetworkingandParallelDistributedComputing (SNPD), 201213thACISInternationalConferenceon, 2012, pp.236-241.
[6] F.Sulianta, T.H.Liong, andI.Atastina.Miningfoodindustry'smultidimensionaldatato produceassociationrulesusingApriorialgorithmasabasisofbu sinessstrategy (the corporate strategy association rules mining algorithm towards food industry multidimensional data based on Apriori algorithm), inInformationandCommunicationTechnology (ICoICT), 2013InternationalConferenceof, 2013, pp.176-181.
[7] S.A.Abaya.AssociationruleminingbasedonApriorialgorithmin minimizingcandidategeneration (the minimum generation candidate association rule mining algorithms based on Apriori algorithm), InternationalJournalofScientificandEngineeringResearch, vol.3, no.7, pp.1-4, July2012.
[8] WangFeng, LiYong-hua.AnImprovedAprioriAlgorithmBasedontheMatrix (a kind of improvement Apriori algorithm based on matrix), fbie, pp.152-155,2008InternationalSeminaronFutureBioMedicalInformationEng ineering, 2008.
[9] Hu Huirong, Wang Zhoujing. a kind of Fast algorithm for mining association rules based on relational matrix, computer utility, 2005,25 (7): 1577-1579.
Summary of the invention
The present invention proposes a kind of database association rule digging method based on improving Apriori algorithm, comprising improvement Apriori algorithm-OLA (OrthogonalListApriori) algorithm based on orthogonal linked list affairs storage matrix.
The present invention includes following steps:
Step 1, scanning transaction database D, obtains relational matrix M
a;
Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal
a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node, and M node is the gauge outfit node of orthogonal linked list; H node is row/column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding; Three kinds of nodes all comprise four territories, Tag territory, Element territory, Right territory and Down territory.Tag territory is mark domain, in order to distinguish three kinds of dissimilar nodes.Element territory is element fields, and concerning orthogonal linked list gauge outfit node, what two tuples in element fields stored is line number and the columns of corresponding sparse relational matrix, i.e. the number of transactions that comprises of transaction database D and item number; Concerning row gauge outfit node, the nonzero element number comprised in which row of two element group representations in element fields and this row; Concerning list head node, the nonzero element number in two element group representation projects in element fields or Item Sets and this row; Concerning nonzero element node, two element group representation projects in element fields or Item Sets and comprise the affairs numbering of this project or Item Sets.Right territory is pointer field, and concerning orthogonal linked list gauge outfit node, it points to first list head node; Concerning row gauge outfit node, it points to this row first nonzero element node; Concerning list head node, it points to next list head node; Concerning nonzero element node, it points to the next nonzero element node of this row.Down territory is also pointer field, and concerning orthogonal linked list gauge outfit node, it points to first row gauge outfit node; Concerning row gauge outfit node, it points to next list head node; Concerning list head node, it points to these row first nonzero element node; Concerning nonzero element node, it points to the next nonzero element node of these row;
Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2
1and frequent 1 collection set L
1corresponding orthogonal linked list;
Step 4, by frequent k-1 item collection set L
k-1be connected with self and produce candidate's frequent k item collection set C
k, k be interval [2, ∞) in natural number;
Step 5, utilizes Apriori character (all nonvoid subsets of frequent item set also must be frequently, if the nonvoid subset of certain candidate is not frequently, so this candidate certainly not frequently) to candidate's frequent item set set C
kcarry out beta pruning;
Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection
affairs set
and calculated candidate frequent k item collection set C
kmiddle member
support, wherein i
1for interval [1, N
k] in natural number, N
krepresent candidate's frequent k item collection set C
kthe number of members comprised;
Step 7, by frequent for candidate k item collection set C
kmiddle member
support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L
k, and according to gained affairs set in step 6
construct frequent k item collection set L
kcorresponding orthogonal linked list;
Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;
Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then can produces correlation rule:
R={A->B}, A are any member in frequent item set set F
nonvoid subset, B is the supplementary set of A, namely
∈ F, i
2for interval [1, N
f] in natural number, N
frepresent the number of members that frequent item set set F comprises.
In step 1, described relational matrix M
aas follows:
And have:
Wherein, I={I
1, I
2..., I
| I|the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d
ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].
In step 3, described computation process is as follows: the list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I
1, I
2..., I
| I|in the number of times that occurs in all affairs of projects member, be respectively N
1, N
2..., N
| I|, comprise project set I
jaffairs set be T ({ I
j), obtain projects support sup (I according to following formulae discovery
j):
sup(I
j)=N
j/|D|,j∈[1,|I|],
Projects support and set minimum support min_support are compared, and deletes the program member that support is less than minimum support, obtain frequent 1 collection set L
1, based on frequent 1 collection set L
1in affairs set T ({ I corresponding to each element
j), obtain frequent 1 collection set L
1corresponding orthogonal linked list.
In step 4, with reference to citing document 1 in background technology, connection procedure is as follows: establish m
1and m
2frequent k-1 item collection set L
k-1any two members, the project in member by dictionary order sequence, namely for member
have
wherein
represent member
in i-th
4individual project, wherein i
3∈ { 0,1}, i
4∈ 1,2 ..., k-1}, if member is m
1and m
2in before k-2 project all identical, member m
1kth-2 projects be less than member m
2kth-2 projects, i.e. (m
1[1]=m
2[1]) & & (m
1[2]=m
2[2]) & & ... & & (m
1[k-2]=m
2[k-2]) & & (m
1[k-1] <m
2[k-1]), then judge m
1and m
2be attachable, connect m
1and m
2the result produced is { m
1[1], m
1[2] ..., m
1[k-1], m
2[k-1] }.
In step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C
kmember
if its all subsets comprising k-1 element are all in list head node, then by member
be retained in candidate's frequent item set set C
kin, otherwise by it from C
kmiddle deletion.
In step 6, the step calculating described support comprises:
Candidate's frequent k item collection set C
kmiddle member
then affairs set T (c
i)=T (I
1, I
2..., I
k-1, I
k)=T (I
1, I
2..., I
k-1) ∩ T (I
k), namely comprise the set of item collection
affairs set T (c
i) be comprise item collection set { I
1, I
2..., I
k-1affairs set T (I
1, I
2..., I
k-1) and comprise item collection set { I
kaffairs set T (I
k) common factor, travel through the orthogonal linked list middle term collection { I that the set of frequent k-1 item collection is corresponding
1, I
2..., I
k-1corresponding row, affairs set T (I can be obtained
1, I
2..., I
k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding
kcorresponding row, obtain affairs set T (I
k), then by the set of following formulae discovery k item collection
support:
sup({I
1,I
2,...,I
k})=N(T(I
1,I
2,...,I
k-1)∩T(I
k))/|D|,k∈[1,n],
Wherein, N (T (I
1, I
2..., I
k-1) ∩ T (I
k)) represent affairs set T (I
1, I
2..., I
k-1) and affairs set T (I
k) the number of transactions that comprises of common factor.
In the present invention, the member of item set is as item collection, and the member that item integrates is as project.
Beneficial effect: transaction database is converted into relational matrix by the present invention, owing to only comprising a small amount of project in usual each affairs, therefore the relational matrix after transforming is sparse matrix, and in order to reduce the space complexity of algorithm, this algorithm uses orthogonal linked list to store relational matrix.The production process of Frequent Item Sets is converted into the calculating process of the node set of single linked list corresponding to project in corresponding relation matrix.This algorithm only needs scan database one time, overcomes the shortcoming that Apriori and related algorithm thereof produce a large amount of Candidate Set and need Multiple-Scan database, decreases the time of frequently carrying out I/O operation; Secondly, generate and only need to carry out node set when finding frequent 2-item collection ship calculation, expend time in less, and record carried out to the frequent k-item collection structure single linked list generated, simplify the generative process of frequent k+1-item collection, avoid the beta pruning process of Apriori algorithm complexity.Finally, algorithm adopts orthogonal linked list storage organization, greatly reduces the demand to storage space.
Accompanying drawing illustrates:
Fig. 1 is transaction database D of the present invention.
Fig. 2 is relational matrix of the present invention.
Fig. 3 is relational matrix orthogonal linked list of the present invention.
Fig. 4 is each member's support of candidate of the present invention 2 collection.
Fig. 5 is frequent 2 the collection orthogonal linked lists of the present invention.
Fig. 6 is frequent 3 the collection orthogonal linked lists of the present invention.
Fig. 7 is correlation rule of the present invention.
Specific embodiments:
The present invention proposes a kind of database association rule digging method based on improving Apriori algorithm, comprising the following steps:
Step 1, scanning transaction database D, obtains relational matrix M
a;
Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal
a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node, and M node is the gauge outfit node of orthogonal linked list; H node is row/column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding; Three kinds of nodes all comprise four territories, Tag territory, Element territory, Right territory and Down territory.Tag territory is mark domain, in order to distinguish three kinds of dissimilar nodes.Element territory is element fields, and concerning orthogonal linked list gauge outfit node, what two tuples in element fields stored is line number and the columns of corresponding sparse relational matrix, i.e. the number of transactions that comprises of transaction database D and item number; Concerning row gauge outfit node, the nonzero element number comprised in which row of two element group representations in element fields and this row; Concerning list head node, the nonzero element number in two element group representation projects in element fields or Item Sets and this row; Concerning nonzero element node, two element group representation projects in element fields or Item Sets and comprise the affairs numbering of this project or Item Sets.Right territory is pointer field, and concerning orthogonal linked list gauge outfit node, it points to first list head node; Concerning row gauge outfit node, it points to this row first nonzero element node; Concerning list head node, it points to next list head node; Concerning nonzero element node, it points to the next nonzero element node of this row.Down territory is also pointer field, and concerning orthogonal linked list gauge outfit node, it points to first row gauge outfit node; Concerning row gauge outfit node, it points to next list head node; Concerning list head node, it points to these row first nonzero element node; Concerning nonzero element node, it points to the next nonzero element node of these row;
Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2
1and frequent 1 collection set L
1corresponding orthogonal linked list;
Step 4, by frequent k-1 item collection set L
k-1be connected with self and produce candidate's frequent k item collection set C
k, k be interval [2, ∞) in natural number;
Step 5, utilizes Apriori character (all nonvoid subsets of frequent item set also must be frequently, if the nonvoid subset of certain candidate is not frequently, so this candidate certainly not frequently) to candidate's frequent item set set C
kcarry out beta pruning;
Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection
affairs set
and calculated candidate frequent k item collection set C
kmiddle member
support, wherein i
1for interval [1, N
k] in natural number, N
krepresent candidate's frequent k item collection set C
kthe number of members comprised;
Step 7, by frequent for candidate k item collection set C
kmiddle member
support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L
k, and according to gained affairs set in step 6
construct frequent k item collection set L
kcorresponding orthogonal linked list;
Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;
Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then can produces correlation rule:
R={A->B}, A are any member in frequent item set set F
nonvoid subset, B is the supplementary set of A, namely
∈ F, i
2for interval [1, N
f] in natural number, N
frepresent the number of members that frequent item set set F comprises.
In step 1, described relational matrix M
aas follows:
And have:
Wherein, I={I
1, I
2..., I
| I|the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d
ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].
In step 3, described computation process is as follows: the list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I
1, I
2..., I
| I|in the number of times that occurs in all affairs of projects member, be respectively N
1, N
2..., N
| I|, comprise project set I
jaffairs set be T ({ I
j), obtain projects support sup (I according to following formulae discovery
j):
sup(I
j)=N
j/|D|,j∈[1,|I|],
Projects support and set minimum support min_support are compared, obtains frequent 1 collection set L
1, based on frequent 1 collection set L
1in affairs set T ({ I corresponding to each element
j), obtain frequent 1 collection set L
1corresponding orthogonal linked list.
In step 4, with reference to citing document 1 in background technology, connection procedure is as follows: establish m
1and m
2frequent k-1 item collection set L
k-1any two members, the project in member by dictionary order sequence, namely for member
have
wherein
represent member
in i-th
4individual project, wherein i
3∈ { 0,1}, i
4∈ 1,2 ..., k-1}, if member is m
1and m
2in before k-2 project all identical, member m
1kth-2 projects be less than member m
2kth-2 projects, i.e. (m
1[1]=m
2[1]) & & (m
1[2]=m
2[2]) & & ... & & (m
1[k-2]=m
2[k-2]) & & (m
1[k-1] <m
2[k-1]), then judge m
1and m
2be attachable, connect m
1and m
2the result produced is { m
1[1], m
1[2] ..., m
1[k-1], m
2[k-1] }.
In step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C
kmember
if its all subsets comprising k-1 element are all in list head node, then by member
be retained in candidate's frequent item set set C
kin, otherwise by it from C
kmiddle deletion.
In step 6, the step calculating described support comprises:
Candidate's frequent k item collection set C
kmiddle member
then affairs set T (c
i)=T (I
1, I
2..., I
k-1, I
k)=T (I
1, I
2..., I
k-1) ∩ T (I
k), namely comprise the set of item collection
affairs set T (c
i) be comprise item collection set { I
1, I
2..., I
k-1affairs set T (I
1, I
2..., I
k-1) and comprise item collection set { I
kaffairs set T (I
k) common factor, travel through the orthogonal linked list middle term collection { I that the set of frequent k-1 item collection is corresponding
1, I
2..., I
k-1corresponding row, affairs set T (I can be obtained
1, I
2..., I
k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding
kcorresponding row, obtain affairs set T (I
k), then by the set of following formulae discovery k item collection
support:
sup({I
1,I
2,...,I
k})=N(T(I
1,I
2,...,I
k-1)∩T(I
k))/|D|,k∈[1,n],
Wherein, N (T (I
1, I
2..., I
k-1) ∩ T (I
k)) represent affairs set T (I
1, I
2..., I
k-1) and affairs set T (I
k) the number of transactions that comprises of common factor.
Embodiment 1
Be described by the step of simple transaction database D to OLA algorithm of as shown in Figure 1, and its performance simply analyzed, the minimum support min_support=30% of setting.
1) according to OLA algorithm, first the transaction database D shown in Fig. 1 is scanned, in transaction database D, comprise 10 affairs T altogether
1-T
10, 6 project I
1-I
6.Scanning transaction database D will obtain relational matrix A as shown in Figure 2, the affairs T of the i-th row correspondence database D of relational matrix A
i, i ∈ [1,10], the project Ij in jth row correspondence database D, j ∈ [1,6], the nonzero element a in relational matrix
ijexpression project I
jbe included in affairs T
iin.Represented by relational matrix A orthogonal linked list, as shown in Figure 3, in Fig. 3, the node of M type is the gauge outfit node of orthogonal linked list, and the node of H type represents the row/column gauge outfit node of orthogonal linked list, and E type node is the node that in relational matrix, nonzero element is corresponding.
2) following, each row of traversal orthogonal linked list, can obtain Item Sets I={A, the number of times that in B, C, D, E, F}, projects occur in all affairs is respectively 5, and 4,5,3,5,6.According to formula sup (I
j)=N
j/ | D|, j ∈ [1,6] calculates, and can obtain projects support and be respectively 0.5,0.4,0.5,0.3,0.5,0.6, all be greater than default minimum support min_support=0.3, then frequent 1 collection set L
1={ { A}, { B}, { C}, { D}, { E}, { F}}.Because all items all belongs to the member of frequent 1 collection, the orthogonal linked list that therefore frequent 1 set pair is answered is orthogonal linked list corresponding to relational matrix A.
4) by frequent 1 collection set L
1carry out producing frequent 2 the collection set C of candidate from connecting
2, as shown in Figure 4, C
2={ { AB}, { AC}, { AD}, { AE}, { AF}, { BC}, { BD}, { BE}, { BF}, { CD}, { CE}, { CF}, { DE}, { DF}, { EF}}.Because all project Ij, j ∈ [1,6] is the member of frequent 1 collection, therefore frequent 2 collection are gathered without the need to carrying out beta pruning according to Apriori character.
5) to frequent 2 the collection set C of candidate
2in all members, calculate its support.Such as candidate's frequent 2 collection AB}, travel through the row that in frequent 1 corresponding orthogonal linked list of collection, project A, B are corresponding respectively, the affairs set that can comprise project A is T (A)={ 1,5,6,8,10}, comprise the affairs set T (B)={ 2 of project B, 4,6,7}, then comprise Item Sets { affairs set T (AB)=T (A) ∩ T (B) of AB}={ 6}, then Item Sets { the support sup of AB} ({ AB})=0.1.In like manner can frequent 2 the collection set C of calculated candidate
2in other all member support as shown in Figure 4.
6) by frequent for candidate 2 collection set C
2in the support of all members and minimum support min_support=0.3 compare, delete the member that support is less than minimum support min_support, obtain frequent 2 collection set L
2={ { AE}, { AF}, { BC}, { EF}} constructs orthogonal linked list corresponding to frequent 2 collection set as shown in Figure 5.
7) by frequent 2 collection set L
2carry out producing frequent 3 the collection set C of candidate from connecting
3, C
3={ { AEF}}.{ row that AE} is corresponding, can comprise Item Sets { affairs set T (AE)={ 1,6 of AE} to travel through frequent 2 corresponding orthogonal linked list middle term collection of collection set, 10}, { row that F} is corresponding, can comprise Item Sets { the affairs set T (F)={ 1 of F} to travel through orthogonal linked list middle term collection corresponding to frequent 1 collection set, 4,5,6,8,10}, then T (AEF)=T (AE) ∩ T (F)={ 1,6,10}.{ support of AEF} is 0.3, is more than or equal to minimum support to can be calculated Item Sets according to OLA algorithm.Therefore, frequent 3 collection set L
3=AEF}}, and construct frequent 3 corresponding orthogonal linked lists of collection set as shown in Figure 6.Due to frequent 3 collection set L
3in only have a member, therefore it is very big Frequent Item Sets, generates Frequent Item Sets process and terminates.
8) according to OLA algorithm generation correlation rule as shown in Figure 7.
Performance: in this example, run OLA algorithm identical with the frequent item set that Apriori algorithm obtains, Apriori algorithm needs scanning transaction database 21 times, and OLA algorithm only needs scan database 1 time, greatly reduce the working time of algorithm, improve efficiency.
The invention provides a kind of database association rule digging method based on improving Apriori algorithm; the method and access of this technical scheme of specific implementation is a lot; the above is only the preferred embodiment of the present invention; should be understood that; for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.The all available prior art of each ingredient not clear and definite in the present embodiment is realized.
Claims (6)
1., based on the database association rule digging method improving Apriori algorithm, it is characterized in that: comprise the following steps:
Step 1, scanning transaction database D, obtains relational matrix M
a;
Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal
a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node; M node is the gauge outfit node of orthogonal linked list; H node is row or column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding;
Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2
1and frequent 1 collection set L
1corresponding orthogonal linked list;
Step 4, by frequent k-1 item collection set L
k-1be connected with self and produce candidate's frequent k item collection set C
k, k be interval [2, ∞) in natural number;
Step 5, utilizes Apriori character to candidate's frequent k item collection set C
kcarry out beta pruning;
Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection
affairs set
and calculated candidate frequent k item collection set C
kmiddle member
support, wherein i
1for interval [1, N
k] in natural number, N
krepresent candidate's frequent k item collection set C
kthe number of members comprised;
Step 7, by frequent for candidate k item collection set C
kmiddle member
support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L
k, and according to gained affairs set in step 6
construct frequent k item collection set L
kcorresponding orthogonal linked list;
Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;
Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then produces correlation rule:
R={A->B}, A are any member in frequent item set set F
nonvoid subset, B is the supplementary set of A, namely
∈ F, i
2for interval [1, N
f] in natural number, N
frepresent the number of members that frequent item set set F comprises.
2. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 1, it is characterized in that: in step 1, described relational matrix M
aas follows:
And have:
Wherein, I={I
1, I
2..., I
| I|the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d
ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].
3. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 2, it is characterized in that, in step 3, described computation process is as follows:
The list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I
1, I
2..., I
| I|in the number of times that occurs in all affairs of projects member, be respectively N
1, N
2..., N
| I|, comprise project set I
jaffairs set be T ({ I
j), obtain projects support sup (I according to following formulae discovery
j):
sup(I
j)=N
j/|D|,j∈[1,|I|],
Projects support and set minimum support min_support are compared, and deletes the program member that support is less than minimum support, obtain frequent 1 collection set L
1, based on frequent 1 collection set L
1in affairs set T ({ I corresponding to each element
j), obtain frequent 1 collection set L
1corresponding orthogonal linked list.
4. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 3, it is characterized in that, in step 4, connection procedure is as follows: establish m
1and m
2frequent k-1 item collection set L
k-1any two members, the project in member by dictionary order sequence, namely for member
have
wherein
represent member
in i-th
4individual project, wherein i
3∈ { 0,1}, i
4∈ 1,2 ..., k-1}, if member is m
1and m
2in before k-2 project all identical, member m
1kth-2 projects be less than member m
2kth-2 projects, i.e. (m
1[1]=m
2[1]) & & (m
1[2]=m
2[2]) & & ... & & (m
1[k-2]=m
2[k-2]) & & (m
1[k-1] <m
2[k-1]), then judge m
1and m
2be attachable, connect m
1and m
2the result produced is { m
1[1], m
1[2] ..., m
1[k-1], m
2[k-1] }.
5. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 4, it is characterized in that, in step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C
kmember
if its all subsets comprising k-1 element are all in list head node, then by member
be retained in candidate's frequent item set set C
kin, otherwise by it from C
kmiddle deletion.
6. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 5, it is characterized in that, in step 6, the step calculating described support comprises:
Candidate's frequent k item collection set C
kmiddle member
Then affairs set T (c
i)=T (I
1, I
2..., I
k-1, I
k)=T (I
1, I
2..., I
k-1) ∩ T (I
k), namely comprise the set of item collection
affairs set T (c
i) be comprise item collection set { I
1, I
2..., I
k-1affairs set T (I
1, I
2..., I
k-1) and comprise item collection set { I
kaffairs set T (I
k) common factor, travel through the orthogonal linked list middle term collection set { I that the set of frequent k-1 item collection is corresponding
1, I
2..., I
k-1corresponding row, obtain affairs set T (I
1, I
2..., I
k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding
kcorresponding row, obtain affairs set T (I
k), then by the set of following formulae discovery k item collection
support sup ({ I
1, I
2..., I
k):
sup({I
1,I
2,...,I
k})=N(T(I
1,I
2,...,I
k-1)∩T(I
k))/|D|,k∈[1,n],
Wherein, N (T (I
1, I
2..., I
k-1) ∩ T (I
k)) represent affairs set T (I
1, I
2..., I
k-1) and affairs set T (I
k) the number of transactions that comprises of common factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510666724.4A CN105320756B (en) | 2015-10-15 | 2015-10-15 | A kind of database association rule digging method based on improvement Apriori algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510666724.4A CN105320756B (en) | 2015-10-15 | 2015-10-15 | A kind of database association rule digging method based on improvement Apriori algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105320756A true CN105320756A (en) | 2016-02-10 |
CN105320756B CN105320756B (en) | 2018-07-10 |
Family
ID=55248142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510666724.4A Active CN105320756B (en) | 2015-10-15 | 2015-10-15 | A kind of database association rule digging method based on improvement Apriori algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105320756B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105827603A (en) * | 2016-03-14 | 2016-08-03 | 中国人民解放军信息工程大学 | Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device |
CN106294617A (en) * | 2016-07-29 | 2017-01-04 | 浪潮软件集团有限公司 | Method for efficiently mining frequent item sets in association rule |
CN106407296A (en) * | 2016-08-30 | 2017-02-15 | 江苏省邮电规划设计院有限责任公司 | Local scan association rule computer data analysis method based on pre-judging screening |
CN106991141A (en) * | 2017-03-21 | 2017-07-28 | 北京邮电大学 | A kind of association rule mining method based on depth pruning strategy |
CN107066587A (en) * | 2017-04-17 | 2017-08-18 | 贵州大学 | A kind of efficient Mining Frequent Itemsets based on group chained list |
CN107220483A (en) * | 2017-05-09 | 2017-09-29 | 西北大学 | A kind of mode prediction method of polynary time series data |
CN107256213A (en) * | 2017-06-28 | 2017-10-17 | 中国石油大学(华东) | A kind of topic relation based on parallel association rule finds method and finds device |
CN107844514A (en) * | 2017-09-22 | 2018-03-27 | 深圳市易成自动驾驶技术有限公司 | Data digging method, device and computer-readable recording medium |
CN107870939A (en) * | 2016-09-27 | 2018-04-03 | 腾讯科技(深圳)有限公司 | A kind of mode excavation method and device |
CN107908776A (en) * | 2017-11-30 | 2018-04-13 | 国网浙江省电力公司经济技术研究院 | Frequent mode Web Mining algorithm and system based on affairs project incidence matrix |
CN107943946A (en) * | 2017-11-24 | 2018-04-20 | 重庆科技学院 | Relevance method for digging between test item bank knowledge point based on Apriori algorithm |
CN108108441A (en) * | 2017-12-21 | 2018-06-01 | 新博卓畅技术(北京)有限公司 | A kind of database table structure analysis method and system |
CN108334548A (en) * | 2017-12-26 | 2018-07-27 | 爱品克科技(武汉)股份有限公司 | A kind of data mining technology based on correlation rule |
CN108664642A (en) * | 2018-05-16 | 2018-10-16 | 句容市茂润苗木有限公司 | Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm |
CN109859852A (en) * | 2019-01-25 | 2019-06-07 | 青海大学 | Improved Apriori algorithm and its application in Tibetan medicine's association mining |
CN110489448A (en) * | 2019-07-24 | 2019-11-22 | 西安理工大学 | The method for digging of big data correlation rule based on Hadoop |
CN110489411A (en) * | 2019-07-11 | 2019-11-22 | 齐鲁工业大学 | A kind of association rule mining method based on virtual value storage and operation mode |
CN110992698A (en) * | 2019-12-26 | 2020-04-10 | 北京工业大学 | Method for calculating association degree between intersections based on Apriori support degree and driving distance in weighting manner |
CN111858662A (en) * | 2020-06-01 | 2020-10-30 | 广东恒睿科技有限公司 | Method, system and storage medium for identifying underlying network potential danger data |
CN112241420A (en) * | 2020-10-26 | 2021-01-19 | 浪潮云信息技术股份公司 | Government affair service item recommendation method based on association rule algorithm |
CN113010597A (en) * | 2021-04-06 | 2021-06-22 | 东北大学 | Parallel association rule mining method for ocean big data |
CN113568942A (en) * | 2021-05-26 | 2021-10-29 | 南京师范大学 | Data set frequent item set mining availability evaluation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6061682A (en) * | 1997-08-12 | 2000-05-09 | International Business Machine Corporation | Method and apparatus for mining association rules having item constraints |
CN103150515A (en) * | 2012-12-29 | 2013-06-12 | 江苏大学 | Association rule mining method for privacy protection under distributed environment |
CN103593400A (en) * | 2013-12-13 | 2014-02-19 | 陕西省气象局 | Lightning activity data statistics method based on modified Apriori algorithm |
CN104217013A (en) * | 2014-09-22 | 2014-12-17 | 广西教育学院 | Course positive and negative mode excavation method and system based on item weighing and item set association degree |
-
2015
- 2015-10-15 CN CN201510666724.4A patent/CN105320756B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6061682A (en) * | 1997-08-12 | 2000-05-09 | International Business Machine Corporation | Method and apparatus for mining association rules having item constraints |
CN103150515A (en) * | 2012-12-29 | 2013-06-12 | 江苏大学 | Association rule mining method for privacy protection under distributed environment |
CN103593400A (en) * | 2013-12-13 | 2014-02-19 | 陕西省气象局 | Lightning activity data statistics method based on modified Apriori algorithm |
CN104217013A (en) * | 2014-09-22 | 2014-12-17 | 广西教育学院 | Course positive and negative mode excavation method and system based on item weighing and item set association degree |
Non-Patent Citations (3)
Title |
---|
MINAL G.ET AL.: "Association Rule Mining using Improved Apriori Algorithm", 《INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS》 * |
张力 等: "基于筛选压缩的类Apriori算法的研究", 《电脑知识与技术》 * |
韩天鹏 等: "基于候选项集剪枝的Apriori算法的研究", 《阜阳师范学院学报(自然科学版)》 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105827603A (en) * | 2016-03-14 | 2016-08-03 | 中国人民解放军信息工程大学 | Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device |
CN106294617A (en) * | 2016-07-29 | 2017-01-04 | 浪潮软件集团有限公司 | Method for efficiently mining frequent item sets in association rule |
CN106407296B (en) * | 2016-08-30 | 2019-06-25 | 中通服咨询设计研究院有限公司 | Partial sweep correlation rule computer data analysis method based on anticipation screening |
CN106407296A (en) * | 2016-08-30 | 2017-02-15 | 江苏省邮电规划设计院有限责任公司 | Local scan association rule computer data analysis method based on pre-judging screening |
CN107870939A (en) * | 2016-09-27 | 2018-04-03 | 腾讯科技(深圳)有限公司 | A kind of mode excavation method and device |
CN106991141A (en) * | 2017-03-21 | 2017-07-28 | 北京邮电大学 | A kind of association rule mining method based on depth pruning strategy |
CN106991141B (en) * | 2017-03-21 | 2020-12-11 | 北京邮电大学 | Association rule mining method based on deep pruning strategy |
CN107066587A (en) * | 2017-04-17 | 2017-08-18 | 贵州大学 | A kind of efficient Mining Frequent Itemsets based on group chained list |
CN107220483A (en) * | 2017-05-09 | 2017-09-29 | 西北大学 | A kind of mode prediction method of polynary time series data |
CN107256213A (en) * | 2017-06-28 | 2017-10-17 | 中国石油大学(华东) | A kind of topic relation based on parallel association rule finds method and finds device |
CN107844514A (en) * | 2017-09-22 | 2018-03-27 | 深圳市易成自动驾驶技术有限公司 | Data digging method, device and computer-readable recording medium |
CN107943946A (en) * | 2017-11-24 | 2018-04-20 | 重庆科技学院 | Relevance method for digging between test item bank knowledge point based on Apriori algorithm |
CN107943946B (en) * | 2017-11-24 | 2019-08-30 | 重庆科技学院 | Relevance method for digging between test item bank knowledge point based on Apriori algorithm |
CN107908776A (en) * | 2017-11-30 | 2018-04-13 | 国网浙江省电力公司经济技术研究院 | Frequent mode Web Mining algorithm and system based on affairs project incidence matrix |
CN108108441A (en) * | 2017-12-21 | 2018-06-01 | 新博卓畅技术(北京)有限公司 | A kind of database table structure analysis method and system |
CN108334548A (en) * | 2017-12-26 | 2018-07-27 | 爱品克科技(武汉)股份有限公司 | A kind of data mining technology based on correlation rule |
CN108664642A (en) * | 2018-05-16 | 2018-10-16 | 句容市茂润苗木有限公司 | Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm |
CN109859852A (en) * | 2019-01-25 | 2019-06-07 | 青海大学 | Improved Apriori algorithm and its application in Tibetan medicine's association mining |
CN110489411A (en) * | 2019-07-11 | 2019-11-22 | 齐鲁工业大学 | A kind of association rule mining method based on virtual value storage and operation mode |
CN110489411B (en) * | 2019-07-11 | 2023-08-22 | 齐鲁工业大学 | Association rule mining method based on effective value storage and operation mode |
CN110489448A (en) * | 2019-07-24 | 2019-11-22 | 西安理工大学 | The method for digging of big data correlation rule based on Hadoop |
CN110992698A (en) * | 2019-12-26 | 2020-04-10 | 北京工业大学 | Method for calculating association degree between intersections based on Apriori support degree and driving distance in weighting manner |
CN111858662A (en) * | 2020-06-01 | 2020-10-30 | 广东恒睿科技有限公司 | Method, system and storage medium for identifying underlying network potential danger data |
CN112241420A (en) * | 2020-10-26 | 2021-01-19 | 浪潮云信息技术股份公司 | Government affair service item recommendation method based on association rule algorithm |
CN113010597A (en) * | 2021-04-06 | 2021-06-22 | 东北大学 | Parallel association rule mining method for ocean big data |
CN113010597B (en) * | 2021-04-06 | 2023-08-01 | 东北大学 | Ocean big data-oriented parallel association rule mining method |
CN113568942A (en) * | 2021-05-26 | 2021-10-29 | 南京师范大学 | Data set frequent item set mining availability evaluation method |
CN113568942B (en) * | 2021-05-26 | 2024-07-16 | 南京师范大学 | Data set frequent item set mining availability evaluation method |
Also Published As
Publication number | Publication date |
---|---|
CN105320756B (en) | 2018-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105320756A (en) | Improved Apriori algorithm based method for mining database association rule | |
US7370033B1 (en) | Method for extracting association rules from transactions in a database | |
US7743058B2 (en) | Co-clustering objects of heterogeneous types | |
Lan et al. | An efficient approach for finding weighted sequential patterns from sequence databases | |
Lee et al. | An efficient algorithm for mining frequent inter-transaction patterns | |
CN105260387B (en) | A kind of Association Rule Analysis method towards magnanimity transaction database | |
Lin et al. | A frequent itemset mining algorithm based on the Principle of Inclusion–Exclusion and transaction mapping | |
Ma et al. | Score look-alike audiences | |
CN105912721B (en) | RDF data distributed semantic parallel inference method | |
Xenopoulos et al. | GALE: Globally assessing local explanations | |
Oguz et al. | Incremental itemset mining based on matrix apriori algorithm | |
Lin et al. | Mining of high average-utility patterns with item-level thresholds | |
Prasad | Optimized high-utility itemsets mining for effective association mining paper | |
Chawla et al. | JOTR: Join-optimistic triple reordering approach for SPARQL query optimization on big RDF data | |
Ashrafi Payaman et al. | Graph hybrid summarization | |
Gaikwad et al. | Evaluation of Apriori Algorithm on Retail Market Transactional Database to get Frequent Itemsets. | |
Nawaz et al. | Spore: shortest path overlapped regions and confined traversals towards graph clustering | |
Boutsinas et al. | Distributed mining of association rules based on reducing the support threshold | |
Zhang et al. | A twig-based algorithm for top-k subgraph matching in large-scale graph data | |
CN106407296B (en) | Partial sweep correlation rule computer data analysis method based on anticipation screening | |
Lin et al. | Mining high-utility sequential patterns in uncertain databases | |
Mahajan et al. | Performance analysis of sequential pattern mining algorithms on large dense datasets | |
Tseng et al. | A minimal perfect hashing scheme to mining association rules from frequently updated data | |
Huang et al. | NSPIS: Mining negative sequential patterns with individual support | |
Lan et al. | Fast discovery of high fuzzy utility itemsets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 210000 No. 58 East Street, Nanxi River, Jianye District, Nanjing, Jiangsu Applicant after: Zhong Tong clothing consulting and Design Research Institute Co., Ltd. Address before: 210000 No. 58 East Street, Nanxi River, Jianye District, Nanjing, Jiangsu Applicant before: Jiangsu Posts & Telecommunications Planning and Designing Institute Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |