Nothing Special   »   [go: up one dir, main page]

CN105320756A - Improved Apriori algorithm based method for mining database association rule - Google Patents

Improved Apriori algorithm based method for mining database association rule Download PDF

Info

Publication number
CN105320756A
CN105320756A CN201510666724.4A CN201510666724A CN105320756A CN 105320756 A CN105320756 A CN 105320756A CN 201510666724 A CN201510666724 A CN 201510666724A CN 105320756 A CN105320756 A CN 105320756A
Authority
CN
China
Prior art keywords
frequent
item
node
affairs
linked list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510666724.4A
Other languages
Chinese (zh)
Other versions
CN105320756B (en
Inventor
赵学健
袁源
孙知信
乔爱锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Posts and Telecommunications Planning and Designing Institute Co Ltd
Original Assignee
Jiangsu Posts and Telecommunications Planning and Designing Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Posts and Telecommunications Planning and Designing Institute Co Ltd filed Critical Jiangsu Posts and Telecommunications Planning and Designing Institute Co Ltd
Priority to CN201510666724.4A priority Critical patent/CN105320756B/en
Publication of CN105320756A publication Critical patent/CN105320756A/en
Application granted granted Critical
Publication of CN105320756B publication Critical patent/CN105320756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention proposes an improved Apriori algorithm based method for mining a database association rule. According to the method, a transaction database is converted into a relational matrix, the converted relational matrix is a sparse matrix, and the relational matrix is stored with an orthogonal link list. A generation process of a frequent item set is converted into an operation process of a single link list node set corresponding to items in the corresponding relational matrix. According to the method, a database only needs to be scanned once, so that the shortcomings that Apriori and a related algorithm therefor generate a large amount of candidate sets and need to scan the database for multiple times are overcome, and the time of frequently performing I/O operations is shortened; then, when a frequent 2-item set is generated and found, only an intersection operation of a node set needs to be performed, so that less time is consumed; and a single link list constructed by a generated frequent k-item set is recorded, so that a generation process of a frequent K+1-item set is simplified, and a complex pruning process of the Apriori algorithm is avoided.

Description

A kind of database association rule digging method based on improving Apriori algorithm
Technical field
The invention discloses a kind of database association rule digging method based on improving Apriori algorithm, emphasis relates to and is representing on the basis of transaction database with orthogonal linked list storage matrix, transformation and optimization is carried out to the Frequent Item Sets generative process of Apriori algorithm, belongs to computer data and excavate and technical field of information processing.
Background technology
Develop today like a raging fire at large data technique, people recognize that namely data are wealth gradually, especially have more huge practical value to the analysis of business data.Association Rule Analysis, as one of the Main Means of data mining, is an important component part indispensable in data mining technology, is mainly used in finding valuable interesting contact implicit in large-scale transaction database and rule.Therefore, the research of association rule algorithm is had very important significance.
As far back as 1993, the people such as the computer scientist R.Agrawal of IBM found the purchase rule of client when buying commodity in customer transaction DB, propose the correlative model between affairs, namely initial correlation rule.Normally a kind of uncomplicated but rule that practicality is very high of correlation rule.By Association Rule Analysis, we can by the relation excavation between transaction itemset and item collection out.The most typical application of Association Rule Analysis is market basket data analysis, such as classical { beer } → { diaper } rule.Except being applied to except market basket data, the application of Association Rule Analysis in other field is also very extensive, as personalized recommendations in E-business, and financial service, advertisement plan, bioinformatics and science data analysis etc.Such as in personalized recommendations in E-business, correlation rule can help e-commerce website to carry out the interested commercial product recommending of some their possibilities to the client with similar consumer behavior, contribute to e-commerce website like this and promote Consumer's Experience, net income increase etc.
Association Rule Analysis algorithm is more, and wherein the most classical practicality is it is preferred that Apriori algorithm and innovatory algorithm thereof.Apriori algorithm [1] is first association rule algorithm proposed in 1994 by Agrawal and Swami, is widely used, and this algorithm performs connection by repetitive cycling, beta pruning generates Frequent Item Sets, thus the rule that is associated.Based on Apriori algorithm, the people such as Yang propose Apriori-TFP algorithm [2], and raw data, in association rule mining process, is carried out pre-service and be stored in local supporting, in tree, finally to generate correlation rule by this algorithm.This algorithm, by effective pre-service, reduces the time of association rule mining, but needs the number of times of scan database still more.The people such as Zhang propose GP-Apriori algorithm [3], GP-Apriori algorithm adopts graphic process unit (GraphicalProcessingUnit, GPU) carry out the support counting of parallelization, and row of vertically concluding the business are stored as linear oldered array.GPU by this oldered array of traversal, and performs step-by-step and intersects and realize support and calculate, and result is copied back internal memory.Compared with the Apriori algorithm that traditional C PU runs, GP-Apriori algorithm improves operating rate owing to have employed advanced GPU, but complicacy increases on the contrary to some extent.The people such as Delighta it is also proposed innovatory algorithm (AprioriMendAlgorithm) [4] of Apriori.This algorithm uses hash function to generate Item Sets, and user must specify minimum support to delete unwanted item collection.This algorithm has efficiency more better than traditional Apriori algorithm, but the execution time increases to some extent.Ning etc. achieve the parallelization [5] of happy Apriori algorithm based on MapReduce framework.This algorithm is with good expansibility and efficiency when processing massive data sets, but this counts the powerful calculating of needs and storage capacity supports, and is normally operated in cluster environment.The people such as Sulianta attempt in the document [6] Apriori algorithm to be applied to multidimensional data analysis, have inquired into the more concrete effective method of rule that to be associated in multidimensional data.The people such as Sheila improve Apriori algorithm in document [7], introduce the concept of affairs size and affairs scale to eliminate the impact of non-critical item object.The people such as Feng propose a kind of Apriori algorithm based on matrix in document [8], and this algorithm effectively represents the various operations of database by matrix, and obtain maximum Frequent Item Sets with the AND operation based on matrix.The people such as Hu are application relational theory thought in document [9], introduce project reparable subspace and AND operation thereof, devise a kind of fast algorithm for mining-SLIG (Single-levelLargeItemsetsGeneration) algorithm, the production process of Frequent Item Sets is converted into vector operation process in the relational matrix of Item Sets.This algorithm overcomes Apriori and related algorithm produces a large amount of Candidate Set and needs the shortcoming of Multiple-Scan database, but the storage space needed is larger.
Reference citation
[1] R.Agrawal, R.Srikantetal..Fastalgorithmsforminingassociationrules (fast algorithm of Mining Association Rules), Proc.20thInt.Conf.VeryLargeDataBases, VLDB, vol.1215, pp.487-499, September1994.
[2] Z.Yang, W.Tang, A.Shintemirov, andQ.Wu.Associationrulemining-baseddissolvedgasanalysisf orfaultdiagnosisofpowertransformers (the diagnosing fault of power transformer dissolved gas analysis based on association rule mining), Systems, Man, andCybernetics, PartC:ApplicationsandReviews, IEEETransactionson, vol.39, no.6, pp.597-610,2009.
[3] F.Zhang, Y.Zhang, andJ.D.Bakos.Gpapriori:Gpu-acceleratedfrequentitemsetmin ing (frequent item set mining that graphic based processor accelerates), inCLUSTER.IEEE, 2011, pp.590-594.
[4] I.S.P.J.D.MagdaleneDelightaAngeline.Associationrulegener ationusingApriorimendalgorithmforstudent'splacement (Association Rules Generating Algorithm based on improving Apriori algorithm), vol.2, no.1,2012, pp.78-86.
[5] N.Li, L.Zeng, Q.He, andZ.Shi.Parallelimplementationofapriorialgorithmbasedon MapReduce (Parallel Implementation based on the Apriori algorithm of MapReduce), inSoftwareEngineering, ArtificialIntelligence, NetworkingandParallelDistributedComputing (SNPD), 201213thACISInternationalConferenceon, 2012, pp.236-241.
[6] F.Sulianta, T.H.Liong, andI.Atastina.Miningfoodindustry'smultidimensionaldatato produceassociationrulesusingApriorialgorithmasabasisofbu sinessstrategy (the corporate strategy association rules mining algorithm towards food industry multidimensional data based on Apriori algorithm), inInformationandCommunicationTechnology (ICoICT), 2013InternationalConferenceof, 2013, pp.176-181.
[7] S.A.Abaya.AssociationruleminingbasedonApriorialgorithmin minimizingcandidategeneration (the minimum generation candidate association rule mining algorithms based on Apriori algorithm), InternationalJournalofScientificandEngineeringResearch, vol.3, no.7, pp.1-4, July2012.
[8] WangFeng, LiYong-hua.AnImprovedAprioriAlgorithmBasedontheMatrix (a kind of improvement Apriori algorithm based on matrix), fbie, pp.152-155,2008InternationalSeminaronFutureBioMedicalInformationEng ineering, 2008.
[9] Hu Huirong, Wang Zhoujing. a kind of Fast algorithm for mining association rules based on relational matrix, computer utility, 2005,25 (7): 1577-1579.
Summary of the invention
The present invention proposes a kind of database association rule digging method based on improving Apriori algorithm, comprising improvement Apriori algorithm-OLA (OrthogonalListApriori) algorithm based on orthogonal linked list affairs storage matrix.
The present invention includes following steps:
Step 1, scanning transaction database D, obtains relational matrix M a;
Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node, and M node is the gauge outfit node of orthogonal linked list; H node is row/column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding; Three kinds of nodes all comprise four territories, Tag territory, Element territory, Right territory and Down territory.Tag territory is mark domain, in order to distinguish three kinds of dissimilar nodes.Element territory is element fields, and concerning orthogonal linked list gauge outfit node, what two tuples in element fields stored is line number and the columns of corresponding sparse relational matrix, i.e. the number of transactions that comprises of transaction database D and item number; Concerning row gauge outfit node, the nonzero element number comprised in which row of two element group representations in element fields and this row; Concerning list head node, the nonzero element number in two element group representation projects in element fields or Item Sets and this row; Concerning nonzero element node, two element group representation projects in element fields or Item Sets and comprise the affairs numbering of this project or Item Sets.Right territory is pointer field, and concerning orthogonal linked list gauge outfit node, it points to first list head node; Concerning row gauge outfit node, it points to this row first nonzero element node; Concerning list head node, it points to next list head node; Concerning nonzero element node, it points to the next nonzero element node of this row.Down territory is also pointer field, and concerning orthogonal linked list gauge outfit node, it points to first row gauge outfit node; Concerning row gauge outfit node, it points to next list head node; Concerning list head node, it points to these row first nonzero element node; Concerning nonzero element node, it points to the next nonzero element node of these row;
Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2 1and frequent 1 collection set L 1corresponding orthogonal linked list;
Step 4, by frequent k-1 item collection set L k-1be connected with self and produce candidate's frequent k item collection set C k, k be interval [2, ∞) in natural number;
Step 5, utilizes Apriori character (all nonvoid subsets of frequent item set also must be frequently, if the nonvoid subset of certain candidate is not frequently, so this candidate certainly not frequently) to candidate's frequent item set set C kcarry out beta pruning;
Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection affairs set and calculated candidate frequent k item collection set C kmiddle member support, wherein i 1for interval [1, N k] in natural number, N krepresent candidate's frequent k item collection set C kthe number of members comprised;
Step 7, by frequent for candidate k item collection set C kmiddle member support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L k, and according to gained affairs set in step 6 construct frequent k item collection set L kcorresponding orthogonal linked list;
Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;
Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then can produces correlation rule:
R={A->B}, A are any member in frequent item set set F nonvoid subset, B is the supplementary set of A, namely ∈ F, i 2for interval [1, N f] in natural number, N frepresent the number of members that frequent item set set F comprises.
In step 1, described relational matrix M aas follows:
And have:
Wherein, I={I 1, I 2..., I | I|the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].
In step 3, described computation process is as follows: the list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I 1, I 2..., I | I|in the number of times that occurs in all affairs of projects member, be respectively N 1, N 2..., N | I|, comprise project set I jaffairs set be T ({ I j), obtain projects support sup (I according to following formulae discovery j):
sup(I j)=N j/|D|,j∈[1,|I|],
Projects support and set minimum support min_support are compared, and deletes the program member that support is less than minimum support, obtain frequent 1 collection set L 1, based on frequent 1 collection set L 1in affairs set T ({ I corresponding to each element j), obtain frequent 1 collection set L 1corresponding orthogonal linked list.
In step 4, with reference to citing document 1 in background technology, connection procedure is as follows: establish m 1and m 2frequent k-1 item collection set L k-1any two members, the project in member by dictionary order sequence, namely for member have wherein represent member in i-th 4individual project, wherein i 3∈ { 0,1}, i 4∈ 1,2 ..., k-1}, if member is m 1and m 2in before k-2 project all identical, member m 1kth-2 projects be less than member m 2kth-2 projects, i.e. (m 1[1]=m 2[1]) & & (m 1[2]=m 2[2]) & & ... & & (m 1[k-2]=m 2[k-2]) & & (m 1[k-1] <m 2[k-1]), then judge m 1and m 2be attachable, connect m 1and m 2the result produced is { m 1[1], m 1[2] ..., m 1[k-1], m 2[k-1] }.
In step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C kmember if its all subsets comprising k-1 element are all in list head node, then by member be retained in candidate's frequent item set set C kin, otherwise by it from C kmiddle deletion.
In step 6, the step calculating described support comprises:
Candidate's frequent k item collection set C kmiddle member then affairs set T (c i)=T (I 1, I 2..., I k-1, I k)=T (I 1, I 2..., I k-1) ∩ T (I k), namely comprise the set of item collection affairs set T (c i) be comprise item collection set { I 1, I 2..., I k-1affairs set T (I 1, I 2..., I k-1) and comprise item collection set { I kaffairs set T (I k) common factor, travel through the orthogonal linked list middle term collection { I that the set of frequent k-1 item collection is corresponding 1, I 2..., I k-1corresponding row, affairs set T (I can be obtained 1, I 2..., I k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding kcorresponding row, obtain affairs set T (I k), then by the set of following formulae discovery k item collection support:
sup({I 1,I 2,...,I k})=N(T(I 1,I 2,...,I k-1)∩T(I k))/|D|,k∈[1,n],
Wherein, N (T (I 1, I 2..., I k-1) ∩ T (I k)) represent affairs set T (I 1, I 2..., I k-1) and affairs set T (I k) the number of transactions that comprises of common factor.
In the present invention, the member of item set is as item collection, and the member that item integrates is as project.
Beneficial effect: transaction database is converted into relational matrix by the present invention, owing to only comprising a small amount of project in usual each affairs, therefore the relational matrix after transforming is sparse matrix, and in order to reduce the space complexity of algorithm, this algorithm uses orthogonal linked list to store relational matrix.The production process of Frequent Item Sets is converted into the calculating process of the node set of single linked list corresponding to project in corresponding relation matrix.This algorithm only needs scan database one time, overcomes the shortcoming that Apriori and related algorithm thereof produce a large amount of Candidate Set and need Multiple-Scan database, decreases the time of frequently carrying out I/O operation; Secondly, generate and only need to carry out node set when finding frequent 2-item collection ship calculation, expend time in less, and record carried out to the frequent k-item collection structure single linked list generated, simplify the generative process of frequent k+1-item collection, avoid the beta pruning process of Apriori algorithm complexity.Finally, algorithm adopts orthogonal linked list storage organization, greatly reduces the demand to storage space.
Accompanying drawing illustrates:
Fig. 1 is transaction database D of the present invention.
Fig. 2 is relational matrix of the present invention.
Fig. 3 is relational matrix orthogonal linked list of the present invention.
Fig. 4 is each member's support of candidate of the present invention 2 collection.
Fig. 5 is frequent 2 the collection orthogonal linked lists of the present invention.
Fig. 6 is frequent 3 the collection orthogonal linked lists of the present invention.
Fig. 7 is correlation rule of the present invention.
Specific embodiments:
The present invention proposes a kind of database association rule digging method based on improving Apriori algorithm, comprising the following steps:
Step 1, scanning transaction database D, obtains relational matrix M a;
Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node, and M node is the gauge outfit node of orthogonal linked list; H node is row/column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding; Three kinds of nodes all comprise four territories, Tag territory, Element territory, Right territory and Down territory.Tag territory is mark domain, in order to distinguish three kinds of dissimilar nodes.Element territory is element fields, and concerning orthogonal linked list gauge outfit node, what two tuples in element fields stored is line number and the columns of corresponding sparse relational matrix, i.e. the number of transactions that comprises of transaction database D and item number; Concerning row gauge outfit node, the nonzero element number comprised in which row of two element group representations in element fields and this row; Concerning list head node, the nonzero element number in two element group representation projects in element fields or Item Sets and this row; Concerning nonzero element node, two element group representation projects in element fields or Item Sets and comprise the affairs numbering of this project or Item Sets.Right territory is pointer field, and concerning orthogonal linked list gauge outfit node, it points to first list head node; Concerning row gauge outfit node, it points to this row first nonzero element node; Concerning list head node, it points to next list head node; Concerning nonzero element node, it points to the next nonzero element node of this row.Down territory is also pointer field, and concerning orthogonal linked list gauge outfit node, it points to first row gauge outfit node; Concerning row gauge outfit node, it points to next list head node; Concerning list head node, it points to these row first nonzero element node; Concerning nonzero element node, it points to the next nonzero element node of these row;
Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2 1and frequent 1 collection set L 1corresponding orthogonal linked list;
Step 4, by frequent k-1 item collection set L k-1be connected with self and produce candidate's frequent k item collection set C k, k be interval [2, ∞) in natural number;
Step 5, utilizes Apriori character (all nonvoid subsets of frequent item set also must be frequently, if the nonvoid subset of certain candidate is not frequently, so this candidate certainly not frequently) to candidate's frequent item set set C kcarry out beta pruning;
Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection affairs set and calculated candidate frequent k item collection set C kmiddle member support, wherein i 1for interval [1, N k] in natural number, N krepresent candidate's frequent k item collection set C kthe number of members comprised;
Step 7, by frequent for candidate k item collection set C kmiddle member support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L k, and according to gained affairs set in step 6 construct frequent k item collection set L kcorresponding orthogonal linked list;
Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;
Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then can produces correlation rule:
R={A->B}, A are any member in frequent item set set F nonvoid subset, B is the supplementary set of A, namely ∈ F, i 2for interval [1, N f] in natural number, N frepresent the number of members that frequent item set set F comprises.
In step 1, described relational matrix M aas follows:
And have:
Wherein, I={I 1, I 2..., I | I|the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].
In step 3, described computation process is as follows: the list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I 1, I 2..., I | I|in the number of times that occurs in all affairs of projects member, be respectively N 1, N 2..., N | I|, comprise project set I jaffairs set be T ({ I j), obtain projects support sup (I according to following formulae discovery j):
sup(I j)=N j/|D|,j∈[1,|I|],
Projects support and set minimum support min_support are compared, obtains frequent 1 collection set L 1, based on frequent 1 collection set L 1in affairs set T ({ I corresponding to each element j), obtain frequent 1 collection set L 1corresponding orthogonal linked list.
In step 4, with reference to citing document 1 in background technology, connection procedure is as follows: establish m 1and m 2frequent k-1 item collection set L k-1any two members, the project in member by dictionary order sequence, namely for member have wherein represent member in i-th 4individual project, wherein i 3∈ { 0,1}, i 4∈ 1,2 ..., k-1}, if member is m 1and m 2in before k-2 project all identical, member m 1kth-2 projects be less than member m 2kth-2 projects, i.e. (m 1[1]=m 2[1]) & & (m 1[2]=m 2[2]) & & ... & & (m 1[k-2]=m 2[k-2]) & & (m 1[k-1] <m 2[k-1]), then judge m 1and m 2be attachable, connect m 1and m 2the result produced is { m 1[1], m 1[2] ..., m 1[k-1], m 2[k-1] }.
In step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C kmember if its all subsets comprising k-1 element are all in list head node, then by member be retained in candidate's frequent item set set C kin, otherwise by it from C kmiddle deletion.
In step 6, the step calculating described support comprises:
Candidate's frequent k item collection set C kmiddle member then affairs set T (c i)=T (I 1, I 2..., I k-1, I k)=T (I 1, I 2..., I k-1) ∩ T (I k), namely comprise the set of item collection affairs set T (c i) be comprise item collection set { I 1, I 2..., I k-1affairs set T (I 1, I 2..., I k-1) and comprise item collection set { I kaffairs set T (I k) common factor, travel through the orthogonal linked list middle term collection { I that the set of frequent k-1 item collection is corresponding 1, I 2..., I k-1corresponding row, affairs set T (I can be obtained 1, I 2..., I k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding kcorresponding row, obtain affairs set T (I k), then by the set of following formulae discovery k item collection support:
sup({I 1,I 2,...,I k})=N(T(I 1,I 2,...,I k-1)∩T(I k))/|D|,k∈[1,n],
Wherein, N (T (I 1, I 2..., I k-1) ∩ T (I k)) represent affairs set T (I 1, I 2..., I k-1) and affairs set T (I k) the number of transactions that comprises of common factor.
Embodiment 1
Be described by the step of simple transaction database D to OLA algorithm of as shown in Figure 1, and its performance simply analyzed, the minimum support min_support=30% of setting.
1) according to OLA algorithm, first the transaction database D shown in Fig. 1 is scanned, in transaction database D, comprise 10 affairs T altogether 1-T 10, 6 project I 1-I 6.Scanning transaction database D will obtain relational matrix A as shown in Figure 2, the affairs T of the i-th row correspondence database D of relational matrix A i, i ∈ [1,10], the project Ij in jth row correspondence database D, j ∈ [1,6], the nonzero element a in relational matrix ijexpression project I jbe included in affairs T iin.Represented by relational matrix A orthogonal linked list, as shown in Figure 3, in Fig. 3, the node of M type is the gauge outfit node of orthogonal linked list, and the node of H type represents the row/column gauge outfit node of orthogonal linked list, and E type node is the node that in relational matrix, nonzero element is corresponding.
2) following, each row of traversal orthogonal linked list, can obtain Item Sets I={A, the number of times that in B, C, D, E, F}, projects occur in all affairs is respectively 5, and 4,5,3,5,6.According to formula sup (I j)=N j/ | D|, j ∈ [1,6] calculates, and can obtain projects support and be respectively 0.5,0.4,0.5,0.3,0.5,0.6, all be greater than default minimum support min_support=0.3, then frequent 1 collection set L 1={ { A}, { B}, { C}, { D}, { E}, { F}}.Because all items all belongs to the member of frequent 1 collection, the orthogonal linked list that therefore frequent 1 set pair is answered is orthogonal linked list corresponding to relational matrix A.
4) by frequent 1 collection set L 1carry out producing frequent 2 the collection set C of candidate from connecting 2, as shown in Figure 4, C 2={ { AB}, { AC}, { AD}, { AE}, { AF}, { BC}, { BD}, { BE}, { BF}, { CD}, { CE}, { CF}, { DE}, { DF}, { EF}}.Because all project Ij, j ∈ [1,6] is the member of frequent 1 collection, therefore frequent 2 collection are gathered without the need to carrying out beta pruning according to Apriori character.
5) to frequent 2 the collection set C of candidate 2in all members, calculate its support.Such as candidate's frequent 2 collection AB}, travel through the row that in frequent 1 corresponding orthogonal linked list of collection, project A, B are corresponding respectively, the affairs set that can comprise project A is T (A)={ 1,5,6,8,10}, comprise the affairs set T (B)={ 2 of project B, 4,6,7}, then comprise Item Sets { affairs set T (AB)=T (A) ∩ T (B) of AB}={ 6}, then Item Sets { the support sup of AB} ({ AB})=0.1.In like manner can frequent 2 the collection set C of calculated candidate 2in other all member support as shown in Figure 4.
6) by frequent for candidate 2 collection set C 2in the support of all members and minimum support min_support=0.3 compare, delete the member that support is less than minimum support min_support, obtain frequent 2 collection set L 2={ { AE}, { AF}, { BC}, { EF}} constructs orthogonal linked list corresponding to frequent 2 collection set as shown in Figure 5.
7) by frequent 2 collection set L 2carry out producing frequent 3 the collection set C of candidate from connecting 3, C 3={ { AEF}}.{ row that AE} is corresponding, can comprise Item Sets { affairs set T (AE)={ 1,6 of AE} to travel through frequent 2 corresponding orthogonal linked list middle term collection of collection set, 10}, { row that F} is corresponding, can comprise Item Sets { the affairs set T (F)={ 1 of F} to travel through orthogonal linked list middle term collection corresponding to frequent 1 collection set, 4,5,6,8,10}, then T (AEF)=T (AE) ∩ T (F)={ 1,6,10}.{ support of AEF} is 0.3, is more than or equal to minimum support to can be calculated Item Sets according to OLA algorithm.Therefore, frequent 3 collection set L 3=AEF}}, and construct frequent 3 corresponding orthogonal linked lists of collection set as shown in Figure 6.Due to frequent 3 collection set L 3in only have a member, therefore it is very big Frequent Item Sets, generates Frequent Item Sets process and terminates.
8) according to OLA algorithm generation correlation rule as shown in Figure 7.
Performance: in this example, run OLA algorithm identical with the frequent item set that Apriori algorithm obtains, Apriori algorithm needs scanning transaction database 21 times, and OLA algorithm only needs scan database 1 time, greatly reduce the working time of algorithm, improve efficiency.
The invention provides a kind of database association rule digging method based on improving Apriori algorithm; the method and access of this technical scheme of specific implementation is a lot; the above is only the preferred embodiment of the present invention; should be understood that; for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.The all available prior art of each ingredient not clear and definite in the present embodiment is realized.

Claims (6)

1., based on the database association rule digging method improving Apriori algorithm, it is characterized in that: comprise the following steps:
Step 1, scanning transaction database D, obtains relational matrix M a;
Step 2, uses relational matrix M described in orthogonal linked list storing step 1 in computer-internal a, this orthogonal linked list comprises the node of three types, is respectively M node, H node and E node; M node is the gauge outfit node of orthogonal linked list; H node is row or column gauge outfit node, is the gauge outfit node of row chained list or row chained list in orthogonal linked list; E node is the node that in relational matrix, nonzero element is corresponding;
Step 3, calculates frequent 1 collection set L according to the orthogonal linked list of step 2 1and frequent 1 collection set L 1corresponding orthogonal linked list;
Step 4, by frequent k-1 item collection set L k-1be connected with self and produce candidate's frequent k item collection set C k, k be interval [2, ∞) in natural number;
Step 5, utilizes Apriori character to candidate's frequent k item collection set C kcarry out beta pruning;
Step 6, travels through the orthogonal linked list of frequent k-1 item collection set and frequent 1 collection set correspondence, obtains comprising item collection affairs set and calculated candidate frequent k item collection set C kmiddle member support, wherein i 1for interval [1, N k] in natural number, N krepresent candidate's frequent k item collection set C kthe number of members comprised;
Step 7, by frequent for candidate k item collection set C kmiddle member support and minimum support min_support compare, delete support and be less than the member of minimum support min_support, obtain frequent k item collection set L k, and according to gained affairs set in step 6 construct frequent k item collection set L kcorresponding orthogonal linked list;
Step 8, repeated execution of steps 4 ~ step 7, until can not find larger Frequent Item Sets;
Step 9, is F according to the frequent item set set that OLA algorithm finally obtains, then produces correlation rule:
R={A->B}, A are any member in frequent item set set F nonvoid subset, B is the supplementary set of A, namely ∈ F, i 2for interval [1, N f] in natural number, N frepresent the number of members that frequent item set set F comprises.
2. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 1, it is characterized in that: in step 1, described relational matrix M aas follows:
And have:
Wherein, I={I 1, I 2..., I | I|the project set that transaction database D comprises, | the item number that I| comprises for database D, | D| is the number of transactions that transaction database comprises, d ijthe element of representing matrix, i is the natural number in interval [1, | D|], and j be the natural number in interval [1, | I|].
3. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 2, it is characterized in that, in step 3, described computation process is as follows:
The list head node of the orthogonal linked list that traversal step 2 obtains, obtains project set I={I 1, I 2..., I | I|in the number of times that occurs in all affairs of projects member, be respectively N 1, N 2..., N | I|, comprise project set I jaffairs set be T ({ I j), obtain projects support sup (I according to following formulae discovery j):
sup(I j)=N j/|D|,j∈[1,|I|],
Projects support and set minimum support min_support are compared, and deletes the program member that support is less than minimum support, obtain frequent 1 collection set L 1, based on frequent 1 collection set L 1in affairs set T ({ I corresponding to each element j), obtain frequent 1 collection set L 1corresponding orthogonal linked list.
4. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 3, it is characterized in that, in step 4, connection procedure is as follows: establish m 1and m 2frequent k-1 item collection set L k-1any two members, the project in member by dictionary order sequence, namely for member have wherein represent member in i-th 4individual project, wherein i 3∈ { 0,1}, i 4∈ 1,2 ..., k-1}, if member is m 1and m 2in before k-2 project all identical, member m 1kth-2 projects be less than member m 2kth-2 projects, i.e. (m 1[1]=m 2[1]) & & (m 1[2]=m 2[2]) & & ... & & (m 1[k-2]=m 2[k-2]) & & (m 1[k-1] <m 2[k-1]), then judge m 1and m 2be attachable, connect m 1and m 2the result produced is { m 1[1], m 1[2] ..., m 1[k-1], m 2[k-1] }.
5. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 4, it is characterized in that, in step 5, described cut operator process is as follows: the list head node traveling through orthogonal linked list corresponding to frequent k-1 item collection set, to candidate k item collection set C kmember if its all subsets comprising k-1 element are all in list head node, then by member be retained in candidate's frequent item set set C kin, otherwise by it from C kmiddle deletion.
6. a kind of based on improving the database association rule digging method of Apriori algorithm as claimed in claim 5, it is characterized in that, in step 6, the step calculating described support comprises:
Candidate's frequent k item collection set C kmiddle member c i 1 = { I 1 , I 2 , ... , I k } = { I 1 , I 2 , ... , I k - 1 } &cup; { I k } , Then affairs set T (c i)=T (I 1, I 2..., I k-1, I k)=T (I 1, I 2..., I k-1) ∩ T (I k), namely comprise the set of item collection affairs set T (c i) be comprise item collection set { I 1, I 2..., I k-1affairs set T (I 1, I 2..., I k-1) and comprise item collection set { I kaffairs set T (I k) common factor, travel through the orthogonal linked list middle term collection set { I that the set of frequent k-1 item collection is corresponding 1, I 2..., I k-1corresponding row, obtain affairs set T (I 1, I 2..., I k-1), travel through the orthogonal linked list middle term collection set { I that frequent 1 collection set is corresponding kcorresponding row, obtain affairs set T (I k), then by the set of following formulae discovery k item collection support sup ({ I 1, I 2..., I k):
sup({I 1,I 2,...,I k})=N(T(I 1,I 2,...,I k-1)∩T(I k))/|D|,k∈[1,n],
Wherein, N (T (I 1, I 2..., I k-1) ∩ T (I k)) represent affairs set T (I 1, I 2..., I k-1) and affairs set T (I k) the number of transactions that comprises of common factor.
CN201510666724.4A 2015-10-15 2015-10-15 A kind of database association rule digging method based on improvement Apriori algorithm Active CN105320756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510666724.4A CN105320756B (en) 2015-10-15 2015-10-15 A kind of database association rule digging method based on improvement Apriori algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510666724.4A CN105320756B (en) 2015-10-15 2015-10-15 A kind of database association rule digging method based on improvement Apriori algorithm

Publications (2)

Publication Number Publication Date
CN105320756A true CN105320756A (en) 2016-02-10
CN105320756B CN105320756B (en) 2018-07-10

Family

ID=55248142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510666724.4A Active CN105320756B (en) 2015-10-15 2015-10-15 A kind of database association rule digging method based on improvement Apriori algorithm

Country Status (1)

Country Link
CN (1) CN105320756B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827603A (en) * 2016-03-14 2016-08-03 中国人民解放军信息工程大学 Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device
CN106294617A (en) * 2016-07-29 2017-01-04 浪潮软件集团有限公司 Method for efficiently mining frequent item sets in association rule
CN106407296A (en) * 2016-08-30 2017-02-15 江苏省邮电规划设计院有限责任公司 Local scan association rule computer data analysis method based on pre-judging screening
CN106991141A (en) * 2017-03-21 2017-07-28 北京邮电大学 A kind of association rule mining method based on depth pruning strategy
CN107066587A (en) * 2017-04-17 2017-08-18 贵州大学 A kind of efficient Mining Frequent Itemsets based on group chained list
CN107220483A (en) * 2017-05-09 2017-09-29 西北大学 A kind of mode prediction method of polynary time series data
CN107256213A (en) * 2017-06-28 2017-10-17 中国石油大学(华东) A kind of topic relation based on parallel association rule finds method and finds device
CN107844514A (en) * 2017-09-22 2018-03-27 深圳市易成自动驾驶技术有限公司 Data digging method, device and computer-readable recording medium
CN107870939A (en) * 2016-09-27 2018-04-03 腾讯科技(深圳)有限公司 A kind of mode excavation method and device
CN107908776A (en) * 2017-11-30 2018-04-13 国网浙江省电力公司经济技术研究院 Frequent mode Web Mining algorithm and system based on affairs project incidence matrix
CN107943946A (en) * 2017-11-24 2018-04-20 重庆科技学院 Relevance method for digging between test item bank knowledge point based on Apriori algorithm
CN108108441A (en) * 2017-12-21 2018-06-01 新博卓畅技术(北京)有限公司 A kind of database table structure analysis method and system
CN108334548A (en) * 2017-12-26 2018-07-27 爱品克科技(武汉)股份有限公司 A kind of data mining technology based on correlation rule
CN108664642A (en) * 2018-05-16 2018-10-16 句容市茂润苗木有限公司 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm
CN109859852A (en) * 2019-01-25 2019-06-07 青海大学 Improved Apriori algorithm and its application in Tibetan medicine's association mining
CN110489448A (en) * 2019-07-24 2019-11-22 西安理工大学 The method for digging of big data correlation rule based on Hadoop
CN110489411A (en) * 2019-07-11 2019-11-22 齐鲁工业大学 A kind of association rule mining method based on virtual value storage and operation mode
CN110992698A (en) * 2019-12-26 2020-04-10 北京工业大学 Method for calculating association degree between intersections based on Apriori support degree and driving distance in weighting manner
CN111858662A (en) * 2020-06-01 2020-10-30 广东恒睿科技有限公司 Method, system and storage medium for identifying underlying network potential danger data
CN112241420A (en) * 2020-10-26 2021-01-19 浪潮云信息技术股份公司 Government affair service item recommendation method based on association rule algorithm
CN113010597A (en) * 2021-04-06 2021-06-22 东北大学 Parallel association rule mining method for ocean big data
CN113568942A (en) * 2021-05-26 2021-10-29 南京师范大学 Data set frequent item set mining availability evaluation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061682A (en) * 1997-08-12 2000-05-09 International Business Machine Corporation Method and apparatus for mining association rules having item constraints
CN103150515A (en) * 2012-12-29 2013-06-12 江苏大学 Association rule mining method for privacy protection under distributed environment
CN103593400A (en) * 2013-12-13 2014-02-19 陕西省气象局 Lightning activity data statistics method based on modified Apriori algorithm
CN104217013A (en) * 2014-09-22 2014-12-17 广西教育学院 Course positive and negative mode excavation method and system based on item weighing and item set association degree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061682A (en) * 1997-08-12 2000-05-09 International Business Machine Corporation Method and apparatus for mining association rules having item constraints
CN103150515A (en) * 2012-12-29 2013-06-12 江苏大学 Association rule mining method for privacy protection under distributed environment
CN103593400A (en) * 2013-12-13 2014-02-19 陕西省气象局 Lightning activity data statistics method based on modified Apriori algorithm
CN104217013A (en) * 2014-09-22 2014-12-17 广西教育学院 Course positive and negative mode excavation method and system based on item weighing and item set association degree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MINAL G.ET AL.: "Association Rule Mining using Improved Apriori Algorithm", 《INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS》 *
张力 等: "基于筛选压缩的类Apriori算法的研究", 《电脑知识与技术》 *
韩天鹏 等: "基于候选项集剪枝的Apriori算法的研究", 《阜阳师范学院学报(自然科学版)》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827603A (en) * 2016-03-14 2016-08-03 中国人民解放军信息工程大学 Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device
CN106294617A (en) * 2016-07-29 2017-01-04 浪潮软件集团有限公司 Method for efficiently mining frequent item sets in association rule
CN106407296B (en) * 2016-08-30 2019-06-25 中通服咨询设计研究院有限公司 Partial sweep correlation rule computer data analysis method based on anticipation screening
CN106407296A (en) * 2016-08-30 2017-02-15 江苏省邮电规划设计院有限责任公司 Local scan association rule computer data analysis method based on pre-judging screening
CN107870939A (en) * 2016-09-27 2018-04-03 腾讯科技(深圳)有限公司 A kind of mode excavation method and device
CN106991141A (en) * 2017-03-21 2017-07-28 北京邮电大学 A kind of association rule mining method based on depth pruning strategy
CN106991141B (en) * 2017-03-21 2020-12-11 北京邮电大学 Association rule mining method based on deep pruning strategy
CN107066587A (en) * 2017-04-17 2017-08-18 贵州大学 A kind of efficient Mining Frequent Itemsets based on group chained list
CN107220483A (en) * 2017-05-09 2017-09-29 西北大学 A kind of mode prediction method of polynary time series data
CN107256213A (en) * 2017-06-28 2017-10-17 中国石油大学(华东) A kind of topic relation based on parallel association rule finds method and finds device
CN107844514A (en) * 2017-09-22 2018-03-27 深圳市易成自动驾驶技术有限公司 Data digging method, device and computer-readable recording medium
CN107943946A (en) * 2017-11-24 2018-04-20 重庆科技学院 Relevance method for digging between test item bank knowledge point based on Apriori algorithm
CN107943946B (en) * 2017-11-24 2019-08-30 重庆科技学院 Relevance method for digging between test item bank knowledge point based on Apriori algorithm
CN107908776A (en) * 2017-11-30 2018-04-13 国网浙江省电力公司经济技术研究院 Frequent mode Web Mining algorithm and system based on affairs project incidence matrix
CN108108441A (en) * 2017-12-21 2018-06-01 新博卓畅技术(北京)有限公司 A kind of database table structure analysis method and system
CN108334548A (en) * 2017-12-26 2018-07-27 爱品克科技(武汉)股份有限公司 A kind of data mining technology based on correlation rule
CN108664642A (en) * 2018-05-16 2018-10-16 句容市茂润苗木有限公司 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm
CN109859852A (en) * 2019-01-25 2019-06-07 青海大学 Improved Apriori algorithm and its application in Tibetan medicine's association mining
CN110489411A (en) * 2019-07-11 2019-11-22 齐鲁工业大学 A kind of association rule mining method based on virtual value storage and operation mode
CN110489411B (en) * 2019-07-11 2023-08-22 齐鲁工业大学 Association rule mining method based on effective value storage and operation mode
CN110489448A (en) * 2019-07-24 2019-11-22 西安理工大学 The method for digging of big data correlation rule based on Hadoop
CN110992698A (en) * 2019-12-26 2020-04-10 北京工业大学 Method for calculating association degree between intersections based on Apriori support degree and driving distance in weighting manner
CN111858662A (en) * 2020-06-01 2020-10-30 广东恒睿科技有限公司 Method, system and storage medium for identifying underlying network potential danger data
CN112241420A (en) * 2020-10-26 2021-01-19 浪潮云信息技术股份公司 Government affair service item recommendation method based on association rule algorithm
CN113010597A (en) * 2021-04-06 2021-06-22 东北大学 Parallel association rule mining method for ocean big data
CN113010597B (en) * 2021-04-06 2023-08-01 东北大学 Ocean big data-oriented parallel association rule mining method
CN113568942A (en) * 2021-05-26 2021-10-29 南京师范大学 Data set frequent item set mining availability evaluation method
CN113568942B (en) * 2021-05-26 2024-07-16 南京师范大学 Data set frequent item set mining availability evaluation method

Also Published As

Publication number Publication date
CN105320756B (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN105320756A (en) Improved Apriori algorithm based method for mining database association rule
US7370033B1 (en) Method for extracting association rules from transactions in a database
US7743058B2 (en) Co-clustering objects of heterogeneous types
Lan et al. An efficient approach for finding weighted sequential patterns from sequence databases
Lee et al. An efficient algorithm for mining frequent inter-transaction patterns
CN105260387B (en) A kind of Association Rule Analysis method towards magnanimity transaction database
Lin et al. A frequent itemset mining algorithm based on the Principle of Inclusion–Exclusion and transaction mapping
Ma et al. Score look-alike audiences
CN105912721B (en) RDF data distributed semantic parallel inference method
Xenopoulos et al. GALE: Globally assessing local explanations
Oguz et al. Incremental itemset mining based on matrix apriori algorithm
Lin et al. Mining of high average-utility patterns with item-level thresholds
Prasad Optimized high-utility itemsets mining for effective association mining paper
Chawla et al. JOTR: Join-optimistic triple reordering approach for SPARQL query optimization on big RDF data
Ashrafi Payaman et al. Graph hybrid summarization
Gaikwad et al. Evaluation of Apriori Algorithm on Retail Market Transactional Database to get Frequent Itemsets.
Nawaz et al. Spore: shortest path overlapped regions and confined traversals towards graph clustering
Boutsinas et al. Distributed mining of association rules based on reducing the support threshold
Zhang et al. A twig-based algorithm for top-k subgraph matching in large-scale graph data
CN106407296B (en) Partial sweep correlation rule computer data analysis method based on anticipation screening
Lin et al. Mining high-utility sequential patterns in uncertain databases
Mahajan et al. Performance analysis of sequential pattern mining algorithms on large dense datasets
Tseng et al. A minimal perfect hashing scheme to mining association rules from frequently updated data
Huang et al. NSPIS: Mining negative sequential patterns with individual support
Lan et al. Fast discovery of high fuzzy utility itemsets

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210000 No. 58 East Street, Nanxi River, Jianye District, Nanjing, Jiangsu

Applicant after: Zhong Tong clothing consulting and Design Research Institute Co., Ltd.

Address before: 210000 No. 58 East Street, Nanxi River, Jianye District, Nanjing, Jiangsu

Applicant before: Jiangsu Posts & Telecommunications Planning and Designing Institute Co., Ltd.

GR01 Patent grant
GR01 Patent grant