Abstract
In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- I :
-
A set of m items, I={i 1,i 2,…,i j ,…,i m }, in which each item i j has its own profit value p j ;
- P :
-
The profit table, {p 1,p 2,…,p j ,…,p m }, in which p j is the profit value of item i j ;
- D :
-
The original quantitative database, D={T 1,T 2,…,T k ,…,T n }, in which each transaction contains several items with purchase quantities;
- d :
-
The new transactions, d={t 1,t 2,…,t k ,…,t n }, in which each transaction contains several items with purchase quantities;
- U :
-
The entire updated database, i.e., D∪d;
- TU D :
-
The total utility of the transactions in D;
- TU d :
-
The total utility of the transactions in d;
- TU U :
-
The total utility of the transactions in U;
- q kj :
-
The quantity of item i j in transaction t k ;
- u kj :
-
The utility of item i j in transaction t k , which is calculated as q kj ×p j ;
- tu k :
-
The transaction utility of currently processed transaction t k ;
- buf :
-
A buffer used to store the total utility of the last processed transactions for transaction insertion. It is set to 0 after the database is rescanned;
- X :
-
An itemset containing several items i j ;
- S u :
-
The upper utility threshold for large (high) transaction-weighted utilization and high utility itemsets. It is the same as the high utility threshold in traditional utility mining;
- S l :
-
The lower utility threshold for pre-large transaction-weighted utilization and pre-large itemsets, where S u >S l ;
- f :
-
The safety transaction utility bound for new transactions;
- C r :
-
The set of candidate r-itemsets;
- Rescan_Items :
-
The set of the itemsets that must be rescanned in original database;
- \(\mathit{HTWU}_{r}^{D}\) :
-
The set of large (high) transaction-weighted utilization r-itemsets in the original database;
- \(\mathit{PTWU}_{r}^{D}\) :
-
The set of pre-large transaction-weighted utilization r-itemsets in the original database;
- HTWU D :
-
The set of large (high) transaction-weighted utilization itemsets in the original database;
- PTWU D :
-
The set of pre-large transaction-weighted utilization itemsets in the original database;
- \(\mathit{HTWU}_{r}^{U}\) :
-
The set of large (high) transaction-weighted utilization r-itemsets in the updated database;
- \(\mathit{PTWU}_{r}^{U}\) :
-
The set of pre-large transaction-weighted utilization r-itemsets in the updated database;
- HTWU U :
-
The set of large (high) transaction-weighted utilization itemsets in the updated database;
- PTWU U :
-
The set of pre-large transaction-weighted utilization itemsets in the updated database;
- HU U :
-
The set of high-utility itemsets in the updated database;
- twu D(X):
-
The transaction-weighted utilization of itemset X in the original database;
- twu d(X):
-
The transaction-weighted utilization of itemset X in the new transactions;
- twu U(X):
-
The transaction-weighted utilization of itemset X in the updated database;
- au D(X):
-
The actual utility of itemset X in the original database;
- au d(X):
-
The actual utility of itemset X in the new transactions;
- au U(X):
-
The actual utility of itemset X in the updated database.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The 20th international conference on very large data bases, pp 487–499
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: International conference on management of data, pp 207–216
Berzal F, Cubero JC, Marín N, Serrano JM (2001) Tbar: an efficient method for association rule mining in relational databases. Data Knowl Eng 37(1):47–64
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2):255–264
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE international conference on data mining, pp 19–26
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Cheung DW, Jiawei H, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering, pp 106–114
Hong TP, Wu CH (2011) An improved weighted clustering algorithm for determination of application nodes in heterogeneous sensor networks. J Inf Hiding Multimed Signal Process 2:173–184
Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129
Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435
Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510
Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: The international workshop on new directions in rough sets, data mining, and granular-soft computing, pp 443–447
IBM quest data mining project, Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html
Lent B, Swami A, Widom J (1997) Clustering association rules. In: The international conference on data engineering, pp 220–231
Li YC, Yeh JS, Chang CC (2005) Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets. In: Lecture notes in computer science, vol 3614, pp 551–560
Li YC, Yeh JS, Chang CC (2005) Efficient algorithms for mining share-frequent itemsets. In: The world congress of international fuzzy systems association, pp 539–543
Li YC, Yeh JS, Chang CC (2005) Fast algorithm for mining share-frequent itemsets. In: The Asia Pacific web conference, pp 417–428
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Discov 3(3):190–199
Lin CW, Lan GC, Hong TP (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180
Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4(4):201–227
Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell. doi:10.1007/s10489-012-0415-3
Liu Y, Liao W-k, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: The international workshop on utility-based data mining, pp 90–99
Liu Y, Liao W-k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture notes in computer science, pp 689–695
Microsoft Example database foodmart of Microsoft analysis services. http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. SIGMOD Rec 24(2):175–186
Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825
Sarda NL, Srinivas NV (1998) An adaptive algorithm for incremental mining of association rules. In: The international workshop on database and expert systems applications, pp 240–245
Song W, Liu Y, Li J (2013) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell. doi:10.1007/s10489-013-0443-7
Sucahyo Y, Gopalan R (2005) Building a more accurate classifier based on strong frequent patterns. In Lecture notes in computer science, pp 1036–1042
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM international conference on data mining, pp 211–225
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: International conference on knowledge discovery and data mining, pp 283–286
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, CW., Hong, TP., Lan, GC. et al. Incrementally mining high utility patterns based on pre-large concept. Appl Intell 40, 343–357 (2014). https://doi.org/10.1007/s10489-013-0467-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0467-z