Abstract
In this paper, we propose mining frequent patterns from univariate uncertain data streams, which have a quantitative interval for each attribute in a transaction and a probability density function indicating the possibilities that the values in the interval appear. Many data streams comprise flows of univariate uncertain data, for example, the records of atmospheric pollution sensors, and network monitoring records. We propose two algorithms to address this issue: the ExactU2Stream algorithm and the ApproxiU2Stream algorithm. The former incrementally stores the incoming transactions, and delays the mining process until it is requested. The latter mines the transactions immediately when they arrive, and stores the derived frequent patterns. Compared with the latter, the former returns results that are more accurate, but it also requires more response time. Both algorithms utilize the sliding window scheme, which decomposes the continuous data stream into discrete, overlapping chunks. The proposed algorithms outperform the compared methods in terms of runtime and memory usage. We have applied the two proposed algorithms to the data streams recording the air quality in Taiwan; the derived frequent patterns not only show the common air quality in Taiwan but also show the extremely bad air quality when a sand storm affects Taiwan.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abd-Elmegid LA, El-Sharkawi ME, El-Fangary LM, Helmy YK (2010) Vertical mining of frequent patterns from uncertain data. Comput Inf Sci 3:171–179
Aggarwal CC, Han J, Yu PS (2004) On demand classification of data streams. In: Proc ACM SIGKDD int conf knowledge discovery and data mining, pp 503–508
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Proc int conf knowledge discovery and data mining, pp 29–37
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc int conf very large data base, pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34:181–198
Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20:753–762
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: Proc int conf automata, languages, and programming, pp 693–703
Chu CJ, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81:1105–1117
Chui C, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Proc Pacific-Asia conference on knowledge discovery and data mining, pp 64–75
Chui C, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Proc Pacific-Asia conference on knowledge discovery and data mining, pp 47–58
Chi Y, Wang H, Yu PS, Muntz RR (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proc int conf data mining, pp 59–66
Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proc SIGMOD/PODS, pp 296–306
Gaber MM, Krishnaswamy S, Zaslavsky A (2005) Onboard mining of data streams in sensor networks. In: Maulik U (ed) Advanced methods of knowledge discovery from complex data. Springer, Berlin, pp 307–335
Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H (ed) Data mining: next generation challenges and future directions. AAAI Press/MIT Press, Melno Park/Cambridge, pp 191–210
Golab L, Dehaan D, Demaine ED, Lopez-Ortiz A, Munro JI (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Proc internet measurement conference, pp 173–178
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc ACM SIGMOD int conf management of data, pp 1–12
Hung CC, Peng WC (2011) A regression-based approach for mining user movement patterns from random sample data. Data Knowl Eng 70:1–20
Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. SIGMOD Rec 35:14–19
Jiang N, Gruenwald L (2006) CFI-stream: mining closed frequent itemsets in data streams. In: Proc int conf knowledge discovery and data mining, pp 592–597
Karp RM, Shenker S (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst 28:51–55
Lee CH (2007) IMSP: an information theoretic approach for multi-dimensional sequential pattern mining. Appl Intell 26:231–242
Leung CKS, Carmichael CL, Hao B (2007) Efficient mining of frequent patterns from uncertain data. In: Proc int conf data mining—workshops, pp 489–494
Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data. In: Proc int conf data engineering, pp 1663–1670
Leung CKS, Hao B, Jiang F (2010) Constrained frequent itemset mining from uncertain data streams. In: Proc int conf data engineering workshops, pp 120–127
Leung CKS, Khan QI (2006) DSTree: a tree structure for the mining of frequent sets from data streams. In: Proc int conf data mining, pp 928–933
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Proc Pacific-Asia conference on knowledge discovery and data mining, pp 653–661
Li CW, Jea KF, Lin RP, Yen SF, Hsu CW (2012) Mining frequent patterns from dynamic data streams with data load management. J Syst Softw 85:1346–1362
Li HF, Lee SY (2009) Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst Appl 36:1466–1477
Li HF, Lee SY, Shan MK (2004) An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proc int work knowledge discovery in data streams
Li HF, Lee SY, Shan MK (2005) Online mining (recently) maximal frequent itemsets over data streams. In: Proc int work research issues in data engineering: stream data mining and applications
Lin CH, Chiu DY, Wu YH (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proc SIAM int conf data mining
Liu YH (2012) Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71:47–68
Liu YH, Wang CS (2012) Constrained frequent pattern mining on univariate uncertain data. J Syst Softw. doi:10.1016/j.jss.2012.11.020
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proc int conf very large data bases, pp 346–357
Mao G, Wu X, Zhu X, Chen G, Liu C (2007) Mining maximal frequent itemsets from data streams. J Inf Sci 33:251–262
Purwanto EC, Logeswaran R (2012) An enhanced hybrid method for time series prediction using linear and neural network models. Appl Intell 37:511–519
Qiao S, Tang C, Jin H, Long T, Dai S, Ku Y, Chau M (2010) PutMode: prediction of uncertain trajectories in moving objects databases. Appl Intell 33:370–386
Silvestri C, Orlando S (2007) Approximate mining of frequent patterns on streams. Intell Data Anal 11:49–73
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proc ACM SIGKDD int conf knowledge discovery and data mining, pp 273–282
Wang YT, Cheng JT (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35:32–40
Yang L, Sanver M (2004) Mining short association rules with one database scan. Proc information and knowledge engineering
Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176:1986–2015
EPA website (2010). http://taqm.epa.gov.tw/taqm/zh-tw/default.aspx
Xu C, Wang Y, Gu Y, Lin S, Yu G (2012) Efficient fuzzy ranking queries in uncertain databases. Appl Intell 37:47–59
Zhao L, Wang L, Xu Q (2012) Data stream classification with artificial endocrine system. Appl Intell 37:390–404
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, YH. Stream mining on univariate uncertain data. Appl Intell 39, 315–344 (2013). https://doi.org/10.1007/s10489-012-0415-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0415-3