Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Mining frequent itemsets over uncertain databases

Published: 01 July 2012 Publication History

Abstract

In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, unlike the corresponding problem in deterministic databases where the frequent itemset has a unique definition, the frequent itemset under uncertain environments has two different definitions so far. The first definition, referred as the expected support-based frequent itemset, employs the expectation of the support of an itemset to measure whether this itemset is frequent. The second definition, referred as the probabilistic frequent itemset, uses the probability of the support of an itemset to measure its frequency. Thus, existing work on mining frequent itemsets over uncertain databases is divided into two different groups and no study is conducted to comprehensively compare the two different definitions. In addition, since no uniform experimental platform exists, current solutions for the same definition even generate inconsistent results. In this paper, we firstly aim to clarify the relationship between the two different definitions. Through extensive experiments, we verify that the two definitions have a tight connection and can be unified together when the size of data is large enough. Secondly, we provide baseline implementations of eight existing representative algorithms and test their performances with uniform measures fairly. Finally, according to the fair tests over many different benchmark data sets, we clarify several existing inconsistent conclusions and discuss some new findings.

References

[1]
Frequent itemset mining implementations repository. http://fimi.us.ac.be.
[2]
Wikipedia of poisson binomial distribution.
[3]
C. C. Aggarwal. Managing and Mining Uncertain Data. Kluwer Press, 2009.
[4]
C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, pages 29--38, 2009.
[5]
C. C. Aggarwal and P. S. Yu. Outlier detection with uncertain data. In SDM, pages 483--493, 2008.
[6]
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng., 21(5):609--623, 2009.
[7]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, 1993.
[8]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994.
[9]
T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In KDD, pages 119--128, 2009.
[10]
T. Calders, C. Garboni, and B. Goethals. Approximation of frequentness probability of itemsets in uncertain data. In ICDM, pages 749--754, 2010.
[11]
T. Calders, C. Garboni, and B. Goethals. Efficient pattern mining of uncertain data with sampling. In PAKDD, pages 480--487, 2010.
[12]
L. L. Cam. An approximation theorem for the poisson binomial distribution. Pacific Journal of Mathematics, 10(4):1181--1197, 1960.
[13]
L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004.
[14]
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005.
[15]
R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng., 16(9):1112--1127, 2004.
[16]
H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist., 23(4):493--507, 1952.
[17]
C. K. Chui and B. Kao. A decremental approach for mining frequent itemsets from uncertain data. In PAKDD, pages 64--75, 2008.
[18]
C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007.
[19]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, pages 1--12, 2000.
[20]
B. Jiang and J. Pei. Outlier detection on uncertain data: Objects, instances, and inferences. In ICDE, pages 422--433, 2011.
[21]
B. Kao, S. D. Lee, F. K. F. Lee, D. W.-L. Cheung, and W.-S. Ho. Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans. Knowl. Data Eng., 22(9), 2010.
[22]
C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In PAKDD, pages 653--661, 2008.
[23]
M. Li and Y. Liu. Underground coal mine monitoring with wireless sensor networks. TOSN, 5(2):10, 2009.
[24]
Y. Liu, K. Liu, and M. Li. Passive diagnosis for wireless sensor networks. IEEE/ACM Trans. Netw., 18(4):1132--1144, 2010.
[25]
M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized algorithm and probabilistic analysis. Cambridge University Press, 2005.
[26]
L. Mo, Y. He, Y. Liu, J. Zhao, S. Tang, X.-Y. Li, and G. Dai. Canopy closure estimates with greenorbs: sustainable sensing in the forest. In SenSys, pages 99--112, 2009.
[27]
J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-mine: Hyper-structure mining of frequent patterns in large databases. In ICDM, pages 441--448, 2001.
[28]
L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In KDD, pages 273--282, 2010.
[29]
S. Suthram, T. Shlomi, E. Ruppin, R. Sharan, and T. Ideker. A direct comparison of protein interaction confidence assignment schemes. BMC Bioinformatics, 7:360, 2006.
[30]
Y. Tong, L. Chen, and B. Ding. Discovering threshold-based frequent closed itemsets over probabilistic data. In ICDE, pages 270--281, 2012.
[31]
L. Wang, R. Cheng, S. D. Lee, and D. W.-L. Cheung. Accelerating probabilistic frequent itemset mining: a model-based approach. In CIKM, pages 429--438, 2010.
[32]
M. J. Zaki. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng., 12(3):372--390, 2000.
[33]
Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. In SIGMOD, pages 819--832, 2008.

Cited By

View all
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2023)OneSketch: A Generic and Accurate Sketch for Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327802835:12(12887-12901)Online publication date: 19-May-2023
  • (2023)Depth-First Uncertain Frequent Itemsets Mining based on Ensembled Conditional Item-Wise Supports2023 International Conference on Intelligent Supercomputing and BioPharma (ISBP)10.1109/ISBP57705.2023.10061307(121-128)Online publication date: 6-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 5, Issue 11
July 2012
608 pages

Publisher

VLDB Endowment

Publication History

Published: 01 July 2012
Published in PVLDB Volume 5, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2023)OneSketch: A Generic and Accurate Sketch for Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327802835:12(12887-12901)Online publication date: 19-May-2023
  • (2023)Depth-First Uncertain Frequent Itemsets Mining based on Ensembled Conditional Item-Wise Supports2023 International Conference on Intelligent Supercomputing and BioPharma (ISBP)10.1109/ISBP57705.2023.10061307(121-128)Online publication date: 6-Jan-2023
  • (2023)Finding Simplex Items in Data Streams2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00152(1953-1966)Online publication date: Apr-2023
  • (2023)Discovery of interesting frequent item sets in an uncertain database using ant colony optimizationInternational Journal of Computers and Applications10.1080/1206212X.2023.226368945:11(673-679)Online publication date: 9-Oct-2023
  • (2022)Mining Order-preserving Submatrices under Data Uncertainty: A Possible-world Approach and Efficient Approximation MethodsACM Transactions on Database Systems10.1145/352491547:2(1-57)Online publication date: 23-May-2022
  • (2022)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 21-Nov-2022
  • (2022)Cleaning Uncertain Data With Crowdsourcing - A General Model With Diverse Accuracy RatesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.302754534:8(3629-3642)Online publication date: 1-Aug-2022
  • (2022)Mining of High-Utility Sequence Patterns in Large-Scale Uncertain Databases2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927807(1-7)Online publication date: 12-Sep-2022
  • (2022)A review on big data based parallel and distributed approaches of pattern miningJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2019.09.00634:5(1639-1662)Online publication date: 1-May-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media