Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1871437.1871494acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Accelerating probabilistic frequent itemset mining: a model-based approach

Published: 26 October 2010 Publication History

Abstract

Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel method to capture the itemset mining process as a Poisson binomial distribution. This model-based approach extracts frequent itemsets with a high degree of accuracy, and supports large databases. We apply our techniques to improve the performance of the algorithms for: (1) finding itemsets whose frequentness probabilities are larger than some threshold; and (2) mining itemsets with the k highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate. Moreover, they are orders of magnitudes faster than previous approaches.

References

[1]
A. Deshpande et al. Model-driven data acquisition in sensor networks. In VLDB, 2004.
[2]
C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, 2009.
[3]
C. Aggarwal and P. Yu. A survey of uncertain data algorithms and applications. TKDE, 21(5), 2009.
[4]
R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[5]
C. J. van Rijsbergen. Information Retrieval. Butterworth, 1979.
[6]
L. L. Cam. An approximation theorem for the Poisson binomial distribution. In Pacific Journal of Mathematics, volume 10, 1960.
[7]
H. Cheng, P. Yu, and J. Han. Approximate frequent itemset mining in the presence of random noise. SCKDDM, 2008.
[8]
R. Cheng, D. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003.
[9]
C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, 2007.
[10]
G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, 2007.
[11]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
[12]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000.
[13]
J. Huang et al. MayBMS: A Probabilistic Database Management System. In SIGMOD, 2009.
[14]
J. Ren and S. Lee and X. Chen and B. Kao and R. Cheng and D. Cheung. Naive Bayes Classification of Uncertain Data. In ICDM, 2009.
[15]
R. Jampani, L. Perez, M. Wu, F. Xu, C. Jermaine, and P. Haas. MCDB: A Monte Carlo Approach to Managing Uncertain Data. In SIGMOD, 2008.
[16]
N. Khoussainova, M. Balazinska, and D. Suciu. Towards correcting input data errors probabilistically using integrity constraints. In MobiDE, 2006.
[17]
H. Kriegel and M. Pfeifle. Density-based clustering of uncertain data. In KDD, 2005.
[18]
C. Kuok, A. Fu, and M. Wong. Mining fuzzy association rules in databases. SIGMOD Record, 1998.
[19]
A. Lu, Y. Ke, J. Cheng, and W. Ng. Mining vague association rules. In DASFAA, 2007.
[20]
M. Mutsuzaki et al. Trio-one: Layering uncertainty and lineage on a conventional dbms. In CIDR, 2007.
[21]
M. Yiu et al. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. TKDE, 21(9), 2009.
[22]
P. Sistla et al. Querying the uncertain position of moving objects. In Temporal Databases: Research and Practice. Springer Verlag, 1998.
[23]
C. Stein. Approximate Computation of Expectations. Institute of Mathematical Statistics Lecture Notes - Monograph Series, 7, 1986.
[24]
L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining Uncertain Data with Probabilistic Guarantees. In SIGKDD, 2010.
[25]
T. Bernecker et al. Probabilistic frequent itemset mining in uncertain databases. In KDD, 2009.
[26]
T. Jayram et al. Avatar information extraction system. IEEE Data Eng. Bulletin, 29(1), 2006.
[27]
S. Tsang, B. Kao, K. Y. Yip, W. Ho, and S. Lee. Decision Trees for Uncertain Data. In ICDE, 2009.
[28]
Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. In SIGMOD, 2008.

Cited By

View all
  • (2024)GMiner++: Boosting GPU-based frequent itemset mining by reducing redundant computationsExpert Systems with Applications10.1016/j.eswa.2024.123928250(123928)Online publication date: Sep-2024
  • (2023)Discovery of interesting frequent item sets in an uncertain database using ant colony optimizationInternational Journal of Computers and Applications10.1080/1206212X.2023.226368945:11(673-679)Online publication date: 9-Oct-2023
  • (2022)Interactive Mining of Probabilistic Frequent Patterns in Uncertain DatabasesInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems10.1142/S021848852250011830:02(263-283)Online publication date: 18-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximation algorithm
  2. frequent itemset
  3. uncertain database

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)GMiner++: Boosting GPU-based frequent itemset mining by reducing redundant computationsExpert Systems with Applications10.1016/j.eswa.2024.123928250(123928)Online publication date: Sep-2024
  • (2023)Discovery of interesting frequent item sets in an uncertain database using ant colony optimizationInternational Journal of Computers and Applications10.1080/1206212X.2023.226368945:11(673-679)Online publication date: 9-Oct-2023
  • (2022)Interactive Mining of Probabilistic Frequent Patterns in Uncertain DatabasesInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems10.1142/S021848852250011830:02(263-283)Online publication date: 18-Apr-2022
  • (2022)A review on big data based parallel and distributed approaches of pattern miningJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2019.09.00634:5(1639-1662)Online publication date: May-2022
  • (2022)UBDM: Utility-Based Potential Pattern Mining over Uncertain Data Using Spark FrameworkEmerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT10.1007/978-3-031-07012-9_52(623-631)Online publication date: 26-May-2022
  • (2021)Mining of High-Utility Patterns in Big IoT-based DatabasesMobile Networks and Applications10.1007/s11036-020-01701-526:1(216-233)Online publication date: 4-Jan-2021
  • (2021)Analytics of high average-utility patterns in the industrial internet of thingsApplied Intelligence10.1007/s10489-021-02751-252:6(6450-6463)Online publication date: 11-Sep-2021
  • (2020)DoubleDeck: Decoupling Complex Control Logic of Network Protocols to Facilitate Efficient Hardware ImplementationElectronics10.3390/electronics91016479:10(1647)Online publication date: 10-Oct-2020
  • (2020)Efficient weighted probabilistic frequent itemset mining in uncertain databasesExpert Systems10.1111/exsy.1255138:5Online publication date: 7-Apr-2020
  • (2020)Mining Robust Frequent Items in Data Streams2020 IEEE International Conference on Joint Cloud Computing10.1109/JCC49151.2020.00026(110-117)Online publication date: Aug-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media