article

Free access

Learning theory analysis for association rules and sequential event prediction

Editors: Kevin Murphy, Bernhard Schölkopf Authors:

Benjamin Letham,

David MadiganAuthors Info & Claims

The Journal of Machine Learning Research, Volume 14, Issue 1

Pages 3441 - 3492

Published: 01 January 2013 Publication History

PDF eReader Publisher Site

Abstract

We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called "sequential event prediction." In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer's online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the "cold start" problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an "adjusted confidence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.

References

[1]

Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. 20th Int'l Conf. Very Large Data Bases, pages 487-499. Morgan Kaufmann, 1994.

[2]

Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, 1993.

[3]

Martin Anthony. Generalization error bounds for threshold decision lists. Journal of Machine Learning Research, 5:189-217, 2004.

[4]

Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick. Sequential PAttern Mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.

[5]

Kevin Bache and Moshe Lichman. UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ml.

[6]

André Berchtold and Adrian E. Raftery. The mixture transition distribution model for high-order markov chains and non-gaussian time series. Statistical Science, 17(3):328-356, 2002.

[7]

Olivier Bousquet and André Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499-526, 2002.

[8]

John S. Breese, David Heckerman, and Carl Myers Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proc. Uncertainty in Artificial Intelligence, pages 43-52, 1998.

[9]

Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, and Cheng Yang. Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng., 13(1):64-78, 2001.

[10]

Michelle Keim Condliff, David D. Lewis, David Madigan, and Christian Posse. Bayesian mixed-effects models for recommender systems. In ACM SIGIR Workshop on Recommender Systems: Algorithms and Evaluation, 1999.

[11]

Luc P. Devroye and Terry J. Wagner. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5):601-604, 1979.

[12]

William DuMouchel and Daryl Pregibon. Empirical bayes screening for multi-item associations. In Proc. ACM SIGKDD Int'l Conf. on Knowl. Discovery and Data Mining, pages 67-76, 2001.

[13]

Jerome H. Friedman and Bogdan E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916-954, 2008.

[14]

Liqiang Geng and Howard J. Hamilton. Choosing the right lens: Finding what is interesting in data mining. In Quality Measures in Data Mining, pages 3-24. Springer, 2007.

[15]

Xiangji Huang, Aijun An, Nick Cercone, and Gary Promhouse. Discovery of interesting association rules from Livelink web log data. In Proc. IEEE Int'l Conf. on Data Mining, 2002.

[16]

Peter Jackson. Introduction to Expert Systems (Third Edition). Addison-Wesley, 1998.

[17]

Stéphanie Jacquemont, François Jacquenet, and Marc Sebban. A lower bound on the sample size needed to perform a significant frequent pattern mining task. Pattern Recogn. Lett., 30:960-967, August 2009.

[18]

Xiang-Rong Jiang and Le Gruenwald. Microarray gene expression data association rules mining based on BSC-tree and FIS-tree. Data & Knowl. Eng., 53(1):3-29, 2005.

[19]

Norman Lloyd Johnson, Adrienne W. Kemp, and Samuel Kotz. Univariate Discrete Distributions. John Wiley & Sons, August 2005.

[20]

Yun Sing Koh. Mining non-coincidental rules without a user defined support threshold. In Advances in Knowl. Discovery and Data Mining, 12th Pacific-Asia Conf., pages 910-915, 2008.

[21]

Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. Lessons and challenges from mining retail e-commerce data. Machine Learning, 57(1-2):83-113, 2004.

[22]

Richard D. Lawrence, George S. Almasi, Viveros Kotlyar, Marisa S. Viveros, and Sastry S. Duri. Personalization of supermarket product recommendations. Data Mining and Knowledge Discovery, 5(1-2):11-32, 2001.

[23]

Benjamin Letham. Similarity-weighted association rules for a name recommender system. In Proceedings of the ECML/PKDD Discovery Challenge Workshop, 2013.

[24]

Benjamin Letham, Cynthia Rudin, and Katherine A. Heller. Growing a list. Data Mining and Knowledge Discovery, 27(3):372-395, 2013a.

[25]

Benjamin Letham, Cynthia Rudin, and David Madigan. Sequential event prediction. Machine Learning, 93:357-380, 2013b.

[26]

Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. An interpretable stroke prediction model using rules and bayesian analysis. In Proceedings of AAAI Late Breaking Track, 2013c.

[27]

Jinyan Li, Xiuzhen Zhang, Guozho Dong, Kotagiri Ramamohanarao, and Qun Sun. Efficient mining of high confidence association rules without support thresholds. In Proc. Principles of Data Mining and Knowledge Discovery, pages 406-411, 1999.

[28]

Wenmin Li, Jiawei Han, and Jian Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. IEEE International Conference on Data Mining, pages 369-376, 2001.

[29]

Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6(1):83-105, 2002.

[30]

Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998.

[31]

David Madigan, Krzysztof Mosurski, and Russell G Almond. Graphical explanation in belief networks. Journal of Computational and Graphical Statistics, 6:160-181, 1997.

[32]

Mario Marchand and Marina Sokolova. Learning with decision lists of data-dependent features. Journal of Machine Learning Research, 2005.

[33]

Tyler H. McCormick, Cynthia Rudin, and David Madigan. Bayesian hierarchical modeling for predicting medical conditions. The Annals of Applied Statistics, 6(2):652-668, 2012.

[34]

Ken McGarry. A survey of interestingness measures for knowledge discovery. The Knowledge Engineering Review, 20:39-61, 2005.

[35]

Grzegorz A. Rempala. Asymptotic factorial powers expansions for binomial and negative binomial reciprocals. In Proceedings of the American Mathematical Society, volume 132, pages 261-272, 2003.

[36]

Ronald L. Rivest. Learning decision lists. Machine Learning, 2(3):229-246, 1987.

[37]

William H. Rogers and Terry J. Wagner. A finite sample distribution-free performance bound for local discrimination rules. Annals of Statistics, 6(3):506-514, 1978.

[38]

V. Romanovsky. Note on the moments of a binomial (p+q)ⁿ about its mean. Biometrika, 15: 410-412, 1923.

[39]

Cynthia Rudin, Benjamin Letham, Ansaf Salleb-Aouissi, Eugene Kogan, and David Madigan. A framework for supervised learning with association rules. In Proceedings of the 24th Annual Conference on Learning Theory, 2011.

[40]

Xiaotong Shen and Lifeng Wang. Generalization error for multi-class margin classification. Electronic Journal of Statistics, pages 307-330, 2007.

[41]

György J. Simon, Vipin Kumar, and Peter W. Li. A simple statistical model and association rule filtering for classification. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 823-831, 2011.

[42]

Pang N. Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right interestingness measure for association patterns. In Proc. ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, 2002.

[43]

Fadi Thabtah. A review of associative classification mining. The Knowledge Engineering Review, 22:37-65, March 2007.

[44]

Koen Vanhoof and Benoît Depaire. Structure of association rule classifiers: a review. In Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering, pages 9-12, 2010.

[45]

Vladimir Vapnik. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5):988-999, September 1999.

[46]

Ke Wang, Yu He, David Wai-Lok Cheung, and Francis Y. L. Chin. Mining confident rules without support requirement. In Proc. Conference on Information and Knowledge Management, pages 89-96, 2001.

[47]

Xiaoxin Yin and Jiawei Han. CPAR: Classification based on predictive association rules. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 331-335, 2003.

Cited By

Fan WHan ZXie MZhang G(2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677131
Fan WHan ZWang YXie M(2023)Discovering Top-k Rules using Subjective and Objective CriteriaProceedings of the ACM on Management of Data10.1145/35889241:1(1-29)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588924
Li MYu LZhang YHuang XShi QCui QYang XLi LZhu WFang YZhou JAl Hasan MXiong L(2022)An Adaptive Framework for Confidence-constraint Rule Set Learning Algorithm in Large DatasetProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557088(3252-3261)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557088
Show More Cited By

Learning theory analysis for association rules and sequential event prediction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
2. Information systems
  1. Information systems applications

Recommendations

Discovering association rules change from large databases
AICI'11: Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part I

Discovering association rules and association rules change (ARC) from existing large databases is an important problem. This paper presents an approach based on multi-hash chain structures to mine association rules change from large database with ...
Mining changes in association rules: a fuzzy approach

Association rule mining is concerned with the discovery of interesting association relationships hidden in databases. Existing algorithms typically assume that data characteristics are stable over time. Their main focus is therefore to mine association ...
Mining Negatives Association Rules Using Constraints

Pattern discovery techniques, such as association rule discovery is one of the fundamental problem in data mining. Usually the task is limited to positive rules of the form X Y when X and Y are susbets of items. To enlarge the knowledge discovery from ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 14, Issue 1

January 2013

3717 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf
MPI for Intelligent Systems

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2013

Published in JMLR Volume 14, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
369
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan WHan ZXie MZhang G(2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677131
Fan WHan ZWang YXie M(2023)Discovering Top-k Rules using Subjective and Objective CriteriaProceedings of the ACM on Management of Data10.1145/35889241:1(1-29)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588924
Li MYu LZhang YHuang XShi QCui QYang XLi LZhu WFang YZhou JAl Hasan MXiong L(2022)An Adaptive Framework for Confidence-constraint Rule Set Learning Algorithm in Large DatasetProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557088(3252-3261)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557088
Geng YChen JYe ZYuan ZZhang WChen H(2021)Explainable zero-shot learning via attentive graph convolutional network and knowledge graphsSemantic Web10.3233/SW-21043512:5(741-765)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/SW-210435
Li MZhang YShi QYang XCui QLi LZhou J(2021)Constraint-Adaptive Rule Mining in Large DatabasesDatabase Systems for Advanced Applications10.1007/978-3-030-73200-4_41(579-591)Online publication date: 11-Apr-2021
https://dl.acm.org/doi/10.1007/978-3-030-73200-4_41
Anjomshoae SFrämling KNajjar A(2019)Explanations of Black-Box Model Predictions by Contextual Importance and UtilityExplainable, Transparent Autonomous Agents and Multi-Agent Systems10.1007/978-3-030-30391-4_6(95-109)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1007/978-3-030-30391-4_6
Abeyrathna KGranmo OZhang XGoodwin M(2019)A Scheme for Continuous Input to the Tsetlin Machine with Applications to Forecasting Disease OutbreaksAdvances and Trends in Artificial Intelligence. From Theory to Practice10.1007/978-3-030-22999-3_49(564-578)Online publication date: 9-Jul-2019
https://dl.acm.org/doi/10.1007/978-3-030-22999-3_49
Yang HRudin CSeltzer M(2017)Scalable Bayesian rule listsProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3306086(3921-3930)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305890.3306086
Angelino ELarus-Stone NAlabi DSeltzer MRudin C(2017)Learning certifiably optimal rule lists for categorical dataThe Journal of Machine Learning Research10.5555/3122009.329041918:1(8753-8830)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3122009.3290419
Wang TRudin CDoshi-Velez FLiu YKlampfl EMacNeille P(2017)A Bayesian framework for learning rule sets for interpretable classificationThe Journal of Machine Learning Research10.5555/3122009.317681418:1(2357-2393)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3122009.3176814
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents