Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Learning theory analysis for association rules and sequential event prediction

Published: 01 January 2013 Publication History

Abstract

We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called "sequential event prediction." In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer's online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the "cold start" problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an "adjusted confidence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.

References

[1]
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. 20th Int'l Conf. Very Large Data Bases, pages 487-499. Morgan Kaufmann, 1994.
[2]
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, 1993.
[3]
Martin Anthony. Generalization error bounds for threshold decision lists. Journal of Machine Learning Research, 5:189-217, 2004.
[4]
Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick. Sequential PAttern Mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
[5]
Kevin Bache and Moshe Lichman. UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ml.
[6]
André Berchtold and Adrian E. Raftery. The mixture transition distribution model for high-order markov chains and non-gaussian time series. Statistical Science, 17(3):328-356, 2002.
[7]
Olivier Bousquet and André Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499-526, 2002.
[8]
John S. Breese, David Heckerman, and Carl Myers Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proc. Uncertainty in Artificial Intelligence, pages 43-52, 1998.
[9]
Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, and Cheng Yang. Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng., 13(1):64-78, 2001.
[10]
Michelle Keim Condliff, David D. Lewis, David Madigan, and Christian Posse. Bayesian mixed-effects models for recommender systems. In ACM SIGIR Workshop on Recommender Systems: Algorithms and Evaluation, 1999.
[11]
Luc P. Devroye and Terry J. Wagner. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5):601-604, 1979.
[12]
William DuMouchel and Daryl Pregibon. Empirical bayes screening for multi-item associations. In Proc. ACM SIGKDD Int'l Conf. on Knowl. Discovery and Data Mining, pages 67-76, 2001.
[13]
Jerome H. Friedman and Bogdan E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916-954, 2008.
[14]
Liqiang Geng and Howard J. Hamilton. Choosing the right lens: Finding what is interesting in data mining. In Quality Measures in Data Mining, pages 3-24. Springer, 2007.
[15]
Xiangji Huang, Aijun An, Nick Cercone, and Gary Promhouse. Discovery of interesting association rules from Livelink web log data. In Proc. IEEE Int'l Conf. on Data Mining, 2002.
[16]
Peter Jackson. Introduction to Expert Systems (Third Edition). Addison-Wesley, 1998.
[17]
Stéphanie Jacquemont, François Jacquenet, and Marc Sebban. A lower bound on the sample size needed to perform a significant frequent pattern mining task. Pattern Recogn. Lett., 30:960-967, August 2009.
[18]
Xiang-Rong Jiang and Le Gruenwald. Microarray gene expression data association rules mining based on BSC-tree and FIS-tree. Data & Knowl. Eng., 53(1):3-29, 2005.
[19]
Norman Lloyd Johnson, Adrienne W. Kemp, and Samuel Kotz. Univariate Discrete Distributions. John Wiley & Sons, August 2005.
[20]
Yun Sing Koh. Mining non-coincidental rules without a user defined support threshold. In Advances in Knowl. Discovery and Data Mining, 12th Pacific-Asia Conf., pages 910-915, 2008.
[21]
Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian Zheng. Lessons and challenges from mining retail e-commerce data. Machine Learning, 57(1-2):83-113, 2004.
[22]
Richard D. Lawrence, George S. Almasi, Viveros Kotlyar, Marisa S. Viveros, and Sastry S. Duri. Personalization of supermarket product recommendations. Data Mining and Knowledge Discovery, 5(1-2):11-32, 2001.
[23]
Benjamin Letham. Similarity-weighted association rules for a name recommender system. In Proceedings of the ECML/PKDD Discovery Challenge Workshop, 2013.
[24]
Benjamin Letham, Cynthia Rudin, and Katherine A. Heller. Growing a list. Data Mining and Knowledge Discovery, 27(3):372-395, 2013a.
[25]
Benjamin Letham, Cynthia Rudin, and David Madigan. Sequential event prediction. Machine Learning, 93:357-380, 2013b.
[26]
Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. An interpretable stroke prediction model using rules and bayesian analysis. In Proceedings of AAAI Late Breaking Track, 2013c.
[27]
Jinyan Li, Xiuzhen Zhang, Guozho Dong, Kotagiri Ramamohanarao, and Qun Sun. Efficient mining of high confidence association rules without support thresholds. In Proc. Principles of Data Mining and Knowledge Discovery, pages 406-411, 1999.
[28]
Wenmin Li, Jiawei Han, and Jian Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. IEEE International Conference on Data Mining, pages 369-376, 2001.
[29]
Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6(1):83-105, 2002.
[30]
Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998.
[31]
David Madigan, Krzysztof Mosurski, and Russell G Almond. Graphical explanation in belief networks. Journal of Computational and Graphical Statistics, 6:160-181, 1997.
[32]
Mario Marchand and Marina Sokolova. Learning with decision lists of data-dependent features. Journal of Machine Learning Research, 2005.
[33]
Tyler H. McCormick, Cynthia Rudin, and David Madigan. Bayesian hierarchical modeling for predicting medical conditions. The Annals of Applied Statistics, 6(2):652-668, 2012.
[34]
Ken McGarry. A survey of interestingness measures for knowledge discovery. The Knowledge Engineering Review, 20:39-61, 2005.
[35]
Grzegorz A. Rempala. Asymptotic factorial powers expansions for binomial and negative binomial reciprocals. In Proceedings of the American Mathematical Society, volume 132, pages 261-272, 2003.
[36]
Ronald L. Rivest. Learning decision lists. Machine Learning, 2(3):229-246, 1987.
[37]
William H. Rogers and Terry J. Wagner. A finite sample distribution-free performance bound for local discrimination rules. Annals of Statistics, 6(3):506-514, 1978.
[38]
V. Romanovsky. Note on the moments of a binomial (p+q)n about its mean. Biometrika, 15: 410-412, 1923.
[39]
Cynthia Rudin, Benjamin Letham, Ansaf Salleb-Aouissi, Eugene Kogan, and David Madigan. A framework for supervised learning with association rules. In Proceedings of the 24th Annual Conference on Learning Theory, 2011.
[40]
Xiaotong Shen and Lifeng Wang. Generalization error for multi-class margin classification. Electronic Journal of Statistics, pages 307-330, 2007.
[41]
György J. Simon, Vipin Kumar, and Peter W. Li. A simple statistical model and association rule filtering for classification. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 823-831, 2011.
[42]
Pang N. Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right interestingness measure for association patterns. In Proc. ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, 2002.
[43]
Fadi Thabtah. A review of associative classification mining. The Knowledge Engineering Review, 22:37-65, March 2007.
[44]
Koen Vanhoof and Benoît Depaire. Structure of association rule classifiers: a review. In Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering, pages 9-12, 2010.
[45]
Vladimir Vapnik. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5):988-999, September 1999.
[46]
Ke Wang, Yu He, David Wai-Lok Cheung, and Francis Y. L. Chin. Mining confident rules without support requirement. In Proc. Conference on Information and Knowledge Management, pages 89-96, 2001.
[47]
Xiaoxin Yin and Jiawei Han. CPAR: Classification based on predictive association rules. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 331-335, 2003.

Cited By

View all
  • (2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
  • (2023)Discovering Top-k Rules using Subjective and Objective CriteriaProceedings of the ACM on Management of Data10.1145/35889241:1(1-29)Online publication date: 30-May-2023
  • (2022)An Adaptive Framework for Confidence-constraint Rule Set Learning Algorithm in Large DatasetProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557088(3252-3261)Online publication date: 17-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 14, Issue 1
January 2013
3717 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2013
Published in JMLR Volume 14, Issue 1

Author Tags

  1. algorithmic stability
  2. association rules
  3. associative classification
  4. sequence prediction
  5. statistical learning theory

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)8
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
  • (2023)Discovering Top-k Rules using Subjective and Objective CriteriaProceedings of the ACM on Management of Data10.1145/35889241:1(1-29)Online publication date: 30-May-2023
  • (2022)An Adaptive Framework for Confidence-constraint Rule Set Learning Algorithm in Large DatasetProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557088(3252-3261)Online publication date: 17-Oct-2022
  • (2021)Explainable zero-shot learning via attentive graph convolutional network and knowledge graphsSemantic Web10.3233/SW-21043512:5(741-765)Online publication date: 1-Jan-2021
  • (2021)Constraint-Adaptive Rule Mining in Large DatabasesDatabase Systems for Advanced Applications10.1007/978-3-030-73200-4_41(579-591)Online publication date: 11-Apr-2021
  • (2019)Explanations of Black-Box Model Predictions by Contextual Importance and UtilityExplainable, Transparent Autonomous Agents and Multi-Agent Systems10.1007/978-3-030-30391-4_6(95-109)Online publication date: 13-May-2019
  • (2019)A Scheme for Continuous Input to the Tsetlin Machine with Applications to Forecasting Disease OutbreaksAdvances and Trends in Artificial Intelligence. From Theory to Practice10.1007/978-3-030-22999-3_49(564-578)Online publication date: 9-Jul-2019
  • (2017)Scalable Bayesian rule listsProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3306086(3921-3930)Online publication date: 6-Aug-2017
  • (2017)Learning certifiably optimal rule lists for categorical dataThe Journal of Machine Learning Research10.5555/3122009.329041918:1(8753-8830)Online publication date: 1-Jan-2017
  • (2017)A Bayesian framework for learning rule sets for interpretable classificationThe Journal of Machine Learning Research10.5555/3122009.317681418:1(2357-2393)Online publication date: 1-Jan-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media