research-article

Interpretable Decision Sets: A Joint Framework for Description and Prediction

Authors:

Himabindu Lakkaraju,

Stephen H. Bach,

Jure LeskovecAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1675 - 1684

https://doi.org/10.1145/2939672.2939874

Published: 13 August 2016 Publication History

Abstract

One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a model's prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems.

Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable. We formalize decision set learning through an objective function that simultaneously optimizes accuracy and interpretability of the rules. In particular, our approach learns short, accurate, and non-overlapping rules that cover the whole feature space and pay attention to small but important classes. Moreover, we prove that our objective is a non-monotone submodular function, which we efficiently optimize to find a near-optimal set of rules.

Experiments show that interpretable decision sets are as accurate at classification as state-of-the-art machine learning techniques. They are also three times smaller on average than rule-based models learned by other methods. Finally, results of a user study show that people are able to answer multiple-choice questions about the decision boundaries of interpretable decision sets and write descriptions of classes based on them faster and more accurately than with other rule-based models that were designed for interpretability. Overall, our framework provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency.

References

[1]

R. Agrawal, T. Imieli'nski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.

Digital Library

[2]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, 1994.

Digital Library

[3]

P. J. Azevedo. Rules for contrast sets. Intelligent Data Analysis, 14(6):623--640, 2010.

Digital Library

[4]

S. D. Bay and M. J. Pazzani. Detecting change in categorical data: Mining contrast sets. In KDD, 1999.

Digital Library

[5]

D. Bertsimas, A. Chang, and C. Rudin. Ordered rules for classification: A discrete optimization approach to associative classification. Operations Research Center Working Paper OR 386--11, MIT, 2011.

[6]

J. Bien and R. Tibshirani. Classification by set cover: The prototype vector machine. arXiv:0908.2284 {stat.ML}, 2009.

[7]

A. Blum. On-line algorithms in machine learning. In A. Fiat and G. J. Woeginger, editors, Online Algorithms: The State of the Art, volume 1442 of Lecture Notes in Computer Science, chapter 14, pages 306--325. 1998.

Digital Library

[8]

L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman and Hall/CRC, 1984.

[9]

L. C. Briand, V. R. Brasili, and C. J. Hetmanski. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering, 19(11):1028--1044, 1993.

Digital Library

[10]

B. Bringmann and A. Zimmermann. The chosen few: On identifying valuable patterns. In ICDM, 2007.

Digital Library

[11]

B. Bringmann and A. Zimmermann. One in a million: picking the right patterns. Knowledge and Information Systems, 18(1):61--81, 2009.

Digital Library

[12]

J. B. Carroll. An analytical solution for approximating simple structure in factor analysis. Psychometrika, 18(1):23--38, 1953.

[13]

H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, 2007.

[14]

P. Clark and R. Boswell. Rule induction with CN2: Some recent improvements. In European Working Session on Machine Learning, 1991.

Digital Library

[15]

P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261--283, 1989.

[16]

N. Cruz, J. Baratgin, M. Oaksford, and D. E. Over. Bayesian reasoning with ifs and ands and ors. Frontiers in psychology, 6, 2015.

[17]

G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, 1999.

Digital Library

[18]

U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions. SIAM J. on Computing, 40(4):1133--1153, 2011.

Digital Library

[19]

M. García-Borroto, J. F. Martínez-Trinidad, and J. A. Carrasco-Ochoa. A new emerging pattern mining algorithm and its application in supervised classification. In KDD. 2010.

Digital Library

[20]

S. Guillaume. Designing fuzzy inference systems from data: An interpretability-oriented review. IEEE Transactions on Fuzzy Systems, 9(3):426--443, 2001.

Digital Library

[21]

D. J. Hand and R. J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 45(2):171--186, 2001.

Digital Library

[22]

J. Hartline, V. Mirrokni, and M. Sundararajan. Optimal marketing strategies over social networks. In WWW, pages 189--198, 2008.

Digital Library

[23]

T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman and Hall/CRC, 1990.

[24]

F. Herrera, C. J. Carmona, P. González, and M. J. Del Jesus. An overview on subgroup discovery: foundations and applications. Knowledge and information systems, 29(3):495--525, 2011.

Digital Library

[25]

K. L. Jordan and T. L. Freiburger. The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length. J. of Ethnicity in Criminal Justice, 13(3):179--196, 2015.

[26]

S. Khuller, A. Moss, and J. S. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.

Digital Library

[27]

B. Kim, C. Rudin, and J. Shah. The Bayesian case model: A generative approach for case-based reasoning and prototype classification. In NIPS, 2014.

Digital Library

[28]

B. Kim, J. Shah, and F. Doshi-Velez. Mind the gap: A generative approach tointerpretable feature selection and extraction. In NIPS, 2015.

Digital Library

[29]

A. R. Klivans and R. A. Servedio. Toward attribute efficient learning of decision lists and parities. J. of Machine Learning Research, 7:587--602, 2006.

Digital Library

[30]

P. Kralj, N. Lavrač, D. Gamberger, and A. Krstačić. Contrast set mining for distinguishing between similar diseases. Artificial Intelligence in Medicine, pages 109--118, 2007.

Digital Library

[31]

H. Lakkaraju, E. Aguiar, C. Shan, D. Miller, N. Bhanpuri, R. Ghani, and K. L. Addison. A machine learning framework to identify students at risk of adverse academic outcomes. In KDD, 2015.

Digital Library

[32]

H. Lakkaraju, S. H. Bach, and J. Leskovec. Interpretable decision sets: A joint framework for description and prediction. Technical report, Stanford InfoLab, 2016.

[33]

N. Lavrač, B. Kavšek, P. Flach, and L. Todorovski. Subgroup discovery with CN2-SD. J. of Machine Learning Research, 5:153--188, 2004.

Digital Library

[34]

B. Letham, C. Rudin, T. H. McCormick, and D. Madigan. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 9(3):1350--1371, 2015.

[35]

B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD, 1998.

Digital Library

[36]

Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In KDD, 2012.

Digital Library

[37]

Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate intelligible models with pairwise interactions. In KDD, 2013.

Digital Library

[38]

D. Malioutov and K. Varshney. Exact rule learning via boolean compressed sensing. In ICML, 2013.

Digital Library

[39]

D. Nauck and R. Kruse. Obtaining interpretable fuzzy classification rules from medical data. A.I. in Medicine, 16(2):149--169, 1999.

[40]

P. K. Novak, N. Lavrač, and G. I. Webb. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. J. of Machine Learning Research, 10:377--403, 2009.

Digital Library

[41]

H. Y. Ong, D. Wang, and X. S. Mu. Diabetes prediction with incomplete patient data. Technical report, 2014.

[42]

J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

Digital Library

[43]

W. Revelle and T. Rocklin. Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14(4):403--414, 1979.

[44]

G. Ridgeway, D. Madigan, T. Richardson, and J. O'Kane. Interpretable boosted naive Bayes classification. In KDD, 1998.

[45]

R. L. Rivest. Learning decision lists. Machine Learning, 2(3):229--246, 1987.

[46]

D. Rodríguez, R. Ruiz, J. C. Riquelme, and J. S. Aguilar-Ruiz. Searching for rules to detect defective modules: a subgroup discovery approach. Information Sciences, 191:14--30, 2012.

Digital Library

[47]

H. Schielzeth. Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1(2):103--113, 2010.

[48]

G. Su, D. Wei, K. R. Varshney, and D. M. Malioutov. Interpretable two-level Boolean rule learning for classification. arXiv:1511.07361, 2015.

[49]

R. Tibshirani. Regression shrinkage and selection via the lasso. J. of the Royal Statistical Society. Series B, 58(1):267--288, 1996.

[50]

B. Ustun and C. Rudin. Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102(3):1--43, 2015.

Digital Library

[51]

L. G. Valiant. Projection learning. Machine Learning, 37(2):115--130, 1999.

Digital Library

[52]

J. Wang and G. Karypis. Harmony: Efficiently mining the best rules for classification. In SDM, 2005.

[53]

T. Wang, C. Rudin, F. Doshi-Velez, Y. Liu, E. Klampfl, and P. MacNeille. Or's of and's for interpretable classification, with application to context-aware recommender systems. arXiv:1504.07614, 2015.

[54]

S. Wedyan. Review and comparison of associative classification data mining approaches. International J. of Computer, Electrical, Automation, Control and Information Engineering, 8(1):34--45, 2014.

[55]

S. Wood. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC, 2006.

Digital Library

[56]

X. Yin and J. Han. Cpar: Classification based on predictive association rules. In SDM, 2003.

Cited By

Abdelhamed WEl-Kassas M(2025)Integrating artificial intelligence into multidisciplinary evaluations of HCC: opportunities and challengesHepatoma Research10.20517/2394-5079.2024.138Online publication date: 4-Mar-2025
https://doi.org/10.20517/2394-5079.2024.138
Tan YSingh CNasseri KAgarwal ADuncan JRonen OEpland MKornblith AYu B(2025)Fast Interpretable Greedy-Tree SumsProceedings of the National Academy of Sciences10.1073/pnas.2310151122122:7Online publication date: 14-Feb-2025
https://doi.org/10.1073/pnas.2310151122
Kozielski MSikora MWawrowski Ł(2025)Towards consistency of rule-based explainer and black box model — Fusion of rule induction and XAI-based feature importanceKnowledge-Based Systems10.1016/j.knosys.2025.113092311(113092)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113092
Show More Cited By

Index Terms

Interpretable Decision Sets: A Joint Framework for Description and Prediction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Cost-sensitive learning

Recommendations

Concise and interpretable multi-label rule sets
Abstract
Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple “if-then” rules, and ...
Learning Accurate and Interpretable Decision Rule Sets from Neural Networks
Learning interpretable decision rule sets: a submodular optimization approach
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Rule sets are highly interpretable logical models in which the predicates for decision are expressed in disjunctive normal form (DNF, OR-of-ANDs), or, equivalently, the overall model comprises an unordered collection of if-then decision rules. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

406
Total Citations
View Citations
2,943
Total Downloads

Downloads (Last 12 months)324
Downloads (Last 6 weeks)33

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Abdelhamed WEl-Kassas M(2025)Integrating artificial intelligence into multidisciplinary evaluations of HCC: opportunities and challengesHepatoma Research10.20517/2394-5079.2024.138Online publication date: 4-Mar-2025
https://doi.org/10.20517/2394-5079.2024.138
Tan YSingh CNasseri KAgarwal ADuncan JRonen OEpland MKornblith AYu B(2025)Fast Interpretable Greedy-Tree SumsProceedings of the National Academy of Sciences10.1073/pnas.2310151122122:7Online publication date: 14-Feb-2025
https://doi.org/10.1073/pnas.2310151122
Kozielski MSikora MWawrowski Ł(2025)Towards consistency of rule-based explainer and black box model — Fusion of rule induction and XAI-based feature importanceKnowledge-Based Systems10.1016/j.knosys.2025.113092311(113092)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113092
Haddouchi MBerrado A(2025)Forest-OREEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109997143:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109997
Velmurugan MOuyang CXu YSindhgatta RWickramanayake BMoreira C(2025)Developing guidelines for functionally-grounded evaluation of explainable artificial intelligence using tabular dataEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109772141:COnline publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109772
Liapis GTsoka SPapageorgiou L(2025)Interpretable optimisation-based approach for hyper-box classificationMachine Learning10.1007/s10994-024-06643-7114:3Online publication date: 6-Feb-2025
https://doi.org/10.1007/s10994-024-06643-7
Vieira-Manzanera EPatricio MBerlanga AMolina J(2025)Analyzing feature importance with neural-network-derived treesNeural Computing and Applications10.1007/s00521-024-10811-037:5(3419-3433)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s00521-024-10811-0
Strecht PMendes-Moreira JSoares C(2025)Estimating Completeness of Consensus Models: Geometrical and Distributional ApproachesMachine Learning, Optimization, and Data Science10.1007/978-3-031-82481-4_32(464-478)Online publication date: 4-Mar-2025
https://doi.org/10.1007/978-3-031-82481-4_32
Imam N(2024)Adversarial Examples on XAI-Enabled DT for Smart Healthcare SystemsSensors10.3390/s2421689124:21(6891)Online publication date: 27-Oct-2024
https://doi.org/10.3390/s24216891
Welchowski TEdelmann D(2024)Interaction Difference Hypothesis Test for Prediction ModelsMachine Learning and Knowledge Extraction10.3390/make60200616:2(1298-1322)Online publication date: 14-Jun-2024
https://doi.org/10.3390/make6020061
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten