Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2939672.2939874acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Interpretable Decision Sets: A Joint Framework for Description and Prediction

Published: 13 August 2016 Publication History

Abstract

One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a model's prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems.
Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable. We formalize decision set learning through an objective function that simultaneously optimizes accuracy and interpretability of the rules. In particular, our approach learns short, accurate, and non-overlapping rules that cover the whole feature space and pay attention to small but important classes. Moreover, we prove that our objective is a non-monotone submodular function, which we efficiently optimize to find a near-optimal set of rules.
Experiments show that interpretable decision sets are as accurate at classification as state-of-the-art machine learning techniques. They are also three times smaller on average than rule-based models learned by other methods. Finally, results of a user study show that people are able to answer multiple-choice questions about the decision boundaries of interpretable decision sets and write descriptions of classes based on them faster and more accurately than with other rule-based models that were designed for interpretability. Overall, our framework provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency.

References

[1]
R. Agrawal, T. Imieli'nski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, 1994.
[3]
P. J. Azevedo. Rules for contrast sets. Intelligent Data Analysis, 14(6):623--640, 2010.
[4]
S. D. Bay and M. J. Pazzani. Detecting change in categorical data: Mining contrast sets. In KDD, 1999.
[5]
D. Bertsimas, A. Chang, and C. Rudin. Ordered rules for classification: A discrete optimization approach to associative classification. Operations Research Center Working Paper OR 386--11, MIT, 2011.
[6]
J. Bien and R. Tibshirani. Classification by set cover: The prototype vector machine. arXiv:0908.2284 {stat.ML}, 2009.
[7]
A. Blum. On-line algorithms in machine learning. In A. Fiat and G. J. Woeginger, editors, Online Algorithms: The State of the Art, volume 1442 of Lecture Notes in Computer Science, chapter 14, pages 306--325. 1998.
[8]
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman and Hall/CRC, 1984.
[9]
L. C. Briand, V. R. Brasili, and C. J. Hetmanski. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering, 19(11):1028--1044, 1993.
[10]
B. Bringmann and A. Zimmermann. The chosen few: On identifying valuable patterns. In ICDM, 2007.
[11]
B. Bringmann and A. Zimmermann. One in a million: picking the right patterns. Knowledge and Information Systems, 18(1):61--81, 2009.
[12]
J. B. Carroll. An analytical solution for approximating simple structure in factor analysis. Psychometrika, 18(1):23--38, 1953.
[13]
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, 2007.
[14]
P. Clark and R. Boswell. Rule induction with CN2: Some recent improvements. In European Working Session on Machine Learning, 1991.
[15]
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261--283, 1989.
[16]
N. Cruz, J. Baratgin, M. Oaksford, and D. E. Over. Bayesian reasoning with ifs and ands and ors. Frontiers in psychology, 6, 2015.
[17]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, 1999.
[18]
U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions. SIAM J. on Computing, 40(4):1133--1153, 2011.
[19]
M. García-Borroto, J. F. Martínez-Trinidad, and J. A. Carrasco-Ochoa. A new emerging pattern mining algorithm and its application in supervised classification. In KDD. 2010.
[20]
S. Guillaume. Designing fuzzy inference systems from data: An interpretability-oriented review. IEEE Transactions on Fuzzy Systems, 9(3):426--443, 2001.
[21]
D. J. Hand and R. J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 45(2):171--186, 2001.
[22]
J. Hartline, V. Mirrokni, and M. Sundararajan. Optimal marketing strategies over social networks. In WWW, pages 189--198, 2008.
[23]
T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman and Hall/CRC, 1990.
[24]
F. Herrera, C. J. Carmona, P. González, and M. J. Del Jesus. An overview on subgroup discovery: foundations and applications. Knowledge and information systems, 29(3):495--525, 2011.
[25]
K. L. Jordan and T. L. Freiburger. The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length. J. of Ethnicity in Criminal Justice, 13(3):179--196, 2015.
[26]
S. Khuller, A. Moss, and J. S. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.
[27]
B. Kim, C. Rudin, and J. Shah. The Bayesian case model: A generative approach for case-based reasoning and prototype classification. In NIPS, 2014.
[28]
B. Kim, J. Shah, and F. Doshi-Velez. Mind the gap: A generative approach tointerpretable feature selection and extraction. In NIPS, 2015.
[29]
A. R. Klivans and R. A. Servedio. Toward attribute efficient learning of decision lists and parities. J. of Machine Learning Research, 7:587--602, 2006.
[30]
P. Kralj, N. Lavrač, D. Gamberger, and A. Krstačić. Contrast set mining for distinguishing between similar diseases. Artificial Intelligence in Medicine, pages 109--118, 2007.
[31]
H. Lakkaraju, E. Aguiar, C. Shan, D. Miller, N. Bhanpuri, R. Ghani, and K. L. Addison. A machine learning framework to identify students at risk of adverse academic outcomes. In KDD, 2015.
[32]
H. Lakkaraju, S. H. Bach, and J. Leskovec. Interpretable decision sets: A joint framework for description and prediction. Technical report, Stanford InfoLab, 2016.
[33]
N. Lavrač, B. Kavšek, P. Flach, and L. Todorovski. Subgroup discovery with CN2-SD. J. of Machine Learning Research, 5:153--188, 2004.
[34]
B. Letham, C. Rudin, T. H. McCormick, and D. Madigan. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 9(3):1350--1371, 2015.
[35]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD, 1998.
[36]
Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In KDD, 2012.
[37]
Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate intelligible models with pairwise interactions. In KDD, 2013.
[38]
D. Malioutov and K. Varshney. Exact rule learning via boolean compressed sensing. In ICML, 2013.
[39]
D. Nauck and R. Kruse. Obtaining interpretable fuzzy classification rules from medical data. A.I. in Medicine, 16(2):149--169, 1999.
[40]
P. K. Novak, N. Lavrač, and G. I. Webb. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. J. of Machine Learning Research, 10:377--403, 2009.
[41]
H. Y. Ong, D. Wang, and X. S. Mu. Diabetes prediction with incomplete patient data. Technical report, 2014.
[42]
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[43]
W. Revelle and T. Rocklin. Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14(4):403--414, 1979.
[44]
G. Ridgeway, D. Madigan, T. Richardson, and J. O'Kane. Interpretable boosted naive Bayes classification. In KDD, 1998.
[45]
R. L. Rivest. Learning decision lists. Machine Learning, 2(3):229--246, 1987.
[46]
D. Rodríguez, R. Ruiz, J. C. Riquelme, and J. S. Aguilar-Ruiz. Searching for rules to detect defective modules: a subgroup discovery approach. Information Sciences, 191:14--30, 2012.
[47]
H. Schielzeth. Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1(2):103--113, 2010.
[48]
G. Su, D. Wei, K. R. Varshney, and D. M. Malioutov. Interpretable two-level Boolean rule learning for classification. arXiv:1511.07361, 2015.
[49]
R. Tibshirani. Regression shrinkage and selection via the lasso. J. of the Royal Statistical Society. Series B, 58(1):267--288, 1996.
[50]
B. Ustun and C. Rudin. Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102(3):1--43, 2015.
[51]
L. G. Valiant. Projection learning. Machine Learning, 37(2):115--130, 1999.
[52]
J. Wang and G. Karypis. Harmony: Efficiently mining the best rules for classification. In SDM, 2005.
[53]
T. Wang, C. Rudin, F. Doshi-Velez, Y. Liu, E. Klampfl, and P. MacNeille. Or's of and's for interpretable classification, with application to context-aware recommender systems. arXiv:1504.07614, 2015.
[54]
S. Wedyan. Review and comparison of associative classification data mining approaches. International J. of Computer, Electrical, Automation, Control and Information Engineering, 8(1):34--45, 2014.
[55]
S. Wood. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC, 2006.
[56]
X. Yin and J. Han. Cpar: Classification based on predictive association rules. In SDM, 2003.

Cited By

View all
  • (2024)Adversarial Examples on XAI-Enabled DT for Smart Healthcare SystemsSensors10.3390/s2421689124:21(6891)Online publication date: 27-Oct-2024
  • (2024)Interaction Difference Hypothesis Test for Prediction ModelsMachine Learning and Knowledge Extraction10.3390/make60200616:2(1298-1322)Online publication date: 14-Jun-2024
  • (2024)A Meta Algorithm for Interpretable Ensemble Learning: The League of ExpertsMachine Learning and Knowledge Extraction10.3390/make60200386:2(800-826)Online publication date: 9-Apr-2024
  • Show More Cited By

Index Terms

  1. Interpretable Decision Sets: A Joint Framework for Description and Prediction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2016
    2176 pages
    ISBN:9781450342322
    DOI:10.1145/2939672
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. decision sets
    3. interpretable machine learning
    4. submodularity

    Qualifiers

    • Research-article

    Conference

    KDD '16
    Sponsor:

    Acceptance Rates

    KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)343
    • Downloads (Last 6 weeks)50
    Reflects downloads up to 30 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Adversarial Examples on XAI-Enabled DT for Smart Healthcare SystemsSensors10.3390/s2421689124:21(6891)Online publication date: 27-Oct-2024
    • (2024)Interaction Difference Hypothesis Test for Prediction ModelsMachine Learning and Knowledge Extraction10.3390/make60200616:2(1298-1322)Online publication date: 14-Jun-2024
    • (2024)A Meta Algorithm for Interpretable Ensemble Learning: The League of ExpertsMachine Learning and Knowledge Extraction10.3390/make60200386:2(800-826)Online publication date: 9-Apr-2024
    • (2024)Why Do Tree Ensemble Approximators Not Outperform the Recursive-Rule eXtraction Algorithm?Machine Learning and Knowledge Extraction10.3390/make60100316:1(658-678)Online publication date: 16-Mar-2024
    • (2024)Explainable Artificial Intelligence in Quantifying Breast Cancer Factors: Saudi Arabia ContextHealthcare10.3390/healthcare1210102512:10(1025)Online publication date: 15-May-2024
    • (2024)Online Supplemental Materials for HEX: Human-in-the-Loop Explainability via Deep Reinforcement LearningSSRN Electronic Journal10.2139/ssrn.4945516Online publication date: 2024
    • (2024)Outlier Summarization via Human Interpretable RulesProceedings of the VLDB Endowment10.14778/3654621.365462717:7(1591-1604)Online publication date: 30-May-2024
    • (2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
    • (2024)DeciX: Explain Deep Learning Based Code Generation ApplicationsProceedings of the ACM on Software Engineering10.1145/36608141:FSE(2424-2446)Online publication date: 12-Jul-2024
    • (2024)The AI-DEC: A Card-based Design Method for User-centered AI ExplanationsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661576(1010-1028)Online publication date: 1-Jul-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media