research-article

Ranking interesting subgroups

Author:

Stefan RuepingAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 913 - 920

https://doi.org/10.1145/1553374.1553491

Published: 14 June 2009 Publication History

Abstract

Subgroup discovery is the task of identifying the top k patterns in a database with most significant deviation in the distribution of a target attribute Y. Subgroup discovery is a popular approach for identifying interesting patterns in data, because it combines statistical significance with an understandable representation of patterns as a logical formula. However, it is often a problem that some subgroups, even if they are statistically highly significant, are not interesting to the user. We present an approach based on the work on ranking Support Vector Machines that ranks subgroups with respect to the user's concept of interestingness, and finds more interesting subgroups. This approach can significantly increase the quality of the subgroups.

References

[1]

Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.

[2]

Atzmueller, M., Puppe, F., & Buscher, H.-P. (2005). Exploiting background knowledge for knowledge-intensive subgroup discovery. Proc. 19th International Joint Conference on Artificial Intelligence (IJCAI-05) (pp. 647--652).

Digital Library

[3]

Bratko, I. (1996). Machine learning: Between accuracy and interpretability. In G. Della Riccia, H.-J. Lenz and R. Kruse (Eds.), Learning, networks and statistics, vol. 382 of CISM Int. Centre for Mechanical Sciences Courses and Lectures, 163--177. Springer.

[4]

Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1--30.

Digital Library

[5]

Grosskreutz, H., Rüping, S., & Wrobel, S. (2008). Tight optimistic estimates for fast subgroup discovery. Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (pp. 440--456). Springer.

[6]

Hand, D. (2002). Pattern detection and discovery. In D. Hand, N. Adams and R. Bolton (Eds.), Pattern detection and discovery. Springer.

Digital Library

[7]

Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In P. J. Bartlett, B. Schöölkopf, D. Schuurmans and A. J. Smola (Eds.), Advances in large margin classifiers, 115--132. MIT Press.

[8]

Joachims, T. (2002). Optimizing search engines using clickthrough data. Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD) (pp. 133--142).

Digital Library

[9]

Kavsek, B., Lavrac, N., & Jovanoski, V. (2003). Apriori-sd: Adapting association rule learning to subgroup discovery. Proc. 5th International Symposium on Intelligent Data Analysis (pp. 230--241).

[10]

Klösgen, W. (1996). Explora: A Multipattern and Multistrategy Discovery Assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining, chapter 3, 249--272. AAAI / MIT Press.

Digital Library

[11]

Lavrac, N., Cestnik, B., Gamberger, D., & Flach, P. A. (2004). Decision support through subgroup discovery: Three case studies and the lessons learned. Machine Learning, 57, 115--143.

Digital Library

[12]

Morik, K., & Kööpcke, H. (2004). Analysing Customer Churn in Insurance Data - A Case Study. Knowledge Discovery in Databases: PKDD 2004 (pp. 325--336). Springer.

Digital Library

[13]

Radlinski, F., Kurup, M., & Joachims, T. (2008). How does clickthrough data reflect retrieval quality? Proceedings of the ACM Conference on Information and Knowledge Management (CIKM) (pp. 43--52).

Digital Library

[14]

Scholz, M. (2005). Sampling-Based Sequential Subgroup Mining. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Databases (KDD'05) (pp. 265--274).

Digital Library

[15]

Srikant, R., Vu, Q., & Agrawal, R. (1997). Mining association rules with item constraints. Proceedings 3rd International Conference on Knowledge Discovery and Data Mining (pp. 66--73). AAAI Press.

Cited By

Lehembre ECremilleux BZimmermann ACuissart BOuali A(2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 26-May-2024
https://doi.org/10.1007/s10618-024-01037-8
Vernerey CAribi NLoudni SLebbah YBelmecheri N(2024)Learning to Rank Based on Choquet Integral: Application to Association RulesAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2242-6_25(313-326)Online publication date: 25-Apr-2024
https://doi.org/10.1007/978-981-97-2242-6_25
Celik MDokuz AEcemis AErdogmus E(2024)Discovering and Ranking Urban Social Clusters Out of Streaming Social Media DatasetsConcurrency and Computation: Practice and Experience10.1002/cpe.8314Online publication date: 24-Oct-2024
https://doi.org/10.1002/cpe.8314
Show More Cited By

Index Terms

Ranking interesting subgroups
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
      2. Modeling methodologies
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Statistical graphics

Recommendations

Contrast mining from interesting subgroups
Bisociative Knowledge Discovery

Subgroup discovery methods find interesting subsets of objects of a given class. We propose to extend subgroup discovery by a second subgroup discovery step to find interesting subgroups of objects specific for a class in one or more contrast classes. ...
Efficient Discovery of the Most Interesting Associations

Self-sufficient itemsets have been proposed as an effective approach to summarizing the key associations in data. However, their computation appears highly demanding, as assessing whether an itemset is self-sufficient requires consideration of all ...
Discovering Interesting Patterns from Hypergraphs
A hypergraph is a complex data structure capable of expressing associations among any number of data entities. Overcoming the limitations of traditional graphs, hypergraphs are useful to model real-life problems. Frequent pattern mining is one of the most ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Sixth Framework Programme

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
207
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lehembre ECremilleux BZimmermann ACuissart BOuali A(2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 26-May-2024
https://doi.org/10.1007/s10618-024-01037-8
Vernerey CAribi NLoudni SLebbah YBelmecheri N(2024)Learning to Rank Based on Choquet Integral: Application to Association RulesAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2242-6_25(313-326)Online publication date: 25-Apr-2024
https://doi.org/10.1007/978-981-97-2242-6_25
Celik MDokuz AEcemis AErdogmus E(2024)Discovering and Ranking Urban Social Clusters Out of Streaming Social Media DatasetsConcurrency and Computation: Practice and Experience10.1002/cpe.8314Online publication date: 24-Oct-2024
https://doi.org/10.1002/cpe.8314
Sourati ZPrasanna Venkatesh VDeshpande DRawlani HIlievski FSandlin HMermoud A(2023)Robust and explainable identification of logical fallacies in natural language argumentsKnowledge-Based Systems10.1016/j.knosys.2023.110418266:COnline publication date: 22-Apr-2023
https://dl.acm.org/doi/10.1016/j.knosys.2023.110418
Yu YWang WWu NLiu HShao M(2022)IISDKnowledge-Based Systems10.1016/j.knosys.2021.108080240:COnline publication date: 15-Mar-2022
https://dl.acm.org/doi/10.1016/j.knosys.2021.108080
Elgharabawy APrasad MLin C(2021)Subgroup Preference Neural NetworkSensors10.3390/s2118610421:18(6104)Online publication date: 12-Sep-2021
https://doi.org/10.3390/s21186104
Maina SBryant ROgallo WVarshney KSpeakman SCintas CWalcott-Bryant ASamoilescu RWeldemariam K(2020)Preservation of Anomalous Subgroups On Variational Autoencoder Transformed DataICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9054495(3627-3631)Online publication date: May-2020
https://doi.org/10.1109/ICASSP40776.2020.9054495
Bendimerad APlantevit MRobardet CAmer-Yahia S(2019)User-driven geolocated event detection in social mediaIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2931340(1-1)Online publication date: 2019
https://doi.org/10.1109/TKDE.2019.2931340
Giacometti ASoulet A(2017)Interactive Pattern Sampling for Characterizing Unlabeled DataAdvances in Intelligent Data Analysis XVI10.1007/978-3-319-68765-0_9(99-111)Online publication date: 4-Oct-2017
https://doi.org/10.1007/978-3-319-68765-0_9
Soulet A(2017)Two Decades of Pattern Mining: Principles and MethodsBusiness Intelligence10.1007/978-3-319-61164-8_3(59-78)Online publication date: 4-Jul-2017
https://doi.org/10.1007/978-3-319-61164-8_3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents