Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1553374.1553491acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Ranking interesting subgroups

Published: 14 June 2009 Publication History

Abstract

Subgroup discovery is the task of identifying the top k patterns in a database with most significant deviation in the distribution of a target attribute Y. Subgroup discovery is a popular approach for identifying interesting patterns in data, because it combines statistical significance with an understandable representation of patterns as a logical formula. However, it is often a problem that some subgroups, even if they are statistically highly significant, are not interesting to the user. We present an approach based on the work on ranking Support Vector Machines that ranks subgroups with respect to the user's concept of interestingness, and finds more interesting subgroups. This approach can significantly increase the quality of the subgroups.

References

[1]
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
[2]
Atzmueller, M., Puppe, F., & Buscher, H.-P. (2005). Exploiting background knowledge for knowledge-intensive subgroup discovery. Proc. 19th International Joint Conference on Artificial Intelligence (IJCAI-05) (pp. 647--652).
[3]
Bratko, I. (1996). Machine learning: Between accuracy and interpretability. In G. Della Riccia, H.-J. Lenz and R. Kruse (Eds.), Learning, networks and statistics, vol. 382 of CISM Int. Centre for Mechanical Sciences Courses and Lectures, 163--177. Springer.
[4]
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1--30.
[5]
Grosskreutz, H., Rüping, S., & Wrobel, S. (2008). Tight optimistic estimates for fast subgroup discovery. Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (pp. 440--456). Springer.
[6]
Hand, D. (2002). Pattern detection and discovery. In D. Hand, N. Adams and R. Bolton (Eds.), Pattern detection and discovery. Springer.
[7]
Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In P. J. Bartlett, B. Schöölkopf, D. Schuurmans and A. J. Smola (Eds.), Advances in large margin classifiers, 115--132. MIT Press.
[8]
Joachims, T. (2002). Optimizing search engines using clickthrough data. Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD) (pp. 133--142).
[9]
Kavsek, B., Lavrac, N., & Jovanoski, V. (2003). Apriori-sd: Adapting association rule learning to subgroup discovery. Proc. 5th International Symposium on Intelligent Data Analysis (pp. 230--241).
[10]
Klösgen, W. (1996). Explora: A Multipattern and Multistrategy Discovery Assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining, chapter 3, 249--272. AAAI / MIT Press.
[11]
Lavrac, N., Cestnik, B., Gamberger, D., & Flach, P. A. (2004). Decision support through subgroup discovery: Three case studies and the lessons learned. Machine Learning, 57, 115--143.
[12]
Morik, K., & Kööpcke, H. (2004). Analysing Customer Churn in Insurance Data - A Case Study. Knowledge Discovery in Databases: PKDD 2004 (pp. 325--336). Springer.
[13]
Radlinski, F., Kurup, M., & Joachims, T. (2008). How does clickthrough data reflect retrieval quality? Proceedings of the ACM Conference on Information and Knowledge Management (CIKM) (pp. 43--52).
[14]
Scholz, M. (2005). Sampling-Based Sequential Subgroup Mining. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Databases (KDD'05) (pp. 265--274).
[15]
Srikant, R., Vu, Q., & Agrawal, R. (1997). Mining association rules with item constraints. Proceedings 3rd International Conference on Knowledge Discovery and Data Mining (pp. 66--73). AAAI Press.

Cited By

View all
  • (2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 26-May-2024
  • (2024)Learning to Rank Based on Choquet Integral: Application to Association RulesAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2242-6_25(313-326)Online publication date: 25-Apr-2024
  • (2024)Discovering and Ranking Urban Social Clusters Out of Streaming Social Media DatasetsConcurrency and Computation: Practice and Experience10.1002/cpe.8314Online publication date: 24-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 26-May-2024
  • (2024)Learning to Rank Based on Choquet Integral: Application to Association RulesAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2242-6_25(313-326)Online publication date: 25-Apr-2024
  • (2024)Discovering and Ranking Urban Social Clusters Out of Streaming Social Media DatasetsConcurrency and Computation: Practice and Experience10.1002/cpe.8314Online publication date: 24-Oct-2024
  • (2023)Robust and explainable identification of logical fallacies in natural language argumentsKnowledge-Based Systems10.1016/j.knosys.2023.110418266:COnline publication date: 22-Apr-2023
  • (2022)IISDKnowledge-Based Systems10.1016/j.knosys.2021.108080240:COnline publication date: 15-Mar-2022
  • (2021)Subgroup Preference Neural NetworkSensors10.3390/s2118610421:18(6104)Online publication date: 12-Sep-2021
  • (2020)Preservation of Anomalous Subgroups On Variational Autoencoder Transformed DataICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9054495(3627-3631)Online publication date: May-2020
  • (2019)User-driven geolocated event detection in social mediaIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2931340(1-1)Online publication date: 2019
  • (2017)Interactive Pattern Sampling for Characterizing Unlabeled DataAdvances in Intelligent Data Analysis XVI10.1007/978-3-319-68765-0_9(99-111)Online publication date: 4-Oct-2017
  • (2017)Two Decades of Pattern Mining: Principles and MethodsBusiness Intelligence10.1007/978-3-319-61164-8_3(59-78)Online publication date: 4-Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media