Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3625834.3625854guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Efficient learning of minimax risk classifiers in high dimensions

Published: 31 July 2023 Publication History

Abstract

High-dimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands. In such scenarios, the large number of features often leads to inefficient learning. Constraint generation methods have recently enabled efficient learning of L1-regularized support vector machines (SVMs). In this paper, we leverage such methods to obtain an efficient learning algorithm for the recently proposed minimax risk classifiers (MRCs). The proposed iterative algorithm also provides a sequence of worst-case error probabilities and performs feature selection. Experiments on multiple high-dimensional datasets show that the proposed algorithm is efficient in high-dimensional scenarios. In addition, the worst-case error probability provides useful information about the classifier performance, and the features selected by the algorithm are competitive with the state-of-the-art.

References

[1]
Dimitris Bertsimas and John N. Tsitsiklis. Introduction to Linear Optimization, volume 6. Athena Scientific, Belmont, MA, 1997.
[2]
Kartheek Bondugula, Verónica Álvarez, Jose I. Segovia-Martin, Aritz Pérez, and Santiago Mazuelas. MRCpy: A library for minimax risk classifiers. arXiv preprint, arXiv:2108.01952, 2023.
[3]
Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13:27-66, 2012.
[4]
Michael Celentano and Andrea Montanari. Fundamental barriers to high-dimensional regression with convex penalties. The Annals of Statistics, 50(1):170-196, 2022.
[5]
Antoine Dedieu, Rahul Mazumder, and Haoyue Wang. Solving L1-regularized SVMs and related linear programs: Revisiting the effectiveness of column and constraint generation. Journal of Machine Learning Research, 23:1-41, 2022.
[6]
Jacques Desrosiers and Marco E. Lübbecke. A primer in column generation. Springer, 2005.
[7]
Chris Ding and Hanchuan Peng. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3: 185-205, 2005.
[8]
Dheeru Dua and Casey Graff. UCI Machine Learning Repository, 2017. URL http://archive.ics.uci.edu/ml.
[9]
Debopriya Ghosh and Javier Cabrera. Enriched random forest for high dimensional genomic data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19:2817-2828, 2022.
[10]
Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003.
[11]
Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46: 389-422, 2002.
[12]
Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S Sathiya Keerthi, and Sellamanickam Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, pages 408-415, 2008.
[13]
Kwangmoo Koh, Seung-Jean Kim, and Stephen Boyd. An interior-point method for large-scale L1-regularized logistic regression. Journal of Machine learning research, 8:1519-1555, 2007.
[14]
Yong Liu, Shizhong Liao, Hailun Lin, Yinliang Yue, and Weiping Wang. Infinite kernel learning: generalization bounds and algorithms. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017.
[15]
Santiago Mazuelas, Andrea Zanoni, and Aritz Perez. Minimax classification with 0-1 loss and performance guarantees. In Advances in Neural Information Processing Systems, volume 33, pages 302-312, 2020.
[16]
Santiago Mazuelas, Yuan Shen, and Aritz Pérez. Generalized maximum entropy for supervised classification. IEEE Transactions on Information Theory, 68:2530-2550, 2022.
[17]
Santiago Mazuelas, Mauricio Romero, and Peter Grünwald. Minimax risk classifiers with 0-1 loss. arXiv preprint, arXiv:2201.06487, 2023.
[18]
Francesca Mignacco, Florent Krzakala, Yue Lu, Pierfrancesco Urbani, and Lenka Zdeborova. The role of regularization in classification of high-dimensional noisy gaussian mixture. In Proceedings of the 37th International Conference on Machine Learning, pages 6874-6883, 2020.
[19]
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, Cambridge, MA, second edition, 2018.
[20]
Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27:1226-1238, 2005.
[21]
Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems, volume 20, pages 1177-1184, 2008.
[22]
Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning, pages 807-814, 2007.
[23]
Jianing Shi, Wotao Yin, Stanley Osher, and Paul Sajda. A fast hybrid algorithm for large-scale L1-regularized logistic regression. Journal of Machine Learning Research, 11:713-741, 2010.
[24]
Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531-1565, 2006.
[25]
Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, and Ryan J Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74:245-266, 2012.
[26]
Andrius Vabalas, Emma Gowen, Ellen Poliakoff, and Alexander J Casson. Machine learning algorithm validation with a limited sample size. PloS One, 14, 2019.
[27]
Gaël Varoquaux. Cross-validation failure: Small sample sizes lead to large error bars. Neuroimage, 180:68-77, 2018.
[28]
Hsiang-Fu Yu, Fang-Lan Huang, and Chih-Jen Lin. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning, 85:41-75, 2011.
[29]
Guo-Xun Yuan, Chia-Hua Ho, and Chih-Jen Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100:2584-2603, 2012.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
UAI '23: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
July 2023
2617 pages

Publisher

JMLR.org

Publication History

Published: 31 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media