Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1273496.1273600acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Supervised feature selection via dependence estimation

Published: 20 June 2007 Publication History

Abstract

We introduce a framework for filtering features that employs the Hilbert-Schmidt Independence Criterion (HSIC) as a measure of dependence between the features and the labels. The key idea is that good features should maximise such dependence. Feature selection for various supervised learning problems (including classification and regression) is unified under this framework, and the solutions can be approximated using a backward-elimination algorithm. We demonstrate the usefulness of our method on both artificial and real world datasets.

References

[1]
Baker, C. (1973). Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186, 273--289.
[2]
Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., & Smola, A. J. (2006). Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics (ISMB), 22(14), e49--e57.
[3]
Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2003). On optimizing kernel alignment. Tech. rep., UC Davis Department of Statistics.
[4]
Dornhege, G., Blankertz, B., Curio, G., & Müüller, K. (2004). Boosting bit rates in non-invasive EEG singletrial classifications by feature combination and multiclass paradigms. IEEE Trans. Biomed. Eng., 51, 993--1002.
[5]
Dornhege, G., Blankertz, B., Krauledat, M., Losch, F., Curio, G., & Müüller, K. (2006). Optimizing spatio-temporal filters for improving BCI. In NIPS, vol. 18.
[6]
Fukumizu, K., Bach, F. R., & Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. JMLR, 5, 73--99.
[7]
Gretton, A., Bousquet, O., Smola, A., & Schöölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In ALT, 63--78.
[8]
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157--1182.
[9]
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389--422.
[10]
Kira, K., & Rendell, L. (1992). A practical approach to feature selection. In Proc. 9th Intl. Workshop on Machine Learning, 249--256.
[11]
Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In ICML, 284--292.
[12]
Lemm, S., Blankertz, B., Curio, G., & Müülller, K.-R. (2005). Spatio-spectral filters for improving the classification of single trial EEG. IEEE Trans. Biomed. Eng., 52, 1541--1548.
[13]
Nemenman, I., Shafee, F., & Bialek, W. (2002). Entropy and inference, revisited. In NIPS, vol. 14.
[14]
Neumann, J., Schnörr, C., & Steidl, G. (2005). Combined SVM-based feature selection and classification. Machine Learning, 61, 129--150.
[15]
Schölkopf, B., & Smola, A. (2002). Learning with Kernels. Cambridge, MA: MIT Press.
[16]
Serfling, R. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley.
[17]
Song, L., Smola, A., Gretton, A., Borgwardt, K., & Bedo, J. (2007). Feature selection for supervised learning using Hilbert-Schmidt Independence Criterion. Tech. rep., NICTA, ANU.
[18]
Steinwart, I. (2002). On the influence of the kernel on the consistency of svms. JMLR, 2, 67--93.
[19]
Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of zero-norm with linear models and kernel methods. JMLR, 3, 1439--1461.
[20]
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. In NIPS, vol. 13.
[21]
Zaffalon, M., & Hutter, M. (2002). Robust feature selection using distributions of mutual information. In UAI.

Cited By

View all
  • (2024)Contextual feature selection with conditional stochastic gatesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693956(46375-46392)Online publication date: 21-Jul-2024
  • (2024)PHSIC against random consistency and its application in causal inferenceProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/233(2108-2116)Online publication date: 3-Aug-2024
  • (2024)Two-View Image Semantic Cooperative Nonorthogonal Transmission in Distributed Edge NetworksInternational Journal of Intelligent Systems10.1155/int/50810172024Online publication date: 1-Jan-2024
  • Show More Cited By
  1. Supervised feature selection via dependence estimation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '07: Proceedings of the 24th international conference on Machine learning
    June 2007
    1233 pages
    ISBN:9781595937933
    DOI:10.1145/1273496
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Machine Learning Journal

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ICML '07 & ILP '07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Contextual feature selection with conditional stochastic gatesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693956(46375-46392)Online publication date: 21-Jul-2024
    • (2024)PHSIC against random consistency and its application in causal inferenceProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/233(2108-2116)Online publication date: 3-Aug-2024
    • (2024)Two-View Image Semantic Cooperative Nonorthogonal Transmission in Distributed Edge NetworksInternational Journal of Intelligent Systems10.1155/int/50810172024Online publication date: 1-Jan-2024
    • (2024)A preprocessing Shapley value-based approach to detect relevant and disparity prone features in machine learningProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658905(279-289)Online publication date: 3-Jun-2024
    • (2024)MIRFuse: an infrared and visible image fusion model based on disentanglement representation via mutual information regularizationJournal of Electronic Imaging10.1117/1.JEI.33.2.02300533:02Online publication date: 1-Mar-2024
    • (2024)Generalizing Graph Neural Networks on Out-of-Distribution GraphsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332109746:1(322-337)Online publication date: Jan-2024
    • (2024)Exploring Large-scale Financial Knowledge Graph for SMEs Supply Chain MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3317631(1-12)Online publication date: 2024
    • (2024)Multiple Collaboration Preserving Projection for Monitoring of Complex Industrial ProcessIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.333021173(1-9)Online publication date: 2024
    • (2024)A Multimodal Sentiment Analysis Method Based on Fuzzy Attention FusionIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.343461432:10(5886-5898)Online publication date: 1-Oct-2024
    • (2024)Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic RepresentationsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.343556134:12(12360-12375)Online publication date: Dec-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media