research-article

Feature selection for support vector regression using probabilistic prediction

Authors:

Chong-Jin OngAuthors Info & Claims

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 343 - 352

https://doi.org/10.1145/1835804.1835849

Published: 25 July 2010 Publication History

Abstract

This paper presents a novel wrapper-based feature selection method for Support Vector Regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiment shows that the proposed method generally performs better, and at least as well as the existing methods, with notable advantage when the data set is sparse.

Supplementary Material

JPG File (kdd2010_yang_fssv_01.jpg)

Download
8.53 KB

MOV File (kdd2010_yang_fssv_01.mov)

Download
97.48 MB

References

[1]

A.Asuncion and D.J.Newman. UCI machine learning repository, 2007.

[2]

C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, November 1995.

Digital Library

[3]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.

[4]

W. Chu, S. S. Keerthi, and C. J. Ong. Bayesian support vector regression using a unified loss function. IEEE Transactions on Neural Networks, 15:29--44, 2004.

Digital Library

[5]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, November 2000.

Digital Library

[6]

J. H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19(1):1--67, 1991.

[7]

J. B. Gao, S. R. Gunn, C. J. Harris, and M. Brown. A probabilistic framework for SVM regression and error bar estimation. Machine Learning, 46(1-3):71--89, 2002.

Digital Library

[8]

O. Gualdr2on, J. Brezmes, E. Llobet, A. Amari, X. Vilanova, B. Bouchikhi, and X. Correig. Variable selection for support vector machine based multisensor systems. Sensors and Actuators B: Chemical, 122:259--268, March 2007.

[9]

I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, 2003.

Digital Library

[10]

I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer Verlag, August 2006.

Digital Library

[11]

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389--422, 2002.

Digital Library

[12]

T. Joachims. Making large-Scale SVM Learning Practical., chapter In B. Scholkopf, C. Burges and A. Smola (Eds), Advances in kernel methods: Support Vector Learning. MIT Press, 1998.

Digital Library

[13]

Y. Kim and J. Kim. Gradient lasso for feature selection. In Proceedings of the 21st International Conference on Machine Learning, pages 60--67, 2004.

Digital Library

[14]

M. H. Law and J. T. Kwok. Bayesian support vector regression. In In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, pages 239--244, 2001.

[15]

C. J. Lin and R. C. Weng. Simple probabilistic predictions for support vector regression. Technical report, Department of Cmputer Science, National Taiwan University, 2004.

[16]

D. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720--736, 1992.

Digital Library

[17]

A. Navot, L. Shpigelman, N. Tishby, and E. Vaadia. Nearest neighbor based feature selection for regression and its application to neural activity. In Y. Weiss, B. Scholkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 995--1002, Cambridge, MA, 2006. MIT Press.

[18]

A. Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In ICML '04: Proceedings of the twenty-first International Conference on Machine learning, pages 78--85, New York, NY, USA, 2004.ACM.

Digital Library

[19]

J. C. Platt. Using sparseness and analytic QP to speed training of support vector machines, chapter In M.S. Kearns, S.A. Solla and D. A. Cohn (Eds), Advances in Neural Information Processing Systems, 11. Cambridge, MIT Press, 1998.

Digital Library

[20]

A. Rakotomamonjy. Variable selection using SVM-based criteria. Journal of Machine Learning Research, 3:1357--1320, 2003.

[21]

K. Q. Shen, C. J. Ong, X. P. Li, and E. P. Wilder-Smith. Feature selection via sensitivity analysis of SVM probabilistic outputs. Machine Learning, 70(1):1--20, 2008.

Digital Library

[22]

L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt. Supervised feature selection via dependence estimation. In ICML '07: Proceedings of the 24th international conference on Machine learning, pages 823--830, New York, NY, USA, 2007. ACM.

Digital Library

[23]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1):267--288, 1996.

[24]

V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, September 1998.

[25]

J. B. Yang, K. Q. Shen, C. J. Ong, and X. P. Li. Feature selection for mlp neural network: The use of random permutation of probabilistic outputs. IEEE Transactions on Neural Network, 20(12):1911 -- 1922, December 2009.

Digital Library

Cited By

Zeng XZhen ZHe JHan L(2018)A feature selection approach based on sensitivity of RBFNNsNeurocomputing10.1016/j.neucom.2017.10.055275:C(2200-2208)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.10.055
Lins IDroguett EMoura MZio EJacinto C(2015)Computing confidence and prediction intervals of industrial equipment degradation by bootstrapped support vector regressionReliability Engineering & System Safety10.1016/j.ress.2015.01.007137(120-128)Online publication date: May-2015
https://doi.org/10.1016/j.ress.2015.01.007
Wen GJiang LWen J(2014)Multiple perceptual neighborhoods-based feature construction for pattern classificationNeurocomputing10.1016/j.neucom.2014.04.007142(499-507)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.neucom.2014.04.007
Show More Cited By

Index Terms

Feature selection for support vector regression using probabilistic prediction
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A novel feature selection method for twin support vector machine

Both support vector machine (SVM) and twin support vector machine (TWSVM) are powerful classification tools. However, in contrast to many SVM-based feature selection methods, TWSVM has not any corresponding one due to its different mechanism up to now. ...
Robust Lp-norm least squares support vector regression with feature selection

Lp-norm least squares support vector regression (Lp-LSSVR) is proposed for feature selection in regression.Using the absolute constraint and the Lp-norm regularization term, Lp-LSSVR performs robust against outliers.Lp-LSSVR ensures the useful features ...
Feature Selection for MLP Neural Network: The Use of Random Permutation of Probabilistic Outputs

This paper presents a new wrapper-based feature selection method for multilayer perceptron (MLP) neural networks. It uses a feature ranking criterion to measure the importance of a feature by computing the aggregate difference, over the feature space, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

July 2010

1240 pages

ISBN:9781450300551

DOI:10.1145/1835804

General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '10

Sponsor:

KDD '10: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 25 - 28, 2010

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
685
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zeng XZhen ZHe JHan L(2018)A feature selection approach based on sensitivity of RBFNNsNeurocomputing10.1016/j.neucom.2017.10.055275:C(2200-2208)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.10.055
Lins IDroguett EMoura MZio EJacinto C(2015)Computing confidence and prediction intervals of industrial equipment degradation by bootstrapped support vector regressionReliability Engineering & System Safety10.1016/j.ress.2015.01.007137(120-128)Online publication date: May-2015
https://doi.org/10.1016/j.ress.2015.01.007
Wen GJiang LWen J(2014)Multiple perceptual neighborhoods-based feature construction for pattern classificationNeurocomputing10.1016/j.neucom.2014.04.007142(499-507)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.neucom.2014.04.007
Canini LBenini SLeonardi R(2013)Affective Recommendation of Movies Based on Selected Connotative FeaturesIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2012.221193523:4(636-647)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1109/TCSVT.2012.2211935
Chen LZhang YSong ZMiao Z(2013)Automatic web services classification based on rough set theoryJournal of Central South University10.1007/s11771-013-1787-120:10(2708-2714)Online publication date: 22-Oct-2013
https://doi.org/10.1007/s11771-013-1787-1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten