Abstract
In this paper, we propose an efficient and robust approach for semi-supervised feature selection, based on the constrained Laplacian score. The main drawback of this method is the choice of the scant supervision information, represented by pairwise constraints. In fact, constraints are proven to have some noise which may deteriorate learning performance. In this work, we try to override any negative effects of constraint set by the variation of their sources. This is achieved by an ensemble technique using both a resampling of data (bagging) and a random subspace strategy. Experiments on high-dimensional datasets are provided for validating the proposed approach and comparing it with other representative feature selection methods.
Similar content being viewed by others
References
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Barkia H, Elghazel H, Aussem A (2011) Semi-supervised feature importance evaluation with ensemble learning. In: IEEE ICDM, pp 31–40
Benabdeslem K, Hindawi M (2011) Constrained laplacian score for semi-supervised feature selection. In: Proceedings of ECML-PKDD conference, pp 204–218
Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143
Frank A, Asuncion A (2010) UCI machine learning repository. Available at http://archive.ics.uci.edu/ml
Breiman L (1996) Bagging predictors. Mach Learn 26(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms. McGraw-Hill Higher Education, New York
Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of ECML/PKDD
Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30
Dietterich T (2000) Ensemble methods in machine learning. In: First international workshop on multiple classifier systems, pp 1–15
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley Interscience, New York
Dy J, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
Elghazel H, Aussem A (2015) Unsupervised feature selection with ensemble learning. Mach Learn 98(1–2):157–180
Freund Y, Shapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning, pp 276–280
Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Series studies in fuzziness and soft computing. Physica-Verlag, Springer, Berlin
Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 17:507–514
Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection based semi-supervised feature selection. In: Proceedings of IEEE ICDM, pp 1080–1085
Hindawi M, Elghazel H, Benabdeslem K (2013) Efficient semi-supervised feature selection by an ensemble approach. In: COPEM@ECML/PKDD. International workshop on complex machine learning problems with ensemble methods, pp 41–55
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Hong Y, Kwong S, Chang Y, Ren Q (2008) Consensus unsupervised feature ranking from multiple views. Pattern Recognit Lett 29(5):595–602
Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit 41(9):2742–2756
Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665
Kohonen T (2001) Self organizing map. Springer, Berlin
Kuncheva LI (2007) A stability index for feature selection. In: Artificial intelligence and applications, pp 421–427
Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern 37(6):1088–1098
Saeys Y, Abeel T, de Peer YV (2008) Robust feature selection using ensemble feature selection techniques. In: ECML/PKDD (2), pp 313–325
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recognit 43:2106–2118
Sun Y, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626
Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of international conference on machine leaning, pp 856–863
Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451
Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of SIAM data mining (SDM), pp 641–646
Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A (2010) Advancing feature selection research—ASU feature selection repository. TR-10-007
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
This work represents an extension of our recently presented idea on the workshop Copem@ECML/PKDD’13 [23].
Rights and permissions
About this article
Cite this article
Benabdeslem, K., Elghazel, H. & Hindawi, M. Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection. Knowl Inf Syst 49, 1161–1185 (2016). https://doi.org/10.1007/s10115-015-0901-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0901-0