Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Feature selection with dynamic mutual information

Published: 01 July 2009 Publication History

Abstract

Feature selection plays an important role in data mining and pattern recognition, especially for large scale data. During past years, various metrics have been proposed to measure the relevance between different features. Since mutual information is nonlinear and can effectively represent the dependencies of features, it is one of widely used measurements in feature selection. Just owing to these, many promising feature selection algorithms based on mutual information with different parameters have been developed. In this paper, at first a general criterion function about mutual information in feature selector is introduced, which can bring most information measurements in previous algorithms together. In traditional selectors, mutual information is estimated on the whole sampling space. This, however, cannot exactly represent the relevance among features. To cope with this problem, the second purpose of this paper is to propose a new feature selection algorithm based on dynamic mutual information, which is only estimated on unlabeled instances. To verify the effectiveness of our method, several experiments are carried out on sixteen UCI datasets using four typical classifiers. The experimental results indicate that our algorithm achieved better results than other methods in most cases.

References

[1]
Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., From data mining to knowledge discovery in databases. AI Magazine. v17. 37-54.
[2]
Lindenbaum, M., Markovitch, S. and Rusakov, D., Selective sampling for nearest neighbor classifiers. Machine Learning. v54. 125-152.
[3]
Schein, A.I. and Ungar, L.H., Active learning for logistic regression: an evaluation. Machine Learning. v68. 235-265.
[4]
M.A. Hall, Correlation-based feature subset selection for machine learning, Ph.D. Dissertation, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1999
[5]
I.K. Fodor, A survey of dimension reduction techniques, Technical Report UCRL-ID-148494, Lawrence Livermore National Laboratory, US Department of Energy, 2002.
[6]
Bellman, R., Adaptive Control Processes: A Guided Tour. 1961. Princeton University Press, Princeton.
[7]
Kohavi, R. and John, G.H., Wrappers for feature subset selection. Artificial Intelligence. v97 i1-2. 273-324.
[8]
Forman, G., An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research. v3. 1289-1305.
[9]
Dy, J.G., Brodley, C.E., Kak, A.C., Broderick, L.S. and Asien, A.M., Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Transactions on Pattern Analysis and Machine Intelligence. v25 i3. 373-378.
[10]
Swets, D.L. and Weng, J.J., Efficient content-based image retrieval using automatic feature selection. In: IEEE International Symposium on Computer Vision, pp. 85-90.
[11]
Saeys, Y., Inza, I. and Larrañaga, P., A review of feature selection techniques in bioinformatics. Bioinformatics. v23 i19. 2507-2517.
[12]
Xing, E., Jordan, M. and Karp, R., Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 8th International Conference on Machine Learning, pp. 601-608.
[13]
Lee, W., Stolfo, S.J. and Mok, K.W., Adaptive intrusion detection: a data mining approach. AI Review. v14 i6. 533-567.
[14]
Guyon, I. and Elisseeff, A., An introduction to variable and feature selection. Journal of Machine Learning Research. v3. 1157-1182.
[15]
Blum, A.L. and Langley, P., Selection of relevant features and examples in machine learning. Artificial Intelligence. v97. 245-271.
[16]
Liu, H. and Yu, L., Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. v17 i4. 491-502.
[17]
Quinlan, R., C4.5: Programs for Machine Learning. 1993. Morgan Kaufmann Publishers, San Mateo, CA.
[18]
K. Kira, L. Rendell, A practical approach to feature selection, in: Proceedings of the 9th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, 1992, pp. 249-256.
[19]
Dash, M. and Liu, H., Feature selection for classification. Intelligent Data Analysis: An International Journal. v1 i3. 131-156.
[20]
Qu, G., Hariri, S. and Yousif, M., A new dependency and correlation analysis for features. IEEE Transactions on Knowledge and Data Engineering. v17 i9. 1199-1207.
[21]
Dash, M. and Liu, H., Consistency-based search in feature selection. Artificial Intelligence. v151. 155-176.
[22]
Kononenko, I., Estimating attributes: analysis and extensions of relief. In: Proceedings of the 11th European Conference on Machine Learning, Springer, Berlin. pp. 171-182.
[23]
Robnik-Sikonja, M. and Kononenko, I., Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning. v53. 23-69.
[24]
Liu, H., Motoda, H. and Yu, A., A selective sampling approach to active feature selection. Artificial Intelligence. v159 i1-2. 49-74.
[25]
Liang, J., Yang, S. and Winstanley, A., Invariant optimal feature selection: a distance discriminant and feature ranking based solution. Pattern Recognition. v41 i5. 1429-1439.
[26]
Devijver, P.A. and Kittler, J., Pattern Recognition-A Statistical Approach. 1992. Prentice-Hall, London.
[27]
Mitra, P., Murthy, C.A. and Pal, S.K., Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence. v24 i3. 301-312.
[28]
Bishop, C.M., Neural Networks for Pattern Recognition. 1995. Oxford University Press, Oxford.
[29]
Zhang, D., Chen, S. and Zhou, Z.-H, Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognition. v41 i5. 1440-1451.
[30]
Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning about Data. 1991. Kluwer Academic Publishers, Dordrecht.
[31]
Bazan, J.G., A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision table. In: Polkowski, L., Skowron, A. (Eds.), Rough Sets in Knowledge Discovery, Physica-Verlag, Heidelberg. pp. 321-365.
[32]
Yu, L. and Liu, H., Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research. v5. 1205-1224.
[33]
Amaldi, E. and Kann, V., On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science. v209 i1-2. 237-260.
[34]
Bell, D.A. and Wang, H., A formalism for relevance and its application in feature subset selection. Machine Learning. v41. 175-195.
[35]
Somol, P., Pudil, P. and Kittler, J., Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence. v26 i7. 900-912.
[36]
Huang, J., Cai, Y. and Xu, X., A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters. v28. 1825-1844.
[37]
Neumann, J., Schnorr, C. and Steidl, G., Combined SVM-based feature selection and classification. Machine Learning. v61. 129-150.
[38]
Dasgupta, A., Drineas, P., Harb, B., Josifovski, V. and Mahnoney, M.W., Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 230-239.
[39]
Gadat, S. and Younes, L., A stochastic algorithm for feature selection in pattern recognition. Journal of Machine Learning Research. v8. 509-547.
[40]
Dy, J.G. and Brodley, C.E., Feature selection for unsupervised learning. Journal of Machine Learning Research. v5. 845-889.
[41]
Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th international Conference on Machine Learning, Corvalis, Oregon, 2007, pp. 1151-1157.
[42]
S. Wu, P.A. Flach, Feature selection with labelled and unlabelled data, in: ECML/PKDD'02 workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, 2002, pp. 156-167.
[43]
X. Zhu, Semi-supervised learning literature survey, Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, WI, 2007 {http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf}.
[44]
Song, Y., Nie, F., Zhang, C. and Xiang, S., A unified framework for semi-supervised dimensionality reduction. Pattern Recognition. v41 i9. 2789-2799.
[45]
C.T.A. Traina, Jr., L. Wu, C. Faloutsos, Fast feature selection using the fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil, 2000, pp. 158-171.
[46]
Bhavani, S.D., Rani, T.S. and Bapi, R.S., Feature selection using correlation fractal dimension: issues and applications in binary classification problems. Applied Soft Computing. v8. 555-563.
[47]
Elaine P. Sousa, Caetano Traina, Jr., Agma J. Traina, Leejay Wu, Christos Faloutsos, A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery 14 (2007) 367-407.
[48]
Cover, T.M. and Thomas, J.A., Elements of Information Theory. 1991. Wiley, New York.
[49]
John, G.H., Kohavi, R. and Pfleger, K., Irrelevant feature and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, CA. pp. 121-129.
[50]
Fleuret, F., Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research. v5. 1531-1555.
[51]
Wang, G., Lochovsky, F.H. and Yang, Q., Feature selection with conditional mutual information maximin in text categorization. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, ACM, Washington, DC, USA. pp. 342-349.
[52]
Al-Ani, A., Deriche, M. and Chebil, J., A new mutual information based measure for feature selection. Intelligent Data Analysis. v7 i1. 43-57.
[53]
Battiti, R., Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks. v5 i4. 537-550.
[54]
Peng, H., Long, F. and Ding, C., Feature selection based on mutual information: criteria of max-dependency, max-relevance and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. v27 i8. 1226-1238.
[55]
Kwak, N. and Choi, C.-H., Input feature selection by mutual information based on Parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence. v24 i12. 1667-1671.
[56]
Huang, D. and Chow, T.W.S., Effective feature selection scheme using mutual information. Neurocomputing. v63. 325-343.
[57]
Hall, M.A. and Holmes, G., Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering. v15 i3. 1041-4347.
[58]
C.L. Blake, C.J. Merz, UCI repository of machine learning databases, Available from: {http://www.ics.uci.edu/~mlearn/MLRepository.html}, Department of Information and Computer Science, University of California, Irvine, 1998.
[59]
Fayyad, U.M. and Irani, K.B., Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022-1027.
[60]
John, G.H. and Langley, P., Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338-345.
[61]
Aha, D. and Kibler, D., Instance-based learning algorithms. Machine Learning. v6. 37-66.
[62]
Cohen, W.W., Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 115-123.
[63]
Witten, I.H. and Frank, E., Data Mining-Practical Machine Learning Tools and Techniques with JAVA Implementations. 2005. second ed. Morgan Kaufmann Publishers, Los Altos, CA.

Cited By

View all
  • (2024)Gaussian Mutual Information Maximization for Efficient Graph Self-Supervised Learning: Bridging Contrastive-based to Decorrelation-basedProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680682(1612-1621)Online publication date: 28-Oct-2024
  • (2024)Attribute Reduction Based on Fuzzy Distinguishable Pair Metric Considering Redundancy Upper and Lower BoundsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.339470932:8(4364-4375)Online publication date: 1-Aug-2024
  • (2024)Ensemble effort estimation for novice agile teamsInformation and Software Technology10.1016/j.infsof.2024.107447170:COnline publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition
Pattern Recognition  Volume 42, Issue 7
July, 2009
430 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 July 2009

Author Tags

  1. Classification
  2. Feature selection
  3. Filter method
  4. Mutual information

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Gaussian Mutual Information Maximization for Efficient Graph Self-Supervised Learning: Bridging Contrastive-based to Decorrelation-basedProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680682(1612-1621)Online publication date: 28-Oct-2024
  • (2024)Attribute Reduction Based on Fuzzy Distinguishable Pair Metric Considering Redundancy Upper and Lower BoundsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.339470932:8(4364-4375)Online publication date: 1-Aug-2024
  • (2024)Ensemble effort estimation for novice agile teamsInformation and Software Technology10.1016/j.infsof.2024.107447170:COnline publication date: 1-Jun-2024
  • (2024)Feature selection using a sinusoidal sequence combined with mutual informationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107168126:PDOnline publication date: 27-Feb-2024
  • (2023)A novel filter feature selection method for text classificationJournal of Information Science10.1177/016555152199103749:1(59-78)Online publication date: 1-Feb-2023
  • (2022)Feature selection via uncorrelated discriminant sparse regression for multimedia analysisMultimedia Tools and Applications10.1007/s11042-022-13258-482:1(619-647)Online publication date: 9-Jun-2022
  • (2022)Locality sensitive hashing with bit selectionApplied Intelligence10.1007/s10489-022-03546-952:13(14724-14738)Online publication date: 1-Oct-2022
  • (2021)Redundancy Coefficient Gradual Up-weighting-based Mutual Information Feature Selection technique for Crypto-ransomware early detectionFuture Generation Computer Systems10.1016/j.future.2020.10.002115:C(641-658)Online publication date: 1-Feb-2021
  • (2021)Solving feature selection problems by combining mutation and crossover operations with the monarch butterfly optimization algorithmApplied Intelligence10.1007/s10489-020-01981-051:6(4058-4081)Online publication date: 1-Jun-2021
  • (2021)Novel self-adjusted particle swarm optimization algorithm for feature selectionComputing10.1007/s00607-020-00891-w103:8(1569-1597)Online publication date: 1-Aug-2021
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media