Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2532443.2532461acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article
Open access

Empirical studies on feature selection for software fault prediction

Published: 23 October 2013 Publication History

Abstract

Classification based software fault prediction methods aim to classify the modules into either fault-prone or non-fault-prone. Feature selection is a preprocess step used to improve the data quality. However most of previous research mainly focus on feature relevance analysis, there is little work focusing on feature redundancy analysis. Therefore we propose a two-stage framework for feature selection to solve this issue. In particular, during the feature relevance phase, we adopt three different relevance measures to obtain the relevant feature subset. Then during the feature redundancy analysis phase, we use a cluster-based method to eliminate redundant features. To verify the effectiveness of our proposed framework, we choose typical real-world software projects, including Eclipse projects and NASA software project KC1. Final empirical result shows the effectiveness of our proposed framework.

References

[1]
T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2--13, 2007.
[2]
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking classification models for software defect prediction: A proposed framework and novel findings," IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 485--496, 2008.
[3]
Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, "A general software defect-proneness prediction framework," IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 356--370, 2011.
[4]
P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Addison-Wesley Longman Publishing Company, 2005.
[5]
K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, "Choosing software metrics for defect prediction: an investigation on feature selection techniques," Software Practice and Experience, vol. 41, no. 5, pp. 579--606, 2011.
[6]
H. Wang, T. Khoshgoftaar, and A. Napolitano, "A comparative study of ensemble feature selection techniques for software defect prediction," in Proceedings of International Conference on Machine Learning and Applications, 2010, pp. 135--140.
[7]
D. Rodriguez, R. Ruiz, J. Cuadrado-Gallego, and J. Aguilar-Ruiz, "Detecting fault modules applying feature selection to classifiers," in Proceedings of International Conference on Information Reuse and Integration, 2007, pp. 667--672.
[8]
M. A. Hall, Correlation-based feature selection for machine learning. The University of Waikato, 1999.
[9]
L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in Proceedings of the International Conferencce on Machine Learning, 2003, pp. 856--863.
[10]
A. C. Cameron and P. K. Trivedi, Regression Analysis of Count Data, 2nd edition. Cambridge University Press, 1998.
[11]
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical recipes in C. Cambridge University Press, 1988.

Cited By

View all
  • (2022)Data quality issues in software fault prediction: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-022-10371-656:8(7839-7908)Online publication date: 21-Dec-2022
  • (2019)A fuzzy-filtered neuro-fuzzy framework for software fault prediction for inter-version and inter-project evaluationApplied Soft Computing10.1016/j.asoc.2019.02.00877(696-713)Online publication date: Apr-2019
  • (2018)Software Bug Prediction Prototype Using Bayesian Network Classifier: A Comprehensive ModelProcedia Computer Science10.1016/j.procs.2018.05.071132(1412-1421)Online publication date: 2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '13: Proceedings of the 5th Asia-Pacific Symposium on Internetware
October 2013
211 pages
ISBN:9781450323697
DOI:10.1145/2532443
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • NJU: Nanjing University
  • CCF: China Computer Federation
  • Chinese Academy of Sciences

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature selection
  2. redundancy analysis
  3. relevance analysis
  4. software fault prediction

Qualifiers

  • Research-article

Funding Sources

  • University Natural Science Research Project of Jiangsu Province
  • Nantong Application Research Plan
  • National Natural Science Foundation of China
  • Open Project of State Key Laboratory for Novel Software Technology at Nanjing University

Conference

Internetware '13
Sponsor:
  • NJU
  • CCF

Acceptance Rates

Internetware '13 Paper Acceptance Rate 15 of 50 submissions, 30%;
Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)15
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Data quality issues in software fault prediction: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-022-10371-656:8(7839-7908)Online publication date: 21-Dec-2022
  • (2019)A fuzzy-filtered neuro-fuzzy framework for software fault prediction for inter-version and inter-project evaluationApplied Soft Computing10.1016/j.asoc.2019.02.00877(696-713)Online publication date: Apr-2019
  • (2018)Software Bug Prediction Prototype Using Bayesian Network Classifier: A Comprehensive ModelProcedia Computer Science10.1016/j.procs.2018.05.071132(1412-1421)Online publication date: 2018
  • (2018)An ANN Based Approach for Software Fault Prediction Using Object Oriented MetricsAdvanced Informatics for Computing Research10.1007/978-981-13-3140-4_31(341-354)Online publication date: 12-Dec-2018
  • (2016)Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault PredictionIEEE Transactions on Reliability10.1109/TR.2015.246167665:1(38-53)Online publication date: Mar-2016
  • (2016)An Investigation of Essential Topics on Software Fault-Proneness Prediction2016 International Symposium on System and Software Reliability (ISSSR)10.1109/ISSSR.2016.016(37-46)Online publication date: Oct-2016
  • (2016)Iterative software fault prediction with a hybrid approachApplied Soft Computing10.1016/j.asoc.2016.08.02549:C(1020-1033)Online publication date: 1-Dec-2016
  • (2014)A Two-Stage Data Preprocessing Approach for Software Fault Prediction2014 Eighth International Conference on Software Security and Reliability10.1109/SERE.2014.15(20-29)Online publication date: Jun-2014

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media