Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3239576.3239607acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicaipConference Proceedingsconference-collections
research-article

Feature-Grouping-Based Two Steps Feature Selection Algorithm in Software Defect Prediction

Published: 16 June 2018 Publication History

Abstract

In order to improve the effect of software defect prediction, many algorithms including feature selection, have been proposed. Based on Wrapper and Filter hybrid framework, a feature-grouping-based feature selection algorithm is proposed in this paper. The algorithm is composed of two steps. In the first step, in order to remove the redundant features, we group the features according to the redundancy between the features. The symmetry uncertainty is used as the constant indicator of the correlation and the FCBF-based grouping algorithm is used to group the features. In the second step, a subset of the features are selected from each group to form the final subset of features. Many classical methods select the representative feature from each group. We consider that when the number of intra-group features is large, the representative features are not enough to reflect the information in this group. Therefore, we require that at least one feature be selected within each group, in this step, the PSO algorithm is used for Searching Randomly from each group. We tested on the open source NASA and PROMISE data sets. Using three kinds of classifier. Compared to the other methods tested in this article, our method resulted in 90% improvement in the predictive performance of 30 sets of results on 10 data sets. Compared with the algorithms without feature selection, the AUC values of this method in the Logistic regression, Naive Bayesian, and K-neighbor classifiers are improved by 5.94% and 4.69% And 8.05%. The FCBF algorithm can also be regarded as a kind of first performing feature grouping. Compared with the FCBF algorithm, the AUC values of this method are improved by 4.78%, 6.41% and 4.4% on the basis of Logistic regression, Naive Bayes and K-neighbor. We can also see that for the FCBF-based grouping algorithm, it could be better to choose a characteristic cloud from each group than to choose a representative one.

References

[1]
Bertolino A. Software Testing Research: Achievements, Challenges, Dreams{C}// Future of Software Engineering. IEEE, 2007:85--103.
[2]
Mullen R E, Gokhale S S. Software Defect Rediscoveries: A Discrete Lognormal Model{C}// IEEE International Symposium on Software Reliability Engineering. IEEE, 2005:10 pp.
[3]
Mcdermid J. Software Hazard and Safety Analysis{C}// International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems: Co-Sponsored by Ifip Wg. Springer-Verlag, 2002:23--36.
[4]
Eberhart R, Kennedy J. A new optimizer using particle swarm theory{C}// International Symposium on MICRO Machine and Human Science. IEEE, 2002:39--43
[5]
Compton B T, Withrow C. Prediction and control of ADA software defects{J}. Journal of Systems & Software, 1990, 12(3):199--207.
[6]
Shu L M, Beijing, Beijing. Software Defect Prediction{J}. Journal of Software, 2008, 19(7).
[7]
Chen X, Gu Q, Liu WS, Liu SL, Ni C. Survey of static software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2016, 27(1):1--25(in Chinese).
[8]
Shepperd M, Song Q, Sun Z, et al. Data Quality: Some Comments on the NASA Software Defect Datasets{J}. IEEE Transactions on Software Engineering, 2013, 39(9):1208--1215.
[9]
Jureczko M, Madeyski L. Towards identifying software project clusters with regard to defect prediction{C}// International Conference on Predictive MODELS in Software Engineering. ACM, 2010:9.
[10]
Wahono R S. A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks{J}. Journal of Software Engineering, 2015, 1(1).
[11]
Khoshgoftaar T M, Gao K, Napolitano A, et al. A comparative study of iterative and non-iterative feature selection techniques for software defect prediction{J}. Information Systems Frontiers, 2014, 16(5):801--822.
[12]
Taghi M. Khoshgoftaar, Kehan Gao, Amri Napolitano. An Empirical Study of Feature Ranking Techniques for Software Quality Prediction{J}. International journal of software engineering and knowledge engineering, 2012, (2):161--183.
[13]
Shi Y, Eberhart R. Modified particle swarm optimizer{C}// IEEE International Conference on Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence. IEEE Xplore, 1999:69--73.
[14]
Song Q, Ni J, Wang G. A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data{J}. IEEE Transactions on Knowledge & Data Engineering, 2012, 25(1):1--14.
[15]
Liu S, Chen X, Liu W, et al. FECAR: A Feature Selection Framework for Software Defect Prediction{C}// Computer Software and Applications Conference. IEEE, 2014:426--435.
[16]
Yu L, Liu H. Efficient Feature Selection via Analysis of Relevance and Redundancy{J}. Journal of Machine Learning Research, 2004, 5(12):1205--1224.
[17]
Koller D, Sahami M. Toward optimal feature selection{C}// Thirteenth International Conference on International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 1996:284--292.
[18]
Yu L, Liu H. Redundancy based feature selection for microarray data{C}// Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004:737--742.

Cited By

View all
  • (2023)Analysis of Feature Selection Methods in Software Defect Prediction ModelsIEEE Access10.1109/ACCESS.2023.334324911(145954-145974)Online publication date: 2023
  • (2023)Software defect prediction model based on improved twin support vector machinesSoft Computing10.1007/s00500-023-07984-627:21(16101-16110)Online publication date: 1-Apr-2023
  • (2022)Data quality issues in software fault prediction: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-022-10371-656:8(7839-7908)Online publication date: 21-Dec-2022
  • Show More Cited By

Index Terms

  1. Feature-Grouping-Based Two Steps Feature Selection Algorithm in Software Defect Prediction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICAIP '18: Proceedings of the 2nd International Conference on Advances in Image Processing
    June 2018
    261 pages
    ISBN:9781450364607
    DOI:10.1145/3239576
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Electronic Science and Technology of China: University of Electronic Science and Technology of China
    • Southwest Jiaotong University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FCBF-based grouping algorithm
    2. Feature grouping
    3. Intra-group feature selection
    4. PSO
    5. Software defect prediction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICAIP '18

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Analysis of Feature Selection Methods in Software Defect Prediction ModelsIEEE Access10.1109/ACCESS.2023.334324911(145954-145974)Online publication date: 2023
    • (2023)Software defect prediction model based on improved twin support vector machinesSoft Computing10.1007/s00500-023-07984-627:21(16101-16110)Online publication date: 1-Apr-2023
    • (2022)Data quality issues in software fault prediction: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-022-10371-656:8(7839-7908)Online publication date: 21-Dec-2022
    • (2021)Regression in Estimation of Software Attributes: A Systematic Literature Review2021 9th International Conference in Software Engineering Research and Innovation (CONISOFT)10.1109/CONISOFT52520.2021.00019(54-60)Online publication date: Oct-2021
    • (2019)A Study on Software Metric Selection for Software Fault Prediction2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)10.1109/ICMLA.2019.00176(1045-1050)Online publication date: Dec-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media