Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A three-stage transfer learning framework for multi-source cross-project software defect prediction

Published: 01 October 2022 Publication History

Highlights

We propose a three-stage transfer learning framework for multi-source cross-project software defect prediction.
The issues of source projects selection and multi-source utilization are considered in our method.
Our method has an overall better performance than other multi-source and single-source CPDP methods.
Our method shows a comparable prediction performance compared to a within-project defect prediction method.
Our method outperforms two baseline unsupervised methods from a comprehensive perspective.

Abstract

Context

Transfer learning techniques have been proved to be effective in the field of Cross-project defect prediction (CPDP). However, some questions still remain. First, the conditional distribution difference between source and target projects has not been considered. Second, facing multiple source projects, most studies only rarely consider the issues of source selection and multi-source data utilization; instead, they use all available projects and merge multi-source data together to obtain one final dataset.

Objective

To address these issues, in this paper, we propose a three-stage weighting framework for multi-source transfer learning (3SW-MSTL) in CPDP. In stage 1, a source selection strategy is needed to select a suitable number of source projects from all available projects. In stage 2, a transfer technique is applied to minimize marginal differences. In stage 3, a multi-source data utilization scheme that uses conditional distribution information is needed to help guide researchers in the use of multi-source transferred data.

Method

First, we have designed five source selection strategies and four multi-source utilization schemes and chosen the best one to be used in stage 1 and 3 in 3SW-MSTL by comparing their influences on prediction performance. Second, to validate the performance of 3SW-MSTL, we compared it with four multi-source and six single-source CPDP methods, a baseline within-project defect prediction (WPDP) method, and two unsupervised methods on the data from 30 widely used open-source projects.

Results

Through experiments, bellwether and weighted vote are separately chosen as a source selection strategy and a multi-source utilization scheme used in 3SW-MSTL. And, our results indicate that 3SW-MSTL outperforms four multi-source, six single-source CPDP methods and two unsupervised methods. And, 3SW-MSTL is comparable to the WPDP method.

Conclusion

The proposed 3SW-MSTL model is more effective for considering the two issues mentioned before.

References

[1]
T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic literature review on fault prediction performance in software engineering, IEEE Trans. Softw. Eng. 38 (2012) 1276–1304,.
[2]
M. Ruchika, A systematic review of machine learning techniques for software fault prediction, Appl. Soft. Comput. 27 (2015) 504–518,.
[3]
Y. Xiao, W. Man, Y. Jian, K.E. Bennin, M. Fu, C. Ma, Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning, Soft Comput. 22 (2018) 3461–3472,.
[4]
P. He, B. Li, X. Liu, J. Chen, Y. Ma, An empirical study on software defect prediction with a simplified metric set, Inf. Softw. Technol. 59 (2015) 170–190,.
[5]
T.J. Ostrand, E.J. Weyuker, R.M. Bell, Predicting the location and number of faults in large software systems, IEEE Trans. Softw. Eng. 31 (2005) 340–355,.
[6]
T. Menzies, J. Greenwald, A. Frank, Data mining static code attributes to learn defect predictors, IEEE Trans. Softw. Eng. 33 (2007) 637–640,.
[7]
R. Moser, W. Pedrycz, G. Succi, A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, in: 2008 30th International Conference on Software Engineering(ICSE), 2008, pp. 181–190,.
[8]
C. Catal, B. Diri, Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem, Inf. Sci. 179 (2009) 1040–1058,.
[9]
M. Shepperd, D. Bowes, T. Hall, Researcher bias: the use of machine learning in software defect prediction, IEEE Trans. Softw. Eng. 42 (2014),. 1-1.
[10]
L.C. Briand, W.L. Melo, J. Wust, Assessing the applicability of fault-proneness models across object-oriented software projects, IEEE Trans. Softw. Eng. 28 (2002) 706–720,.
[11]
S. Watanabe, H. Kaiya, K. Kaijiri, S. Watanabe, H. Kaiya, Adapting a fault prediction model to allow inter languagereuse, in: The 4th International Workshop on Predictor Models in Software Engineering, 2008, pp. 19–24,.
[12]
T. Zimmerman, N. Nagappan, H.C. Gall, E. Giger, B. Murphy, Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, in: The 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009, pp. 91–100,.
[13]
B. Turhan, T. Menzies, A.B. Bener, J. Di Stefano, On the relative value of cross-company and within-company data for defect prediction, Empir. Softw. Eng. 14 (2009) 540–578,.
[14]
S. Herbold, Training data selection for cross-project defect prediction, in: The 9th International Conference on Predictive Models in Software Engineering, 2013, pp. 1–10,.
[15]
T. Fukushima, Y. Kamei, S. Mcintosh, K. Yamashita, N. Ubayashi, An empirical study of just-in-time defect prediction using cross-project models, in: The 11th Working Conference on Mining Software Repositories, 2014, pp. 172–181,.
[16]
Y. Chen, X. Ding, Research on cross-project software defect prediction based on transfer learning, AIP Conf. Proc. 1955 (2018),.
[17]
Y. Ma, G. Luo, X. Zeng, A. Chen, Transfer learning for cross-company software defect prediction, Inf. Softw. Technol. 54 (2012) 248–256,.
[18]
J. Nam, S.J. Pan, S. Kim, Transfer defect learning, in: The 35th International Conference on Software Engineering, 2013, pp. 382–391,.
[19]
D. Ryu, O. Choi, J. Baik, Value-cognitive boosting with a support vector machine for cross-project defect prediction, Empir. Softw. Eng. 21 (2016) 43–71,.
[20]
X. Jing, F. Wu, X. Dong, B. Xu, An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems, IEEE Trans. Softw. Eng. 43 (2017) 321–339,.
[21]
L. Gong, S. Jiang, L. Jiang, An improved transfer adaptive boosting approach for mixed-project defect prediction, J. Softw.-Evol. Proc. 31 (2019),. e2172.
[22]
S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 (2010) 1345–1359,.
[23]
Y. Sun, X.-.Y. Jing, F. Wu, Y. Sun, Selective pseudo-labeling based subspace learning for cross-project defect prediction, IEICE Trans. Inf. Syst. E103.D (2020) 2003–2006,.
[24]
S.J. Pan, I.W. Tsang, J.T. Kwok, Y. Qiang, Domain adaptation via transfer component analysis, IEEE Trans. Neural Netw. 22 (2011) 199–210,.
[25]
K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of transfer learning, J. Big Data 3 (2016) 9,.
[26]
S. Herbold, A. Trautsch, J. Grabowski, A comparative study to benchmark cross-project defect prediction approaches, IEEE Trans. Softw. Eng. 44 (2018) 811–833,.
[27]
C. Liu, D. Yang, X. Xia, M. Yan, X. Zhang, A two-phase transfer learning model for cross-project defect prediction, Inf. Softw. Technol. 107 (2019) 125–136,.
[28]
W. Wen, B. Zhang, X. Gu, X. Ju, An empirical study on combining source selection and transfer learning for cross-project defect prediction, in: 2019 IEEE 1st International Workshop on Intelligent Bug Fixing, 2019, pp. 29–38,.
[29]
Y. Yi, G. Doretto, Boosting for transfer learning with multiple sources, in: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 1855–1862,.
[30]
Q. Sun, R. Chattopadhyay, S. Panchanathan, J. Ye, A two-stage weighting framework for multi-source domain adaptation, in: the 24th International Conference on Neural Information Processing Systems, Curran Associates, Inc, Granada, Spain, 2011, pp. 505–513.
[31]
S. Amasaki, Cross-version defect prediction: use historical data, cross-project data, or both?, Empir. Softw. Eng. 25 (2020) 1573–1595,.
[32]
W. Dai, Q. Yang, G.R. Xue, Y. Yu, Boosting for transfer learning, in: The 24th International Conference on Machine Learning, 2007, pp. 193–200,.
[33]
K. Zhu, N. Zhang, S. Ying, X. Wang, Within-project and cross-project software defect prediction based on improved transfer naive bayes algorithm, Comput., Mater. Continua 63 (2020) 891–910.
[34]
J. Chen, K. Hu, Y. Yang, Y. Liu, Q. Xuan, Collective transfer learning for defect prediction, Neurocomputing 416 (2020) 103–116,.
[35]
L.N. Gong, S.J. Jiang, L.L. Bo, L. Jiang, J.Y. Qian, A novel class-imbalance learning approach for both within-project and cross-project defect prediction, IEEE Trans. Reliab. 69 (2020) 40–54,.
[36]
K. Shi, Y. Lu, G. Liu, Z. Wei, J. Chang, MPT-embedding: an unsupervised representation learning of code for software defect prediction, J. Softw.-Evol. Proc. 33 (2020),. e2330.
[37]
S. Wang, T. Liu, J. Nam, L. Tan, Deep semantic feature learning for software defect prediction, IEEE Trans. Softw. Eng. 46 (2020) 1267–1293,.
[38]
Z. He, F. Shu, Y. Yang, M. Li, Q. Wang, An investigation on the feasibility of cross-project defect prediction, Automat. Softw. Eng. 19 (2012) 167–199,.
[39]
M. Jureczko, L. Madeyski, Towards identifying software project clusters with regard to defect prediction, in: The 6th International Conference on Predictive Models in Software Engineering, 2010, pp. 1–10,.
[40]
Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, A.E. Hassan, Studying just-in-time defect prediction using cross-project models, Empir. Softw. Eng. 21 (2016) 2072–2106,.
[41]
R. Krishna, T. Menzies, W. Fu, Too much automation? the bellwether effect and its implications for transfer learning, in: The 31st IEEE/ACM International Conference on Automated Software Engineering, 2016, pp. 122–131,.
[42]
R. Krishna, T. Menzies, Bellwethers: a baseline method for transfer learning, IEEE Trans. Softw. Eng. 45 (2018) 1081–1105,.
[43]
J. Huang, A.J. Smola, A. Gretton, K.M. Borgwardt, B. Scholkopf, Correcting sample selection bias by unlabeled data, in: The 19th International Conference on Neural Information Processing Systems, MIT Press, Vancouver, British Columbia, Canada, 2006, pp. 601–608.
[44]
C.K.I Williams, Learning with kernels: support vector machines, regularization, optimization, and beyond, 98 (2003) 489. https://doi.org/10.1198/jasa.2003.s269.
[45]
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1997) 119–139,.
[46]
M. Shepperd, Q. Song, Z. Sun, C. Mair, Data quality: some comments on the NASA software defect datasets, IEEE Trans. Softw. Eng. 39 (2013) 1208–1215,.
[47]
Q. Yu, S. Jiang, Y. Zhang, A feature matching and transfer approach for cross-company defect prediction, J. Syst. Softw. 132 (2017) 366–378,.
[48]
M. D'Ambros, M. Lanza, R. Robbes, An extensive comparison of bug prediction approaches, in: The 7th IEEE Working Conference on Mining Software Repositories, 2010, pp. 31–41,.
[49]
R. Wu, H. Zhang, S. Kim, S.C. Cheung, ReLink: recovering links between bugs and changes, in: The 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 2011, pp. 15–25,.
[50]
K. Gao, T.M. Khoshgoftaar, A. Napolitano, Combining feature subset selection and data sampling for coping with highly imbalanced software data, in: the 27th International Conference on Software Engineering and Knowledge Engineering, 2015. Pittsburgh, PA, USA https://doi.org/10.18293/SEKE2015-182.
[51]
M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one sided selection, in: the Fourteenth International Conference on Machine Learning, 1997, pp.179-186. Nashville, Tennessee, USA.
[52]
H. He, E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (2009) 1263–1284,.
[53]
M. Harman, S. Islam, Y. Jia, L.L. Minku, F. Sarro, K. Srivisut, Less is more: temporal fault predictive performance over multiple hadoop releases, Le Goues C., Yoo S. (Eds) Search-Based Software Engineering. SSBSE, Springer, Cham, 2014, p. 8636,. Lecture Notes in Computer Science.
[54]
N. Limsettho, K.E. Bennin, J.W. Keung, H. Hata, Cross project defect prediction using class distribution estimation and oversampling, Inf. Softw. Technol. 100 (2018) 87–102,.
[55]
J. Jiarpakdee, C. Tantithamthavorn, C. Treude, The impact of automated feature selection techniques on the interpretation of defect models, Empir. Softw. Eng. 25 (2020) 3590–3638,.
[56]
C. Tantithamthavorn, S. McIntosh, A.E. Hassan, K. Matsumoto, The impact of automated parameter optimization on defect prediction models, IEEE Trans. Soft. Eng. 45 (2019) 683–711,.
[57]
D. Gray, D. Bowes, N. Davey, Y. Sun, B. Christianson, Using the support vector machine as a classification method for software defect prediction with static code metrics, D. Palmer-Brown, C. Draganova, E. Pimenidis, H. Mouratidis (Eds) Engineering Applications of Neural Networks. EANN 2009. Communications in Computer and Information Science, 43, Springer, Berlin, Heidelberg, 2009, pp. 223–234,.
[58]
F. Wilcoxon, Individual comparisons by ranking methods, Biometrics 1 (1945) 196–202,.
[59]
J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1–30.
[60]
N. Cliff, Dominance statistics: ordinal analyses to answer ordinal questions, Psychol. Bull. 114 (1993) 494–509,.
[61]
X. Chen, Y. Zhao, Q. Wang, Z. Yuan, MULTI: multi-objective effort-aware just-in-time software defect prediction, Inf. Softw. Technol. 93 (2018) 1–13,.
[62]
O.J. Dunn, Multiple comparisons among means, J. Am. Stat. Assoc. 56 (1961) 52–64,.
[63]
T. Zimmermann, N. Nagappan, Predicting defects using network analysis on dependency graphs, in: The 30th International Conference on Software Engineering, 2008, pp. 531–540,.
[64]
Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu, L. Xu, B. Xu, H. Leung, Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models, in: 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 157–168,.
[65]
Y. Zhou, Y. Yang, H. Lu, L. Chen, Y. Li, Y. Zhao, J. Qian, B. Xu, How far we have progressed in the journey? an examination of cross-project defect prediction, ACM Trans. Softw. Eng. Methodol. 27 (2018) 1–51,.
[66]
S. Hosseini, B. Turhan, D. Gunarathna, A systematic literature review and meta-analysis on cross project defect prediction, IEEE Trans. Softw. Eng. 45 (2019) 111–147,.
[67]
D. Ryu, J.I. Jang, J. Baik, A hybrid instance selection using nearest-neighbor for cross-project defect prediction, J. Comput. Sci. Technol. 30 (2015) 969–980,.
[68]
S. Feng, J. Keung, X. Yu, Y. Xiao, K.E. Bennin, M.A. Kabir, M. Zhang, COSTE: complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction, Inf. Softw. Technol. 129 (2021),.
[69]
Q. Huang, X. Xia, D. Lo, Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction, Empir. Softw. Eng. 24 (2019) 2823–2862,.

Cited By

View all
  • (2024)Improving transfer learning for software cross-project defect predictionApplied Intelligence10.1007/s10489-024-05459-154:7(5593-5616)Online publication date: 1-Apr-2024
  • (2024)Cross-Project Software Defect Prediction Based on Feature Selection and Knowledge DistillationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5594-3_12(137-149)Online publication date: 5-Aug-2024
  • (2024)A novel defect prediction method based on semantic feature enhancementJournal of Software: Evolution and Process10.1002/smr.267436:9Online publication date: 16-Sep-2024

Index Terms

  1. A three-stage transfer learning framework for multi-source cross-project software defect prediction
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Information and Software Technology
    Information and Software Technology  Volume 150, Issue C
    Oct 2022
    295 pages

    Publisher

    Butterworth-Heinemann

    United States

    Publication History

    Published: 01 October 2022

    Author Tags

    1. Transfer learning
    2. Cross-project defect prediction
    3. Source selection
    4. Multi-source utilization
    5. 3SW-MSTL

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving transfer learning for software cross-project defect predictionApplied Intelligence10.1007/s10489-024-05459-154:7(5593-5616)Online publication date: 1-Apr-2024
    • (2024)Cross-Project Software Defect Prediction Based on Feature Selection and Knowledge DistillationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5594-3_12(137-149)Online publication date: 5-Aug-2024
    • (2024)A novel defect prediction method based on semantic feature enhancementJournal of Software: Evolution and Process10.1002/smr.267436:9Online publication date: 16-Sep-2024
    • (2023)Revisiting ‘revisiting supervised methods for effort‐aware cross‐project defect prediction’IET Software10.1049/sfw2.1213317:4(472-495)Online publication date: 27-Jun-2023

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media