research-article

A three-stage transfer learning framework for multi-source cross-project software defect prediction

Authors:

Jiaojiao Bai,

Jingdong Jia,

Luiz Fernando CapretzAuthors Info & Claims

Volume 150, Issue C

https://doi.org/10.1016/j.infsof.2022.106985

Published: 01 October 2022 Publication History

Highlights

•

We propose a three-stage transfer learning framework for multi-source cross-project software defect prediction.

•

The issues of source projects selection and multi-source utilization are considered in our method.

•

Our method has an overall better performance than other multi-source and single-source CPDP methods.

•

Our method shows a comparable prediction performance compared to a within-project defect prediction method.

•

Our method outperforms two baseline unsupervised methods from a comprehensive perspective.

Abstract

Context

Transfer learning techniques have been proved to be effective in the field of Cross-project defect prediction (CPDP). However, some questions still remain. First, the conditional distribution difference between source and target projects has not been considered. Second, facing multiple source projects, most studies only rarely consider the issues of source selection and multi-source data utilization; instead, they use all available projects and merge multi-source data together to obtain one final dataset.

Objective

To address these issues, in this paper, we propose a three-stage weighting framework for multi-source transfer learning (3SW-MSTL) in CPDP. In stage 1, a source selection strategy is needed to select a suitable number of source projects from all available projects. In stage 2, a transfer technique is applied to minimize marginal differences. In stage 3, a multi-source data utilization scheme that uses conditional distribution information is needed to help guide researchers in the use of multi-source transferred data.

Method

First, we have designed five source selection strategies and four multi-source utilization schemes and chosen the best one to be used in stage 1 and 3 in 3SW-MSTL by comparing their influences on prediction performance. Second, to validate the performance of 3SW-MSTL, we compared it with four multi-source and six single-source CPDP methods, a baseline within-project defect prediction (WPDP) method, and two unsupervised methods on the data from 30 widely used open-source projects.

Results

Through experiments, bellwether and weighted vote are separately chosen as a source selection strategy and a multi-source utilization scheme used in 3SW-MSTL. And, our results indicate that 3SW-MSTL outperforms four multi-source, six single-source CPDP methods and two unsupervised methods. And, 3SW-MSTL is comparable to the WPDP method.

Conclusion

The proposed 3SW-MSTL model is more effective for considering the two issues mentioned before.

References

[1]

T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic literature review on fault prediction performance in software engineering, IEEE Trans. Softw. Eng. 38 (2012) 1276–1304,.

Highlights

Abstract

Context

Objective

Method

Results

Conclusion

References

Cited By

Index Terms

Recommendations

Cross-Project Software Defect Prediction Using Feature-Based Transfer Learning

Source selection and transfer defect learning based cross-project defect prediction

Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations