Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2970276.2970353acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

An empirical study on dependence clusters for effort-aware fault-proneness prediction

Published: 25 August 2016 Publication History

Abstract

A dependence cluster is a set of mutually inter-dependent program elements. Prior studies have found that large dependence clusters are prevalent in software systems. It has been suggested that dependence clusters have potentially harmful effects on software quality. However, little empirical evidence has been provided to support this claim. The study presented in this paper investigates the relationship between dependence clusters and software quality at the function-level with a focus on effort-aware fault-proneness prediction. The investigation first analyzes whether or not larger dependence clusters tend to be more fault-prone. Second, it investigates whether the proportion of faulty functions inside dependence clusters is significantly different from the proportion of faulty functions outside dependence clusters. Third, it examines whether or not functions inside dependence clusters playing a more important role than others are more fault-prone. Finally, based on two groups of functions (i.e., functions inside and outside dependence clusters), the investigation considers a segmented fault-proneness prediction model. Our experimental results, based on five well-known open-source systems, show that (1) larger dependence clusters tend to be more fault-prone; (2) the proportion of faulty functions inside dependence clusters is significantly larger than the proportion of faulty functions outside dependence clusters; (3) functions inside dependence clusters that play more important roles are more fault-prone; (4) our segmented prediction model can significantly improve the effectiveness of effort-aware fault-proneness prediction in both ranking and classification scenarios. These findings help us better understand how dependence clusters influence software quality.

References

[1]
C. Andersson and P. Runeson. A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems. IEEE Transactions on Software Engineering, 33(5):273–286, May 2007.
[2]
E. Arisholm, L. C. Briand, and E. B. Johannessen. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1):2–17, Jan. 2010.
[3]
V. Basili, L. Briand, and W. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering, 22(10):751–761, Oct. 1996.
[4]
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, Jan. 1995.
[5]
Á. Beszédes, L. Schrettner, B. Csaba, T. Gergely, J. Jász, and T. Gyimóthy. Empirical investigation of SEA-based dependence cluster properties. In Proceedings of the 2013 IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM ’12, pages 1–10, Sept. 2013.
[6]
Á. Beszédes, L. Schrettner, B. Csaba, T. Gergely, J. Jász, and T. Gyimóthy. Empirical Investigation of SEA-based Dependence Cluster Properties. Sci. Comput. Program., 105(C):3–25, July 2015.
[7]
D. Binkley, Á. Beszédes, S. Islam, J. Jász, and B. Vancsics. Uncovering dependence clusters and linchpin functions. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution, (ICSME’ 15, pages 141–150, Sept. 2015.
[8]
D. Binkley and M. Harman. Locating dependence clusters and dependence pollution. In Proceedings of the 21st IEEE International Conference on Software Maintenance, 2005. ICSM’05, pages 177–186, Sept. 2005.
[9]
D. Binkley and M. Harman. Identifying ‘Linchpin Vertices’ That Cause Large Dependence Clusters. In Proceedings of the 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM ’09, pages 89–98, Washington, DC, USA, 2009. IEEE Computer Society.
[10]
D. Binkley, M. Harman, Y. Hassoun, S. Islam, and Z. Li. Assessing the impact of global variables on program dependence and dependence clusters. Journal of Systems and Software, 83(1):96–107, Jan. 2010.
[11]
L. C. Briand, J. Wüst, J. W. Daly, and D. Victor Porter. Exploring the relationships between design measures and software quality in object-oriented systems. Journal of Systems and Software, 51(3):245–273, May 2000.
[12]
K. P. Burnham and D. R. Anderson. Multimodel Inference Understanding AIC and BIC in Model Selection. Sociological Methods & Research, 33(2):261–304, Nov. 2004.
[13]
M. Cataldo, A. Mockus, J. Roberts, and J. Herbsleb. Software Dependencies, Work Dependencies, and Their Impact on Failures. IEEE Transactions on Software Engineering, 35(6):864–878, Nov. 2009.
[14]
K. El Emam, S. Benlarbi, N. Goel, and S. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Software Engineering, 27(7):630–650, July 2001.
[15]
N. Fenton and N. Ohlsson. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering, 26(8):797–814, Aug. 2000.
[16]
H. Gall, K. Hajek, and M. Jazayeri. Detection of Logical Coupling Based on Product Release History. In Proceedings of the International Conference on Software Maintenance, ICSM ’98, pages 190–, Washington, DC, USA, 1998. IEEE Computer Society.
[17]
R. E. Grinter, J. D. Herbsleb, and D. E. Perry. The Geography of Coordination: Dealing with Distance in R&D Work. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work, GROUP ’99, pages 306–315, New York, NY, USA, 1999. ACM.
[18]
M. Harman, D. Binkley, K. Gallagher, N. Gold, and J. Krinke. Dependence Clusters in Source Code. ACM Trans. Program. Lang. Syst., 32(1):1:1–1:33, Nov. 2009.
[19]
J. D. Herbsleb and A. Mockus. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on Software Engineering, 29(6):481–494, June 2003.
[20]
S. Islam, J. Krinke, D. Binkley, and M. Harman. Coherent clusters in source code. Journal of Systems and Software, 88:1–24, Feb. 2014.
[21]
S. S. Islam, J. Krinke, D. Binkley, and M. Harman. Coherent Dependence Clusters. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE ’10, pages 53–60, New York, NY, USA, 2010. ACM.
[22]
J. M. Juran. Quality control handbook. In Quality control handbook. McGraw-Hill, 1962.
[23]
W. Ma, L. Chen, Y. Yang, Y. Zhou, and B. Xu. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology, 69:50–70, Jan. 2016.
[24]
T. Mende and R. Koschke. Effort-Aware Defect Prediction Models. In Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering, CSMR ’10, pages 107–116, Washington, DC, USA, 2010. IEEE Computer Society.
[25]
T. Menzies, J. Greenwald, and A. Frank. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1):2–13, Jan. 2007.
[26]
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 17(4):375–407, May 2010.
[27]
A. Mockus and D. M. Weiss. Predicting risk of software changes. Bell Labs Technical Journal, 5(2):169–180, Apr. 2000.
[28]
N. Nagappan and T. Ball. Use of Relative Code Churn Measures to Predict System Defect Density. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 284–292, New York, NY, USA, 2005. ACM.
[29]
N. Nagappan, B. Murphy, and V. Basili. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 521–530, New York, NY, USA, 2008. ACM.
[30]
R. Ott and M. Longnecker. An Introduction to Statistical Methods and Data Analysis. Cengage Learning, Dec. 2008.
[31]
T. D. Oyetoyan, D. S. Cruzes, and R. Conradi. A study of cyclic dependencies on defect profile of software components. Journal of Systems and Software, 86(12):3162–3182, Dec. 2013.
[32]
M. Pinzger, N. Nagappan, and B. Murphy. Can Developer-module Networks Predict Failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT ’08/FSE-16, pages 2–12, New York, NY, USA, 2008. ACM.
[33]
F. Rahman and P. Devanbu. How, and Why, Process Metrics Are Better. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 432–441, Piscataway, NJ, USA, 2013.
[34]
IEEE Press.
[35]
F. Rahman, D. Posnett, and P. Devanbu. Recalling the ”Imprecision” of Cross-project Defect Prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 61:1–61:11, New York, NY, USA, 2012. ACM.
[36]
J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek. Appropriate statistics for ordinal level data: Should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys. In annual meeting of the Florida Association of Institutional Research, pages 1–33, 2006.
[37]
D. J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition. CRC Press, Aug. 2003.
[38]
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, Nov. 1994.
[39]
Y. Yang, Y. Zhou, H. Lu, L. Chen, Z. Chen, B. Xu, H. Leung, and Z. Zhang. Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study. IEEE Transactions on Software Engineering, 41(4):331–357, Apr. 2015.
[40]
Y. Zhou, H. Leung, and B. Xu. Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness. IEEE Transactions on Software Engineering, 35(5):607–623, 2009.
[41]
Y. Zhou, B. Xu, H. Leung, and L. Chen. An In-depth Study of the Potentially Confounding Effect of Class Size in Fault Prediction. ACM Trans. Softw. Eng. Methodol., 23(1):10:1–10:51, Feb. 2014.
[42]
T. Zimmermann and N. Nagappan. Predicting Defects Using Network Analysis on Dependency Graphs. In Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 531–540, New York, NY, USA, 2008. ACM.
[43]
T. Zimmermann, R. Premraj, and A. Zeller. Predicting Defects for Eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, PROMISE ’07, pages 9–, Washington, DC, USA, 2007. IEEE Computer Society.

Cited By

View all
  • (2024)Risky Dynamic Typing-related Practices in Python: An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/364959333:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Machine Learning-based Models for Predicting Defective PackagesProceedings of the 2024 8th International Conference on Machine Learning and Soft Computing10.1145/3647750.3647755(25-31)Online publication date: 26-Jan-2024
  • (2024)Bug numbers matter: An empirical study of effort‐aware defect prediction using class labels versus bug numbersSoftware: Practice and Experience10.1002/spe.336355:1(49-78)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
August 2016
899 pages
ISBN:9781450338455
DOI:10.1145/2970276
  • General Chair:
  • David Lo,
  • Program Chairs:
  • Sven Apel,
  • Sarfraz Khurshid
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dependence clusters
  2. fault prediction
  3. fault-proneness
  4. network analysis

Qualifiers

  • Research-article

Funding Sources

Conference

ASE'16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Risky Dynamic Typing-related Practices in Python: An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/364959333:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Machine Learning-based Models for Predicting Defective PackagesProceedings of the 2024 8th International Conference on Machine Learning and Soft Computing10.1145/3647750.3647755(25-31)Online publication date: 26-Jan-2024
  • (2024)Bug numbers matter: An empirical study of effort‐aware defect prediction using class labels versus bug numbersSoftware: Practice and Experience10.1002/spe.336355:1(49-78)Online publication date: 10-Jul-2024
  • (2023)The Impact of the bug number on Effort-Aware Defect Prediction: An Empirical StudyProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609458(67-78)Online publication date: 4-Aug-2023
  • (2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
  • (2023)Investigating the Impact of Bug Dependencies on Bug-Fixing Time Prediction2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304804(1-12)Online publication date: 26-Oct-2023
  • (2023)Cross‐version defect prediction using threshold‐based active learningJournal of Software: Evolution and Process10.1002/smr.2563Online publication date: 2-Apr-2023
  • (2022)Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect PredictionIEEE Transactions on Software Engineering10.1109/TSE.2020.300173948:3(786-802)Online publication date: 1-Mar-2022
  • (2022)Effort-Aware Just-in-Time Bug Prediction for Mobile Apps Via Cross-Triplet Deep Feature EmbeddingIEEE Transactions on Reliability10.1109/TR.2021.306617071:1(204-220)Online publication date: Mar-2022
  • (2022)Development effort estimation in free/open source software from activity in version control systemsEmpirical Software Engineering10.1007/s10664-022-10166-x27:6Online publication date: 1-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media