Article

Finding Latent Code Errors via Machine Learning over Program Executions

Authors:

Michael D. ErnstAuthors Info & Claims

ICSE '04: Proceedings of the 26th International Conference on Software Engineering

Pages 480 - 490

Published: 23 May 2004 Publication History

Abstract

This paper proposes a technique for identifying programproperties that indicate errors. The technique generates machinelearning models of program properties known to resultfrom errors, and applies these models to program propertiesof user-written code to classify and rank propertiesthat may lead the user to errors. Given a set of propertiesproduced by the program analysis, the technique selectssubset of properties that are most likely to reveal an error.An implementation, the Fault Invariant Classifier,demonstrates the efficacy of the technique. The implementationuses dynamic invariant detection to generate programproperties. It uses support vector machine and decision treelearning tools to classify those properties. In our experimentalevaluation, the technique increases the relevance(the concentration of fault-revealing properties) by a factorof 50 on average for the C programs, and 4.8 for the Javaprograms. Preliminary experience suggests that most of thefault-revealing properties do lead a programmer to an error.

References

[1]

{1} Y. Brun. Software fault identification via dynamic analysis and machine learning. Master's thesis, MIT Dept. of EECS, Aug. 16, 2003.

[2]

{2} N. Christianini and J. Shawe-Taylor. An Introduction To Support Vector Machines (and other kernel-based learning methods). Cambridge University Press, 2000.

Digital Library

[3]

{3} P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238-252, 1977.

Digital Library

[4]

{4} W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In ICSE, pages 339- 348, May 2001.

Digital Library

[5]

{5} M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):1-25, Feb. 2001.

Digital Library

[6]

{6} M. D. Ernst, A. Czeisler, W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE, pages 449-458, June 2000.

Digital Library

[7]

{7} Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In ICML, pages 148-156, July 1996.

Digital Library

[8]

{8} T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G. Rothermel. An empirical study of regression test selection techniques. ACM TOSEM, 10(2):184-208, Apr. 2001.

Digital Library

[9]

{9} A. Groce and W. Visser. What went wrong: Explaining counterexamples. In SPIN 2003, pages 121-135, May 2003.

Digital Library

[10]

{10} S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In ICSE, pages 291-301, May 2002.

Digital Library

[11]

{11} M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In ICSE, pages 60-71, May 2003.

Digital Library

[12]

{12} M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments on the effectiveness of dataflow- and control flow-based test adequacy criteria. In ICSE, pages 191-200, May 1994.

Digital Library

[13]

{13} T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. Smola, editors, Advances in Kernel Methods -- Support Vector Learning, chapter 11. MIT Press, Cambridge, MA, 1998.

Digital Library

[14]

{14} Y. Kataoka, M. D. Ernst, W. G. Griswold, and D. Notkin. Automated support for program refactoring using invariants. In ICSM, pages 736-743, Nov. 2001.

Digital Library

[15]

{15} A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE, pages 465-475, May 2003.

Digital Library

[16]

{16} F. Provost and P. Domingos. Tree induction for probability-based ranking. Machine Learning, 52(3):199-216, Sept. 2003.

Digital Library

[17]

{17} J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

Digital Library

[18]

{18} J. R. Quinlan. Information on See5 and C5.0. http:// www.rulequest.com/see5-info.html, Aug. 2003.

[19]

{19} R. M. Rifkin. Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, MIT Sloan School of Management, Sept. 2002.

Digital Library

[20]

{20} G. Rothermel and M. J. Harrold. Empirical studies of a safe regression test selection technique. IEEE TSE, 24(6):401- 419, June 1998.

Digital Library

[21]

{21} D. Saff and M. D. Ernst. Reducing wasted development time via continuous testing. In ISSRE, pages 281-292, Nov. 2003.

Digital Library

[22]

{22} G. Salton. Automatic Information Organization and Retrieval . McGraw-Hill, 1968.

Digital Library

[23]

{23} C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979.

Digital Library

[24]

{24} F. I. Vokolos and P. G. Frankl. Empirical evaluation of the textual differencing regression testing technique. In ICSM, pages 44-53, Nov. 1998.

Digital Library

[25]

{25} P. H. Winston. Artificial Intelligence. Addison-Wesley, third edition, 1992.

Digital Library

[26]

{26} Y. Xie and D. Engler. Using redundancies to find errors. In FSE, pages 51-60, Nov. 2002.

Digital Library

Cited By

Feng SYe YShi QCheng ZXu XCheng SChoi HZhang XFilkov VRay BZhou M(2024)ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695530(1620-1632)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695530
Dong DLiang Y(2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674426
Zhou XPeng XXie TSun JJi CLiu DXiang QHe CDumas MPfahl DApel SRusso A(2019)Latent error prediction and fault localization for microservice applications by learning from system trace logsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338961(683-694)Online publication date: 12-Aug-2019
https://dl.acm.org/doi/10.1145/3338906.3338961
Show More Cited By

Index Terms

Recommendations

A Systematic (16,8) Code for Correcting Double Errors and Detecting Triple-Adjacent Errors

A double error correcting systematic (16,8) quasi-cycle (QC) code that can detect all triple-adjacent errors within each 8-b byte is presented. This code is useful in computer memory applications where adjacent errors are more likely than random errors. ...
Predicting Buggy Code Clones through Machine Learning
CASCON '22: Proceedings of the 32nd Annual International Conference on Computer Science and Software Engineering
Code clones (similar code fragments in a code-base} often have negative impacts on the maintenance and evolution of software systems. According to the existing studies, code clones may contain bugs or inconsistencies that can cause an increased ...
A Systematic (12,8) Code for Correcting Single Errors and Detecting Adjacent Errors

A parity check matrix is given for a systematic (12,8) binary code which connects all single errors and detects eight of the nine double adjacent errors within any of the three 4-b nibbles. It is shown that no (12,8) binary systematic parity check code ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '04: Proceedings of the 26th International Conference on Software Engineering

May 2004

761 pages

ISBN:0769521630

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 May 2004

Check for updates

Qualifiers

Article

Conference

ICSE04

Sponsor:

SIGSOFT

ICSE04: 26th International Conference on Software Engineering

May 23 - 28, 2004

Acceptance Rates

ICSE '04 Paper Acceptance Rate 58 of 436 submissions, 13%;

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
734
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng SYe YShi QCheng ZXu XCheng SChoi HZhang XFilkov VRay BZhou M(2024)ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695530(1620-1632)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695530
Dong DLiang Y(2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674426
Zhou XPeng XXie TSun JJi CLiu DXiang QHe CDumas MPfahl DApel SRusso A(2019)Latent error prediction and fault localization for microservice applications by learning from system trace logsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338961(683-694)Online publication date: 12-Aug-2019
https://dl.acm.org/doi/10.1145/3338906.3338961
Amar ARigby PAtlee JBultan TWhittle J(2019)Mining historical test logs to predict bugs and localize faults in the test logsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00031(140-151)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICSE.2019.00031
Shrestha SPanda SCsallner CTichy WMinku L(2018)Complementing machine learning classifiers via dynamic symbolic executionProceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering10.1145/3194104.3194111(15-20)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3194104.3194111
Tatsi KKontogiannis KMindel MLyons KWigglesworth J(2017)Assisting developers towards fault localization by analyzing failure reportsProceedings of the 27th Annual International Conference on Computer Science and Software Engineering10.5555/3172795.3172803(56-65)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.5555/3172795.3172803
Jiang BWu YLi TChan WRosu GDi Penta MNguyen T(2017)SimplyDroid: efficient event sequence simplification for Android applicationProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155603(297-307)Online publication date: 30-Oct-2017
https://dl.acm.org/doi/10.5555/3155562.3155603
Yan HSui YChen SXue J(2017)Machine-Learning-Guided Typestate Analysis for Static Use-After-Free DetectionProceedings of the 33rd Annual Computer Security Applications Conference10.1145/3134600.3134620(42-54)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.1145/3134600.3134620
Katz DBultan TSen K(2017)Understanding intended behavior using models of low-level signalsProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3098237(424-427)Online publication date: 10-Jul-2017
https://dl.acm.org/doi/10.1145/3092703.3098237
Chen YYing MLiu DAlim AChen FChen MBultan TSen K(2017)Effective online software anomaly detectionProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3092730(136-146)Online publication date: 10-Jul-2017
https://dl.acm.org/doi/10.1145/3092703.3092730
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten