Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/998675.999452acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Finding Latent Code Errors via Machine Learning over Program Executions

Published: 23 May 2004 Publication History

Abstract

This paper proposes a technique for identifying programproperties that indicate errors. The technique generates machinelearning models of program properties known to resultfrom errors, and applies these models to program propertiesof user-written code to classify and rank propertiesthat may lead the user to errors. Given a set of propertiesproduced by the program analysis, the technique selectssubset of properties that are most likely to reveal an error.An implementation, the Fault Invariant Classifier,demonstrates the efficacy of the technique. The implementationuses dynamic invariant detection to generate programproperties. It uses support vector machine and decision treelearning tools to classify those properties. In our experimentalevaluation, the technique increases the relevance(the concentration of fault-revealing properties) by a factorof 50 on average for the C programs, and 4.8 for the Javaprograms. Preliminary experience suggests that most of thefault-revealing properties do lead a programmer to an error.

References

[1]
{1} Y. Brun. Software fault identification via dynamic analysis and machine learning. Master's thesis, MIT Dept. of EECS, Aug. 16, 2003.
[2]
{2} N. Christianini and J. Shawe-Taylor. An Introduction To Support Vector Machines (and other kernel-based learning methods). Cambridge University Press, 2000.
[3]
{3} P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238-252, 1977.
[4]
{4} W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In ICSE, pages 339- 348, May 2001.
[5]
{5} M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):1-25, Feb. 2001.
[6]
{6} M. D. Ernst, A. Czeisler, W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE, pages 449-458, June 2000.
[7]
{7} Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In ICML, pages 148-156, July 1996.
[8]
{8} T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G. Rothermel. An empirical study of regression test selection techniques. ACM TOSEM, 10(2):184-208, Apr. 2001.
[9]
{9} A. Groce and W. Visser. What went wrong: Explaining counterexamples. In SPIN 2003, pages 121-135, May 2003.
[10]
{10} S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In ICSE, pages 291-301, May 2002.
[11]
{11} M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In ICSE, pages 60-71, May 2003.
[12]
{12} M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments on the effectiveness of dataflow- and control flow-based test adequacy criteria. In ICSE, pages 191-200, May 1994.
[13]
{13} T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. Smola, editors, Advances in Kernel Methods -- Support Vector Learning, chapter 11. MIT Press, Cambridge, MA, 1998.
[14]
{14} Y. Kataoka, M. D. Ernst, W. G. Griswold, and D. Notkin. Automated support for program refactoring using invariants. In ICSM, pages 736-743, Nov. 2001.
[15]
{15} A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE, pages 465-475, May 2003.
[16]
{16} F. Provost and P. Domingos. Tree induction for probability-based ranking. Machine Learning, 52(3):199-216, Sept. 2003.
[17]
{17} J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[18]
{18} J. R. Quinlan. Information on See5 and C5.0. http:// www.rulequest.com/see5-info.html, Aug. 2003.
[19]
{19} R. M. Rifkin. Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, MIT Sloan School of Management, Sept. 2002.
[20]
{20} G. Rothermel and M. J. Harrold. Empirical studies of a safe regression test selection technique. IEEE TSE, 24(6):401- 419, June 1998.
[21]
{21} D. Saff and M. D. Ernst. Reducing wasted development time via continuous testing. In ISSRE, pages 281-292, Nov. 2003.
[22]
{22} G. Salton. Automatic Information Organization and Retrieval . McGraw-Hill, 1968.
[23]
{23} C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979.
[24]
{24} F. I. Vokolos and P. G. Frankl. Empirical evaluation of the textual differencing regression testing technique. In ICSM, pages 44-53, Nov. 1998.
[25]
{25} P. H. Winston. Artificial Intelligence. Addison-Wesley, third edition, 1992.
[26]
{26} Y. Xie and D. Engler. Using redundancies to find errors. In FSE, pages 51-60, Nov. 2002.

Cited By

View all
  • (2024)ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695530(1620-1632)Online publication date: 27-Oct-2024
  • (2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
  • (2019)Latent error prediction and fault localization for microservice applications by learning from system trace logsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338961(683-694)Online publication date: 12-Aug-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '04: Proceedings of the 26th International Conference on Software Engineering
May 2004
761 pages
ISBN:0769521630

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 May 2004

Check for updates

Qualifiers

  • Article

Conference

ICSE04
Sponsor:

Acceptance Rates

ICSE '04 Paper Acceptance Rate 58 of 436 submissions, 13%;
Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695530(1620-1632)Online publication date: 27-Oct-2024
  • (2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
  • (2019)Latent error prediction and fault localization for microservice applications by learning from system trace logsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338961(683-694)Online publication date: 12-Aug-2019
  • (2019)Mining historical test logs to predict bugs and localize faults in the test logsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00031(140-151)Online publication date: 25-May-2019
  • (2018)Complementing machine learning classifiers via dynamic symbolic executionProceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering10.1145/3194104.3194111(15-20)Online publication date: 28-May-2018
  • (2017)Assisting developers towards fault localization by analyzing failure reportsProceedings of the 27th Annual International Conference on Computer Science and Software Engineering10.5555/3172795.3172803(56-65)Online publication date: 6-Nov-2017
  • (2017)SimplyDroid: efficient event sequence simplification for Android applicationProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155603(297-307)Online publication date: 30-Oct-2017
  • (2017)Machine-Learning-Guided Typestate Analysis for Static Use-After-Free DetectionProceedings of the 33rd Annual Computer Security Applications Conference10.1145/3134600.3134620(42-54)Online publication date: 4-Dec-2017
  • (2017)Understanding intended behavior using models of low-level signalsProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3098237(424-427)Online publication date: 10-Jul-2017
  • (2017)Effective online software anomaly detectionProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3092730(136-146)Online publication date: 10-Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media