research-article

Automatic construction of an effective training set for prioritizing static analysis warnings

Authors:

Guangtai Liang,

Qianxiang Wang,

Hong MeiAuthors Info & Claims

ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering

Pages 93 - 102

https://doi.org/10.1145/1858996.1859013

Published: 20 September 2010 Publication History

Abstract

In order to improve ineffective warning prioritization of static analysis tools, various approaches have been proposed to compute a ranking score for each warning. In these approaches, an effective training set is vital in exploring which factors impact the ranking score and how. While manual approaches to build a training set can achieve high effectiveness but suffer from low efficiency (i.e., high cost), existing automatic approaches suffer from low effectiveness. In this paper, we propose an automatic approach for constructing an effective training set. In our approach, we select three categories of impact factors as input attributes of the training set, and propose a new heuristic for identifying actionable warnings to automatically label the training set. Our empirical evaluations show that the precision of the top 22 warnings for Lucene, 20 for ANT, and 6 for Spring can achieve 100% with the help of our constructed training set.

References

[1]

}}C. Artho. Jlint - Find Bugs in Java Programs. http://Jlint.sourceforge.net/.

[2]

}}N. Ayewah, D. Hovemeyer, J. D. Morgenthaler, J. Penix, and W. Pugh. Using static analysis to find bugs. IEEE Software, vol. 25, no. 5, pages 22--29, 2008.

Digital Library

[3]

}}C. Boogerd and L. Moonen. Prioritizing software inspection results using static profiling. In Proc. SCAM, pages 149--160, 2006.

Digital Library

[4]

}}D. Binkley. Source code analysis: a road map. In Proc. FOSE, pages 104--119, 2007.

Digital Library

[5]

}}J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey. Identifying changed source code lines from revision repositories. In Proc. ESEC/FSE, pages 177--186, 2005.

Digital Library

[6]

}}B. Chess and J. West. Secure programming with static analysis. Aaison Wesley, 2007.

Digital Library

[7]

}}D. Cubranic and G. C. Murphy. Hipikat: recommending pertinent software development artifacts. In Proc. ICSE, pages 408--418, 2003.

Digital Library

[8]

}}K. Chen, S. R. Schach, L. Yu, J. Offutt, and G. Z. Heller. Open-source change logs. Empirical Software Engineering, vol. 9, no. 3, pages 197--210, 2004.

Digital Library

[9]

}}D. Engler, B. Chelf, A. Chou, and S. Hallem. Bugs as deviate behavior: A general approach to inferring errors in system code. In Proc. SOSP, pages 57--72, 2001.

Digital Library

[10]

}}D. Engler and M. Musuvathi. Static analysis versus software model checking for bug finding. In Proc. VMCAI, pages 191--210, 2004.

[11]

}}M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from revision control and bug tracking systems. In Proc. ICSM, pages 23--32, 2003.

Digital Library

[12]

}}FindBugs, available at http://findbugs.sourceforge.net/.

[13]

}}Fortify, available at http://www.fortify.net/intro.html.

[14]

}}K. Hornik, M. Stinchcombe and H. White. Multilayer feed-forward networks are universal approximators. Neural Networks, vol. 2, pages 359--366, 1989.

Digital Library

[15]

}}D. Hovemeyer and W. Pugh. Finding bugs is easy. In Proc. OOPSLA, pages 132--136, 2004.

Digital Library

[16]

}}S. Heckman and L. Williams. On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques. In Pro. ESEM, pages 41--50, 2008.

Digital Library

[17]

}}S. S. Heckman. Adaptively ranking alerts generated from automated static analysis. ACM Crossroads, 14(1), pages 1--11, 2007.

Digital Library

[18]

}}S. Kim and M. D. Ernst. Which warnings should I fix first? In Proc. ESEC/FSE, pages 45--54, 2007.

Digital Library

[19]

}}S. Kim and M. D. Ernst. Prioritizing warning categories by analyzing software history. In Proc. MSR, pages 27--30, 2007.

Digital Library

[20]

}}T. Kremenek, K. Ashcraft, J. Yang and D. Engler. Correlation exploitation in error ranking. In Proc. FSE, pages 83--93, 2004.

Digital Library

[21]

}}T. Kremenek and D. R. Engler. Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In Proc. SAS, pages 295--315, 2003.

Digital Library

[22]

}}Lint4j, available at http://www.jutils.com/.

[23]

}}A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proc. ICSM, pages 120--130, 2000.

Digital Library

[24]

}}PMD, available at http://pmd.sourceforge.net/.

[25]

}}J. R. Ruthruff, J. Penix, J. D. Morgenthaler, S. Elbaum, and G. Rothermel. Predicting accurate and actionable static analysis warnings: an experimental approach. In Proc. ICSE, pages 341--350, 2008.

Digital Library

[26]

}}N. Rutar, C. B. Almazan, and J. S. Foster. A comparison of bug finding tools for Java. In Proc. ISSRE, pages 245--256, 2004.

Digital Library

[27]

}}G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, vol.18, no.11, pages 613--620, 1975.

Digital Library

[28]

}}S. E. Sim, S. Easterbrook, and R. C. Holt. Using benchmarking to advance research: a challenge to software engineering, In Proc. ICSE, pages 74--83, 2003.

Digital Library

[29]

}}J. Spacco, D. Hovemeyer, and W. Pugh. Tracking defect warnings across revisions. In Proc. MSR, pages 133--136, 2006.

Digital Library

[30]

}}J. Sliwerski, T. Zimmermann and A. Zeller. When do changes induce fixes? In Proc. MSR 2005, pages 1--5, 2005.

Digital Library

[31]

}}Weka, available at http://www.cs.waikato.ac.nz/~ml/weka/

[32]

}}C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve static analysis techniques. IEEE Trans. Software Engineering, vol. 31, no. 6, pages 466--480, 2005.

Digital Library

Cited By

Ge XFang CLi XSun WWu DZhai JLin SZhao ZLiu YChen Z(2024)Machine Learning for Actionable Warning Identification: A Comprehensive SurveyACM Computing Surveys10.1145/369635257:2(1-35)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.1145/3696352
Yang YWen MGao XZhang YSun H(2024)Reducing False Positives of Static Bug Detectors Through Code Representation Learning2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00075(681-692)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00075
Gigante DPecorelli FBarletta VJanes ALenarduzzi VTaibi DBaldassarre M(2023)Resolving Security Issues via Quality-Oriented Refactoring: A User Study2023 ACM/IEEE International Conference on Technical Debt (TechDebt)10.1109/TechDebt59074.2023.00016(82-91)Online publication date: May-2023
https://doi.org/10.1109/TechDebt59074.2023.00016
Show More Cited By

Index Terms

Recommendations

Predicting accurate and actionable static analysis warnings: an experimental approach
ICSE '08: Proceedings of the 30th international conference on Software engineering

Static analysis tools report software defects that may or may not be detected by other verification methods. Two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on. This ...
OASIS: prioritizing static analysis warnings for Android apps based on app user reviews
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Lint is a widely-used static analyzer for detecting bugs/issues in Android apps. However, it can generate many false warnings. One existing solution to this problem is to leverage project history data (e.g., bug fixing statistics) for warning ...
Semi-supervised Based Training Set Construction for Outlier Detection
CLOUDCOM-ASIA '13: Proceedings of the 2013 International Conference on Cloud Computing and Big Data

Outliers are sparse and few. It's costly to obtain a training set with enough outliers so that existing approaches to the problem of outlier detection seldom processed with supervised manner. However, given a training set with sufficient outliers, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering

September 2010

534 pages

ISBN:9781450301169

DOI:10.1145/1858996

General Chair:
Charles Pecheur
Université catholique de Louvain, Belgium
,
Program Chairs:
Jamie Andrews
University of Western Ontario, Canada
,
Elisabetta Di Nitto
Politecnico di Milano, Italy

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE10

Sponsor:

ASE10: IEEE/ACM International Conference on Automated Software Engineering

September 20 - 24, 2010

Antwerp, Belgium

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
465
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)5

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ge XFang CLi XSun WWu DZhai JLin SZhao ZLiu YChen Z(2024)Machine Learning for Actionable Warning Identification: A Comprehensive SurveyACM Computing Surveys10.1145/369635257:2(1-35)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.1145/3696352
Yang YWen MGao XZhang YSun H(2024)Reducing False Positives of Static Bug Detectors Through Code Representation Learning2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00075(681-692)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00075
Gigante DPecorelli FBarletta VJanes ALenarduzzi VTaibi DBaldassarre M(2023)Resolving Security Issues via Quality-Oriented Refactoring: A User Study2023 ACM/IEEE International Conference on Technical Debt (TechDebt)10.1109/TechDebt59074.2023.00016(82-91)Online publication date: May-2023
https://doi.org/10.1109/TechDebt59074.2023.00016
Guo ZTan TLiu SLiu XLai WYang YLi YChen LDong WZhou Y(2023)Mitigating False Positive Static Analysis Warnings: Progress, Challenges, and OpportunitiesIEEE Transactions on Software Engineering10.1109/TSE.2023.332966749:12(5154-5188)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3329667
Yedida RKang HTu HYang XLo DMenzies T(2023)How to Find Actionable Static Analysis Warnings: A Case Study With FindBugsIEEE Transactions on Software Engineering10.1109/TSE.2023.323420649:4(2856-2872)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TSE.2023.3234206
Ge XFang CBai TLiu JZhao Z(2023)An Empirical Study of Class Rebalancing Methods for Actionable Warning IdentificationIEEE Transactions on Reliability10.1109/TR.2023.323498272:4(1648-1662)Online publication date: Dec-2023
https://doi.org/10.1109/TR.2023.3234982
Jongeling RVallecillo A(2023)Uncertainty-aware consistency checking in industrial settings2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)10.1109/MODELS58315.2023.00026(73-83)Online publication date: 1-Oct-2023
https://doi.org/10.1109/MODELS58315.2023.00026
Motwani MBrun YLiu AMuccini H(2023)Understanding Why and Predicting When Developers Adhere to Code-Quality StandardsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00045(432-444)Online publication date: 17-May-2023
https://dl.acm.org/doi/10.1109/ICSE-SEIP58684.2023.00045
Lenarduzzi VPecorelli FSaarimaki NLujan SPalomba F(2023)A critical comparison on six static analysis tools: Detection, agreement, and precisionJournal of Systems and Software10.1016/j.jss.2022.111575198(111575)Online publication date: Apr-2023
https://doi.org/10.1016/j.jss.2022.111575
Ge XFang CLiu JQing MLi XZhao Z(2023)An unsupervised feature selection approach for actionable warning identificationExpert Systems with Applications10.1016/j.eswa.2023.120152227(120152)Online publication date: Oct-2023
https://doi.org/10.1016/j.eswa.2023.120152
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents