research-article

Locus: locating bugs from software changes

Authors:

Shing-Chi CheungAuthors Info & Claims

ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Pages 262 - 273

https://doi.org/10.1145/2970276.2970359

Published: 25 August 2016 Publication History

Abstract

Various information retrieval (IR) based techniques have been proposed recently to locate bugs automatically at the file level. However, their usefulness is often compromised by the coarse granularity of files and the lack of contextual information. To address this, we propose to locate bugs using software changes, which offer finer granularity than files and provide important contextual clues for bug-fixing. We observe that bug inducing changes can facilitate the bug fixing process. For example, it helps triage the bug fixing task to the developers who committed the bug inducing changes or enables developers to fix bugs by reverting these changes. Our study further identifies that change logs and the naturally small granularity of changes can help boost the performance of IR-based bug localization. Motivated by these observations, we propose an IR-based approach Locus to locate bugs from software changes, and evaluate it on six large open source projects. The results show that Locus outperforms existing techniques at the source file level localization significantly. MAP and MRR in particular have been improved, on average, by 20.1% and 20.5%, respectively. Locus is also capable of locating the inducing changes within top 5 for 41.0% of the bugs. The results show that Locus can significantly reduce the number of lines needing to be scanned to locate the bug compared with existing techniques.

References

[1]

https://bugs.eclipse.org/bugs/buglist.cgi? classification=Eclipse&component=Core&list id= 11582065&product=JDT&query format=advanced& resolution=FIXED&version=4.5. Accessed: 2015-03-22.

[2]

https://bugs.eclipse.org/bugs/buglist.cgi? classification=Eclipse&component=UI&list id= 11582038&product=PDE&query format=advanced& resolution=FIXED&version=4.4. Accessed: 2015-03-22.

[3]

https://bz.apache.org/bugzilla/buglist.cgi?product= Tomcat%208&query format=advanced&resolution= FIXED. Accessed: 2015-03-22.

[4]

R. Abreu, P. Zoeteweij, and A. J. Van Gemund. On the accuracy of spectrum-based fault localization. In TAIC-PART’07, pages 89–98, 2007.

Digital Library

[5]

A. Alali, H. Kagdi, J. Maletic, et al. What’s a typical commit? a characterization of open source software repositories. In ICPC’08, pages 182–191. IEEE, 2008.

Digital Library

[6]

H. A. N. An Ngoc Lam, Anh Tuan Nguyen and T. N. Nguyen. Combining deep learning with information retrieval to localize buggy files for bug reports. In ASE’15, pages 151–160. IEEE, 2015.

[7]

S. A. Bohner. Software change impact analysis. 1996.

Digital Library

[8]

V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In ASE’07, pages 433–436. ACM, 2007.

Digital Library

[9]

B. Fluri, M. Wursch, M. PInzger, and H. C. Gall. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on Software Engineering, 33(11):725–743, 2007.

Digital Library

[10]

T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7):653–661, 2000.

Digital Library

[11]

G. Jeong, S. Kim, and T. Zimmermann. Improving bug triage with bug tossing graphs. In FSE’09, pages 111–120. ACM, 2009.

Digital Library

[12]

Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 39(6):757–773, 2013.

Digital Library

[13]

D. Kawrykow and M. P. Robillard. Non-essential changes in version histories. In ICSE’11, pages 351–360. ACM, 2011.

Digital Library

[14]

D. Kim, Y. Tao, S. Kim, and A. Zeller. Where should we fix this bug? a two-phase recommendation model. IEEE Transactions on Software Engineering, 39(11):1597–1610, 2013.

Digital Library

[15]

S. Kim, E. J. Whitehead Jr, and Y. Zhang. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering, 34(2):181–196, 2008.

Digital Library

[16]

S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead Jr. Automatic identification of bug-introducing changes. In ASE’06, pages 81–90. IEEE, 2006.

Digital Library

[17]

S. Kim, T. Zimmermann, E. J. Whitehead Jr, and A. Zeller. Predicting faults from cached history. In ICSE’07, pages 489–498. IEEE Computer Society, 2007.

Digital Library

[18]

T.-D. B. Le, R. J. Oentaryo, and D. Lo. Information retrieval and spectrum based bug localization: better together. In FSE’15, pages 579–590. ACM, 2015.

Digital Library

[19]

S. K. Lukins, N. A. Kraft, and L. H. Etzkorn. Bug localization using latent dirichlet allocation. Information and Software Technology, 52(9):972–990, 2010.

Digital Library

[20]

X. Ma, P. Huang, X. Jin, P. Wang, S. Park, D. Shen, Y. Zhou, L. K. Saul, and G. M. Voelker. edoctor: Automatically diagnosing abnormal battery drain issues on smartphones. In NSDI’13, pages 57–70, 2013.

Digital Library

[21]

H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pages 50–60, 1947.

[22]

C. D. Manning and H. Schütze. Foundations of statistical natural language processing, volume 999. MIT Press, 1999.

Digital Library

[23]

S. Meng, X. Wang, L. Zhang, and H. Mei. A history-based matching approach to identification of framework evolution. In ICSE’12, pages 353–363. IEEE, 2012.

Digital Library

[24]

L. Moreno, W. Bandara, S. Haiduc, and A. Marcus. On the relationship between the vocabulary of bug reports and source code. In ICSE’13, pages 452–455. IEEE, 2013.

[25]

L. Moreno, J. J. Treadway, A. Marcus, and W. Shen. On the use of stack traces to improve text retrieval-based bug localization. In ICSME’14, pages 151–160. IEEE, 2014.

Digital Library

[26]

R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In ICSE’08, pages 181–190. IEEE, 2008.

Digital Library

[27]

A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen. A topic-based approach for narrowing the search space of buggy files from a bug report. In ASE’11, pages 263–272. IEEE, 2011.

Digital Library

[28]

C. Parnin and A. Orso. Are automated debugging techniques actually helping programmers? In ISSTA’11, pages 199–209. ACM, 2011.

Digital Library

[29]

F. Rahman, D. Posnett, A. Hindle, E. Barr, and P. Devanbu. Bugcache for inspections: hit or miss? In FSE’11, pages 322–331. ACM, 2011.

Digital Library

[30]

S. Rao and A. Kak. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In MSR’11, pages 43–52. ACM, 2011.

Digital Library

[31]

X. Ren, F. Shah, F. Tip, B. G. Ryder, and O. Chesley. Chianti: a tool for change impact analysis of java programs. In ACM Sigplan Notices, volume 39, pages 432–448. ACM, 2004.

Digital Library

[32]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. Improving bug localization using structured information retrieval. In ASE’2013, pages 345–355. IEEE, 2013.

[33]

J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? ACM sigsoft software engineering notes, 30(4):1–5, 2005.

Digital Library

[34]

E. M. Voorhees et al. The trec-8 question answering track report. In Trec, volume 99, pages 77–82, 1999.

[35]

Q. Wang, C. Parnin, and A. Orso. Evaluating the usefulness of ir-based fault localization techniques. In ISSTA’15, pages 1–11. ACM, 2015.

Digital Library

[36]

S. Wang and D. Lo. Version history, similar report, and structure: Putting them together for improved bug localization. In ICPC’14, pages 53–63. ACM, 2014.

Digital Library

[37]

S. Wang, D. Lo, and X. Jiang. Understanding widespread changes: A taxonomic study. In CSMR’13, pages 5–14. IEEE, 2013.

Digital Library

[38]

C.-P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In ICSME’14, pages 181–190. IEEE, 2014.

Digital Library

[39]

R. Wu, H. Zhang, S.-C. Cheung, and S. Kim. Crashlocator: locating crashing faults based on crash stacks. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, pages 204–214, 2014.

Digital Library

[40]

R. Wu, H. Zhang, S. Kim, and S.-C. Cheung. Relink: recovering links between bugs and changes. In FSE’11, pages 15–25. ACM, 2011.

Digital Library

[41]

X. Ye, R. Bunescu, and C. Liu. Learning to rank relevant files for bug reports using domain knowledge. In FSE’14, pages 689–699. ACM, 2014.

Digital Library

[42]

Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. Bairavasundaram. How do fixes become bugs? In FSE’11, pages 26–36. ACM, 2011.

Digital Library

[43]

A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 28(2):183–200, 2002.

Digital Library

[44]

L. Zhang, M. Kim, and S. Khurshid. Localizing failure-inducing program edits based on spectrum information. In ICSM’11, pages 23–32. IEEE, 2011.

Digital Library

[45]

J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In ICSE’12, pages 14–24. IEEE, 2012.

Digital Library

Cited By

Yoon DWang YYu MHuang EJones JKukkadapu AKocas OWiepert JGoenka KChen SLin YHuang ZKong JChow MTang CWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production MonitoringProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695977(522-540)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695977
Wu YWen MYu ZGuo XJin HFilkov VRay BZhou M(2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695013
Gu KZhang YCao JTan XYang Md'Amorim M(2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663828
Show More Cited By

Index Terms

Locus: locating bugs from software changes
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software evolution
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Locating Faulty Source Code Files to Fix Bug Reports

Open source software is usually released while it still contains bugs. In order to fix a reported bug during maintenance phase, the developer has to search the source code files to identify the faulty ones; this process is called bug localization (BL)...
Multi-level reranking approach for bug localization

Bug fixing has a key role in software quality evaluation. Bug fixing starts with the bug localization step, in which developers use textual bug information to find location of source codes which have the bug. Bug localization is a tedious and time ...
Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools

Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

August 2016

899 pages

ISBN:9781450338455

DOI:10.1145/2970276

General Chair:
David Lo
Singapore Management University, Singapore
,
Program Chairs:
Sven Apel
University of Passau, Germany
,
Sarfraz Khurshid
University of Texas at Austin, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE'16

Sponsor:

SIGAI
SIGSOFT
IEEE-CS

ASE'16: ACM/IEEE International Conference on Automated Software Engineering

September 3 - 7, 2016

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

125
Total Citations
View Citations
808
Total Downloads

Downloads (Last 12 months)133
Downloads (Last 6 weeks)20

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yoon DWang YYu MHuang EJones JKukkadapu AKocas OWiepert JGoenka KChen SLin YHuang ZKong JChow MTang CWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production MonitoringProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695977(522-540)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695977
Wu YWen MYu ZGuo XJin HFilkov VRay BZhou M(2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695013
Gu KZhang YCao JTan XYang Md'Amorim M(2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663828
Chakraborty PArumugam VNagappan MIzadi MDi Sorbo APanichella S(2024)Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648028(1-8)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643787.3648028
Yu GChen PHe ZYan QLuo YLi FZheng Z(2024)ChangeRCA: Finding Root Causes from Software Changes in Large Online SystemsProceedings of the ACM on Software Engineering10.1145/36437281:FSE(24-46)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643728
Lyu YKang HWidyasari RLawall JLo D(2024)Evaluating SZZ Implementations: An Empirical Study on the Linux KernelIEEE Transactions on Software Engineering10.1109/TSE.2024.340671850:9(2219-2239)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3406718
Farzandway MGhassemi F(2024)SpecNLP: A Pre-trained Model Enhanced with Spectrum Profile for Bug Localization2024 IEEE International Conference on Artificial Intelligence Testing (AITest)10.1109/AITest62860.2024.00018(81-86)Online publication date: 15-Jul-2024
https://doi.org/10.1109/AITest62860.2024.00018
Vacheret RPérez FZiadi THillah L(2024)Boosting fault localization of statements by combining topic modeling and OchiaiInformation and Software Technology10.1016/j.infsof.2024.107499173(107499)Online publication date: Sep-2024
https://doi.org/10.1016/j.infsof.2024.107499
Wang DGalster MMorales-Trujillo M(2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107338
Song YXie XXu B(2024)When debugging encounters artificial intelligence: state of the art and open challengesScience China Information Sciences10.1007/s11432-022-3803-967:4Online publication date: 21-Feb-2024
https://doi.org/10.1007/s11432-022-3803-9
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents