Abstract
Text Retrieval (TR)-based approaches for bug localization rely on formulating an initial query based on the full text of a bug report. When the query fails to retrieve the buggy code artifacts, developers can reformulate the query and retrieve more candidate code documents. Existing research on query reformulation focuses mostly on leveraging relevance feedback from the user or on expanding the original query with additional information. We hypothesize that the title of the bug reports, the observed behavior, expected behavior, steps to reproduce, and code snippets provided by the users in bug descriptions, contain the most relevant information for retrieving the buggy code artifacts, and that other parts of the descriptions contain more irrelevant terms, which hinder retrieval. This paper proposes and evaluates a set of query reformulation strategies based on the selection of existing information in bug descriptions, and the removal of irrelevant parts from the original query. The results show that selecting the bug report title and the observed behavior is the strategy that performs best across various TR-based bug localization approaches and code granularities, as it leads to retrieving the buggy code artifacts within the top-N results for 25.6% more queries (on average) than without query reformulation. This strategy is highly applicable and consistent across different thresholds N. Selecting the steps to reproduce or the expected behavior (when provided in the bug reports) along with the bug title and the observed behavior leads to higher performance (i.e., between 31.4% and 41.7% more queries) and comparable consistency, yet it is applicable in fewer cases. These reformulation strategies are easy to use and are independent of the underlying retrieval technique.
Similar content being viewed by others
Notes
See Section 6 for details
This data set is called BRT in our prior work (Chaparro et al. 2017a).
Code search is a task similar but more general than TRBL.
See Table 11 and our replication package for more details.
We changed the notation in the table for space reasons.
HITS@N improvement cannot be measured for these two strategies because the HITS@N achieved by the initial queries (i.e., no reformulation) is zero, hence, the improvement is undefined (see Formula 3).
See our replication package for the detailed MRR/MAP results (Chaparro et al. 2018).
The # of queries for TITLE in Table 17 represents the avg. total # of queries for BRTracer.
The query is low-quality for the other three file-level TRBL techniques as well.
The position of the buggy file, after excluding the first top-5 documents would be 174.
The reformulation results in the query: “too large first step ... (Dormand-Prince 8(5,3) ...) For embedded Runge-Kutta type, this step size ... and fails to stop).”
References
Ali N, Sabane A, Gueheneuc Y-G, Antoniol G (2012) Improving bug location using binary class relationships. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM’12), pp 174–183
Bajracharya SK, Lopes CV (2012) Analyzing and mining a code search engine usage log. Empir Softw Eng 17(4-5):424–466
Bassett BR, Kraft NA (2013) Structural information based term weighting in text retrieval for feature location. In: Proceedings of the international conference on program comprehension (ICPC’13), pp 133–141
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. Comput Surv 44(1):1
Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of the international conference on software engineering (ICSE’16), pp 716–718
Chaparro O, Florez JM, Marcus A (2017a) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the 33rd international conference on software maintenance and evolution (ICSME’17), pp 376–387
Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017b) Detecting missing information in bug descriptions. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE’17), pp 396–407
Chaparro O, Florez JM, Marcus A (2018) Replication package. https://tinyurl.com/y7bzqnwc
Damevski K, Shepherd D, Pollock L (2016) A field study of how developers locate features in source code. Empir Softw Eng 21(2):724–747
Dao T, Zhang L, Na M (2017) How does execution information help with information-retrieval based bug localization? In: Proceedings of the international conference on program comprehension (ICPC’17), pp 241–250
Davies S, Roper M, Wood M (2012) Using bug report similarity to enhance bug localisation. In: Proceedings of the working conference on reverse engineering (WCRE’12), pp 125–134
Davies S, Roper M (2014) What’s in a bug report? In: Proceedings of the international, symposium on empirical software engineering and measurement (ESEM’14), pp 26:1–26:10
De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability. Springer, pp 71–98
Dietrich T, Cleland-Huang J, Shin Y (2013) Learning effective query transformations for enhanced requirements trace retrieval. In: Proceedings of the international conference on automated software engineering (ASE’13), pp 586–591
Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: Proceedings of the international conference on mining software repositories (MSR’16), pp 286–290
Dit B, Revelle M, Gethers M, Poshyvanyk D (2012) Feature location in source code A taxonomy and survey. J Softw Evol Process 25(1):53–95
Eddy BP, Kraft NA, Gray J (2018) Impact of structural weighting on a latent dirichlet allocation–based feature location technique. J Softw Evol Process 30(1):e1892
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: Proceedings of the international conference on software maintenance (ICSM’09), pp 351–360
Ge X, Shepherd DC, Damevski K, Murphy-Hill E (2017) Design and evaluation of a multi-recommendation system for local code search. J Vis Lang Comput 39:1–9
Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the international conference on automated software engineering (ASE’10), pp 245–254
Guo J, Gibiec M, Cleland-Huang J (2017) Tackling the term-mismatch problem in automated trace retrieval. Empir Softw Eng 22(3):1103–1142
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies Tim (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE’13), pp 842–851
Hatcher E, Gospodnetic O (2004) Lucene in action. Manning Publications
Hill E, Roldan-Vega M, Fails JA, Mallet G (2014) Nl-based query refinement and contextualized code search results: A user study. In: Proceedings of the conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE’14), pp 34–43
Hoang TV, Oentaryo RJ, Le TB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Transactions on Software Engineering. (to appear)
Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods, vol 751. Wiley, New York
Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the international symposium on software testing and analysis (ISSTA’14). ACM, pp 437–440
Kevic K, Fritz T (2014) Automatic search term identification for change tasks. In: Proceedings of the international conference on software engineering (ICSE’14), pp 468–471
Lemos OAL, de Paula AC, Sajnani H, Lopes CV (2015) Can the use of types and query expansion help improve large-scale code search? In: Proceedings of the international working conference on source code analysis and manipulation (SCAM’15), pp 41–50
Le T-DB, Thung F, Lo D (2014) Predicting effectiveness of ir-based bug localization techniques. In: Proceedings of the 25th international symposium on software reliability engineering (ISSRE’14), pp 335–345
Le T-DB, Oentaryo RJ, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE’15), pp 579–590
Lee J, Kim D, Tegawendé F, Jung Bissyandé W, Le Traon Y (2018) Bench4bl: reproducibility study on the performance of ir-based bug localization. In: Proceedings of the 27th international symposium on software testing and analysis (ISSTA’18) ISSTA 2018, pp 61–72
Li Z, Wang T, Zhang Y, Zhan Y, Yin G (2016) Query reformulation by leveraging crowd wisdom for scenario-based software search. In: Proceedings of the Asia-Pacific symposium on internetware (Internetware’16), pp 36–44
Lu XA, Keefer RB (1995) Query expansion/reduction and its impact on retrieval effectiveness. NIST Special Publication, pp 231–231
Lucene Apache (2017) https://lucene.apache.org/
Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) Codehow: effective code search based on api understanding and extended boolean model. In: Proceedings of the international conference on automated software engineering (ASE’15), pp 260–270
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of the annual meeting of the association for computational linguistics (ACL’14), pp 55–60
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the working conference on reverse engineering (WCRE’04), pp 214–223
Marcus A, Haiduc S (2013) Text retrieval approaches for concept location in source code. In: Software Engineering: International Summer Schools, ISSSE 2009-2011, Salerno, Italy. Revised Tutorial Lectures, volume 7171 of Lecture Notes in Computer Science. Springer, pp 126–158
Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, De Lucia A (2017) Predicting query quality for applications of text retrieval to software engineering tasks. Trans Softw Eng Methodol 26(1):3:1–3:45
Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization? In: Proceedings of the 34th IEEE international conference on software maintenance and evolution (ICSME’18), pp 410–421
Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 151–160
Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the international conference on automated software engineering (ASE’11), pp 263–272
Nichols BD (2010) Augmented bug localization using past bug information. In: Proceedings of the annual southeast regional conference (ACMSE’10), pp 1–6
Nie L, He J, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783
Ponzanelli L, Mocci A, Lanza M (2015) Stormed: stack overflow ready made data. In: Proceedings of 12th working conference on mining software repositories (MSR’15), pp 474–477
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Rahman MM, Roy CK (2016) Quickar: automatic query reformulation for concept location using crowdsourced knowledge. In: Proceedings of the international conference on automated software engineering (ASE’16), pp 220–225
Rahman MM, Roy CK (2017a) Strict: information retrieval based search term identification for concept location. In: Proceeding of the conference on software analysis, evolution, and reengineering (SANER’17), pp 79–90
Rahman MM, Roy CK (2017b) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of the international conference on automated software engineering (ASE’17). IEEE Press, pp 428–439
Rahman Md M, Barson J, Paul S, Kayani J, Lois FA, Quezada SF, Parnin C, Stolee KT, Ray B (2018a) Evaluating how developers use general-purpose web-search for code retrieval. In: Proceedings of the 15th international conference on mining software repositories (MSR’18), pp 465–475
Rahman MM, Roy CK (2018b) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th joint meeting on foundations of software engineering (ESEC/FSE’18). (to appear)
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the working conference on mining software repositories (MSR’11), pp 43–52
Rath M, Lo D, Mäder P (2018) Analyzing requirements and traceability information to improve bug localization. In: Proceedings of the working conference on mining software repositories (MSR’18). ACM
Roldan-Vega M, Mallet G, Hill E, Fails JA (2013) Conquer: a tool for nl-based query refinement and contextualizing code search results. In: Proceedings of the international conference on software maintenance (ICSM’13), pp 512–515
Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the international conference on automated software engineering (ASE’13), pp 345–355
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572
Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the international conference on aspect-oriented software development (AOSD’07), pp 212–224
Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62:636–648
Sim SE, Umarji M, Ratanotayanon S, Lopes CV (2011) How well do search engines support code retrieval on the web? ACM Trans Softw Eng Methodol 21(1):4
Sisman B, Kak AC (2012) Incorporating version histories in information retrieval based bug localization. In: Proceedings of the working conference on mining software repositories (MSR’12), pp 50–59
Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the working conference on mining software repositories (MSR’13), pp 309–318
Sisman B, Akbar SA, Kak AC (2016) Exploiting spatial code proximity and order for improved source code retrieval for bug localization. J Softw Evol Process 29 (1):e1805
Starke J, Luce C, Sillito J (2009) Searching and skimming: an exploratory study. In: Proceedings of the international conference on software maintenance (ICSM’09), pp 157–166
Takahashi A, Sae-Lim N, Hayashi S, Motoshi S (2018) Preliminary study on using code smells to improve bug localization. In: Proceedings of the international conference on program comprehension (ICPC’18). ACM, p 4
Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension (ICPC’14), pp 53–63
Wang S, Lo D, Lawall J (2014a) Compositional vector space models for improved bug localization. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 171–180
Wang S, Lo D, Jiang L (2014b) Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE’14), pp 677–682
Wang S, Lo D (2016) Amalgam+: composing rich information sources for accurate bug localization. J Softw Evol Process 28(10):921–942
Wen M, Wu R, Cheung S (2016) Locus: locating bugs from software changes. In: Proceedings of the 31st international conference on automated software engineering (ASE’16), pp 262–273
Wong C-P, Xiong Y, Zhang H, Hao D, Lu Z, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 181–190
Xiao Y, Keung J, Bennin KE, Mi Q (2018) Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology
Ye X, Bunescu R, Liu C (2016a) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42(4):379–402
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016b) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE’16), pp 404–415
Youm KC, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192
Zhang Y, Lo D, Xia X, Le TDB, Scanniello G, Sun J (2016) Inferring links between concerns and methods with multi-abstraction vector space model. In: Proceedings of the international conference on software maintenance and evolution (ICSME’16), pp 110–121
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the international conference on software engineering (ICSE’12), pp 14–24
Yu Z, Tong Y, Chen T, Han J (2017) Augmenting bug localization with part-of-speech and invocation. Int J Softw Eng Knowl Eng 27(6):925–949
Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643
Acknowledgments
This research was supported in part by the grants CCF-1848608 and CCF-1526118 from the US National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Lu Zhang, Thomas Zimmermann, Xin Peng and Hong Mei
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chaparro, O., Florez, J.M. & Marcus, A. Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Software Eng 24, 2947–3007 (2019). https://doi.org/10.1007/s10664-018-9672-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-018-9672-z