research-article

Supporting code search with context-aware, analytics-driven, effective query reformulation

Author:

Mohammad Masudur RahmanAuthors Info & Claims

ICSE '19: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings

Pages 226 - 229

https://doi.org/10.1109/ICSE-Companion.2019.00088

Published: 25 May 2019 Publication History

Abstract

Software developers often experience difficulties in preparing appropriate queries for code search. Recent finding has suggested that developers fail to choose the right search keywords from an issue report for 88% of times. Thus, despite a number of earlier studies, automatic reformulation of queries for the code search is an open problem which warrants further investigations. In this dissertation work, we hypothesize that code search could be improved by adopting appropriate term weighting, context-awareness and data-analytics in query reformulation. We ask three research questions to evaluate the hypothesis, and then conduct six studies to answer these questions. Our proposed approaches improve code search by incorporating (1) novel, appropriate keyword selection algorithms, (2) context-awareness, (3) crowdsourced knowledge from Stack Overflow, and (4) large-scale data analytics into the query reformulation process.

References

[1]

Cost of software debugging. URL https://goo.gl/okoj21.

[2]

Apache Lucene Core, 2019. URL https://lucene.apache.org/core.

[3]

R Blanco and C Lioma. Graph-based Term Weighting for Information Retrieval. Inf. Retr., 15(1):54--92, 2012.

Digital Library

[4]

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.

[5]

J Brandt, P J Guo, J Lewenstein, M Dontcheva, and S R Klemmer. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proc. SIGCHI, pages 1589--1598, 2009.

Digital Library

[6]

S Brin and L Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst., 30(1--7):107--117, 1998.

Digital Library

[7]

W Chan, H Cheng, and D Lo. Searching Connected API Subgraph via Text Phrases. In Proc. FSE, pages 10:1--10:11, 2012.

Digital Library

[8]

O Chaparro and A Marcus. On the Reduction of Verbose Queries in Text Retrieval Based Software Maintenance. In Proc. ICSE-C, pages 716--718, 2016.

Digital Library

[9]

O Chaparro, J M Florez, and A Marcus. Using Observed Behavior to Reformulate Queries during Text Retrieval-based Bug Localization. In Proc. ICSME, pages 376--387, 2017.

[10]

G Gay, S Haiduc, A Marcus, and T Menzies. On the Use of Relevance Feedback in IR-based Concept Location. In Proc. ICSM, pages 351--360, 2009.

[11]

R. L. Glass. Frequently forgotten fundamental facts about software engineering. IEEE Software, 18(3):112--111, 2001.

Digital Library

[12]

S Haiduc, G Bavota, R Oliveto, A De Lucia, and A Marcus. Automatic Query Performance Assessment During the Retrieval of Software Artifacts. In Proc. ASE, pages 90--99, 2012.

Digital Library

[13]

S Haiduc, G Bavota, A Marcus, R Oliveto, A De Lucia, and T Menzies. Automatic Query Reformulations for Text Retrieval in Software Engineering. In Proc. ICSE, pages 842--851, 2013.

Digital Library

[14]

V. J. Hellendoorn and P. Devanbu. Are deep neural networks the best choice for modeling source code? In Proc. ESEC/FSE, pages 763--773, 2017.

Digital Library

[15]

E Hill, L Pollock, and K Vijay-Shanker. Automatically Capturing Source Code Context of NL-queries for Software Maintenance and Reuse. In Proc. ICSE, pages 232--242, 2009.

Digital Library

[16]

M J Howard, S Gupta, L Pollock, and K Vijay-Shanker. Automatically Mining Software-based, Semantically-Similar Words from Comment-Code Mappings. In Proc. MSR, pages 377--386, 2013.

Digital Library

[17]

S. F. Hussain and G. Bisson. Text Categorization Using Word Similarities Based on Higher Order Co-occurrences, pages 1--12. 2010.

[18]

K S Jones. A Statistical Interpretation Of Term Specificity And Its Application In Retrieval. Journal of Documentation, 28(1):11--21, 1972.

[19]

K Kevic and T Fritz. Automatic Search Term Identification for Change Tasks. In Proc. ICSE, pages 468--471, 2014.

Digital Library

[20]

K Kevic and T Fritz. A Dictionary to Translate Change Tasks to Source Code. In Proc. MSR, pages 320--323, 2014.

Digital Library

[21]

B. Lemaire and G. Denhire. Effects of high-order co-occurrences on word semantic similarities. CoRR, 2008.

[22]

Z. Li, T. Wang, Y. Zhang, Y. Zhan, and G. Yin. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proc. Internetware, pages 36--44, 2016.

Digital Library

[23]

Z. Lin, Y. Zou, J. Zhao, and B. Xie. Improving software text retrieval using conceptual knowledge in source code. In Proc. ASE, pages 123--134, 2017.

Digital Library

[24]

D Liu, A Marcus, D Poshyvanyk, and V Rajlich. Feature Location via Information Retrieval Based Filtering of a Single Scenario Execution Trace. In Proc. ASE, pages 234--243, 2007.

Digital Library

[25]

Meili Lu, X. Sun, S. Wang, D. Lo, and Yucong Duan. Query expansion via wordnet for effective code search. In Proc. SANER, pages 545--549, 2015.

[26]

C McMillan, M Grechanik, D Poshyvanyk, Q Xie, and C Fu. Portfolio: Finding Relevant Functions and their Usage. In Proc. ICSE, pages 111--120, 2011.

Digital Library

[27]

George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38 (11):39--41, 1995.

Digital Library

[28]

L. Nie, H. Jiang, Z. Ren, Z. Sun, and X. Li. Query expansion based on crowd knowledge for code search. TSC, 9(5):771--783, 2016.

[29]

M M Rahman and C K Roy. Improved Query Reformulation for Concept Location using CodeRank and Document Structures. In Proc. ASE, pages 428--439, 2017.

Digital Library

[30]

M M Rahman and C K Roy. STRICT: Information Retrieval Based Search Term Identification for Concept Location. In Proc. SANER, pages 79--90, 2017.

[31]

M. M. Rahman and C. K. Roy. Improving ir-based bug localization with context-aware query reformulation. In Proc. ESEC/FSE, pages 621--632, 2018.

Digital Library

[32]

M. M. Rahman and C. K. Roy. Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In Proc. ICSME, pages 516--527, 2018.

[33]

M M Rahman, C K Roy, and D Lo. RACK: Automatic API Recommendation using Crowdsourced Knowledge. In Proc. SANER, pages 349--359, 2016.

[34]

M. M. Rahman, J. Barson, S. Paul, J. Kayani, F. A. Lois, S. F. Quezada, C. Parnin, K T. Stolee, and Baishakhi Ray. Evaluating how developers use general-purpose web-search for code retrieval. In Proc. MSR, pages 465--475, 2018.

Digital Library

[35]

M. M. Rahman, C. K. Roy, and D. Lo. Automatic query reformulation for code search using crowdsourced knowledge. EMSE, page 56, 2018.

[36]

C. Sadowski, K. T. Stolee, and S. Elbaum. How developers search for code: A case study. In Proc. ESEC/FSE, pages 191--201, 2015.

Digital Library

[37]

D Shepherd, Z P Fry, E Hill, L Pollock, and K Vijay-Shanker. Using Natural Language Program Analysis to Locate and Understand Action-Oriented Concerns. In Proc. ASOD, pages 212--224, 2007.

Digital Library

[38]

R. Sirres, T. F. Bissyandé, D. Kim, D. Lo, J. Klein, K. Kim, and Y. L. Traon. Augmenting and structuring user queries to support efficient free-form code search. EMSE, pages 2622--2654, 2018.

Digital Library

[39]

B Sisman and A C Kak. Assisting Code Search with Automatic Query Reformulation for Bug Localization. In Proc. MSR, pages 309--318, 2013.

Digital Library

[40]

G Sridhara, E Hill, L Pollock, and K Vijay-Shanker. Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools. In Proc. ICPC, pages 123--132, 2008.

Digital Library

[41]

K. T. Stolee, S. Elbaum, and M. B. Dwyer. Code search with input/output queries: Generalizing, ranking, and assessment. JSS, 116(C):35--48, 2016.

Digital Library

[42]

Q Wang, C Parnin, and A Orso. Evaluating the Usefulness of IR-based Fault Localization Techniques. In Proc. ISSTA, pages 1--11, 2015.

Digital Library

[43]

J. Yang and L. Tan. Swordnet: Inferring semantically related words from software context. EMSE, 19(6):1856--1886, 2014.

Digital Library

[44]

X Ye, H Shen, X Ma, R Bunescu, and C Liu. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In Proc. ICSE, pages 404--415, 2016.

Digital Library

[45]

F. Zhang, H. Niu, I. Keivanloo, and Y. Zou. Expanding queries for code search using semantically related api class-names. TSE, 44(11):1070--1082, 2018.

Cited By

Xie YLin JDong HZhang LWu Z(2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3628161

Recommendations

Location-aware query reformulation for search engines

Query reformulation, including query recommendation and query auto-completion, is a popular add-on feature of search engines, which provide related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the ...
Automatic query reformulation for code search using crowdsourced knowledge

Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs ...
Exploiting Semantic Query Context to Improve Search Ranking
ICSC '08: Proceedings of the 2008 IEEE International Conference on Semantic Computing

One challenge for relevance ranking in Web search is underspecified queries. For such queries, top-ranked documents may contain information irrelevant to the search goal of the user; some newly-created relevant documents are ranked lower due to their ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '19: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings

May 2019

369 pages

Conference Chair:
Gunter Mussbacher
McGill University, Canada
,
General Chair:
Joanne M. Atlee
University of Waterloo, Canada
,
Program Chair:
Tevfik Bultan
University of California, Santa Barbara

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Qualifiers

Research-article

Conference

ICSE '19

Sponsor:

SIGSOFT
IEEE-CS

ICSE '19: 41st International Conference on Software Engineering

May 25 - 31, 2019

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
50
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie YLin JDong HZhang LWu Z(2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3628161

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents