Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICSE-Companion.2019.00088acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Supporting code search with context-aware, analytics-driven, effective query reformulation

Published: 25 May 2019 Publication History

Abstract

Software developers often experience difficulties in preparing appropriate queries for code search. Recent finding has suggested that developers fail to choose the right search keywords from an issue report for 88% of times. Thus, despite a number of earlier studies, automatic reformulation of queries for the code search is an open problem which warrants further investigations. In this dissertation work, we hypothesize that code search could be improved by adopting appropriate term weighting, context-awareness and data-analytics in query reformulation. We ask three research questions to evaluate the hypothesis, and then conduct six studies to answer these questions. Our proposed approaches improve code search by incorporating (1) novel, appropriate keyword selection algorithms, (2) context-awareness, (3) crowdsourced knowledge from Stack Overflow, and (4) large-scale data analytics into the query reformulation process.

References

[1]
Cost of software debugging. URL https://goo.gl/okoj21.
[2]
Apache Lucene Core, 2019. URL https://lucene.apache.org/core.
[3]
R Blanco and C Lioma. Graph-based Term Weighting for Information Retrieval. Inf. Retr., 15(1):54--92, 2012.
[4]
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606, 2016.
[5]
J Brandt, P J Guo, J Lewenstein, M Dontcheva, and S R Klemmer. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proc. SIGCHI, pages 1589--1598, 2009.
[6]
S Brin and L Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst., 30(1--7):107--117, 1998.
[7]
W Chan, H Cheng, and D Lo. Searching Connected API Subgraph via Text Phrases. In Proc. FSE, pages 10:1--10:11, 2012.
[8]
O Chaparro and A Marcus. On the Reduction of Verbose Queries in Text Retrieval Based Software Maintenance. In Proc. ICSE-C, pages 716--718, 2016.
[9]
O Chaparro, J M Florez, and A Marcus. Using Observed Behavior to Reformulate Queries during Text Retrieval-based Bug Localization. In Proc. ICSME, pages 376--387, 2017.
[10]
G Gay, S Haiduc, A Marcus, and T Menzies. On the Use of Relevance Feedback in IR-based Concept Location. In Proc. ICSM, pages 351--360, 2009.
[11]
R. L. Glass. Frequently forgotten fundamental facts about software engineering. IEEE Software, 18(3):112--111, 2001.
[12]
S Haiduc, G Bavota, R Oliveto, A De Lucia, and A Marcus. Automatic Query Performance Assessment During the Retrieval of Software Artifacts. In Proc. ASE, pages 90--99, 2012.
[13]
S Haiduc, G Bavota, A Marcus, R Oliveto, A De Lucia, and T Menzies. Automatic Query Reformulations for Text Retrieval in Software Engineering. In Proc. ICSE, pages 842--851, 2013.
[14]
V. J. Hellendoorn and P. Devanbu. Are deep neural networks the best choice for modeling source code? In Proc. ESEC/FSE, pages 763--773, 2017.
[15]
E Hill, L Pollock, and K Vijay-Shanker. Automatically Capturing Source Code Context of NL-queries for Software Maintenance and Reuse. In Proc. ICSE, pages 232--242, 2009.
[16]
M J Howard, S Gupta, L Pollock, and K Vijay-Shanker. Automatically Mining Software-based, Semantically-Similar Words from Comment-Code Mappings. In Proc. MSR, pages 377--386, 2013.
[17]
S. F. Hussain and G. Bisson. Text Categorization Using Word Similarities Based on Higher Order Co-occurrences, pages 1--12. 2010.
[18]
K S Jones. A Statistical Interpretation Of Term Specificity And Its Application In Retrieval. Journal of Documentation, 28(1):11--21, 1972.
[19]
K Kevic and T Fritz. Automatic Search Term Identification for Change Tasks. In Proc. ICSE, pages 468--471, 2014.
[20]
K Kevic and T Fritz. A Dictionary to Translate Change Tasks to Source Code. In Proc. MSR, pages 320--323, 2014.
[21]
B. Lemaire and G. Denhire. Effects of high-order co-occurrences on word semantic similarities. CoRR, 2008.
[22]
Z. Li, T. Wang, Y. Zhang, Y. Zhan, and G. Yin. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proc. Internetware, pages 36--44, 2016.
[23]
Z. Lin, Y. Zou, J. Zhao, and B. Xie. Improving software text retrieval using conceptual knowledge in source code. In Proc. ASE, pages 123--134, 2017.
[24]
D Liu, A Marcus, D Poshyvanyk, and V Rajlich. Feature Location via Information Retrieval Based Filtering of a Single Scenario Execution Trace. In Proc. ASE, pages 234--243, 2007.
[25]
Meili Lu, X. Sun, S. Wang, D. Lo, and Yucong Duan. Query expansion via wordnet for effective code search. In Proc. SANER, pages 545--549, 2015.
[26]
C McMillan, M Grechanik, D Poshyvanyk, Q Xie, and C Fu. Portfolio: Finding Relevant Functions and their Usage. In Proc. ICSE, pages 111--120, 2011.
[27]
George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38 (11):39--41, 1995.
[28]
L. Nie, H. Jiang, Z. Ren, Z. Sun, and X. Li. Query expansion based on crowd knowledge for code search. TSC, 9(5):771--783, 2016.
[29]
M M Rahman and C K Roy. Improved Query Reformulation for Concept Location using CodeRank and Document Structures. In Proc. ASE, pages 428--439, 2017.
[30]
M M Rahman and C K Roy. STRICT: Information Retrieval Based Search Term Identification for Concept Location. In Proc. SANER, pages 79--90, 2017.
[31]
M. M. Rahman and C. K. Roy. Improving ir-based bug localization with context-aware query reformulation. In Proc. ESEC/FSE, pages 621--632, 2018.
[32]
M. M. Rahman and C. K. Roy. Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In Proc. ICSME, pages 516--527, 2018.
[33]
M M Rahman, C K Roy, and D Lo. RACK: Automatic API Recommendation using Crowdsourced Knowledge. In Proc. SANER, pages 349--359, 2016.
[34]
M. M. Rahman, J. Barson, S. Paul, J. Kayani, F. A. Lois, S. F. Quezada, C. Parnin, K T. Stolee, and Baishakhi Ray. Evaluating how developers use general-purpose web-search for code retrieval. In Proc. MSR, pages 465--475, 2018.
[35]
M. M. Rahman, C. K. Roy, and D. Lo. Automatic query reformulation for code search using crowdsourced knowledge. EMSE, page 56, 2018.
[36]
C. Sadowski, K. T. Stolee, and S. Elbaum. How developers search for code: A case study. In Proc. ESEC/FSE, pages 191--201, 2015.
[37]
D Shepherd, Z P Fry, E Hill, L Pollock, and K Vijay-Shanker. Using Natural Language Program Analysis to Locate and Understand Action-Oriented Concerns. In Proc. ASOD, pages 212--224, 2007.
[38]
R. Sirres, T. F. Bissyandé, D. Kim, D. Lo, J. Klein, K. Kim, and Y. L. Traon. Augmenting and structuring user queries to support efficient free-form code search. EMSE, pages 2622--2654, 2018.
[39]
B Sisman and A C Kak. Assisting Code Search with Automatic Query Reformulation for Bug Localization. In Proc. MSR, pages 309--318, 2013.
[40]
G Sridhara, E Hill, L Pollock, and K Vijay-Shanker. Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools. In Proc. ICPC, pages 123--132, 2008.
[41]
K. T. Stolee, S. Elbaum, and M. B. Dwyer. Code search with input/output queries: Generalizing, ranking, and assessment. JSS, 116(C):35--48, 2016.
[42]
Q Wang, C Parnin, and A Orso. Evaluating the Usefulness of IR-based Fault Localization Techniques. In Proc. ISSTA, pages 1--11, 2015.
[43]
J. Yang and L. Tan. Swordnet: Inferring semantically related words from software context. EMSE, 19(6):1856--1886, 2014.
[44]
X Ye, H Shen, X Ma, R Bunescu, and C Liu. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In Proc. ICSE, pages 404--415, 2016.
[45]
F. Zhang, H. Niu, I. Keivanloo, and Y. Zou. Expanding queries for code search using semantically related api class-names. TSE, 44(11):1070--1082, 2018.

Cited By

View all
  • (2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '19: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings
May 2019
369 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media