Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Link analysis algorithms for static concept location: an empirical assessment

Published: 01 December 2015 Publication History

Abstract

During software evolution, one of the most important comprehension activities is concept location in source code, as it identifies the places in the code where changes are to be made in response to a modification request. Change requests (such as, bug fixing or new feature requests) are usually formulated in natural language, while the source code also includes large amounts of text. In consequence, many of the existing concept location techniques are based on text search or text retrieval. Such approaches reformulate concept location as a document retrieval problem. We refine and improve such solutions by leveraging dependencies between source code elements. Dependency information is used by a link analysis algorithm to rank the document space and to improve concept location based on text retrieval. We implemented our solution to concept location using the PageRank algorithm, used in web document retrieval applications. The results of an empirical evaluation indicate that the new approach leads to better retrieval performance than baseline approaches that use text retrieval and clustering. In addition, we present the results of a controlled experiment and of a differentiated replication to assess whether the new technique supports users in identifying the places in the code where changes are to be made. The results of these experiments revealed that the users exploiting our technique were significantly better supported in the identification of the code to be changed in response to a bug fixing request compared to the users who did not use this technique.

References

[1]
Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: International conference on program comprehension. IEEE CS Press, Washington, DC, pp 103-112.
[2]
Abrahão S, Gravino C, Pelozo EI, Scanniello G, Tortora G (2013) Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: results from a family of five experiments. IEEE Trans Soft Eng 39(3):327-342.
[3]
Ali N, Sabane A, Guéhéneuc Y-G, Antoniol G (2012) Improving bug location using binary class relationships. In: Proceedings of international working conference on source code analysis and manipulation (SCAM). IEEE Computer Society, Washington, DC, p 174-183.
[4]
Aranda J, Ernst N, Horkoff J, Easterbrook S (2007) A framework for empirical evaluation of model comprehensibility. In: Proceedings of modeling in software engineering. ICSE Workshop, pp 7-13. IEEE.
[5]
Arisholm E, Briand LC, Hove SE, Labiche Y (2006) The impact of UML documentation on software maintenance: an experimental evaluation. IEEE Trans Soft Eng 32:365-381.
[6]
Bajracharya SK, Ngo TC, Linstead E, Dou Y, Rigor P, Baldi P, Lopes CV (2006) Sourcerer: a search engine for open source code supporting structure-based search. In: Tarr PL, Cook WR (eds) Companion to the 21th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications (OOPSLA), Portland, pp 681-682. ACM.
[7]
Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm, encyclopedia of software engineering. Wiley.
[8]
Basili VR, Shull F, Lanubile F (1999) Building knoledge through families of experiments. In: IEEE Transactions on Software Engineering, IEEE.
[9]
Beard M, Kraft N, Etzkorn L, Lukins S (2011) Measuring the accuracy of information retrieval based bug localization techniques. In: Proceedings of working conference on reverse engineering (WCRE). IEEE Computer Society, Washington, DC, pp 124-128.
[10]
Briand LC, Labiche Y, Di Penta M, Yan-Bondoc H (2005) An experimental investigation of formality in UML-based development. IEEE Trans Soft Eng 31(10):833-849.
[11]
Brien MPO, Buckley J (2005) Modelling the information-seeking behaviour of programmers - an empirical approach. In: Proceedings of workshop on program comprehension (IWPC). IEEE Computer Society, pp 125-134.
[12]
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on World Wide Web 7, (WWW7). Elsevier, Amsterdam, pp 107-117.
[13]
Buckner J, Buchta J, Petrenko M, Jripples V (2005) Rajlich: a tool for program comprehension during incremental change. In: Proceedings of international workshop on program comprehension, (IWPC). IEEE Computer Society, pp 149-152.
[14]
Carver J, Jaccheri L, Morasca S, Shull F (2003) Issues in using students in empirical studies in software engineering education. In: Proceedings of international symposium on software metrics. IEEE Computer Society, Washington, DC, pp 239-250.
[15]
Chan W-K, Cheng H, Lo D (2012) Searching connected API subgraph via text phrases. In: Proceedings of symposium on the foundations of software engineering. SIGSOFT FSE. ACM, p 10.
[16]
Chen K, Rajlich V (2000) Case study of feature location using dependence graph. In: Proc. of 8th international workshop on program comprehension, pp 241-247.
[17]
Ciolkowski M, Muthig D, Rech J (2004) Using academic courses for empirical validation of software development processes. In: Proceedings of EUROMICRO Conference. IEEE Computer Society, Washington, DC, pp 354-361.
[18]
Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494-509.
[19]
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn., Lawrence Earlbaum Associates, Hillsdale.
[20]
Colosimo M, De Lucia A, Scanniello G, Tortora G (2009) Evaluating legacy system migration technologies through empirical studies. Inf Soft Technol 51(12):433-447.
[21]
Conover WJ (1998) Practical Nonparametric Statistics, 3rd edn. Wiley.
[22]
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391-407.
[23]
Devore JL, Farnum N (1999) Applied statistics for engineers and scientists. Duxbury.
[24]
De Lucia A, Oliveto R, Tortora G (2009) Assessing ir-based traceability recovery tools through controlled experiments. Empirical Softw Eng 14(1):57-92.
[25]
Dit B, Revelle M, Poshyvanyk D (2013a) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empirical Softw Engg 18(2):277-309.
[26]
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013b) Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process 25(1):53-95.
[27]
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52-64.
[28]
Eaddy M, Aho AV, Antoniol G, Guéhéneuc Y-G (2008) Cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: Proceedings of international conference on program comprehension, ICPC '08. IEEE Computer Society, Washington, DC, pp 53-62.
[29]
Ellis P (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press.
[30]
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings of international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 351-360.
[31]
Gold N, Harman M, Li Z, Mahdavi K (2006) Allowing overlapping boundaries in source code using a search based approach to concept binding. In: Proceedings of international conference on software maintenance, (ICSM). IEEE Computer Society, Washington, DC, pp 310-319.
[32]
Grant S, Cordy JR, Skillicorn D, Automated concept location using independent component analysis. In: Proceedings of working conference on reverse engineering WCRE (2008). IEEE Computer Society, Washington, DC, pp 138-142.
[33]
Gravino C, Risi M, Scanniello G, Tortora G (2012) Do professional developers benefit from design pattern documentation? A replication in the context of source code comprehension. In: Proceedings of conference on model driven engineering languages and systems, lecture notes in computer science, Springer, pp 185-201.
[34]
Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C (2010) A search engine for finding highly relevant applications. In: Proceedings of international conference on software engineering, ICSE, vol 1, ACM, New York.
[35]
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of international conference on software engineering, ICSE. IEEE Press, Piscataway, pp 842-851.
[36]
Hannay J, Jørgensen M (2008) The role of deliberate artificial design elements in software engineering experiments. IEEE Trans Softw Eng 34(2):242-259.
[37]
Harman M, Gold N, Hierons RM, Binkley D (2002) Code extraction algorithms which unify slicing and concept assignment. In: Proceedings of working conference on reverse engineering, WCRE. IEEE Computer Society, Richmond, pp 11-21.
[38]
Hill E, Pollock L, Vijay-Shanker K (2007) Exploring the neighborhood with dora to expedite software maintenance. In: Proceedings of international conference on automated software engineering, ASE, ACM, New York.
[39]
Inoue K, Yokomori R, Yamamoto T, Matsushita M, Kusumoto S (2005) Ranking significance of software components based on use relations. IEEE Trans Softw Eng 31(3):213-225.
[40]
Juristo N, Moreno A (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, Englewood Cliffs.
[41]
Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Soft Technol 49(11-12):1073-1086.
[42]
Kitchenham B, Al-Khilidar H, Babar M, Berry M, Cox K, Keung J, Kurniawati F, Staples M, Zhang H, Zhu L (2008) Evaluating guidelines for reporting empirical software engineering studies. Empir Soft Eng 13:97-121.
[43]
Ko AJ, Myers BA, Coblenz MJ, Aung HH (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Soft Eng 32(12):971-987.
[44]
Li Z (2009) Identifying high-level dependence structures using slice-based dependence analysis. In: 25th IEEE international conference on software maintenance (ICSM). Edmonton, pp 457-460. IEEE.
[45]
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of working conference on reverse engineering, WCRE. IEEE Computer Society, Washington, DC, pp 155-164.
[46]
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972-990.
[47]
Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York.
[48]
Marcus A, Haiduc S (2013) Text retrieval approaches for concept location in source code. In: Software engineering, volume 7171 of lecture notes in computer science. Springer, pp 126-158.
[49]
Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of international conference on software engineering, ICSE. IEEE Computer Society, Portland, pp 124-135.
[50]
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of working conference on reverse engineering, WCRE' 04. IEEE Computer Society, Washington, DC, pp 214-223.
[51]
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of International Conference on Software Engineering, ICSE, ACM, New York.
[52]
McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q (2012) Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans Soft Eng 38(5):1069-1087.
[53]
Moreno L, Bandara W, Haiduc S, Marcus A (2013) On the relationship between the vocabulary of bug reports and source code. In: International conference on software maintenance, ICSM, IEEE Computer Society.
[54]
Ngomo ACN (2009) Low-bias extraction of domain-specific concepts. Ph.D Thesis.
[55]
Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter, London.
[56]
Panichella A, McMillan C, Moritz E, Palmieri D, Oliveto R, Poshyvanyk D, De Lucia A (2013) When and how using structural information to improve ir-based traceability recovery. In: European conference on software maintenance and reengineering, CSMR. IEEE Computer Society, Washington, DC, pp 199- 208.
[57]
Petrenko M., Rajlich V. (2013) Concept location using program dependencies and information retrieval (depir). Inf Softw Technol 55(4):651-659.
[58]
Poshyvanyk D, Gethers M, Marcus A (2013) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol 21(4):23:1-23:34.
[59]
Poshyvanyk D, Marcus A (2007) Combining formal concept analysis with information retrieval for concept location in source code. In: Proceedings of the 15th ieee international conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 37-48.
[60]
Puppin D, Silvestri F (2006) The social network of java classes. In: Proceedings of symposium on applied computing, (SAC), ACM, New York.
[61]
Rajlich V, Wilde N (2002) The role of concepts in program comprehension. In: Proceedings of international workshop on program comprehension, IWP. IEEE Computer Society, Washington, DC, pp 271-278.
[62]
Revelle M, Dit B, Poshyvanyk D (2010) Using data fusion and web mining to support feature location in software. In: Proceedings of international conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 14-23.
[63]
Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers' experience and ability influence Web application comprehension tasks supported by UML stereotypes: a series of four experiments. IEEE Trans Soft Eng 36(1):96-118.
[64]
Robillard MP (2008) Topology analysis of software dependencies. ACM Trans Softw Eng Methodol 17(4):18:1-18:36.
[65]
Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and cohen's d for evaluating group differences on the nsse and other surveys? In: Annual meeting of the Florida association of institutional research.
[66]
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw Hill, New York.
[67]
Scanniello G, D'Amico A, D'Amico C, D'Amico T (2010) Using the kleinberg algorithm and vector space model for software system clustering. In: International conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 180-189.
[68]
Scanniello G, Gravino C, Genero M, Cruz-Lemus JA, Tortora G (2014) On the impact of UML analysis models on source code comprehensibility and modifiability. ACM Trans Sofw Eng Meth 23(2):13:1- 13:26.
[69]
Scanniello G, Gravino C, Tortora G (2010) Investigating the role of UML in the software modeling and maintenance - a preliminary industrial survey. In: Proceedings of the international conference on enterprise information systems. pp 141-148.
[70]
Scanniello G, Marcus A (2011) Clustering support for static concept location in source code. In: Proceedings of international conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 1-10.
[71]
Seaman CB (2002) The information gathering strategies of software maintainers. In: Proceedings of the international conference on software maintenance, ICSM. IEEE Computer Society, Washington, DC, pp 141-149.
[72]
Shapiro S, Wilk M (1965) An analysis of variance test for normality. Biometrika 52(3-4):591-611.
[73]
Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Soft Eng 13(2):211-218.
[74]
Sjoberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg N, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Soft Eng 31(9):733-753.
[75]
Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of international conference on software maintenance, ICSM. IEEE Computer Society, pp 213-222.
[76]
Wang S, Lo D, Jiang L (2011) Code search via topic-enriched dependence graph matching. In: Working conference on reverse engineering, WCRE. IEEE Computer Society, pp 119-123.
[77]
Wang S, Lo D, Xing Z, Jiang L (2011) Concern localization using information retrieval: an empirical study on linux kernel. In: Proceedings of working conference on reverse engineering, WCRE. IEEE Computer Society, pp 92-96.
[78]
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer.
[79]
Zhao W, Zhang L, Liu Y, Sun J, Yang F (2004) Sniafl: towards a static non-interactive approach to feature location. In: Proceedings of international conference on software engineering, ICSE. IEEE Computer Society, Washington, DC, pp 293-303.
[80]
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: International conference on software engineering, ICSE. IEEE pp 14-24.

Cited By

View all
  • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 4-Jul-2023
  • (2023)Information Retrieval-Based Fault Localization for Concurrent ProgramsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00122(1467-1479)Online publication date: 11-Nov-2023
  • (2023)Applications of natural language processing in software traceabilityJournal of Systems and Software10.1016/j.jss.2023.111616198:COnline publication date: 1-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 20, Issue 6
December 2015
491 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2015

Author Tags

  1. Concept location
  2. Controlled experiments
  3. Empirical study
  4. Experiments
  5. Information retrieval

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 4-Jul-2023
  • (2023)Information Retrieval-Based Fault Localization for Concurrent ProgramsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00122(1467-1479)Online publication date: 11-Nov-2023
  • (2023)Applications of natural language processing in software traceabilityJournal of Systems and Software10.1016/j.jss.2023.111616198:COnline publication date: 1-Apr-2023
  • (2023)BTLink : automatic link recovery between issues and commits based on pre-trained BERT modelEmpirical Software Engineering10.1007/s10664-023-10342-728:4Online publication date: 12-Jul-2023
  • (2022)Retrieving data constraint implementations using fine-grained code patternsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510167(1893-1905)Online publication date: 21-May-2022
  • (2022)The Effect of Feature Characteristics on the Performance of Feature Location TechniquesIEEE Transactions on Software Engineering10.1109/TSE.2021.304973548:6(2066-2085)Online publication date: 1-Jun-2022
  • (2022)A model-based approach for specifying changes in replications of empirical studies in computer ScienceComputing10.1007/s00607-022-01133-x105:6(1189-1213)Online publication date: 3-Dec-2022
  • (2020)An empirical assessment of baseline feature location techniquesEmpirical Software Engineering10.1007/s10664-019-09734-525:1(266-321)Online publication date: 1-Jan-2020
  • (2018)The State of Empirical Evaluation in Static Feature LocationACM Transactions on Software Engineering and Methodology10.1145/328098828:1(1-58)Online publication date: 5-Dec-2018
  • (2018)Bug Localization with Semantic and Structural Features using Convolutional Neural Network and Cascade ForestProceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 201810.1145/3210459.3210469(101-111)Online publication date: 28-Jun-2018
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media