Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An experimental investigation on the effects of context on source code identifiers splitting and expansion

Published: 01 December 2014 Publication History

Abstract

Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers' productivity.

References

[1]
Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of CASCON, pp 213-222.
[2]
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28:970-983.
[3]
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley.
[4]
Baker RD (1995) Modern permutation test software. In: Edgington EG (ed) Randomization tests. Marcel Decker.
[5]
Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm encyclopedia of software engineering. John Wiley and Sons.
[6]
Binkley D, Davis M, Lawrie D, Morrell C (2009) To camelcase or under_score. In: The 17th IEEE international conference on program comprehension, ICPC 2009. Vancouver, British Columbia, Canada, May 17-19, 2009. IEEE Computer Society, pp 158-167.
[7]
Binkley D, Davis M, Lawrie D, Maletic JI, Morrell C, Sharif B (2013) The impact of identifier style on effort and comprehension. Empir Software Eng 2(18):219-276.
[8]
Caprile B, Tonella P (1999) Nomen est omen: analyzing the language of function identifiers. In: Proc. of the working conference on reverse engineering (WCRE). Atlanta, Georgia, USA, pp 112-122.
[9]
Caprile B, Tonella P (2000) Restructuring program identifier names. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 97-107.
[10]
Deißenböck F, Pizka M (2005) Concise and consistent naming. In: Proc. of the International Workshop on Program Comprehension (IWPC).
[11]
Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011) Can better identifier splitting techniques help feature location? In: Proc. of the International Conference on Program Comprehension (ICPC). Kingston, pp 11-20.
[12]
Enslen E, Hill E, Pollock LL, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009. Vancouver, BC, Canada, May 16-17, pp 71-80.
[13]
Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates.
[14]
Guerrouj L, Di Penta M, Antoniol G, Guéhéneuc YG (2013) TIDIER: an identifier splitting approach using speech recognition techniques. J Softw Evol Process 25(6):569-661.
[15]
Holm A (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Stat 6:65-70.
[16]
Kersten M, Murphy GC (2006) Using task context to improve programmer productivity. In: SIGSOFT '06/FSE-14: proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM Press, Portland, Oregon, pp 1-11.
[17]
Lawrie D, Binkley D (2011) Expanding identifiers to normalize source code vocabulary. In: Proc. of the International Conference on Software Maintenance (ICSM), pp 113-122.
[18]
Lawrie D, Feild H, Binkley D (2006a) Syntactic identifier conciseness and consistency. In: 6th IEEE international workshop on source code analysis and manipulation. Philadelphia, Pennsylvania, USA, pp 139-148.
[19]
Lawrie D, Morrell C, Feild H, Binkley D (2006b) What's in a name? A study of identifiers. In: Proceedings of 14th IEEE international conference on program comprehension. IEEE CS Press, Athens, pp 3-12.
[20]
Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303-318.
[21]
Lawrie DJ, Binkley D, Morrell C (2010) Normalizing source code vocabulary. In: Proc. of the Working Conference on Reverse Engineering (WCRE), pp 112-122.
[22]
Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering. ACM, New York, NY, pp 234-243.
[23]
Madani N, Guerrouj L, Di Penta M, Guéhéneuc Y-G, Antoniol G (2010) Recognizing words from source code identifiers using speech recognition techniques. In: Proceedings of the conference on software maintenance and reengineering. IEEE, pp 69-78.
[24]
Maletic JI, Marcus A (2001) Supporting program comprehension using semantic and structural information. In: Proc. of 23rd international conference on software engineering. Toronto, pp 103-112.
[25]
Marc E, Alfred A, Giuliano A, Guéhéneuc Y-G (2008) Cerberus: tracing requirements to source code using information retrieval dynamic analysis and program analysis. In: ICPC '08: Proceedings of the 2008 the 16th IEEE international conference on program comprehension. IEEE Computer Society, Washington DC pp 53-62.
[26]
Marcus A, Maletic JI, Sergeyev A (2005) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5):811-836.
[27]
Merlo E, McAdam I, De Mori R (2003) Feed-forward and recurrent neural networks for source code informal information analysis. J Softw Maint 15(4):205-244.
[28]
Ney H (1984) The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Trans Acoust Speech Signal Process 32(2):263-271.
[29]
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Software Eng 33(6):420-432.
[30]
R Core Team (2012) R: a language and environment for statistical computing. Vienna, Austria. ISBN 3-900051-07-0.
[31]
Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers' experience and ability influence web application comprehension tasks supported by uml stereotypes: a series of four experiments. IEEE Trans Softw Eng 36(1):96-118.
[32]
Robillard MP, Coelho W, Murphy GC (2004) How effective developers investigate source code: anexploratory study. IEEE Trans Softw Eng 30(12):889-903.
[33]
Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of the international conference on program comprehension, pp 196-205.
[34]
Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall.
[35]
Sillito J, Murphy GC, De Volder K (2008) Asking and answering questions during a programming change task. IEEE Trans Softw Eng 34:434-451.
[36]
Soloway E, Bonar J, Ehrlich K (1983) Cognitive strategies and looping constructs: an empirical study. Commun ACM 26(11):853-860.
[37]
Storey MAD (1998) A cognitive framework for describing and evaluating software exploration tools. PhD thesis Simon Fraser University.
[38]
Takang A, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experiential study. J Program Lang 4(3):143-167.
[39]
von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. IEEE Comput 28(8):44-55.
[40]
Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering--an introduction. Kluwer Academic Publishers.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 19, Issue 6
December 2014
531 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2014

Author Tags

  1. Identifier splitting and expansion
  2. Program understanding
  3. Task context

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Revisiting file context for source code summarizationAutomated Software Engineering10.1007/s10515-024-00460-x31:2Online publication date: 1-Nov-2024
  • (2023)40 Years of Designing Code Comprehension Experiments: A Systematic Mapping StudyACM Computing Surveys10.1145/362652256:4(1-42)Online publication date: 9-Nov-2023
  • (2022)Investigating replication challenges through multiple replications of an experimentInformation and Software Technology10.1016/j.infsof.2022.106870147:COnline publication date: 1-Jul-2022
  • (2017)AnalyzaProceedings of the 22nd International Conference on Intelligent User Interfaces10.1145/3025171.3025227(493-504)Online publication date: 7-Mar-2017
  • (2015)Could we infer unordered API usage patterns only using the library source code?Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension10.5555/2820282.2820294(71-81)Online publication date: 16-May-2015

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media