article

An experimental investigation on the effects of context on source code identifiers splitting and expansion

Authors:

Latifa Guerrouj,

Massimiliano Penta,

Yann-Gaël Guéhéneuc,

Giuliano AntoniolAuthors Info & Claims

Empirical Software Engineering, Volume 19, Issue 6

Pages 1706 - 1753

https://doi.org/10.1007/s10664-013-9260-1

Published: 01 December 2014 Publication History

Abstract

Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers' productivity.

References

[1]

Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of CASCON, pp 213-222.

Abstract

References

Cited By

Recommendations

Toward mining "concept keywords" from identifiers in large software projects

Toward mining "concept keywords" from identifiers in large software projects

From source code identifiers to natural language terms

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations