Improving automated source code summarization via an eye-tracking study of programmers

Published: 31 May 2014 Publication History


Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.


Program comprehension is a current and a practical software engineering problem. Software engineers might try to comprehend the code by reading all of the source code-usually a substantial task. This paper highlights the idea of code summation as an alternative to time-consuming and subjective comprehension of the entire program. The focus of the paper is an automated code summation tool applied during the summary building process. An interesting aspect of the authors' contribution is the inclusion of a grounded theory approach (via eye-tracking studies) that provides empirical evidence regarding the kind of keywords the programmers would use to build the summaries. Through eye movement and heat paths, the authors provide new evidence (for example, a method's signature is most critical), contradict previous beliefs regarding programmer behavior (for example, control flow is not as important as it has been suggested in the past), and use the empirical evidence to develop an automated tool for building program summaries the way professional programmers would develop. The automated tool developed by the authors extracts the keywords and the number of times a certain keyword occurs in the code. Then, these keywords are weighed based upon the place in the code (for example, signature or control flow) where the keyword occurs. These weights mirror the way the programmers would have read these keywords and are validated through the eye-tracking study. This work would interest researchers trying to further improve program comprehension and industry professionals who can use the tool (and provide feedback) to improve the source code summarization in their organizations. This work is relevant and generalizable due to the inclusion of behavioral research methods to understand the program behavior during the source code summarization, and then empirically validating the tool (that was subsequently developed) through experimentation with professional software developers. Online Computing Reviews Service

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
May 2014
1139 pages
Author Tags

  1. program comprehension
  2. source code summaries


