Abstract
Although bug reports are frequently consulted project assets, they are communication logs, by-products of bug resolution, and not artifacts created with the intent of being easy to follow. To facilitate bug report digestion, we propose a new, unsupervised, bug report summarization approach that estimates the attention a user would hypothetically give to different sentences in a bug report, when pressed with time. We pose three hypotheses on what makes a sentence relevant: discussing frequently discussed topics, being evaluated or assessed by other sentences, and keeping focused on the bug report’s title and description. Our results suggest that our hypotheses are valid, since the summaries have as much as 12 % improvement in standard summarization evaluation metrics compared to the previous approach. Our evaluation also asks developers to assess the quality and usefulness of the summaries created for bug reports they have worked on. Feedback from developers not only shows the summaries are useful, but also points out important requirements for this, and any bug summarization approach, and indicates directions for future work.
Similar content being viewed by others
References
Ankolekar A, Sycara K, Herbsleb J, Kraut R, Welty C (2006) Supporting online problem-solving communities with the semantic web. WWW
Anvik J, Hiew L, C Murphy G (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering. ACM
Beineke P, Hastie T, Manning C (2004) Exploring sentiment summarization. AAAI
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report? SIGSOFT
Bettenburg N, Premraj R, Zimmermann T (2008b) Extracting structural information from bug reports. MSR
Blei DM, Ng Y, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res
Boehm B, Basili VR (2001) Software defect reduction top 10 list. IEEE Comput:34
Breu S, Premraj R, Sillito J, Zimmermann T (2010) Information needs in bug reports: improving cooperation between developers and users. Comput Supported Coop Work
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. WWW
Büttcher S, Clarke C, Cormack G (2010) Information retrieval: implementing and evaluating search engines. MIT Press
Dit B, Marcus A (2008) Improving the readability of defect reports. RSSE
Edmundson HP (1969) New methods in automatic extracting. J ACM (JACM) 16(2)
Gasser L, Ripoche G (2003) Distributed collective practices and free/open-source software problem management: perspectives and methods. CITE
Go A, Bhayani R (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford
Haiduc S, Aponte J, Moreno L, Marcus A (2010) On the use of automated text summarization techniques for summarizing source code. In: 2010 17th working conference on reverse engineering (WCRE). IEEE
Hamou-Lhadj A, Lethbridge T (2006) Summarizing the content of large traces to facilitate the understanding of the behaviour of a software system. In: 14th IEEE international conference on program comprehension, 2006. ICPC 2006. IEEE
Hiew L (2006) Assisted detection of duplicate bug reports, Master’s thesis, The University of British Columbia
Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR. ACM
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1)
Lotufo R, Malik Z, Czarnecki K (2012a) Modelling the ‘hurried’ bug report reading process for bug report summarization. ICSM
Lotufo R, Passos L, Czarnecki K (2012b) Towards improving bug tracking systems with game mechanisms. MSR
Mani S, Catherine R, Sinha VS, Dubey A (2012) Ausum: approach for unsupervised bug report summarization. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM
Mann WC, Thompson SA (1988) Rhetorical structure theory: toward a functional theory of text organization. Text 8(3)
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE
Mihalcea R, Textrank PT (2004) Bringing order into texts. EMNLP
Murray G (2008) Summarizing spoken and written conversations. EMNLP
Nenkova A, Louis Ae (2008) Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization
Nenkova A, Passonneau R, McKeown K (2007) The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Trans Comput Logic
Porter MF et al (1980) An algorithm for suffix stripping
Quan X, Liu G, Lu Z, Ni X, Wenyin L (2009) Short text similarity based on probabilistic topics. Knowl Inf Syst
Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. Artif Int
Rastkar S, Murphy GC, Murray G (2010) Summarizing software artifacts: a case study of bug reports. ICSE
Runeson P, AlexanderssonM, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering
Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K (2010) Towards automatically generating summary comments for java methods. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM
Strauss A, Corbin J (2008) Basics of qualitative research: techniques and procedures for developing grounded theory. Sage Publications
Sun B,Mitra P, Giles CL, Yen J, Zha H (2007) Topic segmentation with shared topic detection and alignment of multiple documents. SIGIR
Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Exp Syst Appl
Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: 2012 19th working conference on reverse engineering (WCRE). IEEE
Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 2012 19th working conference on reverse engineering (WCRE). IEEE
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering. ACM
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? In: Proceedings of the 4th international workshop on mining software repositories. IEEE Computer Society
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Massimiliano Di Penta and Jonathan Maletic
Rights and permissions
About this article
Cite this article
Lotufo, R., Malik, Z. & Czarnecki, K. Modelling the ‘hurried’ bug report reading process to summarize bug reports. Empir Software Eng 20, 516–548 (2015). https://doi.org/10.1007/s10664-014-9311-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-014-9311-2