short-paper

On the similarity of software development documentation

Author:

Mathias EllmannAuthors Info & Claims

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pages 1030 - 1033

https://doi.org/10.1145/3106237.3119875

Published: 21 August 2017 Publication History

Abstract

Software developers spent 20% of their time on information seeking on Stack Overflow, YouTube or an API reference documentation. Software developers can search within Stack Overflow for duplicates or similar posts. They can also take a look on software development documentations that have similar and additional information included as a Stack Overflow post or a development screencast in order to get new inspirations on how to solve their current development problem. The linkage of same and different types of software development documentation might safe time to evolve new software solutions and might increase the productivity of the developer’s work day. In this paper we will discuss our approach to get a broader understanding of different similarity types (exact, similar and maybe) within and between software documentation as well as an understanding of how different software documentations can be extended.

References

[1]

2007. Duplicate Bugs. (2007). https://blogs.msdn.microsoft.com/alanpa/2007/08/ 01/duplicate-bugs/ 2016. Duplicate Bugs. (2016). https://meta.stackexchange.com/questions/10841/ how-should-duplicate-questions-be-handled 2017. Definition of an artefact. (2017). https://en.oxforddictionaries.com/ definition/artefact

[2]

Muhammad Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K Roy, and Kevin A Schneider. 2016. Mining duplicate questions in stack overflow. In Proceedings of the 13th International Conference on Mining Software Repositories. ACM, 402–412.

Digital Library

[3]

Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013.

[4]

Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76–81.

Digital Library

[5]

Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2012. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 850–858.

Digital Library

[6]

Jeff Atwood. 2009. Handling Duplicate Questions. (2009). http://blog. stackoverflow.com/2009/04/handling-duplicate-questions/

[7]

Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmfulâĂę really?. In Software maintenance, 2008. ICSM 2008. IEEE international conference on. IEEE, 337–345.

[8]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".

Digital Library

[9]

Roger B Bradford. 2008. An empirical study of required dimensionality for large-scale latent semantic indexing applications. In Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 153–162.

Digital Library

[10]

Bernd Bruegge and Allen H Dutoit. 2004. Object-Oriented Software Engineering Using UML, Patterns and Java-(Required). Prentice Hall.

Digital Library

[11]

Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 74–77.

Digital Library

[12]

Jack G Conrad, Xi S Guo, and Cindy P Schriber. 2003. Online duplicate document detection: signature reliability in a dynamic retrieval environment. In Proceedings of the twelfth international conference on Information and knowledge management. ACM, 443–452.

Digital Library

[13]

Denzil Correa and Ashish Sureka. 2013. Fit or unfit: analysis and prediction of’closed questions’ on stack overflow. ACM.

[14]

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6 (1990), 391.

[15]

Mathias Ellmann, Alxander Oeser, Davide Fucci, and Walid Maalej. 2017. Find, Understand, and Extend Development Screencasts on YouTube. In Proceedings of the 3rd International Workshop on Software Analytics. ACM.

Digital Library

[16]

Thomas Fritz and Gail C Murphy. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 175–184.

Digital Library

[17]

Anna Huang. 2008. Similarity measures for text document clustering. In Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand. 49–56.

[18]

Mik Kersten and Gail C Murphy. 2006. Using task context to improve programmer productivity. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 1–11.

Digital Library

[19]

Andrew J Ko, Robert DeLine, and Gina Venolia. 2007. Information needs in collocated software development teams. In Software Engineering, 2007. ICSE 2007.

Digital Library

[20]

29th International Conference on. IEEE, 344–353.

[21]

E Kodhai, S Kanmani, A Kamatchi, R Radhika, and B Vijaya Saranya. 2010. Detection of type-1 and type-2 code clones using textual analysis and metrics. In Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on. IEEE, 241–243.

Digital Library

[22]

Klaus Krippendorff. 2012. Content analysis: An introduction to its methodology. Sage.

[23]

Timothy C Lethbridge, Janice Singer, and Andrew Forward. 2003. How software engineers use documentation: The state of the practice. IEEE software 20, 6 (2003), 35–39.

Digital Library

[24]

Walid Maalej. 2009. Task-first or context-first? tool integration revisited. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 344–355.

Digital Library

[25]

Walid Maalej, Mathias Ellmann, and Romain Robbes. 2016. Using contexts similarity to predict relationships between tasks. Journal of Systems and Software (2016).

Digital Library

[26]

Walid Maalej and Hans-Jörg Happel. 2010. Can development work describe itself?. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on. IEEE, 191–200.

[27]

Walid Maalej and Martin P Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering 39, 9 (2013), 1264– 1282.

Digital Library

[28]

Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 4 (2014), 31.

Digital Library

[29]

Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, camera, action: how software developers document and share program knowledge using YouTube. In Program Comprehension (ICPC), 2015 IEEE 23rd International Conference on. IEEE, 104–114.

Digital Library

[30]

Tim Menzies, Laurie Williams, and Thomas Zimmermann. 2016. Perspectives on Data Science for Software Engineering. Morgan Kaufmann.

Digital Library

[31]

Seung-Taek Park, David M Pennock, C Lee Giles, and Robert Krovetz. 2002. Analysis of lexical signatures for finding lost or related documents. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 11–18.

Digital Library

[32]

pyLDAvis. 2014. Python library for interactive topic model visualization. (2014). https://github.com/bmabey/pyLDAvis

[33]

Martin P Robillard, Walid Maalej, Robert J Walker, and Thomas Zimmermann. 2014. Recommendation systems in software engineering. Springer Science & Business.

Digital Library

[34]

Chanchal K Roy, James R Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming 74, 7 (2009), 470–495.

Digital Library

[35]

Carson Sievert and Kenneth E Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces. 63–70.

[36]

Janice Singer, Timothy Lethbridge, Norman Vinson, and Nicolas Anquetil. 2010. An examination of software engineering work practices. In CASCON First Decade High Impact Papers. IBM Corp., 174–188.

Digital Library

[37]

Rebecca Tiarks and Walid Maalej. 2014. How does a typical tutorial for mobile development look like?. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 272–281.

Digital Library

[38]

Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How do programmers ask and answer questions on the web?: Nier track. In Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 804–807.

Digital Library

Cited By

Khairunnesa SAhmed SImtiaz SRajan HLeavens G(2023)What kinds of contracts do ML APIs need?Empirical Software Engineering10.1007/s10664-023-10320-z28:6Online publication date: 17-Oct-2023
https://dl.acm.org/doi/10.1007/s10664-023-10320-z
Ellmann MYu YFredericks EDevanbu P(2018)Natural language processing (NLP) applied on issue trackersProceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering10.1145/3283812.3283825(38-41)Online publication date: 4-Nov-2018
https://dl.acm.org/doi/10.1145/3283812.3283825
Inzunza SJuárez-Ramírez RJiménez S(2018)API DocumentationTrends and Advances in Information Systems and Technologies10.1007/978-3-319-77712-2_22(229-239)Online publication date: 17-May-2018
https://doi.org/10.1007/978-3-319-77712-2_22

Recommendations

Patterns for implementing software analytics in development teams
PLoP '17: Proceedings of the 24th Conference on Pattern Languages of Programs

The software development activities typically produce a large amount of data. Using a data-driven approach to decision making - such as Software Analytics - the software practitioners can achieve higher development process productivity and improve many ...
"Leagile" software development

In recent years there has been a noticeable shift in attention from those who use agile software development toward lean software development, often labelled as a shift "from agile to lean". However, the reality may not be as simple or linear as this ...
Documentation practices in scientific software development
CHASE '12: Proceedings of the 5th International Workshop on Co-operative and Human Aspects of Software Engineering

This paper focuses on documentation practices in scientific software development and takes into account two perspectives: that of scientists who develop software (scientists-developers) and of those scientists who use it. We describe documentation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

August 2017

1073 pages

ISBN:9781450351058

DOI:10.1145/3106237

General Chairs:
Eric Bodden
Paderborn University, Germany / Fraunhofer IEM, Germany
,
Wilhelm Schäfer
Paderborn University, Germany
,
Program Chairs:
Arie van Deursen
Delft University of Technology, Netherlands
,
Andrea Zisman
Open University, UK

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ESEC/FSE'17

Sponsor:

SIGSOFT

ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

September 4 - 8, 2017

Paderborn, Germany

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
254
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khairunnesa SAhmed SImtiaz SRajan HLeavens G(2023)What kinds of contracts do ML APIs need?Empirical Software Engineering10.1007/s10664-023-10320-z28:6Online publication date: 17-Oct-2023
https://dl.acm.org/doi/10.1007/s10664-023-10320-z
Ellmann MYu YFredericks EDevanbu P(2018)Natural language processing (NLP) applied on issue trackersProceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering10.1145/3283812.3283825(38-41)Online publication date: 4-Nov-2018
https://dl.acm.org/doi/10.1145/3283812.3283825
Inzunza SJuárez-Ramírez RJiménez S(2018)API DocumentationTrends and Advances in Information Systems and Technologies10.1007/978-3-319-77712-2_22(229-239)Online publication date: 17-May-2018
https://doi.org/10.1007/978-3-319-77712-2_22

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents