research-article

Feature location using crowd-based screencasts

Authors:

Parisa Moslehi,

Juergen RillingAuthors Info & Claims

MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories

Pages 192 - 202

https://doi.org/10.1145/3196398.3196439

Published: 28 May 2018 Publication History

Abstract

Crowd-based multi-media documents such as screencasts have emerged as a source for documenting requirements of agile software projects. For example, screencasts can describe buggy scenarios of a software product, or present new features in an upcoming release. Unfortunately, the binary format of videos makes traceability between the video content and other related software artifacts (e.g., source code, bug reports) difficult. In this paper, we propose an LDA-based feature location approach that takes as input a set of screencasts (i.e., the GUI text and/or spoken words) to establish traceability link between the features described in the screencasts and source code fragments implementing them. We report on a case study conducted on 10 WordPress screencasts, to evaluate the applicability of our approach in linking these screencasts to their relevant source code artifacts. We find that the approach is able to successfully pinpoint relevant source code files at the top 10 hits using speech and GUI text. We also found that term frequency rebalancing can reduce noise and yield more precise results.

References

[1]

2017. Exuberant Ctags. http://ctags.sourceforge.net/. (2017).

[2]

2017. FFmpeg. https://www.fmpeg.org/. (2017).

[3]

2017. Google Cloud Vision API. https://cloud.google.com/vision/. (2017).

[4]

2017. IBM WatsonSpeech To Text. https://www.ibm.com/watson/services/speech-to-text/. (2017).

[5]

2017. Tesseract. https://github.com/tesseract-ocr/tesseract. (2017).

[6]

2017. WordPress. https://WordPress.com/. (2017).

[7]

2017. WordPress Video Tutorials. https://en.support.WordPress.com/video-tutorials/. (2017).

[8]

Release Year 2017. MALLET: MAchine Learning for LanguagE Toolkit. http://mallet.cs.umass.edu/. (Release Year 2017).

[9]

Release Year 2017. Xdebug Extension for PHP. https://xdebug.org/. (Release Year 2017).

[10]

Surafel Lemma Abebe, Anita Alicante, Anna Corazza, and Paolo Tonella. 2013. Supporting concept location through identifier parsing and ontology extraction. Journal of Systems and Software 86, 11 (2013), 2919 -- 2938.

Digital Library

[11]

Kenneth M. Anderson, Susanne A. Sherba, and William V. Lepthien. 2002. Towards large-scale information integration. In Proceedings of the 24th international conference on Software engineering - ICSE '02. ACM Press, New York, New York, USA, 524.

Digital Library

[12]

Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, Xin Xia, and Bo Zhou. 2017. Extracting and analyzing time-series HCI data from screen-captured task videos. Empirical Software Engineering 22, 1 (01 Feb 2017), 134--174.

Digital Library

[13]

Lingfeng Bao, Jing Li, Z. Xing, Xinyu Wang, and Bo Zhou. 2015. Reverse engineering time-series interaction data from screen-captured videos. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 399--408.

[14]

B. Bassett and N. A. Kraft. 2013. Structural information based term weighting in text retrieval for feature location. In 2013 21st International Conference on Program Comprehension (ICPC). 133--141.

[15]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2001. Journal of machine learning research: JMLR. Vol. 3. MIT Press. 993--1022 pages.

Digital Library

[16]

R. Brunelli and T. Poggio. 1993. Face recognition: features versus templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 10 (Oct 1993), 1042--1052.

Digital Library

[17]

Tse-Hsun Chen, Stephen W Thomas, and Ahmed E Hassan. 2016. A survey on the use of topic models when mining software repositories. Empirical Software Engineering 21, 5 (oct 2016), 1843--1919.

Digital Library

[18]

X. Cheng, X. Yan, Y. Lan, and J. Guo. 2014. BTM: Topic Modeling over Short Texts. IEEE Transactions on Knowledge and Data Engineering 26, 12 (Dec 2014), 2928--2941.

[19]

Mohammed Cheriet, Nawwaf Kharma, Cheng-lin Liu, and Ching Suen. 2007. Character Recognition Systems: A Guide for Students and Practitioners. Wiley-Interscience.

Digital Library

[20]

Jane Cleland-Huang, Orlena C. Z. Gotel, Jane Huffman Hayes, Patrick Mäder, and Andrea Zisman. 2014. Software traceability: trends and future directions. In Proceedings of the on Future of Software Engineering - FOSE 2014. ACM Press, New York, New York, USA, 55--69.

Digital Library

[21]

Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391--407.

[22]

Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey. Journal of software: Evolution and Process 25, 1 (2013), 53--95. arXiv:1408.1293

[23]

Brian P. Eddy, Nicholas A. Kraft, and Jeff Gray. 2018. Impact of structural weighting on a latent Dirichlet allocation???based feature location technique. Journal of Software: Evolution and Process 30, 1 (2018), e1892--n/a. e1892 smr.1892.

[24]

J. Escobar-Avila, E. Parra, and S. Haiduc. 2017. Text Retrieval-Based Tagging of Software Engineering Video Tutorials. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 341--343.

Digital Library

[25]

Laleh Eshkevari, Giuliano Antoniol, James R. Cordy, and Massimiliano Di Penta. 2014. Identifying and Locating Interference Issues in PHP Applications: The Case of WordPress. In Proceedings of the 22Nd International Conference on Program Comprehension (ICPC 2014). ACM, New York, NY, USA, 157--167.

Digital Library

[26]

K. Gallaba, A. Mesbah, and I. Beschastnikh. 2015. Don't Call Us, We'll Call You: Characterizing Callbacks in Javascript. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--10.

[27]

H. Kagdi, J. I. Maletic, and B. Sharif. 2007. Mining software repositories for traceability links. In 15th IEEE International Conference on Program Comprehension (ICPC '07). 145--154.

Digital Library

[28]

Iman Keivanloo. 2013. Source Code Similarity and Clone Search. (2013).

[29]

Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic Modeling for Short Texts with Auxiliary Word Embeddings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 165--174.

Digital Library

[30]

Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2008. Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation. 2008 15th Working Conference on Reverse Engineering (2008), 155--164.

Digital Library

[31]

Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Information and Software Technology 52, 9 (2010), 972--990.

Digital Library

[32]

Laura MacLeod, Andreas Bergen, and Margaret-Anne Storey. 2017. Documenting and sharing software knowledge using screencasts. Empirical Software Engineering 22, 3 (01 Jun 2017), 1478--1507.

Digital Library

[33]

Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension (ICPC '15). IEEE Press, Piscataway, NJ, USA, 104--114.

Digital Library

[34]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch??tze. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.

Digital Library

[35]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.

Digital Library

[36]

A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. 2004. An information retrieval approach to concept location in source code. In 11th Working Conference on Reverse Engineering. 214--223.

Digital Library

[37]

P. Moslehi, B. Adams, and J. Rilling. 2016. On Mining Crowd-Based Speech Documentation. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). 259--268.

Digital Library

[38]

Mark S. Nixon and Alberto S. Aguado. 2012. Chapter 5 - High-level feature extraction: fixed shape matching. In Feature Extraction and Image Processing for Computer Vision (Third edition) (third edition ed.), Mark S. Nixon and Alberto S. Aguado (Eds.). Academic Press, Oxford, 217 -- 291.

[39]

Mark S. Nixon and Alberto S. Aguado. 2012. Chapter 7 - Object description. In Feature Extraction and Image Processing for Computer Vision (Third edition) (third edition ed.), Mark S. Nixon and Alberto S. Aguado (Eds.). Academic Press, Oxford, 343 -- 397.

[40]

Elizabeth Poché, Nishant Jha, Grant Williams, Jazmine Staten, Miles Vesper, and Anas Mahmoud. 2017. Analyzing User Comments on YouTube Coding Tutorial Videos. In Proceedings of the 25th International Conference on Program Comprehension (ICPC '17). IEEE Press, Piscataway, NJ, USA, 196--206.

Digital Library

[41]

Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Mir Hasan, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Too long; didn't watch!. In Proceedings of the 38th International Conference on Software Engineering - ICSE '16. ACM Press, New York, New York, USA, 261--272.

Digital Library

[42]

L. Ponzanelli, G. Bavota, A. Mocci, R. Oliveto, M. Di Penta, S. C. Haiduc, B. Russo, and M. Lanza. 2017. Automatic Identifcation and Classification of Software Development Video Tutorial Fragments. IEEE Transactions on Software Engineering PP, 99 (2017), 1--1.

[43]

D. Poshyvanyk, Y. G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. 2007. Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering 33, 6 (June 2007), 420--432.

Digital Library

[44]

George Spanoudakis and Andrea Zisman. 2004. Software Traceability: A Roadmap. In Handbook of Software Engineering and Knowledge Engineering. World Scientific Publishing, 395--428.

[45]

Spencer Hill. Publish Year 2016. Bookly WordPress Plugin - Bugs with Settings and Staff / Services. https://youtu.be/Am9SNUhSz4w. (Publish Year 2016).

[46]

William G Stillwell, David A Seaver, and Ward Edwards. 1981. A comparison of weight approximation techniques in multiattribute utility decision making. Organizational Behavior and Human Performance 28, 1 (1981), 62--77.

[47]

Stephen W. Thomas. 2012. Mining Unstructured Software Repositories Using IR Models. Ph.D. Dissertation. Queen's University.

[48]

P. van der Spek, S. Klusener, and P. van de Laar. 2008. Towards Recovering Architectural Concepts Using Latent Semantic Indexing. In 2008 12th European Conference on Software Maintenance and Reengineering. 253--257.

Digital Library

[49]

Shir Yadid and Eran Yahav. 2016. Extracting Code from Programming Tutorial Videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2016). ACM, New York, NY, USA, 98--111.

Digital Library

Cited By

Alahmadi MAlshangiti M(2024)Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMathematics10.3390/math1207103612:7(1036)Online publication date: 30-Mar-2024
https://doi.org/10.3390/math12071036
Lin JSayagh MHassan A(2023)The Co-evolution of the WordPress Platform and Its PluginsACM Transactions on Software Engineering and Methodology10.1145/353370032:1(1-24)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3533700
Malkadi ATayeb AHaiduc S(2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00184
Show More Cited By

Index Terms

Feature location using crowd-based screencasts
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation
    2. Software verification and validation
      1. Process validation
        Traceability

Recommendations

A feature location approach for mapping application features extracted from crowd-based screencasts to source code
Abstract
Crowd-based multimedia documents such as screencasts have emerged as a source for documenting requirements, the workflow and implementation issues of open source and agile software projects. For example, users can show and narrate how they ...
On mining crowd-based speech documentation
MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories

Despite the globalization of software development, relevant documentation of a project, such as requirements and design documents, often still is missing, incomplete or outdated. However, parts of that documentation can be found outside the project, ...
A user survey on the adoption of crowd-based software engineering instructional screencasts by the new generation of software developers
Abstract Context:
In recent years, crowd-based content in the form of instructional screencast videos has gained popularity among software engineers. For organizations to remain competitive in attracting and retaining their workforce,...
Highlights
- Videos are a popular information sources used by less experienced programmers.
- ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories

May 2018

627 pages

ISBN:9781450357166

DOI:10.1145/3196398

General Chair:
Andy Zaidman
Delft University of Technology, Netherlands
,
Program Chairs:
Yasutaka Kamei
Kyushu University, Japan
,
Emily Hill
Drew University

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 28 - 29, 2018

Gothenburg, Sweden

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
161
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alahmadi MAlshangiti M(2024)Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMathematics10.3390/math1207103612:7(1036)Online publication date: 30-Mar-2024
https://doi.org/10.3390/math12071036
Lin JSayagh MHassan A(2023)The Co-evolution of the WordPress Platform and Its PluginsACM Transactions on Software Engineering and Methodology10.1145/353370032:1(1-24)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3533700
Malkadi ATayeb AHaiduc S(2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00184
Nayebi MAdams B(2023)Image‐based communication on social coding platformsJournal of Software: Evolution and Process10.1002/smr.260936:5Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1002/smr.2609
Alahmadi M(2022)VID2META: Complementing Android Programming Screencasts with Code Elements and GUIsMathematics10.3390/math1017317510:17(3175)Online publication date: 3-Sep-2022
https://doi.org/10.3390/math10173175
Moslehi PRilling JAdams B(2022)A user survey on the adoption of crowd-based software engineering instructional screencasts by the new generation of software developersJournal of Systems and Software10.1016/j.jss.2021.111144185:COnline publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.jss.2021.111144
Silva CGalster MGilson F(2021)Topic modeling in software engineering researchEmpirical Software Engineering10.1007/s10664-021-10026-026:6Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1007/s10664-021-10026-0
Bao LXing ZXia XLo DWu MYang X(2020)psc2codeACM Transactions on Software Engineering and Methodology10.1145/339209329:3(1-38)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1145/3392093
Alahmadi MMalkadi AHaiduc S(2020)UI Screens Identification and Extraction from Mobile Programming ScreencastsProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389265(319-330)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.1145/3387904.3389265
Malkadi AAlahmadi MHaiduc S(2020)A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming ScreencastsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387468(65-75)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3379597.3387468
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents