Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3196398.3196439acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Feature location using crowd-based screencasts

Published: 28 May 2018 Publication History

Abstract

Crowd-based multi-media documents such as screencasts have emerged as a source for documenting requirements of agile software projects. For example, screencasts can describe buggy scenarios of a software product, or present new features in an upcoming release. Unfortunately, the binary format of videos makes traceability between the video content and other related software artifacts (e.g., source code, bug reports) difficult. In this paper, we propose an LDA-based feature location approach that takes as input a set of screencasts (i.e., the GUI text and/or spoken words) to establish traceability link between the features described in the screencasts and source code fragments implementing them. We report on a case study conducted on 10 WordPress screencasts, to evaluate the applicability of our approach in linking these screencasts to their relevant source code artifacts. We find that the approach is able to successfully pinpoint relevant source code files at the top 10 hits using speech and GUI text. We also found that term frequency rebalancing can reduce noise and yield more precise results.

References

[1]
2017. Exuberant Ctags. http://ctags.sourceforge.net/. (2017).
[2]
2017. FFmpeg. https://www.fmpeg.org/. (2017).
[3]
2017. Google Cloud Vision API. https://cloud.google.com/vision/. (2017).
[4]
2017. IBM WatsonSpeech To Text. https://www.ibm.com/watson/services/speech-to-text/. (2017).
[5]
2017. Tesseract. https://github.com/tesseract-ocr/tesseract. (2017).
[6]
2017. WordPress. https://WordPress.com/. (2017).
[7]
2017. WordPress Video Tutorials. https://en.support.WordPress.com/video-tutorials/. (2017).
[8]
Release Year 2017. MALLET: MAchine Learning for LanguagE Toolkit. http://mallet.cs.umass.edu/. (Release Year 2017).
[9]
Release Year 2017. Xdebug Extension for PHP. https://xdebug.org/. (Release Year 2017).
[10]
Surafel Lemma Abebe, Anita Alicante, Anna Corazza, and Paolo Tonella. 2013. Supporting concept location through identifier parsing and ontology extraction. Journal of Systems and Software 86, 11 (2013), 2919 -- 2938.
[11]
Kenneth M. Anderson, Susanne A. Sherba, and William V. Lepthien. 2002. Towards large-scale information integration. In Proceedings of the 24th international conference on Software engineering - ICSE '02. ACM Press, New York, New York, USA, 524.
[12]
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, Xin Xia, and Bo Zhou. 2017. Extracting and analyzing time-series HCI data from screen-captured task videos. Empirical Software Engineering 22, 1 (01 Feb 2017), 134--174.
[13]
Lingfeng Bao, Jing Li, Z. Xing, Xinyu Wang, and Bo Zhou. 2015. Reverse engineering time-series interaction data from screen-captured videos. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 399--408.
[14]
B. Bassett and N. A. Kraft. 2013. Structural information based term weighting in text retrieval for feature location. In 2013 21st International Conference on Program Comprehension (ICPC). 133--141.
[15]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2001. Journal of machine learning research: JMLR. Vol. 3. MIT Press. 993--1022 pages.
[16]
R. Brunelli and T. Poggio. 1993. Face recognition: features versus templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 10 (Oct 1993), 1042--1052.
[17]
Tse-Hsun Chen, Stephen W Thomas, and Ahmed E Hassan. 2016. A survey on the use of topic models when mining software repositories. Empirical Software Engineering 21, 5 (oct 2016), 1843--1919.
[18]
X. Cheng, X. Yan, Y. Lan, and J. Guo. 2014. BTM: Topic Modeling over Short Texts. IEEE Transactions on Knowledge and Data Engineering 26, 12 (Dec 2014), 2928--2941.
[19]
Mohammed Cheriet, Nawwaf Kharma, Cheng-lin Liu, and Ching Suen. 2007. Character Recognition Systems: A Guide for Students and Practitioners. Wiley-Interscience.
[20]
Jane Cleland-Huang, Orlena C. Z. Gotel, Jane Huffman Hayes, Patrick Mäder, and Andrea Zisman. 2014. Software traceability: trends and future directions. In Proceedings of the on Future of Software Engineering - FOSE 2014. ACM Press, New York, New York, USA, 55--69.
[21]
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391--407.
[22]
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey. Journal of software: Evolution and Process 25, 1 (2013), 53--95. arXiv:1408.1293
[23]
Brian P. Eddy, Nicholas A. Kraft, and Jeff Gray. 2018. Impact of structural weighting on a latent Dirichlet allocation???based feature location technique. Journal of Software: Evolution and Process 30, 1 (2018), e1892--n/a. e1892 smr.1892.
[24]
J. Escobar-Avila, E. Parra, and S. Haiduc. 2017. Text Retrieval-Based Tagging of Software Engineering Video Tutorials. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 341--343.
[25]
Laleh Eshkevari, Giuliano Antoniol, James R. Cordy, and Massimiliano Di Penta. 2014. Identifying and Locating Interference Issues in PHP Applications: The Case of WordPress. In Proceedings of the 22Nd International Conference on Program Comprehension (ICPC 2014). ACM, New York, NY, USA, 157--167.
[26]
K. Gallaba, A. Mesbah, and I. Beschastnikh. 2015. Don't Call Us, We'll Call You: Characterizing Callbacks in Javascript. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--10.
[27]
H. Kagdi, J. I. Maletic, and B. Sharif. 2007. Mining software repositories for traceability links. In 15th IEEE International Conference on Program Comprehension (ICPC '07). 145--154.
[28]
Iman Keivanloo. 2013. Source Code Similarity and Clone Search. (2013).
[29]
Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic Modeling for Short Texts with Auxiliary Word Embeddings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 165--174.
[30]
Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2008. Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation. 2008 15th Working Conference on Reverse Engineering (2008), 155--164.
[31]
Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Information and Software Technology 52, 9 (2010), 972--990.
[32]
Laura MacLeod, Andreas Bergen, and Margaret-Anne Storey. 2017. Documenting and sharing software knowledge using screencasts. Empirical Software Engineering 22, 3 (01 Jun 2017), 1478--1507.
[33]
Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension (ICPC '15). IEEE Press, Piscataway, NJ, USA, 104--114.
[34]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch??tze. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.
[35]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
[36]
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. 2004. An information retrieval approach to concept location in source code. In 11th Working Conference on Reverse Engineering. 214--223.
[37]
P. Moslehi, B. Adams, and J. Rilling. 2016. On Mining Crowd-Based Speech Documentation. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). 259--268.
[38]
Mark S. Nixon and Alberto S. Aguado. 2012. Chapter 5 - High-level feature extraction: fixed shape matching. In Feature Extraction and Image Processing for Computer Vision (Third edition) (third edition ed.), Mark S. Nixon and Alberto S. Aguado (Eds.). Academic Press, Oxford, 217 -- 291.
[39]
Mark S. Nixon and Alberto S. Aguado. 2012. Chapter 7 - Object description. In Feature Extraction and Image Processing for Computer Vision (Third edition) (third edition ed.), Mark S. Nixon and Alberto S. Aguado (Eds.). Academic Press, Oxford, 343 -- 397.
[40]
Elizabeth Poché, Nishant Jha, Grant Williams, Jazmine Staten, Miles Vesper, and Anas Mahmoud. 2017. Analyzing User Comments on YouTube Coding Tutorial Videos. In Proceedings of the 25th International Conference on Program Comprehension (ICPC '17). IEEE Press, Piscataway, NJ, USA, 196--206.
[41]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Mir Hasan, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Too long; didn't watch!. In Proceedings of the 38th International Conference on Software Engineering - ICSE '16. ACM Press, New York, New York, USA, 261--272.
[42]
L. Ponzanelli, G. Bavota, A. Mocci, R. Oliveto, M. Di Penta, S. C. Haiduc, B. Russo, and M. Lanza. 2017. Automatic Identifcation and Classification of Software Development Video Tutorial Fragments. IEEE Transactions on Software Engineering PP, 99 (2017), 1--1.
[43]
D. Poshyvanyk, Y. G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. 2007. Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering 33, 6 (June 2007), 420--432.
[44]
George Spanoudakis and Andrea Zisman. 2004. Software Traceability: A Roadmap. In Handbook of Software Engineering and Knowledge Engineering. World Scientific Publishing, 395--428.
[45]
Spencer Hill. Publish Year 2016. Bookly WordPress Plugin - Bugs with Settings and Staff / Services. https://youtu.be/Am9SNUhSz4w. (Publish Year 2016).
[46]
William G Stillwell, David A Seaver, and Ward Edwards. 1981. A comparison of weight approximation techniques in multiattribute utility decision making. Organizational Behavior and Human Performance 28, 1 (1981), 62--77.
[47]
Stephen W. Thomas. 2012. Mining Unstructured Software Repositories Using IR Models. Ph.D. Dissertation. Queen's University.
[48]
P. van der Spek, S. Klusener, and P. van de Laar. 2008. Towards Recovering Architectural Concepts Using Latent Semantic Indexing. In 2008 12th European Conference on Software Maintenance and Reengineering. 253--257.
[49]
Shir Yadid and Eran Yahav. 2016. Extracting Code from Programming Tutorial Videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2016). ACM, New York, NY, USA, 98--111.

Cited By

View all
  • (2024)Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMathematics10.3390/math1207103612:7(1036)Online publication date: 30-Mar-2024
  • (2023)The Co-evolution of the WordPress Platform and Its PluginsACM Transactions on Software Engineering and Methodology10.1145/353370032:1(1-24)Online publication date: 13-Feb-2023
  • (2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
May 2018
627 pages
ISBN:9781450357166
DOI:10.1145/3196398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowd-based documentation
  2. feature location
  3. information extraction
  4. mining video content
  5. software traceability

Qualifiers

  • Research-article

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMathematics10.3390/math1207103612:7(1036)Online publication date: 30-Mar-2024
  • (2023)The Co-evolution of the WordPress Platform and Its PluginsACM Transactions on Software Engineering and Methodology10.1145/353370032:1(1-24)Online publication date: 13-Feb-2023
  • (2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
  • (2023)Image‐based communication on social coding platformsJournal of Software: Evolution and Process10.1002/smr.260936:5Online publication date: 28-Aug-2023
  • (2022)VID2META: Complementing Android Programming Screencasts with Code Elements and GUIsMathematics10.3390/math1017317510:17(3175)Online publication date: 3-Sep-2022
  • (2022)A user survey on the adoption of crowd-based software engineering instructional screencasts by the new generation of software developersJournal of Systems and Software10.1016/j.jss.2021.111144185:COnline publication date: 1-Mar-2022
  • (2021)Topic modeling in software engineering researchEmpirical Software Engineering10.1007/s10664-021-10026-026:6Online publication date: 1-Nov-2021
  • (2020)psc2codeACM Transactions on Software Engineering and Methodology10.1145/339209329:3(1-38)Online publication date: 1-Jun-2020
  • (2020)UI Screens Identification and Extraction from Mobile Programming ScreencastsProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389265(319-330)Online publication date: 13-Jul-2020
  • (2020)A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming ScreencastsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387468(65-75)Online publication date: 29-Jun-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media