Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3273934.3273935acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article
Public Access

Accurately Predicting the Location of Code Fragments in Programming Video Tutorials Using Deep Learning

Published: 10 October 2018 Publication History

Abstract

Background: Video programming tutorials are becoming a popular resource for developers looking for quick answers to a specific programming problem or trying to learn a programming topic in more depth. Since the most important source of information for developers in many such videos is source code, it is important to be able to accurately extract this code from the screen, such that developers can easily integrate it into their programs. Aims: Our main goal is to facilitate the accurate and noise-free extraction of code appearing in programming video tutorials. In particular, in this paper we aim to accurately predict the location of source code in video frames. This will allow for the dramatic reduction of noise when using extraction techniques such as Optical Character Recognition, which could otherwise extract a large amount of irrelevant text (e.g., text found in menu items, package hierarchy, etc.). Method: We propose an approach using a deep Convolutional Neural Network (CNN) to predict the bounding box of fully-visible code sections in video frames. To evaluate our approach, we collected a set of 150 Java programming tutorials, having more than 82K frames in total. A sample of 4,000 frames from these videos were then manually annotated with the code bounding box location and used as the ground truth in an experiment evaluating our approach. Results: The results of the evaluation show that our approach is able to successfully predict the code bounding box in a given frame with 92% accuracy. Conclusions: Our CNN-based approach is able to accurately predict the location of source code within the frames of programming video tutorials.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481--2495.
[2]
Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1589--1598.
[3]
Mathias Ellmann, Alexander Oeser, Davide Fucci, and Walid Maalej. 2017. Find, Understand, and Extend Development Screencasts on YouTube. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Software Analytics. ACM, 1--7.
[4]
Javier Escobar-Avila, Esteban Parra, and Sonia Haiduc. 2017. Text Retrieval-based Tagging of Software Engineering Video Tutorials. In Proceedings of the International Conference on Software Engineering. 341--343.
[5]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.
[6]
Adam Grzywaczewski and Rahat Iqbal. 2012. Task-Specific Information Retrieval Systems for Software Engineers. J. Comput. System Sci. 78, 4 (2012), 1204--1218.
[7]
John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100--108.
[8]
Wei Hu, Yangyu Huang, Li Wei, Fan Zhang, and Hengchao Li. 2015. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors 2015 (2015).
[9]
Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision. Springer, 319--345.
[10]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal Loss for Dense Object Detection. ArXiv Preprint arXiv:1708.02002 (2017).
[11]
David G Lowe. 1999. Object Recognition from Local Scale-Invariant Features. In Proc. of the 7th IEEE Int. Conf. on Computer Vision, Vol. 2. IEEE, 1150--1157.
[12]
David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International journal of computer vision 60, 2 (2004), 91--110.
[13]
Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube. In Proc. of the Intl. Conf. on Program Comprehension. 104--114.
[14]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 807--814.
[15]
Jordan Ott, Abigail Atchison, Paul Harnack, and Erik Bergh, Adrienne adn Linstead. 2018. A Deep Learning Approach to Identifying Source Code in Images and Video. In IEEE/ACM 15th Working Conference on Mining Software Repositories. 376--386.
[16]
Esteban Parra, Javier Escobar-Avila, and Sonia Haiduc. 2018. Automatic Tagging for Software Engineering Videos. In Proceedings of the International Conference on Program Comprehension. 222--232.
[17]
Elizabeth Heidi Poché. 2017. Analyzing User Comments On YouTube Coding Tutorial Videos. mathesis. Louisiana State University, Baton Rouge, LA, USA.
[18]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Mir Hasan, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Too Long; Didn't Watch!: Extracting Relevant Fragments from Software Development Video Tutorials. In Proc. of the Intl. Conf. on Software Engineering. ACM, 261--272.
[19]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Codetube: Extracting Relevant Fragments from Software Development Video Tutorials. In IEEE/ACM International Conf. on Software Engineering Companion. IEEE, 645--648.
[20]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Rocco Oliveto, Massimiliano Di Penta, Sonia Cristina Haiduc, Barbara Russo, and Michele Lanza. 2017. Automatic Identification and Classification of Software Development Video Tutorial Fragments. IEEE Trans. on Software Engineering (2017).
[21]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.
[22]
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning Internal Representations by Error Propagation. Technical Report. California Univ San Diego La Jolla Institute for Cognitive Science.
[23]
Ray Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition, Vol. 2. IEEE, 629--633.
[24]
Shir Yadid and Eran Yahav. 2016. Extracting Code from Programming Tutorial Videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. ACM, 98--111.
[25]
C. Lawrence Zitnick and Piotr Dollár. 2014. Edge Boxes: Locating Object Proposals from Edges. In European Conf. on Computer Vision. 391--405.

Cited By

View all
  • (2024)SCC-GPT: Source Code Classification Based on Generative Pre-Trained TransformersMathematics10.3390/math1213212812:13(2128)Online publication date: 7-Jul-2024
  • (2023)VID2XML: Automatic Extraction of a Complete XML Data From Mobile Programming ScreencastsIEEE Transactions on Software Engineering10.1109/TSE.2022.318889849:4(1726-1740)Online publication date: 1-Apr-2023
  • (2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE'18: Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering
October 2018
97 pages
ISBN:9781450365932
DOI:10.1145/3273934
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Univ. of Oulu: University of Oulu

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning
  2. Programming video tutorials
  3. Software documentation
  4. Source code
  5. Video mining

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PROMISE'18

Acceptance Rates

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)9
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SCC-GPT: Source Code Classification Based on Generative Pre-Trained TransformersMathematics10.3390/math1213212812:13(2128)Online publication date: 7-Jul-2024
  • (2023)VID2XML: Automatic Extraction of a Complete XML Data From Mobile Programming ScreencastsIEEE Transactions on Software Engineering10.1109/TSE.2022.318889849:4(1726-1740)Online publication date: 1-Apr-2023
  • (2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
  • (2021)Summarizing Relevant Parts from Technical Videos2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER50967.2021.00047(434-445)Online publication date: Mar-2021
  • (2021)A Literature Review of Using Machine Learning in Software Development Life Cycle StagesIEEE Access10.1109/ACCESS.2021.31197469(140896-140920)Online publication date: 2021
  • (2020)Exploring the efficacy of transfer learning in mining image-based software artifactsJournal of Big Data10.1186/s40537-020-00335-47:1Online publication date: 8-Aug-2020
  • (2020)Technical Q8A Site Answer Recommendation via Question BoostingACM Transactions on Software Engineering and Methodology10.1145/341284530:1(1-34)Online publication date: 31-Dec-2020
  • (2020)psc2codeACM Transactions on Software Engineering and Methodology10.1145/339209329:3(1-38)Online publication date: 1-Jun-2020
  • (2020)UI Screens Identification and Extraction from Mobile Programming ScreencastsProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389265(319-330)Online publication date: 13-Jul-2020
  • (2020)A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming ScreencastsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387468(65-75)Online publication date: 29-Jun-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media