research-article

Public Access

Accurately Predicting the Location of Code Fragments in Programming Video Tutorials Using Deep Learning

Authors:

Mohammad Alahmadi,

Jonathan Hassel,

Biswas Parajuli,

Piyush KumarAuthors Info & Claims

PROMISE'18: Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering

Pages 2 - 11

https://doi.org/10.1145/3273934.3273935

Published: 10 October 2018 Publication History

Abstract

Background: Video programming tutorials are becoming a popular resource for developers looking for quick answers to a specific programming problem or trying to learn a programming topic in more depth. Since the most important source of information for developers in many such videos is source code, it is important to be able to accurately extract this code from the screen, such that developers can easily integrate it into their programs. Aims: Our main goal is to facilitate the accurate and noise-free extraction of code appearing in programming video tutorials. In particular, in this paper we aim to accurately predict the location of source code in video frames. This will allow for the dramatic reduction of noise when using extraction techniques such as Optical Character Recognition, which could otherwise extract a large amount of irrelevant text (e.g., text found in menu items, package hierarchy, etc.). Method: We propose an approach using a deep Convolutional Neural Network (CNN) to predict the bounding box of fully-visible code sections in video frames. To evaluate our approach, we collected a set of 150 Java programming tutorials, having more than 82K frames in total. A sample of 4,000 frames from these videos were then manually annotated with the code bounding box location and used as the ground truth in an experiment evaluating our approach. Results: The results of the evaluation show that our approach is able to successfully predict the code bounding box in a given frame with 92% accuracy. Conclusions: Our CNN-based approach is able to accurately predict the location of source code within the frames of programming video tutorials.

References

[1]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481--2495.

[2]

Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1589--1598.

Digital Library

[3]

Mathias Ellmann, Alexander Oeser, Davide Fucci, and Walid Maalej. 2017. Find, Understand, and Extend Development Screencasts on YouTube. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Software Analytics. ACM, 1--7.

Digital Library

[4]

Javier Escobar-Avila, Esteban Parra, and Sonia Haiduc. 2017. Text Retrieval-based Tagging of Software Engineering Video Tutorials. In Proceedings of the International Conference on Software Engineering. 341--343.

Digital Library

[5]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.

Digital Library

[6]

Adam Grzywaczewski and Rahat Iqbal. 2012. Task-Specific Information Retrieval Systems for Software Engineers. J. Comput. System Sci. 78, 4 (2012), 1204--1218.

Digital Library

[7]

John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28, 1 (1979), 100--108.

Digital Library

[8]

Wei Hu, Yangyu Huang, Li Wei, Fan Zhang, and Hengchao Li. 2015. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors 2015 (2015).

[9]

Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision. Springer, 319--345.

Digital Library

[10]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal Loss for Dense Object Detection. ArXiv Preprint arXiv:1708.02002 (2017).

[11]

David G Lowe. 1999. Object Recognition from Local Scale-Invariant Features. In Proc. of the 7th IEEE Int. Conf. on Computer Vision, Vol. 2. IEEE, 1150--1157.

Digital Library

[12]

David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International journal of computer vision 60, 2 (2004), 91--110.

Digital Library

[13]

Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube. In Proc. of the Intl. Conf. on Program Comprehension. 104--114.

Digital Library

[14]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 807--814.

Digital Library

[15]

Jordan Ott, Abigail Atchison, Paul Harnack, and Erik Bergh, Adrienne adn Linstead. 2018. A Deep Learning Approach to Identifying Source Code in Images and Video. In IEEE/ACM 15th Working Conference on Mining Software Repositories. 376--386.

Digital Library

[16]

Esteban Parra, Javier Escobar-Avila, and Sonia Haiduc. 2018. Automatic Tagging for Software Engineering Videos. In Proceedings of the International Conference on Program Comprehension. 222--232.

Digital Library

[17]

Elizabeth Heidi Poché. 2017. Analyzing User Comments On YouTube Coding Tutorial Videos. mathesis. Louisiana State University, Baton Rouge, LA, USA.

[18]

Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Mir Hasan, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Too Long; Didn't Watch!: Extracting Relevant Fragments from Software Development Video Tutorials. In Proc. of the Intl. Conf. on Software Engineering. ACM, 261--272.

Digital Library

[19]

Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Codetube: Extracting Relevant Fragments from Software Development Video Tutorials. In IEEE/ACM International Conf. on Software Engineering Companion. IEEE, 645--648.

Digital Library

[20]

Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Rocco Oliveto, Massimiliano Di Penta, Sonia Cristina Haiduc, Barbara Russo, and Michele Lanza. 2017. Automatic Identification and Classification of Software Development Video Tutorial Fragments. IEEE Trans. on Software Engineering (2017).

[21]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.

[22]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning Internal Representations by Error Propagation. Technical Report. California Univ San Diego La Jolla Institute for Cognitive Science.

[23]

Ray Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition, Vol. 2. IEEE, 629--633.

Digital Library

[24]

Shir Yadid and Eran Yahav. 2016. Extracting Code from Programming Tutorial Videos. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. ACM, 98--111.

Digital Library

[25]

C. Lawrence Zitnick and Piotr Dollár. 2014. Edge Boxes: Locating Object Proposals from Edges. In European Conf. on Computer Vision. 391--405.

Cited By

Alahmadi MAlshangiti MAlsubhi J(2024)SCC-GPT: Source Code Classification Based on Generative Pre-Trained TransformersMathematics10.3390/math1213212812:13(2128)Online publication date: 7-Jul-2024
https://doi.org/10.3390/math12132128
Alahmadi M(2023)VID2XML: Automatic Extraction of a Complete XML Data From Mobile Programming ScreencastsIEEE Transactions on Software Engineering10.1109/TSE.2022.318889849:4(1726-1740)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3188898
Malkadi ATayeb AHaiduc S(2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00184
Show More Cited By

Index Terms

Accurately Predicting the Location of Code Fragments in Programming Video Tutorials Using Deep Learning
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation

Recommendations

A deep learning approach to identifying source code in images and video
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories

While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in ...
A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming Screencasts
MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories

Programming screencasts can be a rich source of documentation for developers. However, despite the availability of such videos, the information available in them, and especially the source code being displayed is not easy to find, search, or reuse by ...
Do bugs lead to unnaturalness of source code?
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Texts in natural languages are highly repetitive and predictable because of the naturalness of natural languages. Recent research validated that source code in programming languages is also repetitive and predictable, and naturalness is an inherent ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

PROMISE'18: Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering

October 2018

97 pages

ISBN:9781450365932

DOI:10.1145/3273934

General Chair:
Burak Turhan
Monash University
,
Program Chairs:
Ayse Tosun
Istanbul Technical University
,
Shane McIntosh
McGill University

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Univ. of Oulu: University of Oulu

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

PROMISE'18

PROMISE'18: The 14th International Conference on Predictive Models and Data Analytics in Software Engineering

October 10, 2018

Oulu, Finland

Acceptance Rates

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
528
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)9

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alahmadi MAlshangiti MAlsubhi J(2024)SCC-GPT: Source Code Classification Based on Generative Pre-Trained TransformersMathematics10.3390/math1213212812:13(2128)Online publication date: 7-Jul-2024
https://doi.org/10.3390/math12132128
Alahmadi M(2023)VID2XML: Automatic Extraction of a Complete XML Data From Mobile Programming ScreencastsIEEE Transactions on Software Engineering10.1109/TSE.2022.318889849:4(1726-1740)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3188898
Malkadi ATayeb AHaiduc S(2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00184
Vahedi MRahman MKhomh FUddin GAntoniol G(2021)Summarizing Relevant Parts from Technical Videos2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER50967.2021.00047(434-445)Online publication date: Mar-2021
https://doi.org/10.1109/SANER50967.2021.00047
Shafiq SMashkoor AMayr-Dorn CEgyed A(2021)A Literature Review of Using Machine Learning in Software Development Life Cycle StagesIEEE Access10.1109/ACCESS.2021.31197469(140896-140920)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3119746
Best NOtt JLinstead E(2020)Exploring the efficacy of transfer learning in mining image-based software artifactsJournal of Big Data10.1186/s40537-020-00335-47:1Online publication date: 8-Aug-2020
https://doi.org/10.1186/s40537-020-00335-4
Gao ZXia XLo DGrundy J(2020)Technical Q8A Site Answer Recommendation via Question BoostingACM Transactions on Software Engineering and Methodology10.1145/341284530:1(1-34)Online publication date: 31-Dec-2020
https://dl.acm.org/doi/10.1145/3412845
Bao LXing ZXia XLo DWu MYang X(2020)psc2codeACM Transactions on Software Engineering and Methodology10.1145/339209329:3(1-38)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1145/3392093
Alahmadi MMalkadi AHaiduc S(2020)UI Screens Identification and Extraction from Mobile Programming ScreencastsProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389265(319-330)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.1145/3387904.3389265
Malkadi AAlahmadi MHaiduc S(2020)A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming ScreencastsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387468(65-75)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3379597.3387468
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents