research-article

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications

Authors:

Lin ChenAuthors Info & Claims

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Article No.: 6, Pages 1 - 13

https://doi.org/10.1145/3551349.3556966

Published: 05 January 2023 Publication History

Abstract

Optical character recognition (OCR) algorithms often run slow. They may take several seconds to recognize the texts on a GUI screen, which makes OCR-based widget localization in test automation unfriendly for use, especially on GPU-free computers. This paper first concludes a common type of widget text to be located in GUI testing: label text, which are short texts in widgets like buttons, menu items, and window titles. We then investigate the characteristics of texts on a GUI screen and introduce a fast GPU-independent Label Text Screening (LTS) technique to accelerate the OCR process for label text localization. The technique opens the black box of OCR engines and uses a combination of simple methods to avoid excessive text analysis on a screen as much as possible. Experiments show that, on the subject datasets, LTS reduces the average OCR-based label text localization time to a large extent. On 4k resolution GUI screens, it keeps the localization time below 0.5 seconds for over about 60% of cases without GPU support on a normal laptop computer. In contrast, the existing CPU-based approaches built on popular OCR engines Tesseract, PaddleOCR, and EasyOCR usually need over 2 seconds to achieve the same goal on the same platform. Even with GPU acceleration, they can hardly keep the analysis time in 1 second. We believe the proposed approach would be helpful for implementing OCR-based test automation tools.

References

[1]

2022. Appium: Automation for iOS and Android apps. http://appium.io

[2]

2022. EasyOCR. https://github.com/JaidedAI/EasyOCR

[3]

2022. OpenCV. https://opencv.org/

[4]

2022. PaddleOCR. https://github.com/PaddlePaddle/PaddleOCR

[5]

2022. python-Levenshtein. https://pypi.org/project/python-Levenshtein/

[6]

2022. Robotium. https://github.com/RobotiumTech/robotium

[7]

2022. The source code and the datasets used in the experiment. https://doi.org/10.6084/m9.figshare.19722013.v1

[8]

2022. TencentOCR. https://intl.cloud.tencent.com/products/ocr

[9]

2022. Tesseract. https://github.com/tesseract-ocr/tesseract

[10]

2022. UIAutomator. http://developer.android.com/tools/testingsupport-library

[11]

Mohammad Alahmadi, Abdulkarim Khormi, Biswas Parajuli, Jonathan Hassel, Sonia Haiduc, and Piyush Kumar. 2020. Code localization in programming screencasts. Empirical Software Engineering 25, 2 (2020), 1536–1572. https://doi.org/10.1007/s10664-019-09759-w

[12]

Emil Alégroth and Robert Feldt. 2017. On the long-term use of visual gui testing in industrial practice: a case study. Empirical Software Engineering 22, 6 (2017), 2937–2971. https://doi.org/10.1007/s10664-016-9497-6

Digital Library

[13]

Luca Ardito, Andrea Bottino, Riccardo Coppola, Fabrizio Lamberti, Francesco Manigrasso, Lia Morra, and Marco Torchiano. 2022. Feature matching-based approaches to improve the robustness of Android visual GUI testing. ACM Transactions on Software Engineering and Methodology 31, 2(2022). https://doi.org/10.1145/3477427

Digital Library

[14]

Luca Ardito, Riccardo Coppola, Maurizio Morisio, and Marco Torchiano. 2019. Espresso vs. EyeAutomate: an experiment for the comparison of two generations of Android GUI testing. In Proceedings of the Evaluation and Assessment on Software Engineering (Copenhagen, Denmark) (EASE ’19). ACM, New York, NY, USA, 13–22. https://doi.org/10.1145/3319008.3319022

Digital Library

[15]

Mohammad Bajammal, Andrea Stocco, Davood Mazinanian, and Ali Mesbah. 2022. A survey on the use of computer vision to improve software engineering tasks. IEEE Transactions on Software Engineering 48, 5 (2022), 1722–1742. https://doi.org/10.1109/TSE.2020.3032986

Digital Library

[16]

Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do developers read compiler error messages?. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 575–585. https://doi.org/10.1109/ICSE.2017.59

Digital Library

[17]

Emil Börjesson and Robert Feldt. 2012. Automated system testing using visual GUI testing tools: A comparative study in industry. In Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation (ICST). 350–359. https://doi.org/10.1109/ICST.2012.115

Digital Library

[18]

Min Cai, Jiqiang Song, and Michael R. Lyu. 2002. A new approach for video text detection. In IEEE International Conference on Image Processing, Vol. 1. https://doi.org/10.1109/icip.2002.1037973

[19]

Tsung Hsiang Chang, Tom Yeh, and Robert C. Miller. 2010. GUI testing using computer vision. In Proceedings of International Conference on Human Factors in Computing Systems (CHI), Vol. 3. 1535–1544. https://doi.org/10.1145/1753326.1753555

Digital Library

[20]

Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, and Bernd Girod. 2011. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In International Conference on Image Processing (ICIP). 2609–2612. https://doi.org/10.1109/ICIP.2011.6116200

[21]

Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?. In Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1202–1214. https://doi.org/10.1145/3368089.3409691

Digital Library

[22]

Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. Text recognition in the wild: a survey. Comput. Surveys 54, 2 (2021). https://doi.org/10.1145/3440756

Digital Library

[23]

Nathan Cooper, Carlos Bernal-Cárdenas, Oscar Chaparro, Kevin Moran, and Denys Poshyvanyk. 2021. It takes two to tango: combining visual and textual information for detecting duplicate video-based bug reports. In IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 957–969. https://doi.org/10.1109/ICSE43902.2021.00091

Digital Library

[24]

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST). 845–854. https://doi.org/10.1145/3126594.3126651

Digital Library

[25]

Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2963–2970. https://doi.org/10.1109/CVPR.2010.5540041

[26]

Mattia Fazzini and Alessandro Orso. 2017. Automated cross-platform inconsistency detection for mobile apps. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 308–318. https://doi.org/10.1109/ASE.2017.8115644

[27]

Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Xiang Bai, Jing Wang, and Shenggao Zhu. 2021. Video text tracking with a spatio-temporal complementary model. IEEE Transactions on Image Processing 30 (2021), 9321–9331. https://doi.org/10.1109/TIP.2021.3124313

[28]

Gang Hu, Linjie Zhu, and Junfeng Yang. 2018. AppFlow: Using machine learning to synthesize robust, reusable UI tests. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 269–282. https://doi.org/10.1145/3236024.3236055

Digital Library

[29]

Rong Huang, Palaiahnakote Shivakumara, and Seiichi Uchida. 2013. Scene character detection by an edge-ray filter. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 462–466. https://doi.org/10.1109/ICDAR.2013.99

Digital Library

[30]

Weilin Huang, Zhe Lin, Jianchao Yang, and Jue Wang. 2013. Text localization in natural images using stroke feature transform and text covariance descriptors. In IEEE International Conference on Computer Vision (ICCV). 1241–1248. https://doi.org/10.1109/ICCV.2013.157

Digital Library

[31]

Xiaodong Huang. 2019. Automatic video scene text detection based on saliency edge map. Multimedia Tools and Applications 78, 24 (2019), 34819–34838. https://doi.org/10.1007/s11042-019-08045-7

[32]

Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116, 1 (2016), 1–20. https://doi.org/10.1007/s11263-015-0823-z

Digital Library

[33]

Tauseef Khan, Ram Sarkar, and Ayatullah Faruk Mollah. 2021. Deep learning approaches to scene text detection: a comprehensive review. 54, 5 (2021), 3239–3298. https://doi.org/10.1007/s10462-020-09930-6

Digital Library

[34]

Vijeta Khare, Palaiahnakote Shivakumara, Paramesran Raveendran, and Michael Blumenstein. 2016. A blind deconvolution model for scene text detection and recognition in video. Pattern Recognition 54(2016), 128–148. https://doi.org/10.1016/j.patcog.2016.01.008

Digital Library

[35]

Abdulkarim Khormi, Mohammad Alahmadi, and Sonia Haiduc. 2020. A study on the accuracy of OCR engines for source code transcription from programming screencasts. In Proceedings of the IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). 65–75. https://doi.org/10.1145/3379597.3387468

Digital Library

[36]

Jung Jin Lee, Pyoung Hean Lee, Seong Whan Lee, Alan Yuille, and Christof Koch. 2011. AdaBoost for text detection in natural scene. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 429–434. https://doi.org/10.1109/ICDAR.2011.93

Digital Library

[37]

Jun Wei Lin, Navid Salehnamadi, and Sam Malek. 2020. Test automation in open-source Android apps: a large-scale empirical study. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1078–1089. https://doi.org/10.1145/3324884.3416623

Digital Library

[38]

Chunmei Liu, Chunheng Wang, and Ruwei Dai. 2005. Text detection in images based on unsupervised classification of edge-based features. In 8th International Conference on Document Analysis and Recognition (ICDAR). 610–614 Vol. 2. https://doi.org/10.1109/ICDAR.2005.228

Digital Library

[39]

Xiaoqian Liu and Weiqiang Wang. 2012. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia 14, 2 (2012), 482–489. https://doi.org/10.1109/TMM.2011.2177646

Digital Library

[40]

Shangbang Long, Xin He, and Cong Yao. 2021. Scene text detection and recognition: the deep learning era. International Journal of Computer Vision 129, 1 (2021), 161–184. https://doi.org/10.1007/s11263-020-01369-0

Digital Library

[41]

Michael R. Lyu, Jiqiang Song, and Min Cai. 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15, 2(2005), 243–255. https://doi.org/10.1109/TCSVT.2004.841653

[42]

Shilpa Mahajan and Rajneesh Rani. 2021. Text detection and localization in scene images: a broad review. 54, 6 (2021), 4317–4377. https://doi.org/10.1007/s10462-021-10000-8

Digital Library

[43]

V. N. Manjunath Aradhya, H. T. Basavaraju, and D. S. Guru. 2021. Decade research on text detection in images/videos: a review. 14, 2 (2021), 405–431. https://doi.org/10.1007/s12065-019-00248-z

[44]

Tuan Anh Nguyen and Christoph Csallner. 2016. Reverse engineering mobile application user interfaces with REMAUI. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259. https://doi.org/10.1109/ASE.2015.32

Digital Library

[45]

Ednawati Rainarli, Suprapto, and Wahyono. 2021. A decade: Review of scene text detection methods. Computer Science Review 42 (2021), 100434. https://doi.org/10.1016/j.cosrev.2021.100434

Digital Library

[46]

Zobeir Raisi, Mohamed A. Naiel, Paul Fieguth, Steven Wardell, and John Zelek. 2020. Text detection and recognition in the wild: a review. (2020), 13–15. arxiv:2006.04305http://arxiv.org/abs/2006.04305

[47]

Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, and Chew Lim Tan. 2012. Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology 22, 8(2012), 1227–1235. https://doi.org/10.1109/TCSVT.2012.2198129

Digital Library

[48]

Richard Szeliski. 2010. Computer vision: algorithms and applications (1st ed.). Springer-Verlag, Berlin, Heidelberg.

[49]

Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE International Conference on Computer Vision. 8439–8448. https://doi.org/10.1109/ICCV.2019.00853

[50]

Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). ACM, New York, NY, USA, 1655–1659. https://doi.org/10.1145/3368089.3417940

Digital Library

[51]

Tongtong Xu, Minxue Pan, Yu Pei, Guiyin Li, Xia Zeng, Tian Zhang, Yuetang Deng, and Xuandong Li. 2021. GUIDER: GUI structure and vision co-guided test script repair for Android apps. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 191–203. https://doi.org/10.1145/3460319.3464830

Digital Library

[52]

Bo Yang, Zhenchang Xing, Xin Xia, Chunyang Chen, Deheng Ye, and Shanping Li. 2021. Don’t do that! Hunting down visual design smells in complex UIs against design guidelines. In Proceedings of the International Conference on Software Engineering (ICSE). 761–772. https://doi.org/10.1109/ICSE43902.2021.00075

Digital Library

[53]

Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 7(2015), 1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765

Digital Library

[54]

Xu Cheng Yin, Ze Yu Zuo, Shu Tian, and Cheng Lin Liu. 2016. Text detection, tracking and recognition in video: a comprehensive survey. IEEE Transactions on Image Processing 25, 6 (2016), 2752–2773. https://doi.org/10.1109/TIP.2016.2554321

Digital Library

[55]

Chong Yu, Yonghong Song, and Yuanlin Zhang. 2016. Scene text localization using edge analysis and feature pool. Neurocomputing 175, PartA (2016), 652–661. https://doi.org/10.1016/j.neucom.2015.10.105

Digital Library

[56]

Shengcheng Yu, Chunrong Fang, Zhenfei Cao, Xu Wang, Tongyu Li, and Zhenyu Chen. 2021. Prioritize crowdsourced test reports via deep screenshot understanding. In Proceedings of the International Conference on Software Engineering (ICSE). 946–956. https://doi.org/10.1109/ICSE43902.2021.00090

Digital Library

[57]

Shengcheng Yu, Chunrong Fang, Yexiao Yun, and Yang Feng. 2021. Layout and image recognition driving cross-platform automated mobile testing. In Proceedings of the International Conference on Software Engineering (ICSE). 1561–1571. https://doi.org/10.1109/ICSE43902.2021.00139

Digital Library

[58]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2642–2651. https://doi.org/10.1109/CVPR.2017.283

Cited By

Liu DJiang HGuo SChen YQiao L(2024)What's Wrong With Low-Code Development Platforms? An Empirical Study of Low-Code Development Platform BugsIEEE Transactions on Reliability10.1109/TR.2023.329500973:1(695-709)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3295009
Ji RZhu TZhu XChen CPan MZhang TBissyandé TKlein JBird CSarro F(2023)Vision-Based Widget Mapping for Test Migration across Mobile Platforms: Are We There Yet?Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00068(1416-1428)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00068
Ricca FMarchetto AStocco A(2023)A Retrospective Analysis of Grey Literature for AI-Supported Test AutomationQuality of Information and Communications Technology10.1007/978-3-031-43703-8_7(90-105)Online publication date: 13-Sep-2023
https://doi.org/10.1007/978-3-031-43703-8_7

Index Terms

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

GUI-guided repair of mobile test scripts
ICSE '19: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings

Graphical User Interface (GUI) testing has been the focus of mobile app testing. Manual test cases, containing valuable human knowledge about the apps under test, are often coded as scripts to enable automated and repeated execution for test cost ...
Inferring Types of References to GUI Objects in Test Scripts
ICST '09: Proceedings of the 2009 International Conference on Software Testing Verification and Validation

Since manual black-box testing of GUI-based APplications (GAPs) is tedious and laborious, test engineers create test scripts to automate the testing process. These test scripts interact with GAPs by performing actions on their GUI objects. Unlike ...
Neural Networks Pipeline for Offline Machine Printed Arabic OCR

In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

October 2022

2006 pages

ISBN:9781450394758

DOI:10.1145/3551349

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ASE '22

ASE '22: 37th IEEE/ACM International Conference on Automated Software Engineering

October 10 - 14, 2022

MI, Rochester, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
296
Total Downloads

Downloads (Last 12 months)140
Downloads (Last 6 weeks)12

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu DJiang HGuo SChen YQiao L(2024)What's Wrong With Low-Code Development Platforms? An Empirical Study of Low-Code Development Platform BugsIEEE Transactions on Reliability10.1109/TR.2023.329500973:1(695-709)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3295009
Ji RZhu TZhu XChen CPan MZhang TBissyandé TKlein JBird CSarro F(2023)Vision-Based Widget Mapping for Test Migration across Mobile Platforms: Are We There Yet?Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00068(1416-1428)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00068
Ricca FMarchetto AStocco A(2023)A Retrospective Analysis of Grey Literature for AI-Supported Test AutomationQuality of Information and Communications Technology10.1007/978-3-031-43703-8_7(90-105)Online publication date: 13-Sep-2023
https://doi.org/10.1007/978-3-031-43703-8_7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents