Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3697355.3697362acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdiotConference Proceedingsconference-collections
research-article

An End-to-End Multi-modal-based Framework for Visual Identity Inspection System

Published: 12 December 2024 Publication History

Abstract

Visual identity (VI) comprises symbols and text elements that express the essence of the corporate. Misuse of the visual identity could significantly harm the corporate image and reputation. Therefore, it is important to early detect the misuse of visual identity and take the proper measurements. However, it is challenging to execute the inspection manually, especially for large-scale corporate in the modern era. In this paper, we introduce an end-to-end framework to manage an inspection system based on deep learning technique intending to automatically detect the misuse of corporate VI in various forms of media. More precisely, based on the characteristics of corporate logos in VI, we propose a novel method to systematically generate synthetic dataset, which is used to train the logo detection model. Furthermore, with the general common visual feature of logo and OCR algorithm, a robust rules-based engine is designed to automatically discriminate the VI misuse by the input image. Overall, the case study shows a precision of 90% and recall of 85%, and total processing time of each image could be less than 0.5 second.

References

[1]
Fonte, P., Martins, N., Raposo, D., Pereira, L. (2023). Dynamic Visual Identities: Fundamental Principles of Their Design. In: Martins, N., Raposo, D. (eds) Communication Design and Branding. Springer Series in Design and Innovation, vol 32. Springer, Cham.
[2]
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR (pp. 580–587).
[3]
Girshick, R. (2015). Fast R-CNN. In ICCV (pp. 1440–1448).
[4]
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real time object detection with region proposal networks. In NIPS (pp. 91–99).
[5]
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real time object detection. In CVPR (pp. 779–788).
[6]
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., & Berg, A. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37).
[7]
Lin, T., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In ICCV.
[8]
Ashish, V., Noam, S., Niki, P., Jakob, U., Llion, J., Aidan, G., Łukasz, K., & Illia, P.(2017). Attention is all you need. In NeurIPS.
[9]
Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018a). Relation networksfor object detection. In CVPR.
[10]
Nicolas, C., Francisco, M., Gabriel, S., Nicolas, U., Alexander, K. & Sergey, Z.(2020). End-to-end object detection with transformers. In ECCV.
[11]
Sujuan Hou, Jiacheng Li, Weiqing Min, Qiang Hou, Yanna Zhao, Yuanjie Zheng, and Shuqiang Jiang, Deep learning for logo detection: A Survey, ACM Transactions on Multimedia Computing, Communications, and ApplicationsVolume 20,Issue 323, October 2023, Article No.: 72pp 1–23
[12]
J. Wang, W. Min, S. Hou, S. Ma, Y. Zheng, and S.Jiang,“LogoDet-3K: A large-scale image dataset for logo detection,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 18, no. 1, pp. 1–19, 2022.
[13]
C. Li, I. Feh´erv´ari, X. Zhao, I. Macˆedo, and S. Appalaraju, “SeeTek: Very large-scale open-set logo recognition with text-aware metric learning,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 587–596
[14]
X. Jin, W. Su, R. Zhang, Y. He, and H. Xue, “The open brands dataset:Unified brand detection and recognition at scale,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2020, pp.4387–4391
[15]
Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, et al. "PP-OCR: A Practical Ultra Lightweight OCR System", September. 2020.
[16]
B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017
[17]
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee.(2019).Character Region Awareness for Text Detection, In CVPR, pp. 9365-9374
[18]
Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu and Shuai Shao.(2019).Shape Robust Text Detection with Progressive Scale Expansion Network. In CVPR, pp. 9336-9345
[19]
Shorten, C. and Khoshgoftaar, T. M. (2019). Image Data Augmentation for Deep Learning: A Survey. Journal of Big Data, 6(1), 60.
[20]
Tomasi, C., & Manduchi, R. (1998). Bilateral Filtering for Gray and Color Images. In Proceedings of the Sixth International Conference on Computer Vision (pp. 839-846). IEEE.
[21]
Pizer, S. M., Amburn, E. P., Austin, J. D., Cromartie, R., Geselowitz, A., Greer, T., ... & Zimmerman, J. B. (1987). Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing, 39(3), 355-368.
[22]
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen and Xiang Bai.(2020). Real-time Scene Text Detection with Differentiable Binarization. In AAAI.
[23]
Haotian Liu, Chunyuan Li, Qingyang Wu and Yong Jae Lee.(2023). Visual Instruction Tuning. In NeurIPS Oral.
[24]
JaidedAI. (n.d.). EasyOCR: Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. [Online]. Version 1.7.1. Github Repository. Available: https://github.com/JaidedAI/EasyOCR
[25]
DayBreak-u,"chineseocr_lite,"GitHub. [Online]. Available:https://github.com/DayBreak-u/chineseocr_lite.
[26]
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91-110.
[27]
Diego A. Velazquez; Josep M. Gonfaus; Pau Rodríguez; F. Xavier Roca; Seiichi Ozawa; Jordi Gonzàlez.(2021).Logo Detection With No Priors. IEEE Access
[28]
Xiaojun Jia, Huanqian Yan, Yonglin Wu, Xingxing Wei, Xiaochun Cao, Yong Zhang.(2021).An Effective and Robust Detector for Logo Detection.arXiv:2108.00422

Index Terms

  1. An End-to-End Multi-modal-based Framework for Visual Identity Inspection System

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    BDIOT '24: Proceedings of the 2024 8th International Conference on Big Data and Internet of Things
    September 2024
    412 pages
    ISBN:9798400717529
    DOI:10.1145/3697355
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2024

    Check for updates

    Author Tags

    1. Keywords—Corporate visual identity
    2. logo detection, OCR, end-to-end framework.
    3. synthetic data

    Qualifiers

    • Research-article

    Conference

    BDIOT 2024

    Acceptance Rates

    Overall Acceptance Rate 75 of 136 submissions, 55%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 14
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media