Nothing Special   »   [go: up one dir, main page]

Skip to main content

Unconstrained Text Detection in Manga: A New Dataset and Baseline

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Abstract

The detection and recognition of unconstrained text is an open problem in research. Text in comic books has unusual styles that raise many challenges for text detection. This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga. To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own. To improve the evaluation and search of an optimal model, in addition to standard metrics in binarization, we implement other special metrics. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anonymous, The Danbooru Community, Branwen, G.: Danbooru 2019: a large-scale crowdsourced and tagged anime illustration dataset (2020). https://www.gwern.net/Danbooru2019

  2. Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional convLSTM U-Net with densley connected convolutions. In: ICCV (2019)

    Google Scholar 

  3. Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: DIBCO: document image binarization competition using BCDU-Net to achieve best performance (2019). https://github.com/rezazad68/BCDUnet_DIBCO

  4. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9357–9366 (2019)

    Google Scholar 

  5. Biswas, S., Mandal, S., Das, A.K., Chanda, B.: Land map images binarization based on distance transform and adaptive threshold. In: 11th IAPR International Workshop on Document Analysis Systems, pp. 334–338 (2014)

    Google Scholar 

  6. Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F.: COCO\(\_\)TS dataset: pixel-level annotations based on weak supervision for scene text segmentation. In: ICANN (2019)

    Google Scholar 

  7. Calarasanu, S.: Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms. Ph.D. thesis, Université Pierre et Marie Curie - Paris 6, Paris, France (2015)

    Google Scholar 

  8. Calarasanu, S., Fabrizio, J., Dubuisson, S.: From text detection to text segmentation: a unified evaluation scheme. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 378–394. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_28

    Chapter  Google Scholar 

  9. Calarasanu, S., Fabrizio, J., Dubuisson, S.: What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions. Image Vis. Comput. 46, 1–17 (2016)

    Article  Google Scholar 

  10. Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, MUC4 1992, pp. 22–29. Association for Computational Linguistics, USA (1992). https://doi.org/10.3115/1072064.1072067

  11. Chu, W.T., Yu, C.C.: Text detection in manga by deep region proposal, classification, and regression. In: 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2018)

    Google Scholar 

  12. Ch’ng, C.K., Chan, C.S., Liu, C.: Total-text: towards orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 23, 31–52 (2020)

    Article  Google Scholar 

  13. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  14. Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1237–1243 (2019)

    Google Scholar 

  15. Electrotechnical Laboratory: ETL character database. http://etlcdb.db.aist.go.jp/. Accessed 23 July 2020

  16. Fastai: Fastai deep learning library. https://github.com/fastai/fastai. Accessed 10 Feb 2020

  17. Fastai: U-net model. https://docs.fast.ai/vision.models.unet.html. Accessed 02 Mar 2020

  18. Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10, 910–918 (1988)

    Article  Google Scholar 

  19. Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th International Conference on Document Analysis and Recognition, pp. 1375–1382 (2009)

    Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2015)

    Google Scholar 

  21. Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020). https://doi.org/10.3390/info11020108

    Article  Google Scholar 

  22. Jung, J.H., Lee, S.H., Cho, M.S., Kim, J.H.: Touch TT: scene text extractor using touchscreen interface. ETRI J. 33, 78–88 (2011). https://doi.org/10.4218/etrij.11.1510.0029

    Article  Google Scholar 

  23. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1484–1493. IEEE Computer Society (2013)

    Google Scholar 

  24. Ko, U.R., Cho, H.G.: Sickzil-machine (2019). https://github.com/KUR-creative/SickZil-Machine

  25. Ko, U.-R., Cho, H.-G.: SickZil-machine: a deep learning based script text isolation system for comics translation. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 413–425. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_29

    Chapter  Google Scholar 

  26. Liu, X., Li, C., Zhu, H., Wong, T.-T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput. 32(4), 501–511 (2015). https://doi.org/10.1007/s00371-015-1084-0

    Article  Google Scholar 

  27. Lu, H., Kot, A., Shi, Y.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11, 228–231 (2004)

    Article  Google Scholar 

  28. Manga109: Japanese manga dataset. http://www.manga109.org/en/

  29. Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2016). https://doi.org/10.1007/s11042-016-4020-z

    Article  Google Scholar 

  30. Mehul, G., Ankita, P., Udeshi Namrata, D., Rahul, G., Sheth, S.: Text-based image segmentation methodology. Procedia Technol. 14, 465–472 (2014)

    Article  Google Scholar 

  31. Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016)

    Google Scholar 

  32. Mustafa, W., Kader, M.: Binarization of document images: a comprehensive review. J. Phys. Conf. Ser. 1019, 012023 (2018). https://doi.org/10.1088/1742-6596/1019/1/012023

    Article  Google Scholar 

  33. Nakamura, T., Zhu, A., Yanai, K., Uchida, S.: Scene text eraser. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 832–837 (2017)

    Google Scholar 

  34. Nguyen, N.-V., Rigaud, C., Burie, J.-C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 265–284 (2019). https://doi.org/10.1007/s10032-019-00330-3

    Article  Google Scholar 

  35. Ntogas, N., Veintzas, D.: A binarization algorithm for historical manuscripts. ICC 2008, 41–51 (2008)

    Google Scholar 

  36. Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. ArXiv abs/1803.08670 (2018)

    Google Scholar 

  37. Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., Marthot-Santaniello, I.: ICDAR 2019 competition on document image binarization (DIBCO 2019). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1547–1556 (2019)

    Google Scholar 

  38. Rigaud, C., Burie, J.-C., Ogier, J.-M.: Text-independent speech balloon segmentation for comics and manga. In: Lamiroy, B., Dueire Lins, R. (eds.) GREC 2015. LNCS, vol. 9657, pp. 133–147. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52159-6_10

    Chapter  Google Scholar 

  39. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  40. Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. ArXiv abs/1803.09820 (2018)

    Google Scholar 

  41. Solihin, Y., Leedham, G.: Integral ratio: a new class of global thresholding techniques for handwriting images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 761–768 (1999)

    Article  Google Scholar 

  42. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)

    Google Scholar 

  43. Tursun, O., Zeng, R., Denman, S., Sivipalan, S., Sridharan, S., Fookes, C.: MTRNet: a generic scene text eraser. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 39–44 (2019)

    Google Scholar 

  44. Wang, J., et al.: High-resolution networks (HRNets) for Semantic Segmentation. https://github.com/HRNet/HRNet-Semantic-Segmentation

  45. Wang, J., et al.: Deep high-resolution representation learning for visual recognition. In: TPAMI (2019)

    Google Scholar 

  46. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  47. Wu, V., Manmatha, R., Riseman, E.M.: Finding text in images. In: Proceedings of the Second ACM International Conference on Digital Libraries, DL 1997, pp. 3–12. Association for Computing Machinery, New York (1997). https://doi.org/10.1145/263690.263766

  48. Yakubovskiy, P.: Segmentation Models Pytorch (2020). https://github.com/qubvel/segmentation_models.pytorch

  49. Yanagisawa, H., Yamashita, T., Watanabe, H.: A study on object detection method from manga images using CNN. In: 2018 International Workshop on Advanced Image Technology (IWAIT), pp. 1–4 (2018)

    Google Scholar 

  50. yu45020: Text segmentation and image inpainting (2019). https://github.com/yu45020/Text_Segmentation_Image_Inpainting

  51. Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: EnsNet: ensconce text in the wild. In: AAAI (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julián Del Gobbo .

Editor information

Editors and Affiliations

Ethics declarations

Dataset and Code Availability

Our dataset and the code used in this study is available at the GitHub page https://github.com/juvian/Manga-Text-Segmentation.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 62831 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Del Gobbo, J., Matuk Herrera, R. (2020). Unconstrained Text Detection in Manga: A New Dataset and Baseline. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12537. Springer, Cham. https://doi.org/10.1007/978-3-030-67070-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67070-2_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67069-6

  • Online ISBN: 978-3-030-67070-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics