Unconstrained Text Detection in Manga: A New Dataset and Baseline

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12537))

Included in the following conference series:

European Conference on Computer Vision

3387 Accesses

Abstract

The detection and recognition of unconstrained text is an open problem in research. Text in comic books has unusual styles that raise many challenges for text detection. This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga. To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own. To improve the evaluation and search of an optimal model, in addition to standard metrics in binarization, we implement other special metrics. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detection of Buried Complex Text. Case of Onomatopoeia in Comics Books

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

U-Net Based Architectures for Document Text Detection and Binarization

References

Anonymous, The Danbooru Community, Branwen, G.: Danbooru 2019: a large-scale crowdsourced and tagged anime illustration dataset (2020). https://www.gwern.net/Danbooru2019
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional convLSTM U-Net with densley connected convolutions. In: ICCV (2019)
Google Scholar
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: DIBCO: document image binarization competition using BCDU-Net to achieve best performance (2019). https://github.com/rezazad68/BCDUnet_DIBCO
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9357–9366 (2019)
Google Scholar
Biswas, S., Mandal, S., Das, A.K., Chanda, B.: Land map images binarization based on distance transform and adaptive threshold. In: 11th IAPR International Workshop on Document Analysis Systems, pp. 334–338 (2014)
Google Scholar
Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F.: COCO$\_$TS dataset: pixel-level annotations based on weak supervision for scene text segmentation. In: ICANN (2019)
Google Scholar
Calarasanu, S.: Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms. Ph.D. thesis, Université Pierre et Marie Curie - Paris 6, Paris, France (2015)
Google Scholar
Calarasanu, S., Fabrizio, J., Dubuisson, S.: From text detection to text segmentation: a unified evaluation scheme. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 378–394. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_28
Chapter Google Scholar
Calarasanu, S., Fabrizio, J., Dubuisson, S.: What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions. Image Vis. Comput. 46, 1–17 (2016)
Article Google Scholar
Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, MUC4 1992, pp. 22–29. Association for Computational Linguistics, USA (1992). https://doi.org/10.3115/1072064.1072067
Chu, W.T., Yu, C.C.: Text detection in manga by deep region proposal, classification, and regression. In: 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2018)
Google Scholar
Ch’ng, C.K., Chan, C.S., Liu, C.: Total-text: towards orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 23, 31–52 (2020)
Article Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1237–1243 (2019)
Google Scholar
Electrotechnical Laboratory: ETL character database. http://etlcdb.db.aist.go.jp/. Accessed 23 July 2020
Fastai: Fastai deep learning library. https://github.com/fastai/fastai. Accessed 10 Feb 2020
Fastai: U-net model. https://docs.fast.ai/vision.models.unet.html. Accessed 02 Mar 2020
Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10, 910–918 (1988)
Article Google Scholar
Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th International Conference on Document Analysis and Recognition, pp. 1375–1382 (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2015)
Google Scholar
Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020). https://doi.org/10.3390/info11020108
Article Google Scholar
Jung, J.H., Lee, S.H., Cho, M.S., Kim, J.H.: Touch TT: scene text extractor using touchscreen interface. ETRI J. 33, 78–88 (2011). https://doi.org/10.4218/etrij.11.1510.0029
Article Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1484–1493. IEEE Computer Society (2013)
Google Scholar
Ko, U.R., Cho, H.G.: Sickzil-machine (2019). https://github.com/KUR-creative/SickZil-Machine
Ko, U.-R., Cho, H.-G.: SickZil-machine: a deep learning based script text isolation system for comics translation. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 413–425. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_29
Chapter Google Scholar
Liu, X., Li, C., Zhu, H., Wong, T.-T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput. 32(4), 501–511 (2015). https://doi.org/10.1007/s00371-015-1084-0
Article Google Scholar
Lu, H., Kot, A., Shi, Y.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11, 228–231 (2004)
Article Google Scholar
Manga109: Japanese manga dataset. http://www.manga109.org/en/
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2016). https://doi.org/10.1007/s11042-016-4020-z
Article Google Scholar
Mehul, G., Ankita, P., Udeshi Namrata, D., Rahul, G., Sheth, S.: Text-based image segmentation methodology. Procedia Technol. 14, 465–472 (2014)
Article Google Scholar
Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Google Scholar
Mustafa, W., Kader, M.: Binarization of document images: a comprehensive review. J. Phys. Conf. Ser. 1019, 012023 (2018). https://doi.org/10.1088/1742-6596/1019/1/012023
Article Google Scholar
Nakamura, T., Zhu, A., Yanai, K., Uchida, S.: Scene text eraser. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 832–837 (2017)
Google Scholar
Nguyen, N.-V., Rigaud, C., Burie, J.-C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 265–284 (2019). https://doi.org/10.1007/s10032-019-00330-3
Article Google Scholar
Ntogas, N., Veintzas, D.: A binarization algorithm for historical manuscripts. ICC 2008, 41–51 (2008)
Google Scholar
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. ArXiv abs/1803.08670 (2018)
Google Scholar
Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., Marthot-Santaniello, I.: ICDAR 2019 competition on document image binarization (DIBCO 2019). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1547–1556 (2019)
Google Scholar
Rigaud, C., Burie, J.-C., Ogier, J.-M.: Text-independent speech balloon segmentation for comics and manga. In: Lamiroy, B., Dueire Lins, R. (eds.) GREC 2015. LNCS, vol. 9657, pp. 133–147. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52159-6_10
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. ArXiv abs/1803.09820 (2018)
Google Scholar
Solihin, Y., Leedham, G.: Integral ratio: a new class of global thresholding techniques for handwriting images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 761–768 (1999)
Article Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)
Google Scholar
Tursun, O., Zeng, R., Denman, S., Sivipalan, S., Sridharan, S., Fookes, C.: MTRNet: a generic scene text eraser. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 39–44 (2019)
Google Scholar
Wang, J., et al.: High-resolution networks (HRNets) for Semantic Segmentation. https://github.com/HRNet/HRNet-Semantic-Segmentation
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. In: TPAMI (2019)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wu, V., Manmatha, R., Riseman, E.M.: Finding text in images. In: Proceedings of the Second ACM International Conference on Digital Libraries, DL 1997, pp. 3–12. Association for Computing Machinery, New York (1997). https://doi.org/10.1145/263690.263766
Yakubovskiy, P.: Segmentation Models Pytorch (2020). https://github.com/qubvel/segmentation_models.pytorch
Yanagisawa, H., Yamashita, T., Watanabe, H.: A study on object detection method from manga images using CNN. In: 2018 International Workshop on Advanced Image Technology (IWAIT), pp. 1–4 (2018)
Google Scholar
yu45020: Text segmentation and image inpainting (2019). https://github.com/yu45020/Text_Segmentation_Image_Inpainting
Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: EnsNet: ensconce text in the wild. In: AAAI (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Computación, FCEN, Universidad de Buenos Aires, Buenos Aires, Argentina
Julián Del Gobbo
Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
Rosana Matuk Herrera

Authors

Julián Del Gobbo
View author publications
You can also search for this author in PubMed Google Scholar
Rosana Matuk Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julián Del Gobbo .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Ethics declarations

Dataset and Code Availability

Our dataset and the code used in this study is available at the GitHub page https://github.com/juvian/Manga-Text-Segmentation.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 62831 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Del Gobbo, J., Matuk Herrera, R. (2020). Unconstrained Text Detection in Manga: A New Dataset and Baseline. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12537. Springer, Cham. https://doi.org/10.1007/978-3-030-67070-2_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-67070-2_38
Published: 30 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67069-6
Online ISBN: 978-3-030-67070-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics