Abstract
The detection and recognition of unconstrained text is an open problem in research. Text in comic books has unusual styles that raise many challenges for text detection. This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga. To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own. To improve the evaluation and search of an optimal model, in addition to standard metrics in binarization, we implement other special metrics. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anonymous, The Danbooru Community, Branwen, G.: Danbooru 2019: a large-scale crowdsourced and tagged anime illustration dataset (2020). https://www.gwern.net/Danbooru2019
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional convLSTM U-Net with densley connected convolutions. In: ICCV (2019)
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: DIBCO: document image binarization competition using BCDU-Net to achieve best performance (2019). https://github.com/rezazad68/BCDUnet_DIBCO
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9357–9366 (2019)
Biswas, S., Mandal, S., Das, A.K., Chanda, B.: Land map images binarization based on distance transform and adaptive threshold. In: 11th IAPR International Workshop on Document Analysis Systems, pp. 334–338 (2014)
Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F.: COCO\(\_\)TS dataset: pixel-level annotations based on weak supervision for scene text segmentation. In: ICANN (2019)
Calarasanu, S.: Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms. Ph.D. thesis, Université Pierre et Marie Curie - Paris 6, Paris, France (2015)
Calarasanu, S., Fabrizio, J., Dubuisson, S.: From text detection to text segmentation: a unified evaluation scheme. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 378–394. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_28
Calarasanu, S., Fabrizio, J., Dubuisson, S.: What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions. Image Vis. Comput. 46, 1–17 (2016)
Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, MUC4 1992, pp. 22–29. Association for Computational Linguistics, USA (1992). https://doi.org/10.3115/1072064.1072067
Chu, W.T., Yu, C.C.: Text detection in manga by deep region proposal, classification, and regression. In: 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2018)
Ch’ng, C.K., Chan, C.S., Liu, C.: Total-text: towards orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 23, 31–52 (2020)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Dubray, D., Laubrock, J.: Deep CNN-based speech balloon detection and segmentation for comic books. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1237–1243 (2019)
Electrotechnical Laboratory: ETL character database. http://etlcdb.db.aist.go.jp/. Accessed 23 July 2020
Fastai: Fastai deep learning library. https://github.com/fastai/fastai. Accessed 10 Feb 2020
Fastai: U-net model. https://docs.fast.ai/vision.models.unet.html. Accessed 02 Mar 2020
Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10, 910–918 (1988)
Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th International Conference on Document Analysis and Recognition, pp. 1375–1382 (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2015)
Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020). https://doi.org/10.3390/info11020108
Jung, J.H., Lee, S.H., Cho, M.S., Kim, J.H.: Touch TT: scene text extractor using touchscreen interface. ETRI J. 33, 78–88 (2011). https://doi.org/10.4218/etrij.11.1510.0029
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of 12th International Conference on Document Analysis and Recognition, ICDAR 2013, pp. 1484–1493. IEEE Computer Society (2013)
Ko, U.R., Cho, H.G.: Sickzil-machine (2019). https://github.com/KUR-creative/SickZil-Machine
Ko, U.-R., Cho, H.-G.: SickZil-machine: a deep learning based script text isolation system for comics translation. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 413–425. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_29
Liu, X., Li, C., Zhu, H., Wong, T.-T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput. 32(4), 501–511 (2015). https://doi.org/10.1007/s00371-015-1084-0
Lu, H., Kot, A., Shi, Y.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11, 228–231 (2004)
Manga109: Japanese manga dataset. http://www.manga109.org/en/
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76(20), 21811–21838 (2016). https://doi.org/10.1007/s11042-016-4020-z
Mehul, G., Ankita, P., Udeshi Namrata, D., Rahul, G., Sheth, S.: Text-based image segmentation methodology. Procedia Technol. 14, 465–472 (2014)
Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Mustafa, W., Kader, M.: Binarization of document images: a comprehensive review. J. Phys. Conf. Ser. 1019, 012023 (2018). https://doi.org/10.1088/1742-6596/1019/1/012023
Nakamura, T., Zhu, A., Yanai, K., Uchida, S.: Scene text eraser. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 832–837 (2017)
Nguyen, N.-V., Rigaud, C., Burie, J.-C.: Comic MTL: optimized multi-task learning for comic book image analysis. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 265–284 (2019). https://doi.org/10.1007/s10032-019-00330-3
Ntogas, N., Veintzas, D.: A binarization algorithm for historical manuscripts. ICC 2008, 41–51 (2008)
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. ArXiv abs/1803.08670 (2018)
Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., Marthot-Santaniello, I.: ICDAR 2019 competition on document image binarization (DIBCO 2019). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1547–1556 (2019)
Rigaud, C., Burie, J.-C., Ogier, J.-M.: Text-independent speech balloon segmentation for comics and manga. In: Lamiroy, B., Dueire Lins, R. (eds.) GREC 2015. LNCS, vol. 9657, pp. 133–147. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52159-6_10
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. ArXiv abs/1803.09820 (2018)
Solihin, Y., Leedham, G.: Integral ratio: a new class of global thresholding techniques for handwriting images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 761–768 (1999)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)
Tursun, O., Zeng, R., Denman, S., Sivipalan, S., Sridharan, S., Fookes, C.: MTRNet: a generic scene text eraser. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 39–44 (2019)
Wang, J., et al.: High-resolution networks (HRNets) for Semantic Segmentation. https://github.com/HRNet/HRNet-Semantic-Segmentation
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. In: TPAMI (2019)
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Wu, V., Manmatha, R., Riseman, E.M.: Finding text in images. In: Proceedings of the Second ACM International Conference on Digital Libraries, DL 1997, pp. 3–12. Association for Computing Machinery, New York (1997). https://doi.org/10.1145/263690.263766
Yakubovskiy, P.: Segmentation Models Pytorch (2020). https://github.com/qubvel/segmentation_models.pytorch
Yanagisawa, H., Yamashita, T., Watanabe, H.: A study on object detection method from manga images using CNN. In: 2018 International Workshop on Advanced Image Technology (IWAIT), pp. 1–4 (2018)
yu45020: Text segmentation and image inpainting (2019). https://github.com/yu45020/Text_Segmentation_Image_Inpainting
Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: EnsNet: ensconce text in the wild. In: AAAI (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Dataset and Code Availability
Our dataset and the code used in this study is available at the GitHub page https://github.com/juvian/Manga-Text-Segmentation.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Del Gobbo, J., Matuk Herrera, R. (2020). Unconstrained Text Detection in Manga: A New Dataset and Baseline. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12537. Springer, Cham. https://doi.org/10.1007/978-3-030-67070-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-67070-2_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67069-6
Online ISBN: 978-3-030-67070-2
eBook Packages: Computer ScienceComputer Science (R0)