Abstract
This paper investigates the suitable position and number of pooling layers in Convolutional Neural Network (CNN) for script recognition from scene images. A common practice of CNN for object recognition is to position a convolve layer alternately with a pooling layer followed by a few layers of fully connected layers. We re-evaluate this basic principle by examining the position of pooling layer after every convolve layer, reducing and increasing its numbers. Experimental results on MLe2e dataset for script recognition show that a CNN with less number of pooling layers and non-overlapping pooling stride can reach excellent percentage of accuracy compared to alternating convolve layer with pooling layer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bhunia, A.K., Konwer, A., Bhunia, Ak.K., Bhowmick, A., Roy, P.P., Pal, U.: Script identification in natural scene image and video frams using an attention based convolutional-LSTM network. Pattern Recogn. 85, 172–184 (2019)
Gomez, L., Nicolaou, A., Karatzas, D.: Improving patch-based scene text script identification with ensembles of conjoined network. Pattern Recogn. 67, 85–96 (2017)
Mei, J., Dai, L., Shi, B., Bai, X.: Scene text script identification with convolutional recurrent neural networks. In: IEEE International Conference on Pattern Recognition, pp. 4053–4058 (2016)
Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: ICDAR2015 competition on video script identification (CVSI 2015). In: IEEE 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1196–1200 (2015)
Gomez, L., Karatzas, D.: A fine-grained approach to scene text script identification. In: 12th IAPR Workshop on IEEE Document Analysis Systems (DAS), pp. 192–197 (2016)
Chanda, S., Pal, U., Franke, K.: Font identification – in context of an indic script. In: 21st International Conference on Pattern Recognition (ICPR2012) (2012)
Ul-Hasan, A., Afzal, M.Z., Shafait, F., Liwicki, M., Breuel, T.M.: A sequence learning approach for multiple script identification. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1046–1050 (2015)
Saidani, A., Kacem, A., Belaid, A.: Co-occurrence matrix of oriented gradients for word script and nature identification. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 16–20 (2015)
Ghosh, D., Dube, T., Shivaprasad, A.P.: Script recognition – a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2160 (2010)
Ubul, K., Tursun, G., Aysa, A., Impedovo, D.: Script identification of multi-script documents: a survey. IEEE Access 5, 6546–6559 (2017)
Fujii, Y., Driesen, K., Baccash, J., Hurst, A., Popat, A.C.: Sequence-to-label script identification for multilingual OCR. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 161–168 (2017)
Chen, Z., Wu, Y., Yin, F., Liu, C.L.: Simultaneous script identification and handwriting recognition via muti-task learning of recurrent neural networks. In: 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 525–530 (2017)
Gomez, L.: MLe2e multi-lingual end-to-end dataset (2016). https://www.researchgate.net/publication/297469752_MLe2e_multi-lingual_end-to-end_dataset
Fukushima, K.: Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw. 1(2), 119–130 (1988)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Chen, L., Wong, S., Fan, W., Sun, J., Satoshi, N.: Reconstruction combined training for convolutional neural networks on character recognition. In: 13th International Conference on Document Analysis and Recognition (ICDAR) (2015)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Confl. Violence 115(3), 211–252 (2015)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. Comput. Vis. Image Underst. 137, 24–37 (2015)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Krizhecsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Neural Information Processing Systems (NIPS) (2012)
Szegedy, C., et al.: Going deeper with convolution. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ibrahim, Z., Isa, D., Idrus, Z., Kasiran, Z., Roslan, R. (2019). Evaluation of Pooling Layers in Convolutional Neural Network for Script Recognition. In: Berry, M., Yap, B., Mohamed, A., Köppen, M. (eds) Soft Computing in Data Science. SCDS 2019. Communications in Computer and Information Science, vol 1100. Springer, Singapore. https://doi.org/10.1007/978-981-15-0399-3_10
Download citation
DOI: https://doi.org/10.1007/978-981-15-0399-3_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0398-6
Online ISBN: 978-981-15-0399-3
eBook Packages: Computer ScienceComputer Science (R0)