Abstract
In this paper we evaluated a set of potential improvements to the successful Attention-OCR architecture, designed to predict multiline text from unconstrained scenes in real-world images. We investigated the impact of several optimizations on model’s accuracy, including employing dynamic RNNs (Recurrent Neural Networks), scheduled sampling, BiLSTM (Bidirectional Long Short-Term Memory) and a modified attention model. BiLSTM was found to slightly increase the accuracy, while dynamic RNNs and a simpler attention model provided a significant training time reduction with only a slight decline in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Badue, C., et al.: Self-driving cars: A survey. arXiv preprint arXiv:1901.04407 (2019)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473 (2014)
Bartz, C., Yang, H., Meinel, C.: STN-OCR: A single neural network for text detection and text recognition. CoRR, abs/1707.08831 (2017)
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. CoRR, abs/1506.03099 (2015)
Brzeski, A., Grinholc, K., Nowodworski, K., Przybyłek, A.: Residual mobilenets. In: Workshop on Modern Approaches in Data Engineering and Information System Design at ADBIS 2019 (2019)
Goyal, K., Dyer, C., Berg-Kirkpatrick, T.: Differentiable scheduled sampling for credit assignment. CoRR, abs/1704.06970 (2017)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. CoRR, abs/1604.06646 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, abs/1512.03385 (2015)
Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huszár, F.: How (not) to train your generative model: Scheduled sampling, likelihood, adversary? arXiv e-prints, November 2015
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., (eds) Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025. Curran Associates Inc (2015)
Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. CoRR, abs/1707.03985 (2017)
Liukkonen, M., Tsai, T.-N.: Toward decentralized intelligence in manufacturing: recent trends in automatic identification of things. Int. J. Adv. Manufact. Technol. 87(9–12), 2509–2531 (2016)
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025 (2015)
Mathews, A.P., Xie, L., He, X.: Semstyle: Learning to generate stylised image captions using unaligned text. CoRR, abs/1805.07030 (2018)
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_14
Przybyłek, K., Shkroba, I.: Crowd counting á la bourdieu. In: Workshop on Modern Approaches in Data Engineering and Information System Design at ADBIS 2019 (2019)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497 (2015)
Shi, B., Wang, X., Lv, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. CoRR, abs/1603.03915 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Smith, R., et al.: End-to-end interpretation of the French street name signs dataset. CoRR, abs/1702.03970 (2017)
Sønderby, S.K., Sønderby, C.K., Maaløe, L., Winther, O.: Recurrent spatial transformer networks. CoRR, abs/1509.05329 (2015)
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: Platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)
Wang, X., Takaki, S., Yamagishi, J.: An RNN-based quantized f0 model with multi-tier feedback links for text-to-speech synthesis. In: INTERSPEECH (2017)
Wang, Y., Gao, Z., Long, M., Wang, J., Yu, P.S.: PredRNN++: Towards A resolution of the deep-in-time dilemma in spatiotemporal predictive learning. CoRR, abs/1804.06300 (2018)
Wojna, Z., et al.: Attention-based extraction of structured information from street view imagery. CoRR, abs/1704.03549 (2017)
Yi, C., Tian, Y.: Assistive text reading from natural scene for blind persons. In: Hua, G., Hua, X.-S. (eds.) Mobile Cloud Visual Media Computing, pp. 219–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24702-1_9
Acknowledgments
This work has been partially supported by Statutory Funds of Electronics, Telecommunications and Informatics Faculty, Gdansk University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Brzeski, A., Grinholc, K., Nowodworski, K., Przybyłek, A. (2019). Evaluating Performance and Accuracy Improvements for Attention-OCR. In: Saeed, K., Chaki, R., Janev, V. (eds) Computer Information Systems and Industrial Management. CISIM 2019. Lecture Notes in Computer Science(), vol 11703. Springer, Cham. https://doi.org/10.1007/978-3-030-28957-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-28957-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28956-0
Online ISBN: 978-3-030-28957-7
eBook Packages: Computer ScienceComputer Science (R0)