Abstract
Nowadays, there is a great demand for multilingual optical character recognition (MOCR) in various web applications. And recently, Long Short-Term Memory (LSTM) networks have yielded excellent results on Latin-based printed recognition. However, it is not flexible enough to cope with challenges posed by web applications where we need to quickly get an OCR model for a certain set of languages. This paper proposes a Hybrid Model Reuse (HMR) training approach for multilingual OCR task, based on 1D bidirectional LSTM networks coupled with a model reuse scheme. Specifically, Fixed Model Reuse (FMR) scheme is analyzed and incorporated into our approach, which implicitly grabs the useful discriminative information from a fixed text generating model. Moreover, LSTM layers from pre-trained networks for unilingual OCR task are reused to initialize the weights of target networks. Experimental results show that our proposed HMR approach, without assistance of any post-processing techniques, is able to effectively accelerate the training process and finally yield higher accuracy than traditional approaches.
Supported by the National Social Science Foundation of China (Grant No: 15BGL048), the Hubei Province Science and Technology Support Project (Grant No: 2015BAA072), the National Natural Science Foundation of China (Grant No. 61672398), the Hubei Provincial Natural Science Foundation of China (Grant No: 2017CFA012), the Fundamental Research Funds for the Central Universities (WUT: 2017II39GX).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ablavatski, A., Lu, S., Cai, J.: Enriched deep recurrent visual attention model for multiple object recognition. In: Applications of Computer Vision, pp. 971–978 (2017)
Baird, H.S.: Document image defect models and their uses. In: 2nd International Conference Document Analysis and Recognition, ICDAR 1993, Tsukuba City, Japan, 20–22 October 1993, pp. 62–67 (1993)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (2002)
Breuel, T.M.: The OCRopus open source OCR system. In: Proceedings of the Document Recognition and Retrieval XV, Part of the IS&T-SPIE Electronic Imaging Symposium, San Jose, CA, USA, 29–31 January 2008, p. 68150F (2008)
Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In codice ratio: OCR of handwritten Latin documents using deep convolutional networks. In: International Workshop on Artificial Intelligence for Cultural Heritage, pp. 9–16 (2017)
Graves, A., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp. 369–376 (2006)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. In: International Conference on International Conference on Machine Learning, pp. 97–105 (2015)
Naz, S., Umar, A.I., Shirazi, S.H., Ajmal, M.M., Salahuddin: The optical character recognition for cursive script using HMM: a review. Res. J. Appl. Sci. Eng. Technol. 8(19), 2016–2025 (2014)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Peng, X., Cao, H., Setlur, S., Govindaraju, V., Natarajan, P.: Multilingual OCR research and applications: an overview. In: International Workshop on Multilingual OCR, pp. 1–8 (2013)
Philip, B., Samuel, R.D.S.: A novel bilingual OCR system based on column-stochastic features and SVM classifier for the specially enabled. In: Second International Conference on Emerging Trends in Engineering & Technology, pp. 252–257 (2009)
Shi, Z., Shi, M., Li, C.: The prediction of character based on recurrent neural network language model. In: IEEE/ACIS International Conference on Computer and Information Science, pp. 613–616 (2017)
Smith, R., Antonova, D., Lee, D.S.: Adapting the tesseract open source OCR engine for multilingual OCR. In: International Workshop on Multilingual OCR, p. 1 (2009)
Song, R., Umemoto, K., Nie, J., Xie, X., Tanaka, K., Rui, Y.: UniClip: leveraging web search for universal clipping of articles on mobile. Data Sci. Eng. 1(2), 101–113 (2016)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Ul-Hasan, A., Breuel, T.M.: Can we build language-independent OCR using LSTM networks? In: International Workshop on Multilingual OCR, p. 9 (2013)
Yang, B., Zhang, Y., Cao, J., Zou, L.: On road vehicle detection using an improved faster RCNN framework with small-size region up-scaling strategy. In: Satoh, S. (ed.) PSIVT 2017. LNCS, vol. 10799, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92753-4_20
Yang, Y., Zhan, D., Fan, Y., Jiang, Y., Zhou, Z.: Deep learning for fixed model reuse. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017, pp. 2831–2837 (2017)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 3320–3328 (2014)
Zhou, Z.H.: Learnware: On the Future of Machine Learning. Springer, New York (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, Z., Li, L., Zhong, X., Zhong, L., Xie, Q., Xiang, J. (2018). A Hybrid Model Reuse Training Approach for Multilingual OCR. In: Hacid, H., Cellary, W., Wang, H., Paik, HY., Zhou, R. (eds) Web Information Systems Engineering – WISE 2018. WISE 2018. Lecture Notes in Computer Science(), vol 11233. Springer, Cham. https://doi.org/10.1007/978-3-030-02922-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-02922-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02921-0
Online ISBN: 978-3-030-02922-7
eBook Packages: Computer ScienceComputer Science (R0)