Abstract
In the task of keyword spotting based on query-by-example, how to represent word images is a very important issue. Meanwhile, the problem of out-of-vocabulary (OOV) is frequently occurred in keyword spotting. Therefore, the problem of OOV keyword spotting is a challenging task. In this paper, a hybrid representation approach of word images has been presented to accomplish the aim of OOV keyword spotting. To be specific, a sequence to sequence model has been utilized to generate representation vectors of word images. Meanwhile, a CNN model with VGG16 architecture has been used to obtain another type of representation vectors. After that, a score fusion scheme is adopted to combine the above two kinds of representation vectors. Experimental results demonstrate that the proposed hybrid representation approach of word images is especially suited for solving the problem of OOV keyword spotting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68(8), 310–332 (2017)
Gurjar, N., Sudholt, S., Fink, G.A.: Learning deep representations for word spotting under weak supervision. In: Proceedings of the 13th International Workshop on Document Analysis Systems (DAS’18), pp. 7–12. IEEE (2018)
Wei, H., Gao, G.: A keyword retrieval system for historical Mongolian document images. Int. J. Doc. Anal. Recogn. (IJDAR) 17(1), 33–45 (2013). https://doi.org/10.1007/s10032-013-0203-6
Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV’17), pp. 4433–4442. IEEE (2017)
Wei, H., Zhang, H., Gao, G.: Word image representation based on visual embeddings and spatial constraints for keyword spotting on historical documents. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18), pp. 3616–3621. IEEE (2018)
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recogn. 18(3), 223–234 (2015)
Wei, H., Gao, G.: Visual language model for keyword spotting on historical Mongolian document images. In: Proceedings of the 29th Chinese Control and Decision Conference (CCDC’17), pp. 1737–1742. IEEE (2017)
Wei, H., Gao, G., Su, X.: LDA-based word image representation for keyword spotting on historical mongolian documents. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 432–441. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_52
Wei, H., Zhang, H., Gao, G.: Representing word image using visual word embeddings and RNN for keyword spotting on historical document images. In: Proceedings of the 18th International Conference on Multimedia and Expo (ICME’17), pp. 1368–1373. IEEE (2017)
Wei, H., Zhang, H., Gao, G.: Integrating visual word embeddings into translation language model for keyword spotting on historical mongolian document images. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds.) PCM 2017. LNCS, vol. 10736, pp. 616–625. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77383-4_60
Wei, H., Zhang, H., Gao, G., Su X.: Using word mover’s distance with spatial constraints for measuring similarity between mongolian word images. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science, vol 10637. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9_20
Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR’16), pp. 289–294. IEEE (2016)
Acknowledgments
This study is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2019ZD14, the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05, and the Natural Science Foundation of China under Grant 61463038 and 61763034.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, H., Zhang, J., Liu, K. (2020). A Hybrid Representation of Word Images for Keyword Spotting. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)