A Hybrid Representation of Word Images for Keyword Spotting

Hongxi Wei^11,12,13,
Jing Zhang^11,12,13 &
Kexin Liu^11,12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2543 Accesses

Abstract

In the task of keyword spotting based on query-by-example, how to represent word images is a very important issue. Meanwhile, the problem of out-of-vocabulary (OOV) is frequently occurred in keyword spotting. Therefore, the problem of OOV keyword spotting is a challenging task. In this paper, a hybrid representation approach of word images has been presented to accomplish the aim of OOV keyword spotting. To be specific, a sequence to sequence model has been utilized to generate representation vectors of word images. Meanwhile, a CNN model with VGG16 architecture has been used to obtain another type of representation vectors. After that, a score fusion scheme is adopted to combine the above two kinds of representation vectors. Experimental results demonstrate that the proposed hybrid representation approach of word images is especially suited for solving the problem of OOV keyword spotting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

A study of Bag-of-Visual-Words representations for handwritten keyword spotting

Article 01 May 2015

References

Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68(8), 310–332 (2017)
Article Google Scholar
Gurjar, N., Sudholt, S., Fink, G.A.: Learning deep representations for word spotting under weak supervision. In: Proceedings of the 13th International Workshop on Document Analysis Systems (DAS’18), pp. 7–12. IEEE (2018)
Google Scholar
Wei, H., Gao, G.: A keyword retrieval system for historical Mongolian document images. Int. J. Doc. Anal. Recogn. (IJDAR) 17(1), 33–45 (2013). https://doi.org/10.1007/s10032-013-0203-6
Article Google Scholar
Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV’17), pp. 4433–4442. IEEE (2017)
Google Scholar
Wei, H., Zhang, H., Gao, G.: Word image representation based on visual embeddings and spatial constraints for keyword spotting on historical documents. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18), pp. 3616–3621. IEEE (2018)
Google Scholar
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recogn. 18(3), 223–234 (2015)
Article Google Scholar
Wei, H., Gao, G.: Visual language model for keyword spotting on historical Mongolian document images. In: Proceedings of the 29th Chinese Control and Decision Conference (CCDC’17), pp. 1737–1742. IEEE (2017)
Google Scholar
Wei, H., Gao, G., Su, X.: LDA-based word image representation for keyword spotting on historical mongolian documents. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 432–441. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_52
Chapter Google Scholar
Wei, H., Zhang, H., Gao, G.: Representing word image using visual word embeddings and RNN for keyword spotting on historical document images. In: Proceedings of the 18th International Conference on Multimedia and Expo (ICME’17), pp. 1368–1373. IEEE (2017)
Google Scholar
Wei, H., Zhang, H., Gao, G.: Integrating visual word embeddings into translation language model for keyword spotting on historical mongolian document images. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds.) PCM 2017. LNCS, vol. 10736, pp. 616–625. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77383-4_60
Chapter Google Scholar
Wei, H., Zhang, H., Gao, G., Su X.: Using word mover’s distance with spatial constraints for measuring similarity between mongolian word images. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science, vol 10637. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9_20
Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR’16), pp. 289–294. IEEE (2016)
Google Scholar

Download references

Acknowledgments

This study is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2019ZD14, the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05, and the Natural Science Foundation of China under Grant 61463038 and 61763034.

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010021, China
Hongxi Wei, Jing Zhang & Kexin Liu
Provincial Key Laboratory of Mongolian Information Processing Technology, Hohhot, China
Hongxi Wei, Jing Zhang & Kexin Liu
National and Local Joint Engineering Research Center of Mongolian Information Processing Technology, Hohhot, China
Hongxi Wei, Jing Zhang & Kexin Liu

Authors

Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kexin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, H., Zhang, J., Liu, K. (2020). A Hybrid Representation of Word Images for Keyword Spotting. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_1
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics