Handwritten Text Retrieval from Unlabeled Collections

Santhoshini Gongidi¹⁰ &
C. V. Jawahar¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1568))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

875 Accesses
1 Citations

Abstract

Handwritten documents from communities like cultural heritage, judiciary, and modern journals remain largely unexplored even today. To a great extent, this is due to the lack of retrieval tools for such unlabeled document collections. This work considers such collections and presents a simple, robust retrieval framework for easy information access. We achieve retrieval on unlabeled novel collections through invariant features learned for handwritten text. These feature representations enable zero-shot retrieval for novel queries on unlabeled collections. We improve the framework further by supporting search via text and exemplar queries. Four new collections written in English, Malayalam, and Bengali are used to evaluate our text retrieval framework. These collections comprise 2957 handwritten pages and over 300K words. We report promising results on these collections, despite the zero-shot constraint and huge collection size. Our framework allows the addition of new collections without any need for specific finetuning or labeling. Finally, we also present a demonstration of the retrieval framework. [Project Page].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

General Overview of ImageCLEF at the CLEF 2016 Labs

Making Large Collections of Handwritten Material Easily Accessible and Searchable

Classification of Untranscribed Handwritten Notarial Documents by Textual Contents

Notes

1.
Demo links available at project page.

References

Ahmed, R., Al-Khatib, W.G., Mahmoud, S.: A Survey on handwritten documents word spotting. Int. J. Multimed. Inf. Retr. 6(1), 31–47 (2016). https://doi.org/10.1007/s13735-016-0110-y
Article Google Scholar
Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. TPAMI 36, 2552–2566 (2014)
Article Google Scholar
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR (2019)
Google Scholar
Das, S., Mandal, S.: Keyword spotting in historical Bangla handwritten document image using CNN. In: ICACCP (2019)
Google Scholar
Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. PR 68, 310–332 (2017)
Google Scholar
Gongidi, S., Jawahar, C.V.: iiit-indic-hw-words: a dataset for Indic handwritten text recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12824, pp. 444–459. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_30
Chapter Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Article Google Scholar
Krishnan, P., Dutta, K., Jawahar, C.: Word spotting and recognition using deep embedding. In: DAS (2018)
Google Scholar
Krishnan, P., Jawahar, C.: Generating synthetic data for text recognition. arXiv preprint arXiv:1608.04224 (2016)
Krishnan, P., Jawahar, C.V.: HWNet v2: an efficient word image representation for handwritten documents. IJDAR 22(4), 387–405 (2019). https://doi.org/10.1007/s10032-019-00336-x
Article Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5, 39–46 (2002). https://doi.org/10.1007/s100320200071
Article MATH Google Scholar
Rath, T.M., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: SIGIR (2004)
Google Scholar
Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR 9, 139–152 (2007). https://doi.org/10.1007/s10032-006-0027-8
Article Google Scholar
Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: ICFHR (2016)
Google Scholar
Toselli, A.H., Romero, V., Sánchez, J.A., Vidal, E.: Making two vast historical manuscript collections searchable and extracting meaningful textual features through large-scale probabilistic indexing. In: ICDAR (2019)
Google Scholar
Vidal, E., et al.: The CARABELA project and manuscript collection: large-scale probabilistic indexing and content-based classification. In: ICFHR (2020)
Google Scholar
Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: ICFHR (2016)
Google Scholar
Wilkinson, T., Lindström, J., Brun, A.: Neural word search in historical manuscript collections. ArXiv (2018)
Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge the funding support received through IMPRINT project, Govt. of India to accomplish this project.

Author information

Authors and Affiliations

Centre for Visual Information Technology, IIIT Hyderabad, Hyderabad, India
Santhoshini Gongidi & C. V. Jawahar

Authors

Santhoshini Gongidi
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santhoshini Gongidi .

Editor information

Editors and Affiliations

Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Ropar, Ropar, India
Subrahmanyam Murala
Jadavpur University, Kolkata, India
Ananda Chowdhury
Indian Institute of Technology Ropar, Ropar, India
Abhinav Dhall
Indian Institute of Technology Ropar, Ropar, India
Puneet Goyal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gongidi, S., Jawahar, C.V. (2022). Handwritten Text Retrieval from Unlabeled Collections. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1568. Springer, Cham. https://doi.org/10.1007/978-3-031-11349-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-11349-9_1
Published: 24 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11348-2
Online ISBN: 978-3-031-11349-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Handwritten Text Retrieval from Unlabeled Collections

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

General Overview of ImageCLEF at the CLEF 2016 Labs

Making Large Collections of Handwritten Material Easily Accessible and Searchable

Classification of Untranscribed Handwritten Notarial Documents by Textual Contents

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Handwritten Text Retrieval from Unlabeled Collections

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

General Overview of ImageCLEF at the CLEF 2016 Labs

Making Large Collections of Handwritten Material Easily Accessible and Searchable

Classification of Untranscribed Handwritten Notarial Documents by Textual Contents

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation