research-article

Open access

Historical Postcards Retrieval through Vision Foundation Models

Authors:

Salvatore TabboneAuthors Info & Claims

SUMAC '24: Proceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents

Pages 50 - 56

https://doi.org/10.1145/3689094.3689471

Published: 28 October 2024 Publication History

Abstract

The analysis of historical documents presents challenges in automated processing, given issues such as degradation over time and the need to extract information from a large data source. This paper proposes a methodology, comprising two key stages: text extraction using Optical Character Recognition (OCR) and Vision Foundation Models (VFM) for the retrieval process within a challenging collection of 4,294 historical postcards from the East of France region. This approach allows users to effortlessly find postcards that align with their interests and preferences. VFMs and the textual information extracted from the postcards play a key role in this process, by providing a robust and efficient way to match user queries to relevant postcards in the dataset.VFMs trained on large datasets offer a solution to reduce dependence on annotated data and enhance model versatility. For our retrieval stage, we have selected two VFMs, CLIP and DINOv2, and evaluate their performance using quantitative metrics to identify the model yielding the best results.

References

[1]

OpenAI Josh Achiam, Steven Adler, and Sandhini Agarwal et al. 2023. GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774.

[2]

Christian Bartz, Hendrik Raetz, and Jona Otholt et al. 2022. Synthesis in style: Semantic segmentation of historical documents using synthetic data. In International Conference on Pattern Recognition.

[3]

Rishi Bommasani, Drew A. Hudson, and Ehsan Adeli et al. 2021. On the opportunities and risks of foundation models. https://doi.org/10.48550/arXiv.2108.07258.

[4]

Tom B. Brown, Benjamin Mann, and Nick Ryder et al. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/arXiv.2005.14165.

[5]

Tu Bui, Leonardo Ribeiro, Moacir Ponti, and John Collomosse. 2018. Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Computers & Graphics 71 (2018).

[6]

Mathilde Caron, Hugo Touvron, and Ishan Misra et al. 2021. Emerging Properties in Self-Supervised Vision Transformers. In IEEE International Conference on Computer Vision.

[7]

Aakanksha Chowdhery, Sharan Narang, and Jacob Devlin et al. 2023. PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/arXiv.2204.02311.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. https://doi.org/10.48550/arXiv.1810.04805.

[9]

Alexey Dosovitskiy, Lucas Beyer, and Alexander Kolesnikov et al. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/arXiv.2010.11929.

[10]

Wei Chen; Yu Liu;WeipingWang et al. 2022. Deep learning for instance retrieval: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6 (2022).

[11]

Gernot A. Fink, Leonard Rothacker, and René Grzeszick. 2014. Grouping historical postcards using query-by-example word spotting. In International Conference on Frontiers in Handwriting Recognition.

[12]

Bruno García, Belén Moreno, José F. Vélez, and Ángel Sánchez. 2022. Deep layout extraction applied to historical postcards. Springer.

[13]

Walter Goodwin, Sagar Vaze, Ioannis Havoutis, and Ingmar Posner. 2022. Zeroshot category-level object pose estimation. In European Conference on Computer Vision.

[14]

Rene Grzeszick and Gernot A. Fink. 2014. Recognizing scene categories of historical postcards. In German Conference on Pattern Recognition.

[15]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision.

[16]

Glenn Jocher, Alex Stoken, and Jií Borovec. 2021. ultralytics/yolov5: v3.0.

[17]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7, 3 (2021).

[18]

Li Junnan, Li Dongxu, Xiong Caiming, and Hoi Steven. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In International Conference on Machine Learning.

[19]

Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision and Pattern Recognition.

[20]

Alexander Kirillov, Eric Mintun, and Nikhila Ravi et al. 2023. Segment anything. https://doi.org/10.48550/arXiv.2304.02643.

[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2012).

Digital Library

[22]

Mike Lewis, Yinhan Liu, and Naman Goyal et al. 2020. BART: Denoising Sequenceto- Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Annual Meeting of the Association for Computational Linguistics.

[23]

Jian Liang, David Doermann, and Huiping Li. 2005. Camera-based analysis of text and documents: a survey. International Journal of Document Analysis and Recognition 7, 2 (2005).

Digital Library

[24]

Yinhan Liu, Myle Ott, and Naman Goyal et al. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692.

[25]

David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91 (2004).

Digital Library

[26]

Maroua Mehri, Akrem Sellami, and Salvatore Tabbone. 2023. Historical Document Image Segmentation Combining Deep Learning and Gabor Features. In International Conference on Document Analysis and Recogntion.

[27]

Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, and Andrea Vedaldi. 2022. Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In IEEE Conference on Computer Vision and Pattern Recognition.

[28]

Jamshed Memon, Maira Sami, and Rizwan Ahmed Khan et al. 2020. Handwritten optical character recognition (ocr): A comprehensive systematic literature review (SLR). IEEE ACCESS 8 (2020).

[29]

Tien-Nam Nguyen, Jean-Christophe Burie, Thi-Lan Le, and Anne-Valérie Schweyer. 2022. An effective method for text line segmentation in historical document images. In International Conference on Pattern Recognition.

[30]

Van Nguyen Nguyen, Thibault Groueix, and Georgy Ponimatkin et al. 2023. CNOS: A Strong Baseline for CAD-based Novel Object Segmentation. In International Conference on Computer Vision Workshops.

[31]

Maxime Oquab, Timothée Darcet, and Théo Moutakanni et al. 2023. DINOv2: Learning Robust Visual Features without Supervision. https://doi.org/10.48550/arXiv.2304.07193.

[32]

Alec Radford, Jong W. Kim, and Chris Hallacy et al. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning.

[33]

Ryan Schuerkamp, Jared Barrett, and Amber Bales et al. 2023. Enabling new interactions with library digital collections: automatic gender recognition in historical postcards via deep learning. The Journal of Academic Librarianship 49, 4 (2023).

[34]

Amarjot Singh, Ketan Bacchuwar, and Akshay Bhasin. 2012. A survey of ocr applications. International Journal of Machine Learning and Computing 3 (2012).

[35]

Josef Sivic and Andrew Zisserman. 2003. Video Google: a text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision.

Digital Library

[36]

Thomas Smits, Wouter Haverals, and Loren Verreyen et al. 2023. Greetings from! Extracting address information from 100, 000 historical picture postcards. In Workshop on Computational Humanities Research.

[37]

Kyoko Sugisaki, Nicolas Wiedmer, and Heiko Hausendorf. 2018. Building a corpus from handwritten picture postcards: Transcription, annotation and partof- speech tagging. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation.

[38]

Hugo Touvron, Thibaut Lavril, and Gautier Izacard et al. 2023. LLaMA: Open and Efficient Foundation Language Models. https://doi.org/10.48550/arXiv.2302.13971.

[39]

Hugo Touvron, Louis Martin, and et al. Kevin Stone. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. https://doi.org/10.48550/arXiv.2307.09288.

[40]

Yangtao Wang, Xi Shen, and Shell Hu et al. 2022. Self-supervised transformers for unsupervised object discovery using normalized cut. In IEEE Conference on Computer Vision and Pattern Recognition.

[41]

Fabian Wolf and Gernot A. Fink. 2022. Self-training of handwritten word recognition for synthetic-to-real adaptation. In International Conference on Pattern Recognition.

[42]

Artem B. Yandex and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval. In IEEE International conference on computer vision.

[43]

Honggang Zhang, Kaili Zhao, Yi-Zhe Song, and Jun Guo. 2013. Text extraction from natural scene image: A survey. Neurocomputing 122 (2013).

Digital Library

[44]

Jingyi Zhang, Fumin Shen, Li Liu, and Mengyang Yu et al. Fan Zhu. 2018. Generative domain-migration hashing for sketch-to-image retrieval. In European conference on computer vision.

[45]

Liang Zheng, Yi Yang, and Qi Tian. 2015. SIFT meets CNN: A decade survey of instance retrieval. IEEE transactions on pattern analysis and machine intelligence 14, 8 (2015).

[46]

Xueyan Zou, Jianwei Yang, and Hao Zhang et al. 2023. Segment everything everywhere all at once. https://doi.org/10.48550/arXiv.2304.06718.

Index Terms

Historical Postcards Retrieval through Vision Foundation Models
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
      2. Optical character recognition

Recommendations

Deep Layout Extraction Applied to Historical Postcards
Bio-inspired Systems and Applications: from Robotics to Ambient Intelligence
Abstract
We describe an experimental study on the layout extraction problem applied to circulated old postcards. This type of historical documents presents many challenging aspects related with their automatic analysis as images. For example, their ...
Optical Character Recognition Techniques for Restoration of Thai Historical Documents
ICCEE '08: Proceedings of the 2008 International Conference on Computer and Electrical Engineering

Historical documents are national treasures. Insignificant effort has been made to restore Thai historical documents. Other nations such as Egypt, China, Greece, and USA are investing a large effort in restoring and preserving their national historical ...
Constructing a Recipe Web from Historical Newspapers
The Semantic Web – ISWC 2018
Abstract
Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SUMAC '24: Proceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents

October 2024

67 pages

ISBN:9798400712050

DOI:10.1145/3689094

Program Chairs:
Valerie Gouet-Brunet
LaSTIG Lab / IGN - Gustave Eiffel University, France
,
Ronak Kosti
Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany
,
Li Weng
Zhejiang Financial College, Hangzhou, China

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

région Grand Est France

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

Overall Acceptance Rate 5 of 6 submissions, 83%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
123
Total Downloads

Downloads (Last 12 months)123
Downloads (Last 6 weeks)50

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten