research-article

Open access

DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents

Authors:

Brian Pfitzmann,

Christoph MeinelAuthors Info & Claims

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

Pages 103 - 108

https://doi.org/10.1145/3604951.3605512

Published: 25 August 2023 Publication History

All formats PDF

Abstract

In this work, we propose DocLangID, a transfer learning approach to identify the language of unlabeled historical documents. We achieve this by first leveraging labeled data from a different but related domain of historical documents. Secondly, we implement a distance-based few-shot learning approach to adapt a convolutional neural network to new languages of the unlabeled dataset. By introducing small amounts of manually labeled examples from the set of unlabeled images, our feature extractor develops a better adaptability towards new and different data distributions of historical documents. We show that such a model can be effectively fine-tuned for the unlabeled set of images by only reusing the same few-shot examples. We showcase our work across 10 languages that mostly use the Latin script. Our experiments on historical documents demonstrate that our combined approach improves the language identification performance, achieving 74% recognition accuracy on the four unseen languages of the unlabeled dataset.

References

[1]

Neelotpal Chakraborty, Soumyadeep Kundu, Sayantan Paul, Ayatullah Faruk Mollah, Subhadip Basu, and Ram Sarkar. 2020. Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. Journal of Ambient Intelligence and Humanized Computing 12 (2020), 7997–8008.

[2]

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2020. A closer look at few-shot classification. In International Conference on Learning Representations.

[3]

Chelsea Finn, P. Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning.

[4]

Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4367–4375. https://doi.org/10.1109/CVPR.2018.00459

[5]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

[6]

Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).

[7]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 427–431.

[8]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.

[9]

Ankit Lat and C. V. Jawahar. 2018. Enhancing OCR Accuracy with Super Resolution. In 2018 24th International Conference on Pattern Recognition (ICPR). 3162–3167. https://doi.org/10.1109/ICPR.2018.8545609

[10]

Liqiong Lu, Yaohua Yi, Faliang Huang, Kaili Wang, and Qi Wang. 2019. Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images. IEEE Access 7 (2019), 52669–52679.

[11]

Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2018), 532–548.

[12]

Shilpa Mahajan and Rajneesh Rani. 2022. Word Level Script Identification Using Convolutional Neural Network Enhancement for Scenic Images. Transactions on Asian and Low-Resource Language Information Processing 21 (2022), 1 – 29.

Digital Library

[13]

Christos Papadopoulos, Stefan Pletschacher, Christian Clausner, and Apostolos Antonacopoulos. 2013. The IMPACT Dataset of Historical Document Images. In Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing (Washington, District of Columbia, USA) (HIP ’13). Association for Computing Machinery, New York, NY, USA, 123–130. https://doi.org/10.1145/2501115.2501130

Digital Library

[14]

Hang Qi, Matthew Brown, and David G. Lowe. 2018. Low-Shot Learning with Imprinted Weights. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5822–5830. https://doi.org/10.1109/CVPR.2018.00610

[15]

Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. 2019. Meta-Learning with Latent Embedding Optimization. In International Conference on Learning Representations. https://openreview.net/forum?id=BJgklhAcK7

[16]

Sarathi Shah and M. V. Joshi. 2021. Document Language Classification: Hierarchical Model with Deep Learning Approach. In Computer Analysis of Images and Patterns, Nicolas Tsapatsoulis, Andreas Panayides, Theo Theocharides, Andreas Lanitis, Constantinos Pattichis, and Mario Vento (Eds.). Springer International Publishing, Cham, 372–381.

[17]

R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991

[18]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks for Few-Shot Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4080–4090.

Digital Library

[19]

Shubham Vatsal, Nikhil Arora, Gopi Ramena, Sukumar Moharana, Dhruval Jain, Naresh Purre, and Rachit S Munjal. 2020. On-Device Language Identification of Text in Images using Diacritic Characters. In International Conference on Computer Vision and Image Processing.

[20]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., Red Hook, NY, USA, 3637–3645.

Digital Library

[21]

Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, and Cheng-Lin Liu. 2017. Scene Text Recognition with Sliding Convolutional Character Models. ArXiv abs/1709.01727 (2017).

Index Terms

DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
      2. Optical character recognition
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

Discriminatively Trained GMMs for Language Classification Using Boosting Methods

In language identification and other speech applications, discriminatively trained models often outperform nondiscriminative models trained with the maximum-likelihood criterion. For instance, discriminative Gaussian mixture models (GMMs) are typically ...
Improving Text Classification Accuracy by Training Label Cleaning

In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting ...
Improving Few-Shot Image Classification with Self-supervised Learning
Cloud Computing – CLOUD 2022
Abstract
Few-Shot Image Classification (FSIC) aims to learn an image classifier with only a few training samples. The key challenge of few-shot image classification is to learn this classifier with scarce labeled data. To tackle the issue, we leverage the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

August 2023

117 pages

ISBN:9798400708411

DOI:10.1145/3604951

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HIP '23

HIP '23: 7th International Workshop on Historical Document Imaging and Processing

August 25 - 26, 2023

CA, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
178
Total Downloads

Downloads (Last 12 months)151
Downloads (Last 6 weeks)30

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten