Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3604951.3605512acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article
Open access

DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents

Published: 25 August 2023 Publication History

Abstract

In this work, we propose DocLangID, a transfer learning approach to identify the language of unlabeled historical documents. We achieve this by first leveraging labeled data from a different but related domain of historical documents. Secondly, we implement a distance-based few-shot learning approach to adapt a convolutional neural network to new languages of the unlabeled dataset. By introducing small amounts of manually labeled examples from the set of unlabeled images, our feature extractor develops a better adaptability towards new and different data distributions of historical documents. We show that such a model can be effectively fine-tuned for the unlabeled set of images by only reusing the same few-shot examples. We showcase our work across 10 languages that mostly use the Latin script. Our experiments on historical documents demonstrate that our combined approach improves the language identification performance, achieving 74% recognition accuracy on the four unseen languages of the unlabeled dataset.

References

[1]
Neelotpal Chakraborty, Soumyadeep Kundu, Sayantan Paul, Ayatullah Faruk Mollah, Subhadip Basu, and Ram Sarkar. 2020. Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. Journal of Ambient Intelligence and Humanized Computing 12 (2020), 7997–8008.
[2]
Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2020. A closer look at few-shot classification. In International Conference on Learning Representations.
[3]
Chelsea Finn, P. Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning.
[4]
Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4367–4375. https://doi.org/10.1109/CVPR.2018.00459
[5]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
[6]
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).
[7]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 427–431.
[8]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.
[9]
Ankit Lat and C. V. Jawahar. 2018. Enhancing OCR Accuracy with Super Resolution. In 2018 24th International Conference on Pattern Recognition (ICPR). 3162–3167. https://doi.org/10.1109/ICPR.2018.8545609
[10]
Liqiong Lu, Yaohua Yi, Faliang Huang, Kaili Wang, and Qi Wang. 2019. Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images. IEEE Access 7 (2019), 52669–52679.
[11]
Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2018), 532–548.
[12]
Shilpa Mahajan and Rajneesh Rani. 2022. Word Level Script Identification Using Convolutional Neural Network Enhancement for Scenic Images. Transactions on Asian and Low-Resource Language Information Processing 21 (2022), 1 – 29.
[13]
Christos Papadopoulos, Stefan Pletschacher, Christian Clausner, and Apostolos Antonacopoulos. 2013. The IMPACT Dataset of Historical Document Images. In Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing (Washington, District of Columbia, USA) (HIP ’13). Association for Computing Machinery, New York, NY, USA, 123–130. https://doi.org/10.1145/2501115.2501130
[14]
Hang Qi, Matthew Brown, and David G. Lowe. 2018. Low-Shot Learning with Imprinted Weights. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5822–5830. https://doi.org/10.1109/CVPR.2018.00610
[15]
Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. 2019. Meta-Learning with Latent Embedding Optimization. In International Conference on Learning Representations. https://openreview.net/forum?id=BJgklhAcK7
[16]
Sarathi Shah and M. V. Joshi. 2021. Document Language Classification: Hierarchical Model with Deep Learning Approach. In Computer Analysis of Images and Patterns, Nicolas Tsapatsoulis, Andreas Panayides, Theo Theocharides, Andreas Lanitis, Constantinos Pattichis, and Mario Vento (Eds.). Springer International Publishing, Cham, 372–381.
[17]
R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
[18]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks for Few-Shot Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4080–4090.
[19]
Shubham Vatsal, Nikhil Arora, Gopi Ramena, Sukumar Moharana, Dhruval Jain, Naresh Purre, and Rachit S Munjal. 2020. On-Device Language Identification of Text in Images using Diacritic Characters. In International Conference on Computer Vision and Image Processing.
[20]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., Red Hook, NY, USA, 3637–3645.
[21]
Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, and Cheng-Lin Liu. 2017. Scene Text Recognition with Sliding Convolutional Character Models. ArXiv abs/1709.01727 (2017).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing
August 2023
117 pages
ISBN:9798400708411
DOI:10.1145/3604951
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Convolutional Neural Networks.
  2. Few-Shot Training
  3. Language Identification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HIP '23

Acceptance Rates

Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 178
    Total Downloads
  • Downloads (Last 12 months)151
  • Downloads (Last 6 weeks)30
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media