Evaluation of Different Tagging Schemes for Named Entity Recognition in Handwritten Documents

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14189))

Included in the following conference series:

International Conference on Document Analysis and Recognition

962 Accesses
1 Citations

Abstract

Performing Named Entity Recognition on Handwritten Documents results in categorizing particular fragments of the automatic transcription which may be employed in information extraction processes. Different corpora employ different tagging notations to identify Named Entities, which may affect the performance of the trained model. In this work, we analyze three different tagging notations on three databases of handwritten line-level images. During the experimentation, we train the same Convolutional Recurrent Neural Network (CRNN) and n-gram character Language Model on the resulting data and observe how choosing the best tagging notation depending on the characteristics of each task leads to noticeable performance increments.

This work was supported by Grant PID2020-116813RB-I00 funded by MCIN/AEI/10.13039/501100011033, by Grant ACIF/2021/436 funded by Generalitat Valenciana and by Grant PID2021-124719OB-I00 funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluation of Named Entity Recognition in Handwritten Documents

DocNER: A Deep Learning System for Named Entity Recognition in Handwritten Document Images

Named Entity Recognition with Gated Convolutional Neural Networks

Notes

1.
The documentation for the employed Simplex implementation is available at: https://docs.scipy.org/doc/scipy/reference/optimize.linprog-simplex.html.

References

Abadie, N., Carlinet, E., Chazalon, J., Duménieu, B.: A benchmark of named entity recognition approaches in historical documents application to 19th century French directories. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 445–460. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_30
Chapter Google Scholar
Bluche, T.: Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. Ph.D. thesis, Université Paris Sud-Paris XI (2015)
Google Scholar
Boroş, E., et al.: A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 79–84. IEEE (2020)
Google Scholar
Carbonell, M., Villegas, M., Fornés, A., Lladós, J.: Joint recognition of handwritten text and named entities with a neural end-to-end model. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 399–404. IEEE (2018)
Google Scholar
Catelli, R., Casola, V., De Pietro, G., Fujita, H., Esposito, M.: Combining contextualized word representation and sub-document level analysis through bi-LSTM+ CRF architecture for clinical de-identification. Knowl.-Based Syst. 213, 106649 (2021)
Article Google Scholar
Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMS. Pattern Recognition Letters 33(7), 934–942 (2012). https://doi.org/10.1016/j.patrec.2011.09.009, special Issue on Awards from ICPR 2010
Johansson, S., Leech, G., Goodluck, H.: Manual of information to accompany the lancaster-oslo-bergen corpus of British English, for use with digital computers (1978). http://korpus.uib.no/icame/manuals/LOB/INDEX.HTM
Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 459–472. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_32
Chapter Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)
Google Scholar
Maarand, M., Beyer, Y., Kåsen, A., Fosseide, K.T., Kermorvant, C.: A comprehensive comparison of open-source libraries for handwritten text recognition in norwegian. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 399–413. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_27
Chapter Google Scholar
Marti, U.V., Bunke, H.: The i am-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article MATH Google Scholar
Mocholí Calvo, C.: Development and experimentation of a deep learning system for convolutional and recurrent neural networks. Degree’s thesis, Universitat Politècnica de València (2018)
Google Scholar
Mohit, B.: Named entity recognition. In: Zitouni, I. (ed.) Natural Language Processing of Semitic Languages. TANLP, pp. 221–245. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45358-8_7
Chapter Google Scholar
Monroc, C.B., Miret, B., Bonhomme, M.L., Kermorvant, C.: A comprehensive study of open-source libraries for named entity recognition on handwritten historical documents. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 429–444. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_29
Chapter Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. No. CFP11SRW-USB, IEEE Signal Processing Society (2011)
Google Scholar
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)
Google Scholar
Romero, V., et al.: The Esposalles database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recognit. 46(6), 1658–1669 (2013). https://doi.org/10.1016/j.patcog.2012.11.024
Article Google Scholar
Rowtula, V., Krishnan, P., Jawahar, C.: Pos tagging and named entity recognition on handwritten documents. In: Proceedings of the 15th International Conference on Natural Language Processing, p. 87–91 (2018)
Google Scholar
Sánchez, J.A., Bosch, V., Romero, V., Depuydt, K., De Does, J.: Handwritten text recognition for historical documents in the transcriptorium project. In: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, pp. 111–117 (2014)
Google Scholar
Stolcke, A.: Srilm - an extensible language modeling toolkit. In: Proceedings of 7th International Conference on Spoken Language Processing (ICSLP 2002), pp. 901–904 (2002)
Google Scholar
Tarride, S., Lemaitre, A., Coéasnon, B., Tardivel, S.: A comparative study of information extraction strategies using an attention-based neural network. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems. DAS 2022. LNCS, vol. 13237, pp. 644–658. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_43
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task chunking. In: Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop, pp. 127–132 (2000). https://aclanthology.org/W00-0726
Tüselmann, O., Fink, G.A.: Named entity linking on handwritten document images. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 199–213. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_14
Chapter Google Scholar
Tüselmann, O., Wolf, F., Fink, G.A.: Are end-to-end systems really necessary for NER on handwritten document images? In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 808–822. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_52
Chapter Google Scholar
Villanova-Aparisi, D.: Line-level named entity recognition annotation for the George Washington and IAM datasets (2023). https://doi.org/10.5281/zenodo.7805128
Villanova-Aparisi, D., Martínez-Hinarejos, C.D., Romero, V., Pastor-Gadea, M.: Evaluation of named entity recognition in handwritten documents. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems. DAS 2022. LNCS, vol. 13237, pp. 568–582. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_38
Villegas, M., Romero, V., Sánchez, J.A.: On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 208–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_24
Chapter Google Scholar
Wen, Y., Fan, C., Chen, G., Chen, X., Chen, M.: A survey on named entity recognition. In: Liang, Q., Wang, W., Liu, X., Na, Z., Jia, M., Zhang, B. (eds.) CSPS 2019. LNEE, vol. 571, pp. 1803–1810. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9409-6_218
Chapter Google Scholar
Wick, C., Zöllner, J., Grüning, T.: Transformer for handwritten text recognition using bidirectional post-decoding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 112–126. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_8
Chapter Google Scholar
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Google Scholar
Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics. pp. 2145–2158. Association for Computational Linguistics, Santa Fe, New Mexico, USA, August 2018. https://aclanthology.org/C18-1182

Download references

Author information

Authors and Affiliations

PRHLT Research Center, Universitat Politècnica de València, Camí de Vera, s/n, València, 46021, Spain
David Villanova-Aparisi, Carlos-D. Martínez-Hinarejos & Moisés Pastor-Gadea
Departament d’Informàtica, Universitat de València, València, 46010, Spain
Verónica Romero

Authors

David Villanova-Aparisi
View author publications
You can also search for this author in PubMed Google Scholar
Carlos-D. Martínez-Hinarejos
View author publications
You can also search for this author in PubMed Google Scholar
Verónica Romero
View author publications
You can also search for this author in PubMed Google Scholar
Moisés Pastor-Gadea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Villanova-Aparisi .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villanova-Aparisi, D., Martínez-Hinarejos, CD., Romero, V., Pastor-Gadea, M. (2023). Evaluation of Different Tagging Schemes for Named Entity Recognition in Handwritten Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-41682-8_1
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41681-1
Online ISBN: 978-3-031-41682-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Evaluation of Different Tagging Schemes for Named Entity Recognition in Handwritten Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of Named Entity Recognition in Handwritten Documents

DocNER: A Deep Learning System for Named Entity Recognition in Handwritten Document Images

Named Entity Recognition with Gated Convolutional Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Evaluation of Different Tagging Schemes for Named Entity Recognition in Handwritten Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of Named Entity Recognition in Handwritten Documents

DocNER: A Deep Learning System for Named Entity Recognition in Handwritten Document Images

Named Entity Recognition with Gated Convolutional Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation