Abstract
This paper introduces an agent-centric approach to handle novelty in the visual recognition domain of handwriting recognition (HWR). An ideal transcription agent would rival or surpass human perception, being able to recognize known and new characters in an image, and detect any stylistic changes that may occur within or across documents. A key confound is the presence of novelty, which has continued to stymie even the best machine learning-based algorithms for these tasks. In handwritten documents, novelty can be a change in writer, character attributes, writing attributes, or overall document appearance, among other things. Instead of looking at each aspect independently, we suggest that an integrated agent that can process known characters and novelties simultaneously is a better strategy. This paper formalizes the domain of handwriting recognition with novelty, describes a baseline agent, introduces an evaluation protocol with benchmark data, and provides experimentation to set the state-of-the-art. Results show feasibility for the agent-centric approach, but more work is needed to approach human-levels of reading ability, giving the HWR community a formal basis to build upon as they solve this challenging problem.
This research was sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO) under multiple contracts/agreements including HR001120C0055, W911NF-20-2-0005,W911NF-20-2- 0004,HQ0034-19-D-0001, W911NF2020009. The views contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the DARPA or ARO, or the U.S. Government.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The code for this paper will be made publicly available after publication at https://github.com/prijatelj/handwriting_recognition_with_novelty.
- 2.
The Supplemental Material is publicly available at https://arxiv.org/abs/2105.06582.
References
Augustin, E., Carré, M., Grosicki, E., Brodin, J.M., Geoffrois, E., Prêteux, F.: RIMES evaluation campaign for handwritten mail processing. In: IWFHR (2006)
Bendale, A., Boult, T.E.: Towards open set deep networks. In: IEEE CVPR (2016)
Boult, T.E., et al.: A Unifying Framework for Formal Theories of Novelty:Framework, Examples and Discussion. arXiv:2012.04226 [cs], December 2020. http://arxiv.org/abs/2012.04226
Boult, T.E., et al.: A unifying framework for formal theories of novelty: framework, examples and discussion. In: AAAI (2021)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE CVPR (2005)
DARPA: Teaching AI systems to adapt to dynamic environments (2019). https://www.darpa.mil/news-events/2019-02-14. Accessed 11 Jan 2021
Fei, G., Liu, B.: Breaking the closed world assumption in text classification. In: NAACL-HLT (2016)
Fiel, S., Kleber, F., Diem, M., Christlein, V., Louloudis, G., Nikos, S., Gatos, B.: ICDAR2017 competition on historical document writer identification (Historical-WI). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 1377–1382 (November 2017). https://doi.org/10.1109/ICDAR.2017.225. iSSN: 2379-2140
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)
Langley, P.: Open-world learning for radically autonomous agents. In: AAAI (2020)
van Lit, L.: Paleography: between erudition and computation. In: Among Digitized Manuscripts. Philology, Codicology, Paleography in a Digital World, pp. 102–131. Brill (2019)
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE T-PAMI 28(5), 712–724 (2006)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)
Ontario Ministry of Education: A Guide to Effective Literacy Instruction, Grades 4 to 6: A Multivolume Resource from the Ministry of Education. Volume one, Foundations of literacy instruction for the junior learner, p. 37. Ontario Ministry of Education (2006). https://books.google.com/books?id=Y8F4oAEACAAJ
Rayner, K., Pollatsek, A., Ashby, J., Clifton Jr, C.: Psychology of Reading. Psychology Press, London (2012)
Rudd, E.M., Jain, L.P., Scheirer, W.J., Boult, T.E.: The extreme value machine. IEEE T-PAMI 40(3), 762–768 (2017)
Ruff, L., et al.: A unifying review of deep and shallow anomaly detection. arXiv preprint arXiv:2009.11732 (2020)
Russakovsky, O.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Sanchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR2016 competition on handwritten text recognition on the read dataset. In: ICFHR (2016)
Scheirer, W.J., Jain, L.P., Boult, T.E.: Probability models for open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2317–2324 (2014)
Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE T-PAMI 35(7), 1757–1772 (2012)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE T-PAMI 39(11), 2298–2304 (2017)
Smith, D.A., Cordell, R.: A research agenda for historical and multilingual opticalcharacter recognition. Technical report, Northeastern University (2018)
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725 (September 2019). https://doi.org/10.1109/ICDAR.2019.00120. iSSN: 2379-2140
Stutzmann, D.: Clustering of medieval scripts through computer image analysis: towards an evaluation protocol. Digit. Mediev. 10 (2016). https://doi.org/10.16995/dm.61. http://journal.digitalmedievalist.org//articles/10.16995/dm.61/. ISSN: 1715-0736
Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: ICDAR (2017)
Yampolskiy, R.V., Govindaraju, V.: Behavioural biometrics: a survey and classification. Int. J. Biom. 1(1), 81–113 (2008)
Zhang, H., Patel, V.M.: Sparse representation-based open set recognition. IEEE T-PAMI 39(8), 1690–1696 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Prijatelj, D.S., Grieggs, S., Yumoto, F., Robertson, E., Scheirer, W.J. (2021). Handwriting Recognition with Novelty. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-86337-1_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)