No abstract available.
Proceeding Downloads
Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs
Together with critical editions and translations, commentaries are one of the main genres of publication in literary and textual scholarship, and have a century-long tradition. Yet, the exploitation of thousands of digitized historical commentaries was ...
Mixed Model OCR Training on Historical Latin Script for Out-of-the-Box Recognition and Finetuning
In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) ...
A survey of OCR evaluation tools and metrics
- Clemens Neudecker,
- Konstantin Baierer,
- Mike Gerber,
- Christian Clausner,
- Apostolos Antonacopoulos,
- Stefan Pletschacher
The millions of pages of historical documents that are digitized in libraries are increasingly used in contexts that have more specific requirements for OCR quality than keyword search. How to comprehensively, efficiently and reliably assess the ...
Segmentation of historical maps without annotated data
This paper presents the method which we submitted to the competition of Historical Map Segmentation, in ICDAR’21. The goal is to segment document images of Paris maps from the beginning of the 20th century: delineate the content of the map and locate ...
Text Detection and Recognition by using CNNs in the Austro-Hungarian Historical Military Mapping Survey
Historical maps include precious data about historical, geographical and economic perspectives of a period. However, several unique challenges and opportunities accompany historical maps compared to modern maps, such as low-quality images, degraded ...
Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers
The segmentation of complex images into semantic regions has seen a growing interest these last years with the advent of Deep Learning. Until recently, most existing methods for Historical Document Analysis focused on the visual appearance of documents, ...
The BIR database – Identifying typographic emphasis in list-like historical documents
- Anna Scius-Bertrand,
- Simon Gabay,
- Juliette Janes,
- Ljudmila Petkovic,
- Caroline Corbieres,
- Thibault Clerice
Layout analysis and optical character recognition have become traditional tasks for processing historical prints, but are now insufficient. Additional information is found in typographic emphasis, such as bold and italic letters. They carry semantic ...
Digital Peter: New Dataset, Competition and Handwriting Recognition Methods
- Mark Potanin,
- Denis Dimitrov,
- Alex Shonenkov,
- Vladimir Bataev,
- Denis Karachev,
- Maxim Novopoltsev,
- Andrey Chertok
This paper presents a new dataset of Peter the Great’s manuscripts and describes a segmentation procedure that converts initial images of documents into lines. This new dataset may be useful for researchers to train handwriting text recognition models ...
GloSAT Historical Measurement Table Dataset: Enhanced Table Structure Recognition Annotation for Downstream Historical Data Rescue
Understanding and extracting tables from documents is a research problem that has been studied for decades. Table structure recognition is the labelling of components within a detected table, which can be detected automatically or manually provided. ...
Generalized Template Matching for Semi-structured Text
Conventional template matching for named entity recognition on book-length text strings is generalized by allowing search phrases to capture distant tokens. Combined with word-type tagging and format variants (alternative name/date formats), a few ...
BiblIA - a General Model for Medieval Hebrew Manuscripts and an Open Annotated Dataset
- Daniel Stoekl Ben Ezra,
- Bronson Brown-DeVost,
- Pawel Jablonski,
- Hayim Lapin,
- Benjamin Kiessling,
- Elena Lolli
The paper presents Open Source generalized models for recognition and page segmentation, intended for use on the eScriptorium platform or kraken OCR engine, of Medieval Hebrew manuscripts in square script that arrive at a character accuracy of more ...
Visual Analysis of Chapbooks Printed in Scotland
Chapbooks were short, cheap printed booklets produced in large quantities in Scotland, England, Ireland, North America and much of Europe between roughly the seventeenth and nineteenth centuries. A form of popular literature containing songs, stories, ...
Index Terms
- Proceedings of the 6th International Workshop on Historical Document Imaging and Processing