Abstract.
Digital preservation of newspaper archives aims both at the salvation of endangered material (paper) and at the creation of digital library services that will allow full utilization of the archives by all interested parties. In this paper, we address a series of issues pertaining to the retro-conversion of newspapers, i.e., the conversion of newspaper pages into digital resources. An integrated approach is presented that provides solutions to problems related to newspaper page image enhancement, segmentation of pages into various items (titles, text, images etc), article identification and reconstruction, and, finally, recognition of the textual components. Emphasis is placed on the most difficult intermediate stages of page segmentation and article identification and reconstruction. Detailed experimental results, obtained from a large testbed of old newspaper issues, are presented which clearly demonstrate the applicability of our methodology to the successful retro-conversion of newspaper material.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received: 21 December 1998 / Revised: 25 May 1999
Rights and permissions
About this article
Cite this article
Gatos, B., Mantzaris, S., Perantonis, S. et al. Automatic page analysis for the creation of a digital library from newspaper archives. Int J Digit Libr 3, 77–84 (2000). https://doi.org/10.1007/PL00021477
Issue Date:
DOI: https://doi.org/10.1007/PL00021477