Nothing Special   »   [go: up one dir, main page]

Skip to main content

Automated Processing of Digitized Historical Newspapers beyond the Article Level: Sections and Regular Features

  • Conference paper
The Role of Digital Libraries in a Time of Global Change (ICADL 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6102))

Included in the following conference series:

Abstract

Millions of pages of historical newspapers have been digitized but in most cases access to these are supported by only basic search services. We are exploring interactive services for these collections which would be useful for supporting access, including automatic categorization of articles. Such categorization is difficult because of the uneven quality of the OCR text, but there are many clues which can be useful for improving the accuracy of the categorization. Here, we describe observations of several historical newspapers to determine the characteristics of sections. We then explore how to automatically identify those sections and how to detect serialized feature articles which are repeated across days and weeks. The goal is not the introduction of new algorithms but the development of practical and robust techniques. For both analyses we find substantial success for some categories and articles, but others prove very difficult.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Murray, R.L.: Toward a Metadata Standard for Digitized Historical Newspapers. In: Proceedings of IEEE/ACM JCDL, pp. 330–331 (2005)

    Google Scholar 

  2. Allen, R.B., Waldstein, I., Zhu, W.Z.: Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres. In: Buchanan, G., Masoodian, M., Cunningham, S.J. (eds.) ICADL 2008. LNCS, vol. 5362, pp. 380–387. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Toms, E., Flora, N.: From Physical to Digital Humanities Library: Designing the Humanities Scholar’s Workbench. In: Siemens, R., Moorman, D. (eds.) Mind Technologies, Humanities Computing, and the Canadian Academic Community, pp. 91–115. U. Calgary Press, Calgary (2006)

    Google Scholar 

  4. Allen, R.B.: Improving Access to Digitized Historical Newspapers with Text Mining, Coordinated Models, and Formative User Interface Design. In: IFLA International Newspaper Conference: Digital Preservation and Access to News and Views, pp. 54–59 (2010)

    Google Scholar 

  5. Holley, R.: How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs. D-Lib Magazine 15(3/4) (March/April 2009)

    Google Scholar 

  6. Ihlström, C., Åkesson, M.: Genre Characteristics – A Front Page Analysis of 85 Swedish Online Newspapers. In: Proceedings of the Proceedings of the Hawaii International Conference on System Sciences (2004)

    Google Scholar 

  7. Foulger, D.: Medium as an Ecology of Genre: Integrating Media Theory and Genre Theory. Media Ecology Association (2006)

    Google Scholar 

  8. Allen, R.B., Nalluru, S.: Exploring History with Narrative Timelines. In: Smith, M.J., Salvendy, G. (eds.) HCII 2009. LNCS, vol. 5617, pp. 333–338. Springer, Heidelberg (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Allen, R.B., Hall, C. (2010). Automated Processing of Digitized Historical Newspapers beyond the Article Level: Sections and Regular Features. In: Chowdhury, G., Koo, C., Hunter, J. (eds) The Role of Digital Libraries in a Time of Global Change. ICADL 2010. Lecture Notes in Computer Science, vol 6102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13654-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13654-2_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13653-5

  • Online ISBN: 978-3-642-13654-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics