Abstract
Publishing and archiving mathematical literature presents its own sets of problems. Reaching the goal of building global digital mathematics library (DML), smaller DMLs play an inevitable role in collecting, validating, digitizing and checking data from smaller publishers.
In this paper, we overview the technical challenges of building a machine-actionable set of modules we have developed over almost a decade of evolution of the Czech Digital Mathematics Library (DML-CZ). Firstly, we survey methods of effective automated data acquisition from the content providers. Then we show OCR processing of mathematical documents and automated segmentation of plain text references for metadata enhancement and effective DOI look up. Finally we describe connection to the European Digital Mathematics Library (EuDML) project and public interfaces of DML-CZ for the best visibility and accessibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ausbrooks, R., et al.: Mathematical Markup Language (MathML). Version 3.0. W3C Recommendation. World Wide Web Consortium (W3C) (October 21, 2010), Carlisle, D., Ion, P., Miner, R. (eds.), http://www.w3.org/TR/2010/REC-MathML3-20101021/ (visited on January 06, 2013)
Bartošek, M., Kovář, P., Šárfy, M.: DML-CZ Metadata Editor: Content Creation System for Digital Libraries. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 139–151. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702537 (visited on January 09, 2013)
Councill, I.G., Lee Giles, C., Kan, M.-Y.: ParsCit: An open-source CRF reference string parsing package. In: Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco (May 2008), http://www.comp.nus.edu.sg/~kanmy/papers/lrec08b.pdf (visited on March 13, 2013)
Digital Archive of Journal Articles National Center for Biotechnology Information (NCBI) and National Library of Medicine (NLM). NCBI Book Tag Library version 3.0 (November 2008), http://dtd.nlm.nih.gov/book/
Grimm, J.: Producing MathML with Tralics. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 105–117. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702579 (visited on January 09, 2013)
Krejčíř, V.: Building Czech Digital Mathematics Library upon DSpace System. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 117–126. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702539 (visited on January 09, 2013)
Luong, M.-T., Nguyen, T.D., Kan, M.-Y.: Logical Structure Recovery in Scholarly Articles with Rich Document Features. International Journal of Digital Library Systems 4, 1–23 (2010), http://www.comp.nus.edu.sg/~kanmy/papers/ijdls-SectLabel.pdf , doi: 10.4018/jdls.2010100101 (visited on March 13, 2013)
National Information Standards Organization NISO. JATS: Journal Article Tag Suite, ANSI/NISO Z39.96-2012 (August 2012), http://jats.niso.org/
Růžička, M., Sojka, P.: Data Enhancements in a Digital Mathematics Library. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 69–76. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702575 (visited on January 13, 2013)
Růžička, M., Sojka, P.: Redakční systém odborného časopisu s podporou exportu do digitální knihovny v MathML. In: Zpravodaj CSTUG, pp. 4–20 (January 2011), doi:10.5300/2011-1/4
Růžička, M.: Automated Processing of TeX-typeset Articles for a Digital Library. In: Sojka, P. (ed.): Towards a Digital Mathematics Library, pp. 167–176. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702533 (visited on January 13, 2013)
Sojka, P., Líška, M.: The Art of Mathematics Retrieval. In: Proceedings of the ACM Conference on Document Engineering, DocEng 2011, pp. 57–60. ACM, Mountain View (2011) ISBN: 978-1-4503-0863-2, doi: 10.1145/2034691.2034703
Sojka, P. (ed.): Towards a Digital Mathematics Library. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702564 (visited on January 13, 2013)
Sojka, P. (ed.): Towards a Digital Mathematics Library. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702567 (visited on January 13, 2013)
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY–An integrated OCR system for mathematical documents. In: Vanoirbeek, C., Roisin, C., Munson, E. (eds.) Proceedings of ACM Symposium on Document Engineering, pp. 95–104. ACM, Grenoble (2003)
Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P.: EuDML–Towards the European Digital Mathematics Library. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 11–24. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702569 (visited on January 13, 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Růžička, M., Sojka, P., Krejčíř, V. (2013). Towards Machine-Actionable Modules of a Digital Mathematics Library. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds) Intelligent Computer Mathematics. CICM 2013. Lecture Notes in Computer Science(), vol 7961. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39320-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-39320-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39319-8
Online ISBN: 978-3-642-39320-4
eBook Packages: Computer ScienceComputer Science (R0)