Abstract
The Alexandria Digital Library (ADL) project has been working on automating the processes of building ADL collections and gathering the collection statistics on which ADL’s discovery system is based. As part of this effort, we have created a language and supporting programmatic framework for expressing mappings from XML metadata schemas to the required ADL metadata views. This language, based on the Python scripting language, is largely declarative in nature, corresponding to the fact that mappings can be largely—though not entirely—specified by crosswalk-type specifications. At the same time, the language allows mappings to be specified procedurally, which we argue is necessary to deal effectively with the realities of poor quality, highly variable, and incomplete metadata. An additional key feature of the language is the ability to derive new mappings from existing mappings, thereby making it easy to adapt generic mappings to the idiosyncrasies of particular metadata providers. We evaluate this language on three metadata standards (ADN, FGDC, and MARC) and three corresponding collections of metadata. We also note limitations, future research directions, and generalizations of this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ANSI Z39.50-1995. Information Retrieval (Z39.50) Application Service Definition and Protocol Specification, http://www.loc.gov/z3950/agency/markup/markup.html
Arms, W.Y., Dushay, N., Fulker, D., Lagoze, C.: A Case Study in Meta-data Harvesting: the NSDL. Library Hi Tech 21(2), 228–237 (2003), http://dx.doi.org/10.1108/07378830310479866.
James Clark (ed.). XSL Transformations (XSLT). Version 1.0., http://www.w3.org/TR/xslt
Doerr, M.: Semantic Problems of Thesaurus Mapping. Journal of Digital Information 1(8) (March 2001), http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/
Environmental Systems Research Institute (ESRI), Inc. ESRI Profile of the Content Stan-dard for Digital Geospatial Metadata (March 2003), http://www.esri.com/metadata/esriprof80.html
Federal Geographic Data Committee. FGDC-STD-001-1998. Content Standard for Digi-tal Geospatial Metadata (June 1998), http://www.fgdc.gov/metadata/contstan.html
Frew, J., Janée, G.: A Comparison of the Dublin Core Metadata Element Set and the Alexandria Digital Library Bucket Framework (2003), http://www.alexandria.ucsb.edu/~gjanee/archive/2003/dc-adl.pdf
Ghezzi, C., Jazayeri, M.: Programming Language Concepts, 2nd edn. John Wiley & Sons, New York (1987)
Godby, C.J., Young, J.A., Childress, E.: A Repository of Metadata Crosswalks. D-Lib Magazine 10(12) (December 2004)
Guillaume, D., Plante, R.: Declarative Metadata Processing with XML and Java. In: Astronomical Data Analysis Software and Systems X. ASP Conference Series, vol. 238 (2001), http://www.adass.org/adass/proceedings/adass00/O6-03/.
Halbert, M., Kaczmarek, J., Hagedorn, K.: Findings from the Mellon Meta-data Harvesting Initiative. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 58–69. Springer, Heidelberg (2003)
Hillmann, D., Dushay, N., Phipps, J.: Improving Metadata Quality: Augmen-tation and Recombination. In: DC-2004: International Conference on Dublin Core and Metadata Applications, Shanghai, China (October 2004), http://purl.org/metadataresearch/dcconf2004/papers/Paper_21.pdf.
Janée, G., Frew, J., Hill, L.L., Smith, T.R.: The ADL Bucket Frame-work. In: Third DELOS Workshop on Interoperability and Mediation in Heterogeneous Digital Libraries, Darmstadt, Germany (September 2001), http://www.ercim.org/publication/ws-proceedings/DelNoe03/13.pdf.
Janée, G., Frew, J.: The ADEPT Digital Library Architecture. In: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Portland, Ore-gon, July 2002, pp. 342–35 (2002), http://doi.acm.org/10.1145/544220.544306
Janée, G.: ADN Metadata Mapping (October 2003), http://www.alexandria.ucsb.edu/~gjanee/archive/2003/adn-mapping.html
Janée, G., Frew, J., Valentine, D.: Content Access Characterization in Digi-tal Libraries. In: Proceedings of the 2003 Joint Conference on Digital Libraries (JCDL), Houston, Texas, May 2003, pp. 261–262 (2003), http://doi.acm.org/10.1145/827140.827185.
Kepser, S.: A Simple Proof for the Turing-Completeness of XSLT and XQuery. Extreme Markup Languages (2004), http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Kepser01/EML2004Kepser01.html
Lagoze, C., Van de Sompel, H. (eds.): The Open Archives Initiative Protocol for Metadata Harvesting. Version 2.0, June 14 (2002), http://www.openarchives.org/OAI/openarchivesprotocol.html.
Manghi, P., Simeoni, F., Lievens, D., Connor, R.: Hybrid Applications over XML: Integrating the Procedural and Declarative Approaches. In: Fourth ACM CIKM International Workshop on Web Information and Data Management (WIDM), McLean, Virginia (November 2002), http://doi.acm.org/10.1145/584931.584935
Mertz, D.: Create declarative mini-languages: Programming as assertion rather than instruction. In: Charming Python (2003), http://www.ibm.com/developerworks/library/l-cpdec.html.
Miles, A., Matthews, B.: Inter-Thesaurus Mapping (2005), Retrieved February 22 (2005), http://www.w3.org/2001/sw/Europe/reports/thes/8.4/
Paepcke, A., Brandriff, R., Janée, G., Larson, R., Ludäscher, B., Melnik, S., Raghavan, S.: Search Middleware and the Simple Digital Library Inter-operability Protocol. D-Lib Magazine 6(3) (March 2000)
Raymond, E.S.: The Art of Unix Programming. Addison-Wesley, Boston (2004)
Sathish, K., Maly, K., Zubair, M., Liu, X.: RVOT: A Tool For Making Collections OAI-PMH Compliant. In: Proceedings, 5th Russian Conference on Digital Libraries (RCDL), St. Petersburg, Russia (October 2003), http://RCDL2003.spbu.ru/proceedings/A5.pdf.
Sengupta, A., Dalkılıç, M.E.: DSQL - an SQL for structured documents. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 757–760. Springer, Heidelberg (2002)
Spinellis, D.: Notable Design Patterns for Domain-Specific Languages. Journal of Systems and Software 56(1), 91–99 (2001), http://www.dmst.aueb.gr/dds/pubs/jrnl/2000-JSS-DSLPatterns/html/dslpat.html
Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata Quality for Federated Collections. In: Proceedings of the 9th International Conference on Information Quality (ICIQ), Boston, Massachusetts, November 2004, pp. 111–125 (2004)
Ullman, J.D., Widom, J.: A First Course in Database Systems, 2nd edn. Prentice-Hall, Upper Saddle River (2002)
Woodley, M.S., et al.: DCMI Glossary. September 15 (2003), http://dublincore.org/documents/usageguide/glossary.shtml
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Janée, G., Frew, J. (2005). A Hybrid Declarative/Procedural Metadata Mapping Language Based on Python. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_27
Download citation
DOI: https://doi.org/10.1007/11551362_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28767-4
Online ISBN: 978-3-540-31931-3
eBook Packages: Computer ScienceComputer Science (R0)