Abstract
In this paper, we present an advanced method for on-demand construction of OLAP cubes for ROLAP systems. The method contains the steps from cube design to ETL but focuses on ETL. Actual data analysis can then be done using the tools and methods of the OLAP software at hand. The method is based on RDF/OWL ontologies and design tools. The ontology serves as a basis for designing and creating the OLAP schema, its corresponding database tables, and finally populating the database.
Our starting point is heterogeneous and distributed data sources that are eventually used to populate the OLAP cubes. Mapping between the source data and its OLAP form is done by converting the data first to RDF using ontology maps. Then the data are extracted from its RDF form by queries that are generated using the ontology of the OLAP schema. Finally, the extracted data are stored in the database tables and analysed using an OLAP software. Algorithms and examples are provided for all these steps.
In our tests, we have used an open source OLAP implementation and a database server. The performance of the system is found satisfactory when testing with a data source of 450 000 RDF statements. We also propose an ontology based tool that will work as a user interface to the system, from design to actual analysis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
XML Path Language (XPath). Technical report, W3C (1999)
OWL Web Ontology Language Overview. Technical report, W3C (2004)
RDF primer, W3C recommendation 10 February 2004. Technical report, W3C (2004)
RDF Vocabulary Description Language 1.0: RDF Schema. Technical report, W3C (2004)
Aberer, K., Cudré-Mauroux, P., Hauswirth, M., Van Pelt, T.: GridVine: Building internet-scale semantic overlay networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 107–121. Springer, Heidelberg (2004)
Antoniu, G., van Harmelen, F.: Web Ontology Language: OWL, ch. 4. Springer, Heidelberg (2004)
Bannon, M., Kontogiannis, K.: Semantic Web data description and discovery. In: STEP 2003: Eleventh Annual International Workshop on Software Technology and Engineering Practice. IEEE, Los Alamitos (2003)
Bray, T.: RDF and metadata. XML. com (1998)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 54. Springer, Heidelberg (2002)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)
Codd, E., Codd, S., Salley, C.: Providing OLAP to user-analysts: An IT Mandate. Technical report, Hyperion (1993)
Codd, E.F.: A relational model for large shared data banks. Communications of the ACM (1970)
Codd, E.F.: Further normalization of the data base relational model. In: Data Base Systems, Courant Computer Science Symposia Series 6 (1972)
Comito, C., Talia, D.: XML Data Integration in OGSA Grids. In: Pierson, J.-M. (ed.) VLDB DMG 2005. LNCS, vol. 3836, pp. 4–15. Springer, Heidelberg (2006)
Davidson, S., Buneman, P., Kosky, A.: Semantics of database transformations. LNCS, vol. 1358, pp. 55–91. Springer, Heidelberg (1998)
Gennari, J., et al.: The evolution of Protege – an environment for knowledge-based systems development. Int. J. Hum.-Comput. Stud. 58(1) (2003)
Gottlob, G., Koch, C., Pichler, R.: The complexity of XPath query evaluation. In: PODS 2003: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 179–190. ACM, New York (2003)
Hull, R.: Managing semantic heterogeneity in databases: a theoretical prospective. In: Proc. ACM Symposium on Principles of Databases (1997)
ITU-T. ITU-T Recommendation X.509. Technical Report ISO/IEC 9594-8: 1997, International Telecommunication Union. Information technology - Open Systems Interconnection - The Directory: Authentication framework (1997)
Jensen, M.R., Moller, T.H., Bach Pedersen, T.: Specifying OLAP cubes on XML data. J. Intell. Inf. Syst. 17(2-3), 255–280 (2001)
Lawrence, M., Rau-Chaplin, A.: The OLAP-Enabled Grid: Model and Query Processing Algorithms. In: HPCS (2006)
Lehti, P., Fankhauser, P.: XML data integration with OWL: experiences and challenges. In: Proc. 2004 Intl. Symposium on Applications and the Internet. IEEE, Los Alamitos (2004)
Lenz, H., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Ioannidis, Y., Hansen, D. (eds.) Ninth International Conference on Scientific and Statistical Database Management, Proceedings, Olympia, Washington, USA, pp. 132–143. IEEE Computer Society, Los Alamitos (1997)
Levene, M., Loizou, G.: Why is the snowflake schema a good data warehouse design? Inf. Syst. 28(3), 225–240 (2003)
Maier, D., Ullman, J.D., Vardi, M.Y.: On the foundations of the universal relation model. ACM Trans. Database Syst. 9(2), 283–308 (1984)
Näppilä, T., Järvelin, K., Niemi, T.: A tool for data cube construction from structurally heterogeneous XML documents. J. Am. Soc. Inf. Sci. Technol. 59(3), 435–449 (2008)
Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: Hammer, J. (ed.) DOLAP 2001, ACM Fourth International Workshop on Data Warehousing and OLAP, pp. 9–11. ACM, New York (2001)
Niemi, T., Nummenmaa, J., Thanisch, P.: Normalising OLAP cubes for controlling sparsity. Data and Knowledge Engineering 46(1), 317–343 (2003)
Niemi, T., Toivonen, S., Niinimäki, M., Nummenmaa, J.: Ontologies with Semantic Web/grid in data integration for OLAP. International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4) (2007)
Niinimaki, M.: Grid resources, services and data – towards a semantic grid system. Technical report, University of Tampere, Department of Computer Science (2006)
Niinimäki, M., Niemi, T.: Processing Semantic Web queries in Grid. Intl. Transactions on Systems Science and Application 3(4) (2008)
Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)
Priebe, T., Pernul, G.: Ontology-based Integration of OLAP and Information Retrieval. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736. Springer, Heidelberg (2003)
Romero, O., Abelló, A.: Automating multidimensional design from ontologies. In: DOLAP 2007: Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, pp. 1–8. ACM, New York (2007)
Sagiv, Y.: Can we use the universal instance assumption without using nulls? In: SIGMOD 1981: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, pp. 108–120. ACM, New York (1981)
Skoutas, D., Simitsis, A.: Designing ETL processes using semantic web technologies. In: DOLAP 2006: Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, pp. 67–74. ACM Press, New York (2006)
Skoutas, D., Simitsis, A.: Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data. International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4) (2007)
Staab, S. (ed.): Handbook on Ontologies. Springer, Heidelberg (2004)
The World Wide Web Consortium. XSL Transformations XSLT, Version 1.0, W3C Recommendation (November 16, 1999), http://www.w3.org/TR/xslt
Vrdoljak, B., Banek, M., Rizzi, S.: Designing web warehouses from XML schemas. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 89–98. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Niinimäki, M., Niemi, T. (2009). An ETL Process for OLAP Using RDF/OWL Ontologies. In: Spaccapietra, S., Zimányi, E., Song, IY. (eds) Journal on Data Semantics XIII. Lecture Notes in Computer Science, vol 5530. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03098-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-03098-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03097-0
Online ISBN: 978-3-642-03098-7
eBook Packages: Computer ScienceComputer Science (R0)