Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/583890.583893acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Conceptual modeling for ETL processes

Published: 08 November 2002 Publication History

Abstract

Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the problem of the definition of ETL activities and provide formal foundations for their conceptual representation. The proposed conceptual model is (a) customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project; (b) enriched with a 'palette' of a set of frequently used ETL activities, like the assignment of surrogate keys, the check for null values, etc; and (c) constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns for ETL activities.

References

[1]
Ardent Software. DataStage Suite. http://www.ardentsoftware.com/]]
[2]
M. Bouzeghoub, F. Fabret, M. Matulovic. Modeling Data Warehouse Refreshment Process as a Workflow Application. In Proc. DMDW'99 (Heidelberg, Germany, 1999).]]
[3]
V. Borkar, K. Deshmuk, S. Sarawagi. Automatically Extracting Structure from Free Text Addresses. Bulletin of the Technical Committee on Data Engineering, 23, 4, 2000.]]
[4]
G. Booch, I. Jacobson, J. Rumbaugh. The Unified Modeling Language User Guide. Addison-Wesley Pub Co. (1998)]]
[5]
D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, and R. Rosati. Information integration: Conceptual modeling and reasoning support. In Proc. COOPIS, (New York, USA, 1998) pp. 280--291.]]
[6]
D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, R. Rosati. A principled approach to data integration and reconciliation in data warehousing. In Proc. DMDW'99, (Heidelberg, Germany, 1999).]]
[7]
DataMirror Corporation. Transformation Server. http://www.datamirror.com]]
[8]
M. Demarest. The politics of data warehousing. http://www.hevanet.com/demarest/marc/dwpol.html]]
[9]
Evolutionary Technologies Intl. ETI*EXTRACT. http://www.eti.com/]]
[10]
H. Galhardas, D. Florescu, D. Shasha and E. Simon. Ajax: An Extensible Data Cleaning Tool. In Proc. ACM SIGMOD (Dallas, Texas, 2000), pp. 590.]]
[11]
M. Golfarelli, D. Maio, S. Rizzi. The Dimensional Fact Model: a Conceptual Model for Data Warehouses. Invited Paper, International Journal of Cooperative Information Systems, 7, 2&3, 1998.]]
[12]
M. Golfarelli, S. Rizzi: Methodological Framework for Data Warehouse Design. In Proc. DOLAP, (Bethesda, Maryland, USA, 1998) pp. 3--9.]]
[13]
B. Husemann, J. Lechtenborger, G. Vossen. Conceptual data warehouse modeling. In Proc. DMDW (Stockholm, Sweden, 2000), pp. 6.1--6.11.]]
[14]
B. Inmon. The Data Warehouse Budget. DM Review Magazine, January 1997. www.dmreview.com/master.cfm?NavID=55&EdID=1315]]
[15]
M.A. Jeusfeld, C. Quix, M. Jarke: Design and Analysis of Quality Information for Data Warehouses. In Proc. ER'98 (Singapore 1998), pp. 349--362.]]
[16]
M. Jarke, M.A. Jeusfeld, C. Quix, P. Vassiliadis: Architecture and quality in data warehouses: An extended repository approach. Information Systems, 24, 3, 1999, pp. 229--253.]]
[17]
M. Jarke, M. Lenzerini, Y. Vassiliou, P. Vassiliadis (eds.). Fundamentals of Data Warehouses. Springer,(2000).]]
[18]
R. Kimball. A Dimensional Modeling Manifesto. DBMS Magazine. August 1997.]]
[19]
R. Kimbal, L. Reeves, M. Ross, W. Thornthwaite. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses. John Wiley & Sons, February 1998.]]
[20]
W. Labio, J.L. Wiener, H. Garcia-Molina, V. Gorelik. Efficient Resumption of Interrupted Warehouse Loads. In Proc. SIGMOD (Dallas, Texas, USA, 2000), pp. 46--57.]]
[21]
Microsoft Corp. MS Data Transformation Services. www.microsoft.com/sq]]
[22]
D.L. Moody, M.A.R. Kortink: From enterprise models to dimensional models: a methodology for data warehouse and data mart design. In Proc. DMDW (Stockholm, Sweden, June 2000).]]
[23]
A. Monge. Matching Algorithms Within a Duplicate Detection System. Bulletin of the Technical Committee on Data Engineering, 23, 4, 2000.]]
[24]
T. B. Nguyen, A Min Tjoa, R. R. Wagner. An Object Oriented Multidimensional Data Model for OLAP. In Proc. WAIM (Shanghai, China, June 2000).]]
[25]
Oracle Corp. Oracle9i™ Warehouse Builder User's Guide, Release 9.0.2. November 2001.]]
[26]
E. Rahm, H. Do. Data Cleaning: Problems and Current Approaches. Bulletin of the Technical Committee on Data Engineering, 23, 4, 2000.]]
[27]
V. Raman, J. Hellerstein. Potter's Wheel: An Interactive Data Cleaning System. In Proc. VLDB (Roma, Italy, 2001), pp. 381--390.]]
[28]
C. Sapia, M. Blaschka, G. Höfling, B. Dinter: Extending the E/R Model for the Multidimensional Paradigm. In ER Workshops 1998, pp. 105-116. LNCS 1552, Springer 1999.]]
[29]
C. Shilakes, J. Tylman. Enterprise Information Portals. Enterprise Software Team. http://www.sagemaker.com/company/downloads/eip/ indepth.pdf]]
[30]
N. Tryfona, F. Busborg, J.G.B. Christiansen. starER: A Conceptual Model for Data Warehouse Design. In DOLAP (Kansas City, Missouri, USA, November 1999), pp. 3--8.]]
[31]
J.C. Trujillo, M. Palomar, J. Gómez: Applying Object-Oriented Conceptual Modeling Techniques to the Design of Multidimensional Databases and OLAP Applications. In Proc. WAIM (Shanghai, China, June 2000), pp. 83--94.]]
[32]
A. Tsois. MAC: Conceptual data modeling for OLAP. In Proc. DMDW (Interlaken, Switzerland, 2001]]
[33]
P. Vassiliadis. Gulliver in the land of data warehousing: practical experiences and observations of a researcher. In Proc. DMDW (Stockholm, Sweden, 2000), pp. 12.1--12.16.]]
[34]
P. Vassiliadis, A. Simitsis, S. Skiadopoulos. Modeling ETL activities as graphs. In Proc. DMDW (Toronto, Canada, May 2002), pp. 52--61.]]
[35]
P. Vassiliadis, C. Quix, Y. Vassiliou, M. Jarke. Data Warehouse Process Management. Information Systems, 26, 3, 2001, pp. 205--236.]]
[36]
P. Vassiliadis, Z. Vagena, S. Skiadopoulos, N. Karayannidis, T. Sellis. Arktos: Towards the modeling, design, control and execution of ETL processes. Information Systems, 26, 8, pp. 537--561, (2001).]]

Cited By

View all
  • (2024)Analysis of Variability in Electric Power Consumption: A Methodology for Setting Time-Differentiated TariffsEnergies10.3390/en1704084217:4(842)Online publication date: 10-Feb-2024
  • (2024)An open dataset of data lineage graphs for data governance researchVisual Informatics10.1016/j.visinf.2024.01.0018:1(1-5)Online publication date: Mar-2024
  • (2024)Integrating multiple data sources to measure sustainable tourism in Italian regionsSocio-Economic Planning Sciences10.1016/j.seps.2024.101959(101959)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DOLAP '02: Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
November 2002
88 pages
ISBN:1581135904
DOI:10.1145/583890
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ETL
  2. conceptual modeling
  3. data warehousing

Qualifiers

  • Article

Conference

CIKM02

Acceptance Rates

Overall Acceptance Rate 29 of 79 submissions, 37%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)297
  • Downloads (Last 6 weeks)27
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Analysis of Variability in Electric Power Consumption: A Methodology for Setting Time-Differentiated TariffsEnergies10.3390/en1704084217:4(842)Online publication date: 10-Feb-2024
  • (2024)An open dataset of data lineage graphs for data governance researchVisual Informatics10.1016/j.visinf.2024.01.0018:1(1-5)Online publication date: Mar-2024
  • (2024)Integrating multiple data sources to measure sustainable tourism in Italian regionsSocio-Economic Planning Sciences10.1016/j.seps.2024.101959(101959)Online publication date: May-2024
  • (2024)Machine Learning Operations (MLOps) in Health Care: A Scoping ReviewMayo Clinic Proceedings: Digital Health10.1016/j.mcpdig.2024.06.009Online publication date: Jul-2024
  • (2024)Data integration from traditional to big data: main features and comparisons of ETL approachesThe Journal of Supercomputing10.1007/s11227-024-06413-180:19(26687-26725)Online publication date: 16-Sep-2024
  • (2024)Towards a Formal Specification and Automatic Execution of ETLs in Cross-organizational Business ProcessesDisruptive Information Technologies for a Smart Society10.1007/978-3-031-50755-7_43(459-470)Online publication date: 1-Feb-2024
  • (2023)Applications Integration in a Semi-Virtualized EnvironmentInternational Journal of Innovative Technology and Exploring Engineering10.35940/ijitee.B9403.011222312:2(6-11)Online publication date: 30-Jan-2023
  • (2023)Data Is the New Oil–Sort of: A View on Why This Comparison Is Misleading and Its Implications for Modern Data AdministrationFuture Internet10.3390/fi1502007115:2(71)Online publication date: 12-Feb-2023
  • (2023)Migrating to Big Data: Contextual Challenges for Data-driven Success2023 10th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)10.1109/UPCON59197.2023.10434722(660-668)Online publication date: 1-Dec-2023
  • (2023)High Performance Business Intelligence Dashboard2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS)10.1109/ICSECS58457.2023.10256421(158-163)Online publication date: 25-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media