Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3318464.3386128acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

A Framework for Emulating Database Operations in Cloud Data Warehouses

Published: 31 May 2020 Publication History

Abstract

In recent years, increased interest in cloud-based data warehousing technologies has emerged with many enterprises moving away from on-premise data warehousing solutions. The incentives for adopting cloud data warehousing technologies are many: cost-cutting, on-demand pricing, offloading data centers, unlimited hardware resources, built-in disaster recovery, to name a few. There is inherent difference in the language surface and feature sets of on-premise and cloud data warehousing solutions. This could range from subtle syntactic and semantic differences, with potentially big impact on result correctness, to complete features that exist in one system but are missing in other systems. While there have been some efforts to help automate the migration of on-premise applications to new cloud environments, a major challenge that slows down the migration pace is the handling of features not yet supported, or partially supported, by the cloud technologies. In this paper we build on our earlier work in adaptive data virtualization and present novel techniques that allow running applications utilizing sophisticated database features within foreign query engines lacking the native support of such features. In particular, we introduce a framework to manage discrepancy of metadata across heterogeneous query engines, and various mechanisms to emulate database applications code in cloud environments without any need to rewrite or change the application code.

Supplementary Material

MP4 File (3318464.3386128.mp4)
Presentation Video

References

[1]
Amirhossein Aleyasen, Mohamed A. Soliman, Lyublena Antova, F. Michael Waas, and Marianne Winslett. 2018. High-Throughput Adaptive Data Virtualization via Context-Aware Query Routing. In IEEE BigData Conference.
[2]
Lyublena Antova, Rhonda Baldwin, Derrick Bryant, Tuan Cao, Michael Duller, John Eshleman, Zhongxian Gu, Entong Shen, Mohamed A Soliman, and F Michael Waas. 2016. Datometry Hyper-Q: Bridging the Gap Between Real-Time and Historical Analytics. In SIGMOD.
[3]
Lyublena Antova, Derrick Bryant, Tuan Cao, Michael Duller, Mohamed A Soliman, and F Michael Waas. 2018. Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q. In SIGMOD.
[4]
Attunity/Qlik. 2019. Qlik Data Integration Products. (2019). https://www.qlik.com
[5]
SAP BODS. 2019. SAP BO Data Services. (2019). https://www.sap.com
[6]
compilerworks. 2019. Compilerworks Transpiler Solution. (2019). http://www.compilerworks.com/
[7]
Beno^i t Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD.
[8]
Datometry. 2019. Datometry qShift. (2019). https://datometry.com/products/automatic-schema-generation-for-databases/
[9]
Denodo. 2019. Denodo Data Virtualization. (2019). http://www.denodo.com/
[10]
Amol Deshpande and Joseph M. Hellerstein. 2002. Decoupled Query Optimization for Federated Database Systems. In ICDE.
[11]
Aaron J. Elmore et al. 2015. A Demonstration of the BigDAWG Polystore System. PVLDB (2015).
[12]
Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015a. Amazon Redshift and the Case for Simpler Data Warehouses. In SIGMOD.
[13]
Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015b. Amazon Redshift and the Case for Simpler Data Warehouses. In SIGMOD.
[14]
San-Yih Hwang, Ee-Peng Lim, H.-R. Yang, S. Musukula, K. Mediratta, M. Ganesh, Dave Clements, J. Stenoien, and Jaideep Srivastava. 1994. The MYRIAD Federated Database Prototype. In SIGMOD.
[15]
Holger Kache, Wook-Shin Han, Volker Markl, Vijayshankar Raman, and Stephan Ewen. 2006. POP/FED: Progressive Query Optimization for Federated Queries in DB2. In VLDB.
[16]
Microsoft. 2020. Microsoft Azure Synapse Analytics. (2020). https://docs.microsoft.com/en-us/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture
[17]
PL/SQL. 2019. Database PL/SQL Language Reference. (2019). https://docs.oracle.com/cd/B28359_01/appdev.111/b28370/toc.htm
[18]
AWS SCT. 2019. AWS Schema Conversion Tool. (2019). https://aws.amazon.com/dms/schema-conversion-tool/
[19]
Srinath Shankar, Rimma V. Nehme, Josep Aguilar-Saborit, Andrew Chung, Mostafa Elhemali, Alan Halverson, Eric Robinson, Mahadevan Sankara Subramanian, David J. DeWitt, and Cé sar A. Galindo-Legaria. 2012. Query Optimization in Microsoft SQL server PDW. In SIGMOD.
[20]
Alkis Simitsis, Kevin Wilkinson, Malú Castellanos, and Umeshwar Dayal. 2012. Optimizing analytic data flows for multiple execution engines. In SIGMOD.
[21]
Tibco. 2019. Tibco Data Virtualization. (2019). https://www.tibco.com

Cited By

View all
  • (2023)How Global Retailer ADEO Migrated to Google BigQuery with Database Virtualization2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386640(1895-1898)Online publication date: 15-Dec-2023
  • (2023)A Study on Big Data Engineering Using Cloud Data WarehouseData Engineering and Data Science10.1002/9781119841999.ch3(49-69)Online publication date: 5-Sep-2023
  • (2022)Data Integration, Cleaning, and Deduplication: Research Versus Industrial ProjectsInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_1(3-17)Online publication date: 20-Nov-2022

Index Terms

  1. A Framework for Emulating Database Operations in Cloud Data Warehouses

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cloud data warehousing
    2. data warehousing
    3. database emulation
    4. database migration
    5. metadata management
    6. query processing
    7. query rewriting

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 29 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)How Global Retailer ADEO Migrated to Google BigQuery with Database Virtualization2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386640(1895-1898)Online publication date: 15-Dec-2023
    • (2023)A Study on Big Data Engineering Using Cloud Data WarehouseData Engineering and Data Science10.1002/9781119841999.ch3(49-69)Online publication date: 5-Sep-2023
    • (2022)Data Integration, Cleaning, and Deduplication: Research Versus Industrial ProjectsInformation Integration and Web Intelligence10.1007/978-3-031-21047-1_1(3-17)Online publication date: 20-Nov-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media