Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Improving parallelism of federated query processing

Published: 01 March 2008 Publication History

Abstract

Many large enterprises require access to distributed databases for business intelligence (BI) applications. Typically, distributed database are integrated into a centralized data warehouse for the benefit of easy maintenance. However, this approach needs to overcome the complexity of data loading and job scheduling as well as scalability issues. On the other hand, the approach of a fully federated system may not be feasible for data-intensive BI applications. The hybrid approach via intelligent data placement is more flexible and applicable than centralized or full-federation configurations. The current implementation of the hybrid approach to integrating distributed databases is to aggregate selected data from various remote sources as materialized views and cache them at the federation server to improve the performance of complex BI query workloads. In this paper, we propose an improvement that recommends Materialized Query Tables (MQTs) for backend servers for the benefits of load distribution and easy maintenance of aggregated data in conjunction with the current hybrid approach of data placement. Our approach considers the correlation between backend servers and recommends MQTs that are well coordinated among the backend servers and optimized for the workload. We also exploit the parallelism property among the backend servers to make our approach run almost linearly (in contrast to exponentially) with respect to the number of backend servers, without sacrificing its recommendation quality. Experimental evaluations validate the effectiveness and efficiency of our approach.

References

[1]
Vanja Josifovski, Peter M. Schwarz, Laura M. Haas, Eileen Lin, Garlic: a new flavor of federated query processing for DB2, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI, June 2002, pp. 524-532.
[2]
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, Alan Skelley, DB2 advisor: an optimizer smart enough to recommend its own indexes, in: Proceedings of the International Conference on Data Engineering, 2000.
[3]
Daniel C. Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Roberta Cochrane Guy M. Lohman, Hamid Pirahesh, Latha S. Colby, Jarek Gryz, Eric Alton, Dongming Liang, Gary Valentin, Recommending materialized views and indexes with IBM DB2 design advisor, in: Proceedings of the International Conference on Autonomic Computing, 2004.
[4]
Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy M. Lohman, Adam Storm, Christian Garcia-Arellano, Scott Fadden, DB2 design advisor: integrated automatic physical database design, in: Proceedings of the International Conference on Very Large Data Bases, 2004.
[5]
Wen-Syan Li, Daniel C. Zilio, Vishal S. Batra, Mahadevan Subramanian, Calisto Zuzarte, Inderpal Narang, Load balancing for multi-tiered database systems through autonomic placement of materialized views, in: Proceedings of the International Conference on Data Engineering, 2006.
[6]
TPC-H Benchmark, <http://www.tpc.org/>.
[7]
Rundensteiner, Elke A., Koeller, Andreas and Zhang, Xin, Maintaining data warehouses over changing information sources. Communications of the ACM. v43 i6. 57-62.
[8]
Latha S. Colby, Timothy Griffin, Leonid Libkin, Inderpal Singh Mumick, Howard Trickey, Algorithms for deferred view maintenance, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1996, pp. 469-480.
[9]
Ashish Gupta, H.V. Jagadish, Inderpal Singh Mumick, Data integration using self-maintainable views, in: Proceedings of the International Conference on Extending Data Base Theory, 1996, pp. 101-110.
[10]
Dallan Quass, Ashish Gupta, Inderpal Mumick, Jennifer Widom, Making views self-maintainable for data warehousing, in: Proceedings of the International Conference on Parallel and Distributed Information Systems, December 1996.
[11]
Gupta, A. and Mumick, I.S., Maintenance of materialized views: problems, techniques, and applications. IEEE Bulletin of the Technical Committee on Data Engineering. v18 i2. 3-18.
[12]
O. Shmueli, I. Itai, Maintenance of views, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1984, pp. 240-255.
[13]
Michel E. Adiba, Bruce G. Lindsay, Database snapshots, in: Proc of the 6th International Conference on Very Large Data Bases, Montreal, Quebec, Canada, October 1980, pp. 86-91.
[14]
E.N. Hanson, A performance analysis of view materialization strategies, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, May 1987, pp. 440-453.
[15]
B. Lindsay, L. Haas, C. Mohan, H. Pirahesh, P. Wilms, A snapshot differential refresh algorithm, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1986.
[16]
J.A. Blakeley, P.A. Larson, F.W. Tompa, Efficiently updating materialized views, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1986, pp. 61-71.
[17]
A. Gupta, I.S. Mumick, V.S. Subrahmanian, Maintaining views incrementally, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, May 1993, pp. 157-166.
[18]
T. Griffin, L. Libkin, Incremental maintenance of views with duplicates, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1995, pp. 328-339.
[19]
D. Agrawal, A. El Abbadi, A. Singh, T. Yurek, Efficient view maintenance in data warehouses, in: Proceedings of the 1997 ACM International Conference on Management of Data, May 1997, pp. 417-427.
[20]
K. Salem, K.S. Beyer, R. Cochrane, B.G. Lindsay, How to roll a join: asynchronous incremental view maintenance, in: Proceedings of the 2000 ACM International Conference on Management of Data, May 2000, pp. 129-140.
[21]
Redbrick, <http://www.informix.com/informix/solutions/dw/redbrick/vista/>.
[22]
Oracle Corp, <http://www.oracle.com/>.
[23]
Sanjay Agrawal, Surajit Chaudhuri, Vivek R. Narasayya, Automated selection of materialized views and indexes in SQL databases, in: Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 496-505.
[24]
Khalil Amiri, Sanghyun Park, Renu Tewari, A self-managing data cache for edge-of-network web applications, in: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 2002, pp. 177-185.
[25]
Khalil Amiri, Sanghyun Park, Renu Tewari, Sriram Padmanabhan, DBProxy: a dynamic data cache for web applicationsm, in: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, March 2003, pp. 821-831.
[26]
Times-Ten Team, Mid-tier caching: the timesten approach, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI, June 2002, pp. 588-593.
[27]
Qiong Luo, Sailesh Krishnamurthy, C. Mohan, Hamid Pirahesh, Honguk Woo, Bruce G. Lindsay, Jeffrey F. Naughton, Middle-tier database caching for e-business, in: Proceedings of 2002 ACM SIGMOD Conference, Madison, WI, USA, June 2002.
[28]
Mehmet Altinel, Christof Bornhövd, Sailesh Krishnamurthy, C. Mohan, Hamid Pirahesh, Berthold Reinwald, Cache tables: paving the way for an adaptive database cache, in: Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, September 2003, pp. 718-729.
[29]
Jingren Zhou, Paul Larson, Jonathan Goldstein, Luping Ding, Dynamic materialized views, in: Proceedings of the International Conference on Data Engineering, 2007.
[30]
Wen-Syan Li, Oliver Po, Wang-Pin Hsiung, K. Selçuk Candan, Divyakant Agrawal, Yusuf Akca, Kunihiro Taniguchi, CachePortal II: acceleration of very large scale data center-hosted database-driven web applications, in: Proceedings of the 2003 VLDB Conference, Berlin, Germany, September 2003.
[31]
Paul Larson, Jonathan Goldstein, Jingren Zhou. MTCache: transparent mid-tier database caching in SQL server, in: Proceedings of the International Conference on Data Engineering, 2004.
[32]
Jonathan Goldstein, Per-¿ke Larson, Optimizing queries using materialized views: a practical, scalable solution, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, May 2001, pp. 331-342.
[33]
Hongfei Guo, Per-¿ke Larson, Raghu Ramakrishnan, Jonathan Goldstein, Relaxed currency and consistency: how to say "good enough" in SQL, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 2004, pp. 815-826.
[34]
Philip A. Bernstein, Alan Fekete, Hongfei Guo, Raghu Ramakrishnan, Pradeep Tamma, Relaxed-currency serializability for middle-tier caching and replication, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, 2006, pp. 599-610.
[35]
Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman, Automating physical database design in a parallel database, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI, May 2002, pp. 558-569.

Cited By

View all
  • (2018)SQL Scorecard for Improved Stability and Performance of Data WarehousesInternational Journal of Software Innovation10.4018/IJSI.20160701024:3(22-37)Online publication date: 13-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data &amp; Knowledge Engineering
Data & Knowledge Engineering  Volume 64, Issue 3
March, 2008
178 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2008

Author Tags

  1. Federated databases
  2. Materialized views
  3. Performance

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)SQL Scorecard for Improved Stability and Performance of Data WarehousesInternational Journal of Software Innovation10.4018/IJSI.20160701024:3(22-37)Online publication date: 13-Dec-2018

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media