Nothing Special   »   [go: up one dir, main page]

Skip to main content

Data Warehouse Performance: Selected Techniques and Data Structures

  • Chapter
Business Intelligence (eBISS 2011)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 96))

Included in the following conference series:

Summary

Data stored in a data warehouse (DW) are retrieved and analyzed by complex analytical applications, often expressed by means of star queries. Such queries often scan huge volumes of data and are computationally complex. For this reason, an acceptable (or good) DW performance is one of the important features that must be guaranteed for DW users. Good DW performance can be achieved in multiple components of a DW architecture, starting from hardware (e.g., parallel processing on multiple nodes, fast disks, huge main memory, fast multi-core processor), through physical storage schemes (e.g., row storage, column storage, multidimensional store, data and index compression algorithms), state of the art techniques of query optimization (e.g., cost models and size estimation techniques, parallel query optimization and execution, join algorithms), and additional data structures improving data searching efficiency (e.g., indexes, materialized views, clusters, partitions). In this chapter we aim at presenting only a narrow aspect of the aforementioned technologies. We discuss three types of data structures, namely indexes (bitmap, join, and bitmap join), materialized views, and partitioned tables. We show how they are being applied in the process of executing star queries in three commercial database/data warehouse management systems, i.e., Oracle, DB2, and SQL Server.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. d’Orazio, L., Bimonte, S.: Multidimensional Arrays for Warehousing Data on Clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds.) Globe 2010. LNCS, vol. 6265, pp. 26–37. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. Furtado, P.: A survey of parallel and distributed data warehouses. International Journal of Data Warehousing and Mining 5(2), 57–77 (2009)

    Article  Google Scholar 

  3. Jhingran, A., Jou, S., Lee, W., Pham, T., Saha, B.: IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud. MC Press, LLC (2010)

    Google Scholar 

  4. Andrzejewski, W., Wrembel, R.: GPU-WAH: Applying Gpus to Compressing Bitmap Indexes with Word Aligned Hybrid. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6262, pp. 315–329. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Fang, W., He, B., Luo, Q.: Database compression on graphics processors. Proc. VLDB Endow. 3, 670–680 (2010)

    Article  Google Scholar 

  6. Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: Gputerasort: high performance graphics co-processor sorting for large database management. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 325–336 (2006)

    Google Scholar 

  7. Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 215–226 (2004)

    Google Scholar 

  8. Lauer, T., Datta, A., Khadikov, Z., Anselm, C.: Exploring graphics processing units as parallel coprocessors for online aggregation. In: DOLAP, pp. 77–84 (2010)

    Google Scholar 

  9. Kemper, A., Neumann, T.: Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)

    Google Scholar 

  10. Qiao, L., Raman, V., Reiss, F., Haas, P.J., Lohman, G.M.: Main-memory scan sharing for multi-core cpus. Proc. VLDB Endow. 1, 610–621 (2008)

    Article  Google Scholar 

  11. Ross, K.A., Zaman, K.A.: Serving datacube tuples from main memory. In: Proc. of Int. Conf. on Scientific and Statistical Database Management (SSDBM), p. 182 (2000)

    Google Scholar 

  12. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 671–682 (2006)

    Google Scholar 

  13. Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 967–980 (2008)

    Google Scholar 

  14. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented dbms. In: Proc. of Int. Conf. on Very Large Data Bases (VLDB), pp. 553–564 (2005)

    Google Scholar 

  15. Abadi, D.J., Boncz, P.A., Harizopoulos, S.: Column-oriented database systems. Proc. VLDB Endow. 2, 1664–1665 (2009)

    Article  Google Scholar 

  16. Cuzzocrea, A.: Data cube compression techniques: A theoretical review. In: Encyclopedia of Data Warehousing and Mining. IGI Global (2009)

    Google Scholar 

  17. Hasan, K.M.A., Tsuji, T., Higuchi, K.: An Efficient Implementation for MOLAP Basic Data Structure and its Evaluation. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 288–299. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Moerkotte, G.: Building query compilers, http://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf (retrieved October 6, 2011)

  19. Graefe, G.: A generalized join algorithm. In: BTW, pp. 267–286 (2011)

    Google Scholar 

  20. Graefe, G.: Parallel query execution algorithms. In: Encyclopedia of Database Systems, pp. 2030–2035. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Zeller, H., Graefe, G.: Parallel query optimization. In: Encyclopedia of Database Systems, pp. 2035–2038. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  22. Chaudhuri, S.: Query optimizers: time to rethink the contract? In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 961–968 (2009)

    Google Scholar 

  23. Ioannidis, Y.E., Poosala, V.: Histogram-based approximation of set-valued query-answers. In: Proc. of Int. Conf. on Very Large Data Bases (VLDB), pp. 174–185 (1999)

    Google Scholar 

  24. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 193–204 (1999)

    Google Scholar 

  25. Gramacki, A., Gramacki, J., Andrzejewski, W.: Probability density functions for calculating approximate aggregates. Foundations of Computing and Decision Sciences Journal 35 (2010)

    Google Scholar 

  26. Valduriez, P.: Join indices. ACM Transactions on Database Systems (TODS) 12(2), 218–246 (1987)

    Article  Google Scholar 

  27. O’Neil, P.: Model 204 architecture and performance. In: Gawlick, D., Reuter, A., Haynie, M. (eds.) HPTS 1987. LNCS, vol. 359, pp. 40–59. Springer, Heidelberg (1989)

    Google Scholar 

  28. Stockinger, K., Wu, K.: Bitmap indices for data warehouses. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouses and OLAP: Concepts, Architectures and Solutions, pp. 157–178. Idea Group Inc. (2007); ISBN 1-59904-364-5

    Google Scholar 

  29. Bryla, B., Loney, K.: Oracle Database 11g DBA Handbook. McGraw-Hill Osborne Media (2007)

    Google Scholar 

  30. O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Record 24(3), 8–11 (1995)

    Article  Google Scholar 

  31. Gupta, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press (1999)

    Google Scholar 

  32. Furtado, P.: Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 38–47. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  33. Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 558–569 (2002)

    Google Scholar 

  34. Stöhr, T., Rahm, E.: Warlock: A data allocation tool for parallel warehouses. In: Proc. of Int. Conf. on Very Large Data Bases (VLDB), pp. 721–722 (2001)

    Google Scholar 

  35. Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: EDBT, pp. 228–239. ACM (2010)

    Google Scholar 

  36. Nourani, M., Tehranipour, M.H.: Rl-huffman encoding for test compression and power reduction in scan applications. ACM Trans. Design Autom. Electr. Syst. 10(1), 91–115 (2005)

    Article  Google Scholar 

  37. Stabno, M., Wrembel, R.: RLH: Bitmap compression technique based on run-length and Huffman encoding. Information Systems 34(4-5), 400–414 (2009)

    Article  Google Scholar 

  38. Aouiche, K., Darmont, J.: Data mining-based materialized view and index selection in data warehouses. J. Intell. Inf. Syst. 33, 65–93 (2009)

    Article  Google Scholar 

  39. Lawrence, M., Rau-Chaplin, A.: Dynamic view selection for olap. In: Strategic Advancements in Utilizing Data Mining and Warehousing Technologies, pp. 91–106. IGI Global (2010)

    Google Scholar 

  40. Theodoratos, D., Xu, W., Simitsis, A.: Materialized view selection for data warehouse design. In: Encyclopedia of Data Warehousing and Mining, pp. 1182–1187. IGI Global (2009)

    Google Scholar 

  41. Chen, Y., Dehne, F.K.H.A., Eavis, T., Rau-Chaplin, A.: Improved data partitioning for building large rolap data cubes in parallel. IJDWM 2(1), 1–26 (2006)

    Google Scholar 

  42. Furtado, P.: Large Relations in Node-Partitioned Data Warehouses. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 555–560. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  43. Binnig, C., Kossmann, D., Kraska, T., Loesing, S.: How is the weather tomorrow?: towards a benchmark for the cloud. In: Proc. of Int. Workshop on Testing Database Systems, DBTest (2009)

    Google Scholar 

  44. Funke, F., Kemper, A., Krompass, S., Neumann, T., Seibold, M., Kuno, H., Nica, A., Poess, M.: Metrics for measuring the performance of the mixed workload ch-benchmark. In: Proc. of Technology Conference on Performance Evaluation and Benchmarking, TPCTC (2011)

    Google Scholar 

  45. Kersten, M.L., Kemper, A., Markl, V., Nica, A., Poess, M., Sattler, K.U.: Tractor pulling on data warehouses. In: Proc. of Int. Workshop on Testing Database Systems, DBTest (2011)

    Google Scholar 

  46. O’Neil, P.E., O’Neil, E.J., Chen, X., Revilak, S.: The Star Schema Benchmark and Augmented Fact Table Indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  47. Gyssens, M., Lakshmanan, L.V.S.: A foundation for multi-dimensional databases. In: Proc. of Int. Conf. on Very Large Data Bases (VLDB), pp. 106–115 (1997)

    Google Scholar 

  48. Jarke, M., Lenzerini, M., Vassiliou, Y., Vassiliadis, P.: Fundamentals of Data Warehouses. Springer, Heidelberg (2003)

    Book  Google Scholar 

  49. Malinowski, E., Zimányi, E.: Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications. Springer Publishing Company, Inc., Heidelberg (2008)

    Google Scholar 

  50. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Record 26(1), 65–74 (1997)

    Article  Google Scholar 

  51. Li, Z., Ross, K.A.: Fast joins using join indices. VLDB Journal 8(1), 1–24 (1999)

    Article  Google Scholar 

  52. Davis, K.C., Gupta, A.: Indexing in data warehouses: Bitmaps and beyond. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouses and OLAP: Concepts, Architectures and Solutions, pp. 179–202. Idea Group Inc. (2007); ISBN 1-59904-364-5

    Google Scholar 

  53. Wu, K., Otoo, E.J., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proc. of Int. Conf. on Very Large Data Bases (VLDB), pp. 24–35 (2004)

    Google Scholar 

  54. Scientific Data Management Research Group: FastBit: An efficient compressed bitmap index technology, http://sdm.lbl.gov/fastbit/ (retrieved October 24, 2011)

  55. Wu, M., Buchmann, A.: Encoded bitmap indexing for data warehouses. In: Proc. of Int. Conf. on Data Engineering (ICDE), pp. 220–230 (1998)

    Google Scholar 

  56. Wu, K., Yu, P.: Range-based bitmap indexing for high cardinality attributes with skew. In: Int. Computer Software and Applications Conference (COMPSAC), pp. 61–67 (1998)

    Google Scholar 

  57. Rotem, D., Stockinger, K., Wu, K.: Optimizing candidate check costs for bitmap indices. In: Proc. of ACM Conf. on Information and Knowledge Management (CIKM), pp. 648–655 (2005)

    Google Scholar 

  58. Rotem, D., Stockinger, K., Wu, K.: Optimizing I/O Costs of Multi-Dimensional Queries using Bitmap Indices. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 220–229. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  59. Stockinger, K., Wu, K., Shoshani, A.: Evaluation Strategies for Bitmap Indices with Binning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 120–129. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  60. Koudas, N.: Space efficient bitmap indexing. In: Proc. of ACM Conf. on Information and Knowledge Management (CIKM), pp. 194–201 (2000)

    Google Scholar 

  61. Chan, C., Ioannidis, Y.: Bitmap index design and evaluation. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 355–366 (1998)

    Google Scholar 

  62. O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 38–49 (1997)

    Google Scholar 

  63. Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. In: Proc. of ACM SIGMOD Int. Conference on Management of Data, pp. 47–57 (2001)

    Google Scholar 

  64. Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle RDB. VLDB Journal 5(4), 229–237 (1996)

    Article  Google Scholar 

  65. Wu, K., Otoo, E.J., Shoshani, A.: An efficient compression scheme for bitmap indices. Research report, Lawrence Berkeley National Laboratory (2004)

    Google Scholar 

  66. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems (TODS) 31(1), 1–38 (2006)

    Article  Google Scholar 

  67. Deliège, F.: Concepts and Techniques for Flexible and Effective Music Data Management. PhD thesis, Aalborg University, Denmark (2009)

    Google Scholar 

  68. Stabno, M., Wrembel, R.: RLH: Bitmap compression technique based on run-length and Huffman encoding. In: Proc. of ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), pp. 41–48 (2007)

    Google Scholar 

  69. Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proc. of the Institute of Radio Engineers, pp. 1098–1101 (1952)

    Google Scholar 

  70. O’Neil, E., O’Neil, P., Wu, K.: Bitmap index design choices and their performance implications. Research report, Lawrence Berkeley National Laboratory (2007)

    Google Scholar 

  71. Reiss, F., Stockinger, K., Wu, K., Shoshani, A., Hellerstein, J.M.: Efficient analysis of live and historical streaming data and its application to cybersecurity. Research report, Lawrence Berkeley National Laboratory (2006)

    Google Scholar 

  72. Oracle Corp.: Oracle Database Data Warehousing Guide, Rel. 11g, http://www.oracle.com/technetwork/database/enterprise-edition/documentation/database-093888.html (retrieved June 24, 2010)

  73. Bhattacharjee, B., Padmanabhan, S., Malkemus, T., Lai, T., Cranston, L., Huras, M.: Efficient query processing for multi-dimensionally clustered tables in db2. In: VLDB, pp. 963–974 (2003)

    Google Scholar 

  74. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)

    Article  Google Scholar 

  75. Microsoft Corp.: SQL Server (2008) R2, http://msdn.microsoft.com/enus/library/ms191267.aspx (retrieved June 24, 2010)

  76. Aouiche, K., Darmont, J., Boussaïd, O., Bentayeb, F.: Automatic Selection of Bitmap Join Indexes in Data Warehouse. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 64–73. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  77. Hobbs, L., Hillson, S., Lawande, S.: Oracle9iR2 Data Warehousing. Digital Press (2003)

    Google Scholar 

  78. de Sousa, M.F., Sampaio, M.C.: Efficient materialization and use of views in data warehouses. SIGMOD Record 28(1), 78–83 (1999)

    Article  Google Scholar 

  79. Gupta, H.: Selection of Views to Materialise in a Data Warehouse. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 98–112. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  80. Lawrence, M., Rau-Chaplin, A.: Dynamic View Selection for OLAP. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 33–44. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  81. Theodoratos, D., Xu, W.: Constructing search space for materialized view selection. In: Proc. of ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), pp. 48–57 (2004)

    Google Scholar 

  82. Krueger, J., Tinnefeld, C., Grund, M., Zeier, A., Plattner, H.: A case for online mixed workload processing. In: Proc. of Int. Workshop on Testing Database Systems (DBTest). ACM (2010)

    Google Scholar 

  83. Castellanos, M., Dayal, U., Miller, R.J.: Enabling Real-Time Business Intelligence. LNBIP, vol. 41. Springer, Heidelberg (2010)

    Book  Google Scholar 

  84. Thiele, M., Fischer, U., Lehner, W.: Partition-based workload scheduling in living data warehouse environments. Information Systems 34(4-5), 382–399 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wrembel, R. (2012). Data Warehouse Performance: Selected Techniques and Data Structures. In: Aufaure, MA., Zimányi, E. (eds) Business Intelligence. eBISS 2011. Lecture Notes in Business Information Processing, vol 96. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27358-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27358-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27357-5

  • Online ISBN: 978-3-642-27358-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics