Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1182635.1164161acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

CURE for cubes: cubing using a ROLAP engine

Published: 01 September 2006 Publication History

Abstract

Data cube construction has been the focus of much research due to its importance in improving efficiency of OLAP. A significant fraction of this work has been on ROLAP techniques, which are based on relational technology. Existing ROLAP cubing solutions mainly focus on "flat" datasets, which do not include hierarchies in their dimensions. Nevertheless, the nature of hierarchies introduces several complications into cube construction, making existing techniques essentially inapplicable in a significant number of real-world applications. In particular, hierarchies raise three main challenges: (a) The number of nodes in a cube lattice increases dramatically and its shape is more involved. These require new forms of lattice traversal for efficient execution. (b) The number of unique values in the higher levels of a dimension hierarchy may be very small; hence, partitioning data into fragments that fit in memory and include all entries of a particular value may often be impossible. This requires new partitioning schemes. (c) The number of tuples that need to be materialized in the final cube increases dramatically. This requires new storage schemes that remove all forms of redundancy for efficient space utilization. In this paper, we propose CURE, a novel ROLAP cubing method that addresses these issues and constructs complete data cubes over very large datasets with arbitrary hierarchies. CURE contributes a novel lattice traversal scheme, an optimized partitioning method, and a suite of relational storage schemes for all forms of redundancy. We demonstrate the effectiveness of CURE through experiments on both real-world and synthetic datasets. Among the experimental results, we distinguish those that have made CURE the first ROLAP technique to complete the construction of the cube of the highest-density dataset in the APB-1 benchmark (12 GB). CURE was in fact quite efficient on this, showing great promise with respect to the potential of the technique overall.

References

[1]
{1} S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnam, and S. Sarawagi. On the Computation of Multidimensional Aggregates. In VLDB 1996.]]
[2]
{2} K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In SIGMOD 1999.]]
[3]
{3} J. A. Blackard. The Forest CoverType Dataset. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/covtype]]
[4]
{4} Z. Chen, V. R. Narasayya. Efficient Computation of Multiple Group By Queries. In SIGMOD 2005.]]
[5]
{5} Y. Feng, D. Agrawal, A. Abbadi, A. Metwally. Range CUBE: Efficient Cube Computation by Exploiting Data Correlation . ICDE 2004.]]
[6]
{6} J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Proceedings of the 12th International Conference on Data Engineering, 1996.]]
[7]
{7} C. Hahn, S. Warren, and J. London. Cloud reports. http://cdiac.esd.ornl.gov/cdiac/ndps/ndp026b.html]]
[8]
{8} J. Han, J. Pei, G. Dong, K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In SIGMOD 2001.]]
[9]
{9} V. Harinarayan, A. Razaraman and J. D. Ullman. Implementing Datacubes Efficiently. In SIGMOD 1996.]]
[10]
{10} H. V. Jagadish, L. Lakshmanan, and D. Srivastava. What can Hierarchies do for Data Warehouses? In VLDB 1999.]]
[11]
{11} N. Karayannidis, T. Sellis, Y. Kouvaras. CUBE File: A File Structure for Hierarchically Clustered OLAP Cubes. EDBT 04.]]
[12]
{12} N. Kotsis and D. R. McGregor. Elimination of Redundant Views in Multidimensional Aggregates. In DaWaK 2000.]]
[13]
{13} L.V.S. Lakshmanan, J. Pei, J. Han. Quotient Cube: How to Summarize the Semantics of a Data Cube. VLDB 2002.]]
[14]
{14} L.V.S. Lakshmanan, J. Pei and Y. Zhao. QCTrees: An Efficient Summary Structure for Semantic OLAP. SIGMOD 2003.]]
[15]
{15} C. Li, G. Cong, A. K. H. Tung, S. Wang. Incremental maintenance of quotient cube for median. KDD 2004.]]
[16]
{16} X. Li, J. Han, and H. Gonzalez. High-Dimensional OLAP: A Minimal Cubing Approach. In VLDB 2004.]]
[17]
{17} OLAP Council. APB-1 OLAP Benchmark. http://www.olapcouncil.org]]
[18]
{18} K. A. Ross and D. Srivastava. Fast Computation of Sparse Datacubes. In VLDB 1997.]]
[19]
{19} S. Sarawagi, R. Agrawal and A. Gupta. On Computing the Data Cube. Research report 10026. IBM Almaden Research Center, San Jose, California 1996.]]
[20]
{20} Z. Shao, J. Han, and D. Xin. MM-Cubing: Computing Iceberg Cubes by Factorizing the Lattice Space. In SSDBM 2004.]]
[21]
{21} Y. Sismanis, A. Deligiannakis, Y. Kotidis, N. Roussopoulos. Hierarchical Dwarfs for the Rollup Cube. DOLAP 2003.]]
[22]
{22} Y. Sismanis, A. Deligiannakis, N. Roussopoulos and Y. Kotidis. Dwarf: Shrinking the petacube. In SIGMOD 2002.]]
[23]
{23} Y. Sismanis, and N. Roussopoulos. The Complexity of Fully Materialized Coalesced Cubes. In VLDB 2004.]]
[24]
{24} W. Wang, H. Lu, J. Feng, J. Xu Yu. Condensed Cube: An Effective Approach to Reducing Data Cube Size. ICDE 2002.]]
[25]
{25} D. Xin, J. Han, X. Li, and B. W. Wah. Star Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration. In VLDB 2003.]]
[26]
{26} Y. Zhao, P. M. Deshpande and J. F. Naughton. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates . In SIGMOD 1997.]]

Cited By

View all
  • (2012)Towards a scalable, performance-oriented OLAP storage engineProceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II10.1007/978-3-642-29035-0_13(185-202)Online publication date: 15-Apr-2012
  • (2011)The NOX OLAP query modelProceedings of the 13th international conference on Data warehousing and knowledge discovery10.5555/2033616.2033633(167-183)Online publication date: 29-Aug-2011
  • (2010)The NOX frameworkProceedings of the 12th international conference on Data warehousing and knowledge discovery10.5555/1881923.1881942(172-189)Online publication date: 30-Aug-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages

Sponsors

  • SIGMOD: ACM Special Interest Group on Management of Data
  • K.I.S.S. SIG on Databases
  • AJU Information Technology Co., Ltd
  • US Army ITC-PAC Asian Research Office
  • Google Inc.
  • The Database Society of Japan
  • Samsung SOS
  • Advanced Information Technology Research Center
  • Naver
  • Microsoft: Microsoft
  • Korea Info Sci Society: Korea Information Science Society
  • SK telecom
  • Systems Applications Products
  • ORACLE: ORACLE
  • International Business Management
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Kosef
  • Kaist
  • LG Electronics
  • CCF-DBS

Publisher

VLDB Endowment

Publication History

Published: 01 September 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Towards a scalable, performance-oriented OLAP storage engineProceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II10.1007/978-3-642-29035-0_13(185-202)Online publication date: 15-Apr-2012
  • (2011)The NOX OLAP query modelProceedings of the 13th international conference on Data warehousing and knowledge discovery10.5555/2033616.2033633(167-183)Online publication date: 29-Aug-2011
  • (2010)The NOX frameworkProceedings of the 12th international conference on Data warehousing and knowledge discovery10.5555/1881923.1881942(172-189)Online publication date: 30-Aug-2010
  • (2010)Revisiting the cube lifecycle in the presence of hierarchiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-009-0160-319:2(257-282)Online publication date: 1-Apr-2010
  • (2009)Reduced representations of Emerging Cubes for OLAP database miningInternational Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2009.0290754:3/4(267-300)Online publication date: 1-Nov-2009
  • (2009)Emerging CubesInformation Systems10.1016/j.is.2009.03.00134:6(536-550)Online publication date: 1-Sep-2009
  • (2008)Why go logarithmic if we can go linear?Proceedings of the 11th international conference on Extending database technology: Advances in database technology10.1145/1353343.1353418(618-629)Online publication date: 25-Mar-2008
  • (2008)Supporting the data cube lifecycleThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-006-0036-817:4(729-764)Online publication date: 1-Jul-2008
  • (2007)SideraProceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II10.5555/1784707.1784737(1453-1472)Online publication date: 25-Nov-2007
  • (2007)Mining approximate top-k subspace anomalies in multi-dimensional time-series dataProceedings of the 33rd international conference on Very large data bases10.5555/1325851.1325904(447-458)Online publication date: 23-Sep-2007
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media