Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Range queries in OLAP data cubes

Published: 01 June 1997 Publication History

Abstract

A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL.
For range-sum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at run-time. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/bd of the size of the d-dimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary information, but the overall time complexity is typically reduced significantly. We also discuss how the precomputed information is incrementally updated by batching updates to the data cube. Finally, we present algorithms for choosing the subset of the data cube dimensions for which the auxiliary information is computed and the blocking factor to use for each such subset.
Our approach to answering range-max queries is based on precomputed max over balanced hierarchical tree structures. We use a branch-and-bound-like procedure to speed up the finding of max in a region. We also show that with a branch-and-bound procedure, the average-case complexity is much smaller than the worst-case complexity.

References

[1]
S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. o:f the g~nd Int'l Conference on Very Large Databases, pages 506-521, Mumbai (Bombay), India, September 1996.]]
[2]
Rakesh Agrawal, Ashish Gupta, and Sunita Sarawagi. Modeling multidimensional databases. In Proc. of the 13th Int 'l Conference on Data Engineering, Birmingham, U.K., April 1997.]]
[3]
J.L. Bentley. Multidimensional divide and conquer. Comm. A CM, 23(4):214-229, 1980.]]
[4]
J.L. Bentley and J. H. Friedman. Data structures for range searching. Computing Surveys, ~1(4), 19~9.]]
[5]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles, in Proc. of the A CM SIGMOD Conference on Management of Data, pages 322-331, Atlantic City, N J, May 1990.]]
[6]
M. Berger and I. Regoutsos. An algorithm point clustering and grid generation. IEEE Transactions on Systems, Man and Cybernetics, 21(5):1278-86, 1991.]]
[7]
Bernard Cha~elle. Lower bounds for orthogonal range searching: Ii. the arithmetic model. J. A CM, 37(3):439-463, July 1990.]]
[8]
G.D. Cohen, A.C. Lobstein, and N.j.A. Sloane. b-k~her results on the covering radius of codes. IEJEB Trans. Information Theory, IT-32(5):680- 694, September 1986.]]
[9]
M.C. Chen and L.P. McNamee. The data model and access method of summary data management. IEEE Transactions on Knowledge and Data Engineering, 1(4):519--29, 1989.]]
[10]
E.F. Codd. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical report, E. F. Codd and Associates, 1993.]]
[11]
George Colliat. CLAP, relational, and multidimensional database systems.SIGMOD RECORD, September 1996.]]
[12]
D. Comer. The ubiquitous B-tree. A CM Computing Surveys, 11(2):121-138, June 1979.]]
[13]
Bernard Chazelle and Burton Rosenberg. Computing partial sums in multidimensional arrays. In Proc. of the ACM Syrup. on Computational Geometry, pages 131-139, 1989.]]
[14]
S. Chaudhuri and K. Shim. Including group-by in query optimization. In Proc. of the $Oth Int'l Conference on Very Large Databases, pages 354- 366, Santiago, Chile, September 1994.]]
[15]
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tabs and subtotals. In Proc. of the l~th Int'l Conference on Data Engineering, pages 152-159, 1996.]]
[16]
A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. In Proceedings o.f the Eighth International Conference on Very Large Databases (VLDB), pages 358-369, Zurich, Switzerland, September 1995.]]
[17]
Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jeffrey D. UUman. Index selection for CLAP. In Proc. of the 13th Int'l Conference on Data Engineering, Birmingham, U.K., April 1997.]]
[18]
Ching-Tien Ho, Jehoshua Bruck, and Rakesh Agrawal. Partial-sum queries in CLAP data cubes using covering codes. In Proc. of the 16th A CM Symposium on Principles of Database Systems, May 1997.]]
[19]
V. Harinarayan, A. Rajaraman, and J.D. UU- man. Implementing data cubes efficiently. In Proc. of the A CM SIGMOD Conference on Management of Data, June 1996.]]
[20]
IBM. DB~ SQL Reference }or Common Servers Version ~, 1995.]]
[21]
A.K. Jain and R. C. Dubes. Algorithms }or Clustering Data. Prentice Hall, 1988.]]
[22]
T. Johnson and D. Shasha. Hierarchically sprit cube forests for decision support: description and tuned design, 1996. Working Paper.]]
[23]
D. Lomet, editor. Special Issue on Materialized Views and Data Warehousing. IEEE Data Engineering Bulletin, 18(2), June 1995.]]
[24]
Kurt Mehlhorn. Data Structure and Algorithm 3: Multi-dimensional Searching and Computational Geometry. Springer-Verlag, 1984.]]
[25]
Z. Michalewicz. Statistical and Scientific Databases. Ellis Horwood, 1992.]]
[26]
L. Mitten. Branch and bound methods: General formulation and properties. Operations Research, 18:24-34, 1970.]]
[27]
The CLAP Council. MD-API the OLAP Applzcation Program Interface Version 0.5 Specification, September 1996.]]
[28]
E.M. Reingold, J. Nievergelt, and N. Dec. Combinatorial Algorithms. Prentice-Hall, Englewood Cliffs. N J, 1977.]]
[29]
H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1989.]]
[30]
John Sharer, Rakesh Agrawal, and Mattish Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. of the $$nd Int'l Conference on Very Large Databases, Bombay, India, September 1996.]]
[31]
P. Schroeter and j. Bigun. Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement. Pattern Recognition, 25(5):695-709, May 1995.]]
[32]
A. Shukla, P.M. Deshpande, J.F. Naughton, and K. Ramasamy. Storage estimation for multidimensional aggregates in the presence of hierarchies. In Proc. of the 2~nd Int'l Conference on Very Large Databases, pages 522-531, Mumbai (Bombay), India, September 1996.]]
[33]
B. Salzberg and A. Reuter. Indexing for aggregation, 1996. Working Paper.]]
[34]
J. Srivastava, J.S.E. Tan, and V.Y. Lure. TB- SAM: An access method for efficient processing of statistical queries. IEEE Transactions on Knowledge and Data Engineering, 1(4), 1989.]]
[35]
P.M. Vaidya. Space-time tradeoffs for orthogonal range queries. In Proc. 17th Annual A CM Syrup. on Theory of Comput., pages 169-174, 1985.]]
[36]
D.E. Willard and G.S. Lueker. Adding range restriction capability to dynamic data structures. J. A CM, 32(3):597-617, 1985.]]
[37]
Andrew Yao. On the complexity of maintaining partial sums. SIAM J. Computing, 14(2):277- 288, May 1985.]]
[38]
W.P. Yah and P. Larson. Eager aggregation and lazy aggregation. In Proceedings of the Eighth International Conference on Very Large Databases (VLDB), pages 345-357, Zurich, Switzerland, September 1995.]]

Cited By

View all
  • (2024)Fast and Accurate PARAFAC2 Decomposition for Time Range Queries on Irregular TensorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679735(962-972)Online publication date: 21-Oct-2024
  • (2023)Compressing Big OLAP Data Cubes in Big Data Analytics Systems: New Paradigms, a Reference Architecture, and Future Research PerspectivesE-Business and Telecommunications10.1007/978-3-031-45137-9_7(156-175)Online publication date: 30-Sep-2023
  • (2022)Collecting Geospatial Data Under Local Differential Privacy With Improving Frequency EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3181049(1-12)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 26, Issue 2
June 1997
583 pages
ISSN:0163-5808
DOI:10.1145/253262
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
    June 1997
    594 pages
    ISBN:0897919114
    DOI:10.1145/253260
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997
Published in SIGMOD Volume 26, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)192
  • Downloads (Last 6 weeks)33
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fast and Accurate PARAFAC2 Decomposition for Time Range Queries on Irregular TensorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679735(962-972)Online publication date: 21-Oct-2024
  • (2023)Compressing Big OLAP Data Cubes in Big Data Analytics Systems: New Paradigms, a Reference Architecture, and Future Research PerspectivesE-Business and Telecommunications10.1007/978-3-031-45137-9_7(156-175)Online publication date: 30-Sep-2023
  • (2022)Collecting Geospatial Data Under Local Differential Privacy With Improving Frequency EstimationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3181049(1-12)Online publication date: 2022
  • (2022)Upilio: Leveraging the Serverless Paradigm for Building a Versatile IoT ApplicationService-Oriented and Cloud Computing10.1007/978-3-031-04718-3_8(121-136)Online publication date: 22-Mar-2022
  • (2021)SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/34600137:3(1-43)Online publication date: 8-Jun-2021
  • (2021)MOSE: A Monotonic Selectivity Estimator Using Learned CDFIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3112753(1-1)Online publication date: 2021
  • (2020)CoopStoreProceedings of the VLDB Endowment10.14778/3407790.340781713:12(2174-2187)Online publication date: 14-Sep-2020
  • (2020)Using Deep Learning for Big Spatial Data PartitioningACM Transactions on Spatial Algorithms and Systems10.1145/34021267:1(1-37)Online publication date: 12-Aug-2020
  • (2020)TrioStatProceedings of the 28th International Conference on Advances in Geographic Information Systems10.1145/3397536.3422220(78-86)Online publication date: 3-Nov-2020
  • (2019)Durable top-k queries on temporal dataProceedings of the VLDB Endowment10.14778/3275366.328496711:13(2223-2235)Online publication date: 17-Jan-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media