Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Join-Like Operator to Combine Data Cubes and Answer Queries from Multiple Data Cubes

Published: 07 October 2014 Publication History

Abstract

In order to answer a “joint” query from multiple data cubes, Pourabass and Shoshani [2007] distinguish the data cube on the measure of interest (called the “primary” data cube) from the other data cubes (called “proxy” data cubes) that are used to involve the dimensions (in the query) not in the primary data cube. They demonstrate in study cases that, if the measures of the primary and proxy data cubes are correlated, then the answer to a joint query is an accurate estimate of its true value. Needless to say, for two or more proxy data cubes, the result depends upon the way the primary and proxy data cubes are combined together; however, for certain combination schemes Pourabass and Shoshani provide a sufficient condition, that they call proxy noncommonality, for the invariance of the result.
In this article, we introduce: (1) a merge operator combining the contents of a primary data cube with the contents of a proxy data cube, (2) merge expressions for general combination schemes, and (3) an equivalence relation between merge expressions having the same pattern. Then, we prove that proxy noncommonality characterizes patterns for which every two merge expressions are equivalent. Moreover, we provide an efficient procedure for answering joint queries in the special case of perfect merge expressions. Finally, we show that our results apply to data cubes in which measures are obtained from unaggregated data using the aggregate functions SUM, COUNT, MAX, and MIN, and a lot more.

References

[1]
S. M. Aji and R.-J. McEliece. 2000. The generalized distributive law. IEEE Trans. Inf. Theory 46, 325--343.
[2]
C. Beeri, R. Fagin, D. Maier, and M. Yannakakis. 1983. On the desirability of acyclic database systems. J. ACM 30, 479--513.
[3]
Y. M. Bishop, S. E. Fienberg, and P. W. Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA.
[4]
P. Cheeseman. 1984. Learning of expert systems from data. In Proceedings of the IEEE Workshop on Principles of Knowledge-Based Systems (PKWBS'84). 115--122.
[5]
M. Dalkilic and E. Robertson. 2000. Information dependencies. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'00). 245--253.
[6]
W. E. Deming and F. F. Stephan. 1940. On least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals Math. Statist. 11, 427--444.
[7]
M. Ghosh and J. N. K. Rao. 1994. Small area estimation: An appraisal. Statist. Sci. 9, 55--76.
[8]
C. Giannella and E. Robertson. 2003. A note on approximation measures for multi-valued dependencies in relational databases. Inf. Process. Lett. 85, 153--158.
[9]
S. A. Goldman and R. L. Rivest. 1988. Making maximum-entropy computations easier by adding extra constraints. In Maximum-Entropy and Bayesian Methods, Science and Engineering 2, G. J. Erikson and C. R. Smith, Eds., Kluwer Academic, 323--340.
[10]
J. Gray, A. Bosworth, A. Layman, and H. Priahesh. 1995. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In Proceedings of the International Conference on Data Engineering (ICDE'95). IEEE Computer Society Press, 152--159.
[11]
V. Harinarayan, A. Rajaraman, and J. D. Ullman. 1996. Implementing data cubes efficiently. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIGMOD'96). 205--216.
[12]
F. R. Kschinschang, B. J. Frey, and H.-A. Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498--519.
[13]
S. L. Lauritzen. 1996. Graphical Models. Oxford University Press, Oxford, UK.
[14]
W. W. Leontief and A. Strout. 1963. Multiregional input-output analysis. In Structural Interdependence and Economic Development, 119--169.
[15]
D. Maier and J. D. Ullman. 1984. Connections in acyclic hypergraphs. Theor. Comput. Sci. 32, 185--199.
[16]
F. M. Malvestuto. 1987. Answering queries in categorical databases. In Proceedings of the 6th ACM Symposium on Principles of Database Systems (PODS'87). 25--27.
[17]
F. M. Malvestuto. 1988. Existence of extensions and product extensions for discrete probability distributions. Discr. Math. 69, 61--77.
[18]
F. M. Malvestuto. 1989. A universal table model for categorical databases. Inf. Sci. 49, 203--223.
[19]
F. M. Malvestuto. 1993. A universal-scheme approach to statistical databases containing homogeneous summary tables. ACM Trans. Database Syst. 18, 678--708.
[20]
F. M. Malvestuto. 2013. The sum-product algorithm: Algebraic independence and computational aspects. Kybernetika 49, 4--22.
[21]
F. M. Malvestuto and E. Pourabbas. 2004. Customized answers to summary queries via aggregate views. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM'04). 193--202.
[22]
F. M. Malvestuto and E. Pourabbas. 2005. Local computation of answers to table queries on summary databases. In Proceedings of the 17th International Conference on Scientific and Statistical Database Management (SSDBM'05). 263--270.
[23]
F. Mosteller. 1948. On pooling data. J. Amer. Statist. Assoc. 43, 231--242.
[24]
W.-K. Ng and C. V. Ravishankar. 1995. Information synthesis in statistical databases. In Proceedings of the 4th Conference on Information and Knowledge Management (CIKM'95). ACM Press, New York, 335--361.
[25]
J. Pearl. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Fransisco.
[26]
E. Pourabbas and A. Shoshani. 2003. Answering joint queries from multiple aggregate olap databases. In Proceedings of the 5th Conference on Data Warehousing and Knowledge Discovery (DaWak'03). Y. Kambayashi, M. K. Mohania, and W. Woess, Eds., Lecture Notes in Computer Science, vol. 2737, Springer, 24--34.
[27]
E. Pourabbas and A. Shoshani. 2007. Efficient estimation of joint queries from multiple olap databases. ACM Trans. Database Syst. 32, 1.
[28]
E. Pourabbas and A. Shoshani. 2010. Improving estimation accuracy of aggregate queries on data cubes. Data Knowl. Engin. 69, 50--72.
[29]
J. N. K. Rao. 2003. Small Area Estimation. John Wiley and Sons.
[30]
W. L. Schaible, Ed. 1996. Indirect Estimators in U.S. Federal Programs. Lecture Notes in Statistics, vol. 108, Springer.
[31]
A. Shoshani. 1997. OLAP and statistical databases. In Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'97). ACM Press, New York, 185--196.
[32]
R. Stone and A. Brown. 1962. A Programme of Growth: A Computable Model for Economic Growth, Part I. Chapman and Hall, London.
[33]
R. E. Tarjan and M. Yannakakis. 1984. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput. 13, 566--579.
[34]
V. Verma, F. Gagliardi, and C. Ferretti. 2009. On pooling data and measures. http://www.econ-pol.unisi.it/dmq/pdf/DMQ_WP_84.pdf.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 39, Issue 3
September 2014
264 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2676651
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2014
Accepted: 01 June 2014
Revised: 01 May 2014
Received: 01 October 2013
Published in TODS Volume 39, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data cubes
  2. OLAP
  3. data integration

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 265
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media