Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Extended wavelets for multiple measures

Published: 01 June 2007 Publication History

Abstract

Several studies have demonstrated the effectiveness of the Haar wavelet decomposition as a tool for reducing large amounts of data down to compact wavelet synopses that can be used to obtain fast, accurate approximate answers to user queries. Although originally designed for minimizing the overall mean-squared (i.e., L2-norm) error in the data approximation, recently proposed methods also enable the use of Haar wavelets in minimizing other error metrics, such as the relative error in data value reconstruction, which is arguably the most important for approximate query answers. Relatively little attention, however, has been paid to the problem of using wavelet synopses as an approximate query answering tool over complex tabular datasets containing multiple measures, such as those typically found in real-life OLAP applications. Existing decomposition approaches will either operate on each measure individually, or treat all measures as a vector of values and process them simultaneously. As we demonstrate in this article, these existing individual or combined storage approaches for the wavelet coefficients of different measures can easily lead to suboptimal storage utilization, resulting in drastically reduced accuracy for approximate query answers. To address this problem, in this work, we introduce the notion of an extended wavelet coefficient as a flexible, efficient storage method for wavelet coefficients over multimeasure data. We also propose novel algorithms for constructing effective (optimal or near-optimal) extended wavelet-coefficient synopses under a given storage constraint, for both sum-squared error and relative-error norms. Experimental results with both real-life and synthetic datasets validate our approach, demonstrating that our techniques consistently obtain significant gains in approximation accuracy compared to existing solutions.

Supplementary Material

Deligiannakis Appendix (a10-deligiannakis-apndx.pdf)
Online appendix to designing mediation for context-aware applications. The appendix supports the information on article 10.

References

[1]
Acharya, S., Gibbons, P. B., Poosala, V., and Ramaswamy, S. 1999. Join synopses for approximate query answering. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 275--286.
[2]
Chakrabarti, K., Garofalakis, M., Rastogi, R., and Shim, K. 2001. Approximate query processing using wavelets. VLDB J. 10, 2-3 (Sept.), 199--223.
[3]
Cisco. 1999. Netflow services and applications. White paper. http://www.cisco.com/.
[4]
Cochran, W. G. 1977. Sampling Techniques. John Wiley, New York.
[5]
Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA.
[6]
Deligiannakis, A. and Roussopoulos, N. 2003a. Extended wavelets for multiple measures. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 229--240.
[7]
Deligiannakis, A. and Roussopoulos, N. 2003b. Extended wavelets for multiple measures. Tech. Rep. CS-TR-4462, University of Maryland.
[8]
Foley, J. D., van Dam, A., Feiner, S. K., and Hughes, J. F. 1990. Computer Graphics: Principles and Practice. Addison-Wesley, Reading, MA.
[9]
Ganguly, S., Hasan, W., and Krishnamurthy, R. 1992. Query optimization for parallel execution. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 9--18.
[10]
Garofalakis, M. and Gibbons, P. B. 2001. Approximate query processing: Taming the terabytes. Proceedings of the 27th International Conference on Very Large Data Bases.
[11]
Garofalakis, M. and Gibbons, P. B. 2004. Probabilistic wavelet synopses. ACM Trans. Database Syst. 29, 43--90.
[12]
Garofalakis, M. and Kumar, A. 2005. Wavelet synopses for general error metrics. ACM Trans. Database Syst. 30, 279--332.
[13]
Gilbert, A., Kotidis, Y., Muthukrishnan, S., and Strauss, M. 2001. Surfing wavelets on streams: One-Pass summaries for approximate aggregate queries. In Proceedings of the 27th International Conference on Very Large Data Bases. 79--88.
[14]
Guha, S. 2005. Space efficiency in synopsis construction algorithms. In Proceedings of the 31st International Conference on Very Large Data Bases. 409--420.
[15]
Guha, S., Kim, C., and Shim, K. 2004. XWAVE: Approximate extended wavelets for streaming data. In Proceedings of the 30th International Conference on Very Large Data Bases. 288--299.
[16]
Hellerstein, J., Haas, P., and Wang, H. 1997. Online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 171--182.
[17]
Jawerth, B. and Sweldens, W. 1994. An overview of wavelet based multiresolution analyses. SIAM Rev. 36, 377--412.
[18]
Karras, P. and Mamoulis, N. 2005. One-Pass wavelet synopses for maximum-error metrics. In Proceedings of the 31st International Conference on Very Large Data Bases.
[19]
Matias, Y. and Urieli, D. 2005. Optimal workload-based weighted wavelet synopses. In Proceedings of the 13th International Conference on Database Theory (ICDT). 368--382.
[20]
Matias, Y., Vitter, J., and Wang, M. 2000. Dynamic maintenance of wavelet-based histograms. In Proceedings of the 26th International Conference on Very Large Data Bases. 101--110.
[21]
Matias, Y., Vitter, J. S., and Wang, M. 1998. Wavelet-based histograms for selectivity estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 448--459.
[22]
Schmidt, R. R. and Shahabi, C. 2002. Propolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries. In Proceedings of the 8th International Conference on Extending Database Technology (EDBT). 664--681.
[23]
Stollnitz, E. J., DeRose, T. D., and Salesin, D. H. 1996. Wavelets for Computer Graphics - Theory and Applications. Morgan Kaufmann, San Fransisco, CA.
[24]
Vazirani, V. V. 2001. Approximation Algorithms. Springer Verlag.
[25]
Vitter, J. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 193--204.
[26]
Vitter, J. S. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 37--57.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 32, Issue 2
June 2007
267 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1242524
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2007
Published in TODS Volume 32, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wavelets
  2. approximate query processing
  3. data synopses

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Worst-case Optimal Join AlgorithmsJournal of the ACM10.1145/318014365:3(1-40)Online publication date: 13-Mar-2018
  • (2018)Discrete Wavelet Transform and Wavelet SynopsesEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_539(1132-1139)Online publication date: 7-Dec-2018
  • (2017)Discrete Wavelet Transform and Wavelet SynopsesEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_539-2(1-8)Online publication date: 9-Feb-2017
  • (2016)Approximate Query Processing Using Wavelets in OLAP with Arbitrarily Sized Data and Bounded Errors2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)10.1109/PDP.2016.46(66-73)Online publication date: Feb-2016
  • (2015)Data Summarization Techniques for Big Data—A SurveyHandbook on Data Centers10.1007/978-1-4939-2092-1_38(1109-1152)Online publication date: 17-Mar-2015
  • (2014)A Decomposition Framework for Computing and Querying Multidimensional OLAP Data Cubes over Probabilistic Relational DataFundamenta Informaticae10.5555/2637688.2637693132:2(239-266)Online publication date: 1-Apr-2014
  • (2013)Data StreamApplied Data Mining10.1201/b15027-14(215-235)Online publication date: 12-Jun-2013
  • (2012)Synopses for Massive DataFoundations and Trends in Databases10.1561/19000000044:1–3(1-294)Online publication date: 1-Jan-2012
  • (2012)Worst-case optimal join algorithmsProceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems10.1145/2213556.2213565(37-48)Online publication date: 21-May-2012
  • (2012)Understanding cardinality estimation using entropy maximizationACM Transactions on Database Systems10.1145/2109196.210920237:1(1-31)Online publication date: 6-Mar-2012
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media