Abstract
The wavelet decomposition is a proven tool for constructing concise synopses of large data sets that can be used to obtain fast approximate answers. Existing research studies focus on selecting an optimal set of wavelet coefficients to store so as to minimize some error metric, without however seeking to reduce the size of the wavelet coefficients themselves. In many real data sets the existence of large spikes in the data values results in many large coefficient values lying on paths of a conceptual tree structure known as the error tree. To exploit this fact, we introduce in this paper a novel compression scheme for wavelet synopses, termed hierarchically compressed wavelet synopses, that fully exploits hierarchical relationships among coefficients in order to reduce their storage. Our proposed compression scheme allows for a larger number of coefficients to be stored for a given space constraint thus resulting in increased accuracy of the produced synopsis. We propose optimal, approximate and greedy algorithms for constructing hierarchically compressed wavelet synopses that minimize the sum squared error while not exceeding a given space budget. Extensive experimental results on both synthetic and real-world data sets validate our novel compression scheme and demonstrate the effectiveness of our algorithms against existing synopsis construction algorithms.
Similar content being viewed by others
References
Baraniuk, R., Jones, D.: A signal-dependent time-frequency representation: fast algorithm for optimal kernel design. ISP 42(1), 134– (1994)
Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 111–122 (2000)
Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: Proceedings of the International Conference on Extending Database Technology (EDBT) (2006)
Deligiannakis, A., Garofalakis, M., Roussopoulos, N.: A fast approximation scheme for probabilistic wavelet synopses. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management (SSDBM) (2005)
Deligiannakis, A., Garofalakis, M., Roussopoulos, N.: Extended wavelets for multiple measures. ACM Trans. Database Systems 32(2) (2007)
Deligiannakis, A., Roussopoulos, N.: Extended wavelets for multiple measures. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 229–240 (2003)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)
Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 476–487 (2002)
Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 166–176 (2004)
Garofalakis, M., Kumar, A.: Wavelet synopses for general error metrics. ACM Trans. Database Systems 30(4), 888– (2005)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) (2001)
Guha, S.: Space efficiency in synopsis construction algorithms. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 409–420 (2005)
Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 88–97 (2005)
Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA) (2006)
Guha, S., Kim, C., Shim, K.: Xwave: Approximate extended wavelets for streaming data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 288–299 (2004)
Jahangiri, M., Sacharidis, D., Shahabi, C.: Shift-Split: I/O efficient maintenance of wavelet-transformed multidimensional data. In: Proceedings of ACM International Conference on Management of Data (SIGMOD) (2005)
Jawerth, B., Sweldens, W.: An overview of wavelet based multiresolution analyses. SIAM Rev. 36(3), 377– (1994)
Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 421–432 (2005)
Mallat S. (1999) A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, Ney York
Matias, Y., Urieli, D.: Inner-product based wavelet synopses for range-sum queries. In: Proceedings of the 14th Annual European Symposium on Algorithms (ESA), pp. 504–515 (2006)
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 448–459 (1998)
Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 101–110 (2000)
Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: Proceedings of the IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS) (2005)
Natse, A., Rastogi, R., Shim, K.: WALRUS: A similarity retrieval algorithm for image databases. In: Proceedings of ACM International Conference on Management of Data (SIGMOD) (1999)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB (1997)
Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann (1996)
Urieli, D., Matias, Y.: Optimal workload-based weighted wavelet synopses. In: Proceedings of International Conference on Database Theory (ICDT) (2005)
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of ACM International Conference on Management of Data (SIGMOD, pp. 193–204. ACM Press (1999)
Vitter, J.S., Wang, M., Iyer, B.R.: Data cube approximation and histograms via wavelets. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp. 96–104 (1998)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been funded by the project PENED 2003. The project is co-financed 75% of public expenditure through EC—European Social Fund, 25% of public expenditure through Ministry of Development—General Secretariat of Research and Technology and through private sector, under measure 8.3 of OPERATIONAL PROGRAMME “COMPETITIVENESS" in the 3rd Community Support Programme.
Rights and permissions
About this article
Cite this article
Sacharidis, D., Deligiannakis, A. & Sellis, T. Hierarchically compressed wavelet synopses. The VLDB Journal 18, 203–231 (2009). https://doi.org/10.1007/s00778-008-0096-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0096-z