research-article

Distributed online aggregations

Authors:

Kian-Lee TanAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 2, Issue 1

Pages 443 - 454

https://doi.org/10.14778/1687627.1687678

Published: 01 August 2009 Publication History

Abstract

In many decision making applications, users typically issue aggregate queries. To evaluate these computationally expensive queries, online aggregation has been developed to provide approximate answers (with their respective confidence intervals) quickly, and to continuously refine the answers. In this paper, we extend the online aggregation technique to a distributed context where sites are maintained in a DHT (Distributed Hash Table) network. Our Distributed Online Aggregation (DoA) scheme iteratively and progressively produces approximate aggregate answers as follows: in each iteration, a small set of random samples are retrieved from the data sites and distributed to the processing sites; at each processing site, a local aggregate is computed based on the allocated samples; at a coordinator site, these local aggregates are combined into a global aggregate. DoA adaptively grows the number of processing nodes as the sample size increases. To further reduce the sampling overhead, the samples are retained as a precomputed synopsis over the network to be used for processing future queries. We also study how these synopsis can be maintained incrementally. We have conducted extensive experiments on PlanetLab. The results show that our DoA scheme reduces the initial waiting time significantly and provides high quality approximate answers with running confidence intervals progressively.

References

[1]

http://www.comp.nus.edu.sg/s3p2p/.

[2]

http://www.planet-lab.org.

[3]

http://www.tpc.org/tpch.

[4]

S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. SIGMOD Rec., 28(2), 1999.

Digital Library

[5]

M. O. Akinde, M. H. Böhlen, T. Johnson, L. V. S. Lakshmanan, and D. Srivastava. Efficient olap query processing in distributed data warehouses. Information Systems, 28(1--2):111--135, 2003.

Digital Library

[6]

J. Albrecht and W. Lehner. On-line analytical processing in distributed data warehouses. In IDEAS, 1998.

Digital Library

[7]

B. Arai, G. Das, D. Gunopulos, and V. Kalogeraki. Approximating aggregation queries in peer-to-peer networks. In ICDE, 2006.

Digital Library

[8]

B. Arai, S. Lin, and D. Gunopulos. Efficient data sampling in heterogeneous peer-to-peer networks. In ICDM, 2007.

Digital Library

[9]

A. Awan, R. A. Ferreira, S. Jagannathan, and A. Grama. Distributed uniform sampling in unstructured peer-to-peer networks. In HICSS, 2006.

Digital Library

[10]

B. A. Bash, J. W. Byers, and J. Considine. Approximately uniform random sampling in sensor networks. In DMNS, 2004.

Digital Library

[11]

S. Datta and H. Kargupta. Uniform data sampling from a peer-to-peer network. In ICDCS, 2007.

Digital Library

[12]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 2008.

Digital Library

[13]

O. F. Random sampling from databases. In PhD Thesis, University of California, 1993.

[14]

C. Gkantsidis, C. Gkantsidis, M. Mihail, and A. Saberi. Random walks in peer-to-peer networks: algorithms and evaluation. Perform. Eval., 63(3), 2006.

Digital Library

[15]

P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD, 1999.

Digital Library

[16]

J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, 1997.

Digital Library

[17]

M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform url sampling. In WWW, 2000.

Digital Library

[18]

R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S. Shenker, and I. Stoica. Querying the internet with pier. In VLDB.

Digital Library

[19]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS, 41(3), 2007.

Digital Library

[20]

M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sampling service: experimental evaluation of unstructured gossip-based implementations. In Middleware, 2004.

Digital Library

[21]

M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3), 2005.

Digital Library

[22]

C. Jermaine, A. Dobra, S. Arumugam, S. Joshi, and A. Pol. A disk-based join with probabilistic guarantees. In SIGMOD, 2005.

Digital Library

[23]

W. Litwin, M. anne Neimat, and D. A. Schneider. Lh*---a scalable, distributed data structure. ACM Transactions on Database Systems, 21, 1996.

Digital Library

[24]

F. Olken. Random sampling from databases. 1993.

[25]

F. Olken and D. Rotem. Maintenance of materialized views of sampling queries. In ICDE, 1992.

Digital Library

[26]

R. Nelson and A. Tantawi. Approximate analysis of fork/join synchronization in parallel queues. 37, 1988.

Digital Library

[27]

I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transaction Network, 11(1), 2003.

Digital Library

[28]

G. Swart. Spreading the load using consistent hashing: A preliminary report. In ISPDC, 2004.

Digital Library

[29]

K.-L. Tan, C. H. Goh, and B. C. Ooi. Online feedback for nested aggregate queries with multi-threading. In VLDB, 1999.

Digital Library

[30]

S. Wu, J. Li, B. C. Ooi, and K.-L. Tan. Just-in-time query retrieval over partially indexed data on structured p2p overlays. In SIGMOD Conference, 2008.

Digital Library

Cited By

Sheoran NChockchowwat SChheda AWang SVerma RPark Y(2023)A Step Toward Deep Online AggregationProceedings of the ACM on Management of Data10.1145/35892691:2(1-28)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589269
Pansare NBorkar VJermaine CCondie T(2020)Online aggregation for large MapReduce jobsProceedings of the VLDB Endowment10.14778/3402707.34027484:11(1135-1145)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402707.3402748
Li FWu BYi KZhao Z(2019)Wander Join and XDBACM Transactions on Database Systems10.1145/328455144:1(1-41)Online publication date: 29-Jan-2019
https://dl.acm.org/doi/10.1145/3284551
Show More Cited By

Index Terms

Distributed online aggregations

Recommendations

Approximate Aggregations in Structured P2P Networks

In corporate networks, daily business data are generated in gigabytes or even terabytes. It is costly to process aggregate queries in those systems. In this paper, we propose PACA, a probably approximately correct aggregate query processing scheme, for ...
Dynamic optimization of generalized SQL queries with horizontal aggregations
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

SQL presents limitations to return aggregations as tables with a horizontal layout. A user generally needs to write separate queries and data definition statements to combine transposition with aggregation. With that motivation in mind, we introduce ...
Vertical and horizontal percentage aggregations
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Existing SQL aggregate functions present important limitations to compute percentages. This article proposes two SQL aggregate functions to compute percentages addressing such limitations. The first function returns one row for each percentage in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 2, Issue 1

August 2009

1293 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009

Published in PVLDB Volume 2, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
273
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sheoran NChockchowwat SChheda AWang SVerma RPark Y(2023)A Step Toward Deep Online AggregationProceedings of the ACM on Management of Data10.1145/35892691:2(1-28)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589269
Pansare NBorkar VJermaine CCondie T(2020)Online aggregation for large MapReduce jobsProceedings of the VLDB Endowment10.14778/3402707.34027484:11(1135-1145)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402707.3402748
Li FWu BYi KZhao Z(2019)Wander Join and XDBACM Transactions on Database Systems10.1145/328455144:1(1-41)Online publication date: 29-Jan-2019
https://dl.acm.org/doi/10.1145/3284551
Zhao ZChristensen RLi FHu XYi KDas GJermaine CBernstein P(2018)Random Sampling over Joins RevisitedProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183739(1525-1539)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3183713.3183739
Kamat NNandi A(2018)A Session-Based Approach to Fast-But-Approximate Interactive Data Cube ExplorationACM Transactions on Knowledge Discovery from Data10.1145/307064812:1(1-26)Online publication date: 13-Feb-2018
https://dl.acm.org/doi/10.1145/3070648
Han XWang BLi JGao H(2018)Efficiently processing deterministic approximate aggregation query on massive dataKnowledge and Information Systems10.1007/s10115-017-1136-z57:2(437-473)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1007/s10115-017-1136-z
Cheng YZhao WRusu FChoudhary AWu KDong B(2017)Bi-Level Online Aggregation on Raw DataProceedings of the 29th International Conference on Scientific and Statistical Database Management10.1145/3085504.3085514(1-12)Online publication date: 27-Jun-2017
https://dl.acm.org/doi/10.1145/3085504.3085514
Li FWu BYi KZhao ZÖzcan FKoutrika GMadden S(2016)Wander JoinProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915235(615-629)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1145/2882903.2915235
Yu CBoyd J(2016)FB+-tree for Big Data ManagementBig Data Research10.1016/j.bdr.2015.11.0034:C(25-36)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1016/j.bdr.2015.11.003
Qin CRusu F(2015)Speculative Approximations for Terascale Distributed Gradient Descent OptimizationProceedings of the Fourth Workshop on Data analytics in the Cloud10.1145/2799562.2799563(1-10)Online publication date: 31-May-2015
https://dl.acm.org/doi/10.1145/2799562.2799563
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents