Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Distributed online aggregations

Published: 01 August 2009 Publication History

Abstract

In many decision making applications, users typically issue aggregate queries. To evaluate these computationally expensive queries, online aggregation has been developed to provide approximate answers (with their respective confidence intervals) quickly, and to continuously refine the answers. In this paper, we extend the online aggregation technique to a distributed context where sites are maintained in a DHT (Distributed Hash Table) network. Our Distributed Online Aggregation (DoA) scheme iteratively and progressively produces approximate aggregate answers as follows: in each iteration, a small set of random samples are retrieved from the data sites and distributed to the processing sites; at each processing site, a local aggregate is computed based on the allocated samples; at a coordinator site, these local aggregates are combined into a global aggregate. DoA adaptively grows the number of processing nodes as the sample size increases. To further reduce the sampling overhead, the samples are retained as a precomputed synopsis over the network to be used for processing future queries. We also study how these synopsis can be maintained incrementally. We have conducted extensive experiments on PlanetLab. The results show that our DoA scheme reduces the initial waiting time significantly and provides high quality approximate answers with running confidence intervals progressively.

References

[1]
http://www.comp.nus.edu.sg/s3p2p/.
[2]
http://www.planet-lab.org.
[3]
http://www.tpc.org/tpch.
[4]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. SIGMOD Rec., 28(2), 1999.
[5]
M. O. Akinde, M. H. Böhlen, T. Johnson, L. V. S. Lakshmanan, and D. Srivastava. Efficient olap query processing in distributed data warehouses. Information Systems, 28(1--2):111--135, 2003.
[6]
J. Albrecht and W. Lehner. On-line analytical processing in distributed data warehouses. In IDEAS, 1998.
[7]
B. Arai, G. Das, D. Gunopulos, and V. Kalogeraki. Approximating aggregation queries in peer-to-peer networks. In ICDE, 2006.
[8]
B. Arai, S. Lin, and D. Gunopulos. Efficient data sampling in heterogeneous peer-to-peer networks. In ICDM, 2007.
[9]
A. Awan, R. A. Ferreira, S. Jagannathan, and A. Grama. Distributed uniform sampling in unstructured peer-to-peer networks. In HICSS, 2006.
[10]
B. A. Bash, J. W. Byers, and J. Considine. Approximately uniform random sampling in sensor networks. In DMNS, 2004.
[11]
S. Datta and H. Kargupta. Uniform data sampling from a peer-to-peer network. In ICDCS, 2007.
[12]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 2008.
[13]
O. F. Random sampling from databases. In PhD Thesis, University of California, 1993.
[14]
C. Gkantsidis, C. Gkantsidis, M. Mihail, and A. Saberi. Random walks in peer-to-peer networks: algorithms and evaluation. Perform. Eval., 63(3), 2006.
[15]
P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD, 1999.
[16]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, 1997.
[17]
M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform url sampling. In WWW, 2000.
[18]
R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S. Shenker, and I. Stoica. Querying the internet with pier. In VLDB.
[19]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS, 41(3), 2007.
[20]
M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sampling service: experimental evaluation of unstructured gossip-based implementations. In Middleware, 2004.
[21]
M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3), 2005.
[22]
C. Jermaine, A. Dobra, S. Arumugam, S. Joshi, and A. Pol. A disk-based join with probabilistic guarantees. In SIGMOD, 2005.
[23]
W. Litwin, M. anne Neimat, and D. A. Schneider. Lh*---a scalable, distributed data structure. ACM Transactions on Database Systems, 21, 1996.
[24]
F. Olken. Random sampling from databases. 1993.
[25]
F. Olken and D. Rotem. Maintenance of materialized views of sampling queries. In ICDE, 1992.
[26]
R. Nelson and A. Tantawi. Approximate analysis of fork/join synchronization in parallel queues. 37, 1988.
[27]
I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transaction Network, 11(1), 2003.
[28]
G. Swart. Spreading the load using consistent hashing: A preliminary report. In ISPDC, 2004.
[29]
K.-L. Tan, C. H. Goh, and B. C. Ooi. Online feedback for nested aggregate queries with multi-threading. In VLDB, 1999.
[30]
S. Wu, J. Li, B. C. Ooi, and K.-L. Tan. Just-in-time query retrieval over partially indexed data on structured p2p overlays. In SIGMOD Conference, 2008.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 2, Issue 1
August 2009
1293 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009
Published in PVLDB Volume 2, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Step Toward Deep Online AggregationProceedings of the ACM on Management of Data10.1145/35892691:2(1-28)Online publication date: 20-Jun-2023
  • (2020)Online aggregation for large MapReduce jobsProceedings of the VLDB Endowment10.14778/3402707.34027484:11(1135-1145)Online publication date: 3-Jun-2020
  • (2019)Wander Join and XDBACM Transactions on Database Systems10.1145/328455144:1(1-41)Online publication date: 29-Jan-2019
  • (2018)Random Sampling over Joins RevisitedProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183739(1525-1539)Online publication date: 27-May-2018
  • (2018)A Session-Based Approach to Fast-But-Approximate Interactive Data Cube ExplorationACM Transactions on Knowledge Discovery from Data10.1145/307064812:1(1-26)Online publication date: 13-Feb-2018
  • (2018)Efficiently processing deterministic approximate aggregation query on massive dataKnowledge and Information Systems10.1007/s10115-017-1136-z57:2(437-473)Online publication date: 1-Nov-2018
  • (2017)Bi-Level Online Aggregation on Raw DataProceedings of the 29th International Conference on Scientific and Statistical Database Management10.1145/3085504.3085514(1-12)Online publication date: 27-Jun-2017
  • (2016)Wander JoinProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915235(615-629)Online publication date: 26-Jun-2016
  • (2016)FB+-tree for Big Data ManagementBig Data Research10.1016/j.bdr.2015.11.0034:C(25-36)Online publication date: 1-Jun-2016
  • (2015)Speculative Approximations for Terascale Distributed Gradient Descent OptimizationProceedings of the Fourth Workshop on Data analytics in the Cloud10.1145/2799562.2799563(1-10)Online publication date: 31-May-2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media