Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2882903.2915225acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Scalable Approximate Query Tracking over Highly Distributed Data Streams

Published: 14 June 2016 Publication History

Abstract

The recently-proposed Geometric Monitoring (GM) method has provided a general tool for the distributed monitoring of arbitrary non-linear queries over streaming data observed by a collection of remote sites, with numerous practical applications. Unfortunately, GM-based techniques can suffer from serious scalability issues with increasing numbers of remote sites. In this paper, we propose novel techniques that effectively tackle the aforementioned scalability problems by exploiting a carefully designed sample of the remote sites for efficient approximate query tracking. Our novel sampling-based scheme utilizes a sample of cardinality proportional to √N (compared to N for the original GM), where $N$ is the number of sites in the network, to perform the monitoring process. Our experimental evaluation over a variety of real-life data streams demonstrates that our sampling-based techniques can significantly reduce the communication cost during distributed monitoring with controllable, predefined accuracy guarantees.

References

[1]
S. Burdakis and A. Deligiannakis. Detecting outliers in sensor networks using the geometric approach. In Proc. of ICDE Conference, pages 1108--1119, 2012.
[2]
E. J. Candès and Y. Plan. A probabilistic and ripless theory of compressed sensing. CoRR, abs/1011.3854, 2010.
[3]
A. L. Cauchy. Cours d'analyse de l'École royale polytechnique. Paris, France, 1821.
[4]
G. Cormode. The continuous distributed monitoring model. SIGMOD Rec., 42(1):5--14, may 2013.
[5]
G. Cormode and N. Duffield. Sampling for big data: A tutorial. In Proc. of ACM SIGKDD Conference, pages 1975--1975, 2014.
[6]
G. Cormode and M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proc. of VLDB Conference, pages 312--323, 2005.
[7]
G. Cormode and M. Garofalakis. Streaming in a connected world: querying and tracking distributed data streams. In Proc. of SIGMOD Conference, pages 1178--1181, 2007.
[8]
G. Cormode and M. Garofalakis. Approximate continuous querying over distributed streams. ACM Transactions on Database Systems, 33(2):9:1--9:39, 2008.
[9]
G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In SIGMOD, pages 25--36, 2005.
[10]
G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Trans. Algorithms, 7(2):21:1--21:20, mar 2011.
[11]
C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: A stream database for network applications. In ACM SIGMOD, pages 647--651, 2003.
[12]
M. Dilman and D. Raz. Efficient reactive monitoring. In INFOCOM, volume 2, pages 1012--1019, 2001.
[13]
M. Gabel, A. Schuster, and D. Keren. Communication-efficient distributed variance monitoring and outlier detection for multivariate time series. In IEEE IPDPS, pages 37--47, 2014.
[14]
M. Garofalakis, J. Gehrke, and R. Rastogi. Querying and mining data streams: You only get one look a tutorial. In Proc. of SIGMOD Conference, pages 635--635, 2002.
[15]
M. Garofalakis, D. Keren, and V. Samoladas. Sketch-based geometric monitoring of distributed stream queries. In VLDB, 6(10):937--948, 2013.
[16]
N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, and A. Schuster. Prediction-based geometric monitoring over distributed data streams. In SIGMOD, pages 265--276, 2012.
[17]
N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, and A. Schuster. Distributed geometric query monitoring using prediction models. ACM Trans. Database Syst., 39(2):16:1--16:42, may 2014.
[18]
K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Inf. Retr., 4(2):133--151, 2001.
[19]
Z. Huang, K. Yi, Y. Liu, and G. Chen. Optimal sampling algorithms for frequency estimation in distributed data. In INFOCOM, pages 1997--2005, 2011.
[20]
Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proc. of PODS, pages 295--306, 2012.
[21]
S. R. Jeffery, M. Garofalakis, and M. J. Franklin. Adaptive cleaning for rfid data streams. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB '06, pages 163--174. VLDB Endowment, 2006.
[22]
M. Kamp, M. Boley, D. Keren, A. Schuster, and I. Sharfman. Communication efficient distributed online prediction by dynamic model synchronization. In ECML PKDD, pages 623--639, 2014.
[23]
R. Keralapura, G. Cormode, and J. Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD, pages 289--300, 2006.
[24]
D. Keren, G. Sagy, A. Abboud, D. Ben-David, A. Schuster, I. Sharfman, and A. Deligiannakis. Geometric monitoring of heterogeneous streams. Knowledge and Data Engineering, IEEE Transactions on, 26(8):1890--1903, 2014.
[25]
A. Lazerson, I. Sharfman, D. Keren, A. Schuster, M. Garofalakis, and V. Samoladas. Monitoring distributed streams using convex decompositions. Proc. VLDB Endow., 8(5):545--556, jan 2015.
[26]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5(Apr):361--397, 2004.
[27]
Z. Liu, B. Radunović, and M. Vojnović. Continuous distributed counting for non-monotonic streams. In Proc. of PODS, pages 307--318, 2012.
[28]
S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. Tinydb: an acquisitional query processing system for sensor networks. ACM Trans. Database Syst., 30:122--173, March 2005.
[29]
A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In ICDE, pages 767--778, 2005.
[30]
O. Papapetrou and M. N. Garofalakis. Continuous fragmented skylines over distributed streams. In IEEE ICDE, pages 124--135, 2014.
[31]
Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vision, 40(2):99--121, 2000.
[32]
G. Sagy, D. Keren, I. Sharfman, and A. Schuster. Distributed threshold querying of general functions by a difference of monotonic representation. In VLDB, 4:46--57, 2010.
[33]
G. Sagy, I. Sharfman, D. Keren, and A. Schuster. Top-k vectorial aggregation queries in a distributed environment. J. Parallel Distrib. Comput., 71(2):302--315, 2011.
[34]
C.-E. Sarndal, B. Swensson, and J. Wretman. "Model Assisted Survey Sampling". Springer-Verlag New York, Inc. (Springer Series in Statistics), 1992.
[35]
I. Sharfman, A. Schuster, and D. Keren. A geometric approach to monitoring threshold functions over distributed data streams. In SIGMOD, pages 301--312, 2006.
[36]
I. Sharfman, A. Schuster, and D. Keren. Shape sensitive geometric monitoring. In PODS, pages 301--310, 2008.
[37]
J. M. Steele. The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities, Chap.1,. Cambridge University Press, New York, NY, USA, 2004.
[38]
M. Tang, F. Li, and Y. Tao. Distributed online tracking. In Proc. of SIGMOD Conference, pages 2047--2061, New York, NY, USA, 2015.
[39]
Q. G. Zhao, M. Ogihara, H. Wang, and J. J. Xu. Finding global icebergs over distributed data sets. In PODS, 2006.

Cited By

View all
  • (2019)LDA classifier monitoring in distributed streaming systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2018.09.017123(156-167)Online publication date: Jan-2019
  • (2019)A survey on quality-assurance approximate stream processing and applicationsFuture Generation Computer Systems10.1016/j.future.2019.07.047101:C(1062-1080)Online publication date: 1-Dec-2019
  • (2019)Enhancing distributed functional monitoring with quantum protocolsQuantum Information Processing10.1007/s11128-019-2484-218:12(1-25)Online publication date: 1-Dec-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. distributed monitoring
  3. sampling

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)3
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)LDA classifier monitoring in distributed streaming systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2018.09.017123(156-167)Online publication date: Jan-2019
  • (2019)A survey on quality-assurance approximate stream processing and applicationsFuture Generation Computer Systems10.1016/j.future.2019.07.047101:C(1062-1080)Online publication date: 1-Dec-2019
  • (2019)Enhancing distributed functional monitoring with quantum protocolsQuantum Information Processing10.1007/s11128-019-2484-218:12(1-25)Online publication date: 1-Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media