Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Time- and Space-Efficient Sliding Window Top-k Query Processing

Published: 25 March 2015 Publication History

Abstract

A sliding window top-k (top-k/w) query monitors incoming data stream objects within a sliding window of size w to identify the k highest-ranked objects with respect to a given scoring function over time. Processing of such queries is challenging because, even when an object is not a top-k/w object at the time when it enters the processing system, it might become one in the future. Thus a set of potential top-k/w objects has to be stored in memory while its size should be minimized to efficiently cope with high data streaming rates. Existing approaches typically store top-k/w and candidate sliding window objects in a k-skyband over a two-dimensional score-time space. However, due to continuous changes of the k-skyband, its maintenance is quite costly. Probabilistic k-skyband is a novel data structure storing data stream objects from a sliding window with significant probability to become top-k/w objects in future. Continuous probabilistic k-skyband maintenance offers considerably improved runtime performance compared to k-skyband maintenance, especially for large values of k, at the expense of a small and controllable error rate. We propose two possible probabilistic k-skyband usages: (i) When it is used to process all sliding window objects, the resulting top-k/w algorithm is approximate and adequate for processing random-order data streams. (ii) When probabilistic k-skyband is used to process only a subset of most recent sliding window objects, it can improve the runtime performance of continuous k-skyband maintenance, resulting in a novel exact top-k/w algorithm. Our experimental evaluation systematically compares different top-k/w processing algorithms and shows that while competing algorithms offer either time efficiency at the expanse of space efficiency or vice-versa, our algorithms based on the probabilistic k-skyband are both time and space efficient.

References

[1]
Christian Bohm, Beng Chin Ooi, Claudia Plant, and Ying Yan. 2007. Efficiently processing continuous kNN queries on data streams. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07). 156--165.
[2]
Amit Chakrabarti, Graham Cormode, and Andrew Mcgregor. 2008a. Robust lower bounds for communication and stream computation. In Proceedings of the 40th ACM Annual Symposium on Theory of Computing (STOC'08). ACM Press, New York, 641--650.
[3]
Amit Chakrabarti, T. S. Jayram, and Mihai Patrascu. 2008b. Tight lower bounds for selection in randomly ordered streams. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'08). 720--729.
[4]
Sharma Chakravarthy and Qingchun Jiang. 2009. Stream Data Processing: A Quality of Service Perspective Modeling, Scheduling, Load Shedding, and Complex Event Processing. 1st Ed. Springer.
[5]
Muhammad Aamir Cheema, Xuemin Lin, Ying Zhang, Wei Wang, and Wenjie Zhang. 2009. Lazy updates: An efficient technique to continuously monitoring reverse kNN. Proc. VLDB Endow. 2, 1, 1138--1149.
[6]
Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Nikos Sarkas. 2007. Ad-hoc top-k query answering for data streams. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). 183--194.
[7]
Lukasz Golab and M. Tamer Ozsu. 2003. Issues in data stream management. SIGMOD Rec. 32, 2, 5--14.
[8]
Sudipto Guha and Andrew Mcgregor. 2006. Approximate quantiles and the order of the stream. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). ACM Press, New York, 273--279.
[9]
Parisa Haghani, Sebastian Michel, and Karl Aberer. 2009. Evaluating top-k queries over incomplete data streams. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). ACM Press, New York, 877--886.
[10]
Parisa Haghani, Sebastian Michel, and Karl Aberer. 2010. The gist of everything new: Personalized top-k processing over Web 2.0 streams. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). ACM Press, New York, 489--498.
[11]
Ihab F. Ilyas, George Beskales, and Mohamed A. Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4.
[12]
Cheqing Jin, Ke Yi, Lei Chen, Jeffrey Xu Yu, and Xuemin Lin. 2010. Sliding-window top-k queries on uncertain streams. VLDB J. 19, 3, 411--435.
[13]
Nick Koudas, Beng Chin Ooi, Kian-Lee Tan, and Rui Zhang. 2004. Approximate NN queries on streams with guaranteed error/performance bounds. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04). 804--815.
[14]
Andrew McGregor. 2008. Recent results on processing random-order streams and space-efficient sampling. In Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing (ALLERTON'08). 206--208.
[15]
Kyriakos Mouratidis, Spiridon Bakiras, and Dimitris Papadias. 2006. Continuous monitoring of top-k queries over sliding windows. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). ACM Press, New York, 635--646.
[16]
Kyriakos Mouratidis and Hweehwa Pang. 2009. An incremental threshold method for continuous text search queries. In Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE'09). 1187--1190.
[17]
Kyriakos Mouratidis and Hweehwa Pang. 2011. Efficient evaluation of continuous text search queries. IEEE Trans. Knowl. Data Engin. 23, 10, 1469--1482.
[18]
Kyriakos Mouratidis and Dimitris Papadias. 2007. Continuous nearest neighbor queries over sliding windows. IEEE Trans. Knowl. Data Engin. 19, 6, 789--803.
[19]
J. Ian Munro and Mike S. Paterson. 1978. Selection and sorting with limited storage. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science (SFCS'78). 253--258.
[20]
Sarana Nutanong, Rui Zhang, Egemen Tanin, and Lars Kulik. 2010. Analysis and evaluation of V*-kNN: An efficient algorithm for moving kNN queries. VLDB J. 19, 3, 307--332.
[21]
Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1, 41--82.
[22]
Kresimir Pripuzic, Ivana Podnar Zarko, and Karl Aberer. 2011. Distributed processing of continuous sliding-window k-NN queries for data stream filtering. World Wide Web 14, 5--6, 465--494.
[23]
Kresimir Pripuzic, Ivana Podnar Zarko, and Karl Aberer. 2008. Top-k/w publish/subscribe: Finding k most relevant publications in sliding time window w. In Proceedings of the 2nd International Conference on Distributed Event-based Systems (DEBS'08). ACM Press, New York, 127--138.
[24]
Weixiong Rao, Lei Chen, Shudong Chen, and Sasu Tarkoma. 2014. Evaluating continuous top-k queries over document streams. World Wide Web 17, 1, 59--83.
[25]
Panayiotis Tsaparas, Themistoklis Palpanas, Yannis Kotidis, Nick Koudas, and Divesh Srivastava. 2003. Ranked join indices. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). 277--288.
[26]
Di Yang, Avani Shastri, Elke A. Rundensteiner, and Matthew O. Ward. 2011. An optimal strategy for monitoring top-k queries in streaming windows. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT'11). ACM Press, New York, 57--68.
[27]
Ying Zhang. 2008. Computing order statistics over data streams. Ph.D. dissertation, University of New South Wales, Sydney, Australia.

Cited By

View all
  • (2022)Approximate Continuous Top-K Queries over Memory Limitation-Based Streaming DataDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_1(3-20)Online publication date: 8-Apr-2022
  • (2020)Continuous top-k approximated join of streaming and evolving distributed dataSemantic Web10.3233/SW-19036711:5(767-799)Online publication date: 1-Jan-2020
  • (2020)Sliding-Window Probabilistic Threshold Aggregate Queries on Uncertain Data StreamsInformation Sciences10.1016/j.ins.2020.02.029Online publication date: Feb-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 40, Issue 1
March 2015
260 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2751312
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2015
Accepted: 01 July 2014
Revised: 01 June 2014
Received: 01 July 2013
Published in TODS Volume 40, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data stream processing
  2. continuous top-k queries
  3. sliding windows

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Approximate Continuous Top-K Queries over Memory Limitation-Based Streaming DataDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_1(3-20)Online publication date: 8-Apr-2022
  • (2020)Continuous top-k approximated join of streaming and evolving distributed dataSemantic Web10.3233/SW-19036711:5(767-799)Online publication date: 1-Jan-2020
  • (2020)Sliding-Window Probabilistic Threshold Aggregate Queries on Uncertain Data StreamsInformation Sciences10.1016/j.ins.2020.02.029Online publication date: Feb-2020
  • (2020)ConclusionRelevant Query Answering over Streaming and Distributed Data10.1007/978-3-030-38339-8_7(115-119)Online publication date: 22-Jan-2020
  • (2020)Handling Top-k QueriesRelevant Query Answering over Streaming and Distributed Data10.1007/978-3-030-38339-8_6(75-114)Online publication date: 22-Jan-2020
  • (2020)BackgroundRelevant Query Answering over Streaming and Distributed Data10.1007/978-3-030-38339-8_2(9-25)Online publication date: 22-Jan-2020
  • (2019)Succinct Summing over Sliding WindowsAlgorithmica10.1007/s00453-018-0524-481:5(2072-2091)Online publication date: 17-May-2019
  • (2018)Energy-aware and quality-driven sensor management for green mobile crowd sensingJournal of Network and Computer Applications10.1016/j.jnca.2015.06.02359:C(95-108)Online publication date: 28-Dec-2018
  • (2018) Anomaly Detection and Big Data in IPTV Networks IPTV Delivery Networks10.1002/9781119397939.ch9(225-244)Online publication date: 10-Apr-2018
  • (2017)Approximate Continuous Top-k Query over Sliding WindowJournal of Computer Science and Technology10.1007/s11390-017-1708-032:1(93-109)Online publication date: 11-Jan-2017
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media