Abstract
Preference query processing is important for a wide range of applications involving distributed databases, such as network monitoring, web-based systems, and market analysis. In such applications, data objects are generated frequently and massively, which presents an important and challenging problem of continuous query processing over distributed data stream environments. A top-k dominating query, which has been receiving much research attention recently, returns the k data objects that dominate the highest number of data objects in a given dataset, and due to its dominance-based ranking function, we can easily obtain superior data objects. An emerging requirement in distributed stream environments is an efficient technique for continuously monitoring top-k dominating data objects. Despite of this fact, no study has addressed this problem. In this paper, therefore, we address the problem of continuous top-k dominating query processing over distributed data stream environments. We present two algorithms that monitor the exact top-k dominating data and efficiently eliminate unqualified data objects for the result, which reduces both communication and computation costs. In addition to these algorithms, we present an approximate algorithm that further reduces both communication and computation costs. Extensive experiments on both synthetic and real data have demonstrated the efficiency and scalability of our algorithms.
Similar content being viewed by others
References
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of PODS, pp. 1–16. Springer, New York (2002)
Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceedings of SIGMOD, pp. 28–39. Springer, San Diego (2003)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of ICDE, pp. 421–430. Springer, New York (2001)
Chan, C.Y., Jagadish, H., Tan, K.L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of SIGMOD, pp. 503–514. Springer, New York (2006)
Cheema, M.A., Lin, X., Zhang, W., Zhang, Y.: A safe zone based approach for monitoring moving skyline queries. In: Proceedings of EDBT, pp. 275–286. IEEE Press, Geneva (2013)
Das, G., Gunopulos, D., Koudas, N., Sarkas, N.: Ad-hoc top-k query answering for data streams. In: Proceedings of VLDB, pp. 183–194. ACM, New York (2007)
Garofalakis, M., Keren, D., Samoladas, V.: Sketch-based geometric monitoring of distributed stream queries. PVLDB 6(10), 937–948 (2013)
Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Prediction-based geometric monitoring over distributed data streams. In: Proceedings of SIGMOD, pp. 265–276. ACM Press, Scottsdale (2012)
Han, X., Li, J., Gao, H.: Tdep: Efficiently Processing Top-k Dominating Query on Massive Data, pp. 1–30. Knowledge and Information Systems, Springer, Berlin (2014)
He, Z., Lo, E.: Answering why-not questions on top-k queries. In: Proceedings of ICDE, pp. 750–761. Springer, New York (2012)
Hose, K., Vlachou, A.: A survey of skyline processing in highly distributed environments. VLDB J. 21(3), 359–384 (2012)
Huang, Q., Lee, P.P.: Ld-sketch: A distributed sketching design for accurate and scalable anomaly detection in network data streams. In: Proceedings of INFOCOM, pp. 1420–1428. IEEE Press, Geneva (2014)
Huang, Z., Lu, H., Ooi, B.C., Tung, A.: Continuous skyline queries for moving objects. IEEE TKDE 18(12), 1645–1658 (2006)
Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous top-k dominating queries in subspaces. In: Panhellenic Conference on Informatics, pp. 31–35. IEEE Press, New York (2008)
Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous top-k dominating queries. IEEE TKDE 24(5), 840–853 (2012)
Lee, Y.W., Lee, K.Y., Kim, M.H.: Efficient processing of multiple continuous skyline queries over a data stream. Inf. Sci. 221, 316–337 (2013)
Lian, X., Chen, L.: Top-k dominating queries in uncertain databases. In: Proceedings of EDBT, pp. 660–671. IEEE Press, New York (2009)
Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of ICDE, pp. 502–513. IEEE Press, San Diego (2005)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: Proceedings of SIGMOD, pp. 86–95. Springer, Heidelberg (2007)
Lu, H., Zhou, Y., Haustad, J.: Continuous skyline monitoring over distributed data streams. In: SSDBM, pp. 565–583. Springer, Heidelberg (2010)
Lu, H., Zhou, Y., Haustad, J.: Efficient and scalable continuous skyline monitoring in two-tier streaming settings. Inf. Syst. 38(1), 68–81 (2013)
Morse, M., Patel, J.M., Grosky, W.I.: Efficient continuous skyline computation. Inf. Sci. 177(17), 3411–3437 (2007)
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD, pp. 635–646. ACM New York (2006)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)
Papapetrou, O., Garofalakis, M.: Continuous fragmented skylines over distributed streams. In: Proceedings of ICDE, pp. 124–135. ACM Press, New York (2014)
Papapetrou, O., Garofalakis, M., Deligiannakis, A.: Sketch-based querying of distributed sliding-window data streams. PVLDB 5(10), 992–1003 (2012)
Santoso, B., Chiu, G.: Close dominance graph: an efficient framework for answering continuous top-k dominating queries. IEEE TKDE 26(8), 1853–1865 (2014)
Shen, Z., Cheema, M.A., Lin, X., Zhang, W., Wang, H.: Efficiently monitoring top-k pairs over sliding windows. In: Proceedings of ICDE, pp. 798–809. Springer, New York (2012)
Skoutas, D., Sacharidis, D., Simitsis, A., Kantere, V., Sellis, T.: Top-k dominant web services under multi-criteria matching. In: Proceedings of EDBT, pp. 898–909. Springer, New York (2009)
Sun, S., Huang, Z., Zhong, H., Dai, D., Liu, H., Li, J.: Efficient monitoring of skyline queries over distributed data streams. Knowl. Inf. Syst. 25(3), 575–606 (2010)
Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE TKDE 18(3), 377–391 (2006)
Tiakas, E., Valkans, G., Papadopoulos, A.N., Manolopoulos, Y., Gunopoulos, D.: Metric-based top-k dominating queries. In: Proceedings of EDBT, pp. 415–426. IEEE Press, San Diego (2014)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of SIGMOD, pp. 753–764. Springer, New York (2008)
Xie, X., Lu, H., Chen, J., Shang, S.: Top-k neighborhood dominating query. In: Proceedings of DASFAA, pp. 131–145. Springer, Berlin (2013)
Yang, D., Shastri, A., Rundensteiner, E.A., Ward, M.O.: An optimal strategy for monitoring top-k queries in streaming windows. In: Proceedings of EDBT, pp. 57–68. ACM, New York (2011)
Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of VLDB, pp. 483–494. Springer, New York (2007)
Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)
Yu, A., Agarwal, P., Yang, J.: Processing a large number of continuous preference top-k queries. In: Proceedings of SIGMOD, pp. 397–408. ACM Press, New York (2012)
Zhang, W., Lin, X., Zhang, Y., Pei, J., Wang, W.: Threshold-based probabilistic top-k dominating queries. VLDB J. 19(2), 283–305 (2010)
Zhang, W., Lin, X., Zhang, Y., Pei, J., Wang, W.: Progressive processing of subspace dominating queries. VLDB J. 20(6), 921–948 (2011)
Zhang, Z., Cheng, R., Papadias, D., Tung, A.K.: Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of SIGMOD, pp. 495–508. ACM Press, New York (2009)
Acknowledgments
This research is partially supported by the Grant-in-Aid for Scientific Research (A)(26240013) of MEXT, Japan, and JST, Strategic International Collaborative Research Program, SICORP.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amagata, D., Hara, T. & Nishio, S. Sliding window top-k dominating query processing over distributed data streams. Distrib Parallel Databases 34, 535–566 (2016). https://doi.org/10.1007/s10619-015-7187-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-015-7187-9