Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Distributed processing of continuous sliding-window k-NN queries for data stream filtering

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aberer, K.: P-grid: a self-organizing access structure for P2P information systems. LNCS 2172, 179–194 (2001)

    Google Scholar 

  2. Balazinska, M., Deshpande, A., Franklin, M.J., Gibbons, P.B., Gray, J., Hansen, M., Liebhold, M., Nath, S., Szalay, A., Tao, V.: Data management in the worldwide Sensor Web. IEEE Pervasive Computing 6(2), 30–40 (2007)

    Article  Google Scholar 

  3. Bell, T.A.H., Moffat, A.: The design of a high performance information filtering system. In: SIGIR, pp. 12–20 (1996)

  4. Böhm, C., Ooi, B.C., Plant, C., Yan, Y.: Efficiently processing continuous k-nn queries on data streams. In: ICDE, pp. 156–165 (2007)

  5. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC (2004)

  6. Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. In: STOC, pp. 641–650 (2008)

  7. Chakrabarti, A., Jayram, T.S., Pǎtraşcu, M.: Tight lower bounds for selection in randomly ordered streams. In: SODA, pp. 720–729 (2008)

  8. Conover, H., Berthiau, G., Botts, M., Goodman, H.M., Li, X., Lu, Y., Maskey, M., Regner, K., Zavodsky, B.: Using Sensor Web protocols for environmental data acquisition and management. Ecol. Informa. 5(1), 32–41 (2010)

    Article  Google Scholar 

  9. Das, G., Gunopulos, D., Koudas, N., Sarkas, N.: Ad-hoc top-k query answering for data streams. In: VLDB, pp. 183–194 (2007)

  10. Guha, S., McGregor, A.: Approximate quantiles and the order of the stream. In: PODS, pp. 273–279 (2006)

  11. Haghani, P., Michel, S., Aberer, K.: The gist of everything new: personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498 (2010)

  12. Halperin, D.: Arrangements. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, chapter 24, pp. 529–562. CRC Press, Boca Raton (2004)

    Google Scholar 

  13. Jin, C., Yi, K., Chen, L., Yu, J.X., Lin, X.: Sliding-window top-k queries on uncertain streams. VLDB J. 19(3), 411–435 (2010)

    Article  Google Scholar 

  14. Koudas, N., Ooi, B.C., Tan, K.L., Zhang, R.: Approximate nn queries on streams with guaranteed error/performance bounds. In: VLDB, pp. 804–815 (2004)

  15. Lua, E.K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and comparison of peer-to-peer overlay network schemes. IEEE Commun. Surv. Tutor. 7, 72–93 (2005)

    Article  Google Scholar 

  16. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29, 319–362 (2004)

    Article  Google Scholar 

  17. Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)

  18. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD, pp. 635–646 (2006)

  19. Mouratidis, K., Pang, H.: An incremental threshold method for continuous text search queries. In: ICDE, pp. 1187–1190 (2009)

  20. Mouratidis, K., Papadias, D.: Continuous nearest neighbor queries over sliding windows. IEEE Trans. Knowl. Data Eng. 19(6), 789–803 (2007)

    Article  Google Scholar 

  21. Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. In: SFCS, pp. 253–258 (1978)

  22. Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)

    Article  MathSciNet  Google Scholar 

  23. Neumann, T., Bender, M., Michel, S., Schenkel, R., Triantafillou, P., Weikum, G.: Optimizing distributed top-k queries. LNCS 5175, 337–349 (2008)

    Google Scholar 

  24. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)

    Article  Google Scholar 

  25. Pripužić, K., Podnar Žarko, I., Aberer, K.: Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w. In: DEBS, pp. 127–138 (2008)

  26. Pripužić, K.: Top-k publish/subscribe matching model based on sliding window. Ph.D. thesis, University of Zagreb (2010). Section 3: Efficient top-k/w processing over data streams

  27. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)

  28. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160 (2001)

  29. Terpstra, W.W., Behnel, S., Fiege, L., Zeidler, A., Buchmann, A.P.: A peer-to-peer approach to content-based publish/subscribe. In: DEBS, pp. 1–8 (2003)

  30. Tryfonopoulos, C., Idreos, S., Koubarakis, M.: Publish/subscribe functionality in IR environments using structured overlay networks. In: SIGIR, pp. 322–329 (2005)

  31. Yu, H., Li, H.G., Wu, P., Agrawal, D., Abbadi, A.E.: Efficient processing of distributed top-k queries. LNCS 3588, 65–74 (2005)

    Google Scholar 

  32. Zhang, J., Suel, T.: Efficient query evaluation on large textual collections in a peer-to-peer environment. In: P2P, pp. 225–233 (2005)

  33. Zhang, Y.: Computing order statistics over data streams. Ph.D. thesis, University of New South Wales (2008)

  34. Zimmer, C., Tryfonopoulos, C., Berberich, K., Koubarakis, M., Weikum, G.: Approximate information filtering in peer-to-peer networks. In: WISE, pp. 6–19 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krešimir Pripužić.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pripužić, K., Podnar Žarko, I. & Aberer, K. Distributed processing of continuous sliding-window k-NN queries for data stream filtering. World Wide Web 14, 465–494 (2011). https://doi.org/10.1007/s11280-011-0125-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0125-5

Keywords

Navigation