Abstract
We present NearBucket-LSH, an effective algorithm for similarity search in large-scale distributed online social networks organized as peer-to-peer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucket-LSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucket-LSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than \(50\,\%\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that in a general c-dimensional CAN of N nodes, the expected routing length is \(c/4\left( N^{1/c}\right) \) [25], which equals k/2 for \(c=k\) and \(N=2^k\).
- 2.
We transform cosine similarity into angular similarity and then apply the success probability formulas.
References
Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25, 211–230 (2001)
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Effects of user similarity in social media. WSDM 2012, pp. 703–712 (2012)
Bahmani, B., Goel, A., Shinde, R.: Efficient distributed locality sensitive hashing. In: CIKM 2012, pp. 2174–2178 (2012)
Batko, M., Novak, D., Falchi, F., Zezula, P.: Scalability comparison of peer-to-peer similarity search structures. Future Gener. Comp. Syst 24(8), 834–848 (2008)
Buchegger, S., Schiöberg, D., Vu, L.H., Datta, A.: PeerSoN: P2P social networking - early experiences and insights. In: SNS 2009, pp. 46–52, 31 March 2009
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC 2002, pp. 380–388 (2002)
Chierichetti, F., Kumar, R.: LSH-preserving functions and their applications. In: SODA 2012, pp. 1078–1094 (2012)
Cutillo, L.A., Molva, R., Önen, M., Safebook: a distributed privacy preserving online social network. In: WOWMOM, pp. 1–3 (2011)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG 2004, pp. 253–262 (2004)
Falchi, F., Gennaro, C., Zezula, P.: A content–addressable network for similarity search in metric spaces. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005-2006. LNCS, vol. 4125, pp. 98–110. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71661-7_9
Friendster. http://www.friendster.com/
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, pp. 518–529 (1999)
Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In EDBT 2009, pp. 744–755 (2009)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, pp. 604–613 (1998)
Livejournal. http://www.livejournal.com/
Lua, E.K., Crowcroft, J., Pias, M., Sharma, R., Lim, S.: A survey and comparison of peer-to-peer overlay network schemes. IEEE Commun. Surv. Tutorials 7, 72–93 (2005)
Lucene. http://lucene.apache.org/core/
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB 2007, pp. 950–961 (2007)
Mani, M., Nguyen, A.-M., Crespi, N.: Scope: a prototype for spontaneous P2P social networking. In: PerCom Workshops, pp. 220–225 (2010)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27, 415–444 (2001)
Narendula, R., Papaioannou, T.G., Aberer, K.: Towards the realization of decentralized online social networks: an empirical study. In: ICDCS Workshops, pp. 155–162 (2012)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: SIGCOMM 2001, pp. 161–172, New York, NY, USA (2001)
Sundaram, N., Turmukhametova, A., Satish, N., Mostak, T., Indyk, P., Madden, S., Dubey, P.: Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow. 6(14), 1930–1941 (2013)
TarsosLSH. https://github.com/jorensix/tarsoslsh
Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: WWW 2010, pp. 981–990 (2010)
Yang, J., Leskovec, J.: Defining, evaluating network communities based on ground-truth. In: MDS 2012, pp. 3: 1–3: 8 (2012)
Acknowledgments
Naama Kraus is grateful to the Hasso-Plattner-Institut (HPI) for the scholarship for doctoral studies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kraus, N., Carmel, D., Keidar, I., Orenbach, M. (2016). NearBucket-LSH: Efficient Similarity Search in P2P Networks. In: Amsaleg, L., Houle, M., Schubert, E. (eds) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science(), vol 9939. Springer, Cham. https://doi.org/10.1007/978-3-319-46759-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-46759-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46758-0
Online ISBN: 978-3-319-46759-7
eBook Packages: Computer ScienceComputer Science (R0)