research-article

KMV-peer: a robust and adaptive peer-selection algorithm

Authors:

Yehoshua Sagiv,

Michal Shmueli-ScheuerAuthors Info & Claims

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

Pages 157 - 166

https://doi.org/10.1145/1935826.1935860

Published: 09 February 2011 Publication History

Abstract

The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be applied in other setups. Peers publish per-term summaries of their collections. Specifically, for each term, the range of document scores is divided into intervals; and for each interval, a KMV (K Minimal Values) synopsis of its documents is created. A new peer-selection algorithm uses the KMV synopses and two scoring functions in order to adaptively rank the peers, according to the relevance of their documents to a given query. The proposed method achieves high-quality results while meeting the above criteria of efficiency. In particular, experiments are done on two large, real-world datasets; one is blogs and the other is web data. These experiments show that the algorithm outperforms the state-of-the-art approaches and is robust over different collections, various scoring functions and multi-term queries.

Supplementary Material

JPG File (wsdm2011_mass_kmv_01.jpg)

Download
10.04 KB

MP4 File (wsdm2011_mass_kmv_01.mp4)

Download
197.12 MB

References

[1]

R. Baeza-Yates, A. Gionis, F. Junqueira, V. Plachouras, and L. Telloli. On the feasibility of multi-site web search engines. In CIKM, 2009.

Digital Library

[2]

Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1--10, 2002.

Digital Library

[3]

L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. In IEEE Micro, 23(2), 2003.

Digital Library

[4]

L. A. Barroso and U. Hölzle. The datacenter as a computer -- an introduction to the design of warehouse-scale machines. 2009.

Digital Library

[5]

M. Bender, S. Michel, G. Weikum, and C. Zimmer. The minerva project: Database selection in the context of p2p search. In BTW, pages 125--144, 2005.

[6]

K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD, 2007.

Digital Library

[7]

J. Callan. Distributed information retrieval. In Advances in information retrieval, pages 127--150, 2000.

[8]

P. Cao and Z. Wang. Efficient top-k query calculations in distributed networks. In PODC, pages 206--215, 2004.

Digital Library

[9]

F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen. Planetp: using gossiping to build content addressable peer-to-peer information sharing communities. In HPDC, pages 236--246, 2003.

Digital Library

[10]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS '01, 2001.

Digital Library

[11]

K. Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR, 2000.

Digital Library

[12]

Jie-Lu. Full-text federated search in peer-to-peer networks. In ACM SIGIR Forum, volume 41, 2007.

Digital Library

[13]

D. E. Knuth. Sorting and searching. The Art of Computer Programming, 3, 1973.

[14]

J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR, pages 332--339, 2006.

Digital Library

[15]

Y. Mass, Y. Sagiv, and M. Shmueli--Scheuer. A scalable and effective full-text search in p2p networks. In CIKM, Nov. 2009.

Digital Library

[16]

Y. Mass, Y. Sagiv, and M. Shmueli-Scheuer. A peer-selection algorithm for information retrieval. In CIKM, Oct. 2010.

Digital Library

[17]

S. Michel, M. Bender, N. Ntarmos, P. Triantafillou, G. Weikum, and C. Zimmer. Discovering and exploiting keyword and attribute-value co-occurrences to improve p2p routing indices. In CIKM, pages 172--181, 2006.

Digital Library

[18]

L. T. Nguyen, W. G. Yee, and O. Frieder. Query workload driven summarization for p2p query routing. In P2P, 2008.

Digital Library

[19]

I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer. Scalable peer-to-peer web retrieval with highly discriminative keys. In ICDE, pages 1096--1105, 2007.

[20]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.

[21]

A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Lecture Notes in Computer Science, pages 329--350, 2001.

Digital Library

[22]

M. Shokouhi. Central-rank-based collection selection in uncooperative distributed information retrieval. In ECIR, June 2007.

Digital Library

[23]

L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR, July 2003.

Digital Library

[24]

G. Skobeltsyn, T. Luu, I. Podnar arko, M. Rajman, and K. Aberer. Query-driven indexing for scalable peer-to-peer text retrieval. Future Gener. Comput. Syst., 25(1):89--99, 2009.

Digital Library

[25]

G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Web text retrieval with a p2p query-driven index. In SIGIR, 2007.

Digital Library

[26]

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. SIGCOMM, 2001.

Digital Library

[27]

W. G. Yee, L. T. Nguyen, D. Jia, and O. Frieder. Efficient query routing by improved peer description in p2p networks. In InfoScale, pages 1--10, 2008.

Digital Library

[28]

C. Zimmer, C. Tryfonopoulos, and G. Weikum. Exploiting correlated keywords to improve approximate information filtering. In SIGIR, July 2008.

Digital Library

Cited By

Lee GChen YLee C(2013)Supporting Similarity Range Queries Efficiently by Using Reference Points in Structured P2P OverlaysAdvances in Intelligent Systems and Applications - Volume 110.1007/978-3-642-35452-6_65(645-652)Online publication date: 2013
https://doi.org/10.1007/978-3-642-35452-6_65

Index Terms

KMV-peer: a robust and adaptive peer-selection algorithm
1. Information systems
  1. Information retrieval

Recommendations

A peer-selection algorithm for information retrieval
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

A novel method for creating collection summaries is developed, and a fully decentralized peer-selection algorithm is described. This algorithm finds the most promising peers for answering a given query. Specifically, peers publish per-term synopses of ...
Spam characterization and detection in peer-to-peer file-sharing systems
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Spam is highly pervasive in P2P file-sharing systems and is difficult to detect automatically before actually downloading a file due to the insufficient and biased description of a file returned to a client as a query result. To alleviate this problem, ...
A Novel Top-k Query Scheme in Unstructured P2P Networks
CIT '09: Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02

There're two major problems in the unstructured p2p systems, one is their heavy network traffic; the other is the problem of query effectiveness, which is caused mainly by high numbers of query answers, many of which are irrelevant for users. A ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

February 2011

870 pages

ISBN:9781450304931

DOI:10.1145/1935826

General Chair:
Irwin King
CUHK, Hong Kong
,
Program Chairs:
Wolfgang Nejdl
L3S and University of Hannover, Germany
,
Hang Li
Microsoft Research Asia, China

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM'11

Sponsor:

WSDM'11: Fourth ACM International Conference on Web Search and Data Mining

February 9 - 12, 2011

Hong Kong, China

Acceptance Rates

WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
307
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee GChen YLee C(2013)Supporting Similarity Range Queries Efficiently by Using Reference Points in Structured P2P OverlaysAdvances in Intelligent Systems and Applications - Volume 110.1007/978-3-642-35452-6_65(645-652)Online publication date: 2013
https://doi.org/10.1007/978-3-642-35452-6_65

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents