Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1935826.1935860acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

KMV-peer: a robust and adaptive peer-selection algorithm

Published: 09 February 2011 Publication History

Abstract

The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be applied in other setups. Peers publish per-term summaries of their collections. Specifically, for each term, the range of document scores is divided into intervals; and for each interval, a KMV (K Minimal Values) synopsis of its documents is created. A new peer-selection algorithm uses the KMV synopses and two scoring functions in order to adaptively rank the peers, according to the relevance of their documents to a given query. The proposed method achieves high-quality results while meeting the above criteria of efficiency. In particular, experiments are done on two large, real-world datasets; one is blogs and the other is web data. These experiments show that the algorithm outperforms the state-of-the-art approaches and is robust over different collections, various scoring functions and multi-term queries.

Supplementary Material

JPG File (wsdm2011_mass_kmv_01.jpg)
MP4 File (wsdm2011_mass_kmv_01.mp4)

References

[1]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Plachouras, and L. Telloli. On the feasibility of multi-site web search engines. In CIKM, 2009.
[2]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1--10, 2002.
[3]
L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. In IEEE Micro, 23(2), 2003.
[4]
L. A. Barroso and U. Hölzle. The datacenter as a computer -- an introduction to the design of warehouse-scale machines. 2009.
[5]
M. Bender, S. Michel, G. Weikum, and C. Zimmer. The minerva project: Database selection in the context of p2p search. In BTW, pages 125--144, 2005.
[6]
K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD, 2007.
[7]
J. Callan. Distributed information retrieval. In Advances in information retrieval, pages 127--150, 2000.
[8]
P. Cao and Z. Wang. Efficient top-k query calculations in distributed networks. In PODC, pages 206--215, 2004.
[9]
F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen. Planetp: using gossiping to build content addressable peer-to-peer information sharing communities. In HPDC, pages 236--246, 2003.
[10]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS '01, 2001.
[11]
K. Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR, 2000.
[12]
Jie-Lu. Full-text federated search in peer-to-peer networks. In ACM SIGIR Forum, volume 41, 2007.
[13]
D. E. Knuth. Sorting and searching. The Art of Computer Programming, 3, 1973.
[14]
J. Lu and J. Callan. User modeling for full-text federated search in peer-to-peer networks. In SIGIR, pages 332--339, 2006.
[15]
Y. Mass, Y. Sagiv, and M. Shmueli--Scheuer. A scalable and effective full-text search in p2p networks. In CIKM, Nov. 2009.
[16]
Y. Mass, Y. Sagiv, and M. Shmueli-Scheuer. A peer-selection algorithm for information retrieval. In CIKM, Oct. 2010.
[17]
S. Michel, M. Bender, N. Ntarmos, P. Triantafillou, G. Weikum, and C. Zimmer. Discovering and exploiting keyword and attribute-value co-occurrences to improve p2p routing indices. In CIKM, pages 172--181, 2006.
[18]
L. T. Nguyen, W. G. Yee, and O. Frieder. Query workload driven summarization for p2p query routing. In P2P, 2008.
[19]
I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer. Scalable peer-to-peer web retrieval with highly discriminative keys. In ICDE, pages 1096--1105, 2007.
[20]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.
[21]
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Lecture Notes in Computer Science, pages 329--350, 2001.
[22]
M. Shokouhi. Central-rank-based collection selection in uncooperative distributed information retrieval. In ECIR, June 2007.
[23]
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR, July 2003.
[24]
G. Skobeltsyn, T. Luu, I. Podnar arko, M. Rajman, and K. Aberer. Query-driven indexing for scalable peer-to-peer text retrieval. Future Gener. Comput. Syst., 25(1):89--99, 2009.
[25]
G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Web text retrieval with a p2p query-driven index. In SIGIR, 2007.
[26]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. SIGCOMM, 2001.
[27]
W. G. Yee, L. T. Nguyen, D. Jia, and O. Frieder. Efficient query routing by improved peer description in p2p networks. In InfoScale, pages 1--10, 2008.
[28]
C. Zimmer, C. Tryfonopoulos, and G. Weikum. Exploiting correlated keywords to improve approximate information filtering. In SIGIR, July 2008.

Cited By

View all
  • (2013)Supporting Similarity Range Queries Efficiently by Using Reference Points in Structured P2P OverlaysAdvances in Intelligent Systems and Applications - Volume 110.1007/978-3-642-35452-6_65(645-652)Online publication date: 2013

Index Terms

  1. KMV-peer: a robust and adaptive peer-selection algorithm

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
    February 2011
    870 pages
    ISBN:9781450304931
    DOI:10.1145/1935826
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. kmv
    2. p2p search
    3. top-k

    Qualifiers

    • Research-article

    Conference

    Acceptance Rates

    WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)Supporting Similarity Range Queries Efficiently by Using Reference Points in Structured P2P OverlaysAdvances in Intelligent Systems and Applications - Volume 110.1007/978-3-642-35452-6_65(645-652)Online publication date: 2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media