Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/860435.860491acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

SETS: search enhanced by topic segmentation

Published: 28 July 2003 Publication History

Abstract

We present SETS, an architecture for efficient search in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arrange participating sites in a topic-segmented overlay topology in which most connections are short-distance, connecting pairs of sites with similar content. Topically focused sets of sites are then joined together into a single network by long-distance links. Queries are matched and routed to only the topically closest regions. We discuss a variety of design issues and tradeoffs that an implementor of SETS would face. We show that SETS is efficient in network traffic and query processing load.

References

[1]
D. Barbara and C. Clifton. Information brokers: Sharing knowledge in a heterogeneous distributed system. In Proc. 4th Conf. on Database and Expert Systems Applications (DEXA), pages 80--91, 1993.
[2]
C. M. Bowman, P. B. Danzig, D. Hardy, U. Manber, M. F. Schwartz, and D. P. Wessels. Harvest: A scalable, customizable discovery and access system. Computer Networks and ISDN Systems, 28(1--2):119--125, 1995.
[3]
B. Cahoon and K. S. McKinley. Performance evaluation of a distributed architecture for information retrieval. In Proc. 19th ACM Conf. on Inform. Retrieval (SIGIR), pages 110--118, 1996.
[4]
J. Callan. Distributed information retrieval. Advances in Information Retrieval, pages 127--150, 2000.
[5]
J. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In Proc. 18th ACM Conf. on Inform. Retrieval (SIGIR), pages 21--28, 1995.
[6]
Citeseer: Scientific literature digital library (http://citeseer.nj.nec.com/cs).
[7]
E. Cohen, H. Kaplan, and A. Fiat. Associative search in peer-to-peer networks: Harnessing latent semantics. In Proc. IEEE Infocom, 2003.
[8]
J. G. Conrad, X. S. Guo, P. Jackson, and M. Meziou. Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment. In Proc. 28th Conf. on Very Large Data Bases (VLDB), pages 71--82, 2002.
[9]
A. Crespo and H. Garcia-Molina. Routing indices for peer-to-peer networks. In Proc. Intl. Conf. on Distributed Computing Systems (ICDCS), pages 23--34, 2002.
[10]
P. B. Danzig, J. S. Ahn, J. Noll, and K. Obraczka. Distributed indexing: A technique for scalable, distributed information retrieval systems. In Proc. 14th ACM Conf. on Information Retrieval (SIGIR), pages 220--229, 1991.
[11]
S. Feld. Social structural determinants of similarity among associates. In American Sociological Review (47), 1982.
[12]
J. French, A. Powell, J. Callan, C. Viles, T. Emmitt, K. Prey, and Y. Mou. Comparing the performance of database selection algorithms. In Proc. 22nd ACM Conf. on Information Retrieval (SIGIR), pages 238--245, 1999.
[13]
S. Gauch and J. Wang. A corpus analysis approach for automatic query expansion. In Proc. 6th Conf. on Information and Knowledge Management (CIKM), pages 278--284, 1997.
[14]
M. S. Granovetter. The strength of weak ties: A network theory revisited. In Sociological Theory (1), 1983.
[15]
L. Gravano, H. Garcia-Molina, and A. Tomasic. Gloss: Text-source discovery over the internet. ACM Transactions of Database Systems, 24(2):229--264, 1999.
[16]
J. Kleinberg. The small-world phenomenon: An algorithmic perspective. In Proc. 32nd ACM Symposium on Theory of Computing (STOC), pages 163--170, 2000.
[17]
G. S. Manku. Routing networks for distributed hash tables. In Proc. 22nd ACM PODC, 2003.
[18]
G. S. Manku, M. Bawa, and P. Raghavan. Symphony: Distributed hashing in a small world. In Proc. 4th USENIX Symposium on Internet Technologies and Systems (USITS), pages 127--140, 2003.
[19]
S. Milgram. The small world problem. In Psychology Today 1(67), 1967.
[20]
S. Milliner, M. Papazoglou, and H. Weigand. Linguistic tool based information elicitation in large heterogeneous database networks. In Proc. Workshop on Natural Language and Databases (NLDB), 1996.
[21]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, June, 1995.
[22]
C. H. Ng and K. C. Sia. Peer clustering and firework query model. In Poster in 11th Conf. on World Wide Web (WWW), 2002.
[23]
J. J. Ordille and B. P. Miller. Distributed active catalogs and meta-data caching in descriptive name services. In Proc. Conf. on Distributed Computing Systems (ICDCS), pages 120--129, 1993.
[24]
M. P. Papazoglou, H. A. Proper, and J. Yang. Landscaping the information space of large multi-database networks. Data Knowledge Engineering, 36(3):251--281, 2001.
[25]
S. Ratnasamy, P. Francis, M. Handley, and R. M. Karp. A Scalable Content-Addressable Network (CAN). In Proc. ACM SIGCOMM, pages 161--172, 2001.
[26]
H. Schutze and C. Silverstein. Projections for efficient document clustering. In Proc. 20th ACM Conf. on Information Retrieval (SIGIR), pages 74--81, 1997.
[27]
M. Sheldon, A. Duda, R. Weiss, Jr. O'Toole, and D. J. Gifford. A content routing system for distributed information systems. In Proc. 4th Intl. Conf. on Extending Database Technology (EDBT), pages 109--122, 1994.
[28]
L. Si, R. Jin, J. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In Proc. 11th ACM Conf. on Information and Knowledge Management (CIKM), pages 391--397, 2002.
[29]
P. Simpson. Query processing in a heterogeneous retrieval network. In Proc. 11th ACM Conf. on Information Retrieval (SIGIR), pages 359--370, 1988.
[30]
I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. ACM SIGCOMM, pages 149--160, 2001.
[31]
C. Tang, Z. Xu, and M. Mahalingam. Peersearch: Efficient information retrieval in peer-to-peer networks. In HotNets-I, 2002.
[32]
C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.
[33]
C. L. Viles and J. C. French. Dissemination of collection wide information in a distributed information retrieval system. In Proc. 18th ACM Conf. on Information Retrieval (SIGIR), pages 12--20, 1995.
[34]
E. Voorhees, N. Gupta, and B. Johnson-Laird. Learning collection fusion strategies. In Proc. 18th ACM Conf. on Information Retrieval (SIGIR), pages 172--179, 1995.
[35]
R. Weiss, B. Velez, M. A. Sheldon, C. Nemprempre, P. Szilagyi, A. Duda, and D. K. Gifford. Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proc. 7th ACM Conf. on Hypertext, pages 180--193, 1996.
[36]
J. Xu and J. Callan. Effective retrieval of distributed collections. In Proc. 21st ACM Conf. on Information Retrieval (SIGIR), pages 112--120, 1998.
[37]
J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In Proc. 22nd ACM Conf. on Information Retrieval (SIGIR), pages 254--261, 1999.
[38]
B. Yang and H. Garcia-Molina. Improving search in peer-to-peer networks. In Proc. 22nd Intl. Conf. on Distributed Computing Systems (ICDCS), 2002.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
July 2003
490 pages
ISBN:1581136463
DOI:10.1145/860435
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed information retrieval
  2. peer-to-peer (P2P)
  3. small world networks
  4. topic segments
  5. topic-driven query routing

Qualifiers

  • Article

Conference

SIGIR03
Sponsor:

Acceptance Rates

SIGIR '03 Paper Acceptance Rate 46 of 266 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Permutable compiled queriesProceedings of the VLDB Endowment10.14778/3425879.342588214:2(101-113)Online publication date: 16-Nov-2020
  • (2020)Resource discovery mechanisms in pure unstructured peer-to-peer systems: a comprehensive surveyPeer-to-Peer Networking and Applications10.1007/s12083-020-01027-9Online publication date: 26-Nov-2020
  • (2019)ObliDBProceedings of the VLDB Endowment10.14778/3364324.336433113:2(169-183)Online publication date: 1-Oct-2019
  • (2018)An Efficient Content Search Method Based on Local Link Replacement in Unstructured Peer-to-Peer NetworksIEICE Transactions on Communications10.1587/transcom.2017EBP3024E101.B:3(740-749)Online publication date: 2018
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 10th International Conference on Communication Software and Networks (ICCSN)10.1109/ICCSN.2018.8488222(189-194)Online publication date: Jul-2018
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 13th APCA International Conference on Control and Soft Computing (CONTROLO)10.1109/CONTROLO.2018.8439791(189-194)Online publication date: Jun-2018
  • (2017)Distributed Search Efficiency and Robustness in Service oriented Multi-agent NetworksProceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences10.1145/3034950.3034975(9-18)Online publication date: 14-Jan-2017
  • (2016)Fusion feature for LSH-based image retrieval in a cloud datacenterMultimedia Tools and Applications10.1007/s11042-015-2892-y75:23(15405-15427)Online publication date: 1-Dec-2016
  • (2015)Stale view cleaningProceedings of the VLDB Endowment10.14778/2824032.28240378:12(1370-1381)Online publication date: 1-Aug-2015
  • (2015)Distributed directory system: A healthcare use case for rural areas2015 Latin American Computing Conference (CLEI)10.1109/CLEI.2015.7360038(1-10)Online publication date: Oct-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media