Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1390334.1390390acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Exploiting correlated keywords to improve approximate information filtering

Published: 20 July 2008 Publication History

Abstract

Information filtering, also referred to as publish/subscribe, complements one-time searching since users are able to subscribe to information sources and be notified whenever new documents of interest are published. In approximate information filtering only selected information sources, that are likely to publish documents relevant to the user interests in the future, are monitored. To achieve this functionality, a subscriber exploits statistical metadata to identify promising publishers and index its continuous query only in those publishers. The statistics are maintained in a directory, usually on a per-keyword basis, thus disregarding possible correlations among keywords. Using this coarse information, poor publisher selection may lead to poor filtering performance and thus loss of interesting documents.1
Based on the above observation, this work extends query routing techniques from the domain of distributed information retrieval in peer-to-peer (P2P) networks, and provides new algorithms for exploiting the correlation among keywords in a filtering setting. We develop and evaluate two algorithms based on single-key and multi-key statistics and utilize two different synopses (Hash Sketches and KMV synopses) to compactly represent publishers. Our experimental evaluation using two real-life corpora with web and blog data demonstrates the filtering effectiveness of both approaches and highlights the different tradeoffs.

References

[1]
K. Aberer. P-Grid: A Self-Organizing Access Structure for P2P Information Systems. In CoopIS, 2001.]]
[2]
I. Aekaterinidis and P. Triantafillou. Internet Scale String Attribute Publish/Subscribe Data Networks. In CIKM, 2005.]]
[3]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining Association Rules between Sets of Items in Large Databases. In SIGMOD, 1993.]]
[4]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting Distinct Elements in a Data Stream. In RANDOM, 2002.]]
[5]
M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer. Improving Collection Selection with Overlap Awareness in P2P Search Engines. In SIGIR, 2005.]]
[6]
K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for Distinct-Value Estimation Under Multiset Operations. In SIGMOD, 2007.]]
[7]
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 1970.]]
[8]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-Wise Independent Permutations. JCSS, 2000.]]
[9]
J. P. Callan, Z. Lu, and W. B. Croft. Searching Distributed Collections with Inference Networks. In SIGIR, 1995.]]
[10]
G. Cormode and M. N. Garofalakis. Sketching Streams Through the Net: Distributed Approximate Query Tracking. In VLDB, 2005.]]
[11]
M. Durand and P. Flajolet. Loglog Counting of Large Cardinalities (Extended Abstract). In ESA, 2003.]]
[12]
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing Iceberg Queries Efficiently. In VLDB, 1998.]]
[13]
P. Flajolet and G. N. Martin. Probabilistic Counting Algorithms for Data Base Applications. JCSS, 1985.]]
[14]
N. Fuhr. A decision-theoretic approach to database selection in networked ir. ACM TOIS, 1999.]]
[15]
S. Idreos, M. Koubarakis, and C. Tryfonopoulos. P2P-DIET: An Extensible P2P Service that Unifies Ad-hoc and Continuous Querying in Super-Peer Networks. In SIGMOD, 2004.]]
[16]
V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently Estimating the Selectivity of Conjuncts of Predicates. In VLDB, 2005.]]
[17]
W. Meng, C. T. Yu, and K.-L. Liu. Building E±cient and Effective Metasearch Engines. ACM Computing Surveys, 2002.]]
[18]
S. Michel, M. Bender, N. Ntarmos, P. Triantafillou, G. Weikum, and C. Zimmer. Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to Improve P2P Routing Indices. In CIKM, 2006.]]
[19]
T. Neumann, M. Bender, S. Michel, and G. Weikum. A Reproducible Benchmark for P2P Retrieval. In ExpDB, 2006.]]
[20]
P. R. Pietzuch and J. Bacon. Hermes: A Distributed Event-Based Middleware Architecture. In DEBS, 2002.]]
[21]
I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer. Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys. In ICDE, 2007.]]
[22]
A. I. T. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel. SCRIBE: The Design of a Large-Scale Event Notification Infrastructure. In Networked Group Communication, 2001.]]
[23]
G. Skobeltsyn, T. Luu, I. P. Zarko, M. Rajman, and K. Aberer. Web Text Retrieval with a P2P Query-Driven Index. In SIGIR, 2007.]]
[24]
I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In SIGCOMM, 2001.]]
[25]
C. Tang and Z. Xu. pFilter: Global Information Filtering and Dissemination Using Structured Overlay Networks. In FTDCS, 2003.]]
[26]
C. Tryfonopoulos, S. Idreos, and M. Koubarakis. LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs. In ECDL, 2005.]]
[27]
C. Tryfonopoulos, C. Zimmer, G. Weikum, and M. Koubarakis. Architectural Alternatives for Information Filtering in Structured Overlays. IEEE Internet Computing (IC), 2007.]]
[28]
T. Yan and H. Garcia-Molina. The SIFT Information Dissemination System. ACM TODS, 1999.]]
[29]
C. Zimmer, C. Tryfonopoulos, and G. Weikum. MinervaDL: An Architecture for Information Retrieval and Filtering in Distributed Digital Libraries. In ECDL, 2007.]]

Cited By

View all
  • (2022)A BTM-Based Adaptive Objectionable Short Text Filtering FrameworkWireless Communications and Mobile Computing10.1155/2022/66683442022(1-12)Online publication date: 22-Jan-2022
  • (2019)Query Expansion for Effective Retrieval Results of Hindi–English Cross-Lingual IRApplied Artificial Intelligence10.1080/08839514.2019.157701833:7(567-593)Online publication date: 8-Apr-2019
  • (2017)Query expansion based on term selection for Hindi – English cross lingual IRJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2017.09.002Online publication date: Sep-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Peer-to-Peer (P2P)
  2. approximate publish/subscribe
  3. distinct-value (DV) estimation
  4. distributed information filtering (IF)
  5. information systems

Qualifiers

  • Research-article

Conference

SIGIR '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A BTM-Based Adaptive Objectionable Short Text Filtering FrameworkWireless Communications and Mobile Computing10.1155/2022/66683442022(1-12)Online publication date: 22-Jan-2022
  • (2019)Query Expansion for Effective Retrieval Results of Hindi–English Cross-Lingual IRApplied Artificial Intelligence10.1080/08839514.2019.157701833:7(567-593)Online publication date: 8-Apr-2019
  • (2017)Query expansion based on term selection for Hindi – English cross lingual IRJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2017.09.002Online publication date: Sep-2017
  • (2014)Selectivity estimation on streaming spatio-textual data using local correlationsProceedings of the VLDB Endowment10.14778/2735471.27354728:2(101-112)Online publication date: 1-Oct-2014
  • (2012)A Survey of Automatic Query Expansion in Information RetrievalACM Computing Surveys10.1145/2071389.207139044:1(1-50)Online publication date: 1-Jan-2012
  • (2011)KMV-peerProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935860(157-166)Online publication date: 9-Feb-2011
  • (2010)Peer-to-peer web searchFrom active data management to event-based systems and more10.5555/1985625.1985641(195-208)Online publication date: 1-Jan-2010
  • (2010)A peer-selection algorithm for information retrievalProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871682(1601-1604)Online publication date: 26-Oct-2010
  • (2010)Peer-to-Peer Web Search: Euphoria, Achievements, Disillusionment, and Future OpportunitiesFrom Active Data Management to Event-Based Systems and More10.1007/978-3-642-17226-7_12(195-208)Online publication date: 2010
  • (2009)Distinct-value synopses for multiset operationsCommunications of the ACM10.1145/1562764.156278752:10(87-95)Online publication date: 1-Oct-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media