Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2537734.2537744acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
research-article

Exploring the magic of WAND

Published: 05 December 2013 Publication History

Abstract

Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND processing strategy is one way of reducing the amount of computation necessary when executing such a query. The value of WAND has now been validated in a wide range of studies, and has become one of the key baselines against which all new top-k processing algorithms are benchmarked. However, most previous implementations of WAND-based retrieval approaches have been in the context of the BM25 Okapi similarity scoring regime. Here we measure the performance of WAND in the context of the alternative Language Model similarity score computation, and find that the dramatic efficiency gains reported in previous studies are no longer achievable. That is, when the primary goal of a retrieval system is to maximize effectiveness, WAND is relatively unhelpful in terms of attaining the secondary objective of maximizing query throughput rates. However, the BM-WAND algorithm does in fact help reducing the percentage of postings to be scored, but with additional computational overhead. We explore a variety of tradeoffs between scoring metric and processing regime and present new insight into how score-safe algorithms interact with rank scoring.

References

[1]
V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In Proc. SIGIR, pages 372--379, 2006.
[2]
V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35--42, 2001.
[3]
T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: Ad-hoc retrieval results since 1998. In Proc. CIKM, pages 601--610, 2009.
[4]
A. Z. Broder, D. Carmel, H. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426--434, 2003.
[5]
A. Broschart and R. Schenkel. High-performance processing of text queries with tunable pruned term and term pair indexes. ACM Trans. Information Systems, 30(1): 1--32, 2012.
[6]
E. W. Brown. Fast evaluation of structured queries for information retrieval. In Proc. SIGIR, pages 30--38, 1995.
[7]
C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. SIGIR, pages 97--110, 1985.
[8]
S. Büttcher, C. L. A. Clarke, and G. V. Cormack. Information Retrieval: Implementing and evaluating search engines. MIT Press, Cambridge, Massachusetts, 2010.
[9]
C. Dimopoulos, S. Nepomnyachiy, and T. Suel. Optimizing top-k document retrieval strategies for block-max indexes. In Proc. WSDM, pages 113--122, 2013.
[10]
S. Ding and T. Suel. Faster top-k retrieval using block-max indexes. In Proc. SIGIR, pages 993--1002, 2011.
[11]
M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. VLDB Endowment, 4(12): 1213--1224, 2011.
[12]
C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proc. WSDM, pages 441--450, 2010.
[13]
R. Konow, G. Navarro, C. L. A. Clarke, and A. López-Ortíz. Faster and smaller inverted indices with treaps. In Proc. SIGIR, pages 193--202, 2013.
[14]
D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. SIGIR, pages 472--479, 2005.
[15]
A. Moffat and J. Zobel. Self indexing inverted files for fast text retrieval. ACM Trans. Information Systems, 14(4): 349--379, 1996.
[16]
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency sorted indexes. JASIST, 47(10): 749--764, 1996.
[17]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. TREC-3, 1994.
[18]
T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In Proc. SIGIR, pages 219--225, 2005.
[19]
H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing and Management, 31(6): 831--850, 1995.
[20]
E. M. Voorhees and D. K. Harman. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proc. TREC-8, pages 1--24, 1999.
[21]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2): 6-1--6-56, 2006.

Cited By

View all
  • (2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
  • (2024)Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825349(709-716)Online publication date: 15-Dec-2024
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ADCS '13: Proceedings of the 18th Australasian Document Computing Symposium
December 2013
126 pages
ISBN:9781450325240
DOI:10.1145/2537734
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. performance
  3. text indexing

Qualifiers

  • Research-article

Conference

ADCS '13
ADCS '13: The Australasian Document Computing Symposium
December 5 - 6, 2013
Queensland, Brisbane, Australia

Acceptance Rates

ADCS '13 Paper Acceptance Rate 12 of 23 submissions, 52%;
Overall Acceptance Rate 30 of 57 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
  • (2024)Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825349(709-716)Online publication date: 15-Dec-2024
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • (2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
  • (2022)Faster Learned Sparse Retrieval with Guided TraversalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531774(1901-1905)Online publication date: 6-Jul-2022
  • (2021)Optimizing Scoring and Sorting Operations for Faster WAND ProcessingAdvanced Data Mining and Applications10.1007/978-3-030-65390-3_38(499-514)Online publication date: 6-Jan-2021
  • (2020)Finding the Best of Both WorldsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401076(1031-1040)Online publication date: 25-Jul-2020
  • (2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020
  • (2019)Accelerated Query Processing Via Similarity Score PredictionProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331207(485-494)Online publication date: 18-Jul-2019
  • (2018)Query Driven Algorithm Selection in Early Stage RetrievalProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159676(396-404)Online publication date: 2-Feb-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media