research-article

Exploring the magic of WAND

Authors:

Matthias Petri,

J. Shane Culpepper,

Alistair MoffatAuthors Info & Claims

ADCS '13: Proceedings of the 18th Australasian Document Computing Symposium

Pages 58 - 65

https://doi.org/10.1145/2537734.2537744

Published: 05 December 2013 Publication History

Abstract

Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND processing strategy is one way of reducing the amount of computation necessary when executing such a query. The value of WAND has now been validated in a wide range of studies, and has become one of the key baselines against which all new top-k processing algorithms are benchmarked. However, most previous implementations of WAND-based retrieval approaches have been in the context of the BM25 Okapi similarity scoring regime. Here we measure the performance of WAND in the context of the alternative Language Model similarity score computation, and find that the dramatic efficiency gains reported in previous studies are no longer achievable. That is, when the primary goal of a retrieval system is to maximize effectiveness, WAND is relatively unhelpful in terms of attaining the secondary objective of maximizing query throughput rates. However, the BM-WAND algorithm does in fact help reducing the percentage of postings to be scored, but with additional computational overhead. We explore a variety of tradeoffs between scoring metric and processing regime and present new insight into how score-safe algorithms interact with rank scoring.

References

[1]

V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In Proc. SIGIR, pages 372--379, 2006.

Digital Library

[2]

V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35--42, 2001.

Digital Library

[3]

T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: Ad-hoc retrieval results since 1998. In Proc. CIKM, pages 601--610, 2009.

Digital Library

[4]

A. Z. Broder, D. Carmel, H. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426--434, 2003.

Digital Library

[5]

A. Broschart and R. Schenkel. High-performance processing of text queries with tunable pruned term and term pair indexes. ACM Trans. Information Systems, 30(1): 1--32, 2012.

Digital Library

[6]

E. W. Brown. Fast evaluation of structured queries for information retrieval. In Proc. SIGIR, pages 30--38, 1995.

Digital Library

[7]

C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. SIGIR, pages 97--110, 1985.

Digital Library

[8]

S. Büttcher, C. L. A. Clarke, and G. V. Cormack. Information Retrieval: Implementing and evaluating search engines. MIT Press, Cambridge, Massachusetts, 2010.

Digital Library

[9]

C. Dimopoulos, S. Nepomnyachiy, and T. Suel. Optimizing top-k document retrieval strategies for block-max indexes. In Proc. WSDM, pages 113--122, 2013.

Digital Library

[10]

S. Ding and T. Suel. Faster top-k retrieval using block-max indexes. In Proc. SIGIR, pages 993--1002, 2011.

Digital Library

[11]

M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. VLDB Endowment, 4(12): 1213--1224, 2011.

Digital Library

[12]

C. Kohlschütter, P. Fankhauser, and W. Nejdl. Boilerplate detection using shallow text features. In Proc. WSDM, pages 441--450, 2010.

Digital Library

[13]

R. Konow, G. Navarro, C. L. A. Clarke, and A. López-Ortíz. Faster and smaller inverted indices with treaps. In Proc. SIGIR, pages 193--202, 2013.

Digital Library

[14]

D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. SIGIR, pages 472--479, 2005.

Digital Library

[15]

A. Moffat and J. Zobel. Self indexing inverted files for fast text retrieval. ACM Trans. Information Systems, 14(4): 349--379, 1996.

Digital Library

[16]

M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency sorted indexes. JASIST, 47(10): 749--764, 1996.

Digital Library

[17]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. TREC-3, 1994.

[18]

T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In Proc. SIGIR, pages 219--225, 2005.

Digital Library

[19]

H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing and Management, 31(6): 831--850, 1995.

Digital Library

[20]

E. M. Voorhees and D. K. Harman. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proc. TREC-8, pages 1--24, 1999.

[21]

J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2): 6-1--6-56, 2006.

Digital Library

Cited By

Mallia ASuel TTonellotto NHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657906
Gou JLiu YShao MSuel T(2024)Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825349(709-716)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825349
Mackenzie JTrotman ALin J(2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3576922
Show More Cited By

Index Terms

Exploring the magic of WAND
1. Information systems
  1. Information retrieval

Recommendations

Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

Document retrieval is one of the best-established information retrieval activities since the ’60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is ...
Thesaurus Performance with Information Retrieval: Schema Matching as a Case Study
SMC '13: Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics

Thesaurus is used with many Information Retrieval (IR) models such as data integration, data warehousing, semantic query processing and classifiers. Considering the existence of various thesauri for a particular domain of knowledge, output quality of an ...
Automatic performance evaluation of web search engines

Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ADCS '13: Proceedings of the 18th Australasian Document Computing Symposium

December 2013

126 pages

ISBN:9781450325240

DOI:10.1145/2537734

Conference Chairs:
Shane Culpepper
RMIT University
,
Guido Zuccon
CSIRO
,
General Chair:
Laurianne Sitbon
Queensland University of Technology

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ADCS '13

ADCS '13: The Australasian Document Computing Symposium

December 5 - 6, 2013

Queensland, Brisbane, Australia

Acceptance Rates

ADCS '13 Paper Acceptance Rate 12 of 23 submissions, 52%;

Overall Acceptance Rate 30 of 57 submissions, 53%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
304
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mallia ASuel TTonellotto NHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657906
Gou JLiu YShao MSuel T(2024)Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825349(709-716)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825349
Mackenzie JTrotman ALin J(2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3576922
Li ZMackenzie JChen HDuh WHuang HKato MMothe JPoblete B(2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591806
Mallia AMackenzie JSuel TTonellotto NAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Faster Learned Sparse Retrieval with Guided TraversalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531774(1901-1905)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531774
Jiang KZhu LSun Q(2021)Optimizing Scoring and Sorting Operations for Faster WAND ProcessingAdvanced Data Mining and Applications10.1007/978-3-030-65390-3_38(499-514)Online publication date: 6-Jan-2021
https://doi.org/10.1007/978-3-030-65390-3_38
Khattab OHammoud MElsayed THuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Finding the Best of Both WorldsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401076(1031-1040)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401076
Mackenzie JMoffat Ad'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412000
Petri MMoffat AMackenzie JCulpepper JBeck DPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Accelerated Query Processing Via Similarity Score PredictionProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331207(485-494)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331207
Mackenzie JCulpepper JBlanco RCrane MClarke CLin JChang YZhai CLiu YMaarek Y(2018)Query Driven Algorithm Selection in Early Stage RetrievalProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159676(396-404)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159676
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten