Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3539618.3591806acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Profiling and Visualizing Dynamic Pruning Algorithms

Published: 18 July 2023 Publication History

Abstract

Efficiently retrieving the top-k documents for a given query is a fundamental operation in many search applications. Dynamic pruning algorithms accelerate top-k retrieval over inverted indexes by skipping documents that are not able to enter the current set of results. However, the performance of these algorithms depends on a number of variables such as the ranking function, the order of documents within the index, and the number of documents to be retrieved. In this paper, we propose a diagnostic framework, Dyno, for profiling and visualizing the performance of dynamic pruning algorithms. Our framework captures processing traces during retrieval, allowing the operations of the index traversal algorithm to be visualized. These visualizations support both query-level and system-to-system comparisons, enabling performance characteristics to be readily understood for different systems. Dyno benefits both academics and practitioners by furthering our understanding of the behavior of dynamic pruning algorithms, allowing better design choices to be made during experimentation and deployment.

References

[1]
I. S. Altingovde, E. Demir, F. Can, and O. Ulusoy. Incremental cluster-based retrieval using compressed cluster-skipping inverted files. ACM Trans. Inf. Sys., 26(3), 2008.
[2]
V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35--42, 2001.
[3]
P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. Mc- Namara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, and T. Wang. MS MARCO: A human generated MAchine Reading COmprehension dataset. arXiv:1611.09268v3, 2018.
[4]
D. Blandford and G. Blelloch. Index compression through document reordering. In Proc. DCC, pages 342--352, 2002.
[5]
E. Bortnikov, D. Carmel, and G. Golan-Gueta. Top-k query processing with conditional skips. In Proc. WWW, pages 653--661, 2017.
[6]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426--434, 2003.
[7]
K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-?? processing over compressed lists. In Proc. ICDE, pages 709--720, 2011.
[8]
M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221--1224, 2013.
[9]
M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman. A comparison of document-at-a-time and score-at-a-time query evaluation. In Proc. WSDM, pages 201--210, 2017.
[10]
N. Craswell, B. Mitra, E. Yilmaz, D. Campos, and E. M. Voorhees. Overview of the TREC 2019 deep learning track. In Proc. TREC, 2021.
[11]
C. M. Daoud, E. S. de Moura, D. Fernandes, A. S. da Silva, C. Rossi, and A. Carvalho. Waves: A fast multi-tier top-k query processing algorithm. Inf. Retr., 20(3):292--316, 2017.
[12]
L. L. S. de Carvalho, E. S. de Moura, C. M. Daoud, and A. S. da Silva. Heuristics to improve the BMW method and its variants. Journal of Information & Data Management, 6(3):178--191, 2015.
[13]
L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In Proc. KDD, pages 1535--1544, 2016.
[14]
C. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In Proc. SIGIR, pages 723--732, 2013.
[15]
S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In Proc. SIGIR, pages 993--1002, 2011.
[16]
A. Grand, R. Muir, J. Ferenczi, and J. Lin. From MaxScore to Block-Max Wand: The story of how Lucene significantly improved query evaluation performance. In Proc. ECIR, pages 20--27, 2020.
[17]
F. Hafizoglu, E. C. Kucukoglu, and I. S. Altingovde. On the efficiency of selective search. In Proc. ECIR, pages 705--712, 2017.
[18]
A. Kane and F. W. Tompa. Split-lists and initial thresholds for WAND-based search. In Proc. SIGIR, pages 877--880, 2018.
[19]
O. Khattab, M. Hammoud, and T. Elsayed. Finding the best of both worlds: Faster and more robust top-k document retrieval. In Proc. SIGIR, pages 1031--1040, 2020.
[20]
Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. Does selective search benefit from WAND optimization? In Proc. ECIR, pages 145--158, 2016.
[21]
D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Soft. Prac. & Exp., 45(1):1--29, 2015.
[22]
X. Ma, R. Pradeep, R. Nogueira, and J. Lin. Document expansions and learned sparse lexical representations for MSMARCO V1 and V2. In Proc. SIGIR, 2022.
[23]
C. Macdonald, R. McCreadie, R. L. T. Santos, and I. Ounis. From puppy to maturity: Experiences in developing Terrier. In Proc. OSIR at SIGIR 2012, 2012.
[24]
C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In Proc. SIGIR, pages 621--630, 2012.
[25]
J. Mackenzie and A. Moffat. Examining the additivity of top-k query processing innovations. In Proc. CIKM, pages 1085--1094, 2020.
[26]
J. Mackenzie, A. Mallia, M. Petri, J. S. Culpepper, and T. Suel. Compressing inverted indexes with recursive graph bisection: A reproducibility study. In Proc. ECIR, pages 339--352, 2019.
[27]
J. Mackenzie, Z. Dai, L. Gallagher, and J. Callan. Efficiency implications of term weighting for passage retrieval. In Proc. SIGIR, pages 1821--1824, 2020.
[28]
J. Mackenzie, A. Mallia, A. Moffat, and M. Petri. Accelerating learned sparse indexes via term impact decomposition. In Findings of the ACL: EMNLP 2022, pages 2830--2842, 2022.
[29]
J. Mackenzie, M. Petri, and A. Moffat. Tradeoff options for bipartite graph partitioning. IEEE Trans. Know. & Data Eng., 2022. To appear.
[30]
J. Mackenzie, M. Petri, and A. Moffat. Anytime ranking on document-ordered indexes. ACM Trans. Inf. Sys., 40(1):13.1--13.32, 2022.
[31]
J. Mackenzie, A. Trotman, and J. Lin. Efficient document-at-a-time and score-at- a-time query evaluation for learned sparse representations. ACM Trans. Inf. Sys., 41(4), 2023.
[32]
A. Mallia, G. Ottaviano, E. Porciani, N. Tonellotto, and R. Venturini. Faster BlockMax WAND with variable-sized blocks. In Proc. SIGIR, pages 625--634, 2017.
[33]
A. Mallia, M. Siedlaczek, J. Mackenzie, and T. Suel. PISA: Performant indexes and search for academia. In Proc. OSIRRC at SIGIR 2019, pages 50--56, 2019.
[34]
A. Mallia, M. Siedlaczek, and T. Suel. An experimental study of index compression and DAAT query processing methods. In Proc. ECIR, pages 353--368, 2019.
[35]
A. Mallia, M. Siedlaczek, M. Sun, and T. Suel. A comparison of top-?? threshold estimation techniques for disjunctive query processing. In Proc. CIKM, pages 2141--2144, 2020.
[36]
A. Mallia, O. Khattab, N. Tonellotto, and T. Suel. Learning passage impacts for inverted indexes. In Proc. SIGIR, pages 1723--1727, 2021.
[37]
R. Nogueira and J. Lin. From doc2query to docTTTTTquery, 2019. Unpublished technical report.
[38]
M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In Proc. Aust. Doc. Comp. Symp., pages 58--65, 2013.
[39]
M. Petri, A. Moffat, J. Mackenzie, J. S. Culpepper, and D. Beck. Accelerated query processing via similarity score prediction. In Proc. SIGIR, pages 485--494, 2019.
[40]
G. E. Pibiri and R. Venturini. Techniques for inverted index compression. ACM Comp. Surv., 53(6):125.1--125.36, 2021.
[41]
S. E. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Found. Trnd. Inf. Retr., 3:333--389, 2009.
[42]
D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In Proc. WSDM, pages 423--432, 2012.
[43]
W.-Y. Shieh, T.-F. Chen, J. J.-J. Shann, and C.-P. Chung. Inverted file compression through document identifier reassignment. Inf. Proc. & Man., 39(1):117--131, 2003.
[44]
F. Silvestri. Sorting out the document identifier assignment problem. In Proc. ECIR, pages 101--112, 2007.
[45]
N. Tonellotto, C. Macdonald, and I. Ounis. Effect of different docid orderings on dynamic pruning retrieval strategies. In Proc. SIGIR, pages 1179--1180, 2011.
[46]
N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In Proc. WSDM, pages 63--72, 2013.
[47]
N. Tonellotto, C. Macdonald, and I. Ounis. Efficient query processing for scalable web search. Found. Trnd. Inf. Retr., 12(4--5):319--500, 2018.
[48]
H. R. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Inf. Proc. & Man., 31(6):831--850, 1995.
[49]
L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In Proc. SIGIR, pages 105--114, 2011.
[50]
E. Yafay and I. S. Altingovde. Caching scores for faster query processing with dynamic pruning in search engines. In Proc. CIKM, pages 2457--2460, 2019.
[51]
P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselines using lucene. J. Data Inf. Qual., 10(4):16.1--17.20, 2018.
[52]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comp. Surv., 38(2):6.1--6.56, 2006

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. explainability
  2. profiling
  3. query processing
  4. visualization

Qualifiers

  • Short-paper

Conference

SIGIR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 69
    Total Downloads
  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media