Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-45442-5_3guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance

Published: 14 April 2020 Publication History

Abstract

The latest major release of Lucene (version 8) in March 2019 incorporates block-max indexes and exploits the block-max variant of Wand for query evaluation, which are innovations that originated from academia. This paper shares the story of how this came to be, which provides an interesting case study at the intersection of reproducibility and academic research achieving impact in the “real world”. We offer additional thoughts on the often idiosyncratic processes by which academic research makes its way into deployed solutions.

References

[1]
Azzopardi, L., et al.: The Lucene for information access and retrieval research (LIARR) workshop at SIGIR 2017. In: Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), pp. 1429–1430, Tokyo (2017)
[2]
Azzopardi, L., et al.: Lucene4IR: developing information retrieval evaluation resources using Lucene. In: SIGIR Forum, vol. 50, no. 2, pp. 58–75 (2017)
[3]
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 2003), pp. 426–434, New Orleans (2003)
[4]
Buckley, C.: Implementation of the SMART information retrieval system. Department of Computer Science TR, pp. 85–686, Cornell University (1985)
[5]
Cutting, D.R., Pedersen, J.O.: Space optimizations for total ranking. In: Computer-Assisted Information Searching on Internet (RIAO 1997), pp. 401–412, Paris (1997)
[6]
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), pp. 993–1002, Beijing (2011)
[7]
Lin J, et al., et al. Ferro N, et al., et al. Toward reproducible baselines: the open-source IR reproducibility challenge Advances in Information Retrieval 2016 Cham Springer 408-420
[8]
Macdonald, C., McCreadie, R., Santos, R.L., Ounis, I.: From puppy to maturity: experiences in developing Terrier. In: Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval, pp. 60–63, Portland (2012)
[9]
Macdonald C, Ounis I, and Tonellotto N Upper-bound approximations for dynamic pruning ACM Trans. Inf. Syst. 2011 29 4 171-1728
[10]
Macdonald, C., Tonellotto, N.: Upper bound approximation for BlockMaxWand. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR 2017), pp. 273–276 (2017)
[11]
Mallia, A., Siedlaczek, M., Mackenzie, J., Suel, T.: PISA: performant indexes and search for academia. In: Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019): CEUR Workshop Proceedings, vol. 2409, pp. 50–56, Paris (2019)
[12]
Metzler D and Croft WB Combining the language model and inference network approaches to retrieval Inf. Process. Manag. 2004 40 5 735-750
[13]
Metzler, D., Strohman, T., Turtle, H., Croft, W.B.: Indri at TREC 2004: Terabyte track. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004), Gaithersburg (2004)
[14]
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: Proceedings of the SIGIR 2006 Workshop on Open Source Information Retrieval, pp. 18–25 (2006)
[15]
Pohl, S.: Efficient scoring in Lucene. In: Berlin Buzzwords 2012 (2012)
[16]
Trotman, A., Clarke, C.L., Ounis, I., Culpepper, S., Cartright, M.A., Geva, S.: Open source information retrieval: a report on the SIGIR 2012 workshop. In: SIGIR Forum, vol. 46, no. 2, pp. 95–101 (2012)
[17]
Turtle H and Flood J Query evaluation: strategies and optimizations Inf. Process. Manag. 1995 31 6 831-850
[18]
Turtle, H., Hegde, Y., Rowe, S.A.: Yet another comparison of Lucene and Indri performance. In: Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval, pp. 64–67, Portland (2012)
[19]
Yang, P., Fang, H., Lin, J.: Anserini: enabling the use of Lucene for information retrieval research. In: Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), pp. 1253–1256, Tokyo (2017)
[20]
Yang, P., Fang, H., Lin, J.: Anserini: reproducible ranking baselines using Lucene. J. Data Inf. Qual. 10(4), Article 16 (2018)
[21]
Yee, W.G., Beigbeder, M., Buntine, W.: SIGIR06 workshop report: open source information retrieval systems (OSIR06). In: SIGIR Forum, vol. 40, no. 2, pp. 61–65 (2006)

Cited By

View all
  • (2024)Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADEAdvances in Information Retrieval10.1007/978-3-031-56060-6_23(349-363)Online publication date: 24-Mar-2024
  • (2023)Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity MatchingProceedings of the VLDB Endowment10.14778/3583140.358316316:6(1507-1519)Online publication date: 1-Feb-2023
  • (2023)Analyzing and Improving the Scalability of In-Memory Indices for Managed Search EnginesProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595272(15-29)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II
Apr 2020
708 pages
ISBN:978-3-030-45441-8
DOI:10.1007/978-3-030-45442-5

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 14 April 2020

Author Tags

  1. Open-source software
  2. Technology adoption

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADEAdvances in Information Retrieval10.1007/978-3-031-56060-6_23(349-363)Online publication date: 24-Mar-2024
  • (2023)Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity MatchingProceedings of the VLDB Endowment10.14778/3583140.358316316:6(1507-1519)Online publication date: 1-Feb-2023
  • (2023)Analyzing and Improving the Scalability of In-Memory Indices for Managed Search EnginesProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595272(15-29)Online publication date: 6-Jun-2023
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • (2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
  • (2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media