Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Fast top-k preserving query processing using two-tier indexes

Published: 01 September 2016 Publication History

Highlights

We present a new query processing method for text search.
We extend the BMW-CS algorithm to now preserve the top-k results, proposing BMW-CSP.
We show through experiments that the method is competitive when compared to baselines.

Abstract

In this paper we propose and evaluate the Block Max WAND with Candidate Selection and Preserving Top-K Results algorithm, or BMW-CSP. It is an extension of BMW-CS, a method previously proposed by us. Although very efficient, BMW-CS does not guarantee preserving the top-k results for a given query. Algorithms that do not preserve the top results may reduce the quality of ranking results in search systems. BMW-CSP extends BMW-CS to ensure that the top-k results will have their rankings preserved. In the experiments we performed for computing the top-10 results, the final average time required for processing queries with BMW-CSP was lesser than the ones required by the baselines adopted. For instance, when computing top-10 results, the average time achieved by MBMW, the best multi-tier baseline we found in the literature, was 36.29 ms per query, while the average time achieved by BMW-CSP was 19.64 ms per query. The price paid by BMW-CSP is an extra memory required to store partial scores of documents. As we show in the experiments, this price is not prohibitive and, in cases where it is acceptable, BMW-CSP may constitute an excellent alternative query processing method.

References

[1]
Anh V., Moffat A., Pruned query evaluation using pre-computed impacts, Proceedings of ACM SIGIR, 2006, pp. 372–379.
[2]
Anh V.N., de Kretser O., Moffat A., Vector-space ranking with effective early termination, Proceedings of ACM SIGIR, 2001, pp. 35–42.
[3]
Baeza-Yates R., Ribeiro-Neto B., Modern information retrieval, 2nd ed., Addison-Wesley Publishing Company, USA, 2011.
[4]
Broder A.Z., Carmel D., Herscovici M., Soffer A., Zien J., Efficient query evaluation using a two-level retrieval process, Proceedings of ACM international conference on information and knowledge management, CIKM, 2003, pp. 426–434.
[5]
Carvalho A., Rossi C., de Moura E.S., Fernandes D., Silva A.S.d., LeprEF: Learn to pre-compute evidence fusion for efficient query evaluation, Journal of the Association for Information Science and Technology 55 (92) (2012) 1–28.
[6]
Chakrabarti K., Chaudhuri S., Ganti V., Interval-based pruning for top-k processing over compressed lists, Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11, IEEE Computer Society, Washington, DC, USA, 2011, pp. 709–720,.
[7]
Dimopoulos C., Nepomnyachiy S., Suel T., A candidate filtering mechanism for fast top-k query processing on modern cpus, Proceedings of ACM SIGIR, 2013, pp. 723–732.
[8]
Ding S., Suel T., Faster top-k document retrieval using block-max indexes, Proceedings of ACM SIGIR, 2011, pp. 993–1002.
[9]
Ellias P., Universal codeword sets and representations of the integers, IEEE Transactions on Information Theory 21 (2) (1975) 194–203.
[10]
Fontoura M., Josifovski V., Liu J., Venkatesan S., Zhu X., Zien J.Y., Evaluation strategies for top-k queries over memory-resident inverted indexes, Proceedings of VLDB vol. 4 (12) (2011) 1213–1224.
[11]
Moffat A., Zobel J., Self-indexing inverted files for fast text retrieval, ACM Transactions on Information Systems 14 (4) (1996) 349–379,.
[12]
de Moura E.S., Navarro G., Ziviani N., Baeza-Yates R., Fast and flexible word searching on compressed text, ACM Transactions on Information Systems 18 (2) (2000) 113–139,.
[13]
Ntoulas A., Cho J., Pruning policies for two-tiered inverted index with correctness guarantee, Proceedings of ACM SIGIR, 2007, pp. 191–198.
[14]
Risvik K., Aasheim Y., Lidal M., Multi-tier architecture for web search engines, Proceedings of first Latin American web congress, 2003, pp. 132–143.
[15]
Robertson S.E., Walker S., Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval, Proceedings of ACM SIGIR, 1994, pp. 232–241.
[16]
Rossi C., de Moura E.S., Carvalho A.L., Silva A.S.d., Fast document-at-a-time query processing using two-tier indexes, Proceedings of ACM SIGIR, 2013, pp. 183–192.
[17]
Salton G., Wong A., Yang C.S., A vector space model for automatic indexing, Communications of the ACM 18 (11) (1975) 613–620.
[18]
Shan D., Ding S., He J., Yan H., Li X., Optimized top-k processing with global page scores on block-max indexes, Proceedings of International Conference on Web Search and Data Mining, WSDM, 2012, pp. 423–432.
[19]
Skobeltsyn G., Junqueira F., Plachouras V., Baeza-Yates R., Resin: A combination of results caching and index pruning for high-performance web search engines, Proceedings of ACM SIGIR, 2008, pp. 131–138.
[20]
Strohman T., Croft W.B., Efficient document retrieval in main memory, Proceedings of ACM SIGIR, 2007, pp. 175–182.
[21]
Zukowski M., Heman S., Nes N., Boncz P., Super-scalar RAM-CPU cache compression, Proceedings of the 22nd international conference on data engineering, ICDE ’06, IEEE Computer Society, Washington, DC, USA, 2006, p. 59.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 52, Issue 5
Sep 2016
269 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 September 2016

Author Tags

  1. Information retrieval
  2. Query processing
  3. Inverted index
  4. Search system
  5. BMW

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Many are Better than OneInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10335960:4Online publication date: 1-Jul-2023
  • (2023)Efficient immediate-access dynamic indexingInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10324860:3Online publication date: 24-May-2023
  • (2022)An NVM SSD-Based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316055735:6(5612-5625)Online publication date: 18-Mar-2022
  • (2021)Window Navigation with Adaptive Probing for Executing BlockMax WANDProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463109(2323-2327)Online publication date: 11-Jul-2021
  • (2020)Finding the Best of Both WorldsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401076(1031-1040)Online publication date: 25-Jul-2020
  • (2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020
  • (2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
  • (2019)Caching Scores for Faster Query Processing with Dynamic Pruning in Search EnginesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358154(2457-2460)Online publication date: 3-Nov-2019
  • (2019)Faster BlockMax WAND with Longer SkippingAdvances in Information Retrieval10.1007/978-3-030-15712-8_52(771-778)Online publication date: 14-Apr-2019
  • (2018)Split-Lists and Initial Thresholds for WAND-based SearchThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210066(877-880)Online publication date: 27-Jun-2018
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media