Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Generic Techniques for Building Top-k Structures

Published: 10 October 2022 Publication History

Abstract

A reporting query returns the objects satisfying a predicate q from an input set. In prioritized reporting, each object carries a real-valued weight (which can be query dependent), and a query returns the objects that satisfy q and have weights at least a threshold τ. A top-k query finds, among all the objects satisfying q, the k ones of the largest weights; a max query is a special instance with k = 1. We want to design data structures of small space to support queries (and possibly updates) efficiently.
Previous work has shown that a top-k structure can also support max and prioritized queries with no performance deterioration. This article explores the opposite direction: do prioritized queries, possibly combined with max queries, imply top-k search? Subject to mild conditions, we provide affirmative answers with two reduction techniques. The first converts a prioritized structure into a static top-k structure with the same space complexity and only a logarithmic blowup in query time. If a max structure is available in addition, our second reduction yields a top-k structure with no degradation in expected performance (this holds for the space, query, and update complexities). Our techniques significantly simplify the design of top-k structures because structures for max and prioritized queries are often easier to obtain. We demonstrate this by developing top-k structures for interval stabbing, 3D dominance, halfspace reporting, linear ranking, and L nearest neighbor search in the RAM and the external memory computation models.

References

[1]
Peyman Afshani. 2008. On dominance reporting in 3D. In Proceedings of European Symposium on Algorithms (ESA’08). 41–51.
[2]
Peyman Afshani, Lars Arge, and Kasper Green Larsen. 2012. Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In Proceedings of Symposium on Computational Geometry (SoCG’12). 323–332.
[3]
Peyman Afshani, Gerth Stolting Brodal, and Norbert Zeh. 2011. Ordered and unordered top-k range reporting in large data sets. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’11). 390–400.
[4]
Peyman Afshani and Timothy M. Chan. 2009. Optimal halfspace range reporting in three dimensions. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09). 180–186.
[5]
Peyman Afshani, Chris H. Hamilton, and Norbert Zeh. 2010. A general approach for cache-oblivious range reporting and approximate range counting. Computational Geometry 43, 8 (2010), 700–712.
[6]
Pankaj K. Agarwal, Lars Arge, Jeff Erickson, Paolo Giulio Franciosa, and Jeffrey Scott Vitter. 2000. Efficient searching with linear constraints. Journal of Computer and System Sciences (JCSS) 61, 2 (2000), 194–216.
[7]
Pankaj K. Agarwal, Lars Arge, Haim Kaplan, Eyal Molad, Robert Endre Tarjan, and Ke Yi. 2012. An optimal dynamic data structure for stabbing-semigroup queries. SIAM Journal of Computing 41, 1 (2012), 104–127.
[8]
Pankaj K. Agarwal, Boris Aronov, Sariel Har-Peled, Jeff M. Phillips, Ke Yi, and Wuzhou Zhang. 2016. Nearest-neighbor searching under uncertainty II. ACM Transactions on Algorithms 13, 1 (2016), 3:1–3:25.
[9]
Pankaj K. Agarwal, Siu-Wing Cheng, Yufei Tao, and Ke Yi. 2009. Indexing uncertain data. In Proceedings of ACM Symposium on Principles of Database Systems (PODS’09). 137–146.
[10]
Pankaj K. Agarwal, Alon Efrat, and Micha Sharir. 1999. Vertical decomposition of shallow levels in 3-dimensional arrangements and its applications. SIAM Journal of Computing 29, 3 (1999), 912–953.
[11]
Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, and Subhash Suri. 2018. Range-max queries on uncertain data. Journal of Computer and System Sciences (JCSS) 94 (2018), 118–134.
[12]
Alok Aggarwal and Jeffrey Scott Vitter. 1988. The input/output complexity of sorting and related problems. Communications of the ACM (CACM) 31, 9 (1988), 1116–1127.
[13]
Lars Arge, Andrew Danner, and Sha-Mayn Teh. 2003. I/O-efficient point location using persistent B-trees. ACM Journal of Experimental Algorithmics 8 (2003).
[14]
Boris Aronov and Sariel Har-Peled. 2008. On approximating the depth and related problems. SIAM Journal of Computing 38, 3 (2008), 899–921.
[15]
Iwona Bialynicka-Birula and Roberto Grossi. 2005. Rank-sensitive data structures. In String Processing and Information Retrieval (SPIRE). 79–90.
[16]
Gerth Stolting Brodal. 2016. External memory three-sided range reporting and Top-k queries with sublogarithmic updates. In Proceedings of Symposium on Theoretical Aspects of Computer Science (STACS’16). 23:1–23:14.
[17]
Gerth Stolting Brodal, Rolf Fagerberg, Mark Greve, and Alejandro Lopez-Ortiz. 2009. Online sorted range reporting. In International Symposium on Algorithms and Computation (ISAAC’09). 173–182.
[18]
Timothy M. Chan. 2005. Low-dimensional linear programming with violations. SIAM Journal of Computing 34, 4 (2005), 879–893.
[19]
Timothy M. Chan, Yakov Nekrich, Saladi Rahul, and Konstantinos Tsakalidis. 2018. Orthogonal point location and rectangle stabbing queries in 3-d. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP’18). 31:1–31:14.
[20]
Bernard Chazelle and Herbert Edelsbrunner. 1987. Linear space data structures for two types of range search. Discrete & Computational Geometry 2 (1987), 113–126.
[21]
Bernard Chazelle, Leonidas J. Guibas, and D. T. Lee. 1985. The power of geometric duality. BIT Numerical Mathematics 25, 1 (1985), 76–90.
[22]
Kenneth L. Clarkson. 1987. New applications of random sampling in computational geometry. Discrete & Computational Geometry 2 (1987), 195–222.
[23]
Kenneth L. Clarkson and Peter W. Shor. 1989. Application of random sampling in computational geometry, II. Discrete & Computational Geometry 4 (1989), 387–421.
[24]
Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Computational Geometry: Algorithms and Applications (3rd ed.). Springer-Verlag.
[25]
Torben Hagerup and Christine Rub. 1990. A guided tour of Chernoff bounds. Information Processing Letters (IPL) 33, 6 (1990), 305–308.
[26]
Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. 2014. Space-efficient frameworks for Top-k string retrieval. Journal of the ACM (JACM) 61, 2 (2014), 9:1–9:36.
[27]
Xiaocheng Hu, Cheng Sheng, and Yufei Tao. 2019. Building an optimal point-location structure in \({O}(sort(n))\) I/Os. Algorithmica 81, 5 (2019), 1921–1937.
[28]
Ihab F. Ilyas, George Beskales, and Mohamed A. Soliman. 2008. A survey of top-k query processing techniques in relational database systems. Computing Surveys 40, 4 (2008).
[29]
Marek Karpinski and Yakov Nekrich. 2011. Top-K color queries for document retrieval. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’11). 401–411.
[30]
D. T. Lee and C. K. Wong. 1980. Voronoi diagrams in \(L_1\) (\(L_\infty\)) metrics with 2-dimensional storage applications. SIAM Journal of Computing 9, 1 (1980), 200–211.
[31]
Moshe Lewenstein. 2013. Orthogonal range searching for text indexing. In Space-Efficient Data Structures, Streams, and Algorithms. 267–302.
[32]
Jiri Matousek. 1992. Reporting points in halfspaces. Computational Geometry 2 (1992), 169–186.
[33]
Edward M. McCreight. 1985. Priority search trees. SIAM Journal of Computing 14, 2 (1985), 257–276.
[34]
Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press.
[35]
S. Muthukrishnan. 2002. Efficient algorithms for document retrieval problems. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). 657–666.
[36]
Gonzalo Navarro and Yakov Nekrich. 2012. Top-k document retrieval in optimal time and linear space. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’12). 1066–1077.
[37]
Gonzalo Navarro and Yakov Nekrich. 2017. Time-optimal top-k document retrieval. SIAM Journal of Computing 46, 1 (2017), 80–113.
[38]
Manish Patil, Sharma V. Thankachan, Rahul Shah, Yakov Nekrich, and Jeffrey Scott Vitter. 2014. Categorical range maxima queries. In Proceedings of ACM Symposium on Principles of Database Systems (PODS’14). 266–277.
[39]
Saladi Rahul and Ravi Janardan. 2014. A general technique for top-k geometric intersection query problems. IEEE Transactions on Knowledge and Data Engineering (TKDE) 26, 12 (2014), 2859–2871.
[40]
Saladi Rahul and Yufei Tao. 2015. On top-k range reporting in 2D space. In Proceedings of ACM Symposium on Principles of Database Systems (PODS’15). 265–275.
[41]
Biswajit Sanyal, Prosenjit Gupta, and Subhashis Majumder. 2013. Colored top-K range-aggregate queries. Information Processing Letters (IPL) 113, 19–21 (2013), 777–784.
[42]
Neil Sarnak and Robert Endre Tarjan. 1986. Planar point location using persistent search trees. Communications of the ACM (CACM) 29, 7 (1986), 669–679.
[43]
Rahul Shah, Cheng Sheng, Sharma V. Thankachan, and Jeffrey Scott Vitter. 2013. Top-k document retrieval in external memory. In Proceedings of European Symposium on Algorithms (ESA’13). 803–814.
[44]
Cheng Sheng and Yufei Tao. 2012. Dynamic top-k range reporting in external memory. In Proceedings of ACM Symposium on Principles of Database Systems (PODS’12). 121–130.
[45]
Yufei Tao. 2012. Stabbing horizontal segments with rays. In Proceedings of Symposium on Computational Geometry (SoCG’12). 313–322.
[46]
Yufei Tao. 2014. A dynamic I/O-efficient structure for one-dimensional top-k range reporting. In Proceedings of ACM Symposium on Principles of Database Systems (PODS’14). 256–265.
[47]
Ke Yi, Feifei Li, George Kollios, and Divesh Srivastava. 2008. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Transactions on Knowledge and Data Engineering (TKDE) 20, 12 (2008), 1669–1682.

Index Terms

  1. Generic Techniques for Building Top-k Structures

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Algorithms
    ACM Transactions on Algorithms  Volume 18, Issue 4
    October 2022
    305 pages
    ISSN:1549-6325
    EISSN:1549-6333
    DOI:10.1145/3561946
    • Editor:
    • Edith Cohen
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022
    Online AM: 29 June 2022
    Accepted: 23 June 2022
    Revised: 14 January 2022
    Received: 07 February 2021
    Published in TALG Volume 18, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Top-k
    2. reductions
    3. data structures
    4. algorithms

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Indian Institute of Science
    • GRF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 173
      Total Downloads
    • Downloads (Last 12 months)40
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media