Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

rkHit: Representative Query with Uncertain Preference

Published: 20 June 2023 Publication History

Abstract

A top-k query retrieves the k tuples with highest scores according to a user preference, defined as a scoring function. It is difficult for a user to precisely specify the scoring function. Instead, obtaining the distribution on scoring functions, i.e., the preference distribution, has been extensively explored in many fields.
Motivated by this, we introduce the uniform (r,k)-hit (UrkHit) problem. Given a preference distribution, UrkHit aims to select a representative set of r tuples to maximize the probability of containing a tuple attractive to the user. We say a tuple attracts a user, if it is a top-k tuple for the scoring function adopted by the user. Further, we generalize UrkHit and propose the (r,k)-hit (rkHit) problem with an additional penalty function to model the user satisfaction with the tuple ranked i-th. rkHit aims to maximize the expected user satisfaction with the representative set. In 2D space, we design an exact algorithm 2DH for rkHit, indicating rkHit is in P for d=2. We show that rkHit is NP-hard when d\ge3. In 3D space, assuming a uniform preference distribution, we propose a (1-1/e)-approximation algorithm 3DH based on space partitioning. In addition, we propose an approximate algorithm MDH suitable for any dimension and distribution, which creatively combines the ideas of sampling and clustering. It relaxes the approximation guarantee slightly. Comprehensive experiments demonstrate the efficiency and effectiveness of our algorithms.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, and Subhash Suri. 2017. Efficient Algorithms for k-Regret Minimizing Sets. In 16th International Symposium on Experimental Algorithms (SEA 2017) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 75). Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1--7:23.
[2]
Pankaj K. Agarwal and Micha Sharir. 2000. Arrangements and Their Applications. In Handbook of Computational Geometry. North-Holland, Amsterdam, 49--119.
[3]
Reza Akbarinia, Esther Pacitti, and Patrick Valduriez. 2007. Best Position Algorithms for Top-k Queries. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 495--506.
[4]
Abolfazl Asudeh, Gautam Das, H. V. Jagadish, Shangqi Lu, Azade Nazi, Yufei Tao, Nan Zhang, and Jianwen Zhao. 2022. On Finding Rank Regret Representatives. ACM Trans. Database Syst., Vol. 47, 3, Article 10 (aug 2022), 37 pages.
[5]
Abolfazl Asudeh, H. V. Jagadish, Gerome Miklau, and Julia Stoyanovich. 2018. On Obtaining Stable Rankings. Proc. VLDB Endow., Vol. 12, 3 (nov 2018), 237--250. https://doi.org/10.14778/3291264.3291269
[6]
Abolfazl Asudeh, H. V. Jagadish, Julia Stoyanovich, and Gautam Das. 2019a. Designing Fair Ranking Schemes. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1259--1276. https://doi.org/10.1145/3299869.3300079
[7]
Abolfazl Asudeh, Azade Nazi, Nan Zhang, and Gautam Das. 2017. Efficient Computation of Regret-Ratio Minimizing Set: A Compact Maxima Representative. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 821--834.
[8]
Abolfazl Asudeh, Azade Nazi, Nan Zhang, Gautam Das, and H. V. Jagadish. 2019b. RRR: Rank-Regret Representative. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 263--280.
[9]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Hooyeon Lee. 2012. Approximating Low-Dimensional Coverage Problems. In Proceedings of the Twenty-Eighth Annual Symposium on Computational Geometry (Chapel Hill, North Carolina, USA) (SoCG '12). Association for Computing Machinery, New York, NY, USA, 161--170. https://doi.org/10.1145/2261250.2261274
[10]
Jon L Bentley, Kenneth L Clarkson, and David B Levine. 1993. Fast linear expected-time algorithms for computing maxima and convex hulls. Algorithmica, Vol. 9, 2 (1993), 168--183.
[11]
Avrim Blum, Jeffrey Jackson, Tuomas Sandholm, and Martin Zinkevich. 2004. Preference elicitation and query learning. Journal of Machine Learning Research, Vol. 5, Jun (2004), 649--667.
[12]
Stephan Bö rzsö nyi, Donald Kossmann, and Konrad Stocker. 2001. The Skyline Operator. In Proceedings of the 17th International Conference on Data Engineering, April 2--6, 2001, Heidelberg, Germany. IEEE Computer Society, 421--430.
[13]
Nicolas Bruno, Luis Gravano, and Amélie Marian. 2002. Evaluating top-k queries over Web-accessible databases. In Proceedings 18th International Conference on Data Engineering. IEEE Computer Society, 369--380. https://doi.org/10.1109/ICDE.2002.994751
[14]
Christian Buchta. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett., Vol. 33, 2 (1989), 63--65.
[15]
Wei Cao, Jian Li, Haitao Wang, Kangning Wang, Ruosong Wang, Raymond Chi-Wing Wong, and Wei Zhan. 2017. k-regret minimizing set: Efficient algorithms and hardness. In 20th International Conference on Database Theory (ICDT 2017) (LIPIcs, Vol. 68). Schloss Dagstuhl - Leibniz-Zentrum fü r Informatik, 11:1--11:19.
[16]
Li Chen and Pearl Pu. 2004. Survey of Preference Elicitation Methods. Technical Report. http://infoscience.epfl.ch/record/52659
[17]
Sean Chester, Alex Thomo, S Venkatesh, and Sue Whitesides. 2014. Computing k-regret minimizing sets. Proceedings of the VLDB Endowment, Vol. 7, 5 (2014), 389--400.
[18]
Wei Chu and Zoubin Ghahramani. 2005. Preference Learning with Gaussian Processes. In Proceedings of the 22nd International Conference on Machine Learning (Bonn, Germany) (ICML '05). Association for Computing Machinery, New York, NY, USA, 137--144.
[19]
Paolo Ciaccia and Davide Martinenghi. 2017. Reconciling skyline and ranking queries. Proceedings of the VLDB Endowment, Vol. 10, 11 (2017), 1454--1465.
[20]
Paolo Ciaccia and Davide Martinenghi. 2018. Beyond Skyline and Ranking Queries: Restricted Skylines (Extended Abstract). In Proceedings of the 26th Italian Symposium on Advanced Database Systems, Castellaneta Marina (Taranto), Italy, June 24--27, 2018 (CEUR Workshop Proceedings, Vol. 2161). CEUR-WS.org.
[21]
Paolo Ciaccia and Davide Martinenghi. 2020. Flexible skylines: Dominance for arbitrary sets of monotone functions. ACM Transactions on Database Systems (TODS), Vol. 45, 4 (2020), 1--45.
[22]
Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Nikos Sarkas. 2007. Ad-Hoc Top-k Query Answering for Data Streams. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 183--194.
[23]
Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Dimitris Tsirogiannis. 2006. Answering Top-k Queries Using Views. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 451--462.
[24]
Uriel Feige. 1998. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), Vol. 45, 4 (1998), 634--652.
[25]
Parke Godfrey. 2004. Skyline cardinality for relational processing. In International Symposium on Foundations of Information and Knowledge Systems. Springer, Berlin, Heidelberg, 78--97.
[26]
Dorit S Hochba. 1997. Approximation algorithms for NP-hard problems. ACM Sigact News, Vol. 28, 2 (1997), 40--52.
[27]
Leonard Kaufman and Peter Rousseeuw. 1987. Clustering by means of medoids. Statistical Data Analysis Based on the L1-Norm and Related Methods, Y. Dodge Ed.
[28]
Xuemin Lin, Yidong Yuan, Qing Zhang, and Ying Zhang. 2007. Selecting Stars: The k Most Representative Skyline Operator. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15--20, 2007. IEEE Computer Society, 86--95.
[29]
Dongjing Miao, Zhipeng Cai, Jianzhong Li, Xiangyu Gao, and Xianmin Liu. 2020. The computation of optimal subset repairs. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 2061--2074.
[30]
Michel Minoux. 1978. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization techniques. Springer, Berlin, Heidelberg, 234--243.
[31]
Kyriakos Mouratidis, Keming Li, and Bo Tang. 2021. Marrying Top-k with Skyline Queries: Relaxing the Preference Input While Producing Output of Controllable Size. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 1317--1330. https://doi.org/10.1145/3448016.3457299
[32]
Kyriakos Mouratidis and Bo Tang. 2018. Exact processing of uncertain top-k queries in multi-criteria settings. Proceedings of the VLDB Endowment, Vol. 11, 8 (2018), 866--879.
[33]
Danupon Nanongkai, Atish Das Sarma, Ashwin Lall, Richard J. Lipton, and Jun Xu. 2010. Regret-Minimizing Representative Databases. Proc. VLDB Endow., Vol. 3, 1--2 (sep 2010), 1114--1124.
[34]
George L Nemhauser and Laurence A Wolsey. 1978. Best algorithms for approximating the maximum of a submodular set function. Mathematics of operations research, Vol. 3, 3 (1978), 177--188.
[35]
Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. ACM Transactions on Database Systems (TODS), Vol. 30, 1 (2005), 41--82.
[36]
Peng Peng and Raymong Chi-Wing Wong. 2015. K-Hit Query: Top-k Query with Probabilistic Utility Function. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 577--592.
[37]
Jeff M. Phillips. 2012. Chernoff-Hoeffding Inequality and Applications. CoRR, Vol. abs/1209.6396 (2012). showeprint[arXiv]1209.6396 http://arxiv.org/abs/1209.6396
[38]
Li Qian, Jinyang Gao, and HV Jagadish. 2015. Learning user preferences by adaptive pairwise comparison. Proceedings of the VLDB Endowment, Vol. 8, 11 (2015), 1322--1333.
[39]
Paul Resnick and Hal R Varian. 1997. Recommender systems. Commun. ACM, Vol. 40, 3 (1997), 56--58.
[40]
Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to Recommender Systems Handbook. In Recommender Systems Handbook. Springer, 1--35. https://doi.org/10.1007/978-0--387--85820--3_1
[41]
Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, and Jun (Jim) Xu. 2011. Representative skylines using threshold-based preference distributions. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11--16, 2011, Hannover, Germany. IEEE Computer Society, 387--398.
[42]
Suraj Shetiya, Abolfazl Asudeh, Sadia Ahmed, and Gautam Das. 2019. A Unified Optimization Algorithm for Solving "Regret-Minimizing Representative" Problems. Proc. VLDB Endow., Vol. 13, 3 (nov 2019), 239--251.
[43]
Piotr Skowron and Piotr Faliszewski. 2017. Chamberlin--Courant Rule with Approval Ballots: Approximating the MaxCover Problem with Bounded Frequencies in FPT Time. Journal of Artificial Intelligence Research, Vol. 60 (2017), 687--716.
[44]
Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with Uncertain Scoring Functions: Semantics and Sensitivity Measures. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 805--816.
[45]
Maxim Sviridenko, Jan Vondrák, and Justin Ward. 2015. Optimal Approximation for Submodular and Supermodular Optimization with Bounded Curvature. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (San Diego, California) (SODA '15). Society for Industrial and Applied Mathematics, USA, 1134--1148.
[46]
Bo Tang, Kyriakos Mouratidis, and Mingji Han. 2021. On M-Impact Regions and Standing Top-k Influence Problems. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 1784--1796.
[47]
Bo Tang, Kyriakos Mouratidis, Man Lung Yiu, and Zhenyu Chen. 2019. Creating Top Ranking Options in the Continuous Option and Preference Space. Proc. VLDB Endow., Vol. 12, 10 (jun 2019), 1181--1194.
[48]
Yufei Tao, Ling Ding, Xuemin Lin, and Jian Pei. 2009. Distance-Based Representative Skyline. In Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China. IEEE Computer Society, 892--903.
[49]
VN Vapnik and A Ya Chervonenkis. 1971. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability and its Applications, Vol. 16, 2 (1971), 264.
[50]
Weicheng Wang, Raymond Chi-Wing Wong, and Min Xie. 2021c. Interactive Search for One of the Top-k. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 1920--1932.
[51]
Yanhao Wang, Yuchen Li, Raymond Chi-Wing Wong, and Kian-Lee Tan. 2021a. A Fully Dynamic Algorithm for k-Regret Minimizing Sets. In 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April 19--22, 2021. IEEE, 1631--1642.
[52]
Yanhao Wang, Michael Mathioudakis, Yuchen Li, and Kian-Lee Tan. 2021b. Minimum Coresets for Maxima Representation of Multidimensional Data. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Virtual Event, China) (PODS'21). Association for Computing Machinery, New York, NY, USA, 138--152.
[53]
Xingxing Xiao and Jianzhong Li. 2021. Sampling-based approximate skyline calculation on big data. Discrete Mathematics, Algorithms and Applications, Vol. 0, 0 (2021), 2250024. https://doi.org/10.1142/S1793830922500240
[54]
Xingxing Xiao and Jianzhong Li. 2022. Rank-Regret Minimization. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9--12, 2022. IEEE, 1848--1860.
[55]
Min Xie, Raymond Chi-Wing Wong, and Ashwin Lall. 2020a. An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J., Vol. 29, 1 (2020), 147--175.
[56]
Min Xie, Raymond Chi-Wing Wong, Peng Peng, and Vassilis J. Tsotras. 2020b. Being Happy with the Least: Achieving (α)-happiness with Minimum Number of Tuples. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 1009--1020.
[57]
Min Xie, Raymond Chi-Wing Wong, and Ashwin Lall. 2019. Strongly Truthful Interactive Regret Minimization. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 281--298.
[58]
Dong Xin, Chen Chen, and Jiawei Han. 2006. Towards Robust Indexing for Ranked Queries. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 235--246.
[59]
Guolei Yang and Ying Cai. 2017. Querying Improvement Strategies. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21--24, 2017. OpenProceedings.org, 294--305.

Cited By

View all
  • (2024)Ethical Games: Toward Evidence-Based Guidance for Safeguarding Players and DevelopersGames: Research and Practice10.1145/36852072:2(1-11)Online publication date: 19-Aug-2024

Index Terms

  1. rkHit: Representative Query with Uncertain Preference

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 2
    PACMMOD
    June 2023
    2310 pages
    EISSN:2836-6573
    DOI:10.1145/3605748
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2023
    Published in PACMMOD Volume 1, Issue 2

    Permissions

    Request permissions for this article.

    Author Tags

    1. multi-criteria decision-making
    2. skyline
    3. top-k

    Qualifiers

    • Research-article

    Funding Sources

    • the National Natural Science Foundation of China (NSFC)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)193
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Ethical Games: Toward Evidence-Based Guidance for Safeguarding Players and DevelopersGames: Research and Practice10.1145/36852072:2(1-11)Online publication date: 19-Aug-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media