Abstract
Typically, a specific market (e.g., of hotels, restaurants, laptops, etc.) is represented as a multi-attribute dataset of the available products. The topic of identifying and shortlisting the products of most interest to a user has been well-explored. In contrast, in this work we focus on the dataset, and aim to assess its competitiveness with regard to different possible preferences. We define measures of competitiveness, and represent them in the form of a heat-map in the domain of preferences. Our work finds application in market analysis and in business development. These applications are further enhanced when the competitiveness heat-map is used in tandem with information on user preferences (which can be readily derived by existing methods). Interestingly, our study also finds side-applications with strong practical relevance in the area of multi-objective querying. We propose a suite of algorithms to efficiently produce the heat-map, and conduct case studies and an empirical evaluation to demonstrate the practicality of our work.
Similar content being viewed by others
Notes
For legibility, only a 2% random sample of the weight vectors is illustrated.
Without loss of generality, we assume that no pair of products have identical attributes in all dimensions.
r\(k\)-skyband computation by [47] follows the general branch-and-bound paradigm using the index on \({\mathcal {P}}\); n here refers to the number of products popped from its search heap and considered for inclusion into RKS(c).
If \({\varvec{p}}_i\) r-dominates \({\varvec{p}}_j\) for N, the same holds for every partition of N too.
The \(\#\textrm{DR}\) heat-map is very similar to \(|\textrm{RKS}|\), as we will elaborate shortly.
The Pearson correlation coefficient for n value pairs \((x_i,y_i)\) is defined as \(r_{xy}= \frac{ \sum _{i=1}^{n}(x_i-{\bar{x}})(y_i-{\bar{y}}) }{\sqrt{\sum _{i=1}^{n}(x_i-{\bar{x}})^2}\sqrt{\sum _{i=1}^{n}(y_i-{\bar{y}})^2}}\), where \({\bar{x}}\) is the mean of \(x_i\) values, and analogously for \({\bar{y}}\). It takes values between \(-1\) (perfect anti-correlation) and 1 (perfect correlation).
References
Achtert, E., Böhm, C., Kröger, P., Kunath, P., Pryakhin, A., Renz, M.: Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In SIGMOD Conference, pp. 515–526 (2006)
Agarwal, P.K., Arge, L., Erickson, J., Franciosa, P.G., Vitter, J.S.: Efficient searching with linear constraints. J. Comput. Syst. Sci. 61(2), 194–216 (2000)
Agarwal, P.K., Kumar, N., Sintos, S., Suri, S.: Efficient algorithms for k-regret minimizing sets. In: SEA, volume 75 of LIPIcs, pp. 7:1–7:23 (2017)
Asudeh, A., Nazi, A., Zhang, N., Das., G.: Efficient computation of regret-ratio minimizing set: A compact maxima representative. In: SIGMOD Conference, pp. 821–834 (2017)
Asudeh, A., Nazi, A., Zhang, N., Das, G., Jagadish, H.V.: RRR: rank-regret representative. In: SIGMOD Conference, pp. 263–280 (2019)
Basketball positions. https://en.wikipedia.org/wiki/Basketball_positions
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)
Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational geometry: algorithms and applications. Springer-Verlag TELOS, 2008
Bernecker, T., Emrich, T., Kriegel, H., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)
Chang, Y.-C., Bergman, L., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: Indexing for linear optimization queries. In: SIGMOD Conference, pp. 391–402 (2000)
Chester, S., Thomo, A., Venkatesh, S., Whitesides, S.: Computing k-regret minimizing sets. PVLDB 7(5), 389–400 (2014)
Church, J., Ware, R.: Industrial Organization: A Strategic Approach. Irwin McGraw Hill, New York (2000)
Ciaccia, P., Martinenghi, D.: Reconciling skyline and ranking queries. PVLDB 10(11), 1454–1465 (2017)
Ciaccia, P., Martinenghi, D.: FA + TA \(<\) fsa: Flexible score aggregation. In: CIKM, pp 57–66 (2018)
Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: VLDB, pp. 451–462 (2006)
Duch, J., Waitzman, J.S., Amaral, L.A.N.: Quantifying the performance of individual players in a team activity. PLOS ONE 5(6), 1–7 (2010)
Dyer, J.S., Sarin, R.K.: Measurable multi attribute value functions. Oper. Res. 27(4), 810–822 (1979)
Edelsbrunner, H., Seidel, R., Sharir, M.: On the zone theorem for hyperplane arrangements. SIAM J. Comput. 22(2), 418–429 (1993)
Emrich, T., Kriegel, H., Kröger, P., Renz, M., Züfle.: Boosting spatial pruning: on optimal pruning of mbrs. In: SIGMOD Conference, pp. 9–50 (2010)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley, USA (1968)
Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Cham (2010)
Ge, S., U, L.H., Mamoulis, N., Cheung, D.W.: Efficient all top-\(k\) computation - A unified solution for all top-\(k\), reverse top-\(k\) and top-\(m\) influential queries. IEEE Trans. Knowl. Data Eng., 25(5):1015–1027 (2013)
Gerardi, K., Shapiro, A.: Does competition reduce price dispersion? new evidence from the airline industry. J. Polit. Econ. 117(1), 1–37 (2009)
Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)
Hausman, J., Leonard, G., Zona, J.D.: Competitive analysis with differentiated products. Anna. Econ. Stat. 34, 159–180 (1994)
Hotel dataset. https://github.com/RyanBalfanz/hotelsbase
House dataset. https://www.ipums.org
Hose, K., Vlachou, A.: A survey of skyline processing in highly distributed environments. VLDB J. 21(3), 359–384 (2012)
Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: SIGMOD Conference, pp. 259–270 (2001)
Hristidis, V., Papakonstantinou, Y.: Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13(1), 49–70 (2004)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comp. Surv. 40(4), 11–58 (2008)
Kalyvas, C., Tzouramanis, T.: A survey of skyline query processing. CoRR, abs/1704.01788 (2017)
Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Trade-Offs. Wiley, USA (1976)
Kolahdouzan, M.R., Shahabi, C.: Voronoi-based K nearest neighbor search for spatial network databases. In: VLDB, pp. 840–851 (2004)
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD Conference, pp. 201–212 (2000)
Kriegel, H., Renz, M., Schubert, M.: Route skyline queries: A multi-preference path planning approach. In: ICDE, pp. 261–272 (2010)
Li, C., Ooi, B.C., Tung, A.K.H., Wang, S.: DADA: a data cube for dominant relationship analysis. In: SIGMOD Conference, pp. 659–670 (2006)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: ICDE, pp. 86–95 (2007)
Liu, J., Xiong, L., Zhang, Q., Pei, J., Luo, J.: Eclipse: Generalizing kNN and skyline. In: ICDE, pp. 972–983. IEEE (2021)
Miah, M., Das, G., Hristidis, V., Mannila, H.: Standing out in a crowd: Selecting attributes for maximum visibility. In: ICDE, pp. 356–365 (2008)
Mouratidis, K., Li, K., Tang, B.: Marrying top-k with skyline queries: Relaxing the preference input while producing output of controllable size. In: SIGMOD Conference, pp. 1317–1330 (2021)
Mouratidis, K., Lin, Y., Yiu, M.L.: Preference queries in large multi-cost transportation networks. In: ICDE, pp. 533–544 (2010)
Mouratidis, K., Tang, B.: Exact processing of uncertain top-k queries in multi-criteria settings. PVLDB 11(8), 866–879 (2018)
Mouratidis, K., Zhang, J., Pang, H.: Maximum rank query. PVLDB 8(12), 1554–1565 (2015)
Mulmuley, K.: On levels in arrangements and voronoi diagrams. Discr. Comput. Geometry 6, 307–338 (1991)
Nanongkai, D., Sarma, A.D., Lall, A., Lipton, R.J., Xu, J.J.: Regret-minimizing representative databases. PVLDB 3(1), 1114–1124 (2010)
National Institute of Standards and Technology. NIST/SEMATECH e-handbook of statistical methods; 7.1.6. what are outliers in the data, 2013
NBA dataset. https://www.basketball-reference.com
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)
Pearson, K.: Note on Regression and Inheritance in the Case of Two Parents. Proc. R. Soc. Lond. Ser. I(58), 240–242 (1895)
Pearson, R.K.: Univariate outlier detection, chapter 3, pp. 69–91 (2005)
Peng, P., Wong, R.C.: k-hit query: Top-k query with probabilistic utility function. In: SIGMOD Conference, pp. 577–592 (2015)
Qian, L., Gao, J., Jagadish, H.V.: Learning user preferences by adaptive pairwise comparison. PVLDB 8(11), 1322–1333 (2015)
Rahul, S., Tao, Y.: Efficient top-k indexing via general reductions. In: PODS, pp. 277–288. ACM (2016)
Sarma, A.D., Lall, A., Nanongkai, D., Xu, J.J.: Randomized multi-pass streaming skyline algorithms. PVLDB 2(1), 85–96 (2009)
Sharifzadeh, M., Shahabi, C.: The spatial skyline queries. In: VLDB, pp. 751–762 (2006)
Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1), 1231–1242 (2010)
Sheng, C., Tao, Y.: Worst-case i/o-efficient skyline algorithms. ACM Trans. Database Syst. 37(4), 1–22 (2012)
Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 7th edn. McGraw-Hill Book Company, USA (2020)
Soliman, M.A., Ilyas, I.F., Martinenghi, D., Tagliasacchi, M.: Ranking with uncertain scoring functions: semantics and sensitivity measures. In: SIGMOD Conference, pp. 805–816 (2011)
Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor queries for dynamic databases. In: SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)
Sun, Y., Zhang, R., Xue, A.Y., Qi, J., Du, X.: Reverse nearest neighbor heat maps: A tool for influence exploration. In: ICDE, pp. 966–977 (2016)
Tang, B., Mouratidis, K., Han, M.: On m-impact regions and standing top-k influence problems. In: SIGMOD Conference, pp. 1784–1796 (2021)
Tang, B., Mouratidis, K., Yiu, M.L.: Determining the impact regions of competing options in preference space. In: SIGMOD Conference, pp. 805–820 (2017)
Tang, B., Mouratidis, K., Yiu, M.L., Chen, Z.: Creating top ranking options in the continuous option and preference space. PVLDB 12(10), 1181–1194 (2019)
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: ICDE, pp. 892–903 (2009)
Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)
Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multidimensional reverse k NN search. VLDB J. 16(3), 293–316 (2007)
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE, pp. 277–288 (2003)
TripAdvisor dataset. https://www.cs.virginia.edu/~hw5x/dataset.html
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: ICDE, pp. 365–376 (2010)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. PVLDB 3(1), 364–372 (2010)
Wan, Q., Wong, R.C., Ilyas, I.F., Özsu, M.T., Peng, Y.: Creating competitive products. PVLDB 2(1), 898–909 (2009)
Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: KDD, pp. 783–792 (2010)
Wong, R.C., Özsu, M.T., Yu, P.S., Fu, A.W., Liu, L.: Efficient method for maximizing bichromatic reverse nearest neighbor. PVLDB 2(1), 1126–1137 (2009)
Wu, Y., Gao, J., Agarwal, P.K., Yang, J.: Finding diverse, high-value representatives on a surface of answers. PVLDB 10(7), 793–804 (2017)
Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: ICDE, pp. 1397–1399 (2008)
Xie, M., Wong, R.C., Lall, A.: An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J. 29(1), 147–175 (2020)
Xie, M., Wong, R.C., Li, J., Long, C., Lall, A.: Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: SIGMOD Conference, pp. 959–974 (2018)
Xin, D., Chen, C., Han, J.: Towards robust indexing for ranked queries. In: VLDB, pp. 235–246, (2006)
Yang, G., Cai, Y.: Querying improvement strategies. In: EDBT, pp. 294–305 (2017)
Yang, J., Zhang, Y., Zhang, W., Lin, X.: Influence based cost optimization on user preference. In: ICDE, pp. 709–720 (2016)
Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: SLICE: reviving regions-based pruning for reverse k nearest neighbors queries. In: ICDE, pp. 760–771 (2014)
Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)
Yu, A., Agarwal, P.K., Yang, J.: Top-k preferences in high dimensions. IEEE Trans. Knowl. Data Eng. 28(2), 311–325 (2016)
Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y., Emrich, T.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: ICDE, pp. 158–169, (2013)
Zhang, Z., Jin, C., Kang, Q.: Reverse k-ranks query. PVLDB 7(10), 785–796 (2014)
Acknowledgements
This work was supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (Award No. MOE-T2EP20121-0002). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Ministry of Education, Singapore. This work was partially supported by Shenzhen Fundamental Research Program (Grant No. 20220815112848002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mouratidis, K., Li, K. & Tang, B. Quantifying the competitiveness of a dataset in relation to general preferences. The VLDB Journal 33, 231–250 (2024). https://doi.org/10.1007/s00778-023-00804-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-023-00804-1