Abstract
We study fair clustering problems as proposed by Chierichetti et al. [CKLV17]. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable [IMMM14] and show how to compute them in a streaming setting. This yields a streaming PTAS for fair k-means in the case of two colors (and exact balances). Furthermore, we extend techniques due to Chierichetti et al. [CKLV17] to obtain an approximation algorithm for k-means, which leads to a constant factor algorithm in the streaming model when combined with the coreset.
This work was done while the first author was at University of Bonn and the third author was at TU Dortmund. The second author acknowledges support by the ERC Advanced Grant 788893 AMDROMA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approximation of Euclidean k-means. In: 31st International Symposium on Computational Geometry (SoCG), pp. 754–767 (2015)
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Proceedings of the 22nd Neural Information Processing Systems (NIPS), pp. 10–18 (2009)
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: there’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica, May 2016
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: StreamKM++: a clustering algorithm for data streams. ACM J. Exper. Algorithmics 17, 1–30 (2012). Article no. 2.4
Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In: Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 61–72 (2017)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035 (2007)
Braverman, V., Feldman, D., Lang, H.: New frameworks for offline and streaming coreset constructions, arXiv preprint arXiv:1612.00889 (2016)
Bercea, I.O., et al.: On the cost of essentially fair clusterings, CoRR abs/1811.10319 (2018)
Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: the state of the art, arXiv:1703.09207 (2017)
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering, CoRR abs/1902.03519 (2019)
Bhattacharya, A., Jaiswal, R., Kumar, A.: Faster algorithms for the constrained k-means problem. Theory Comput. Syst. 62(1), 93–115 (2018)
Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for \(k\)-means clustering.. In: Proceedings of the 24th Neural Information Processing Systems (NIPS), pp. 298–306 (2010)
Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)
Cohen, M.B., Elder, S., Musco, C., Musco, C., Persu, M.: Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC), pp. 163–172 (2015)
Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments, arXiv: 1510.07524v1 (2016)
Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), pp. 5036–5044 (2017)
Cohen-Addad, V., Klein, P.N., Mathieu, C.: Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 353–364 (2016)
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2017)
Drineas, P., Frieze, A.M., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1–3), 9–33 (2004)
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: ITCS 2012, pp. 214–226 (2012)
Ding, H., Xu, J.: A unified framework for clustering constrained data without locality property. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, 4–6 January 2015, pp. 1471–1490 (2015)
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268 (2015)
Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40450-4_41
Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC), pp. 569–578 (2011)
Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd Symposium on Computational Geometry, pp. 11–18 (2007)
Friggstad, Z., Rezapour, M., Salavatipour, M.R.: Local search yields a PTAS for k-means in doubling metrics. In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 365–374 (2016)
Frahling, G., Sohler, C.: Coresets in dynamic geometric data streams. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 209–217 (2005)
Frahling, G., Sohler, C.: A fast k-means implementation using coresets. Int. J. Comput. Geom. Appl. 18, 605–625 (2008)
Feldman, D., Schmidt, M., Sohler, C.: Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1434–1453 (2013)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS) (2016)
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of the Tenth Annual Symposium on Computational Geometry (SoCG), pp. 332–339 (1994)
Indyk, P., Mahabadi, S., Mahdian, M., Mirrokni, V.S.: Composable core-sets for diversity and coverage maximization. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 100–108 (2014)
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization, CoRR abs/1901.08628 (2019)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Kleinberg, J.M., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. In: 8th Innovations in Theoretical Computer Science Conference (ITCS), pp. 43:1–43:23 (2017)
Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 51–532 (2010)
Lloyd, S.P.: Least squares quantization in PCM. Bell Laboratories Technical Memorandum (1957). Later published as [Llo82]
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Langberg, M., Schulman, L.J.: Universal epsilon-approximators for integrals. In: Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 598–607 (2010)
Lee, E., Schmidt, M., Wright, J.: Improved and simplified inapproximability for k-means. Inf. Process. Lett. 120, 40–43 (2017)
Munteanu, A., Schwiegelshohn, C.: Coresets-methods and history: a theoreticians design pattern for approximation and streaming algorithms. KI 32(1), 37–53 (2018)
Munoz, C., Smith, M., Patil, D.: Big data: a report on algorithmic systems, opportunity, and civil rights. Executive Office of the President. The White House (2016)
Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP) (2018)
Schmidt, M.: Coresets and streaming algorithms for the k-means problem and related clustering objectives. Ph. D. thesis, Universität Dortmund (2014)
Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair k-means clustering, CoRR abs/1812.10854 (2018)
Thanh, B.L., Ruggieri, S., Turini, F.: K-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 502–510 (2011)
Zafar, M.B., Valera, I., Rodriguez, M.G., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web (WWW), pp. 1171–1180 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Schmidt, M., Schwiegelshohn, C., Sohler, C. (2020). Fair Coresets and Streaming Algorithms for Fair k-means. In: Bampis, E., Megow, N. (eds) Approximation and Online Algorithms. WAOA 2019. Lecture Notes in Computer Science(), vol 11926. Springer, Cham. https://doi.org/10.1007/978-3-030-39479-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-39479-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39478-3
Online ISBN: 978-3-030-39479-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)