Fair Coresets and Streaming Algorithms for Fair k-means

Melanie Schmidt¹⁰,
Chris Schwiegelshohn¹¹ &
Christian Sohler¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11926))

Included in the following conference series:

International Workshop on Approximation and Online Algorithms

1027 Accesses

Abstract

We study fair clustering problems as proposed by Chierichetti et al. [CKLV17]. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable [IMMM14] and show how to compute them in a streaming setting. This yields a streaming PTAS for fair k-means in the case of two colors (and exact balances). Furthermore, we extend techniques due to Chierichetti et al. [CKLV17] to obtain an approximation algorithm for k-means, which leads to a constant factor algorithm in the streaming model when combined with the coreset.

This work was done while the first author was at University of Bonn and the third author was at TU Dortmund. The second author acknowledges support by the ERC Advanced Grant 788893 AMDROMA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Distributed Fair k-Center Clustering Problems with Outliers

A technique for obtaining true approximations for k-center with covering constraints

Article Open access 05 April 2021

A Technique for Obtaining True Approximations for k-Center with Covering Constraints

References

Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approximation of Euclidean k-means. In: 31st International Symposium on Computational Geometry (SoCG), pp. 754–767 (2015)
Google Scholar
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Proceedings of the 22nd Neural Information Processing Systems (NIPS), pp. 10–18 (2009)
Google Scholar
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: there’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica, May 2016
Google Scholar
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: StreamKM++: a clustering algorithm for data streams. ACM J. Exper. Algorithmics 17, 1–30 (2012). Article no. 2.4
MathSciNet MATH Google Scholar
Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In: Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 61–72 (2017)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035 (2007)
Google Scholar
Braverman, V., Feldman, D., Lang, H.: New frameworks for offline and streaming coreset constructions, arXiv preprint arXiv:1612.00889 (2016)
Bercea, I.O., et al.: On the cost of essentially fair clusterings, CoRR abs/1811.10319 (2018)
Google Scholar
Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: the state of the art, arXiv:1703.09207 (2017)
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering, CoRR abs/1902.03519 (2019)
Google Scholar
Bhattacharya, A., Jaiswal, R., Kumar, A.: Faster algorithms for the constrained k-means problem. Theory Comput. Syst. 62(1), 93–115 (2018)
Article MathSciNet Google Scholar
Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for $k$-means clustering.. In: Proceedings of the 24th Neural Information Processing Systems (NIPS), pp. 298–306 (2010)
Google Scholar
Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)
Article MathSciNet Google Scholar
Cohen, M.B., Elder, S., Musco, C., Musco, C., Persu, M.: Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC), pp. 163–172 (2015)
Google Scholar
Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments, arXiv: 1510.07524v1 (2016)
Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), pp. 5036–5044 (2017)
Google Scholar
Cohen-Addad, V., Klein, P.N., Mathieu, C.: Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 353–364 (2016)
Google Scholar
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2017)
Google Scholar
Drineas, P., Frieze, A.M., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1–3), 9–33 (2004)
Article Google Scholar
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: ITCS 2012, pp. 214–226 (2012)
Google Scholar
Ding, H., Xu, J.: A unified framework for clustering constrained data without locality property. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, 4–6 January 2015, pp. 1471–1490 (2015)
Google Scholar
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268 (2015)
Google Scholar
Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40450-4_41
Chapter Google Scholar
Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC), pp. 569–578 (2011)
Google Scholar
Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd Symposium on Computational Geometry, pp. 11–18 (2007)
Google Scholar
Friggstad, Z., Rezapour, M., Salavatipour, M.R.: Local search yields a PTAS for k-means in doubling metrics. In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 365–374 (2016)
Google Scholar
Frahling, G., Sohler, C.: Coresets in dynamic geometric data streams. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 209–217 (2005)
Google Scholar
Frahling, G., Sohler, C.: A fast k-means implementation using coresets. Int. J. Comput. Geom. Appl. 18, 605–625 (2008)
Article MathSciNet Google Scholar
Feldman, D., Schmidt, M., Sohler, C.: Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1434–1453 (2013)
Google Scholar
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)
Google Scholar
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of the Tenth Annual Symposium on Computational Geometry (SoCG), pp. 332–339 (1994)
Google Scholar
Indyk, P., Mahabadi, S., Mahdian, M., Mirrokni, V.S.: Composable core-sets for diversity and coverage maximization. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 100–108 (2014)
Google Scholar
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization, CoRR abs/1901.08628 (2019)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Article Google Scholar
Kleinberg, J.M., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. In: 8th Innovations in Theoretical Computer Science Conference (ITCS), pp. 43:1–43:23 (2017)
Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 51–532 (2010)
Article MathSciNet Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. Bell Laboratories Technical Memorandum (1957). Later published as [Llo82]
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Langberg, M., Schulman, L.J.: Universal epsilon-approximators for integrals. In: Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 598–607 (2010)
Google Scholar
Lee, E., Schmidt, M., Wright, J.: Improved and simplified inapproximability for k-means. Inf. Process. Lett. 120, 40–43 (2017)
Article MathSciNet Google Scholar
Munteanu, A., Schwiegelshohn, C.: Coresets-methods and history: a theoreticians design pattern for approximation and streaming algorithms. KI 32(1), 37–53 (2018)
Google Scholar
Munoz, C., Smith, M., Patil, D.: Big data: a report on algorithmic systems, opportunity, and civil rights. Executive Office of the President. The White House (2016)
Google Scholar
Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP) (2018)
Google Scholar
Schmidt, M.: Coresets and streaming algorithms for the k-means problem and related clustering objectives. Ph. D. thesis, Universität Dortmund (2014)
Google Scholar
Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair k-means clustering, CoRR abs/1812.10854 (2018)
Google Scholar
Thanh, B.L., Ruggieri, S., Turini, F.: K-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 502–510 (2011)
Google Scholar
Zafar, M.B., Valera, I., Rodriguez, M.G., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web (WWW), pp. 1171–1180 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cologne, 50923, Köln, Germany
Melanie Schmidt & Christian Sohler
Department of Computer, Control and Management Engineering Antonio Ruberti, Sapienza University of Rome, 00185, Roma, RM, Italy
Chris Schwiegelshohn

Authors

Melanie Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Chris Schwiegelshohn
View author publications
You can also search for this author in PubMed Google Scholar
Christian Sohler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Melanie Schmidt .

Editor information

Editors and Affiliations

LIP6, Sorbonne University, Paris, France
Evripidis Bampis
Department for Mathematics and Computer Science, University of Bremen, Bremen, Germany
Nicole Megow

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmidt, M., Schwiegelshohn, C., Sohler, C. (2020). Fair Coresets and Streaming Algorithms for Fair k-means. In: Bampis, E., Megow, N. (eds) Approximation and Online Algorithms. WAOA 2019. Lecture Notes in Computer Science(), vol 11926. Springer, Cham. https://doi.org/10.1007/978-3-030-39479-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-39479-0_16
Published: 25 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39478-3
Online ISBN: 978-3-030-39479-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Fair Coresets and Streaming Algorithms for Fair k-means

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Distributed Fair k-Center Clustering Problems with Outliers

A technique for obtaining true approximations for k-center with covering constraints

A Technique for Obtaining True Approximations for k-Center with Covering Constraints

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fair Coresets and Streaming Algorithms for Fair k-means

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Distributed Fair k-Center Clustering Problems with Outliers

A technique for obtaining true approximations for k-center with covering constraints

A Technique for Obtaining True Approximations for k-Center with Covering Constraints

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation