Nothing Special   »   [go: up one dir, main page]

Skip to main content

Fair Coresets and Streaming Algorithms for Fair k-means

  • Conference paper
  • First Online:
Approximation and Online Algorithms (WAOA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11926))

Included in the following conference series:

  • 1027 Accesses

Abstract

We study fair clustering problems as proposed by Chierichetti et al. [CKLV17]. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable [IMMM14] and show how to compute them in a streaming setting. This yields a streaming PTAS for fair k-means in the case of two colors (and exact balances). Furthermore, we extend techniques due to Chierichetti et al. [CKLV17] to obtain an approximation algorithm for k-means, which leads to a constant factor algorithm in the streaming model when combined with the coreset.

This work was done while the first author was at University of Bonn and the third author was at TU Dortmund. The second author acknowledges support by the ERC Advanced Grant 788893 AMDROMA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approximation of Euclidean k-means. In: 31st International Symposium on Computational Geometry (SoCG), pp. 754–767 (2015)

    Google Scholar 

  2. Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Proceedings of the 22nd Neural Information Processing Systems (NIPS), pp. 10–18 (2009)

    Google Scholar 

  3. Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: there’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica, May 2016

    Google Scholar 

  4. Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: StreamKM++: a clustering algorithm for data streams. ACM J. Exper. Algorithmics 17, 1–30 (2012). Article no. 2.4

    MathSciNet  MATH  Google Scholar 

  5. Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In: Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 61–72 (2017)

    Google Scholar 

  6. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035 (2007)

    Google Scholar 

  7. Braverman, V., Feldman, D., Lang, H.: New frameworks for offline and streaming coreset constructions, arXiv preprint arXiv:1612.00889 (2016)

  8. Bercea, I.O., et al.: On the cost of essentially fair clusterings, CoRR abs/1811.10319 (2018)

    Google Scholar 

  9. Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: the state of the art, arXiv:1703.09207 (2017)

  10. Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering, CoRR abs/1902.03519 (2019)

    Google Scholar 

  11. Bhattacharya, A., Jaiswal, R., Kumar, A.: Faster algorithms for the constrained k-means problem. Theory Comput. Syst. 62(1), 93–115 (2018)

    Article  MathSciNet  Google Scholar 

  12. Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for \(k\)-means clustering.. In: Proceedings of the 24th Neural Information Processing Systems (NIPS), pp. 298–306 (2010)

    Google Scholar 

  13. Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)

    Article  MathSciNet  Google Scholar 

  14. Cohen, M.B., Elder, S., Musco, C., Musco, C., Persu, M.: Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC), pp. 163–172 (2015)

    Google Scholar 

  15. Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments, arXiv: 1510.07524v1 (2016)

  16. Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), pp. 5036–5044 (2017)

    Google Scholar 

  17. Cohen-Addad, V., Klein, P.N., Mathieu, C.: Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 353–364 (2016)

    Google Scholar 

  18. Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2017)

    Google Scholar 

  19. Drineas, P., Frieze, A.M., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1–3), 9–33 (2004)

    Article  Google Scholar 

  20. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: ITCS 2012, pp. 214–226 (2012)

    Google Scholar 

  21. Ding, H., Xu, J.: A unified framework for clustering constrained data without locality property. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, 4–6 January 2015, pp. 1471–1490 (2015)

    Google Scholar 

  22. Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268 (2015)

    Google Scholar 

  23. Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40450-4_41

    Chapter  Google Scholar 

  24. Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC), pp. 569–578 (2011)

    Google Scholar 

  25. Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd Symposium on Computational Geometry, pp. 11–18 (2007)

    Google Scholar 

  26. Friggstad, Z., Rezapour, M., Salavatipour, M.R.: Local search yields a PTAS for k-means in doubling metrics. In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 365–374 (2016)

    Google Scholar 

  27. Frahling, G., Sohler, C.: Coresets in dynamic geometric data streams. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 209–217 (2005)

    Google Scholar 

  28. Frahling, G., Sohler, C.: A fast k-means implementation using coresets. Int. J. Comput. Geom. Appl. 18, 605–625 (2008)

    Article  MathSciNet  Google Scholar 

  29. Feldman, D., Schmidt, M., Sohler, C.: Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1434–1453 (2013)

    Google Scholar 

  30. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp. 291–300 (2004)

    Google Scholar 

  31. Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  32. Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of the Tenth Annual Symposium on Computational Geometry (SoCG), pp. 332–339 (1994)

    Google Scholar 

  33. Indyk, P., Mahabadi, S., Mahdian, M., Mirrokni, V.S.: Composable core-sets for diversity and coverage maximization. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 100–108 (2014)

    Google Scholar 

  34. Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization, CoRR abs/1901.08628 (2019)

    Google Scholar 

  35. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  Google Scholar 

  36. Kleinberg, J.M., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. In: 8th Innovations in Theoretical Computer Science Conference (ITCS), pp. 43:1–43:23 (2017)

    Google Scholar 

  37. Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 51–532 (2010)

    Article  MathSciNet  Google Scholar 

  38. Lloyd, S.P.: Least squares quantization in PCM. Bell Laboratories Technical Memorandum (1957). Later published as [Llo82]

    Google Scholar 

  39. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  40. Langberg, M., Schulman, L.J.: Universal epsilon-approximators for integrals. In: Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 598–607 (2010)

    Google Scholar 

  41. Lee, E., Schmidt, M., Wright, J.: Improved and simplified inapproximability for k-means. Inf. Process. Lett. 120, 40–43 (2017)

    Article  MathSciNet  Google Scholar 

  42. Munteanu, A., Schwiegelshohn, C.: Coresets-methods and history: a theoreticians design pattern for approximation and streaming algorithms. KI 32(1), 37–53 (2018)

    Google Scholar 

  43. Munoz, C., Smith, M., Patil, D.: Big data: a report on algorithmic systems, opportunity, and civil rights. Executive Office of the President. The White House (2016)

    Google Scholar 

  44. Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP) (2018)

    Google Scholar 

  45. Schmidt, M.: Coresets and streaming algorithms for the k-means problem and related clustering objectives. Ph. D. thesis, Universität Dortmund (2014)

    Google Scholar 

  46. Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair k-means clustering, CoRR abs/1812.10854 (2018)

    Google Scholar 

  47. Thanh, B.L., Ruggieri, S., Turini, F.: K-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 502–510 (2011)

    Google Scholar 

  48. Zafar, M.B., Valera, I., Rodriguez, M.G., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web (WWW), pp. 1171–1180 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Melanie Schmidt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schmidt, M., Schwiegelshohn, C., Sohler, C. (2020). Fair Coresets and Streaming Algorithms for Fair k-means. In: Bampis, E., Megow, N. (eds) Approximation and Online Algorithms. WAOA 2019. Lecture Notes in Computer Science(), vol 11926. Springer, Cham. https://doi.org/10.1007/978-3-030-39479-0_16

Download citation

Publish with us

Policies and ethics