Nothing Special   »   [go: up one dir, main page]

Skip to main content

Abstract

Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts, playlists, TV channels) whose benefit can only be attained by groups when they reach a minimum level of representation (e.g., 50% to elect their desired candidate). In this paper, we study the k-means clustering problem with the additional constraint that each group (e.g., demographic group) must have a minumum level of representation in at least a given number of clusters. We formulate the problem through a mixed-integer optimization framework and present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints. While incorporating the fairness criteria leads to an NP-Hard assignment problem within the algorithm, we present computational approaches that make the algorithm practical even for large datasets. Numerical results show that this approach can produce fair clusters with practically no increase in the clustering cost across standard benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 504–514. ACM, New York (2021)

    Google Scholar 

  2. Ahmadian, S., et al.: Fair hierarchical clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21050–21060 (2020)

    Google Scholar 

  3. Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 267–275. ACM, New York (2019)

    Google Scholar 

  4. Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: International Conference on Machine Learning, pp. 405–413. PMLR, Long Beach (2019)

    Google Scholar 

  5. Bandyapadhyay, S., Fomin, F.V., Simonov, K.: On coresets for fair clustering in metric and Euclidean spaces and their applications (2020). arXiv:2007.10137

  6. Bera, S., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., Vancouver (2019). https://proceedings.neurips.cc/paper/2019/file/fc192b0c0d270dbf41870a63a8c76c2f-Paper.pdf

  7. Bercea, I.O., et al.: On the cost of essentially fair clusterings (2018). arXiv:1811.10319

  8. Böhm, M., Fazzone, A., Leonardi, S., Schwiegelshohn, C.: Fair clustering with multiple colors (2020). arXiv:2002.07892

  9. Brownwell, C.: CRTC relaxes quotas on Canadian content for tv broadcasters (2015). https://financialpost.com/technology/crtc-relaxes-quotas-on-canadian-content-for-tv-broadcasters

  10. Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91. PMLR, New York City (2018)

    Google Scholar 

  11. Chhabra, A., Vashishth, V., Mohapatra, P.: Fair algorithms for hierarchical agglomerative clustering (2020). arXiv:2005.03197

  12. Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., Long Beach (2017). https://proceedings.neurips.cc/paper/2017/file/978fce5bcc4eccc88ad48ce3914124a2-Paper.pdf

  13. Chiplunkar, A., Kale, S., Ramamoorthy, S.N.: How to solve fair k-center in massive data models. In: International Conference on Machine Learning, pp. 1877–1886. PMLR, Remote (2020)

    Google Scholar 

  14. Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. Signal Image Video Process. 13(7), 1413–1420 (2019)

    Article  Google Scholar 

  15. Dua, D., Graff, C., et al.: UCI machine learning repository (2017)

    Google Scholar 

  16. Esmaeili, S., Brubach, B., Srinivasan, A., Dickerson, J.: Fair clustering under a bounded cost. In: Advances in Neural Information Processing Systems, vol. 34, pp. 14345–14357 (2021)

    Google Scholar 

  17. Esmaeili, S., Brubach, B., Tsepenekas, L., Dickerson, J.: Probabilistic fair clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12743–12755 (2020)

    Google Scholar 

  18. Esmaeili, S.A., Duppala, S., Dickerson, J.P., Brubach, B.: Fair labeled clustering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 327–335 (2022)

    Google Scholar 

  19. Ghadiri, M., Samadi, S., Vempala, S.: Socially fair k-means clustering. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 438–448. ACM, Online (2021)

    Google Scholar 

  20. Goyal, D., Jaiswal, R.: Tight FPT approximation for socially fair clustering (2021). arXiv:2106.06755

  21. Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022). https://www.gurobi.com

  22. Harb, E., Lam, H.S.: Kfc: A scalable approximation algorithm for \( k \)-center fair clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14509–14519 (2020)

    Google Scholar 

  23. Huang, L., Jiang, S., Vishnoi, N.: Coresets for clustering with fairness constraints. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  24. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  25. Jia, X., Sheth, K., Svensson, O.: Fair colorful k-center clustering. In: Bienstock, D., Zambelli, G. (eds.) IPCO 2020. LNCS, vol. 12125, pp. 209–222. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45771-6_17

    Chapter  Google Scholar 

  26. Jones, M., Nguyen, H., Nguyen, T.: Fair k-centers via maximum matching. In: International Conference on Machine Learning, pp. 4940–4949. PMLR, Virtual (2020)

    Google Scholar 

  27. Kansal, T., Bahuguna, S., Singh, V., Choudhury, T.: Customer segmentation using k-means clustering. In: 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), pp. 135–139. IEEE, Belgaum (2018)

    Google Scholar 

  28. Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization. In: International Conference on Machine Learning, pp. 3448–3457. PMLR, Long Beach (2019)

    Google Scholar 

  29. Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: International Conference on Machine Learning, pp. 3458–3467. PMLR, Long Beach (2019)

    Google Scholar 

  30. Le Quy, T., Roy, A., Friege, G., Ntoutsi, E.: Fair-capacitated clustering. In: EDM. EDM, Paris (2021)

    Google Scholar 

  31. Liu, S., Vicente, L.N.: A stochastic alternating balance \( k \)-means algorithm for fair clustering (2021). arXiv:2105.14172

  32. Makarychev, Y., Vakilian, A.: Approximation algorithms for socially fair clustering. In: Conference on Learning Theory, pp. 3246–3264. PMLR, Boulder (2021)

    Google Scholar 

  33. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning (2019). arXiv:1908.09635

  34. NCSBE: North Carolina voter registration data (2020). https://www.ncsbe.gov/results-data/voter-registration-data

  35. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  36. Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair k-means clustering (2018). arXiv:1812.10854

  37. Thejaswi, S., Ordozgoiti, B., Gionis, A.: Diversity-aware \( k \)-median: clustering with fair center representation (2021). arXiv:2106.11696

  38. Wang, Y., et al.: Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. J. Biomed. Inform. 102, 103364 (2020)

    Article  Google Scholar 

  39. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  40. Ziko, I.M., Yuan, J., Granger, E., Ayed, I.B.: Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11202–11209. AAAI, Virtual (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Connor Lawless .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lawless, C., Günlük, O. (2024). Fair Minimum Representation Clustering. In: Dilkina, B. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2024. Lecture Notes in Computer Science, vol 14743. Springer, Cham. https://doi.org/10.1007/978-3-031-60599-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-60599-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-60601-4

  • Online ISBN: 978-3-031-60599-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics