Fair Minimum Representation Clustering

Connor Lawless²⁵ &
Oktay Günlük²⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14743))

Included in the following conference series:

International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research

381 Accesses

Abstract

Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts, playlists, TV channels) whose benefit can only be attained by groups when they reach a minimum level of representation (e.g., 50% to elect their desired candidate). In this paper, we study the k-means clustering problem with the additional constraint that each group (e.g., demographic group) must have a minumum level of representation in at least a given number of clusters. We formulate the problem through a mixed-integer optimization framework and present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints. While incorporating the fairness criteria leads to an NP-Hard assignment problem within the algorithm, we present computational approaches that make the algorithm practical even for large datasets. Numerical results show that this approach can produce fair clusters with practically no increase in the clustering cost across standard benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Group and Individual Fairness in Clustering Algorithms

Exploring Rawlsian Fairness for K-Means Clustering

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

References

Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 504–514. ACM, New York (2021)
Google Scholar
Ahmadian, S., et al.: Fair hierarchical clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21050–21060 (2020)
Google Scholar
Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 267–275. ACM, New York (2019)
Google Scholar
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: International Conference on Machine Learning, pp. 405–413. PMLR, Long Beach (2019)
Google Scholar
Bandyapadhyay, S., Fomin, F.V., Simonov, K.: On coresets for fair clustering in metric and Euclidean spaces and their applications (2020). arXiv:2007.10137
Bera, S., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., Vancouver (2019). https://proceedings.neurips.cc/paper/2019/file/fc192b0c0d270dbf41870a63a8c76c2f-Paper.pdf
Bercea, I.O., et al.: On the cost of essentially fair clusterings (2018). arXiv:1811.10319
Böhm, M., Fazzone, A., Leonardi, S., Schwiegelshohn, C.: Fair clustering with multiple colors (2020). arXiv:2002.07892
Brownwell, C.: CRTC relaxes quotas on Canadian content for tv broadcasters (2015). https://financialpost.com/technology/crtc-relaxes-quotas-on-canadian-content-for-tv-broadcasters
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91. PMLR, New York City (2018)
Google Scholar
Chhabra, A., Vashishth, V., Mohapatra, P.: Fair algorithms for hierarchical agglomerative clustering (2020). arXiv:2005.03197
Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., Long Beach (2017). https://proceedings.neurips.cc/paper/2017/file/978fce5bcc4eccc88ad48ce3914124a2-Paper.pdf
Chiplunkar, A., Kale, S., Ramamoorthy, S.N.: How to solve fair k-center in massive data models. In: International Conference on Machine Learning, pp. 1877–1886. PMLR, Remote (2020)
Google Scholar
Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. Signal Image Video Process. 13(7), 1413–1420 (2019)
Article Google Scholar
Dua, D., Graff, C., et al.: UCI machine learning repository (2017)
Google Scholar
Esmaeili, S., Brubach, B., Srinivasan, A., Dickerson, J.: Fair clustering under a bounded cost. In: Advances in Neural Information Processing Systems, vol. 34, pp. 14345–14357 (2021)
Google Scholar
Esmaeili, S., Brubach, B., Tsepenekas, L., Dickerson, J.: Probabilistic fair clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12743–12755 (2020)
Google Scholar
Esmaeili, S.A., Duppala, S., Dickerson, J.P., Brubach, B.: Fair labeled clustering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 327–335 (2022)
Google Scholar
Ghadiri, M., Samadi, S., Vempala, S.: Socially fair k-means clustering. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 438–448. ACM, Online (2021)
Google Scholar
Goyal, D., Jaiswal, R.: Tight FPT approximation for socially fair clustering (2021). arXiv:2106.06755
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022). https://www.gurobi.com
Harb, E., Lam, H.S.: Kfc: A scalable approximation algorithm for $ k $-center fair clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14509–14519 (2020)
Google Scholar
Huang, L., Jiang, S., Vishnoi, N.: Coresets for clustering with fairness constraints. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Jia, X., Sheth, K., Svensson, O.: Fair colorful k-center clustering. In: Bienstock, D., Zambelli, G. (eds.) IPCO 2020. LNCS, vol. 12125, pp. 209–222. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45771-6_17
Chapter Google Scholar
Jones, M., Nguyen, H., Nguyen, T.: Fair k-centers via maximum matching. In: International Conference on Machine Learning, pp. 4940–4949. PMLR, Virtual (2020)
Google Scholar
Kansal, T., Bahuguna, S., Singh, V., Choudhury, T.: Customer segmentation using k-means clustering. In: 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), pp. 135–139. IEEE, Belgaum (2018)
Google Scholar
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization. In: International Conference on Machine Learning, pp. 3448–3457. PMLR, Long Beach (2019)
Google Scholar
Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: International Conference on Machine Learning, pp. 3458–3467. PMLR, Long Beach (2019)
Google Scholar
Le Quy, T., Roy, A., Friege, G., Ntoutsi, E.: Fair-capacitated clustering. In: EDM. EDM, Paris (2021)
Google Scholar
Liu, S., Vicente, L.N.: A stochastic alternating balance $ k $-means algorithm for fair clustering (2021). arXiv:2105.14172
Makarychev, Y., Vakilian, A.: Approximation algorithms for socially fair clustering. In: Conference on Learning Theory, pp. 3246–3264. PMLR, Boulder (2021)
Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning (2019). arXiv:1908.09635
NCSBE: North Carolina voter registration data (2020). https://www.ncsbe.gov/results-data/voter-registration-data
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair k-means clustering (2018). arXiv:1812.10854
Thejaswi, S., Ordozgoiti, B., Gionis, A.: Diversity-aware $ k $-median: clustering with fair center representation (2021). arXiv:2106.11696
Wang, Y., et al.: Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. J. Biomed. Inform. 102, 103364 (2020)
Article Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar
Ziko, I.M., Yuan, J., Granger, E., Ayed, I.B.: Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11202–11209. AAAI, Virtual (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA
Connor Lawless & Oktay Günlük

Authors

Connor Lawless
View author publications
You can also search for this author in PubMed Google Scholar
Oktay Günlük
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Connor Lawless .

Editor information

Editors and Affiliations

University of Southern California, Los Angeles, CA, USA
Bistra Dilkina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lawless, C., Günlük, O. (2024). Fair Minimum Representation Clustering. In: Dilkina, B. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2024. Lecture Notes in Computer Science, vol 14743. Springer, Cham. https://doi.org/10.1007/978-3-031-60599-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-60599-4_2
Published: 25 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60601-4
Online ISBN: 978-3-031-60599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fair Minimum Representation Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Group and Individual Fairness in Clustering Algorithms

Exploring Rawlsian Fairness for K-Means Clustering

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fair Minimum Representation Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Group and Individual Fairness in Clustering Algorithms

Exploring Rawlsian Fairness for K-Means Clustering

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation