Abstract
Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts, playlists, TV channels) whose benefit can only be attained by groups when they reach a minimum level of representation (e.g., 50% to elect their desired candidate). In this paper, we study the k-means clustering problem with the additional constraint that each group (e.g., demographic group) must have a minumum level of representation in at least a given number of clusters. We formulate the problem through a mixed-integer optimization framework and present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints. While incorporating the fairness criteria leads to an NP-Hard assignment problem within the algorithm, we present computational approaches that make the algorithm practical even for large datasets. Numerical results show that this approach can produce fair clusters with practically no increase in the clustering cost across standard benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 504–514. ACM, New York (2021)
Ahmadian, S., et al.: Fair hierarchical clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21050–21060 (2020)
Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 267–275. ACM, New York (2019)
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: International Conference on Machine Learning, pp. 405–413. PMLR, Long Beach (2019)
Bandyapadhyay, S., Fomin, F.V., Simonov, K.: On coresets for fair clustering in metric and Euclidean spaces and their applications (2020). arXiv:2007.10137
Bera, S., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., Vancouver (2019). https://proceedings.neurips.cc/paper/2019/file/fc192b0c0d270dbf41870a63a8c76c2f-Paper.pdf
Bercea, I.O., et al.: On the cost of essentially fair clusterings (2018). arXiv:1811.10319
Böhm, M., Fazzone, A., Leonardi, S., Schwiegelshohn, C.: Fair clustering with multiple colors (2020). arXiv:2002.07892
Brownwell, C.: CRTC relaxes quotas on Canadian content for tv broadcasters (2015). https://financialpost.com/technology/crtc-relaxes-quotas-on-canadian-content-for-tv-broadcasters
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91. PMLR, New York City (2018)
Chhabra, A., Vashishth, V., Mohapatra, P.: Fair algorithms for hierarchical agglomerative clustering (2020). arXiv:2005.03197
Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., Long Beach (2017). https://proceedings.neurips.cc/paper/2017/file/978fce5bcc4eccc88ad48ce3914124a2-Paper.pdf
Chiplunkar, A., Kale, S., Ramamoorthy, S.N.: How to solve fair k-center in massive data models. In: International Conference on Machine Learning, pp. 1877–1886. PMLR, Remote (2020)
Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. Signal Image Video Process. 13(7), 1413–1420 (2019)
Dua, D., Graff, C., et al.: UCI machine learning repository (2017)
Esmaeili, S., Brubach, B., Srinivasan, A., Dickerson, J.: Fair clustering under a bounded cost. In: Advances in Neural Information Processing Systems, vol. 34, pp. 14345–14357 (2021)
Esmaeili, S., Brubach, B., Tsepenekas, L., Dickerson, J.: Probabilistic fair clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12743–12755 (2020)
Esmaeili, S.A., Duppala, S., Dickerson, J.P., Brubach, B.: Fair labeled clustering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 327–335 (2022)
Ghadiri, M., Samadi, S., Vempala, S.: Socially fair k-means clustering. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 438–448. ACM, Online (2021)
Goyal, D., Jaiswal, R.: Tight FPT approximation for socially fair clustering (2021). arXiv:2106.06755
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022). https://www.gurobi.com
Harb, E., Lam, H.S.: Kfc: A scalable approximation algorithm for \( k \)-center fair clustering. In: Advances in Neural Information Processing Systems, vol. 33, pp. 14509–14519 (2020)
Huang, L., Jiang, S., Vishnoi, N.: Coresets for clustering with fairness constraints. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Jia, X., Sheth, K., Svensson, O.: Fair colorful k-center clustering. In: Bienstock, D., Zambelli, G. (eds.) IPCO 2020. LNCS, vol. 12125, pp. 209–222. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45771-6_17
Jones, M., Nguyen, H., Nguyen, T.: Fair k-centers via maximum matching. In: International Conference on Machine Learning, pp. 4940–4949. PMLR, Virtual (2020)
Kansal, T., Bahuguna, S., Singh, V., Choudhury, T.: Customer segmentation using k-means clustering. In: 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), pp. 135–139. IEEE, Belgaum (2018)
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair k-center clustering for data summarization. In: International Conference on Machine Learning, pp. 3448–3457. PMLR, Long Beach (2019)
Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: International Conference on Machine Learning, pp. 3458–3467. PMLR, Long Beach (2019)
Le Quy, T., Roy, A., Friege, G., Ntoutsi, E.: Fair-capacitated clustering. In: EDM. EDM, Paris (2021)
Liu, S., Vicente, L.N.: A stochastic alternating balance \( k \)-means algorithm for fair clustering (2021). arXiv:2105.14172
Makarychev, Y., Vakilian, A.: Approximation algorithms for socially fair clustering. In: Conference on Learning Theory, pp. 3246–3264. PMLR, Boulder (2021)
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning (2019). arXiv:1908.09635
NCSBE: North Carolina voter registration data (2020). https://www.ncsbe.gov/results-data/voter-registration-data
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair k-means clustering (2018). arXiv:1812.10854
Thejaswi, S., Ordozgoiti, B., Gionis, A.: Diversity-aware \( k \)-median: clustering with fair center representation (2021). arXiv:2106.11696
Wang, Y., et al.: Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. J. Biomed. Inform. 102, 103364 (2020)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Ziko, I.M., Yuan, J., Granger, E., Ayed, I.B.: Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11202–11209. AAAI, Virtual (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lawless, C., Günlük, O. (2024). Fair Minimum Representation Clustering. In: Dilkina, B. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2024. Lecture Notes in Computer Science, vol 14743. Springer, Cham. https://doi.org/10.1007/978-3-031-60599-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-60599-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60601-4
Online ISBN: 978-3-031-60599-4
eBook Packages: Computer ScienceComputer Science (R0)