Computer Science > Machine Learning

arXiv:2203.04462 (cs)

[Submitted on 9 Mar 2022]

Title:Downstream Fairness Caveats with Synthetic Healthcare Data

Authors:Karan Bhanot, Ioana Baldini, Dennis Wei, Jiaming Zeng, Kristin P. Bennett

View PDF

Abstract:This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness. Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. Albeit essential, these laws hinder research reproducibility. Synthetic data is a viable solution that can enable access to data similar to real healthcare data without privacy risks. Healthcare datasets may have biases in which certain protected groups might experience worse outcomes than others. With the real data having biases, the fairness of synthetically generated health data comes into question. In this paper, we evaluate the fairness of models generated on two healthcare datasets for gender and race biases. We generate synthetic versions of the dataset using a Generative Adversarial Network called HealthGAN, and compare the real and synthetic model's balanced accuracy and fairness scores. We find that synthetic data has different fairness properties compared to real data and fairness mitigation techniques perform differently, highlighting that synthetic data is not bias free.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2203.04462 [cs.LG]
	(or arXiv:2203.04462v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2203.04462

Submission history

From: Karan Bhanot [view email]
[v1] Wed, 9 Mar 2022 00:52:47 UTC (382 KB)

Computer Science > Machine Learning

Title:Downstream Fairness Caveats with Synthetic Healthcare Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Downstream Fairness Caveats with Synthetic Healthcare Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators