Privacy-Preserving Energy-Based Generative Models for Marginal Distribution Protection

Robert E. Tillman, Tucker Balch, Manuela Veloso

Published: 21 Jun 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider learning generative models for sensitive financial and healthcare data. While previous work incorporates Differential Privacy (DP) into GAN training to protect the privacy of individual training instances, we consider a different privacy context where the primary objective is protecting the privacy of sensitive marginal distributions of the true generative process. We propose and motivate a new notion of privacy: \emph{$\alpha$-Level Marginal Distribution Privacy} ($\alpha$-LMDP), which provides a statistical guarantee that the sensitive generative marginal distributions are different from the observed real data. We then propose \emph{Privacy-Preserving Energy Models (PPEMs)}, a novel energy-based generative model formulation where the representations for these attributes are isolated from other attributes. This structured formulation motivates a learning procedure where a penalty based on a statistical goodness of fit test, the \emph{Kernel Stein Discrepancy}, can be applied to only the attributes requiring privacy so that $\alpha$-LMDP may be satisfied without affecting the other attributes. We evaluate this approach using financial and healthcare datasets and demonstrate that the resulting learnt generative models produce high fidelity synthetic data while preserving privacy. We also show that PPEMs can incorporate both $\alpha$-LMDP \emph{and} DP in contexts where both forms of privacy are required.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Our revision includes the following changes in response to the reviewer feedback: Requested by Reviewer nErC: - We added the papers listed by the reviewer to the related work and discussed them in relation to our proposed privacy notion, particularly focusing on Zhang et al., 2022, which is the closest to our proposed method - We added a broader impact section addressing the limitations of our privacy definition as discussed in our response to the reviewer. Requested by Reviewer KKXi: - We made the relationship between GANs, energy models, and EnergyGAN more clear by introducing EnergyGAN in section 3.1 - We added the commentary regarding the optimality proof and its relation to the proof from EnergyGAN to the main text - We added our responses 1, 2 and 3 to the reviewer's questions to address the reviewer's points 2, 3 and 4 - We removed the statement that PPEMs are the first energy-based models with privacy-preserving properties from the paper - We added a broader impact section and discussed the relation of our approach to fair generative modeling along with the point brought up by Reviewer mZ5y Requested by Reviewer mZ5y: - We added a limitations section to the paper including the points mentioned in our response to the reviewer - We retitled the paper “Privacy-Preserving Energy-Based Generative Models for Marginal Distribution Protection” - We added the definitions requested by the reviewer - We will added a broader impact section include the reviewer's point mentioned above in our response 4 along with Reviewer KKXi’s point relating our approach to fair generative modeling

Assigned Action Editor: ~Antti_Honkela1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 701