Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

MAPPING: debiasing graph neural networks for fair node classification with limited sensitive information leakage

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Despite remarkable success in diverse web-based applications, Graph Neural Networks (GNNs) inherit and further exacerbate historical discrimination and social stereotypes, which critically hinder their deployments in high-stake domains such as online clinical diagnosis, financial crediting, etc. However, existing research in fair graph learning typically favors pairwise constraints to achieve fairness but fails to cast off dimensional limitations and generalize them into multiple sensitive attributes. Besides, most studies focus on in-processing techniques to enforce and calibrate fairness, constructing a model-agnostic debiasing GNN framework at the pre-processing stage to prevent downstream misuses and improve training reliability is still largely under-explored. Furthermore, previous work tends to enhance either fairness or privacy individually but few probes into how fairness issues trigger privacy concerns and whether such concerns can be alleviated with fairness intervention. In this paper, we propose a novel model-agnostic debiasing framework named MAPPING (Masking And Pruning and Message-Passing trainING) for fair node classification, in which we adopt the distance covariance (dCov)-based fairness constraints to simultaneously reduce feature and topology biases under multiple sensitive memberships, and combine them with adversarial debiasing to confine the risks of sensitive attribute inference. Experiments on real-world datasets with different GNN variants demonstrate the effectiveness and flexibility of MAPPING. Our results show that MAPPING can achieve better trade-offs between utility and fairness, and mitigate privacy risks of sensitive information leakage. This work paves the way for a new direction in trustworthy GNNs by addressing fairness and privacy concerns simultaneously, rather than achieving fairness at the expense of privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Data is provided within the manuscript.

Materials Availability

Yes.

Code Availability

Currently no.

Notes

  1. https://github.com/chirag126/nifty

References

  1. Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., Yin, D.: Graph Neural Networks for Social Recommendation, in The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, WWW ’19, p. 417–426 (2019). https://doi.org/10.1145/3308558.3313488

  2. Yang, Z., Pei, W., Chen, M., Yue, C.: WTAGRAPH: Web Tracking and Advertising Detection using Graph Neural Networks, In: 2022 IEEE Symposium on Security and Privacy (SP) , pp. 1540–1557 (2022). https://doi.org/10.1109/SP46214.2022.9833670

  3. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://arxiv.org/pdf/1901.00596.pdf

  4. Rahman, T., Surma, B., Backes, M., Zhang, Y.: Fairwalk: Towards Fair Graph Embedding, in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. Int. Joint Conf. Art. Intell. Org. pp. 3289–3295 (2019). https://doi.org/10.24963/ijcai.2019/456

  5. Suresh, H., Guttag, J.: A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle, In: Equity and Access in Algorithms, Mechanisms, and Optimization ACM, (2021). https://doi.org/10.1145/3465416.3483305

  6. Hamberg, K.: Gender bias in medicine. Women’s Health 4(3), 237–243 (2008). https://doi.org/10.2217/17455057.4.3.237

    Article  Google Scholar 

  7. Dai, E., Wang, S.: Say No to the Discrimination: Learning Fair Graph Neural Networks with Limited Sensitive Attribute Information, In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining , pp. 680–688 (2021)

  8. Agarwal, C., Lakkaraju, H., Zitnik, M.: Towards a unified framework for fair and stable graph representation learning, In: Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, Proceedings of Machine Learning Research, vol. 161 (PMLR, 2021), pp. 2114–2124. https://proceedings.mlr.press/v161/agarwal21b.html

  9. Oneto, L., Navarin, N., Donini, M.: Learning Deep Fair Graph Neural Networks, In: 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2020, Bruges, Belgium, October 2-4, 2020 , pp. 31–36 (2020). https://www.esann.org/sites/default/files/proceedings/2020/ES2020-75.pdf

  10. Jiang, Z., Han, X., Fan, C., Liu, Z., Zou, N., Mostafavi, A., Hu, X.: Fmp: Toward fair graph message passing against topology bias (2022)

  11. Spinelli, I., Scardapane, S., Hussain, A., Uncini, A.: Fairdrop: Biased edge dropout for enhancing fairness in graph representation learning. IEEE Transactions on Artificial Intelligence pp. 1–1 (2021). https://doi.org/10.1109/TAI.2021.3133818

  12. Kamiran, F., Calders, T.: Data pre-processing techniques for classification without discrimination. Knowl. Inf. Syst. 33 (2011). https://doi.org/10.1007/s10115-011-0463-8

  13. Wang, H., Ustun, B., Calmon, F.: Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions, In: Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 97 pp. 6618–6627 (2019) https://proceedings.mlr.press/v97/wang19l.html

  14. Dong, Y., Ma, J., Wang, S., Chen, C., Li, J.: Fairness in graph mining: A survey. IEEE Trans. Knowl. Data Eng. 01, 1–22 (5555). https://doi.org/10.1109/TKDE.2023.3265598

  15. Dong, Y., Liu, N., Jalaian, B., Li, J.: Edits: Modeling and mitigating data bias for graph neural networks. Proceed. ACM. Web. Conf. 2022, 1259–1269 (2022)

    Google Scholar 

  16. Villani, C.: Topic. Opt. Transport. Theory. American Mathematical Society, Providence, Rhode Island (2003)

    Google Scholar 

  17. Belghazi, M.I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., Hjelm, D.: Mutual Information Neural Estimation, In: Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 80, pp. 531–540 (2018). https://proceedings.mlr.press/v80/belghazi18a.html

  18. Song, J., Ermon, S.: Understanding the Limitations of Variational Mutual Information Estimators, In: International Conference on Learning Representations (2020)

  19. Staerman, G., Laforgue, P., Mozharovskyi, P., d’Alché Buc, F.: When OT meets MoM: Robust estimation of Wasserstein Distance, In: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, PMLR, 130, pp. 136–144 (2021). https://proceedings.mlr.press/v130/staerman21a.html

  20. Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. The Annal. Stat. 35(6), 2769–2794 (2007). https://doi.org/10.1214/009053607000000505

    Article  MathSciNet  Google Scholar 

  21. Hou, J., Ye, X., Feng, W., Zhang, Q., Han, Y., Yusong Liu, Y., Li, Y., Wei, Y.: Distance correlation application to gene co-expression network analysis. BMC Bioinf. 23 (2022). https://doi.org/10.1186/s12859-022-04609-x

  22. Zhang, H., Wu, B., Yuan, X., Pan, S., Tong, H., Pei, J.: Trustworthy graph neural networks: Aspects, methods, and trends. Proceed. IEEE. 112(2), 97–139 (2024). https://doi.org/10.1109/JPROC.2024.3369017

    Article  Google Scholar 

  23. Duddu, V., Boutet, A., Shejwalkar, V.: Quantifying Privacy Leakage in Graph Embedding, In: MobiQuitous 2020 - 17th EAI International Conference on Mobile and Ubiquitous Systems: Comput. Netw. Serv. ACM, (2020). https://doi.org/10.1145/3448891.3448939

  24. He, X., Jia, J., Backes, M., Gong, N.Z., Zhang, Y.: Stealing Links from Graph Neural Networks, In: 30th USENIX Security Symposium (USENIX Security 21) (USENIX Association, 2021), pp. 2669–2686 (2021). https://www.usenix.org/conference/usenixsecurity21/presentation/he-xinlei

  25. He, X., Wen, R., Wu, Y., Backes, M., Shen, Y., Zhang, Y.: Node-level membership inference attacks against graph neural networks (2021)

  26. Conti, M., Li, J., Picek, S., Xu, J.: Label-Only Membership Inference Attack against Node-Level Graph Neural Networks, In: Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security (Association for Computing Machinery, New York, NY, USA, 2022), AISec 22, p. 1–12 (2022) https://doi.org/10.1145/3560830.3563734

  27. Golle, P.: Revisiting the uniqueness of simple demographics in the US population, In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society (Association for Computing Machinery, New York, NY, USA, 2006), WPES 06, p. 77–80 (2006). https://doi.org/10.1145/1179601.1179615

  28. Zhang, H., Yuan, X., Pan, S.: Unraveling Privacy Risks of Individual Fairness in Graph Neural Networks, In: 2024 IEEE 40th International Conference on Data Engineering (ICDE) IEEE Comput. Soc. Los Alamitos, CA, USA, 2024), pp. 1712–1725 (2024). https://doi.org/10.1109/ICDE60146.2024.00139

  29. Dai, E., Wang, S.: Learning fair graph neural networks with limited and private sensitive attribute information. IEEE Trans. Knowl. Data. Eng. pp. 1–14 (2022). https://doi.org/10.1109/TKDE.2022.3197554

  30. Zhang, S., Yin, H., Chen, T., Huang, Z., Cui, L., Zhang, X.: Graph Embedding for Recommendation against Attribute Inference Attacks, In: Proceedings of the Web Conference 2021 (Associate. Comput. Machine., New York, NY, USA,) WWW 21, p. 3002–3014 (2021). https://doi.org/10.1145/3442381.3449813

  31. Sajadmanesh, S., Gatica-Perez, D.: Locally Private Graph Neural Networks, In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (Association for Computing Machinery, New York, NY, USA, ) CCS 21, p. 2130–2145 (2021). https://doi.org/10.1145/3460120.3484565

  32. Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. The Annal. Stat. 35(6), 2769–2794 (2007). http://www.jstor.org/stable/25464608

  33. Liu, J., Li, Z., Yao, Y., Xu, F., Ma, X., Xu, M., Tong, H.: Fair Representation Learning: An Alternative to Mutual Information, In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York, NY, USA, ) KDD 22, p. 1088–1097 (2022). https://doi.org/10.1145/3534678.3539302

  34. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through Awareness, In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Association for Computing Machinery, New York, NY, USA,) ITCS 12, p. 214–226 (2012). https://doi.org/10.1145/2090236.2090255

  35. Hardt, M., Price, E., Srebro, N.: Equality of Opportunity in Supervised Learning, In: Proceedings of the 30th International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA,) NIPS’ 16, p. 3323–3331 (2016)

  36. Louizos, C., Swersky, K., Li, Y., Welling, M., Zemel, R.: The variational fair autoencoder (2017)

  37. Bose, A.J., Hamilton, W.: Compositional Fairness Constraints for Graph Embeddings, In: Proceedings of the Thirty-sixth International Conference on Machine Learning, Long Beach CA (2019)

  38. Wang, Y., Zhao, Y., Dong, Y., Chen, H., Li, J., Derr, T.: Improving Fairness in Graph Neural Networks via Mitigating Sensitive Attribute Leakage, in SIGKDD (2022)

  39. SKEEM, J.L., LOWENKAMP, C.T. : Risk, race, and recidivism: Predictive bias and disparate impact*. Criminology 54(4), 680–712 (2016). https://doi.org/10.1111/1745-9125.12123. https://onlinelibrary.wiley.com//pdf/10.1111/1745-9125.12123

  40. van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Machine. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html

  41. Zeng, Z., Islam, R., Keya, K.N., Foulds, J., Song, Y., Pan, S.: Fair representation learning for heterogeneous information networks, In: Proceedings of the International AAAI Conference on Web and Social Media, 15, pp. 877–887 (2021)

  42. Yao, S., Huang, B.: Beyond parity: Fairness objectives for collaborative filtering. Adv. Neural. Inf. Process. Syst. 30 (2017)

  43. Jayaraman, B., Evans, D.: Are Attribute Inference Attacks Just Imputation?, In: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Association for Computing Machinery, New York, NY, USA,) CCS 22, p. 1569–1582 (2022). https://doi.org/10.1145/3548606.3560663

  44. Beutel, A., Chen, J., Zhao, Z., Chi, E.H.: Data decisions and theoretical implications when adversarially learning fair representations (2017)

  45. Zhao, T., Dai, E., Shu, K., Wang, S.: Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features, In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (Association for Computing Machinery, New York, NY, USA,) WSDM 22, p. 1433–1442 (2022). https://doi.org/10.1145/3488560.3498493

  46. Zhu, H., Wang, S.: Learning fair models without sensitive attributes: A generative approach (2022)

  47. Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness Beyond Disparate Treatment and Disparate Impact: Learning Classification without Disparate Mistreatment, In: Proceedings of the 26th International Conference on World Wide Web (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE,) WWW 17, p. 1171–1180 (2017). https://doi.org/10.1145/3038912.3052660

  48. Cho, J., Hwang, G., Suh, C.: A Fair Classifier Using Mutual Information, In: 2020 IEEE International Symposium on Information Theory (ISIT) (IEEE Press,) p. 2521–2526 (2020). https://doi.org/10.1109/ISIT44484.2020.9174293

  49. Roh, Y., Lee, K., Whang, S.E., Suh, C.: FR-Train: A Mutual Information-Based Approach to Fair and Robust Training, In: Proceedings of the 37th International Conference on Machine Learning (JMLR.org,) ICML20, (2020)

  50. Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating Unwanted Biases with Adversarial Learning, In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (Association for Computing Machinery, New York, NY, USA,) AIES ’18, p. 335–340 (2018). https://doi.org/10.1145/3278721.3278779

  51. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive Representation Learning on Large Graphs, In: NIPS (2017)

  52. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How Powerful are Graph Neural Networks?, In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=ryGs6iA5Km

  53. Weisfeiler, B., Lehman, A.: A reduction of a graph to a canonical form and an algebra arising during this reduction., In: Nauchno-Technicheskaya Informatsia, 2(9) , pp. 12–16 (1968)

  54. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A Next-Generation Hyperparameter Optimization Framework, In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York, NY, USA,) KDD 19, p. 2623–2631 ( 2019). https://doi.org/10.1145/3292500.3330701

  55. Huang, C., Huo, X.: A statistically and numerically efficient independence test based on random projections and distance covariance. Frontier. Applied. Math. Stat. 7 (2022). https://doi.org/10.3389/fams.2021.779841

  56. Huo, X., Székely, G.J.: Fast computing for distance covariance. Technometrics 58(4), 435–447 (2016). https://doi.org/10.1080/00401706.2015.1054435

    Article  MathSciNet  Google Scholar 

  57. Li, Y., Purcell, M., Rakotoarivelo, T., Smith, D., Ranbaduge, T., Ng, K.S.: Private graph data release: A survey (2022)

  58. Masrour, F., Wilson, T., Yan, H., Tan, P.N., Esfahanian, A.: Bursting the filter bubble: Fairness-aware network link prediction. Proceed. AAAI Conf. Art. Intell. 34(01), 841–848 (2020). https://doi.org/10.1609/aaai.v34i01.5429, https://ojs.aaai.org/index.php/AAAI/article/view/5429

  59. Li, K., Luo, G., Ye, Y., Li, W., Ji, S., Cai, Z.: Adversarial privacy-preserving graph embedding against inference attack. IEEE Internet. Things. J. 8, 6904–6915 (2020)

    Article  Google Scholar 

  60. Liao, P., Zhao, H., Xu, K., Jaakkola, T., Gordon, G.J., Jegelka, S., Salakhutdinov, R.: Information Obfuscation of Graph Neural Networks, In: Proceedings of the 38th International Conference on Machine Learning, Proceed. Machine. Learn. Res., PMLR, 139, pp. 6600–6610 (2021). http://proceedings.mlr.press/v139/liao21a.html

  61. Hu, H., Cheng, L., Vap, J.P., Borowczak, M.: Learning Privacy-Preserving Graph Convolutional Network with Partially Observed Sensitive Attributes. Proceed. ACM Web Conf. 2022, 3552–3561 (2022)

    Google Scholar 

  62. Wang, B., Guo, J., Li, A., Chen, Y., Li, H.: Privacy-Preserving Representation Learning on Graphs: A Mutual Information Perspective, In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York, NY, USA,) KDD 21, p. 1667–1676 (2021). https://doi.org/10.1145/3447548.3467273

  63. Chang, H., Shokri, R.: On the Privacy Risks of Algorithmic Fairness, In: 2021 IEEE European Symposium on Security and Privacy (EuroS &P) (IEEE Computer Society, Los Alamitos, CA, USA,) pp. 292–303 (2021). https://doi.org/10.1109/EuroSP51992.2021.00028

  64. Chen, C., Liang, Y., Xu, X., Xie, S., Hong, Y., Shu, K.: When Fairness Meets Privacy: Fair Classification with Semi-Private Sensitive Attributes, In: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022 (2022)

  65. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis, in Theory of Cryptography, pp. 265–284. Springer, Berlin Heidelberg, Berlin, Heidelberg (2006)

    Google Scholar 

  66. Pujol, D., McKenna, R., Kuppam, S., Hay, M., Machanavajjhala, A., Miklau, G.: Fair Decision Making Using Privacy-Protected Data, In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, New York, NY, USA,) FAT 20, p. 189–199 (2020). https://doi.org/10.1145/3351095.3372872

  67. de Oliveira, A.S., Kaplan, C., Mallat, K., Chakraborty, T.: An empirical analysis of fairness notions under differential privacy (2023)

  68. Ding, J., Zhang, X., Li, X., Wang, J., Yu, R., Pan, M.: Differentially private and fair classification via calibrated functional mechanism, In: Proceedings of the AAAI Conference on Artificial Intelligence, 34 , pp. 622–629 (2020)

  69. Fey, M., Lenssen, J.E.: Fast Graph Representation Learning with PyTorch Geometric, In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)

  70. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization, In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conf. Track. Proceed. (2015). http://arxiv.org/abs/1412.6980

  71. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An Imperative Style, High-Performance Deep Learning Library, In: Advances in Neural Information Processing Systems 32 (Curran Associates, Inc.,) pp. 8024–8035 (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Download references

Acknowledgements

This research was supported in part by the University of Pittsburgh Center for Research Computing through the resources provided. The first author acknowledges the support from the SCI fellowship.

Funding

This research was supported in part by the University of Pittsburgh Center for Research Computing through the resources provided.

Author information

Authors and Affiliations

Authors

Contributions

Balaji Palanisamy as an advisor reviewed and edited the manuscript. Ying Song finished all sections except for the above two parts.

Corresponding author

Correspondence to Ying Song.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Ethics approval and consent to participate

Yes.

Consent for publication

Yes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Empirical analysis

1.1 A.1 Data synthesis

Fig. 7
figure 7

Distributions of biased and unbiased graph data based on the minor sensitive attribute. The minor sensitive attribute is binary, where group 0 represents the minority while group 1 denotes the majority

Please note that \(S_m\) can be highly related to \(S_n\) or even not. For \(S_n\), we generate a \(2500 \times 3\) biased non-sensitive feature matrix from multivariate normal distributions \(\mathcal {N} \left( \mu _0, \Sigma _0\right) \) and \(\mathcal {N} \left( \mu _1, \Sigma _1\right) \), where subgroup 0 represents minority in reality, \(\mu _0=\left( -10,-2,-5\right) ^{T}\), \(\mu _1=\left( 10,2,5\right) ^{T}\), \(\Sigma _0=\Sigma _1\) are both identity matrices, and \(|S_0|=500\) and \(|S_1|=2000\). We combine the top 100 sample from \(\left( \mu _0,\Sigma _0\right) \), the top 200 from \(\left( \mu _1,\Sigma _1\right) \), then the next 200 from \(\left( \mu _0,\Sigma _0\right) \) and 600 from \(\left( \mu _1,\Sigma _1\right) \), then the next 100 from \(\left( \mu _0,\Sigma _0\right) \) and 700 from \(\left( \mu _1,\Sigma _1\right) \), and finally the last 100 from \(\left( \mu _0,\Sigma _0\right) \) and 500 from \(\left( \mu _1,\Sigma _1\right) \). Next, generate \(S_n\) based on this combination. And then we create another \(2500 \times 3\) non-sensitive matrix attached to \(S_n\) from multivariate normal distributions \(\mathcal {N} \left( \mu _2, \Sigma _2\right) \) and \(\mathcal {N} \left( \mu _3, \Sigma _3\right) \), where subgroup 2 represents minority in reality, \(\mu _2=\left( -12,-8,-4\right) ^{T}\), \(\mu _3=\left( 12,8,4\right) ^{T}\), \(\Sigma _2=\Sigma _3\) are both identity matrices, and \(|S_2|=700\) and \(|S_3|=1800\). We combine the top 300 from \(\left( \mu _2,\Sigma _2\right) \), and 1200 from \(\left( \mu _3,\Sigma _3\right) \), and then 400 from \(\left( \mu _2,\Sigma _2\right) \) and 600 from \(\left( \mu _3,\Sigma _3\right) \). Next, generate \(S_m\) based on this combination again. Debiased features are sampled from multivariate normal distributions with \(\mu _4=\left( 0,1,0,1,0,1\right) \) and covariance as the identity matrix. Second, the biased topology is formed via the stochastic block model, where the first block contains 500 nodes while the second contains 2000 nodes, and the link probability within blocks is 5e-3 and between blocks is 1e-7. The debiased topology is built with a random geometric graph with 0.033 radius.

1.2 A.2 Implementation details

We built 1-layer GCNs with PyTorch Geometric [69] with Adam optimizer [70], learning rate 1e-3, dropout 0.2, weight decay 1e-5, training epoch 1000, and hidden layer size 16 and implemented them in PyTorch [71]. All experiments are conducted on a 64-bit machine with 4 Nvidia A100 GPUs. The experiments are trained on 1, 000 epochs. We repeat experiments 10 times with different seeds to report the average results.

Algorithm 1
figure a

Pre-masking Strategies.

Algorithm 2
figure b

MAPPING.

Appendix B: Pseudo codes

1.1 B.1 Algorithm 1 - Pre-masking Strategies

1.2 B.2 Algorithm 2 - MAPPING

Appendix C: Experiments

1.1 C.1 Dataset description

In German, nodes represent bank clients and edges are connected based on the similarity of clients’ credit accounts. To control credit risks, the bank needs to differentiate applicants with good/bad credits. In Recidivism, nodes denote defendants who got released on bail at the U.S state courts during 1990-2009 and edges are linked based on the similarity of defendants’ basic demographics and past criminal histories. The task is to predict whether the defendants will be bailed. In Credit, nodes are credit card applicants and edges are formed based on the similarity of applicants’ spending and payment patterns. The goal is to predict whether the applicants will default.

1.2 C.2 Hyperparameter setting of SOTA

In this subsection, we detail the hyperparameters for different fair models. To obtain relatively better performance, we leverage Optuna to facilitate grid search.

FairGNN: dropout from \(\{0.1,0.2,0.3,0.4,0.5\}\), weight decay 1e-5, learning rate \(\{0.001,0.005,0.01,0.05,0.1\}\), regularization coefficients \(\alpha =4\) and \(\beta =0.01\), sensitive number 200 and label number 500, hidden layer size \(\{16,32,64,128\}\), .

NIFTY: project hidden layer size 16, drop edge and feature rates are 0.001 and 0.1, dropout \(\{0.1,0.3,0.5\}\), weight decay 1e-5, learning rate \(\{0.0001,0.001,0.01\}\), regularization coefficient \(\{0.4,0.5,0.6,0.7,0.8\}\), hidden layer size 16.

EDITS: we directly use the debiased datasets in [15], dropout \(\{0.05,0.1,0.3,0.5\}\), weight decay {1e-4,1e-5,1e-6,1e-7}, learning rate \(\{0.001,0.005,0.01,0.05\}\), hidden layer size 16.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, Y., Palanisamy, B. MAPPING: debiasing graph neural networks for fair node classification with limited sensitive information leakage. World Wide Web 27, 74 (2024). https://doi.org/10.1007/s11280-024-01312-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01312-0

Keywords

Navigation