research-article

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

Authors:

Jianwei HuangAuthors Info & Claims

IEEE Journal on Selected Areas in Communications, Volume 42, Issue 11

Pages 3064 - 3077

https://doi.org/10.1109/JSAC.2024.3431516

Published: 01 November 2024 Publication History

Abstract

Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients’ local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (<monospace>FedAL</monospace>) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients’ local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients’ local training and global knowledge transfer due to clients’ heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients’ ability to transfer/learn knowledge to/from others. Experimental results show that <monospace>FedAL</monospace> and its variants achieve higher accuracy than other federated KD baselines.

References

[1]

P. Kairouz et al., “Advances and open problems in federated learning,” Found. Trends Mach. Learn., vol. 14, nos. 1–2, pp. 1–210, Jun. 2021.

Digital Library

[2]

D. Li and J. Wang, “FedMD: Heterogenous federated learning via model distillation,” 2019, arXiv:1910.03581.

[3]

M. Hoang, N. Hoang, B. K. H. Low, and C. Kingsford, “Collective model fusion for multiple black-box experts,” in Proc. Int. Conf. Mach. Learn., vol. 97, 2019, pp. 2742–2750.

[4]

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., vol. 54, 2017, pp. 1273–1282.

[5]

P. Sun et al., “Pain-FL: Personalized privacy-preserving incentive for federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3805–3820, Dec. 2021.

[6]

P. Sun, X. Chen, G. Liao, and J. Huang, “A profit-maximizing model marketplace with differentially private federated learning,” in Proc. IEEE Conf. Comput. Commun., May 2022, pp. 1439–1448.

[7]

Y. Jiao, K. Yang, T. Wu, C. Jian, and J. Huang, “Provably convergent federated trilevel learning,” Proc. AAAI Conf. Artif. Intell., vol. 38, no. 11, pp. 12928–12937, Mar. 2024.

[8]

J. Ba and R. Caruana, “Do deep nets really need to be deep?,” in Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 2654–2662.

[9]

L. Zhang, D. Wu, and X. Yuan, “FedZKT: Zero-shot knowledge transfer towards resource-constrained federated learning with heterogeneous on-device models,” in Proc. IEEE 42nd Int. Conf. Distrib. Comput. Syst. (ICDCS), Jul. 2022, pp. 928–938.

[10]

J.-H. Ahn, O. Simeone, and J. Kang, “Wireless federated distillation for distributed edge learning with heterogeneous data,” in Proc. IEEE 30th Annu. Int. Symp. Pers., Indoor Mobile Radio Commun. (PIMRC), Sep. 2019, pp. 1–6.

[11]

E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data,” 2018, arXiv:1811.11479.

[12]

Y. Tan et al., “FedProto: Federated prototype learning across heterogeneous clients,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 8, 2022, pp. 8432–8440.

[13]

T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 2351–2363.

[14]

Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for heterogeneous federated learning,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 12878–12889.

[15]

L. Zhang, L. Shen, L. Ding, D. Tao, and L. Duan, “Fine-tuning global model via data-free knowledge distillation for non-IID federated learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10164–10173.

[16]

C. He, M. Annavaram, and S. Avestimehr, “Group knowledge transfer: Federated learning of large CNNs at the edge,” in Proc. NIPS, vol. 33, 2020, pp. 14068–14080.

[17]

D. Yao et al., “Local-global knowledge distillation in heterogeneous federated learning with non-IID data,” 2021, arXiv:2107.00051.

[18]

C. Wu, F. Wu, R. Liu, L. Lyu, Y. Huang, and X. Xie, “FedKD: Communication efficient federated learning via knowledge distillation,” Nature Commun., vol. 13, p. 2032, Apr. 2022.

[19]

Q. Li, B. He, and D. Song, “Practical one-shot federated learning for cross-silo setting,” in Proc. 30th Int. Joint Conf. Artif. Intell., 2021, pp. 1484–1490.

[20]

Y. Jee Cho, J. Wang, T. Chiruvolu, and G. Joshi, “Personalized federated learning for heterogeneous clients with clustered knowledge transfer,” 2021, arXiv:2109.08119.

[21]

K. Ozkara, N. Singh, D. Data, and S. N. Diggavi, “QuPeD: Quantized personalization via distillation with applications to federated learning,” in Proc. 34th Annu. Conf. Neural Inf. Process. Syst., vol. 34, 2021, pp. 3622–3634.

[22]

C. Heinbaugh, E. Luz-Ricca, and H. Shao, “Data-free one-shot federated learning under very high statistical heterogeneity,” in Proc. Int. Conf. Learn. Represent., 2023, pp. 1–17.

[23]

F. Sattler, A. Marban, R. Rischke, and W. Samek, “Communication-efficient federated distillation,” 2020, arXiv:2012.00632.

[24]

L. Hu, H. Yan, L. Li, Z. Pan, X. Liu, and Z. Zhang, “MHAT: An efficient model-heterogenous aggregation training scheme for federated learning,” Inf. Sci., vol. 560, pp. 493–503, Jun. 2021.

[25]

W. Huang, M. Ye, and B. Du, “Learn from others and be yourself in heterogeneous federated learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10143–10153.

[26]

W. Huang, M. Ye, Z. Shi, and B. Du, “Generalizable heterogeneous federated cross-correlation and instance similarity learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 2, pp. 712–728, Feb. 2024.

Digital Library

[27]

G. Lee, Y. Shin, M. Jeong, and S.-Y. Yun, “Preservation of the global knowledge by not-true self knowledge distillation in federated learning,” in Proc. Conf. Neural Inf. Process. Syst., 2021, pp. 1–14.

[28]

J. Tang et al., “FedRAD: Heterogeneous federated learning via relational adaptive distillation,” Sensors, vol. 23, no. 14, p. 6518, Jul. 2023.

[29]

Y. He, Y. Chen, X. Yang, H. Yu, Y.-H. Huang, and Y. Gu, “Learning critically: Selective self-distillation in federated learning on non-IID data,” IEEE Trans. Big Data, early access, Jul. 11, 2022. 10.1109/TBDATA.2022.3189703.

[30]

C. Xu, Z. Hong, M. Huang, and T. Jiang, “Acceleration of federated learning with alleviated forgetting in local training,” 2022, arXiv:2203.02645.

[31]

S. Wang et al., “DFRD: Data-free robustness distillation for heterogeneous federated learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 36, 2024, pp. 17854–17866.

[32]

M. Aljahdali, A. M. Abdelmoniem, M. Canini, and S. Horváth, “Flashback: Understanding and mitigating forgetting in federated learning,” 2024, arXiv:2402.05558.

[33]

S. Kim et al., “Federated learning with knowledge distillation for multi-organ segmentation with partially labeled datasets,” Med. Image Anal., vol. 95, Jul. 2024, Art. no.

[34]

J. Liu, Q. Zeng, H. Xu, Y. Xu, Z. Wang, and H. Huang, “Adaptive block-wise regularization and knowledge distillation for enhancing federated learning,” IEEE/ACM Trans. Netw., vol. 32, no. 1, pp. 791–805, Feb. 2024.

Digital Library

[35]

H. Jin et al., “Personalized edge intelligence via federated self-knowledge distillation,” IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 2, pp. 567–580, Feb. 2023.

[36]

N. Shoham et al., “Overcoming forgetting in federated learning on non-IID data,” in Proc. NeurIPS Workshop, 2019, pp. 1–6.

[37]

Z. Xu, Y. C. Hsu, and J. Huang, “Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks,” in Proc. Int. Conf. Learn. Represent. Workshop, 2018, pp. 1–4.

[38]

Y. Liu, W. Zhang, and J. Wang, “Learning from a lightweight teacher for efficient knowledge distillation,” 2020, arXiv:2005.09163.

[39]

C. Zhuo, D. Gao, and L. Liu, “PKDGAN: Private knowledge distillation with generative adversarial networks,” IEEE Trans. Big Data, early access, Oct. 31, 2022. 10.1109/TBDATA.2022.3216566.

[40]

X. Wang, R. Zhang, Y. Sun, and J. Qi, “KDGAN: Knowledge distillation with generative adversarial networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 775–786.

[41]

Y. Wang, C. Xu, C. Xu, and D. Tao, “Adversarial learning of portable student networks,” in Proc. AAAI Conf. Artif. Intell., vol. 32, 2018, pp. 4260–4267.

[42]

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), 2014, pp. 1–9.

[43]

P. Han, X. Shi, and J. Huang, “FedAL: Black-box federated knowledge distillation enabled by adversarial learning,” 2023, arXiv:2311.16584.

[44]

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Mach. Learn., vol. 79, no. 1, pp. 151–175, 2010.

Digital Library

[45]

S. S. Shwartz and S. B. David, Understanding Machine Learning: From Theory to Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2014.

[46]

I. Bistritz, A. Mann, and N. Bambos, “Distributed distillation for on-device learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 22593–22604.

[47]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proc. NeurIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011, pp. 1–7.

[48]

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Univ. Toronto, Toronto, ON, USA, Tech. Rep., 2009.

[49]

L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “CINIC-10 is not ImageNet or CIFAR-10,” 2018, arXiv:1810.03505.

[50]

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,” 2019, arXiv:1909.06335.

[51]

M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, and Y. Khazaeni, “Bayesian nonparametric federated learning of neural networks,” in Proc. Int. Conf. Mach. Learn., vol. 97, 2019, pp. 7252–7261.

[52]

D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–13.

Index Terms

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
  2. Machine learning
2. Theory of computation
  1. Design and analysis of algorithms
    1. Distributed algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

: Toward Heterogeneous Federated Learning via Global Knowledge Distillation
Federated learning, as one enabling technology of edge intelligence, has gained substantial attention due to its efficacy in training deep learning models without data privacy and network bandwidth concerns. However, due to the heterogeneity of the edge ...
Federated Learning with Label-Masking Distillation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, ...
Personalized Federated Learning via Backbone Self-Distillation
MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

In practical scenarios, federated learning frequently necessitates training personalized models for each client using heterogeneous data. This paper proposes a backbone self-distillation approach to facilitate personalized federated learning. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Journal on Selected Areas in Communications

IEEE Journal on Selected Areas in Communications Volume 42, Issue 11

Nov. 2024

334 pages

Issue’s Table of Contents

0733-8716 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 November 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents