Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

Published: 01 November 2024 Publication History

Abstract

Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients&#x2019; local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (<monospace>FedAL</monospace>) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients&#x2019; local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients&#x2019; local training and global knowledge transfer due to clients&#x2019; heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients&#x2019; ability to transfer/learn knowledge to/from others. Experimental results show that <monospace>FedAL</monospace> and its variants achieve higher accuracy than other federated KD baselines.

References

[1]
P. Kairouz et al., “Advances and open problems in federated learning,” Found. Trends Mach. Learn., vol. 14, nos. 1–2, pp. 1–210, Jun. 2021.
[2]
D. Li and J. Wang, “FedMD: Heterogenous federated learning via model distillation,” 2019, arXiv:1910.03581.
[3]
M. Hoang, N. Hoang, B. K. H. Low, and C. Kingsford, “Collective model fusion for multiple black-box experts,” in Proc. Int. Conf. Mach. Learn., vol. 97, 2019, pp. 2742–2750.
[4]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., vol. 54, 2017, pp. 1273–1282.
[5]
P. Sun et al., “Pain-FL: Personalized privacy-preserving incentive for federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3805–3820, Dec. 2021.
[6]
P. Sun, X. Chen, G. Liao, and J. Huang, “A profit-maximizing model marketplace with differentially private federated learning,” in Proc. IEEE Conf. Comput. Commun., May 2022, pp. 1439–1448.
[7]
Y. Jiao, K. Yang, T. Wu, C. Jian, and J. Huang, “Provably convergent federated trilevel learning,” Proc. AAAI Conf. Artif. Intell., vol. 38, no. 11, pp. 12928–12937, Mar. 2024.
[8]
J. Ba and R. Caruana, “Do deep nets really need to be deep?,” in Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 2654–2662.
[9]
L. Zhang, D. Wu, and X. Yuan, “FedZKT: Zero-shot knowledge transfer towards resource-constrained federated learning with heterogeneous on-device models,” in Proc. IEEE 42nd Int. Conf. Distrib. Comput. Syst. (ICDCS), Jul. 2022, pp. 928–938.
[10]
J.-H. Ahn, O. Simeone, and J. Kang, “Wireless federated distillation for distributed edge learning with heterogeneous data,” in Proc. IEEE 30th Annu. Int. Symp. Pers., Indoor Mobile Radio Commun. (PIMRC), Sep. 2019, pp. 1–6.
[11]
E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data,” 2018, arXiv:1811.11479.
[12]
Y. Tan et al., “FedProto: Federated prototype learning across heterogeneous clients,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 8, 2022, pp. 8432–8440.
[13]
T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 2351–2363.
[14]
Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for heterogeneous federated learning,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 12878–12889.
[15]
L. Zhang, L. Shen, L. Ding, D. Tao, and L. Duan, “Fine-tuning global model via data-free knowledge distillation for non-IID federated learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10164–10173.
[16]
C. He, M. Annavaram, and S. Avestimehr, “Group knowledge transfer: Federated learning of large CNNs at the edge,” in Proc. NIPS, vol. 33, 2020, pp. 14068–14080.
[17]
D. Yao et al., “Local-global knowledge distillation in heterogeneous federated learning with non-IID data,” 2021, arXiv:2107.00051.
[18]
C. Wu, F. Wu, R. Liu, L. Lyu, Y. Huang, and X. Xie, “FedKD: Communication efficient federated learning via knowledge distillation,” Nature Commun., vol. 13, p. 2032, Apr. 2022.
[19]
Q. Li, B. He, and D. Song, “Practical one-shot federated learning for cross-silo setting,” in Proc. 30th Int. Joint Conf. Artif. Intell., 2021, pp. 1484–1490.
[20]
Y. Jee Cho, J. Wang, T. Chiruvolu, and G. Joshi, “Personalized federated learning for heterogeneous clients with clustered knowledge transfer,” 2021, arXiv:2109.08119.
[21]
K. Ozkara, N. Singh, D. Data, and S. N. Diggavi, “QuPeD: Quantized personalization via distillation with applications to federated learning,” in Proc. 34th Annu. Conf. Neural Inf. Process. Syst., vol. 34, 2021, pp. 3622–3634.
[22]
C. Heinbaugh, E. Luz-Ricca, and H. Shao, “Data-free one-shot federated learning under very high statistical heterogeneity,” in Proc. Int. Conf. Learn. Represent., 2023, pp. 1–17.
[23]
F. Sattler, A. Marban, R. Rischke, and W. Samek, “Communication-efficient federated distillation,” 2020, arXiv:2012.00632.
[24]
L. Hu, H. Yan, L. Li, Z. Pan, X. Liu, and Z. Zhang, “MHAT: An efficient model-heterogenous aggregation training scheme for federated learning,” Inf. Sci., vol. 560, pp. 493–503, Jun. 2021.
[25]
W. Huang, M. Ye, and B. Du, “Learn from others and be yourself in heterogeneous federated learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10143–10153.
[26]
W. Huang, M. Ye, Z. Shi, and B. Du, “Generalizable heterogeneous federated cross-correlation and instance similarity learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 2, pp. 712–728, Feb. 2024.
[27]
G. Lee, Y. Shin, M. Jeong, and S.-Y. Yun, “Preservation of the global knowledge by not-true self knowledge distillation in federated learning,” in Proc. Conf. Neural Inf. Process. Syst., 2021, pp. 1–14.
[28]
J. Tang et al., “FedRAD: Heterogeneous federated learning via relational adaptive distillation,” Sensors, vol. 23, no. 14, p. 6518, Jul. 2023.
[29]
Y. He, Y. Chen, X. Yang, H. Yu, Y.-H. Huang, and Y. Gu, “Learning critically: Selective self-distillation in federated learning on non-IID data,” IEEE Trans. Big Data, early access, Jul. 11, 2022. 10.1109/TBDATA.2022.3189703.
[30]
C. Xu, Z. Hong, M. Huang, and T. Jiang, “Acceleration of federated learning with alleviated forgetting in local training,” 2022, arXiv:2203.02645.
[31]
S. Wang et al., “DFRD: Data-free robustness distillation for heterogeneous federated learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 36, 2024, pp. 17854–17866.
[32]
M. Aljahdali, A. M. Abdelmoniem, M. Canini, and S. Horváth, “Flashback: Understanding and mitigating forgetting in federated learning,” 2024, arXiv:2402.05558.
[33]
S. Kim et al., “Federated learning with knowledge distillation for multi-organ segmentation with partially labeled datasets,” Med. Image Anal., vol. 95, Jul. 2024, Art. no.
[34]
J. Liu, Q. Zeng, H. Xu, Y. Xu, Z. Wang, and H. Huang, “Adaptive block-wise regularization and knowledge distillation for enhancing federated learning,” IEEE/ACM Trans. Netw., vol. 32, no. 1, pp. 791–805, Feb. 2024.
[35]
H. Jin et al., “Personalized edge intelligence via federated self-knowledge distillation,” IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 2, pp. 567–580, Feb. 2023.
[36]
N. Shoham et al., “Overcoming forgetting in federated learning on non-IID data,” in Proc. NeurIPS Workshop, 2019, pp. 1–6.
[37]
Z. Xu, Y. C. Hsu, and J. Huang, “Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks,” in Proc. Int. Conf. Learn. Represent. Workshop, 2018, pp. 1–4.
[38]
Y. Liu, W. Zhang, and J. Wang, “Learning from a lightweight teacher for efficient knowledge distillation,” 2020, arXiv:2005.09163.
[39]
C. Zhuo, D. Gao, and L. Liu, “PKDGAN: Private knowledge distillation with generative adversarial networks,” IEEE Trans. Big Data, early access, Oct. 31, 2022. 10.1109/TBDATA.2022.3216566.
[40]
X. Wang, R. Zhang, Y. Sun, and J. Qi, “KDGAN: Knowledge distillation with generative adversarial networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 775–786.
[41]
Y. Wang, C. Xu, C. Xu, and D. Tao, “Adversarial learning of portable student networks,” in Proc. AAAI Conf. Artif. Intell., vol. 32, 2018, pp. 4260–4267.
[42]
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), 2014, pp. 1–9.
[43]
P. Han, X. Shi, and J. Huang, “FedAL: Black-box federated knowledge distillation enabled by adversarial learning,” 2023, arXiv:2311.16584.
[44]
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Mach. Learn., vol. 79, no. 1, pp. 151–175, 2010.
[45]
S. S. Shwartz and S. B. David, Understanding Machine Learning: From Theory to Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2014.
[46]
I. Bistritz, A. Mann, and N. Bambos, “Distributed distillation for on-device learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 22593–22604.
[47]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proc. NeurIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011, pp. 1–7.
[48]
A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Univ. Toronto, Toronto, ON, USA, Tech. Rep., 2009.
[49]
L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “CINIC-10 is not ImageNet or CIFAR-10,” 2018, arXiv:1810.03505.
[50]
T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,” 2019, arXiv:1909.06335.
[51]
M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, and Y. Khazaeni, “Bayesian nonparametric federated learning of neural networks,” in Proc. Int. Conf. Mach. Learn., vol. 97, 2019, pp. 7252–7261.
[52]
D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–13.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Journal on Selected Areas in Communications
IEEE Journal on Selected Areas in Communications  Volume 42, Issue 11
Nov. 2024
334 pages

Publisher

IEEE Press

Publication History

Published: 01 November 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media