A survey of multimodal federated learning: background, applications, and perspectives

Hao Pan¹,
Xiaoli Zhao¹,
Lipeng He²,
Yicong Shi¹ &
…
Xiaogang Lin¹

719 Accesses
2 Altmetric
Explore all metrics

Abstract

Multimodal Federated Learning (MMFL) is a novel machine learning technique that enhances the capabilities of traditional Federated Learning (FL) to support collaborative training of local models using data available in various modalities. With the generation and storage of a vast amount of multimodal data from the internet, sensors, and mobile devices, as well as the rapid iteration of artificial intelligence models, the demand for multimodal models is growing rapidly. While FL has been widely studied in the past few years, most of the existing research was based in unimodal settings. With the hope of inspiring more applications and research within the MMFL paradigm, we conduct a comprehensive review of the progress and challenges in various aspects of state-of-the-art MMFL. Specifically, we analyze the research motivation for MMFL, propose a new classification method of existing research, discuss the available datasets and application scenarios, and put forward perspectives on the opportunities and challenges faced by MMFL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Federated Learning on Multimodal Data: A Comprehensive Survey

Article 01 June 2023

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

A Multimodal Contrastive Federated Learning for Digital Healthcare

Article 02 September 2023

Data availability

No datasets were generated or analysed during the current study.

References

Cai, Y., Cai, H., Wan, X.: Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2506–2515 (2019)
Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., Poria, S.: Towards multimodal sarcasm detection (an _obviously_ perfect paper). Preprint at arXiv:1906.01815 (2019)
Yu, Q., Liu, Y., Wang, Y., Xu, K., Liu, J.: Multimodal federated learning via contrastive representation ensemble. Preprint at arXiv:2302.08888v3 (2023)
Thrasher, J., Devkota, A., Siwakotai, P., Chivukula, R., Poudel, P., Hu, C., Bhattarai, B., Gyawali, P.: Multimodal federated learning in healthcare: a review. Preprint at arXiv:2310.09650 (2023)
Wang, K., Yin, Q., Wang, W., Wu, S., Wang, L.: A comprehensive survey on cross-modal retrieval. Preprint at arXiv:1607.06215 (2016)
Ghandi, T., Pourreza, H., Mahyar, H.: Deep learning approaches on image captioning: a review. ACM Comput. Surv. 56(3), 1–39 (2023)
Article Google Scholar
Hussain, T., Muhammad, K., Ding, W., Lloret, J., Baik, S.W., Albuquerque, V.H.C.: A comprehensive survey of multi-view video summarization. Pattern Recogn. 109, 107567 (2021)
Article Google Scholar
Liang, P.P., Liu, T., Cai, A., Muszynski, M., Ishii, R., Allen, N., Auerbach, R., Brent, D., Salakhutdinov, R., Morency, L.-P.: Learning language and multimodal privacy-preserving markers of mood from mobile data. Preprint at arXiv:2106.13213 (2021)
Dalmaz, O., Yurt, M., Çukur, T.: Resvit: Residual vision transformers for multimodal medical image synthesis. IEEE Trans. Med. Imaging 41(10), 2598–2614 (2022)
Article Google Scholar
Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)
Article Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282 PMLR, (2017)
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. syst. 2, 429–450 (2020)
Google Scholar
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: Scaffold: Stochastic controlled averaging for federated learning. In: International Conference on Machine Learning, pp. 5132–5143 PMLR, (2020)
Li, X., Jiang, M., Zhang, X., Kamp, M., Dou, Q.: Fedbn: Federated learning on non-iid features via local batch normalization. Preprint at arXiv:2102.07623 (2021)
Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inform. Proc. Syst. 33, 7611–7623 (2020)
Google Scholar
Baltrušaitis, T., Ahuja, C., Morency, L.-P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Article Google Scholar
Chen, S., Li, B.: Towards optimal multi-modal federated learning on non-iid data with hierarchical gradient blending. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pp. 1469–1478 IEEE, (2022)
Che, L., Wang, J., Zhou, Y., Ma, F.: Multimodal federated learning: a survey. Sensors 23(15), 6986 (2023)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 Springer, (2014)
Tan, A.Z., Yu, H., Cui, L., Yang, Q.: Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems (2022)
Smith, V., Chiang, C.-K., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning. Advances in neural information processing systems 30 (2017)
Gao, J., Li, P., Chen, Z., Zhang, J.: A survey on deep learning for multimodal data fusion. Neural Comput. 32(5), 829–864 (2020)
Article MathSciNet Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: Alignment and parsing of video and text transcription. In: Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV 10, pp. 158–171 Springer, (2008)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Preprint at arXiv:1409.1556 (2014)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. Preprint at arXiv:2010.11929 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at arXiv:1412.3555 (2014)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Preprint at arXiv:1301.3781 (2013)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in Neural Information Processing Systems 30,(2017)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 PMLR (2020)
Feng, T., Bose, D., Zhang, T., Hebbar, R., Ramakrishna, A., Gupta, R., Zhang, M., Avestimehr, S., Narayanan, S.: Fedmultimodal: A benchmark for multimodal federated learning. Preprint at arXiv:2306.09486 (2023)
Zhang, N., Ding, S., Zhang, J., Xue, Y.: An overview on restricted boltzmann machines. Neurocomputing 275, 1186–1199 (2018)
Article Google Scholar
Tschannen, M., Bachem, O., Lucic, M.: Recent advances in autoencoder-based representation learning. Preprint at arXiv:1812.05069 (2018)
Muhammad, G., Alshehri, F., Karray, F., El Saddik, A., Alsulaiman, M., Falk, T.H.: A comprehensive survey on multimodal medical signals fusion for smart healthcare systems. Inform. Fusion 76, 355–375 (2021)
Article Google Scholar
Huang, K., Shi, B., Li, X., Li, X., Huang, S., Li, Y.: Multi-modal sensor fusion for auto driving perception: A survey. Preprint at arXiv:2202.02703 (2022)
Qi, P., Chiaro, D., Piccialli, F.: Fl-fd: Federated learning-based fall detection with multimodal data fusion. Inform. Fusion 99, 101890 (2023)
Article Google Scholar
Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hofmann, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. Advances in Neural Information Processing Systems 27 (2014)
Ma, C., Smith, V., Jaggi, M., Jordan, M., Richtárik, P., Takác, M.: Adding vs. averaging in distributed primal-dual optimization. In: International Conference on Machine Learning, pp. 1973–1982 PMLR, (2015)
Ye, M., Fang, X., Du, B., Yuen, P.C., Tao, D.: Heterogeneous federated learning: state-of-the-art and research challenges. ACM Comput. Surv. 56(3), 1–44 (2023)
Article Google Scholar
Reisizadeh, A., Tziotis, I., Hassani, H., Mokhtari, A., Pedarsani, R.: Straggler-resilient federated learning: Leveraging the interplay between statistical accuracy and system heterogeneity. IEEE J. Selected Areas Inform. Theory 3(2), 197–205 (2022)
Article Google Scholar
Chen, J., Zhang, A.: Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 87–96 (2022)
Liu, Y., Kang, Y., Zou, T., Pu, Y., He, Y., Ye, X., Ouyang, Y., Zhang, Y.-Q., Yang, Q.: Vertical federated learning: Concepts, advances, and challenges. IEEE Transactions on Knowledge and Data Engineering (2024)
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)
Article Google Scholar
Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Federated learning for vision-and-language grounding problems. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 11572–11579 (2020)
Lin, Y.-M., Gao, Y., Gong, M.-G., Zhang, S.-J., Zhang, Y.-Q., Li, Z.-Y.: Federated learning on multimodal data: a comprehensive survey. Mach. Intell. Res. 4, 1–15 (2023)
Google Scholar
Chen, J., Zhang, A.: On disentanglement of asymmetrical knowledge transfer for modality-task agnostic federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 38, pp. 11311–11319 (2024)
Fallah, A., Mokhtari, A., Ozdaglar, A.: Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Adv. Neural Inform. Proc. Syst. 33, 3557–3568 (2020)
Google Scholar
Liang, P.P., Liu, T., Ziyin, L., Allen, N.B., Auerbach, R.P., Brent, D., Salakhutdinov, R., Morency, L.-P.: Think locally, act globally: Federated learning with local and global representations. Preprint at arXiv:2001.01523 (2020)
Yang, X., Xiong, B., Huang, Y., Xu, C.: Cross-modal federated human activity recognition via modality-agnostic and modality-specific representation learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 36, pp. 3063–3071 (2022)
Qayyum, A., Ahmad, K., Ahsan, M.A., Al-Fuqaha, A., Qadir, J.: Collaborative federated learning for healthcare: Multi-modal covid-19 diagnosis at the edge. IEEE Open J. Comput. Soc. 3, 172–184 (2022)
Article Google Scholar
Li, D., Xie, W., Li, Y., Fang, L.: Fedfusion: Manifold driven federated learning for multi-satellite and multi-modality fusion. IEEE Transactions on Geoscience and Remote Sensing (2023)
Yang, X., Xiong, B., Huang, Y., Xu, C.: Cross-modal federated human activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)
Dai, Q., Wei, D., Liu, H., Sun, J., Wang, L., Zheng, Y.: Federated modality-specific encoders and multimodal anchors for personalized brain tumor segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 38, pp. 1445–1453 (2024)
Agbley, B.L.Y., Li, J., Haq, A.U., Bankas, E.K., Ahmad, S., Agyemang, I.O., Kulevome, D., Ndiaye, W.D., Cobbinah, B., Latipova, S.: Multimodal melanoma detection with federated learning. In: 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 238–244 IEEE, (2021)
Xiong, B., Yang, X., Qi, F., Xu, C.: A unified framework for multi-modal federated learning. Neurocomputing 480, 110–118 (2022)
Article Google Scholar
Zheng, T., Li, A., Chen, Z., Wang, H., Luo, J.: Autofed: Heterogeneity-aware federated multimodal learning for robust autonomous driving. Preprint at arXiv:2302.08646 (2023)
Lu, W., Hu, X., Wang, J., Xie, X.: Fedclip: Fast generalization and personalization for clip in federated learning. Preprint at arXiv:2302.13485 (2023)
Ouyang, X., Xie, Z., Fu, H., Cheng, S., Pan, L., Ling, N., Xing, G., Zhou, J., Huang, J.: Harmony: Heterogeneous multi-modal federated learning through disentangled model training. In: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, pp. 530–543 (2023)
Chen, J., Pan, R.: Medical report generation based on multimodal federated learning. Comput. Med. Imaging Graph. 113, 102342 (2024)
Article Google Scholar
Zong, L., Xie, Q., Zhou, J., Wu, P., Zhang, X., Xu, B.: Fedcmr: Federated cross-modal retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1672–1676 (2021)
Zhao, Y., Barnaghi, P., Haddadi, H.: Multimodal federated learning on iot data. In: 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI), pp. 43–54 IEEE, (2022)
Le, H.Q., Nguyen, M.N., Thwal, C.M., Qiao, Y., Zhang, C., Hong, C.S.: Fedmekt: Distillation-based embedding knowledge transfer for multimodal federated learning. Preprint at arXiv:2307.13214 (2023)
Guo, T., Guo, S., Wang, J.: pfedprompt: Learning personalized prompt for vision-language models in federated learning. In: Proceedings of the ACM Web Conference 2023, pp. 1364–1374 (2023)
Bao, G., Zhang, Q., Miao, D., Gong, Z., Hu, L.: Multimodal federated learning with missing modality via prototype mask and contrast. Preprint at arXiv:2312.13508 (2023)
Yu, S., Yang, Q., Wang, J., Wu, C.: Fedusl,: A federated annotation method for driving fatigue detection based on multimodal sensing data. ACM Trans. Sensor Netw. (2024). https://doi.org/10.1145/3657291
Article Google Scholar
Gong, M., Zhang, Y., Gao, Y., Qin, A., Wu, Y., Wang, S., Zhang, Y.: A multi-modal vertical federated learning framework based on homomorphic encryption. IEEE Transactions on Information Forensics and Security (2023)
Tan, M., Feng, Y., Chu, L., Shi, J., Xiao, R., Tang, H., Yu, J.: Fedsea: Federated learning via selective feature alignment for non-iid multimodal data. IEEE Transactions on Multimedia (2023)
Yuan, L., Han, D.-J., Wang, S., Upadhyay, D., Brinton, C.G.: Communication-efficient multimodal federated learning: Joint modality and client selection. Preprint at arXiv:2401.16685 (2024)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135 PMLR, (2017).
Hu, M., Luo, M., Huang, M., Meng, W., Xiong, B., Yang, X., Sang, J.: Towards a multimodal human activity dataset for healthcare. Multimed. Syst. 29(1), 1–13 (2023)
Article Google Scholar
Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečnỳ, J., Kumar, S., McMahan, H.B.: Adaptive federated optimization. Preprint at arXiv:2003.00295 (2020)
Fang, H., Gupta, S., Iandola, F., Srivastava, R.K., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J.C., Zitnick, C.L., Zweig, G.: From captions to visual concepts and back. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Zhen, L., Hu, P., Wang, X., Peng, D.: Deep supervised cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10394–10403 (2019)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv:1810.04805 (2018)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 PMLR, (2021)
Sun, Y.: Federated transfer learning with multimodal data. Preprint at arXiv:2209.03137 (2022)
Saeed, A., Salim, F.D., Ozcelebi, T., Lukkien, J.: Federated self-supervised learning of multisensor representations for embedded intelligence. IEEE Int. Things J. 8(2), 1030–1040 (2020)
Article Google Scholar
Wang, J., Yang, X., Cui, S., Che, L., Lyu, L., Xu, D.D., Ma, F.: Towards personalized federated learning via heterogeneous model reassembly. Adv. Neural Inform. Proc. Syst. 36 (2024)
Kim, W., Son, B., Kim, I.: Vilt: Vision-and-language transformer without convolution or region supervision. International Conference on Machine Learning, 5583–5594 PMLR (2021)
Bao, H., Wang, W., Dong, L., Liu, Q., Mohammed, O.K., Aggarwal, K., Som, S., Piao, S., Wei, F.: Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inform. Proc. Syst. 35, 32897–32912 (2022)
Google Scholar
Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2641–2649 (2015)
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)
Zhao, S., Jia, G., Yang, J., Ding, G., Keutzer, K.: Emotion recognition from multiple modalities: fundamentals and methodologies. IEEE Signal Proc. Mag. 38(6), 59–73 (2021)
Article Google Scholar
Chaturvedi, V., Kaur, A.B., Varshney, V., Garg, A., Chhabra, G.S., Kumar, M.: Music mood and human emotion recognition based on physiological signals: a systematic review. Multimed. Syst. 28(1), 21–44 (2022)
Article Google Scholar
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., Narayanan, S.S.: Iemocap: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Article Google Scholar
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: Meld: A multimodal multi-party dataset for emotion recognition in conversations. Preprint at arXiv:1810.02508v6 (2018)
Zadeh, A.B., Liang, P.P., Poria, S., Cambria, E., Morency, L.-P.: Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Vol. 1, pp. 2236–2246 Long Papers, (2018)
Huang, Y., Yang, X., Gao, J., Sang, J., Xu, C.: Knowledge-driven egocentric multimodal activity recognition. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 16(4), 1–133 (2020)
Article Google Scholar
Singh, R., Sonawane, A., Srivastava, R.: Recent evolution of modern datasets for human activity recognition: a deep survey. Multimed. Syst. 26(2), 83–106 (2020)
Article Google Scholar
Chao, X., Hou, Z., Mo, Y., Shi, H., Yao, W.: Structural feature representation and fusion of human spatial cooperative motion for action recognition. Multimed. Syst. 29(3), 1301–1314 (2023)
Article Google Scholar
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al.: The kinetics human action video dataset. Preprint at arXiv:1705.06950 (2017)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. Preprint at arXiv:1212.0402 (2012)
Kwolek, B., Kepski, M.: Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Prog. Biomed. 117(3), 489–501 (2014)
Article Google Scholar
Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: detecting hate speech in multimodal memes. Adv. Neural Inform. Proc. Syst. 33, 2611–2624 (2020)
Google Scholar
Hasan, M.K., Rahman, W., Zadeh, A., Zhong, J., Tanveer, M.I., Morency, L.-P., et al.: Ur-funny: A multimodal language dataset for understanding humor. Preprint at arXiv:1904.06618 (2019)
Alam, F., Ofli, F., Imran, M.: Crisismmd: Multimodal twitter datasets from natural disasters. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)
Duarte, M.F., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64(7), 826–838 (2004)
Article Google Scholar
Banos, O., Garcia, R., Holgado-Terriza, J.A., Damas, M., Pomares, H., Rojas, I., Saez, A., Villalonga, C.: mhealthdroid: a novel framework for agile development of mobile health applications. In: Ambient Assisted Living and Daily Activities: 6th International Work-Conference, IWAAL 2014, Belfast, UK, December 2-5, 2014. Proceedings 6, pp. 91–98 Springer, (2014)
Wagner, P., Strodthoff, N., Bousseljot, R.-D., Kreiseler, D., Lunze, F.I., Samek, W., Schaeffter, T.: Ptb-xl, a large publicly available electrocardiography dataset. Sci. Data 7(1), 154 (2020)
Article Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Liang, P.P., Lyu, Y., Fan, X., Wu, Z., Cheng, Y., Wu, J., Chen, L., Wu, P., Lee, M.A., Zhu, Y., et al.: Multibench: Multiscale benchmarks for multimodal representation learning. Preprint at arXiv:2107.07502 (2021)
Li, X.: Tag relevance fusion for social image retrieval. Multimed. Syst. 23(1), 29–40 (2017)
Article Google Scholar
Bano, S., Tonellotto, N., Cassarà, P., Gotta, A.: Fedcmd: A federated cross-modal knowledge distillation for drivers emotion recognition. ACM Transactions on Intelligent Systems and Technology (2024)
Liang, P.P., Liu, T., Cai, A., Muszynski, M., Ishii, R., Allen, N., Auerbach, R., Brent, D., Salakhutdinov, R., Morency, L.-P.: Learning language and multimodal privacy-preserving markers of mood from mobile data. Preprint at arXiv:2106.13213 (2021)
Li, Z., Cheng, W., Zhou, J., An, Z., Hu, B.: Deep learning model with multi-feature fusion and label association for suicide detection. Multimed. Syst. 29(4), 2193–2203 (2023)
Article Google Scholar
Gupta, A., Savarese, S., Ganguli, S., Fei-Fei, L.: Embodied intelligence via learning and evolution. Nature Commun. 12(1), 5721 (2021)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China, 62376151. Science and Technology Commission of Shanghai Municipality, 22DZ2205600.

Author information

Authors and Affiliations

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, 333 Longteng Road, Shanghai, 201620, China
Hao Pan, Xiaoli Zhao, Yicong Shi & Xiaogang Lin
Department of Combinatorics and Optimization, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada
Lipeng He

Authors

Hao Pan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lipeng He
View author publications
You can also search for this author in PubMed Google Scholar
Yicong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaogang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.P. wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaoli Zhao.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pan, H., Zhao, X., He, L. et al. A survey of multimodal federated learning: background, applications, and perspectives. Multimedia Systems 30, 222 (2024). https://doi.org/10.1007/s00530-024-01422-9

Download citation

Received: 16 March 2024
Accepted: 15 July 2024
Published: 29 July 2024
DOI: https://doi.org/10.1007/s00530-024-01422-9

A survey of multimodal federated learning: background, applications, and perspectives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Federated Learning on Multimodal Data: A Comprehensive Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A Multimodal Contrastive Federated Learning for Digital Healthcare

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A survey of multimodal federated learning: background, applications, and perspectives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Federated Learning on Multimodal Data: A Comprehensive Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A Multimodal Contrastive Federated Learning for Digital Healthcare

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation