Computer Science > Machine Learning

arXiv:2102.02079 (cs)

[Submitted on 3 Feb 2021 (v1), last revised 28 Oct 2021 (this version, v4)]

Title:Federated Learning on Non-IID Data Silos: An Experimental Study

Authors:Qinbin Li, Yiqun Diao, Quan Chen, Bingsheng He

View PDF

Abstract:Due to the increasing privacy concerns and data regulations, training data have been increasingly fragmented, forming distributed databases of multiple "data silos" (e.g., within different organizations and countries). To develop effective machine learning services, there is a must to exploit data from such distributed databases without exchanging the raw data. Recently, federated learning (FL) has been a solution with growing interests, which enables multiple parties to collaboratively train a machine learning model without exchanging their local data. A key and common challenge on distributed databases is the heterogeneity of the data distribution among the parties. The data of different parties are usually non-independently and identically distributed (i.e., non-IID). There have been many FL algorithms to address the learning effectiveness under non-IID data settings. However, there lacks an experimental study on systematically understanding their advantages and disadvantages, as previous studies have very rigid data partitioning strategies among parties, which are hardly representative and thorough. In this paper, to help researchers better understand and study the non-IID data setting in federated learning, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases. Moreover, we conduct extensive experiments to evaluate state-of-the-art FL algorithms. We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases. Our experiments provide insights for future studies of addressing the challenges in "data silos".

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2102.02079 [cs.LG]
	(or arXiv:2102.02079v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.02079

Submission history

From: Qinbin Li [view email]
[v1] Wed, 3 Feb 2021 14:29:09 UTC (823 KB)
[v2] Thu, 4 Feb 2021 06:45:28 UTC (823 KB)
[v3] Thu, 22 Jul 2021 14:01:16 UTC (844 KB)
[v4] Thu, 28 Oct 2021 15:22:21 UTC (1,199 KB)

Computer Science > Machine Learning

Title:Federated Learning on Non-IID Data Silos: An Experimental Study

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Federated Learning on Non-IID Data Silos: An Experimental Study

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators