Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care
Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care
Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care
Big data for health care is one of the potential solutions to deal with the numerous challenges of health care, such
as rising cost, aging population, precision medicine, universal health coverage, and the increase of non-
communicable diseases. However, data centralization for big data raises privacy and regulatory concerns.
Covered topics include (1) an introduction to privacy of patient data and distributed learning as a poten-
tial solution to preserving these data, a description of the legal context for patient data research, and
a definition of machine/deep learning concepts; (2) a presentation of the adopted review protocol; (3)
a presentation of the search results; and (4) a discussion of the findings, limitations of the review, and future
perspectives.
Distributed learning from federated databases makes data centralization unnecessary. Distributed algorithms
iteratively analyze separate databases, essentially sharing research questions and answers between databases
instead of sharing the data. In other words, one can learn from separate and isolated datasets without patient
data ever leaving the individual clinical institutes.
Distributed learning promises great potential to facilitate big data for medical application, in particular for
international consortiums. Our purpose is to review the major implementations of distributed learning in health
care.
JCO Clin Cancer Inform 4:184-200. © 2020 by American Society of Clinical Oncology
Licensed under the Creative Commons Attribution 4.0 License
184
Downloaded from ascopubs.org by 95.90.245.240 on August 26, 2020 from 095.090.245.240
Copyright © 2020 American Society of Clinical Oncology. All rights reserved.
Distributed Learning in Health Care
CONTEXT
Key Objective
Review the contribution of distributed learning to preserve data privacy in health care.
Knowledge Generated
Data in health care are greatly protected; therefore, accessing medical data is restricted by law and ethics. This restriction has
led to a change in research practice to adapt to new regulations. Distributed learning makes it possible to learn from medical
data without these data ever leaving the medical institutions.
Relevance
Distributed learning allows learning from medical data while guaranteeing preservation of patient privacy.
various decision trees. The results are then averaged. Each decision
tree in the forest has access to a random set of the training data,
chooses a class, and the most selected class is then the predicted
class.10
KNN KNN can be used for regression problems; however, it is widely used for Yes88,89
classification problems. In KNN, the assumption is that similar data
elements are close to each other. Given K (positive integer) and a test
observation, KNN first groups the K closest elements to the test
observation. Then, in the case of regression, it returns the mean of the
K labels, or in the case of classification, it returns the mode of the K
labels.10
Abbreviations: KNN, K-nearest neighbors; MDP, Markov decision process; SVM, support vector machine.
Distributed Learning in Health Care
222 healthy controls originating from four segmented, and standardized. performance, which was better than
datasets were collected: two datasets from performance of local models.
University Hospital Brno (Czech Republic),
University Medical Center Utrecht (The
Netherlands), and the last dataset originates
from the Prague Psychiatric Center and
Psychiatric Hospital Bohnice.
189
TABLE 2. Summary of Methods and Results of Distributed Machine Learning Studies Grouping More Than One Health Care Center (Continued)
Reference Data and Target Methods and Distributed Learning Approach Tools Accomplishments and Results
63
Jochems Clinical data from 698 patients with lung cancer, Distributed learning for a Bayesian network using Varian learning portal AUC, 0.662
treated with curative intent with CRT or RT data from three hospitals
alone were collected and stored in two medical
institutes: MAASTRO (Netherlands) and
Michigan University (United States).
Target: prediction of NSCLC 2-year survival after The model used the T category and N category, The discriminative performance of centralized
radiation therapy age, total tumor dose, and WHO performance and distributed models on the validation set
for predictions. was similar.
Brisimi65 Electronic health records from Boston Medical Soft-margin l1-regularized sparse SVM classifier. Not provided AUC, 0.56
Center of patients with at least one heart-
related diagnosis between 2005 and 2010.
The data are distributed between 10 hospitals. Developed an iterative cPDS algorithm for solving
the large-scale SVM problem in a decentralized
Abbreviations: AUC, area under the curve; BOA, Beyond Ontology Awareness; COBRA, Consortium for Brachytherapy Data Analysis; cPDS, cluster Primal Dual Splitting; CRT, chemoradiation; NSCLC,
non–small-cell lung cancer; RT, radiotherapy; SVM, support vector machine.
Hospital 2
DB Hospital 1
Sending parameters/master
Hospital 2 Sending local model/hospital
Master machine
1 0 9
hospital 2
hospital 1
FIG 2. Schematic representation of the processes in a transparent distributed learning network. (A) Data preparation steps. (B) Distributed learning network,
which is composed of three hospitals, each of which is equipped with a learning machine that can communicate with a master machine responsible for
sending model parameters and checking convergence criteria. (C) Flowchart of the distributed learning network described in B. (D) Example of an action that
can be tracked by blockchain (designed and implemented according to needs agreed among network members) and keep all network participants aware of
any new activity taken in the network. DB, database; FAIR, findable, accessible, interoperable, reusable.
thickness estimation, and so on) of human brain magnetic model is ready, not only can the network participants use it
resonance images.35 The results demonstrated perfor- to learn from their data, but this learning should be able to
mance improvement on the test datasets. Similar to the be performed locally and under highly private and secure
previous study, a brain tumor segmentation was suc- conditions to protect the model’s output.23
cessfully performed using distributed deep learning across The users of a machine/deep learning model are not
10 institutions (BraTS distribution).36 necessarily the model’s developers. Hence, documentation
In the matter of distributed deep learning, the training and the integration of automated data eligibility tests have
weights are combined to train a final model, and the raw two important assets:
data are never exposed.35,37 In the case of sharing the • The documentation ensures providing a clear view of
locale gradients,25 it might be possible to retrieve estima- what the model is designed for, a technical description of
tions of the original data from these gradients. Training the the model, and its use.
local models on batches may prevent retrieving all the data • The eligibility tests are important to ensure that correct
from the gradients, as these gradients correspond to single input data are extracted and provided before executing
batches rather than all the local data.38 However, setting an the model. In euroCAT,23 a distributed learning expert
optimal batch size needs to be considered25 to assure data installed quality control via data extraction pipelines at
safety and the model’s ability to generalize.28,39,40 every participant point in the network. The pipeline
automatically allowed data records fulfilling the model
PRIVACY AND INTEGRATION OF DISTRIBUTED
training eligibility criteria to be used in the training. The
LEARNING NETWORKS
experts also test the extraction pipeline thoroughly in
Privacy in a distributed learning network addresses three addition to the machine learning testing. However, there
main areas: data privacy, the implemented model’s privacy, were post-processing compensation methods to cor-
and the model’s output privacy. Data privacy is achieved by rect for the variations caused by using different local
means of data anonymization and data never leaving the protocols.19
medical institutions. The distributed learning model can be
secured by applying differential privacy techniques,41 DISCUSSION
preventing leakage of weights during the training, and If one examines oncology, for instance, cancer is clearly
cryptographic techniques.42 These cryptographic tech- one of the greatest challenges facing health care. More than
niques provide a set of multiparty protocols that ensure 16 million new cancer cases were reported in 2017 alone.43
security of the computations and communication. Once the This number climbed to 18.1 million cases in 2018.44 This
increasing number of cancer incidences45 means that to publish and reuse computational workflows, and to
there are undoubtedly sufficient data worldwide to put define and share scientific protocols as workflow templates.50
machine/deep learning to meaningful work. However, as Such solutions will address emerging concerns about the
highlighted earlier, this requires access to the data and, as nonreproducibility of scientific research, particularly in data
also highlighted earlier, distributed learning enables this in science (eg, poorly published data, incomplete workflow
a manner that resolves legal and ethical concerns. None- descriptions, limited ability to perform meta-analyses, and an
theless, integration of distributed learning into health care is overall lack of reproducibility).51,52 Because workflows are
much slower compared with other fields, which raises the fundamental to research activities, FAIR has broad applica-
question of why this should be. Here, we summarize a set bility, which is vital in the context of distributed learning with
of methodologies to facilitate the adoption of distributed medical data.
learning and provide future directions.
WHY NOT PUBLICLY SHARE MEDICAL DATA?
CURRENT STATE OF MEDICAL DATA STORAGE Some studies were conducted trying to facilitate and secure
AND PREPROCESSING data-sharing procedures to encourage related researchers
Information Communication Technology and organizations to publicly share their data and embrace
transparency,53 by proposing data-sharing procedures and
Every hospital has its own storage devices and architecture.38,39
protocols aiming to harmonize regulatory frameworks and
In this case, the information communication technology
research governance.54,55 Despite the efforts made toward
preparation for distributed learning requires significant
data-sharing globalization, the sociocultural issues sur-
energy, time, and manpower, which can be costly. This
rounding data sharing remain pertinent.56 Large clinical
same process (data acquisition and preprocessing) needs
trials also face limitations in the data collection capabilities
to be repeated for each participating hospital,46-48 and
because of limited data storage capacities and manpower.
subsequently development and adoption of medical data
To retrospectively perform additional analysis, all the par-
standardization protocols need to be developed for this
ticipating centers need to be contacted again, which is time
implementation process.
consuming and delays research.57
Make the Data Readable: Findable, Accessible,
Furthermore, medical institutions prefer not to share patient
Interoperable, Reusable Data Principles
data to ensure privacy protection.58 This is, of course, in no
One way to enable a virtuous circle network effect is to small part about ensuring the trust and confidence of
embrace another community engaged in synergistic ac- patients who display a wide range of sensitivities toward the
tivities (joining a distributed learning network is worthwhile use of their personal data.
if it links to another large network). The Findable, Acces-
ORGANIZATIONAL CHANGE MANAGEMENT
sible, Interoperable, Reusable (FAIR) Guiding Principles for
data management and stewardship have gained sub- The adoption of distributed learning will require a change in
stantial interest, but delivering scientific protocols and organizational management (such as making use of newest
workflows that are aligned with these principles is data standardization techniques and adapting the roles of
significant.49 A description of FAIR principles is repre- employees to more technically oriented tasks, such as data
sented in Figure 3. Technological solutions are urgently retrieval). Provided knowledge and understanding of
needed that will enable researchers to explore, consume, proper change management concepts, health care pro-
and produce FAIR data in a reliable and efficient manner, viders can implement the latter successfully.59 Change
management principles, such as defining a global vision,
networking, and continuous communicating, could facili-
Findable
tate the integration of new technologies and bring up
Descriptive metadata
Persistent identifiers the clinical capabilities. However, this process of change
management can be complicated, because it requires the
Reusable Findable
Accessible involvement of multiple health care centers from different
Right and license Specify what to share countries and continents. This diversity can trigger a fear of
management Accessible Risk management
Usage standards Participant consent loss (one of the major factors of financial decision making),
definition (what can Interoperable management which stems from differences of opinion and regulation,60
and cannot be used) Access status
Reusable and the absence of data standardization, making the
processes of data acquisition and preprocessing harder.
In addition, the lack of knowledge about the new tech-
Interoperable
XML standards, including nology leads to resistance to accept the change and
data documentation innovation.60,61 Therefore, it is important to help health care
organizations understand the need for distributed learn-
FIG 3. Description of findable, accessible, interoperable, reusable ing by explaining the context of the change in terms of
(FAIR) principles. traditional ways of learning to distributed learning and
Master
Master
The training can start using The block then can be added to the chain, Master
the extra data provided which provides an indelible
and transparent record of the requests H1 then is entitled to start providing
by H1
the data to the network
New block chained with last block in the chain Master computer
monetizing clinical research and data (giving patients the these technologies on privacy, and the relationship be-
choice to share), processing claims, detecting fraud, and tween privacy and confidentiality, but there are significant
managing prescriptions (replace incorrect and outdated technical developments for the regulators to consider that
data). In addition to the above-mentioned uses of block- could answer a number of their concerns.
chain, it has been also used to maintain security and
scalability of clinical data sharing,73 secure medical record SUMMARY
sharing,74 prevent drug counterfeiting,75 and secure a pa-
tient’s location.76 Currently, a combination of regulations and ethics makes it
difficult to share data even for scientific research purposes.
It is essential that the use of distributed machine/deep The issues relate to the legal basis for processing and
learning and blockchain be harmonized with the available anonymization. Specifically, there has been reluctance to
security-preserving technologies (ie, continues devel- move away from informed consent as the legal basis for
opment and cybersecurity), which begins at the user levels processing toward processing in the public interest, and
(use strong passwords, connect using only trusted net- there are concerns about the re-identification of individuals
works, and so on) and ends with more complex information where data are de-identified and then shared in aggregated
technology infrastructures (such as data anonymization environments. A solution could be to allow researchers to
and user ID encryption).77 Cybersecurity is a key aspect in train their machine learning programs without the data ever
preserving privacy and ensuring safety and trust among having to leave the clinics, which in this paper we have
patients and health care systems.78 The continuous de- established as distributed learning. This safe practice
velopment or postmarketing surveillance can be seen as makes it possible to learn from medical data and can be
the set of checks and integrations that should occur when applied across various medical disciplines. A limitation to its
a distributed learning network is launched. This practice application, however, is that medical centers need to be
should make it possible to identify any weak security convinced to participate in such practice, and regulators
measures in the network or non-up-to-date features that also need to know suitable safeguards have been estab-
may require re-implementation.79,80 lished. Moreover, as can be seen in Table 2, even with the
The distributed learning and blockchain technologies use of distributed learning, the size of the data pool learned
presented here show that there are emerging data science from remains rather small. In the future, the integration of
solutions that begin to meet the concerns and shortcom- blockchain technology to distributed learning networks
ings of the law. The problems of re-identification are greatly could be considered, as it ensures transparency and
reduced and managed through the technologies. Clearly, traceability while following FAIR data principles and can
there are conceptual issues of understanding the impact of facilitate the implementation of distributed learning.
REFERENCES
1. Mitchell TM: Machine Learning International ed., [Reprint.]. New York, NY, McGraw-Hill, 1997
2. Boyd S, Parikh N, Chu E, et al: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in
Machine Learning 3:1-122, 2010
3. Cardoso I, Almeida E, Allende-Cid H, et al: Analysis of machine learning algorithms for diagnosis of diffuse lung diseases. Methods Inf Med 57:272-279, 2018
4. Wang X, Peng Y, Lu L, et al: ChestX-Ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of
common thorax diseases. Presented at 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, July 21-26, 2017
18
5. Ding Y, Sohn JH, Kawczynski MG, et al: A deep learning model to predict a diagnosis of Alzheimer disease by using F-FDG PET of the brain. Radiology
290:456-464, 2019
6. Emmert-Streib F, Dehmer M: A machine learning perspective on personalized medicine: An automized, comprehensive knowledge base with ontology for
pattern recognition. Mach Learn Knowl Extr 1:149-156, 2018
7. Deist TM, Dankers FJWM, Valdes G, et al: Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.
Med Phys 45:3449-3459, 2018
8. Lambin P, van Stiphout RG, Starmans MH, et al: Predicting outcomes in radiation oncology multifactorial decision support systems. Nat Rev Clin Oncol
10:27-40, 2013
9. Wang S, Summers RM: Machine learning and radiology. Med Image Anal 16:933-951, 2012
10. James G, Witten D, Hastie T, et al: An introduction to statistical learning: With applications in R. New York, NY, Springer, 2017
11. Sutton RS, Barto AG: Reinforcement Learning: An Introduction. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
12. Deng L: Deep learning: Methods and applications. Foundations and Trends in Signal Processing 7:197-387, 2014
13. LeCun Y, Bengio Y, Hinton G: Deep learning. Nature 521:436-444, 2015
14. Garling C: Andrew Ng: Why ‘deep learning’ is a mandate for humans, not just machines. Wired 2015. https://www.wired.com/brandlab/2015/05/andrew-ng-
deep-learning-mandate-humans-not-just-machines/
15. Pesapane F, Codari M, Sardanelli F: Artificial intelligence in medical imaging: Threat or opportunity? Radiologists again at the forefront of innovation in
medicine. Eur Radiol Exp 2:35, 2018
16. Liberati A, Altman DG, Tetzlaff J, et al: The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care
interventions: Explanation and elaboration. PLoS Med 6:e1000100, 2009
16a. Intersoft Consulting: General Data Protection Regulation: Recitals. https://gdpr-info.eu/recitals/no-26/
17. MAASTRO Clinic: euroCAT: Distributed learning. https://youtu.be/nQpqMIuHyOk
18. Rennock MJW, Cohn A, Butcher JR: Blockchain technology and regulatory investigations. https://www.steptoe.com/images/content/1/7/v2/171967/LIT-
FebMar18-Feature-Blockchain.pdf
19. Orlhac F, Frouin F, Nioche C, et al: Validation of a method to compensate multicenter effects affecting CT radiomics. Radiology 291:53-59, 2019
20. Goodfellow I, Bengio Y, Courville A: Deep Learning. https://www.deeplearningbook.org/
21. Lambin P, Roelofs E, Reymen B, et al: Rapid Learning health care in oncology - an approach towards decision support systems enabling customised
radiotherapy. Radiother Oncol 109:159-164, 2013
22. Lustberg T, van Soest J, Jochems A, et al: Big Data in radiation therapy: Challenges and opportunities. Br J Radiol 90:20160689, 2017
23. Deist TM, Jochems A, van Soest J, et al: Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care:
euroCAT. Clin Transl Radiat Oncol 4:24-31, 2017
24. Price G, van Herk M, Faivre-Finn C: Data mining in oncology: The ukCAT project and the practicalities of working with routine patient data. Clin Oncol (R Coll
Radiol) 29:814-817, 2017
25. Dean J, Corrado G, Monga R, et al: Large Scale Distributed Deep Networks. Advances in Neural Information Processing Systems 25, 2012, 1223-1231. https://
papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012
26. Cireşan D, Meier U, Schmidhuber J: Multi-column deep neural networks for image classification. http://arxiv.org/abs/1202.2745
27. Radiuk PM: Impact of training set batch size on the performance of convolutional neural networks for diverse datasets. Information Technology and
Management Science 20:20-24, 2017
28. Keskar NS, Mudigere D, Nocedal J, et al: On large-batch training for deep learning: generalization gap and sharp minima. http://arxiv.org/abs/1609.04836
29. Papernot N, Abadi M, Erlingsson Ú, et al: Semi-supervised knowledge transfer for deep learning from private training data. http://arxiv.org/abs/1610.05755
30. Shokri R, Shmatikov V: Privacy-preserving deep learning, in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security -
CCS ’15. Denver, Colorado, ACM Press, 2015, pp 1310-1321.
31. Predd JB, Kulkarni SB, Poor HV: Distributed learning in wireless sensor networks. IEEE Signal Process Mag 23:56-69, 2006
32. Ji X, Hou C, Hou Y, et al: A distributed learning method for l 1 -regularized kernel machine over wireless sensor networks. Sensors (Basel) 16:1021, 2016
33. Chang K, Balachandar N, Lam C, et al: Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc 25:945-954, 2018
34. McClure P, Zheng CY, Kaczmarzyk J, et al: Distributed Weight Consolidation: A Brain Segmentation Case Study. https://arxiv.org/abs/1805.10863
35. FreeSurferWiki: FreeSurfer. http://freesurfer.net/fswiki/FreeSurferWiki
36. Sheller MJ, Reina GA, Edwards B, et al: Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation.
http://arxiv.org/abs/1810.04304
37. Li W, Milletarı̀ F, Xu D, et al: Privacy-preserving federated brain tumour segmentation. http://arxiv.org/abs/1910.00962
38. Abadi M, Chu A, Goodfellow I, et al: Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Com-
munications Security – CCS’16. 308-318, 2016
39. Mishkin D, Sergievskiy N, Matas J: Systematic evaluation of convolution neural network advances on the Imagenet. Comput Vis Image Underst 161:11-19,
2017
40. Lin T, Stich SU, Patel KK, et al: Don’t use large mini-batches, use local SGD. http://arxiv.org/abs/1808.07217
41. Biryukov A, De Cannière C, Winkler WE, et al: Discretionary access control policies (DAC), in van Tilborg HCA, Jajodia S (eds): Encyclopedia of Cryptography
and Security. Boston, MA, Springer, 2011, pp 356-358
42. Pinkas B: Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor 4:12-19, 2002
43. Siegel RL, Miller KD, Jemal A: Cancer statistics, 2017. CA Cancer J Clin 67:7-30, 2017
44. Bray F, Ferlay J, Soerjomataram I, et al: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185
countries. CA Cancer J Clin 68:394-424, 2018
45. Siegel R, DeSantis C, Virgo K, et al: Cancer treatment and survivorship statistics, 2012. CA Cancer J Clin 62:220-241, 2012
46. Shortliffe EH, Barnett GO: Medical data: Their acquisition, storage, and use, in Shortliffe EH, Perreault LE (eds): Medical Informatics. New York, NY, Springer,
2001, pp 41-75
47. Shabani M, Vears D, Borry P: Raw genomic data: Storage, access, and sharing. Trends Genet 34:8-10, 2018
48. Langer SG: Challenges for data storage in medical imaging research. J Digit Imaging 24:203-207, 2011
49. Wilkinson MD, Dumontier M, Aalbersberg IJ, et al: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018, 2016
50. Wilkinson MD, Sansone S-A, Schultes E, et al: A design framework and exemplar metrics for FAIRness. Sci Data 5:180118, 2018
51. Dumontier M, Gray AJG, Marshall MS, et al: The health care and life sciences community profile for dataset descriptions. PeerJ 4:e2331, 2016
52. Jagodnik KM, Koplev S, Jenkins SL, et al: Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the
Commons Framework Pilots workshop. J Biomed Inform 71:49-57, 2017
53. Polanin JR, Terzian M: A data-sharing agreement helps to increase researchers’ willingness to share primary data: Results from a randomized controlled trial. J
Clin Epidemiol 106:60-69, 2018
54. Azzariti DR, Riggs ER, Niehaus A, et al: Points to consider for sharing variant-level information from clinical genetic testing with ClinVar. Cold Spring Harb Mol
Case Stud 4:a002345, 2018
55. Boué S, Byrne M, Hayes AW, et al: Embracing transparency through data sharing. Int J Toxicol 10.1177/1091581818803880
56. Poline J-B, Breeze JL, Ghosh S, et al: Data sharing in neuroimaging research. Front Neuroinform 6:9 2012
57. Cutts FT, Enwere G, Zaman SMA, et al: Operational challenges in large clinical trials: Examples and lessons learned from the Gambia pneumococcal vaccine
trial. PLoS Clin Trials 1:e16 2006
58. Xia W, Wan Z, Yin Z, et al: It’s all in the timing: Calibrating temporal penalties for biomedical data sharing. J Am Med Inform Assoc 25:25-31, 2018
59. Fleishon H, Muroff LR, Patel SS: Change management for radiologists. J Am Coll Radiol 14:1229-1233, 2017
60. Delaney R, D’Agostino R: The challenges of integrating new technology into an organization. https://digitalcommons.lasalle.edu/cgi/viewcontent.cgi?
article=1024&context=mathcompcapstones
61. Agboola A, Salawu R: Managing deviant behavior and resistance to change. Int J Bus Manage 6:235, 2010
62. Jochems A, Deist TM, van Soest J, et al: Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the
hospital - A real life proof of concept. Radiother Oncol 121:459-467, 2016
63. Jochems A, Deist TM, El Naqa I, et al: Developing and validating a survival prediction model for NSCLC patients through distributed learning across 3 countries.
Int J Radiat Oncol Biol Phys 99:344-352, 2017
63a. Deist TM, Dankers FJWM, Ojha P, et al: Distributed learning on 20 000+ lung cancer patients - The Personal Health Train. Radiother Oncol 144:189-200,
2020
64. Tagliaferri L, Gobitti C, Colloca GF, et al: A new standardized data collection system for interdisciplinary thyroid cancer management: Thyroid COBRA. Eur
J Intern Med 53:73-78, 2018
65. Brisimi TS, Chen R, Mela T, et al: Federated learning of predictive models from federated Electronic Health Records. Int J Med Inform 112:59-67, 2018
66. Dluhoš P, Schwarz D, Cahn W, et al: Multi-center machine learning in imaging psychiatry: A meta-model approach. Neuroimage 155:10-24, 2017
67. Dhillon V, Metcalf D, Hooper M: Blockchain in health care, in Dhillon V, Metcalf D, Hooper M (eds): Blockchain Enabled Applications: Understand the
Blockchain Ecosystem and How to Make it Work for You. Berkeley, CA, Apress, 2017, pp 125-138
68. Lugan S, Desbordes P, Tormo LXR, et al: Secure architectures implementing trusted coalitions for blockchained distributed learning (TCLearn). http://arxiv.
org/abs/1906.07690
69. Nakamoto S: Bitcoin: A peer-to-peer electronic cash system. https://bitcoin.org/bitcoin.pdf
70. Gordon WJ, Catalini C: Blockchain technology for healthcare: Facilitating the transition to patient-driven interoperability. Comput Struct Biotechnol J
16:224-230, 2018
71. Kamel Boulos MN, Wilson JT, Clauson KA: Geospatial blockchain: Promises, challenges, and scenarios in health and healthcare. Int J Health Geogr 17:25
2018
72. Pirtle C, Ehrenfeld J: Blockchain for healthcare: The next generation of medical records? J Med Syst 42:172, 2018
73. Zhang P, White J, Schmidt DC, et al: FHIRChain: Applying blockchain to securely and scalably share clinical data. Comput Struct Biotechnol J 16:267-278,
2018
74. Dubovitskaya A, Xu Z, Ryu S, et al: Secure and trustable electronic medical records sharing using blockchain. AMIA Annu Symp Proc 2017:650-659, 2018
75. Vruddhula S: Application of on-dose identification and blockchain to prevent drug counterfeiting. Pathog Glob Health 112:161, 2018
76. Ji Y, Zhang J, Ma J, et al: BMPLS: Blockchain-based multi-level privacy-preserving location sharing scheme for telecare medical information systems. J Med
Syst 42:147, 2018
77. Coventry L, Branley D: Cybersecurity in healthcare: A narrative review of trends, threats and ways forward. Maturitas 113:48-52, 2018
78. Jalali MS, Kaiser JP: Cybersecurity in hospitals: A systematic, organizational perspective. J Med Internet Res 20:e10059, 2018
79. Vlahović-Palčevski V, Mentzer D: Postmarketing surveillance, in Seyberth HW, Rane A, Schwab M (eds): Pediatric Clinical Pharmacology. Berlin, Springer,
2011, pp 339-351
80. Parkash R, Thibault B, Philippon F, et al: Canadian Registry of Implantable Electronic Device outcomes: Surveillance of high-voltage leads. Can J Cardiol
34:808-811, 2018
81. Ing EB, Ing R: The use of a nomogram to visually interpret a logistic regression prediction model for giant cell arteritis. Neuroophthalmology 42:284-286, 2018
82. Tirzīte M, Bukovskis M, Strazda G, et al: Detection of lung cancer with electronic nose and logistic regression analysis. J Breath Res 13: 016006, 2018
83. Ji Z, Jiang X, Wang S, et al: Differentially private distributed logistic regression using private and public data. BMC Med Genomics 7:S14, 2014 (suppl 1)
84. Jiang W, Li P, Wang S, et al: WebGLORE: A web service for Grid LOgistic REgression. Bioinformatics 29:3238-3240, 2013
85. Wang S, Jiang X, Wu Y, et al: EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed privacy-preserving online model learning. J Biomed
Inform 46:480-496, 2013
86. Desai A, Chaudhary S: Distributed decision tree. Proceedings of the Ninth Annual ACM India Conference, Gandhinagar, India, ACM Press, 2016, pp 43-50
87. Caragea D, Silvescu A, Honavar V: Decision tree induction from distributed heterogeneous autonomous data sources, in Abraham A, Franke K, Köppen M
(eds): Intelligent Systems Design and Applications. Berlin, Springer, 2003, pp 341-350
88. Plaku E, Kavraki LE: Distributed computation of the knn graph for large high-dimensional point sets. J Parallel Distrib Comput 67:346-359, 2007
89. Xiong L, Chitti S, Liu L: Mining multiple private databases using a kNN classifier, in Proceedings of the 2007 ACM symposium on Applied computing – SAC ’07.
Seoul, Korea, ACM Press, 2007, p 435
90. Huang Z: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283-304, 1998
91. Jagannathan G, Wright RN: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data, in Proceeding of the eleventh ACM SIGKDD
international conference on Knowledge discovery in data mining – KDD ’05. Chicago, Illinois, USA, ACM Press, 2005, p 593
92. Jin R, Goswami A, Agrawal G: Fast and exact out-of-core and distributed k-means clustering. Knowl Inf Syst 10:17-40, 2006
93. Jagannathan G, Pillaipakkamnatt K, Wright RN: A new privacy-preserving distributed k -clustering algorithm, in Proceedings of the 2006 SIAM International
Conference on Data Mining. Society for Industrial and Applied Mathematics, 2006, pp 494-498
94. Ye Y, Chiang C-C: A parallel apriori algorithm for frequent itemsets mining, in Fourth International Conference on Software Engineering Research, Management
and Applications (SERA’06). Seattle, WA, IEEE, 2006, pp 87-94
95. Cheung DW, Ng VT, Fu AW, et al: Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8:911-922, 1996
96. Bellman R: A Markovian decision process. Indiana Univ Math J 6:679-684, 1957
97. Puterman ML: Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York, NY, John Wiley & Sons, 2014
98. Watkins CJCH, Dayan P: Q-learning. Mach Learn 8:279-292, 1992
99. Lauer M, Riedmiller M: An algorithm for distributed reinforcement learning in cooperative multi-agent systems, in Proceedings of the Seventeenth International
Conference on Machine Learning. Burlington, MA, Morgan Kaufmann, 2000, pp 535-542. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.772
n n n
APPENDIX
Identification
database searching through other sources
(n = 127) (n = 0)
Studies included in
qualitative synthesis
(n = 6)
Included
Studies included in
quantitative synthesis
(meta-analysis)
(n = 6)
FIG A1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 flow diagram.
Abbreviations: N/A, not applicable; PICOS, participants, interventions, comparisons, outcomes, and study design; PRISMA, Preferred
Reporting Items for Systematic Reviews and Meta-Analyses.