King’s College London and Nvidia researchers say they have developed a way of training a deep neural network that allows training data — including sensitive medical information — to be distributed across multiple locations.
The new technique, the two said, makes it possible for organisations to collaborate on a shared model, without needing to directly share any clinical data.
The federated learning system for medical image analysis lets each participating institution’s data be kept on premises and secure.
Federated learning involves creating a central global server that sends a training algorithm to each medical centre taking part in the project.
These institutions train the model on their private dataset, before sending it back to be aggregated by the central server.
The research collaboration has resulted in a paper that will be presented to MICCAI, an international medical image computing conference today.
Read this: 7 New NHS “Data Hubs” Launched: Will Analyse Real-Time Medical Data
The researchers see the model as a medical “breakthrough” in AI-informed healthcare, one that could open up the vast data stores held by medical institutions.
In that paper the researchers claim: “Federated learning allows collaborative and decentralized training of neural networks without sharing the patient data. Each node trains its own local model and, periodically, submits it to a parameter server.
“The server accumulates and aggregates the individual contributions to yield a global model, which is then shared with all nodes.”
The dataset used by King’s College for their a paper was taken from the BraTS 2018 dataset; which contains MRI scans of 285 patients with brain tumors.
Federated Learning
A key problem in creating an AI algorithm that is tasked with reaching medical diagnoses and solutions to healthcare outcomes is that medical datasets are often hard to obtain. Tight rules and regulations on patient data sharing have resulted in a highly siloed data landscape: something that’s an antithesis to machine learning and AI, which requires large and standardised data sets to train models on.
Under the federated model developed by King’s College and Nvidia, each medical centre gets a client that will run the training model on their dataset. Then from a central server a machine learning algorithm is sent out in a container to each institution in the project; they then train the model on the siloed data they hold.
Once the model has been trained on all the cases available, it is then sent back to a central server where the algorithm mixes all of the learnt parameters together to reach a consensus. This process is rinsed and repeated till the AI model is accurate enough to be considered reliable.
This type of training of method has been tested and utilised by Google to help train deep neural networks when the data required is spread across a massive number of clients or devices. Currently a lot of machine learning is done in data centres, these are highly controlled environments that not only have large memory capacity, but also run at high speeds with low-latency. Adversely training a model on data that is contained on devices that are spread out in multiple regions and have varying levels of connectivity would not suit a traditional learning model.
In a paper [PDF] Google researchers noted that: “Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.”