View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. AI and automation
May 31, 2022updated 01 Jun 2022 9:05am

AI-generated brain scans highlight potential for synthetic healthcare data

NVIDIA and King's College London have released 100,000 AI-generated brain scans.

By Ryan Morrison

Researchers at King’s College London and chipmaker NVIDIA have released 100,000 synthetic 3D brain scans that were created using the Cambridge-1 supercomputer, as well as a framework for deep learning in healthcare imaging. The AI-generated images, which are intended for use by scientists researching neurological conditions, highlight the emerging use of synthetic data to bolster healthcare research.

Synthetic brain scan images
Researchers will be able to use the MONAI framework to generate synthetic medical images on demand (Photo: NVIDIA)

The synthetic MRI scans were developed using MONAI, a deep-learning framework for medical imaging that was launched by NVIDIA and King’s College London in 2020. The framework, based on Python-based deep learning library PyTorch, provides tools for researchers to apply deep learning to medical images, from data labelling and analysis to application development.

The team used MONAI and the Cambridge-1 supercomputer to create an “AI factory for synthetic data”, running hundreds of experiments to find the best AI models with the most accurate data possible. Using this system, KCL researchers were able to generate 100,000 clinically accurate synthetic MRI scans in just 48 hours.

In the past, the application of AI to biomedical research has been hampered by a lack of suitable training data, said lead researcher Jorge Cardoso, reader in artificial medical intelligence at King’s College London and founding member of the MONAI consortium, in a statement. The MONAI project has created a “treasure trove” of images for researchers to use, he said.

An NVIDIA spokesperson told Tech Monitor that using the MONAI framework, Cambridge-1 will be able to create synthetic medical images “to order”. “Female brains, male brains, old ones, young ones, brains with or without disease – plug in what you need, and it creates them,” the spokesperson said. “Though they’re simulated, the images are highly useful because they preserve key biological characteristics, so they look and act like real brains would.”

The same technique could be used to simulate any kind of medical imaging, including CAT and PET scans, Cardoso said in statement. “In fact, this technique can be applied to any volumetric image”.

“It can be a bit overwhelming,” he added. “There are so many different things we can start thinking about now.”

Content from our partners
How businesses can safeguard themselves on the cyber frontline
How hackers’ tactics are evolving in an increasingly complex landscape
Green for go: Transforming trade in the UK

Synthetic data for healthcare research: pros and cons

As well as creating realistic training data for AI models, synthetic data can also help researchers skirt any privacy concerns about using patient records, explains Frederik Hvilshøj, machine learning engineer at computer vision data platform provider Encord.

“Due to data privacy and other restricting factors, the amount of available data within medicine has been scarce,” Hvilshøj told Tech Monitor. “As a consequence, model performance has also not been optimal.”

Synthetic data is also cheap and quick to produce, Hvilshøj added. “Imagine having to bring 100,000 people to an MRI to scan them one at a time. This would be super costly. With the synthetic generation of data, this can be done within a fraction of the time with a click of a button.”

There are downsides to using synthetic data, however, including the need to ensure it is realistic. This is a “hard problem to solve”, says Hvilshøj, and costly if you need to keep a human in the loop to verify training models.

“This is not a trivial task. We also need to make sure that the data generation process has not overfitted to the training data, which would cause us to end up with close to identical samples as those in the training set.”

Quality assurance is essential when using synthetic data to train AI systems, Hvilshøj explained, as it might otherwise lead to skewed results.

“The biggest risk is relying blindly on synthetic data. Without a proper quality assurance of the data, chances are that the downstream models trained on the data will be even more ‘rogue’.”

Synthetic data is being used to support AI development in a number of application areas. Last year, for example, the International Organisation for Migration (IOM) and Microsoft launched a synthetic dataset for human trafficking

Read more: Synthetic data may not be AI’s privacy silver bullet

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU