Researchers from Google and Stanford University have developed a machine learning model that shows huge potential to catch lung cancer cases early, outperforming the assessments of six radiologists – catching more cases and reducing false positives – after being trained on a dataset of 45,856 chest screens.
The model – detailed this week in a paper in the journal Nature Medicine, detected five percent more cancer cases while reducing false-positive exams by more than 11 percent compared to unassisted radiologists; achieving 94.4 percent accuracy when tested on 6,716 cases. Google said it aims to commercialise the model via its Cloud Healthcare API, which it released in April.
The research team, which also included participants from New York University-Lagone Medical Center, Northwestern Medicine, and Palo Alto Veterans Affairs, trained Tensorflow on three publicly available datasets and one proprietary one from Northwestern Medicine for the paper.
The result: a convolutional neural network model that can not only generate the overall lung cancer malignancy prediction, but also identify subtle malignant tissue in the lungs (lung nodules). The model can also factor in information from previous scans, useful in predicting lung cancer risk, as the growth rate of suspicious lung nodules can be indicative of malignancy, Google noted in an accompanying blog.
Google technical lead Shravya Shetty, said: “Radiologists typically look through hundreds of 2D images within a single CT scan and cancer can be miniscule and hard to spot. We created a model that can not only generate the overall lung cancer malignancy prediction (viewed in 3D volume) but also identify subtle malignant tissue in the lungs (lung nodules). The model can also factor in information from previous scans, useful in predicting lung cancer risk because the growth rate of suspicious lung nodules can be indicative of malignancy.”
She added: “These initial results are encouraging, but further studies will assess the impact and utility in clinical practice. We’re collaborating with Google Cloud Healthcare and Life Sciences team to serve this model through the Cloud Healthcare API and are in early conversations with partners around the world to continue additional clinical validation research and deployment.”
Google is not releasing the code used for training the models, saying it “has a large number of dependencies on internal tooling, infrastructure and hardware, and its release is therefore not feasible.”
The researchers added, however: “All experiments and implementation details are described in sufficient detail in the Methods section to allow independent replication with non-proprietary libraries.” Several major components of the work are available in open source repositories, they added, pointing to the public datasets, along with the Tensorflow machine learning tool and its Object Detection API.