Caution urged on medical AI

Published 28 June 2023

Associate Professor Deborah Apthorp from UNE’s School of Psychology, a specialist working to improve the diagnosis of Parkinson’s disease, is urging caution on the use of machine learning.

“AI has a lot of promise for use in medicine, but we need to be cautious about the premature adoption of AI models,” she said. “There is potential for AI to create a false sense of confidence and this may severely impact the health of patients.”

Assoc. Prof. Apthorp has been collaborating with colleagues from Canberra Hospital and the Our Health in Our Hands group at the Australian National University to test how effectively machine learning can be used to detect Parkinson’s disease, the second most common neurological disease in Australia. Far from the “extraordinary” detection performance reported, the researchers have found that accuracy dropped by as much as 30% after accounting for some inherent flaws in the algorithms.

“Any AI is only as good as the data it has been trained on; it’s a case of garbage in, garbage out,” Assoc. Prof. Apthorp said. “We have discovered inflationary effects in some of the medical data that AI is trained on, which means that it cannot generalise well for any given population. This can severely misrepresent the technology’s performance in the real world.”

In any research, hypotheses are tested on human samples that accurately represent variables in society – for example, gender, age, general health, smoking status and ethnicity. This allows researchers to make inferences about broader trends, which is known as generalisation.

In the case of Parkinson’s, which affects about 1% of the population over 60 years – some 80,000 Australians – the race is on to develop accurate and objective ways of diagnosing and tracking the severity of the disease. Clinicians currently rely largely on a patient’s neurological history and assessments of their motor skills, but misdiagnoses and delayed diagnoses are common.

In 90% of Parkinson’s patients, speech production is impaired, which makes voice recordings a useful way to chart the disease’s progress. So Assoc. Prof. Apthorp and her fellow researchers tested AI on two datasets of voice recordings, to determine how effectively the AI could distinguish between Parkinson’s patients and healthy people. The initial results were skewed by inflationary effects and therefore not indicative of real-world conditions.

“This occurred because, in the study of Parkinson’s, AI can only rely on small amounts of published data, and that data is not truly representative of society,” Assoc. Prof. Apthorp said. “The composition of the datasets is not balanced, which is essential for the comparison of two samples that are as similar as possible, with the exception that one group has the disease and the other doesn’t. So while AI is an incredibly useful tool, we need to be careful in hyping its capabilities.”

Assoc. Prof. Apthorp believes AI could one day help to enhance human diagnostic abilities – in an earlier paper, the research team showed that it was possible to zoom in on aspects of voice recordings that were inaudible to an expert human ear – but there is still much that needs to happen before it is successfully deployed.

“More researchers need to publish their data, for one thing, to build a representative dataset,” she said. “Importantly, machine learning experts have to work with the medical experts who use the models and those who collect the data to ensure we minimise the inflationary effects. Then AI models must be robustly evaluated and randomized controlled trials conducted.

“I think AI has great potential to assist us in the early diagnosis of Parkinson’s and the development of early interventions. With carefully conceived data collection protocols developed in collaboration with both medical professionals and AI/machine learning specialists, we could achieve high-quality datasets that would lead to better algorithms and better translation to the real world.

“But, at the moment, it remains very blue-sky stuff. We still need skilled clinicians to inform how we are using machine learning and to critically evaluate its use.”

In this story: