Introduction

Parkinson’s disease (PD) is a common, chronic, progressive neurodegenerative movement disorder associated with the aggregation of abnormal α-synuclein in Lewy bodies and the loss of nigrostariatal dopaminergic neurons. The mechanism of neurodegeneration in PD is unclear, and currently, there is no cure for PD. The most striking symptoms of PD are motor symptoms such as tremor, rigidity, bradykinesia, or postural instability and patients with severe motor symptoms often have difficulty using their hands, or have difficulty standing and walking due to tremor and stiff muscles, which severely affects their quality of life. In addition, non-motor symptoms, such as hyposmia/anosmia (smell/olfactory loss), autonomic dysfunction, and rapid eye movement (REM) sleep behavior disorder, usually emerge years before motor symptoms, but they may be mild and are often overlooked. The diagnosis of PD is challenging, e.g., in differentiating PD from essential tremor, drug-induced parkinsonism and atypical parkinsonian disorders such as progressive supranuclear palsy (PSP), multiple system atrophy (MSA), and corticobasal degeneration (CBD). The error rate of a clinical diagnosis of PD is high. A meta-analysis reported that the error rate was 26.2% by nonexperts, and from 16.1% (for initial diagnosis) to 20.4% (for follow up diagnosis) by experts1. Using autopsy results to evaluate PD diagnoses, Hughes et al.2 found the diagnostic error rate was around 24%. Further, using neuropathologic findings of PD as the gold standard, Adler et al.3 found that the accuracy of a clinical diagnosis of PD was only 26% in untreated or medication non-responsive subjects, 53% in medication-responsive early PD (duration shorter than 5 years), and >85% in medication-responsive and longer duration PD. The high error rate in PD diagnosis may be because: (1) Clinical diagnoses of PD are mainly based on results of clinical tests and response to antiparkinsonian medication. Neuroimaging is only used as an assistance in PD diagnosis, although the clinical utility of neuroimaging such as SPECT (single photon emission computed tomography) is high and results of dopamine transporter scan (DaTscan) lead to modified diagnosis in one-third of the patients4; (2) Currently, there are few reliable biomarkers for PD5, in particular, there is no in vivo imaging tool available to directly image the accumulation of α–synuclein aggregates or the spreading of Lewy bodies in the brain of a PD patient6.

Another challenge in the diagnosis of PD is early detection because at early stages of PD, brain changes and symptoms are subtle. The brain regions that are most affected by PD are the basal ganglia and substantia nigra. Neurodegeneration of the basal ganglia and loss of dopaminergic neurons in the substantia nigra begin long before the presence of motor symptoms, and by the time motor symptoms emerge, 40–60% of nigral dopaminergic neurons are lost and up to 80% synaptic function is reduced7,8. The period between the onset of neurodegeneration and the emergence of motor symptoms is called prodromal (or pre-motor) stage, which might last from several years to decades9. Early neuroprotective treatment can slow down neurodegeneration progression and potentially prevent clinical PD symptoms from emerging9. Therefore, it is important to detect PD at early stages so that early neuroprotective treatment can be effective.

Clinical assessment and analysis of PD imaging is crucial for the diagnosis of PD. At early stages of PD, loss of neurons in the brain first occurs in the ventrolateral substantia nigra pars compacta, then projects to the posterior putamen, and then to more regions in the striatum. Progressive brain atrophy has been detected on structural MRI in PD, even at early PD stages10,11. In addition, due to the accumulation of abnormal α-synuclein aggregates in Lewy bodies and the spreading of Lewy bodies in the brain of PD patients over time (from brain stem and olfactory system, to the substantia nigra and then to neocortical regions)12, PD may be viewed as a progressive brain network disruption13. Dopaminergic radiotracer imaging with SPECT or positron emission tomography (PET), a biomarker of early PD, can detect dopaminergic denervation in PD at pre-motor stage9,14. However, dopaminergic denervation may elude from visual analysis or semi-quantitative image analysis of SPECT or PET images. Consequently, there is variability in dopamine transporter SPECT imaging interpretation between radiologists, which leads to inconsistent diagnosis. Thus, computer-aided diagnosis (CAD) based on machine-learning methods has been developed to help detect dopaminergic denervation on SPECT15,16,17,18,19,20 and PET21,22,23,24 images, and to identify PD-related structural changes on MRI25,26,27,28 for early detection of PD. Moreover, resting-state functional MRI (rs-fMRI)29,30,31,32,33 and diffusion tenser imaging (DTI)34,35 have been used to identify abnormal functional and structural connectivity in PD. Further, machine-learning-based multi-modal data (including imaging and/or clinical data) analysis has been found helpful in the detection of brain abnormalities in PD17,30,36,37.

Machine learning (ML), a group of multivariate analytic methods that learn from data, identify data patterns and classify the data, is often used in data mining and artificial intelligence. Machine learning can be either supervised (using training data labeled by humans for data classification) or unsupervised (which does not use training data, but identify data patterns on its own). Supervised learning includes methods such as linear discriminant analysis (LDA)38, support vector machine (SVM)39, artificial neural networks (ANNs)40 and random forest41, while unsupervised learning includes approaches such as cluster analysis42. Among these methods, SVM and ANNs are frequently used machine-learning models. SVM creates a line (or a hyperplane) that best separates data into classes and provides linear (or non-linear) mapping between inputs and outputs, while ANNs (consisting of multiple layers) work in a complex and non-linear way, which do not provide direct mapping between inputs and outputs. In addition, based on ANNs, newly developed deep-learning techniques or deep-neural networks are a new set of powerful tools for data classification in PD18,24,43.

Since each PD case is unique, diagnosis and therapy need to be tailored for individual patients to achieve the best clinical outcome. Machine-learning (ML) techniques have the potential to identify complex data patterns, automate data analysis, and make inferences/classifications for data of individual patients, which may be useful for precision medicine in PD. In recent years, machine learning has been increasingly used in the diagnosis of PD. This paper reviewed the studies that applied machine-learning methods to the diagnosis and early detection of PD in order to provide an overview of this field.

An overview of machine-learning-based studies for the diagnosis of PD

Studies applied machine-learning-based approaches to the diagnosis of PD mainly fall into three categories: 1. Discrimination between PD and Healthy control (HC); 2. Differential diagnosis; 3. Early PD detection. Machine-learning-based imaging studies using SPECT, PET, structural MRI, and functional MRI (fMRI) were summarized in Table 1, Table 2, Table 3, and Table 4, respectively (among them, Table 4 is a Supplementary Table).

Table 1 Machine-learning-based SPECT dopaminergic imaging studies for PD diagnosis and early detection.
Table 2 Machine-learning-based PET imaging studies for PD diagnosis and early detection.
Table 3 Machine-learning-based structural MRI studies for PD diagnosis and early detection.
Table 4 Machine-learning-based fMRI studies for PD diagnosis and early detection.

Machine-learning-based studies for the discrimination between PD and HC

Dopaminergic imaging

Reduced uptake of a dopamine transporter radiotracer in the striatum of a PD patient on dopaminergic imaging (SPECT or PET) indicates neuronal degeneration and dopaminergic deficit in PD. In particular, reduced uptake of [123I]FP-CIT ([123I]-ioflupane) (the most widely used dopamine transporter radiotracer in SPECT DaTSCAN imaging) in the striatum (putamen and caudate) helps confirm PD and exclude other disorders such as drug-induced Parkinsonism and essential tremor. Measurements of dopamine transporter binding in the striatum and the distribution of the radiotracer uptake are important to characterize dopaminergic functional deficit in PD. Semi-quantitative analysis computes measurements of dopamine transporter binding in the striatum such as striatal uptake and striatal-binding ratios, but can not capture the distribution of the radiotracer uptake (which is often perceived by experienced experts), while machine-learning methods such as artificial neural network (ANN) and support vector machine (SVM) can identify data patterns in the distribution of the radiotracer uptake on SPECT imaging.

To test the ability whether a machine-learning method can mimic expert pattern recognition skills, Acton and Newberg applied ANN to striatum images obtained from dopaminergic SPECT imaging and obtained an overall diagnostic accuracy of 94.4% (n = 81)15. Illan et al.44 further developed an automatic computer-aided diagnostic system based on SVM (and other classifiers) for PD detection, and found that classification with SVM on striatum images performed the best (the area under the receiver-operating characteristics curve (AUC) was 0.968) (n = 108). In addition, Segovia et al.45 used partial least square (PLS) for data dimension reduction of striatum images, classified the imaging features with SVM and obtained a classification accuracy of 94.7% (n = 95). Further, Palumbo et al.46 used SVM to classify the uptake values in the striatal regions (accuracy: 90.6–90.7%) and reported that uptake values in the putamen are the most discriminative predictor for PD diagnosis, and adding patient age to data classification improved classification accuracy (95.6%) (n = 56).

Comparative studies between machine-learning-based analysis and semi-quantitative analysis of SPECT images have shown that computer-aided diagnosis (CAD) based on machine-learning methods such as SVM and ANN has outperformed conventional semi-quantitative analysis, reduced interpretation variability of dopaminergic transporter SPECT imaging, and improved diagnostic accuracy of PD and consistency of radiologists15,19,20.

Further, new imaging features such as texture features have improved classification accuracy (e.g., 97.4%, n = 158)47, and recently developed machine-learning techniques such as deep-learning convolutional neural networks (CNNs) have begun to show some promising results. For example, Choi et al.18 developed an automatic deep-learning system that applied CNNs to SPECT imaging analysis and obtained high detection rates of 96% (PPMI data, n = 431, early PD) and 98.8% (local data, n = 72, advanced PD), which was comparable to that of experts’ visual analysis and semi-quantitative analysis. The deep-learning system could also reclassify patients who were clinically diagnosed as PD, but had scans without evidence of dopaminergic deficit (SWEDD)18. Further, it has been reported that new classifiers such as enhanced probabilistic neural network and a semi-supervised-learning classifier graph-based transductive learning detected PD more accurately than SVM37,48. In addition, to overcome the limitations of institution-specific ML software implementations, Zhang and Kagen explored the widely available GoogleTM TensorFlow machine-learning software library and applied Artificial Neural network to SPECT image classification for a large sample of PD patients (n = 1171), which yielded a classification accuracy of 93.8 ± 4.7%49. Further, Glaab et al.21 performed voxel-based whole-brain analysis on FDOPA PET (n = 60 PD) and FDG PET (n = 44 PD) images, classified them with SVM and random forest models, and found that FDOPA PET had (~10%) higher diagnostic performances than FDG PET. Using uptake features or texture features extracted from FDG PET data, classification yielded 70–91% accuracy21,22,23. Further research is warranted to validate and optimize these new data features and/or new machine-learning methods to improve classification accuracy and reliability.

Structural magnetic resonance imaging (MRI)

Morphometric measurements such as brain gray matter (GM) and white matter (WM) volumes, shapes, cortical thickness, and cortical surface area in regions of interest (ROIs) such as striatum have been used as imaging features to detect progressive brain atrophy in machine-learning-based MRI imaging analysis to aid in the diagnosis of PD.

Subcortical nuclei shape analysis has revealed volume differences in the putamen and shape differences in the striatum (putamen and caudate nucleus) between PD patient group and control group, and discriminant analysis using a combination of these imaging features discriminated individual patients from controls with an accuracy of 75–83% (n = 21)50. In another study, SVM was applied to combined MRI imaging features (GM, WM, cerebrospinal fluid (CSF) volumes, cortical thickness, cortical surface area, correlation index of cortical thickness of 78 ROIs), which distinguished patients from controls with an accuracy of 85.8% (n = 69)51. In addition, new MRI imaging features such as cerebellum shape index36 or GM density feature of the cerebellum52 and proper classifiers are promising to improve classification accuracy. For example, classifying GM density decrease in the Crus and Vermis of the cerebellum with SVM improved the classification accuracy to 97%52, while classifying neuroimaging biomarkers such as cerebellum shape index, surface area, and volume of regions of interest (ROIs), as well as clinical data (e.g., UPDRS scores) with AdaBoost classifier, yielded a classification accuracy up to 98.9%36.

Functional MRI (fMRI)

Reduced functional connectivity (FC) and brain activity (measured by amplitude of low-frequency fluctuation (ALFF)) in the basal ganglia network (BGN) and sensorimotor network in PD have been reported53,54,55,56. These findings are consistent across multiple patient samples54,57, and robust to variations in image processing methods, which suggests that resting-state fMRI (rs-fMRI) might be a biomarker for PD.

However, there are some inconsistent findings across rs-fMRI studies in PD58,59,60,61,62. Machine-learning methods may help reveal the diagnostic value of rs-fMRI in PD and clarify some of the inconsistencies. rs-fMRI measurements such as FC, ALFF, and regional homogeneity (ReHo) have been used as imaging features for PD classification. For example, classifying FC, ALFF, and ReHo features with SVM yielded a classification accuracy of 74% (n = 19)30; using FC features from 12 brain networks, FC classification with SVM yielded 70% accuracy (n = 80)63, while using whole-brain FC features, classification with SVM achieved 93.6% accuracy (n = 21)64. Apart from variations across data samples, these results revealed the importance of feature selection and optimization in machine-learning-based rs-fMRI image analysis.

Multi-modal data

Since data from a single source or modality (e.g., SPECT) can not fully capture all the key characteristics of the abnormalities of PD, multi-modal data (e.g., combined SPECT imaging and clinical data such as motor test score) may help improve PD detection. Multi-modal data refers to data from different sources (such as imaging modalities: SPECT, PET, MRI, etc.; and/or clinical tests: motor test, cognitive test, etc.) measured on different scales. Clinical data that are often used for PD diagnosis include motor data and non-motor data of clinical examinations such as motor disorder society-sponsored revision of the Unified Parkinson’s Disease Rating Scale I, II, and III (MDS-UPDRS I,II, and III), Montreal Cognitive Assessment (MoCA), Scales for Outcomes in Parkinson’s Disease—Autonomic (SCOPA-AUT), and University of Pennsylvania Smell Identification Test. In addition, to identify biomarkers of PD progression, the Parkinson Progression Marker Initiative (PPMI)5, a comprehensive international multi-center study collected multi-modal clinical data (motor data included MDS-UPDRS; non-motor data included cognitive testing such as MoCA, autonomic testing such as SCOPA-AUT total autonomic score, sleep disorder assessment, and olfactory assessment), imaging data (DaTSCAN and MRI), biospecimen data (blood, CSF, urine) and genetic data (DNA, RNA) of 400 PD patients and 200 health controls over 5 years, and made the data available online, which is a great data resource in the field5.

Studies have shown that the combination of imaging and clinical data have improved the detection of brain abnormalities in PD17,21,36,37. For instance, Hirschauer et al.37 found that PD detection rate of a single-modal feature extracted from SPECT (Ioflupane (123I) striatal-binding ratios in the caudate and putamen) was 66–97%, but the combined multi-modal data features (SPECT + clinical data) yielded a detection rate of 98.6%. Glaab et al.21 also reported that combining imaging data features (PET data) with metabolomics data enhanced the discrimination power and diagnostic performance of the machine-learning systems in their study.

In addition, studies have shown that combined genetic and clinical data (such as rapid eye movement (REM) sleep behavior disorder, olfactory loss, and CSF measurements) improved PD diagnosis and early detection17,36,37. Genetic data (e.g., whether sibling with PD with age of onset <50 years) and clinical data such as abnormal quantitative motor test results are biomarkers of early PD9. For a recent review on machine learning using genetic data in PD, see ref. 65.

Machine-learning-based studies for the differential diagnosis of PD

Dopaminergic imaging

The difference in striatal uptake or striatal uptake ratios between PD and other parkinsonism has been identified by machine-learning-based dopaminergic imaging analysis. To differentiate between PD and vascular parkinsonism (VP), Huertas-Fernández et al.66 developed diagnostic models to classify the [(123)I]FP-CIT uptake in the region of interest (ROI) striatum and the whole brain, and reported that discrimination accuracy between VP and PD reached 90.3 ± 5.8% (using logistic regression for ROI approach), and 90.4 ± 5.9% (using SVM for voxel-based whole-brain approach) (n = 164). Further, to differentiate between PD and atypical parkinsonian syndromes such as MSA or PSP, SVM has been applied to (18)F-DMFP PET image classification (with imaging features such as striatal uptake and uptake in the thalamus) and has yielded moderate (>70%) classification accuracy (n = 39)67,68,69. A recent study has shown that using deep-learning method and saliency features (extracted from FDG-PET images) significantly improved the differentiation between PD, MSA, and PSM (n = 502)24.

In addition, attempts have been made to differentiate between PD and essential tremor with machine-learning-based dopaminergic imaging analysis. Using striatal uptake ratios as input data for ANN, Hamilton et al.70 distinguished PD from essential tremor (n = 18) with 100% diagnostic accuracy. Further, Palumbo et al.71 classified striatal uptake ratios with probabilistic neural network (PNN) (n = 261), and confirmed that PNN achieved valid classification accuracy to differentiate between PD and essential tremor (accuracy: 81.9 ± 8.1% for early PD; 78.9 ± 8.1% for advanced PD; 96.6 ± 2.6% for essential tremor).

Structural MRI

Atrophy in the midbrain, basal ganglia, and cerebellar peduncles helps distinguish PD from atypical parkinsonian disorders such as progressive supranuclear palsy (PSP) and multiple system atrophy (MSA). PD has subtle volume reduction in cerebral gray matter (GM) and the basal ganglia50,72,73, while major brain atrophy of PSP is in the midbrain and superior cerebellar peduncles2, and for MSA, major abnormalities are in pons, middle cerebellar peduncles, and cerebellum2,74. Although challenging, attempts to use machine-learning approach have been made to differentiate between PD and other parkinsonian types based on these structural MRI imaging features.

To distinguish PD from atypical parkinsonian disorders (such as PSP or MSA), Duchesne et al.75 developed an automated computer classification system that extracted brain tissue composition and deformation features in the hindbrain region from MRI images, applied SVM to feature classification and obtained a classification accuracy of 91% (PD vs. non-PD (PSP or MSA)) (n = 16 PD). Further, to differentiate PD from atypical parkinsonian disorders, Focke et al.76 used GM and WM volumes obtained by voxel-based morphometry (VBM) and found that classification with SVM yielded up to 96.8% accuracy for differentiation between PD and PSP, and 71.9% between PD and MSA, but it failed to differentiate between PD and healthy controls (n = 21 PD). On the other hand, Salvatore et al.77 extracted MRI imaging features by principal components analysis, generated voxel-based pattern distribution map of structural differences for identification of voxel-based morphological biomarkers of PD and PSP, and obtained >90% accuracy in differentiating between PD and PSP, or between PSP and healthy control (n = 28 PD). To further distinguish PD from PSP and MSA, Huppertz et al.74 developed an automated MRI analysis method that computed atlas-based volumetric measures and classified the imaging features with SVM, reported the majority of classification accuracy of >80%, and found the largest atrophy in PD, PSP, and MSA (compared with controls) (n = 204 PD). To differentiate between PD and scans without evidence of dopaminergic deficit (SWEDD) or healthy controls, Singh et al.26,27 extracted discretized voxel intensity changes from MRI using unsupervised self-organizing maps, classified the imaging features with SVM (n = 408 PD) and achieved accurate classification performances (>90%).

In addition, abnormalities in the substantia nigra in PD (due to dopaminergic neuronal loss) revealed by T2-weighted MRI, neuromelanin-sensitive MRI or iron-sensitive MRI at high field strength (such as 7 T) or by 3 T susceptibility weighted imaging (SWI) can be used in the diagnosis of PD14. For example, Haller et al.78 examined PD patients with SWI and found that they had increased SWI in the bilateral thalamus and left substantia nigra, which had diagnostic value in differentiating between PD and other parkinsonism (classification accuracy for SVM: 86.92 ± 16.59%) (n = 20 PD).

Functional MRI (fMRI)

Machine-learning methods have also been applied to rs-fMRI analysis for differentiation of PD subtypes such as tremor-PD vs. non-tremor-PD79, and postural instability and gait difficulty subtype (PIGD) vs. non-PIGD80. Zhang et al.79 used rs-fMRI measurement regional network efficiencies as imaging feature, and classified them with linear discriminant analysis, which yielded an accuracy of 92% in differentiating tremor-PD vs. non-tremor-PD79. Further, to differentiate between PD with Levodopa-induced dyskinesias (LID) and PD without LID, Herz et al.81 extracted seed-based FC in cortico-striatal network from rs-fMRI images, classified them with SVM and achieved a differentiation accuracy of 95.8%.

Diffusion tenser imaging (DTI)

Reduced substantia nigra fractional anisotropy (FA) has been identified and regarded as a PD biomarker for over a decade, but recent meta-analyses have found that substantia nigra fractional anisotropy had a very large variation in results across studies82, had low pooled sensitivity and specificity, and was not a diagnostic biomarker of Parkinson’s disease83.

However, since DTI reflects the disruption of microstructure (e.g., neuron myelin) integrity, DTI has shown promise in differentiating PD from atypical parkinsonism. Haller et al.84 examined DTI images of PD patients and other Parkinsonism with tract-based spatial statistics (TBSS) analysis and found that compared with other parkinsonism, PD patients had an increased FA and a decreased MD in the right frontal white matter, and classification of DTI imaging features using SVM yielded an accuracy of 97.5 ± 7.54% (n = 17 PD vs. 23 other Parkinsonism). Further, Cherubini et al.85 combined DTI and MRI voxel-based morphometry features to distinguish PD patients from PSP patients using SVM, which yielded an improved accuracy (100%) (n = 57 PD vs. 21 PSP). Combining with MRI voxel-based morphometry and rs-fMRI imaging features, DTI has also aided in the differentiation between PD subtypes (PIGD vs. non-PIGD)80. In addition, using combined DTI and apparent transverse relaxation rate (R2*) imaging, Du et al.86 found that MSA has a decreased FA and an increased apparent transverse relaxation rate (R2*) in the subthalamic nucleus, whereas PSP has an increased MD in the hippocampus. Classification of imaging features with Elastic-Net machine-learning technique yielded high differentiation accuracy (>90%) (n = 35 PD vs. 16 MSA vs. 19 PSP)86.

Multi-modal data

Compared with single-modal data features, higher classification accuracy or detection rate using combined multi-modal data features has been achieved in differentiation between PD and atypical parkinsonian disorder. For instance, to differentiate between PD and PSP, Cherubini et al.85 used combined MRI and DTI imaging features, and achieved a classification accuracy of 100%, which was higher than using either MRI or DTI features alone85. In addition, to differentiate between PD and MSA or PSP, Du et al.86 reported high classification accuracy (98–99%) using DTI and apparent transverse relaxation rate R2* imaging features, higher than DTI or R2* features alone.

Machine-learning-based studies for the early detection of PD

Dopaminergic imaging

Machine learning has been found useful in dopaminergic imaging analysis for early PD detection. As a pioneering study, Prashanth et al.16 investigated the value of different SVM methods in classifying SPECT images for early PD detection. Using striatal-binding ratios in the striatal regions from data obtained from the PPMI database, they found that SVM was valuable in early PD detection and SVM with non-linear kernel achieved higher detection rate (96.14 ± 1.89%) than SVM with linear kernel. Oliveira et al.87, applied SVM and other classifiers to classification of the binding potential at each voxel in the striatum of SPECT images for early PD detection and reported that SVM achieved the highest detection rate (97.86%). Later, Prashanth et al.17 added non-motor clinical data features such as cerebrospinal fluid (CSF) measurements to further improve the detection rate of early PD. Prashanth et al.88 further found that shape and surface-fitting-based features showed higher importance than striatal-binding ratios for early PD detection and feature classification with SVM yielded a classification accuracy of 97.29 ± 0.11% (n = 427)88. In addition, Oliveira et al.89 found that the length of the striatal region uptake (detection rate: 96.5%) performed better than uptake ratio-based based features for early PD detection (n = 443). These findings have demonstrated the value of machine-learning approach in dopaminergic image analysis for early detection of PD and the importance of imaging feature and classifier selection/optimization in machine-learning-based imaging analysis.

Structural MRI

Machine-learning methods have helped improve the diagnostic gain of MRI. Classification of combined MRI measurements (gray matter (GM) + white matter (WM) + cerebrospinal fluid (CSF) volumes) with SVM could detect early PD with an accuracy of 80% (n = 19)30. A robust linear discriminant analysis (LDA) classifier with an optimal set of imaging features (GM and WM volumes of 98 ROIs from MRI) that applied to MRI images of early PD patients (99% in early stages) (n = 374) yielded a classification accuracy of 81.9%, which was higher than 69.1% yielded by SVM90. In addition, Singh and Samavedham26 demonstrated that the structural changes of early PD could be detected by using MRI features alone with an unsupervised self-organizing map approach (n = 518) and high classification accuracy (>95%) was achieved. This approach was later applied to a larger dataset (n = 1316) from PPMI (Parkinson’s Progression Markers Initiative) and ADNI (Alzheimer’s disease neuroimaging initiative), and yielded high classification performance (95.37 ± 0.02%) for distinguishing patients with PD (or Alzheimer’s disease) from healthy subjects27, which confirmed the value of the machine-learning approach in aiding the diagnosis of neurodegenerative disorders such as PD and Alzheimer’s disease. In a recent study, Amoroso et al.25 used an unsupervised approach for MRI image classification, and they extracted structure regional connectivity features from MRI images (n = 374), applied SVM to imaging feature classification, and obtained a classification accuracy of 88 ± 6% (using MRI features alone) or 93 ± 4% (using MRI features and clinical data).

Functional MRI (fMRI)

Studies have shown that rs-fMRI can detect PD at early stages. Wu et al.29 used effective connectivity extracted from rs-fMRI to examine patients with early PD (n = 16) and reported that the substantia nigra pars compacta in early PD had decreased effective connectivity with regions such as the striatum, thalamus, supplementary motor area and cerebellum, which negatively correlated with the Unified Parkinson’s Disease Rating Scale (UPDRS) scores. However, the detection rate of early PD using rs-fMRI imaging features alone (classified by SVM) is not high (e.g., 74%30) and needs to be improved. In addition, the findings of two rs-fMRI studies in asymptomatic LRRK2 mutation carriers suggested that functional connectivity disruptions precede the presence of PD motor symptoms32,33. Further, using rs-fMRI, Rolinski et al.31 examined patients with rapid eye movement (REM) sleep behavior disorder (RBD) (n = 26), and PD patients (n = 10), and found that functional connectivity measures of basal ganglia network (BGN) dysfunction differentiated RBD and PD from HC with high sensitivity (96%) and specificity (74% for RBD, 78% for PD), suggesting that rs-fMRI may be a biomarker in identifying early functional connectivity changes in the BGN in subjects at high risk of PD and patients with PD. However, confirmative studies are warranted.

Multi-modal data

Combining multi-modal imaging and/or clinical data have improved early PD detection. Long et al.30 found that combined multi-modal features improved early PD detection and multi-modal imaging (MRI and rs-fMRI) with combined multi-modal features (GM + WM + CSF + ReHo+ALFF + FC) yielded higher classification accuracy (87%) than single-modal features (MRI: 80%; rs-fMRI:74%). In addition, Oliveira et al.89 examined SPECT images of early PD patients and found that several data features had high classification accuracy including the length of the striatal region (96.5%), the putaminal binding potential (95.4%) and the striatal-binding potential (93.9%), while the combined imaging features had the highest classification accuracy (97.9%). Furthermore, Prashanth et al.17 classified non-motor clinical data features, such as rapid eye movement (REM) sleep behavior disorder (RBD) and olfactory loss, and CSF measurements in addition to SPECT imaging markers (striatal-binding ratios) with classifiers such as SVM and random forests, and found that a combination of these data features with SVM classification performed the best in early PD detection (detection rate: 96.40 ± 1.08%) (n = 401).

Discussion

The studies reviewed in this paper have demonstrated that machine-learning automated data analysis, identified data patterns (e.g., in the distribution of the radiotracer uptake on SPECT images) and improved the accuracy of imaging quantification in the diagnosis of PD. A recent comprehensive review has confirmed the value of machine learning in assisting the diagnosis of PD, and has further pointed out the potential of these machine-learning applications to enhance clinical decision-making in PD diagnosis91. Particularly, the review by Mei et al.91 provided statistical analysis for the machine-learning studies in PD diagnosis, and reported that (1) on average, the classification accuracy of the machine-learning applications was ~94% for SPECT imaging, ~86% for PET imaging, and ~87% for MRI (including fMRI) imaging; (2) SVM and NN (neural network) were the most frequently used methods in the imaging studies, the usage for SVM (50%–70% used for SPECT or PET imaging, ~60% for MRI imaging) was higher than that of NN (22%–53% used for SPECT or PET imaging, ~23% for MRI imaging); and (3) SVM and NN had higher classification accuracy than other machine-learning methods in the imaging studies.

The value and role of machine learning in the diagnosis and early detection of PD

The value and potential of machine learning (ML) in PD diagnosis have been clearly demonstrated by comparative studies that compared ML methods with conventional techniques such as semi-quantitative methods or visual analysis in the diagnosis of PD. For example, it has been shown that computer-aided diagnosis (CAD) system based on machine-learning methods has outperformed semi-quantitative methods of SPECT image analysis15,19 and improved PD diagnostic accuracy of radiologists20. Further, Choi et al.18 demonstrated the value of recently developed deep-learning techniques (convolutional neural networks) in SPECT image analysis, and obtained high classification performance that is comparable to experts’ visual analysis and semi-quantitative analysis18. In addition, ML methods have been shown useful in differential diagnosis65,66,67,68,69,85,86 and early PD detection17,26,30,31.

Machine-learning applications can automatically analyze and classify imaging and clinical data in PD, but machine-learning applications are still in infancy and subject to errors, pitfalls and biases. For example, clinician-dependent class/group labeling of the training data in the machine-learning models may be prone to errors because of the high error rate in the clinical diagnosis of PD. As another example, newly developed deep-learning models may have new challenges such as overfitting, low generalizability and data insufficiency.

The role of machine-learning applications is not to substitute clinicians, but to assist them in clinical decision-making, to relieve them from tedious data preprocessing, to save them from time-consuming manual raw data inspection or processing (e.g., draw regions of interest and perform measurements on images), and to help them focus on important clinical decision-making questions in order to reduce medical errors, and improve the clinical diagnosis of PD. On the other hand, machine-learning applications are far from perfect and still need to be improved. Being aware of the potential errors and problems in machine-learning applications in radiology, Geis et al.92 pointed out that clinicians who use the applications are ultimately responsible for clinical decision-making and patient care.

Current limitations and challenges in machine-learning applications

First, prone-to-error labeling of classes/groups in supervised learning

Due to the high error rate of a clinical diagnosis of PD, clinician-dependent labeling for the classes or groups (e.g., PD patients or healthy subjects) of the training data (that are used in supervised machine-learning applications) may be prone to error. To overcome this problem, training data labeling in supervised learning need to be confirmed by pathological (biopsy or post-mortem) data. On the other hand, when the clinical diagnosis of a PD dataset is uncertain and pathological data is not available, unsupervised-learning approaches may be considered. Without the need for training data, unsupervised learning seeks to identify hidden data patterns, which may overcome the problem of mislabeling diagnostic categories in the training data in supervised learning. However, there are some technical challenges in applying unsupervised-learning methods to imaging analysis in PD diagnosis, e.g., unsupervised-learning methods are not good at accurately extracting imaging features43. Attempts have been made to overcome such difficulties, e.g., by using semi-supervised-learning clustering method that combines a small amount of labeled data with a large amount of unlabeled data in the training dataset43. In addition, unsupervised learning has been applied to MRI feature selection in early PD detection25,26,27. To detect early PD using supervised-learning methods on structural MRI features is challenging, but Singh and Samavedham have demonstrated that the structural changes of early PD could be detected by integrating Kohonen unsupervised self-organizing map and least-squares support vector machine (n = 518)26. This approach was later applied to a larger dataset (n = 1316)27, which confirmed the value and robustness of the method.

Second, machine-learning “black-box”

Since machine-learning applications identify data patterns (e.g., abnormal structural or functional changes in imaging data) that could be invisible (or unrecognizable) to humans, the mechanisms and results of the machine-learning applications (especially for neural network-based models) may be difficult to interpret due to lack of direct “evidence” supporting classification results. This could be against the principle of evidence-based medicine and results in reluctance to accept machine-learning applications in clinical practice. For example, new deep-learning (or deep-neural network) models often have millions of parameters which make them like a “black-box” incomprehensible to clinicians, and make it hard to interpret how the classification results have been obtained. However, just like a microscope allows people to see at cellular level (which is invisible to human eyes), machine-learning methods identify abstract imaging features (that reveal brain signal differences between groups/classes) such as distributions of a radiotracer uptake, image voxel intensity changes, and texture features, and amplify these signal differences between classes/groups at a resolution that the signal differences between classes/groups can be detected by “machines” or machine-learning models to best separate the data into classes or groups. Clinicians do not have to understand the details of the inner workings (e.g., the parameters) of machine-learning methods in order to use these application tools for PD diagnosis, but it is beneficial to have some basic knowledge of the mechanism of a machine-learning method and statistical pitfalls in order to avoid errors. Nevertheless, although there is an abstraction in the mechanism of machine-learning methods, machine-learning applications shall follow the principle of evidence-based medicine, use best evidence in the field of PD diagnosis (e.g., to guide feature selection or check classification results), and provide as much “evidence” as possible to support clinical decision-making. For instance, in addition to imaging feature classification, Singh et al.27 identified disease-specific biomarkers (i.e., significant brain regions affected by PD) with a machine-learning method, and these biomarkers, serving as “evidence”, could be used to decipher disease progression. Further, efforts have been made to interpret classification results of deep-learning models. For example, Magesh et al.93 reported a newly developed deep-learning model CNN based on transfer learning to analyze and classify SPECT DaTSCAN images that could distinguish PD patients from healthy controls with an accuracy of 95.2%, and used Local Interpretable Model-Agnostic Explainer methods to interpret classification results.

Third, overfitting problem in machine learning

Overfitting problem often occurs in machine-learning applications, which refers to a machine-learning method or model performs very well on a training dataset, but not on a test dataset or other datasets94. This might be because the machine-learning method is over-trained by the training dataset and the noise in the training data is also modeled which makes the model ungeneralizable to other datasets. A recent rs-fMRI study showed that PD-related functional connectivity changes were not reproducible across the 3 PD samples used in the study62. The classification performance (PD vs. HC) was low (50–60%) even in the datasets from a single data sample and the lack of generalizability in these data samples may be mainly due to high PD heterogeneity62. In addition, there are new challenges of overfitting in machine-learning applications using deep-learning models95. To overcome the overfitting problem in machine-learning applications for PD diagnosis, it is necessary to improve data quality, reduce data heterogeneity, use large data samples and validate machine-learning models with proper validation methods (such as N-fold cross-validation) in order to make machine-learning models generalizable. Further, to avoid overfitting in deep-learning models, some methods such as implicit regulation, proper initiation, adjusting learning rates and reducing model complexity may help the models generalize well95.

Future directions

First, improve and validate the machine-learning applications

Despite the progress made in machine-learning applications in the diagnosis and early detection of PD, there is still much room for improvement. 1) more research is needed to address the problem of prone-to-error class labeling in supervised learning. 2) it is necessary to optimize multi-modal data features for an optimal feature set, and choose and optimize machine-learning classifiers for an optimal classifier to improve classification accuracy. This is because: (1) several studies have shown that combined multi-modal data features (such as SPECT + clinical data) had higher detection rate than single-modal features30,37; (2) comparative studies using different classifiers have demonstrated the differences in classification accuracy between different classifiers36,37,89. 3) thorough validation is needed before the machine-learning applications can be used in clinical settings. In addition, newly developed deep-learning techniques have shown promising results18,93,96, but also face new challenges and obstacles such as overfitting, low generalizability and data insufficiency. For a recent review of deep-learning applications for the diagnosis of PD, see ref. 97. Research in explainable machine-learning models is needed to address the “black-box” problem in the neural network models. To overcome the new challenges in deep-learning models, further research is needed to avoid overfitting in deep-learning models, improve these new deep-neural network applications and make them more accurate, reliable, generalizable and explainable.

Second, improve modeling longitudinal multi-modal data

Since Parkinson’s disease is a progressive disorder, it is necessary to model multi-modal data over time in order to identify biomarkers for PD progression. Some efforts have been made to tackle this difficult problem in recent years98,99,100,101,102. Due to the complexity of longitudinal multi-modal (imaging and clinical) data, methods such as embedding learning and sparse regression have been proposed, which have obtained promising results102. Further research is needed to improve modeling these longitudinal multi-modal data so that reliable biomarkers can be identified to enhance the diagnosis and management of PD.

Third, integrate ML-based applications into clinical decision support system to aid in PD diagnosis

It has been demonstrated that the performance of machine-learning-based computer-aided diagnostic (CAD) system generally exceeded that of semi-quantitative analysis on SPECT imaging in distinguishing PD patients from healthy controls12,16 and improved PD diagnostic accuracy of radiologists17. More comparative and confirmative studies are needed to further reveal the advantages and weaknesses of these machine-learning applications. Since semi-quantitative imaging analysis software is commercially available at clinics, such software may be upgraded to incorporate mature machine-learning algorisms to further assist clinicians in the diagnosis of PD. However, a framework that facilitates the development, deployment, validation and regulation of such machine-learning-based clinical applications is needed. For example, benchmark data and metrics (e.g., the PPMI database) need to be established to test the optimized and standardized applications. Further, before these ML-based clinical applications are deployed in clinical settings, it is necessary to run clinical trials to assess the diagnostic gain and clinical benefits of such applications over conventional semi-quantitative analysis (or visual analysis). Consequently, rules and regulations are needed to facilitate this process in order to make such ML-based systems available to clinics.

Conclusions

In summary, encouraging progress has been made in applying machine-learning techniques to the diagnosis and early detection of PD. Although machine-learning applications in PD diagnosis are still in their infancy, machine-learning methods have automated imaging data analysis, outperformed conventional semi-quantitative analysis and performed comparably well as experts’ visual inspection in detecting PD-associated dopaminergic degeneration on SPECT imaging, reduced interpretation variability of imaging, improved PD diagnostic accuracy of radiologists and aided in differential diagnosis and early PD detection. Using combined multi-modal imaging and clinical data (in these applications) may further enhance the diagnosis and early detection of PD. To integrate these machine-learning applications into clinical systems, further validation and optimization are needed to make them accurate and reliable. Despite the challenges in translating machine-learning applications into clinical practice, machine-learning techniques are promising to assist clinicians in improving differential diagnosis of parkinsonism and early diagnosis of PD, which may reduce the error rate of PD diagnosis, and help detect PD at pre-motor stage so that early treatments (e.g., neuroprotective treatment) may be applied to slow down PD progression, prevent severe motor symptoms from emerging, and relieve patients from suffering.