Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Multivariate prediction of dementia in Parkinson’s disease


Cognitive impairment in Parkinson’s disease (PD) is pervasive with potentially devastating effects. Identification of those at risk for cognitive decline is vital to identify and implement appropriate interventions. Robust multivariate approaches, including fixed-effect, mixed-effect, and multitask learning models, were used to study associations between biological, clinical, and cognitive factors and for predicting cognitive status longitudinally in a well-characterized prevalent PD cohort (n = 827). Age, disease duration, sex, and GBA status were the primary biological factors associated with cognitive status and progression to dementia. Specific cognitive tests were better predictors of subsequent cognitive status for cognitively unimpaired and dementia groups. However, these models could not accurately predict future mild cognitive impairment (PD-MCI). Data collected from a large PD cohort thus revealed the primary biological and cognitive factors associated with dementia, and provide clinicians with data to aid in the identification of risk for dementia. Sex differences and their potential relationship to genetic status are also discussed.


Cognitive impairment in Parkinson’s disease (PD) is pervasive with multiple negative effects1. The trajectory of cognitive decline in PD can vary considerably, however, with some individuals quickly developing cognitive symptoms that interfere with functional activities and others maintaining steady but mild symptoms over many years2. Because cognitive impairment can begin insidiously, such problems can go unrecognized and in the absence of appropriate behavioral, social, and medical interventions may interfere with patient safety and independence3. A current important question in PD research is thus whether those who are at risk for impending cognitive decline can be identified in order to implement appropriate interventions, optimize medical management, and enhance autonomy.

There is now abundant genetic and phenotypic data to support substantial clinical and biological heterogeneity in cognitive decline in people with PD, and this complexity challenges traditional methodological approaches2,4. There are thus many potential interactions between processes that underlie cognition and other biological systems among individuals that may introduce error. Conventional statistical approaches may thus result in poor reproducibility. Such methods are chosen by the researcher a priori and are used to test one or a few variables at a time, often with an overemphasis on P values and an inability to adequately address the potential impact of heterogeneity. Given these issues, the resulting conclusions may lack important clinical meaning and generalizability. In order to address the problems introduced by univariate statistical methods, multivariate models are used with increasing frequency in the study of cognitive diseases5.

Here, we utilized multivariate models, including fixed-effect, mixed-effect, and multitask learning models, to examine the interplay among cognition, genetics, and clinical features in the Pacific Udall Center (PUC), a large, deeply annotated cohort of participants with PD. Using the first two modeling methods, we sought to (i) identify cognitive diagnosis outcomes in this longitudinal prevalent PD cohort, (ii) determine biological factors related to cognitive diagnosis and dementia prediction, and (iii) establish any associations between genetic factors and specific cognitive test performance. Finally, using the multitask models, we sought to identify associations between cognitive test performance patterns and subsequent dementia.



Fixed-effect, mixed-effect, and multitask learning models were implemented to analyze detailed cognitive and biological data from 827 participants with PD (514 with longitudinal data) enrolled in the PUC. Age, education, sex, disease duration (time since initial onset of PD motor symptoms), total levodopa equivalent daily dose (LEDD; calculated as described by Tomlinson et al.6), the 15-item Geriatric Depression Scale (GDS-15)7, and site were the included covariates. To determine whether the inclusion of younger participants influenced the results, analyses were repeated both for the entire sample and excluding participants under 50. Given that there were not substantial differences noted in the results, the following results are presented using the entire study sample. Baseline cohort characteristics are provided in Table 1. Longitudinal change in cognitive status (no cognitive impairment [NCI], mild cognitive impairment [PD-MCI], dementia [PDD]) across visits is depicted in Fig. 1.

Table 1 Baseline characteristics of the Pacific Udall Center cohort.
Fig. 1: Changes in cognitive status across visits.

The number inside each node represents the number of people with the corresponding cognitive status indicated by its color. The nodes with dashed line represent people with only data from the first visit. The links represents the group participants who continued to the next visit.

Effects of biological factors on cognitive status

In the entire sample, a mixed-effect model developed using only biological factors was found to have satisfactory prediction of cognitive status across all visits (average area under the receiver operating characteristic curve [AUC] = 0.71, Fig. 2a). Predictions of both PD-NCI and PDD (AUC = 0.76 and 0.77, respectively) were more accurate than PD-MCI (AUC = 0.61). Of note, this model using only biological factors performed worse than the model using only cognitive test performance (a major component in making a cognitive diagnosis) (average AUC = 0.9; Fig. 2b). In the final model, which included all covariates, all biological factors were significantly associated with cognitive status except for microtubule-associated protein tau (MAPT) and apolipoprotein E (APOE) genotype (Table 2). Notably, the increase in odds ratios of both being male and having a glucocerebrosidase gene (GBA) variant were approximately equivalent to an additional 15 years of PD duration in terms of PDD risk in this cohort.

Fig. 2: Biological factors satisfactorily predict cognitive status.

Cross-validated area under receiver operating characteristic (AUC) of the mixed-effect model prediction based only on biological factors (a) compared to the AUC of the mixed-effect model prediction based solely on cognitive tests (b). Error bars represent standard deviations (sd).

Table 2 Association of biological factors with cognitive status in the full longitudinal PUC cohort.

In the longitudinal cohort (excluding participants with PDD at baseline), survival analyses showed a significantly shorter duration between PD symptom onset and diagnosis of PDD in GBA mutation carriers compared to non-mutation carriers (Fig. 3a). Faster progression to PDD was also observed in males compared to females (Fig. 3b). Male participants with a GBA variant were starkly more at risk of acquiring PDD, and earlier, than female participants with no GBA variant (Fig. 3c). APOE ε4 did not exhibit a significant effect on time to PDD (Fig. 3d). The significance of these observations remains unchanged even if the time scale was changed to age at visit or to months since the first visit (Supplementary Fig. 1).

Fig. 3: Survival analyses indicate significant longitudinal differences between participants of different sex and selected genes.

Survival analyses to an endpoint of PDD for participants categorized by GBA variant (a), sex (b), combination of both (c), and APOE ε4 allele (d) by the number of years since the diagnosis of PD. P value obtained from log rank tests indicated significant effect of sex, GBA variant, and the combination of both.

In analyses that were restricted to participants with longitudinal data who were nondemented at their first visit but were diagnosed with PDD at any subsequent visit (n = 97), age at PD onset was also a significant factor in the rate of progression to PDD (Supplementary Table 1). Number of years from PD onset until PDD and age at PDD are shown in Supplementary Fig. 2; no correlation was noted (R 0.1, results not shown). Later PD symptom onset was associated with faster progression to PDD (Supplementary Fig. 2).

Effects of genetic factors on cognitive test performance

Analysis of the fitted mixed-effect model indicated the strongest effect on individual cognitive tests was from GBA, which was significantly associated with all tests except phonemic verbal fluency and Hopkins Verbal Learning Test-Revised (HVLT-R) Delayed Recall after Bonferroni correction (Table 3). Both APOE and MAPT did not exhibit significant effects after correction. However, analysis using a sex-specific cohort (females only) suggested a significant effect of APOE ε4 with lower performance on semantic verbal fluency (Supplementary Table 2). In addition, GBA effects on visuospatial and verbal learning tasks could be sex-specific (Supplementary Table 3). It should be noted that a generalizable predictive model could not be developed for this purpose due to large random effects between individuals (as evidenced by the relatively large standard errors of the random intercept for each test; Table 3).

Table 3 Association of APOE ε4 allele, GBA status, and MAPT haplotype with the cognitive performance in the full longitudinal PUC cohort.

Prediction of future cognitive diagnosis by cognitive test performance

Multitask models were employed for future cognitive status prediction, where each task predicted cognitive status of a specific year in the future based only on the data from the first visit (limited to five years since the first visit due to reduced numbers of visits beyond this point). The model could accurately separate PD-NCI from PDD up to four years into the future (Fig. 4a). However, the model could not accurately differentiate PD-MCI from other diagnoses in any year. Analysis of the model components indicated that cognitive tests are the most important features in the prediction of future cognitive status. Specifically, HVLT-R Total Recall and Digit Symbol scores were the most indicative of PD-NCI, whereas the Montreal Cognitive Assessment (MoCA), semantic verbal fluency, Digit Symbol, and Trailmaking Test B minus Trailmaking Test A (TMT B-A) were the most indicative of PDD (Fig. 4b). Other factors including sex, GBA status, and PD duration and severity also affected some tasks at a lower scale. This suggests that although biological factors are significant, cognitive test scores are stronger predictors of subsequent dementia. This is consistent with the mixed-effect analysis above which demonstrated that cognitive status is more strongly associated with combined cognitive test performance than the combination of biological factors at each visit (Fig. 2).

Fig. 4: Multitask model indicates current test performances could imply future cognitive status.

The area under receiver operating characteristic curve (AUC) of the multitask model prediction on unseen data with each task predicting the participants’ cognitive status at nth years after the first visit using only their first visit and biological data (a). The median (Q2), the first and third quantile (Q1 and Q3), and the minimum and maximums (Q1−1.5IQR and Q3 + 1.5IQR) are at the center line, bounds, and the whiskers of the box plots. The heatmap depicting the magnitude of components from PD-NCI and PDD classification models, highlighting the importance of many of the cognitive tests in the prediction of future cognitive status. The positive components in each model are associated with higher probability of that model’s diagnosis (b).


In the current study, we evaluated features related to patterns of cognitive progression in a large PD cohort. Age, disease duration, sex, and GBA status were the primary biological factors associated with cognitive status. Survival analyses demonstrated the importance of sex, GBA, and age of PD onset in the progression to PDD in this prevalent cohort. GBA carriers had worse performance across most cognitive measures, and potential sex-specific differences on specific cognitive tasks were noted in relation to APOE and GBA. Importantly, when all variables were included in the model, we found that although performance on specific cognitive tests best predicted subsequent cognitive status in the cohort for PD-NCI and PDD, this model could not accurately predict future PD-MCI.

The size of the PUC cohort, breadth of data collected, and longitudinal design permitted implementation of robust multivariate approaches to address important questions related to cognitive progression in people with PD. Increasingly, such methods are employed across disciplines to address shortcomings associated with traditional statistical approaches. While to date the use of machine learning approaches is limited in PD research, such methods have been used to predict disease progression in the Michael J Fox Foundation Parkinson’s Progression Markers Initiative (PPMI)8. Only one recent study included cognitive outcome in the PPMI cohort, and found that initial MoCA score, sleep symptoms, auditory working memory, and anxiety symptoms were the primary factors related to subsequent worsening global cognition. Unlike the current study, age, sex, and disease duration were not related to subsequent decline in global cognition9. However, PPMI enrolls participants with de novo PD, thus participants were only evaluated during the earliest stages of the disease when cognitive decline may be minimal. Further, the sample size was smaller and length of follow-up shorter than in the current analyses. Importantly, neuropsychological testing in this study included only the MoCA, compared to the depth and breadth of testing available in our cohort. Finally, genetic factors that may directly influence PD phenotypes were not included. For example, GBA variants have been associated with the above traits (anxiety10, auditory working memory11, sleep symptoms12). These phenotypic features may thus serve as a proxy for certain underlying biological traits in some participants. In the current study, we clearly demonstrate the important role of GBA in cognitive presentation and progression in PD, consistent with a previous longitudinal study by our group using traditional statistical methods13.

Although we, and now several others, have reported increased cross-sectional risk for dementia in people with PD who inherited an APOE ε4 allele14,15,16, our results here showed only a trend to an increased rate of progression to dementia in this group. These results mirror those for AD, where APOE ε4 is a strong and extensively replicated genetic risk factor; however, the impact of APOE ε4 on clinical progression to MCI or AD dementia in multivariate analyses is not clear. Indeed, some reported a significant impact of APOE ε4 on clinical progression to MCI or AD dementia, while others did not17,18,19,20. These studies show that the impact of APOE ε4 on clinical progression is complex, and several observed significant interactions with being female. Our results most closely match those from the Alzheimer’s Disease Neuroimaging Initiative, Australian Imaging, Biomarker and Lifestyle Study, and Harvard Aging Brain Study, which showed that APOE ε4 itself is not a major factor in clinical progression18. Although not a strong predictor of progression to PDD in our cohort, inheritance of an APOE ε4 allele was not benign; women with PD who had an APOE ε4 allele were at greater risk for decline in semantic verbal fluency. As we have previously shown, reduced semantic verbal fluency is associated with shortened time to PDD among females only21. In the AD literature, impaired semantic verbal fluency is associated with dementia diagnosis as well as with AD biomarkers in preclinical disease22,23, and there is some evidence that females with AD dementia may perform worse than males on semantic verbal fluency tasks24. Further, APOE ε4 may play a role in influencing semantic verbal fluency performance in amnestic mild cognitive impairment25. Taken together, these results tentatively suggest that APOE ε4 may have a greater impact on cognitive phenotype in females with PDD, although additional research is necessary. Finally, it is important to consider cohort characteristics among these many observational studies that may underlie some of the apparent discrepancies. Indeed, our cohort likely has under-sampled early PDD and this may undermine our ability to associate progression to PDD with APOE ε4. With this limitation in mind, our longitudinal results from people with PD align with most results from AD and highlight a possible but weak effect on the rate of progression to dementia, possible domain-specific effects, and potentially stronger impact on women.

Consistent with our previous cross-sectional reports26,27, we also found no association between the MAPT H1 haplotype and specific cognitive test performance, dementia diagnosis, or cognitive decline during follow-up. Previous reports on MAPT and cognition are mixed, with one group reporting faster decline in MMSE scores and greater dementia risk in PD patients with the H1 haplotype28 and another showing a greater association between the H1 haplotype and PD diagnosis among those with dementia29. However, many others have shown no association between cognitive test performance, cognitive diagnosis, or rate of cognitive decline and the H1 haplotype, and the current study provides additional evidence that the MAPT H1 haplotype may not play a primary role in cognitive decline in PD30,31,32.

The results from the current study extend our understanding of sex differences and cognitive decline in PD, particularly in association with genetic profile. As we and others have shown, male sex is associated with a higher likelihood of cognitive impairment and with faster progression of cognitive symptoms in PD21. Here, we demonstrate an additive relationship for GBA and sex in influencing the rate of progression to dementia, such that male GBA carriers progressed most quickly, while female GBA carriers had a similar rate of progression to that of male non-GBA carriers. Predictably, GBA carrier status was associated with worse performance in multiple domains for both males and females (global function, divided attention, working memory, and processing speed)11,33. However, while the previously reported association between GBA and lower visuospatial function in PD is replicated, in secondary analyses the association was only significant for males. Reduced visuospatial function has been implicated in conversion to dementia in PD34,35. Performance on the Judgment of Line Orientation task is most frequently correlated with lesions in the right posterior parietal-occipital regions36, areas where GBA carriers have demonstrated reduced synaptic activity and nigrostriatal DAT density37. Thus, the greater degree of cognitive decline in males with PD may be in part related to GBA-influenced lesions in these regions or in the pathways that serve these regions. Additional work in this area is needed to determine if GBA influences lesion location in brain differentially for males and females.

Overall, our multivariate approach showed that the prediction of placement into the cognitively unimpaired and PDD groups is quite high using all available variables, particularly specific cognitive measures. Our models could not, however, accurately predict PD-MCI. The identification of meaningful cognitive subtypes in PD-MCI has proven difficult given the heterogeneity of the disease38. Variability in PD-MCI is common, with a 24% average rate of reversion over 1–6 years of follow-up reported in a recent meta-analysis2. Medication effects, motor subtypes, anxiety, depression, fluctuations in attention, hallucination, delusions, and myriad other disease-related factors may impact cognitive function for those on the path to PDD, leading to diagnostic instability and difficulty predicting rate of cognitive decline2.

The primary limitation of the current study was that, due to enrollment of participants with prevalent PD, we were unable to follow the natural history of cognitive impairment from disease onset to dementia. As a result, those diagnosed with PDD early in the disease are likely under-sampled, leading to an inflated time to dementia when compared to what others have reported39,40,41. However, the goal of the current study was not to provide expected annual incidence rates of PDD, as these have been well-described previously. Rather, the goal was to identify important biological and cognitive factors that predict cognitive diagnosis; by enrolling a prevalent sample we were able to study the full cognitive diagnostic range even cross-sectionally at the initial visits, something that is not possible in an incident PD cohort42. Thus, although we provide survival analysis models to demonstrate the differences in time to PDD according to various biological factors, the absolute time values should not be taken to represent time to incident PDD in the entire PD population. Possible additional contributors to this finding of longer time to PDD in the cohort include (a) our measurement of disease onset from first motor symptoms vs. time of PD diagnosis, and (b) a substantially larger cohort than the previously mentioned studies, potentially leading to wider variability in PD phenotype. Future results from incident studies including larger samples will be informative. Further sampling limitations of the study include that our participants were generally highly educated, and thus may not be representative of the larger population with PD. Finally, due to the limitations of the data collected, we were not able to include potentially important variables in the analyses, such as the possible mediating effects of antidepressants and sedatives, vascular risk, and detailed sleep and anxiety features.

Cognitive impairment in PD is pervasive and distressing, and identification of factors associated with cognitive decline in PD may allow earlier intervention. Traditional statistical methods aimed at the identification of factors associated with cognitive progression may produce biased or spurious results. Our robust multivariate approaches to data collected from a large sample of participants with prevalent PD and varying levels of cognitive function reveal that the primary biological factors associated with PDD are male sex, GBA status, age, and disease duration, while performance on tasks measuring executive functions, semantic verbal fluency, and recall were the best predictors of subsequent PDD. PD-MCI was much more unstable and difficult to predict with either biological or cognitive variables. These results provide clinicians with data to aid in the identification of risk for PDD, and thus to implement important behavioral, social, and cognitive interventions to maximize quality of life in people with PD. Future work to better identify predictors of variability versus stability for those with PD-MCI will be important in the ongoing pursuit of optimally characterizing and introducing effective interventions for this sizable group of cognitively impaired individuals with PD.



Participants were enrolled in the PUC, a Morris K. Udall Center of Excellence in Parkinson’s Disease Research, which collects detailed longitudinal data from three sites: Stanford University, University of Washington/Veterans Affairs Puget Sound Health Care System, and Oregon Health Sciences University/Veterans Affairs Portland Health Care System. All participants met the United Kingdom Parkinson’s Disease Society Brain Bank diagnostic criteria for PD (UKPDBB); atypical parkinsonism syndromes were excluded. Participants were excluded from these analyses if they met UKPDBB criteria at their initial visit but did not meet criteria by their final visit and/or were determined to have parkinsonism related to other factors, or if there was not enough information to determine UKPDBB status (n = 19). Participants with an unknown/other cognitive diagnosis (n = 4) or those who were diagnosed with PDD but later reverted to PD-NCI or PD-MCI (n = 5; unexpected events likely due to factors such as anxiety, depression, illness, or medication effects) were excluded. There were no exclusions based on age at visit or age at symptom onset. Participants from all sites who completed at least one visit and who were assigned a cognitive diagnosis of PD-NCI, PD-MCI, or PDD were included (n = 827). Longitudinal analyses included participants with at least one follow-up examination (n = 514). Time between follow-up visits for most participants was 1–2 years; a smaller proportion had longer intervals (Supplementary Fig. 3).

Ethical compliance

The institutional review boards at Stanford University, University of Washington/Veterans Affairs Puget Sound Health Care System, and Oregon Health Sciences University/Veterans Affairs Portland Health Care System provided formal approval for the study procedures. All participants (or a legally authorized representative) provided written informed consent.

Consensus diagnosis

Participants were assigned motor and cognitive diagnoses during diagnostic consensus conferences attended by at least two movement disorders specialists and a neuropsychologist. Cognitive diagnoses were made according to published criteria43,44 as previously described45, and were based on data from neuropsychological testing (Supplementary Table 4) (comparing raw test scores to demographically corrected normative values), participant history, and clinical interview.

Cognitive variables

The core cognitive variables included in the current analyses are those common to all sites: (1) global (MoCA46); (2) learning & memory (HVLT-R47); (3) attention/working memory (Letter-Number Sequencing from the Wechsler Adult Intelligence Scale – III48, Digit Symbol subtest from the Wechsler Adult Intelligence Scale-Revised49, Trailmaking Test, parts A and B50); (4) verbal fluency (animals and letters F-A-S50); and (5) visuospatial (Benton Judgment of Line Orientation51). Trailmaking Test B - A scores were used to minimize the effects of motor disability. Participants completed additional neuropsychological tests at each site to permit cognitive diagnosis using Movement Disorders Society PD-MCI Level II criteria (Supplementary Table 4). Raw test scores were used for the purposes of the analyses. Analyses including z-scores based on comparison to demographically corrected normative values were run separately; given that these did not produce substantially different results as compared to the raw scores, the results are not shown.

Clinical variables and covariates

A movement disorder specialist assessed severity of motor symptoms using Part III of the Movement Disorder Society revision of the Unified Parkinson Disease Rating Scale (MDS-UPDRS)52 and the Modified Hoehn and Yahr scale53. Age, education, sex, disease duration (time since initial onset of PD motor symptoms), total LEDD, and GDS-15 were included as covariates. Site differences were seen at baseline with regard to education, motor severity, and cognitive severity/status (Supplementary Table 5), and thus site was also included as a covariate. Differences in time between visits for participants was accounted for by including age in all analytic models.

Genetic variables

Genomic DNA was extracted from peripheral blood or saliva samples using standard methods. Participants were genotyped for APOE rs429358 and rs7412 (which define the ε2, ε3, and ε4 alleles) and MAPT rs1800547 (which differentiates H1 and H2 haplotypes) using commercially available assays TaqMan assays (Applied Biosystems)27. APOE genotype was encoded as either having at least one ε4 allele or none. Sequencing of the entire GBA coding region was performed to detect the presence of all known pathogenic mutations and the E326K polymorphism (rs2230288). “Pathogenic” mutations were defined as previously described11. GBA mutations and the E326K polymorphism were combined as a single group in dominant model analyses given our previous demonstration that both are associated with a higher risk of dementia and specific cognitive impairments11,13.

Data preprocessing

Missing data points (2% of the total observed features) were imputed using Restricted Boltzmann machine.

Linear fixed-effect and mixed-effect models

Ordinal mixed-effect regression with logit link54 were used to study the longitudinal association between biological factors and cognitive status. A linear mixed-effect regression55 was used to study the longitudinal association between biological factors and cognitive test performance. For both analyses, random intercepts were used to account for correlation within a participant. To examine the model performance in predicting cognitive status, the distribution of the reported AUC for each diagnosis (PD-NCI, PD-MCI, and PDD) was obtained from 100 iterations of two-layered cross-validations; in each iteration 25% of the data were held out for testing the model performance as unseen data and the inner cross-validation layer used the rest of the data for model fitting and optimization. While the prediction performance is objectively evaluated via cross-validation, a final model was fit and interpreted based on the entire data set, with potential confounders included as covariates. The two-sided P values from Wald tests of the coefficients were reported. For analysis of the progression rate based only on biological factors (cross-sectional data), a simple linear fixed-effect regression model was used.

Generalized multitask models

Multitask models were used to predict future cognitive status based on data from the year of the first visit, i.e., each of the tasks predicted cognitive status for n (0–5) years in the future. Multitask learning aims to improve the generalization performance by exploiting the intrinsic relatedness and learning multiple related tasks simultaneously. A specific type of multitask learning, temporal grouped LASSO (TGL)56, was employed. With logistic loss, the TGL cost function is shown below as Eq. (1)

$$\min \mathop {\sum }\limits_{i = 1}^t \mathop {\sum }\limits_{j = 1}^{n_i} {\log} \left( {1 + \exp \left( { - Y_{i,j}\left( {W_i^TX_{i,j} + c_i} \right)} \right)} \right) \,+\, {\theta_1}\Vert W\Vert_F^2 + {\theta _2}\Vert WH \Vert_F^2 + {\theta _3}\Vert W \Vert_{2,1}$$

where \({{X}}_{i,j}\) denotes sample \({j}\) of the \({i}^{th}\) task, \({Y}_{i,j}\) is the corresponding ground truth of the sample, Wi and ci are the model weights and biases for task \({i}\), \({\theta}_1\), \({\theta}_2\), and \({\theta}_3\) are regularization parameters controlling \(\ell _2\)-norm penalty, temporal smoothness, and group sparsity for joint feature selection, respectively (optimized during cross-validation), \({\mathrm{H}}\) is a matrix of temporal smoothness prior, where \({H} \in {\Bbb R}^{{t} \times ({t} - 1)}\) and \({H}_{ij} = 1\) if \({i} = {j},\,{H}_{ij}\) = −1 if \({i} = {j} + 1\), and \({H}_{ij}\) = 0 otherwise, \(||.||_{\mathrm{F}}\) represents a Frobenius norm, and \(||.||_{2,1}\) is \(\mathop {\sum }\nolimits_{{i} = 1}^{d} {\mathrm{sqrt}}(\mathop {\sum }\nolimits_{{j} = 1}^{t} (.)_{{ij}}^2)\). Therefore, the first term measures empirical error of the model, the second penalizes overfitting (by penalizing large weights), the third term encourages temporal smooth transition (by penalizing large weight differences in the subsequent visit), i.e. assuming that most decline from PD-NCI to PDD transitions through PD-MCI, and the last term promotes the model to select the a feature subset from all \({\mathrm{d}}\) features that is important over all \({\mathrm{t}}\) tasks (by penalizing features that are not strong in all tasks). Through this knowledge sharing between tasks, TGL has previously shown superior performance for prognosis prediction compared to traditional machine learning algorithms57.

The TGL model was implemented in MATLAB through Malsar package58. The model, originally built for binary classification, was modified to handle ordinal classes according to a published protocol59. Specifically, two sparse regression models were built: one predicted the probability of PD-NCI and the other predicted the probability of PDD; the probability of PD-MCI was calculated as one minus these two predicted probabilities. The distribution of AUCs (PD-MCI vs. others; PDD vs. others) based on predicted probabilities across validations was obtained using a cross-validation scheme similar to the mixed-effect regression models. The final model was obtained by averaging the weights from all models predicting the probability of either being a certain cognitive status or not across cross-validation iterations.

Survival analyses

Diagnosis of PDD was used as the endpoint in survival analyses. A Cox proportional hazards (Cox PH) regression with frailty, a type of mixed-effect survival model, was used to study the association between baseline covariates and time to PDD. The model was clustered by participants to account for correlated groups of observations and the log-rank test was performed to obtain the two-sided P value for each covariate. The survival curve in different subgroups was then generated using the fitted Cox PH model.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data that support the findings of this study and scripts for data analysis are available from the corresponding author upon reasonable request. The data are not publicly available due to them containing information that could compromise research participant privacy consent. Data use agreements between the University of Washington/Dr. Zabetian and each outside investigator and their institutions are required. Such agreements would need to be completed by the researcher (and their institution) and the University of Washington prior to the raw data being made available.

Code availability

The custom code used for analysis of the data was written in Python, R, and MATLAB languages. Python was used in data visualization, data cleaning and preprocessing, interface with MATLAB, and noSQL result storage. Multi-task learning was done using MATLAB. R was used for survival analysis and multi-level modeling. In survival analysis, the core packages used include survival and survminer. In multilevel modeling, the core packages used include lmerTest, vcrpart, and stats. The source code for reproducing the results shown in this study can be found at:


  1. 1.

    Biundo, R., Weis, L. & Antonini, A. Cognitive decline in Parkinson’s disease: the complex picture. NPJ Parkinsons Dis. 2, 16018 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Saredakis, D., Collins-Praino, L. E., Gutteridge, D. S., Stephan, B. C. M. & Keage, H. A. D. Conversion to MCI and dementia in Parkinson’s disease: a systematic review and meta-analysis. Parkinsonism Relat. Disord. 65, 20–31 (2019).

    PubMed  Google Scholar 

  3. 3.

    Bjornestad, A., Tysnes, O. B., Larsen, J. P. & Alves, G. Loss of independence in early Parkinson disease: a 5-year population-based incident cohort study. Neurology 87, 1599–1606 (2016).

    PubMed  Google Scholar 

  4. 4.

    Lewis, S. J. et al. Heterogeneity of Parkinson’s disease in the early clinical stages using a data driven approach. J. Neurol. Neurosurg. Psychiatry 76, 343–348 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Bratic, B., Kurbalija, V., Ivanovic, M., Oder, I. & Bosnic, Z. Machine learning for predicting cognitive diseases: methods, data sources and risk factors. J. Med Syst. 42, 243 (2018).

    PubMed  Google Scholar 

  6. 6.

    Tomlinson, C. L. et al. Systematic review of levodopa dose equivalency reporting in Parkinson’s disease. Mov. Disord. 25, 2649–2653 (2010).

    PubMed  Google Scholar 

  7. 7.

    Yesavage, J. A. et al. Development and validation of a geriatric depression screening scale: a preliminary report. J. Psychiatr. Res. 17, 37–49 (1982).

    PubMed  Google Scholar 

  8. 8.

    Latourelle, J. C. et al. Large-scale identification of clinical and genetic predictors of motor progression in patients with newly diagnosed Parkinson’s disease: a longitudinal cohort study and validation. Lancet Neurol. 16, 908–916 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Salmanpour, M. R. et al. Optimized machine learning methods for prediction of cognitive outcome in Parkinson’s disease. Comput Biol. Med. 111, 103347 (2019).

    PubMed  Google Scholar 

  10. 10.

    Swan, M. et al. Neuropsychiatric characteristics of GBA-associated Parkinson disease. J. Neurol. Sci. 370, 63–69 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Mata, I. F. et al. GBA Variants are associated with a distinct pattern of cognitive deficits in Parkinson’s disease. Mov. Disord. 31, 95–102 (2016).

    CAS  PubMed  Google Scholar 

  12. 12.

    Yahalom, G. et al. Carriers of both GBA and LRRK2 mutations, compared to carriers of either, in Parkinson’s disease: risk estimates and genotype-phenotype correlations. Parkinsonism Relat. Disord. 62, 179–184 (2019).

    PubMed  Google Scholar 

  13. 13.

    Davis, M. Y. et al. Association of GBA mutations and the E326K polymorphism with motor and cognitive progression in Parkinson Disease. JAMA Neurol. 73, 1217–1224 (2016).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Pang, S., Li, J., Zhang, Y. & Chen, J. Meta-analysis of the relationship between the apoe gene and the onset of Parkinson’s Disease dementia. Parkinsons Dis. 2018, 9497147 (2018).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Rongve, A. et al. GBA and APOE epsilon4 associate with sporadic dementia with Lewy bodies in European genome wide association study. Sci. Rep. 9, 7013 (2019).

    PubMed  PubMed Central  Google Scholar 

  16. 16.

    Sun, R. et al. Polymorphisms and Parkinson Disease with or without dementia: a meta-analysis including 6453 participants. J. Geriatr. Psychiatry Neurol. 32, 3–15 (2019).

    PubMed  Google Scholar 

  17. 17.

    Altmann, A., Tian, L., Henderson, V. W. & Greicius, M. D., Alzheimer’s Disease Neuroimaging Initiative, I. Sex modifies the APOE-related risk of developing Alzheimer disease. Ann. Neurol. 75, 563–573 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Buckley, R. F. et al. Sex, amyloid, and APOE epsilon4 and risk of cognitive decline in preclinical Alzheimer’s disease: findings from three well-characterized cohorts. Alzheimers Dement 14, 1193–1203 (2018).

    PubMed  PubMed Central  Google Scholar 

  19. 19.

    Farrer, L. A. et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. JAMA 278, 1349–1356 (1997).

    CAS  PubMed  Google Scholar 

  20. 20.

    Landau, S. M. et al. Comparing predictors of conversion and decline in mild cognitive impairment. Neurology 75, 230–238 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Cholerton, B. et al. Sex differences in progression to mild cognitive impairment and dementia in Parkinson’s disease. Parkinsonism Relat. Disord. 50, 29–36 (2018).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Henry, J. D., Crawford, J. R. & Phillips, L. H. Verbal fluency performance in dementia of the Alzheimer’s type: a meta-analysis. Neuropsychologia 42, 1212–1222 (2004).

    PubMed  Google Scholar 

  23. 23.

    Ho, J. K., Nation, D. A. & Alzheimer’s Disease Neuroimaging, I. Neuropsychological profiles and trajectories in preclinical Alzheimer’s Disease. J. Int. Neuropsychol. Soc. 24, 693–702 (2018).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Ryan, J. J., Glass Umfleet, L., Kreiner, D. S., Fuller, A. M. & Paolo, A. M. Neuropsychological differences between men and women with Alzheimer’s disease. Int J. Neurosci. 128, 342–348 (2018).

    CAS  PubMed  Google Scholar 

  25. 25.

    Biundo, R. et al. Influence of APOE status on lexical-semantic skills in mild cognitive impairment. J. Int. Neuropsychol. Soc. 17, 423–430 (2011).

    PubMed  Google Scholar 

  26. 26.

    Mata, I. F. et al. Large-scale exploratory genetic analysis of cognitive impairment in Parkinson’s disease. Neurobiol. Aging 56, 211.e1–211.e7 (2017).

    CAS  Google Scholar 

  27. 27.

    Mata, I. F. et al. APOE, MAPT, and SNCA genes and cognitive performance in Parkinson disease. JAMA Neurol. 71, 1405–1412 (2014).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Williams-Gray, C. H. et al. The distinct cognitive syndromes of Parkinson’s disease: 5 year follow-up of the CamPaIGN cohort. Brain 132, 2958–2969 (2009).

    PubMed  Google Scholar 

  29. 29.

    Seto-Salvia, N. et al. Dementia risk in Parkinson disease: disentangling the role of MAPT haplotypes. Arch. Neurol. 68, 359–364 (2011).

    PubMed  Google Scholar 

  30. 30.

    Ezquerra, M. et al. Lack of association of APOE and tau polymorphisms with dementia in Parkinson’s disease. Neurosci. Lett. 448, 20–23 (2008).

    CAS  PubMed  Google Scholar 

  31. 31.

    Irwin, D. J. et al. Neuropathologic substrates of Parkinson disease dementia. Ann. Neurol. 72, 587–598 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Papapetropoulos, S. et al. Phenotypic associations of tau and ApoE in Parkinson’s disease. Neurosci. Lett. 414, 141–144 (2007).

    CAS  PubMed  Google Scholar 

  33. 33.

    Alcalay, R. N. et al. Cognitive performance of GBA mutation carriers with early-onset PD: the CORE-PD study. Neurology 78, 1434–1440 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Gasca-Salas, C. et al. Longitudinal assessment of the pattern of cognitive decline in non-demented patients with advanced Parkinson’s disease. J. Parkinsons Dis. 4, 677–686 (2014).

    PubMed  Google Scholar 

  35. 35.

    Pal, A. et al. Deficit in specific cognitive domains associated with dementia in Parkinson’s disease. J. Clin. Neurosci. 57, 116–120 (2018).

    PubMed  Google Scholar 

  36. 36.

    Tranel, D., Vianna, E., Manzel, K., Damasio, H. & Grabowski, T. Neuroanatomical correlates of the Benton Facial Recognition Test and Judgment of Line Orientation Test. J. Clin. Exp. Neuropsychol. 31, 219–233 (2009).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Cilia, R. et al. Survival and dementia in GBA-associated Parkinson’s disease: the mutation matters. Ann. Neurol. 80, 662–673 (2016).

    CAS  PubMed  Google Scholar 

  38. 38.

    Cholerton, B. A. et al. Evaluation of mild cognitive impairment subtypes in Parkinson’s disease. Mov. Disord. 29, 756–764 (2014).

    PubMed  PubMed Central  Google Scholar 

  39. 39.

    Buter, T. C. et al. Dementia and survival in Parkinson disease: a 12-year population study. Neurology 70, 1017–1022 (2008).

    CAS  PubMed  Google Scholar 

  40. 40.

    Hely, M. A., Reid, W. G., Adena, M. A., Halliday, G. M. & Morris, J. G. The Sydney multicenter study of Parkinson’s disease: the inevitability of dementia at 20 years. Mov. Disord. 23, 837–844 (2008).

    PubMed  Google Scholar 

  41. 41.

    Williams-Gray, C. H. et al. The CamPaIGN study of Parkinson’s disease: 10-year outlook in an incident population-based cohort. J. Neurol. Neurosurg. Psychiatry 84, 1258–1264 (2013).

    PubMed  Google Scholar 

  42. 42.

    Jones, J. D., Kuhn, T. P. & Szymkowicz, S. M. Reverters from PD-MCI to cognitively intact are at risk for future cognitive impairment: analysis of the PPMI cohort. Parkinsonism Relat. Disord. 47, 3–7 (2018).

    PubMed  Google Scholar 

  43. 43.

    Emre, M. et al. Clinical diagnostic criteria for dementia associated with Parkinson’s disease. Mov. Disord. 22, 1689–1707 (2007). quiz 1837.

    PubMed  Google Scholar 

  44. 44.

    Litvan, I. et al. Diagnostic criteria for mild cognitive impairment in Parkinson’s disease: Movement Disorder Society Task Force guidelines. Mov. Disord. 27, 349–356 (2012).

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    Cholerton, B. A. et al. Pacific Northwest Udall Center of excellence clinical consortium: study design and baseline cohort characteristics. J. Parkinsons Dis. 3, 205–214 (2013).

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Nasreddine, Z. S. et al. The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. J. Am. Geriatr. Soc. 53, 695–699 (2005).

    PubMed  Google Scholar 

  47. 47.

    Benedict, R. H. B., Schretlen, D., Groninger, L. & Brandt, J. The Hopkins verbal learning test-revised: normative data and analysis of inter-form and inter-rater reliability. Clin. Neuropsychologist 12, 43–55 (1998).

    Google Scholar 

  48. 48.

    Wechsler, D. WAIS-III Administration and Scoring Manual (The Psychological Corporation, 1997).

  49. 49.

    Wechsler, D. Wechsler Adult Intelligence Scale-Revised (The Psychological Corporation, 1987).

  50. 50.

    Strauss, E., Sherman, E. M. S. & Spreen, O. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. 3rd edn. (Oxford University Press, 2006).

  51. 51.

    Benton, A. L., Sivan, A. B., Hamsher, N. R., Varney, N. R. & Spreen, O. Contributions to Neuropsychological Assessment: A Clinical Manual (Oxford University Press, 1994).

  52. 52.

    Goetz, C. G. et al. Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord. 23, 2129–2170 (2008).

    PubMed  Google Scholar 

  53. 53.

    Goetz, C. G. et al. Movement Disorder Society Task Force report on the Hoehn and Yahr staging scale: status and recommendations. Mov. Disord. 19, 1020–1028 (2004).

    PubMed  Google Scholar 

  54. 54.

    Burgin, R. vcrpart: Tree-Based Varying Coefficient Regression for Generalized Linear and Ordinal Mixed Models. R package version 0.3-3 (2015).

  55. 55.

    Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. Package ‘Imertest’. R package version 2 (2015).

  56. 56.

    Zhou, J., Yuan, L., Liu, J. & Ye, J. A multitask learning formulation for predicting disease progression. In Apté, C. V. (ed) Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 814–822 (Association for Computing Machinery, San Diego, California, USA, 2011).

  57. 57.

    Emrani, S., McGuirk, A. & Xiao, W. Prognosis and diagnosis of Parkinson’s disease using multi-task learning. In Matwin, S., Yu, S. (eds) Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, Halifax, NS, Canada, 2017).

  58. 58.

    Zhou, J., Chen, J. & Ye, J. Malsar: Multi-task Learning via Structural Regularization (Arizona State University, 2011).

  59. 59.

    Frank, E. & Hall, M. A simple approach to ordinal classification. In de Raedt, L., Flach, P. (eds) European Conference on Machine Learning. 145–156 (Springer, Freiburg, Germany, 2001).

Download references


This work was supported by the National Institutes of Neurological Disorders and Stroke [grant number P50 NS062684], Department of Veterans Affairs [grant number 101 CX001702], and the Scully Initiative Fund. The funding sources did not provide scientific input for the study. This material is the result of work supported with resources and the use of facilities at the Veterans Affairs Puget Sound Health Care System. We sincerely thank our research subjects for their participation in this study.

Author information




All authors meet the journal criteria for authorship, including (1) substantial contributions to the conception or design of the work or the acquisition, analysis, or interpretation of the data, (2) drafting the work or revising it critically for important intellectual content, (3) final approval of the completed version, and (4) accountability for all aspects of the work. Specific contributions for each author are listed below. (1) Research project: A. Conception/Design, B. Organization, C. Acquisition/Execution. (2) Statistical Analyses: A. Design, B. Execution, C. Interpretation. (3) Manuscript: A. Writing of the first draft, B. Review, critique, and approval of the final draft. T.P.: 1C 2A 2B 3A 3B. BC: 1A 1B 1C 2C 3A 3B. I.F.M.: 1A 1C 2C 3B. C.P.Z.: 1A 1B 1C 2C 3B. K.L.P.: 1A 1B 1C 2C 3B. N.A.: 2C 3B. L.T.: 2C 3B. J.F.Q.: 1A 1B 1C 2C 3B. A.L.H.: 1C 2C 3B. K.A.C.: 1C 2C 3B. S.-C.H.: 1C 2C 3B. K.L.E.: 1A 1B 1C 2C 3B. T.J.M.: 1A 1B 1C 2C 3B.

Corresponding author

Correspondence to Brenna Cholerton.

Ethics declarations

Competing interests

The authors declare no competing interests related to the current manuscript. Full financial disclosure is provided below: T.P. is supported by grants from the NIH and the Scully Initiative Fund. B.C. is supported by grants from the NIH. I.F.M. is funded by grants from the Parkinson’s Foundation, American Parkinson’s Disease Association, and the National Institutes of Health. C.P.Z. is supported by grants from the American Parkinson Disease Association. Department of Veteran Affairs, and NIH, and a gift from the Dolsen Foundation. K.L.P. reports honoraria from invited scientific presentations to universities and professional societies not exceeding $5000/yr, is reimbursed by Sanofi, AstraZeneca, and Sangamo BioSciences for the conduct of clinical trials, has received consulting fees from Allergan and Curasen, and is funded by grants from the Michael J Fox Foundation for Parkinson’s Research and the NIH. N.A. is supported by grants from the NIH, the American Heart Association, the Doris Duke Charitable Foundation, the Bill and Melinda Gates Foundation, the March of Dimes, the Food and Drug Administration, the Burroughs Wellcome Fund, L.T. is supported by grants from the NIH. J.F.Q. is reimbursed by Prothena and Roche for the conduct of clinical trials and by vTv Pharmaceuticals for DSMB service. J.F.Q. is also supported by grants from the NIH and Department of Veterans Affairs. K,A.C. is funded by a VA Merit Grant. A.L.H. is reimbursed by Theravance Inc. for conducting clinical trials and supported by grants from NIH and the Huntington’s Disease Society of America. S-.C.H. is funded by grants from the NIH and Michael J. Fox Foundation. K.L.E. is funded by grants from the NIH. T.J.M. reports honoraria from invited scientific presentations to universities and professional societies not exceeding $5000 per year and is funded by grants from the NIH, the Michael J. Fox Foundation, the Farmer Family Foundation, and the Scully Initiative Fund.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Phongpreecha, T., Cholerton, B., Mata, I.F. et al. Multivariate prediction of dementia in Parkinson’s disease. npj Parkinsons Dis. 6, 20 (2020).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing