Diagnostic accuracy of brain age prediction in a memory clinic population and comparison with clinically available volumetric measures

The aim of this study was to assess the diagnostic validity of a deep learning-based method estimating brain age based on magnetic resonance imaging (MRI) and to compare it with volumetrics obtained using NeuroQuant (NQ) in a clinical cohort. Brain age prediction was performed on minimally processed MRI data using deep convolutional neural networks and an independent training set. The brain age gap (difference between chronological and biological age) was calculated, and volumetrics were performed in 110 patients with dementia (Alzheimer’s disease, frontotemporal dementia (FTD), and dementia with Lewy bodies), and 122 with non-dementia (subjective and mild cognitive impairment). Area-under-the-curve (AUC) based on receiver operating characteristics and logistic regression analyses were performed. The mean age was 67.1 (9.5) years and 48.7% (113) were females. The dementia versus non-dementia sensitivity and specificity of the volumetric measures exceeded 80% and yielded higher AUCs compared to BAG. The explained variance of the prediction of diagnostic stage increased when BAG was added to the volumetrics. Further, BAG separated patients with FTD from other dementia etiologies with > 80% sensitivity and specificity. NQ volumetrics outperformed BAG in terms of diagnostic discriminatory power but the two methods provided complementary information, and BAG discriminated FTD from other dementia etiologies.

The number of patients suffering from dementia is rapidly increasing worldwide as populations grow older.To date no curative treatment is available for any dementia disorder 1 .However, new treatment possibilities targeted at Alzheimer's disease (AD) are evolving, prompting the need for tools to aid early diagnosis and predict future cognitive decline 2 .This is of utmost importance as potentially disease-modifying treatment strategies will target early pathophysiological changes.
The use of artificial intelligence (AI) to improve and facilitate early diagnosis, planning, and follow-up of treatment has emerged rapidly in many medical fields, including the field of cognitive impairment and dementia 3 .A recent report published by the Norwegian directorate of health encouraged increased use of AI in radiology to improve early diagnosis and provide decision support 4 .
AI based methods for the assessment of brain age based on magnetic resonance imaging (MRI) scans have recently evolved 5 .The purpose of such data-driven methods is to train models to identify characteristics in the MRI data that are robustly associated with age (or any other key characteristics) in a training set, and then apply the resulting model on a different set of brain scans to estimate the age of individual participants or patients in a clinical context.The discrepancy between the predicted and the chronological age, sometimes referred to as Brain age prediction.Across groups, age prediction accuracy was high, with a correlation between predicted and chronological age of 0.879 and a mean absolute error (MAE) of 4.29.Within groups the correlation was 0.913 (MAE 3.58) in non-dementia and 0.750 (MAE 5.09) in dementia.Across groups, the correlation between BAG and age was − 0.046 (p 0.483); 0.092 (p 0.313) in non-dementia and − 0.503 (p < 0.001) in dementia.
Diagnostic associations with brain MRI features.Group-wise summary stats for the MRI features are presented in Table 1.Linear models adjusted for age and sex revealed patients with dementia had higher BAG (t = 5.23, p < 0.001), smaller forebrain parenchyma volume (t = − 6.67, p ≤ 0.001), and smaller hippocampi (t = − 7.36, p = < 0.001) compared to non-dementia patients (unadjusted Cohen's d of − 0.59, 1.27, and 1.34 respectively).
Table 2 shows the results from the ROC analysis.AUCs were overall higher for the dementia versus SCD classification compared to the dementia versus non-dementia classification.The two NQ measures yielded higher AUCs for both stage classifications compared to BAG, with non-overlapping confidence intervals in the dementia versus non-dementia classification.For the dementia versus SCD classification, the NQ AUCs were both 0.89 (sensitivity 80%, specificity 86-88%) and BAG AUC was 0.78 (sensitivity 80%, specificity 67%).For the dementia versus non-dementia classification, the NQ AUCs were 0.82-0.83(sensitivity 80%, specificity 67-68%) and BAG AUC was 0.68 (sensitivity 80%, specificity 48%).
Table 3 presents the results from the logistic regression analyses predicting disease stages.The model including hippocampus volume adjusted for demographic covariates gave the highest Nagelkerke R 2 (0.44 vs. 0.40 and 0.36) for dementia/non-dementia prediction, while the model with BAG adjusted for demographic covariates revealed the highest Nagelkerke R 2 for dementia/SCD prediction (0.60 vs. 0.58 and 0.59).Adding white matter hypointensity volume (WMH) to model 3 of both diagnostic predictions did not change the Nagelkerke  3a).In the dementia/SCD prediction (Table 3b), the Nagelkerke R 2 increased from 0.59 to 0.69 when adding BAG to forebrain parenchyma and covariates and from 0.58 to 0.68 when adding BAG to hippocampus and covariates.Table 4 summarizes the comparisons of AD, FTD and DLB.BAG and forebrain parenchyma volume, but not hippocampus volume, were significantly different between groups.Post hoc group comparisons of AD versus non-AD, FTD versus non-FTD, and DLB versus non-DLB showed highest BAG in patients with FTD, and largest forebrain parenchyma volume in patients with DLB, compared to the other etiologies (p = 0.005 and p = 0.012, respectively).AUC of BAG separating FTD from non-FTD was 0.82 (95% CI 0.62-1.00,p 0.009) with sensitivity 83% and specificity 82% and of forebrain parenchyma volume separating DLB from non-DLB AUC was 0.73 (95% CI 0.60-0.87,p 0.009) with sensitivity 83% and specificity 57%.
FTD patients were younger than the other patients (64.7 vs. 72.0,p = 0.023).When adjusting for age and sex, the association between FTD and BAG was no longer statistically significant (t = 1.74, p = 0.086).However, a sensitivity subanalysis including only patients 70 years and below was performed.In this analysis, including five patients with FTD and 34 with non-FTD (median ages 64 and 66 (p = 0.117) in FTD and non-FTD respectively), the FTD patients had significantly higher BAG (median 11.3 vs. 5.1 (p = 0.004)), and significantly higher BAG when adjusting for age and sex (p = 0.007).The AUC of this subgroup analysis was 0.88 (p = 0.006).

Discussion
This study of the diagnostic properties of MRI-based brain age prediction in a memory clinic setting revealed that BAG was associated with disease stage, but the discriminatory power was outperformed by the hippocampus and forebrain parenchyma volumes.BAG was however found to discriminate FTD from other dementia etiologies.
A higher BAG was associated with more impaired disease stage, i.e. dementia versus non-dementia stages.This was as expected as previous studies have found BAG to be associated with cognitive test results and to be higher in AD compared to healthy controls 5 .The association between BAG and disease stage was not confounded by degree of vascular comorbidity, as measured with FreeSurfer WMH.Despite the association with disease stage, BAG performed poorly at discriminating dementia from non-dementia, while volumetrics using NQ did better.We suggest including MCI patients to the non-dementia group might be the cause of the weak discriminating power of BAG as the distinction between MCI and dementia is indefinite and excluding MCI patients should help distinguishing the remaining groups.Thus, in analyses discriminating dementia from SCD, the discriminating power was higher for both MRI measures.Despite increasing the AUC of both methods, BAG did not achieve the sensitivity and specificity levels that are generally expected from a clinical biomarker, while both NQ measures did 14 .
In clinical practice the separation of dementia from non-dementia stages is based on clinical interviews and examinations and does not include biomarkers in the decision-making process.Therefore, the analyses on associations with disease stage were primarily performed to compare the results of the novel, and until now, research-intended brain age prediction with the clinically available hippocampus and forebrain parenchyma volumes.Hippocampal atrophy is a well-known marker of AD 15 and is often used as a supportive biomarker in the diagnostic workup.It is therefore not surprising that this measure reached clinically relevant discriminatory power as most of the dementia patients had probable AD.The brain age prediction method is trained to capture general brain age and an increased BAG has been associated with genetic, lifestyle, and psychiatric diseases, in addition to AD 7 .Increased BAG is likely to be less specific to neurodegenerative diseases than hippocampal volume, supported by the current findings.It is also possible that cognitive and brain reserve play a greater role when BAG is applied to cognitively impaired patients.We adjusted for educational level, but this only accounts for one part of the complex concept of cognitive reserve 16 .Further, brain age prediction integrating information across the whole brain is likely less sensitive to specific, small region, hippocampal atrophy than hippocampal volume itself.Therefore, it is conceivable that future work performing regional brain age prediction (e.g.Kaufmann et al. 7 ) may increase clinical sensitivity and specificity.
NQ volumetrics and BAG were also compared in logistic regression analyses.The adjusted model including hippocampus resulted in a higher Nagelkerke R 2 than the model with BAG in the dementia/non-dementia prediction, in line with the ROC results.In the dementia/SCD prediction, the adjusted model with BAG had the The clinical utility of a biomarker ultimately depends on its value for etiological diagnostic work-up.BAG was larger in patients with FTD and the discriminatory power for separating FTD from other etiologies was excellent, with sensitivity and specificity levels above 80%.Previous studies have reported increased brain age in severe mental disorders including schizophrenia, and a genome wide association study found an association between brain ageing and the MAPT gene which encodes for tau protein that is related to FTD 17 .The present sample is relatively small, and the explorative design does not allow for decisive conclusions.Based on previous findings of BAG being associated with diseases associated with frontal lobe pathologies 17,18 our findings encourage further studies on the association between BAG and FTD and other frontal lobe pathologies.However, the size and distribution of the affected brain regions are expected to influence brain age prediction and could introduce bias to the associations.Indeed, the frontal lobes occupy a relatively large proportion of the brain, accounting for two thirds of the total brain volume 19 .Atrophy of this region could possibly therefore affect brain age estimates to a larger extent than focal atrophy of a smaller brain region, e.g. the hippocampus.Another possibility is that age prediction was biased by age, i.e. that the accuracy of the prediction model varies with age, a phenomenon commonly seen in brain age models 20 .The younger age of the FTD patients may thus have influenced the results.Indeed, when adjusting for age the group differences were attenuated to the point where they no longer reached the threshold for statistical significance.Practically, it is difficult to correct for bias between groups with different age distributions since the true structure of the bias is unknown; correcting jointly in both groups based on independent data can have little to no effect, whereas an in-sample correction could reduce actual group differences.Thus, we performed a sensitivity analysis matching the groups on age, confirming higher BAG in FTD patients.
Hippocampus volume was not significantly smaller in patients with AD dementia compared to non-AD dementia, which might seem unexpected as hippocampal atrophy is known to be a marker of AD.This is however in line with a previous study based on a larger, yet partly overlapping, cohort where hippocampus volume reached an AUC of only 0.62 for discrimination of AD dementia versus non-AD dementia 12 .Further, previous studies from our group concluded that as much as 53% of patients with AD dementia lack atrophy of the hippocampi, and that atypical atrophy patterns are common 21,22 .Both these findings might explain why hippocampus was not able to separate patients with AD dementia from non-AD dementia at a clinically acceptable level, in that study.
There are limitations to the current study.The cross-sectional explorative design and the relatively low number of patients in the etiological comparisons and in the FTD sensitivity analysis limit the confidence and generalizability of the conclusions.Another limitation is that only clinical criteria without AD specific molecular imaging (Aβ-PET) or biofluid (CSF, plasma, Aβ/p-tau) biomarkers were used as the gold standard for the etiological diagnoses.Although the clinical diagnoses were made using the NIA/AA criteria by two experienced physicians, and while the main goal of this study was to examine whether BAG could serve as an additional diagnostic marker in a naturalistic clinical setting, future studies on the diagnostic properties of BAG should include specific biomarkers to substantiate the results.Further, information on comorbidity was not available in the current data set.Although our analyses revealed no substantial confounding effects of white matter cerebrovascular pathology as indexed using WMH from FreeSurfer, various comorbid clinical conditions may influence MRI based analyses and subsequent brain age prediction, and should be considered in future studies.

Conclusions
Brain age estimation using clinically available MRI scans adds an interesting perspective to the association between brain ageing and neurodegenerative diseases.While NQ volumetrics outperformed BAG in terms of discriminatory power for patients with dementia versus those without dementia, the two measures provided complementary information and we did not find evidence to suggest that our findings were confounded by cerebrovascular comorbidity.The finding of increased brain age in FTD patients is of clinical interest as few biomarkers are available for this diagnosis.The causal direction of effects and prognostic properties remain to be further characterized, preferably in a longitudinal study.www.nature.com/scientificreports/While automated tools for individual-level brain phenotyping based on machine learning, such as brain age prediction, have potential to support clinical diagnostics in a memory clinic setting, further developments and validations of its etiology discriminating and prognostic properties are needed to characterize its clinical potential.

Methods
Participants.All patients assessed for cognitive complaints at the memory clinic at Oslo University hospital (OUH), Norway, between June 2015 and January 2019 that met the criteria of SCD, MCI, or dementia (see below), and that had been examined with brain MRI at the same scanner at OUH +/− 6 months from the clinical assessment were eligible for inclusion.Referral to the research MRI scanner at OUH, and not to another MRI scanner, was done at random when an MRI scan was indicated as part of the clinical routine, and it was convenient for the patient due to geography to perform it at OUH.Among the 254 patients fulfilling these criteria, MRI scans of sufficient quality were available from 232 patients, which form the present study cohort.
All patients had consented to be part of a national quality and research register (The Norwegian registry of persons assessed for cognitive symptoms, NorCog).The inclusion and clinical assessments carried out at the memory clinic and data included in NorCog have been described previously 23 .
Diagnoses and clinical assessments.All patients were diagnosed retrospectively by two experienced physicians (KP and THE), using all available information from the extensive clinical assessments including information from patients and proxies on symptoms, cognitive test results, function in activities of daily living, and physical and psychiatric examinations 23,24 .The NIA/AA 2011 criteria were used to diagnose MCI and dementia 25 , and the Jessen criteria were used to diagnose SCD 26 .Among the patients with dementia, those fulfilling clinical criteria of AD according to NIA/AA 2011 criteria (probable AD and possible AD mixed with vascular pathology) 25 , frontotemporal dementia (FTD) according to the Rascovsky and Gorno-Tempini 2011 criteria 27,28 , and dementia with Lewy bodies (DLB) according to the 2017 McKeith criteria 29 , were included in etiology-based validity analyses.Other diagnoses were excluded due to few cases or other mixed etiologies (i.e., two patients with vascular cognitive impairment, one patient with Parkinson dementia, ten patients with various mixed etiologies, and three unspecific dementia diagnoses).Clinical radiology reports including information on structural pathologies of both cortical and subcortical regions were used to exclude etiologies not related to dementia (i.e.intracranial bleedings or tumors).Further, signs of vascular pathology indicative of cerebrovascular disease and frontal atrophy were used according to criteria of vascular cognitive impairment and FTD.Information on regional structural changes based the clinical report, or from the NQ report was not included in the diagnostic criteria.
The Norwegian version of the Mini Mental State Examination (MMSE) and the Clinical Dementia Rating scale-sum of boxes (CDR-SB) were used as measures of global cognitive and functional performance for descriptive purposes.MMSE gives a score between zero and 30, the higher score the better global cognitive function 30,31 and the CDR-SB is a global measure of cognitive and functional impairment including six items scored from zero to 3 and summed up to a score ranging from zero to 18, the higher the score the greater the impairment 32 .Two CDR-certified physicians scored the CDR-SB post hoc, based on all available information from the patient records (KP and THE).The Consortium to Establish a Registry for Alzheimer's Disease (CERAD) 10-word delayed recall test, with scores from zero to 10, the higher score the better the learning and retrieving capacity 33 was included as a descriptive measure of memory function.
MRI acquisition and analysis.All patients were assessed with brain MRI according to the same research protocol using a GE Discovery MR750 3T scanner (GE Healthcare, Milwaukee, WI, US).Whole brain T1-weighted structural MRI data was acquired using an inversion recovery-fast spoiled gradient echo sequence (BRAVO) with the following parameters: TR = 8.16 ms, TE = 3.18 ms, TI = 450 ms, flip angle = 12°, field of view = 256 mm, acquisition matrices = 256 × 256, 188 sagittal slices, slice thickness = 1.0 mm, voxel size = 1 × 1 × 1 mm 3 .The scans were analyzed following a previously established minimal processing pipeline and brain age prediction model 11 , and with NeuroQuant 3. version (NQ, CorTechs labs/University of California, San Diego, CA, USA) 13 .
Brain age was computed using a state-of-the-art deep convolutional neural network, trained on a large population dataset (N = 53,542 from 21 publicly accessible datasets) with a wide age range from a multitude of scanners 11 , not including the one used for the present study.The model is available online 34 .As input, the model used minimally processed imaging data linearly registered with six degrees of freedom to MNI152 space.BAG was calculated by subtracting chronological age from predicted brain age, such that a positive BAG reflects higher predicted age compared to chronological age and vice versa.
NQ produces valid and reliable volumetric measures of several brain regions 13,35 .NQ volumetry of hippocampus correlates well with visual ratings of the medial temporal lobe using the Scheltens scale 36 .In the present study, we included the volume of the hippocampus as atrophy of the hippocampus is one of the bestestablished diagnostic imaging biomarkers for AD 15 , constituting the majority of the patients with dementia in the present sample.Additionally, we included forebrain parenchyma volume, including all parenchymal brain volumes except the brainstem and cerebellum, as this volume was previously shown to have the best ability to discriminate between dementia and non-dementia 12 , and to include a measure that would represent more than the AD specific medial temporal region.Both structures were included as proportions of estimated intracranial volume, i.e. the sum of whole brain volume and CSF spaces, to adjust for head size.
In 229 of the 232 included patients, FreeSurfer data on white matter hypointensities (WMH) was available 37 .Although based on T1-weighted MRI scans, this measure has been found to correlate well with both state-ofthe-art T2/FLAIR white matter hyperintensities and the visual rating scale of Fazekas [38][39][40] .To adjust for head size, WMH was divided by the FreeSurfer measure of total intracranial volume.WMH was included post hoc to evaluate if cerebrovascular comorbidity could confound the association between BAG and dementia.
Statistics.Data were analyzed using IBM SPSS Statistics for Windows (version 27, Armonk, NY, USA).The significance level was set at 0.05.Diagnostic groups were compared using independent samples t-test and ANOVA for continuous measures and χ 2 tests for categorical measures.Age-and sex-adjusted linear models were performed for group-wise comparisons of the MRI measures.Medians and Mann-Whitney U test were used in a sensitivity analysis of the subgroup of patients 70 years of age and below.
To compare the validity of the two MRI methods, receiver operating characteristics (ROC) analyses were carried out for each method, calculating the area under the curve (AUC) as a measure of the performance of the classifiers to separate dementia from non-dementia and from SCD.The interpretation of the AUC depends on the clinical setting in which the test should be used, but generally an AUC of 0.5-0.7 is regarded poor, 0.7-0.8acceptable, 0.8-0.9excellent, > 0.9 outstanding 41 .For a biomarker to be clinically useful, the sensitivity and specificity should be at least 80% 14 .Thus, for each MRI measure, the sensitivity was set at 80% and the specificity was obtained from the ROC analysis.
Bivariate Pearson correlation of the three MRI measures (BAG, hippocampus volume and forebrain parenchyma volume) was performed to prepare the logistic regression analysis.Hippocampus volume and forebrain parenchyma volume were highly correlated (r = 0.703, p < 0.001) Thus, in the logistic regression analyses predicting dementia versus non-dementia and dementia versus SCD, the forebrain parenchyma, hippocampus volume, and BAG were included in separate models (models 1, 2, and 3), adjusting for demographic covariates (age, sex, and educational level).The Nagelkerke R 2 was used as an estimate of the explained variance to compare the models.In models 4 and 5, BAG was added to each of the two volumetric measures to assess its additive value for the prediction of disease stage.Finally, to explore if cerebrovascular comorbidity could confound the association between BAG and diagnosis, WMH was added to model 3 (not in table).

Ethics declarations.
All patients gave written informed consent to be included in NorCog.The Regional Committee of Medical Research Ethics of the South-East Norway approved the use of NorCog data in the present study (REC South-East number 29461).All methods and analyses were performed in accordance with the Declaration of Helsinki.

Table 1 .
Patient characteristics, and comparisons between disease stages.All continuous variables expressed as mean (SD).*Three groups comparison using ANOVA/χ 2 .
**Dementia versus non-dementia (SCD-MCI) comparison using t-test/χ 2 .SCD subjective cognitive decline, MCI mild cognitive impairment, SD standard deviations, MMSE mini mental status examination, CDR-SB clinical dementia rating scale sum of boxes, BAG brain age gap, ICV intracranial volume, WMH white matter hypointensities.Significant values are in bold.

Table 2 .
Ability of MRI methods to distinguish dementia from non-dementia (a) and from SCD (b).SCD subjective cognitive decline, BAG brain age gap, ICV intracranial volume, AUC area under the receiver operating curve, CI confidence interval.Significant values are in bold.

Adjusted model 1 Adjusted model 2 Adjusted model 3 Adjusted model 4 Adjusted model 5 OR (95% CI) p OR (95% CI) p OR (95% CI) p OR (95% CI) p OR (95% CI) p OR (95% CI)
2agelkerke R2.Finally, models including BAG and one of the NQ measures had the highest Nagelkerke R 2 , indicating that NQ volumetrics and BAG provides complementary information for dementia prediction.

Table 4 .
Characteristics of patients with various dementia etiologies, and comparisons of diagnostic groups.*Three group comparison using ANOVA/χ 2 .**t test.AD Alzheimer's disease, FTD frontotemporal dementia, DLB dementia with Lewy bodies, BAG brain age gap, ICV intracranial volume, WMH white matter hypointensities.Significant values are in bold.