Multimodal deep learning for Alzheimer’s disease dementia assessment

Worldwide, there are nearly 10 million new cases of dementia annually, of which Alzheimer’s disease (AD) is the most common. New measures are needed to improve the diagnosis of individuals with cognitive impairment due to various etiologies. Here, we report a deep learning framework that accomplishes multiple diagnostic steps in successive fashion to identify persons with normal cognition (NC), mild cognitive impairment (MCI), AD, and non-AD dementias (nADD). We demonstrate a range of models capable of accepting flexible combinations of routinely collected clinical information, including demographics, medical history, neuropsychological testing, neuroimaging, and functional assessments. We then show that these frameworks compare favorably with the diagnostic accuracy of practicing neurologists and neuroradiologists. Lastly, we apply interpretability methods in computer vision to show that disease-specific patterns detected by our models track distinct patterns of degenerative changes throughout the brain and correspond closely with the presence of neuropathological lesions on autopsy. Our work demonstrates methodologies for validating computational predictions with established standards of medical diagnosis.

coronal and (c) sagittal sections for the COG and ADD tasks, respectively, after upsampling to the original MRI space. SHAP values for the COG task are illustrated in the left column and span the three possible diagnostic classes for this task. SHAP values for the ADD task are illustrated in the right column. On visual inspection, strongly positive SHAP values for dementia and AD diagnosis, respectively, appear to be concentrated within temporal/limbic structures and the parietal lobes. This is concordant with the known pathophysiology of AD. We presented the distribution of the regionally-averaged SHAP values from the subjects that were correctly predicted as AD or non-AD dementias. The x-axis contains the region names that each violin plot is corresponding to. The regions were defined based on the Hammersmith Adult brain atlas. If there are same structures in both the left and right hemispheres, we merged them into one region, otherwise the regions from the altas were directly included in this plot. The order of the region names is the same as the Fig. 4c which was originally determined by ranking the mean absolute SHAP values descendingly within each lobe. The regionallyaveraged SHAP value was derived by overlapping the segmentation mask of a region on the 3D shap heatmap generated using the DeepSHAP method of the MRI model on the NACC testing set. Comparisons on the regionally-averaged SHAP distribution were made between AD and each non-AD dementias, including (a) frontotemporal dementia (top row), (b) Lewy body dementia (middle row), and (c) vascular dementia (bottom row).

Supplementary Figure S8: Pairwise Tukey range test results for neuropathology.
To supplement the ANOVA analysis ( Fig. 5c), we used the Tukey-Kramer test to assess mean pairwise differences in the CNN-derived DEMO scores on persons stratified by severity of neuropathologic findings. The figure shows the least square means, the matrix of adjusted pvalues and the diffograms. The diffograms show multiple comparisons of the mean score level categories and indicate the score level (0-3; ABC scores) that significantly differed in the DEMO score. The horizontal and vertical gray reference lines display the means of each score category and, the straight lines represent 95% confidence intervals. Blue lines represent pairs of means that are significantly different from each other at the p = 0.05 significance level. Red lines indicate pairs of means that are not significantly different from each other, i.e., those with confidence intervals that intersect the diagonal reference line. These results demonstrate an expected increase in the DEMO score with increased burden of neurological findings. Supplementary Figure S10: MRI preprocessing and segmentation. MRI scans from all datasets were preprocessed using a common pipeline implemented in FSL. Raw MRIs were first reoriented to a standard axis layout and then aligned to the MNI-152 template using a linear registration tool and automatically-identified region-of-interest. These aligned MRIs were then skull-stripped, and the resultant brains then underwent a second linear registration for fine-tuning of MNI alignments, as well as bias field correction for magnetic field inhomogeneities. Finally, specific brain regions were segmented by aligning the Hammersmith Adult brain atlas to registered brains using a nonlinear registration. All processed MRIs were inspected visually, and individual brain extraction parameters were adjusted as needed for cases with failed registration. All FSL commands for the above steps are listed within boxes in the accompanying figure.
Supplementary Figure S11: Non-imaging feature missing rate. The proportion of missing data is shown for all non-imaging features across the eight cohorts. A value of 0.0 represents that no data is missing, while a value of 1.0 indicates that all data for a particular feature was absent. We further stratify missingness by diagnostic label (NC, MCI, AD, and nADD) in order to demonstrate instances in which data-availability and disease status may be correlated.  ) correspond to metrics for a one-versus-rest classification task in which the goal was to individually delineate these three cognitive categories from all others within the overarching COG task. The "COG" column corresponds to the complete COG task of separating each NC/MCI/DE category (i.e., a 3-way classification). The "ADD" columns corresponds to the task of classifying AD and nADD diagnosis given that a DE diagnosis has already been obtained from the COG task. Lastly, the "4-way" column corresponds to the complete classification workflow in which NC, MCI, AD, and nADD cases are delineated in a final four-way classification.    presented the network visualization of the inter-region correlation structure of the brain from the axial and sagittal views. The node was defined to represent a particular region from the parcellated brain and the edges between nodes demonstrated the sign and degree of correlation between a pair of nodes. We parcellated the brain MRI into 95 regions using the segmentation mask provided from the Hammersmith Adult Brain Atlas. To project the 3D structure into a axial and sagittal plane, we redefined the node for best visualization purpose. The "index" and "Adult brain atlas" columns show the completed 95 structures and their corresponding indexes from the Hammersmith Adult Brain Atlas. The "sagittal view" and "axial view" columns demonstrate how we merged and re-indexed regions and the node index number is what we labeled each node from Figs. 4d & 4e, respectively. In the sagittal view, we focused on visualizing the correlation between the temporal lobe, frontal lobe, parietal lobe, occipital lobe, cerebellum and brainstem. Specifically, we merged the same structures from the left and right hemisphere as a single node in the sagittal projection, thus ending up with a total of 33 final nodes as defined in this table. In the axial view, we excluded some of the structures that have been already shown in the sagittal view, for example, insula, the third ventricle, etc. The focus of the axial view is to reveal the correlation between cerebrum structures from the left and right hemispheres. Our selection of the axial nodes yielded 57 regions. We randomly sampled 100 subjects from the NACC dataset. For each of the selected subject, we provided MRI scan along with a set of non-imaging features as specified in the supplementary material to 17 neurologists for them to review and make a prediction on one of the 4 possible categories, i.e., normal cognition (NC), mild cognitive impairment (MCI), Alzheimer's disease (AD), non-AD dementia (nADD). To make a head-to-head comparison, we also tested our MRI model, non-imaging model and fusion model on the same 100 selected subjects. We reported performance metrics, including accuracy, F-1, sensitivity, specificity and Matthew correlation coefficient (MCC) for various tasks as indicated by column names based on the predictions from (a) 17 neurologists, (b) the MRI model, (c) the non-imaging model and (d) the fusion model. The mean and standard deviation (std) from table (a) was calculated over all 17 neurologists, and the mean and std from the other tables was derived from 5-fold validation experiments. More specifically, the COG represents the full classification of NC, MCI and DE cases). In addition, we reported the performance of binary classification of NC vs. non-NC ("COG NC " column), MCI vs. non-MCI ("COG MCI " column) and DE vs. non-DE ("COG DE " column). We also reported the model's performance in detecting AD from the dementied subjects within the "ADD columns. Lastly, we reported the 4-way classification of NC, MCI, AD, nADD ("4-way" column). We randomly sampled 50 subjects from the NACC dataset. For each of the selected subject, we provided MRI scan along with a set of non-imaging features as specified in the supplementary material to 7 neuroradiologists for them to independently review and make a prediction on one of the 2 possible categories, i.e., Alzheimer's disease (AD) and non-AD dementia (nADD). We reported performance metrics, including accuracy, F-1, sensitivity, specificity and Matthew correlation coefficient (MCC) for this binary classification task by considering AD as positive samples.   • ADD task: Separation of persons with AD from those with nADD given an initial diagnosis of DE.

Supplementary
• 4-way task: Complete separation of NC, MCI, AD, and nADD cases. Accomplished by successive completion of the COG and the ADD tasks.

Model-Derived Cognitive Metrics
• DEMO score: "DEmentia MOdel" score. A continuous measure for overall cognitive status ranging from 0 (NC) to 1 (MCI) to 2 (DE). DEMO score thresholding enables completion of the COG task and its subtasks.
• ALZ score: "ALZheimer's" score. A continuous measure from 0 (nADD) to 1 (AD) that corresponds with the probability that a person has Alzheimer's disease dementia. ALZ score thresholding enables completion of the ADD task.

Model Types:
• MRI-only model: A convolutional neural network (CNN) that uses MRI scans and no other information to complete the COG and the ADD tasks.
• Non-imaging model: A traditional machine learning classifier that uses demographics, past medical history, neuropsychological testing, and functional assessments to complete the COG and the ADD tasks.
• Demographic information was reviewed along with a summary of the subject's historical medical conditions, both active and inactive. Neuropsychological results featuring both raw and standardized scores (where available) were then reviewed, along with a FAQ, GDS and NPI total with sub-scores corresponding in time to the subject's cognitive assessments and stated age. Subjects with abnormal cognitive performance and elevated FAQ scores were potentially experiencing dementia. Functional ratings were considered in the context of potential progressive neurodegenerative disease as well as medical factors, substance abuse and overall burden of psychiatric symptoms. When specific cognitive domains appeared to be impaired out of proportion to others (such as episodic memory and low-frequency word-finding, etc.), the relative compatibility of the subject's cognitive profile with Alzheimer's disease was additionally considered. MRI was reviewed in the context of the above-described cognitive and functional impression. Particular attention was paid to the severity and distribution of any atrophy present, and for additional abnormalities that could alternatively explain or contribute to the subject's cognitive performance and functional status.

Neurologist 2:
The degree of cognitive impairment was first inferred from the raw cognitive assessment scores, as well as their associated Z-scores and percentiles. All subjects whose testing was within normal ranges received a label of NC. For all subjects, any cognitive domain score greater than 2 Z-scores below the mean or less than the 5th percentile was deemed to indicate cognitive impairment within that domain. If any cognitive impairment was present, the FAQ was used to distinguish between MCI and dementia: specifically, FAQ less than 5 qualified as MCI whereas FAQ greater than 5 was judged as dementia. For all cases of dementia, the subtype of dementia was inferred from the most prominent areas of brain atrophy, the most salient domains of cognitive impairment, evidence of vascular damage, and the size of the ventricle. Prominent features indicating AD included delayed memory impairment, the degree of medial temporal lobe atrophy and parietal atrophy. Prominent posterior parietal atrophy is also suggestive of AD.
Otherwise, notable indicators of non-AD dementias included prominent executive dysfunction, mental behavior abnormalities, language disorders, asymmetric temporal lobe atrophy, frontal lobe atrophy, prominent white matter hyperintensities, and additional ischemic lesions such as those of the thalamus and temporal lobe. There were several challenges worth noting. For instance, if FAQs are significantly reduced in cases of MCI, it is difficult to determine whether decreased FAQ is due to dementia or other physical, mental, or emotional factors. Broadly speaking, establishing a clear demarcation between MCI and dementia is challenging. Furthermore, when brain atrophy and cognitive decline are more global in nature, the task of subtyping the patient's dementia becomes more difficult.

Neurologist 3:
The participant's age, background history including family history, co-morbidities and cognitive assessments including MMSE were reviewed. This was followed by their functional assessment, GDS, and neuropsychiatric inventory. The degree and location of atrophy on MRI was then reviewed. If the participant had a normal cognitive and functional assessment with normal MRI, regardless of age, then this was labeled as normal. If the participant had a normal cognitive and functional assessment but was above 80 years with some mild generalized atrophy on MRI, then this was labeled as normal. If the patient had mildly impaired cognitive assessment with severe depression, and normal MRI, then this was also labeled normal. If the participant had mildly impaired cognitive assessment but no functional limitations with normal or mild atrophy on MRI, then this was labeled as mild cognitive impairment. The exception being if they had atrophy on MRI and were young (i.e., in their 50s). If the participant was noted to have impaired cognitive assessment with impaired functional assessment, with medial temporal or hippocampal atrophy out of proportion of the rest of the brain, then this was labeled AD. If the participant had impaired cognitive and functional assessment with significant vascular risk factors and small vessel disease on MRI, frontal prominent atrophy, or prominent neuropsychiatric inventory including delusions and hallucinations, this was labeled as non-ADD.

Neurologist 4:
The approach to these cases was akin to that which is taken with a patient presenting to the clinic. Though a personal history and physical evaluation were not available, the case sheets did provide most of the information necessary for the evaluation of a patient with underlying cognitive disorder. FAQ and NPI first gauged the severity of cognitive deficits and associated symptoms. Any patient who indicated dependency on others for functional activities solely due to cognitive difficulty was considered to have dementia. If the patient had mild cognitive symptoms, then they were diagnosed with MCI. If no symptoms were present and the patient was completely independent, then cognition was deemed to be age appropriate. Next, the GDS was used to assess for the presence of geriatric depression, which is a wellknown confounder for cognitive disorder. In the case of depression, a patient's cognitive difficulty cannot be clearly resolved as occurring primarily or secondarily to another disease. After this, a full review of the patient's active medical problems was performed to assess alternative etiologies of cognitive decline such as stroke, TBI, seizures, thyroid disease, vitamin B12 deficiency, and substance abuse. If no such conditions were found, the cognitive disorder was considered to likely be secondary to Alzheimer's disease. Additional patient history, including age, gender, race, and family history was also reviewed for its diagnostic potential. Cognitive assessments were reviewed for patterns of deficiencies using both their z-score and percentile ratings. Significant abnormalities of memory testing both immediate and delayed recall would be highly suggestive of Alzheimer's disease in a symptomatic patient. Preserved memory function with either normal or abnormal language, processing, attention, and executive domains would be unusual for Alzheimer's disease. Lastly, the patient's brain MRI was reviewed for two reasons. 1) To rule out structural causes of cognitive decline such as stroke, and 2) to look for classic atrophy patterns which may be suggestive of AD vs. non-ADD.

Neurologist 5:
The patient's clinical data was first reviewed. Extremes in education (e.g., less than completion of 9th grade) or age (eg >90 years old) were particularly notable. Thereafter, cognitive test scores were examined and double-checked to ensure that any normal scores were not accompanied by significant impairment in function-specifically, no FAQ scores >6. For any given patient, if cognition was normal and function intact, then the diagnosis was NC. Similarly, if function was intact per FAQ testing, but there were mild declines in cognitive test scores, then a diagnosis of MCI was made. Finally, if cognitive test scores were significantly low (greater than two Z-scores below the mean in at least 2 domains) and functional impairment present, then the patient was judged to have dementia. For those with dementia, the presence of a recent stroke or TBI was assessed. The scoring pattern on cognitive testing was also examined to judge its consistency with memory impairment. If there was evidence of dementia but no significant memory changes, then the other domains involved were considered, and a review of the NPI-Q for signs of hallucinations and disinhibition was completed. After checking this additional data, the brain MRI of all patients with dementia was reviewed. If imaging suggested an AD pattern of atrophy, then the diagnosis was AD. If the pattern was not suggestive of Alzheimer's, then the diagnosis was non-ADD. Naturally, there were certain exceptions to the above-outlined approach. Some patients demonstrated clear declines in cognition despite normal function per FAQ testing. Conversely, some patients exhibited only mild cognitive changes despite significantly elevated FAQ scores. For such patients with isolated functional impairment, additional efforts were made to look for signs of neuropsychiatric symptoms on the NPI or GDS, as these could feasibly explain functional impairment. Thus, patients with isolated functional impairment and high GDS or NPI scores were classified as MCI as opposed to dementia. Of note, for those with at least 9 years of education and an MMSE < 24, a low FAQ score was typically ignored, and dementia was diagnosed. For patients with particularly low levels of education (e.g., 3-5 years), functional testing was given relatively greater weight than cognitive testing. Various caveats should be considered. To begin, many of the subjects classified as MCI likely have prodromal AD pathology, and thus their brain MRI may have reflected an AD pattern of atrophy. Given that this project focused on AD at the stage of full clinical dementia, the brain MRI of those with an MCI diagnosis was not reviewed. MCI was thus taken to be a purely clinical diagnosis. Future studies may consider including MCI subjects, sub-classify them as amnestic vs. non-amnestic, and then have clinicians review brain MRIs to see whether they believe the subject may have prodromal AD. Additionally, it is quite common to have mixed pathology in the setting of dementia. For instance, recent studies suggest that approximately 50% of patients meeting criteria for Dementia with Lewy Bodies also have some AD pathology, and that pure Lewy Body disease was seen only in 39% of patients. Similarly, the cooccurrence of AD and vascular pathology is quite common. Future studies may want to include the possibility of classifying the subject as having co-occurring pathologies in the setting of AD.

Neurologist 6:
When approaching each case, demographic information, including age and education, were reviewed. Medical history and co-morbidities were also reviewed. Then the provided neuropsychological and psychiatric data was first reviewed along with the ADL scores. Significant functional impairment such as decreased ability to perform ADLs was useful in diagnosing dementia as well as distinguishing MCI from dementia. Neuropsychological scores were then reviewed, as well as the FAQ, GDS, and NPI. Raw scores, standardized scores, and sub scores were reviewed for each patient. Different batteries on neuropsychological testing were also significantly considered. For example, profound executive dysfunction with relatively preserved memory function and lack of amnestic features would be more suggestive of Lewy Body Dementia or other non-AD dementia, whereas predominantly amnestic features being suggestive of AD. Patients with impaired performance on neuropsychological assessments were categorized into experiencing dementia versus MCI. As previously noted, ADL scores could be helpful in determining degree of cognitive impairment. Imaging was then reviewed. Patterns of atrophy were noted to assess for neurodegenerative disease type and disease burden. For example, parietal atrophy being more suggestive of atrophy, frontal-temporal atrophy more suggestive of frontal temporal dementia, and subcortical atrophy suggestive of vascular dementia. The volume loss/atrophy was also considered, with higher degrees of atrophy potentially correlating with greater clinical findings suggestive of cognitive impairment. The final diagnosis considered neuropsychological scores, imaging findings, and clinical history to assess if other medical comorbidities were contributing to cognitive impairment. There were some difficulties in making the diagnosis due to limitations in the presented data. It is difficult to determine from the provided information if ADL limitations occur secondarily to motor or cognitive impairment. Particularly in the case of non-ADDs such as Parkinson's disease, early limitations in ADLs may be due to motor impairment, whereas those who have progressed to Parkinson's disease dementia may score low on ADL assessments due to cognitive or executive dysfunction. Patients with good cognitive reserve may also perform average on neuropsychological testing despite significant executive impairment, and this may make pairing ADL assessment with neuropsychological assessment inadequate when determining dementia diagnoses. It is also important to note that there is often overlap of dementia pathologies. Notably, multiple dementias may account for difficult clinical presentations. Also, pathologic changes suggestive of multiple comorbid conditions are frequently noted on autopsy, as in the case of a patient with mixed PD/AD who is found to have both alpha-synucleinopathy and beta amyloid plaques.

Neurologist 7:
The primary driver in deciding the ratings was the cognitive test scores, which were first reviewed. If all the scores were average or above average (based on age and education adjusted z-scores), then the diagnosis was assigned as normal. If any test scores were below average or impaired, then the functional assessment scores were reviewed to determine if there was sufficient evidence for a functional impairment due to cognitive impairment, which would indicate dementia. If there was no functional impairment, then the diagnosis was assigned as MCI. If there was functional impairment that seemed reasonable in relation to the cognitive test impairments, then the diagnosis was dementia. For all demented cases, other information was used to determine AD vs non-ADD including pattern of cognitive impairment, type of functional impairments, NPI results, and MRI evidence of hippocampal atrophy out of proportion to global atrophy. Difficulties in rating occurred when there were average or above average cognitive test scores, indicating normal cognitive performance, but abnormalities on the functional rating or the NPI. Other difficulties arose when the cognitive test scores were only below average on one test, but not impaired, or when there was a mismatch between the type of cognitive test impairment and the type of functional impairment. Finally, in severely demented patients when all test scores were severely impaired and there was severe functional impairment, distinguishing between AD vs non-AD was more difficult.

Neurologist 8:
A review of the subject's demographic, clinical and cognitive measures followed by MRI was performed. The subject's education, age and MMSE were particularly considered while making diagnosis. If MMSE was below 25, then these subjects were generally diagnosed with dementia unless there were extenuating circumstances such as old age or poor level of education. Cognitive testing with poor recall, language involvement, slow processing speed and significantly low values on functional status were more indicative of MCI or AD. MRI was particularly useful in subjects whose depression and neuropsychiatric scores were elevated. If the MRI was normal and the subjects had depression and clinical testing was mildly affected, then the poor clinical scores were attributed to depression. If the subjects had mildly abnormal scores with some functional impairment and the MRI showed frontotemporal atrophy, then they were diagnosed with MCI. However, when the subjects were depressed and had poor scores on language and memory, then they were diagnosed with MCI. If the subject had disproportionately more language and behavior issues along with MMSE lower than 25, then the subjects were diagnosed with non-ADD. In some of these cases, MRI was confirmatory in providing evidence on frontotemporal atrophy. Also, if subjects had slow processing speed and executive function with decreased fluency and hallucinations and delusions, then they were diagnosed with non-AD. MRI was not necessarily indicative of diagnosis in these cases because most of them had generalized global atrophy, and posterior atrophy in some cases. If the subjects had MMSE below 25, and vascular risk factors with mild white matter disease on the MRI, then they were also diagnosed with non-ADD. All other cases with MMSE values lower than 25 were diagnosed with AD. In general, MCI diagnosis was not straightforward in depressed subjects or subjects with low education level.

Neurologist 9:
Initially, a demographic overview of the patient, including age, past medical history and MMSE was conducted. In the case of a normal MMSE, the likely diagnosis is NC. In that case, each MRI was reviewed to ensure no abnormalities. In the event of an MMSE between 27-30, MCI is considered if there is no significant other disease and the patient's MRI is unremarkable. For MMSE between 24-27, additional consideration was given to the total number of years of education and overall psychiatric disease burden, both of which can affect MMSE. As discussed, it was also assessed whether any MRI changes could be explained by prior medical history. In sum, a decision on the nature of the dementing process was based on a review of medical history, with particular attention paid to depression. Specific psychiatric symptoms such as severe hallucinations support a diagnosis other than Alzheimer's disease. The Boston Naming test and various memory assessments were also used in order to determine the particular subtype of dementia. In the case of dementia, consideration was given to imaging features which might explain cognitive impairment. If the dementia was explained by a finding on MRI consistent with a non-Alzheimer's process, then a non-ADD diagnosis was chosen. If there is no alternative explanation clinically and the MRI was supportive of AD, then the diagnosis was Alzheimer's disease.

Neurologist 10:
Clinical history was reviewed to determine if the subject met the criteria for cognitive impairment. Specifically, MMSE, FAQ and NPI scores were reviewed along with any evidence of psychiatric disease.
If the MMSE scores were between 24-26, then other neuropsychological test scores were reviewed. If the Boston Naming test scores were between 14 and 20, then evidence of psychiatric risk factors that are typically related to depression were carefully reviewed. If clinical criteria for mild cognitive impairment or dementia were noted, MRI was reviewed to identify patterns of atrophy. If atrophy was observed predominantly in the temporal and parietal lobes, then the diagnosis was deemed as AD. When global atrophy was present, and an unclear temporo-parietal predominance was seen, clinical history was used to guide the ultimate diagnosis.

Neurologist 11:
The subject's age along with MMSE were first reviewed as a quick screen of global cognition. Subsequently, FAQ scores were reviewed. If MMSE scores were low and the FAQ scores were above six, then the subject was considered as potentially having dementia rather than just MCI. Neuropsychological testing scores were then reviewed. Logical memory immediate and delayed recall scores were reviewed to see if the subject's scores were very impaired. If the scores remained low (e.g., generally in single digits), this indicated verbal learning or memory deficit. Scores from other cognitive domains were also reviewed to observe if there was a similar deficit or if these were relatively preserved. If verbal learning/memory were impaired significantly out of proportion to the other domains, then the diagnosis was assigned as AD. If other domains were significantly affected, and if the subject had many neuropsychiatric symptoms, then the diagnosis was assigned as non-ADD. MRI scans were also reviewed for most subjects when cognitive impairment was observed from the non-imaging data. Hippocampal atrophy observed on MRI reinforced the prior determinations that were based on cognitive scores. Challenges included not having information on the duration of symptoms or the approximate onset of symptoms (years prior to the assessments provided to the raters) and lack of visuospatial function test data.

Neurologist 12:
The images for each patient were viewed before reviewing their charts. This was done to prevent knowledge of the clinical history from confounding the image review. Examination of the MRI images in all 3 planes was performed to assess for evidence of focal or diffuse atrophy suggestive of a particular dementia diagnosis. Images were also reviewed for evidence of other pathological findings such as prior ischemic strokes, vascular disease, hydrocephalus, TBI, or other conditions. Review of MRIs was conducted without any volumetric analysis and estimated volume loss while accounting for age. After the patient's chart was reviewed to assess cognitive status. Attention was paid to the patient's cognitive function and, where abnormal, an attempt was made to determine whether cognitive dysfunction was global or limited only to certain domains. Imaging and cognitive study findings were correlated to reach the final diagnosis. If MRI and cognitive studies were normal, the diagnoses were normal. If the MRI showed signs of mild atrophy (diffuse or focal) or the cognitive studies showed mild cognitive decline, the diagnosis was MCI. If the MRI showed prominent diffuse atrophy or cognitive studies showed signs of decline across multiple domains, the diagnosis was AD. If the MRI showed focal atrophy or signs of atrophy consistent with ischemic strokes or frontotemporal dementia along with cognitive studies showing decline in limited cognitive domains only, the diagnosis was non-AD. Some cases had mixed features of AD and non-ADD and were classified as AD. Some cases with MCI had mild cognitive deficits consistent with early AD but were classified as MCI as MRI atrophy was mild and cognitive deficits were mild as well.
Neurologist 13: The clinical information was reviewed to find evidence of vascular risk factors which may contribute to non-ADD. Cognitive test results such as MMSE were used for initial screening followed by review of the performers in language, memory, and executive function domains. When scores were consistently lower in these domains, then AD was deemed as the diagnosis. In cases with near normal MMSE scores and lower scores in other domains, the diagnosis was not trivial and most often thought to be an MCI variant. Evidence of active psychiatric disorders and normal aging were also considered when diagnosing subjects with MCI. Finally, MRIs were helpful to confirm AD. However, in some cases, if the temporal lobe abnormalities were not present on MRI but other tests (e.g., neuropsychiatric inventory and ADLs) were supportive of dementia, AD was still deemed as the primary diagnosis.

Neurologist 14:
Subjects were diagnosed with normal cognition if they had high scores on their MMSE (29 or 30) and were within one standard deviation / z-score from the mean. Functionally, they had to be able to perform their ADLs and IADLs independently. Occasionally, this could be deceiving (e.g., a patient unable to perform ADLs but had very high scoring). These subjects were still classified as normal with the assumption that perhaps that they might be able to complete their ADLs because of other non-cognitive limitations (e.g., poor vision). Often, the MRI data did not significantly influence the classification of normal subjects because the clinical picture was reassuring. Subjects were classified as having MCI if they scored in the 24-28 range on their MMSE. Years of education were considered, and the z-scores were in the range of 1-2. Mild atrophy or evidence of white matter disease on the MRI were also observed in the cases diagnosed with MCI. Subjects were classified as AD if they had low MMSE scores and results from the other cognitive tests were consistently poor. If the subjects were cognitively impaired without many vascular risk factors or other signs (e.g., urinary incontinence), then they were deemed to have AD. Likewise, subjects with multiple high risk vascular features (e.g., atrial fibrillation, hypertension) and evidence of small vessel disease on MRI were classified as non-AD because vascular dementia was more likely. Evidence of delusions or hallucinations led to assigning non-ADD diagnosis on the subjects.

Neurologist 15:
The diagnoses of normal cognition, MCI, and dementia were initially considered primarily based on the existence of objective NP impairment. If at least one NP test result was below 1.5 SD or if two NP tests were below 1 SD, then these subjects were considered to have some level of cognitive impairment. Functional independence was considered when no FAQ items were scored greater than 1. Additional information was then reviewed including the MRI scans to determine the specific diagnosis. For AD, impaired delayed recall and medial temporal lobe atrophy provided supporting evidence. Some demented cases that did not have above typical AD characteristics but had atypical NP performance (e.g., more prominent non-memory impairment), parietal and (or) occipital lobes atrophy and no other contradictory evidence, were still diagnosed as AD (atypical type). For non-ADD, the subjects were diagnosed if they showed atypical NP performance, asymmetrical frontal or temporal atrophy on MRI that indicated FTD and related subtypes, or cerebrovascular disease, or hydrocephalus, etc. In rare cases, impaired FAQ was not considered associative with cognitive decline, e.g., due to severe depression or psychiatric disorders. These cases were not assigned to have dementia. In other rare cases, subjects whose FAQ was normal but had severe language problems and prominent temporal lobe atrophy were diagnosed as non-ADD as they may have primary progressive aphasia. The challenges during diagnosis included (1) lacking medical history (onset characteristics and course of disease) and data for neurological signs; (2) No NP data for visuospatial performance; (3) No unanimously accepted operational definition for objective NP impairment and functional independency.

Neurologist 16:
The subject's history including age, gender, education level, past disease history, family history and other medical information was reviewed. This was followed by review of the cognitive assessments, FAQ, GDS and NPI-Q to determine if the subject had MCI or dementia. For extremely poor scores, dementia was deemed as the diagnosis. The AD cases were differentiated from non-ADD cases based on the nature of cognitive impairment and the condition of the combined neuropsychiatric symptoms. Additionally, if the MRI indicated an atrophy pattern within the medial temporal or parietal lobes, the subject was diagnosed with AD. If there was asymmetric frontotemporal lobe atrophy or significant cerebrovascular disease, then non-ADD was considered as the diagnosis.

Neurologist 17:
The subject was considered to have dementia if there were significant deficits on cognitive testing and functional impairments in more than one domain. Based on the age, the overall brain volume was examined on the MRI, followed by review of hippocampus/medial temporal lobe volume. The imaging was then assessed for WMD and other abnormalities. AD was selected as the diagnosis if there was a consistent pattern of atrophy and the history negative for prominent delusions/hallucinations. Non-ADD was chosen as the diagnosis if there was a high burden of WMD and/or prominent delusions/hallucinations. If there was WMD and disproportionate hippocampal/medial atrophy, then cognitive test results were used to determine if the pattern correlated with AD vs non-ADD; AD was selected as the diagnosis if there was preserved immediate recall/registration and poor delayed recall, and non-ADD as the diagnosis if the registration was very poor. The subject was considered to have MCI if there were deficits on cognitive testing and functional impairment in one domain. If severe abnormalities were identified on imaging in terms of WMD or atrophy pattern, then the diagnosis was updated to be either AD or non-ADD based on prior descriptors. The subject was considered to be NC if cognitive testing showed results within the normal ranges and was independent in all functional domains. If severe abnormalities were identified on imaging in terms of WMD or atrophy pattern, and if the cognitive testing results were lower than expected for the subject's level of education, then the diagnosis was updated to MCI.

Summary of the neurologist approach to the ratings
Collectively, these perspectives speak to the importance of an integrated approach to dementia diagnosis in which distinct modes of data are reconciled prior to an ultimate classification of disease status. Assessment of demographic and medical history were commonly employed to rule-out confounding conditions leading to cognitive decline, or to suggest specific symptomologies consistent with dementia subtypes. Relatedly, neurocognitive, and functional testing allowed clinicians to not only triage subjects by their degree of impairment, but also to delve into domain-specific declines in cognition that are important in differentiating various forms of dementia.
Most commonly, MRI was utilized as a confirmatory test of the initial clinical impression. Spatial patterns of atrophy were highly informative during neuroimaging reviews, with disproportionate volume loss in the medial temporal and parietal lobes often cited as suggesting AD. Asymmetric atrophy outside of these regions, as well as ischemic lesions and excessive ventricular enlargement were typically judged to represent various non-ADDs. Collectively, these perspectives speak to the importance of an integrated approach to dementia diagnosis in which distinct modes of data are reconciled prior to an ultimate classification of disease status.

V. Neuroradiologist Accounts of Diagnostic Approach
Neuroradiologist 1: MRIs were initially screened to identify subjective evidence of any abnormality. Given that the task at hand was to discriminate between AD and non-ADD, careful attention was paid to potential patterns of volume loss--particularly within the bilateral hippocampi. If there was relatively symmetric severe hippocampal volume loss bilaterally, AD was typically chosen. Non-AD was typically diagnosed if the hippocampi were relatively spared and any of the following conditions were met: 1) notable asymmetry to volume loss (this is often observed in frontotemporal dementia or corticobasal degeneration) 2) disproportionate volume loss involving structures outside of the temporal and parietal lobes (e.g., frontal lobe, occipital lobe, brainstem, or cerebellum) 3) ventricle enlargement clearly discordant with the degree of cerebral volume loss (thus indicating possible normal pressure hydrocephalus). Among the challenges encountered in this study was the difficulty of providing diagnosis without having a consensus approach between radiologists. Additionally, limiting reads to T1-weighted pre-contrast sequences somewhat limited the assessment of vascular dementia. Additionally, there was nonuniform alignment of some sequences, which needed multiplanar reformats within Slicer to achieve uniform alignment.

Neuroradiologist 2:
A quick scan of the entire MRI was performed to check for motion or any other quality issues, and the presence of infarcts and extensive small vessel disease to suggest vascular dementia, followed by an initial global assessment of atrophy. The approach started with observing the ventricles and the corpus callosum. It was relatively easier to assess sulcal enlargement than parenchymal atrophy. An assessment was also made for presence of disproportionate ventriculomegaly to suggest normal pressure hydrocephalus. It was challenging to grade the amygdala, hippocampus, and parahippocampus separately and they were largely graded identically. Grading for parietal lobe atrophy was difficult because the parietal lobes, especially the postcentral gyri and superior parietal lobules often look smaller than the rest of the brain in older subjects. Therefore, unless severe, parietal lobe atrophy was downgraded by 1 level i.e., the same appearance in the frontal lobe that would be graded moderate for the parietal lobes would be graded mild, etc. AD dementia was called when there was predominant parietal and medial temporal atrophy. If there was severe frontal or occipital atrophy or global severe temporal atrophy, then those cases were diagnosed with non-AD dementia even if there was parietal and medial temporal atrophy. It is recognized that this method would likely miss posterior cortical variant of AD.

Neuroradiologist 3:
All MRI scans were reviewed to determine whether atrophy involved the hippocampus/medial temporal lobe, entorhinal cortex, and parietal lobes. If atrophy was present in these areas, the diagnosis was assigned as AD. Alternatively, non-ADD was diagnosed if extensive small vessel changes (suggesting vascular disease) were present, or if significant atrophy was observed in the frontal lobe or diffusely throughout the brain. The major challenges encountered included image quality or completeness. For instance, certain scans featured significant patient motion, or were otherwise missing certain sequences. Also, numerous cases had imaging features that overlapped between AD and non-ADD, thus making these difficult to definitively identify. For these patients, judgement between the two types of dementia was very subjective.

Neuroradiologist 4:
The medial temporal lobe on the coronal reformats were first reviewed for atrophy disproportionate to other brain regions. If present, then the axial, coronal, and sagittal reformats were reviewed to see if the atrophy also involved the anterior temporal lobes. In most instances, there was correlation between the degree of medial and anterior temporal lobe atrophy. Subsequently, axial reformats were reviewed to see if there was parietal lobe atrophy, the axial and coronal reformats to see if there was frontal lobe atrophy, and the coronal reformats to see if there was occipital lobe atrophy. When assessing for ventricular enlargement, the size of the ventricles was always compared to the sulci looking for disproportionate ventricular enlargement. Corpus callosum atrophy was evaluated using the sagittal reformats. AD was selected as the diagnosis for cases with both parietal and temporal lobe atrophy or only parietal lobe atrophy. Non-ADD was selected as the diagnosis for the other cases. "Mild" was chosen for subtle findings, "severe" was chosen for extreme findings such as "knife-like" gyri, and "moderate" was chosen for those findings that fell in between. Encountered difficulties including identifying regional predominant volume loss in those patients with superimposed generalized atrophy, segmenting the temporal and frontal lobes, and downloading/uploading individual datasets into 3D slicer.

Neuroradiologist 5:
The overall approach was to initially review the MRI scans to exclude the presence of multiple infarcts and observe easily identifiable atrophy patterns in the entire brain. For example, initial assessment focused on whether frontal and anterior temporal versus parietal and medial temporal volume loss was dominant and easily identifiable. The second stage of assessment involved a more detailed sub-analysis of each region and grading of severity. AD diagnosis was assigned when atrophy was observed predominantly within the parietal and medial temporal lobes or when the frontal lobe involvement was less than or commensurate with parietal and temporal lobes. Diagnosis of non-AD dementia was assigned in any pattern differing from this including frontal, anterior temporal, or occipital predominant involvement as well as enlarged ventricles or multiple infarcts. The size of the ventricles with respect to that of the sulci was compared when assessing ventricular size.
Neuroradiologist 6: Regional atrophy was assessed by observing the size of the gyri and the degree of sulcal widening in different brain regions using the axial, sagittal, and coronal planes. Ventricular size was also assessed. Cases with disproportionate atrophy of the medial temporal lobes/hippocampus, in the absence of severe anterior temporal atrophy, were classified as AD. Cases with disproportionate parietal atrophy (with or without medial temporal atrophy) in patients younger than 65 years of age were also classified as early onset AD. Patterns consistent with posterior cortical atrophy/visual variant of Alzheimer's disease were also investigated by looking for parietal +/-parieto-occipital and posterior temporal lobe predominant atrophy. All other patterns of atrophy were classified as non-ADD, such as disproportionately severe anterior temporal atrophy. Cases with no-to-minimal atrophy and cases without clear regional predominance were challenging to classify. Cases with medial temporal atrophy along with some degree of atrophy elsewhere in the temporal and/or frontal lobes were also a challenge.

Neuroradiologist 7:
The hippocampus and medial temporal lobes were first evaluated. Specific attention was paid to volume loss in these structures and additional lobes of the brain, and to whether changes in the bilateral cerebral hemispheres were symmetrical. If the volume in each brain lobe decreased proportionally, with dominant volume reduction in the hippocampal and/or parietal lobes, then the subject was diagnosed with AD. If the brain lobes decreased in volume disproportionately, specifically within the temporal pole, frontal, and occipital lobes, or if bilateral asymmetry was obvious, then the subjects were diagnosed with non-ADD. Ischemic changes were often observed. If during the evaluation process, cerebrovascular disease was obvious, then vascular dementia was considered a possibility. In such cases, the lack of obvious volume loss or the presence of proportional volume reduction in each lobe led to the diagnosis of vascular dementia (ie. non-ADD). However, if ischemic changes were noted in combination with prominent volume loss within the medial temporal or parietal lobes, a mixed AD/vascular dementia pathology was favored, and the subject was classified as AD overall.

Summary of the neuroradiologist approach to the ratings
The neuroradiologists' approach to dementia classification was predominantly informed by the distribution of atrophic changes relative to global brain atrophy. Expectedly, disproportionate volume loss within temporal lobe structures and the parietal lobe was judged to be consistent with AD, whereas such changes outside of these regions was generally deemed to represent non-ADD. Additional features commonly suggesting non-ADD included the presence of past infarcts as well as excessive ventricular enlargement.

VI. Data to Clinicians and Diagnostic Criteria
Section I: Data provided to the neurologists In our expert-level validation, we gave our panel of neurologists a random subset of 100 NACC participants. We aimed to simulate the full span of assessment material available to a practicing physician. The example of a case presentation to a neurologist, including all the clinical nonimaging data features, is provided below.
Furthermore, we included a description of clinical data fields to the neurologists participating in our clinician vs. model comparison. The document below is what was provided to neurologists as an instructional form for further information on each of the clinical non-imaging data features.
After the neurologists reviewed the cases and the clinical data field descriptions, they were asked to give a diagnosis label for each case (either normal cognition, mild cognitive impairment, Alzheimer's disease dementia, or non-Alzheimer's disease dementia). talking to strangers as if he/she knows them, or saying things that may hurt people's feelings? 9. Irritability/lability -Is the patient impatient and cranky? Does he/she have difficulty coping with delays or waiting for planned activities? 10. Motor disturbance -Does the patient engage in repetitive activities such as pacing around the house, handling buttons, wrapping string, or doing other things repeatedly? 11. Nighttime behaviors -Does the patient awaken you during the night, rise too early in the morning, or take excessive naps during the day? 12. Appetite/eating -Has the patient lost or gained weight, or had a change in the type of food he/she likes? • Scoring: 0 (No), 1 (Yes, mild), 2 (Yes, moderate), 3 (Yes, severe) ○ Total score out of 36 ○ Yes -symptoms present in last month ○ No -otherwise ○ Mild -noticeable, but not a significant change ○ Moderate -significant, but not a dramatic change ○ Severe -very marked or prominent, a dramatic change

Section II: Data provided to the neuroradiologists
For our expert-level validation, we gave neuroradiologists 50 randomly sampled cases from the NACC dataset. Below, we provide a template questionnaire that was given to neuroradiologists. All neuroradiologists were provided with 1.5T T1-weighted MRIs for each subject, along with age and gender. The neuroradiologists filled out one online questionnaire for each case they reviewed and were asked to give a dementia diagnosis of Alzheimer's disease dementia and non-Alzheimer's disease dementia based on impression.