## Introduction

Alzheimer’s disease (AD) is characterized by the presence of insoluble fibrillar deposits of amyloid-beta in plaque structures and tau protein in neurofibrillary tangles. The development of positron emission tomography (PET) tracers [1], and cerebrospinal fluid (CSF) and blood assays [2, 3], for measuring in vivo levels of amyloid-beta pathology reconceptualized the diagnosis of AD by allowing, for the first time, diagnosis in vivo in patients with objective cognitive impairment and abnormal levels of amyloid-beta, at both the mild cognitive impairment (MCI) stage and the dementia stage (now called MCI or dementia due to AD, respectively) [4,5,6]. Longitudinal studies have, however, stressed that although the amyloid-beta biomarkers are highly sensitive to the presence of amyloid-beta pathology, they have limited specificity in determining which patients with MCI will decline cognitively over time and develop dementia of the Alzheimer’s type [7]. This finding, together with the high frequency of amyloid-beta-positive, cognitively normal elderly [8, 9], questions the deterministic role of a positive amyloid-beta biomarker result in predicting AD-related cognitive decline in a clinical setting.

Studies have indicated potential for tau PET in differentiating between cognitively normal elderly and patients with AD [10, 11], for its strong relationship with cognitive impairment [12,13,14,15,16,17], and in discriminating between different subtypes of AD [18]. However, the ability of tau PET to predict which patients will decline cognitively over time remains to be determined. In this study, we followed patients with AD who had undergone baseline imaging with the first generation tau PET tracer [18F]THK5317 [10, 19, 20]. We wanted to determine the accuracy of predicting future cognitive decline from the extent of baseline tracer binding, and to compare this with the predictive accuracy using baseline [18F]FDG and [11C]PIB PET, and clinical CSF, neuropsychological and structural atrophy markers.

## Materials and methods

### Study participants

Twenty patients with a clinical diagnosis of MCI or dementia due to AD who had previously participated in baseline multi-modal investigations were included in this longitudinal prognostic study. All patients had been referred to the Cognitive Clinic at Theme Aging, Karolinska University Hospital, Stockholm, Sweden, for memory assessment. The procedures for clinical assessment and patient recruitment are detailed elsewhere [10]. At baseline, nine patients were diagnosed with dementia due to AD (that is, probable AD and a positive [11C]PIB PET scan) and 11 with MCI due to AD (that is, amnestic multi-domain MCI and a positive [11C]PIB PET scan) [4, 6]. All patients were followed up clinically at regular intervals to determine the extent of clinically evident cognitive decline over time (as described in the Neuropsychological assessment section). For the purposes of this study we focused on the most recent follow-up assessments (those that were available at 1st of November 2019). The median follow-up for our patient group was 48.02 months (interquartile range = 32.04:56.33).

The study was approved by the Regional Human Ethics committee in Stockholm, and the Radiation Safety committee of Uppsala University Hospital, Sweden. All participants and their caregivers provided written informed consent prior to the investigation and all procedures were in accordance with the ethical standards of the Institutional and National Research Committee and with the 1964 Helsinki Declaration and its later amendments, or comparable ethical standards.

### Neuropsychological assessment

All patients underwent neuropsychological assessment at baseline. This included assessment of global cognition using the Mini-Mental State Examination (MMSE), and of episodic memory using, among others [12], the Rey Auditory-Verbal Learning (RAVL) encoding subtest, expressed as z-scores, in comparison with results from a reference group of healthy controls [21].

The patients were assessed again using MMSE at follow-up, with the exception of one patient with a diagnosis of AD dementia at baseline who could not complete the assessment because of severe cognitive decline over the follow-up interval. Cognitive decline over time was defined as a decrease in MMSE score of ≥1.5 MMSE units per year [15, 22, 23], coupled with evidence of cognitive deterioration in multiple domains over the follow-up interval based on the medical history from patients and caregivers. The participants were subsequently divided into two subgroups (those who were cognitively stable vs those with cognitive decline). The patient who could not complete follow-up MMSE assessment because of severe cognitive impairment was placed in the cognitive decline group. This patient, who became eligible for nursing-home care during the follow-up interval as a result of requiring extensive assistance with the activities of daily living, was not included in analyses employing MMSE measures as the dependent variable.

### Image analysis

At baseline, all patients had undergone [18F]THK5317, [11C]PIB and [18F]FDG PET imaging, and a T1-MRI sequence, as detailed elsewhere [10]. Individual dynamic baseline [18F]THK5317 PET (0-60 min) images were co-registered onto the individual T1-MRI image and kinetic modelling using the reference Logan graphical method was applied to create parametric distribution volume ratio (DVR) images, using the cerebellar grey matter as reference (PMOD v.3.5). In region-based analyses, for minimizing the spill-over effect on the signal because of atrophy, MRI-based partial volume correction was applied to the dynamic [18F]THK5317 PET images, based on individual T1 scans, prior to kinetic modelling using the Muller-Gartner method (PMOD v.3.5) [12]. For voxel-based analyses of the [18F]THK5317 data, non-partial volume corrected data was used, since the application of correction resulted to the amplification of the noise in extra-cerebral areas. Summation [11C]PIB (40-60 min) and [18F]FDG (30–45 min) PET images were co-registered onto the individual T1-MRI images (SPM8). Standard uptake value ratio (SUVR) images were created using the cerebellar grey matter as the reference for [11C]PIB and the pons for [18F]FDG.

For regional quantification, we used regions of interest (ROIs) derived from the Harvard-Oxford probabilistic atlas (FSL), spatially warped in each patient’s native T1-MRI space, after application of an individual grey matter mask, as previously described [12]. Based on the atlas, bilateral composite ROIs were created for the quantification of the [18F]THK5317 and [18F]FDG parametric images: inferior temporal gyrus, middle temporal gyrus, superior temporal gyrus, lateral parietal lobe, occipital lobe, frontal lobe and precuneus. Binding was not assessed in ROIs with high MAO-B loads (e.g., striatum, thalamus, medial temporal lobe, cingulate gyrus), given the affinity of [18F]THK5317 for the MAO-B enzyme [24]. A composite neocortical ROI was created for quantifying [11C]PIB binding.

An experienced neuroradiologist rated the T1-MRI sequence for assessing the medial temporal lobe atrophy (MTA) score [25]. The left and right MTA scores were averaged.

### CSF measurements

CSF samples were obtained under non-fasting conditions via lumbar puncture from 17 participants at baseline. Levels of Aβ1-42, and p-tau181p were determined in 16 participants, and t-tau in all 17 participants using commercially available sandwich ELISAs (Innogenetics, Ghent, Belgium).

### Statistical analyses

Wilcoxon rank sum and chi-squared tests were used for comparing the clinical characteristics of patients who remained cognitively stable with those of patients who declined cognitively over the follow-up interval, (uncorrected p < 0.05).

The areas under the receiver operating characteristic curves (AUC) were calculated to assess the accuracy of the baseline biomarker levels [e.g., regional tracer binding/uptake (DVR or SUVR), neuropsychological and CSF measurements, and atrophy ratings] in differentiating patients who remained cognitively stable from those who declined cognitively over time. The Youden index was used for determining optimal cut-off points at a regional level.

Linear models were used for evaluating the association between baseline biomarker levels and the decrease in MMSE score over time (ΔMMSE; follow-up minus baseline), after adjusting for relevant covariates (Eq. (1)). Linear mixed-effects models were used for assessing the effect of the interaction between the baseline biomarker levels and time from baseline on the MMSE scores (baseline or follow-up), after adjusting for relevant covariates and accounting for repeated measurements (Eq. (2)). The follow-up assessments were assigned a time from baseline corresponding to the difference in months between baseline (time from baseline = 0) and the follow-up investigations. All models were also replicated after the addition of baseline diagnosis (MCI due to AD or dementia due to AD) as covariate, and the results are presented in Supplementary Figs. 1, 2. No collinearity was detected in the models.

Bonferroni-corrected alpha levels for all the models assessing the effects of regional tracer binding/uptake ([18F]THK5317 and [18F]FDG) were based on the number of assessed ROIs (n = 7; Bonferroni-corrected p < 0.05). The models assessing the effects of clinical biomarkers (cognitive, CSF and atrophy markers, and composite [11C]PIB binding) were not corrected for multiple comparisons (uncorrected p < 0.05).

All regional analyses for [18F]THK5317 binding were replicated in the subgroup of patients with available CSF measures (n = 16) for comparison (Supplementary Fig. 3).

All the above-mentioned statistical analyses were carried out using R 3.6.0 software.

$$\Delta{\rm{MMSE}}\;\left( {\rm{follow}} {\hbox{-}} {\rm{up}}\;{\rm{minus}}\;{\rm{baseline}} \right)\\ \,= {\rm{Baseline}}\; {\rm{biomarker}}\; {\rm{levels}} \\ \quad\, +{\rm{Time}}\; {\rm{interval}}\; {\rm{between}}\; {\rm{MMSE}}\; {\rm{investigations}} + {\rm{Age}},$$
(1)
$${\rm{MMSE}}\; \left( {{\rm{baseline}}\;{\rm{or}}\;{\rm{follow}} {\hbox{-}} {\rm{up}}} \right)\\ \,= {\rm{Baseline}}\;{\rm{biomarker}}\;{\rm{levels}} \\ \, \quad + {\rm{Time}}\;{\rm{from}}\;{\rm{baseline}} + {\rm{Age}} + {\rm{Baseline}}\;{\rm{biomarker}}\;{\rm{level}}\; \\ \, \quad \sim \;{\rm{Time}}\;{\rm{from}}\;{\rm{baseline}}\;\left( {{\rm{interaction}}} \right)\\ \quad \,+ {\rm{Random}}\;{\rm{intercept}}\;\left( {{\rm{participant ID}}} \right).$$
(2)

### Statistical analyses—voxel-based comparisons

Following the above-mentioned pre-processing steps, we performed spatial normalization of all voxel-based maps for [18F]THK5317, [18F]FDG and [11C]PIB into the MNI space using the individual transformation matrices from the T1-MRI segmentation step (SPM8). A Gaussian smoothing kernel (FWHM = 8 mm in all directions) was applied to the images. An explicit grey matter mask was used to restrict the voxel-based analyses to GM regions.

The AUC were calculated at a voxel level to assess the accuracy of the baseline tracers’ binding (DVR or SUVR) in differentiating patients who remained cognitively stable from those who declined cognitively over time using VoxelStats 1.1 [26].

A multiple regression design was used for implementing correlation analyses at a voxel level. Briefly, the association between baseline tracer binding/uptake (DVR or SUVR) and the decrease in MMSE score (ΔMMSE) over time was evaluated after adjusting for the time interval between MMSE investigations (continuous variable) and age at baseline (continuous variable). The models were also evaluated after the addition of baseline diagnosis as covariate (dummy variable), and the results are presented in Supplementary Fig. 1. The relevant contrasts for ΔMMSE were evaluated (positive correlations for [18F]FDG SUVR, and negative for [18F]THK5317 DVR and [11C]PIB SUVR).

Separate multiple regression designs were used for evaluating the relationship between baseline tracers’ binding/uptake (DVR or SUVR) and (1) MMSE at baseline and (2) MMSE at follow-up, in two separate models. The relevant contrasts for MMSE were evaluated (positive for [18F]FDG SUVR, and negative for [18F]THK5317 DVR and [11C]PIB SUVR).

All the above-mentioned voxel-based statistical analyses involving regression models were carried out using SPM8 software. For significance testing, no correction for multiple comparisons was applied at the voxel level (p < 0.001). A correction for multiple comparisons was applied at a cluster level using the family-wise error rate (FWE-cluster-corrected, p < 0.05). The results of the voxel-based comparisons were projected onto group average cortical surfaces using BrainNet Viewer 1.61 software [27].

## Results

### Study participants

At follow-up, seven patients with a diagnosis of MCI due to AD and two with dementia due to AD remained clinically cognitively stable (cognitively stable group, n = 9; ΔMMSE <1.5 units/year). Four patients with a baseline diagnosis of MCI due to AD and seven patients with a baseline diagnosis of dementia due to AD experienced further cognitive decline over time (cognitive decline group, n = 11; ΔMMSE ≥1.5 units/year). In total, 4 of 11 patients in the cognitive decline group were admitted to nursing home over the follow-up interval.

The follow-up interval was more than 24 months (median interval = 48.02 months; interquartile range = 32.04:56.33) for all except one patient, who had a baseline diagnosis of AD dementia and was in the cognitively stable group (ΔMMSE = 1). This patient’s latest cognitive assessment was 17 months after baseline and the patient died shortly after from unrelated causes. The clinical characteristics of the patients who remained cognitively stable vs those who experienced cognitive decline over time are summarized in Table 1.

### Accuracy of the biomarker levels for predicting cognitive decline over time

At a voxel level, the accuracy of baseline [18F]THK5317 binding levels in bilateral temporoparietal areas for predicting subsequent cognitive decline in the patient sample (cognitively stable vs cognitive decline groups) was excellent (n = 20, AUC > 90%), while the accuracy of baseline [18F]FDG uptake was much more moderate and accurate predictions were only seen in unilateral temporoparietal areas (n = 20) (Fig. 1a). Baseline [11C]PIB binding levels showed poor predictive accuracy (n = 20). The [18F]THK5317 DVR, [18F]FDG SUVR and [11C]PIB SUVR images from the patients with baseline diagnoses of MCI or dementia due to AD who remained cognitively stable vs those who experienced cognitive decline over time are shown in Fig. 2.

At a regional level, baseline [18F]THK5317 binding in the middle temporal gyrus was 100% accurate in predicting cognitive decline (n = 20, AUC: 1.00; sensitivity 100%; specificity 100%) while binding in the inferior temporal gyrus was marginally less accurate, followed by the precuneus (Fig. 1b). The predictive accuracy of baseline [18F]FDG uptake was highest in the precuneus (n = 20, AUC: 0.77; sensitivity 100%; specificity 64%; Fig. 1c). The predictions from all the clinical biomarkers at baseline were less accurate (n = 20 for MMSE, RAVL encoding, composite [11C]PIB binding and MTA rating; n = 16 for CSF Aβ1-42 and p-tau181p; n = 17 for CSF t-tau; Fig. 1d).

### Association of baseline biomarker levels with future decrease in MMSE score

At a voxel level, baseline [18F]THK5317 binding in temporal areas was significantly associated negatively with cognitive decline (n = 19, FWE-cluster-corrected p < 0.05), after adjusting for relevant covariates (Fig. 3a). No statistically significant association was found between baseline levels of [18F]FDG uptake or [11C]PIB binding and cognitive decline (n = 19, FWE-cluster-corrected p > 0.05).

At a regional level, baseline [18F]THK5317 binding in temporoparietal areas was associated with a significant decrease in MMSE score, after adjusting for relevant covariates (n = 19, Bonferroni-corrected p < 0.05; Fig. 3b). No statistically significant association was found between baseline regional levels of [18F]FDG uptake and a decrease in MMSE score (n = 19, Bonferroni-corrected p > 0.05; Fig. 3c). Nor was a statistically significant association found between the baseline levels of the clinical biomarkers and a decrease in MMSE score (n = 19 for MMSE, RAVL encoding, composite [11C]PIB binding and MTA rating; n = 15 for CSF Aβ1-42 and p-tau181p; n = 16 for CSF t-tau; uncorrected p > 0.05 for all; Fig. 3d).

### Association of baseline biomarker levels with MMSE score longitudinally

At a voxel level, baseline [18F]THK5317 binding in temporoparietal areas was associated negatively with follow-up but not with baseline MMSE scores (n = 19, FWE-cluster-corrected p < 0.05) (Fig. 4a, b). Baseline [18F]FDG uptake in temporoparietal areas was significantly associated with baseline MMSE scores, and in more restricted parietal areas with follow-up MMSE scores (n = 19, FWE-cluster-corrected p < 0.05). Baseline [11C]PIB binding was not associated with baseline or follow-up MMSE scores (n = 19, FWE-cluster-corrected p < 0.05).

In order to assess whether the relationship of biomarker levels and MMSE score is moderated by the time point of MMSE evaluation, we assessed the interaction term [biomarker levels x time point] for the different biomarker modalities, after adjusting for relevant covariates, as described above. The interaction term was significant for baseline regional [18F]THK5317 binding in temporoparietal ROIs (n = 19, Bonferroni-corrected p < 0.05) and for baseline CSF levels of Aβ1-42/t-tau ratio, t-tau, and p-tau181p (n = 16 for CSF t-tau and n = 15 for CSF p-tau181p; uncorrected p < 0.05). More specifically, the interaction plots (Fig. 5a–c) indicated that baseline [18F]THK5317 binding in temporoparietal areas was more strongly negatively associated with MMSE scores at follow-up than at baseline. A similar effect was detected for CSF measures of tau; however, the modelled estimates of the effect were small. The interaction term was not significant for the [18F]FDG uptake and the other clinical biomarker levels.

## Discussion

Several reports from in vivo and post-mortem data have highlighted the close relationship between cross-sectional measures of tau pathology and cognition in AD [9, 12, 15,16,17, 28, 29]. Our data highlight that baseline [18F]THK5317 PET showed higher prognostic accuracy to predict a future decline in cognitive performance, relative to other clinically available biomarkers. Furthermore, our results suggest the existence of a temporal dissociation between tau pathology and cognitive decline, with [18F]THK5317 binding relating stronger with prospective rather than cross-sectional cognitive performance.

In our study, baseline [18F]THK5317 binding predicted the subsequent stability of or decline in cognitive performance over time in patients with a clinical diagnosis in the AD spectrum (MCI or dementia due to AD) with excellent accuracy. Furthermore, there was a strong association between the extent of tracer binding at baseline and the extent of cognitive decline (ΔMMSE) over time, indicating that the extent of baseline tracer binding could give an indication of the rate of cognitive decline in patients with objective cognitive impairment and a positive amyloid-beta biomarker. It is therefore likely that the presence of tau pathology in neocortical association areas will have a negative impact on cognition over time. These results support the currently accepted hypothesis that tau pathology is a pathophysiological marker of AD and not simply a downstream marker that only correlates with cross-sectional measures of the stage of the disease [5]. Moreover, our work underlines the utility of tau PET imaging for early identification of patients at risk of cognitive decline in the AD spectrum, which would enable accurate prognosis in a clinical setting and inclusion of appropriate participants in anti-AD clinical trials.

In contrast, in our sample as in previous reports [30, 31], the accuracy of baseline [18F]FDG levels in predicting which patients would decline cognitively over time was only fair, and there was no relationship between the extent of tracer uptake and the rate of subsequent cognitive decline. These results, together with the current evidence for a strong relationship between [18F]FDG and global cognitive measures cross-sectionally in AD [12, 32], encourage its use mainly as a staging marker for the disease [5]. Figs. 1b and c and 2 demonstrate that [18F]FDG uptake levels, but not [18F]THK5317 binding levels, are better for discriminating between the stages of AD (MCI vs dementia due to AD) than for predicting cognitive decline. As expected from previous studies, the baseline clinical cognition markers, and the gross clinical atrophy markers, which are probably the most downstream markers of AD [33], were less accurate in predicting cognitive decline than the [18F]FDG levels [32].

Comparison of the predictive accuracy of [11C]PIB or CSF Aβ1-42 with that of [18F]THK5317 was complex in this study, given the known amyloid-beta positivity of all participants. From our analyses, however, it is evident that in the presence of a positive amyloid PET scan, further increases in the levels of amyloid tracer binding will not be predictive of future cognitive decline, at least at clinically relevant follow-up intervals. Interestingly, and in agreement with previous studies, only 11 of 20 (55%) patients with amyloid-positive PET scans experienced cognitive decline, and these were mainly those in the later stages of cognitive impairment (4 of 11 patients with MCI declined cognitively vs 7 of 9 patients with dementia) [34]. This observation adds to the cumulative evidence questioning the deterministic role of a positive amyloid-beta biomarker in patients with cognitive impairment [35], and highlights the need for other more specific pathophysiological markers, such as [18F]THK5317, for determining which individuals are likely to undergo AD-related cognitive decline. This study, however, was not designed to compare the predictive accuracy of amyloid-beta with that of tau biomarkers, nor to assess their combination in a clinical setting.

In our additional analyses, we noted that [18F]THK5317 binding and, to a lesser extent, tau markers in the CSF (lower estimates, but in a more restricted sample size) were better associated with the MMSE score at follow-up than with the baseline MMSE values. These results contradict partly previous reports of strong correlations between cross-sectional measures of tau pathology and cognitive performance [18], and highlight that the relationship of tau pathology with prospective cognitive performance is stronger, which supports the prognostic utility of tau PET imaging. Furthermore, these results underline the presence of a temporal dissociation between the development of tau deposition and cognitive deficits—tau pathology appears to precede the development of cognitive impairment, as has been previously suggested in hypothetical models of the AD cascade [33]. In contrast, [18F]FDG uptake showed more extensive correlations with cross-sectional rather than prospective cognitive performance, which further stresses the staging role of this biomarker.

The most important limitation of this work lies in the fact that [18F]THK5317 interacts with the enzyme monoamine oxidase B (MAO-B), and the molecular basis behind this interaction has been proposed [19, 24, 36]. Similar non-intended binding to MAO-B has been described for the other first generation tau PET tracers, including [18F]Flortaucipir [24, 37, 38], and it is somewhat difficult, to date, to estimate the exact contributions of the different binding targets (e.g., tau deposits and MAO-B) in the in vivo tracer PET signal. However, earlier in vivo PET studies have shown that the regional pattern of MAO-B tracer (i.e., [3H]deuterium-L-deprenyl) binding is different from that of [18F]THK5317 binding, and MAO-B tracer binding appears to be less extensive in later stages of the AD (i.e., dementia-stage) in contrast to what seen with [18F]THK5317 [10, 39, 40]. This suggests that the contribution of the MAO-B component in [18F]THK5317 binding will be small in the neocortical areas that were evaluated, which is in agreement with the low load of MAO-B in these areas at autopsy [24, 41]. The small sample number of patients in the current sample and the inclusion of only a gross neuropsychological measure at follow-up (i.e., MMSE) stresses the need for validating the results in larger patient groups with the use of different neuropsychological measures. Since the CSF biomarker assessment at baseline was limited to a sub-sample of the patient group, it was not possible to compare accurately the performance of CSF biomarkers to that of other biomarker measurements, which were available for all patients. Being limited to the current patient group (discovery sample), the sensitivity/specificity measures for [18F]THK5317 binding should be interpreted with caution before the assessment of the biomarker’s accuracy in a validation cohort. Using cortical volume or thickness measures would be required for comparing fairly atrophy measures and [18F]THK5317 binding, although the use of different field strengths for MRI acquisitions in our patient sample precluded the use of such analyses for atrophy in this study. Although the acquisition of the [18F]FDG data was based on the European Association of Nuclear Medicine Neuroimaging Committee’s guidelines [42], the use of a relatively early, static protocol for [18F]FDG, in contrast to the dynamic acquisition of the [18F]THK5317 data, could act as a limitation in the comparison of prognostic accuracy between imaging modalities.

The strength of this study lies in the longitudinal evaluation of a clinically well characterized sample of patients with a diagnosis of AD (MCI or dementia due to AD). Furthermore, the patient group was followed for a clinically relevant time: more than two years. The accuracy of baseline levels of [18F]THK5317 in predicting prospective cognitive decline in these patients was excellent, in comparison to the fair to poor accuracy of baseline amyloid-beta deposition levels, baseline neurodegeneration, baseline CSF measures and baseline neuropsychological assessment. Our results emphasize that the use of tau PET could improve the outcomes of both clinical routines and ongoing trials by allowing the early, accurate identification of patients with AD who will experience cognitive decline.