Dissociable age and memory relationships with hippocampal subfield volumes in vivo:Data from the Irish Longitudinal Study on Ageing (TILDA)

The heterogeneous specialisation of hippocampal subfields across memory functions has been widely shown in animal models. Yet, few in vivo studies in humans have explored correspondence between hippocampal subfield anatomy and memory performance in ageing. Here, we used a well-validated automated MR segmentation protocol to measure hippocampal subfield volumes in 436 non-demented adults aged 50+. We explored relationships between hippocampal subfield volume and verbal episodic memory, as indexed by word list recall at immediate presentation and following delay. In separate multilevel models for each task, we tested linearity and non-linearity of associations between recall performance and subfield volume. Fully-adjusted models revealed that immediate and delayed recall were both associated with cubic fits with respect to volume of subfields CA1, CA2/3, CA4, molecular layer, and granule cell layer of dentate gyrus; moreover, these effects were partly dissociable from quadratic age trends, observed for subiculum, molecular layer, hippocampal tail, and CA1. Furthermore, analyses of semantic fluency data revealed little evidence of robust associations with hippocampal subfield volumes. Our results show that specific hippocampal subfields manifest associations with memory encoding and retrieval performance in non-demented older adults; these effects are partly dissociable from age-related atrophy, and from retrieval of well-consolidated semantic categories.

As demonstrated above, across different years from the HRS/AHEAD prospective cohort studies of older adults, the IR and DR tasks revealed consistent factor structures, indicative of strong construct validity and good reliability. In utilising the same task wordlists and closely aligned procedure, correlations between the IR and DR tasks were found to be of similar magnitude in TILDA as in ELSA. Moreover, normative data for performance on IR and DR suggested similar means/medians and ranges of performance across the three studies, notably with TILDA median IR scores falling between the means of ELSA and HRS.
Animal naming task. The animal naming task entails self-generation of common semantic categories by participants, and is routinely used in the evaluation of clinical populations, including patients with Alzheimer's-type dementia [5], and in patients with neuropsychological conditions [6]. Adopting the procedure of ELSA, participants in TILDA were asked to name as many animals as possible within 60 seconds (discounting proper nouns and repetitions).
In a systematic review of the extant literature on verbal semantic fluency evaluation in large cohort studies, [6] reported the mean (±SD) scores from animal naming tasks conducted in 12 studies carried out between the years 1999 and 2015. The mean of mean scores from the 12 studies was reported as 19.83 (5.76) animals named correctly [6] (their Table 1) (we note that the 12 studies included one on the TILDA cohort; see [7]). Furthermore, examining verbal semantic fluency in the ELSA cohort (Wave 4, 2008-2009), [3] found mean (±SD) of animal naming scores was 20.3 (6.5) [N=5197], comparable to means (±SDs) reported previously by [8] for Wave 1 (20.8±7.0) and Wave 2 (19.2±6.1) of TILDA [N=3417]. Indeed, the current MRI subcohort of TILDA (randomly sampled from the main cohort) had a median animal naming score of 19 (IQR: 15.5,23), in line with the Wave 1 and Wave 2 observations for the sample of the main TILDA cohort analysed by [8], and the means reported by [3].
Hence, given the close consistency of central tendency measurements for animal naming from a sample of large cohort studies [6], ELSA [3], and previous TILDA Waves [8], the results from the present report concur closely with expected score distributions.
Test-retest reliability: TILDA MRI cohort. Finally, appraising the test-retest reliability of the IR, DR and AN tasks within the TILDA MRI cohort, we ran Spearman correlations across these tasks for participants within the MRI cohort who had provided data for all three tasks at Waves 1-3 of TILDA. In order to maximise the test-retest N(=513), we included participants in the correlations who were excluded from the mixed effects analyses in the present report based on MRI QC criteria (e.g., lesions, motion artifacts, poor reconstructions). Since the IR task in each wave allowed participants two attempts at recalling the word list, we entered each attempt of the IR task per wave into the correlational analyses, in order to appraise consistency of measurement within the same session (attempts referred to as IR1 and IR2). In sum, correlational analyses revealed strong test-retest reliability within-wave for assays of IR performance; DR showed similarly high consistency with within-wave IR scores. Across waves, test-retest reliability was strongest for IR performance when considering adjacent waves, but was reduced for non-adjacent waves (likely due to the greater inter-assessment interval); DR performance showed good reliability across adjacent and nonadjacent waves. AN performance across waves suggested good test-retest reliability, with a small increase in correlation between adjacent versus nonadjacent waves. Coupled with the correlations between tasks observed in independent cohorts referenced above (see [2]), our findings point towards largely strong reliability of measurements within the TILDA MRI cohort.

Covariate Selection
As presented in Table 1, we measured depressive symptoms using the abbreviated form of the CES-D depression inventory. We found that the range of scores for depressive symptoms was truncated, with the 1st to 3rd quartiles (n=335) of the distribution reflecting scores less than 5. The 4th quartile (n=97) comprised scores of 5 or greater; however, in practice, we found that most of these participants had scores between 5 and 7, below the cutoff of 8 deemed clinically relevant for depression. Hence, the cohort overwhelmingly did not meet criteria for depression. Given the very restricted range of values for CES-D scores and the largely non-depressed nature of the cohort, we did not adjust for CES-D depression scores in statistical models.