Introduction

Hippocampus is among the most important brain structures involved in memory1,2,3, and is a critical site of pathogenesis in dementing illnesses such as Alzheimer’s Disease4,5,6,7. Decades of ex vivo human studies and in vivo animal studies have revealed anatomical and functional heterogeneity of the hippocampal subfields8,9,10,11,12. Yet, until recently, very few in vivo studies in humans had shown dissociable relationships between performance in different memory domains and hippocampal subfield anatomy or function13,14,15.

The role of the hippocampal subfields with respect to select domains of memory thus remains under explored. Recent work has shown that errors during real-world spatial navigation are negatively associated with hippocampal tail volume in mild cognitive impairment (MCI), but with Cornu Ammonis (CA) 3 volume in healthy controls16. Yet, despite this, there remains a limited understanding of the role of hippocampal subfields in other memory domains relied on in daily life, such as verbal episodic memory (e.g., recalling a grocery list) or semantic memory (e.g., retrieving familiar nouns) [but see17]. Verbal episodic and semantic memory have been shown to dissociate to hippocampal versus anterior temporal regions respectively18,19,20,21,22. However, much of our understanding of these dissociations is based on small and heterogeneous patient cohorts19,20,21. Little is known about these relationships in the context of healthy ageing; less still is known about the effects of age with respect to relationships between hippocampal subfields and specific memory domains, despite evidence of age-related variation in subfield anatomy23,24,25.

Recent advances in computational methods and atlases available for anatomical MRI have improved the reliability and efficiency with which histologically-validated hippocampal parcellations may be applied to in vivo datasets9,26,27,28,29. Spatially refined subfield parcellations have now been used in studies of clinical disorders30,31, and in one study of MCI and educational attainment32. Nevertheless, a pressing concern is the quantification of relationships between hippocampal subfields, memory domains, and age, particularly given projected worldwide growth in dementia prevalence in older adults33, and the key role of hippocampus in modulating memory34.

Here, we explored the role of hippocampal subfield anatomy with respect to two domains of memory, in healthy ageing. In a large, non-demented sample of community-dwelling older adults, we aimed to dissociate memory domains based on their expected patterning with hippocampal subfield volumes. We predicted relationships would emerge between verbal episodic memory (list learning and retrieval) and volumes of subfields CA1, CA2/3, CA4, and granule cell layer of dentate gyrus (GC-DG) – regions heavily implicated in encoding and retrieval processes12,35. In contrast, we expected that fluency in semantic memory (retrieval of familiar category names) would show little if any relationship with subfield volumes19. We appraised these relationships in tandem with age-related differences in subfield volumes, by assessing the robustness of effects related to memory alongside fits for cross-sectional age.

Materials and Methods

Design and participants

Details of TILDA’s design have been published previously36,37. Briefly, the study comprises a clustered stratified random sample of the population aged 50 and over living in the Republic of Ireland. At Wave 3 (2014), participants completed a computer assisted personal interview in their home (CAPI; N = 6,618; 85% response rate), and a physical health assessment with a trained research nurse at a health centre, or at home (N = 5,364; 82% response rate). Wave 3 participants who completed a health assessment were later invited to complete the MRI protocol. The study received ethical approval from the Research Ethics Committee, School of Medicine, Trinity College Dublin. All experimental procedures were performed in line with Trinity College Dublin School of Medicine guidelines and regulations for ethics in research involving human participants.

MRI sample

Initial recruitment prioritised participants aged 65 and over in order to limit attrition amongst the oldest old within the sample, with later recruitment targeting those aged 50–64 years. Participants provided voluntary informed consent before their scan appointment.

In total, 578 participants attended for MRI. T1w datasets were acquired from 560 participants; 18 did not provide data (due to claustrophobia/nerves [n = 14], or MRI contraindication [n = 4]). Supplementary Fig. S1 indicates the exclusions made to the sample due to data quality issues (n = 51). We excluded 73 further participants based on physical and cognitive health criteria (Supplementary Fig. S1). A sample of 436 participants (see Table 1) were available for volumetric analyses (median age [IQR, age range]: 68 [65–73, 52-88]).

Table 1 Descriptive data for analysis sample.

Verbal episodic memory and semantic fluency assessment

Participants’ verbal episodic memory and semantic fluency performance were assessed by trained interviewers during the Wave 3 CAPI. Assessments comprised immediate and delayed verbal recall of word lists; semantic fluency was assessed by free naming of animals. All participants were fluent English speakers.

Immediate (IR) and delayed recall (DR)

IR and DR were tested during the cognitive module in the CAPI. One of four possible 10-item word lists was selected randomly by the CAPI computer (lists were the same as those validated by the Health and Retirement Study38; see Supplementary Methods). The list was then presented to the participant by audio recording, or by the interviewer reading aloud (in instances of difficulty hearing the recording; audibility was verified with a brief test recording played for the participant). Participants were instructed to listen to the entire list carefully (approx. rate: 1 word/2 s), and were then prompted to repeat as many of the presented words as possible within two minutes. The interviewer recorded the number of words recalled correctly. The test was then repeated at once using the same procedure. The participant’s IR score was calculated as the mean number of words recalled correctly across the first and second attempt. The CAPI then proceeded to the animal naming task (see below), followed by five further sections of questions (cardiac disease history; other chronic conditions; falls/fractures; pain; medical tests – duration ~25 mins). Following this, participants completed the DR test, which required them to repeat as many of the word list items from the IR test as possible; DR score was the number of words recalled correctly. IR and DR were weakly negatively correlated with age (IR: Spearman ρ = –0.27, p < 0.0001; DR: ρ = –0.25, p < 0.0001), and highly positively correlated with each other (ρ = 0.7, p < 0.0001).

Animal naming (AN)

Participants were instructed to name as many animals as possible in 60 seconds. Task timing was controlled by the CAPI computer. The interviewer recorded each word spoken by the participant, scoring as correct common nouns (including subordinate levels of categories, e.g., doe, stag), and as incorrect any repeated items or proper nouns. AN was weakly negatively correlated with age (ρ = –0.19, p < 0.0001), and weakly positively correlated with IR and DR (IR: ρ = 0.29, p < 0.0001; DR: ρ = 0.25, p < 0.0001).

MRI protocol & T1w acquisition

Scans were acquired at the National Centre for Advanced Medical Imaging (CAMI), St. James’ Hospital, Dublin, via 3 T Philip’s Achieva system with 32-channel head coil. A 3D Magnetisation-prepared Rapid Gradient Echo (MP-RAGE) sequence was used. FOV (mm): 240 × 218 × 162; 0.9 mm isotropic resolution; SENSE factor: 2; TR: 6.7 ms; TE: 3.1 ms; flip angle: 9°.

Data inspection and hippocampal subfield reconstruction

All volumes were inspected for evidence of image artifact and presence of grey and white matter lesions by a trained operator blind to participant identity. Data for 33 participants were excluded due to motion artifact; 18 further datasets had one or more lesions present and were excluded (Supplementary Fig. S1). All T1w image analyses were completed in FreeSurfer v.6.039,40,41. We used the hippocampal subfields module within the FreeSurfer recon-all processing pipeline to segment hippocampus26,39. Details of these hippocampal segmentation routines have been published previously26. Briefly, the procedure employs a probabilistic atlas encoded in a tetrahedral mesh, and derived from manual segmentations using Bayesian techniques. Segmentation is posed as a Bayesian inference problem within a generative model, which spatially deforms atlas label prior probabilities; segmentations are achieved via Bayesian optimisation, based on the label prior probabilities and observed voxel intensities (see Supplementary Fig. S2). Recon-all procedures were run on a Linux computing cluster at the Trinity Centre for High Performance Computing. All hippocampal segmentations were inspected for error overlaid on the intensity normalised T1w volumes by a trained operator blind to participant identity. All datasets had hippocampal segmentations that fell within expected tissue boundaries; none of the recon-all or subfield reconstruction procedures yielded any reports of error.

Data Analyses

Hippocampal subfield volumes

Hemisphere-wise volumetric data (mm3) for hippocampal subfields per participant were gathered using FreeSurfer routines (quantifyHippocampalSubfields). Data were analysed in STATA 14 (StataCorp, TX).

Statistical modelling

The tightly folded structure of hippocampus leads to high correlation of subfield volumes within and between hemispheres; in part, this may arise via limitations with in vivo scan spatial resolution and image contrast, giving reduced spatial accuracy of segmentations. Mixed effects linear regression models of subfield volumes allowed us to fit random effects at the levels of hemisphere and participant, accounting for the intraclass spatial correlation within hemispheres and individuals that arises from these issues. Hence, the random effects modelled subfields as nested within hemispheres, and hemispheres as nested within participants: Yijk = β0 + βnijk… + vk + ujk + eijk; where Yijk was the volume of hippocampal subfield i, in hemisphere j, within individual k; β0 was the model intercept; βnijk… a set of fixed-effect covariate terms; vk the participant-specific intercept; ujk the hemisphere intercept for participant k; and eijk the subfield-specific residual term. All models included age, age2, gender, total grey matter volume, highest level of education (primary, secondary, tertiary), smoker status (never, current-past), handedness, and cardiovascular disease (any history/none of abnormal rhythm, angina, cardiac arrest, or heart attack) as fixed effect covariates. Frailty variables were included with the other covariates in initial models, but were dropped due to lack of improvement in model fit when entered in isolation or together (all p > 0.3). Covariates were selected based on previous literature showing impacts of cardiovascular risk42, smoking43, education32, and frailty44 on tissue volumes.

In separate models, we tested effects of IR, DR, and AN as predictors of hippocampal subfield volumes. Tasks were modelled separately to avoid multicollinearity. Initial inspection of data suggested non-linear trends between recall performance and subfield volumes. We therefore modelled linear, quadratic, and cubic terms for recall tasks, appraising model fit relative to the next simplest alternative using likelihood ratio tests. Models specified subfield as a fixed term, in addition to subfield being nested in the hemisphere random effect. We hypothesised that subfields including CA1, CA2/3, CA4 and GC-DG would be critical to learning performance, and therefore specified fixed effect linear and non-linear interaction terms between IR/DR and subfield. To explore effects of age comprehensively, we further included fixed effect linear and quadratic interaction terms between age and subfield. Within-model statistical significance (α = 0.05) of all terms was evaluated using Wald tests. All non-linear terms were evaluated with respect to the significance of the related lower order terms; e.g., in the case of quadratic terms, we deemed as significant only those where the corresponding linear term was also significant (since the interpretation of a significant quadratic term in isolation was not meaningful within the present models).

To evaluate the stability of final mixed effect fits for recall tasks, we performed 10-fold cross validation. We used a random sampling procedure to divide the cohort into 10 folds of approximately equal size (6 folds n = 44, 4 folds n = 43); we then iteratively fitted the fully-adjusted mixed models to 90% of the data, holding the remaining 10% for validation. Initial models of 90% of the data were estimated using fixed and random effects; predictions for the held-out 10% sample used the fixed model terms only. Root-mean-square error (RMSE) values were calculated for each of the 10 sets of predictions.

Results

Immediate and delayed recall relate to volumes of specific hippocampal subfields

Using mixed effects linear regression, we analysed hippocampal subfield volumes with respect to recall performance. Figure 1 and Table 2 presents marginal estimates of hippocampal subfield volumes from best-fitting immediate recall (IR) and delayed recall (DR) models, across IR and DR score ranges, and across age bands (note that recall and age estimates incorporate their respective linear and non-linear terms). Table 3 summarises coefficients for the non-linear recall x subfield and age x subfield interaction terms, from the same models (all models fitted main effects for each of the recall and age terms, in addition to their interactions with subfield; see Supplementary Tables S1 and S2 for full IR and DR output, respectively).

Figure 1
figure 1

Marginal estimates of hippocampal subfield volumes (mm3) from immediate recall (IR) and delayed recall (DR) models, showing effects of IR, DR and age. (a) Upper panels present marginal estimates (blue line; light blue band ±95% CI) of subfield volumes as a function of IR score; overlaid cyan scatter presents observed participant-wise data. Lower panels present marginal estimates (±95% CI) of subfield volumes as a function of age (years), with participant-wise scatters. (b) Marginal estimates for DR model; all specifications as per (a). *IR/DR cubic term significant, p < 0.05; • age quadratic term significant, p < 0.05; grey shading denotes marginally significant trend - see Tables 2 and 3, and Results. Note differences in y-axis ranges across panel rows in (a,b); adjusted to accommodate differences in subfield volumes (see also Supplementary Fig. 2). All marginal estimates calculated from fully-adjusted models, holding all covariates at their means.

Table 2 Marginal estimates (95% CI) of hippocampal subfield volumes across immediate recall (IR) and delayed recall (DR) scores, with age effects.
Table 3 Immediate recall (IR; left) and delayed recall (DR; right) coefficients for model interaction terms - non-linear recall and age effects.

We appraised whether inclusion of IR terms and their subfield interactions improved model fit. Iterative addition of IR interaction terms to the fully adjusted model revealed that the cubic IR terms and their interaction with subfield significantly improved the model fit, over quadratic (likelihood ratio [LR] test: χ2(11) = 555.6, p < 0.00001) and linear (LR: χ2(22) = 565.8, p < 0.00001) IR and their subfield interaction terms in the model. Cubic IR terms and their interaction with subfield significantly improved the fit of the fully adjusted model, relative to the fully adjusted model with no IR terms (LR: χ2(33) = 611.0, p < 0.00001; ΔAIC: 547). Similarly, in the fully adjusted DR model, the cubic DR and subfield interaction terms improved the fit significantly, compared to the quadratic (LR: χ2(11) = 552.0, p < 0.00001) and linear only (LR: χ2(22) = 576.95, p < 0.00001) DR and subfield interaction terms; the fully adjusted model with cubic DR terms significantly improved the fit relative to that model omitting all DR terms (LR: χ2(33) = 613.02, p < 0.00001; ΔAIC: 547).

In the fully adjusted IR model, significant subfield x cubic IR interactions emerged for CA1 (p < 0.006), CA2/3 (p < 0.001), CA4 (p < 0.01), GC-DG (p < 0.007), and molecular layer (p < 0.008) (Fig. 1a, top; Tables 2 & 3, left). In the same model, subfield x quadratic age interactions were significant for subiculum (p < 0.0001), CA1 (p < 0.005), molecular layer (p < 0.0001), and hippocampal tail (p < 0.0001), with a marginal trend for GC-DG (lin. p < 0.08, quad. p < 0.005; see Fig. 1a, bottom; Tables 2 & 3, left). Thus, CA2/3 and CA4 showed significant cubic fits for IR but no significant age effects; in contrast, subiculum and hippocampal tail showed significant quadratic age fits, but no significant effects of IR. CA1, molecular layer, and to a degree, GC-DG, manifested effects of IR and age.

A similar pattern emerged for the DR model. Subfield x cubic DR interactions were significant for CA1 (p < 0.007), GC-DG (p < 0.023), and molecular layer (p < 0.016), with a marginally significant cubic term for CA4 (p < 0.065); the cubic term did not reach significance for CA2/3 (p > 0.1), but the quadratic term was significant (p < 0.029) (Fig. 1b, top; Tables 2 & 3, right). Further, subfield x quadratic age interactions were significant for subiculum (p < 0.0001), molecular layer (p < 0.001), hippocampal tail (p < 0.0001), with a weak trend for CA1 (lin. p < 0.086, quad. p < 0.02; Fig. 1b, bottom; Tables 2 & 3, right). Like the IR model, CA2/3 and CA4 showed significant non-linear effects of DR but no significant age effects, whereas subiculum and hippocampal tail showed significant quadratic age effects but no significant DR effects. Both cubic DR and quadratic age effects manifested at molecular layer, while cubic DR effects were significant at CA1 but age effects were not.

Verbal fluency shows no robust relationship with hippocampal subfield volumes

We used mixed effects regression to analyse subfield volume with respect to animal naming (AN). As for the IR and DR models, we fitted the interaction between AN and subfield to the fully adjusted model. The subfield x AN linear interaction reached significance for presubiculum only (β = 0.56, p < 0.011, 95% CI: 0.13–0.99; other regions all p > 0.18). Evaluating the AN model against the fully adjusted model with no AN interaction terms showed no significant improvement in model fit with the addition of the AN terms (LR: LR: χ2(11) = −489.0, p > 0.9; ΔAIC: −511). Quadratic age effects in the model including AN terms recapitulated the IR model, with age effects at presubiculum and fimbria also reaching significance; however, omitting the AN terms yielded age effects consistent with the IR model (see Supplementary Table S3).

Consistency of IR and DR model predictions

To appraise the stability of the IR and DR models, we employed 10-fold cross validation, using a ‘leave one out’ procedure to initially fit each model whilst the remaining fold was held for validation. We used the fixed effects from the fully-adjusted IR and DR models with cubic recall terms fitted for training and prediction. Table 4 presents the RMSE values across folds for IR and DR predictions. Supporting the pattern of results found for the fully adjusted IR and DR models of the entire sample, cross-validated estimates of prediction error showed good consistency within and across models (IR RMSE range & SD: 36.33–42.13, 2.08; DR RMSE range & SD: 36.38–40.95, 1.51).

Table 4 RMSE values for model predictions from 10-fold internal cross-validation of IR and DR models.

Discussion

Here, we examined relationships between volumes of hippocampal subfields and performance in two domains of memory, in healthy ageing. We found that immediate recall (IR) performance during verbal list learning was associated non-linearly with volumes of CA1, CA2/3, CA4, GC-DG, and molecular layer, with cubic terms providing the best fits per subfield. Similarly, subsequent performance on list delayed recall (DR) was associated non-linearly with volumes of these subfields; again, cubic terms tended to afford the best fits (cf. CA2/3). In parallel, we observed age-related decline in the volumes of subiculum, molecular layer, and hippocampal tail in both models, with further declines noted for CA1 and GC-DG in the IR model; age-related declines were best fit by quadratic terms. Finally, we found that semantic fluency was not robustly associated with volumes of hippocampal subfields.

As predicted, our results revealed roles for CA1-CA4 and GC-DG, in addition to molecular layer, in the encoding and subsequent retrieval of novel verbal word lists. Several recent studies have provided piecemeal evidence of associations between subfield volumes and verbal episodic memory, implicating subiculum in immediate verbal recall45, and CA146, subiculum46,47, and presubiculum45 in delayed verbal recall. Our current IR findings implicate each of the CA subfields and GC-DG, supporting their previously demonstrated roles in encoding of novel stimuli/environments12,34, and in verbal and visuo-spatial episodic memory17,48. Our DR results agree in part with the IR findings; although cubic fit robustness was reduced for CA2/3 and CA4, overall trends showed that DR performance fluctuated non-linearly with CA, GC-DG and molecular layer subfield volume (further to17). Augmenting these results, our observation that free recall of familiar semantic categories showed little relationship with subfield volumes agrees with accounts of semantic memory as dissociable to non-hippocampal medial temporal regions, and temporal pole21,22.

A major aim of our present analyses was to explore age-related differences in subfield volumes in tandem with memory performance. That age terms in both the IR and DR models revealed non-linear decline in the volumes of subiculum, molecular layer, and hippocampal tail, differs from existing in vivo23 and ex vivo24 results, which have shown age-related decline in CA123,24 and dentate gyrus/CA4 volumes23. Although we observed some evidence of age-related decline in the IR model for CA1, we note that the GC-DG trend was weaker, and neither CA1 nor GC-DG showed robust decline with age in the DR model. Differences in the segmentation procedures (here, automated; cf.23, manual) and our larger sample size likely account for the divergent findings. A notable feature of our present results was the lack of age-related decline for CA2/3 or CA4, whereas volumes of both regions fluctuated with IR and DR performance. Animal models have shown critical roles for dentate gyrus and CA3 in pattern separation and pattern completion respectively, whereby many sources of cortical information are decorrelated in support of discretised memory representations (separation), and where various traces may be combined to allow recall based on multiple representations (completion)12,13,34,49. One implication of our findings may be that pattern completion processes focal to CA2/3 and perhaps CA4 are less susceptible to age-related atrophy in health, whereas regions including subiculum are more prone to manifesting grey matter loss50. Current clinical evidence suggests CA1-CA4 volumes and related recall performance appear to be most heavily impacted in the progression of MCI and Alzheimer’s disease51,52.

Our findings hold broader implications for memory performance and subfield atrophy in healthy ageing. The complexity of the trends observed in our data suggests that prediction of those at risk of eventual memory impairment requires a multiple-subfield view of hippocampus26. In particular, the cubic trends noted in our data suggest that poorest performers in IR and DR are likely to manifest subtle tissue loss in subfields including CA2/3, CA4 and GC-DG, compared to average performers. Moreover, it is notable that subfields including CA1 showed fluctuation in volume across IR and DR scores, yet at the upper and lower tails of the IR performance range, CA1 volumes were similar (see Table 2, left). Taken together, our results suggest that detecting those most at risk of subtle memory impairment may require memory assessment at multiple time points, and detailed assessment of CA2/3, CA4 & GC-DG anatomy, against well-characterised normative data for a range of ages. A limitation of the existing literature has been the relatively small sample sizes employed (typically N < 150), which may mask the complexity of performance-anatomy relationships; here, we were able to characterise these profiles in a large sample with broad ranges of both memory performance and age.

Avenues for future research may include the potential to combine detailed assays of hippocampal subfield volumes with advanced machine learning techniques, as a means of predicting cognitive performance based on subfield volumes. Recent machine learning approaches have tracked the progression of MCI towards Alzheimer’s disease (e.g., by training support vector machines to discriminate between Alzheimer’s patients and healthy controls, and then applying the trained model to MCI patient data53). However, such approaches are restricted to classification of categorical disease outcomes. More recent approaches have involved predicting continuous data (e.g., age54,55, pain ratings56) from MRI scans using machine learning methods (e.g., elastic net or Gaussian process regression). Advancing such techniques, recent approaches have trained artificial neural networks to predict cognitive performance based on hippocampal subfield volumes and cortical thickness data57. In future studies, such models could be used to generate predictions for an individual’s expected longitudinal cognitive performance; an observed discrepancy between the model prediction and an individual’s subsequent true performance could serve as a clinical indicator for MCI risk. Moreover, the potential to construct such models using a range of additional physiological measurements as training set features (e.g., serum markers for inflammatory cytokines, blood pressure, objective gait assays) could enhance prediction accuracy, by allowing for broader characterisation of both neural and physiological phenotypes that may precede MCI onset.

An important consideration for future studies will be scan spatial resolution, which impacts the accuracy of hippocampal subfield measurements; replication of the present results in a cohort with scans of < 0.6 mm isotropic resolution would be beneficial. Indeed, previous investigations have employed higher resolution scans than the present study27,58, within semi- and fully-automated hippocampal segmentation routines. A further issue concerns the image contrast employed in the segmentation protocol26,58,59. A number of automated pipelines (including that within FreeSurfer v.6.0) enable the specification of T1w and T2w input images, as aids to hippocampal atlas construction59 or subject-level hippocampal segmentation26. The present imaging protocol did not include a T2w acquisition that was suitable for combined use with the T1w image (owing to in-plane resolution differences); hence, only the T1w image could be used as input to the FreeSurfer segmentation procedure. This holds implications for the accuracy of segmentation of some subfields. As outlined in26, use of T1w images alone in the FreeSurfer parcellation scheme can lead to under segmentation of the molecular layer, an issue that is largely resolved when both T1w and T2w images are used. Thus, future MRI investigations with the TILDA cohort would benefit from integration of high-resolution T1w and T2w scans in order to achieve the most optimal estimates of tissue volumes within the hippocampal subfields.

In sum, our results reveal that specific subfields of the hippocampus manifest non-linear associations with verbal memory encoding and retrieval performance in non-demented older adults. These effects are partly dissociable from age-related atrophy, and from naming of well-consolidated semantic categories. Our results may enable us to generate predictions for those at greatest risk of incident memory impairment in future TILDA waves.