Early-life stress biases responding to negative feedback and increases amygdala volume and vulnerability to later-life stress

Early-life stress (ELS) or adversity, particularly in the form of childhood neglect and abuse, is associated with poor mental and physical health outcomes in adulthood. However, whether these relationships are mediated by the consequences of ELS itself or by other exposures that frequently co-occur with ELS is unclear. To address this question, we carried out a longitudinal study in rats to isolate the effects of ELS on regional brain volumes and behavioral phenotypes relevant to anxiety and depression. We used the repeated maternal separation (RMS) model of chronic ELS, and conducted behavioral measurements throughout adulthood, including of probabilistic reversal learning (PRL), responding on a progressive ratio task, sucrose preference, novelty preference, novelty reactivity, and putative anxiety-like behavior on the elevated plus maze. Our behavioral assessment was combined with magnetic resonance imaging (MRI) for quantitation of regional brain volumes at three time points: immediately following RMS, young adulthood without further stress, and late adulthood with further stress. We found that RMS caused long-lasting, sexually dimorphic biased responding to negative feedback on the PRL task. RMS also slowed response time on the PRL task, but without this directly impacting task performance. RMS animals were also uniquely sensitive to a second stressor, which disproportionately impaired their performance and slowed their responding on the PRL task. MRI at the time of the adult stress revealed a larger amygdala volume in RMS animals compared with controls. These behavioral and neurobiological effects persisted well into adulthood despite a lack of effects on conventional tests of ‘depression-like’ and ‘anxiety-like’ behavior, and a lack of any evidence of anhedonia. Our findings indicate that ELS has long-lasting cognitive and neurobehavioral effects that interact with stress in adulthood and may have relevance for understanding the etiology of anxiety and depression in humans.


Experimental timeline
A timeline is provided in Figure S1 that describes the sequence and timing of all procedures that animals underwent. Further detail is provided in subsequent methods sections.

Food restriction
Using an online tool 1 , body weight data was extracted from published free-feeding growth curves of male 2 and female 3 Lister-Hooded rats. A linear model was fit for each sex to estimate weight based on animal age. Both models included two terms, one of which was the natural logarithm of animal age, and one of which was an inverse exponential of animal age. R 2 for both models was > 0.99. Model coefficients were then used in an equation that allowed prediction of an individual animal's weight at any age based on a free-feeding measurement collected at any other age. Food restriction involved once-daily administration of an amount of chow titrated daily to maintain each animal above 85% of its individually predicted free-feeding weight.

Adult footshock stress
The number of shocks delivered across the first 8 days for all animals was as follows: 2, 1, 2, 0, 1, 2, 0, 1. The distribution of shock delivery across the remaining 11 days varied between animals but was balanced across group and sex, and animals always received 2 shocks on the day immediately before: each animal's MRI scan day (any of stress days 9-13), the sucrose preference test day (any of stress days [16][17], and the sacrifice day (stress day 19). These varied schedules were necessary to circumvent the limited number of MRI scans possible to conduct on a given day. Animals never received shock sessions on their sacrifice day. The timing of the footshocks within each session was random, except that shocks were never delivered during the first five minutes or last ten minutes of the session. Shock intensity was 0.5 mA and duration was 0.5 seconds. The chambers used were conventional operant chambers (Med Associates, St Albans, VT) equipped only with a grid floor, a house light, high-contrast distinctive wallpaper behind the transparent plexiglass surfaces, and a small overhead camera. A distinctive scent cue was provided by applying 3 drops of eucalyptus globulus essential oil (Neal's Yard Ltd, Cambridge, UK) to a cotton ball and placing it in the sound-attenuating box that enclosed each chamber before each session. The grid floor was connected to a scrambled shock generator (Med Associates).

Arena-based behavioral testing
Three behavioral tasks involved monitoring animal performance in an arena or maze. For each of these tasks, the appropriate plexiglass arena was placed above a 1.1 m × 1.1 m infrared-illuminated platform. Rat movements were recorded from above using an infrared camera (FLIR Systems, Wilsonville, OR, USA), and analyzed using VideoTrack software (ViewPoint Behaviour Technology, Lyon, France). Rats were tested on each of these tasks only once. Testing always occurred during the dark phase of the light cycle, which occurred between 0900 and 2100.
Rats were tested on the elevated plus maze (EPM) between PND 66-69. This task exploits rats' natural fear of open and well-lit places in order to obtain a measure of anxiety 4  Novelty-induced locomotor reactivity was assessed between PND 70-73. In this task, rats were placed in a 50 cm × 50 cm × 50 cm arena with grey walls and a white floor, and their movements were recorded and tracked. Distance moved across the whole 2-hour session was determined. The arenas were brightly lit at 500-600 lux.
The novelty preference test was performed between PND 74-76. The arena for this task contained two large rectangular chambers (20 cm × 50 cm × 50 cm) divided by a small rectangular alleyway (10 cm × 50 cm × 50 cm). The two large chambers differed in color (white or black) and texture (smooth or dotted with small round pits). Rats were placed for 5 minutes in the central alleyway and allowed to habituate, before a door was opened to one of the other two chambers for 25 minutes. Finally, doors were opened to both chambers for 15 minutes, so that the animal could freely explore all three chambers. Novelty preference was defined as the time spent in the novel chamber (the second available chamber) as a percentage of the combined time spent in the familiar chamber (the first available chamber) and the novel chamber 5 . Illumination within all three chambers was kept very low at < 2 lux.

Sucrose preference test
Sucrose solutions were freshly prepared immediately before provision to animals, using foodgrade sucrose (MP Biomedicals, Solon, USA).
At each of the two testing timepoints, animals initially underwent three habituation steps. First, animals were provided with one bottle of 1% sucrose solution in their home cage for 24 hours, in addition to the water normally available via the cage rack water delivery system. After the 24 hours had concluded, animals were placed individually into large cages with clean bedding. One bottle containing 1% sucrose solution and one containing water were then provided. After 45 minutes, the bottles were removed and animals were placed back into their home cages. The next day, each animal was placed into the same cage as before for 45 minutes, but this time the location of the sucrose bottle and water bottle were swapped.
The following day, to conduct the test itself, each animal was again placed into the same cage, but was left there for four hours. The location of each bottle was reversed compared to the previous day, and these locations were again reversed two hours into the test session. At the pre-stress timepoint, tests of preference for the 0.5%, 1%, and 2% sucrose solutions were separated by a 44hour washout period in the home cage. For a given animal, all sessions (habituation or test) in the test cage commenced at the same time of day, and the daily food ration was always provided approximately 20 hours prior to that time.
The weight of the sucrose and water bottles was weighed before and after each test. For each bottle, the weight after was subtracted from the weight before, giving a measure of absolute sucrose solution and water consumption. Sucrose preference over the four hours was defined as the amount of sucrose solution consumed as a percentage of the total amount of sucrose solution and water consumed.

Apparatus
Chambers were contained within fan-ventilated, light-and sound-attenuating boxes. One wall of each chamber consisted of a 15-inch LCD touchscreen capable of infrared touch detection (Nexio Co. Ltd, Incheon, Korea), while the opposite wall contained a pellet receptacle with a head-entry detector and accompanying light. Chambers were controlled using the K-Limbic software (Conclusive Marketing Ltd., High Wych, UK).

Progressive ratio schedules of reinforcement
The stimulus was always located in the middle of the screen by width, roughly 2 cm above the metal bar floor. Where a stimulus touch or omission resulted in pellet delivery, a 1 s tone (2.9 kHz) was generated, the magazine light was turned on, and the stimulus was removed from the screen.
Upon head entry into the magazine, the magazine light was turned off, and a 5 s inter-trial interval was initiated, after which the stimulus was returned to the screen. Where a stimulus touch did not result in reward delivery, a 20 ms tone was generated and the stimulus was removed from the screen for 0.5 s. The house light remained off at all times. Animals underwent a maximum of one training or test session per day.
Animals progressed through three training stages: (1) for one session only, pellets were delivered even if animals did not touch the stimulus, (2) until animals earned 100 pellets in a given session, a pellet was delivered every time the animal touched the stimulus, and (3) until animals earned 100 pellets in each of 2 sessions, or 60 pellets in each of 5 sessions, a pellet was delivered every time the animal touched the stimulus 5 times. During the initial training session, the stimulus was presented for up to 30 s at a time; if animals touched the stimulus once, or the 30 s period elapsed without any stimulus touch (an omission), reward was delivered as described above. The number of sessions each animal required to progress through the training stages was recorded.

Probabilistic reversal learning
In the first stage of training (touch training A), at the start of each trial, the visual stimulus was randomly presented on either the left or right side of the touchscreen. The second stage of training (touch training B) was identical to the first, except that background touches were punished (with punishment defined in the main text). The third stage of training consisted of a deterministic reversal learning (DRL) task. This task was identical to the PRL task, except that a touch on the correct stimulus resulted in reward on 100% of trials, while a touch on the incorrect stimulus resulted in punishment on 100% of trials.
For each animal, only one chamber session was conducted per day. Animals commenced touch training A approximately 3-4 weeks in advance of the day they were scheduled to commence the adult stress. Animals were progressed to touch training B once they had completed one session of touch training A in which they earned at least 100 rewards. Animals were progressed to the DRL task once they had completed two consecutive sessions of touch training B in which they earned at least 100 rewards on both occasions. Animals were progressed to the PRL task once they had completed two consecutive sessions of the DRL task in which they achieved at least 4 reversals on both occasions. Once they began touch training, animals completed one session daily of the appropriate task up to and including the first day of the adult stress, when they were tested in the morning and commenced on the stress in the afternoon. After adult stress commencement, they were tested on stress days 3 and 6, and also day 11 where this did not conflict with an MRI scan.
Almost all animals completed at least seven sessions of the PRL task before the adult stress began, but many did not complete more sessions than this, so pre-stress analysis was limited to the first seven sessions.
Prior to calculation of the latency means per session, to minimize the impact of extreme intrasession outliers, latency data were winsorized within each session using a conservative threshold of 3.5 median absolute deviations [6][7][8] .
Several measures beyond those discussed in the main text were recorded or calculated. The number of reversals was recorded, as was the average number of perseverations per reversal.
Perseverations were defined as serial touches on the newly-incorrect target following reversal, not including the first post-reversal incorrect touch (if one occurred). Thus, if an animal touched the incorrect target twice immediately following reversal, one perseveration would have been scored for that reversal. The incorrect-win stay proportion and correct-loss shift proportion were calculated. For both of the touch training tasks and for the DRL task, the number of sessions each animal took to achieve criterion and thus progress to the next task was recorded.

MRI scanning
After induction using 3-5% isoflurane, 1-2% isoflurane was delivered in 100% oxygen at 1 L/min. During the scan, rats were monitored using a pulse oximeter, respiratory tracer pad, and rectal temperature probe (SA Instruments, Stony Brook, NY, USA). Vital signs (heart rate, respiratory rate, oxygen saturation, body temperature) were maintained within strain-specific norms via adjustment of anesthesia depth and the temperature of a heat pad secured atop the animal.

Image acquisition
A three-dimensional gradient echo sequence was used to acquire structural images. The field of view of 25.60 × 20.48 × 30.72 mm 3 was constrained within a matrix of 160 × 128 × 192 voxels, each with an isotropic resolution of 160 µm 3 . Three different image types were obtained, with each providing unique tissue contrast: magnetization transfer (MT) images, proton density (PD) images, and T1-weighted images. PD images were acquired using a relaxation time (TR) of 25 ms and an echo time (TE) of 2.41 ms, with a flip angle (FA) of 6°. MT images were acquired using the same parameters, but with the additional application of 4 ms radiofrequency (RF) pulses with a Gaussian shape (2 kHz frequency offset, bandwidth 685 Hz, 10 uT magnitude). T1 images were acquired using a TR of 18 ms, a TE of 2.41 ms, and an FA of 40°. Acquisition was accelerated by a factor of 1.55. Images were acquired every 2.1 ms until 6 MT, 6 T1, and 8 PD images were acquired, and then for each image type, a single average image was produced.

Image pre-processing
All MRI processing operations were performed using tools included in: the Advanced Normalization Tools (ANTs) toolbox v2.3. First, bias field correction was performed using N4BiasFieldCorrection (ANTs) 15 on MT, T1, and PD images. Depending on the image type, between 6-18 bias-corrected images were generated for each input image, each using a different b-spline mesh, and then corrected images were averaged without normalization to produce the final bias-corrected image.
Because MT, T1, and PD images from the same animal at a given timepoint were not perfectly aligned with each other, the rigid transformations from bias-corrected T1 and PD to MT space was found for each set of those three images using antsRegistration. This transformation was then converted to MRTrix3 format using c3d_affine_tool (convert3d) and transformconvert (MRTrix3), and then applied to the header of the bias-corrected T1 or PD image using mrtransform (MRTrix3) to avoid resampling.

Generation of study templates
Tri-modal template generation was performed using antsMultivariateTemplateConstruction2 (ANTs). At each timepoint (PND 20, 62, and 285), one animal's scan was selected to act as the initial reference space, and then sequentially for all other scanned animals, the three bias-corrected images (MT, T1, PD) were concurrently rigidly registered into the reference space. The transformed images were averaged with normalization to produce three modality-and studyspecific template images (referred to as "study templates"), and then each animal's original biascorrected images were non-linearly registered to the new templates. This average-and-register step was performed a total of four times in sequence: twice with a three-stage symmetric normalization (SyN) registration, and then twice with a four-stage SyN registration involving 30-50 iterations at the final stage 16 . In all multi-modal registrations, the MT, T1, and PD images were given relative weights of 2:1:1 in calculation of the image metric, due to the superior grey-white matter contrast in the MT images.
Extensive optimization of registration parameters, including whether and where to use brain masks, was performed. Ultimately, regarding masking and brain extraction, the following methods were used: For the PND 20 timepoint only, the template brain was manually masked, and the mask then inverse warped into subject space. Each subject brain mask was manually edited for accuracy, and then the entire template generation process was repeated using brain-extracted images. For the PND 62 and 285 timepoints, template generation was performed using whole-head images without supplying antsRegistration with any masks. A final, additional registration to the template was then calculated for use in analyses. This registration used whole-head images, but the affine part was performed by supplying antsRegistration a mask of the fixed image only, while the non-linear part used no masks.

ROI volume quantitation
Region of interest (ROI) masks were derived from the Tohoku University Rat Brain Atlas, which was kindly provided by Dr Akira Sumiyoshi of the National Institutes for Quantum and Radiological Science and Technology, Japan (personal correspondence, January 2021). The creation of this atlas is described in Liang et al. 17 and Valdés-Hernández et al. 18  First, all ROI masks were extracted from the atlas. Then, after unilateral masks were merged to form bilateral masks, the Cg1 and Cg2 masks dividing the cingulate cortex were merged, and the six masks dividing the insular cortex were merged. Non-linear transformations were identified from the Tohoku template image to the PND 285 MT study template image, and from the PND 285 template to the PND 20 and PND 62 templates. These transformations were used to warp select ROI masks into each of the three study template spaces. A mask of grey and white matter together was created for each study template, based initially on the range of voxel intensity values across these two tissue classes, then with further manual editing for accuracy. These intensity masks were applied to the ROI masks to remove any aberrant voxels in CSF, meningeal tissue, or air spaces. ROI visualizations were produced using MRIcroGL 23 .
Six ROIs were selected for volume quantitation, including four subcortical ROIs (amygdala, nucleus accumbens, dorsal striatum, and hippocampal formation), and two cortical ROIs (cingulate cortex and insula). The amygdala mask encompasses all amygdaloid nuclei delineated in Paxinos and Watson 21 , including the basolateral, central, medial, intercalated, lateral, posterolateral cortical, and posteromedial cortical amygdaloid nuclei 24 . The nucleus accumbens (NAc) mask encompasses both the core and the shell. Sub-segmentation of ROIs was not attempted because almost all of their internal boundaries are not visible using any of the three contrasts acquired here.
Non-linear registration depends on intensity boundaries or gradients and cannot attend to invisible boundaries, and thus volume quantitation of sub-regions would be unreliable due to the considerable or predominant influence of the affine part of the registration on the size of the final, subject-space ROI mask.
ROI masks were inverse-warped from each study template space back to subject space. The graywhite matter mask for each template space was also inverse-warped to subject space, to provide an index of total brain volume (TBV). All warping of ROI masks from the PND 285 template space onward was performed using linear interpolation, and ultimately ROI volumes were calculated by multiplying the mean intensity value (range 1 ≥ x > 0) of non-zero voxels by the total volume (in mm 3 ) occupied by non-zero voxels.

Blinding
Experimenters were not explicitly blinded to experimental condition.
Categorical predictors were handled using sum-to-zero coding. Where mixed-effects models were used: model fitting used restricted maximum likelihood estimation 28 , model fitting was performed using the lme4 package, v1.1-29 28 , models included a random intercept for each subject with no further random effects, and degrees of freedom were approximated using the Kenward-Roger method [29][30][31] .
After model fitting, model assumptions were checked using diagnostic plots. Normality of residuals was evaluated using quantile-quantile plots while homogeneity of variance and linearity were evaluated using plots of residuals vs fitted values. For linear mixed-effects models, diagnostic plots were generated using the redres package v0.0.0.9 32 , whereas for logistic mixed-effects models, diagnostic plots were generated using the dharma package v0.4.3 33 , which also confirmed no over-or under-dispersion of the model residuals. A data point was considered to be influential, i.e. to unduly influence the model, when it had a Cook's distance of ≥ 1. When any assumption appeared to be violated, or there was at least one influential point, nonparametric methods were used: permutation testing was used for hypothesis testing, and nonparametric bootstrapping was used for descriptive statistics [34][35][36] .
Type II F-statistics were calculated from mixed-effects models using the KRmodcomp function in the pbkrtest package v0.5.1 29 , and from fixed-effects models using the base R anova function. For post-hoc comparisons, two-sample t-(parametric) or z-(permutation test) statistics were calculated using the emmeans or emtrends functions from the package emmeans v1.7.3 31 . No multiple comparisons correction was performed.
For permutation testing, the Freedman-Lane method was used 37 . Specifically, for each predictor, a larger model was fitted that included the predictor but no higher-order terms, and a reduced model was fitted that excluded the predictor. An F-statistic was calculated which compared these models. Then, to generate a reference null distribution for this F-statistic, 10,000 permutations (complete, random samples without replacement) of the animal identifier were generated using the sample function in base R, in a data table consisting of the mapping from animal identifier to any between-subjects variables (e.g. group and sex). This permuted mapping was then merged with a data table consisting of the residuals from the reduced model and the corresponding animal identifier. These residuals retained the random but not fixed effects from the reduced model. The fuller model and reduced model were then both fitted to each permuted dataset, and compared via calculation of an F-statistic, resulting in a null distribution of 10,000 F-statistics. Finally, the pvalue was calculated as the proportion of permutation-derived F-statistics that were equal to or larger than the F-statistic derived from the unpermuted data. For post-hoc comparisons, an identical procedure was followed, using a larger model that included the interaction term to be followed up, and a reduced model that excluded it, except that for a given post-hoc comparison, a z-statistic was calculated from the larger model, and compared to a null distribution of z-statistics derived from a set of 10,000 larger models fit to permuted datasets.
Estimated marginal means (EMMs) were calculated using the emmeans R package 31 , and except in the case of nonparametric bootstrapping, standard errors were also derived using the same package. To obtain EMMs and associated standard errors via nonparametric bootstrapping, animal identifiers were first resampled (using sample in base R) within each group-sex condition 10,000 times with replacement, using original within-condition sample sizes. All outcome and predictor data corresponding to each resampled animal identifier were then pulled from the original data to create a resampled dataset, which the relevant model was then fit to. From each model, any appropriate EMMs were calculated. To obtain the final estimate of a given EMM, the mean of the 10,000 sample-derived EMMs was calculated, and the standard error was calculated by determining the quantiles corresponding to the probabilities 0.1587 and 0.8413 (z-score -1 and +1), after slight expansion of these probabilities to adjust for the moderately small sample sizes 38-40 , and dividing the difference in these quantiles by two.
Box plots were produced from appropriate partial residuals, which were calculated using ggemmeans from the ggeffects package v1.1.4.
Observations were only excluded from analysis in case of documented animal illness.
Code availability: analysis scripts are available upon request.

Body weight analysis
For analysis of pre-stress body weight, age was divided into bins of seven days each, starting from PND 20, and each animal's weight measurements were averaged for each bin. Because animals commenced adult stress at different ages, weights taken during adult stress were excluded from calculation of per-animal averages, and analysis extended only to the bin starting at PND 258, because large numbers of animals progressively commenced the adulthood stress shortly after that age. Because weight was not linear over time, age was treated as a categorical variable. For poststress analysis, no binning was performed, and stress day was treated as a continuous variable because animals' weights were linear over time. Post-stress analyses were adjusted for meancentered baseline, which consisted of animals' median weight over the thirteen days before the first stress day.

Behavioral data analysis
Model structures are reported in the statistics tables in the supplementary information. For adult stress analyses, models included baseline as a predictor [41][42][43] . For PRL analyses, a baseline for each metric was created for each animal by taking the median over its final five pre-stress PRL sessions.
For SPT analysis, each animal's total pre-stress sucrose preference was used as its baseline, defined as the total sucrose solution consumed over the three pre-stress tests as a percentage of the total amount of sucrose and water consumed in that time. Before inclusion in models, baseline variables were first mean-centered.
Body weight was measured at varying intervals from PND 20 through sacrifice. For analysis of pre-stress body weight, age was divided into bins of seven days each, starting from PND 20, and each animal's weight measurements were averaged for each bin. Because animals commenced adult stress at different ages, weights taken during adult stress were excluded from calculation of per-animal averages, and analysis extended only to the bin starting at PND 258, because large numbers of animals progressively commenced the adulthood stress shortly after that age. Because weight was not linear over time, age was treated as a categorical variable. For post-stress analysis, no binning was performed, and stress day was treated as a continuous variable because animals' weights were linear over time.
For PR and PRL analyses, all models treated session number as a continuous variable, to minimize type II error by conserving degrees of freedom and because no frankly non-linear relationships between session number and any response variables were apparent on exploratory data visualization. Where a two-term interaction involving group and session was significant with no significant higher-order interactions, this was followed up with a pairwise comparison between groups at both the first and last session. Where a three-term interaction involving group, sex, and session was significant, a new model was fit for each sex.

MRI data analysis
A separate linear mixed-effects model was constructed for each ROI, which included the full interaction of group, sex, and timepoint, as well as TBV and its interaction with timepoint. As there were only three timepoints and ROI volume was never linear across them, timepoint was treated as a categorical variable. TBV was centered around the mean TBV at each timepoint. For some ROIs, model residuals were modestly non-normal; for consistency, non-parametric statistics were calculated and presented in the main text for all models, but parametric statistics are similar and are also presented in the supplementary information.  Figure S1. Experimental timeline. The range of post-natal days (PND) for the start and end of each procedure underwent by repeated maternal separation (RMS) and control animals are provided.

Figure S2. Repeated maternal separation (RMS) did not affect sessions-to-criterion for either of the training stages for the progressive ratio (PR) task.
Before being tested on the three progressive ratio (PR) schedules of reinforcement, animals were trained on fixed ratio 1 and 5 schedules, in which they had to respond on the target one and five times respectively to earn a reward. There were no differences in sessions-to-criterion between RMS (n = 28) and control (n = 28) animals. Histogram bins are one session in width.