We welcome the ongoing interest in our work1 and the opportunity to address2 concerns with respect to: (i) controlling for sex/ethnicity; (ii) DNA extraction and (iii) telomere length (TL) measurement methodology.

  1. (i)

    It is well known that sex and ethnicity can affect TL, and both were considered in the analyses and discussed with the reviewers. They were not included in the manuscript for reasons of brevity but we are happy to present them now: including sex in the model does not change the association between prenatal maternal stress and newborn TL (sex: ß = −.15, p = .009; maternal perceived stress: ß = −.13, p = .02). Analyses in individuals with Caucasian parents yields results comparable to those obtained for the total sample for newborns (ß = −.18, p = .003, n = 273; total sample ß = −.14, p = .015) and for mothers (ß = −.10, p = .096, n = 274; total sample ß = −.11, p = .055). Thus the reported associations are not confounded by sex or ethnicity.

  2. (ii)

    As explicitly stated in the manuscript, Qiagen kits rather than the standard Chemagen method were used for 14 samples due to a low volume of cord blood. The t-test for independent samples was used to determine differences in mean TL resulting from the different extraction methods (t(13.38) = 2.45, p = .029). This was statistically controlled for in the analyses.

  3. (iii)

    Contrary to the statement of Esteves et al., quality thresholds for the measurements were presented: one concerning cycle difference between technical duplicates for T and S, and one concerning the linear range of the assay. Our coefficient of variation (CV) was at the lower end of the range reported by other laboratories, and is comparable to that reported in our previous large scale studies.3 As outlined in our paper, it is not possible to report an inter-run CV for all samples, as only 128 samples were measured on two occasions. Absolute quantification is not performed against a standard curve. The Rotor-Gene Q comparative quantification software is used to quantify T and S levels relative to K562. Amplification efficiency is calculated and used in the quantification rather than assuming 100% efficiency. Therefore, any minor differences in efficiency between runs are already corrected for within our measurement values. While we minimise the effect of technical variation between batches as much as possible (single reagent batches, consistent equipment and assay QC) our analysis revealed a small batch effect. Repeating the analyses without accounting for batch effects did not change the association between prenatal maternal stress and children’s TL (ß = −.13, p = .027 compared to ß = −.14, p = .015 as reported) as well as between maternal lifetime psychiatric disorder and maternal TL (ß = −.12, p = .028 compared to ß = −.11, p = .055 as reported) indicating that batch effects are of marginal importance.

We remain confident that our methods and analyses are solid, and naturally share the call of Esteves et al. for larger samples. While our cohort is still the largest to date to show an association between maternal stress during pregnancy and TL in the offspring, larger studies and meta-analyses are needed for a more comprehensive exploration of the multiple factors that impact TL.