Preterm birth (<37 gestational weeks) predisposes to several long-term health problems, including cognitive defects and higher risk of mental health problems.1,2,3 Preterm children learn motor skills later and perform worse in mathematics and reading at school than their term peers, and they score lower in cognitive tests in childhood and also in young adulthood.2,4,5,6,7 However, most of the extensive research on cognitive ability has focused on the smallest and most premature: very or extremely preterm (<32 or <28 gestational weeks) or very or extremely low birth weight (VLBW; <1500 or <1000 g birth weight) children. Cognitive defects among these groups are well established7,8 and are accompanied by deficits in executive functioning,9,10,11,12 comprising active and intentional cognitive processes, including controlling thoughts, behavior, and emotions.13 Very preterm children and adults as a group display lower intelligence; in two separate meta-analyses, they achieved an approximately 0.8 to 0.9 SD lower intelligence quotient (IQ) than those born at term.7,8

Infants born late preterm (34 + 0–36 + 6 gestational weeks) constitute the majority of all preterm infants (72% in the USA and 85% worldwide).14,15 Among toddlers and school-age children, those born late preterm have a higher risk for developmental disabilities, a greater need for special education, and worse performance in cognitive tests than those born at term.16,17,18,19 Few studies have reported any cognitive test results for adults born at the more mature end of prematurity, as recently reviewed elsewhere.20 In two Scandinavian studies of male military conscripts, higher gestational age was correlated with better intellectual performance at 18–19 years of age.21,22 In our previous Arvo Ylppö Longitudinal Study cohort, young adults born late preterm did not differ from full-term controls in Full-Scale, Verbal, and Performance IQ tests after adjustments for perinatal and postnatal factors and socioeconomic position.23 Only those born both late preterm and small for gestational age (SGA) performed more poorly than full-term controls in the Full-Scale and Performance IQ tests.23 We also showed in the Helsinki Birth Cohort Study that those born late preterm performed worse in word list recognition in the Consortium to Establish a Registry for Alzheimer’s Disease Neuropsychological Battery (CERAD-NB) in late adulthood (mean age 68.1 years) and further, if having only attained basic or upper secondary education, had worse performance in several other subtests of CERAD-NB.24

Our previous work with the Helsinki Study of Very low Birth Weight Adults (HeSVA) showed that adults born VLBW had slower reaction times and impaired learning relative to full-term controls in the computer-based Cogstate test.5 Our aims in this study were to evaluate reaction times, learning, and executive functioning in adults born over the entire range of prematurity with an updated and broader version of the Cogstate test and to examine whether SGA birth weight (when combined with prematurity) is an additional risk factor for worse performance.



This study is part of the ESTER Preterm Birth Study in which we invited 1980 young adults from Northern Finland via the Northern Finland Birth Cohort 1986 (NFBC, born in 1985–1986; 49.8%) and the Finnish Medical Birth Register (FMBR, born in 1987–1989; 50.2%) to participate.25 In 2009–2011, 753 adults with verified durations of gestation participated in a clinical examination (Fig. 1).25 All participants provided written informed consent, and the study was approved by the Coordinating Ethics Committee of the Helsinki and Uusimaa Hospital District. We excluded 15 participants because of severe mental disability, cerebral palsy, or other severe physical disability. Another 16 participants were excluded from analyses because they did not finish all tasks or had outliers in several tasks that indicate poor compliance. Therefore, the current study finally included 722 participants (48% men): 133 (47% men) early preterm (<34 gestational weeks) participants, 241 (49% men) late preterm (34 + 0–36 + 6 gestational weeks) participants, and 348 (48% men) full-term controls (≥37 gestational weeks) (Table 1). In one of the Cogstate tasks (Groton Maze Learning Recall Test), data were only available for a subset of 72 early preterm participants, 133 late preterm participants, and 186 full-term controls.

Fig. 1: Flow chart of the study participants.
figure 1

Early preterm was defined as <34 gestational weeks, late preterm as 34 + 0–36 + 6 gestational weeks, and full term as ≥37 gestational weeks.

Table 1 Characteristics of the participants.

Perinatal data

For the NFBC, perinatal data were previously collected from medical records deposited in the cohort database.26 We obtained corresponding data for the FMBR participants from the medical records of hospitals and maternity clinics.25 Duration of gestation was confirmed from these data (ultrasonography before 20 gestational weeks had been performed on 62.7% and 53.1% of the preterm infants and full-term controls, respectively).25 Diagnosis of maternal hypertension (chronic or gestational), preeclampsia, and gestational diabetes was confirmed based on standard criteria.27,28 Participants whose SD score for birth weight was <−2.0, according to the Finnish infant growth standards from the year 1989,29 were classified as SGA, and all others were classified as appropriate for gestational age (AGA).

Clinical examination

Clinical examination was performed at a mean age of 23.3 (SD 1.2) years. Height and weight were measured, and body mass index (BMI) was calculated.25 Participants filled in a questionnaire on medical history, medication, socioeconomic position, and lifestyle. Childhood socioeconomic position was defined by the highest parental educational attainment at the time of clinical examination.25

Cognitive assessment: Cogstate test

Cogstate Research™ is a computer-based cognitive test battery previously used in several different settings to test mild cognitive impairments.30,31 To evaluate reaction times and cognitive functioning in general, we chose the Continuous Paired Associate Learning Test, Detection Test, Groton Maze Learning (GML) Test, Identification Test, One Card Learning Test, One Back Test, and Social Emotional Cognition Test for the battery (the tasks are explained below). Before testing, a research nurse (blinded to the birth status) issued oral instructions, and all of the participants practiced at least once, as recommended; more than one training session rarely improves the results.32 During testing, the participants were alone at the computer and wore headphones so that they could hear the error beep for incorrect answers. Written instructions were provided on the screen before each task, and familiarization trials were performed with visual and auditory feedback for correct/incorrect answers before the real test. The whole test lasted 9.18–15.07 min. The test results were transferred as encrypted data files to the Cogstate server and automatic data extraction was performed.

Tasks at the Cogstate battery

Continuous Paired Associate Learning Test evaluates visual associative memory with number of errors. Different patterns are shown on the screen and the participants are required to find an identical pair for the pattern shown. In the second round, the patterns are hidden and the participant needs to remember where they were earlier located on the screen.

Detection Test evaluates psychomotor function based on a simple measurement of mean reaction time. The participant must press any key immediately when a playing card turns face up.

GML Test evaluates executive function based on the number of errors and as a secondary variable spatial memory efficiency based on the speed of correct movement.33 The participant searches for the correct route from one corner of a 10 × 10 grid of tiles to the opposite side. Only downwards or sideways movements are allowed, and only one step can be taken in each move. The same route is repeated five times. To test visual memory, executive function, and learning, the GML Recall Test (repetition of the GML Test) was performed after all the other tests and the number of errors was measured.

Identification Test evaluates visual attention based on the mean reaction time. Depending on whether they see a red or black playing card, the participant needs to press the red or black key as quickly as possible.

One Back Test evaluates working memory according to speed of correct movement. In this test, two cards are displayed and the participant presses “yes” or “no” depending on whether the card they have seen is the same as the previous one.

One Card Learning Test evaluates visual learning according to the accuracy of performance (the portion of correct hits from all responses). The participant must press the “yes” or “no” key depending on whether they think the card shown was displayed before or not.

Social Emotional Cognition Test evaluates emotional cognition based on the accuracy of performance. Photographs of four faces are shown and the participant needs to recognize which of the faces is different from the other three.

Statistical analysis

First, we visually inspected the Cogstate data for outliers and inconsistencies. Multiple linear regression models were used to detect differences between those born early preterm, late preterm, and full term, and all p values < 0.05 were considered significant. The Cogstate program delivers results for time to make a correct movement (Detection Test, Identification Test, and One back Test) in milliseconds transformed to logarithms to base 10. Owing to skewness of the distributions, we also transformed number of errors at GML, GML Recall, and Continuous Paired Associate Learning Tests to the logarithm scale after adding one error for each participant (because some participants had zero errors). Results of accuracy of performance in the One Card Learning and Social Emotional Cognition Tests were delivered and analyzed in arcsine square root proportion values as per Cogstate standard. All logarithmic and arcsine square root proportion values were back-transformed after analysis so the results for these variables are shown as percentage differences between the groups.

Regression model 1 was adjusted for the source cohort (NFBC or FMBR), sex, and age. Model 2 was further adjusted for birth weight SD score, maternal smoking during pregnancy, maternal BMI before pregnancy, maternal hypertension or preeclampsia, maternal diabetes, mother’s age, being firstborn, and parental education (at the time of clinical examination). These variables for adjustment were selected based on theoretical grounds.5,34,35 We also analyzed whether adjustment for birth head circumference SD score, adult head circumference, or physical activity in adulthood would have an effect on the results; none of these affected the models, and thus they were omitted from the final models.

The analyses were also performed in the following pre-defined subgroups: early preterm and SGA (n = 22), late preterm and SGA (n = 31), early preterm and AGA (n = 111), late preterm and AGA (n = 210), and those born VLBW (n = 42), compared with those born full term and AGA (n = 341). The design of our study was based on performing the analyses in early preterm and late preterm subgroups, but to investigate the linear effect of gestational age, we also carried out the analyses with gestational age as a continuous variable (the analyses included preterm participants). IBM SPSS Statistics 24 was used to perform the analyses.



The early and late preterm groups differed from the full-term controls in terms of their prenatal and postnatal factors, including birth size, being born as a twin, and maternal preeclampsia (Table 1). The early preterm group included 17% (n = 22), the late preterm group 13% (n = 31), and the full-term control group 2% (n = 7) of SGA participants. The late preterm participants were more likely to be born from a mother with gestational diabetes than the full-term controls: 11 (5%) of those born late preterm versus 6 (2%) of the full-term controls (p = 0.04). A detailed non-participant analysis has been previously presented.25 Of the unimpaired adults who were excluded from analyses because they did not complete the Cogstate test (or had several outliers) (n = 16), 6 were born early preterm (of these individuals, 1 was also born SGA), 6 were born late preterm, and 4 were born full term. Because of the small size of this excluded group, a full non-participant analysis of them is not shown.

Cogstate test results

The early and late preterm participants performed similarly to those born full term in most of the Cogstate tasks (Table 2): Continuous Paired Associate Learning Test, Detection Test (which measures psychomotor function), Identification Test (attention), One Card Learning Test (visual learning), One Back Test (working memory), and Social Emotional Cognition Test. The difference estimates between the early and late preterm groups and full-term controls were between 0% and 13% (with large confidence intervals (CIs), including zero) when adjusted for sex, age, and source cohort (Model 1, Table 2). The results remained similar after adjusting the models for several perinatal and postnatal factors and parental socioeconomic position (Model 2).

Table 2 Cogstate test outcomes in preterm groups compared to those born at full term.

The only test that showed any significant differences between the groups was the GML Test (Table 2). Those born early preterm were slower than those born at term; this indicates that the early preterm group had lower spatial memory efficiency.33 The difference remained both in Model 1 (adjusted for sex, age, and cohort) and after further adjustments for prenatal and postnatal factors and parental socioeconomic position in Model 2 (0.6 fewer correct moves/10 s, 95% CI: −1.0; −0.2, Model 2). However, the early preterm group had a similar number of errors as the full-term controls in the same test (measuring executive function; Table 2). The late preterm group was also slower than the full-term control group in the GML Test, with 0.3 fewer correct moves/10 s (95% CI: −0.6; −0.0) in Model 1. However, the difference became non-significant after further adjustments in Model 2. In the GML Recall Test (measuring visual memory and executive function), the early preterm group had 27.5% (95% CI: 5.6; 53.8) and the late preterm group 17.7% (95% CI: 0.9; 37.2) more errors than the full-term controls in Model 1, but the results were attenuated after further adjustments in Model 2.

In the analyses of predefined subgroups of early preterm and SGA, late preterm and SGA, early preterm and AGA, late preterm and AGA, and VLBW compared with those born full term and AGA, again barely any differences appeared (Supplementary Table S1 and Table 3). All groups performed similarly to the full-term and AGA controls in the Continuous Paired Associate Test, Detection Test, Identification Test, One Card Learning Test, One Back Test, and Social Emotional Cognition Test (Supplementary Table S1). However, in the GML Test, although there were no differences in the number of errors (executive function), the speed of performance (spatial memory efficiency) was slower in the early preterm and AGA (0.7 fewer correct moves/10 s, 95% CI: −1.1; −0.2, Model 2), late preterm and SGA (1.3 fewer correct moves/10 s, 95% CI: −2.1; −0.4, Model 2), and VLBW groups (1.1 fewer correct moves/10 s, 95% CI: −1.9; −0.3, Model 2) than in the full-term and AGA controls (Table 3). The early preterm and AGA and VLBW groups also had more errors than the full-term and AGA group in the GML Recall Test (visual memory and executive function): 25.1% (95% CI: 1.1; 54.9) and 62.5% (95% CI: 10.0; 139.9, Model 2) more errors, respectively.

Table 3 Groton Maze Learning (GML) Test outcomes in subgroups according to gestational age and birth weight compared to full-term and appropriate for gestational age (AGA) controls.

In the analyses using gestational age as a continuous variable (including participants born preterm), nearly similar results were shown when divided into gestational age groups, and no significant associations were detected for the majority of the tasks (Supplementary Table S2).


We showed that unimpaired adults born early and late preterm performed, on average, as well as full-term controls in almost all of the tested core cognitive abilities: paired associate learning, psychomotor function, attention, visual learning, working memory, and emotional cognition. These results did not support our hypothesis of milder cognitive weaknesses among adults born at the mature end of prematurity. The only exception was the speed of performance in the GML Test, which measures spatial memory efficiency;33 the early preterm participants had 0.6 fewer correct moves/10 s (95% CI: −1.0; −0.2) than full-term controls, translating to 0.3 SD effect size, and late preterm and SGA participants had 1.3 fewer correct moves/10 s (95% CI: −2.1; −0.4) than full-term and AGA controls, translating to 0.7 SD effect size.

Previous findings by us and others have extensively shown that even adults without major disabilities who were born very preterm or VLBW perform worse on traditional neurocognitive tests than those born full term, including lower IQ and lower scores in attention scales.2,3,7 One of the characteristics of the profile of VLBW/very preterm adults is difficulty in executive functioning,9,10,11,12 i.e., the functions necessary for the cognitive control of behavior and selecting and successfully monitoring behaviors that facilitate the attainment of chosen goals.13 Here we have hypothesized that these problems would be manifested, to some extent, also in those adults born closer to term. The number of errors in the GML Test represents mostly executive function (especially error monitoring), and speed in the same test represents spatial memory efficiency.33 In the present study, those born early and late preterm had a similar number of errors but were slower than full-term controls in the GML Test. For those born late preterm, the statistical significance was marginal and arose from the slower speed among those born late preterm and SGA. Those born with VLBW were also slower than full-term and AGA controls in the GML Test. The analyses further indicate that VLBW and early preterm and AGA participants made more errors in the GML Recall Test (repeat of the GML Test as a measure of visual memory and executive function). Overall, these test results are consistent with the spatial memory problems detected in adults born preterm, but among those born late preterm the weaknesses were only observed among those born SGA.

Spatial memory is used for remembering one’s environment and orientation of objects relative to each other, helping us to interact with them.36 Problem solving often requires spatial memory functions, and children with poorer spatial memory may face difficulties in problem-solving tasks at school.37 This could partly explain the challenges that preterm individuals have in mathematics.37,38 Although specific mechanisms behind the spatial memory problems of preterm children are poorly known,36 a Norwegian study found smaller hippocampi in VLBW young adults, which was related to poorer visual memory indices on the Wechsler Memory Scale.39 Children born late preterm appear to have lower intellectual ability and higher risk for school difficulties than those born at term.17,18,19 Intellectual ability in adulthood has also been assessed by register studies that used intellectual ability tests for military conscripts in Sweden and Norway during an era when military service was obligatory for all men.21,22 These studies show that lower intellectual ability in men born preterm is sustained in adulthood.21,22 However, in the Swedish study, those born at 35–36 and 33–34 gestational weeks showed only 0.06 SD and 0.09 SD lower intellectual ability, respectively, than those born at term in the adjusted models.22 In the Norwegian study, the corresponding differences were merely 0.05 SD for those born late preterm and 0.13 SD for those born at 30–33 gestational weeks.21 Such small differences would not have been detected in the present study in which the width of CIs corresponded to around 0.3–0.5 SD.

Our previous study with the Arvo Ylppö Longitudinal Study included late preterm subjects who needed medical care in the neonatal ward within 10 days of birth.23 Therefore, the participants were, in infancy, likely to be sicker than the late preterm group in the present study, which as a geographical cohort also included infants healthy enough to stay with their mother.23 In that study, the late preterm participants showed slight weaknesses in the Full-Scale, Verbal and Performance IQ tests; they had 0.25 SD (3.7 IQ points) lower full-scale IQ than those born at term.23 This difference was explained partly by socioeconomic and perinatal factors, but findings remained for those born both late preterm and SGA, among whom the full-scale IQ was 1.1 SD (11.8 IQ points) lower than in full-term controls.23 In the present study, the late preterm and full-term participants performed mostly similarly in the Cogstate test (with CIs of −0.1 to 0.4 SD), but the difference in speed in the GML Test of late preterm participants and full-term controls remained after adjustments for confounding factors in those born late preterm and SGA, with an effect size of 0.7 SD. These findings suggest SGA as an additional risk factor, in particular among those born late preterm with lower immaturity-associated risk.

Computer-based simple tasks measure partly different abilities (e.g., reaction time and accuracy as important measurements) than traditional neurocognitive tests;40 this might explain why we did not distinguish differences in most of the tests. Moreover, our previous findings in the HeSVA cohort showed that those born VLBW and preterm had slower reaction times in all of the five tasks performed (including three tasks similar to those in the present study: the Detection Test, the Identification Test, and the One Back Test called Simple reaction time, Choice reaction time, and One back working memory, respectively, in the previous publication).5 However, these findings were not replicated in the present study even in the VLBW group, although the speed of performance on the GML Test may correlate with reaction time and learning.33 The effect sizes for reaction times in the Detection Test, Identification Test, and One Back Test were 0.2–0.4 in the previous study.5 The point estimates for the same tests were 0.1–0.3 SD (with CIs including zero, when adjusted for sex, age, and cohort) in the present study for the comparison between VLBW and full-term and AGA adults. In the present study, the number of those born VLBW was small (42 versus 147), and they also had a slightly higher mean gestational age (30.4 versus 29.2 weeks) than the HeSVA cohort.5 Overall, even the early preterm group in this cohort had a higher mean gestational age (31.9 weeks, SD = 1.9) than pure very preterm cohorts (i.e., cohorts only including infants born VLBW or <32 gestational weeks), which may explain the mildness of the differences between those born early preterm and at term.

One of the main strengths of this study is the large number of participants (n = 722) born across a wide range of gestational ages. The cohort was selected from a geographical region, and thus it also includes healthy late preterm adults not needing treatment at the neonatal care unit. We have collected detailed neonatal data and information about pregnancy disorders, maternal smoking, and childhood socioeconomic position, allowing adjustments with multiple important covariates. The cognitive test used, the computer-based Cogstate, has both strengths and limitations. It is a standardized test battery that is independent of research personnel, but its retest reliability might be low.41 Further, the validity of the different tasks relative to traditional cognitive tests has been divergent, as interclass correlations ranging from weak to very high have been demonstrated.42,43 Combining the Cogstate test with traditional cognitive testing would have increased the reliability of our results. Limitations of our cohort include having no data on cognitive testing at earlier time points. Moreover, although we were able to adjust for a number of important covariates, we may not have accounted for all confounding factors as a result of the long follow-up period. Finally, we cannot completely rule out participation bias, although no relevant differences in the characteristics between participants and non-participants were detected, as previously presented in detail.25

Preterm birth has been associated with several long-term problems, including defects in cognitive functions at different ages.4,6,8 However, the majority of the findings have been presented in studies that include only those born the most immature: very preterm or VLBW. We showed that in a population-based cohort of unimpaired young adults born over the whole range of gestational ages cognitive performance of those born early and late preterm attained the level of those born full term. Although this study cannot exclude small differences, it appears that cognitive weaknesses established in childhood among those born at the mature end of prematurity may not persist to adulthood in those without comorbidities. However, SGA may be an additional risk factor for cognitive problems in adulthood among those born late preterm.