Introduction

Head circumference (HC) at birth is a proxy for fetal brain development1 and may be important to predict adverse perinatal and long-term outcomes. Small HC is associated with increased risk of infant mortality2 and delayed neurodevelopment of children in later life.3 Meanwhile, a large head size in fetuses or newborns is associated with adverse labor outcomes, such as operative delivery and unplanned cesarean.4,5,6,7,8 Measuring HC in infants is a simple, quick, and economical method for determining head size and overall brain volume.9,10,11 Based on an age- and sex-adjusted reference curve, clinicians can identify newborns with abnormal HC who may suffer from microcephaly (>2 standard deviation (SD) or 3 SD below the mean) and have a high risk of neurodevelopmental disorders.11,12

There are several published references for HC at birth in the United States, Canada, Finland, India, Indonesia, and Turkey.9,13,14,15,16,17 Significant differences among these curves can be found in the same centile for the same gestational age and sex, which suggests that racial or ethnic disparities, as well as differences in exposures to prenatal risk factors, across different populations may affect the fetal development in utero. These references usually describe fetal growth in different populations, while a “standard” inclines toward optimal growth based on an ideal low-risk population in a prescriptive sense and has an implication to render normal pregnancies abnormal.18,19,20 In 2014, the INTERGROWTH-21st project provided an international standard to assess newborn HC based on healthy and well-nourished women.21 However, racial or ethnic disparities in fetal growth challenged the use of international standards in specific populations, and little evidence is provided about whether this global standard for newborn head size is suitable for a Chinese population.

The existing Chinese national reference for HC at birth was generated using data from newborns born in 15 cities between 1986 and 1987,22 without consideration of infants’ sex. This existing reference may have become out of date because of the fact that rapid urbanization and economic growth over the past decades in China has brought enormous improvements in prenatal health care including nutritional status among pregnant women. For example, large-scale physical growth surveys in China showed a significant increase of 0.4 cm in average HC of newborns between 1985 and 2005.23 Few studies in China have established population-specific HC references at birth,24,25 but even these have not necessarily considered preterm birth or produced a detailed HC reference chart.

The objectives of the present study were to evaluate the validity of applying the INTERGROWTH-21st standard to a Southern Chinese population for identifying abnormal HC and compare it with a newly generated local HC reference by examining the association between abnormal HC identified by the two tools and adverse neonatal outcomes.

Methods

Study population

The HC surveillance for newborns was conducted in four perinatal health care centers: Guangzhou Women and Children’s Medical Center (GWCMC), which comprised two campuses, Guangzhou Huadu Women and Children Health Care Hospital and Guangzhou Liwan Women and Children Health Care Hospital, located in the central, north and west areas of Guangzhou, China, respectively. We included all the singletons born in these centers between 24+0 and 41+6 weeks of gestation from February 10, 2017 to May 31, 2018. Exclusion criteria for ensuring the low-risk status of newborns and their mothers included maternal age ≥35 years or <18 years, height <150 cm, body mass index (BMI) ≥28 kg/m2 or <16 kg/m2, history of cardiovascular disease, drug abuse, two previous pregnancies ending in miscarriage, history of stillbirth, macrosomia, preterm birth, or congenital malformations in previous pregnancies. We also excluded newborns whose mother used assisted reproductive technology or had the following complications in the index pregnancy: gestational hypertension, gestational diabetes mellitus, intrahepatic cholestasis of pregnancy, heart disease, liver disease, kidney disease, type 2 diabetes mellitus, chronic hypertension, thyroid disorders, systemic lupus erythematosus, tuberculosis, cancer, psychiatric disorders, idiopathic thrombocytopenic purpura, hematologic disorder, severe anemia, and sexual transmitted disease including gonorrhea, syphilis, genital warts or herpes, genital chlamydia trachomatis or mycoplasma infection, and acquired immunodeficiency syndrome. Stillbirths and infants with congenital abnormalities were further excluded. In addition, since at least 50 observations for each gestational week group were required to establish new HC curves,21 newborns at early gestational ages without enough observations were also excluded.

Collected HC data were entered into the Guangzhou Perinatal Health Care and Delivery Surveillance System (GPHCDSS), which covers >99% of deliveries in Guangzhou as previously described.26,27 The GPHCDSS collects information on maternal demographic characteristics, medical history, and maternal complications during pregnancy, as well as neonatal information including gestational age at birth and sex. Gestational age was based on the ultrasound examination in the first or early second trimester as a routine practice, which was rigorously followed in the four study centers. In rare cases, gestational age was calculated by the date of last menstrual period when ultrasound examination was not available. Data of 47,369 singletons were collected from the four centers during the study period. After excluding 22,168 newborns who were not at the low-risk status, 103 stillbirths, 456 with congenital abnormalities, 227 with missing or implausible data (HC values with residual >+3.89 SD or <−3.89 SD by robust regression with iteratively reweighted least square procedure) on HC measurements, and 158 born earlier than 33 weeks of gestation because of insufficient observations in these groups, 24,257 singleton live births were left in our final analysis (Supplemental Fig. S1 (online)). The study was approved by the Guangzhou Women and Children’s Medical Center Ethics Approval Board (No. 2016111865-2).

HC measurement

HC of newborns was measured by midwives immediately after delivery, using the “seca 212” head tape-measure band, which is made of non-stretch Teflon. All the midwives in the four centers were trained with standardized measurement procedures before the conduction of the study. For each newborn, the midwife should measure the HC twice by wrapping the band around the broadest part of the forehead above the eyebrow, above the ears, and around the most prominent part of the back of the head. HC measurements were read to the nearest millimeter and the average value of the two measurements was then recorded. The allowable difference between the two measurements was no more than 4 mm, otherwise the charge midwife would retake the HC with the same process.

Composite adverse neonatal outcome

A composite of neonatal outcomes was defined as one or more of the following conditions: low Apgar score (<7) at 5 min after birth, neonatal asphyxia, and admission to neonatal intensive care unit (NICU). Data of neonatal outcomes were extracted from electronic medical records of GWCMC, while medical records from the other two centers were inaccessible.

Statistical analysis

We used robust regression with iteratively reweighted least square procedure to identify implausible HC values before the construction of the HC reference to avoid biased gestational age-specific HC distributions, either by misclassification of gestational age or invalid HC measurements. We considered HC values with residual >+3.89 SD or <−3.89 SD as outliers (n = 34). When applying the INTERGROWTH-21st standard to the local population, we compared the proportions of newborns identified with HC > 2 SD below the mean at different gestational ages with expected proportion in a Gaussian (normal) distribution. We used generalized additive models for location, scale, and shape framework (GAMLSS) to construct a gestational age-specific HC reference and smoothed it with cubic spline. Three methods were explored in the GAMLSS to take account of skewness, platykurtosis, and leptokurtosis: (1) a LMS (lambda, mu, and sigma) method, which assumes a Box–Cox transformation; (2) a LMST (lambda, mu, sigma, Box–Cox t distribution) method, which assumes a shifted and scaled (truncated) t distribution, and (3) a LMSP (lambda, mu, sigma, Box–Cox power exponential distribution) method, which assumes a Box–Cox power exponential transformation. We selected the best model based on model fit assessment from the Bayes Information Criterion. When estimating the lambda (L), mu (M), and sigma (S) parameters, the link function for the L and S was “log,” while the skewness and kurtosis values were constant. All models were fitted for boys and girls separately.

We generated figures and tables for the 3rd, 10th, 25th, 50th, 75th, 90th, and 97th centiles, and SDs were obtained after smoothing for boys and girls separately. We compared the new −3 SD, −2 SD, mean, 2 SD, and 3 SD curves with the INTERGROWTH-21st standard visually. We also compared the classification of HC measurements at birth according to the −3 SD, −2 SD, 2 SD, and 3 SD cut-offs by the two tools.

Logistic regression was used to estimate the odds ratios (ORs) and 95% confidence intervals (CIs) for the associations between abnormal head size and composite adverse neonatal outcome. Abnormal HC, including microcephaly and macrocephaly, were defined as HC > 2 SD below the mean and above the mean, respectively, using the INTERGROWTH-21st standard or the new reference. Newborns were categorized into four or three groups jointly based on the two tools: (1) for microcephaly, newborns were firstly categorized as two groups: non-micro (not microcephaly by either of the two tools) and micro-either (microcephaly by either of the two tools), while the latter group was then classified into: micro-IG only (<−2 SD by the INTERGROWTH-21st standard alone), micro-NR only (<−2 SD by the new reference alone), and micro-both (<−2 SD by both tools); (2) for macrocephaly, newborns were first categorized as two groups: non-macro (not macrocephaly by either of the two tools) and macro-either (macrocephaly by either of the two tools), while the latter group was then classified into: macro-NR only (>2 SD by the new reference alone) and macro-both (>2 SD by both tools). Considering the difference in gestational age distribution among the microcephaly groups, we conducted stratified analyses by gestation age based on the intersection of the two tools (≥38 vs <38 weeks of gestation). Statistical significance was considered at the level of p < 0.05. All analyses were performed using R version 3.2.4 (R, Inc) and SAS version 9.3 (SAS Institute, Cary, NC).

Results

Application of the INTERGROWTH-21st standard to the local population

Figure 1 shows the proportion of microcephaly after applying the INTERGROWTH-21st standard to the local population at different gestational ages. There were 1067 (4.40%) newborns classified as HC <−2 SD according to the INTERGROWTH-21st standard, and when stratified by neonatal sex, 616 (4.77%) of the boys and 451 (3.98%) of the girls were identified as HC < −2 SD, respectively. The proportions of newborns identified with HC < −2 SD were higher at most gestational ages than expected proportion (2.28%) in a Gaussian (normal) distribution, especially among boys delivered at 40 and 41 weeks of gestation. The inconsistency between the observed and expected proportions suggested that there might be a need to construct a new HC reference in the local population.

Fig. 1
figure 1

Proportions of newborns identified with microcephaly by the INTERGROWTH-21st standard at different gestational ages. Data in table were presented as the number of newborns with head circumference (HC) < −2 SD/total number of newborns. Expected proportion of HC <−2 SD was based on the assumption that the HC had a normal distribution (2.28%)

Construction of new HC curves and comparison with the INTERGROWTH-21st standard

Smoothed gestational age-specific HC curves for boys and girls at birth in the new reference are presented in Fig. 2. The 3rd, 10th, 25th, 50th, 75th, 90th, and 97th percentiles; the means; and SDs of HC; as well as the values of L, M, and S parameters, at each gestational week from 33 to 41 in smoothed curves are shown in Table 1. The average absolute differences between observed and smoothed centiles were 0.12 cm for boys and 0.14 cm for girls. The HC measurements at all percentiles for each gestational age were higher for boys than for girls, with differences ranging from 0.1 to 0.5 cm.

Fig. 2
figure 2

Crude and smoothed curves for head circumference (HC) at birth of boys (a) and girls (b) in the new reference. Different curves represent the 3rd, 10th, 25th, 50th, 75th, 90th, and 97th percentiles according to gestational age. Gray circles represent actual observations. LMST (lambda, mu, sigma, Box–Cox t distribution) method, which assumes a shifted and scaled (truncated) t distribution. LMSP (lambda, mu, sigma, Box–Cox power exponential distribution) method, which assumes a Box–Cox power exponential transformation

Table 1 Head circumference (in cm) of singleton boys and girls at birth according to gestational age in the new reference

The comparison of new HC curves with the INTERGROWTH-21st standard is presented in Fig. 3. For boys, the HC cut-offs of −2 SD and −3 SD after 38 weeks of gestation were higher in the INTERGROWTH-21st standard than in the new reference with a maximal difference of 0.8 cm, while an opposite pattern was found before 38 weeks of gestation. The HC values in the 2 SD and 3 SD curves at all gestational weeks for boys were lower in the new reference than in the INTERGROWTH-21st standard with a maximal difference of 0.9 cm. For girls, similar differences were observed between the new reference and the INTERGROWTH-21st standard.

Fig. 3
figure 3

Comparison of head circumference curves at birth of boys (a) and girls (b) between the INTERGROWTH-21st standard and the new reference. The new reference is represented as a solid line and the INTERGROWTH-21st standard is represented as a dashed line

Table 2 shows the classification of HC at birth according to the new reference and the INTERGROWTH-21st standard. There were 688 (2.83%) and 40 (0.16%) newborns classified as HC <−2 SD and <−3 SD, respectively, according to the new reference, comparing to the numbers of 1067 (4.40%) and 65 (0.27%) when using the INTERGROWTH-21st standard. Among the newborns delivered at or later than 38 weeks of gestation, 417 were classified as HC <−2 SD by the INTERGROWTH-21st standard only, while 38 were classified as HC < −2 SD by the new reference only among those delivered earlier than 38 weeks of gestation. There were 498 (2.05%) newborns classified as HC > 2 SD according to the new reference, comparing to the number of 271 (1.11%) when using the INTERGROWTH-21st standard.

Table 2 Classification of head circumference at birth according to the INTERGROWTH-21st standard and the new reference

HC categories and adverse neonatal outcomes

The associations between microcephaly defined by the two tools and composite adverse neonatal outcome are shown in Table 3. Compared to the non-micro group, newborns in the micro-either group had a higher risk of composite adverse neonatal outcome (OR 1.56, 95% CI 1.26–1.91), while the ORs for the micro-NR only and micro-both groups were 7.53 (95% CI 2.97–19.12) and 2.00 (95% CI 1.57–2.56), respectively. No differences in the risk of composite adverse neonatal outcome were found between newborns in the micro-IG only group and in the non-micro group (OR 0.73, 95% CI 0.47–1.13). Stratified analysis showed that the preterm micro-NR only group (<38 weeks of gestation) had a higher risk of composite adverse outcome (OR 2.87, 95% CI 1.12–7.33), compared to the preterm non-micro group, whereas the term micro-IG only group (≥38 weeks of gestation) did not have such a higher risk compared to the term non-micro group (OR 0.88, 95% CI 0.57–1.36). Both the preterm and the term micro-both groups had higher risk of composite adverse outcome compared to the non-micro group, with ORs of 2.22 (95% CI 1.34–3.68) and 1.84 (95% CI 1.38–2.46), respectively (Table 3). We did not find significant associations between macrocephaly identified by the two tools and composite adverse neonatal outcome (Supplemental Table S1 (online)).

Table 3 Associations between microcephaly identified by the INTERGROWTH-21st standard and the new reference and composite adverse neonatal outcome

Discussion

In the present study, we found that 4.40% of the newborns were identified with HC > 2 SD below the mean by the INTERGROWTH-21st standard in a low-risk Chinese population, and the inconsistency with corresponding proportion in a normal distribution highlighted the need for new local HC reference. An up-to-date local sex-specific HC reference was then constructed for newborns from 33 to 41 weeks of gestation. When comparing our new HC curves to the INTERGROWTH-21st standard, we found significant differences in the HC cut-offs of −2 SD and −3 SD at both earlier and later gestational age groups. The proportions of newborns with HC > 2 SD and > 3 SD below the mean by the INTERGROWTH-21st standard were over 1.5 times as high as those by the new reference. The INTERGROWTH-21st standard identified many more newborns with HC > 2 SD below the mean compared with the new reference among those delivered at or later than 38 weeks of gestation; however, the risk of composite adverse neonatal outcome for these “microcephaly” newborns were similar to their counterparts in the non-micro group.

The INTERGROWTH-21st standard was developed among healthy pregnant women assuming that babies from different ethnic backgrounds would follow the same standard if they were under ideal conditions.21,28 Conversely, wide variation in fetal growth has been found even when the mothers were in equal optimized conditions.18 A fetal growth study showed highly significant differences in fetal HC measurements by ultrasound exam at the same gestational age among Asian, White, Hispanic, and Black women who were at low-risk status.29 Our findings about the validity of applying the INTERGROWTH-21st standard to the local population might add some new support to the disparities in fetal head growth among different populations.

There were similarities and differences between the features of our newly generated HC reference and the INTERGROWTH-21st standard. The HC values at all percentiles for each gestational age were higher for boys than for girls, which was in line with the INTERGROWTH-21st standard.21 The head size difference according to sex was also observed in standards based on other populations.9,17 However, differences between boys and girls in the new curves (≤0.5 cm) were smaller than those from other studies including the INTERGROWTH-21st standard.13,21,30 In our new curves, we found a flattened HC growth after 38 weeks of gestation, which was consistent with previously published HC growth curves9,13,14,15 in contrast to the INTERGROWTH-21st standard. This result was supported by previous studies that fetal growth including the HC growth would slow down after 38 weeks of gestation in normal pregnancies,31,32 which might be attributed to the reduction of utero-placental function in late pregnancy.33

When comparing our new reference to the INTERGROWTH-21st standard, we found significant differences in the −2 SD and the −3 SD curves at both earlier and later gestational ages. Specifically, the cut-offs of −2 SD and −3 SD in the INTERGROWTH-21st standard tended to be higher than those in the new reference among newborns born after 38 weeks of gestation, which accounted for >60% of all the newborns in our population. When using the INTERGROWTH-21st standard as a screening tool for microcephaly, the proportions of newborns with HC <−2 SD or <−3 SD would each increase up to 1.5-folds, while the microcephalic newborns identified by the INTERGROWTH-21st standard only were not at greater risk of composite adverse neonatal outcome compared to those in the non-micro group. Few studies have reported the associations between microcephaly and neonatal mortality and morbidity including NICU admission and hospitalization.2 Further studies are needed to provide stronger evidence for these associations and improve individualized care for newborns with abnormal head measurements.

Consistent with our findings, a study indicated that fetal smallness would be over-diagnosed in the Chinese population when using the INTERGROWTH-21st fetal growth standard, especially for HC measures.34 The differences in the performance of our new reference and the INTERGROWTH-21st standard may be partly attributed to the genetic variations in fetal size among different races or ethnicities, under the premise of similar maternal characteristics in this study with the INTERGROWTH-21st study. An international standard is valuable for the comparison across different countries and for the growth evaluation in populations without suitable local standards.35 However, it needs to be validated and cut-offs for identifying suboptimal infant growth should be tailored before it is applied to the local population.36

There are several strengths of the present study. First, previously published population-based HC references were usually generated using routine registry data, which had relative low quality. By contrast, the measurement of HC in this study was conducted according to standardized procedures, which might decrease measurement errors. Second, the analysis was conducted among mothers who were at low-risk status according to strict exclusion criteria so that our results could be compared with other growth references/standards that used similar inclusion and exclusion criteria including the INTERGROWTH-21st standard. Third, data on maternal and neonatal information were collected through the GPHCDSS system or electronic medical records of the study center, and the accuracy of data with minimal missing values enabled a robust analysis. One of the limitations of the present study is that we have a limited number of newborns for early gestational ages and therefore were unable to construct a HC curve for those born before 33 weeks. Second, the sample size in earlier gestational weeks was small, which might lead to a relatively large difference between the observed and smoothed curves of the 3rd and the 10th centile compared to later gestational weeks. Third, similar to the INTERGROWTH-21st standard and other commonly used HC references, our new reference did not exclude newborns with extensive head moulding, which might influence the measurement of their HCs. Previous studies have shown that head moulding may lead to a larger or smaller HC at birth than it would be after the effect of moulding has subsided.37,38 Therefore, including newborns with extensive head moulding may result in underestimating or overestimating the extreme percentiles of reference curves. Fourth, data of neonatal outcomes from two centers of our study sites were not available that reduced the sample size for examining the association between HC and neonatal outcomes. Last, the neonatal outcomes we used (low Apgar score after birth, neonatal asphyxia, and admission to NICU) were cross-sectional (measured at birth) and gestational age dependent, which might not reflect the prognosis of abnormal HC well. Unfortunately, we could not examine the associations between the HC measurements and neurodevelopment outcomes, which are most relevant to abnormal HC.

Conclusions

The newly generated sex-specific reference for HC at birth may be more appropriate for application in local populations compared to the INTERGROWTH-21st standard and would decrease the cost of screening and diagnosis for microcephaly, especially among the newborns delivered at or after 38 weeks of gestation. Further studies are needed to examine the ability of our new reference to predict neurodevelopment in childhood.