Swedish intrauterine growth reference ranges of biometric measurements of fetal head, abdomen and femur

Ultrasonic assessment of fetal growth is an important part of obstetric care to prevent adverse pregnancy outcome. However, lack of reliable reference ranges is a major barrier for accurate interpretation of the examinations. The aim of this study was to create updated Swedish national reference ranges for intrauterine size and growth of the fetal head, abdomen and femur from gestational week 12 to 42. This prospective longitudinal multicentre study included 583 healthy pregnant women with low risk of aberrant fetal growth. Each woman was examined up to five times with ultrasound from gestational week 12 + 3 to 41 + 6. The assessed intrauterine fetal biometric measurements were biparietal diameter (outer–inner), head circumference, mean abdominal diameter, abdominal circumference and femur length. A two-level hierarchical regression model was employed to account for the individual measurements of the fetus and the number of repeated visits for measurements while accounting for the random effect of the identified parameterization of gestational age. The expected median and variance, expressed in both standard deviations and percentiles, for each individual biometric measurement was calculated. The presented national reference ranges can be used for assessment of intrauterine size and growth of the fetal head, abdomen and femur in the second and third trimester of pregnancy.


Procedures. The ultrasound machines used were GE Voluson E10, GE Voluson E8 and GE Voluson E6
with abdominal transducers 2-6 MHz RM6C, 2-8 MHz C4-8-D, RAB 4-8-D and 2-9 MHz C2-9-D. BPD was used to calculate the GA, using the modified Selbing and Kjessler formula, 58.65 + 1.07*BPD + 0.0138*BPD 2 , as recommended by the Swedish Society of Obstetrics and Gynecology 14 . Only fetuses with BPD at least 21 mm at first study visit were included. At each ultrasound scan, five biometric measurements were each measured three times; BPD, head circumference (HC), mean abdominal diameter (MAD), abdominal circumference (AC) and femur length (FL). All data was manually registered in a web-based study database.
BPD and HC were measured in an axial section, at the level of the thalami, with the midline echo in a central position broken anteriorly by cavum septum pellucidum. Orbitae and cerebellum were non-visible. The callipers for BPD were placed on the outer margin of the proximal parietal bone, and the inner margin of the distal parietal bone. HC was measured by placing the callipers on the outer borders of the frontal and occipital edges of the bone, and the ellipse facility was used to follow the contour of the skull. MAD and AC were measured in cross-section (circular view of the abdomen), with the stomach visible, the umbilical vein in the anterior third of the abdomen and the aorta and inferior vena cava anteriorly of the spine. Further, the greater part of a rib should be seen but not the heart or kidneys. The callipers for MAD were placed on the outer skin borders both anterioposteriorly and perpendicular transversely. AC was measured using the ellipse facility to follow the outer contours of the skin. Lastly, FL was measured in a longitudinal section of the femur in 45° to 90° angle of insonation, with the callipers placed on the outer borders of the femoral diaphysis. All measurements followed the national recommendations for biometric assessment and the practice guidelines from The International Society of Ultrasound in Obstetrics and Gynecology 14,15 . Data management. Each biometric measurement was estimated three times and registered in the study database for all GAs, totalling to 38,601 repeated measurements. Data was first examined graphically using scatter-plots of each biometric parameter for GA in order to identify deviant records and inspect some data assumptions. Outliers were identified and each outlier was inspected regarding GA and the value of the individual biometric parameters. GA was evaluated and corrected against wrong data entry in the database. Incorrect GA records were adjusted during the examination process according to estimated date of delivery and date of examination. Next, extreme or unreasonable measurements (such as HC equal to or smaller than BPD) were deleted or otherwise corrected, if original measurements were available in the woman's medical records (often available for women examined in Uppsala, unlike other study sites). Where original measurements were considered unreasonable or contradictory, the corresponding data was deleted.
In 22 out of total 33 measurement records with incorrect GA, there was no information on GA. In the remaining eleven cases, GA was incorrectly calculated. A total of 267 measurements (0.68%) were outliers, with suspected incorrect entry of measurement values. Of the incorrect values, 166 (62%) were deleted. The remaining 101 incorrect values were corrected based on original measurement data. BPD was the measurement with lowest rate of incorrect values, 0.35%, followed by MAD with 0.41% incorrect values. AC was the measurement with the highest rate of incorrect values, 1.19%, followed by HC with a rate of 0.84%. FL was incorrectly entered in 0.64% of the measurements. study subjects were examined with repeated ultrasound scans during the same day. Six out of seven sonographers participated. Each study subject was examined two or three times by different sonographers, who assessed all five biometric measurements three times during each scan.
A linear mixed effect model was applied to estimate inter-observer variation. The chosen model accounts for the repeated measures and the differences in biometric measurements due to differences in GA. The model included fixed and random effects for each biometric measurement (BPD, HC, MAD, AC and FL), with a statistical marginal error of 5%.
In addition, the intraclass correlation coefficient (ICC) was assessed by applying a two-way mixed effects model to estimate intra-observer variation. We estimated absolute agreement, which includes systematic and random residual errors, for average measures.
Statistical analysis. Descriptive statistics were used for maternal characteristics at baseline as well as for delivery and neonatal characteristics. An independent samples t-test was performed to evaluate if dating discrepancy was different for girls and boys. The t-test was used after confirming that the data does not violate the test assumptions. A one-way ANOVA was employed to evaluate if dating discrepancy varied between the study sites. An independent samples Mann-Whitney U test was performed to compare median birthweights in subgroups of the cohort.
The biometric measurements were used to create reference ranges for the individual variables (BPD, HC, MAD, HC and FL). The log transformed fetal growth measurements were modelled using a multilevel approach, with fixed and random effects. First, a fractional polynomial regression was performed on the log transformed fetal measurements to identify the best fitting combination of fractional polynomials for the GA. For instance for the fetal BPD, a combination of 0.5 and 3 as the best fitting fractional polynomial powers was identified. The derived parameters were then included in the regression model as fixed effects in a multilevel model to account for repeated measurements for each fetus. We followed the approach used by Ohuma and Altman 4 and Johnsen et al. 16,17 -a two-level hierarchical model was used, considering the measurements (level 1) for each fetus (level 2) at each visit with a random effect for the effect of the identified fractional polynomial of GA and the intercept, similar to the study by Johnsen et al. 17 . We used the models mentioned above to estimate the expected fetal measurements at each GA in weeks. Thereafter, similar equations as in 17 were used to compute the standard deviation (SD) and the percentiles while adjusting for maternal body mass index (BMI), height, parity, county of birth (Nordic or non-Nordic) and fetal sex.
In a sensitivity analysis, where women with abnormal BMI were excluded, we applied the same adjusted statistical models to a subset of the study cohort with BMI 18.5 to 29.9 kg/m 2 to estimate the expected fetal biometric measurements at each GA in weeks, and to compute the SD and percentiles. The reference ranges before and after exclusion of women with abnormal BMI were compared using an independent samples t-test for each biometric measurement, for all subjects as well as stratified according to offspring sex.
Statistical analyses were performed using IBM SPSS Statistics version 2.5 and STATA 15.0.

Results
Out of the 684 recruited women, 650 were eligible for the study. During pregnancy, 14 women (2.2%) developed hypertension or pre-eclampsia, and 11 (1.7%) developed diabetes and were hence excluded. Fetoplacental complications, such as placenta previa, placental abruption, single umbilical artery and preterm fetal growth restriction led to exclusion in six cases. One woman had a late miscarriage, one child was stillborn and 26 children were born preterm. Eight women were excluded due to fetal malformation or chromosomal aberration. Thus, the final cohort consisted of 583 women; 275 from Uppsala, 66 from Falun, 98 from Katrineholm, 50 from Västerås and 94 from Örebro. In total 2590 ultrasound scans were performed during the study. The majority, 526 of 583 included women, were scanned at least four times. In 187 women, all five planned ultrasound scans were performed. The ultrasound examinations following the dating scan were fairly equally distributed, see Fig. 1. There was a peak at www.nature.com/scientificreports/ week 18-19, corresponding to the routine ultrasound scan, and week 37-39. The dating discrepancy, i.e. the difference between estimated date of delivery (EDD) according to BPD at first study visit and EDD according to last menstrual period, was within ± 7 days, and thereby fulfilled the inclusion criteria. The mean discrepancy was − 0.1 days (SD 2.8 days) and the median discrepancy was 0 days. The dating discrepancy was slightly, but not statistically significantly, larger for girls than boys (p = 0.174); mean − 0.5 days for girls (SD 2.7 days) and 0.2 days for boys (SD 2.7 days), respectively. Further, there was a difference in dating discrepancy between the study sites (p < 0.001), with the lowest discrepancy in Katrineholm (mean 0.1 days, SD 2.4 days) and the largest in Västerås (mean − 1.2 days, SD 2.8 days). The median age of the participating women was 29 years. BMI covered a range of 16.7-44.8 kg/m 2 , with a median BMI of 23.5 kg/m 2 . The majority of the study population, 92%, were born in a Nordic country (Sweden, Norway, Denmark, Finland or Iceland), and 5.5% were of non-European origin. Nearly 43% of the women were nulliparous. The median pregnancy duration was 281 days. Data on neonatal characteristics, including sex, was available for 574 children (98.5%). The median birthweight was 3625 g and median birth length 51 cm. For children with a mother born in a Nordic country, the median birthweight was 3628 g, compared with 3600 g for children with a mother born in a non-Nordic country; a difference that was not statistically significant (p = 0.258). Likewise, the median birthweight was comparable in children with younger and older mothers; 3660 g for maternal age less than 35 years and 3624.5 g for maternal age 35 years or older, p = 0.908. Nulliparous women gave birth to children with lower median birthweight compared with parous women; 3540 g for nulliparous and 3714 g for parous women, p = 0.008. Maternal and neonatal characteristics are summarized in Table 1.
Mean and variance equation for BPD in males and females: Table 4 shows the median and variance for estimated HC by GA in SD, and Table 5 HC by GA in percentiles. Figure 2b shows the raw data with fitted percentiles for estimated HC by GA.
Mean and variance equation for HC in males and females: www.nature.com/scientificreports/ Table 6 shows the median and variance for estimated MAD by GA in SD, and Table 7 MAD by GA in percentiles. Figure 2c shows the raw data with fitted percentiles for estimated MAD by GA.
Mean and variance equation for MAD in males and females: Table 8 shows the median and variance for estimated AC by GA in SD, and Table 9 AC by GA in percentiles. Figure 2d shows the raw data with fitted percentiles for estimated AC by GA.
Mean and variance equation for AC in males and females: Table 2. Estimated biparietal diameter (BPD) in mm by gestational age (GA) for males and females, standard deviations (SD). a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days.
Mean and variance equation for FL in males and females: Supplementary Tables 1-5 show the median and variance of the different biometric measurements (BPD, HC, MAD, AC and FL) for each gestational week for males and females separately. Supplementary Tables 6-10 show the median and variance of the different biometric measurements for each gestational day. The variance is expressed in standard deviations (+ 3 SD, + 2 SD, + 1 SD, median, − 1 SD, − 2 SD and − 3 SD) and in percentiles (2.5th, 5th, 10th, 25th, median, 75th, 90th, 95th and 97.5th). Moreover, all supplementary tables enclose the full equations of mean and variance for each biometric measurement.
The sensitivity analysis of women with BMI 18.5 to 29.9 kg/m 2 showed no statistically significant differences between the reference ranges in the complete study population and the subset of women where underweight and obese women were excluded (p = 0.9906 to 0.999). Supplementary Tables 11-15 show the median and variance of Table 3. Estimated biparietal diameter (BPD) in mm by gestational age (GA) for males and females, percentiles. a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days.

Discussion
In this cohort of prospectively enrolled, healthy women with low risk of aberrant fetal growth, we have constructed new Swedish reference ranges for normal size and growth of the fetal head, abdomen and femur. We have provided charts for five biometric measurements; BPD, HC, MAD, AC and FL, from gestational week 12 to 42.
Over the years, a large number of studies have presented regional and international charts for fetal size and growth. There is a large variability in study design and statistical modelling methods, as well as in reported percentiles 3,4,18 . The aim of a fetal growth chart is to describe how fetuses should grow under optimal conditions 3 . Hence, in concordance with large international studies of fetal size and growth, the present study has only included women with low risk of aberrant fetal growth 5,6 .
Reliable and population-representative size and growth charts are important in order to correctly evaluate both fetal size and growth, the latter as serial measurements. Altman and Chitty highlight differences in estimating size and growth, and how this affects the choice of appropriate study design 18 . A cross-sectional design is recommended for evaluating size, with a single measurement on each study subject. Longitudinal studies, on the other hand, comprise repeated measurements of each study subject. Compared with cross-sectional studies, longitudinal studies often use smaller study samples with measurements that are not independent of each other. Unless the repeated measurements are properly addressed, the variation may be underestimated using a longitudinal design. Since the publication of the intrauterine growth charts constructed by Maršál et al. in 12 , which are presently used in Sweden, statistical methods have been developed and used that take both repeated measurements and increased variation with GA into account. These methods permit the use of a longitudinal design to produce growth charts of size as well as growth intended for clinical practice 4,18 . A strength of our study is the prospective longitudinal design and the use of modern statistical modelling methods. Hence, the growth charts can be used to evaluate ultrasonically derived fetal biometry, both regarding size and growth. However, these growth charts are not intended for dating of pregnancies, as dating standards require different statistical analyses 19 . We recommend the use of dating charts that are designed solely for that purpose. www.nature.com/scientificreports/ Another strength of our study is the relatively large cohort of healthy women with low-risk pregnancies recruited specifically for this study with an even spread of the examinations across the included GAs. In order to increase the reproducibility and decrease the measurement error, a limited number of experienced sonographers conducted the ultrasound scans following the biometric measurement recommendations that are in use in Sweden. The use of triplicate measurements of each biometric assessment at each ultrasound scan further reduces measurement error. The reproducibility study showed a low grade of inter-observer variability. However, the low number of study subjects is a limitation. Accordingly, the reliability of the reproducibility study cannot be assessed as high. Lastly, strict criteria for exclusion due to increased risk of aberrant growth have been applied throughout the study.
A valid estimation of GA is considered crucial for developing reliable growth reference ranges 3 . The used method with regular menstrual cycles where estimated date of delivery (EDD) according to last menstrual period is consistent with first trimester ultrasound dating provides a reliable dating method 3,4,20-22 . The median discrepancy of 0 days in our study indicates concordant dating between EDD according to BPD and EDD according to last menstrual period. The mean dating discrepancy was larger for girls than for boys. The dating discrepancy is in line with the findings of earlier studies that have examined discrepancy in dating using last menstrual period and ultrasound 23,24 . We used BPD in gestational week 12 + 3 -13 + 6 to date the pregnancies. Swedish as well as international guidelines recommend dating with ultrasound during the first trimester, as this appears to be the most reliable method for pregnancy dating 14,25 . The Swedish guidelines recommend the use of crown rump length (CRL) in early pregnancy, and BPD from 21 mm (corresponding to week 12 + 3). Adherance to the recommendations of using CRL for dating in early pregnancy is however low in Sweden 26 . Since many Swedish Table 4. Estimated head circumference (HC) in mm by gestational age (GA) for males and females, standard deviations (SD). a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days. GA (weeks a ) − 3 SD − 2 SD − 1 SD Median + 1 SD + 2 SD + 3 SD   12  66  69  71  74  76  79  82   13  78  81  83  86  89  92  95   14  90  93  96  99  102  105  108   15  102  105  108  112  115  118  122   16  114  118  121  125  128  132  136   17  126  130  134  138  142  146  150   18  138  142  146  151  155  160  164   19  150  154  159  164  168  173  179   20  161  166  171  176  181  187  192   21  172  178  183  188  194  200  206   22  183  189  195  201  207  213  www.nature.com/scientificreports/ sonographers are not experienced in measuring CRL, we chose to date all pregnancies with BPD in order to avoid different dating methods. Further, the equation used for dating with BPD was derived with CRL as reference for "true" GA, and later the equation was validated as well performing with low systematic and random error 27,28 . However, first trimester dating with BPD predicts the GA and duration of pregnancy equally well as CRL, and the choice of future dating method should therefore not affect the applicability of our growth charts 27,29 . Variations in early growth might have an impact on the estimated GA when first trimester ultrasound dating is used rather than last menstrual period. This implies that there is a risk that a systematic bias caused by measurement error is introduced. The potential effect of inaccurate GA assessment due to natural variation in fetal growth should however be small, as the dating discrepancy was very small. A limitation of the study is the predominance (92.1%) of women born in Sweden or another Nordic country. This figure is high compared with the Swedish pregnant population, where 69.5% of all women giving birth in Sweden in 2018 were born in a Nordic country 13 . Some selection bias was unavoidable, as the written information to potential study subjects that was handed out during the recruitment process was solely available in Swedish, English and Arabic. Efforts were made to recruit women of various ethnicities and social backgrounds, by recruiting women in primary care units with a high rate of immigrants as well as in units with mainly Swedish born women.
In order to achieve a study population representative to the Swedish pregnant population, women of low as well as high BMI were included in the cohort, despite the potential effect of abnormal BMI on intrauterine growth. Since only healthy women were included, the risk of poor intrauterine growth due to malnutrition should be low. Even though the median BMI was normal, the upper interquartile range included women with overweight, indicating that a significant part of the study population were overweight. Women with obesity were Table 5. Estimated head circumference (HC) in mm by gestational age (GA) for males and females, percentiles. a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days.

GA (weeks a )
2.5th 5th 10th 25th Median 75th 90th 95th 97. 5th   12  69  69  70  72  74  75  77  78  79   13  81  82  83  84  86  88  90  91  92   14  93  94  95  97  99  101  103  104  105   15  105  106 108  110  112  114  116  117  118   16  118  119 120  122  125  127  129  131  132   17  130  131 133  135  138  140  143  144  146   18  142  144 145  148  151  154  156  158  160   19  154  156 158  160  164  167  170  172  173   20  166  168 170  173  176  180  183  185  187   21  178  179 181  185  188  192  196  198  200   22  189  191 193  196  201  205  208  211  213   23  200  202 204  208  212  217  221  223  www.nature.com/scientificreports/ not only screened with repeated random plasma glucose, but also with oral glucose tolerance test for gestational diabetes. All women who developed gestational diabetes were excluded, and solely women with normal plasma glucose and glucose tolerance fulfilled the study. It cannot be ruled out that increased fetal growth due to other factors than gestational diabetes in obese women might affect the results towards an overestimation of normal fetal size. In the sensitivity analysis, where women with BMI 18.5 to 29.9 kg/m 2 were compared with the complete study population, only small differences were observed between the groups for all biometric measurements. These differences should not be of any clinical significance, and neither were there any statistically significant differences between the reference ranges if all women were included or not. Hence, including the subjects with extreme BMI values should not bias the results. Moreover, the aim of the study was to provide reference ranges in a study population of healthy women representative to the Swedish pregnant population. Maternal age and BMI in the study population were similar to the mean age (30.4 years) and BMI (25.2 kg/m 2 ) of pregnant women in Sweden 2017 13 . Hence, the results of the complete study population can be regarded as generalizable for estimation of fetal size and growth in the current Swedish pregnant population. Compared with the growth charts presently used in Sweden, our new reference ranges are derived from an almost seven times larger study population 12 . Moreover, the study population in Maršál's study comprises 24% smokers. Considering the potential growth retarding effect of maternal smoking, their study population does not represent a low-risk population with expected normal fetal growth 30,31 . Methodological considerations, such as the nowadays outdated cross-sectional analytic methods of a longitudinal study in the Maršál study, and changes in the Swedish pregnant population, motivates a change into updated reference ranges for fetal size and growth. Moreover, the corresponding Norwegian growth charts, which are based on a methodology similar to ours, are Table 6. Estimated mean abdominal diameter (MAD) in mm by gestational age (GA) for males and females, standard deviations (SD). a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days. GA (weeks a ) − 3 SD − 2 SD − 1 SD Median + 1 SD + 2 SD + 3 SD   12  15  16  17  18  19  20  21   13  19  20  21  22  23  24  25   14  22  23  24  25  26  28  29   15  26  27  28  29  30  32  33   16  29  30  32  33  34  36  37   17  32  34  35  37  38  40  www.nature.com/scientificreports/ not entirely applicable to the Swedish setting, partly due to differences in demographics and birthweights, but most importantly due to differences in recommendations for how to perform the ultrasonic BPD measurements. The Norwegian reference ranges for biometry are calculated using the calipers placed on the outer margins of both the proximal and the distal parietal bone 16 .
During the last few years, large international projects have produced growth standards intended for universal use, with the assumption that differences in fetal growth and birthweights are caused by suboptimal environment rather than inherent differences in the populations 5,6,32 . Others have found evidence supporting that physiological differences rather than pathology explain the differences in size and growth between populations 8,33-35 . Applying international standards would in such a case possibly misclassify a large proportion of fetuses as either SGA, AGA or LGA 10 . There is an ongoing debate concerning the need of national standards for fetal size and growth. It is interesting to note that even though the INTERGROWTH-21st project showed high degree of likeness between study sites, the WHO Multicentre Growth Reference Study reported significant differences in fetal growth in different settings 5,6 . Even though both studies are of high quality with large study populations, recent studies have presented evidence that questions the use of a single international standard that represents ideal growth in all populations [8][9][10]35 . Similar conclusions were drawn by the authors of the WHO study, who recommend that if international charts are used, their performance should be tested in the local setting to assess if adjustments are needed 5 . Bearing these concerns in mind, we believe that there is a need for updated national reference ranges of fetal size and growth for everyday clinical practice. Moreover, further studies are needed to evaluate proper cutoffs for the updated reference ranges in order to identify fetuses with increased risk of adverse perinatal outcome. Table 7. Estimated mean abdominal diameter (MAD) in mm by gestational age (GA) for males and females, percentiles. a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days.

Data availability
The datasets generated during and/or analysed during the current study are not publicly available due to the ethical and legal restrictions prohibiting the sharing of personal data, but are available from the corresponding author on reasonable request. Table 11. Estimated femur length (FL) in mm by gestational age (GA) for males and females, percentiles. a GA expressed as completed gestational weeks, e.g. 12 weeks corresponds to 12 + 0 weeks or 84 gestational days.