Risk scores for predicting small for gestational age infants in Japan: The TMM birthree cohort study

This study aimed to construct a prediction model for small-for-gestational-age (SGA) infants in Japan by creating a risk score during pregnancy. A total of 17,073 subjects were included in the Tohoku Medical Megabank Project Birth and Three-Generation Cohort Study, a prospective cohort study. A multiple logistic regression model was used to construct risk scores during early and mid-gestational periods (11–17 and 18–21 weeks of gestation, respectively). The risk score during early gestation comprised the maternal age, height, body mass index (BMI) during early gestation, parity, assisted reproductive technology (ART) with frozen-thawed embryo transfer (FET), smoking status, blood pressure (BP) during early gestation, and maternal birth weight. The risk score during mid-gestation also consisted of the maternal age, height, BMI during mid-gestation, weight gain, parity, ART with FET, smoking status, BP level during mid-gestation, maternal birth weight, and estimated fetal weight during mid-gestation. The C-statistics of the risk scores during early- and mid-gestation were 0.658 (95% confidence interval [CI]: 0.642–0.675) and 0.725 (95% CI: 0.710–0.740), respectively. In conclusion, the predictive ability of the risk scores during mid-gestation for SGA infants was acceptable and better than that of the risk score during early gestation.

study subjects. Of the 23,425 pregnant women who consented to participate in the Tohoku Medical Megabank Project (TMM) Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study), participants were excluded due to the following reasons: withdrawn consent (N = 25); multiple pregnancies (N = 954); miscarriages (N = 158); delivery week at ≥ 42 weeks of gestation (N = 27); missing data (N = 35) and improbable data (N = 1) on the delivery week; missing data on parity (N = 37); unknown infant sex (N = 6); missing data on infant birth weight (N = 9); improbable data on infant birth weight (i.e., 24 g) (N = 1); missing data on the gestational week in the questionnaire or when the data was collected (N = 4,345); and improbable data on the gestational week in the questionnaire (N = 1). Furthermore, seven women with improbable data on the maternal height were excluded followed by 747 women who were excluded due to missing data on at least one of the following variables: maternal height, pre-pregnancy body weight (BW), BW during early gestation, BW during mid-gestation, and history of delivery of low birth weight (LBW) infants in a previous pregnancy. Finally, 17,073 women remained and were subsequently analyzed. Women with "No answer" on questions regarding the maternal birth weight and smoking status were not excluded considering the clinical use of risk scores to predict SGA infants. Table 1 shows the maternal and neonatal characteristics of the study subjects. The number of SGA infants was 1,126 (6.6%). Table S1 and the supplementary information present the results of the univariate logistic regression model. Supplementary Tables S2, S3, and S4 show the adjusted odds ratios, regression coefficients, and integer scores during the early-and mid-gestation periods based on the multiple logistic regression model. The selected explanatory vari- The numbers and proportion of SGA infants in each category, cases/number (%)

Maternal characteristics
Gestational week at prenatal checkup during early gestation, median (IQR) 12 www.nature.com/scientificreports/ ables in all risk scores were maternal height, parity, assisted reproductive technology (ART; conventional in vitro fertilization or intracytoplasmic sperm injection) with frozen-thawed embryo transfer (FET), continued smoking during pregnancy, and maternal birth weight. The maternal age during early gestation, maternal BMI during early pregnancy, and hypertension (grade 1 or higher) during early gestation were also selected while creating the risk scores during early gestation. When the risk score during mid-gestation was constructed, the maternal age during mid-gestation, maternal BMI during mid-gestation, weight gain between the initial body weight (BW) during early gestation and the BW during mid-gestation, and hypertension (grade 1 or higher) during mid-gestation were selected for model 1. For model 2 during mid-gestation, the initial standard deviation (SD) value of the EFW during mid-gestation was also selected in addition to the parameters selected for model 1.

Skeletal dysplasia
Thanatophoric dysplasia www.nature.com/scientificreports/ Model performance and calibration plot of the risk scores for predicting SGA infants. The discrimination performance of each risk score is shown in Fig. 2. The risk score during early gestation showed a poor discrimination performance. Models 1 and 2 (i.e., risk scores during mid-gestation) showed poor and acceptable discrimination performances, respectively. The C-statistics and ten-fold cross-validated C-statistics of model 2 during mid-gestation were 0.725 (95% confidence interval [CI]: 0.710-0.740) and 0.726 (95% CI: 0.710-0.741), respectively. The results of the sensitivity analysis performed using multiple imputation by a chained equation are described in the supplementary information. Table 2 shows the observed proportion of the SGA infants and the predicted probability of the SGA infants according to the quintiles of each risk score. As shown in Fig. 3, the possibility of miscalibration was low, because the calibration curve in each risk score was close to the diagonal line (i.e., the line of perfect calibration). Figure 4 shows the result of the decision curve analysis. The risk score during early gestation, model 1, and model 2 during mid-gestation had a higher net benefit (NB) than that of either all or no subjects considered to be at a high risk of delivering SGA infants when the threshold probabilities were 0.029-0.183 (risk score = − 3 to 10); 0.023-0.195 (risk score = − 5 to 9), and 0.019-0.312 (risk score = − 6 to 11), respectively. Table S5 shows the results of a comparison of the model performance between different risk scores for the prediction of SGA infants. In terms of discrimination and reclassification, model 2 showed a better performance for predicting SGA infants during mid-gestation as compared to the risk score during early gestation or model 1 during mid-gestation. As shown in Fig. 4, the NB in model 2 during mid-gestation was higher than that in the risk score during early gestation and model 1 during mid-gestation. Table 3 shows the discrimination performance and the NB based on the lowest risk score of quintile 5 as the cut-off value in each risk score. In model 2 during mid-gestation, the lowest risk score of quintile 5 as the cut-off value (i.e., risk score = 4) was equivalent to a threshold probability of 0.110 and had an NB of 0.011. Supplementary Table S6 also shows the discrimination performance and NB when each risk score that was closest to several threshold probabilities was set as the cut-off value. Supplementary Table S7 illustrates www.nature.com/scientificreports/ the discrimination performance and NB for each of the risk score that had the maximum Youden index that was set as the cut-off value.

Discrimination and clinical utility based on different cut-off values of the risk scores for prediction of SGA infants.
Model performance, calibration, and clinical utility of each risk score for predicting preterm and term SGA infants (Results of an additional analysis). The results of the model performance, calibration, and clinical utility of each risk score for predicting preterm and term SGA infants, an additional analysis performed in the present study, are described in the Supplementary Information.

Discussion
This is the first study in Japan to construct a prediction model for SGA infants based on risk scores. Since the predictive ability of the risk score including the EFW during mid-gestation was acceptable, it can be incorporated into prenatal checkup protocols for the early detection of pregnant women at a high risk of delivering SGA infants in Japan. The predictive ability of the risk scores for SGA infants during early gestation was poor. However, by using a risk score, healthcare providers and pregnant women can collectively identify certain risk factors to improve outcomes, for instance, the smoking status during pregnancy, which can be modified through smoking cessation 24,25 . Maternal smoking during pregnancy is associated with higher resistance within the umbilical artery flow 26 . Decrease in endothelial nitric oxide synthase (eNOS) activity in fetal umbilical and chorionic vessels, caused by maternal smoking, may be a possible mechanism 27 . In addition, eNOS activity in pregnant women who quit smoking during pregnancy was higher than in those who smoked during pregnancy 27 . Therefore, smoking cessation may lead to a decrease in risk of SGA infants by increasing eNOS activity in fetal umbilical and chorionic vessels. Probably because model 1 during mid-gestation included the maternal weight gain and was closer to the time of delivery than the risk score during early gestation, it had a higher ability to predict SGA infants. By using a risk score during mid-gestation, healthcare providers can recognize maternal weight gain, which may Table 2. Predicted probability and observed proportion of SGA infants according to the quintiles of each risk score. The risk score during early gestation comprised the maternal age, height, BMI during early gestation, parity, ART with FET, smoking status, BP during early gestation, and maternal birth weight. The risk score during mid-gestation (model 1) consisted of the maternal age, height, BMI during mid-gestation, weight gain, parity, ART with FET, smoking status, BP level during mid-gestation, and maternal birth weight. The risk score during mid-gestation (model 2) also consisted of the maternal age, height, BMI during mid-gestation, weight gain, parity, ART with FET, smoking status, BP level during mid-gestation, maternal birth weight, and estimated fetal weight during mid-gestation. Abbreviations: ART; assisted reproductive technology, BMI; body mass index; BP, blood pressure; CI, confidence interval; exp, exponential; EFW, estimated fetal weight; FET, frozen-thawed embryo transfer; SGA, small for gestational age.  14 .
Considering maternal complications, SGA (as a surrogate of FGR) requires careful surveillance with modalities such as the EFW, fetal doppler velocimetry, and cardiotocography at tertiary institutions 7 . If pregnant women are considered at a high-risk of delivering SGA infants by the prediction model, a detailed fetal ultrasonography will be needed. Fetal ultrasonography includes confirmation of congenital morphological abnormality, measuring maternal uterine artery pulsatility index for evaluation of placental dysfunction, close follow-up of EFW and fetal abdominal circumference for evaluation of fetal growth velocity. If fetal growth deteriorates, evaluation of fetal doppler velocimetry, including umbilical artery pulsatility index, middle cerebral artery pulsatility index, and flow of ductus venosus in combination with cardiotocography or biophysical profile scoring will be needed. In Japan, www.nature.com/scientificreports/ pregnant women are commonly managed at midwife homes or at primary, secondary, and tertiary institutions 28 . Obstetric medical institutions are increasingly becoming more centralized due to a shortage of obstetricians in Japan and a decrease in the number of medical institutions providing perinatal care 29 . Therefore, division of roles is increasingly practiced in tertiary institutions and institutions that manage low-risk pregnancies; the latter need to determine when to transfer pregnant women at a high risk of delivering SGA infants to tertiary institutions. Here, our prediction model may provide an early opportunity for the recognition of such women, allowing sufficient time for decision-making on their transfer. However, the NB for each of the risk scores was low in this study. Therefore, it is necessary to create a prediction model for SGA infants that increases NB more than the current model. Additionally, our model (in particular, model 2 during mid-gestation) for predicting preterm SGA infants should be updated though recalibration in the future. We would like to propose the use of a risk score for predicting SGA infants, as a supportive rather than a mandatory tool, at prenatal checkups. The decision to use our risk score for predicting SGA infants should be taken by the medical institution concerned.
The strength of this study is that many variables that were used for creating a prediction model for SGA infants were collected prospectively in a large sample size cohort study. Conversely, the limitations of the study are as follows. First, external validation of the prediction model was not performed. Other prediction models for SGA infants, which were constructed in other countries, have been evaluated for external validation 30 . Although several characteristics in this study (including the maternal age; proportions of primipara, preterm births, and low-birth-weight infants; mean gestational age at delivery; and infant birth weight) were similar to those used in the Japan Environment and Children's Study (a nationwide birth cohort study in Japan), we will perform an external validation in the near future 31 . Second, it is unknown whether low maternal birth weight was attributable . Decision curve analysis of the risk scores for predicting SGA infants. "All" (solid grey line) indicates that all subjects are considered to be at a high risk of delivering SGA infants. "None" (dashed black line) indicates that no subjects are considered to be at a high risk of delivering SGA infants. Table 3. Discrimination and the NB based on the lowest risk score of quintile 5 as the cut-off value in each risk score. Abbreviations: CI, confidence interval; LR, likelihood ratio; NB, net benefit; NPV, negative predictive value; PPV, positive predictive value; SGA, small for gestational age; TPR, true positive rate. www.nature.com/scientificreports/ to SGA or preterm births, because information on the gestational week when the subjects were born in was not collected. Third, data on other predictors of SGA infants, including prenatal ultrasonographic findings on abnormal cord insertion site, and abnormal cord coiling which leads to impaired cord blood flow were not collected in this study 32,33 . Furthermore, neither the maternal uterine artery pulsatility index nor biochemical markers such as the pregnancy-associated plasma protein-A were measured in this study 34,35 . Moreover, because the placental growth factor data were available for only a small number of subjects in this study, this parameter could not be incorporated into the prediction model. Therefore, we could not evaluate predictive performance of other prediction models of SGA infants for subjects in this study. In addition, we also could not compare predictive performance among risk scores in this study and other prediction models. However, these parameters are not routinely measured in clinical practice in Japan. Therefore, our prediction model may be useful in environments where provisions for such skill-intensive techniques and special measurement systems are not available. Riskin-Mashiah, S. et al. reported that fasting plasma glucose (FPG) in the first trimester was a predictor of infant birth weight 36 . However, 86.0% of data on FPG in the first trimester was missing in this study. In addition, the percentage of missing data on family history of DM, a risk factor of gestational diabetes mellitus, was 37.3%. Therefore, neither FPG in the first trimester nor family history of DM could be incorporated into the prediction model due to the high proportion of missing data in this study 37 . Although other parameters of fetal ultrasonography, including biparietal diameter (BPD), abdominal circumference (AC), femur length (FL), and fetal congenital anomalies may improve the predictive ability of our prediction model for SGA infants, there is a high proportion of missing data for at least one of BPD, AC, and FL in this study. Furthermore, neither prenatal ultrasonographic findings on fetal congenital anomalies nor doppler assessment of umbilical artery were recorded in this study. Therefore, we could not include BPD, AC, FL, fetal congenital anomalies, and doppler assessment of umbilical artery in the prediction model. For a sensitivity analysis, the discrimination performance of SGA infants without major congenital anomalies using the prediction model was evaluated. The discrimination performance of SGA infants without major congenital anomalies was similar to that of all SGA infants. The C-statistics and tenfold cross-validated C-statistics of the risk score during early gestation were 0.659 (95% CI: 0.643-0.676) and 0.659 (95% CI: 0.642-0.677), respectively. In model 1 during mid-gestation, the C-statistics and tenfold cross-validated C-statistics were 0.677 (95% CI: 0.661-0.694) and 0.673 (95% CI: 0.661-0.694), respectively. In model 2 during mid-gestation, the C-statistics and tenfold cross-validated C-statistics were 0.723 (95% CI: 0.708-0.738) and 0.723 (95% CI: 0.707-0.739), respectively. The cause of hypertensive disorders of pregnancy (HDP), especially preeclampsia, is thought to be impaired uterine spiral artery remodeling followed by angiogenic imbalance. As a result, the insufficient uteroplacental perfusion leads to SGA (as a surrogate of FGR) 38 . As low-dose aspirin treatment for pregnant women at high risk of preeclampsia would decrease the risk of delivery of SGA infants, construction of a prediction model for SGA infants with HDP (i.e., preeclampsia) in early gestation will be needed in Japan in the future 39 .

Early gestation
In conclusion, our prediction model for SGA infants, particularly during mid-gestation, may aid in the detection of pregnant women at a high risk of delivering SGA infants in Japan. Further studies for its external validation and improvement of its predictive ability are necessary.

Methods
Study design and participants. This study was part of the TMM BirThree Cohort Study, an ongoing prospective cohort study. The TMM BirThree Cohort Study, one of the several cohort studies conducted by the TMM, aimed to 1) monitor the damage to health status due to the Great East Japan Earthquake, 2) study the early diagnosis and treatment of diseases, and 3) perform molecular-epidemiological studies to examine the associations between genetic and environmental factors and diseases 40 .
The TMM BirThree Cohort Study recruited pregnant women (and their family members), whose expected date of confinement was later than February 1, 2014, from obstetric clinics and hospitals in the Miyagi and Iwate Prefectures in Japan from July 19, 2013 to March 31, 2017. Timing of consent to participate in the TMM BirThree Cohort Study was from the whole period of pregnancy to one month after delivery. Written informed consent was obtained from all participants. The study protocol was approved by the Institutional Review Board of the Tohoku University Graduate School of Medicine (approval number: 2013-1-103-1). The study has been conducted in accordance with the Declaration of Helsinki, the Ethical Guidelines for Human Genome/Gene Analysis Research, and all other applicable guidelines 41,42 . The details and cohort profile of the TMM BirThree Cohort Study have been described previously 40,43,44 .
For easy use in clinical settings, we developed a prediction model based on risk scores to predict the delivery of SGA infants, and an internal validation was performed using the data of all the enrolled subjects. This study was also described based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement 45,46 . The official TRIPOD checklist is shown in the supplementary information 46 . Candidate explanatory variables of the risk score for predicting SGA infants. The maternal age, height, BMI, weight gain, general medical history and medical history during previous pregnancy, conception method, smoking status, alcohol drinking, clinic blood pressure (BP), standard deviation (SD) of the EFW during mid-gestation, and maternal birth weight were considered as the candidate explanatory variables of the risk score 10,[15][16][17]19,21,34,[47][48][49][50][51] . Details on the data collection of the candidate explanatory variables are presented in the supplementary information.
There is a potential for extramaternal survival at ≥ 22 weeks of gestation; furthermore, reference ranges of birth weights are available for Japanese infants from ≥ 22 weeks of gestation onward 52 . Therefore, we considered that a prediction model for SGA infants at < 22 weeks of gestation would be clinically significant for fetal surveillance at www.nature.com/scientificreports/ the prenatal checkup. Furthermore, the pregnancy period was divided into two periods separated by 18 weeks of gestation, because Japanese reference ranges for the EFW are unavailable for < 18 weeks of gestation 53 . Therefore, we constructed the risk scores to predict SGA infants in the following two gestational periods that were less than 22 weeks: 1) early gestation: 11 weeks, 0 days to 17 weeks, 6 days and 2) mid-gestation: 18 weeks, 0 days to 21 weeks, 6 days.

Definition of SGA infants.
There is a difference in mean infant birth weight between different races 22 .
Since 1980, the mean infant birth weight has decreased and the proportion of LBW infants has increased rapidly in Japan 54 . Additionally, owing to the prevalent use of customized infant birth weight percentiles among health practitioners in clinical practice in Japan, we used it to define SGA infants in this study. Data on the infant birth weight, parity, delivery week, and sex were obtained from the medical records, because the infant birth weight percentile in Japan is customized based on these parameters 52,55 . SGA infants were defined as infants whose birth weight was in the < 10 th percentile.

Statistical analyses. Continuous variables of the maternal and neonatal characteristics in this study
were expressed as means ± SDs or median (interquartile range), as appropriate. Categorical variables were also expressed as numbers (percentages). First, we explored the candidate explanatory variables that were associated with SGA infants using a univariate logistic regression model. When the risk score during early gestation was constructed, the maternal age, maternal height, maternal pre-pregnancy BMI, BMI during early gestation, maternal weight gain between pre-pregnancy BW and the initial BW during early gestation, parity, conception method, maternal birth weight, BP during early gestation, smoking status, alcohol consumption, and medical histories of diabetes mellitus (DM), systemic lupus erythematosus (SLE) and/or antiphospholipid syndrome (APS), chronic kidney diseases (CKD), hyperthyroidism, and hypothyroidism were included in the univariate logistic regression model. When the risk score during mid-gestation was constructed, the maternal BMI during mid-gestation, maternal weight gain between initial BW during early gestation and initial BW during mid-gestation, BP during mid-gestation, and SD value of the EFW during mid-gestation were included in a univariate logistic regression model in addition to the explanatory variables that were considered during early gestation. Furthermore, risk scores without EFW during mid-gestation (model 1) and with EFW during mid-gestation (model 2) were created. Explanatory variables that showed a twosided P-value of < 0.20 in the univariate logistic regression model were included in a multiple logistic regression model. Furthermore, the maternal age during early or mid-gestation and height, which were parameters related to SGA infants in previous studies, were also included in the multiple logistic regression model, even when the two-sided P-value was ≥ 0.20 in the univariate logistic regression model 15,16 . If the variance inflation factor was greater than 2.5, multicollinearity among several explanatory variables was suspected, and the explanatory variables were either combined or one variable was chosen to decrease the multicollinearity 56 . In the multiple logistic regression model, explanatory variables that contributed to the prediction of SGA infants were selected when each two-sided P-value was < 0.05. The sample size in this study satisfied the condition that the number of SGA infants per explanatory variable was 10 or more to avoid overfitting in the multiple logistic regression model 57 .
The regression coefficients, rather than the odds ratio, were divided by the smallest absolute value of the regression coefficients among the selected explanatory variables and then rounded to an integer score 58 . Next, the risk score for predicting SGA infants was calculated by summating the integer score of each explanatory variable.
To assess the discrimination performance of each risk score for the SGA infants, we created receiver operating characteristics (ROC) curves and calculated the C-statistics (also known as the area under the ROC curves). The C-statistics were interpreted as follows: 0.5 (no discrimination); > 0.5 and < 0.7 (poor discrimination); ≥ 0.7 and < 0.8 (acceptable discrimination); ≥ 0.8 and < 0.9 (excellent discrimination); and ≥ 0.9 (outstanding discrimination) 59 . For the internal validation of each risk score, ten-fold cross validation was conducted. Furthermore, a calibration plot using a restricted cubic spline function with four knots was created to assess the calibration 60 . Calibration evaluates the concordance between the predicted probability and observed proportion of the SGA infants. To evaluate the clinical utility of the risk scores for SGA infant prediction by a net benefit (NB), we conducted a decision curve analysis (DCA) [61][62][63] .
To evaluate the differences in the predictive abilities of the different risk scores for SGA infants, we evaluated the differences in the C-statistics and reclassification. Differences in the C-statistics were determined using the Delong's test 64 . Reclassification was evaluated by calculating the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) [65][66][67] . Since there is no established meaningful risk category of SGA infants, we evaluated the continuous NRI (cNRI), rather than the category-based NRI. The overall cNRI evaluates the upward and downward changes in the predicted risk of SGA infants by changing the reference model to the new model. The overall cNRI was calculated as the sum of the event cNRI and nonevent cNRI. The event cNRI indicated the net proportion of subjects who delivered SGA infants and were correctly predicted to have a higher risk of delivering SGA infants. The nonevent cNRI indicated the net proportion of subjects who did not deliver SGA infants and were correctly predicted to have a lower risk of delivering SGA infants. The IDI was calculated by the following formula: IDI = change in the average predicted probability of SGA infants for subjects who delivered SGA infants between the two models-change in the average predicted probability of SGA infants for subjects who had not delivered SGA infants between the two models. When differences in the C-statistics, cNRI, and IDI among the risk scores during early gestation, model 1, and model 2 during mid-gestation were compared, a two-sided P-value of < 0.0167 (0.05/3) by the Bonferroni's correction was considered as statistically significant. In addition, we also compared the decision curves of the risk scores.
Considering that it is unknown whether infants are SGA or not until they are born, we also explored cut-off values with the predicted probability of SGA infants. Thus far, there is no established threshold probability to www.nature.com/scientificreports/ categorize pregnant women as having a high risk of delivering SGA infants, although the other study reported several cut-off values according to the predicted probability of SGA infants 30 . Therefore, several cut-off values of the risk scores were considered to define a high risk of SGA infant delivery in this study. After the subjects were divided into quintiles based on the distribution of each risk score, the minimum risk scores of the fifth quintile in each risk score were set as the cut-off values. Furthermore, the risk scores closest to threshold probabilities of 0.05, 0.10, 0.15, and 0.20 (i.e., 5%, 10%, 15%, and 20%) were set as the cut-off values for each risk score. The numbers that must be tested corresponding to the threshold probabilities of 0.05, 0.10, 0.15, and 0.20 were 20, 10, 7.7, and 5, respectively. The risk scores which had the maximum Youden index (i.e., sensitivity + specificity-1) were also set as the cut-off values 68 . Then, the true positive rate (i.e., sensitivity), specificity, positive predictive value, negative predictive value, positive likelihood ratio (LR), and negative LR were calculated in each cut-off of the risk score 69 . Furthermore, we calculated each NB when each cut-off risk score was set.
In the sensitivity analysis, we constructed a risk score after multiple imputations by a chained equation (MICE), as described in the Supplementary Material 70 . As an additional analysis, we evaluated the model performance, calibration, and clinical utility of each risk score for predicting preterm and term SGA infants. We also compared the model performance between different risk scores for predicting preterm and term SGA infants.
Statistical software used in the statistical analysis are described in the supplementary information.

Data availability
The datasets analyzed in this study are available from the corresponding author on reasonable request.