Accurate prediction of birth implementing a statistical model through the determination of steroid hormones in saliva

Steroidal hormone interaction in pregnancy is crucial for adequate fetal evolution and preparation for childbirth and extrauterine life. Estrone sulphate, estriol, progesterone and cortisol play important roles in the initiation of labour mechanism at the start of contractions and cervical effacement. However, their interaction remains uncertain. Although several studies regarding the hormonal mechanism of labour have been reported, the prediction of date of birth remains a challenge. In this study, we present for the first time machine learning algorithms for the prediction of whether spontaneous labour will occur from week 37 onwards. Estrone sulphate, estriol, progesterone and cortisol were analysed in saliva samples collected from 106 pregnant women since week 34 by enzyme-immunoassay (EIA) techniques. We compared a random forest model with a traditional logistic regression over a dataset constructed with the values observed of these measures. We observed that the results, evaluated in terms of accuracy and area under the curve (AUC) metrics, are sensibly better in the random forest model. For this reason, we consider that machine learning methods contribute in an important way to the obstetric practice.

www.nature.com/scientificreports/ In comparison, P4 is essential for the maintenance of pregnancy as it inhibits the contractility of the myometrium by decreasing the production of prostaglandins and the genetic expression of contraction associated proteins (CAPs) 14 .
Regarding glucocorticoids, cortisol (C) is of vital importance in pregnancy, since it is responsible for fetal lung maturation 15 . The increase in C levels at the end of pregnancy can serve as a signal for the fetus to induce estrogen synthesis prior to the onset of delivery 16,17 .
The endocrine system is a complex regulatory system with multiple levels of organisation (central nervous system, tissues, adrenal glands, cells) and time cycles (circadian oscillations, rapid responses, ultradian rhythms). This system displays nonlinear feedback responses involved in the interaction between its components and other parts of the body 18 . Statistical models have assisted in understanding how feedback mechanisms are key to homeostatic modulation 18 , and also the behaviour of the complex pituitary-adrenal-pituitary axis complex 19 .
Traditional logistic regression models are generally used to face a problem of predicting premature births 20 . However, the use of machine learning models has aroused growing interest and have been widely employed in bioinformatics 21 , as well as in different fields of health sciences 22,23 . The possibility of its use is also mentioned in the diagnosis of premature births and emerging obstetric diseases 20 . In this way, machine learning models, such as tree-based algorithms and neural networks, can be used to address problems related to the prediction of labour. In this study, random forest models were used to contrast the contribution of such a machine learning technique, compared to traditional logistic regression models.
Therefore, the aim of our study was to determine P4, C, E3 and E1SO4 levels by a non-invasive method, in saliva samples from healthy pregnant women during the third trimester of pregnancy (from week 27 to the time of delivery), in order to develop a mathematical model to predict the probability of a spontaneous birth.

Results
Validation parameters. The sensitivity of the EIA technique was tested by means of low detection limit and calculated in 10 consecutive assays. Results were P4 = 0.25 nmol/L, C = 0.14 nmol/L, E3 = 0.13 nmol/L and E1SO4 = 0.31 nmol/L. The accuracy of the EIA was tested by determining the recovery rates of known amounts of P4, C, E3 and E1SO4 spiked into saliva samples. The average range of recovery rates was as follows (mean ± standard deviation): P4 = 94.29 ± 3.85 to 98.71 ± 5.37; C = 94.29 ± 3.85 to 97.15 ± 2.68; E3 = 90.75 ± 3.92 to 99.65 ± 2.01; and E1SO4 = 90.08 ± 3.29 to 93.27 ± 0.90. Precision of P4, C, E3 and E1SO4 was determined by calculating the intra-and inter-assay coefficients of variation (CV%). P4 = intra-assay: 3.2 ± 0.98%, interassay: 5.3 ± 0.89%; C = intra-assay: 4.8 ± 1.02%, inter-assay: 7.4 ± 1.23%; E3 = intra-assay: 6.7 ± 1.32%, inter-assay: 9.1 ± 2.14%; and E1SO4 = intra-assay: 2.9 ± 0.84%, inter-assay: 6.5 ± 1.36%. In order to determine the effects of saliva on standard curve, the standard curves with saliva samples were run in parallel with the standard doseresponse curve. There was a good degree of parallelism between both standard curves for the hormones studied: P4, R = 0.87; C, R = 0.84; E3, R = 0.89; and E1SO4, R = 0.83. Hormonal concentrations. Total means of each hormone together with a confidence interval based on one standard deviation were calculated from week 26-41 ( Fig. 1 and supplementary Table 1). In addition, the evolution of mean values distinguished by the week of delivery is also shown. Among all the hormones analysed, P4 is the only one that increased gradually throughout the weeks of pregnancy during the third trimester (Fig. 1A). C and E3 results showed that concentrations increased significantly (p < 0.05) from week 37 until giving birth (Fig. 1B,C). E1SO4 concentrations started increased exponentially from the 35 th week until giving birth (Fig. 1D).
Regarding repeated measures correlation coefficient (rmcorr) between hormones, results demonstrated that all hormones are positively correlated (Table 1), with C as the one presenting the strongest correlation with the rest of the hormones while P4 as the one with the weakest correlation.
Statistical model. We have adjusted the statistical models in order to predict the probability of spontaneous birth in the following week, from gestation week 37 (37th week included), based on the hormonal levels collected since the 34th week (Fig. 1). It was decided to adjust two types of statistical models. On the one hand, we fit a multivariate logistic regression model, since it is a reference model in the biosanitary field, mainly due to its simple interpretation. Such a model allows the measurement of the influence on the response variable of the independent variables by means of the odds ratio associated with each of the estimated parameters. On the other hand, a random forest model has been adjusted, which is a model that loses the interpretability of the results, but in return, is very competitive from the predictive point of view 25 .
Both models have been implemented using SAS Institute Enterprise Miner software, which internally invokes the DMREG procedure to perform the logistic regression and HPFOREST procedure to perform the random forest, proposing accuracy as the metric to be maximised.
In the logistic regression model, the logit function has been considered as the linkage function. Also, a stepwise variable selection process has been used, proposing a significance value of 0.05 for the sl-entry and sl-stay parameter. To avoid possible overfitting effects in the adjustment of the models, we have applied a cross-validation strategy. The value obtained for the accuracy metric in this process have been 77.4%. The assessment over the test dataset has provided an accuracy of 69.07% (Table 2). Table 3 shows the order in which the variables have been included in the model. No variables were removed in the adjustment process. Table 4 shows the estimates of the model coefficients, the corresponding p-values and the odds ratios associated with variables selected. www.nature.com/scientificreports/ We can observe that the positive and negative signs of the coefficients associated with the hormonal variables are consistent with the biological functions. We think that negative values for E1SO4 and progesterone are related to longer pregnancies, while positive signs for estriol and cortisol are related to earlier deliveries.
In the random forest model, a maximum of 100 decision trees were considered. Regarding the value of the parameters associated to the trees, following are those considered by default in Enterprise Miner:   www.nature.com/scientificreports/ In this case, to avoid possible overfitting, we used a modelling strategy used by Enterprise Miner called Out-Of-Bag (OOB) because it is the alternative this software proposed to the well-known cross-validation strategy. In the OOB strategy, the observations associated with each one of the adjusted trees are called the bagged observations, while those excluded are called OOB observations. These last observations were used in order to decide the number of trees to be ensembled to improve the capacity of generalisation of the model, similar to a cross-validation strategy.
The evaluation of the accuracy metric leads us to consider as an optimal model the ones that ensemble a total of 60 or 71 trees (Fig. 2), showing an OOB accuracy of 81.03%. According to the principle of parsimony (Occam's razor) recommended in the context of modelling 26 , we decided to use the simplest model, therefore, the one covering 60 trees was taken. The assessment over the test dataset provided an accuracy of 79.38% (see Table 2).
A list representing the relative importance of the variables in this model is also shown in Fig. 2. This relevance is established based on the degree of participation of these variables in each of the 60 trees that comprise the model. The fact that the same variable can participate in the same tree several times justifies that the participation of each variable can be greater than 60. Therefore, it can be seen that the E1SO4 value average over the four weeks of the last month (E1SO4_MONTH variable) is the most important effect to be considered. The same average values of Progesterone-PROGESTERONE_MONTH and the percentage variation of Estriol-ESTRIOL_ VAR_WEEK over the last two weeks were, respectively, the second and third most relevant variables, being the   www.nature.com/scientificreports/ only ones that participate in more than 100 trees. The first variable associated with cortisol hormone appeared in the seventh position.
In Table 2, the confusion matrices associated with both models are presented. The accuracy, the positive and the negative predictive value (PPV and NPV, respectively) showed that the random forest model provides better results than the logistic model. The area under the Receiver Operating Characteristics (ROC) curve, one of the most important evaluation metrics to verify the performance of any classification model 27 , showed that the random forest model works better also.
Moreover, two criteria were used as a baseline to compare the accuracy of the models. Firstly, the maximum randomness criterion, consisted of assigning the dominant class (birth = 0) to all cases. Although we considered this criterion to be very conservative, it was observed that the reference value it provides is 67/97 (69.07%), which is equal to the accuracy of the logistic regression, but significantly less than that of the random forest model (79.38%). Secondly, the proportional randomness criterion consisted of randomly assigning 0 and 1 values according to the actual proportion of the binary event analysed. Assuming that the predictions are made at random regardless of the real values, the expected probability of coincidences using this criterion is 0.5727 (57.27%).
being p = P(actualBirth = 1) and 1 − p = P(actualBirth = 0). The second criterion is considered reasonable to test the true goodness of fit of the model. Then, in the case of the multivariate logistic regression model, the reference precision improves by 11.80 percentage points (69.07-57.27%), while with the random forest model the improvement is 22.11 percentage points (79.38-57.27%) ( Table 5).
These results led us to propose the random forest model as appropriate to predict the week in which birth is most likely to occur, although it has less explanatory power than the logistic regression model.
Finally, the confusion matrices, detailed per week, are presented in Table 5. It was observed that the accuracy of the model is higher in the 37th week (85.71%), but this result must be interpreted carefully because in this week, the model tends to predict birth = 0, taking into account that it is the week in which fewer births occur. It should be considered that this prediction was provided by the model in 33 of the 35 cases, and hence in almost 94.30% of the cases. This allowed the highest NPV value (87.88%) but, this should not be considered as an important result because the maximum randomness criterion would provide a value of 85.71% (30/35). On the other hand, the number of positive cases predicted in each week was similar to the actual cases in the weeks numbered 38, 39, 40 (eight vs. 10, six vs, eight, eight vs. seven, respectively) and therefore, the corresponding results could be considered more interesting.

Discussion
In this study, the physiological variations of P4, C, E3 and E1SO4 during the third trimester are shown. These variations played an important role in pregnancy and in the preparation of labour mechanisms. It is convenient to explain that the activation of labour was not directly related to changes in the levels of estrogen and progesterone; however, this event could be affected by an alteration in the response of the myometrium to the expression of www.nature.com/scientificreports/ estrogen receptors (ER) and progesterone receptors (PR). The interaction between PR and ER of the myometrium prevents the action of estrogens during most of the pregnancy, while in preparation for delivery, there is a withdrawal of progesterone and estrogen activation 28 .
We implemented the analysis of these hormones in saliva samples, a useful non-invasive method to predict the week of spontaneous delivery. Our results revealed a positive hormonal correlation between all studied hormones, with C being the strongest correlation to the rest of hormones. Proper functioning and interaction of steroidogenic tissues, in addition to the enzymes involved in their synthesis and transport, are crucial since intrauterine exposure of the fetus to abnormal glucocorticoid levels 29 or sex hormones 30, 31 can negatively affect fetal development. Some authors state that there is a relationship between fetal sex with estrone and estradiol levels during pregnancy, however, they did not find statistically significant results 12 .
The role of the different sex hormones has been extensively studied, although there are hardly any articles that interrelate these levels of estrogens, progestogens and glucocorticoids and their variations throughout the third trimester of pregnancy. According to the results obtained in this study, the hormonal analysis of P4, E3, C and E1SO4 at specific times of the third trimester were good indicators of normal values in low-risk pregnancies, and their determination could provide additional information relevant to the control of pregnancy.
In addition, the exponential increase in P4, C, E3 and E1SO4 in third trimester, the period in which the fetus reaches maturity 32 , prepares the fetus and mother for the start of labour, as reported by other authors 8,[33][34][35] . Although its biological importance is ambiguous, E1SO4 seems to serve as an estrogenic reservoir linked to the endometrium transformation of the endometrium necessary for pregnancy maintenance 36 . The increase in E1SO4 at week 35 determines the increases in the other hormones studied. Therefore, it is critical to take this measurement last during the third trimester of pregnancy, in order to obtain a better understanding of the onset of labour processes.
The relevance of P4, C, E3 and E1SO4 in pregnancy and the onset of labour led us to develop a statistical model that could predict the week of birth. In recent years, the need to develop new prognostic methods in the field of obstetrics in order to improve clinical practice regarding the prediction of premature births, delivery methods, and development of pre-eclampsia, among others, has been emphasised 37 .
The knowledge of the week of spontaneous birth one week in advance has multiple benefits for both the pregnant woman and the health personnel, and has been very useful for reducing the number of inductions to labour for post term pregnancies. The development of a mathematical model with a positive predictive value of 70.83%, as well as a negative predictive value of 82.19%, was capable to relate from a woman the variations of the different hormones of only four saliva samples collected from week 34 and to predict childbirth. Therefore, this model is a novel and great advance in the field of obstetrics.
The application of the statistical model developed in low-risk pregnant women, routinely from week 34, contributed to improving the allocation of health resources based on the number of expected deliveries, as an increase in the number of midwives and obstetricians is associated with a decrease in the cesarean section ratio 38 . Also, its routine application contributed to reducing the number of inductions of labour (IOL) for post-term pregnancy, confirming that labour will begin the week following the application of the statistical model. This action could specify the maximum period to maintain a pregnancy, giving the opportunity of a spontaneous delivery and reducing the complications associated with IOL as postpartum haemorrhage, uterine rupture 39 , instrumental deliveries, cesarean sections and used of epidural anaesthesia 40, 41 , among others.
For future research, among other aspects, the inclusion of criteria such as pre-pregnancy body mass index, previous preterm birth or history of progesterone medication, would enhance the results of future randomised forest models 45 .
Therefore, due to the development of a statistical model that predicts the due date in low-risk women based on the levels of P4, SO4E1, E3 and C from week 34, fetal survival, the allocation of health resources based on the number of expected delivery, as well as the reduction of the number of inductions to post-term delivery, could be markedly improved.  Table 6), of which two of the initial participants dropped out before the beginning of the study. We therefore commenced with a total of 159 women, 53 of whom dropped out throughout the study, reaching a total of 106 who completed the study. The exclusion criteria were kidney disease, thyroid disease, autoimmune disease, cancer, pregestational diabetes, pre-gestational hypertension, in vitro fertilisation with heparin treatment, and current steroid treatment.
Eligible women underwent weekly collection of a saliva sample during the third trimester of pregnancy (from week 27 until week of delivery), with a Salivette (Sarstedt, Germany) collection tube. All samples were collected at an established time (10:00 am ± 1 h).
This study was approved by the Clinical Research Ethics Committee of Hospitales de Madrid, with the code CEIm HM hospitals: 16.06.0960-GHM, in accordance with the World Medical Association and the Declaration of Helsinki. Table 6. Data for all recruited women. All indicators were applied according to the SEGO recommendations. *Spontaneous rupture of membranes was considered spontaneous birth as between 60-80% of births occur within 24 h after SROM and approximately 90% occur by 48 h, however, due to risk of neonatal infection, all study participants were induced 12 h post SROM, as per hospital protocol. **Cesarean delivery without trial of labour was considered as those performed during latent phase of labour without reaching active labour. Elective cesarean section cases were discarded from the total samples. obtained was stored at a temperature of − 20 °C until further hormonal analysis. Polyclonal antisera raised in male rabbits against: 11α-hydroxyprogesterone 11α hemisuccinate BSA (P4), estrone-3-glucuronide: BSA (SO4E1), cortisol 3-CMO: BSA (cortisol) and estriol -6-CMO: BSA (Estriol) were used for assay development (Steraloids. Inc., Newport, RI). P4 (ab: C914), E1SO4 (ab: R522-2), E3 (ab: R4835) and C (ab: R4866) concentrations were assayed by enzyme-immunoassay (EIA). Briefly, microtiter plates (96-well flat-bottom polystyrene) were coated overnight at 4° C with the appropriate antibody dilutions. Then, plates were washed 3 times and conjugate working solutions (CWS) were prepared by diluting conjugate stocks in assay buffer. Standards and saliva samples were diluted in CWS and analyzed in duplicate. To achieve a competitive reaction, plates were incubated at room temperature for 2 h. Plates were washed 3 times with wash buffer and Enhance K-Blue TMB substrate (Neogen, Lexington, KY) was added to each well. To stop colorimetric reaction, stop solution was added to each well. Absorbance was read at 450 nm in an automatic microplate reader. Hormone concentrations were calculated by means of software developed for this technique (ELISA AID, Eurogenetics, Belgium). A standard dose-response curve was constructed by plotting the binding percent-age (B/B0 × 100) against steroid hormone standard concentrations. P4, E3, SO4E1 and C concentrations were expressed in ng/ml 42 .
The validation technique parameters' recovery rates, sensitivity, intra-and inter-assay coefficients of variation and parallelism were assayed as previously reported by Illera et al. 42 . The EIA techniques and antibodies used were developed and validated in the Endocrinology Laboratory of the Department of Animal Physiology (Faculty of Veterinary Medicine, University Complutense of Madrid, Spain).

Statistical analysis.
In order to achieve the desired precision for the estimates of the population mean hormonal values in each week of observation, the sample size was estimated with a confidence level of 95%, and an error of the estimate of ± 0.2 times the standard deviation, according to this consideration we obtained n ≥ 96.04.
To achieve the required precision and based on previous experience with other studies, an approximate loss between 30 and 40% of women has been estimated throughout the sample collection process (in the worst case, the dropout rate was 40%, leading us to specify n ≥ 161). Finally, after concluding the study, n = 106, which is above the initial consideration resulting in a dropout rate 34.2%.
Descriptive statistical analysis for mean and standard deviation of each hormone based on the week of delivery was performed. Repeated measures correlation coefficient (rmcorr) between hormonal values during the third trimester were estimated using PROC MIXED procedures by SAS 24 .
Validation technique parameters (recovery rates, sensitivity, intra-and inter-assay coefficients of variation) were calculated according to Andreasson et al. 43 . Parallelism was calculated using ANCOVA analysis 44 . Data were expressed as mean ± standard error. In all statistical comparisons, type I error set to maximum of 0.05.

Methodology for the construction of a childbirth prediction model. The methodology employed
for predicting the probability of spontaneous delivery in the week following from the latest hormonal levels observed is presented in this section. This methodology is based on two phases: processing data and adjustment of mathematical model.

Data processing.
A binary variable has been defined for each pair (Woman, Week), identifying whether a subject gives birth (1) or not (0) each week. This is the variable whose value is to be predicted, and its distribution by weeks is shown in Table 7. This table presents a total of 292 cases, distributed into 202 cases with a target equal to 0 and 90 cases with a target equal to 1. This table is divided into two tables (Table 8), one named "training dataset used to adjust the model" and another one named "test dataset used to check it". The proportions associated with one and another dataset are 2/3 and 1/3, respectively. The cases assigned to each dataset were selected at random, taking into account that the patients registered in one and the other table must be exclusive.  www.nature.com/scientificreports/ Once the target (variable to be predicted) was defined, it was necessary to define the explanatory inputs used to make the prediction. Basically, these variables measured mean values and percentage variations of the hormonal indicator during different time periods. The detail of the definition of these variables is shown in Table 9. Considering that the data was collected on Mondays of each week, the explanatory variables were measured in the four weeks prior to which the probability of delivery was estimated, including the forecast week itself. Therefore, the hormonal data collected at weeks 34, 35, 36, and 37 were used to predict whether the birth would occur at week 37, as well as weeks 38 and 39, and finally the data collected at weeks 37, 38, 39 and 40 were used to predict whether the birth would occur at week 40 or week 41. The resulting table, made up of the pair (Woman, Week), the binary target variable (birth or non-birth) and the 22 explanatory variables, served as input for different statistical models.