The amniotic fluid cell-free transcriptome in spontaneous preterm labor

The amniotic fluid (AF) cell-free RNA was shown to reflect physiological and pathological processes in pregnancy, but its value in the prediction of spontaneous preterm delivery is unknown. Herein we profiled cell-free RNA in AF samples collected from women who underwent transabdominal amniocentesis after an episode of spontaneous preterm labor and subsequently delivered within 24 h (n = 10) or later (n = 28) in gestation. Expression of known placental single-cell RNA-Seq signatures was quantified in AF cell-free RNA and compared between the groups. Random forest models were applied to predict time-to-delivery after amniocentesis. There were 2385 genes differentially expressed in AF samples of women who delivered within 24 h of amniocentesis compared to gestational age-matched samples from women who delivered after 24 h of amniocentesis. Genes with cell-free RNA changes were associated with immune and inflammatory processes related to the onset of labor, and the expression of placental single-cell RNA-Seq signatures of immune cells was increased with imminent delivery. AF transcriptomic prediction models captured these effects and predicted delivery within 24 h of amniocentesis (AUROC = 0.81). These results may inform the development of biomarkers for spontaneous preterm birth.

The World Health Organization defines preterm birth as the delivery of a neonate between 20 and 37 weeks of gestation 1 . Globally, 14.84 million preterm births and 1.1 million prematurity-related deaths occur each year 2,3 . The United States has seen the incidence of preterm birth continue to increase since 2014, and this rate has recently plateaued at nearly 10% since 2018 4 . Preterm birth is the leading cause of neonatal mortality (death within 28 days of delivery) and morbidity (e.g., neonatal respiratory morbidity and NICU triage/admission) as well as infant mortality (death before 5 years of age) 3,5 . Prematurely born infants are at an elevated risk of developing chronic diseases that may include neurological disorders (e.g., learning disabilities) and cardiovascular diseases later in life 6,7 . Preterm birth poses a substantial economic burden on society; approximately $26.2 billion are spent annually on the care of prematurely born infants 8 ; hence, there is a sustained effort to develop strategies to prevent and reduce the impact of preterm birth. Although there has been success in reducing the mortality rate, other adverse neonatal outcomes associated with prematurity have remained consistently prevalent [9][10][11] .
Preterm birth is either iatrogenic (i.e., medically indicated in the event of disease, such as preeclampsia) or spontaneous following preterm labor or preterm pre-labor rupture of the membranes 12,13 . Spontaneous preterm labor accounts for most of the preterm births, characterized by multiple underlying etiologies culminating in the activation of a common pathway of labor that leads to spontaneous preterm delivery 13,14 . There is evidence to support a causal relationship between microbial-associated or sterile intra-amniotic inflammation and spontaneous Scientific Reports | (2021) 11:13481 | https://doi.org/10.1038/s41598-021-92439-x www.nature.com/scientificreports/ preterm labor and birth [15][16][17][18][19][20][21][22][23][24] . Intra-amniotic inflammation, however, is responsible for only a subset of cases with spontaneous preterm labor and delivery, while most cases of preterm labor are considered to be idiopathic or have an unknown etiology.
To address the complex public health problem of preterm birth, the development of biomarkers for preterm labor is necessary [25][26][27][28] . Currently, risk modeling of preterm birth relies on maternal factors: a history of preterm birth 29 , of late miscarriage 30 or cervical excisional surgery 31 , a sonographic short cervix 32 , a low customized cervical length percentile 33 during the current pregnancy, amniotic fluid sludge 34 , and an abnormal cervical consistency index 35 . Biochemical markers, e.g., fFN 36 , PIGFBP-1 37 , PAMG-1 38 , inflammatory cytokines 39 , and cervical acetate 40 , as well as the vaginal microbiome 41 and maternal blood transcriptome 42 , have also been suggested as predictive of preterm birth. Combinations of biomarkers have shown superior predictive performance compared to any single test 43 . However, given the syndromic nature of spontaneous preterm labor, there is a need to improve preterm birth prediction performance relative to current biomarkers 43,44 .
Amniotic fluid (AF) surrounds the developing fetus and is in continuous exchange with fetal organs and gestational tissues [45][46][47] ; hence, this fluid is a rich source of potential biomarkers for preterm labor and birth 48 . The cell-free supernatant that remains upon removing the AF's cellular components reflects fetal well-being and pregnancy status 46,47,[49][50][51][52][53][54][55][56] . Indeed, studies have described changes in the AF cell-free transcriptome in fetal genetic disorders, such as trisomy 18 50 , trisomy 21 49 , and Turner syndrome 52 , as well as in fetal growth restriction 54 and preeclampsia 55 . Others have investigated the association between the AF cell-free transcriptome and maternal factors such as race, obesity, and smoking status 51,56 , as well as neonatal morbidity 53 ; yet, the AF cell-free transcriptome in spontaneous preterm labor was not assessed in prior investigations.
To address the current knowledge gap, we performed whole transcriptome profiling of AF in mothers who underwent transabdominal amniocentesis after an episode of preterm labor and then utilized machine learning to predict the time-to-delivery after amniocentesis. Given the recent development of cell-type-specific signatures based on single-cell genomic studies of the placenta 57-61 and the relevance of these signatures in identifying preeclampsia 58,62 and preterm parturition 61 , we have also assessed the perturbations of these signatures in AF cell-free RNA.

Results
Demographic characteristics of the study participants. We examined the AF cell-free transcriptome in samples collected from 38 women who had a transabdominal amniocentesis performed after an episode of preterm labor (Fig. 1a). Women were divided into two groups according to the interval from amniocentesis to delivery (Fig. 1b): (1) women who delivered within 24 h of amniocentesis (n = 10) and (2) women who delivered after 24 h from amniocentesis (n = 28). Table 1 presents a comparison of the clinical characteristics of women between the two groups.
Women who delivered within 24 h of amniocentesis had smaller babies than those who delivered after 24 h (median birth weight: 1907.5 g vs. 2142.5 g, p = 0.047); yet, there were no differences in the birthweight percentiles. The AF interleukin (IL)-6 concentrations, frequency of AF glucose levels < 14 mg/dl, and frequency of AF white blood cell counts ≥ 50 cells/mm 3 were higher in women who delivered within 24 h of amniocentesis than women who delivered after 24 h. Other fetal and maternal characteristics were similar between the two groups. Of importance, there was no significant difference in gestational age at amniocentesis between women who delivered within 24 h of amniocentesis compared to women who delivered after 24 h (median gestational age at amniocentesis: 32.8 weeks vs. 31 weeks, p > 0.5).
Amniotic fluid WBC count, IL-6 levels, and culture determinations are used in the clinical decisions in patients with preterm labor who undergo an amniocentesis. These decisions may include administering tocolytic agents to inhibit myometrial contractions, administering antenatal corticosteroids for fetal lung maturity, or early delivery in cases of severe infection/inflammation. However, none of the 38 women in this study had labor induced preterm, with only one case having labor induction at 40 weeks. Therefore, indicated early delivery was not a confounding factor in the analysis of cell-free RNA data. Clinical decision to administer tocolytic agents may affect the interval from amniocentesis to delivery, yet this was not the case here since tocolytic agent administration occurred at similar rates between the two groups (delivery ≤ 24 h from amniocentesis vs. > 24 h) (40% vs. 46.4%, respectively) ( Table 1).
Differential expression with imminent delivery after amniocentesis. Hierarchical clustering ( Fig. 1c) based on the most variable genes and principal components analysis (Fig. 1d) shows an overall separation between the group of women who delivered within 24 h of amniocentesis and those who delivered after 24 h. The first principal component that captured 36% of the variation was correlated with gestational age at amniocentesis (Pearson correlation = 0.36, p = 0.027).
The comparison of AF cell-free transcriptomes of women who delivered within 24 h of amniocentesis and those from women who delivered after 24 h showed differential expression (fold change > 1.25, q value < 0.1) of 2385 genes (1508 up-regulated and 877 down-regulated) (Table S1, Fig. 2).
The five most up-regulated genes were CCL4 (C-C motif chemokine ligand 4), IL1B (interleukin 1 beta), AQP9 (aquaporin 9), BCL2A1 (BCL2 related protein A1), and CXCL8 (C-X-C motif chemokine ligand 8). Functional analysis of up-regulated differentially expressed (DE) genes showed an over-representation of 1918 biological processes, 171 cellular components, and 143 molecular functions (Table S2). Most enriched biological processes were related to the inflammatory and immune responses to stimuli, e.g., pattern recognition receptor signaling pathway, leukocyte mediated immunity, NIK/NF-kappaB signaling, and cytokine-mediated signaling pathway. Significantly over-represented cellular components included terms related to extracellular region, membrane, cytoplasmic vesicle part, and I-kappaB/NF-kappaB complex. The over-represented molecular functions www.nature.com/scientificreports/ significantly increased in the AF cell-free transcriptome of women who delivered within 24 h of amniocentesis compared to women who delivered after 24 h (q < 0.05) (Fig. 3a). These included tissues and cell-type-specific signatures of organs (e.g., fetal lung, liver, and olfactory bulb) and immune system-related signatures (bone marrow, lymph nodes, whole blood, T cells, B cells, monocytes, and natural killer cells).
Changes in placental scRNA-Seq signatures in AF with imminent delivery. By utilizing the same type of analysis described above for tissues, we have also analyzed changes in placental scRNA-Seq signatures derived from the placentas' three compartments (basal plate, placental villi, and chorio-amniotic membranes) in women with term labor or preterm labor 61 . There was a significant increase in the average expression of genes specific to monocytes, myeloid progenitor cells, dendritic macrophages, activated T cells, B cells, natural killer cells, and extravillous trophoblasts in women who delivered within 24 h from amniocentesis compared to those who delivered after 24 h (Fig. 3b).
Prediction of time-to-delivery. Based on the differential expression analysis, we concluded that the AF cell-free transcriptome of women with preterm labor echoes the inflammatory response that precedes delivery. We hypothesized that predictive models that capture these effects should predict the time-to-delivery after amniocentesis. Our modeling strategy included the selection of RNAs that were most informative about the interval from amniocentesis to delivery in a multivariate evaluation, followed by random forest model fitting of gene expression data to predict time-to-delivery as a continuous variable. The cross-validated prediction of time-to-delivery by a transcriptomic model was significant, with a Spearman's correlation coefficient of 0.49 (p < 0.001) and a root-mean-square error (RMSE) of 3.1 weeks (Fig. 4a). When assessed as a binary outcome, prediction of delivery within 24 h of the procedure by the transcriptomic model time-to-delivery estimates was also significant as indicated by a receiver operating characteristic (ROC) curve analysis (Fig. 4b).
The areas under the ROC curve (AUROC) for prediction of delivery within 24 h, 1 week, and 2 weeks were 0.81, 0.74, and 0.72, respectively. To assess the robustness and reproducibility of genes selected as predictors during the cross-validation analysis, we calculated the average Jaccard similarity (0.82) and average kappa coefficient (0.9) between all sets of predictor genes identified across leave-one-out iterations. Based on Kursa's 64 definition of significantly selfconsistent selection, 53 of the most-predictive genes of time-to-delivery in AF samples are highlighted in Table 2. All 53 genes selected as predictors were up-regulated in the AF cell-free RNA of women who delivered within 24 h of amniocentesis compared to those who delivered later. Of note, 23 genes were selected in all iterations of leave-one-out cross-validation; these included IL1B (interleukin 1 beta), CXCL8 (C-X-C motif chemokine ligand Table 1. Demographic characteristics of the women included in the transcriptomics study. Continuous variables were compared with the Welch's t-test and are summarized as medians (interquartile range). Categorical variables are shown as number (%) and were compared by using a Fisher's exact test. a Composite neonatal morbidity was defined as the presence of any of the following complications: 5 min Apgar score < 7, bronchopulmonary dysplasia, pulmonary hypoplasia, respiratory distress syndrome, necrotizing enterocolitis, intraventricular hemorrhage, periventricular leukomalacia, retinopathy of prematurity, neonatal sepsis, or NICU admission.   Figure 5 shows a highly connected protein-protein interaction network built based on the corresponding 53 predictor genes. In this figure, several enriched Gene Ontology biological processes related to the immune and inflammatory responses are highlighted (q value < 0.05).

Discussion
Spontaneous preterm labor is a syndrome with many etiologies and may involve intra-amniotic inflammation with or without microbial infection, oxidative stress, and placental dysfunction 14 . Accurate prediction and mitigation of spontaneous preterm birth are still challenging. Identification of symptomatic patients at the greatest risk of impending delivery allows obstetricians to implement prophylactic interventions and timely transfer into tertiary care centers and to guide antenatal therapy and postnatal care intended to reduce the risk of adverse outcomes the neonate [65][66][67][68] . Reassuring mothers that they are not at risk of imminent delivery also alleviates stress, thereby improving the likelihood of an uncomplicated pregnancy 69 . We hypothesized that the proximity of amniotic fluid to the fetus and gestational tissues makes it an ideal source to explore the transcriptomic perturbations preceding delivery. Herein we have characterized for the first time AF cell-free transcriptomic changes and identified placental single-cell RNA signatures in women who present with an episode of preterm labor. These data could inform the development of biomarkers in subsequent studies based on minimally invasive samples, such as maternal blood.
The current study included women (n = 38) who had transabdominal amniocentesis performed after an episode of preterm labor, with some delivering on the same day (within 24 h, n = 10) and others delivering later in gestation (n = 28). All women who delivered within 24 h of amniocentesis had a preterm birth, whereas 57% (16/28) of women delivering after 24 h had a preterm birth. Intra-amniotic inflammation (IL-6 ≥ 2600 pg/ml) was diagnosed in 80% (8/10) of women who delivered within 24 h of amniocentesis and in 29% (8/28) of women who gave birth later in gestation. Consistent with this observation, comparison of AF cell-free transcriptomes between the two groups revealed an up-regulation of genes involved in the immune system's innate and adaptive components, including myeloid leukocyte activation, regulation of complement activation, toll-like receptor Genes having a lower expression close to delivery were enriched for intra-cellular biological processes, e.g., cellular macromolecule metabolic process, organelle organization, cytoskeleton organization, cell cycle, and embryonic development. Thus, we observed a shift in the amniotic fluid milieu from one that reflects fetal organ   www.nature.com/scientificreports/ maturation and growth to a pro-inflammatory phenotype induced by the stimulation of a maternal and or a fetal immune response as delivery approached. The activation of the immune response was accompanied by the increased expression of genes coding for pro-inflammatory cytokines (e.g. IL-1α, IL-1β, and IL-6), chemokines (e.g. CCL20, CXCL5, CCL5, and CXCL8), matrix metalloproteases (e.g. MMP1, MMP8, and MMP9), nuclear factor kappa B (NFKB1 and NFKB2), and prostaglandin-endoperoxide synthase 2, all of which have been implicated in the pathogenesis of the preterm parturition syndrome 18,74 . Taken together, these results suggest that the activation of fetal and maternal immune responses acts as the trigger for preterm parturition in women who delivered within 24 h of amniocentesis. This response may be triggered when microorganisms invade the amniotic cavity (intra-amniotic infection/inflammation) or even in the absence of detectable microbes (sterile intra-amniotic inflammation) 19 . We did not apply the molecular microbiological techniques that reliably discriminate between these two conditions 75 . However, a previous report suggests that sterile intra-amniotic inflammation is more prevalent than intra-amniotic infection/inflammation in preterm deliveries 19 . Sterile intra-amniotic inflammation is thought to be initiated by danger signals, or alarmins, derived from cellular stress or necrotic release of intracellular matter into the intra-amniotic space 20,[76][77][78][79][80] . This process involves the activation of the NLRP3 inflammasome 22,23,[81][82][83][84][85][86] . Indeed, we observed an increased expression of the putative activators, components, and effectors of the inflammasome, such as IL-1α, baculoviral IAP repeat containing 3, guanylate binding protein 5, NLRP3, caspase-1, and IL-1β 84,87,88 .
In the current study, we also interrogated the second-trimester AF cell-free transcriptome to identify predictors of time-to-delivery after amniocentesis and to evaluate their predictive performance in risk stratification of women who present with symptoms of preterm labor. Previously, Ngo et al. 42 developed a transcriptomic model of time-to-delivery based on longitudinal maternal blood cell-free RNA, and reported that a term-delivery-based model does not predict gestational age at delivery in women with preterm birth (RMSE, 11.4 weeks). Herein, we used the AF cell-free transcriptomic data from women with preterm labor to train robust machine-learning models to estimate the interval from amniocentesis to delivery. The cross-validated transcriptomic models showed a significant prediction (Spearman's correlation 0.5, RMSE 3.1 weeks). When the continuous time-to-delivery estimates from the transcriptomic model were translated into binary predictions of delivery within 24 h, 1 week, and 2 weeks, the risk of imminent delivery was predicted with an AUROC of 0.81, 0.74, and 0.72, respectively. These results point to the AF cell-free transcriptome's potential to predict pregnancy duration after preterm labor.
A parsimonious set of 53 genes was reliably retained as predictors during cross-validation. These genes were up-regulated at delivery, and functional analysis identified biological processes related to an immune/ inflammatory response to a stimulus. Biomarkers targeting these processes were identified in predictive models based on a whole-blood transcriptome 42,89,90 . The pro-inflammatory cytokines, IL-1β, CXCL8, CCL3, CCL4, and CCL20, were among the strongest predictors selected in all iterations of leave-one-out-cross-validation. Several studies have shown an increase in amniotic fluid concentrations of proteins encoded by these genes in women with preterm labor due to intra-amniotic inflammation/infection 20,[91][92][93][94] . A causal role in preterm birth has been established for IL-1β in animal models 15,95 . We have reported on the ability of these cytokine concentrations in www.nature.com/scientificreports/ the amniotic fluid in predicting the risk of early preterm delivery 94 . Other strong predictors included pleckstrin (PLEK), B-cell lymphoma 2-related protein A1 (BCL2A1), Solute Carrier Family 34 Member 2 (SLC34A2), Aquaporin 9 (AQP9), and TNFAIP6. Upon phosphorylation by protein kinase C, PLEK increases cytokine secretion in phagocytes and contributes as an adaptor to the microbicidal activity in neutrophils 96,97 . BCL2A1 is an anti-apoptotic protein shown to prolong chorio-decidual neutrophil survival in preterm rhesus macaques in an IL-1-dependent manner 98 . SLC34A2 gene is expressed in alveolar type II cells of the fetal lung and may be involved in the cellular uptake of phosphate to produce surfactants 99 . However, the expression of genes coding for surfactant proteins was not significantly different between women delivering within 24 h of amniocentesis and women who delivered after 24 h, indicating that fetal lung maturity may not be related to imminent delivery in this study. Instead, the overexpression of SLC34A2, as well as AQP9, may reflect the metabolic adaptations needed to sustain the immune response 100 . Of note, although the current study does not have immediate and direct consequences on the management of spontaneous preterm labor, it provides an RNA-level signature that could be correlated in the future with maternal blood omics data, allowing for the development of non-invasive approaches to risk assessment. Similar to the use of amniotic fluid IL6 [101][102][103] , patients considered at high risk of imminent premature delivery could benefit from currently available interventions. These include the administration of antenatal corticosteroids to accelerate fetal lung maturity 66,104 and tocolytics treatment to inhibit myometrium contractions, and eventually allow in utero transfer to tertiary neonatal intensive care units to provide better care to prematurely born neonate 67,68,104,105 . Strengths and limitations. This study is the first to describe the cell-free transcriptome of AF in women with spontaneous preterm labor. The main limitation of this study is that no additional targeted validation studies were performed. However, our previously reported in-silico analysis 56 based on the same sample collection, RNA extraction, and expression quantification in the same population demonstrated that gestational age-specific effects in the transcriptome strongly correlate with independent reports based on samples from women of different ethnic backgrounds who were not in labor than those studied herein, for which most women self-identified as African-American. The moderate sample size and cross-validation strategy enabled the robust evaluation of predictive analytic approaches and the identification of a parsimonious set of candidate cell-free mRNAs.

Conclusion
The changes in the AF cell-free transcriptome in women who delivered within 24 h of amniocentesis compared to those who delivered later in gestation indicated the establishment of an inflammatory milieu in the intraamniotic space in response to a pathologic stimulus. Placental single-cell-specific gene signatures of critical immune cell types were also dysregulated in these women. These effects are critical components captured by the transcriptomic models predicting the duration of gestation after transabdominal amniocentesis.

Materials and methods
Study design. Pregnant women were enrolled into a prospective longitudinal study at the Center for Advanced Obstetrical Care and Research of the Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health, U.S. Department of Health and Human Services, in the Detroit Medical Center and Wayne State University. We designed a retrospective cross-sectional study from this cohort to include women who underwent transabdominal amniocentesis after an episode of preterm labor. We excluded cases with multiple gestations and genetic anomalies. The final dataset included 38 AF samples from 12 women who went on to deliver at term and 26 women who delivered preterm. Of the 38 women, 37 delivered after the spontaneous onset of labor, with augmentation of labor required in 7 women. Labor was only induced in one term pregnancy (40.7 weeks) more than 8 weeks after amniocentesis. There were no cases of preterm prelabor rupture of the membranes. All women provided written consent for the use of biological specimens and metadata in research prior to sample collection. The Institutional Review Boards of Wayne State University and NICHD approved the study protocol.
Clinical definitions. Gestational age was determined based on the date of the last menstrual period and a first or early second-trimester ultrasound examination. Term labor was defined as the presence of regular uterine contractions with a frequency of at least one every 10 min and cervical changes occurring after 37 weeks of gestation 106 . Spontaneous preterm labor was defined as the spontaneous onset of labor with intact membranes before 37 weeks of gestation.
Amniotic fluid samples. Obstetricians used a 22-gauge needle to withdraw AF transabdominally while monitoring with ultrasound under antiseptic conditions. Amniotic fluid was immediately transported to the research laboratory in a capped sterile syringe. Amniotic fluid was centrifuged at 350×g and the supernant (5 ml) was immediately stored at -80 °C 46 .
RNA extraction. RNA  Microarray data preprocessing. The Robust Multi-array Average 109 method implemented in the R/Bioconductor's oligo package 110 was applied to background correct, normalize, and summarize the raw probe-level microarray data into gene-level expression summaries based on probe-to-gene assignment provided by a custom chip definition file 111,112 . Since samples were profiled in multiple batches, we used the removeBatchEffect function of R/Bioconductor's limma package 113 to correct batch effects. To assess sources of variability in the transcriptomic data, we conducted principal component analysis on the complete set of 32,907 genes.

Differential expression analysis.
We compared the AF cell-free transcriptome between groups by fitting linear models implemented in the limma 113 package. A minimum fold change of 1.25-fold and a false discovery rate adjusted p value (q value) < 0.1 determined the statistical significance. We summarized the results of differential expression analysis with volcano plots and heatmaps. A hypergeometric test implemented in the GOstats package 114 was used to identify significantly enriched Gene Ontology 115 biological processes, molecular functions, and cellular components among the differentially expressed genes (q-value < 0.05).
Tissue-specific and single-cell-specific expressions. Tissue-specific genes were defined as those having a median expression 30 times higher in a given tissue than all other tissues in the GNF Gene Expression Atlas 63 . Genes specific to a placental cell type were defined based on the single-cell RNA-Seq analyses 61 . The log 2 transformed expression values for each gene were standardized by subtracting the mean and dividing by the standard deviation calculated from the reference study group (term delivery) 56 . The standardized values referred to as Z scores were then averaged over the top 20 genes specific to a tissue or a single-cell type. We compared the average Z scores between groups by fitting a linear model. A q value of less than 0.05 was considered significant.
Random forest prediction models. Random forest is an ensemble-supervised, machine-learning algorithm for classification and regression tasks 116 suitable for high-dimensional data 117,118 . Unlike other methods, random forests are robust enough to feature transformation and parameter tuning while being computationally efficient 119 . Random forests consistently rank among the top performers in studies evaluating and comparing different supervised machine-learning algorithms 120 . They provide an unbiased measure of the predictor variable importance based on out-of-bag samples (observations not included in fitting individual decision trees) 116 . The R package, randomForest 121 ,was used to fit random forest models with 1000 trees.
Feature selection for the random forest models. Before fitting the random forest models, a preliminary multi-variable feature selection procedure was applied based on the sparse Hilbert-Schmidt independence criterion (SHS) 122 . SHS relies on the kernel-based Hilbert-Schmidt independence criterion (HSIC) to measure the relationship between the gene expression data and the response. By introducing penalties for the number of selected features, SHS chooses a parsimonious set of genes that maximizes the dependence with the response variable while taking into account the correlation between genes. We used the algorithm implementation provided by the authors of SHS to perform this analysis, using MATLAB (version R2018a).
Prediction performance assessment. We used leave-one-out cross-validation to evaluate the predictive performance of the transcriptomic models of time from amniocentesis to delivery. At each iteration in the crossvalidation, both feature selection and random forest model fitting were performed. Prediction performance metrics included Spearman's correlation coefficient and root-mean-square error (RMSE). We also calculated the AUROC curves by using the predicted time-to-delivery as a surrogate of the inverse of the risk of delivery within 24 h, 1 week, and 2 weeks after amniocentesis.
To retain a final set of the most informative genes for the prediction of time-to-delivery, we employed the strategy of Kursa 2014 64 based on the frequency of gene selection across different partitions of the data. Significantly enriched Gene Ontology biological processes were identified among the most predictive genes by using the GOstats 114 package.
The network of protein-protein interactions. To gain insight into the relations among genes predictive of time-to-delivery in AF samples, we constructed a protein-protein interaction network representation of Scientific Reports | (2021) 11:13481 | https://doi.org/10.1038/s41598-021-92439-x www.nature.com/scientificreports/ these genes by using the stringApp 1.5.0 123 in Cytoscape 3.7.2 124 . An edge between two nodes (genes) was defined based on a protein-protein interaction confidence score > 0.4. The network's most inter-connected subnetwork was retained, and genes were annotated to significantly enriched biological processes.