Rewiring of gene expression in PWBC at artificial insemination is predictive of pregnancy 1 outcome in cattle ( Bos taurus ) 2 3

ABSTRACT Infertility is a disease that affects humans and cattle in similar ways. The resemblance includes complex genetic architecture, multiple etiology, low heritability of fertility related traits in females, and the frequency in the female population. Here, we used cattle as a biomedical model to test the hypothesis that gene expression profiles of protein-coding genes expressed in peripheral white blood cells (PWBCs), and circulating micro RNAs in plasma, are associated with female fertility, measured by pregnancy outcome. We drew blood samples from 17 female calves on the day of artificial insemination and analyzed transcript abundance for 10496 genes in PWBCs and 290 circulating micro RNAs. The females were later classified as pregnant to artificial insemination, pregnant to natural breeding or not pregnant. We identified 1860 genes producing significant differential coexpression (eFDR

day of artificial insemination and analyzed transcript abundance for 10496 genes in PWBCs and 23 290 circulating micro RNAs. The females were later classified as pregnant to artificial 24 insemination, pregnant to natural breeding or not pregnant. We identified 1860 genes producing 25 significant differential coexpression (eFDR<0.002) based on pregnancy outcome. Additionally, 26 237 micro RNAs and 2274 genes in PWBCs presented differential coexpression based on 27 pregnancy outcome. Furthermore, using a machine learning prediction algorithm we detected a 28 subset of genes whose abundance could be used for blind categorization of pregnancy outcome.

29
Our results provide strong evidence that bloodborne transcript abundance is highly associated 30 with fertility in females.

32
The Word Health Organization considers Infertility to be a "disease of the reproductive 33 system" 1 . Infertility is a common problem in humans and agriculturally important large mammals.

34
For instance, prevalence of infertility may reach 16% 2 and 15% 3 in humans and beef female 35 calves, respectively. Similar to what has been observed in humans 4 , the genetic architecture of 36 fertility in beef heifers is very complex 5,6,7,8,9,10,11,12 . In line with this complexity, female reproductive 37 traits have low heritability. The heritability of traits related to female reproductive fitness (i.e.: birth 38 rate and age at last reproduction) in humans ranges from 0.05 to 0.28 13 . Likewise, reproductive 39 related traits in cattle such as heifer pregnancy and first service conception range from 0.07 to 40 0.20 5,14,15,16,17,18,19 and 0.03 to 0.18 5,14,17 , respectively. Due to genetic and physiological 41 resemblance with human, cattle have been identified as an important biomedical model for 42 dissecting female fertility traits 20,21 .

43
Beyond the importance as a biomedical model, cattle production systems provide 44 approximately 28% 22 of the protein supply globally. Improving cattle production efficiency is 45 essential for farmers to attain sustainable production and support the growing demand for animal 46 protein 22 . Infertility is a major factor that hinders efficiency in cattle production, and it starts with 47 limited success of pregnancy in young female calves. First breeding success greatly influences 48 the lifetime efficiency of beef replacement heifers. Heifers that calve early in their first calving 49 season experience increased productivity and longevity than their later calving herd 50 mates 23,24,25,26 . Furthermore, the genetic correlation between yearling pregnancy rate and lifetime 51 pregnancy rate is high (0.92-0.97) 27,28 . Therefore, the ability to identify heifers that experience 52 optimal fertility during the first breeding is essential to the sustainability of beef cattle production 53 systems.

54
The examination of the genetic components of fertility in beef heifers have yielded several 55 genes potentially associated with fertility traits 5,6,7,8,9,10,11,12 , but the effect of these markers are

64
Supplementary Table 3, Supplementary Fig. 1). Of these 1860 genes, 963 genes formed 716 97 negative, 1053 genes formed 708 positive, four genes lost two positive, and 40 genes formed 23 98 inverted coexpression connections in not-preg heifers relative to both AI-preg and NB-preg 99 heifers. Among the 963 genes that formed new negative coexpression connections in not-preg 100 heifers, there was enrichment of the biological processes "proteolysis" (n=38 genes) and 101 "translation" (n=52 genes, FDR<0.1, see Fig. 2d for selected genes and Supplementary Table 4 102 for list of genes). Five of the 40 genes that formed inverted coexpression connection in not-preg 103 heifers relative to the pregnant counterparts were associated with "negative regulation of 104 transcription by RNA polymerase II" (FDR<0.1, see Fig. 2d for selected genes and Supplementary 105 Table 5 for a list of genes). These results indicate that physiological factors associated with fertility 106 or infertility act on the transcription protein-coding genes in PWBCs.

107
Differential miRNA:mRNA coexpression associated with pregnancy outcome 108 The generation of data from small RNA (from plasma) and mRNA (from PWBCs) from the 109 same subjects (n=17) permitted us to ask whether circulating miRNAs have correlated expression 110 with protein-coding genes expressed in the PWBCs. We identified 141 pairs of miRNA:mRNA 111 with |r|>0.85 (eFDR<0.0001, Supplementary Fig. 2) that were formed by 106 protein-coding genes 112 and 33 miRNAs (Supplementary table 6). Among the 106 protein-coding genes existing in high 113 correlation with miRNA, we identified three significantly enriched biological processes, namely: 114 "peptidyl-threonine phosphorylation", "cilium assembly" and "regulation of gene expression" (Fig.   115 3a, Supplementary table 7). Of notice, 30 of the 141 miRNA:mRNA pairs that presented negative 116 correlation between the expression levels have been previously identified as interacting pairs in 117 the miRWalk database (Fig. 3b). Additionally, bta-mir-744 and bta-mir-320a emerged as the two 118 most connected miRNAs (13 and 6 connections, respectively). It is evident that circulating 119 miRNAs have regulatory roles over transcripts of protein-coding genes expressed in the PWBCs.

120
We also interrogated miRNA:mRNA pairs for differential coexpression associated with 121 pregnancy outcome. We identified 2330 and 2738 miRNA:mRNA pairs with gain of negative and 122 positive correlated expression, respectively, (|r|>0.9, eFDR≤0.01, Supplementary Fig. 3) in not-123 pregnant heifers that were not identified compared to in pregnant heifers (both AI-preg and NB-124 preg, Fig. 3c). In addition, there were 51, two and three connections showing inverted, loss of 125 negative and loss of positive correlations in not-pregnant heifers relative to the pregnant (both AI-126 preg and NB-preg) counterparts (Fig. 3c, see Supplementary table 8 for genes). Among the 1220 127 protein-coding genes associated with gain of negative correlation with miRNAs, there were 128 biological processes significantly enriched, namely: "immunoglobulin mediated immune response", "negative regulation of DNA-binding transcription factor activity", "lymphocyte 130 chemotaxis" and "mitochondrial respiratory chain complex I assembly" (FDR<0.1, Supplementary 131 table 9). Among the 1288 genes associated with gain of positive correlation with miRNAs, the 132 following biological processes were significantly enriched: "regulation of DNA replication" and 133 "cellular response to BMP stimulus" (FDR<0.1, Supplementary  pregnancy outcomes with the exception of miR-11995 that was more abundant in AI-preg versus 150 not-pregnant heifers. These results indicate that quantitative differences exist in transcript 151 abundance of protein-coding genes expressed in PWBC relative to the pregnancy outcome.

152
Next, we asked if differential gene expression could be associated with differential 153 miRNA:mRNA coexpression. An intersection of the results identified 12 differentially expressed 154 genes (10 annotated genes) in not-pregnant heifers relative to the AI-preg and NB-preg groups 155 that also gained coexpression with 27 miRNAs (Fig. 4c). Notably, all genes that had higher 156 expression in not-pregnant heifers (CEACAM4, FCGR2, ISG20, JAML, LSP1, NCF1, NFE2, Motivated by the observations that several genes have their abundance altered in 162 relationship to the fertility potential pregnancy (Fig. 4a), we asked whether transcript abundances 163 could serve as predictor features across datasets. We generated gene expression values for a 164 similar dataset we produced previously (GSE103628 34 ), which also consisted of transcriptome 165 data from PWBCs collected from heifers at the time of AI and were classified according to their 166 pregnancy outcome as AI-preg (n=6) or not-pregnant (n=6). This dataset was particularly fitting 167 for this interrogation because the sampling occurred one year earlier on the same site. We

174
We used the transcript abundance values for these 198 genes obtained from samples 175 collected in year one as input (training set) for machine learning algorithms to derive models of 176 prediction to discriminate pregnancy outcome (AI-preg or not-pregnant, Fig. 5b). Next, we used

185
GATA3 were down-regulated in AI-preg heifers relative to the not-pregnant group (Fig. 5f).

188
Here, we report a multilayered transcriptome analysis of bloodborne transcripts sampled 189 on the day of AI. The main objective of our study was to understand gene regulatory networks 35 190 occurring in PWBCs, and rewiring that can occur based on the heifer's fertility potential. We 191 focused on the day of artificial insemination because it is the last day prior to the fertilization, when 192 the heifer has the first opportunity to become pregnant. We interrogated mRNA and miRNA 193 abundance levels obtained from the same individuals using the coexpression framework, which 194 is a valuable systems biology approach to gain insights into the biology of complex traits 35 .

195
Furthermore, using machine learning 36 , we detected consistent altered gene transcript abundance

226
Immuno-related cells composing the PWBC population can uptake circulating small 227 RNAs 51 . Although our data did not allow us to trace the origin of the tissue that produced the 228 miRNAs, we detected 33 circulating miRNAs whose abundance are associated with the 229 abundance of 106 protein-coding genes. As reported in humans 52 , we quantified positive and 230 negative correlated abundance between miRNA and protein-coding genes. Negative correlations 231 mostly occur due to the targeting of mRNA by a specific miRNA. Indeed, we found several pairs 232 of miRNA:mRNA negatively correlated that had been identified as interacting pairs that fit this

242
We also tested whether circulating miRNA and PWBC mRNA pairs have differential 243 coexpression relative to the pregnancy outcome. Following the same trend observed for 244 miRNA:mRNA coexpression in PWBCs, there were 1000-fold more miRNA:mRNA connections 245 that were present in heifers that did not become pregnant compared to the loss in coexpression.

246
Results from gene enrichment analysis support the notion that genes forming new miRNA:mRNA 247 coexpression connections in not-pregnant heifers function in the immune system, although our 248 data do not permit inferences on whether particular immune processes were more or less 249 functional in not-pregnant heifers.

250
Critical aspects of female fertility are tied to an appropriate regulation of the immune 251 system. Maternal immune tolerance to pregnancy starts with the exposure to proteins in the 252 semen 56 , and is carried throughout placentation and pregnancy establishment 57 . Notably, 24 out 253 of 26 differentially expressed genes between pregnant (AI-preg and NB-preg) and not-pregnant 254 heifers showed greater abundance in not-pregnant heifers relative to the other two groups. (AI-255 preg and NB-preg). While we did not detect differential abundance of miRNAs relative to 256 pregnancy outcome, the gain of coexpression between miRNAs and mRNAs from differentially 257 expressed genes functionally linked with immunology is in line with regulatory roles that certain 258 miRNAs exert on immune cells 58 . This result lead us to hypothesize that the heifers in the not-259 pregnant group have a heightened activity of specific cells or functions in the immune system 260 relative to the heifers that became pregnant. It remains unclear, however, if this possible higher 261 activity was stimulated by extrinsic factors or is within the naturally-occurring variation 59 .

262
The results of differential gene expression and differential coexpression fostered a 263 hypothesis that transcript abundance in the PWBCs can be used as predictors of pregnancy 264 outcome. We tested this hypothesis by applying machine learning algorithms on two groups of

271
Although we detected altered coexpression with high confidence, and in some instances, 272 with significant enrichment of genes in specific gene ontology terms, the consequence of the

278
In summary, analyses of mRNA from PWBCs and circulating extracellular miRNA by 279 multiple approaches (differential gene expression, differential gene coexpression and machine

291
Thirty-four days prior to AI, heifers were evaluated for reproductive tract score (scale of 1-5 66 ) and 292 body condition score (BCS; scale of 1-9 67 ) for the assertion that morphological and physiological 293 conditions were suitable for breeding. One animal was eliminated from the breeding herd at this 294 time.

295
Thirty one animals, 14 months of age on average, underwent estrous synchronization for 296 fixed-time artificial insemination utilizing the 7-Day Co-Synch + CIDR protocol 68 . An experienced 297 veterinarian inseminated all heifers with semen from sires of proven fertility. Fourteen days after 298 insemination, heifers were exposed to two fertile bulls for natural breeding for 60 days. An

360
Sequences aligning to multiple places on the genome, or with 5 or more mismatches, were filtered 361 out. The sequences were then marked for duplicates, and non-duplicated pairs of reads (mRNA)

404
We further interrogated the data for experimentally validated miRNA-mRNA interactions obtained

428
We focused the mRNA sequencing data from AI-pregnant and not-pregnant heifers from 429 this report and data we collected from the same site on the previous year and published 430 elsewhere 34,76 . The raw sequences from the previous dataset were realigned, and data were 431 treated as described above.

432
To remove the genes not informative to the machine learning algorithm, we identified 433 differentially expressed genes between the two groups (AI-pregnant and not-pregnant). We used 434 the same procedures and standards aforementioned with the exception that we added year as a 435 fixed effect into the model. There were 198 genes differentially expressed between these two 436 groups (AI-preg and not-pregnant). We performed a variance stabilizing transformation on the 437 counts for these 198 genes, and the transformed data were used as input.