Introduction

The latest data from the Food and Agriculture Organization show that in 2020 more than 46% of the daily protein supply in the world was from animal-based foods (FAO-STATS). Bovine meat and milk accounted for 12.8% of the total protein supply in the world in 2020 (FAO-STATS). These numbers underscore the importance of cattle production to sustain a growing demand for protein globally1. Infertility or subfertility is a critical barrier to sustainable cattle production2, including in heifers. For example, approximately 15%3 and 5%4 of beef and dairy heifers, respectively, do not calve at 24 months of age. Heifers that calve at an optimum age have greater productivity and longevity in the herd5,6,7,8,9,10. Therefore, identifying heifers with optimum fertility is a promising approach to improving sustainability in cattle production.

The heritability of breeding values for heifer fertility is often low for beef11,12,13,14,15,16,17 and dairy18,19,20,21,22,23 heifers, which indicate that there are multiple genetic factors impacting this complex trait beyond additive genetic effects. Another potential avenue for the understanding of infertility is the use of molecular phenotyping24. The pioneering efforts focused on genome-wide association studies (GWAS) to identify genetic markers associated with heifer fertility4,12,25,26,27,28,29,30,31,32,33,34,35,36,37,38, but only a few seem to be reproducible across populations37. More recent efforts have also focused on transcriptome39,40,41 and metabolome42 datasets characterizing these molecules in blood samples. Again, limited genes have been identified with differential transcript abundance across datasets39. Much research is needed for the identification of molecular features that can help explain fertility fitness.

Altogether, approximately 5% of heifers are infertile4,43, and this cohort is a great biological model for studying the genetic bases of infertility for several reasons. First, neither dairy nor beef heifers are under the challenging metabolic demand required for milk production44,45,46. Second, post-partum cows need to undergo a critical period of physiological and anatomical recovery before the next breeding47,48,49. Third, there are several postpartum diseases with negative consequences on reproduction success50,51,52. Reproductive problems in well-managed heifers are inherent to their physiology20, most of which are also under genetic control53, or directly related to mutations54 that impair female reproductive functions.

Angus and Holstein heifers have similar frequencies of infertility or subfertility4,43 despite the selection pressures directed at beef or dairy production, and thus have distant genetic background. Most studies involving the identification of biological features associated with fertility in heifers have used either one group of purebred or crossbreed animals4,12,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42. Here, we carried out a case-control55 experiment to test the hypothesis that differences in genetic variants, gene transcript, and protein abundance due to fertility fitness would be shared between heifers of different genetic background. Our objective was to contrast genetic variants, gene transcripts, and protein abundance between Fertile and Sub-Fertile heifers from Angus and Holstein genetic backgrounds. We show that both the independent analysis and multi-omics approach identified molecular signatures that capable of discriminating heifers of differing fertility potential, and thus with an underlying biology associated with fertility that is shared between both breeds.

Methods

All analytical procedures are presented in Additional file 1 and accessible at https://biase-lab.github.io/MultiOmics/.

Ethics statement

Animal handling for this experiment was approved by the Institutional Animal Care and Use Committee (IACUC) at Virginia Polytechnic Institute and State University.

Experimental design

We collected blood samples from purebred Angus heifers (n = 12), averaging 14 months in age, at the time of their first artificial insemination (AI) service. Heifers were subjected to a 7-Day Co-Synch + CIDR estrus synchronization protocol prior to breeding. Briefly, heifers were administered an intramuscular (IM) injection of gonadotrophin-releasing hormone (GnRH, 100 μg; Factrel®; Zoetis Inc.) on Day 0, followed by the insertion of a controlled internal drug release (CIDR, 1.38 g Progesterone; Eazi-Breed™ CIDR®; Zoetis Inc.). On Day 7, the CIDR was removed and an injection of prostaglandin F2 alpha (PGF2α, 25 μg; Lutalyse®; Zoetis Inc.) was delivered. Fixed-time AI was performed 54 ± 2 h following CIDR removal alongside a second injection of GnRH.

Additionally, we collected blood samples from purebred Holstein (n = 10) heifers, averaging 12 months in age, at the time of the first AI service. Heifers were enrolled in a 5-Day CIDR-Synch protocol before insemination. Briefly, an IM injection of GnRH was delivered on Day 0 with the insertion of a CIDR device. The CIDR device was removed on Day 5, followed by an IM injection of PGF2α. A second injection of PGF2α was administered 24 hours later. Then, timed AI was performed with a second GnRH injection on Day 8.

Heifers were identified as Fertile (Holstein, n = 5; Angus, n = 5) or Sub-Fertile (Holstein, n = 5; Angus, n = 7) based on their pregnancy outcome, following similar criteria used previously39,40. Fertile animals were identified as those who became pregnant and subsequently delivered a calf following the first insemination service. Angus heifers were categorized as Sub-Fertile after failing to achieve pregnancy following two insemination services and exposure to a bull for natural breeding. Holstein heifers were identified as Sub-Fertile after needing four or more artificial inseminations.

Heifers were synchronized with protocols that have been identified by prior research to have high success for a heifer to become pregnant to AI56,57. Hence the different protocols for beef and dairy heifers. The criteria for classification were different for each group due to differences in management that are inherent to beef and dairy replacement heifers. Most importantly, each heifer had multiple opportunities to become pregnant before being classified as sub-fertile.

The heifers utilized in this study were not part of a nutritional experiment, and thus nutrition was not accounted as a variable nor was it a factor in the selection of heifers. All dairy heifers were raised with equivalent exposure to feed. Similarly, all beef heifers were raised with equivalent exposure to feed.

Blood sample collection and white blood cell isolation

Fifty ml of blood were drawn from each animal by venipuncture of the jugular vein using 18 mg K2 EDTA vacutainers (Becton, Dickinson, and Company). The tubes were inverted for proper mixing with the anticoagulant and then immediately placed on ice until further processing.

We processed the blood samples following procedures described elsewhere39,40,58 within three hours of sampling58. Tubes containing whole blood samples were centrifuged for 25 minutes (min) at 4 °C and 2000×g to separate the buffy coat. The buffy coat was then aspirated and mixed with 14 ml of red blood cell lysis buffer (1.55 M ammonium chloride, 0.12 M sodium bicarbonate, 1 mM EDTA (Cold Spring Harbor Protocols)). Then, the solution was centrifuged for 10 min at 4 °C at 800×g and the supernatant was discarded. The remaining pellet was mixed with 200 µl TRIzol™ Reagent (Invitrogen™, Thermo Fisher Scientific, Waltham, MA) in a 2 ml cryotube (Corning Inc., Corning, NY) prior to snap-freezing with liquid nitrogen. Samples were then stored at – 80 °C until further processing.

Total RNA and DNA extraction

The buffy coat samples were thawed at room temperature in a total volume of 525 µl TRIzol™ Reagent. Then, total RNA was extracted from peripheral white blood cells using the Zymo Research Direct-zol™ DNA/RNA Miniprep kit (Zymo Research Corporation, Irvine, CA), according to the manufacturer’s protocol. Next, we assessed the quality of the RNA by quantifying the RNA integrity number (RIN) for each sample using the Agilent RNA 6000 Pico kit (Agilent, Santa Clara, CA) on the Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA).

Genotyping and data processing

We submitted 400 ng of DNA for each heifer to Neogen (Neogen Corporation, Lincoln, NE) for genotyping. The samples were genotyped using the Illumina BovineHD Beadchip (Illumina Inc., San Diego, CA) genotyping array (777K). We processed the data for quality control59 using PLINK60. First, we removed SNPs that were preferentially called in one of the groups in the case and control. This was followed by the removal of samples with more than 10% of the genotypes missing, and removal of SNPs with a minor allelic frequency less than 1%, a missing rate greater than 10%, or deviation from the Hardy-Weinberg equilibrium (P < 0.00001). Next, we carried out variant pruning. We considered a window size of 50 kilobases with five variants in each window at a correlation threshold of 0.2. After pruning, we calculated relatedness and inbreeding coefficients using the parameter ‘--make-rel’ in PLINK (Additional file 2). All reported SNP coordinates are relative to btau9 assembly converted with the LiftOver tool61.

Library preparation and sequencing

For sequencing library construction, 900 ng of total RNA was diluted into 25 µl of nuclease-free water, and RNA quantity was confirmed using the Qubit™ RNA High Sensitivity Assay kit (Invitrogen™, Thermo Fisher Scientific, Waltham, MA) on the Qubit™ 4 Fluorometer (Invitrogen™, Thermo Fisher Scientific, Waltham, MA). Libraries were prepared for next-generation sequencing using the Illumina Stranded mRNA Prep kit (Illumina, Inc., San Diego, CA) and the IDT® for Illumina RNA UD indexes (Illumina, Inc., San Diego, CA) according to the manufacturer’s instructions. Sequencing was conducted on the NovaSeq 6000 sequencing system (Illumina, Inc., San Diego, CA) using the NovaSeq 6000 SP Reagent kit v1.5 (Illumina, Inc., San Diego,CA) to produce paired-end reads 150 nucleotides in length. Sequencing was performed by the VANTAGE laboratory at Vanderbilt University Medical Center (Nashville, TN).

Sequence alignment and filtering

We aligned the sequences to the cattle reference genome (Bos_taurus.ARS-UCD1.2.105) in the Ensembl62 database with hisat263,64,65 using the -very-sensitive parameter. Then, we used Samtools66,67 to filter sequences and remove secondary alignments, duplicates, and unmapped reads. Next, we used biobambam268 to mark and remove duplicates.

Transcript quantification and gene filtering

The number of fragments that matched to the Ensembl62 cow gene annotation (Bos_taurus.ARS-UCD1.2.105) was quantified using featureCounts69, and we preserved genes annotated as protein-coding, pseudogenes, or long non-coding RNA. Genes were then retained for further analysis if counts per million (CPM) and fragments per kilobase per million (FPKM) were >1 in at least five samples.

Proteomics data and processing

One hundred μl of plasma per sample was submitted to the Virginia Tech Mass Spectrometry Incubator (VT-MSI) facility at the Fralin Life Sciences Institute, Virginia Tech, for protein extraction and data collection.

Plasma samples (100 μl) were acidified by the addition of 11.1 µl 12% (v/v) o-phosphoric acid (MilliporeSigma, St. Louis, MO), then proteins were precipitated by the addition of 725 µl LC/MS grade methanol and incubated at -80°C overnight. Precipitated protein was collected by centrifugation and solubilized in S-trap lysis buffer (10% (w/v) SDS in 100 mM triethylammonium bicarbonate ( MilliporeSigma, St. Louis, MO, pH 8.5)). Protein concentration was determined by measuring the absorbance at 280 nm, then 150 μg of protein for each sample was reduced using DTT (4.5 mM) then alkylated with iodoacetamide (10 mM, MilliporeSigma, St. Louis, MO). Unreacted I iodoacetamide was quenched with DTT (10 mM, MilliporeSigma, St. Louis, MO) and samples were acidified using o-phosphoric acid (MilliporeSigma, St. Louis, MO). Protein was again precipitated using methanol and incubated at -80ºC overnight as above. Precipitated protein was loaded onto a micro S-trap and washed with methanol then digested overnight with trypsin. Peptides were recovered and five μg, as determined by measuring the absorbance at 215 nm using a DS-11 FX+ spectrophotometer/fluorometer (DeNovix, Wilmington, DE), of each sample was analyzed twice (duplicates) using ESI-MS/MS Orbitrap Fusion Lumos (Thermo Fisher Scientific (Waltham, MA)).

Samples were first loaded onto a precolumn (Acclaim PepMap 100 (Thermo Scientific, Waltham, MA), 100 µm × 2 cm) after which flow was diverted to an analytical column (50 cm µPAC (PharmaFluidics, Woburn, MA). The UPLC/autosampler utilized was an Easy-nLC 1200 (Thermo Scientific, Waltham, MA). Flow rate was maintained at 150 nl/min and peptides were eluted utilizing a 2 to 45% gradient of solvent B in solvent A over 88 min. Spray voltage on the µPAC compatible Easy-Spray emitter (PharmaFluidics, Woburn, MA) was 1300 volts, the ion transfer tube was maintained at 275 °C, the RF lens was set to 30% and the default charge state was set to 3.

MS data for the m/z range of 400–1500 was collected using the orbitrap at 120,000 resolution in positive profile mode with an AGC target of 4.0e5 and a maximum injection time of 50 ms. Peaks were filtered for MS/MS analysis based on having isotopic peak distribution expected of a peptide with an intensity above 2.0e4 and a charge state of 2–5. Peaks were excluded dynamically for 15 s after 1 scan with the MS/MS set to be collected at 45% of a chromatographic peak width with an expected peak width (FWHM) of 15 s. MS/MS data starting at m/z of 150 was collected using the orbitrap at 15000 resolution in positive centroid mode with an AGC target of 1.0e5 and a maximum injection time of 200 ms. Activation type was HCD stepped from 27 to 33.

Data were analyzed utilizing Proteome Discoverer 2.5 (Thermo Scientific, Waltham, MA) combining a Sequest HT and Mascot 2.7 (Matrix Science, Boston, MA) search into one result summary for each sample. Both searches utilized the UniProt reference Bos taurus proteome database and a common protein contaminant database provided with the Proteome Discoverer (PD) software package70. Each search assumed trypsin-specific peptides with the possibility of 2 missed cleavages, a precursor mass tolerance of 10 ppm and a fragment mass tolerance of 0.1 Da. Sequest HT searches also included the PD software precursor detector node to identify MS/MS spectra containing peaks from more than one precursor. Sequest HT searches included a fixed modification of carbamidomethyl at Cys and the variable modifications of oxidation at Met and loss of Met at the N-terminus of a protein (required for using the INFERYS rescoring node). Peptide matches identified by Sequest HT were subjected to INFERYS rescoring to further optimize the number of peptides identified with high confidence.

Mascot searches included the following dynamic modifications in addition to the fixed modification of Cys alkylated by iodoacetamide (carbamidomethylated): oxidation of Met, acetylation of the protein N-terminus, cyclization of a peptide N-terminal Gln to pyro-Glu, and deamidation of Asn/Gln residues.

Protein identifications were reported at a 1% false discovery rate (high confidence) or at 5% false discovery rate (medium confidence) based on searches of decoy databases utilizing the same parameters as above. The software matched peptide peaks across all runs, and protein quantities are the sum of all peptide intensities associated with the protein.

Principal component analysis

We carried out principal component analysis for the genotypes after pruning using the parameter ‘--pca’ in PLINK. The eigenvectors were used for plotting. For the transcriptome data, first we obtained the variant stabilized data using the function ‘vst’ from the R package ‘DESeq2’. Next, we calculated the components using the function ‘plotPCA’ in R. For the protein data, we averaged the values for each technical duplicate and used these values as input for the function ‘prcomp’ in R.

Statistical analyses

SNP association analysis

After filtering, 575,053 genotypes from 22 animals were used for association analysis conducted in PLINK60 using Fisher’s exact test. We adjusted the nominal P values to correct for multiple hypothesis testing using the adaptative permutation procedure71 in PLINK60. Locus association was inferred at alpha = 1 × 10−5, as reported by The Wellcome Trust Case Control Consortium72 for case-control studies, as well as by previous GWAS analyses of reproductive traits in cows or heifers4,25,32, which corresponded to an adjusted P value <0.005.

Differential transcript abundance

We compared transcript abundance between samples from each breed and each fertility group. The R packages ‘edgeR’73,74, with the quasi-likelihood test, and ‘DEseq2’75, using the Wald’s and likelihood test, were utilized to conduct the analyses. We adjusted the raw P values for multiple hypothesis testing by calculating the empirical false discovery rate (eFDR76), with 10,000 permutations. Differences in transcript abundance were deemed statistically significant when eFDR <0.002 in the results obtained from the three tests.

Differential protein abundance

To identify differential protein abundance that is robust to the algorithm utilized, we analyzed the protein data using two different algorithms. First, we transformed the protein data using natural logarithm (Loge(x)). We analyzed the transformed data using a generalized mixed model77 using the R package ‘lme4’, which included the fertility group (Fertile or Sub-Fertile), breed (Angus or Holstein), and the random effect of the subject. Random effect was included in this analysis as samples were assayed twice to provide a more robust estimate of differential protein abundance. Then, we used the function ‘emmeans’, which tests the significance of the difference (H01 = μ2, H11≠μ2) with the Student’s t test78, to calculate the estimated differences in protein abundance between fertility groups within each breed. We also analyzed the log-transformed data using the R package ‘limma’79. We accounted for the same independent variables mentioned above (fertility group and breed), in addition to accounting for the correlation between the duplicated data for each individual with the function ‘duplicateCorrelation’. We tested for a differential abundance of the identified proteins using the empirical Bayes Statistics implemented in the function ‘eBayes’80,81. In both analyses, we adjusted the nominal P values using FDR82. Significance was assumed if FDR< 0.05 in both approaches.

Multi-omics factor analysis

We analyzed the multimodal multi-omics datasets (genome, transcriptome, and proteome) interactively using Multi-omics Factor Analysis approach83,84. We subset the genotypes, transcriptome, and proteome data to reduce the global profiling. We retained SNPs with a P value < 0.001 for the Fisher’s test, genes with a P value < 0.01 for all three statistical tests employed, and proteins with a P value < 0.05 in both statistical tests used. We conducted the analysis using the R package ‘MOFA2’83,84, accounting for the breed as a group.

Results

Overview of the data produced

We selected 22 Bos taurus heifers of Angus (n = 12) and Holstein (n = 10) breeds based on their fertility fitness (Fig. 1A). We isolated total RNA from circulating white blood cells, averaging 16.3 µg ± 4.0, and quality, measured by the RIN, averaging 9.4 ± 0.4. The extraction of genomic DNA yielded 1.1 µg ± 0.4. We produced RNA-sequencing data (Fig. 1B) and quantified the transcript abundance of 12,445 genes (12,105 protein-coding genes, 228 long non-coding RNAs, and 112 pseudogenes). We also analyzed 575,053 nucleotide positions across the bovine genome (Fig. 1B). Lastly, we produced untargeted proteomics data from plasma that resulted in the relative quantification of 213 proteins. As expected, the genotypic and proteomic data clustered the heifers of different genetic background separately (Fig. 1A). Conversely, there was no clustering of the samples based on the transcriptome data of the peripheral white blood cells (Fig. 1D,E).

Figure 1
figure 1

Overview of data produced. (A) Breeds and classification used in this study, including sample size. (B) Schematics of the data produced, and analysis undertaken. Principal component analysis of the genome-wide single nucleotide polymorphisms (C), transcriptome (D) and (E) proteome data. GWAS: genome-wide association analysis, DGE: differential gene expression, DPA: differential protein abundance.

GWAS identifies SNPs associated with fertility in Angus and Holsteins heifers

Our analysis identified two SNPs significantly associated with heifer fertility (rs110918927, chr12: 85648422, P = 6.7 × 10-7; and rs109366560, chr11:37666527, P = 2.6 × 10-5, Fig. 2A, Additional file 3). For the SNP rs110918927, all heifers that delivered a calf after one artificial insemination presented the genotype AA (f(A) = 1, f(G) = 0), whereas 11 out of 12 heifers classified as sub-fertile presented at least one copy of the allele G (f(A) = 0.29, f(G) = 0.71). This polymorphism sits in an intergenic region of the genome with the closest gene located > 73 kilobases downstream relative to the SNP. For the SNP rs109366560, none of the heifers classified as sub-fertile were homozygous for the allele G (f(G) = 0.12, f(A) = 0.88), and five out of the nine fertile heifers genotyped were homozygous GG (f(G) = 0.78, f(A) = 0.22). This SNP is located on intron 22 of the gene Echinoderm microtubule associated protein like 6 (EML6).

Figure 2
figure 2

Genome-wide association analysis of fertility in beef and dairy heifers. (A) Manhattan plot with the distribution of SNPs across their genome and their P values from Fisher’s exact association test. (B) Genetic frequencies of the two SNPs that are putatively linked to fertility in heifers.

Transcriptome analysis identifies differential transcript abundance between Fertile and Sub-Fertile heifers

Next, we sought to determine if there were differences in transcript abundance from circulating white blood cells between the Fertile and Sub-Fertile heifer groups, accounting for their genetic background. We identified two genes whose transcript abundance differed (eFDR ≤ 0.002) between the two groups (Fertile and Sub-Fertile), namely Adipocyte Plasma Membrane Associated Protein (APMAP, 1.16 greater abundance in the Fertile group) and Dynein Axonemal Intermediate Chain 7 (DNAI7, 1.23 greater abundance in the Sub-Fertile group) (Fig. 3A, Additional file 4).

Figure 3
figure 3

Differential transcript and protein abundance associated with fertility. (A) Transcript abundance. (B) Protein abundance. In each plot, within each breed, shapes indicate the same animal.

Proteomic analysis identifies differential protein abundance between Fertile and Sub-Fertile heifers

We also tested if there were differential abundance in proteins present in the plasma of heifers classified based on their fertility groups in both genetic backgrounds. The protein Alpha-ketoglutarate-dependent dioxygenase FTO was more abundant in the plasma collected from Fertile heifers relative to their Sub-Fertile counterparts (FDR < 0.05, Fig. 3B, Additional file 5).

Integrative multi-omics analysis identifies molecular features that classify heifers based on their fertility potential

When each data were evaluated independently, the quantification of 22 and 23 gene and protein relative abundances accounted for 44.1% and 16.6% of the variance associated with fertility classification, respectively, and the genotypic information of 59 SNPs explained 70.1% of the variance associated with fertility classification. Overall, there were four factors identified in the analysis with the potential to distinguish the samples based on their fertility status, out of which three were most representative with Factors one, two, and three being mostly dominated by genotype, transcript, and protein data, respectively (Fig. 4A). Factors one, two, and three separated most of the samples based on their fertility classification except two, three, and five samples, respectively (Fig. 4B).

Figure 4
figure 4

Multi-omics analysis of heifer fertility. (A) Percentage of variance explained by each factor within each data modality. (B) Relative separation of samples based on their phenotype for each factor plotted. (C) The relative weight of importance of the top ten features (SNP identifiers are from the array annotation, gene identifiers are from Ensembl, protein identifiers are from UniProt) in each data modality and factor plotted. (D) sample clustering using all three data modes.

Notably, the top nine SNPs that explained most of the variance related to Factor one are located in a window on chromosome 5 spanning from nucleotide 118332762 to 118345383. The tenth SNP was the top significant polymorphism identified on chromosome 12 nucleotide 85648422 according to our Fisher’s exact test contrasting heifers of different fertility potential (Fig. 4C). Among the genes whose transcript abundance explained the variance related to Factor two, we identified the following annotated genes: ArfGAP with SH3 domain, ankyrin repeat and PH domain 3 (ASAP3), ATP synthase membrane subunit c locus 1 (ATP5MC1), Centrosomal protein 170 (CEP170), Myeloid derived growth factor (MYDGF), Coiled-coil domain containing 34 (CCDC34), RAD51 associated protein 1 (RAD51AP1), and Ubiquinol-cytochrome c reductase complex III subunit VII (UQCRQ) (Fig. 4C). Among the proteins whose abundance explained the variance related to Factor three, the following were annotated to known genes: Apolipoprotein C-II (APOC2), Lymphocyte cytosolic protein 1 (LCP1), Vitamin K-dependent protein Z (PROZ), Albumin (ALB), Serotransferrin-like (LOC525947), Complement component C8 beta chain (C8B), Pigment epithelium-derived factor (SERPINF1), Phosphatidylinositol-glycan-specific phospholipase D (GPLD1), Alpha-ketoglutarate-dependent dioxygenase FTO (FTO) (Fig. 4C). Collectively, data from the genotypes, transcriptome, and proteome clustered 21 out of 22 heifers correctly based on their fertility status, with only one Fertile heifer clustering with the group of Sub-Fertile heifers.

Discussion

Reproduction is a multidimensional biological function in mammals that can be partitioned into multiple components or traits85, and as a consequence, infertility is a complex phenotype with multifactorial origins, including a strong genetic component53,54. Our study was not designed to identify molecular markers for future use in selection programs. Rather, our work addressed two critical questions regarding the underlying biology of infertility: (a) whether multiple layers of molecular information, present in the circulatory system, would differ based on female fertility fitness; and (b) whether the integrative analysis of multiple layers of molecular information would be a better predictor of the causes of infertility. Our analysis identified molecular signatures in the genome, transcriptome, and proteome that provide important insights about the root causes of infertility.

Neither one of the significant SNPs were located in a region previously associated with female reproductive traits86. These SNPs have also not been previously reported to be associated with fertility traits in previous investigations that focused on sire-centric models4,33,34,35,36,37, nor on studies that focused on genotyped heifers only32,38. However, it is notable that the polymorphism rs110918927 is in the gene EML6, which produces a protein that participates in the function of spindle microtubules in oocytes87. Knockdown of this protein in mice oocytes at the germinal vesicle stage impairs spindle morphology and increases aneuploidy87 in oocytes that progress to the metaphase II stage in the absence of EML688. The gene EML6 also produces transcripts in bovine oocytes89, and the significant SNP in this gene is a strong indication of a functional connection to reduced oocyte developmental competence in the Sub-Fertile group of heifers.

Genes differentially expressed in the peripheral white blood cells have been associated with fertility in heifers39,40,41. The protein APMAP exhibits arylesterase activity, which is known to protect lipoproteins from oxidation90. Importantly, the APMAP protein regulates adipose composition and metabolic health, and the disruption of the APMAP gene in mice leads to an increase in visceral adipose tissue expansion91. This protein was also shown to be less abundant in the omental tissue of women diagnosed with polycystic ovary syndrome92. Therefore, lower expression of APMAP in the peripheral white blood cells of Sub-Fertile heifers is possibly connected with a metabolic, hormonal or inflammatory disorder that disrupts fertility in heifers.

The Protein DNAI7 composes the axonemal dynein complex and participates in beta-tubulin binding activity and microtubule binding activity, and thus contributes to ciliary beating93. Variants that impair the function of DNAI7 are associated with Primary Ciliary Dyskinesia, with one potential consequence being the abnormal function of cilia and possible impaired transport of the cleaving embryos into the uterus94. DNAI7 may also function as a cell cycle regulator, and dysregulated transcript abundance of DNAI7 was associated with nasopharyngeal neoplasm in mice95 and lung adenocarcinoma in humans96. Since Sub-Fertile heifers have greater abundance of DNAI7 transcripts in their circulating white blood cells, it is possible that dysregulation in the cell cycle has a biological link with subfertility. Further research is required, however, to evaluate whether a dysregulation in the cell cycle linked to upregulation of DNAI7 is connected with increased inflammation91 associated with less transcripts from APMAP.

The protein Alpha-ketoglutarate-dependent dioxygenase FTO has oxidative demethylation activity of abundant N6-methyladenosine (m6A) residues in RNA97. The protein FTO preferentially demethylates N6,2′-O-dimethyladenosine (m6Am) rather than m6A and contributes to a reduced stability of m6Am mRNAs98. On a systemic level, genomic variants in FTO were associated with symptoms of metabolic disorders99, although the effects observed in humans, such elevated body mass index100,101, and mice102 may be contradictory. Also worth noting, a variant on the FTO gene was associated with polycystic ovary syndrome103. Interestingly, in mice, the FTO gene is downregulated due to a deficiency in essential amino-acids104, and deficiency in the FTO protein causes postnatal growth retardation and a significant reduction in adipose tissue and lean body mass105. Our observation of the FTO abundance in heifers of different fertility potential is an indication that Sub-Fertile heifers could be experiencing a metabolic imbalance, contributing to their lower fertility. We note that the heifers utilized in this experiment were not nutritionally challenged and thus, our observations are a consequence of their intrinsic biological system and how it may utilize nutrients.

The next step was to interrogate the data we produced in a comprehensive manner. Interestingly, the largest source of variability was observed in the genomic data. Nine of the top ten SNPs that were assigned to Factor one were located in an intron of the TAFA chemokine-like family member 5 (TAFA5) gene. These SNPs are within a quantitative trait loci for milk yield106, a trait negatively correlated with reproductive traits107, however, no relationship between genetic variants in this gene and female fertility has been reported previously. None of the top ten genes with transcript abundance relevant for the modeling of the variance were identified as differentially expressed when analyzed independently. This result is not surprising because the identification of significant features using standard statistical approaches for association analysis is not necessarily the best approach for identifying predictive genes associated with complex traits108,109. It was surprising that three out of nine annotated proteins, which composed the top ten proteins that explained most of the variance in factor three, were also identified in our analyses using general linear mixed models. The most interesting result, however, was that all three data modalities were able to separate 21 out of 22 heifers correctly based on their fertility potential. Our results show that molecular differences have strong signals linked to fertility fitness that surpasses their differing genetic background.

Conclusions

Our interrogation of multiple levels of biological information (genome, transcriptome, and proteome) at a systemic level in heifers highlighted the molecular complexity of female fertility. While the genomic data pointed to a disruption of oocyte developmental competence, the transcriptome and proteomic data point to metabolic dysregulation contributing to subfertility or infertility. Although the differences in molecular profiles identified in our study need to be further validated by mechanistic studies, our results, supported by the current literature, highlight differences in the molecular profile associated with female fertility that transcend the constraints of breed-specific genetic background.