Proteomic landscape of seminal plasma associated with dairy bull fertility

Male fertility is the ability of sperm to fertilize the egg and sustain embryo development. Several factors determine the fertilizing capacity of mammalian sperm, including those intrinsic to sperm and components of the seminal plasma. The present study analyzed the seminal fluid proteome of Bos taurus and potential associations between proteins and fertility scores. Mass spectrometry coupled with nano HPLC allowed the identification of 1,159 proteins in the dairy bull seminal plasma. There were 50 and 29 seminal proteins more abundant in high (HF) low fertility (LF) bulls, respectively. Based on multivariate analysis, C-type natriuretic peptide, TIMP-2, BSP5 and sulfhydryl oxidase indicated relationship with HF bulls. Clusterin, tissue factor pathway inhibitor 2, galectin-3-binding protein and 5′-nucleotidase were associated with LF bulls. Abundance of NAD(P)(+)-arginine ADP-ribosyltransferase, prosaposin and transmembrane protein 2 proteins had the highest positive correlations with fertility ranking. Quantities of vitamin D-binding protein, nucleotide exchange factor SIL1 and galectin-3-binding protein showed the highest negative correlations with fertility ranking. A fertility ranking score was calculated and the relationship with these proteins was significant (Spearman’s rho = 0.94). The present findings represent a major and novel contribution to the study of bovine seminal proteins. Indicators of fertility can be used to improve reproductive biotechnologies.

Also, several studies have used semen parameters to distinguish fertility phenotypic status in bulls 25 and men as well 20 . However, such parameters are not reliable predictors of male fertility when compared to in vivo assays 26 . Working with dairy bulls as a model for fertility studies allows us to have access to an accurate database of sire conception rates calculated from hundreds of artificial inseminations. Thus, the present study was conducted to analyze the seminal plasma proteome of dairy sires (Bos taurus) using a label-free shotgun proteomics approach. We also investigated potential associations between seminal plasma proteins and fertility phenotype of the bulls.

Results
Seminal plasma proteome of Holstein bulls. In the present study, 1,159 proteins were identified in the dairy bull seminal plasma, using mass spectrometry coupled with nano HPLC (Supplementary Table S1). Among all listed proteins, 765 were characterized according to UniProt database and 394 were still defined as non-characterized. Gene ontology terms related to biological process and molecular function of dairy bull seminal plasma proteins are presented in Fig. 1. The most important biological processes linked to the identified proteins were cellular process (24.3 and 24.5% in HF and LF bulls, respectively) followed by regulation (23.2 and 22.8% in HF and LF bulls, respectively) and interaction with cells and organisms (8.7 and 8.6% in HF and LF bulls, respectively). Molecular functions of bovine seminal proteins were mainly reported as binding (44.2 and 43.6% in HF and LF bulls, respectively) and catalytic activity (37 and 37.8% in HF and LF bulls, respectively).
Comparison between protein profiles of seminal plasma from high and low fertility bulls. Of the 1,159 proteins identified in dairy bull seminal plasma, 949 were found in seminal plasma from HF bulls and 771 in LF sires, with 561 proteins (48.4%) common to both HF and LF phenotypes (Supplementary Table S1; Fig. 2). Thus, 388 proteins were exclusive to HF, while 210 were found in only LF sires. Among the proteins present in the seminal plasma of all bulls (561), there were 79 with different (p < 0.05) quantities in the groups of high and low fertility sires. Then, 50 proteins were more abundant in HF bulls and 29 proteins, in LF bulls (Table 1).
According to the plot based on the principal component analysis, there was a clear distribution of proteins with different quantities in high and low fertility bulls (Fig. 3). Based on multivariate analysis, a score plot of the two components with the highest variability (49.5 and 37.3%, data not shown) was performed (Fig. 4a) and eight proteins had VIP score greater than 1.5 (Fig. 4b), indicating meaningful contributions for definition of the fertility phenotype. Proteins contributing to definition of high fertility are C-type natriuretic peptide (NPPC), metalloproteinase inhibitor 2 (TIMP2), seminal plasma protein −30 kDa (BSP5) and sulfhydryl oxidase (QSOX1). On the other hand, clusterin (CLU), tissue factor pathway inhibitor 2 (TFPI2), galectin-3-binding protein and 5′-nucleotidase (NT5E) are the ones with meaningful contributions to low the fertility phenotype. Protein-based fertility rank score. Given the 79 seminal plasma proteins with different abundances (p < 0.05) between the groups of high and low fertility bulls, we selected the six proteins showing the highest correlations with fertility ranking. NAD(P)(+)-arginine ADP-ribosyltransferase, prosaposin and transmembrane protein 2 had the highest positive correlations, while vitamin D-binding protein, nucleotide exchange factor SIL1 and galectin-3-binding protein showed the highest negative correlations with fertility of bulls. Using the normalized abundances of all these six proteins, it was possible to calculate a predictive fertility rank score with Spearman's rho = 0.91 and Pearson's correlation = 0.83 (Fig. 5).

Discussion
In the present study, we used a label-free mass spectrometry approach to characterize the seminal plasma proteome of adult Holstein bulls. This strategy allowed the identification of 1,159 proteins and represents a major contribution to the understanding of seminal plasma composition in the Bos taurus species. Moreover, there were specific seminal proteins with different expression profiles in sires with contrasting fertility phenotypes determined in vivo. The major biological processes (cellular process, regulation and interaction with cells) and molecular functions (binding and catalytic activity) of the seminal proteins related to their participation in events such as cell protection, sperm motility and capacitation, acrosome reaction, fertilization and embryonic development. Also, gene ontology terms related to biological process and molecular function were very similar in bulls of high and low fertility. The reason for this certainly relies on the fact that dairy bulls used in our study have been selected for decades by the AI companies and differences in fertility among them do not relate to general pattern of seminal plasma proteins, such as the pattern defined by gene ontology. Instead, differences in fertility phenotypes of those bulls are associated to very specific molecular aspects of seminal plasma, as we discuss below.
According to multivariate statistical analysis, proteins identified as more abundant in HF bulls were seminal plasma protein −30 kDa (BSP5), metalloproteinase inhibitor 2 (TIMP2), C-type natriuretic peptide (NPPC) and sulfhydryl oxidase (SQOX1). These proteins had VIP >1.5 and, thus, they are the best indicators of the high fertility phenotype of the bulls. BSP5 belongs to the Binder of Sperm Protein (BSP) family and, along with BSP1 and BSP3, represent around 60% of all proteins of bull seminal plasma 15,27,28 . BSPs are secreted by the accessory sex glands and, after ejaculation, bind to sperm 29 and induce cholesterol and phospholipid efflux from the sperm membrane, an essential step for capacitation 30 . BSPs also mediate sperm interaction with the oviduct epithelium 31 and BSP1 affects both fertilization and early development of bovine embryos in vitro 13 . In silico analysis points out to significant interactions between BSP5 and metalloproteinase inhibitors, such as TIMP-2, both with greater abundance in high fertility bulls. Furthermore, a link seems to exist between BSP5 and the oocyte-expressed  protein homolog, which is located in subcortical cytoplasm of early embryos and play roles in cell divisions 32 , thereby supporting a putative role of BSP5 during fertilization. Metalloproteinase inhibitors (TIMPs) modulate the activity of matrix metalloproteinases (MMPs), enzymes involved in remodeling of the extracellular matrix (ECM) 33 . The balance between MMPs and TIMPs is crucial during ECM remodeling 33 . Certain TIMPs have been suggested to play roles in sperm-egg fusion in the mouse 34 . TIMP-3 controls the degree of trophoblast implantation in the murine uterus 35 and TIMP-2 content in bovine seminal plasma has a negative correlation with post-thaw sperm morphology and membrane stability 36 . Moreover, treatment of bull sperm with heparin binding proteins, a fertility-associated antigen and TIMP-2, increased pregnancy rates after artificial insemination 37 . Not only does TIMP-2 interact with several types of metalloproteinases but also with HPX, a protein that transports hemoglobin to the liver for breakdown and iron recovery. According to in silico analysis, TIMP-2 interacts with several types of metalloproteinases and, in fact, some MMPs are involved in reproductive events, such as angiogenesis, implantation and embryogenesis 38,39 .
NPPC belongs to a family of small peptides that participates in natriuresis and diuresis through vasodilatation 40 . Authors have described higher amounts of NPPC in reproductive tissues of male pigs when compared to other 41 . NPR-B (a NPPC receptor) is present in the acrosome and tail of human sperm and, thus, it is plausible that NPPC from seminal plasma binds to its receptor thereby stimulating intracellular cGMP and sperm motility 42 . In silico analysis showed interactions between NPPC, its receptors and guanylate cyclase, an enzyme    Predictive fertility rank score based on protein score (Y) and bull fertility rank (X). Protein score was obtained using normalized abundances of the six proteins with highest correlation with fertility rank. Bull fertility rank is shown from the highest (1) to the lowest value (10), as defined in Table 2. A predictive fertility rank score was significant with Spearman's rho = 0.91 and Pearson's correlation = 0.83. The blue line represents the conception rate difference from average (%) and the orange line represents the protein fertility score. The dotted line represents the linear regression for the respective (blue or orange) curve, showing the correlations between both scores, the conception rate and protein fertility score. involved in cGMP biosynthesis 42 . Also, NPPC interacts with natriuretic peptide A, which promotes trophoblasts implantation and artery remodeling in uterus 43 . QSOX1 plays a role in reduction of oxygen molecule to hydrogen peroxide, forming disulfide bonds in proteins and peptides 44 . In the male reproductive tract, QSOX1 protects spermatozoa structure and function by oxidizing sulfhydryl groups that could cause damage to the cell 45 . Several authors suggest that QSOX is crucial for sperm physiology and its dysregulation is associated with failures in spermatogenesis in hamsters 46 and rats 47 . Based on in silico analysis, QSOX1 interacts with several types of glycoproteins present in the cellular membrane and with albumin, the most abundant protein of the cauda epididymal fluid in Holstein bulls 48 . Albumin protects sperm cells against harmful effects of lipid peroxides 49 and acts during sperm capacitation 50 and acrosome reaction 51 . QSOX1 also interacts with vascular endothelial growth factor A and with insulin growth factor 1, a protein that improves blastocyst rate formation 52 in the bovine species. Thus, seminal plasma proteins BSP5, TIMP2, NPPC and QSOX1 participate in important events related to reproduction, which explains, at least partially, their empirical associations with fertility. An earlier study described a quadratic relationship between BSP5 content in accessory sex gland fluid and fertility status of bulls 15 . This indicates that increasing amounts of BSP5 are beneficial but too much BSP5 in semen becomes detrimental to fertility. In fact, in vitro experiments confirm that BSPs are needed for proper sperm function but, when cells are exposed to high amounts of BSPs and for long periods of time, they excessively loose membrane cholesterol and phospholipids and become less viable 25,47 . Considering these facts, we suggest that the amount of BSP5 present in the bulls of our study was not sufficiently high to exert negative effects on fertility.
Seminal plasma clusterin (CLU), tissue factor pathway inhibitor 2 (TFPI2), galectin-3-binding protein and 5′-nucleotidase had VIP >1.5, indicating their significant contribution for definition of the low fertility phenotype of the dairy bulls. CLU is a chaperone and protects sperm against complement-mediated attack 53 and against the effects of protein precipitation 54 . Clusterin contributes to removal of defective spermatozoa and is an indicator of poor semen quality in bulls 55 , rams 56 , men 57 , stallions 58 and peccaries 59 . Also, seminal plasma CLU is inversely associated with the number of normal sperm in beef cattle 60 and a positive association exists between abnormal morphology of sperm head and clusterin expression after scrotal insulation of Holstein bulls 55 . Based on in silico analysis, CLU interacts with a diverse cohort of molecules, some of which found in the reproductive tract and germ cells of bulls, such as serpins, albumin, TIMP, alpha-2-HS-glycoprotein. CLU also interacts with galectin-3 binding protein, another protein found at high levels in the seminal plasma of low fertility bulls, discussed below. Thus, there is sufficient experimental evidence in support of the inverse association between seminal plasma CLU and fertility of bulls.
TFPI-2 is a serine protease inhibitor also known as matrix-associated serine protease inhibitor (MSPI) 61 . It has been postulated that TFPI-2 is in fact one of the products of PP5 (placental protein 5) degradation 62 . PP5 is a placental glycoprotein associated with the coagulation and fibrinolytic system 63 and PP5 plays a role in clotting and liquefaction mechanisms in human seminal plasma 64 . In fact, our in silico analysis indicates that TFPI-2 activity is linked to coagulation proteins, such as coagulation factors (F3, F7, F11 and F12), plasminogen precursor (PLG) and tissue-type plasminogen activator (PLAT). TFPI-2 also interacts with kallikreins, a group of proteins that convert kininogen into kinin, promoting increase in sperm motility 65 . However, further studies are still needed to confirm if TFPI-2 has a causal relation with low fertility in bulls. NT5E is a glycosylated enzyme already described in seminal plasma of bulls 66 and participates in hydrolysis of AMP, stimulating sperm motility and sperm capacitation 67 . In silico analysis showed interactions of TFPI-2 with deaminases (ADA, AMPD1 and AMP D2), phosphorylases (ENTPD1, ENPP1, ENPP3, PNP and UPP2) and inosine-5′-monophosphate dehydrogenase 2 (IMPDH2). These enzymes that interact with TFPI-2 act through different intracellular signaling events that may lead to activation of sperm motility and capacitation. Galectin-3-binding protein is a member of beta-galactoside binding lectins expressed in various cells and tissues 68 and this molecule has been previously found in epidydimal fluid of dairy bulls 48 . In humans, seminal galectin-3-binding protein plays multiple roles associated with semen liquefaction, sperm motility, angiogenesis in the female reproductive tract and as a pro-inflammatory agent 69 . In silico analysis detected an interaction between galectin-3 binding protein and clusterin, which is also over expressed in low fertility dairy bulls. Like clusterin, galectin-3-binding protein also interacts with albumin, a molecule involved in sperm capacitation and acrosome reaction, as we mentioned above. In addition, galectin-3 binding protein interacts with complement factor D (CFD), an activator of the immune system in the female reproductive tract 70 . As well known, seminal plasma components interact with the female reproductive tract, stimulating gene expression and the immune system, influencing fertility and embryo development 71 . Proteins from seminal plasma interact with endometrium epithelial cells, inducing or suppressing several mRNAs. This event causes synthesis of cytokines and chemokines that recruit immune cells from the In conclusion, the present study is a comprehensive overview of the proteome of bull seminal plasma. An approach based on DDA label-free mass spectrometry allowed the description of 1,159 proteins and this is, so far, the broadest inventory of the bovine seminal plasma proteome. At this point, we cannot precisely make inferences about the full protein composition of the seminal plasma of dairy bulls but it is certain that seminal fluid is a very complex milieu, containing components yet to be identified. Statistical analyses indicated eight proteins with significant contributions for definition of the fertility phenotype of dairy bulls. Moreover, we describe two distinct fertility indicators: a discriminator of high and low fertility bulls and a rank predictor. The entire approach used in our study and functional aspects of potential indicators of bull fertility are depicted in Fig. 7. Studies about the composition of seminal fluid set the foundations for the understanding of mechanisms regulating male fertility, which in turn has major economic impact for the industry and farmers. Also, definition of molecular indicators of bull fertility can be used to enhance reproductive biotechnologies for cattle.

Materials and Methods
Experimental design. Analysis of the seminal plasma proteome from Holstein bulls (Bos taurus) with contrasting in vivo fertility rates was conducted using high performance liquid chromatography combined with mass spectrometry. Computational biology as well as univariate and multivariate analyses were performed to compare the seminal proteome of high (HF) and low (LF) fertility dairy bulls and to detect molecular indicators for bull fertility. Then, we evaluated the correlation between normalized abundance of seminal proteins to create an equation of predictive fertility score.
In vivo bull fertility and semen samples. Semen samples from ten Holstein bulls with reliable fertility phenotypes ( Table 2) were provided by Alta Genetics (Watertown, WI, USA). Individual fertility scores of bulls used in the present study were calculated using Probit.F90 software, based on the average conception of at least 674 breeding outcomes per bull 6 . The population standard deviations were used as criteria to define bull fertility 74 and, for the present study, high and low fertility sires differed from the mean by at least 1.3 standard deviations. Factors that influence fertility performance of sires (breeding event, environmental factors and herd management) were adjusted to determine reliable fertility scores using threshold models 69 .

Figure 7.
A graphical abstract showing the main findings of this study. Using DDA label-free mass spectrometry, 1,159 proteins were identified in dairy bull seminal plasma. Of the 1,159 proteins, 949 were found in seminal plasma from high fertility (HF, n = 5) bulls and 771 proteins in low fertility (LF, n = 5) sires. while 561 proteins were common to both HF and LF phenotypes. There were 50 and 29 seminal proteins more abundant in HF and LF bulls, respectively. Based on multivariate analyses, there were eight proteins with VIP score greater than 1.5 (Fig. 4b), indicating meaningful contributions of such proteins for definition of the fertility phenotype. Among them, four proteins were more abundant in either HF or LF bulls. DDA: dependent data acquired; PLS: partial least square; VIP: variable influence in projection. Semen from the five high and five low fertility bulls were collected using an artificial vagina and treated with a protease inhibitor, as reported before 24,74 . Right after semen collection, seminal plasma was subjected to a 10-min. centrifugation at 700 × g (4 °C). Afterwards, the resulting supernatant (seminal plasma) was transferred to a new tube and centrifuged again at 10,000 × g for 60 min., at 4 °C 27  Ten stage tip C18 columns were manually made to perform peptide desalting using Empore TM SPE disks (Sigma-Aldrich, Darmstadt, Germany), as previously described 76 . Briefly, to prepare a stage tip C18 membranes 100% methanol was added to the column at centrifuged at 1,000 × g for 3 min. The same procedure was repeated twice: first with a solution containing 80% acetonitrile and 0.5% acetic acid and second with 5% acetic acid. Finally, tryptic-digested samples were added to columns and centrifuged at 900 × g during 5 min, followed by washing twice with 0.5% acetonitrile at 1,000 × g for 3 min. To elute peptides, the columns were centrifuged at 600 × g for 3 min with increasing concentrations of acetonitrile (25% to 80%) with 0.5% acetic acid. Then, samples were again subjected to peptide quantification prior to mass spectrometry analysis (Qubit TM ; Thermo Fisher, Waltham, MA, USA).
Label-free mass spectrometry. Three micrograms of tryptic digested peptides from each sample were individually applied to a Dionex Ultimate 3,000 liquid chromatographer (Thermo Scientific, Waltham, MA, USA) for reversed phase nano-chromatography. The peptides were injected into a 2 cm × 100 µm trap-column containing C18, 5 µm particles (Dr. Maisch GmbH, Germany). The peptides were eluted from this column to another analytical one (32 cm × 75 µm) containing C18, 3 µm particles (Dr. Maisch GmbH, Germany) and finally eluted to the spectrometer's ionization source. The elution gradient was composed of 0.1% formic acid in water (solvent A), and 0.1% formic acid in acetonitrile (solvent B), in a gradient of 2 to 35% solvent B for 170 min.
Samples were analyzed in positive DDA (data dependent acquisition) mode in a label-free mass spectrometric approach using an Orbitrap Elite instrument (Thermo Fisher, Waltham, MA, USA), as previously reported 77 . The eluted fractions generated MS1 spectra between 300-1,650 m/z with a resolution of 120,000 FWHM at 400 m/z. The twenty most abundant ions from MS1 with charges larger than two were automatically selected to fragmentation (MS2) by higher-energy collisional dissociation (HCD) with an automatic gain control (AGC) of 1 × 10 6 and dynamic exclusion of 10 ppm for 90 s. HCD isolation window was set for 2.0 m/z, with 5 × 10 4 AGC, normalized collision energy of 35% and threshold for detection of 3,000.
Data analyses. MS1 spectra found in the chromatograms were aligned and, according to integrated intensity area from the XIC peaks generated by the respective ion, quantified using Progenesis QI software Nonlinear Dynamics (Waters, Milford, MA, USA). The protein identification was performed using Peaks software, which deduces sequences from the fragmentation information and searches in UniProt database. Protein identification information was inserted again in Progenesis QI program and combined with quantitative data generated previously.
Multivariate statistical analysis was performed using Progenesis QI software to evaluate differences in protein abundance in bulls of high and low fertility. Normalized abundances of proteins were plotted against fertility scores of each bull. A first statistical analysis was performed before protein identification to filter the MS1 features presenting ANOVA p-values < 0.05. Peaks 7.0 software was used with the fragmentation spectra and searched the Bos taurus Uniprot database, downloaded on 01/nov/2016. Parameters were set as following: precursor ion mass error tolerance of 10 ppm, MS/MS mass tolerance of 0.5 Da, carbamidomethylation of cysteine residues (fixed modification), deamidation and methionine oxidation (variable modifications). Trypsin was selected as the digestion enzyme, and up to two missed cleavage sites per peptide were allowed. The identified proteins were filtered at a rate of 1% for false discovery rate (FDR), and a minimum of 1 unique peptide per protein was required for identification. The protein input from Peaks were imported into the Progenesis QI software to generate quantitative data at the protein level. Multivariate Principal Component Analysis (PCA) was performed in Progenesis QI to evaluate protein abundances as related to phenotypes. Proteins were considered differentially abundant when presented p ≤ 0.05 after the ANOVA test at the protein level.
An additional multivariate analysis was carried out using MetaboAnalyst 3.0 (http://www.metaboanalyst.ca) 78 considering the proteins significantly related to bull phenotypes. The protein dataset was normalized by sum, and Pareto-scaling was used to reduce relative importance of MS large values. Partial-Least Squares Discriminant Analysis (PLS-DA) was applied to differentiate classes in highly complex protein datasets, despite variability within each class. Variable Importance in Projection (VIP) based on the PLS-DA was used for the identification SCientiFiC REPORTS | (2018) 8:16323 | DOI:10.1038/s41598-018-34152-w of biologically relevant features to categorize indicators of fertility. Then, variables with VIP >1.5 were considered important for group separation (high vs low fertility). Normalized relative abundances of the regulated proteins were tested for correlation with the individual fertility scores using Spearman and Pearson correlation coefficient. The abundances of the three better positively correlated proteins and the three better negatively correlated proteins were used to calculate a rank score. Then, a protein-based fertility rank score was calculated based on a curve-fitting model of the protein abundances.
Functional clustering and networking of bull seminal plasma proteins. Gene ontology (GO) analysis was carried out using STRAP software, gathering information about biological process and molecular function from UniProtKB and EBI databases. Moreover, in silico analysis of protein-protein network was performed using STRING (http://string-db.org) version 9.0 database 27 . Interactions were evaluated for seminal plasma proteins associated with VIP score >1.5.