Introduction

Fatty acids (FAs) are energy substrates and important components of the cell membrane in the form of phospholipids. Their biological activities influence cellular and tissue metabolism and function, and signals responsiveness, consequently affecting health and disease risk1.

The nutritional and technological quality of milk is largely influenced by the quantity of milk fat and the fatty acid profile2. For instance, milk and dairy products are among the major sources of saturated fatty acids (SFAs) in the human diet. Epidemiological and clinical studies suggested that dietary SFAs are associated with a higher risk of cardiovascular disease, leading to a public health recommendation to decrease SFA intake3. By altering the milk fat composition, it may be possible to reduce SFA consumption without dietary intervention and changes in eating habits4. However, the quality of the evidence on which dietary recommendations are based has been recently challenged and it has been suggested that it would be preferable to consider the biological action of individual FAs rather than large FA groups5. Bovine milk contains, in fact, several bioactive FA such as butyric acid (4:0), palmitoleate (16:1 c9); oleic acid (18:1 c9), vaccenic acid (18:1 t11) and conjugated linoleic acid (CLA, 18:2 c9, t11)6.

Various factors can influence the milk FA composition, including stage of lactation, physiological state and feeding as well as the animal’s genetic background7. Several studies have shown the existence of heritable variations in the FA profile of bovine milk8,9, while genetic correlations among milk FA traits have been also estimated and biologically interpreted9,10. Furthermore, single nucleotide polymorphisms (SNPs) located in genes underlying quantitative trait loci for fat yield or content, such as SREBP1, SCD1, DGAT1 and ABCG2 11,12,13, and in other genes involved in lipid biosynthesis or metabolism have been shown to affect the milk FA profile in different bovine breeds14,15,16. Genome-wide association studies (GWAS) have also shown several genomic regions to be significantly associated to the milk FA profile17,18,19, supporting the finding that milk fat synthesis and secretion are coordinated by a complex network of interrelated genes. The availability of data on the genetic parameters of individual FAs and identification of polymorphisms affecting their contents provide useful information for developing breeding selection strategies aimed at obtaining a milk FA profile more beneficial to human health.

Studies are available which investigated gene expression changes occurring in the bovine mammary gland across lactation20,21, as well as focused on the expression of lipid-related genes22,23 and on the discovery of genetic variants functionally implicated in the regulation of milk fat24. Therefore, several loci associated with milk FA composition have been identified and the knowledge about the role of genes driving milk fat synthesis in the bovine mammary gland has been significantly advanced. Despite that, the functional consequences of variants, including the complex molecular interplay of signal transduction, transcriptional, post-transcriptional and metabolic events underlying the regulation of milk fat synthesis and secretion in the bovine mammary gland, are still far to be extensively elucidated. A greater understanding of how genes or gene variants associated with milk FA composition exert their molecular effects could improve our knowledge of the physiological and cellular adaptations required for the synthesis and secretion of milk fat in ruminants, advance our understanding of tissue function beyond the well-known biochemical pathways and can improve the quality of milk and dairy products destined for human consumption.

The integration of GWAS analysis with gene-set enrichment and network analyses turned out to be a valid means to extrapolate additional biological information and investigate the functional relationships among sets of genes, that individually explain only a relatively small part of phenotypic variation and therefore cannot be identified by GWAS due to the stringent significance threshold25. These approaches have been recently used to explore bovine reproductive26,27 and productive traits28,29, and milk fat composition (including 10 FA traits)19 and have provided new insights into the biological pathways, complex gene interactions and key regulators affecting the investigated traits. Recently, an approach named Association Weight Matrix and based on SNP co-associations has been also proposed as a tool to exploit the results of GWAS and, combined with network inference algorithms, build gene networks to infer regulatory and functional interactions among genes30. Therefore, our hypothesis is that the milk FA associated alleles exert their effects by influencing transcription of the closest genes through multiple mechanisms. The aims of this study, therefore, were i) to perform a GWAS analysis to identify genomic regions associated to 65 FA traits and the fat percentage in bovine milk; ii) to characterize the regulatory mechanisms of the loci identified to understand the molecular and biological functions involved in the regulation of milk fat content and composition through pathway analysis; iii) to use an AWM approach to build gene networks for milk FA composition and to identify key transcription factors (TFs) regulating the synthesis and secretion of milk fat in dairy cows.

Results

GWAS analysis

Summary statistics and genomic heritabilities for the 65 FA traits and the fat percentage in bovine milk are reported in Tables 1 and 2. Heritability estimates varied from low (<0.10; e.g., 14:0 and 18:1c9) to moderate (up to 0.35; e.g., 16:0) with some individual FAs having values close to 0 (13:0, 22:0, 16:1t9, 17:1c9, 18:1t4, 18:2t11,c15, 18:3c9,t11,c15).

Table 1 Descriptive statistics, additive genetic variance \(({\sigma }_{a}^{2})\) and genomic heritability \(({h}^{2})\) for fat percentage and individual milk fatty acids (n = 1,152, Brown Swiss cows).
Table 2 Descriptive statistics, additive genetic variance and genomic heritability for groups of fatty acids and unsaturation indices (n = 1,152, Brown Swiss cows).

The results of the GWAS analysis are summarized in Table 3 and Supplementary Table S1. A total of 175 significant SNPs (P < 5E-05) were identified across all Bos taurus autosomes (BTAs). Three SNPs had unknown positions on the genome. Fifty-seven FA traits and the milk fat percentage exhibited significant signals, some of which were shared among the traits. The P-values ranged from 4.94E-05 to 1.36E-09. Around 50% of significant SNPs were associated to more than 1 trait. The highest signals were detected on BTA26 (~22.98 Mbp) and on BTA8 (~3.66 Mbp). In particular, the marker ARS-BFGL-NGS-32553 located at 22,977,848 bp on BTA26 was significantly associated with the 14:1 index (P = 1.36E-09). Other strong signals associated with the 14:1 index were detected at 22,951,431 bp and at 22,446,047 bp, which corresponded to markers ARS-BFGL-NGS-39823 and BTB-00933928 (P = 1.95E-08 and P = 8.07E-08, respectively). These two markers are in high linkage disequilibrium with ARS-BFGL-NGS-32553 (r2 = 0.89 and 0.70, respectively). A total of 4 regions (i.e. windows of consecutive SNPs at ≤ 1 Mb distance interval), that were detected on BTA26, were associated with 8 traits. Region 26a corresponded to only 1 SNP (~9.87 Mbp), which was associated to both 10:1 and 14:1 indices (Table 3). Regions 26b, 26c and 26d contained multiple associated SNPs (Table 3 and Fig. 1). In particular, region 26b (~14.66–16.71 Mbp) included 4 SNPs associated to 14:1 index and 14:1c9. In the region 26c (~18.17–22.98 Mbp), signals were detected for 14:1 index, 10:1 index, 14:1c9, MUFAs, 18:1c9, 10:0, 12:0 and 14:0. Finally, 6 SNPs in region 26d (~25.09–31.58 Mbp) were associated to 14:1 index and 14:1c9. The highest signal on BTA8 (Hapmap40047-BTA-119117) was significantly associated to 18:1t16. The proportion of additive genetic variance explained by this SNP was 44.22% (Supplementary Table S1). We detected regions of consecutive SNPs (≥3) on BTA3, BTA5 and BTA13. In particular, a region containing 6 SNPs on BTA3 (~118.57–121.20 Mbp, region 3 g) was significantly associated with 17:0 iso. Two regions including 5 SNPs were detected on BTA5: region 5a (~10.37–10.79 Mbp) which showed significant associations with 11:0, 15:0 and odd-chain FAs (OCFAs); region 5e (~84.09–85.29 Mbp), which contained signals for 10:0, 12:0, SFAs, monounsaturated FAs (MUFAs), medium-chain FAs (MCFAs) and long-chain FAs (LCFAs). Five SNPs on BTA13 (region 13c, ~42.07–42.93 Mbp) were associated with short-chain FAs (SCFAs), 8:0, 10:1c9 and 15:0 ante.

Table 3 Summary results of the genome-wide association analysis for milk fatty acid traits.
Figure 1
figure 1

Manhattan plots of the genome-wide association studies on Bos taurus autosome 26 (BTA26). (a) 14:0; (b) 14:1c9; (c) 12:0; (d) 10:0; (e)18:1c9; (f) MUFA; (g) ID10:1; (h) ID14:1. The red horizontal lines indicate a −log10 (P-values) of 4.30 (corresponding to P-value = 5 × 10−5). Regions 26b, 26c and 26d are highlighted for each trait.

Pathway analyses

In the pathway analyses, for each trait 850 significant SNPs (P < 0.05) were on average assigned to 700 genes, which were mined using the Bioconductor package goseq31 to reveal the biological processes connected with the milk FA synthesis and metabolism in the bovine mammary gland. Significantly enriched GO terms and KEGG pathways (q < 0.05) were found for the profile of 20 out of the 65 FA traits in milk (Fig. 2, Supplementary Table S2). In particular, the genes associated with 12:0 presented the highest number of overrepresented categories/pathways, suggesting a large degree of genetic control for this FA (Fig. 2a). For instance, we observed a high association between the 12:0 content in milk and the regulation of mitogen-activated protein kinase (MAPK) activity, e.g., MAPK cascade (q = 7.73E-06), MAPK signalling pathway (q = 0.00046), positive regulation of extracellular signal-regulated kinase (ERK) 1 and ERK2 cascade (q = 0.00017), and positive regulation of protein phosphorylation (q = 8.70E-06; Fig. 2a). The intermediate filament together with the intermediate filament cytoskeleton appeared to be substantially enriched by 18:1c12 (q = 3.99E-06 and q = 8.34E-06, respectively), while vesicle-mediated transport was generally enhanced by 18:2t11,c15-related genes (q = 1.57E-05) (Fig. 2a).

Figure 2
figure 2

Distribution of the significantly enriched terms/pathways using genes associated to the fatty acid traits. The SNP (P < 0.05) were assigned to genes if they were located within the gene or in a flanking region of 15 kb up- and downstream of the gene using the biomaRt R package. For mapping, the Ensembl Bos taurus UMD3.1 assembly was used as reference. Gene-set enrichment analysis was carried out using the goseq R package. Only the traits showing significantly enriched terms are reported (FDR < 0.05). (a) GO terms; (b) KEGG-pathways. ID14:1: 14:1/(14:1+14:0); ID16:1: 16:1/(16:0+16:1); MUFA: monounsaturated fatty acids; PUFA: polyunsaturated fatty acids; SFA: saturated fatty acids; TFA: trans fatty acids.

Similar to GO terms, KEGG analysis indicated that the 12:0-related genes were involved in the GnRH signalling pathway, MAPK signalling, type I diabetes mellitus and inflammation processes such as graft-versus-host disease and systemic lupus erythematosus (Fig. 2b). The same analysis indicated that the systemic lupus erythematosus pathway was also enriched for 18:1t9, 18:1t10, TFA and TFA18:1. The oocyte meiosis pathway was ranked at the top of the KEGG list of the most impacted functions for SFA (q = 0.00017; Fig. 2b), whereas the ErbB signalling pathway and bacterial invasion of epithelial cells was clearly induced by 18:3c9,c12,c15-related genes. Basal transcription factors appeared to be greatly impacted by the 16:1 index-related genes (q = 0.00020; Fig. 2b).

AWM matrix construction and gene networks

We retained 15,277 annotated SNPs out of the 37,568 SNPs for AWM matrix construction analyses. After applying a series of filtering steps (see Material and Methods), 1,575 SNPs, corresponding to 1,575 unique genes, were used to build the AWM matrix. Hierarchical clustering of traits evidenced 2 clusters which best described the FA profiles according to similarities in their origins or in their metabolic processes, e.g., alpha-linolenic acid (18:3c9,c12,c15) clustered with the n3 FAs, and 6:0 with the SCFAs (Supplementary Fig. S1). Correlations across AWM rows were used to predict gene associations and build a network in which each node represented a gene and each edge a significant interaction. In total, 27,050 significant edges between 1,575 nodes were identified based on the Partial Correlation coefficient with Information Theory (PCIT) algorithm. The SNPs detected with the AWM approach explained 67% of the phenotypic variance for 12:0, which was significantly larger (P < 0.001) than the variance explained by the same number of randomly selected SNPs (10,000 replicates) (Fig. 3). After applying the gene network reduction strategy (filtering for sparse correlations ≥ |0.80|), we obtained a network with 1,628 significant interactions and 791 nodes (Fig. 4). The node degree followed an approximate power law distribution (R2 = 0.933; y = −1605x379), which suggests that the reduced network was scale-free and that a few gene nodes acted as hubs with a large number of links to other gene nodes. Analysis of other network topological parameters, e.g., node closeness, eccentricity and betweenness (network centrality indices), showed that some of the genes involved in lipid metabolism, such as INTS8, SACM1L, AGMO and PTGR1, might be relevant to the SNP co-association network. Table 4 summarizes the 10 most highly-connected nodes based on significant interactions (node degree) and Supplementary Table S3 shows their positions in close proximity to the QTLs related to milk fat or milk FAs deposited in the Cattle QTL database32. As the SNP marker Hapmap54104-rs29010930 was located ~ 2 kb up-stream of CNNM1 (at ~20.3 Mb on BTA26), we investigated whether this SNP affected the transcriptional regulation of CNNM1 by comparing genomic regions up-stream of CNNM1 across different species using mVISTA. The results revealed the presence of a cluster of conserved non-coding elements (CNEs) surrounding Hapmap54104-rs29010930 (Supplementary Fig. S2).

Figure 3
figure 3

Proportion of phenotypic variance explained by 1575 randomly selected SNP (10,000 replicates). The dashed green vertical line represents the proportion of phenotypic variance explained by the SNPs include in the AWM.

Figure 4
figure 4

Regulatory network of the genes significantly associated with fatty acid profiles in bovine milk. In the network, every node represents a gene, whereas every edge connecting two nodes represents a significant interaction (correlation value ≥|0.80|). The nodes shape indicates whether the node is a transcription factor (triangles), a miRNA (hexagon), a metabolite (round rectangle), a membrane receptor (rectangle), a transporter (parallelogram), or other type of genes (ellipses). Information of molecule type was obtained using Ingenuity Pathway Analysis (IPA; version 5.5, Ingenuity Systems, USA). The list of identified TF was updated with that reported by33. The node colour represents the biological function of the gene, according to IPA. The edge colour intensity indicates the level of the association: red = positive correlation - and blue = negative correlation between two nodes. The width of the edge indicates the value of the correlation; a wider edge corresponds to a higher correlation in absolute value.

Table 4 Description of the ten most connected nodes in the co-association network*.

We used IPA to investigate the co-enriched functions/pathways and other biological features within the network. Enriched pathways included MAPK related pathways (e.g. “ERK/MPK signaling”, P = 8.32E-03; “GnRH signaling”, P = 1.55E-06), “Phosphatidylglycerol Biosynthesis II” (P = 6.76E-03); pathways related to lipid metabolism such as “Fatty acid activation”(P = 9.12E-03), “Adipogenesis pathway”(P = 1.91E-02), “Triacylglycerol Biosynthesis” (P = 8.51E-03), “Triacylglycerol degradation” (P = 3.02E-02), “PPAR signalling” (P = 1.38E-03) and “CDP-diacylglycerol Biosynthesis I” (P = 4.47E-03); pathways related to hormone signalling, e.g., “Pregnenolone Biosynthesis” (P = 1.91E-02). The top IPA computed networks showed that the genes in the networks were associated with “Cell Morphology, Cellular Function and Maintenance, Carbohydrate Metabolism” and “Immunological Disease, Inflammatory Disease, Inflammatory Response”. The full list of enriched pathways and computed networks is reported in Supplementary Table S4.

In parallel, we have also generated a supplementary TF network by investigating the potential impact of the combination of one or more TFs on the expression of the other genes in the network with the lowest redundancy. Hence, we explored 383,306 possible combinations of TF trios among the 1,977 TFs identified by33. The analysis allowed us to identify BTB domain and CNC homolog 2 (BACH2), E2F transcription factor 3 (E2F)3, and lysine demethylase 5 A (KDM5A) TFs, which controlled the transcription of 877 unique genes in our network (~56% of the genes in the AWM matrix) (Fig. 5) and which might play a pivotal role in orchestrating adaptations of the FA metabolism in the mammary gland. Functional analyses of these 877 target genes by ClueGo revealed the metabolic pathways to be the most highly impacted, of which lipid and carbohydrate metabolism were the most important. Among lipid metabolism, glycerophospholipid metabolism and sphingolipid metabolism were the most highly induced, followed by the MAPK activity-related pathways (e.g., the MAPK signalling pathway) and the GnRH signalling pathway. We also found overrepresentation of pathways related to reproduction (e.g., progesterone-mediated oocyte maturation) as well as the glutamatergic synapse and oxytocin signalling pathway. The full list of significantly enriched pathways and GO terms is presented in Supplementary Table S5.

Figure 5
figure 5

Activators and repressors of the regulatory network of genes associated with the bovine milk fatty acid profile. In the network, the best trio of transcription factors is showed: E2F3, KDM5A and BACH2. The nodes shape indicates whether the node is a transcription factor (triangles), a miRNA (hexagon), a metabolite (round rectangle), amembrane receptor (rectangle), a transporter (parallelogram), or other type of genes (ellipses). The node colour represents the biological function of the gene according to Ingenuity Pathway Analysis (IPA) annotation. The list of identified TF was updated with that reported by33. The edge colour intensity indicates the level of the association: red = positive correlation - and blue = negative correlation between two nodes.

In addition, to analyze in depth the TFs potentially controlling the network, we examined the putative binding sites within the promoter region of these TFs using the LASAGNA tool. Interestingly, the promoter region of E2F3 and KDM5A was predicted to contain binding sites for several regulators of cholesterol and lipid metabolism such as ARNT:AhR, HNF4A, CREB1, SP1, MAFB and PPARG:RXRA (Supplementary Table S6).

Discussion

GWAS results

We have reported here GWAS results for the profiles of 65 FA traits and the fat percentage in Brown Swiss cows’ milk, including also lesser-studied FAs and/or those present in small concentrations. Genomic heritabilities exhibited a medium-high relationship with the genetic heritabilities estimated using a standard animal model (r = 0.70). Similarly, we found a moderate agreement between AWM column-wise correlations and the genetic correlations among FAs (r = 0.70)9. Several GWAS studies are available for bovine milk FA profiles in Holstein and Jersey populations17,18,19,34,35. However, only the main or most representative individual FAs or FA groups have generally been investigated. We found some correspondence with previous results, e.g., the presence of overlapping regions on BTA26 (~2.5–39.0 Mbp) for 14:1c9 and 14:1 index17,18,34, on BTA13 (~41.40–68.40 Mbp) for 8:0 and SCFAs17, and on BTA5 for 18:1 index (~25.4–28.0 Mbp) and unsaturated FA (UFA) (84.1–104.70 Mbp)35. In particular, the region on BTA26 covers the known candidate gene SCD1 (located at ~21.14 Mbp), confirming its notable influence on 14:1 index and consequently on 14:1c9. Indeed, although mammary SCD1 may act on several substrates (i.e. 14:0; 16:0, 17:0; 18:0; 18:1t11), 14:1 index has been considered the best proxy for SCD activity in the mammary gland, being 14:0 almost exclusively produced via de novo synthesis in the mammary gland36. Accordingly, the strongest association in the present study was found for 14:1 index (ARS-BFGL-NGS-32553, P = 1.36E-09) at 22.98 Mbp on BTA26, which is 1.84 Mb away from SCD1. Interestingly, we identified CNEs surrounding the marker Hapmap54104-rs29010930, which was located on BTA26 at 20.20 Mb, ~2 kb upstream of CNNM1, one of the top nodes in the reduced network. This marker was also detected by standard GWAS analysis and significantly associated to 14:0 (P = 2.16E-05; Supplementary Table S1). Because CNEs are now known to contribute 4.8% to 9.5% of the variability in the genome37, SNPs within this region may have important functional consequences on FA metabolism or synthesis in mammary gland. Indeed, recent findings suggest that CNEs are potentially involved in gene regulation, often encoding for enhancer elements, which may also be located far from their target gene38,39. Support for this hypothesis might be gathered by the results of GWAS analysis obtained after conditioning for Hapmap54104-rs29010930, which evidenced a large drop in h2 for 14:0 (from 0.075 to 0.017). The window of consecutive SNPs significantly associated with 14:1 index and 14:1c9 in region 26b is very close to SORBS1 (located at ~16.72–16.92 Mbp), which is involved in the regulation of insulin signalling in human adipose tissue40. Insulin acts as powerful regulator for the transcription of bovine SCD according to its pro-lipogenic role41. The ruminant mammary gland does not have a total requirement for insulin to activate lipogenesis but the lipogenic role of insulin signaling in the bovine mammary gland might acquire a biological meaning with the advancing of lactation and the increase in insulin sensitivity42. The SORBS1 gene has also been previously reported as a potential candidate gene associated with bovine milk FA profile35. The same SNP at 104.71 Mbp on BTA5 (ARS-BFGL-NGS-99256) significantly affected PUFAs and n6 FAs (present work) and UFAs35. This SNP is located 4.46 Mbp from the OLR1 gene, which can bind and degrade oxidized low-density lipoprotein and has already been identified as a candidate gene for milk fat composition18,43. Significant associations were detected on BTA2 (~129.05 Mbp), BTA25 (~32.65 Mbp) and BTA28 (~2.95 Mbp) for the CLA 18:2 c9,t11. A significant association for CLA on BTA2 was found also by44, although in a different position (98.2 Mbp). The SNP on BTA28 (ARS-BFGL-NGS-118662) was located 0.4 Mbp from GNPAT, which is involved in the lipid metabolic process and has previously been associated with Δ9-desaturase44. Other significant associations were discovered in close proximity of previously identified candidate genes34,35 such as GH (ARS-BFGL-NGS-102154), LIPJ and LIPK (ARS-BFGL-NGS-13746), PRLR (Hapmap26422-BTA-148751) and NFKB2 (BTB-00933928) (Supplementary Table S1).

On the other hand, we also found some of our results to be inconsistent with previous studies. The most important differences were found on BTA14, where DGAT1 (located at ~1,8Mbp) is recognised as influencing the profile of several FA traits, including RA, 14:1, 16:1, 16:0, 18:1c9, 14:1 index, 16:1 index, 18:1 index 18:2c9,c12, 18:3c9,c12,c15 18,25 and milk fat19,45. In the present study, we also found a significant association with the milk fat but at a different position (~26.00 Mbp). Other associations on BTA14 were with 18:1t4 (~6.91 Mbp), 11:0 (~17.38 Mbp), 14:0 iso (~40.79 and 57.15 Mbp) and 15:0 iso (~40.79 and 58.85 Mbp). These differences can be attributed to the fact that DGAT1 is nearly monomorphic in the Italian Brown Swiss12,46, therefore the observed associations might reflect the effect of other multiple genes. For instance, Hapmap25446-BTC-054694 was located ~0.3 Mbp from CYP7A1 which catalyses a rate-limiting step in cholesterol catabolism47. Furthermore, a polymorphism on this gene has been shown to influence plasma lipids in humans48.

Our GWAS study did not detected SNPs associated to some of the well-known candidate genes for milk FA profile (e.g. FASN, MGST1). On the other hand, some other previously reported candidate genes were confirmed (e.g. ORL1, GNPAT, GH). GWAS power depends on sample size, effect size, causal allele frequency, and marker allele frequency and its correlation with the causal variant49. Some divergences with previous results might be attributed to several factors such as differences in the populations studied (breed, number of animals, environment and management conditions, physiological and metabolic factors, structure of the linkage disequilibrium among the genetic markers), and differences in the statistical model used for GWAS analysis and/or in the marker densities.

Pathway and network analyses

Pathway and network analyses from GWAS data were performed to find both candidate regulatory sites and the most likely functional variants. We identified 12:0-related genes as uncovering the greatest number of pathways/GO terms, suggesting that 12:0 may play a key biological role in the control of milk fat composition. It is worth mentioning that the palm kernel oil, which is a rich source of 12:0, can be used as supplement in the feed industry. However, in Italy palm kernel oil is not regularly adopted as indirectly suggested by recent studies carried out in Northern Italy50,51. Concentrates might be integrated with lipids to increase the energy content of the diets but generally soybean, linseed and sunflower (as whole seeds, expellers or oil) are used.

The genes related to 12:0 were mostly involved in MAPK activity, ERK1 and ERK2 cascade and protein phosphorylation. The MAPK pathway has been found to be responsive to ApoA-1/ABCA1 activity in bovine mammary epithelial cells in vitro 52. Further, the presence of ABCA1 and ABCG1 has been detected in mammary epithelial cells and in the membrane of milk fat globules which suggest that these proteins might be involved in cholesterol exchange between mammary epithelial cells and milk53. In the ruminants, only a small part of cholesterol in milk seemed to be synthetised in the mammary gland while it is mainly derived from the blood uptake54. Genes related to the cholesterol synthesis are induced in the bovine liver after parturition55, suggesting that high levels of cholesterol are delivered by lipoproteins to the mammary gland56. Therefore, we might speculate that the significance of the association between 12:0 and alleles related to genes involved in MAPK pathway could be likely related to role of cholesterol in the fat globule membrane. Finally, capric and lauric acid have been also shown to have anti-bacterial and anti-inflammatory activities in vitro and in mouse model57,58. The anti-inflammatory effect seemed to be partially mediated by the inhibition of NF-κB activation and phosphorylation of MAPK58, suggesting a putative link between the immune signalling pathways and the mammary cells’ anti-inflammatory response. Activation of the GnRH pathways suggests a link between lauric acid, MAPK activity and cholesterol synthesis and release. The MAPK are also acknowledged as being involved in the transcriptional activation of a wide variety of genes regulating the biosynthesis and secretion of the gonadotropins, e.g., the luteinizing hormone (LH) and the follicle-stimulating hormone (FSH), both related to sex steroid hormone synthesis, follicle growth and oocyte maturation59. Cholesterol is also the precursor for the biosynthesis of steroid hormones, including sex steroids and corticosteroids60. The enrichment of pathways related to reproduction, e.g. oocyte meiosis, has been also previously associated to the milk profile of 12:0 in Danish Jersey19. All these novel putative biological roles for 12:0 in the bovine mammary gland are summarized in Fig. 6.

Figure 6
figure 6

Potential roles of lauric acid in the bovine mammary gland. The Figure summarize the proposed roles for lauric acid based in the bovine mammary gland based on the results of functional and network analyses from the GWAS data. FSH: Follicle-stimulating hormone; LH: Luteinizing hormone.

Pathway analyses of the set of genes included in the regulatory network further confirmed the relevance of alleles involved in MAPK signalling, cholesterol biosynthesis, hormonal signalling and reproduction. Overrepresentation of the “Oxytocin signalling pathway” may be explained by oxytocin (OXT) effects in the regulation of milk secretion from lactating mammary gland in mammals including ruminants61. Oxytocin was suggested to have direct effects on the mammary epithelium in mammals, in that it stimulates milk lipid secretion, although the mechanisms are still not clear62,63. Finally, OXT also plays an important role in many reproduction-related functions in female ruminant; accordingly, steroid hormones seemed to act as (indirect) regulators of the OXT-receptor signalling64.

Identification of transcription factors within the regulatory network

Mining for transcription factors and their putative target genes is an important component of the system biology approach and allows transcriptional networks, which may have important regulatory functions for FA metabolism and synthesis, to be uncovered. The AWM approach allowed inferring gene interactions and computing networks based on the co-association patterns across phenotypes, using the PCIT algorithm which is based on partial correlations and information criteria. On the other hand, IPA provided relevant networks and up-stream regulators analysis based on the information deposited in the expert-annotated Ingenuity Knowledge Base. Integrating these approaches allowed to formulate molecular mechanistic hypotheses and identify upstream regulators likely responsible for phenotypic or functional outcomes.

The TF network analyses revealed that the best trio of TFs (E2F3, KDM5A and BACH2) presented co-association with a large number of genes involved in processes related to lipid and energy metabolism described in mammary tissue of ruminants, e.g., fatty acid β-oxidation65, PPAR/RXR activation66 and triacylglycerol biosynthesis67. This is of interest because manipulation of those TFs networks either directly (e.g., inhibiting or activating one or several TFs in the network) or indirectly (e.g., through selection) can lead to changes in milk FA synthesis. E2F3, one of the key regulators, is a cell cycle-related factor whose function in non-ruminants is associated with the regulation of adipocyte differentiation68 and lipid metabolism69. This TF exhibited co-association with 487 target genes playing a role in regulating lipid metabolism/transport and glucose metabolism (e.g., DGKB, SCD5, ARNTL2, HMGCR, DAGLA, PLA2G4A, ACOXL and LPCAT2). For instance, the concentrations of trans-FA and CLA isomers in milk have been positively associated to SCD5 expression in bovine mammary gland70; however, its substrate affinity and role in the mammary tissue are still unclear71. The second predicted key TF was BACH2, which was co-associated with 250 genes. The BACH2 functions as transcriptional repressor at the immune system level in human72; however, this gene has also been associated with intramuscular fat content in cattle73. Predicted target genes included lipid-related genes, such as INSIG1, ABCG1, RORA, HMGA2, ESR1 and ACSL3. The insulin-induced gene 1 (INSIG1) regulates the responsiveness of SREBP1 and 2 processing via SCAP, thereby altering the rate of lipogenesis22. Indeed, changes in dietary lipid composition affect the expression of INSIG1 in bovine mammary gland74. Furthermore, SNPs on INSIG1 were significantly associated with differences in UFAs, SFAs and desaturation indices in bovine milk11. Acyl-CoA synthetase long-chain family member 3 (ACSL3) plays a role in the conversion of free LCFAs into fatty acyl-CoA esters controlling lipid biosynthesis and FA oxidation. Previous works have identified SNPs on ACSL3, that were associated with bovine milk FA composition75,76. The third predicted TF was KDM5A, which exhibited co-association with 269 target genes. The KDM5A functions as histone demethylases and contributes to co-repression of gene transcription. The KDM5A target genes include a large set of cell cycle regulators and genes involved in lipid metabolism, such as LPIN1, ACACA, CHPT1, DGKG, GNPAT, PLD1 and ADCY3. The LPIN1 functions as a nuclear transcriptional co-activator to regulate the channeling of FAs toward milk triglyceride synthesis in the bovine mammary gland77, whereas ACACA catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, which is the rate-limiting reaction in FA biosynthesis78. Polymorphisms on LPIN1 and ACACA have been recently associated to the bovine milk FA profile14,16.

These TF identified as being responsive to the milk FA profile might represent new targets for more detailed functional studies. However, it is worth mentioning that no direct evidence for E2F3, KDM5A and BACH2 genes expression in bovine mammary gland is currently available79; therefore, investigation of their expression level in bovine mammary epithelial cells is needed to gain support for their potential role as regulators of milk FA profile at molecular level. On the other hand, in cow these TFs are expressed in tissues related to nervous system, i.e. hypothalamus (E2F3, KDM5A and BACH3) and brain (E2F3)79; it is well known that molecular pathways controlling FA metabolism are highly interconnected and therefore we might assume a putative relationship with FA even if these genes are not expressed in the mammary gland.

Current limitations for a more complete systems biology approach to milk FA regulation in dairy cows include a lack of additional “omics” data (e.g., metabolomics, proteomics, microRNAomics, epigenomics) and the incomplete genome annotation. Further, functional pathways available in livestock animals are still limited and most of the information is derived from human and animal models studies28. Consequently, the identification of competitive functional pathways is often challenging. Despite these limitations, our results show that genetic variants in the E2F3, BACH2 and KDM5A genes probably play a key role in modulating bovine milk fat profile. Interestingly, these genes and many other lipid-related genes were not identified by standard GWAS approach, due to the stringent P-value threshold. The use of GWAS analysis coupled to network analysis allowed for a more holistic study of the multi-faceted control of fatty acid composition in milk. Even if some of our findings do not fully agree with previous GWAS studies in terms of (re)discovery of well-known candidate genes for bovine milk FA profile, the presented results are substantially coherent with biological processes and cellular functions implicated in the regulation of lipid metabolism in bovine mammary gland. Such information provides the basis for more detailed functional studies at the level of transcription factors and the subsequent effects on transcriptional networks. The new knowledge of the genetic control of milk FA composition could help in the development of selection strategies aimed at improving the quality of milk for human consumption. In particular, the information inferred from the present study might be incorporated as biological prior into prediction models such as BayesRC80 and GFBLUP81 to shed more light into the genetic basis of complex traits such as milk FA profile and to improve the accuracy of genomic prediction. On the other hand, the potential limitation of this approach relies on the need to underpin the actual role of the predicted gene regulators in the bovine mammary gland. Supporting evidences might be obtained by using co-expression analyses82, aiming to validate the predicted gene-gene interactions and to shed more light on the biological pathways driving variations in milk fat profile. It is likely that establishing a functional rationale underlying the importance of allelic variation and candidate genes for milk FA composition will become a major component of following up the genes emerging from GWAS.

Methods

Ethics statement

The cows included in this study belonged to commercial private herds and were not subjected to any invasive procedures. Milk and blood samples were previously collected during the routine milk recording coordinated by technicians working at the Breeder Association of Trento Province (Italy) and therefore authorized by a local authority.

Animals, phenotypes and genotypes

Milk samples from 1,264 Italian Brown Swiss cows belonging to 85 herds located in Trento province in north-eastern Italy were collected once during the evening milking, as described in83. Herds were selected to represent different environments and dairy farming systems including feeding regimens (e.g. different amount and type of forage, amount of compound feed or type of ration). Details about dairy farming systems including animals feeding conditions are reported in84. Immediately after milking, the milk samples (without preservative) were refrigerated at 4 °C and transferred to the laboratory.

Analysis of the milk fat composition has been previously described9. In short, fat percentage was determined in individual milk samples using a MilkoScan FT6000 (Foss, Hillerød, Denmark) and analysis of the FA profile was performed using gas chromatography (GC) with 65 milk FA traits included (47 individual FAs, 13 FA groups and 5 desaturation indices).

The Illumina BovineSNP50 v.2 BeadChip (Illumina Inc., San Diego, CA) was used to genotype 1,152 cows (blood samples were not available for all phenotyped animals). Quality control excluded markers that did not satisfy the following criteria: (1) call rate > 95%, (2) minor allele frequency > 0.5%, and (3) no extreme deviation from Hardy-Weinberg proportions (P > 0.001, Bonferroni corrected). After this filtering step, 1,011 cows and 37,568 SNP were retained.

Genome-wide association study

A single marker regression model was fitted for GWAS using the GenABEL package85 in the R environment and the GRAMMAR-GC (Genome wide Association using Mixed Model and Regression - Genomic Control) approach with the default function gamma 86. The GRAMMAR-GC procedure consists of 3 steps. Firstly, an additive polygenic model with a genomic relationship matrix is fitted. The polygenic model was:

$${\bf{y}}={\bf{X}}\beta +{\bf{a}}+{\bf{e}},$$
(1)

where y is a vector of milk FA traits; β is a vector with fixed effects of days in milk (classes of 30 days each), parity of the cow (classes 1, 2, 3, ≥4) and herd-date (n = 85); X is the incidence matrix that associates each observation to specific levels of the factors in β. The two random terms in the model were animal and the residuals, which were assumed to be normally distributed as \({\boldsymbol{a}}\, \sim N(0,{\bf{G}}{\sigma }_{g}^{2})\) and \({\boldsymbol{e}}\, \sim N(0,{\bf{I}}{\sigma }_{e}^{2})\), where G is the genomic relationship matrix, I is an identity matrix, and \({\sigma }_{g}^{2}\) and \({\sigma }_{e}^{2}\) are the additive genomic and residual variances, respectively. The G matrix was constructed in the GenABEL R package, where for a given pair of individuals i and j, the identical by state coefficients (\({f}_{i,j}\)) is calculated as:

$${f}_{i,j}=\frac{1}{N}{\sum }_{k}\frac{({x}_{i,k}-{p}_{k})\,\times \,({x}_{j,k}-{p}_{k})}{{p}_{k}\,\times (1-{p}_{k})}$$
(2)

where N is the number of markers used, \({x}_{i,k}\) is the genotype of the ith individual at the kth SNP (coded as 0, ½ and 1), \({p}_{k}\) is the frequency of the “+” allele and k = 1, …, N.

In the second step of GRAMMAR-GC, the residuals obtained in (1) are regressed on the SNP (single marker regression) to test for associations. In the last step, the Genomic Control (GC) approach corrects for conservativeness of the GRAMMAR procedure and estimates of the marker effects are obtained87. A P-value threshold of 5 × \({10}^{-5}\) was adopted to determine significant associations88. Manhattan plots were drawn using the R package qqman89. The variance explained by each SNP was calculated as 2pqa2, where p is the frequency of one allele, q = 1-p is the frequency of the second allele and a is the estimated additive genetic effect. A scan for genes around 1Mbp upstream-downstream from the significant SNP was performed using the Ensembl Bos taurus UMD3.1 database (http://www.ensembl.org/index.html).

Model (1) was also used to estimate variance components and the genomic heritability of the traits based on the genomic relationship matrix. Heritability was estimated as:

$${h}^{2}=\frac{{\sigma }_{g}^{2}}{{\sigma }_{g}^{2}+{\sigma }_{e}^{2}}$$
(3)

The proportion of the phenotypic variance explained by the SNPs included in the AWM was estimated using GenABEL and the previously described model. Firstly, a G matrix based only on the SNP s included in the AWM was constructed. Secondly, the same number of randomly selected SNPs was used to build 10,000 G matrices (10,000 replicates) and estimate the proportion of phenotypic variance explained by these randomly selected SNPs.

The r-squared statistic was chosen to predict the extent of LD using the R package LDheatmap90.

Conserved non-coding elements were explored using mVISTA, a web tool for comparative genomic analysis (http://genome.lbl.gov/vista/index.shtml), which allows sequences from multiple species to be compared and visualized with annotation information.

Pathway analyses

Pathway analyses were performed to identify the biological mechanisms contributing to the milk fat profile, as previously detailed29. Briefly, the SNPs were divided into two categories “non-relevant” and “relevant” based on a nominal P-values < 0.05. By using a less stringent significance threshold (respect to the GWAS study), we aimed to capture the effect of less significant SNP which however can contribute to explain the variability for the investigated traits, possibly as part of organized pathways and/or biological processes. Then, the relevant SNPs were assigned to a gene if they were located within the gene or within a flanking region 15 kb up- and downstream it91, using the BiomaRt R package92,93 and the Ensembl Bos taurus UMD3.1 assembly as reference94. For functional annotation, the Kyoto Encyclopaedia of Genes and Genomes (KEGG)95 and the Gene Ontology (GO)96 databases were used to define pathways and functional categories associated to the gene sets. Only GO and KEGG terms with >10 and <1000 genes were included in the analyses to avoid testing broad or narrow functional categories. For each functional category, a Fisher’s exact test was applied to test for overrepresentation of significant gene sets. A false discovery rate (FDR) correction was used to control for false positives with the cut-off for significant enrichments set at FDR < 0.05. The gene-set enrichment analysis was performed using the Bioconductor package goseq in the R environment31.

SNP co-association and network analyses

Along with the biological pathway analysis, SNP co-association and network analyses were carried out in order to detect key gene regulators, functional connections and networks of genes affecting the milk fat profile.

Given that many of the original annotations for the BovineSNP50 v.2 BeadChip (Illumina Inc., San Diego, CA) have been found to be incomplete, the Illumina BovineSNP50 v.3 Genotyping Beadchip annotation (available since June 2016 at http://www.illumina.com/products/by-type/microarray-kits/bovine-snp50.html) was used to re-assign the probes to new probe sets based on SNP position.

The AWM was built starting from the results of a GWAS analysis carried out without imposing a significance threshold. In particular, the AWM was constructed from two matrices that contained row-wise SNPs and column-wise phenotypes, as previously reported in detail30,97. Elements in the first matrix were equal to the P value of association for each SNP and phenotype, while in the second matrix corresponded to the SNP z-score standardised additive effect. Based on the results of the pathway analyses, which showed that the 12:0 FA was associated with the greatest number of overrepresented categories/pathways, the 12:0 FA was selected as the key phenotype and the associated SNPs (P ≤ 0.05) were included in the AWM. In the next step, dependency among phenotypes was explored by estimating the average number of other phenotypes (Ap) that were associated with these SNPs at the same P-value threshold (≤0.05) (Ap = 8). Subsequently, all SNPs that were associated with at least 8 phenotypes at P ≤ 0.05 were included in the AWM. In the next steps, the AWM was built following the procedure described by30, but only SNPs within genes or located close to intergenic SNPs (within 10 kb of the coding region) were selected. A distance of 10 kb was chosen because the probable size of the promoter region of a given gene is heterogeneous. In addition, to identify putative regulators, the TFs reported by33 and the microRNAs (miRNAs) that were mapped to the UMD3.1 bovine genome assembly (GenBank assembly accession: GCA_000003055.3) were also included in this analysis. The Pearson correlations obtained from pair-wise correlations of columns were computed and visualised as a clustering tree using the hclust R function98. The table-like structure of the AWM was used as input to the information theory (PCIT) algorithm, which uses a partial correlation in an information theory framework to ascertain significant gene–gene interactions (co-associations)99. To consider only the high-confidence gene co-associations determined by PCIT, those with correlation ≥|0.80| were retained, on the assumption that those genes have relevant biological significance for the key phenotype from which the AWM-PCIT was derived. The co-association network was automatically laid out using the organic layout algorithm in Cytoscape V2.7 (http://cytoscape.org). Network topological parameters and node centrality values were calculated using NetworkAnalyzer100 and CentiScape plugins101 to gain insights into the organisation and structure of the complex networks formed by the interacting molecules. In parallel, the list of co-associated genes was also fed into an Ingenuity Pathway Analysis (IPA, version 5.5; Ingenuity Systems, USA) to identify relevant categories of molecular functions, cellular components and biological processes. The IPA enabled us to identify (i) significantly overrepresented functional GO annotations, (ii) their over- or under-expression, and (iii) group-specific transcriptional networks. All listed or reconstructed cellular pathways were derived from the Ingenuity Knowledge Base which collects biological interactions and functional annotations derived from various experimental contexts and manually curated for accuracy from the literature and third-party databases. The IPA output a statistical assessment (based on a Fisher’s exact test) of the significance of representation for biological functions and signaling pathways (P-value < 0.05). The IPA computed networks and ranked them according to a statistical likelihood approach.

Once the TFs and their target genes to which they were potentially connected were identified in the AWM-derived network, an information lossless approach102 was used to identify the optimal subset of TFs spanning the majority of the network topology. Pathway and ontology analyses of the predicted target genes (co-associated with the best TF trio) were carried out using the Cytoscape plugin ClueGo103. The Benjamini & Hochberg correction for multiple testing was used with the cut-off for significant enrichment set at FDR < 0.05. The LASAGNA-Search 2.0 web tool104 was used to search for TFs binding site using matrices in the TRANSFAC public database and a P-value significance threshold of 0.001.