SNP co-association and network analyses identify E2F3, KDM5A and BACH2 as key regulators of the bovine milk fatty acid profile

The fatty acid (FA) profile has a considerable impact on the nutritional and technological quality of milk and dairy products. The molecular mechanism underlying the regulation of fat metabolism in bovine mammary gland have been not completely elucidated. We conducted genome-wide association studies (GWAS) across 65 milk FAs and fat percentage in 1,152 Brown Swiss cows. In total, we identified 175 significant single nucleotide polymorphism (SNPs) spanning all chromosomes. Pathway analyses revealed that 12:0 was associated with the greatest number of overrepresented categories/pathways (e.g. mitogen-activated protein kinase (MAPK) activity and protein phosphorylation), suggesting that it might play an important biological role in controlling milk fat composition. An Associated Weight Matrix approach based on SNP co-associations predicted a network of 791 genes related to the milk FA profile, which were involved in several connected molecular pathways (e.g., MAPK, lipid metabolism and hormone signalling) and undetectable through standard GWAS. Analysis of transcription factors and their putative target genes within the network identified BACH2, E2F3 and KDM5A as key regulators of milk FA metabolism. These findings contribute to increasing knowledge of FA metabolism and mammary gland functionality in dairy cows and may be useful in developing targeted breeding practices to improve milk quality.

genes involved in lipid biosynthesis or metabolism have been shown to affect the milk FA profile in different bovine breeds [14][15][16] . Genome-wide association studies (GWAS) have also shown several genomic regions to be significantly associated to the milk FA profile [17][18][19] , supporting the finding that milk fat synthesis and secretion are coordinated by a complex network of interrelated genes. The availability of data on the genetic parameters of individual FAs and identification of polymorphisms affecting their contents provide useful information for developing breeding selection strategies aimed at obtaining a milk FA profile more beneficial to human health.
Studies are available which investigated gene expression changes occurring in the bovine mammary gland across lactation 20,21 , as well as focused on the expression of lipid-related genes 22,23 and on the discovery of genetic variants functionally implicated in the regulation of milk fat 24 . Therefore, several loci associated with milk FA composition have been identified and the knowledge about the role of genes driving milk fat synthesis in the bovine mammary gland has been significantly advanced. Despite that, the functional consequences of variants, including the complex molecular interplay of signal transduction, transcriptional, post-transcriptional and metabolic events underlying the regulation of milk fat synthesis and secretion in the bovine mammary gland, are still far to be extensively elucidated. A greater understanding of how genes or gene variants associated with milk FA composition exert their molecular effects could improve our knowledge of the physiological and cellular adaptations required for the synthesis and secretion of milk fat in ruminants, advance our understanding of tissue function beyond the well-known biochemical pathways and can improve the quality of milk and dairy products destined for human consumption.
The integration of GWAS analysis with gene-set enrichment and network analyses turned out to be a valid means to extrapolate additional biological information and investigate the functional relationships among sets of genes, that individually explain only a relatively small part of phenotypic variation and therefore cannot be identified by GWAS due to the stringent significance threshold 25 . These approaches have been recently used to explore bovine reproductive 26,27 and productive traits 28,29 , and milk fat composition (including 10 FA traits) 19 and have provided new insights into the biological pathways, complex gene interactions and key regulators affecting the investigated traits. Recently, an approach named Association Weight Matrix and based on SNP co-associations has been also proposed as a tool to exploit the results of GWAS and, combined with network inference algorithms, build gene networks to infer regulatory and functional interactions among genes 30 . Therefore, our hypothesis is that the milk FA associated alleles exert their effects by influencing transcription of the closest genes through multiple mechanisms. The aims of this study, therefore, were i) to perform a GWAS analysis to identify genomic regions associated to 65 FA traits and the fat percentage in bovine milk; ii) to characterize the regulatory mechanisms of the loci identified to understand the molecular and biological functions involved in the regulation of milk fat content and composition through pathway analysis; iii) to use an AWM approach to build gene networks for milk FA composition and to identify key transcription factors (TFs) regulating the synthesis and secretion of milk fat in dairy cows.  Tables 1 and 2. Heritability estimates varied from low (<0.10; e.g., 14:0 and 18:1c9) to moderate (up to 0.35; e.g., 16:0) with some individual FAs having values close to 0 (13:0, 22:0, 16:1t9, 17:1c9, 18:1t4, 18:2t11,c15, 18:3c9,t11,c15).

Results
The results of the GWAS analysis are summarized in Table 3 and Supplementary Table S1. A total of 175 significant SNPs (P < 5E-05) were identified across all Bos taurus autosomes (BTAs). Three SNPs had unknown positions on the genome. Fifty-seven FA traits and the milk fat percentage exhibited significant signals, some of which were shared among the traits. The P-values ranged from 4.94E-05 to 1.36E-09. Around 50% of significant SNPs were associated to more than 1 trait. The highest signals were detected on BTA26 (~22.98 Mbp) and on BTA8 (~3.66 Mbp). In particular, the marker ARS-BFGL-NGS-32553 located at 22,977,848 bp on BTA26 was significantly associated with the 14:1 index (P = 1.36E-09). Other strong signals associated with the 14:1 index were detected at 22,951,431 bp and at 22,446,047 bp, which corresponded to markers ARS-BFGL-NGS-39823 and BTB-00933928 (P = 1.95E-08 and P = 8.07E-08, respectively). These two markers are in high linkage disequilibrium with ARS-BFGL-NGS-32553 (r2 = 0.89 and 0.70, respectively). A total of 4 regions (i.e. windows of consecutive SNPs at ≤ 1 Mb distance interval), that were detected on BTA26, were associated with 8 traits. Region 26a corresponded to only 1 SNP (~9.87 Mbp), which was associated to both 10:1 and 14:1 indices (Table 3). Regions 26b, 26c and 26d contained multiple associated SNPs (Table 3 and Fig. 1

Pathway analyses.
In the pathway analyses, for each trait 850 significant SNPs (P < 0.05) were on average assigned to 700 genes, which were mined using the Bioconductor package goseq 31 to reveal the biological processes connected with the milk FA synthesis and metabolism in the bovine mammary gland. Significantly enriched GO terms and KEGG pathways (q < 0.05) were found for the profile of 20 out of the 65 FA traits in milk (Fig. 2 Table S2). In particular, the genes associated with 12:0 presented the highest number of overrepresented categories/pathways, suggesting a large degree of genetic control for this FA (Fig. 2a). For instance, we observed a high association between the 12:0 content in milk and the regulation of mitogen-activated protein kinase (MAPK) activity, e.g., MAPK cascade (q = 7.73E-06), MAPK signalling pathway (q = 0.00046), positive regulation of extracellular signal-regulated kinase (ERK) 1 and ERK2 cascade (q = 0.00017), and positive regulation of protein phosphorylation (q = 8.70E-06; Fig. 2a). The intermediate filament together with the intermediate filament cytoskeleton appeared to be substantially enriched by 18:1c12 (q = 3.99E-06 and q = 8.34E-06, respectively), while vesicle-mediated transport was generally enhanced by 18:2t11,c15-related genes (q = 1.57E-05) (Fig. 2a).
Similar to GO terms, KEGG analysis indicated that the 12:0-related genes were involved in the GnRH signalling pathway, MAPK signalling, type I diabetes mellitus and inflammation processes such as graft-versus-host disease and systemic lupus erythematosus (Fig. 2b). The same analysis indicated that the systemic lupus erythematosus pathway was also enriched for 18:1t9, 18:1t10, TFA and TFA18:1. The oocyte meiosis pathway was ranked at the top of the KEGG list of the most impacted functions for SFA (q = 0.00017; Fig. 2b), whereas the ErbB signalling pathway and bacterial invasion of epithelial cells was clearly induced by 18:3c9,c12,c15-related genes. Basal transcription factors appeared to be greatly impacted by the 16:1 index-related genes (q = 0.00020; Fig. 2b).
AWM matrix construction and gene networks. We retained 15,277 annotated SNPs out of the 37,568 SNPs for AWM matrix construction analyses. After applying a series of filtering steps (see Material and Methods), 1,575 SNPs, corresponding to 1,575 unique genes, were used to build the AWM matrix. Hierarchical clustering of traits evidenced 2 clusters which best described the FA profiles according to similarities in their origins or in their metabolic processes, e.g., alpha-linolenic acid (18:3c9,c12,c15) clustered with the n3 FAs, and 6:0 with the SCFAs (Supplementary Fig. S1). Correlations across AWM rows were used to predict gene associations and build a network in which each node represented a gene and each edge a significant interaction. In total, 27,050 significant edges between 1,575 nodes were identified based on the Partial Correlation coefficient with Information Theory (PCIT) algorithm. The SNPs detected with the AWM approach explained 67% of the phenotypic variance for 12:0, which was significantly larger (P < 0.001) than the variance explained by the same number of randomly selected SNPs (10,000 replicates) (Fig. 3). After applying the gene network reduction strategy (filtering for sparse correlations ≥ |0.80|), we obtained a network with 1,628 significant interactions and 791 nodes (Fig. 4). The node degree followed an approximate power law distribution (R 2 = 0.933; y = −1605x379), which suggests that the reduced network was scale-free and that a few gene nodes acted as hubs with a large number of links to other gene nodes. Analysis of other network topological parameters, e.g., node closeness, eccentricity and    Table 4 summarizes the 10 most highly-connected nodes based on significant interactions (node degree) and Supplementary Table S3 shows their positions in close proximity to the QTLs related to milk fat or milk FAs deposited in the Cattle QTL database 32 . As the SNP marker Hapmap54104-rs29010930 was located ~ 2 kb up-stream of CNNM1 (at ~20.3 Mb on BTA26), we investigated whether this SNP affected the transcriptional regulation of CNNM1 by comparing genomic regions up-stream of CNNM1 across different species using mVISTA. The results revealed the presence of a cluster of conserved non-coding elements (CNEs) surrounding Hapmap54104-rs29010930 ( Supplementary  Fig. S2). We used IPA to investigate the co-enriched functions/pathways and other biological features within the network. Enriched pathways included MAPK related pathways (e.g. "ERK/MPK signaling", P = 8.32E-03; "GnRH signaling", P = 1.55E-06), "Phosphatidylglycerol Biosynthesis II" (P = 6.76E-03); pathways related to lipid metabolism such as "Fatty acid activation"(P = 9.12E-03), "Adipogenesis pathway"(P = 1.91E-02), "Triacylglycerol Biosynthesis" (P = 8.51E-03), "Triacylglycerol degradation" (P = 3.02E-02), "PPAR signalling" (P = 1.38E-03) and "CDP-diacylglycerol Biosynthesis I" (P = 4.47E-03); pathways related to hormone signalling, e.g., "Pregnenolone Biosynthesis" (P = 1.91E-02). The top IPA computed networks showed that the genes in the networks were associated with "Cell Morphology, Cellular Function and Maintenance, Carbohydrate Metabolism" and "Immunological Disease, Inflammatory Disease, Inflammatory Response". The full list of enriched pathways and computed networks is reported in Supplementary Table S4.
In parallel, we have also generated a supplementary TF network by investigating the potential impact of the combination of one or more TFs on the expression of the other genes in the network with the lowest redundancy. Hence, we explored 383,306 possible combinations of TF trios among the 1,977 TFs identified by 33 . The analysis allowed us to identify BTB domain and CNC homolog 2 (BACH2), E2F transcription factor 3 (E2F)3, and lysine demethylase 5 A (KDM5A) TFs, which controlled the transcription of 877 unique genes in our network (~56% of the genes in the AWM matrix) (Fig. 5) and which might play a pivotal role in orchestrating adaptations of the FA metabolism in the mammary gland. Functional analyses of these 877 target genes by ClueGo revealed the metabolic pathways to be the most highly impacted, of which lipid and carbohydrate metabolism were the most important. Among lipid metabolism, glycerophospholipid metabolism and sphingolipid metabolism were the most highly induced, followed by the MAPK activity-related pathways (e.g., the MAPK signalling pathway) and the GnRH signalling pathway. We also found overrepresentation of pathways related to reproduction (e.g., progesterone-mediated oocyte maturation) as well as the glutamatergic synapse and oxytocin signalling pathway. The full list of significantly enriched pathways and GO terms is presented in Supplementary Table S5.
In addition, to analyze in depth the TFs potentially controlling the network, we examined the putative binding sites within the promoter region of these TFs using the LASAGNA tool. Interestingly, the promoter region of  Table 3. Summary results of the genome-wide association analysis for milk fatty acid traits. 1 Table S6).

Discussion
GWAS results. We have reported here GWAS results for the profiles of 65 FA traits and the fat percentage in Brown Swiss cows' milk, including also lesser-studied FAs and/or those present in small concentrations. Genomic heritabilities exhibited a medium-high relationship with the genetic heritabilities estimated using a standard animal model (r = 0.70). Similarly, we found a moderate agreement between AWM column-wise correlations and the genetic correlations among FAs (r = 0.70) 9 . Several GWAS studies are available for bovine milk FA profiles in Holstein and Jersey populations [17][18][19]34,35 . However, only the main or most representative individual FAs or FA groups have generally been investigated. We found some correspondence with previous results, e.g., the presence  35 . In particular, the region on BTA26 covers the known candidate gene SCD1 (located at ~21.14 Mbp), confirming its notable influence on 14:1 index and consequently on 14:1c9. Indeed, although mammary SCD1 may act on several substrates (i.e. 14:0; 16:0, 17:0; 18:0; 18:1t11), 14:1 index has been considered the best proxy for SCD activity in the mammary gland, being 14:0 almost exclusively produced via de novo synthesis in the mammary gland 36 . Accordingly, the strongest association in the present study was found for 14:1 index (ARS-BFGL-NGS-32553, P = 1.36E-09) at 22.98 Mbp on BTA26, which is 1.84 Mb away from SCD1. Interestingly, we identified CNEs surrounding the marker Hapmap54104-rs29010930, which was located on BTA26 at 20.20 Mb, ~2 kb upstream of CNNM1, one of the top nodes in the reduced network. This marker was also detected by standard GWAS analysis and significantly associated to 14:0 (P = 2.16E-05; Supplementary Table S1). Because CNEs are now known to contribute 4.8% to 9.5% of the variability in the genome 37 , SNPs within this region may have important functional consequences on FA metabolism or synthesis in mammary gland. Indeed, recent   In the network, every node represents a gene, whereas every edge connecting two nodes represents a significant interaction (correlation value ≥|0.80|). The nodes shape indicates whether the node is a transcription factor (triangles), a miRNA (hexagon), a metabolite (round rectangle), a membrane receptor (rectangle), a transporter (parallelogram), or other type of genes (ellipses). Information of molecule type was obtained using Ingenuity Pathway Analysis (IPA; version 5.5, Ingenuity Systems, USA). The list of identified TF was updated with that reported by 33 . The node colour represents the biological function of the gene, according to IPA. The edge colour intensity indicates the level of the association: red = positive correlation -and blue = negative correlation between two nodes. The width of the edge indicates the value of the correlation; a wider edge corresponds to a higher correlation in absolute value. findings suggest that CNEs are potentially involved in gene regulation, often encoding for enhancer elements, which may also be located far from their target gene 38,39 . Support for this hypothesis might be gathered by the results of GWAS analysis obtained after conditioning for Hapmap54104-rs29010930, which evidenced a large drop in h2 for 14:0 (from 0.075 to 0.017). The window of consecutive SNPs significantly associated with 14:1 index and 14:1c9 in region 26b is very close to SORBS1 (located at ~16.72-16.92 Mbp), which is involved in the regulation of insulin signalling in human adipose tissue 40 Table 4. Description of the ten most connected nodes in the co-association network*. 1 Ap: associated phenotypes. *Network represented in Fig. 4 which was obtained after filtering the complete PCIT-gene network for sparse correlation ≥|0.80|.

Figure 5.
Activators and repressors of the regulatory network of genes associated with the bovine milk fatty acid profile. In the network, the best trio of transcription factors is showed: E2F3, KDM5A and BACH2. The nodes shape indicates whether the node is a transcription factor (triangles), a miRNA (hexagon), a metabolite (round rectangle), amembrane receptor (rectangle), a transporter (parallelogram), or other type of genes (ellipses). The node colour represents the biological function of the gene according to Ingenuity Pathway Analysis (IPA) annotation. The list of identified TF was updated with that reported by 33 . The edge colour intensity indicates the level of the association: red = positive correlation -and blue = negative correlation between two nodes.
of bovine SCD according to its pro-lipogenic role 41 . The ruminant mammary gland does not have a total requirement for insulin to activate lipogenesis but the lipogenic role of insulin signaling in the bovine mammary gland might acquire a biological meaning with the advancing of lactation and the increase in insulin sensitivity 42 . The SORBS1 gene has also been previously reported as a potential candidate gene associated with bovine milk FA profile 35 44 Table S1).
On the other hand, we also found some of our results to be inconsistent with previous studies. The most important differences were found on BTA14, where DGAT1 (located at ~1,8Mbp) is recognised as influencing the profile of several FA traits, including RA, 14:1, 16:1, 16:0, 18:1c9, 14:1 index, 16:1 index, 18:1 index 18:2c9,c12, 18:3c9,c12,c15 18,25 and milk fat 19,45 . In the present study, we also found a significant association with the milk fat but at a different position (~26.00 Mbp). Other associations on BTA14 were with 18:1t4 (~6.91 Mbp), 11:0 (~17.38 Mbp), 14:0 iso (~40.79 and 57.15 Mbp) and 15:0 iso (~40.79 and 58.85 Mbp). These differences can be attributed to the fact that DGAT1 is nearly monomorphic in the Italian Brown Swiss 12,46 , therefore the observed associations might reflect the effect of other multiple genes. For instance, Hapmap25446-BTC-054694 was located ~0.3 Mbp from CYP7A1 which catalyses a rate-limiting step in cholesterol catabolism 47 . Furthermore, a polymorphism on this gene has been shown to influence plasma lipids in humans 48 .
Our GWAS study did not detected SNPs associated to some of the well-known candidate genes for milk FA profile (e. g. FASN, MGST1). On the other hand, some other previously reported candidate genes were confirmed (e.g. ORL1, GNPAT, GH). GWAS power depends on sample size, effect size, causal allele frequency, and marker allele frequency and its correlation with the causal variant 49 . Some divergences with previous results might be attributed to several factors such as differences in the populations studied (breed, number of animals, environment and management conditions, physiological and metabolic factors, structure of the linkage disequilibrium among the genetic markers), and differences in the statistical model used for GWAS analysis and/or in the marker densities.
Pathway and network analyses. Pathway and network analyses from GWAS data were performed to find both candidate regulatory sites and the most likely functional variants. We identified 12:0-related genes as uncovering the greatest number of pathways/GO terms, suggesting that 12:0 may play a key biological role in the control of milk fat composition. It is worth mentioning that the palm kernel oil, which is a rich source of 12:0, can be used as supplement in the feed industry. However, in Italy palm kernel oil is not regularly adopted as indirectly suggested by recent studies carried out in Northern Italy 50,51 . Concentrates might be integrated with lipids to increase the energy content of the diets but generally soybean, linseed and sunflower (as whole seeds, expellers or oil) are used.
The genes related to 12:0 were mostly involved in MAPK activity, ERK1 and ERK2 cascade and protein phosphorylation. The MAPK pathway has been found to be responsive to ApoA-1/ABCA1 activity in bovine mammary epithelial cells in vitro 52 . Further, the presence of ABCA1 and ABCG1 has been detected in mammary epithelial cells and in the membrane of milk fat globules which suggest that these proteins might be involved in cholesterol exchange between mammary epithelial cells and milk 53 . In the ruminants, only a small part of cholesterol in milk seemed to be synthetised in the mammary gland while it is mainly derived from the blood uptake 54 . Genes related to the cholesterol synthesis are induced in the bovine liver after parturition 55 , suggesting that high levels of cholesterol are delivered by lipoproteins to the mammary gland 56 . Therefore, we might speculate that the significance of the association between 12:0 and alleles related to genes involved in MAPK pathway could be likely related to role of cholesterol in the fat globule membrane. Finally, capric and lauric acid have been also shown to have anti-bacterial and anti-inflammatory activities in vitro and in mouse model 57,58 . The anti-inflammatory effect seemed to be partially mediated by the inhibition of NF-κB activation and phosphorylation of MAPK 58 , suggesting a putative link between the immune signalling pathways and the mammary cells' anti-inflammatory response. Activation of the GnRH pathways suggests a link between lauric acid, MAPK activity and cholesterol synthesis and release. The MAPK are also acknowledged as being involved in the transcriptional activation of a wide variety of genes regulating the biosynthesis and secretion of the gonadotropins, e.g., the luteinizing hormone (LH) and the follicle-stimulating hormone (FSH), both related to sex steroid hormone synthesis, follicle growth and oocyte maturation 59 . Cholesterol is also the precursor for the biosynthesis of steroid hormones, including sex steroids and corticosteroids 60 . The enrichment of pathways related to reproduction, e.g. oocyte meiosis, has been also previously associated to the milk profile of 12:0 in Danish Jersey 19 . All these novel putative biological roles for 12:0 in the bovine mammary gland are summarized in Fig. 6.
Pathway analyses of the set of genes included in the regulatory network further confirmed the relevance of alleles involved in MAPK signalling, cholesterol biosynthesis, hormonal signalling and reproduction. Overrepresentation of the "Oxytocin signalling pathway" may be explained by oxytocin (OXT) effects in the regulation of milk secretion from lactating mammary gland in mammals including ruminants 61 . Oxytocin was suggested to have direct effects on the mammary epithelium in mammals, in that it stimulates milk lipid secretion, although the mechanisms are still not clear 62,63 . Finally, OXT also plays an important role in many reproduction-related functions in female ruminant; accordingly, steroid hormones seemed to act as (indirect) regulators of the OXT-receptor signalling 64 .

Identification of transcription factors within the regulatory network. Mining for transcription
factors and their putative target genes is an important component of the system biology approach and allows transcriptional networks, which may have important regulatory functions for FA metabolism and synthesis, to be uncovered. The AWM approach allowed inferring gene interactions and computing networks based on the co-association patterns across phenotypes, using the PCIT algorithm which is based on partial correlations and information criteria. On the other hand, IPA provided relevant networks and up-stream regulators analysis based on the information deposited in the expert-annotated Ingenuity Knowledge Base. Integrating these approaches allowed to formulate molecular mechanistic hypotheses and identify upstream regulators likely responsible for phenotypic or functional outcomes.
The TF network analyses revealed that the best trio of TFs (E2F3, KDM5A and BACH2) presented co-association with a large number of genes involved in processes related to lipid and energy metabolism described in mammary tissue of ruminants, e.g., fatty acid β-oxidation 65 , PPAR/RXR activation 66 and triacylglycerol biosynthesis 67 . This is of interest because manipulation of those TFs networks either directly (e.g., inhibiting or activating one or several TFs in the network) or indirectly (e.g., through selection) can lead to changes in milk FA synthesis. E2F3, one of the key regulators, is a cell cycle-related factor whose function in non-ruminants is associated with the regulation of adipocyte differentiation 68 and lipid metabolism 69 . This TF exhibited co-association with 487 target genes playing a role in regulating lipid metabolism/transport and glucose metabolism (e.g., DGKB, SCD5, ARNTL2, HMGCR, DAGLA, PLA2G4A, ACOXL and LPCAT2). For instance, the concentrations of trans-FA and CLA isomers in milk have been positively associated to SCD5 expression in bovine mammary gland 70 ; however, its substrate affinity and role in the mammary tissue are still unclear 71 . The second predicted key TF was BACH2, which was co-associated with 250 genes. The BACH2 functions as transcriptional repressor at the immune system level in human 72 ; however, this gene has also been associated with intramuscular fat content in cattle 73 . Predicted target genes included lipid-related genes, such as INSIG1, ABCG1, RORA, HMGA2, ESR1 and ACSL3. The insulin-induced gene 1 (INSIG1) regulates the responsiveness of SREBP1 and 2 processing via SCAP, thereby altering the rate of lipogenesis 22 . Indeed, changes in dietary lipid composition affect the expression of INSIG1 in bovine mammary gland 74 . Furthermore, SNPs on INSIG1 were significantly associated with differences in UFAs, SFAs and desaturation indices in bovine milk 11 . Acyl-CoA synthetase long-chain family member 3 (ACSL3) plays a role in the conversion of free LCFAs into fatty acyl-CoA esters controlling lipid biosynthesis and FA oxidation. Previous works have identified SNPs on ACSL3, that were associated with bovine milk FA composition 75,76 . The third predicted TF was KDM5A, which exhibited co-association with 269 target genes. The KDM5A functions as histone demethylases and contributes to co-repression of gene transcription. The KDM5A target genes include a large set of cell cycle regulators and genes involved in lipid metabolism, such as LPIN1, ACACA, CHPT1, DGKG, GNPAT, PLD1 and ADCY3. The LPIN1 functions as a nuclear transcriptional co-activator to regulate the channeling of FAs toward milk triglyceride synthesis in the bovine mammary gland 77 , whereas ACACA catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, which is the rate-limiting reaction in FA biosynthesis 78 . Polymorphisms on LPIN1 and ACACA have been recently associated to the bovine milk FA profile 14,16 .
These TF identified as being responsive to the milk FA profile might represent new targets for more detailed functional studies. However, it is worth mentioning that no direct evidence for E2F3, KDM5A and BACH2 genes expression in bovine mammary gland is currently available 79 ; therefore, investigation of their expression level in bovine mammary epithelial cells is needed to gain support for their potential role as regulators of milk FA profile at molecular level. On the other hand, in cow these TFs are expressed in tissues related to nervous system, i.e. hypothalamus (E2F3, KDM5A and BACH3) and brain (E2F3) 79 ; it is well known that molecular pathways controlling FA metabolism are highly interconnected and therefore we might assume a putative relationship with FA even if these genes are not expressed in the mammary gland.
Current limitations for a more complete systems biology approach to milk FA regulation in dairy cows include a lack of additional "omics" data (e.g., metabolomics, proteomics, microRNAomics, epigenomics) and the incomplete genome annotation. Further, functional pathways available in livestock animals are still limited and most of the information is derived from human and animal models studies 28 . Consequently, the identification of competitive functional pathways is often challenging. Despite these limitations, our results show that genetic variants in the E2F3, BACH2 and KDM5A genes probably play a key role in modulating bovine milk fat profile. Interestingly, these genes and many other lipid-related genes were not identified by standard GWAS approach, due to the stringent P-value threshold. The use of GWAS analysis coupled to network analysis allowed for a more holistic study of the multi-faceted control of fatty acid composition in milk. Even if some of our findings do not fully agree with previous GWAS studies in terms of (re)discovery of well-known candidate genes for bovine milk FA profile, the presented results are substantially coherent with biological processes and cellular functions implicated in the regulation of lipid metabolism in bovine mammary gland. Such information provides the basis for more detailed functional studies at the level of transcription factors and the subsequent effects on transcriptional networks. The new knowledge of the genetic control of milk FA composition could help in the development of selection strategies aimed at improving the quality of milk for human consumption. In particular, the information inferred from the present study might be incorporated as biological prior into prediction models such as BayesRC 80 and GFBLUP 81 to shed more light into the genetic basis of complex traits such as milk FA profile and to improve the accuracy of genomic prediction. On the other hand, the potential limitation of this approach relies on the need to underpin the actual role of the predicted gene regulators in the bovine mammary gland. Supporting evidences might be obtained by using co-expression analyses 82 , aiming to validate the predicted gene-gene interactions and to shed more light on the biological pathways driving variations in milk fat profile. It is likely that establishing a functional rationale underlying the importance of allelic variation and candidate genes for milk FA composition will become a major component of following up the genes emerging from GWAS.

Methods
Ethics statement. The cows included in this study belonged to commercial private herds and were not subjected to any invasive procedures. Milk and blood samples were previously collected during the routine milk recording coordinated by technicians working at the Breeder Association of Trento Province (Italy) and therefore authorized by a local authority.
Animals, phenotypes and genotypes. Milk samples from 1,264 Italian Brown Swiss cows belonging to 85 herds located in Trento province in north-eastern Italy were collected once during the evening milking, as described in 83 . Herds were selected to represent different environments and dairy farming systems including feeding regimens (e.g. different amount and type of forage, amount of compound feed or type of ration). Details about dairy farming systems including animals feeding conditions are reported in 84 . Immediately after milking, the milk samples (without preservative) were refrigerated at 4 °C and transferred to the laboratory.
Analysis of the milk fat composition has been previously described 9 . In short, fat percentage was determined in individual milk samples using a MilkoScan FT6000 (Foss, Hillerød, Denmark) and analysis of the FA profile was performed using gas chromatography (GC) with 65 milk FA traits included (47 individual FAs, 13 FA groups and 5 desaturation indices).

Genome-wide association study.
A single marker regression model was fitted for GWAS using the GenABEL package 85 in the R environment and the GRAMMAR-GC (Genome wide Association using Mixed Model and Regression -Genomic Control) approach with the default function gamma 86 . The GRAMMAR-GC procedure consists of 3 steps. Firstly, an additive polygenic model with a genomic relationship matrix is fitted. The polygenic model was: where y is a vector of milk FA traits; β is a vector with fixed effects of days in milk (classes of 30 days each), parity of the cow (classes 1, 2, 3, ≥4) and herd-date (n = 85); X is the incidence matrix that associates each observation to specific levels of the factors in β. The two random terms in the model were animal and the residuals, which were assumed to be normally distributed as a N G (0, ) , where G is the genomic relationship matrix, I is an identity matrix, and σ g 2 and σ e 2 are the additive genomic and residual variances, respectively. The G matrix was constructed in the GenABEL R package, where for a given pair of individuals i and j, the identical by state coefficients ( f i j , ) is calculated as: where N is the number of markers used, x i k , is the genotype of the i th individual at the k th SNP (coded as 0, ½ and 1), p k is the frequency of the "+" allele and k = 1, …, N.
In the second step of GRAMMAR-GC, the residuals obtained in (1) are regressed on the SNP (single marker regression) to test for associations. In the last step, the Genomic Control (GC) approach corrects for conservativeness of the GRAMMAR procedure and estimates of the marker effects are obtained 87 . A P-value threshold of 5 × − 10 5 was adopted to determine significant associations 88 . Manhattan plots were drawn using the R package qqman 89 . The variance explained by each SNP was calculated as 2pqa 2 , where p is the frequency of one allele, q = 1-p is the frequency of the second allele and a is the estimated additive genetic effect. A scan for genes around 1Mbp upstream-downstream from the significant SNP was performed using the Ensembl Bos taurus UMD3.1 database (http://www.ensembl.org/index.html).
Model (1) was also used to estimate variance components and the genomic heritability of the traits based on the genomic relationship matrix. Heritability was estimated as: The proportion of the phenotypic variance explained by the SNPs included in the AWM was estimated using GenABEL and the previously described model. Firstly, a G matrix based only on the SNP s included in the AWM was constructed. Secondly, the same number of randomly selected SNPs was used to build 10,000 G matrices (10,000 replicates) and estimate the proportion of phenotypic variance explained by these randomly selected SNPs.
The r-squared statistic was chosen to predict the extent of LD using the R package LDheatmap 90 . Conserved non-coding elements were explored using mVISTA, a web tool for comparative genomic analysis (http://genome.lbl.gov/vista/index.shtml), which allows sequences from multiple species to be compared and visualized with annotation information.
Pathway analyses. Pathway analyses were performed to identify the biological mechanisms contributing to the milk fat profile, as previously detailed 29 . Briefly, the SNPs were divided into two categories "non-relevant" and "relevant" based on a nominal P-values < 0.05. By using a less stringent significance threshold (respect to the GWAS study), we aimed to capture the effect of less significant SNP which however can contribute to explain the variability for the investigated traits, possibly as part of organized pathways and/or biological processes. Then, the relevant SNPs were assigned to a gene if they were located within the gene or within a flanking region 15 kb up-and downstream it 91 , using the BiomaRt R package 92,93 and the Ensembl Bos taurus UMD3.1 assembly as reference 94 . For functional annotation, the Kyoto Encyclopaedia of Genes and Genomes (KEGG) 95 and the Gene Ontology (GO) 96 databases were used to define pathways and functional categories associated to the gene sets. Only GO and KEGG terms with >10 and <1000 genes were included in the analyses to avoid testing broad or narrow functional categories. For each functional category, a Fisher's exact test was applied to test for overrepresentation of significant gene sets. A false discovery rate (FDR) correction was used to control for false positives with the cut-off for significant enrichments set at FDR < 0.05. The gene-set enrichment analysis was performed using the Bioconductor package goseq in the R environment 31 . SNP co-association and network analyses. Along with the biological pathway analysis, SNP co-association and network analyses were carried out in order to detect key gene regulators, functional connections and networks of genes affecting the milk fat profile.
Given that many of the original annotations for the BovineSNP50 v.2 BeadChip (Illumina Inc., San Diego, CA) have been found to be incomplete, the Illumina BovineSNP50 v.3 Genotyping Beadchip annotation (available since June 2016 at http://www.illumina.com/products/by-type/microarray-kits/bovine-snp50.html) was used to re-assign the probes to new probe sets based on SNP position.
The AWM was built starting from the results of a GWAS analysis carried out without imposing a significance threshold. In particular, the AWM was constructed from two matrices that contained row-wise SNPs and column-wise phenotypes, as previously reported in detail 30,97 . Elements in the first matrix were equal to the P value of association for each SNP and phenotype, while in the second matrix corresponded to the SNP z-score standardised additive effect. Based on the results of the pathway analyses, which showed that the 12:0 FA was associated with the greatest number of overrepresented categories/pathways, the 12:0 FA was selected as the key phenotype and the associated SNPs (P ≤ 0.05) were included in the AWM. In the next step, dependency among phenotypes was explored by estimating the average number of other phenotypes (Ap) that were associated with these SNPs at the same P-value threshold (≤0.05) (Ap = 8). Subsequently, all SNPs that were associated with at least 8 phenotypes at P ≤ 0.05 were included in the AWM. In the next steps, the AWM was built following the procedure described by 30 , but only SNPs within genes or located close to intergenic SNPs (within 10 kb of the coding region) were selected. A distance of 10 kb was chosen because the probable size of the promoter region of a given gene is heterogeneous. In addition, to identify putative regulators, the TFs reported by 33 and the microR-NAs (miRNAs) that were mapped to the UMD3.1 bovine genome assembly (GenBank assembly accession: GCA_000003055.3) were also included in this analysis. The Pearson correlations obtained from pair-wise correlations of columns were computed and visualised as a clustering tree using the hclust R function 98 . The table-like structure of the AWM was used as input to the information theory (PCIT) algorithm, which uses a partial correlation in an information theory framework to ascertain significant gene-gene interactions (co-associations) 99 . To consider only the high-confidence gene co-associations determined by PCIT, those with correlation ≥|0.80| were retained, on the assumption that those genes have relevant biological significance for the key phenotype from which the AWM-PCIT was derived. The co-association network was automatically laid out using the organic layout algorithm in Cytoscape V2.7 (http://cytoscape.org). Network topological parameters and node centrality values were calculated using NetworkAnalyzer 100 and CentiScape plugins 101 to gain insights into the organisation and structure of the complex networks formed by the interacting molecules. In parallel, the list of co-associated genes was also fed into an Ingenuity Pathway Analysis (IPA, version 5.5; Ingenuity Systems, USA) to identify relevant categories of molecular functions, cellular components and biological processes. The IPA enabled us to identify (i) significantly overrepresented functional GO annotations, (ii) their over-or under-expression, and (iii) group-specific transcriptional networks. All listed or reconstructed cellular pathways were derived from the Ingenuity Knowledge Base which collects biological interactions and functional annotations derived from various experimental contexts and manually curated for accuracy from the literature and third-party databases. The IPA output a statistical assessment (based on a Fisher's exact test) of the significance of representation for biological functions and signaling pathways (P-value < 0.05). The IPA computed networks and ranked them according to a statistical likelihood approach.
Once the TFs and their target genes to which they were potentially connected were identified in the AWM-derived network, an information lossless approach 102 was used to identify the optimal subset of TFs spanning the majority of the network topology. Pathway and ontology analyses of the predicted target genes (co-associated with the best TF trio) were carried out using the Cytoscape plugin ClueGo 103 . The Benjamini & Hochberg correction for multiple testing was used with the cut-off for significant enrichment set at FDR < 0.05. The LASAGNA-Search 2.0 web tool 104 was used to search for TFs binding site using matrices in the TRANSFAC public database and a P-value significance threshold of 0.001.