Introduction

Bovine milk is an important human dietary component, serving as an important delivery medium for proteins, minerals, vitamins and lipids including fatty acids and cholesterol (CHL). Milk fat is one of the principal contributors to daily dietary CHL intake for humans1. Milk CHL content is highly variable between species, breeds and herds and is influenced by many factors including genetics and nutrition2,3. Previously, we demonstrated that genetic factors contributed 10 to 18% of the total phenotypic variation in milk CHL content4.

High concentrations of total or low-density lipoprotein CHL (LDL-CHL) in human blood are linked to risk of cardiovascular diseases (CVD)5,6,7,8,9,10. Consequently, numerous genome wide association studies (GWAS) have been devoted to mapping genomic regions and variants affecting total CHL, LDL-CHL, high density lipoprotein CHL (HDL-CHL) and triglyceride11,12,13,14. In total, 126 GWAS have been performed on CHL related phenotypes in humans and animal model species (https://www.ebi.ac.uk/gwas/search?query=cholesterol, accessed on 09th January, 2018). Although mechanisms regulating CHL have been intensively studied in humans15,16,17,18, few studies have been devoted to the genetics of CHL in livestock species. In cows, several gene expression/proteomics studies have reported genes with potential involvement in milk CHL concentration/metabolism19,20,21,22,23,24,25,26,27 but their actual roles and associated SNPs with CHL content in milk have not been investigated. For instance, Mani et al.28 identified ATP-binding cassette sub-family A member 1 (ABCA1) and ATP-binding cassette sub-family G member 1 (ABCG1) proteins in milk fat globule membranes and suggested their potential involvement in CHL exchange between mammary epithelial cells and alveolar milk. Using cell culture studies, Ontsouka et al.21 indicated that CHL transport in mammary epithelial cells was mediated by APOA-1/ABCA1 and ABCG1/HDL dependent pathways. Studying the response of CHL metabolism to negative energy balance induced by feed restriction, Gross et al.27 observed that CHL metabolism was influenced by nutrient and energy deficiency according to stage of lactation in dairy cows. Together, these studies19,20,21,22,23,24,25,26,27 suggest modulatory roles of cow’s genetics, physiological stage and diet on the expression of genes involved in CHL synthesis. However, the specific roles of the various genes and their sequence variants in regulating CHL synthesis and content in bovine milk have not been studied and no GWAS has been performed for milk CHL content. This study aimed to identify associated single nucleotide polymorphisms (SNPs), candidate genes and biological pathways involved in the regulation of milk CHL content via GWAS and pathway enrichment. Moreover, mRNA sequence data of mammary gland tissues from 12 cows were used to verify that the candidate genes identified by GWAS are expressed in the mammary gland.

Results

SNPs associated with milk cholesterol

Two and 20 SNPs were significantly associated with CHL_fat and CHL_milk, respectively at the genome wide significant threshold p < 5E-05 (Table 1, Fig. 1); while 19 and 36 SNPs (7 in common) were suggestively associated (p < 5E-04) with CHL_fat and CHL_milk, respectably (Table S1). The quantile-quantile (q-q) plot showed no systematic deviation from the diagonal (Y = X) indicating that the data were corrected for population stratification (Fig. S1). BTB-01524761 (rs42640895) and ARS-BFGL-NGS-4939 (rs109421300) were the most significantly associated SNPs with CHL_fat (p = 2.61E-05) and CHL_milk (p = 6.70E-19), respectively. Two significant SNPs for CHL_fat are located in an intergenic region of bovine chromosome (BTA) 17. The majority of significant SNPs (16 out of 20) for CHL_milk are located within a region of 1.4 to 3.3 Mb of BTA 14. Four LD blocks were detected in this region (Fig. 2) and one of the LD blocks also contained the most significant SNP (ARS-BFGL-NGS-4939 [rs109421300]) for CHL_milk. Other significant SNPs for CHL_milk are located on BTA 6, 15, 17 and 18. Several of the significant SNPs for CHL_milk are located in gene regions (seven within introns and two within exons) (Table 1). Three genes (relaxin–insulin-like family peptide receptor 1 (RXFP1), transmembrane protein 144 (TMEM144) and family with sequence similarity 198, member B (FAM198B)) are located in 0.5 Mb flanking regions to significant SNPs for CHL_fat. Genes including diacylglycerol O-Acyltransferase 1 (DGAT1), rhophilin-1 (RHPN1), cysteine and histidine rich 1 (CYHR1), ENSBTAG00000003606, vacuolar protein sorting 28 (VPS28), two pore segment channel 1 (TPCN1), cadherin 13 (CDH13), ENSBTAG00000045727 and MAF1 homolog, negative regulator of RNA polymerase III (MAF1) contained significant SNPs for CHL_milk (Table 1).

Table 1 Genome-wide significant SNPs for milk cholesterol content.
Figure 1
figure 1

Manhattan plot of genome-wide significant (p < 5E-05) and suggestive (p < 5E-04) SNP associations for milk cholesterol content in Canadian Holstein cows. The most significant SNPs with their corresponding p-values are indicated. CHL_fat: mg of cholesterol in 100 gram of fat, CHL_milk: mg of cholesterol in 100 gram of milk.

Figure 2
figure 2

Linkage disequilibrium (LD) pattern on a 1.4–3.4 Mb region of BTA 14. LD blocks are marked with triangles; values in boxes are LD (squared correlation coefficient, r2) between SNP pairs; red boxes indicate LOD > 2 and D′ = 1 (LOD is the log of the likelihood odds ratio, a measure of confidence in the value of D′, where D′ is the ratio of the linkage disequilibrium coefficient D to its maximum possible).

Gene ontology, pathways and transcription factor enrichments of positional candidate genes

A total of 207 and 320 genes (positional candidate genes) (58 in common, Table S1) annotated at 0.5 Mb flanking regions of 21 and 56 SNPs (significant and suggestive) for CHL_fat and CHL_milk, respectively (Table S1), were used as input for GO and pathways enrichment. A total of 59 and 112 GO terms were enriched for CHL_fat and CHL_milk positional candidate genes, respectively (Table S2). For CHL_fat, negative regulation of cyclin-dependent protein kinase activity (p = 0.001), basolateral plasma membrane (p = 0.007) and cyclin-dependent protein kinase regulator activity (p = 1.10E-04) were the most significant biological processes, cellular component and molecular function GO terms, respectively, enriched for positional candidate genes (Table 2). Meanwhile, cardiac muscle tissue development (p = 1.10E-04), anchored to membrane (p = 0.001) and interleukin-2 receptor binding (p = 8.60E-05) were the most significant biological processes, cellular component and molecular function GO  terms, respectively, enriched for CHL_milk positional candidate genes (Table 3). In addition, 5 KEGG pathways (neuroactive ligand-receptor interaction, focal adhesion, leukocyte transendothelial migration, tight junction and basal cell carcinoma) and 2 (glycerolipid metabolism and Notch signaling) were enriched for CHL_fat and CHL_milk positional candidate genes, respectively (Tables 2 and 3). The potential interactions between the positional candidate genes for CHL_fat and CHL_milk are shown in Figs 3 and 4, respectively. PRL10, GHRH, CALCB and RXFP1 interacted highly with other genes for CHL_fat (Fig. 3) while MAPK15, FAM83H, ARHGAP39, HEATR7A, CYHR1 and CPSF1 were among highly interacting genes in the CHL_milk protein interaction network (Fig. 4). Moreover, a total of 20 and 16 transcription factors were enriched for positional candidate genes for CHL_fat and CHL_milk, respectively (Table 4). The most enriched transcription factors for CHL_fat were CREB1 (p = 0.002), PPARD (p = 0.004) and CEBPB (p = 0.005) and for CHL_milk were LXR (p = 1E-11), DACH1 (p = 1E-07) and SMC4 (p = 1.19E-07).

Table 2 Gene ontology and pathways enriched for positional candidate genes of CHL_fata.
Table 3 Gene ontology and pathways enriched for potential candidate genes of CHL_milka.
Figure 3
figure 3

Protein-protein interaction network created using the STRING database for CHL_fat positional candidate genes. Network analysis was set at medium confidence (STRING score = 0.4). The line widths represent the level of interactions (wider lines represent stronger evidence of interactions). CHL_fat: mg of cholesterol in 100 gram of fat.

Figure 4
figure 4

Protein-protein interaction network created using the STRING database for CHL_milk positional candidate genes. Network analysis was set at medium confidence (STRING score = 0.4). Line widths represent the level of interactions (wider lines represent stronger evidence of interactions). CHL_milk: mg of cholesterol in 100 gram of milk.

Table 4 Significantly enriched transcription factors for positional candidate genes for CHL_fat and CHL_milk.

Pearson correlation of candidate gene expression (read counts) in mammary gland tissues with milk cholesterol content

Examination of RNA sequence data (read counts) of mammary gland tissues from 12 cows at mid lactation (day 120–180) indicated that among 207 positional candidate genes for CHL_fat, 35 genes were not expressed, 25 genes were very lowly expressed (each with total read counts <10), while 12 genes (TMEM120B, INSIG1, FLNB, RPN2, RASAL1, ARF4, MYL9, GRN, ORAI1, PLBD2, AQP1 and RSRC2) were highly expressed (each with total read counts >10,000) (Table S4a,b). Out of 320 genes for CHL_milk, 70 genes were not expressed, 36 genes were lowly expressed (each with total read counts <10), while 19 genes were highly expressed (each with total read counts >10,000) (Table S4a,c). LGB, RPL8, RPS19, EEF1D, ITGB1 and HNRNPF were the most highly expressed genes among the CHL_milk positional candidate genes. Moreover, the expression of 45 out of 207 CHL_fat and 72 out of 320 CHL_milk positional candidate genes was significantly correlated with CHL_fat and CHL_milk, respectively (Tables 5 and 6). The expression of genes including EPB41L1, DET1, DTX1, ABHD6, RSRC2, ITGA2B, MLXIP, KCTD6 and DLGAP4 was strongly and significantly correlated (|cor| > 0.8 and p < 0.01) to CHL_fat (Table 5). Moreover, the expressions of 28 genes were strongly and significantly correlated (|cor| > 0.8 and p < 0.01) with CHL_milk (Table 6) including ENSBTAG00000048096 and TONSL, as the two most significantly correlated (|cor| > 0.9 and p < 0.001) to CHL_milk.

Table 5 Positional candidate genes for milk cholesterol which are expressed in mammary gland tissues and also significantly correlated to cholesterol concentration in milk fat (CHL_fat)a of the same cows.
Table 6 Positional candidate genes for milk cholesterol which are expressed in mammary gland tissues and also significantly correlated to cholesterol concentration in milk (CHL_milk)a of the same cows.

Discussion

It is known that most cow milk CHL (about 80%) is derived from blood whereas a small portion (about 20%) is derived through local synthesis in the mammary gland29. Therefore, the regulation of milk CHL content may require complex mechanisms and the involvement of many genes and pathways. Recently, we reported heritability estimates for CHL_fat (0.09) and CHL_milk (0.18) suggesting that genetics contributes a proportion of the total phenotypic variances in milk CHL content4.

More SNPs (20) were significantly associated with CHL_milk as compared to two for CHL_fat at the genome wide significant threshold (p < 5E-05). Furthermore, 36 and 19 SNPs including 7 in common were suggestively associated (p < 5E-04) with CHL_milk and CHL_fat, respectively. In fact, 58 genes are located in 0.5 Mb flanking regions of 7 suggestively (p < 5E-04) associated SNPs (ARS-BFGL-NGS-110646 [rs109154988], ARS-USMARC-Parent-DQ786763-rs29020472 [rs29020472], BTB-01524761 [rs42640895], BTB-01712106 [rs42829960], Hapmap40322-BTA-100742 [rs41600454], Hapmap43002-BTA-63541 [rs41586803], and Hapmap52830-rs29014800 [rs29014800]) for CHL_milk and CHL_fat. Some of the genes have been reported to have potential roles in CHL metabolism such as protein tyrosine phosphatase 1β (PTPN1), diacylglycerol kinase eta (DGKH) and serine dehydratase (SDS). PTPN1 is an important gene for plasma total and HDL-CHL30,31,32,33 while DGKH encodes an enzyme responsible for the recycling and degradation of diacylglycerol, known as important for CHL efflux from adipose cells34. SDS gene on the other hand is known to contain a susceptibility loci for low HDL-CHL levels35. The most important QTL region for CHL_fat at 41.9 Mb of BTA 17 contained two significant SNPs (Hapmap40322-BTA-100742 [rs41600454] and BTB-01524761 [rs42640895]) for the trait. Relaxin–insulin-like family peptide receptor 1 (RXFP1), transmembrane protein 144 (TMEM144) and family with sequence similarity 198, member B (FAM198B) genes are positional candidate genes for CHL_fat, however, none of them has been reported to have a direct role in the regulation of CHL metabolism. RXFP1, one of four relaxin receptors, is known to play a role in signal transduction between extracellular/intracellular domains36. The activation of RXFP1 receptor stimulates the phosphorylation of mitogen-activated protein kinases such as ERK1/236. In fact, the phosphorylation of ERK1/2 is important for the regulation of CHL efflux37. RXFP1 is also among genes with more levels of interactions with other CHL-fat candidate genes, as shown by the interaction network (Fig. 3). However, RXFP1 was very lowly expressed in mammary gland tissues (Table S4) so its involvement with CHL_fat concentration might be through its activities in other tissues. The involvement of FAM198B and TMEM144 genes in CHL metabolism might be via their roles in the membrane, since TMEM144 is a carbohydrate transmembrane transporter while FAM198B play roles in golgi membrane functions. In fact, FAM198B was expressed in mammary gland tissues and also significantly correlated to CHL_fat concentration (Tables 5 and S4b), so its role in CHL synthesis in the mammary gland warrants further investigation.

An intergenic region of BTA 17, position 63 Mb, is another interesting region harboring two suggestive SNPs (ARS-BFGL-NGS-64029 [rs110842600] (p = 1.91E-04) and Hapmap52830-rs29014800 [rs29014800] (p = 5.80E-05)) for CHL_fat and CHL_milk, respectively (Table S1a,b). Among many genes (PLBD2, SDS, RITA1, PTPN11, DTX1, RASAL1, LHX5, CFAP73, IQCD, DDX54, OAS2, TPCN1, SLC8B1, SDSL and RPH3A) located within 0.5 Mb flanking regions of these two SNPs, protein tyrosine phosphatase 1β (PTPN1) has been directly linked to CHL metabolism30,31,32,33 and it has been identified as a candidate gene for both CHL_fat and CHL_milk in this study. Variants of PTPN11 have been found to associate with serum CHL level in a sex-specific pattern in human30 while Lu et al.32 identified PTPN11 as a candidate gene for human plasma HDL-CHL. In the mammary gland, PTPN11 gene was moderately expressed and had tendency (p = 0.067) of being correlated to CHL_fat concentration (Table S4b), therefore more studies are required to validate its role in CHL metabolism.

The QTL region at 117.7 Mb of BTA 4 harboring suggestive SNP ARS-BFGL-NGS-20980 (rs110814823) (p = 4.26E-04) for CHL_fat also harbors several important genes of CHL metabolism such as 5-hydroxytryptamine (serotonin) receptor 5 A (HTR5A)38,39 and insulin induced gene 1 (INSIG1)40,41. INSIG1 was the second most highly expressed gene among CHL_fat positional candidate genes in the mammary gland (Table S4), whereas HTR5A was not expressed in the mammary gland. However, the expression of INSIG1 gene in the mammary gland was not significantly correlated to CHL_fat concentration. It was shown recently that downregulation of INSIG1 gene in mammary gland tissues of lactating dairy cows following dietary supplementation with 5% linseed oil was predicted by Ingenuity Pathways Analysis software (Invitrogen, Carlsbad, CA, USA) to activate CHL concentration in the mammary gland42. Two flanking genes (disintegrin and metalloproteinase domain-containing protein 11 [ADAM11] and hexamethylene bisacetamide inducible 1 [HEXIM1]) of suggestive SNP ARS-BFGL-NGS-24479 (rs41916457) (p = 3.90E-04) at 45.1 Mb region of Bta 19 (Table S1a) have been reported to be involved in CHL metabolism43,44,45. However, the expression of both ADAM11 and HEXIM1 genes was not significantly correlated to CHL_fat concentration in this study.

The enrichment analyses identified several GO terms with protein kinase regulator activities including negative regulation of cyclin-dependent protein kinase activity (p = 0.001, most significant biological process GO term) and cyclin-dependent protein kinase regulator activity (p = 1E-04, most significant molecular function GO term). In fact, cyclin-dependent protein kinase has been identified as a key regulator of eukaryotic cell cycle46, and it might be linked to CHL metabolism via its role in the regulation of energy status47,48 or lipid metabolism in the liver49. Regulation of CHL homeostasis and CHL metabolism is associated with plasma membrane activities50,51. Enrichment results suggest a potential role of the (basolateral) plasma membrane in the regulation of CHL_fat. The plasma membrane was the GO term enriched with the largest number of positional candidate genes for CHL_fat while basolateral plasma membrane was the most significantly enriched cell component GO term for CHL_fat candidate genes (Tables 2 and S2a). Meanwhile, cell-cell signaling (p = 0.001) and cell communication (p = 0.004) (Table 2) were among the most significant biological processes GO terms for CHL_fat suggesting that the regulation of CHL_fat probably requires the interaction and shared signaling activities between different cell types. Among the five KEGG pathways significantly enriched for CHL_fat positional candidate genes, the tight junction pathway has important roles in the transportation of milk constituents in mammary gland cells52,53, therefore it might also function in the transportation of CHL from the blood stream into the mammary gland or from mammary gland cells (de novo synthesized) into milk. Focal adhesion is an important pathway for immune functions in bovine mammary cells54, for lactation involution55 and for epigenetic regulation of milk production56. The focal adhesion kinase protein has been found in bovine milk fat globule membrane which is the major store of CHL in milk57, therefore focal adhesion pathway might be important for milk CHL via its role in the milk fat globule. Many significant transcription factors enriched for CHL_fat positional candidate genes have multiple functions. For example, c-Myc (MYC) is essential for the regulation of cell cycle progression, apoptosis and cellular transformation58,59 while peroxisome proliferator activated receptor delta (PPARD) is important for the regulation of the transcription of genes associated with proliferation, metabolism, inflammation, and immunity60. In fact, PPARD is an important transcription factor regulating CHL metabolism since it plays important roles in the reverse CHL transport61.

For CHL_milk, the most significant SNP (ARS-BFGL-NGS-4939 [rs109421300]) is located in an intronic region of diacylglycerol O-acyltransferase 1 (DGAT1) gene at 1,801,116 bp on BTA 14. This SNP has been reported to be in complete linkage disequilibrium with the K232A substitution within the DGAT1 gene in German cows62. This SNP is also important for milk fat62 and fatty acid components63. Moreover, we also reported high LD among SNPs within and around the DGAT1 gene region (Fig. 2). Another significantly associated SNP for CHL_milk (ARS-BFGL-NGS-18365 or rs110892754) has been found to be important for 305 day milk fat yield64. The DGAT1 gene and the centromeric region of BTA 14 is important for the regulation of milk traits (milk fat yield, fat%, protein yield and protein%)62,64,65,66,67,68,69. DGAT1 is a key enzyme in triacylglycerol biosynthesis and also play important roles in the regulation of CHL metabolism70,71,72. In ApoE gene knock-out mice, DGAT1 deficiency decreases CHL uptake and absorption71. Therefore, the significant SNPs detected for CHL content in this study suggests that the DGAT1 gene and the centromeric region of BTA 14 might be important in the regulation of milk CHL content. In fact, the expression of DGAT1 gene in mammary gland tissues was also significantly correlated to CHL_milk concentration (p = 0.011) (Table 6), suggesting that DAGT1 might contribute to the regulation of CHL_milk metabolism in the mammary gland.

A significant SNP (ARS-BFGL-NGS-41837 or rs110597360) for CHL_milk on BTA 6 is located in an intergenic region and the nearest gene to this SNP is ENSBTAG00000001751, an orthologue of human CXXC finger protein 4 (CXXC4) gene. CXXC4 encodes a CXXC-type zinc finger domain-containing protein that functions as an antagonist of the canonical wingless/integrated signaling pathway73,74. The role of this novel gene in CHL_milk is unknown. On BTA 15, Hapmap38637-BTA-88156 (rs41596665) was significantly associated with CHL_milk and its flanking gene, mastermind like transcriptional coactivator 2 (MAML2) encodes for a member of the mastermind-like family of proteins which play important roles in the Notch signaling pathway75. In fact, the Notch signaling pathway was one of the pathways enriched for CHL_milk positional candidate genes in this study and it has been shown to have important roles in mammary gland development76. The Notch signaling pathway is important in the regulation of cell fate, cell proliferation and cell death in development77; however, there is no report of its direct role in milk CHL metabolism. On BTA 17, Hapmap52830-rs29014800 (rs29014800) was significantly associated with CHL_milk (p = 1.58E-05) and also suggestively associated with CHL_fat (Tables 1 and S1a), therefore this SNP might be important in the regulation of milk CHL content. On BTA 18, Hapmap39330-BTA-42256 (rs41605812), located in an intronic region of cadherin 13 (CDH13) gene (Table 1), is important for CHL_milk. A SNP within CDH13 has been reported to be associated with plasma adiponectin levels in Japanese population78 and with triglyceride/high density lipoprotein ratio in Korean cardiovascular patients79. This gene is moderately expressed in the bovine mammary gland and also showed a trend (p = 0.075) to correlate to CHL_milk concentration (Table S4c). However, the role of this gene in milk CHL metabolism remains to be characterized.

The enrichment results for positional candidate genes showed several GO terms related to heart development (Table 3) which might reflect the fact that many candidate genes for CHL also play roles in cardiovascular disease development or heart diseases. An interesting molecular function GO term enriched was interleukin-2 receptor binding. It is known that interleukin-2 gene plays important roles in the activation of STAT5a gene in mammary gland development80. Glycerolipid metabolism, another enriched pathway has been implicated in the biosynthesis of CHL81,82. Therefore, interleukin-2 receptor binding (GO term) and glycerolipid metabolism pathway might also play important roles in bovine milk CHL metabolism. Interestingly, the most important transcription factor enriched for CHL_milk candidate genes was liver X receptor (LXR) (p = 1.00E-11) which is an important regulator of CHL, fatty acid, and glucose homeostasis83,84,85. There are two LXR subtypes (LXRα and LXRβ) and LXRα, the dominant subtype is highly expressed in the liver and other tissues (intestine, adipose, kidney, and adrenals)86 while LXRβ is widely expressed in different tissues86. In our mammary gland RNA expression data, LXRβ (or NR1H2 gene) was also expressed at a higher level when compared to LXRα (or NR1H3 gene). In the liver, LXRα expression was not significantly correlated to CHL_milk during transition and early lactation20. Another notable transcription factor enriched for CHL_milk positional candidate genes was notch homolog 1 (NOTCH1) (p = 0.028) (Table 4), which indicates the importance of NOCTH signaling pathway in milk CHL regulation. The functions of highly interacted genes (MAPK15, FAM83H, ARHGAP39, HEATR7A, CYHR1 and CPSF1) in CHL_milk protein interaction network (Fig. 4), as well as highly significantly correlated genes (ENSBTAG00000048096, TONSL and ITGB1) (Table 6) in CHL metabolism are unknown and warrant further investigation.

The genetic variants identified in this study may facilitate selection in commercial breeding schemes either by incorporation in marker-enhanced selection or via implementation of genomic prediction including these identified genetic variants in a customized SNP panel. However, it is also important to consider potential limitations of our study including the limited size of resource population for GWAS, the relaxed p-value threshold used to select SNPs for gene set enrichments, potential for false discovery errors for certain enriched gene ontologies and pathways with few enriched genes in the gene list. The results should be interpreted with caution since both the results of associations (GWAS) and correlations derived from RNA sequence data may not reflect actual causative relationships. As already mentioned above, most CHL in milk is derived from the diet (which is partly reflected as CHL concentration in the blood) while only a small proportion, about 20%, is synthesized de novo in the mammary gland. Therefore, association analysis considering data on both blood and milk CHL concentrations might enhance knowledge of the implicated candidate genes in the regulatory pathways of milk CHL concentration such as dietary CHL transport from blood to the mammary gland and de novo synthesis in the mammary gland. Moreover, integration of gene expression data from the mammary gland and other tissues like the liver could identify the link between the mechanisms regulating CHL in the mammary gland and other tissues, and how these connections influence de novo synthesis of CHL in the mammary gland and milk CHL concentration.

To the best of our knowledge, this is the first GWAS on bovine milk CHL. The strongest SNP associations with milk CHL were detected on BTA14 and BTA17. This study identified several candidate genes (DGAT1, PTPN1, INSIG1, HEXIM1, SDS, and HTR5A), also important for human plasma CHL and related traits, that might be important for bovine milk CHL. Novel candidate genes (RXFP1, FAM198B, TMEM144, CXXC4, MAML2 and CDH13) for milk CHL content were identified. Enrichment analyses suggested the involvement of important gene ontology terms ((basolateral) plasma membrane and cell-cell signaling processes), pathways (tight junction, focal adhesion, Notch signaling and glycerolipid metabolism pathways), and several transcription factors (PPARD, LXR and NOTCH1) in the regulation of bovine milk CHL content. The expression of some positional candidate genes in the mammary gland and their correlation with milk CHL content was supported with RNA sequencing data and milk CHL concentrations from the same animals. This study has therefore provided an insight into the genomics of bovine milk CHL and identified potential candidate genes and pathways that might be further studied to identify/confirm casual mutations that might help in the selection of cows with desired milk CHL content.

Materials and Methods

Animal Resource and Cholesterol Measure

Animal selection and milk sampling has been described in our previous study4. In brief, 100 ml of milk from each of 1,848 cows from 29 herds (minimum: 33 cows/herd and maximum: 172 cows/herd) were used. The concentration of CHL in milk fat was determined by direct saponification and capillary gas chromatography according to Fletouris et al.87. About 0.2 mg milk fat was saponified in capped tubes with 0.5 M methanolic KOH solution by heating for 15 minutes and the unsaponifiable fraction was extracted with toluene and analyzed by capillary gas chromatography using Agilent HP 6890 Series Gas Chromatography (GC) System (Agilent Technologies, California, USA). The concentration C (mg/100 g of fat) of CHL (CHL_fat) in analyzed samples was calculated based on computed mass (nanograms) of the analyte in the injected extract. The concentration of CHL was expressed in mg/100 g of fat (CHL_fat) or mg/100 g of milk (CHL_milk). After editing data for cow registration number, dam and sire information, test date, parity and age at calving, a total of 1,793 cows with complete records were retained for further analysis.

Genotyping and Genotype Quality Control

DNA was isolated from hair follicles of 1,200 (out of 1,848) cows and genotyped using the Illumina BovineSNP50K BeadChip following manufacturer’s instructions (Illumina Inc., San Diego, CA). Genotype quality control was implemented by discarding animals and SNPs with call rate <0.95 and SNPs deviating from Hardy Weinberg equilibrium (p < 0.0001). Missing genotypes were imputed with FImpute 2 software88 and subsequently SNPs with MAF <0.05 were excluded. After quality control, 40,196 SNPs and 1,183 animals were retained for the association analyses.

Association Analyses

The association analyses were performed using a univariate single SNP mixed linear model implemented in DMU package89. In summary, the model for each SNP (analyzed individually) was as follows (model 1):

$$y=1\mu +XB+Za+mg+e$$
(1)

where y is the vector of phenotype (CHL_fat, CHL_milk), 1 is a vector of 1s with length equal to number of observations, μ is the general mean, X is an incidence matrix relating phenotypes to the corresponding fixed effects, and B is the vector for fixed effects which includes interaction between herd and parity and days in milk (DIM), Z is an incidence matrix relating phenotypes to the corresponding random polygenic effect, a is a vector of the random polygenic effect N(0, Aσu2) (where A is the additive relationship matrix and σu2 is the polygenic variance), m is a vector with genotypic indicators 2, 1, or 0 for genotypes AA, AB and BB, respectively associating records to the marker effect, g is a scalar of the associated additive effect of the SNP, and e is a vector of random environmental deviates: N(0, σe2) (where σe2 is the general error variance). The parameters of the model σ2u and σ2e were estimated using restricted maximum likelihood (REML) for each SNP. To determine the significantly associated SNPs, an F-test was used to test the null hypothesis H0: β = 0. Distribution of test statistics was assessed by quantile-quantile (q-q) plot generated from association tests and the deviation from the null hypothesis of no SNP association with the trait. The markers with p nominal < 5E-05 were considered genome wide significant90 and markers with p nominal from 5E-05 to 5E-04 were considered suggestively genome wide significant to avoid many false negative results caused by stringent Bonferroni correction.

Detection of Linkage Disequilibrium Blocks

Since several significant SNPs may be clustered in the same region (QTL region), we performed Linkage Disequilibrium (LD) analysis to characterize Linkage Disequilibrium patterns (LD block) for these regions. The LD block was defined according to Gabriel et al.91 and was detected and visualized with Haploview software92. Gabriel et al.91 defined a LD block as a region within which 95% of SNP pairs show strong LD (strong LD is defined if the one-sided upper 95% confidence bound on D′ is >0.98 and the lower bound is above 0.7). Before constructing LD block, we excluded SNPs with call rate <0.95, SNPs deviating from Hardy Weinberg equilibrium (p < 0.0001) and SNPs with MAF <0.05 and Mendelian inheritance errors >1. During LD construction, pairwise comparisons of markers >500 kb apart were ignored according to default settings in the Haploview software.

Gene Mapping, Pathways and Transcription Factor Enrichment

We selected both significant and suggestive SNPs for pathway analyses because assignment of genes using only genome wise significant SNPs may ignore potentially important SNPs with lower significant levels, consequently missing out on key putative candidates and associated pathways. Nearby genes within a flanking distance of 0.5 Mb from significant and suggestive SNPs were queried from Ensemble database (Ensembl 83, Bos taurus UMD3.1), using bedtools93. Genes were submitted to the Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/) for KEGG pathways and Gene Ontology (GO) enrichment analyses94 while STRING v10.595 database was used to assess protein-protein interactions. The human genome was selected as background for enrichment instead of the bovine genome in order to take advantage of a richer database of information on the genomics of human CHL. Annotated pathways and GO terms were tested for enrichment using Fisher exact test. Pathways/GO terms were declared significantly enriched if they did not appear by chance with p < 0.0594. For STRING95 enrichment, the default options were used with the network edge selected based on confidence level. The minimum confidence threshold was set-up at the medium level with score of 0.4. In addition, a comprehensive gene set enrichment analysis for transcriptional machinery using ChIP-X enrichment analysis (ChEA2015)96 was performed with Enrichr (http://amp.pharm.mssm.edu/Enrichr/)97. The transcription factors were declared significantly enriched at p < 0.05.

Evaluation of Expression of Positional Candidate Genes Using Mammary Gland RNA-Seq Data

The RNA-Seq expression data of 12 cows used is a subset of the data from our previous study42. Cows were in mid lactation (day 120–180) and fed the control ration (Table S4a). The expression of positional candidate genes for milk CHL as read count (reads per kilo base per million mapped reads (RPKM)) is shown in Table S4b. The CHL content in milk obtained from the 12 cows on the same day that mammary gland biopsies where obtained for RNA-Seq was determined using the same methods described above87. The Pearson correlations of CHL content with the RPKM values of positional candidate genes were calculated using cor() function in R program98. The candidate genes were considered significantly correlated to milk CHL content at p < 0.05.

The care of animals and use procedures were according to the Canadian Council on Animal Care99 and were approved by the Animal Care and Ethics Committee of Agriculture and Agri-Food Canada.