Genome wide association study identifies novel potential candidate genes for bovine milk cholesterol content

This study aimed to identify single nucleotide polymorphisms (SNPs) associated with milk cholesterol (CHL) content via a genome wide association study (GWAS). Milk CHL content was determined by gas chromatography and expressed as mg of CHL in 100 g of fat (CHL_fat) or in 100 mg of milk (CHL_milk). GWAS was performed with 1,183 cows and 40,196 SNPs using a univariate linear mixed model. Two and 20 SNPs were significantly associated with CHL_fat and CHL_milk, respectively. The important regions for CHL_fat and CHL_milk were at 41.9 Mb on chromosome (BTA) 17 and 1.6–3.2 Mb on BTA 14, respectively. DGAT1, PTPN1, INSIG1, HEXIM1, SDS, and HTR5A genes, also known to be associated with human plasma CHL phenotypes, were identified as potential candidate genes for bovine milk CHL. Additional new potential candidate genes for milk CHL were RXFP1, FAM198B, TMEM144, CXXC4, MAML2 and CDH13. Enrichment analyses suggested that identified candidate genes participated in cell-cell signaling processes and are key members in tight junction, focal adhesion, Notch signaling and glycerolipid metabolism pathways. Furthermore, identified transcription factors such as PPARD, LXR, and NOTCH1 might be important in the regulation of bovine milk CHL content. The expression of several positional candidate genes (such as DGAT1, INSIG1 and FAM198B) and their correlation with milk CHL content were further confirmed with RNA sequence data from mammary gland tissues. This is the first GWAS on bovine milk CHL. The identified markers and candidate genes need further validation in a larger cohort for use in the selection of cows with desired milk CHL content.


Discussion
It is known that most cow milk CHL (about 80%) is derived from blood whereas a small portion (about 20%) is derived through local synthesis in the mammary gland 29 . Therefore, the regulation of milk CHL content may require complex mechanisms and the involvement of many genes and pathways. Recently, we reported heritability estimates for CHL_fat (0.09) and CHL_milk (0.18) suggesting that genetics contributes a proportion of the total phenotypic variances in milk CHL content 4 . More SNPs (20) were significantly associated with CHL_milk as compared to two for CHL_fat at the genome wide significant threshold (p < 5E-05). Furthermore, 36 and 19 SNPs including 7 in common were suggestively associated (p < 5E-04) with CHL_milk and CHL_fat, respectively. In fact, 58 genes are located in 0. , and Hapm ap52830-rs29014800 [rs29014800]) for CHL_milk and CHL_fat. Some of the genes have been reported to have potential roles in CHL metabolism such as protein tyrosine phosphatase 1β (PTPN1), diacylglycerol kinase eta (DGKH) and serine dehydratase (SDS). PTPN1 is an important gene for plasma total and HDL-CHL 30-33 while DGKH encodes an enzyme responsible for the recycling and degradation of diacylglycerol, known as important for CHL efflux from adipose cells 34 . SDS gene on the other hand is known to contain a susceptibility loci for low HDL-CHL levels 35 . The most important QTL region for CHL_fat at 41.9 Mb of BTA 17 contained two significant SNPs (Hapmap40322-BTA-100742 [rs41600454] and BTB-01524761 [rs42640895]) for the trait. Relaxin-insulin-like family peptide receptor 1 (RXFP1), transmembrane protein 144 (TMEM144) and family with sequence similarity 198, member B (FAM198B) genes are positional candidate genes for CHL_fat, however, none of them has been reported to have a direct role in the regulation of CHL metabolism. RXFP1, one of four relaxin receptors, is known to play a role in signal transduction between extracellular/intracellular domains 36 . The activation of RXFP1 receptor stimulates the phosphorylation of mitogen-activated protein kinases such as ERK1/2 36 . In fact, the phosphorylation of ERK1/2 is important for the regulation of CHL efflux 37 . RXFP1 is also among genes with more levels of interactions with other CHL-fat candidate genes, as shown by the interaction network (Fig. 3). However, RXFP1 was very lowly expressed in mammary gland tissues (Table S4) so its involvement with CHL_fat concentration might be through its activities in other tissues. The involvement of FAM198B and TMEM144 genes in CHL metabolism might be via their roles in the membrane, since TMEM144 is a carbohydrate transmembrane transporter while FAM198B play roles in golgi membrane functions. In fact, FAM198B was expressed in mammary gland tissues and also significantly correlated to CHL_fat concentration (Tables 5 and  S4b), so its role in CHL synthesis in the mammary gland warrants further investigation.  (Table S1a,b). Among many genes (PLBD2, SDS, RITA1, PTPN11, DTX1, RASAL1, LHX5, CFAP73, IQCD, DDX54, OAS2, TPCN1, SLC8B1, SDSL and RPH3A) located within 0.5 Mb flanking regions of these two SNPs, protein tyrosine phosphatase 1β (PTPN1) has been directly linked to CHL metabolism [30][31][32][33] and it has been identified as a candidate gene for both CHL_fat and CHL_milk in this study. Variants of PTPN11 have been found to associate with serum CHL level in a sex-specific pattern in human 30 while Lu et al. 32 identified PTPN11 as a candidate gene for human plasma HDL-CHL. In the mammary gland, PTPN11 gene was moderately expressed and had tendency (p = 0.067) of being correlated to CHL_fat concentration (Table S4b), therefore more studies are required to validate its role in CHL metabolism.
The QTL region at 117.7 Mb of BTA 4 harboring suggestive SNP ARS-BFGL-NGS-20980 (rs110814823) (p = 4.26E-04) for CHL_fat also harbors several important genes of CHL metabolism such as 5-hydroxytryptamine (serotonin) receptor 5 A (HTR5A) 38,39 and insulin induced gene 1 (INSIG1) 40,41 . INSIG1 was the second most highly expressed gene among CHL_fat positional candidate genes in the mammary gland (Table S4), whereas HTR5A was not expressed in the mammary gland. However, the expression of INSIG1 gene in the mammary gland was not significantly correlated to CHL_fat concentration. It was shown recently that downregulation of INSIG1 gene in mammary gland tissues of lactating dairy cows following dietary supplementation with 5% linseed oil was predicted by Ingenuity Pathways Analysis software (Invitrogen, Carlsbad, CA, USA) to activate CHL concentration in the mammary gland 42 (Table S1a) have been reported to be involved in CHL metabolism [43][44][45] . However, the expression of both ADAM11 and HEXIM1 genes was not significantly correlated to CHL_fat concentration in this study.
The enrichment analyses identified several GO terms with protein kinase regulator activities including negative regulation of cyclin-dependent protein kinase activity (p = 0.001, most significant biological process GO term) and cyclin-dependent protein kinase regulator activity (p = 1E-04, most significant molecular function GO term). In fact, cyclin-dependent protein kinase has been identified as a key regulator of eukaryotic cell cycle 46 , and it might be linked to CHL metabolism via its role in the regulation of energy status 47,48 or lipid metabolism in the liver 49 . Regulation of CHL homeostasis and CHL metabolism is associated with plasma membrane activities 50,51 . Enrichment results suggest a potential role of the (basolateral) plasma membrane in the regulation of CHL_fat. The plasma membrane was the GO term enriched with the largest number of positional candidate genes for CHL_fat while basolateral plasma membrane was the most significantly enriched cell component GO term for CHL_fat candidate genes (Tables 2 and S2a). Meanwhile, cell-cell signaling (p = 0.001) and cell communication (p = 0.004) ( Table 2) were among the most significant biological processes GO terms for CHL_fat suggesting that the regulation of CHL_fat probably requires the interaction and shared signaling activities between different cell types. Among the five KEGG pathways significantly enriched for CHL_fat positional candidate genes, the tight junction pathway has important roles in the transportation of milk constituents in mammary gland cells 52,53 , therefore it might also function in the transportation of CHL from the blood stream into the mammary gland or from mammary gland cells (de novo synthesized) into milk. Focal adhesion is an important pathway for immune functions in bovine mammary cells 54 , for lactation involution 55 and for epigenetic regulation of milk production 56 . The focal adhesion kinase protein has been found in bovine milk fat globule membrane which is the major store of CHL in milk 57   regulation of the transcription of genes associated with proliferation, metabolism, inflammation, and immunity 60 . In fact, PPARD is an important transcription factor regulating CHL metabolism since it plays important roles in the reverse CHL transport 61 . For CHL_milk, the most significant SNP (ARS-BFGL-NGS-4939 [rs109421300]) is located in an intronic region of diacylglycerol O-acyltransferase 1 (DGAT1) gene at 1,801,116 bp on BTA 14. This SNP has been reported to be in complete linkage disequilibrium with the K232A substitution within the DGAT1 gene in German cows 62 . This SNP is also important for milk fat 62 and fatty acid components 63 . Moreover, we also reported high LD among SNPs within and around the DGAT1 gene region (Fig. 2). Another significantly associated SNP for CHL_milk (ARS-BFGL-NGS-18365 or rs110892754) has been found to be important for 305 day milk fat yield 64 . The DGAT1 gene and the centromeric region of BTA 14 is important for the regulation of milk traits (milk fat yield, fat%, protein yield and protein%) 62,64-69 . DGAT1 is a key enzyme in triacylglycerol biosynthesis and also play important roles in the regulation of CHL metabolism [70][71][72] . In ApoE gene knock-out mice, DGAT1 deficiency decreases CHL uptake and absorption 71 . Therefore, the significant SNPs detected for CHL content in this study suggests that the DGAT1 gene and the centromeric region of BTA 14 might be important in the regulation of milk CHL content. In fact, the expression of DGAT1 gene in mammary gland tissues was also significantly correlated to CHL_milk concentration (p = 0.011) ( Table 6), suggesting that DAGT1 might contribute to the regulation of CHL_milk metabolism in the mammary gland.
A significant SNP (ARS-BFGL-NGS-41837 or rs110597360) for CHL_milk on BTA 6 is located in an intergenic region and the nearest gene to this SNP is ENSBTAG00000001751, an orthologue of human CXXC finger protein 4 (CXXC4) gene. CXXC4 encodes a CXXC-type zinc finger domain-containing protein that functions as an antagonist of the canonical wingless/integrated signaling pathway 73,74 . The role of this novel gene in CHL_milk is unknown. On BTA 15, Hapmap38637-BTA-88156 (rs41596665) was significantly associated with CHL_milk  75 . In fact, the Notch signaling pathway was one of the pathways enriched for CHL_milk positional candidate genes in this study and it has been shown to have important roles in mammary gland development 76 . The Notch signaling pathway is important in the regulation of cell fate, cell proliferation and cell death in development 77 ; however, there is no report of its direct role in milk CHL metabolism. On BTA 17, Hapmap52830-rs29014800 (rs29014800) was significantly associated with CHL_milk (p = 1.58E-05) and also suggestively associated with CHL_fat (Tables 1 and S1a), therefore this SNP might be important in the regulation of milk CHL content. On BTA 18, Hapmap39330-BTA-42256 (rs41605812), located in an intronic region of cadherin 13 (CDH13) gene (Table 1), is important for CHL_milk. A SNP within CDH13 has been reported to be associated with plasma adiponectin levels in Japanese population 78 and with triglyceride/high density lipoprotein ratio in Korean cardiovascular patients 79 . This gene is moderately expressed in the bovine mammary gland and also showed a trend (p = 0.075) to correlate to CHL_milk concentration (Table S4c). However, the role of this gene in milk CHL metabolism remains to be characterized.
The enrichment results for positional candidate genes showed several GO terms related to heart development ( Table 3) which might reflect the fact that many candidate genes for CHL also play roles in cardiovascular disease development or heart diseases. An interesting molecular function GO term enriched was interleukin-2 receptor binding. It is known that interleukin-2 gene plays important roles in the activation of STAT5a gene in mammary gland development 80 . Glycerolipid metabolism, another enriched pathway has been implicated in the biosynthesis of CHL 81,82 . Therefore, interleukin-2 receptor binding (GO term) and glycerolipid metabolism pathway might also play important roles in bovine milk CHL metabolism. Interestingly, the most important transcription factor enriched for CHL_milk candidate genes was liver X receptor (LXR) (p = 1.00E-11) which is an important regulator of CHL, fatty acid, and glucose homeostasis [83][84][85] . There are two LXR subtypes (LXRα and LXRβ) and LXRα, the dominant subtype is highly expressed in the liver and other tissues (intestine, adipose, kidney, and adrenals) 86 while LXRβ is widely expressed in different tissues 86 . In our mammary gland RNA expression data, LXRβ (or NR1H2 gene) was also expressed at a higher level when compared to LXRα (or NR1H3 gene). In the liver, LXRα expression was not significantly correlated to CHL_milk during transition and early lactation 20 . Another notable transcription factor enriched for CHL_milk positional candidate genes was notch homolog 1 (NOTCH1) (p = 0.028) ( Table 4), which indicates the importance of NOCTH signaling pathway in milk CHL regulation. The functions of highly interacted genes (MAPK15, FAM83H, ARHGAP39, HEATR7A, CYHR1 and CPSF1) in CHL_ milk protein interaction network (Fig. 4), as well as highly significantly correlated genes (ENSBTAG00000048096, TONSL and ITGB1) ( Table 6) in CHL metabolism are unknown and warrant further investigation.
The genetic variants identified in this study may facilitate selection in commercial breeding schemes either by incorporation in marker-enhanced selection or via implementation of genomic prediction including these identified genetic variants in a customized SNP panel. However, it is also important to consider potential limitations of our study including the limited size of resource population for GWAS, the relaxed p-value threshold used to select SNPs for gene set enrichments, potential for false discovery errors for certain enriched gene ontologies and pathways with few enriched genes in the gene list. The results should be interpreted with caution since both the results of associations (GWAS) and correlations derived from RNA sequence data may not reflect actual causative relationships. As already mentioned above, most CHL in milk is derived from the diet (which is partly reflected as CHL concentration in the blood) while only a small proportion, about 20%, is synthesized de novo in the mammary gland. Therefore, association analysis considering data on both blood and milk CHL concentrations might enhance knowledge of the implicated candidate genes in the regulatory pathways of milk CHL concentration such as dietary CHL transport from blood to the mammary gland and de novo synthesis in the mammary gland. Moreover, integration of gene expression data from the mammary gland and other tissues like the liver could identify the link between the mechanisms regulating CHL in the mammary gland and other tissues, and how these connections influence de novo synthesis of CHL in the mammary gland and milk CHL concentration.
To the best of our knowledge, this is the first GWAS on bovine milk CHL. The strongest SNP associations with milk CHL were detected on BTA14 and BTA17. This study identified several candidate genes (DGAT1, PTPN1,  INSIG1, HEXIM1, SDS, and HTR5A), also important for human plasma CHL and related traits, that might be important for bovine milk CHL. Novel candidate genes (RXFP1, FAM198B, TMEM144, CXXC4, MAML2 and CDH13) for milk CHL content were identified. Enrichment analyses suggested the involvement of important gene ontology terms ((basolateral) plasma membrane and cell-cell signaling processes), pathways (tight junction, focal adhesion, Notch signaling and glycerolipid metabolism pathways), and several transcription factors (PPARD, LXR and NOTCH1) in the regulation of bovine milk CHL content. The expression of some positional candidate genes in the mammary gland and their correlation with milk CHL content was supported with RNA sequencing data and milk CHL concentrations from the same animals. This study has therefore provided an insight into the genomics of bovine milk CHL and identified potential candidate genes and pathways that might be further studied to identify/confirm casual mutations that might help in the selection of cows with desired milk CHL content.

Materials and Methods
Animal Resource and Cholesterol Measure. Animal selection and milk sampling has been described in our previous study 4 . In brief, 100 ml of milk from each of 1,848 cows from 29 herds (minimum: 33 cows/herd and maximum: 172 cows/herd) were used. The concentration of CHL in milk fat was determined by direct saponification and capillary gas chromatography according to Fletouris et al. 87 . About 0.2 mg milk fat was saponified in capped tubes with 0.5 M methanolic KOH solution by heating for 15 minutes and the unsaponifiable fraction was extracted with toluene and analyzed by capillary gas chromatography using Agilent HP 6890 Series Gas Association Analyses. The association analyses were performed using a univariate single SNP mixed linear model implemented in DMU package 89 . In summary, the model for each SNP (analyzed individually) was as follows (model 1): where y is the vector of phenotype (CHL_fat, CHL_milk), 1 is a vector of 1s with length equal to number of observations, μ is the general mean, X is an incidence matrix relating phenotypes to the corresponding fixed effects, and B is the vector for fixed effects which includes interaction between herd and parity and days in milk (DIM), Z is an incidence matrix relating phenotypes to the corresponding random polygenic effect, a is a vector of the random polygenic effect ∼N(0, Aσ u 2 ) (where A is the additive relationship matrix and σ u 2 is the polygenic variance), m is a vector with genotypic indicators 2, 1, or 0 for genotypes AA, AB and BB, respectively associating records to the marker effect, g is a scalar of the associated additive effect of the SNP, and e is a vector of random environmental deviates: N(0, σ e 2 ) (where σ e 2 is the general error variance). The parameters of the model σ 2 u and σ 2 e were estimated using restricted maximum likelihood (REML) for each SNP. To determine the significantly associated SNPs, an F-test was used to test the null hypothesis H 0 : β = 0. Distribution of test statistics was assessed by quantile-quantile (q-q) plot generated from association tests and the deviation from the null hypothesis of no SNP association with the trait. The markers with p nominal < 5E-05 were considered genome wide significant 90 and markers with p nominal from 5E-05 to 5E-04 were considered suggestively genome wide significant to avoid many false negative results caused by stringent Bonferroni correction.

Detection of Linkage Disequilibrium Blocks.
Since several significant SNPs may be clustered in the same region (QTL region), we performed Linkage Disequilibrium (LD) analysis to characterize Linkage Disequilibrium patterns (LD block) for these regions. The LD block was defined according to Gabriel et al. 91 and was detected and visualized with Haploview software 92 . Gabriel et al. 91   Mendelian inheritance errors >1. During LD construction, pairwise comparisons of markers >500 kb apart were ignored according to default settings in the Haploview software.
Gene Mapping, Pathways and Transcription Factor Enrichment. We selected both significant and suggestive SNPs for pathway analyses because assignment of genes using only genome wise significant SNPs may ignore potentially important SNPs with lower significant levels, consequently missing out on key putative candidates and associated pathways. Nearby genes within a flanking distance of 0.5 Mb from significant and suggestive SNPs were queried from Ensemble database (Ensembl 83, Bos taurus UMD3.1), using bedtools 93 . Genes were submitted to the Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://david. abcc.ncifcrf.gov/) for KEGG pathways and Gene Ontology (GO) enrichment analyses 94 while STRING v10.5 95 database was used to assess protein-protein interactions. The human genome was selected as background for enrichment instead of the bovine genome in order to take advantage of a richer database of information on the genomics of human CHL. Annotated pathways and GO terms were tested for enrichment using Fisher exact test. Pathways/GO terms were declared significantly enriched if they did not appear by chance with p < 0.05 94 . For STRING 95 enrichment, the default options were used with the network edge selected based on confidence level. The minimum confidence threshold was set-up at the medium level with score of 0.4. In addition, a comprehensive gene set enrichment analysis for transcriptional machinery using ChIP-X enrichment analysis (ChEA2015) 96 was performed with Enrichr (http://amp.pharm.mssm.edu/Enrichr/) 97 . The transcription factors were declared significantly enriched at p < 0.05.

Evaluation of Expression of Positional Candidate Genes Using Mammary Gland RNA-Seq
Data. The RNA-Seq expression data of 12 cows used is a subset of the data from our previous study 42 . Cows were in mid lactation (day 120-180) and fed the control ration ( Table S4a). The expression of positional candidate genes for milk CHL as read count (reads per kilo base per million mapped reads (RPKM)) is shown in Table S4b. The CHL content in milk obtained from the 12 cows on the same day that mammary gland biopsies where obtained for RNA-Seq was determined using the same methods described above 87 . The Pearson correlations of CHL content with the RPKM values of positional candidate genes were calculated using cor() function in R program 98 . The candidate genes were considered significantly correlated to milk CHL content at p < 0.05. The care of animals and use procedures were according to the Canadian Council on Animal Care 99 and were approved by the Animal Care and Ethics Committee of Agriculture and Agri-Food Canada.