Inclusion of endophenotypes in a standard GWAS facilitate a detailed mechanistic understanding of genetic elements that control blood lipid levels

Dyslipidemia is the primary cause of cardiovascular disease, which is a serious human health problem in large parts of the world. Therefore, it is important to understand the genetic and molecular mechanisms that regulate blood levels of cholesterol and other lipids. Discovery of genetic elements in the regulatory machinery is often based on genome wide associations studies (GWAS) focused on end-point phenotypes such as total cholesterol level or a disease diagnosis. In the present study, we add endophenotypes, such as serum levels of intermediate metabolites in the cholesterol synthesis pathways, to a GWAS analysis and use the pig as an animal model. We do this to increase statistical power and to facilitate biological interpretation of results. Although the study population was limited to ~ 300 individuals, we identify two genome-wide significant associations and ten suggestive associations. Furthermore, we identify 28 tentative associations to loci previously associated with blood lipids or dyslipidemia associated diseases. The associations with endophenotypes may inspire future studies that can dissect the biological mechanisms underlying these previously identified associations and add a new level of understanding to previously identified associations.


Scientific Reports
| (2020) 10:18434 | https://doi.org/10.1038/s41598-020-75612-6 www.nature.com/scientificreports/ Six of the genome-wide or suggestively associated SNPs were located in introns of protein-coding genes and six were within intergenic regions. Strong functional candidate genes were found for seven of the associated SNPs and one locus appeared to have a pleiotropic effect.
The association between Des and a SNP on chromosome (Chr) 6, 157,486,305 bp was the most significant association identified (p = 1.06E−11) (Fig. 1a). The markers in strong linkage disequilibrium (LD) with this SNP position encompass a region of 4397 bp entailing exon 1 of DHCR24 (Fig. 2a). This gene encodes the enzyme 24-dehydrocholesterol reductase that converts desmosterol to cholesterol and different mutations in this gene has been shown to cause desmosterolosis (OMIM:602398; Refs. 32,33 ). Our finding shows that the limiting step for Des serum levels in the examined pigs are not synthesis of the molecule but rather conversion of the molecule to cholesterol by DHCR24.
The four phenotypes Lat, Des, Sint and TC were found to be suggestively associated with four different SNPs on Chr 13 in strong LD with each other. The Des-associated SNP was located in an intron of the HPS3 gene but the remaining three SNPs were located in intergenic regions. All of them were located close to the CP gene, which encodes ceruloplasmin. Ceruloplasmin concentrations has previously been shown to be strongly correlated with serum triglyceride and cholesterol levels 37 . Furthermore, ceruloplasmin administration has been shown to produce a partial correction of dyslipidemia, manifested by normalization of levels of TG, TC, LDL-C and HDL-C 38 . A mechanistic explanation for this may be found in the ability of ceruloplasmin to oxidize LDL particles (reviewed by Ref. 39 ). The oxidized particles are scavenged and degraded by macrophages (reviewed by Ref. 40 ) and thereby ceruloplasmin has a LDL-C lowering effect.
LDL-C level was furthermore suggestively associated with a SNP on Chr 2 in an intron of FCHO2. The FCH and mu domain containing endocytic adaptor 2 encoded by FCHO2 plays a direct role in clathrin-mediated endocytosis of LDL-C by organizing clathrin-coated structures for LDLR endocytosis 31 . Hence, this gene plays a role in clearance of LDL-C from the blood stream.
Sint and Lan were found to be suggestively associated with positions in introns of LINGO1 on Chr 7 and NYAP2 on Chr 15, respectively. To our knowledge, this is the first time these two genes has been linked to levels of cholesterol precursors in the blood, or blood lipid levels in general.
Three SNPs were associated with levels of different phytosterols. These results establish the first identified loci associated with serum levels of phytosterols. The three SNPs were located on Chr 13 but not close to each other. The Sphy associated SNP at 75.59 Mb on Chr 13 was genome-wide significant (p = 1.3E−08) (Fig. 1b) and the ANAPC13 gene, encoding anaphase promoting complex subunit 13, was located within a region in strong LD with the lead-SNP (Fig. 2b). This gene has not previously been associated with serum levels of phytosterols or other lipids. A SNP suggestively associated with Cste was located close to the genes FAIM and FOXL2. The two genes are not obvious candidate genes for phytosterol levels but they are involved in lipid metabolism. FAIM deficiency enhances SREBP signaling and promotes lipogenesis in liver 34 . FOXL2 represses expression of Star, a protein that controls cholesterol transport from the outer to the inner mitochondrial membranes 35,36 . www.nature.com/scientificreports/ The microbiota-derived sterol, Csta, was suggestively associated with a SNP on Chr 3. The SNP was located within an intron of the CALN1 gene. To our knowledge, it is the first time this locus has been associated with lipid absorption or lipid metabolism and the result needs to be confirmed by further studies.
We furthermore identified 52 SNPs with a p -value between 1E−05 and 4.8E−07. Associations at this significance level will inevitably contain many false positive results. Table 3 lists 28 of the 52 SNPs. These SNPs are located in loci, which previously have been linked to blood lipid levels and/or dyslipidemia associated comorbidities. We classify these SNPs as tentative associations. Our results suggest that the previously identified effects may be mediated via specific molecular mechanisms identified by the tentative associations with specific endophenotypes in our study. For example, the ARNT2 (Chr 7) and the TSPYL5 (Chr 4) genes have previously been associated with blood lipid levels, regulation of lipid metabolism and fatty acid synthesis [41][42][43] . In our results, these two loci are tentatively associated with Lan. Hence, our results point more precisely towards a function of Table 2. Genome-wide and suggestively significant GWAS results. Summary data of lead-SNPs surpassing the genome-wide (p < 4.8E−08) or suggestive (p < 4.8E−07) significance level. Results are listed in chromosomal and positional order. Chromosome number (Chr) and base-pair (BP) position refer to pig genome assembly Sscrofa 11.1. MAF = minor allele frequency. P = p value. B = regression coefficient. SE = standard error for B. r 2 = The squared correlation coefficient between pairs of SNPs as a measure for linkage disequilibrium (LD).   44 and PEX2 (Chr 4) to levels of HDL-C and TC 45 , whereas the tentative associations in the present study point more precisely towards a function of these loci in the KRPW for cholesterol biosynthesis because the two loci are associated specifically with Lat levels. On the other hand, the associations between Des levels and the two genes ARV1 (Chr 14) and ACSL1 (Chr 15) identify functions of these loci in the BPW of cholesterol biosynthesis. These loci were previously associated with levels of LDL-C, free cholesterol in circulation and accumulations of cholesterol in the liver 46,47 , whereas our tentative results point more precisely towards an effect on BPW cholesterol synthesis. More examples like these are described in Table 3. Overall, the tentative results in the present study suggest new hypotheses about the fundamental factors and mechanisms causing previously identified end-point phenotype associations. The tentative results of the present study may therefore form a foundation for future studies to clarify these mechanisms.
Correlations between phenotypes are listen in Supplementary Table 1. In general, serum levels of intermediate molecules in the cholesterol synthesis pathway were correlated with each other, phytosterol levels were correlated with each other and levels of microbiota-derived sterols were correlated with each other. Despite the close functional relationship between endophenotypes and end-point phenotypes, correlations between these phenotypes were generally weak and non-significant. However, a moderate correlation was observed between Des and TC. Additionally, we observed a moderate correlation between Lan and the two phytosterols Bsit and Stig. LDL-C was more strongly correlated with TC than HDL-C.

Discussion
Larger and larger cohorts has been the mantra for increasing power in GWAS for many years, and for a good reason of course. Results obtained repeatedly in large cohorts are more trustworthy than those, which are only supported by a few more or less coincidental observations in a small sample. However, in the incessant race for larger studies and big consortia, it seems the value of meticulous phenotyping has been comparatively neglected. The power of a study depends on the statistical significance criterion, the sample size and the magnitude of the effect that we want to detect. In a GWAS, the phenotypic effect of any nucleotide variant may be fixed but the accurate ascertainment of the phenotypic effect has consequences for the power of the study. At least two factors play a critical role for accurate measurement of the phenotypic effect. Firstly, any confounding environmental Table 3. Tentative associations in regions previously associated with blood lipid levels or dyslipidemia associated comorbidities. Tentatively significant results of the genome wide association analyses for loci, which has been associated with blood lipid levels and/or dyslipidemia associated comorbidities in previous studies and have a p value < 10E−5 in the present study.  www.nature.com/scientificreports/ factor must be minimized. Secondly, the direct effect must be measured rather than a secondary or a derived effect, which may be genetically more complex and hence biased by other loci in the genome. In the present study, we minimized environmental effects by using the pig as an animal model and by raising the animals in a highly controlled environment including a uniform diet for all animals. Furthermore, in order to detect a more direct effect of genetic variants, we measured a number of endophenotypes, for example intermediate metabolites in the cholesterol synthesis pathway. On the other hand, opposing the trend towards larger studies, we reduced the number of animals in the study to a number, which must be considered an absolute minimum for a study of this kind.
Despite the very small sample size, our approach resulted in detection of two genome-wide significant associations with p -values of 1.06E−11 and 1.30E−08. While p -values like those are remarkable for a GWAS with this sample size, the many suggestive and tentative associations also show that the number of animals was too restrictive and statistical power suffered from that. Nevertheless, strong functional candidate genes were found for a large fraction of the suggestively and tentatively associated loci. Furthermore, many of those demonstrated how inclusion of endophenotypes could be used to dissect genetic and molecular mechanisms underlying more complex end-point phenotypes. Many of the loci associated with endophenotypes in the present study have previously been associated with the more complex end-point phenotypes HDL-C, LDL-C or a dyslipidemiaassociated disease. In the previous studies, detection of the relatively weak effect of these loci on the end-point phenotypes was only possible because large cohorts were studied. The associations to end-point phenotypes were not confirmed for these loci in the present study, because it was underpowered to detect loci with weak effects on complex end-point phenotypes. Instead, by including endophenotypes we found these loci associated with some of the genetically less complex but biologically more fundamental mechanisms, which ultimately cause a change in levels of LDL-C, HDL-C or disease risk as documented by the previous large cohort studies. That is, we detected the same loci in a simpler setup and at a lower cost, and at the same time, we obtain a more detailed understanding of the biological mechanism by which the loci have an effect on the complex end-point phenotypes.
Inclusion of endophenotypes furthermore enabled identification of one pleiotropic locus on Chr 13 associated with Lat, Des, Sint, and TC. The pleiotropic effect indicates either a direct effect at several levels of cholesterol biosynthesis or an early and strong fundamental effect reflected in levels of intermediates in all later stages of cholesterol synthesis. Hence, due to inclusion of endophenotypes, this locus can be identified as a master regulator in cholesterol biosynthesis.
In addition to loci with effect on endophenotypes, the present study also identifies a number of loci with an effect on the end-point phenotypes, HDL-C, LDL-C, TC and TG. In several cases, these results point towards genes with a very central role in cholesterol synthesis or to loci with an effect on cholesterol clearance and/or excretion rather than synthesis. HDL-C level was tentatively associated with the CDK1 gene, the product of which stabilizes members of the SREBP family 77 . As mentioned above, SREBPs play a key role in the biosynthesis of cholesterol 79 . LDL-C level was suggestively associated with a key mechanism for LDL-C clearance from the blood, namely LDLR endocytosis, via an association with the FCHO2 gene. TC was tentatively associated with another transmembrane transport mechanism by its association with COL23A1. Full-length COL23A1 molecules are found in lipid rafts 56 , which are tightly packed microdomains of the cell membrane. Evidence suggests that these rafts are directly involved in reverse cholesterol transport 57 . Our results lend support to the cholesterol transporting capacity of COL23A1-containing lipid rafts and point to an important role in regulation of overall cholesterol levels. TG level was tentatively associated with TOR1AIP1, which plays an essential role in regulation of secretion of hepatic VLDL 71 , which is the principal carrier of TG in the blood. All these results are in agreement with the expectation that a study of this scale will only be able to detect the loci with strongest effect for the genetically more complex end-point phenotypes.
The serum levels of phytosterols and microbiota-derived sterols can be used as a proxy for the efficiency of sterol absorption from the gastrointestinal canal. Our results confirm previous observations of a role for the PACRG/PRKN locus in lipid absorption from the gut. Besides that, our results indicate that loci with a more general effect on lipid transport and biosynthesis also have a role to play in sterol absorption.
Overall, the present study corroborate many previous results from studies in human and mice and the great overlap between results affirm the quality of the pig as an excellent animal model for human blood lipid metabolism. Furthermore, we identify new loci associated with different blood lipid levels. These results must be further evaluated in future studies in humans and animal models. Most importantly, the study identifies suggestive and tentative associations between endophenotypes and genes, which previously have been associated with end-point phenotypes. These results suggest hypotheses about more fundamental molecular mechanisms underlying the previously identified associations with end-point phenotypes. We propose that these hypotheses should be evaluated in future studies in humans and in animal models. The study demonstrates how inclusion of endophenotypes has the power to detect biologically important loci even in a small-scale study. This is a cost-effective approach compared to larger GWAS with complex end-point phenotypes. At the same time, the results demonstrate how the inclusion of endophenotypes facilitates elucidation of specific details in biological mechanisms underlying variation in end-point phenotypes.

Material and methods
Animal material and sample collection. All animals used in the present study were a three-way cross between Duroc, Landrace and Yorkshire used in the Danish pig production system. The sows from crossings between Landrace and Yorkshire were inseminated with mixed semen from Duroc boars to produce the pigs used in the study. All parental animals were provided by Danbred (Herlev, Denmark). The pigs were produced in a production farm and raised under the conditions for production pigs in Denmark observing guidelines in the www.nature.com/scientificreports/ Danish "Animal Maintenance Act" (Act 432 dated 09/06/2004) and the "Order regarding animal experimentation" (BEK nr 12 af 07/01/2016) and approved by the Danish Veterinary and Food Administration. All pigs were ear-tagged with individual ID at weaning. Both female and male pigs were used in the study. All pigs were fed the same diet. None of the pigs was subjected to any treatment. At an age of approximately 6 month and a body weight of approximately 100 kg, animals were send to an approved commercial abattoir where they were slaughtered in the morning after overnight fasting. Blood and serum samples were collected immediately after exsanguination in BD K2E (EDTA) tubes and BD Vacutainer SST tubes, respectively, from Thermo Fisher Scientific.
DNA isolation, genotyping and imputation to whole genome sequence variants. High quality DNA was isolated from EDTA stabilized blood using a classic salting out procedure 80 . SNP genotyping was performed by Edinburg Genomics, Ashworth Laboratories (Edinburgh) using the 700 K Affymetrix Axiom PigHD chip. To establish marker positions in the newest assembly of the pig genome (Sscrofa11.1 81 ), sequences of the probes for Affymetrix Axiom PigHD SNPs were mapped to the new assembly using BWA 82 Table 1 were measured by Gas Chromatography-Mass Spectrometry (GC-MS) using a method adapted from Heuillet, et al. 85 and Quehenberger, et al. 86 . Briefly, serum were supplemented with deuterium-labelled internal standards and esterified sterols were saponified with 500 µl 0.5 N KOH for 20 min at 60 °C. Sterols were extracted twice with 450 µl water and 900 µl hexane and derivatized with 60 µl BSTFA:TMCS (90:10), 1 h at 80 °C. Samples were dried and resuspended in 60 µl cyclohexane containing 1% BSTFA for GC/MS injection. Samples were analyzed using a Trace 1310-ISQ LT GC-MS instrument (Thermo Fisher Scientific). Sterols were injected at 250 °C in split mode and separated on a 30 m × 0.25 mm, 0.25 µm DB-5MS column (Agilent). Sterols were ionized using electronic impact (EI) and analyzed in SIM mode. Outliers for each of the phenotypes were removed. A phenotypic value was considered an outlier if it was either below the first quantile − 1.5 IQR (interquartile range, which is the difference between third and first quantile), or above the third quantile + 1.5 IQR. The following linear mixed model was used to adjust the phenotypes for the fixed effects:

Phenotypes. Serum levels of all lipids and sterols listed in
where y is the vector of phenotypes; b is the vector of fixed effects (i.e. batch, pen, sex, group); g is the vector of random polygenic effect estimated using a genomic relationship matrix constructed using the markers; e is the random residual. It was assumed that g follows a normal distribution N 0, Gσ 2 g , in which G was the matrix of genomic relationship between individuals estimated using HD marker genotypes following VanRaden 87 , and σ 2 g was the genetic variance. For random residuals, it was assumed that e ∼ N 0, Iσ 2 e , where σ 2 e was the residual variance and I was an identity matrix. The corrected phenotypes were calculated as estimated genetic values plus residuals (y c = g + e) . These corrected phenotypes were used as the response variables in the association analysis.
Correlations between corrected phenotypes were calculated using the Hmisc R package (https ://www.R-proje ct.org). The False discovery rates (FDR) was calculated based on p -values using the p.adjust function of the R package.
Genome wide association analysis. The method for identifying possibly closely linked QTL has been described previously 88 . We estimated the genomic relationship matrix (GRM) using GCTA 89 with the 700 K HD marker set. We estimated the GRM for each chromosome by leaving this chromosome out. This GRM was used for the following GWAS analysis. SimpleM 90 was used to estimate the number of independent tests and set a genome wide significance threshold p-value of 4.8e−08 (− log 10 P = 7.32). A threshold for suggestive significant was set to 10 times the genome wide significance level (4.8E−07). Furthermore, SNPs with − log 10 (p value) < 5 in regions, which previously had been associated with blood lipid levels or dyslipidemia associated comorbidities, are reported.
The GWAS was performed in several rounds. In a first round, single SNP GWAS analysis using GCTA 89 was performed for each chromosome. Then all SNPs were ranked based on their − log 10 P value and the largest − log 10 P value within each chromosome was identified. If the − log 10 P value of a SNP exceeded five, and there were at least one SNP with − log 10 P > 4 within a 2 Mb region (1 Mb up and down stream of the lead SNP), this SNP was retained as lead-SNP. Then, for each lead-SNP we extracted the lead SNP's genotype dosage, fitted it as a covariate, and scanned the whole chromosome again in a second round of the GWAS. If the result of second round detected another SNP that fulfilled lead-SNP criteria, and if this SNP had been a significant (− log 10 P > 5) non-lead SNP in the first round, then this SNP was added to the lead-SNP list. We then extracted the allele dosage of this SNP, www.nature.com/scientificreports/ fixed it as another covariate, and scanned the chromosome in a third GWAS round. This procedure was iterated until no additional SNP remained significant. Genomic heritability (the proportion of phenotypic variance explained by all HD markers) was estimated for each phenotype using GCTA --reml function 89 and a GRM with all HD markers on autosomes. LD was calculated between each lead-SNP and all other SNPs with − log 10 P > 5 on the same chromosome as the lead-SNP. Pairwise LD (r 2 ) was calculated using Plink 91 . LocusZoom 92 was used to illustrate regional GWAS and LD results for genome-wide significant SNPs.
Identification of candidate genes and comparative analysis with previous studies. For each identified association, a 1 Mb region of the porcine genome (Sscrofa11.1) centered on associated SNP were analyzed. Even though strong LD in many cases extended considerably less than 0.5 Mb on each side of the associated SNP, a 1 Mb region were analyzed for all associated positions. All protein coding as well as noncoding genes in the analyzed regions were catalogued. Previous knowledge about each gene was ascertained by searches in the PubMed database, the Entrez-Gene database and the database of www.genec ards.org. Searches were performed with search terms combining each gene name with other relevant terms such as the specific endophenotype or end-point phenotype associated with the locus. For each gene, searches were also performed with a combination of gene name and the general search terms "sterols", "cholesterol" and "blood lipids".