Introduction

Blood lipid levels are routinely measured to evaluate risk of cardiovascular disease (CVD), which is a devastating disease in humans. According to WHO, 17.5 million people died from CVD in 2012. This was more than twice the number of deaths due to cancers. Furthermore, six million people were under age 70 as CVD caused a premature end of life1. This places CVD as the leading cause of death and early demise in the world. The culprit of CVD is dyslipidemia, and this emphasizes why it is of crucial importance to understand the mechanisms that regulate serum levels of blood lipids.

Whole body cholesterol content and serum levels of cholesterol are regulated by a balance in uptake of cholesterol from food, excretion of cholesterol via the bile, de novo biosynthesis of cholesterol in the body as well as microbial metabolism and reduction of cholesterol to coprostanol2,3. Several genes and molecules that facilitate and regulate uptake, excretion and fluxes of cholesterol between compartments and over cell membranes have been described4,5,6,7,8. De novo biosynthesis of cholesterol can take place in any cell in the body, but the great majority of the molecule is synthesized in the liver in a tightly regulated process9. The homeostatic mechanisms are however only partly understood.

Two separate pathways for cholesterol biosynthesis have been described; the Bloch pathway (BPW)10 and the Kandutsch-Russel pathway (KRPW)11. The initial steps in cholesterol synthesis are shared by both pathways. HMG-CoA reductase (the target of current cholesterol-lowering statin therapy) is central in the early step of the cholesterol synthesis where mevalonate is formed from acetyl-CoA. Mevalonate is hereafter converted to squalene (Sq), which again is turned into lanosterol (Lan), which has the cyclic nature of cholesterol. Lan is the starting point for both the BPW and KRPW. In the BPW, Lan is converted to cholesterol via an intermediate called desmosterol (Des), whereas cholesterol is synthesized from Lan via an intermediate called lathosterol (Lat) in the KRPW.

A fundamental regulatory mechanism for de novo synthesis involves sensing of intracellular cholesterol levels by the SREBP proteins (sterol regulatory element binding protein 1 and 2). If cholesterol is present, SREBP binds to SCAP and INSIG-1. Else, INSIG-1 dissociates from the complex and allows cleavage of SREBP, which is then free to migrate to the nucleus where it acts as a transcription factor for a number of genes including the LDL-receptor gene (LDLR) and the HMG-CoA reductase gene (reviewed by REF12). Up-regulation of the LDL-receptor increases uptake of low-density lipoprotein cholesterol (LDL-C) from the blood and up-regulation of the HMG-CoA reductase gene increases de novo biosynthesis of cholesterol. The SREBP proteins regulate the initial steps of cholesterol synthesis. Regulatory mechanisms later in the cholesterol biosynthesis are not as well characterized. It has been observed in mice that relative use of the BPW or the KRPW is tissue specific, but the mechanisms, that regulate relative use of the two pathways, are unknown13.

The present genome-wide association study (GWAS) is designed to dig deeper into the uncharacterized regulatory mechanisms by the use of intermediate phenotypes—also known as endophenotypes.

Use of endophenotypes in genetic mapping have been advocated since the beginning of the century14. The rationale is that an intermediate phenotype is more proximal to the direct effect of a single gene. Hence, an endophenotype is expected to be genetically less complex than an end-point phenotype. This has implications both for the power of a GWAS study and for the interpretation of GWAS results into biological function. A GWAS detects the effect of a variation in the genome—most often a single nucleotide polymorphism (SNP). Associating a SNP with an appropriate endophenotype determines the direct effect of the given SNP, whereas associating the SNP with an end-point phenotype determines a derived effect of the SNP. The derived effect may be influenced by other genes and/or environmental factors and the resulting noise cause a reduction in statistical power. The biological complexity of an end-point phenotype furthermore hampers the biological interpretation of an identified association because the associated SNP, or genes near the genetic variant, may affect one of many phenotype-determining factors. For the endophenotype, on the other hand, the associated SNP, or genes nearby, may affect one of few endophenotype-determining factors.

There has been a general realization that endophenotypes are required in order to resolve genetic heterogeneity15 and to enable a functional characterization of newly discovered genetic variants16. Nevertheless, efforts to increase statistical power in GWAS has until now mainly been focused on increasing sample sizes17. This has also been the case for GWAS studies on blood lipid levels and CVD in humans. Here, sample sizes have increased to several hundred thousand individuals18,19,20. However, the measured phenotypes in these studies are still the crude end-point phenotypes, LDL-C, HDL-C, TG levels, or a CVD diagnosis.

In the present study, we investigate the power of including endophenotypes in the analyses instead of increasing sample size. Furthermore, we use the pig, which is an established model for the highly prevalent human diseases involving lipid metabolism (metabolic syndrome, diabetes, obesity, and cardiovascular diseases)21,22,23. The pig offers an attractive alternative model to rodents because of their anatomical and physiological similarities to humans and the availability of genomic, transcriptomic, metagenomic, proteomic, and metabolomic tools for analysis of this species24. Regarding blood lipids, pigs transport most of their cholesterol in LDL-C, as do humans, while rodents carry the majority of cholesterol in high-density lipoprotein cholesterol (HDL-C) rather than LDL-C, making them less than desirable as models25. Additionally, high-cholesterol diets can induce human-like changes in the plasma lipoprotein profile of pigs, with ~ 60% of plasma cholesterol distributed in LDL particles. Currently, there is no single golden standard animal model of atherosclerosis and CVD development, but the pig is probably the best way to recreate human plaque instability that can be linked to hypercholesterolemia, obesity, and diabetes, which contributes to accelerated atherosclerosis26,27.

The goal of the present study is, like in other GWAS studies, to identify loci with an ultimate effect on cholesterol levels, TG levels and CVD. However, unlike previous studies, we include levels of intermediate metabolites in the cholesterol synthesis pathways as endophenotypes in the analyses. This includes serum levels of Lan as well as the BPW and KRPW specific intermediate metabolites, Des and Lat. Furthermore, serum levels of phytosterols and microbiota-derived sterols are included in the analyses. These sterols are absorbed from the gut by the same mechanisms as cholesterol. Hence, serum levels of these lipids provide an estimate for the efficiency of cholesterol absorption from the gut28,29,30. Our goal is to dissect the cholesterol phenotypes and thereby identify loci controlling specific steps and pathways for biosynthesis, catabolism, and cholesterol transport mechanisms. Hereby we will add to a deeper understanding of biological and genetic mechanisms in cholesterol regulation.

Results

The examined endophenotypes and end-point phenotypes are listed in Table 1 and results of the GWAS analysis are summarized in Table 2. We identified two genome-wide (p < 4.8E−08) and 10 suggestive associations (p < 4.8E−07). As a general overview, six loci were found to be associated with levels of different precursor molecules in the cholesterol synthesis pathway (Lan, Des, Lat, and Sum of intermediates (Sint)). One of these was genome-wide significant. Two suggestive associations were identified for end-point phenotypes LDL-C and total cholesterol (TC). Three associations (1 genome-wide significant) were identified for levels of phytosterols; campesterol (Cste), betasitosterol (Bsit), and sum of phytosterols (Sphy). One suggestive association were identified for levels of the microbiota-derived sterol, coprostanol (Csta).

Table 1 Phenotypes.
Table 2 Genome-wide and suggestively significant GWAS results.

Six of the genome-wide or suggestively associated SNPs were located in introns of protein-coding genes and six were within intergenic regions. Strong functional candidate genes were found for seven of the associated SNPs and one locus appeared to have a pleiotropic effect.

The association between Des and a SNP on chromosome (Chr) 6, 157,486,305 bp was the most significant association identified (p = 1.06E−11) (Fig. 1a). The markers in strong linkage disequilibrium (LD) with this SNP position encompass a region of 4397 bp entailing exon 1 of DHCR24 (Fig. 2a). This gene encodes the enzyme 24-dehydrocholesterol reductase that converts desmosterol to cholesterol and different mutations in this gene has been shown to cause desmosterolosis (OMIM:602398; Refs.32,33). Our finding shows that the limiting step for Des serum levels in the examined pigs are not synthesis of the molecule but rather conversion of the molecule to cholesterol by DHCR24.

Figure 1
figure 1

Manhattan plot illustrating results of the genome-wide association study for the phenotypes (a) Desmosterol (Des) and (b) Sum of phytosterols (Sphy). Chromosome numbers are indicated on the X-axis and − log(p values) on the Y-axis. Horizontal red line: Genome-wide significance threshold (− log(4.8E−08). Horizontal dashed blue line: Suggestive significance threshold (− log(4.8E−07).

Figure 2
figure 2

LocusZoom plots of regional genome-wide association results, linkage disequilibrium and gene annotation for the two genome-wide significant associations. (a) Cover the region on chromosome 6 associated with serum levels of desmosterol. (b) cover the region on chromosome 13 associated with sum of phytosterols in serum. Genome positions are indicated on the X-axes. Annotated genes in the regions are indicated above the X-axes; exons and introns are indicated by boxes and lines, respectively. − log(p values) are indicated on Y-axes. The squared correlation coefficients (r2) as a measure of linkage disequilibrium between SNPs are indicated by color. SNPs with r2 > 0.8 are considered in strong linkage disequilibrium.

The four phenotypes Lat, Des, Sint and TC were found to be suggestively associated with four different SNPs on Chr 13 in strong LD with each other. The Des-associated SNP was located in an intron of the HPS3 gene but the remaining three SNPs were located in intergenic regions. All of them were located close to the CP gene, which encodes ceruloplasmin. Ceruloplasmin concentrations has previously been shown to be strongly correlated with serum triglyceride and cholesterol levels37. Furthermore, ceruloplasmin administration has been shown to produce a partial correction of dyslipidemia, manifested by normalization of levels of TG, TC, LDL-C and HDL-C38. A mechanistic explanation for this may be found in the ability of ceruloplasmin to oxidize LDL particles (reviewed by Ref.39). The oxidized particles are scavenged and degraded by macrophages (reviewed by Ref.40) and thereby ceruloplasmin has a LDL-C lowering effect.

LDL-C level was furthermore suggestively associated with a SNP on Chr 2 in an intron of FCHO2. The FCH and mu domain containing endocytic adaptor 2 encoded by FCHO2 plays a direct role in clathrin-mediated endocytosis of LDL-C by organizing clathrin-coated structures for LDLR endocytosis31. Hence, this gene plays a role in clearance of LDL-C from the blood stream.

Sint and Lan were found to be suggestively associated with positions in introns of LINGO1 on Chr 7 and NYAP2 on Chr 15, respectively. To our knowledge, this is the first time these two genes has been linked to levels of cholesterol precursors in the blood, or blood lipid levels in general.

Three SNPs were associated with levels of different phytosterols. These results establish the first identified loci associated with serum levels of phytosterols. The three SNPs were located on Chr 13 but not close to each other. The Sphy associated SNP at 75.59 Mb on Chr 13 was genome-wide significant (p = 1.3E−08) (Fig. 1b) and the ANAPC13 gene, encoding anaphase promoting complex subunit 13, was located within a region in strong LD with the lead-SNP (Fig. 2b). This gene has not previously been associated with serum levels of phytosterols or other lipids. A SNP suggestively associated with Cste was located close to the genes FAIM and FOXL2. The two genes are not obvious candidate genes for phytosterol levels but they are involved in lipid metabolism. FAIM deficiency enhances SREBP signaling and promotes lipogenesis in liver34. FOXL2 represses expression of Star, a protein that controls cholesterol transport from the outer to the inner mitochondrial membranes35,36.

The microbiota-derived sterol, Csta, was suggestively associated with a SNP on Chr 3. The SNP was located within an intron of the CALN1 gene. To our knowledge, it is the first time this locus has been associated with lipid absorption or lipid metabolism and the result needs to be confirmed by further studies.

We furthermore identified 52 SNPs with a p -value between 1E−05 and 4.8E−07. Associations at this significance level will inevitably contain many false positive results. Table 3 lists 28 of the 52 SNPs. These SNPs are located in loci, which previously have been linked to blood lipid levels and/or dyslipidemia associated comorbidities. We classify these SNPs as tentative associations. Our results suggest that the previously identified effects may be mediated via specific molecular mechanisms identified by the tentative associations with specific endophenotypes in our study. For example, the ARNT2 (Chr 7) and the TSPYL5 (Chr 4) genes have previously been associated with blood lipid levels, regulation of lipid metabolism and fatty acid synthesis41,42,43. In our results, these two loci are tentatively associated with Lan. Hence, our results point more precisely towards a function of ARNT2 and TSPYL5 in the early steps of cholesterol synthesis where Lan is formed from Sq. Similarly, previous studies have linked PID1 (Chr 15) to HDL-C level44 and PEX2 (Chr 4) to levels of HDL-C and TC45, whereas the tentative associations in the present study point more precisely towards a function of these loci in the KRPW for cholesterol biosynthesis because the two loci are associated specifically with Lat levels. On the other hand, the associations between Des levels and the two genes ARV1 (Chr 14) and ACSL1 (Chr 15) identify functions of these loci in the BPW of cholesterol biosynthesis. These loci were previously associated with levels of LDL-C, free cholesterol in circulation and accumulations of cholesterol in the liver46,47, whereas our tentative results point more precisely towards an effect on BPW cholesterol synthesis. More examples like these are described in Table 3. Overall, the tentative results in the present study suggest new hypotheses about the fundamental factors and mechanisms causing previously identified end-point phenotype associations. The tentative results of the present study may therefore form a foundation for future studies to clarify these mechanisms.

Table 3 Tentative associations in regions previously associated with blood lipid levels or dyslipidemia associated comorbidities.

Correlations between phenotypes are listen in Supplementary Table 1. In general, serum levels of intermediate molecules in the cholesterol synthesis pathway were correlated with each other, phytosterol levels were correlated with each other and levels of microbiota-derived sterols were correlated with each other. Despite the close functional relationship between endophenotypes and end-point phenotypes, correlations between these phenotypes were generally weak and non-significant. However, a moderate correlation was observed between Des and TC. Additionally, we observed a moderate correlation between Lan and the two phytosterols Bsit and Stig. LDL-C was more strongly correlated with TC than HDL-C.

Discussion

Larger and larger cohorts has been the mantra for increasing power in GWAS for many years, and for a good reason of course. Results obtained repeatedly in large cohorts are more trustworthy than those, which are only supported by a few more or less coincidental observations in a small sample. However, in the incessant race for larger studies and big consortia, it seems the value of meticulous phenotyping has been comparatively neglected. The power of a study depends on the statistical significance criterion, the sample size and the magnitude of the effect that we want to detect. In a GWAS, the phenotypic effect of any nucleotide variant may be fixed but the accurate ascertainment of the phenotypic effect has consequences for the power of the study. At least two factors play a critical role for accurate measurement of the phenotypic effect. Firstly, any confounding environmental factor must be minimized. Secondly, the direct effect must be measured rather than a secondary or a derived effect, which may be genetically more complex and hence biased by other loci in the genome. In the present study, we minimized environmental effects by using the pig as an animal model and by raising the animals in a highly controlled environment including a uniform diet for all animals. Furthermore, in order to detect a more direct effect of genetic variants, we measured a number of endophenotypes, for example intermediate metabolites in the cholesterol synthesis pathway. On the other hand, opposing the trend towards larger studies, we reduced the number of animals in the study to a number, which must be considered an absolute minimum for a study of this kind.

Despite the very small sample size, our approach resulted in detection of two genome-wide significant associations with p -values of 1.06E−11 and 1.30E−08. While p -values like those are remarkable for a GWAS with this sample size, the many suggestive and tentative associations also show that the number of animals was too restrictive and statistical power suffered from that. Nevertheless, strong functional candidate genes were found for a large fraction of the suggestively and tentatively associated loci. Furthermore, many of those demonstrated how inclusion of endophenotypes could be used to dissect genetic and molecular mechanisms underlying more complex end-point phenotypes. Many of the loci associated with endophenotypes in the present study have previously been associated with the more complex end-point phenotypes HDL-C, LDL-C or a dyslipidemia-associated disease. In the previous studies, detection of the relatively weak effect of these loci on the end-point phenotypes was only possible because large cohorts were studied. The associations to end-point phenotypes were not confirmed for these loci in the present study, because it was underpowered to detect loci with weak effects on complex end-point phenotypes. Instead, by including endophenotypes we found these loci associated with some of the genetically less complex but biologically more fundamental mechanisms, which ultimately cause a change in levels of LDL-C, HDL-C or disease risk as documented by the previous large cohort studies. That is, we detected the same loci in a simpler setup and at a lower cost, and at the same time, we obtain a more detailed understanding of the biological mechanism by which the loci have an effect on the complex end-point phenotypes.

Inclusion of endophenotypes furthermore enabled identification of one pleiotropic locus on Chr 13 associated with Lat, Des, Sint, and TC. The pleiotropic effect indicates either a direct effect at several levels of cholesterol biosynthesis or an early and strong fundamental effect reflected in levels of intermediates in all later stages of cholesterol synthesis. Hence, due to inclusion of endophenotypes, this locus can be identified as a master regulator in cholesterol biosynthesis.

In addition to loci with effect on endophenotypes, the present study also identifies a number of loci with an effect on the end-point phenotypes, HDL-C, LDL-C, TC and TG. In several cases, these results point towards genes with a very central role in cholesterol synthesis or to loci with an effect on cholesterol clearance and/or excretion rather than synthesis. HDL-C level was tentatively associated with the CDK1 gene, the product of which stabilizes members of the SREBP family77. As mentioned above, SREBPs play a key role in the biosynthesis of cholesterol79. LDL-C level was suggestively associated with a key mechanism for LDL-C clearance from the blood, namely LDLR endocytosis, via an association with the FCHO2 gene. TC was tentatively associated with another transmembrane transport mechanism by its association with COL23A1. Full-length COL23A1 molecules are found in lipid rafts56, which are tightly packed microdomains of the cell membrane. Evidence suggests that these rafts are directly involved in reverse cholesterol transport57. Our results lend support to the cholesterol transporting capacity of COL23A1-containing lipid rafts and point to an important role in regulation of overall cholesterol levels. TG level was tentatively associated with TOR1AIP1, which plays an essential role in regulation of secretion of hepatic VLDL71, which is the principal carrier of TG in the blood. All these results are in agreement with the expectation that a study of this scale will only be able to detect the loci with strongest effect for the genetically more complex end-point phenotypes.

The serum levels of phytosterols and microbiota-derived sterols can be used as a proxy for the efficiency of sterol absorption from the gastrointestinal canal. Our results confirm previous observations of a role for the PACRG/PRKN locus in lipid absorption from the gut. Besides that, our results indicate that loci with a more general effect on lipid transport and biosynthesis also have a role to play in sterol absorption.

Overall, the present study corroborate many previous results from studies in human and mice and the great overlap between results affirm the quality of the pig as an excellent animal model for human blood lipid metabolism. Furthermore, we identify new loci associated with different blood lipid levels. These results must be further evaluated in future studies in humans and animal models. Most importantly, the study identifies suggestive and tentative associations between endophenotypes and genes, which previously have been associated with end-point phenotypes. These results suggest hypotheses about more fundamental molecular mechanisms underlying the previously identified associations with end-point phenotypes. We propose that these hypotheses should be evaluated in future studies in humans and in animal models. The study demonstrates how inclusion of endophenotypes has the power to detect biologically important loci even in a small-scale study. This is a cost-effective approach compared to larger GWAS with complex end-point phenotypes. At the same time, the results demonstrate how the inclusion of endophenotypes facilitates elucidation of specific details in biological mechanisms underlying variation in end-point phenotypes.

Material and methods

Animal material and sample collection

All animals used in the present study were a three-way cross between Duroc, Landrace and Yorkshire used in the Danish pig production system. The sows from crossings between Landrace and Yorkshire were inseminated with mixed semen from Duroc boars to produce the pigs used in the study. All parental animals were provided by Danbred (Herlev, Denmark). The pigs were produced in a production farm and raised under the conditions for production pigs in Denmark observing guidelines in the Danish “Animal Maintenance Act” (Act 432 dated 09/06/2004) and the “Order regarding animal experimentation” (BEK nr 12 af 07/01/2016) and approved by the Danish Veterinary and Food Administration. All pigs were ear-tagged with individual ID at weaning. Both female and male pigs were used in the study. All pigs were fed the same diet. None of the pigs was subjected to any treatment. At an age of approximately 6 month and a body weight of approximately 100 kg, animals were send to an approved commercial abattoir where they were slaughtered in the morning after overnight fasting. Blood and serum samples were collected immediately after exsanguination in BD K2E (EDTA) tubes and BD Vacutainer SST tubes, respectively, from Thermo Fisher Scientific.

DNA isolation, genotyping and imputation to whole genome sequence variants

High quality DNA was isolated from EDTA stabilized blood using a classic salting out procedure80. SNP genotyping was performed by Edinburg Genomics, Ashworth Laboratories (Edinburgh) using the 700 K Affymetrix Axiom PigHD chip. To establish marker positions in the newest assembly of the pig genome (Sscrofa11.181), sequences of the probes for Affymetrix Axiom PigHD SNPs were mapped to the new assembly using BWA82. Markers with a unique map position were retained for further analyses. Whole genome sequence (WGS) from a reference population of 217 animals from three pig breeds (89 Duroc, 61 Landrace and 67 Yorkshire) was used to impute genotypes at a WGS variant level based on HD chip genotypes. Non-autosomal markers and indels with a position coinciding with HD chip markers were removed. To phase the haplotype for the HD marker set and the WGS marker set, we used Eagle83 with default parameters. The WGS dataset comprised 26,581,741 bi-allelic markers on 18 autosomes. Finally, Minimac384 was used to impute the HD marker set to WGS level. SNPs with a minor allele frequency below 2%, SNPs with large deviation from Hardy–Weinberg proportions (P < 1.0E − 4) and SNPs with imputation accuracy below 0.4, reported by Minimac3, were removed. After this, 14,763,710 markers were retained for association analyses.

Phenotypes

Serum levels of all lipids and sterols listed in Table 1 were measured by Gas Chromatography-Mass Spectrometry (GC–MS) using a method adapted from Heuillet, et al.85 and Quehenberger, et al.86. Briefly, serum were supplemented with deuterium-labelled internal standards and esterified sterols were saponified with 500 µl 0.5 N KOH for 20 min at 60 °C. Sterols were extracted twice with 450 µl water and 900 µl hexane and derivatized with 60 µl BSTFA:TMCS (90:10), 1 h at 80 °C. Samples were dried and resuspended in 60 µl cyclohexane containing 1% BSTFA for GC/MS injection. Samples were analyzed using a Trace 1310-ISQ LT GC–MS instrument (Thermo Fisher Scientific). Sterols were injected at 250 °C in split mode and separated on a 30 m × 0.25 mm, 0.25 µm DB-5MS column (Agilent). Sterols were ionized using electronic impact (EI) and analyzed in SIM mode.

Outliers for each of the phenotypes were removed. A phenotypic value was considered an outlier if it was either below the first quantile − 1.5 IQR (interquartile range, which is the difference between third and first quantile), or above the third quantile + 1.5 IQR. The following linear mixed model was used to adjust the phenotypes for the fixed effects:

$${\varvec{y}}={\varvec{Z}}{\varvec{b}}+{\varvec{X}}{\varvec{g}}+{\varvec{e}}$$

where y is the vector of phenotypes; b is the vector of fixed effects (i.e. batch, pen, sex, group); g is the vector of random polygenic effect estimated using a genomic relationship matrix constructed using the markers; e is the random residual. It was assumed that g follows a normal distribution \(N\left(0, G{\sigma }_{g}^{2}\right)\), in which G was the matrix of genomic relationship between individuals estimated using HD marker genotypes following VanRaden87, and \({\sigma }_{g}^{2}\) was the genetic variance. For random residuals, it was assumed that \(e\sim N\left(0,I{\sigma }_{e}^{2}\right)\), where \({\sigma }_{e}^{2}\) was the residual variance and I was an identity matrix. The corrected phenotypes were calculated as estimated genetic values plus residuals \(({y}_{c}=\widehat{g}+\widehat{e})\). These corrected phenotypes were used as the response variables in the association analysis.

Correlations between corrected phenotypes were calculated using the Hmisc R package (https://www.R-project.org). The False discovery rates (FDR) was calculated based on p -values using the p.adjust function of the R package.

Genome wide association analysis

The method for identifying possibly closely linked QTL has been described previously88. We estimated the genomic relationship matrix (GRM) using GCTA89 with the 700 K HD marker set. We estimated the GRM for each chromosome by leaving this chromosome out. This GRM was used for the following GWAS analysis. SimpleM90 was used to estimate the number of independent tests and set a genome wide significance threshold p-value of 4.8e−08 (− log10P = 7.32). A threshold for suggestive significant was set to 10 times the genome wide significance level (4.8E−07). Furthermore, SNPs with − log10(p value) < 5 in regions, which previously had been associated with blood lipid levels or dyslipidemia associated comorbidities, are reported.

The GWAS was performed in several rounds. In a first round, single SNP GWAS analysis using GCTA89 was performed for each chromosome. Then all SNPs were ranked based on their − log10P value and the largest − log10P value within each chromosome was identified. If the − log10P value of a SNP exceeded five, and there were at least one SNP with − log10P > 4 within a 2 Mb region (1 Mb up and down stream of the lead SNP), this SNP was retained as lead-SNP. Then, for each lead-SNP we extracted the lead SNP’s genotype dosage, fitted it as a covariate, and scanned the whole chromosome again in a second round of the GWAS. If the result of second round detected another SNP that fulfilled lead-SNP criteria, and if this SNP had been a significant (− log10P > 5) non-lead SNP in the first round, then this SNP was added to the lead-SNP list. We then extracted the allele dosage of this SNP, fixed it as another covariate, and scanned the chromosome in a third GWAS round. This procedure was iterated until no additional SNP remained significant.

Genomic heritability (the proportion of phenotypic variance explained by all HD markers) was estimated for each phenotype using GCTA --reml function89 and a GRM with all HD markers on autosomes. LD was calculated between each lead-SNP and all other SNPs with − log10P > 5 on the same chromosome as the lead-SNP. Pairwise LD (r2) was calculated using Plink91. LocusZoom 92 was used to illustrate regional GWAS and LD results for genome-wide significant SNPs.

Identification of candidate genes and comparative analysis with previous studies

For each identified association, a 1 Mb region of the porcine genome (Sscrofa11.1) centered on associated SNP were analyzed. Even though strong LD in many cases extended considerably less than 0.5 Mb on each side of the associated SNP, a 1 Mb region were analyzed for all associated positions. All protein coding as well as non-coding genes in the analyzed regions were catalogued. Previous knowledge about each gene was ascertained by searches in the PubMed database, the Entrez-Gene database and the database of www.genecards.org. Searches were performed with search terms combining each gene name with other relevant terms such as the specific endophenotype or end-point phenotype associated with the locus. For each gene, searches were also performed with a combination of gene name and the general search terms “sterols”, “cholesterol” and “blood lipids”.