Understanding genetic architecture of plasma lipidome could provide better insights into lipid metabolism and its link to cardiovascular diseases (CVDs). Here, we perform genome-wide association analyses of 141 lipid species (n = 2,181 individuals), followed by phenome-wide scans with 25 CVD related phenotypes (n = 511,700 individuals). We identify 35 lipid-species-associated loci (P <5 ×10−8), 10 of which associate with CVD risk including five new loci-COL5A1, GLTPD2, SPTLC3, MBOAT7 and GALNT16 (false discovery rate<0.05). We identify loci for lipid species that are shown to predict CVD e.g., SPTLC3 for CER(d18:1/24:1). We show that lipoprotein lipase (LPL) may more efficiently hydrolyze medium length triacylglycerides (TAGs) than others. Polyunsaturated lipids have highest heritability and genetic correlations, suggesting considerable genetic regulation at fatty acids levels. We find low genetic correlations between traditional lipids and lipid species. Our results show that lipidomic profiles capture information beyond traditional lipids and identify genetic variants modifying lipid levels and risk of CVD.
Cardiovascular diseases (CVDs) encompass many pathological conditions of impaired heart function, vascular structure and circulatory system. CVDs are the leading cause of mortality and morbidity worldwide1, necessitating the need for better preventive and predictive strategies. Plasma lipids, the well-established heritable risk factors for CVDs2, are routinely monitored to assess CVD risk. Standard lipid profiling measures traditional lipids (referred to LDL-C, HDL-C, total triglycerides and total cholesterol), but does not capture the functionally and chemically diverse molecular components—the lipid species3. These molecular lipid species may independently and specifically affect different manifestations of CVD, such as ischaemic heart disease and stroke. Lipid species including cholesterol esters (CEs), lysophosphatidylcholines (LPCs), phosphatidylcholines (PCs), phosphatidylethanolamines (PEs), ceramides (CERs), sphingomyelins (SMs) and triacylglycerols (TAGs) potentially improve CVD risk assessment over traditional lipids4,5,6,7,8,9.
Understanding of the genetic architecture and genetic regulation of these lipid species could help guide tool development for CVD risk prediction and treatment. Genetic studies of traditional lipids have identified over 250 genomic loci and improved our understanding of CVD pathophysiology10,11. For the majority of the lipid loci, however, their effects on detailed lipidome beyond traditional lipids are unknown. Only a few studies have reported genetic associations for lipid species either through studies on subsets of the lipidome12,13 or GWASs on metabolome14,15,16,17,18,19,20.
In light of the limited information about the genetics of lipidomic profiles and their relationship with CVDs, we carried out a GWAS of lipidomic profiles of 2181 individuals using ~9.3 million genetic markers followed by PheWAS including 25 CVD-related phenotypes in up to 511,700 individuals (Fig. 1). We aimed to (1) determine heritability of lipid species and their genetic correlations; (2) identify genetic variants influencing the plasma levels of lipid species; (3) test the relationship between identified lipid–species-associated variants and CVD manifestations and (4) gain mechanistic insights into established lipid variants. We find that lipid species are heritable, suggesting a considerable role of endogenous regulation in lipid metabolism. We report association of new genomic loci with lipid species and CVD risk in humans. In addition to enhancing the current understanding of genetic regulation of circulating lipids, our study emphasises the need of lipidomic profiling in identifying additional variants influencing lipid metabolism.
Heritability of lipid species
First, we determined SNP-based heritability for each of the lipid species and traditional lipids using genetic relationship matrix for all the study participants. The demographic characteristics of the study participants are provided in Supplementary Table 1. SNP-based heritability estimates ranged from 0.10 to 0.54 (Fig. 2a; Supplementary Table 2), showing considerable variation across lipid classes (Fig. 2b), with similar trends as reported previously21,22. CERs showed the greatest estimated heritability (median = 0.39, range = 0.35–0.40), whereas phosphatidylinositols (PIs) showed the least heritability (median = 0.19, range = 0.11–0.31). Sphingolipids had higher heritability than glycerolipids ranging from 0.24 to 0.41 (Fig. 2b), which is similar to a previous study that reported higher heritability for sphingolipids ranging from 0.28 to 0.53 estimated based on pedigrees21. Lipids containing polyunsaturated fatty acids, particularly C20:4, C20:5 and C22:6, had significantly higher heritability compared with other lipid species (Fig. 2c). For instance, PC (17:0;0–20:4;0) and LPC (22:6;0) had the highest heritability (> 0.50), whereas PC (16:0;0–16:1;0) and PI (16:0;0–18:2;0) had the lowest heritability estimates (< 0.12) (Supplementary Table 2).
Genetic correlations between lipid species
Longer, polyunsaturated lipids (those with four or more double bonds) had stronger genetic correlations with each other than with other lipid species (Supplementary Fig. 1, Supplementary Data 1). This can be seen in the hierarchical clustering based on genetic correlations that segregate TAG subspecies into two clusters based on carbon content and degree of unsaturation (Fig. 2d). These patterns were not seen in phenotypic correlations that were estimated based on the plasma levels of lipid species (Supplementary Fig. 2).
We observed low phenotypic and genetic correlation between traditional lipids and molecular lipid species, except strong positive genetic correlations of triglycerides with TAGs and DAGs (average r = 0.88) (Fig. 3). However, triglycerides had low genetic correlation with other lipid species (average (abs) r = 0.26). HDL-C and LDL-C levels had low genetic and phenotypic correlations with most of the lipid species (Fig. 3; Supplementary Data 1). Consistently, all of the known lipid variants explained 2–21% of variances in plasma levels of various lipid species, with the least variance accounting for LPCs (Fig. 3). To rule out the possibility that lipid-lowering medications resulted in the observed low genetic correlations between traditional lipids and lipid species, we also calculated the genetic correlations after excluding the individuals using lipid lowering medications (N = 172). This re-analysis provided the similar results as the primary analysis (Supplementary Fig. 3). It is to be noted that this sample size might not provide sufficient power for heritability estimations in unrelated samples. Our study also included the family samples which provides higher statistical power in heritability estimation than unrelated samples.
Lipid species associated variants
Next, we performed the genome-wide association analyses for 141 lipid species with ~9.3 million genetic markers. We identified 2817 associations between 518 variants located within 11 genomic loci (1MB blocks) and 42 lipid species from 10 lipid classes at study-wide significance (P < 1.5 × 10−9 accounting for 34 principal components that explain 90% of the variance in lipidome) (Table 1; Supplementary Data 2, 3). These included three new loci (ROCK1, MAF and SYT1) that are not previously reported for any lipid measure or related metabolite (Fig. 4). Among the new loci, the strongest association was at an intronic variant rs151223356 near ROCK1 with short acyl-chain LPC(14:0,0) (P = 1.9 × 10−10). ROCK1 encodes for a serine/threonine kinase that plays key role in glucose metabolism23. In line with our observation of higher heritability for lipids with C20:4, C20:5 or C22:6 acyl chains, we detected associations for 15 out of 21 lipids with these acyl chains.
We also replicated the previous associations of FADS2, SYNE2, LIPC, CERS4 and MBOAT7 with the same lipid species13,14,15,16,17,18,19,20. The previously reported associations at the known loci identified in previous metabolomics GWASs are provided in Supplementary Data 4. This information was obtained from the databases-SNiPA (http://snipa.org) using block annotation and PhenoScanner v2 (http://www.phenoscanner.medschl.cam.ac.uk/), and were manually curated to include associations from literature search. In addition, we also identified new locus–lipid species associations at previously reported lipid loci including new associations of variants at ABCG5/8 with CE (20:2;0) (P = 3.9 × 10−10), MBOAT7 with PI (18:0;0–20:3;0) (P = 3.0 × 10−12) and GLTPD2 with SM (34:0;2) (P = 3.4 × 10−22) (Supplementary Data 2, 3).
Further, we systematically evaluated the associations of variants previously identified in metabolomics GWAS (126 variants from 46 loci available in our data set out of 132 reported) with 141 lipid species. Of these known variants, 76 variants from 12 loci showed association with 98 different lipid species with P < 3.2 × 10−5 (correcting for 46 loci and 34 PCs for lipid species) (Supplementary Data 5). Of the 134 previously reported variant–lipid species pair associations that could be examined in our data set, 94 of such associations were replicated with the same direction of effect with P < 3.7 × 10−4 (accounting for 134 comparisons) in our study (Supplementary Data 6).
In addition, 24 further loci were associated with at least one lipid species at regularly used genome-wide significance level (1.5 × 10−9>P < 5.0 × 10−8). Among these additional loci, 13 loci were located in genomic regions not previously reported for any lipid measure or related metabolite, and 8 loci were located near known loci for lipids but were independent of any previously reported variant (Table 1; Supplementary Data 3). The regional association plots for all 35 loci with P < 5.0 × 10−8 are presented in Supplementary Data 7, and the genotype–phenotype relationships for the lead variants in these 35 loci are provided in Supplementary Fig. 4.
Relationship between identified variants and risk of CVD
As many of the lipid species have previously been shown to predict CVD risk, we determined if the variants associated with lipid species affect individuals’ susceptibility to CVD-related phenotypes in FinnGen and UK Biobank cohorts. We identified 25 CVD-related phenotypes from the clinical outcomes derived from health registry data in the FinnGen and UK Biobanks (Supplementary Table 3). The follow-up PheWAS analyses included lead variants from all of the 35 independent loci that showed associations with P < 5.0 × 10−8 (Table 1). Overall, 10 of the 35 lipid–species variants (APOA5, ABCG5/8, BLK, LPL, FADS2, COL5A1, GALNT16, GLTPD2, MBOAT7 and SPTLC3) were associated with at least one of the CVD outcomes (FDR < 5%) (Fig. 5; Supplementary Data 8). These included novel associations of variants at COL5A1 with cerebrovascular disease (P = 4.6 × 10−4), GALNT16 with angina (P = 9.3 × 10−4), MBOAT7 with venous thromboembolism (P = 1.3 × 10−3), GLTPD2 with atherosclerosis (P = 5.3 × 10−4) and SPTLC3 with intracerebral haemorrhage (P = 1.0 × 10−3) (Fig. 5). FADS1-2-3 is a well-known lipid modifying locus; however, like many other known lipid loci, its effects on CVD risk has been unclear. We found an association of FADS2 rs28456-G with peripheral artery disease (P = 2.2 × 10−4) and aterial embolism and thrombosis (P = 2.5 × 10−4). BLK (rs1478898-A) was also found to be associated with decreased risk of obesity (OR = 0.97, P = 5.6×10−8) and type 2 diabetes (OR = 0.96, P = 4.5 × 10−5).
Several studies have suggested a role for sphingolipids, including CERs and SMs, in the pathogenesis of CVDs. CER (d18:1/24:0) and CER (d18:1/24:1) have been reported to be associated with the increased risk of CVD events9. We found that the CER (d18:1/24:1) decreasing variant SPTLC3 rs364585-G was associated with decreased risk of intracerebral haemorrhage, while CER (d18:1/24:0) increasing variant ZNF385D rs13070110-C was nominally associated with increased risk of intracerebral haemorrhage. Furthermore, consistent with the observation that elevated plasma SMs levels are atherogenic24, we identified association of GLTPD2 rs79202680-T (associated with reduced levels of SMs) with reduced risk of atherosclerosis.
Mechanistic insights into lipid variants
Next, we determined if the detailed lipidomic profiles could provide new mechanistic insights into the role of known lipid variants in lipid biology. We present two examples of well-established lipid variants here. First is the fatty acid desaturase (FADS) gene cluster that has been consistently reported to be associated with omega-3 and omega-6 fatty acids levels with inverse effects on different PUFAs. Its mechanism, however, has not been fully deciphered. Here, we found that the FADS2 rs28456-G was associated with increased levels of lipids with a C20:3 acyl chain and decreased levels of lipids with C20:4, C20:5 and C22:6 acyl chains (Supplementary Fig. 5). The rs28456-G is also an eQTL that increases FADS2 expression while reduces the expression of FADS1 [GTEx v7]. These data together explain the inverse relationship of FADS2 variants with lipids containing different polyunsatureated fatty acids (PUFAs) (Fig. 6).
Another example is lipoprotein lipase (LPL). LPL codes for lipoprotein lipase that is the master lipolytic factor of TAGs in TAG-enriched chylomicrons and VLDL particles. We found that LPL rs11570891-T was associated with reduced levels of medium length TAGs (C50–C56), with strongest associations with TAG (52:3;0). This suggested that LPL enzyme might have different efficiency in hydrolysis of TAGs of different length. We explored this possibility by evaluating (1) the effect of LPL rs11570891-T on LPL enzymatic activity and (2) the relationship between LPL activity and plasma levels of TAGs of different length, using post-heparin LPL measured in the EUFAM cohort. We found that LPL rs11570891-T (an eQTL increasing LPL expression) was associated with increased LPL activity, which in turn was associated with TAG species with stronger effect on medium length TAGs than other TAGs (Fig. 6). Consistent with a previous report by Rhee et al.16, variant rs964184-C at APOA5, which codes for the activator that stimulates LPL-mediated lipolysis of TAG-rich lipoproteins and their remnants, also showed association with medium length TAGs (Fig. 6). These results provide first clues to the probable variable role of LPL and APOA5 in the hydrolysis of different TAG species.
Similarly, the association patterns of some of the newly mapped loci suggested their underlying functions. For example, SYNGR1 rs186680008-C showed strongest associations with decreased levels of lipid species with C20:3 acyl chain from different lipid classes, including CEs, PCs and PCOs (Supplementary Fig. 5), suggesting its role in PUFA metabolism (Fig. 6). PTPRN2 rs10281741-G and MIR100HG rs10790495-G showed associations with reduced levels of long polyunsaturated TAG species, suggesting their role in negative regulation of either elongation and desaturation of fatty acids or incorporation of long-chain unsaturated fatty acids during TAG biosynthesis.
Lipidomics provide higher statistical power
As intermediate phenotypes are known to provide more statistical power, we assessed whether the lipid species could help to detect genetic associations with greater power than traditional lipids using variants previously identified for traditional lipids (number of variants = 557; Supplementary Data 9). We found that molecular lipid species have much stronger associations than traditional lipids with the same sample size, except for well-known APOE and CETP (Fig. 7; Supplementary Data 10). The associations were several orders of magnitudes stronger for the variants in or near genes involved in lipid metabolism, such as FADS1-2-3, LIPC, ABCG5/8, SGPP1 and SPTLC3. This shows that the lipidomics provides higher chances to identify lipid-modulating variants, particularly the ones with direct role in lipid metabolism, with much smaller sample size than traditional lipids.
We present findings from a large-scale study that integrate lipidome, genome and phenome revealing detailed description of genetic regulation of lipidome and its associations with CVD risk. In addition to enhancing the current understanding of genetic determinants of circulating lipids, our study highlights the potential of lipidomics in gene mapping for lipids and CVDs over traditional lipids. The study generates a publicly available knowledgebase of genetic associations of molecular lipid species and their relationships with thousands of clinical outcomes.
Despite the expected influence of dietary intake on the circulatory lipids, plasma levels of lipid species are found to be heritable, suggesting considerable role of endogenous regulation in lipid metabolism. Importantly, genetic mechanisms do not seem to regulate all lipid species in a lipid class in the same way, as also observed in recent mice lipidomics studies25,26. Longer and more unsaturated lipid species from different lipid classes clearly display stronger genetic correlations. These observations are consistent with a previous study based on family pedigrees21. Our finding is important in the light of the proposed role of lipids containing PUFAs in CVDs, diabetes and other disorders27,28,29. Identification of genetic factors regulating these particular lipids is important for understanding the subtleties of lipid metabolism and devising preventive strategies including dietary interventions. Our study provides multiple leads in this direction by identifying 11 genomic loci (KLHL17, APOA5, CD33, SHTN1, FADS2, LIPC, MBOAT7, MIR100HG, PTPRN2, PDHA2 and TMEM86B) associated with long, polyunsaturated lipids at genome-wide significance. Of these, FADS2, APOA5, LPL and MBOAT7 variants were also associated with risk of CVDs (Fig. 5).
Further, we mapped genetic variants for lipid species from several lipid classes, including CERs, CEs, TAGs, SMs and PCs, that are shown to predict CVD risk4,5,6,7,8,9. Our PheWAS analyses also suggested relationship between many of the mapped genetic variants and CVD outcomes. This knowledge can directly fuel studies on CVD prediction or drug target discovery. For instance, CERs and CEs have also been reported to associate with increased risk of CVD events5,6,7,8,9. Our study revealed three loci associated with CEs, including FADS2 and two novel loci-ABCG5/8 and SYNGR1, and two loci for CERs (SPTLC3 and ZNF385D). CER species, particularly CER (d18:1/24:0) and CER (d18:1/24:1) are recently reported to be associated with the increased risk of CVD9. We identified two variants near SPTLC3 and ZNF385D that modulate the plasma levels of CER (d18:1/24:1) and CER (d18:1/24:0), respectively, and risk for intracerebral haemorrhage. This information could also guide future studies to establish the causal relationship between lipid species and CVD.
The detailed lipidomic profile also provided clues towards understanding the mechanisms of effects of well-established lipid loci like FADS2 and LPL on lipid metabolism and CVD risks. We show how the inverse effects of FADS2 rs28456-G on the expression of two desaturases (FADS2 and FADS1) could explain its opposite effects on lipids with different PUFAs. The delta-6 desaturation by FADS2 generates gamma-linolenic acid and stearidonic acid that by elongation yield dihomo-gamma-linolenic acid and eicosatetraenoic acid (Fig. 6)30. Further, delta-5 desaturation of dihomo-gamma-linolenic acid by FADS1 generates arachidonic acid and eicosapentaenoic acid. Thus, as depicted in Fig. 6, the inverse effects of FADS2 rs28456-G on FADS2 and FADS1 expressions explain its opposite effects on different PUFAs. The association of FADS2 rs28456-G with the reduced levels of lipids containing arachidonic acid may also explain its assocition with reduced risk of atherosclerotic CVD outcomes—peripheral artery disesae (PAD) and aterial embolism and thrombosis.
LPL and APOA5 are the key players in TAG hydrolysis. Our integrated approach suggested that their activity could be different for different TAG species with higher efficiency for medium length TAGs (C50–C56). We show that an LPL variant increases the LPL activity resulting in decreased levels of medium length TAGs. The association of the LPL variant with reduced susceptibility to CVD and type 2 diabetes could be mediated through the decrease in medium length TAGs (Fig. 5). This is consistent with a previous report that showed a similar pattern of association of levels of TAG species with type 2 diabetes31.
Similarly, the patterns of assocations of newly mapped loci also suggested their involvement in the regulation of lipid metabolism. For example, rs10281741-G near PTPRN2 and rs10790495-G near MIR100HG showed distinct association patterns with TAGs, with strongest association with long polyunsaturated TAGs. PTPRN2 codes for protein tyrosine phosphatase receptor N2 with a possible role in pancreatic insulin secretion and development of diabetes mellitus32, while MIR100HG rs10790495 is an eQTL for the heat-shock protein HSPA8 that has a role in cell proliferation33. However, it is not known if PTPRN2 and MIR100HG or HSPA8 have any role in lipid metabolism.
Finally, we show that lipidomic profiles capture information beyond traditional lipids and provide an opportunity to identify additional genetic variants influencing lipid metabolism and disease risk. Previously, Petersen et al. showed that lipoprotein subfractions correlate with traditional lipids and strengthen genetic associations at known lipid loci and that these loci explain more of the variance of lipoprotein subfractions than of serum lipids34. Similarly, our study demonstrates that molecular lipid species have stronger statistical power compared with traditional lipids at known lipid loci using the same sample size. However, in contrast to Petersen et al., we found that many of the lipid species, including LPCs and PCs that have previously been associated with incident coronary heart disease risk4,5,6, have low phenotypic and genotypic correlations with traditional lipids. We also show that the known lipid variants for traditional lipids explain less of the variance of lipid species than traditional lipids. Altogether, as expected these results suggest that lipidomic profiles could provide novel information that could not be captured by traditional lipids and lipoprotein measurements.
Our study had some potential limitations. Though our study represents one of the largest genetic screen of lipidomic variation, larger cohorts are needed to achieve its full understanding. Blood samples for the EUFAM cohort were drawn after an overnight fast whereas the FINRISK cohort samples had varied fasting duration. This, however, does not seem to have substantial effect on the results and their interpretation as shown in Supplementary Data 11 and Supplementary Fig. 6. Moreover, a recent study by Rämö et al. also demonstrated similar lipidomic profiles for dyslipidemias from the EUFAM and FINRISK cohorts35. The UK Biobank cohort is reported to have a “healthy volunteer” effect36, which may affect the PheWAS results, however, given the large sample size, this is unlikely to have a substantial effect on genetic association analyses. Furthermore, lipidomic profiles were measured in whole plasma, which does not provide information at the level of individual lipoprotein subclasses and limits our ability to gain detailed mechanistic insights. We also excluded poorly detected lipid species to ensure high data quality that narrowed the spectrum of lipidomic profiles. Further advances in lipidomics platforms might help to capture more comprehensive and complete lipidomic profiles, including the position of fatty acyl chains in the glycerol backbone of TAGs and glycerophospholipids and detection of sphingosine-1-P species and several other species, that would allow to overcome these limitations.
In conclusion, our study demonstrates that lipidomics enables deeper insights into the genetic regulation of lipid metabolism than clinically used lipid measures, which in turn might help guide future biomarker and drug target discovery and disease prevention.
Subjects and clinical measurements
The study included participants from the following cohorts: EUFAM, FINRISK, FinnGen and UK Biobank. The EUFAM (The European Multicenter Study on Familial Dyslipidemias in Patients with Premature Coronary Heart Disease) study cohort is comprised of the Finnish familial combined hyperlipidemia families37. The families in EUFAM study were identified via probands admitted to Finnish university hospitals with a diagnosis of premature coronary heart disease. The probands had premature coronary heart disease and high levels of the total cholesterol, triglycerides, or both (≥ 90th Finnish age-specific and sex-specific population percentile), or low HDL-C levels (≤ 10th percentile). Invitation was extended to all the family members and spouses of the probands if at least one first-degree relative of the proband had high levels of the total cholesterol, triglycerides, or both. Venous blood samples were obtained from all participants after overnight fasting. Triglycerides and total cholesterol were measured by enzymatic methods using an automated Cobas Mira analyser (Hoffman-La Roche, Basel, Switzerland)37,38. HDL-C was quantified by phosphotungstic acid/magnesium chloride precipitation procedures, and LDL-C was calculated using the Friedewald formula39.
The Finnish National FINRISK study is a population-based survey conducted every 5 years since 1972, and thus far samples have been collected in 1992, 1997, 2002, 2007 and 201240. Collections from the 1992, 1997, 2002, 2007 and 2012 surveys are stored in the National Institute for Health and Welfare /THL) Biobank. Lipidomic profiling was performed for 1142 participants that were randomly selected from the FINRISK 2012 survey (Supplementary Table 1). The participants were advised to fast for at least 4 h before the examination and to avoid heavy meals earlier during the day. Venous blood samples were obtained from all the participants and sera were separated. HDL-C, triglycerides and total cholesterol were measured with enzymatic methods (Abbott laboratories, Abbott Park, IL, USA) with Abbott Architect c8000 clinical chemistry analyser40.
The FinnGen data release 2 is composed of 102,739 Finnish participants. The phenotypes were derived from ICD codes in Finnish national hospital registries and cause-of-death registry as a part of FinnGen project. The quality of the CVD diagnoses in these registers has been validated in previous studies41,42,43,44,45. The UK Biobank data is comprised of >500,000 participants based in UK and aged 40–69 years, annotated for over 2000 phenotypes46. The PheWAS analyses in this study included 408,961 samples from white British participants.
The study was conducted in accordance with the principles of the Helsinki declaration. Written informed consent was obtained from all the study participants. The study protocols were approved by the ethics committees of the participating centres (The Hospital District of Helsinki and Uusimaa Coordinating Ethics committees, approval No. 184/13/03/00/12). For the Finnish Institute of Health and Welfare (THL) driven FinnGen preparatory project (here called FinnGen), all patients and control subjects had provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, older cohorts were based on study specific consents and later transferred to the THL Biobank after approval by Valvira, the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Valvira. The Ethical Review Board of the Hospital District of Helsinki and Uusimaa approved the FinnGen study protocol Nr HUS/990/2017. The FinnGen preparatory project is approved by THL, approval numbers THL/2031/6.02.00/2017, amendments THL/341/6.02.00/2018, THL/2222/6.02.00/2018 and THL/283/6.02.00/2019. All DNA samples and data in this study were pseudonymized.
Mass spectrometry-based lipid analysis of 2181 participants was performed in three batches-353 and 686 EUFAM participants in two batches and 1142 FINRISK participants in third batch at Lipotype GmbH (Dresden, Germany). Samples were analysed by direct infusion in a QExactive mass spectrometer (Thermo Scientific) equipped with a TriVersa NanoMate ion source (Advion Biosciences)47. The data were analysed using in-house developed lipid identification software based on LipidXplorer48,49. Post processing and normalisation of data were performed using an in-house developed data management system. Only lipids with signal-to-noise ratio >5 and amounts at least fivefold higher than in the corresponding blank samples were considered for further analyses. Reproducibility of the assay was assessed by the inclusion of reference plasma samples (eight reference samples for EUFAM and three reference samples for FINRISK) per 96-well plate. Median coefficient of variation was <10% across all batches. The data were corrected for batch and drift effects. Lipid species detected in <80% of the samples in any of the batches and samples (N = 64) with low lipid contents were excluded. Among the lipid species which passed quality control, a total of 141 lipid species from 13 lipid classes (Supplementary Table 2) were detected consistently in all three batches and were included in all analysis. The total amounts of lipid classes were calculated by summing up the absolute concentrations of all lipid species belonging to each lipid class. The measured concentrations of the lipid species and calculated class total were transformed to normal distribution by rank-based inverse normal transformation.
It is to be noted that Lipotype platform used in the study detected many additional lipid species (N = 83) that were not captured previously by other platforms. The list of the lipid species detected by different platforms and overlaps across the platforms are provided in the Supplementary Data 12 and Supplementary Fig. 7.
Genotyping and imputation
Genotyping for both EUFAM and FINRISK cohorts was performed using the HumanCoreExome BeadChip (Illumina Inc., San Diego, CA, USA). The genotype calls were generated together with other available data sets using zCall at the Institute for Molecular Medicine Finland (FIMM). Genotype data underwent stringent quality control (QC) before imputation that included exclusion of samples with low call rate (<95%), sex discrepancies, excess heterozygosity and non-European ancestry. Variants with low call rate (<95%) and deviation from Hardy–Weinberg Equilibrium (HWE P < 1 × 10−6) were excluded. Imputation was performed using IMPUTE250, which used two population-specific reference panels of 2690 high-coverage whole-genome and 5093 high-coverage whole-exome sequence data. Variants with imputation info score <0.70 were filtered out. After QC on lipidomic profiles and imputed variants, all subsequent analyses included 2045 individuals and ~9.3 million variants with MAF >0.005 that were available in both cohorts.
FinnGen samples were genotyped with Illumina and Affymetrix arrays (Thermo Fisher Scientific, Santa Clara, CA, USA). Genotype calls were made with GenCall and zCall algorithms for Illumina and AxiomGT1 algorithm for Affymetrix chip genotyping data. Genotyping data produced with previous chip platforms were lifted over to build version 38 (GRCh38/hg38) following the protocol described here: dx.doi.org/10.17504/protocols.io.nqtddwn. Samples with sex discrepancies, high genotype missingness (> 5%), excess heterozygosity (+-4SD) and non-Finnish ancestry were removed. Variants with high missingness (> 2%), deviation from HWE (P < 1e-6) and low minor allele count (MAC < 3) were removed. Pre-phasing of genotyped data was performed with Eagle 2.3.5 (https://data.broadinstitute.org/alkesgroup/Eagle/) with the default parameters, except the number of conditioning haplotypes was set to 20,000. Imputation was carried out by using the population-specific SISu v3 imputation reference panel with Beagle 4.1 (version 08Jun17.d8b, https://faculty.washington.edu/browning/beagle/b4_1.html) as described in the following protocol: [dx.doi.org/10.17504/protocols.io.nmndc5e]. SISu v3 imputation reference panel was developed using the high-coverage (25–30x) whole-genome sequencing data generated at the Broad Institute of MIT and Harvard and at the McDonnell Genome Institute at Washington University; and jointly processed at the Broad Institute. Variant callset was produced with GATK HaplotypeCaller algorithm by following GATK best-practices for variant calling. Genotype-, sample- and variant-wise QC was applied in an iterative manner by using the Hail framework v0.1 [https://github.com/hail-is/hail]. The resulting high-quality WGS data for 3775 individuals were phased with Eagle 2.3.5 as described above. Post-imputation quality control involved excluding variants with INFO score < 0.7.
Genotyping for the majority of the UK Biobank participants was done using the Affymetrix UK Biobank Axiom Array, while a subset of participants was genotyped using the Affymetrix UK BiLEVE Axiom Array. Details about the quality control and imputation of UK Biobank cohort are described by Bycroft et al.51.
Heritability estimates and genetic correlations
For heritability and genetic correlation estimation, rank-based inverse-transformed measures of lipid species, computed separately for the EUFAM and FINRISK cohorts, were combined to increase statistical power. The residuals of inverse-transformed measures after regressing for age, sex, first ten principal components (PCs) of genetic population structure, lipid medication, hormone replacement therapy, thyroid condition and type 2 diabetes were used as phenotypes. SNP-based heritability estimates were calculated using the variance component analysis using a genetic relationship matrix (GRM) as implemented in biMM52. Only the good quality variants with missingness <10% and MAF >0.005 were used to generate the GRM. The GRM was generated using GCTA by setting the off-diagonal elements that are <0.05 to 0 as proposed by Zaitlen et al.53. This allows to estimate SNP-based heritability in family data without removing closely related individuals. The heritability estimates of lipid species in different groups were compared using Wilcoxon rank-sum test.
The genetic correlation between each pair of lipid species and between each lipid species and traditional lipids was determined using the generated GRM with bivariate linear mixed model as implemented in biMM. The correlations based on the plasma levels (termed as phenotypic correlations) between all the pairs of the lipid species and traditional lipids were calculated using Pearson’s correlation coefficient. The heatmaps and hierarchical clustering based on genetic and phenotypic correlations were generated using heatmap.2 in R. As lipid-lowering medications could affect the plasma levels of lipid species, all analyses were adjusted for the usage of lipid-lowering medications, and separate analyses were also performed after excluding individuals using lipid-lowering medications (N = 172).
We performed univariate association tests for 141 individual lipid species, 12 total lipid classes and 4 traditional lipid measures (HDL-C, LDL-C, total cholesterol and triglycerides), in all batches to control for possible batch effects and combined the summary statistics by meta-analysis. The association analyses for the EUFAM cohort were performed using linear mixed models, including the above-mentioned covariates as fixed effects and kinship matrix as random effect as implemented in MMM54. The kinship matrices for the GWAS analyses were computed separately for each chromosome to include the variants from the other chromosomes using directly genotyped variants with MAF >0.01 and missingness <2%. The FINRISK cohort was analysed with linear regression model adjusting for age, sex, first ten PCs, lipid medication and diabetes using SNPTEST v2.555. Meta-analyses were performed using the inverse variance weighted method for fixed effects adjusted for genomic inflation factor in METAL56. In addition, analyses adjusting for the traditional lipids (in addition to above-mentioned covariates) were also performed for the identified variants to determine the independent effect on lipid species.
Test statistics were adjusted for λ values if >1.0 before meta-analyses. Genomic inflation factor (λ) ranged from 0.98 to 1.19 across the batches whereas the final λ values for meta-analysis ranged from 0.998 to 1.045 (Supplementary Data 13). The P-values obtained from the meta-analysis were considered to determine the SNP–lipid species associations. To account for multiple tests, the study-wide P-value threshold was set at <1.5 × 10−9 after correcting for 34 principal components (PCs) that explain over 90% of the variance in lipidomic profiles. Only the associations consistent in effect direction in all three batches were considered significant. Variants were designated as new if not located within 1 Mb of any previously reported variants for lipids (any of the traditional lipids and molecular lipid species); and as independent signal in known locus if located within 1 Mb but r2 < 0.20 with the previous lead variants and confirmed by conditional analysis. Variants with the strongest association in the identified lipid species loci was identified as the lead variants, and were annotated to the nearest gene for the new loci.
We identified 25 CVD-related outcomes from the derived phenotypes in the FinnGen and UK Biobanks (Supplementary Table 3). Associations between the 35 lead variants from the identified loci and 25 selected CVD phenotypes in FinnGen cohort were obtained from the ongoing analyses as a part of the FinnGen project. The associations were tested using saddle point approximation method adjusting for age, sex and first 10 PCs as implemented in SPAtest R package57. Associations between selected binary phenotypes and 35 lead variants in UK Biobank were obtained from Zhou et al. that were tested using logistic mixed model in SAIGE with a saddle point approximation and adjusting for first four principal components, age and sex (https://www.leelabsg.org/resources)58. Data for four phenotypes were not available from Zhou et al. and hence were obtained from http://www.nealelab.is/uk-biobank/. Associations of quantitative traits were tested using linear regression models with the same covariates as mentioned above, both for Finnish and UK Biobank cohorts. Meta-analyses of both cohorts were performed using the inverse variance weighted method for fixed effects model in METAL. The P-values obtained from the meta-analyses of the two cohorts are reported for PheWAS associations. All the PheWAS associations with false discovery rate (FDR) <5% evaluated using the Benjamini–Hochberg method and consistent direction of effects were considered significant.
To determine the variance explained by the known loci for traditional lipids, we included all the lead variants with MAF >0.005 in 250 genomic loci that have previously been associated with one or more of the four traditional lipids. Of the 636 reported variants, 557 variants with MAF >0.005 (including six proxies) were available in our QC passed imputed genotype data (Supplementary Data 10). A genetic relationship matrix (GRM) based on these 557 variants was generated using GCTA that was used to determine the variance in plasma levels of all lipid species explained by the known variants using variance component analysis in biMM.
The post-heparin lipoprotein lipase (LPL) after 15 min of heparin load was measured for 630 individuals in the EUFAM cohort using the ELISA method developed by Antikainen et al.59. The measured values were transformed using rank-based inverse normal transformation. Associations between the LPL activity and plasma levels of TAGs were determined using linear regression model adjusted for age, sex, lipid medication, hormone replacement therapy, thyroid condition and type 2 diabetes. Association between the LPL variant rs11570891 and LPL activity was tested using linear mixed model adjusted for age, sex, first ten PCs of genetic population structure, lipid medication, hormone replacement therapy, thyroid condition and type 2 diabetes as fixed effect and kinship matrix as random effect as implemented in MMM.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The full lipidomics GWAS summary level data are available on the web-based database [https://mqtl.fimm.fi]. Similarly, the PheWAS summary data can be obtained through [https://www.leelabsg.org/resources] and [http://www.nealelab.is/uk-biobank/]. The data presented in the figures and other summary level data are contained within the Supplementary Files and Supplementary Data. Other data are available through the Institute for Molecular Medicine Finland Data Access Committee on reasonable request after appropriate ethical approval.
Global Burden of Disease 2016 Causes of Death Collaborators. Global, regional, and national age-specific mortality for 264 causes of death, 1980-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet 390, 1151–1210 (2017).
Ference, B. A. et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the european atherosclerosis society consensus panel. Eur. Heart J. 38, 2459–2472 (2017).
Quehenberger, O. & Dennis, E. A. The human plasma lipidome. N. Engl. J. Med. 365, 1812–1823 (2011).
Alshehry, Z. H. et al. Plasma lipidomic profiles improve on traditional risk factors for the prediction of cardiovascular events in type 2 diabetes mellitus. Circulation 134, 1637–1650 (2016).
Stegemann, C. et al. Lipidomics profiling and risk of cardiovascular disease in the prospective population-based bruneck study. Circulation 129, 1821–1831 (2014).
Laaksonen, R. et al. Plasma ceramides predict cardiovascular death in patients with stable coronary artery disease and acute coronary syndromes beyond ldl-cholesterol. Eur. Heart J. 37, 1967–1976 (2016).
Havulinna, A. S. et al. Circulating ceramides predict cardiovascular outcomes in the population-based FINRISK 2002 cohort. Arterioscler. Thromb. Vasc. Biol. 36, 2424–2430 (2016).
Razquin, C. et al. Plasma lipidome patterns associated with cardiovascular risk in the PREDIMED trial: a case-cohort study. Int. J. Cardiol. 253, 126–132 (2018).
Wang, D. D. et al. Plasma ceramides, mediterranean diet, and incident cardiovascular disease in the PREDIMED Trial (Prevención con Dieta Mediterránea). Circulation 135, 2028–2040 (2017).
Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).
Liu, D. J. et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758–1766 (2017).
Demirkan, A. et al. Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations. PLoS Genet. 8, e1002490 (2012).
Hicks, A. A. et al. Genetic determinants of circulating sphingolipid concentrations in European populations. PLoS Genet. 5, e1000672 (2009).
Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008).
Kettunen, J. et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat. Commun. 7, 11122 (2016).
Rhee, E. P. et al. A genome-wide association study of the human metabolome in a community-based cohort. Cell Metab. 18, 130–143 (2013).
Draisma, H. H. M. et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nat. Commun. 6, 7208 (2015).
Illig, T. et al. A genome-wide perspective of genetic variation in human metabolism. Nat. Genet. 42, 137–141 (2010).
Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).
Long, T. et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat. Genet. 49, 568–578 (2017).
Bellis, C. et al. Human plasma lipidome is pleiotropically associated with cardiovascular risk factors and death. Circ. Cardiovasc. Genet. 7, 854–863 (2014).
Frahnow, T. et al. Heritability and responses to high fat diet of plasma lipidomics in a twin study. Sci. Rep. 7, 3750 (2017).
Chun, K. H. et al. In vivo activation of ROCK1 by insulin is impaired in skeletal muscle of humans with type 2 diabetes. Am. J. Physiol. Endocrinol. Metab. 300, E536–E542 (2011).
Jiang, X. C. et al. Plasma sphingomyelin level as a risk factor for coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 20, 2614–2618 (2000).
Jha, P. et al. Systems analyses reveal physiological roles and genetic regulators of liver lipid species. Cell Syst. 6, 722–733 (2018). e6.
Jha, P. et al. Genetic Regulation of plasma lipid species and their association with metabolic phenotypes. Cell Syst. 6, 709–721 (2018). e6.
Ander, B. P., Dupasquier, C. M., Prociuk, M. A. & Pierce, G. N. Polyunsaturated fatty acids and their effects on cardiovascular disease. Exp. Clin. Cardiol. 8, 164–172 (2003).
Forouhi, N. G. et al. Association of plasma phospholipid n-3 and n-6 polyunsaturated fatty acids with type 2 diabetes: The EPIC-InterAct Case-Cohort Study. PLoS Med. 13, e1002094 (2016).
Dyall, S. C. Long-chain omega-3 fatty acids and the brain: a review of the independent and shared effects of EPA, DPA and DHA. Front Aging Neurosci. 7, 52 (2015).
Saini, R. K. & Keum, Y. S. Omega-3 and omega-6 polyunsaturated fatty acids: dietary sources, metabolism, and significance—a review. Life Sci. 203, 255–267 (2018).
Rhee, E. P. et al. Lipid profiling identifies a triacylglycerol signature of insulin resistance and improves diabetes prediction in humans. J. Clin. Invest. 4, 1402–1411 (2011).
Doi, A. et al. IA-2beta, but not IA-2, is induced by ghrelin and inhibits glucose-stimulated insulin secretion. Proc. Natl Acad. Sci. USA 103, 885–890 (2006).
Rohde, M. et al. Members of the heat-shock protein 70 family promote cancer cell growth by distinct mechanisms. Genes Dev. 19, 570–582 (2005).
Petersen, A. K. et al. Genetic associations with lipoprotein subfractions provide information on their biological nature. Hum. Mol. Genet. 21, 1433–1443 (2012).
Ramo, T. J. et al. Coronary artery disease risk and lipidomic profiles are similar in hyperlipidemias with family history and population-ascertained hyperlipidemias. J. Am. Heart Assoc. 8, e012415 (2019).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Porkka, K. V. et al. Phenotype expression in familial combined hyperlipidemia. Atherosclerosis 133, 245–253 (1997).
Ripatti, P. et al. The contribution of gwas loci in familial dyslipidemias. PLoS Genet. 12, e1006078 (2016).
Friedewald, W. T., Levy, R. I. & Fredrickson, D. S. Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin. Chem. 18, 499–502 (1972).
Borodulin, K. et al. Forty-year trends in cardiovascular risk factors in Finland. Eur. J. Public Health 25, 539–546 (2015).
Pajunen, P. et al. The validity of the Finnish hospital discharge register and causes of death register data on coronary heart disease. Eur. J. Cardiovasc. Prev. Rehabil. 12, 132–137 (2005).
Tolonen, H. et al. The validation of the Finnish hospital discharge register and causes of death register data on stroke diagnoses. Eur. J. Cardiovasc. Prev. Rehabil. 14, 380–385 (2007).
Mähönen, M. et al. The validity of heart failure diagnoses obtained from administrative registers. Eur. J. Prev. Cardiol. 20, 254–259 (2013).
Mähönen, M. et al. The validity of hospital discharge register data on coronary heart disease in Finland. Eur. J. Epidemiol. 13, 403–415 (1997).
Sund, R. Quality of the Finnish hospital discharge register: a systematic review. Scand. J. Public Health 40, 505–515 (2012).
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Surma, M. A. et al. An automated shotgun lipidomics platform for high throughput, comprehensive, and quantitative analysis of blood plasma intact lipids. Eur. J. Lipid Sci. Technol. 117, 1540–1549 (2015).
Herzog, R. et al. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol. 12, R8 (2011).
Herzog, R. et al. Lipidxplorer: a software for consensual cross-platform lipidomics. PLoS One 7, e29851 (2012).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Pirinen, M. et al. biMM: efficient estimation of genetic variances and covariances for cohorts with high-dimensional phenotype measurements. Bioinformatics 33, 2405–2407 (2017).
Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9, e1003520 (2013).
Pirinen, M., Donnelly, P. & Spencer, C. C. A. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7, 369–390 (2012).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Dey, R., Schmidt, E. M., Abecasis, G. R. & Lee, S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet. 101, 37–49 (2017).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Antikainen, M., Suurinkeroinen, L., Jauhiainen, M., Ehnholm, C. & Taskinen, M. R. Development and evaluation of an ELISA method for the determination of lipoprotein lipase mass concentration-comparison with a commercial, one-step enzyme immunoassay. Eur. J. Clin. Chem. Clin. Biochem 34, 547–553 (1996).
We would like to thank Sari Kivikko, Huei-Yi Shen and Ulla Tuomainen for management assistance. We thank all study participants of the study for their participation. The FINRISK and FinnGen data used for the research were obtained from THL Biobank. We thank the THL DNA laboratory for its skillful work to produce the DNA samples used in the genotyping work, which was used in this study. Part of the genotyping was performed by the Institute for Molecular Medicine Finland FIMM Technology Centre, University of Helsinki. This research has been conducted using the UK Biobank Resource with application number 22627. This work was supported by National Institutes of Health [grant HL113315 to S.R., M.R.T., N.F. and A.P.]; Finnish Foundation for Cardiovascular Research [to S.R., V.S., M.R.T., M.J. and A.P.]; Academy of Finland Center of Excellence in Complex Disease Genetics [grant 312062 to S.R.]; Academy of Finland [285380 to S.R., 288509 to M.P.]; Jane and Aatos Erkko Foundation [to M.J.]; Sigrid Jusélius Foundation [to S.R. and M.R.T.]; Horizon 2020 Research and Innovation Programme [grant 692145 to S.R.]; EU-project RESOLVE (EU 7th Framework Programme) [grant 305707 to M.R.T.]; HiLIFE Fellowship [to S.R.]; Helsinki University Central Hospital Research Funds [to M.R.T.]; Magnus Ehrnrooth Foundation [to M.J.]; Leducq Foundation [to M.R.T.]; Ida Montin Foundation [to P.R.]; MD-PhD Programme of the Faculty of Medicine, University of Helsinki [to J.T.R.]; Doctoral Programme in Population Health, University of Helsinki [to J.T.R. and P.R.]; Finnish Medical Foundation [to J.T.R.]; Emil Aaltonen Foundation [to J.T.R. and P.R.]; Biomedicum Helsinki Foundation [to J.T.R.]; Paulo Foundation [to J.T.R.]; Idman Foundation [to J.T.R.]; Veritas Foundation [to J.T.R.]; FIMM-EMBL PhD Fellowship grant [to S.H.]. The FinnGen project is funded by two grants from Business Finland (HUS 4685/31/2016 and UH 4386/31/2016) and nine industry partners (AbbVie, AstraZeneca, Biogen, Celgene, Genentech, GSK, Merck, Pfizer and Sanofi). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.
V.S. has participated in a conference trip sponsored by Novo Nordisk and received an honorarium from the same source for participating in an advisory board meeting. M.J.G. is an employee of Lipotype GmbH, C.K. is a shareholder and employee of Lipotype GmbH, K.S. is a shareholder and CEO of Lipotype GmbH. M.A.S. is a shareholder of Lipotype GmbH and an employee of Łukasiewicz Research Network–PORT Polish Center for Technology Development. The remaining authors have no relevant competing interests.
Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.