Genetic architecture of human plasma lipidome and its link to cardiovascular disease.

Understanding genetic architecture of plasma lipidome could provide better insights into lipid metabolism and its link to cardiovascular diseases (CVDs). Here, we perform genome-wide association analyses of 141 lipid species (n = 2,181 individuals), followed by phenome-wide scans with 25 CVD related phenotypes (n = 511,700 individuals). We identify 35 lipid-species-associated loci (P <5 ×10-8), 10 of which associate with CVD risk including five new loci-COL5A1, GLTPD2, SPTLC3, MBOAT7 and GALNT16 (false discovery rate<0.05). We identify loci for lipid species that are shown to predict CVD e.g., SPTLC3 for CER(d18:1/24:1). We show that lipoprotein lipase (LPL) may more efficiently hydrolyze medium length triacylglycerides (TAGs) than others. Polyunsaturated lipids have highest heritability and genetic correlations, suggesting considerable genetic regulation at fatty acids levels. We find low genetic correlations between traditional lipids and lipid species. Our results show that lipidomic profiles capture information beyond traditional lipids and identify genetic variants modifying lipid levels and risk of CVD.

: Delete Y-axis legend "Height" Figure 4B: X-axis legend is too small Reviewer #2 (Remarks to the Author): The authors report results of the hitherto largest genome-wide study of lipidomic profiles. More specifically, they studied 141 lipid species, followed by a phenome-wide scan with 26 CVD related phenotypes. They identified 35 loci, 15 of them not reported with lipid-related phenotypes, yet. 10 of the loci were also related with CVD endpoints. Furthermore, they also showed highly heterogeneous estimates of heritability and variance explained.
Altogether, this is a very interesting, timely manuscript, which has been very thoroughly performed with state-of the art methods. I only have one major point: With individual GWAS on 141 lipid species and 4 traditional lipid measures, defining a p-value of smaller than 5x10-5 as genome-wide significant is not really appropriate. Of course, the correlation between the traits has to be taken into account, e.g. by correcting for principal components explaining the majority of the correlation between traits, as has been done in recently published metabolomics GWAS.

Some additional minor points:
Please check the numbering of the Supplementary Tables in the text, e.g. on page 20, it should probably be Suppl. Table 6 instead of 4, page 5: Suppl. Table 2 instead of 1. This paper describes a GWAS with a new commercial lipidomics platform (Lipotype GmbH, Dresden, Germany). The platform covers 141 lipid species from 13 lipid classes. The sample size is substantial for a lipidomics GWAS (N=2,181). Variants were declared significant if the association with a lipid reached genome-wide significance (p<5E-8) and if it showed consistent directionality in the three analyzed batches. At this significance level 35 lipid-species associations were detected. The authors discuss overlap with CVD risk. Overall this is an interesting study that provides new insights into the genetic architecture of lipid metabolism that merits publication. I have a few specific remarks that may help improve the manuscript (some are personal taste): We thank the reviewer for the kind words and valuable suggestions that have helped to greatly improve the manuscript. We regret the confusing statement. We acknowledge the fact that there have been efforts to identify genetic variants for lipid species previously, and we have mentioned and cited in the introduction section of the manuscript. We meant to imply that this is the first study integrating lipidome, genome and phenome at this scale. We have rephrased the statement to avoid any confusion in the revised manuscript as:

The authors state that this is
Page 10, Lines 24-25: "We present findings from a large-scale study that integrate lipidome, genome and phenome revealing detailed description of genetic regulation and genetic architecture of the lipidome, and their associations with CVD risk." 2. For instance, the claim that this study identified new locus-lipid species associations at previously reported lipid loci is not always correct. For instance, rs261290 near LIPC has been reported in association with 1-stearoyl-2-arachidonoyl-GPE (18:0/20:4) at a pvalue of 1.68E-23 (see for instancehttp://snipa.org using the block annotation or http://www.phenoscanner.medschl.cam.ac.uk with rs261290). A more stringent comparison with previous mGWAS should be provided (the phenoscanner API could be useful here).
We thank the reviewer for pointing this out. We have now formally compared our results with the previous metabolomics GWAS results. As suggested, we checked for the previous associations with lipid related metabolites at 35 lipid species associated loci identified in our study using block annotation in the SNiPA (http://snipa.org) and the PhenoScanner v2 (http://www.phenoscanner.medschl.cam.ac.uk/) and manually curated to include associations from literature search.
Out of the 35 identified lipid species associated loci, 11 loci were reported to associate with at least one of the lipid related metabolites analyzed in any previous mGWAS at P<5.0x10 -8 . The Supplementary Table 5 has now been updated based on this information and the detailed list of all previous associations is also provided in a new supplementary table (Supplementary Table 6).
The text in the manuscript has been modified accordingly as: Page 7, Lines 3-11: "We also replicated the previous associations of FADS2, SYNE2, LIPC, LASS4 and MBOAT7 with the same lipid species [13][14][15][16][17][18][19][20]   We agree with the reviewer and believe that this would be an important information that would allow comparison of the results across the studies. We have now provided the list of the lipid species detected by different platforms in the Supplementary Table 14  )] levels. Nevertheless, we identified the total lipid moieties captured by these three platforms by their species or subspecies (whichever available) and looked for the identical match in the three platforms. A total of 296 lipid metabolites were found, of which Metabolon captured 121, Lipotype captured 141 and Biocrates captured 107 lipid moieties.
The overlaps among the platform at different levels are shown in the venn diagrams below (included as Supplementary Figure 8 in the revised manuscript). As shown in panel A, over 40% of the lipid species in Metabolon and Lipotype overlap with each other. There is little overlap with Biocrates as it does not inform about the acyl chain composition for most of the lipids. Panel b shows that overlap at species level i.e. when for instance PC(34:0;0) and PC(16:0;0-18:0;0) were considered as match at species level. If we only consider the lipid species with acyl chain composition information in all three platforms (panel c), almost 50% of Biocrates lipid species could be detected by both Metabolon and Lipotype.

It would also be important to know how the Lipotype platform fares on identical lipids and identical SNPs discovered by the Biocrates and Metabolon GWAS.
We thank the reviewer for this valuable suggestion. We have now summarized the results for all the variants previously identified in mGWAS and also the associations between the identical lipids and identical SNPs in the revised manuscript. The text in the manuscript reads as: Page 7, Lines 12-18: "Further, we systematically evaluated the associations of variants previously identified in metabolomics GWAS (126 variants  For this comparison, we obtained the list of all the published GWAS for metabolomics from http://www.metabolomix.com/list-of-all-published-gwas-with-metabolomics/ and the associations with lipid related metabolites were extracted from the respective manuscripts. A total of 132 SNPs were found to be associated with at least one of the lipid related metabolites in these studies. Of these 132 SNPs, 126 were available in our dataset and a total of 134 identical SNP-lipid species pairs associations were available in our dataset. Kindly note that the effect sizes are not comparable across the studies as they differ in their units, however irrespective of that, we observed strong correlation in the effect sizes between previous studies and our study.

5.
For the novel loci, regional association plots should be provided.
We have now included regional association plots for all 35 loci as Supplementary Figure 4. 6. The term "genetic sharing" is not well defined -do you mean overlapping associations on a same SNP with several phenotypes? A more stringent definition of genetic signal colocalization should be used.
We determined the pairwise genetic correlation for all the lipid species. Genetic sharing here implies that the two lipid species with high genetic correlation would likely have overlapping genetic variants associated with them. In the present study, we wanted to determine the extent of genetic sharing between the lipid species. We did not intend to formally determine the genetic variants that colocalize or overlap between the lipid species as this would have been very intensive and out of the scope and focus of the study.
To avoid confusion, we have replaced the term "genetic sharing" by "genetic correlation" at appropriate places.

The term "traditional lipids" should be introduced as referring to HDL, LDL, TC, and bulk TGs.
The text has been modified as: Page 4, Lines 6-7: "Standard lipid profiling measures traditional lipids (referred to LDL-C, HDL-C, total triglycerides, and total cholesterol) but…" We agree with the reviewer and that is exactly what we want to formally demonstrate and emphasize with the data at hands. We intend to show that lipidomic profiles capture information beyond traditional lipids and provide an opportunity to identify additional genetic variants influencing lipid metabolism and disease risk. We have modified the section dealing with this in the revised manuscript. Moreover, as suggested we have also discussed our results in the light of previous observations by Petersen et al.
The modified text in the manuscript now read as:

I am personally not a big fan of heritability calculations (but others are, so I am not asking to remove them) -especially in the case of the FADS-associated lipids, the signal may come from a signal variant. It would be interesting to see how the heritability changes when the FADS-signal is regressed out (but this is just my curiosity, not a request I expect the authors to follow).
As suggested, we calculated the heritability estimates after excluding +/-2.5Mb region flanking FADS2 variant. We observed only a slight decrease in the estimates for most of the lipid species as shown in table below.
Though the FADS region variants are consistently shown to be associated with lipids, these variants are quite common in the population (~ 40%) and hence are unlikely to have large effect on the lipid levels. Moreover, as we mentioned in the manuscript, we identified 11 loci that were associated with polyunsaturated lipids. Hence, as expected, the genetic regulation of these lipid species seems to be quite polygenic in nature and a single locus does not contribute much to the total variance.   10. Page 6: I suggest deleting "Moreover, lipid species with genome-wide significant association had higher heritability estimates compared to the lipid species with no significant association", as this is evident, and at this point genome-wide significance has not been formally introduced.
We have removed this text from the manuscript as suggested.
11. The PheWeb database is a great addition to the paper -the link should be replaced by a domain name to be persistent after publication (not an IP address, which may change). Some of the links to the Oxford Big server do not work -the alleles are inverted.
We thank the reviewer for pointing this out. We have now updated the Pheweb browser after consistently aligning the alleles and all the links are working fine now. We agree that it would be nice to set up the domain for Pheweb browser. We are working on the technicality of it and will soon replace the current IP address (http://35.205.141.92) with the domain name.
12. The association of the ROCK1 variant with CVD appears stretched if there is no genetic signal in large CVD GWAS. The association of BLK with diabetes also appears quite stretched (P=4.5E-5) and should be substantiated … the same holds for GALNT16 and LPL … in this context would it be interesting to be more formal about how many tests with disease have been made.
In the PheWAS analyses, 25 CVD phenotypes and 34 lead variants (one lead variant was not available) were included. As the CVD phenotypes are correlated, we applied false discovery rate (FDR) for multiple testing correction as Bonferroni correction would be too conservative in this case. So, PheWAS associations with false discovery rate (FDR) <5% evaluated using the Benjamini-Hochberg method were considered significant.
14. Figure 4B: X-axis legend is too small We have taken care of the font sizes in all the figures in the revised manuscript.

Reviewer #2
The authors report results of the hitherto largest genome-wide study of lipidomic profiles. More specifically, they studied 141 lipid species, followed by a phenome-wide scan with 26 CVD related phenotypes. They identified 35 loci, 15 of them not reported with lipid-related phenotypes, yet. 10 of the loci were also related with CVD endpoints. Furthermore, they also showed highly heterogeneous estimates of heritability and variance explained.
Altogether, this is a very interesting, timely manuscript, which has been very thoroughly performed with state-of the art methods. I only have one major point: We thank the reviewer for the kind words and valuable suggestions.

With individual GWAS on 141 lipid species and 4 traditional lipid measures, defining
a p-value of smaller than 5x10-5 as genome-wide significant is not really appropriate. Of course, the correlation between the traits has to be taken into account, e.g. by correcting for principal components explaining the majority of the correlation between traits, as has been done in recently published metabolomics GWAS.
As suggested, we have now adjusted the p-values for the number of principal components (PC) that explain over 90% of the variance in the lipidomics data. After accounting for 34 PCs that explain 90% of the variance, we set the p-value threshold to be smaller than 1.5×10 -9 .
The method section now includes the following: Page 20, Lines 3-5: "To account for multiple tests, the study-wide P value threshold was set at <1.5×10 -9 after correcting for 34 principal components (PCs) that explain over 90% of the total variances in lipidomic profiles." And the association results have been modified based on this p-value cut off (Pages 6 and 7)

Some additional minor points: Please check the numbering of the Supplementary
Tables in the text, e.g. on page 20, it should probably be Suppl. Table 6 instead of 4, page 5: Suppl. Table 2 instead of 1.
Thanks for pointing this out. We have taken care of this in the revised manuscript. Figure 3: what is the rationale of ordering?

3.
In Figure 3, the lipid species are ordered based on the hierarchical clustering using genetic and phenotypic correlations (lower panel of the Figure 3) and their heritability estimates and proportion of variance explained for the lipid species is shown in upper panel in the same order.
We have clarified this in the legend of the figure 3 in the revised manuscript.
Page 30 Thanks for pointing this. We have taken care of the font sizes in all the figures in the revised manuscript and hope that they are readable in the current form.