Abstract
Plasma lipids are known heritable risk factors for cardiovascular disease, but increasing evidence also supports shared genetics with diseases of other organ systems. We devised a comprehensive three-phase framework to identify new lipid-associated genes and study the relationships among lipids, genotypes, gene expression and hundreds of complex human diseases from the Electronic Medical Records and Genomics (347 traits) and the UK Biobank (549 traits). Aside from 67 new lipid-associated genes with strong replication, we found evidence for pleiotropic SNPs/genes between lipids and diseases across the phenome. These include discordant pleiotropy in the HLA region between lipids and multiple sclerosis and putative causal paths between triglycerides and gout, among several others. Our findings give insights into the genetic basis of the relationship between plasma lipids and diseases on a phenome-wide scale and can provide context for future prevention and treatment strategies.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Implicating genes, pleiotropy, and sexual dimorphism at blood lipid loci through multi-ancestry meta-analysis
Genome Biology Open Access 27 December 2022
-
Real-world data: a brief review of the methods, applications, challenges and opportunities
BMC Medical Research Methodology Open Access 05 November 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout








Data availability
This project corresponds to UKB application ID 32133 and eMERGE Network phase III (dbGaP study accession no. phs001584.v1.p1). Lipid GWAS summary statistics for GLGC 2013 (ref. 3) are publicly available for download (http://csg.sph.umich.edu/willer/public/lipids2013/). Lipid GWAS summary statistics for GERA5 are available via dbGaP (accession no. phs000674.v2.p2). Expression prediction models with LD reference data using MASHR are available on Zenodo (https://zenodo.org/record/3518299/files/mashr_eqtl.tar?download=1). GTEx Analysis Release v8 (dbGaP accession no. phs000424.v8.p2) is available for download via the GTEx Portal (https://gtexportal.org/home/datasets/). Summary statistics for lipid GWAS, lipid TWAS, lipid-guided PheWAS and Xpress-PheWAS generated in this study are available on Figshare (https://figshare.com/s/d62961bbc6c45c8dc2b0).
Code availability
Code for identifying LD-contaminated genes and detecting secondary independent associations at a locus is shared on GitHub (https://github.com/RitchieLab/Gene-level-statistical-colocalization/).
References
Castelli, W. P. Cholesterol and lipids in the risk of coronary artery disease—the Framingham Heart Study. Can. J. Cardiol. 4, 5A–10A (1988).
Kannel, W. B., Dawber, T. R., Kagan, A., Revotskie, N. & Stokes, J. Factors of risk in the development of coronary heart disease—six year follow-up experience. The Framingham Study. Ann. Intern. Med. 55, 33–50 (1961).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Hoffmann, T. J. et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat. Genet. 50, 401–413 (2018).
Klarin, D. et al. Genetics of blood lipids among ~300,000 multiethnic participants of the Million Veteran Program. Nat. Genet. 50, 1514–1523 (2018).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue-specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Gottesman, O. et al. The Electronic Medical Records and Genomics (eMERGE) network: past, present and future. Genet. Med. 15, 761–771 (2013).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
González-Gay, M. A. & González-Juanatey, C. Inflammation and lipid profile in rheumatoid arthritis: bridging an apparent paradox. Ann. Rheum. Dis. 73, 1281–1283 (2014).
Pietrzak, A., Michalak-Stoma, A., Chodorowska, G. & Szepietowski, J. C. Lipid disturbances in psoriasis: an update. Mediators Inflamm. 2010, 535612 (2010).
Ference, B. A., Graham, I., Tokgozoglu, L. & Catapano, A. L. Impact of lipids on cardiovascular health. J. Am. Coll. Cardiol. 72, 1141–1156 (2018).
Reale, M. & Sanchez-Ramon, S. Lipids at the cross-road of autoimmunity in multiple sclerosis. Curr. Med. Chem. 24, 176–192 (2017).
Di Paolo, G. & Kim, T.-W. Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nat. Rev. Neurosci. 12, 284–296 (2011).
Chesmore, K., Bartlett, J. & Williams, S. M. The ubiquity of pleiotropy in human disease. Hum. Genet. 137, 39–44 (2018).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Webb, T. R. et al. Systematic evaluation of pleiotropy identifies 6 further loci associated with coronary artery disease. J. Am. Coll. Cardiol. 69, 823–836 (2017).
Andreassen, O. A. et al. Abundant genetic overlap between blood lipids and immune-mediated diseases indicates shared molecular genetic mechanisms. PLoS ONE 10, e0123057 (2015).
Kim, Y. K. et al. Evaluation of pleiotropic effects among common genetic loci identified for cardio-metabolic traits in a Korean population. Cardiovasc. Diabetol. 15, 1–11 (2016).
Ligthart, C. et al. Bivariate genome-wide association study identifies novel pleiotropic loci for lipids and inflammation. BMC Genomics 17, 443 (2016).
Nikpay, M., Turner, A. W. & McPherson, R. Partitioning the pleiotropy between coronary artery disease and body mass index reveals the importance of low frequency variants and central nervous system-specific functional elements. Circ. Genom. Precis. Med. 11, e002050 (2018).
Zhang, X. et al. Detecting potential pleiotropy across cardiovascular and neurological diseases using univariate, bivariate and multivariate methods on 43,870 individuals from the eMERGE network. Pac. Symp. Biocomput. 24, 272–283 (2019).
Davey Smith, G. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2013).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Butler, R. The ICD-10 General Equivalence Mappings. Bridging the translation gap from ICD-9. J. AHIMA 78, 84–85 (2007).
Xu, L. et al. An association study between genetic polymorphisms related to lipoprotein-associated phospholipase A2 and coronary heart disease. Exp. Ther. Med. 5, 742–750 (2013).
Wolpin, B. M. et al. Prospective study of ABO blood type and the risk of pulmonary embolism in two large cohort studies. Thromb. Haemost. 104, 962–971 (2010).
Hajizadeh, R., Kavandi, H., Nadiri, M. & Ghaffari, S. The association of ABO blood group with incidence and outcome of acute pulmonary embolism. Turk Kardiyol. Dern. Ars. 44, 397–403 (2016).
Zhang, J., Zhao, Z., Guo, X., Guo, B. & Wu, B. Powerful statistical method to detect disease-associated genes using publicly available genome-wide association studies summary data. Genet. Epidemiol. 43, 941–951 (2019).
Lumish, H. S., O’Reilly, M. P. & Reilly, M. P. Sex differences in genomic drivers of adipose distribution and related cardiometabolic disorders: opportunities for precision medicine. Arterioscler. Thromb. Vasc. Biol. 40, 45–60 (2020).
Reshef, Y. A. et al. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat. Genet. 50, 1483–1493 (2018).
Cantuti-Castelvetri, L. et al. Defective cholesterol clearance limits remyelination in the aged central nervous system. Science 359, 684–688 (2018).
Fard, M. K. et al. BCAS1 expression defines a population of early myelinating oligodendrocytes in multiple sclerosis lesions. Sci. Transl. Med. 9, eaam7816 (2017).
Kung, J. T. Y., Colognori, D. & Lee, J. T. Long noncoding RNAs: past, present and future. Genetics 193, 651–669 (2013).
Ginn, L., Shi, L., La Montagna, M. & Garofalo, M. LncRNAs in non-small-cell lung cancer. Noncoding RNA 6, 25 (2020).
Zhong, R. et al. LINC01149 variant modulates MICA expression that facilitates hepatitis B virus spontaneous recovery but increases hepatocellular carcinoma risk. Oncogene 39, 1944–1956 (2020).
Feng, X. & Yang, S. Long noncoding RNA LINC00243 promotes proliferation and glycolysis in non-small-cell lung cancer cells by positively regulating PDK4 through sponging miR-507. Mol. Cell. Biochem. 463, 127–136 (2020).
Yu, X., Chen, H., Huang, S. & Zeng, P. Evaluation of the causal effects of blood lipid levels on gout with summary level GWAS data: two-sample Mendelian randomization and mediation analysis. J. Hum. Genet. 66, 465–473 (2021).
Marien, E. et al. Non-small-cell lung cancer is characterized by dramatic changes in phospholipid profiles. Int. J. Cancer 137, 1539–1548 (2015).
Eggers, L. F. et al. Lipidomes of lung cancer and tumour-free lung tissues reveal distinct molecular signatures for cancer differentiation, age, inflammation and pulmonary emphysema. Sci. Rep. 7, 11087 (2017).
Tiwary, S. et al. Metastatic brain tumors disrupt the blood–brain barrier and alter lipid metabolism by inhibiting expression of the endothelial cell fatty acid transporter Mfsd2a. Sci. Rep. 8, 8267 (2018).
Sun, H., Zhang, X., Shi, W. & Fang, B. Association of soft tissue infection in the extremity with glucose and lipid metabolism and inflammatory factors. Exp. Ther. Med. 17, 2535–2540 (2019).
Gao, S., Cui, X., Wang, X., Burg, M. B. & Dmitrieva, N. I. Cross-sectional positive association of serum lipids and blood pressure with serum sodium within the normal reference range of 135–145 mmol/l. Arterioscler. Thromb. Vasc. Biol. 37, 598–606 (2017).
Goldstein, I. et al. p53, a novel regulator of lipid metabolism pathways. J. Hepatol. 56, 656–662 (2012).
Mäkinen, N. et al. Exome sequencing of uterine leiomyosarcomas identifies frequent mutations in TP53, ATRX and MED12. PLoS Genet. 12, e1005850 (2016).
Parrales, A. & Iwakuma, T. p53 as a regulator of lipid metabolism in cancer. Int. J. Mol. Sci. 17, 2074 (2016).
Veturi, Y. & Ritchie, M. D. How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures? Pac. Symp. Biocomput. 23, 228–239 (2018).
Olafsdottir, T. et al. Genome-wide association identifies seven loci for pelvic organ prolapse in Iceland and the UK Biobank. Commun. Biol. 3, 129 (2020).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Verma, S. S. et al. Imputation and quality-control steps for combining multiple genome-wide datasets. Front. Genet. 5, 370 (2014).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2015).
Macarthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Eicher, J. D. et al. GRASP v2.0: an update on the Genome-Wide Repository of Associations between SNPs and Phenotypes. Nucleic Acids Res. 43, 799–804 (2014).
Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).
Yavorska, O. O. & Burgess, S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int. J. Epidemiol. 46, 1734–1739 (2017).
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
anastasia-lucas/hudson. A Hudson Plot Package version 0.1.0. GitHub. https://rdrr.io/github/anastasia-lucas/hudson/. Accessed 5 March 2020.
Conway, J. R., Lex, A. & Gehlenborg, N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940 (2017).
Zuguang Gu. Circlize R package. https://cran.r-project.org/web/packages/circlize/index.html (2019).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Acknowledgements
Phase III of the eMERGE Network was initiated and funded by the NHGRI through the following grants: U01HG8657 (Group Health Cooperative/University of Washington); U01HG8685 (Brigham and Women’s Hospital); U01HG8672 (Vanderbilt University Medical Center); U01HG8666 (Cincinnati Children’s Hospital Medical Center); U01HG6379 (Mayo Clinic); U01HG8679 (Geisinger Clinic); U01HG8680 (Columbia University Health Sciences); U01HG8684 (Children’s Hospital of Philadelphia); U01HG8673 (Northwestern University); U01HG8701 (Vanderbilt University Medical Center serving as the Coordinating Center); U01HG8676 (Partners Healthcare/Broad Institute); and U01HG8664 (Baylor College of Medicine). For the UKB, all data for this cohort pertained to project 32133: ‘Integration of multi-organ imaging phenotypes, clinical phenotypes and genomic data’. Y.V., R.M.K., T.J.H., N.R., M.W.M., E.T. and M.D.R. acknowledge funding from the National Institutes of Health (NIH GM115318: Pharmacogenomics of Statin Therapy (POST)). Y.V. and M.D.R. also acknowledge NIH AI077505 (Pharmacogenomics of HIV Therapy). J.E.M. acknowledges NHGRI T32HG009495–01; C.M.S. acknowledges R35GM131770 (Pharmacogenetics to improve Drug Therapy). B.F.V. acknowledges NIH DK101478, NIH HG010067 and a Linda Pechenik Montague Investigator Award for their time on this project.
Author information
Authors and Affiliations
Contributions
Y.V. and M.D.R. conceptualized and designed the study. Y.V. conducted all statistical analyses. Y.V. and D.H. conducted phase III analyses. Y.V., A.L. and S.D. performed data visualization. Y.V., Y.B. and A.L. conducted phenotype curation. Y.V., M.D.R. and A.V. performed data acquisition for the UKB. H.H., P.S., I.K., D.S., C.M.S., D.R.V.E., Q.F. and W.-Q.W. performed data acquisition for eMERGE. T.J.H., N.R., R.M.K., M.W.M. and E.T. performed data acquisition for GERA. Y.V. and B.F.V. conceptualized phase III of this study. Y.V. and J.E.M. performed overrepresentation analysis. D.J.R. provided guidance for phases I and II. Y.V. and M.D.R. wrote the manuscript. All authors provided interpretation of the results and critical feedback on the manuscript.
Corresponding author
Ethics declarations
Competing interests
M.D.R. is on the scientific advisory board for Goldfinch Bio and Cipherome. D.J.R. serves on Scientific Advisory Boards for Alnylam, Novartis, Pfizer and Verve and is a founder of Staten Biotechnology. The other co-authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Case-control distribution for ICD codes.
Distribution of cases (blue) and controls (yellow) for the collapsed 3-digit ICD codes in eMERGE (top) and UKB (bottom). eMERGE has predominantly ICD-9 codes whereas UKB has predominantly ICD-10 codes.
Extended Data Fig. 2 Lipid GWAS in eMERGE.
Manhattan plots from GWAS (two-sided linear regression) conducted on the four plasma lipid traits (HDL-C, LDL-C, TC, TG) for the eMERGE cohort. In each plot we have chromosomes 1 to 22 on the x-axis and -log(P) value on the y-axis.
Extended Data Fig. 3 Lipid GWAS in UKB.
Manhattan plots from GWAS (two-sided linear regression) conducted on the four plasma lipid traits (HDL-C, LDL-C, TC, TG) for the UKB cohort. In each plot we have chromosomes 1 to 22 on the x-axis and -log(P) value on the y-axis.
Extended Data Fig. 4 Lipid TWAS P-values for novel lipid genes.
Synthesis-view plot indicating -log10 P-values for Bonferroni-significant ‘novel’ genes (two-sided gene-based tests: P < 5.57 × 10−7) from lipid TWAS. These genes passed coloc P[H3] < 0.5 filter in at least one cohort. The direction of triangle corresponds to the direction of gene-effect from TWAS (left facing-negative and right facing-positive). Colors indicate the five selected tissues from GTEx v8 (adipose subcutaneous, adipose visceral omentum, liver, small intestine terminal ileum, whole blood).
Extended Data Fig. 5 Colocalization probabilities of shared causal variant between lipids and gene expression for novel lipid genes.
Synthesis-view plot indicating coloc P[H4] for Bonferroni-significant ‘novel’ genes (two-sided gene-based tests: P < 5.57 × 10−7) obtained from lipid TWAS. These genes passed coloc P[H3] < 0.5 filter in at least one cohort. The direction of triangle corresponds to the direction of gene-effect from TWAS (left facing-negative and right facing-positive). Colors indicate the five selected tissues from GTEx v8 (adipose subcutaneous, adipose visceral omentum, liver, small intestine terminal ileum, whole blood). We present coloc results for all regions corresponding to a gene.
Extended Data Fig. 6 Overlap of detected ICD codes between cohorts.
UpSet plot indicating overlap of diseases (ICD codes) with Bonferroni-significant genes between PheWAS and Xpress-PheWAS conducted on eMERGE and UKB, respectively.
Extended Data Fig. 7 Overlap of significant SNPs between lipid GWAS and lipid-guided PheWAS across cohorts.
UpSet plot indicating overlap of GWAS-significant SNPs (Bonferroni threshold) between each of the four plasma lipids (HDL-C, LDL-C, TC, TG) aggregated across the four considered cohorts (eMERGE, GERA, GLGC, UKB) and lipid-guided PheWAS conducted in eMERGE and UKB, respectively.
Extended Data Fig. 8 Lipid-disease pleiotropy from lipid-guided PheWAS in either eMERGE or UKB.
Circos plot indicates Bonferroni-significant SNPs in either cohort (eMERGE or UKB) from lipid-guided PheWAS (two-sided logistic regression). Outer track, the number of SNPs detected in either cohort; inner track, significant ICD codes per disease category. Links, SNPs connecting lipids (in salmon) to diseases (in blue); link thickness, # SNPs; link color, chromosome. Due to large number of SNP associations involved, this plot does not show associations (links) in the HLA region (chromosome 6).
Extended Data Fig. 9 Overlap of significant genes between lipid TWAS and Xpress-PheWAS across cohorts.
UpSet plot indicating overlap of detected Bonferroni-significant genes between lipid TWAS and Xpress-PheWAS conducted on eMERGE and UKB, respectively. Lipid TWAS genes have been split into two categories: (1) novel; (2) previously reported.
Extended Data Fig. 10 Effect sizes and confidence intervals from two-sample univariable Mendelian randomization analyses.
Mendelian randomization funnel plots depicting MR effect size (using two-sided IVW and Egger approaches) across ICD codes detected as FDR significant (excluding proof-of-concept diseases such as E78 Disorders of lipoprotein metabolism and other lipidemias and I10 Essential primary hypertension; see Fig. 7 for a full list of FDR-significant diseases). Top 5 plots: exposure dataset (lipid), GERA; outcome dataset, UKB. Remaining plots: exposure dataset (lipid), UKB; outcome dataset, eMERGE.
Supplementary information
Rights and permissions
About this article
Cite this article
Veturi, Y., Lucas, A., Bradford, Y. et al. A unified framework identifies new links between plasma lipids and diseases from electronic medical records across large-scale cohorts. Nat Genet 53, 972–981 (2021). https://doi.org/10.1038/s41588-021-00879-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00879-y
This article is cited by
-
Real-world data: a brief review of the methods, applications, challenges and opportunities
BMC Medical Research Methodology (2022)
-
Implicating genes, pleiotropy, and sexual dimorphism at blood lipid loci through multi-ancestry meta-analysis
Genome Biology (2022)