Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The power of genetic diversity in genome-wide association studies of lipids

Abstract

Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Comparison of identified loci across ancestry groups.
Fig. 2: Inclusion of multiple ancestries drives improved fine-mapping.
Fig. 3: Multi-ancestry LDL-C PRS show similar performance across ancestry groups.

Data availability

The GWAS meta-analysis results (including both ancestry-specific and multi-ancestry analyses) and risk score weights are available at http://csg.sph.umich.edu/willer/public/glgc-lipids2021. The optimized multi-ancestry and single-ancestry PRS weights are deposited in the PGS Catalogue (https://www.pgscatalog.org/) accession numbers PGS000886PGS000897 (all intervening numbers).

Code availability

The code EasyQC is available at www.genepi-regensburg.de/easyqc, and Raremetal is available at https://github.com/SailajaVeda/raremetal.

References

  1. Taddei, C. et al. Repositioning of the global epicentre of non-optimal cholesterol. Nature 582, 73–77 (2020).

    Google Scholar 

  2. Ference, B. A. et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel. Eur. Heart J. 38, 2459–2472 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Roth, G. A. et al. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 392, 1736–1788 (2018).

    Google Scholar 

  4. Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Liu, D. J. et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758–1766 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Lu, X. et al. Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants that contribute to lipid levels and coronary artery disease. Nat. Genet. 49, 1722–1730 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Kathiresan, S. et al. A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med. Genet. 8, S17 (2007).

    PubMed  PubMed Central  Google Scholar 

  9. Kathiresan, S. et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N. Engl. J. Med. 358, 1240–1249 (2008).

    CAS  PubMed  Google Scholar 

  10. Peloso, G. M. et al. Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. Am. J. Hum. Genet. 94, 223–232 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Hoffmann, T. J. et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat. Genet. 50, 401–413 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Klarin, D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet. 50, 1514–1523 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Holmen, O. L. et al. Systematic evaluation of coding variation identifies a candidate causal variant in TM6SF2 influencing total cholesterol and myocardial infarction risk. Nat. Genet. 46, 345–351 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Asselbergs, F. W. et al. Large-scale gene-centric meta-analysis across 32 studies identifies multiple lipid loci. Am. J. Hum. Genet. 91, 823–838 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Albrechtsen, A. et al. Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes. Diabetologia 56, 298–310 (2013).

    CAS  PubMed  Google Scholar 

  17. Saxena, R. et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).

    CAS  PubMed  Google Scholar 

  18. Iotchkova, V. et al. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat. Genet. 48, 1303–1312 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Tachmazidou, I. et al. A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates. Nat. Commun. 4, 2872 (2013).

    PubMed  Google Scholar 

  20. Tang, C. S. et al. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese. Nat. Commun. 6, 10206 (2015).

    ADS  CAS  PubMed  Google Scholar 

  21. van Leeuwen, E. M. et al. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels. Nat. Commun. 6, 6065 (2015).

    PubMed  Google Scholar 

  22. Spracklen, C. N. et al. Association analyses of East Asian individuals and trans-ancestry analyses with European individuals reveal new loci associated with cholesterol and triglyceride levels. Hum. Mol. Genet. 26, 1770–1784 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).

    CAS  PubMed  Google Scholar 

  24. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. Buniello, A. et al. The NHGRI–EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    CAS  PubMed  Google Scholar 

  28. Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  29. Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).

    PubMed  PubMed Central  Google Scholar 

  30. Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Guo, J. et al. Quantifying genetic heterogeneity between continental populations for human height and body mass index. Sci. Rep. 11, 5240 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    ADS  PubMed  PubMed Central  Google Scholar 

  34. Majara, L. et al. Low generalizability of polygenic scores in African populations due to genetic and environmental diversity. Preprint at bioRxiv https://doi.org/10.1101/2021.01.12.426453 (2021).

  35. Lehmann, B. C. L., Mackintosh, M., McVean, G. & Holmes, C. C. High trait variability in optimal polygenic prediction strategy within multiple-ancestry cohorts. Preprint at bioRxiv https://doi.org/10.1101/2021.01.15.426781 (2021).

  36. Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Cavazos, T. B. & Witte, J. S. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. HGG Adv. 2, 100017 (2021).

    PubMed  Google Scholar 

  39. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Bentley, A. R. et al. Multi-ancestry genome-wide gene–smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids. Nat. Genet. 51, 636–648 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).

    PubMed  PubMed Central  Google Scholar 

  43. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Baigent, C. et al. Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90 056 participants in 14 randomised trials of statins. Lancet 366, 1267–1278 (2005).

    CAS  PubMed  Google Scholar 

  45. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).

    PubMed  PubMed Central  Google Scholar 

  48. Feng, S., Liu, D., Zhan, X., Wing, M. K. & Abecasis, G. R. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 30, 2828–2829 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Loh, P.-R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Liu, X. et al. WGSA: an annotation pipeline for human genome sequencing studies. J. Med. Genet. 53, 111–112 (2016).

    CAS  PubMed  Google Scholar 

  52. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    PubMed  PubMed Central  Google Scholar 

  54. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200–204 (2014).

    CAS  PubMed  Google Scholar 

  56. Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

    MathSciNet  MATH  Google Scholar 

  58. Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    PubMed  PubMed Central  Google Scholar 

  60. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    PubMed  PubMed Central  Google Scholar 

  62. GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Google Scholar 

  63. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

    CAS  PubMed  Google Scholar 

  65. Finer, S. et al. Cohort Profile: East London Genes &Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. Int. J. Epidemiol. 49, 20–21i (2019).

    PubMed Central  Google Scholar 

  66. Moon, S. et al. The Korea Biobank Array: design and identification of coding variants associated with blood biochemical traits. Sci. Rep. 9, 1382 (2019).

    ADS  PubMed  PubMed Central  Google Scholar 

  67. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Funding for the Global Lipids Genetics Consortium was provided by the NIH (R01-HL127564). This research was conducted using the UK Biobank Resource under application number 24460. Computing support and file management for central meta-analysis by S. Caron is acknowledged. This research is based on data from the MVP, Office of Research and Development, Veterans Health Administration, and was supported by awards 2I01BX003362-03A1 and 1I01BX004821-01A1. This publication does not represent the views of the Department of Veteran Affairs or the United States Government. Study-specific acknowledgements are provided in the Supplementary Information.

Author information

Authors and Affiliations

Authors