Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1,2,3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4,5,6,7,8,9,10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States—where minority populations have a disproportionately higher burden of chronic conditions13—the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Individual-level phenotype and genotype data are available through dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000356). Allele frequency data will be available for all genotyped sites on dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/) and the University of Chicago Geography of Genetic Variants Browser (http://popgen.uchicago.edu/ggv/). Clinically relevant variant frequency data are available through ClinGen (https://curation.clinicalgenome.org/). Summary statistics for the genome-wide association study results are available through the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics).
Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).
Bustamante, C. D., Burchard, E. G. & De La Vega, F. M. Genomics for the world. Nature 475, 163–165 (2011).
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).
The SIGMA Type 2 Diabetes Consortium. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. J. Am. Med. Assoc. 311, 2305–2314 (2014).
Gudmundsson, J. et al. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat. Genet. 44, 1326–1329 (2012).
Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).
Kenny, E. E. et al. Melanesian blond hair is caused by an amino acid change in TYRP1. Science 336, 554 (2012).
Manning, A. et al. A low-frequency inactivating AKT2 variant enriched in the Finnish population is associated with fasting insulin levels and type 2 diabetes risk. Diabetes 66, 2019–2032 (2017).
Han, Y. et al. Prostate cancer susceptibility in men of African ancestry at 8q24. J. Natl Cancer Inst. 108, djv431 (2016).
Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11, e1001661 (2013).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Liao, Y. et al. Surveillance of health status in minority communities — racial and ethnic approaches to community health across the U.S. (REACH U.S.) risk factor survey, United States, 2009. MMWR Surveill. Summ. 60, 1–44 (2011).
Wojcik, G. L. et al. Imputation-aware tag SNP selection to improve power for large-scale, multi-ethnic association studies. G3 (Bethesda) 8, 3255–3267 (2018).
Rosenberg, N. A. et al. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 1, e70 (2005).
Conomos, M. P. et al. Genetic diversity and association studies in US Hispanic/Latino populations: applications in the Hispanic community health study/study of Latinos. Am. J. Hum. Genet. 98, 165–184 (2016).
Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015).
Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98, 127–148 (2016).
Lin, D.-Y. et al. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 95, 675–688 (2014).
Lin, D. Y. & Zeng, D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika 97, 321–332 (2010).
Fadista, J., Manning, A. K., Florez, J. C. & Groop, L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 24, 1202–1205 (2016).
MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS ONE 10, e0120758 (2015).
Bien, S. A. et al. Strategies for enriching variant coverage in candidate disease loci on a multiethnic genotyping array. PLoS ONE 11, e0167758 (2016).
Lacy, M. E. et al. Association of sickle cell trait with hemoglobin A1c in African americans. J. Am. Med. Assoc. 317, 507–515 (2017).
Lin, C.-N. et al. Effects of hemoglobin C, D, E, and S traits on measurements of HbA1c by six methods. Clin. Chim. Acta 413, 819–821 (2012).
Mongia, S. K. et al. Effects of hemoglobin C and S traits on the results of 14 commercial glycated hemoglobin assays. Am. J. Clin. Pathol. 130, 136–140 (2008).
Roberts, W. L. et al. Effects of hemoglobin C and S traits on glycohemoglobin measurements by eleven methods. Clin. Chem. 51, 776–778 (2005).
Henn, B. M. et al. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl Acad. Sci. USA 108, 5154–5162 (2011).
Baker, J. L., Shriner, D., Bentley, A. R. & Rotimi, C. N. Pharmacogenomic implications of the evolutionary history of infectious diseases in Africa. Pharmacogenomics J. 17, 112–120 (2017).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).
Colby, S. L. & Ortman, J. M. Projections of the Size and Composition of the U.S. Population: 2014 to 2060 (United States Census Bureau, 2015).
United Nations Population Fund. State of World Population 2016. http://www.unfpa.org/swop (2016).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI) with co-funding from the National Institute on Minority Health and Health Disparities (NIMHD). The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health (NIH). The PAGE consortium thanks the staff and participants of all PAGE studies for their contributions. We thank R. Williams and M. Ginoza for providing assistance with program coordination. The complete list of PAGE members can be found at http://www.pagestudy.org. Assistance with data management, data integration, data dissemination, genotype imputation, ancestry deconvolution, population genetics, analysis pipelines and general study coordination was provided by the PAGE Coordinating Center (NIH U01HG007419). Genotyping services were provided by the Center for Inherited Disease Research (CIDR). The CIDR is fully funded through a federal contract from the NIH to The Johns Hopkins University, contract number HHSN268201200008I. Genotype data quality control and quality assurance services were provided by the Genetic Analysis Center in the Biostatistics Department of the University of Washington, through support provided by the CIDR contract. The data and materials included in this report result from collaboration between the following studies and organizations: BioMe Biobank, HCHS/SOL, MEC, PAGE Global Reference Panel and WHI. Their funding is listed below and additional acknowledgements can be found in Supplementary Information 12. The BioMe Biobank received funding for the PAGE IPM BioMe Biobank study through the National Human Genome Research Institute (NIH U01HG007417). Primary funding support to K.E.N., M.G., R.T., H.M.H., C.L.A., C.J.H., A.E.J., B.M.L., M.A.R., K.L.Y., E.B., L.F., M.F., G.H., D.L., C.L.W. and S.Y. (as part of HCHS/SOL) is provided by U01HG007416. Additional support was provided via R01DK101855 and 15GRNT25880008. The HCHS/SOL study was carried out as a collaborative study supported by contracts from the National Heart, Lung and Blood Institute (NHLBI) to the University of North Carolina (N01-HC65233), University of Miami (N01-HC65234), Albert Einstein College of Medicine (N01-HC65235), Northwestern University (N01-HC65236) and San Diego State University (N01-HC65237). The Multiethnic Cohort study (MEC) characterization of epidemiological architecture is funded through the NHGRI PAGE program (NIH U01 HG007397). The MEC study is funded through the National Cancer Institute U01 CA164973. The Stanford Global Reference Panel was created by Stanford-contributed samples and comprises multiple datasets from multiple researchers across the world designed to provide a resource for any researchers interested in diverse population data on the Multi-Ethnic Global Array (MEGA), funded by the NHGRI PAGE program (NIH U01HG007419). The authors thank the researchers and research participants who made this dataset available to the community. Funding support for the ‘Exonic variants and their relation to complex traits in minorities of the WHI’ study is provided through the NHGRI PAGE program (NIH U01HG007376). The WHI program is funded by the NHLBI, NIH, US Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C and HHSN271201100004C. K.K.N. was supported by the Cancer Prevention Training Grant in Nutrition, Exercise and Genetics R25CA094880 from the National Cancer Institute. C.R.G. was supported by NHGRI training grant T32 HG000044. H.M.H. was supported by NHLBI training grant T32 HL007055. A.E.J. was supported by NIH 5K99HL130580-02 and NIH L60 MD008384-02. K.L.Y. supported by NCATS KL2TR001109. J.M.K. was supported by KL2TR000421. R.W.W. was supported by NIH 5T32HD049311-07. D.-Y.L. was supported by R01CA082659, R01GM047845 and P01CA142538. L.F.-R. was supported by NICHD training grant T32 HD007168 and P2C HD050924. T.A.T. was supported by P01GM099568.
Nature thanks André G. Uitterlinden and the other anonymous reviewer(s) for their contribution to the peer review of this work.
C.D.B. is a member of the scientific advisory boards for Liberty Biosecurity, Personalis, 23andMe Roots into the Future, Ancestry.com, IdentifyGenomics and Etalon, and is a founder of CDB Consulting. C.R.G. and B.M.H. own stock in 23andMe. E.E.K. and C.R.G. are members of the scientific advisory board for Encompass Bioscience. E.E.K. consults for Illumina.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Number of unique participants in the GWAS Catalog from 2006 to 2017 (inclusive).
We observed that—although the number of unique participants (in millions) in the GWAS Catalog has grown substantially over the past decade—the relative proportion of participants of non-European descent has remained constant, with the majority of progress within Asian populations.
a, The correlation (r2) for novel and residual loci calculated by obtaining the individual level data for all PAGE participants and correlating the SNP genotype with each of the ten PCs. The correlation between each locus and each of the ten PCs was plotted on the y axis, novel loci are plotted in grey and residual loci are plotted in yellow. We observed an especially high correlation between a novel locus and PC4, which represents Native Hawaiian/Pacific Islander ancestry. b, The individual level data for all PAGE participants were obtained and plotted in a parallel coordinates plot, such that each PAGE individual is represented by a set of line segments connecting their eigenvalues. This allows us to see which race/ethnicity groups are differentiated at each PC. For example, we see predominantly green lines as outliers for PC4, which indicates that this vector represents a continuum of Native Hawaiian/Pacific Islander ancestry.
This file includes detailed descriptions of PAGE participating studies, phenotype harmonization, genotyping and imputation, population substructure characterization, the comparison of meta- and mega-analyses, extended statistical methods, and characterization of clinically-relevant variants, with 17 Supplementary Figures. Acknowledgements not included in the main text are also listed.
This file contains Supplementary Tables 1-7: Supplementary Table 1 Phenotypes in PAGE, both combined and stratified by self-identified race/ethnicity; Supplementary Table 2 Results from SUGEN and GENESIS of novel and secondary loci reaching genome-wide significance across all 26 traits; Supplementary Table 3 Results from SUGEN stratified by self-identified race/ethnicity and combined in fixed-effects meta-analysis for all novel and secondary loci across all 26 traits; Supplementary Table 4 Results from SUGEN and GENESIS for all previously reported loci in the combined sample (mega-analysis) for each continuous trait; Supplementary Table 5 All known variants with reference information for the indicated traits. This includes rsID, PubmedID, citation, sample descriptors (both discovery and replication), and reported gene; Supplementary Table 6 Bibliography and study descriptors for the largest published manuscript by trait in the NHGRI-EBI GWAS Catalog; Supplementary Table 7 Comparison of effect sizes (both as-published and standardized for sample size) of previously reported trait-loci associations between in the NHGRI-EBI GWAS Catalog and PAGE GWAS results.
About this article
The American Journal of Human Genetics (2019)
Nature Reviews Genetics (2019)
Frontiers in Immunology (2019)
Nature Reviews Genetics (2019)
Nature Communications (2019)