Abstract
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R2). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Data are available at: IMPACT Github repository: https://github.com/immunogenomics/IMPACT; IMPACT 707 annotations: https://github.com/immunogenomics/IMPACT/tree/master/IMPACT707. Data were obtained from the following resources: HOMER: http://homer.ucsd.edu/homer/motif/; S-LDSC: https://github.com/bulik/ldsc; 1000 Genomes: http://www.internationalgenome.org/; cell-type-specifically expressed gene set annotations and LD scores: https://data.broadinstitute.org/alkesgroup/LDSCORE/LDSC_SEG_ldscores/; cell-type-specific histone modification ChIP–seq datasets: https://data.broadinstitute.org/alkesgroup/LDSCORE/; Plink: https://www.cog-genomics.org/plink2; Riken website: http://jenger.riken.jp/en/; Price Lab GWAS summary statistics: https://data.broadinstitute.org/alkesgroup/sumstats_formatted/; Neale Lab GWAS summary statistics: http://www.nealelab.is/uk-biobank; GWAS catalog: https://www.ebi.ac.uk/gwas/; Deep Learning: https://data.broadinstitute.org/alkesgroup/LDSCORE/DeepLearning/.
Code availability
We have provided code to recreate our analyses at https://github.com/immunogenomics/IMPACT/tree/master/IMPACT707/AnalysisCode.
References
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 405e1–405e3 (2013).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Sharp, S. A. et al. Development and standardization of an improved type 1 diabetes genetic risk score for use in newborn screening and incident diagnosis. Diabetes Care 42, 200–207 (2019).
Kullo, I. J. et al. Incorporating a genetic risk score into coronary heart disease risk estimates: effect on low-density lipoprotein cholesterol levels (the MI-GENES Clinical Trial). Circulation 133, 1181–1188 (2016).
Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation 135, 2091–2101 (2017).
Márquez-Luna, C., Loh, P.-R., South Asian type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Curtis, D. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. Psychiatr. Genet. 28, 85–89 (2018).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Hu, Y. et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 13, e1005589 (2017).
Márquez-Luna, C. et al. Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Preprint at bioRxiv https://doi.org/10.1101/375337 (2018).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Reshef, Y. A. et al. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat. Genet. 50, 1483–1493 (2018).
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).
Amariuta, T. et al. IMPACT: genomic annotation of cell-state-specific regulatory elements inferred from the epigenome of bound transcription factors. Am. J. Hum. Genet. 104, 879–895 (2019).
Kawakami, E., Nakaoka, S., Ohta, T. & Kitano, H. Weighted enrichment method for prediction of transcription regulators from transcriptome and global chromatin immunoprecipitation data. Nucleic Acids Res. 44, 5010–5021 (2016).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Roadmap EpigenomicsConsortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).
Akiyama, M. et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 49, 1458–1467 (2017).
Ishigaki, K. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. 52, 669–679 (2020).
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).
Gurdasani, D., Barroso, I., Zeggini, E. & Sandhu, M. S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20, 520–535 (2019).
Drake, L. Y. et al. B cells play key roles in th2-type airway immune responses in mice exposed to natural airborne allergens. PLoS One 10, e0121660 (2015).
Amariuta, T., Luo, Y., Knevel, R., Okada, Y. & Raychaudhuri, S. Advances in genetics toward identifying pathogenic cell states of rheumatoid arthritis. Immunol. Rev. 294, 188–204 (2019).
Buttari, B., Profumo, E. & Riganò, R. Crosstalk between red blood cells and the immune system and its impact on atherosclerosis. Biomed. Res. Int. 2015, 616834 (2015).
Anderson, H. L., Brodsky, I. E. & Mangalmurti, N. S. The evolving erythrocyte: red blood cells as modulators of innate immunity. J. Immunol. 201, 1343–1351 (2018).
Lui, J. C. & Baron, J. Mechanisms limiting body growth in mammals. Endocr. Rev. 32, 422–440 (2011).
Maier, A. B., van Heemst, D. & Westendorp, R. G. J. Relation between body height and replicative capacity of human fibroblasts in nonagenarians. J. Gerontol. A Biol. Sci. Med. Sci. 63, 43–45 (2008).
Murphy, R. A. et al. Adipose tissue, muscle, and function: potential mediators of associations between body weight and mortality in older adults with type 2 diabetes. Diabetes Care 37, 3213–3219 (2014).
Heymsfield, S. B., Gallagher, D., Mayer, L., Beetsch, J. & Pietrobelli, A. Scaling of human body composition to stature: new insights into body mass index. Am. J. Clin. Nutr. 86, 82–91 (2007).
Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).
Gusev, A. et al. Atlas of prostate cancer heritability in European and African-American men pinpoints tissue-specific regulation. Nat. Commun. 7, 10979 (2016).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Gibbs, R. A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Dey, K. K. et al. Evaluating the informativeness of deep learning annotations for human complex diseases. Nat. Commun. 11, 4703 (2020).
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Yang, J., Zeng, J., Goddard, M. E., Wray, N. R. & Visscher, P. M. Concepts, estimation and interpretation of SNP-based heritability. Nat. Genet. 49, 1304–1310 (2017).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Lee, S. H., Goddard, M. E., Wray, N. R. & Visscher, P. M. A better coefficient of determination for genetic profile analysis. Genet. Epidemiol. 36, 214–224 (2012).
Acknowledgements
This work is supported in part by funding from the National Institutes of Health (grant nos. NHGRI T32 HG002295, UH2AR067677, 1U01HG009088, U01 HG009379 and 1R01AR063759).
Author information
Authors and Affiliations
Contributions
T.A., K.I. and S.R. conceived and designed the study. T.A., K.I., A.L.P. and S.R. conducted statistical genetic analysis. T.A. and S.R. conducted functional genomic data analysis. H.S., T.O. and E.K. performed TF ChIP–seq data collection and analysis. K.K.D., M.K. and A.L.P. performed deep learning analysis. K.I., K.M., Y.M. and C.T. managed and analyzed BBJ data. T.A., K.I. and S.R. wrote the initial draft of the manuscript. All co-authors contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Data collection.
a) TF ChIP-seq collection from NCBI: (left) cell type and TF diversity where ‘Cell Deriv’ indicates number of unique parental cell types, for example GM12878 and GM10847 are both B cell lines, (right) diversity of tissue types. b) (left) Epigenomic and sequence features to be used in IMPACT models, (right) diversity of histone modification ChIP-seq in features. c) Diversity of European (EUR) and East Asian (EAS) GWAS summary statistics across phenotypic categories.
Extended Data Fig. 2 IMPACT annotation-trait associations.
Significant cell type-phenotype associations across 707 IMPACT regulatory annotations and 111 complex traits and diseases at τ* 5% FDR, color indicates -log10 FDR 5% adjusted P value of τ*. Zooms shows particular cell type categories enriched for polygenic trait associations.
Extended Data Fig. 3 Proportion of heritability in the top 5% of SNPs.
a) Common SNP heritability captured by the top 5% of SNPs according to the lead cell type association for each EUR GWAS. Lead association determined by largest τ* estimate that is significantly positive. b) Similar for each EAS GWAS. Gray bars indicate the standard error of the heritability estimate. Color represents the category of the complex trait or disease.
Extended Data Fig. 4 τ* comparison of IMPACT annotations versus cell-type-specific histone marks.
Comparison of two different functional annotations, IMPACT and cell-type-specific histone marks, to capture polygenic heritability assessed by quantifying τ* per-SNP heritability value. Circled are five representative traits used throughout the study: asthma, RA, PrCa, MCV, and height.
Extended Data Fig. 5 Common per-SNP heritability (τ*) estimate for sets of independent IMPACT cell type annotations across 29 traits.
Dotted line is the identity line, y=x. τ* values with their standard errors are colored green if significantly positive in EUR and not EAS, red if significantly positive in EAS but not in EUR, green if significantly positive in both EUR and EAS, and gray if not significantly positive in either population.
Extended Data Fig. 6 Population concordance of heterozygosity (2pq) among variants prioritized by IMPACT compared to standard P+T.
a) Heterozygosity of variants from genome-wide EUR and EAS PrCa summary statistics in the top 5% of the lead IMPACT annotation for EUR PrCa. b) Heterozygosity of variants from genome-wide EUR and EAS PrCa summary statistics using standard P+T. c) Heterozygosity of variants from genome-wide EUR and EAS PrCa summary statistics in the bottom 95% of the lead IMPACT annotation for PrCa; mutually exclusive with SNPs in A). d) Meta-analysis of heterozygosity correlations between populations across 21 traits shared between EUR and EAS cohorts over 17 GWAS P value thresholds (with reference to the EUR GWAS).
Extended Data Fig. 7 Population divergence, measured by Fst, among variants prioritized by IMPACT compared to standard P+T.
Larger values indicate a reduction in heterozygosity. Meta-analysis of Fst between EUR and EAS populations across 21 traits shared between EUR and EAS cohorts over 17 GWAS P value thresholds (with reference to the EUR GWAS).
Extended Data Fig. 8 EUR PRS model evaluated on EAS individuals from BBJ.
For each trait, we evaluate the predictive value of standard PRS models (top 100% of IMPACT SNPs) and functionally informed PRS models (using a subset of SNPs prioritized by IMPACT). The top 100% of SNPs according to IMPACT represents the PRS model with no functional annotation information. Intervals represent the 95% CI around the R2 estimate. For quantitative traits, R2 represents the proportion of variance captured by the linear PRS model. For case–control traits, R2 represents the liability scale R2 from the logistic regression PRS model.
Extended Data Fig. 9 Trans-ethnic and within-population PRS models evaluated on the same 5,000 BBJ individuals.
a) Phenotypic variance (R2) in 5,000 BBJ individuals explained by IMPACT-informed PRS-EUR (light pink) and standard PRS-EUR (light blue). b) Phenotypic variance (R2) in 5,000 BBJ individuals explained by IMPACT-informed PRS-EAS (light pink) and standard PRS-EAS (light blue). Error bars indicate 95% CI calculated via 1,000 bootstraps.
Extended Data Fig. 10 PRS accuracy is robust to loci of large effect.
We recomputed confidence intervals around the R2 estimates (panels A and B) and around the relative improvements in R2 estimates of IMPACT PRS over standard P+T PRS (panels C and D) via block jackknife across the genome, using 200 adjacent equally-sized bins and iteratively removing variants within each bin and computing the R2. a) Trans-ethnic analysis of EUR PRS to BBJ individuals. b) Within-population analysis of EAS PRS to BBJ individuals. Error bars indicate 95% confidence interval (CI) around the R2 estimates. c) Trans-ethnic analysis of EUR PRS to BBJ individuals, relative improvement in R2 estimates defined as (IMPACT R2 - standard P+T R2)/standard P+T R2. d) Within-population analysis of EAS PRS to BBJ individuals, relative improvement in R2 estimates defined as (IMPACT R2 - standard P+T R2)/standard P+T R2.
Supplementary information
Supplementary Information
Supplementary Note and Figs. 1–23
Supplementary Tables
Supplementary Tables 1–22
Rights and permissions
About this article
Cite this article
Amariuta, T., Ishigaki, K., Sugishita, H. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat Genet 52, 1346–1354 (2020). https://doi.org/10.1038/s41588-020-00740-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-00740-8
This article is cited by
-
Genetic effects of sequence-conserved enhancer-like elements on human complex traits
Genome Biology (2024)
-
Principles and methods for transferring polygenic risk scores across global populations
Nature Reviews Genetics (2024)
-
Fine-mapping the CYP2A6 regional association with nicotine metabolism among African American smokers
Molecular Psychiatry (2024)
-
Cross-ancestry genetic architecture and prediction for cholesterol traits
Human Genetics (2024)
-
Overestimated prediction using polygenic prediction derived from summary statistics
BMC Genomic Data (2023)