Abstract
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The genotype data of BBJ used in this study are available from the Japanese Genotype-phenotype Archive (JGA) with accession codes JGAS000114/JGAD000123 and JGAS000114/JGAD000220, which can be accessed through application at https://humandbs.biosciencedbc.jp/en/hum0014-latest. The UKB analysis was conducted via application number 47821. The genotype and phenotype data can be accessed through application at https://www.ukbiobank.ac.uk. This study used the FinnGen release 3 data. Summary results can be accessed through application at https://www.finngen.fi/en/access_results. We provide downloadable full GWAS summary statistics with an interactive visualization of Manhattan, LocusZoom and PheWAS plots at our PheWeb.jp website (https://pheweb.jp/). The summary statistics of GWASs in this study (BioBank Japan, European and cross-population meta-analyses) are also deposited at the National Bioscience Database Center (NBDC) Human Database (https://humandbs.biosciencedbc.jp/en/) with the accession code hum0197, and the GWAS Catalog (https://www.ebi.ac.uk/gwas/) with the study accession IDs from GCST90018563 (https://www.ebi.ac.uk/gwas/studies/GCST90018563) to GCST90019002 (https://www.ebi.ac.uk/gwas/studies/GCST90019002) (full IDs are described in the Supplementary Notes). The summary statistics of metabolite GWASs in the Japanese population (Tohoku Medical Megabank Organization) which we used for decomposition–projection analysis are available at https://jmorp.megabank.tohoku.ac.jp/202102/gwas/TGA000005. We used gnomAD database (https://gnomad.broadinstitute.org/) to refer to the allele frequencies.
Code availability
We used publicly available software for the analyses. The software used is listed and described in the Methods section of our manuscript.
References
Berger, D. A brief history of medical diagnosis and the birth of the clinical laboratory. Part 1—ancient times through the 19th century. MLO Med. Lab. Obs. 31, 28–30 (1999).
Organización Mundial de la Salud. International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10) (World Health Organization, 2016).
Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Denny, J. C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics 26, 1205–1210 (2010).
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Tanigawa, Y. et al. Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. Nat. Commun. 10, (2019).
Nagai, A. et al. Overview of the BioBank Japan project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Ritchie, M. D. et al. Robust replication of genotype–phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86, 560–572 (2010).
Ishigaki, K. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. 52, 669–679 (2020).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Bulik-Sullivan, B. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Hagiya, H. et al. Trends in incidence and mortality of tuberculosis in Japan: a population-based study, 1997-2016. Epidemiol. Infect. 147, e38 (2019).
WHO. Global Tuberculosis Report. https://apps.who.int/iris/bitstream/handle/10665/336069/9789240013131-eng.pdf (2020).
Wu, Y. et al. Genome-wide association study of medication-use and associated disease in the UK Biobank. Nat. Commun. 10, 1891 (2019).
Gagliano Taliun, S. A. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat. Genet. 52, 550–552 (2020).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).
Pendergrass, S. A. et al. A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans. PLoS ONE 14, e0226771 (2019).
Verma, A. et al. PheWAS and beyond: the landscape of associations with medical diagnoses and clinical measures across 38,662 individuals from Geisinger. Am. J. Hum. Genet. 102, 592–608 (2018).
Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Qi, H. et al. Discovery of susceptibility loci associated with tuberculosis in Han Chinese. Hum. Mol. Genet. 26, 4752–4763 (2017).
Sveinbjornsson, G. et al. HLA class II sequence variants influence tuberculosis risk in populations of European ancestry. Nat. Genet. 48, 318–322 (2016).
Baldini, C., Ferro, F., Mosca, M., Fallahi, P. & Antonelli, A. The association of Sjögren syndrome and autoimmune thyroid disorders. Front. Endocrinol. 9, 121 (2018).
Nakao, M. et al. ABO blood group alleles and the risk of pancreatic cancer in a Japanese population. Cancer Sci. 102, 1076–1080 (2011).
Edgren, G. et al. Risk of gastric cancer and peptic ulcers in relation to ABO blood type: a cohort study. Am. J. Epidemiol. 172, 1280–1285 (2010).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Anttila, V. et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).
Burren, O. S. et al. Informed dimension reduction of clinically-related genome-wide association. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2020.01.14.905869v3 (2020).
Chasman, D. I., Giulianini, F., Demler, O. V. & Udler, M. S. Pleiotropy-based decomposition of genetic risk scores: association and interaction analysis for type 2 diabetes and CAD. Am. J. Hum. Genet. 106, 646–658 (2020).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).
GTEx Consortium, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).
Portincasa, P. & Wang, D. Q. H. Intestinal absorption, hepatic synthesis, and biliary secretion of cholesterol: where are we for cholesterol gallstone formation? Hepatology 55, 1313–1316 (2012).
Vessey, D. A. The biochemical basis for the conjugation of bile acids with either glycine or taurine. Biochem. J. 174, 621–626 (1978).
Coombs, R. R. A. & Gell, P. G. (eds) in Clinical Aspects of Immunology 317–337 (Blackwell Science, 1963).
Stone, K. D., Prussin, C. & Metcalfe, D. D. IgE, mast cells, basophils, and eosinophils. J. Allergy Clin. Immunol. 125, S73 (2010).
Kobayashi, K., Kaneda, K. & Kasama, T. Immunopathogenesis of delayed-type hypersensitivity. Microsc. Res. Tech. 53, 241–245 (2001).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Sakaue, S. & Okada, Y. GREP: Genome for REPositioning drugs. Bioinformatics 35, 3821–3823 (2019).
Sakaue, S. et al. Trans-biobank analysis with 676,000 individuals elucidates the association of polygenic risk scores of complex traits with human lifespan. Nat. Med. 26, 542–548 (2020).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).
Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE 8, e64683 (2013).
Severe Covid-19 GWAS Group et al. Genomewide association study of severe Covid-19 with respiratory failure. N. Engl. J. Med. 383, 1522–1534 (2020).
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Rüeger, S., McDaid, A. & Kutalik, Z. Evaluation and application of summary statistic imputation to discover new height-associated loci. PLoS Genet. 14, e1007371 (2018).
Kuriyama, S. et al. The Tohoku Medical Megabank Project: design and mission. J. Epidemiol. 26, 493–511 (2016).
Tadaka, S. et al. JMorp: Japanese Multi Omics Reference Panel. Nucleic Acids Res. 46, D551–D557 (2018).
Acknowledgements
We thank all the participants of BioBank Japan, UK Biobank and FinnGen. We thank K. Watanabe for her input in the analysis of phenotypic correlations and pleiotropy. This research was supported by the Tailor-Made Medical Treatment program (the BioBank Japan Project) of the Ministry of Education, Culture, Sports, Science, and Technology (MEXT), the Japan Agency for Medical Research and Development (AMED). The FinnGen project is funded by two grants from Business Finland (grant nos. HUS 4685/31/2016 and UH 4386/31/2016) and nine industry partners (AbbVie, AstraZeneca, Biogen, Celgene, Genentech, GSK, MSD, Pfizer and Sanofi). The following biobanks are acknowledged for collecting the FinnGen project samples: Auria Biobank (https://www.auria.fi/biopankki/), THL Biobank (https://thl.fi/fi/web/thl-biopank), Helsinki Biobank (https://www.terveyskyla.fi/helsinginbiopankki/), Northern Finland Biobank Borealis (https://www.ppshp.fi/Tutkimus-ja-opetus/Biopankki), Finnish Clinical Biobank Tampere (https://www.tays.fi/biopankki), Biobank of Eastern Finland (https://ita-suomenbiopankki.fi), Central Finland Biobank (https://www.ksshp.fi/fi-FI/Potilaalle/Biopankki), Finnish Red Cross Blood Service Biobank (https://www.bloodservice.fi/Research%20Projects/biobanking) and Terveystalo Biobank Finland (https://www.terveystalo.com/fi/Yritystietoa/Terveystalo-Biopankki/Biopankki/). S.S. was in part supported by the Mochida Memorial Foundation for Medical and Pharmaceutical Research, Kanae Foundation for the Promotion of Medical Science, Astellas Foundation for Research on Metabolic Disorders and the JCR Grant for Promoting Basic Rheumatology. M. Kanai was supported by a Nakajima Foundation Fellowship and the Masason Foundation. Y. Tanigawa is in part supported by a Funai Overseas Scholarship from the Funai Foundation for Information Technology and the Stanford University School of Medicine. M.A.R. is in part supported by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) under award no. R01HG010140, and an NIH Center for Multi- and Cross-population Mapping of Mendelian and Complex Diseases grant (no. 5U01 HG009080). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Y.O. was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (grant nos. 19H01021, 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and JP21km0405217), JST Moonshot R&D Grant Number JPMJMS2021, Takeda Science Foundation and the Bioinformatics Initiative of Osaka University Graduate School of Medicine, Osaka University.
Author information
Authors and Affiliations
Consortia
Contributions
S.S., M. Kanai and Y.O. conceived the study. S.S., M. Kanai, Y. Tanigawa., M.A.R. and Y.O. wrote the manuscript. S.S., M. Kanai, J.K., M. Kurki, T. Konuma, Kenichi Yamamoto, M.A., K. Ishigaki, Kazuhiko Yamamoto, Y. Kamatani, A.P., M.J.D. and Y.O. conducted GWAS data analysis. S.S., Y. Tanigawa and M.A.R. conducted statistical decomposition analysis. S.S., S.K., A.N., G.T. and Y.O. conducted metabolome analysis. A.S., K.S., W.O., K. Yamaji, K.T., S.A., Y. Takahashi, T.S., N.S., H.Y., S. Minami, S. Murayama, K. Yoshimori, S.N., D.O., M.H., A.M., Y. Koretsune, K. Ito, C.T., T.Y., I.K., T. Kadowaki, M.Y., Y.N., M. Kubo, Y.M., Kazuhiko Yamamoto and K.M. collected and managed samples and data. A.P. and M.J.D. coordinated collaboration with FinnGen.
Corresponding authors
Ethics declarations
Competing interests
M.A.R. is on the SAB of 54Gene and the Computational Advisory Board for Goldfinch Bio and has advised BioMarin, Third Rock Ventures, MazeTx and Related Sciences. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The remaining authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Caroline Hayward, Marylyn Ritchie, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overview of this study.
We performed 220 deep-phenotype GWASs in BioBank Japan, including 108 novel GWASs ever conducted in East Asian population. We performed trans-biobank meta-analyses with UK Biobank and FinnGen (ntotal = 628,000), resulting in discovery of 5,343 novel loci. All summary statistics are openly shared through pheweb.jp web portal. As downstream analyses, we performed (i) cross-population comparison of pleiotropy and genetic correlation, (ii) comprehensive HLA fine-mapping, and (iii) statistical decomposition of a matrix of summary statistics to gain insights into biology underlying current disease classifications, by incorporating functional genomics, metabolomics, and biomarker data.
Extended Data Fig. 2 Locus plots for representative loci.
(a) Regional association plots for Pulmonary Tuberculosis (PTB) in BBJ are shown. The lead variant (rs140780894) is colored in pink, and colors of other dots indicate linkage disequilibrium measure r2 with the lead variant. (b) Regional association plots for cholelithiasis in BBJ are shown. The lead variant (rs715) is colored in pink, and colors of other dots indicate linkage disequilibrium measure r2 with the lead variant. (c) Regional association plots for gastric diseases in BBJ at the PSCA locus in gastric ulcer, gastric cancer, and gastric polyp are shown. Rs2976397, which was a lead variant in gastric ulcer, is colored in pink, and colors of other dots indicate linkage disequilibrium measure r2 with the lead variant. (d) Regional association plots at the FUT3 locus in gall bladder polyp and cholelithiasis in BBJ are shown. Rs28362459, which was a lead variant in gall bladder polyp, is colored in pink, and colors of other dots indicate linkage disequilibrium measure r2 with the lead variant. (e) Regional association plots for urticaria in BBJ are shown. The lead variant (rs56043070) is colored in pink, and colors of other dots indicate linkage disequilibrium measure r2 with the lead variant. (f) Regional association plots for salicylic acids prescription in BBJ are shown. The lead variant (rs151193009) is colored in pink, and colors of other dots indicate linkage disequilibrium measure r2 with the lead variant.
Extended Data Fig. 3 The effect size correlation between BBJ GWAS and European GWAS.
The marginal effect sizes of genome-wide significant variants across traits in BBJ are compared with those in European GWAS. Each plot represents a variant, and is colored based on the significance in European GWAS as shown in the left top legend. Pearson’s correlation r and P value (two-sided) between BBJ GWAS and European GWAS are also shown in the legend.
Extended Data Fig. 4 Phenotypic correlation across 220 phenotypes in BBJ.
a. Heatmap of pair-wise phenotypic correlation matrix. The color of the cells indicates the value of correlation r as shown in a color scale at the bottom. The traits (rows and columns) were hierarchically clustered by hclust library in R. b. Silhouette score for clustering of closely related phenotypes with different number of clusters (Supplementary Notes).
Extended Data Fig. 5 The degree of pleiotropy in BBJ after accounting for phenotypic or genetic correlations.
The Manhattan-like plots show the number of significant associations (P < 5.0×10−8) at each tested genetic variant in Japanese. a. For all traits (ntrait = 220; as shown in Fig. 2a). b. After accounting for phenotypic correlations. c. After accounting for genetic correlations.
Extended Data Fig. 6 Genetic correlation matrices across populations.
The matrices describe pairwise genetic correlation rg in Japanese GWAS (a; n = 5,565) and in European GWAS (b; n = 10,878), which was estimated by bivariate LD score regression. A color of the cells indicates the value of rg as shown in a color scale at the bottom. The traits (rows and columns) were hierarchically clustered by hclust library in R, and trait domains are displayed as colored boxes (see Methods).
Extended Data Fig. 7 Network representation of the TSVD analysis.
Two-dimensional illustration of interconnection among 159 diseases and 40 latent components. Plots in blue indicate each trait’s statistics, and plots in pink indicate the latent components derived by TSVD. White lines represent the contribution of each phenotype in each component. The width of the lines indicates the strength of the contribution based on the squared cosine score.
Extended Data Fig. 8 Heatmap representation of squared cosine scores of diseases to components.
The components (rows) are shown from 1 (top) to 40 (bottom), and the diseases (columns) are sorted based on the contribution of each component to the disease based on the squared cosine score (from component 1 to 40). Each cell is colored based on the squared cosine score of a given trait to a given component, as shown in a color scale at the bottom right.
Extended Data Fig. 9 Enrichment analyses of genes explaining each component with tissue specificity.
A heatmap representation of the enrichment analyses of genes explaining each component with tissue-specific genes defined by GTEx expression profile (a) and regulatory vocabulary from ENCODE3 data (b). Each cell is colored based on Penrichment from Fisher’s exact tests to assess the enrichment of the genes comprising each component within each tissue-specific gene set as shown in a color scale at the bottom right.
Extended Data Fig. 10 Genetic variants analyzed in the three cohorts.
The Venn diagram showing the number of genetic variants analyzed in this study in each of the three cohorts (BBJ, UKB, and FinnGen) and overlapping variants across the cohorts.
Supplementary information
Supplementary Information
Supplementary Figs. 1 and 2, Supplementary Notes and Supplementary references
Supplementary Table 1
Supplementary Tables 1–12
Rights and permissions
About this article
Cite this article
Sakaue, S., Kanai, M., Tanigawa, Y. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53, 1415–1424 (2021). https://doi.org/10.1038/s41588-021-00931-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00931-x
This article is cited by
-
The relationship between innate/adaptive immunity and gastrointestinal cancer : a multi-omics Mendelian randomization study
BMC Gastroenterology (2024)
-
Type 1 diabetes, its complications, and non-ischemic cardiomyopathy: a mendelian randomization study of European ancestry
Cardiovascular Diabetology (2024)
-
Investigating the causal relationship of gut microbiota with GERD and BE: a bidirectional mendelian randomization
BMC Genomics (2024)
-
Causal relationships between neuropsychiatric disorders and nonalcoholic fatty liver disease: A bidirectional Mendelian randomization study
BMC Gastroenterology (2024)
-
The association between adipokines and pulmonary diseases: a mendelian randomization study
BMC Pulmonary Medicine (2024)