The overwhelming majority of participants in current genetic studies are of European ancestry. To elucidate disease biology in the East Asian population, we conducted a genome-wide association study (GWAS) with 212,453 Japanese individuals across 42 diseases. We detected 320 independent signals in 276 loci for 27 diseases, with 25 novel loci (P < 9.58 × 10−9). East Asian–specific missense variants were identified as candidate causal variants for three novel loci, and we successfully replicated two of them by analyzing independent Japanese cohorts; p.R220W of ATG16L2 (associated with coronary artery disease) and p.V326A of POT1 (associated with lung cancer). We further investigated enrichment of heritability within 2,868 annotations of genome-wide transcription factor occupancy, and identified 378 significant enrichments across nine diseases (false discovery rate < 0.05) (for example, NKX3-1 for prostate cancer). This large-scale GWAS in a Japanese population provides insights into the etiology of complex diseases and highlights the importance of performing GWAS in non-European populations.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
GWAS summary statistics of the 42 diseases are publicly available from our website (JENGER; http://jenger.riken.jp/en/) and the National Bioscience Database Center Human Database (https://humandbs.biosciencedbc.jp/en/) (research ID: hum0014) without any access restrictions. GWAS genotype data for case samples were deposited at the National Bioscience Database Center Human Database (research ID: hum0014).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).
Diversity matters. Nat. Rev. Genet. 20, 495 (2019).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Maas, P. et al. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the united states. JAMA Oncol. 2, 1295–1302 (2016).
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Kullo, I. J. et al. Incorporating a genetic risk score into coronary heart disease risk estimates clinical perspective. Circulation 133, 1181–1188 (2016).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation 135, 2091–2101 (2017).
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Estrada, K. et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population the SIGMA Type 2 Diabetes Consortium. J. Am. Med. Assoc. 311, 2305–2314 (2014).
Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Ma, T., Wu, S., Yan, W., Xie, R. & Zhou, C. A functional variant of ATG16L2 is associated with Crohn’s disease in the Chinese population. Color. Dis. 18, O420–O426 (2016).
Van der Harst, P. & Verweij, N. The identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).
Calvete, O. et al. The wide spectrum of POT1 gene variants correlates with multiple cancer types. Eur. J. Hum. Genet. 25, 1278–1281 (2017).
Bainbridge, M. N. et al. Germline mutations in shelterin complex genes are associated with familial glioma. J. Natl Cancer Inst. 107, 384 (2015).
Robles-Espinoza, C. D. et al. POT1 loss-of-function variants predispose to familial melanoma. Nat. Genet. 46, 478–481 (2014).
Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Kawase, T. et al. PH domain-only protein PHLDA3 is a p53-regulated repressor of Akt. Cell 136, 535–550 (2009).
Bujor, A. M. et al. Akt blockade downregulates collagen and upregulates MMP1 in human dermal fibroblasts. J. Invest. Dermatol. 128, 1906–1914 (2008).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Kobayashi, Y. et al. Mice lacking hypertension candidate gene ATP2B1 in vascular smooth muscle cells show significant blood pressure elevation. Hypertension 59, 854–860 (2012).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Frati, F. et al. The role of the microbiome in asthma: the gut–lung axis. Int. J. Mol. Sci. 20, E123 (2018).
Stokholm, J. et al. Maturation of the gut microbiome and risk of asthma in childhood. Nat. Commun. 9, 141 (2018).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Matsuda, M., Sakamoto, N. & Fukumaki, Y. Delta-thalassemia caused by disruption of the site for an erythroid-specific transcription factor, GATA-1, in the delta-globin gene promoter. Blood 80, 1347–1351 (1992).
De Gobbi, M. et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 312, 1215–1217 (2006).
Pevny, L. et al. Erythroid differentiation in chimaeric mice blocked by a targeted mutation in the gene for transcription factor GATA-1. Nature 349, 257–260 (1991).
Elhanati, Y., Marcou, Q., Mora, T. & Walczak, A. M. RepgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data. Bioinformatics 32, 1943–1951 (2016).
Welch, J. J. et al. Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 104, 3136–3147 (2004).
Lantz, K. A. et al. Foxa2 regulates multiple pathways of insulin secretion. J. Clin. Invest. 114, 512–520 (2004).
Bowen, C. et al. Loss of NKX3.1 expression in human prostate cancers correlates with tumor progression. Cancer Res. 60, 6111–6115 (2000).
Deplancke, B., Alpern, D. & Gardeux, V. The genetics of transcription factor DNA binding variation. Cell 166, 538–554 (2016).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Gaulton, K. J. et al. Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat. Genet. 47, 1415–1425 (2015).
Wolfe, D., Dudek, S., Ritchie, M. D. & Pendergrass, S. A. Visualizing genomic information across chromosomes with PhenoGram. BioData Min. 6, 18 (2013).
Kuriyama, S. et al. The Tohoku Medical Megabank Project: design and mission. J. Epidemiol. 26, 493–511 (2016).
Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Matoba, N. et al. GWAS of 165,084 Japanese individuals identified nine loci associated with dietary habits. Nat. Hum. Behav. 4, 308–316 (2020).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Mizuno, H. et al. Impact of atherosclerosis-related gene polymorphisms on mortality and recurrent events after myocardial infarction. Atherosclerosis 185, 400–405 (2006).
Asanomi, Y. et al. A rare functional variant of SHARPIN attenuates the inflammatory response and associates with increased risk of late-onset Alzheimer’s disease. Mol. Med. 25, 20 (2019).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
We acknowledge the staff of the BBJ for their outstanding assistance. We express our heartfelt gratitude to the Tohoku Medical Megabank Organization (Tohoku University), Iwate Tohoku Medical Megabank Organization (Iwate Medical University), Japan Public Health Center-based Prospective Study, Japan Multi-Institutional Collaborative Cohort Study and NCGG Biobank for their invaluable contributions to collecting control samples. We also express our gratitude to the OACIS for contributing to the replication study of CAD, and the National Cancer Center Hospital for contributing to the replication study of lung cancer. We also express our gratitude to E.K. and H.S. for kindly sharing their results of chromatin immunoprecipitation sequencing data analysis. We extend our appreciation to Y. Yukawa, Y. Yokoyama and other members of the Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences for great support. This research was supported by the Tailor-Made Medical Treatment Program (BBJ) of the Ministry of Education, Culture, Sports, Science and Technology (MEXT), the Japan Agency for Medical Research and Development (AMED) under grant numbers JP17km0305002 (to M.Kubo) and JP17km0305001 (to M.M., S.Nagayama, Y.D., Y.Miki, T.Katagiri, O.O., W.O., H.I., T.Yoshida, I.I., T.Takahashi, J.I. and K.M.), JST KAKENHI grants (18H02932 to S.I.) and the Research Program on Hepatitis from AMED (JP19fk0310109 and JP19fk0210020 to K.C.). The Tohoku Medical Megabank Organization (Tohoku University) was supported in part by MEXT-JST and AMED, the most recent grant numbers being JP19km0105001 and JP19km0105002 (to M.Y.). The Iwate Tohoku Medical Megabank Organization (Iwate Medical University) was supported in part by MEXT-JST and AMED, the most recent grant numbers being JP19km0105003 and JP19km0105004. The Japan Public Health Center Prospective Study has been supported by the National Cancer Center Research and Development Fund since 2011 (latest grant number: 29-A-4, to S.T.) and was supported by a Grant-in-Aid for Cancer Research from the Ministry of Health, Labour and Welfare of Japan from 1989–2010. The Japan Multi-Institutional Collaborative Cohort Study was supported by Grants-in-Aid for Scientific Research for Priority Areas of Cancer (17015018 to Nobuyuki Hamajima) and Innovative Areas (221S0001 to Hideo Tanaka) and by Japan Society for the Promotion of Science KAKENHI grants (CoBiA and 16H06277) from MEXT (to Hideo Tanaka and K.W.). The NCGG study was partly supported by AMED under grant number JP18kk0205009 (S.Niida) and JP20dk0207045 (K.O.). The OACIS was supported by AMED (JP19ek0210081 to Yasuhiko Sakata). The lung cancer study at the National Cancer Center Hospital was supported by the National Cancer Center Research and Development Fund (NCC Biobank), AMED (JP16ck0106096 to T.Kohno) and the Ministry of Health, Labour and Welfare program (H29-Gantaisaku-Ippann-025 to T.Kohno). The study at Fujita Health University was supported by AMED under grant numbers JP20dm0107097 (M.Ikeda and N.I.), JP20km0405201 (N.I.) and JP20km0405208 (M.Ikeda).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Study designs in this GWAS. Study design 1 (top) was used in the main analysis. An example of study design 1 is provided; in GWAS of disease 3, we included all other patients (except those have related diseases) into control group. The definition of related diseases is provided in Supplementary Table 1. Study design 2 (bottom) was used to discuss the appropriateness of study design selection. b, Effect size estimates and S.E. at the 309 autosomal disease-associated variants detected in sex-combined analysis (P < 5 × 10−8). We compared the effect size estimates in study design 1 with those in study design 2. Heterogeneity between two studies was tested using Cochran’s Q test. The identity line is shown in blue. The red dot (rs373205748 associated with arrhythmia) indicates a variant with significant heterogeneity in effect size estimates between two study designs (P = 0.00012 < 0.05/309).
We compared effect sizes reported in the previous GWAS with those in this GWAS. Effect size and S.E. are shown. The identity line is shown in blue. The sample size of GWAS is provided in Table 1. We utilized a generalized linear mixed model in our GWAS.
We first compared effect sizes reported in the previous GWAS with those in our GWAS (Supplementary Table 3 and Extended Data Fig. 2); 1,219 out of 1,396 previously reported risk alleles were replicated with the same effect direction (177 alleles were not replicated). We compared MAF of replicated variants (n = 1,219) and MAF of not replicated variants (n = 177). Mann-Whitney U test P value is provided (two-sided test).
Extended Data Fig. 4 Permutation test to estimate appropriate P value threshold to control type I errors.
Using 1,000 simulated binary phenotypes with down-sampled samples (n = 10,000), we conducted GWAS utilizing the same strategy as used in the main analysis. a, The distribution of minimum P values in each phenotype (Pmin). The 95-th percentile of Pmin was 2.87 × 10−8. The 95% confidence interval was estimated by 1,000 bootstraps. b, The distributions of Pmin using all samples (n = 198,137) and those using 10,000 samples. To increase computational efficiency, we restricted this analysis to imputed genotype data in chromosome 22. For this analysis in b, we utilized Plink2.
Extended Data Fig. 5 Allele frequency comparison between novel and known disease-associated variants.
MAF comparison at disease-associated variants at novel (n = 41) and known loci (n = 153) with suggestive significance (P < 5 × 10−8) (a, East Asian populations; b, European populations in 1KG phase3). For known loci, we restricted this analysis to loci where the closest reported variants were discovered by GWAS in European populations. Mann-Whitney U test P value is provided (two-sided test).
Extended Data Fig. 6 A novel association which can be explained by an East Asian-specific missense variant.
A regional association plot for keloid (812 cases vs 211,641 controls) at the PHLDA3 region is provided. We utilized a generalized linear mixed model in our GWAS.
Effect size and S.E. are provided for neoplastic diseases (a) and non-neoplastic diseases (b). The sample size of GWAS is provided in Table 1. We utilized a generalized linear mixed model in our GWAS.
Extended Data Fig. 8 Comparison of allelic directions between this GWAS and previous European GWAS at known loci.
a, Schematic explanations how we compared statistics between BBJ-GWAS and GWAS conducted in European populations (EUR-GWAS). We utilized two inclusion criteria of known loci: (i) EUR-GWAS has significant associations (P < 5 × 10−8) within 1Mb from the BBJ-lead variants and (ii) the BBJ-lead variant is in LD with the lead variant in the European-GWAS (r2 > 0.4 in European samples in 1KG phase3). The first criterion was added to exclude loci where EUR-GWAS has insufficient power (112 known loci remained after applying the first criterion). The second criterion was added because EUR-GWAS statistics at the BBJ-lead variant is not representing those at the EUR-lead variant when they are not in LD. b, effect sizes of BBJ- and EUR-GWAS at the BBJ-lead variants. All variants which passed the first criterion were used (n = 112). Variants which passed the second criterion are shown in red (n = 65). Since two variants have extremely large effect size, we provided two plots in different scales. The three variants with the opposite effect directions are marked by large dots, and their details are also provided. c, Regional association of T2D around rs12031188. Variants in LD (r2 > 0.4) with BBJ-lead variant (rs12031188) but not with EUR-lead variant are shown in red; Variants in LD (r2 > 0.4) with both lead variants are shown in blue. East Asians and Europeans in 1KG phase3 were used for LD calculation of the BBJ- and the EUR-lead variant, respectively.
a. Genetic correlations between male- and female-specific GWAS. Estimates of genetic correlation and standard errors are provided. *: genetic correlation was significantly different from one (two-sided t test P = 2.2 × 10−3 < 0.05/20). b. The results of S-LDSC analysis based on sex-specific GWAS of asthma using 220 cell-type specific annotations. Significant annotations in either male or female asthma were shown (P < 0.05/220). Heterogeneity was tested by Cochran’s Q test, and its P values (Phet) were also provided. Black dashed line indicates P value = 0.05/220; grey dashed line indicates P value = 0.05.
The results of S-LDSC were plotted on the UMAP space. The significant results (FDR<0.05) were highlighted by cluster-specific colors (the same colors as used in Fig. 4). The names of the top five most significant TFs were also shown on the plot. The results of diseases with less than five significant TF binding site tracks were shown.
About this article
Cite this article
Ishigaki, K., Akiyama, M., Kanai, M. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat Genet 52, 669–679 (2020). https://doi.org/10.1038/s41588-020-0640-3