Genome-wide association studies in the Japanese population identify seven novel loci for type 2 diabetes

Genome-wide association studies (GWAS) have identified more than 80 susceptibility loci for type 2 diabetes (T2D), but most of its heritability still remains to be elucidated. In this study, we conducted a meta-analysis of GWAS for T2D in the Japanese population. Combined data from discovery and subsequent validation analyses (23,399 T2D cases and 31,722 controls) identify 7 new loci with genome-wide significance (P<5 × 10−8), rs1116357 near CCDC85A, rs147538848 in FAM60A, rs1575972 near DMRTA1, rs9309245 near ASB3, rs67156297 near ATP8B2, rs7107784 near MIR4686 and rs67839313 near INAFM2. Of these, the association of 4 loci with T2D is replicated in multi-ethnic populations other than Japanese (up to 65,936 T2Ds and 158,030 controls, P<0.007). These results indicate that expansion of single ethnic GWAS is still useful to identify novel susceptibility loci to complex traits not only for ethnicity-specific loci but also for common loci across different ethnicities.

T o date, more than 80 susceptibility loci for type 2 diabetes (T2D) have been identified through genome-wide association studies (GWAS) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] . However, the joint effects of these variants account for o10% of the heritability for T2D 10,19 . GWAS for T2D have been extensively conducted in populations of European descent and, accordingly, the majority of established T2D susceptibility genetic loci were originally identified by European GWAS 1,2,[8][9][10][11]16 . Cumulative evidence suggests that Asian populations may be more genetically susceptible to T2D than populations with European ancestry 20 . In addition, there are significant interethnic differences in the risk allele frequency or in effect sizes at several loci, which may affect the power to detect associations in these populations 2 . On the other hand, both overlap in T2D susceptibility loci among different ancestry groups and coincident risk alleles at lead single-nucleotide polymorphisms (SNPs) across diverse populations have been reported, suggesting that causal variants at many of these loci are shared across different ancestry groups 12 . Moreover, a recently published transethnic GWAS has successfully identified seven novel T2D susceptibility loci by combining the association data from European, South Asian, East Asian and Mexican/Latinos GWAS 12 . Therefore, it is valuable to perform GWAS for T2D using non-European and European populations, to facilitate identification of both ethnicity-specific and commonsusceptibility loci among different ethnic groups.
Four T2D GWAS loci discovered in a Japanese population earlier have been shown to be significantly associated with T2D in the largest European GWAS meta-analysis 10 : KCNQ1 (refs 3,4), UBE2E2 (ref. 5), C2CD4A-C2CD4B 5 and ANK1 (ref. 6), highlighting that there are common loci conferring susceptibility to T2D among the different ethnic groups studied. Three additional loci (MIR129-LEP, GPSM1 and SLC16A11-SLC16A13) have been identified by a large-scale Japanese GWAS (n ¼ B25,000) based on the imputation of genotypes using the 1000 Genomes Project data as a reference 7 . One of the findings, the association in the SLC16A11-SLC16A13 was also confirmed in the Mexican GWAS study 15 .
To identify novel loci for susceptibility to T2D, we have expanded the Japanese GWAS data set by incorporating new Japanese GWAS data (9,817 T2D cases and 6,763 controls) with GWAS data in previously reported case-control individuals (5,646 T2D cases and 19,420 controls) 7 followed by a validation study using independent Japanese case-control individuals (7,936 T2D and 5,539 controls) and multi-ethnic replication studies (East Asians: 12,554 T2D and 17,383 controls; Europeans: 38,947 T2D and 121,903 controls; South Asians: 10,587 T2D and 14,378 controls; and Mexian/Latinos: 3,848 T2D and 4,366 controls). As a result, we identify seven novel loci for T2D and the result indicates that expansion of single ethnic GWAS is still useful to identify novel susceptibility loci to complex traits.

Results
GWAS meta-analysis and validation in the Japanese population. Imputed genotype dosage data for 9,817 T2D cases and 6,763 controls for 7,521,072 autosomal SNPs (Stage-1, set-1) were obtained and combined with an independent GWAS data of previously reported case-control individuals 7 (Stage-1, set-2: 5,646 T2D cases and 19,420 controls; 7,521,072 autosomal SNPs), as shown in Fig. 1a. There was no obvious inflation in the quantile-quantile plots for each study (Stage-1: l GC ¼ 1.13 and l GC adjusted for 1,000 cases and controls (l GC-1000 ) 21 ¼ 1.012; Stage-2: l GC ¼ 1.082 and l GC-1000 ¼ 1.009), as shown in Supplementary Fig. 1A,B. SNPs with a low imputation quality (r 2 o0.7 either in set-1 or set-2) or with an inconsistent direction of effect between the studies were excluded from the analysis. We obtained 42 loci exhibiting a suggestive association with T2D (Po1 Â 10 À 6 ). The most significant association in this meta-analysis was rs2237896 located at intron 15 of KCNQ1 (P ¼ 2.81 Â 10 À 70 ), which was previously identified in Japanese GWAS 3,4 (Supplementary Fig. 1C). Out of the 42 loci, 25 were previously established T2D susceptibility loci (Supplementary Table 1) and the remaining 17 were further evaluated using an independent Japanese case-control study (Stage-2: 7,936 T2D cases and 5,539 controls, multi-centre) and de novo genotyping (Supplementary Tables 2 and 3).

and Supplementary
We also examined the association of these seven SNPs with glycaemic traits in Stage-2 control individuals, including fasting plasma glucose, homeostasis model assessment (HOMA) of b-cell function (HOMA-b) and HOMA of insulin resistance (IR). However, we did not detect any significant associations between the T2D risk alleles and these glycaemic traits (PZ0.0024 Supplementary Table 7). We also searched the publicly available European GWAS data 11,25,26 (MAGIC, http:// www.magicinvestigators.org) and found that the T2D risk allele at the DMRTA1 locus (rs11791293-C; proxy for rs1575972-T, CEU r 2 ¼ 1) and at the MIR4686 locus (rs7111341-T; proxy for rs7107784-G, CEU r 2 ¼ 0.95) were associated with a decrease in fasting plasma insulin (FPI) (P ¼ 0.0039; Supplementary Table 8) and with an increase in FPI (P ¼ 0.0066; Supplementary Table 8), respectively, although these associations were not statistically significant (PZ0.0008 ¼ 0.05/63 (7 SNPs Â 9 traits)).
Examination of seven novel loci in diverse ethnic groups. We analysed the association of these seven variants with disease susceptibility in populations other than Japanese. We obtained association data for four ethnic groups using de novo genotyping, in silico replication and by examining publicly available GWAS data 10 Table 9). Meta-analyses of the combined data from the four non-Japanese ethnicities indicated that four SNP loci, namely rs147538848 in FAM60A, rs1575972 near DMRTA1, rs7107784 near MIR4686 and rs67839313 near INAFM2 were associated with the disease after Bonferroni's correction (Po0.00714 ¼ 0.05/7; Table 2). The disease association of these four SNPs was further corroborated by combining the Japanese data with the multi-ethnic replication data sets (Supplementary Table 10). The rs67156297 locus in ATP8B2 was nominally associated with T2D in the combined meta-analysis for multi-ethnic groups other than the Japanese populations. We did not detect any disease association for the remaining two SNP loci other than in the Japanese population; however, the effect direction for each of the seven loci was consistent with that in the Japanese population.
Sex-and BMI-stratified analyses in the Japanese population. We performed BMI-stratified (BMIo25 orZ25) and sexstratified analyses in the novel and established GWAS loci, to determine whether significant heterogeneity in allelic effects existed between non-obese and obese individuals or males and females in the Japanese population. BMI-stratified analysis for 83 previously established loci revealed evidence of significant heterogeneity in the effect size between non-obese and obese individuals at KCNQ1 (P for heterogeneity ¼ 8.89 Â 10 À 5 ; Supplementary Table 11). The effect size of KCNQ1 was greater in the non-obese group than in the obese group (Supplementary  Table 11). In sex-stratified analyses, individual established loci did not show significant heterogeneity in effect sizes between men and women (P46 Â 10 À 4 ; Supplementary Table 12).
For the seven novel T2D-associated loci identified in this study, no significant heterogeneity was detected in BMI-stratified or sex-stratified analyses (Supplementary Tables 13 and 14).
Fine mapping analyses for established T2D loci. We examined the association data of 83 previously identified T2D susceptibility loci in the Japanese GWAS meta-analysis data (Supplementary Data 1 and Supplementary Fig. 2). Variants at 19 loci were found associated with T2D at a genome-wide level of significance and additional 30 loci were determined to be significantly associated with T2D (Po6.02 Â 10 À 4 ¼ 0.05/83). Of the above Genome-wide association test Stage-1, set-2 49 significant associations, ADCY5, HNF1A and PRC1 were not previously evaluated in the Japanese population, because the lead SNPs within these loci in the European GWAS (rs11708067 and rs11717195 at the ADCY5 locus 9,10 , rs12427353 and rs7957197 at the HNF1A locus 8,10 and rs8042680 at the PRC1 locus 8 ) were monoallelic in the Japanese population. In this study, rs79223353 at the ADCY5 locus, rs55783344 at the HNF1A locus and rs79548680 at the PRC1 locus were determined to be significantly associated with T2D (Po6.02 Â 10 À 4 ; Supplementary Data 1 and Supplementary Fig. 3). Meta-analysis combining the GWAS data with de novo genotyping data for Stage-2 individuals revealed that the association of rs79223353 within the ADCY5 locus and rs79548680 within the PRC1 locus reached genomewide significance in the Japanese population (Supplementary  Table 15). We did not detect any disease-associated SNPs within the 16 loci (9 derived from European GWAS, 3 from East Asian, 2 from trans-ethnic, 1 from South Asian and 1 from African American, PZ0.05) using fine mapping analyses (Supplementary Data 1 and Supplementary Fig. 2). We also identified a secondary association signal located at EXOC6 near the IDE-HHEX locus 10 . The associations of rs78627331 and rs34773007 within the EXOC6 locus were significant after conditioning on rs1111875 (r 2 ¼ 0.01 for rs78627331 and r 2 ¼ 0.04 for rs34773007 in JPT), which was a previously reported lead SNP within the IDE-HHEX locus (P ¼ 1.49 Â 10 À 8 for rs78627331, P ¼ 2.20 Â 10 À 8 for rs34773007; Supplementary Table 16).
Drug targets search by a bioinformatics approach. We applied the genetic information from previously reported and the present GWAS, to investigate potential drug targets for the treatment of T2D. First, we defined 286 T2D potential risk genes located in any of the 90 T2D risk loci (7 novel T2D loci that were identified in the present study and 83 previously identified T2D loci; see In brief, we scored each of the 286 biological candidate genes by adopting the following six selection criteria and calculating the number of satisfied criteria as follows: (1) genes for which T2D risk SNPs or any of the SNPs in LD (r 2 Z0.80) with them were annotated as missense variants; (2) genes for which cis-eQTL genes of any of lymphoblastoid cell lines, adipose tissue or liver tissues were observed for T2D risk SNPs (Po0.05 for lymphoblastoid cell lines and adipose tissues, and Po0.004 for liver tissues); (3) monogenic diabetes genes; (4) genes for which at least three out of six associated phenotype labels (homeostasis/metabolism, liver/ biliary system, endocrine/exocrine gland, growth/size/body, mortality/ageing and embryogenesis; Po9.2 Â 10 À 5 ) were observed in knockout mouse 28 ; (5) genes prioritized by PubMed text mining genes using GRAIL 29 with gene-based Po0.05; and (6) genes prioritized by protein-protein interaction (PPI) network using DAPPLE 30 with gene based Po0.05. As these criteria exhibited weak correlations with each other (r 2 o0.34; Supplementary Fig. 5), each gene was given a score based on the number of criteria that were met (scores ranged from 0 to 6). Genes with a score of 2 or higher were defined as biological T2D risk genes.
We searched for overlapping genes between the 871 drug target genes corresponding to approved, in clinical trials or experimental drugs for various human diseases described in the previous report 27 , and the 40 biological T2D risk genes plus 712 genes that are known to have products that have direct PPI 30 with the biological T2D risk gene products. We identified a total of 83 overlapping genes (Supplementary Fig. 6 and Supplementary  Table 21). Fourteen drug target genes with approved T2D treatments demonstrated significant overlap with the 40 biological T2D risk genes and 712 genes with direct PPI (4 genes overlapped with 5.6-fold enrichment as determined using permutation analysis, P ¼ 0.0042; Supplementary Table 22 and Supplementary Fig. 6). The 871 drug target genes had overlap with the identified 83 genes, which is 1.8-fold more enrichment than would be expected by chance, but this is 3.1-fold less enrichment compared with overlap of the targets of T2D drugs ( Supplementary Fig. 6).
Of the 83 overlapping genes, 5 were biological T2D risk genes (PPARG, KCNJ11, ABCC8, GCK and KIF11; Fig. 4). Three of these are targets of approved T2D drug treatments: PPARG, thiazolidinediones; KCNJ11, sulfonylurea; ABCC8, sulfonylureas and glinide. GCK is a target gene of a GCK activator that was in clinical trials as of August 2014 (Supplementary Table 23). Of the remaining 78 genes, 2 genes exhibit PPI with 3 biological T2D risk gene products. GSK3B interacts with NOTCH1, NOTCH2 and CCND2, whereas JUN interacts with FBXW7, HHEX and CCND2. Eight genes interact with 2 biological T2D risk gene products and 68 genes interact with a single biological T2D risk gene product (Supplementary Table 21

Discussion
In this study, we performed a GWAS meta-analysis in the Japanese population followed by validation using an independent Japanese sample. Integration of the results for B55,000 Japanese individuals identified 7 novel loci associated with T2D that reached genome-wide significance. In a subsequent transethnic meta-analysis, four loci were confirmed and one locus was suggested as common susceptibility loci for T2D in populations other than the Japanese population. GWAS have been extensively performed in diverse ethnic groups, including populations of European, East Asian, South Asian and Mexican decent [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] . To this point, the sample size of GWAS for European populations has grown to over 100,000 (ref. 10) and these studies have identified nearly 50 loci associated with T2D. GWAS on populations of non-European origin and transethnic GWAS meta-analysis have identified more than 30 loci associated with T2D, which were not detected in earlier European GWAS [3][4][5][6][7][12][13][14][15]18 . Among these, several loci have been shown to be associated with T2D in larger European populations 10 , which suggests that further expansion of GWAS for non-European populations could prove useful in identifying additional susceptibility loci associated with T2D.
Among the seven novel loci identified in this study, rs147538848 in FAM60A, rs1575972 near DMRTA1, rs7107784 near MIR4686 and rs67839313 near INAFM2 were shown to be common susceptibility loci for T2D across different ethnicities, although the significance of the association differed among individual ethnic groups for several loci.
rs147538848 is located in the intron of FAM60A, which encodes a subunit of the Sin3 deacetylase complex (Sin3/HDAC1) that has been shown to be important for the repression of genes encoding components of the transforming growth factor-b signalling pathway 31 . Studies using a rat intrauterine growth retardation model have suggested that the Sin3/HDAC1 complex may negatively regulate the expression level of pancreatic and duodenal homeobox 1 (PDX1), which is known as an important transcription factor for the development of pancreas and b-cell maturation 32 via histone modification of its proximal promoter 33 . A T2D risk allele at the FAM60A locus might contribute to disease susceptibility by impairing the transcriptional regulation of genes that are important for glucose metabolism.
INAFM2 encodes InaF-motif containing 2 and has previously been known as Osteogenesis upregulated transcript 1 (OGU1) or long intergenic non-protein coding RNA 984 (LINC00984), which is a putative long non-coding RNA. Although the expression of OGU1 has been shown to be upregulated during osteogenesis 34 , the function of INAFM2 encoding protein is still unknown. Around rs67839313, there are two plausible genes for susceptibility to T2D: PLCB2 and DISP2. PLCB2 encodes phospholipase C isoform b-2 and phospholipase C is a known regulator of insulin secretion through hydrolysis of islet phosphoinositide pools 35 . Therefore, it is feasible that this locus is associated with impaired glucose-stimulated insulin secretion machinery. DISP2 encodes dispatched homologue 2, which is a cell surface marker on insulin-positive cells 36 . Although the functional role of this molecule in glucose homeostasis is not well understood, it is potentially involved in the maturation of pancreatic b-cells or it might have a role in already matured pancreatic b-cells.
The effect size for the T2D association of rs1575972 near DMRTA1 was similar among all populations in this study, except    (Table 2 and Supplementary  Table 10). The risk allele of rs1575972 in the DMRTA1 locus was nominally correlated with a decrease in FPI (Supplementary  Table 8), which suggests that this locus might contribute to T2D susceptibility through affecting insulin secretion in pancreatic b-cells. The DMRTA1 encodes doublesex and mab-3-related transcription factor-like family A1, which has been recently reported to be involved in neuronal development by regulating the Pax6-Neurog2 transcriptional cascade 37 . Although the relevance of DMRTA1 to pancreatic development has not been established, DMRTA1 might play a role in b-cell development, because Pax6 and Neurog3, other member of the neurogenin subfamily, are key transcriptional regulators of pancreatic endocrine cell differentiation 38 . INS, IGF-2 and TH are located at approximately rs7107784 near the MIR4686 locus. IGF-2 plays a key role in embryonic growth and may also influence body weight in adulthood 39 , and TH (tyrosine hydroxylase) has been shown to play a role in b-cell development 40 . This locus is known to be associated with risk of type-1 diabetes (rs1004446-C, 45 kbp from rs7107784; r 2 ¼ 0.003, 41 . The risk allele of rs7107784-G is nominally associated with the increase of FPI levels in the European population (MAGIC data; Supplementary Table 8) and an increase of HOMA-IR in our Japanese data set (Supplementary Table 7). This suggests that the effects of rs7107784-G are probably not mediated by an impairment of insulin production or secretion, but rather by an impairment of insulin sensitivity. rs67156297 near ATP8B2 was nominally associated with T2D in the transethnic replication meta-analysis (Table 2 and  Supplementary Table 10). ATP8B2 encodes a member of the P4 family of ATPases (type 4P-type ATPase), which are multispan transmembrane proteins that have been implicated in phospholipid translocation from the exoplasmic to the cytoplasmic membrane leaflet 42 . The role of ATP8B2 in the pathogenesis of T2D has not been established. However, another member of the P4 ATPase family, atp10a, has been shown to be important for the biogenesis and/or membrane-directed trafficking of Glut4 receptors, and loss-of-function of atp10a induces IR and obesity in mice 42 .
The remaining two loci, rs1116357 near CCDC85A and rs9309245 near ASB3, were not associated with T2D (P40.05) in the replication meta-analysis for non-Japanese populations, which suggests that the effect of these loci might be specific to the Japanese population. As heterogeneity in effect sizes was observed for rs1116357 or rs9309245 between Japanese and other ethnic groups, including European, South Asian and Mexican (Supplementary  Tables 25 and 27), two possibilities might exist for the two SNP loci: (1) the LD between the causal alleles and the Japanese lead SNPs are consistent across the populations, but the risk alleles have effects only in the Japanese, and (2) the causal alleles are in LD with these SNPs only in the Japanese. By a systematic evaluation for effect sizes and LDs within these loci, we did not identify any SNPs associated with T2D in European populations, which are in LD with our lead SNPs in the Japanese, whereas not in LD in European populations ( Supplementary Fig. 8 and Supplementary Tables 26  and 27). Therefore, the causal allele in the two loci might have an effect only in Japanese populations; however, further evaluation is required to elucidate the precise mechanism how these loci contribute to T2D susceptibility in the Japanese.
While searching for potential drug targets for T2D using a systematic bioinformatics approach, 83 overlapping genes were identified from 752 genes (40 biological T2D GWAS genes and 712 genes that encode products in direct PPI with 40 biological T2D GWAS genes) and 871 drug target genes for various human diseases 27 . Of these, 5 were T2D GWAS genes: PPARG, KCNJ11, ABCC8, GCK and KIF11. PPARG, KCNJ11 and ABCC8 have approved T2D treatment options. In addition, a GCK activator is currently undergoing clinical trials for the treatment of T2D. KIF11, which encodes kinesin family member 11 (also known as EG5), has been shown to be involved in regulating cell mitosis and inhibitors targeting this gene product have been developed as chemotherapeutic agents in the treatment of cancer 43 . Although the role of KIF11 in the regulation of glucose metabolism has not been well established, a recent study reported that knockdown of KIF11 using small interfering RNA resulted in increased glycogenesis in human primary hepatocytes 44 . Thus, a KIF11 inhibitor might ameliorate glucose homeostasis by suppressing gluconeogenesis from the liver.  We identified two genes, GSK3B and JUN, which directly interact with multiple biological T2D susceptibility genes. GSK3B encodes glycogen synthase kinase 3b, which is a constitutively active multifunctional serine/threonine kinase and is involved in diverse physiological pathways, including metabolism, cell cycle regulation, gene expression, development, oncogenesis and neuroprotection 45 . Several studies using Gsk3b-modified mouse models have suggested that inhibition of GSK3B function may have beneficial effects on glucose metabolism through pancreatic b-cell preservation or enhancement of insulin-stimulated glycogen synthase regulation and glycogen deposition [45][46][47] . Currently, GSK3B inhibitors are under clinical trial for the treatment of cancers (Supplementary Table 23), but these compounds could also be potential treatments for T2D.
JUN encodes the proto-oncogene c-Jun and the role of c-Jun in the pathogenesis of T2D is not well understood. However, c-Jun has been shown to decrease the expression of the human insulin gene by repressing insulin promoter activity 48 . c-Jun is a transactivation component of the heterodimeric transcription factor AP-1 and activated through phosphorylation of serines 63 and 73 by Jun N-terminal kinase 2 (ref. 49), and inhibition of JNK has been shown to ameliorate glucose intolerance in a mouse model for T2D 50 . Currently, AP-1 inhibitor is under clinical trial for the treatment of rheumatoid arthritis (Supplementary  Table 23) and might also be potential treatments for T2D.
Although these results suggest that these loci are potential therapeutic targets for treating T2D, the pipeline used to identify these genes has some limitations. As eQTL effects have often been observed for genes far from each locus, it is possible that some biological genes located outside of LD block in each locus were overlooked. In addition, the selection criteria for PubMed textmining or knockout mouse studies were based on the known functions; therefore, T2D-associated genes whose functions have not been established may have been missed. The number of criteria that were met for individual genes were simply summed for scoring, although the relative impact of the six criteria used here on biological significance may not be equal. We used the previously described scoring method 27 , to prioritize genes in an objective manner; however, it would be worthwhile to refine the pipeline by modifying the selection criteria for genes in future studies. Finally, the potential therapeutic targets or treatments identified through the in silico pipeline have not yet been validated through an experimental approach. Furthermore, in vivo evaluation is essential to clarify the therapeutic effect of these potential T2D treatments.
In conclusion, we have identified seven novel T2D susceptibility loci using a large-scale Japanese GWAS meta-analysis. The T2D association for four of these was also observed in non-Japanese populations. In addition, we have proposed several new potential pharmacological targets for T2D treatment using a systematic bioinformatics approach. These results indicate that expansion of single ethnic GWAS is still useful to identify novel susceptibility loci to complex traits not only for ethnicity-specific but also for common loci across different ethnicities. Moreover, systematic approaches for integrating the findings of genetic, biological and pharmacological studies could be useful for developing new T2D treatments, although additional pipeline refinement would be required.

Methods
Subjects. Discovery stage (Stage-1). We selected T2D cases from individuals registered in BioBank Japan as having T2D (set-1 cases, n ¼ 9,817). Control groups consisted of individuals registered in BioBank Japan as not having T2D but with diseases other than T2D (cerebral aneurysm, oesophageal cancer, endometrial cancer, chronic pulmonary emphysema or glaucoma) or volunteers from the Osaka-Midosuji Rotary Club and Pharma SNP consortium (set-1 controls, n ¼ 6,763; Supplementary Table 24). We also used case and control individuals registered in the BioBank Japan that were previously analysed and reported (set-2 cases, n ¼ 5,646 and set-2 controls, n ¼ 19,420) 7 . There was no overlap in individuals in set-1 and set-2.
Validation analysis (Stage-2). We examined 7,936 T2D cases from the BioBank Japan that were not included in the discovery stage and from subjects with T2D, who visited outpatient clinics at  51 . We excluded individuals who were positive for antibodies against glutamic acid decarboxylase and those with diabetes due to liver dysfunction, steroids and other drugs that might raise glucose levels, malignancy or a monogenic disorder known to cause diabetes.
Clinical characteristics of Stage-1and Stage-2 participants are shown in Supplementary Table 24. Genomic DNA was extracted from peripheral leukocytes using the standard procedure. All individuals provided written informed consent to participate in this study. The protocol of this study conformed to the provisions of the Declaration of Helsinki and was approved by the ethical committees at the RIKEN Yokohama Institute and all other institutions.
Genotyping and quality control in the discovery stage. Set-1 samples were genotyped using the Human Omni Express Exome Bead Chip. There were 535,686 autosomal SNPs that passed quality control, with a call rate Z0.99, for Hardy-Weinberg equilibrium test P Z1 Â 10 À 6 in controls and minor allele frequency (MAF) Z0.01. Set-2 samples were genotyped using the Illumina Human 610K SNP array. There were 480,426 autosomal SNPs that passed quality control and were used for further analysis. For sample quality control, we evaluated cryptic relatedness for each sample using an identity-by-state method and removed samples that exhibited second-degree or closer relatedness. We further performed principal component analysis to select individuals within the major Japanese (Hondo) cluster as reported previously [5][6][7]52 , and data for 16,580 individuals (9,817 T2D cases and 6,763 controls) in set-1 and 25,066 individuals (5,646 T2D cases and 19,420 controls) in set-2 were used in subsequent analyses. To evaluate the potential effect of population stratification, we used a quantile-quantile plot of the observed P-values ( Supplementary Fig. 1A,B).
Imputation. We performed genotype imputation using MACH and Minimac 53,54 with individuals from the 1000 Genomes Project (phased JPT, CHB and Han Chinese South data n ¼ 275, March 2012) as reference populations 55 . We selected SNPs with MAF Z0.01 and a Minimac software quality score (r 2 )Z0.7. Individual genotype dosage data were used for association studies using mach2dat 53,54 .
Genotyping and quality control in the Stage-2 analysis. We genotyped 7,936 individuals with T2D and 5,539 controls using a multiplex PCR-Invader assay, as described previously 3,5-7 . Genotyping success rates o95% or concordance rates o99.9% were excluded from further evaluation.
Follow-up analyses. We obtained follow-up analysis data (n ¼ up to 223,966: 65,936 T2Ds and 158,030 controls) from multiple cohorts or a publicly available database, as described below.
East Asian populations. We obtained genotype data for up to 29,937 individuals (12,554 T2Ds and 17,383 controls), de novo genotyping from 2 cohorts and in silico replication data from 9 cohorts (Supplementary Table 9).
South Asian populations. We obtained in silico genotype data for a total of up to 24,965 individuals (10,587 T2Ds and 14,378 controls) from 6 cohorts (Supplementary Table 9).
European populations. We obtained genotype data for up to 160,850 individuals (38,947 T2Ds and 121,903 controls), de novo genotyping data from the Danish case-control study and from a publicly available database (DIAGRAM3 http:// diagram-consortium.org/downloads.html) 10 . The Danish case-control study consisted of individuals from the Inter99 cohort 56 , Health2006 cohort 57 , Vejle Biobank 58 , T2D cases from the Danish ADDITION screening cohort 59 and a T2D case-control study obtained at Steno Diabetes Center (SDC). Two SNPs were genotyped by Illumina MetaboChip in 8,781 individuals from Inter99, Health2006 and SDC, whereas four SNPs were genotyped by LGC Genomics, UK, in individuals from Inter99, Vejle Biobank, ADDITION and SDC samples (Supplementary Table 9).
Mexican/Latino population. We obtained in silico genotype data for up to 8,214 individuals (3,848 T2Ds and 4,366 controls) from the SIGMA Type 2 Diabetes Consortium (Supplementary Table 9).
Ethnicity was self-reported by the enroled individuals. For each study, approval was obtained from the institutional review boards of the participating institutions and written informed consent was obtained from all participants. We excluded association data obtained by imputed genotyped data with a low quality of imputation (r 2 o0.7 or info o0.7). Details of the study samples are described in Supplementary Table 9.
Statistical analysis. The association between each SNP and T2D was assessed using the logistic regression test with an additive model with or without adjusting for age, sex and log-transformed BMI. We combined data from the each GWAS and our validation analyses using an inverse variance method and examined heterogeneity with a Cochran's Q test using METAL 60 . Regional association plots were generated using LocusZoom 61 .
We also performed quantitative traits analysis for fasting plasma glucose, HOMA-b and HOMA-IR using multiple linear regression analysis in an additive association model with or without adjusting for age, sex and log-transformed BMI. The Japanese samples studied here show skewed distribution values for BMI, HOMA-IR and HOMA-b; therefore, we have analysed the quantitative traits using log-transformed BMI, HOMA-IR and HOMA-b.
Drug discovery. We performed a search for potential drug targets using genetic information of confirmed T2D susceptibility loci and publicly available bioinformatics tools 29,30 and databases 28,62-65 using a method that has been previously described by Okada et al. 27 (Supplementary Note).