AIS is a complex spinal deformity which is defined as a lateral spinal curvature with a Cobb angle of >10 degrees1. AIS is a common disease affecting ~2.5% of adolescents in Japan2 and millions of children worldwide are affected with a prevalence of 2–3%3,4. AIS is regarded as a multifactorial disease affected by genetic and environmental factor5. The importance of genetic factors in the etiology and pathogenesis of AIS has been demonstrated by many twin, family and population studies6,7.

Genome-wide association study (GWAS) is one of the most effective methods to identify genetic factors of complex traits, including common diseases. To date, several GWASs have been conducted for AIS8,9,10,11,12,13. We previously conducted two GWASs (GWAS1 and GWAS2) in Japanese populations and identified three loci significantly associated with AIS susceptibility (10q24.31, 6q24.1 and 9p22.2)8,9,10. A subsequent meta-analysis using multi-ethnic cohorts confirmed their robust associations14. A total of seven significantly associated loci have been identified in the meta-analysis; however, the proportion of the AIS heritability explained by the seven loci is estimated to be only ~3%. Therefore, it is indispensable to identify additional susceptibility loci for understanding the etiology and pathogenesis of AIS.

In the present study, we perform a large Japanese GWAS followed by a meta-analysis of three GWASs. We replicate the association of 6 previous reported susceptibility loci and identify 14 previously unreported susceptibility loci. By a sex-stratified analysis, we further identify three significant loci for female AIS. The integrative analyses indicate the significant genetic correlations of AIS with body mass index (BMI) and uric acid (UA), and show the significant heritability enrichment for AIS in multiple cell-type groups. In addition, we find 21 cis-expression quantitative trait loci (eQTL)-associated genes in 7 out of 14 previously unreported susceptibility loci. In vitro functional analyses suggest that one of these eQTLs, rs1978060, regulates the expression of TBX1, and the difference in FOXA2 binding causes difference in cis-acting transcriptional regulatory function between alleles. These findings provide insights into etiology and pathogenesis of AIS.


Association analysis

To identify the additional AIS susceptibility loci, we conducted a large GWAS (GWAS3: 3254 cases and 63,252 controls). In addition, we reanalyzed the previous GWASs (GWAS1 and GWAS2) by updating the reference panel (Methods). For each of the three GWASs, imputation analysis was performed separately. Subsequently, the meta-analysis combining the three GWASs was performed (a total of 5327 cases and 73,884 controls; Supplementary Fig. 1). The genomic control inflation factor (λGC = 1.16) showed an inflation in the GWAS; however, the linkage disequilibrium (LD) score regression analysis indicated that the inflation was mostly from polygenicity (85.5%) and biases have a small contribution (estimated mean χ2 = 1.29 and LD score intercept = 1.04). Compared to the previously reported GWAS, the bias was not large15; we therefore did not apply the GC correction. As a result of the meta-analysis, we identified 20 significant AIS loci including 14 previously unreported loci (Table 1, Supplementary Data 1, Fig. 1 and Supplementary Fig. 2). The most significant association was identified at rs11190870 (P= 2.01 × 10−82) as with our previous GWASs8,10. Including this locus, significant associations of the six previously reported loci (10q24.31, 6q24.1, 9p22.2, 20p11.22, 16q23.3 and 9q34.2)14 were replicated in the present meta-analysis (Table 1, Supplementary Data 1 and Fig. 1). Lead variants at the 14 previously unreported loci included both common genetic variants with smaller effect size and low-frequency genetic variants with relatively large effects sizes (Fig. 2 and Supplementary Table 1). Of the 14 variants, 3 variants had distinct minor allele frequency (MAF) spectra between Japanese (JPN) and Europeans (EUR). These three variants are rare (MAF < 0.01) or monomorphic in EUR, but low-frequency (MAF: 0.01–0.05) or common (MAF > 0.05) in JPN (Fig. 2 and Supplementary Table 1). We estimated the AIS heritability using LD score regression. The SNP heritability estimated on the liability scale by the LDSC software ( with the use of common variants in hapmap3 data is 42%. The lead variants of the 20 significant loci explained 4.6% of the phenotypic variance. We performed a stepwise conditional analysis to detect multiple independent signals within the 20 significant loci. We found five additional signals that reached the locus-wide significance (P < 5.0 × 10−6) (Supplementary Table 2). These five signals additionally explained 0.6% of the phenotypic variance (Supplementary Table 2 and Supplementary Fig. 3). These results were also confirmed by GCTA-COJO (Supplementary Table 3 and Supplementary Note 1).

Table 1 Association of the genome-wide significant loci
Fig. 1
figure 1

Manhattan plot showing the –log10 P value for the SNPs in the AIS GWAS. P values were derived from the meta-analysis by using the inverse-variance method under a fixed-effect model. The red line represents the genome-wide significance threshold (P = 5 × 10−8). The genetic loci that satisfied the genome-wide significance threshold in the meta-analysis of the three GWAS are labeled

Fig. 2
figure 2

Relationship between effect size and minor allele frequency. The meta-analysis effect size (y axis) and the minor allele frequency (x axis) for 20 significant SNPs. Red circles represent the SNPs at previously unreported loci (n = 14) and blue circles represent the SNPs at previously reported loci (n = 6). Effect sizes are measured as odds ratios, which give the odds of the outcome given exposure to one risk allele compared with those to no risk allele

Integrative analysis

AIS is a complex disease and several hypotheses including neuromuscular, biomechanical, genetic, developmental and growth-related have been proposed to explain the etiology and pathogenesis of AIS5,16. However, there is currently no hypothesis commonly supported. Therefore, the identification of AIS-related tissues and cell types is valuable for understanding of etiology and pathogenesis of AIS. We investigated the cell-type specificity of AIS based on the enrichment of heritability. We applied stratified LD score regression to our GWAS result using 220 cell-type-specific annotations of the four histone marks (H3K4me1, H3K4me3, H3K9ac and H3K27ac) constructed by the Roadmap Epigenomics Project17. First, to obtain an overview of AIS-related cell types, we assessed heritability enrichment of 10 major cell-type groups that are constructed reflecting system- or organ-level structures from the 220 individual cell-type-specific annotations18. The most significant enrichment was observed in the cardiovascular cell group (P = 1.07 × 10−5) (Supplementary Table 4). In addition, significant enrichment was also observed in other five cell-type groups: skeletal muscle, connective or bone, other, central nervous system (CNS) and gastrointestinal groups (P < 5.0 × 10−3) (Supplementary Fig. 4 and Supplementary Table 4). We further assessed the heritability enrichment of the 220 individual cell types and identified significant enrichments in six cell types (P < 2.3 × 10−4) (Supplementary Data 2). The most significant heritability enrichment was observed in H3K4me1 in fetal stomach cell (P = 4.82 × 10−6) and significant enrichment was also observed in H3K4me1 in other fetal organs such as lung, trunk muscle and leg muscle. Significant enrichment was also observed in H3K4me1 in stomach smooth muscle and H3K4me3 in penile foreskin fibroblast primary. These results suggest that the etiology and pathogenesis of AIS are heterogeneous and multifactorial.

To gain biological insights, we conducted a pathway analysis with the result of the meta-analysis using PASCAL19 ( Sixty-two pathways showed the nominal significance (P < 0.05); however, no specific pathway significantly associated with AIS (P < 4.6 × 10−5) was detected (Supplementary Data 3). To obtain insights into the genetic architecture of AIS, we explored the shared genetics between AIS and various traits. We calculated genetic correlation between this study and 67 complex human traits (61 quantitative traits and 6 diseases) in Japanese20,21 using bivariate LD score regression22 (Supplementary Data 4). Genetic correlations of AIS with BMI and UA have been suggested in our previous studies20,21 and their significant negative correlations were replicated in this study (BMI: rg = −0.15, P = 3.14 × 10−5; UA: rg = −0.15, P = 1.22 × 10−5).

Sex-stratified analysis

Since AIS has a clear sex difference of female predominance23 and the number of male case samples was limited, we conducted a female-specific meta-analysis of the GWASs (5004 cases and 33,679 controls). As a result, 15 loci showed the genome-wide significance (Supplementary Data 5). Three of the fifteen loci (3q13.2, 1q25.2 and 1q23.3) did not reach GWAS significance in the overall study. We found an evidence of sex-heterogeneity in the BOC region on chromosome 3 (Table 2).

Table 2 Association of the genome-wide significant loci for female AIS (only loci FEMALE-sig)

Functional annotation and candidate susceptibility genes

To identify candidate causal variants at each of the 14 previously unreported loci (Supplementary Fig. 2), we used several SNP annotation tools (HaploReg (, 3DSNP (, RegulomeDB (, etc.) and explored the biological role of these variants. For each locus, we searched all variants in high LD (r2 > 0.8 in East Asians (EAS) of 1KGP3) with the most associated variant (lead variant) and found some candidate variants with regulatory function (Supplementary Data 6). Furthermore, to identify susceptibility genes in the loci, we searched for eQTL using data from the Genotype-Tissue Expression (GTEx) project24 ( As demonstrated by the cell-type-specific analysis, AIS is a multifactorial heterogeneous disease and multiple tissues may be associated with AIS. Moreover, because the tissues used in the GTEx project are limited and there are no data on AIS-related tissues such as intervertebral disc, cartilage and bone, we searched data on all tissues currently available. We observed significant cis-eQTLs at 7 of the 14 previously unreported loci, and the expression levels of 21 genes in the 7 cis-eQTL loci were associated with some variants that were in high LD (r2 > 0.8 in EAS) with the lead variants (Supplementary Table 5). We also searched for candidate variants with regulatory functions and eQTL using GTEx data for three previously unreported loci that were significant in the female meta-analysis (Supplementary Data 7 and Supplementary Table 6). The expression levels of three genes in the chromosome 3 locus were associated with some variants that were in high LD (r2 > 0.8 in EAS) with the lead variants (Supplementary Table 6). These cis-eQTL-associated genes are promising candidates for AIS susceptibility genes, among which there are several interesting candidate genes (Supplementary Table 7).

Among these cis-eQTLs, we specifically focused on rs1978060, the lead variant at chromosome 22q11.21 corresponding to the lead cis-eQTL variant of TBX1. TBX1 is a member of the T-box gene family, which is a group of transcription factors involved in the regulation of developmental processes. Mutations of TBX1 are known to cause DiGeorge syndrome (OMIM #188400) and velocardiofacial syndrome (OMIM #192430), and scoliosis is one of the clinical features, which is highly prevalent (47–49%) in association with these syndromes25,26,27,28. There is substantial evidence that Tbx1 haploinsufficiency is responsible for the physical features of these syndromes, and it has also been reported that Tbx1 knockout mice show vertebral anomalies29,30. In addition, other T-box gene family member, TBX6 is known to cause congenital scoliosis31. We therefore selected the TBX1 locus for further functional analysis. To prioritize candidate variants at the locus, we used a simple scoring system. The candidate variants were scored when they possess the following functional information: (i) promoter histone mark, (ii) enhancer histone mark, (iii) DNase protein binding and (iv) motif change. Based on the system, we selected rs1978060 as a most likely causal variant at this locus. rs1978060 was also a promising candidate causal SNP in other scoring systems such as 3DSNP and RegulomeDB (Supplementary Data 6).

We conducted in vitro analysis for rs1978060. We constructed luciferase reporter vectors by cloning the TBX1 promoter region and inserting oligonucleotides that contained either the risk or non-risk allele of rs1978060. We evaluated the effect of rs1978060 on TBX1 promoter activity and revealed that the risk allele-G significantly reduced the reporter activity compared to the non-risk allele-A (Fig. 3a and Supplementary Fig. 5a). We then performed an electrophoretic mobility shift assay to examine the DNA–protein binding of rs1978060. As expected, different binding patterns of DNA–protein complexes were observed between the risk and non-risk alleles of rs1978060 (Fig. 3b and Supplementary Fig. 5b). We searched for possible transcription factors that have the differential binding effect on rs1978060 using annotation tools. FOXA2 was predicted by JASPER ( to have a higher binding score in the non-risk allele and binding to this genomic region has also been demonstrated by chromatin immunoprecipitation assays. Furthermore, an electrophoretic mobility shift assay using FOXA2 antibody showed a super-shift in the presence of the antibody (Fig. 3c and Supplementary Fig. 5b) and the effect of rs1978060 on the TBX1 promoter activity showed significant allelic difference when co-transfection with FOXA2 (Fig. 3a and Supplementary Fig. 5a). These results indicate that FOXA2 binds to rs1978060 and regulates the transcription of TBX1.

Fig. 3
figure 3

Allelic difference of functional variant, rs1978060 in Chr.22q11.21. a Reporter assays in MCF-7 cells. There was a significantly decreased transcriptional activity for the risk G-allele of rs1978060 compared to the non-risk A-allele. Error bars show standard deviation (S.D.) for each variant. Asterisks indicate statistically significant changes in paired comparison (t-test P < 0.01). n = 2 independent experiments. b Electrophoretic mobility shift assays with nuclear extracts from MCF-7 cells. There was specific bands for A-allele probe (lane 1, black arrow) and the G-allele probe (lane 2, white arrow) of rs1978060. Competition analyses were performed using an excess of the unlabeled A-allele probe (lane 3, lane 6) and G-allele probe (lane 4, lane 5) as competitors. c A super-shift assay using the FOXA2 antibody. The black arrow indicates a super-shifted FOXA2 complex in lane 3. Source data are provided as a Source Data file


In the present study, we conducted a large AIS GWAS in Japanese and performed a meta-analysis with two previous Japanese GWASs. With larger sample sizes and improved imputation, we identified 20 susceptibility loci including 14 previously unreported loci. The larger sample size yielded both common variants with smaller effect sizes and low-frequency variants with relatively large effects on the risk of AIS, suggesting contribution of both rare and common variants in the genetic architecture of AIS (Fig. 2). Of the lead variants at the 14 previously unreported loci, 11 were common variants (MAF > 0.05) in both JPN and EUR. Some of these common variants differed greatly in MAF between JPN and EUR (Supplementary Table 1). However, the association of common lead variants (MAF > 0.05) at the three susceptibility loci (10q24.31, 6q24.1 and 9p22.2) identified in our previous GWAS was replicated even if there is a large difference in MAF between JPN and EUR14,32,33. Therefore, it is expected that the association of common lead variants (MAF > 0.05) identified in the present study will also be replicated in a multi-ethnic meta-analysis. From this point of view, rs73235136, a common lead variant (MAF > 0.05) at a female-specific susceptibility locus, also has a large difference in MAF between JPN and EUR, but the association is expected to be replicated. On the other hand, the low-frequency variants (rs141903557 and rs188915802), which have relatively large effects, are rare or monomorphic in EUR (Supplementary Table 1). The contribution of these variants with relatively large effects to the genetic architecture of AIS can differ among ethnic groups, as no association can be found unless there are other causal variants.

Besides our GWASs, a few GWASs in Caucasian and in Chinese have been reported11,12,13. The association of 20p11.22 locus reported in Caucasian GWAS was replicated in this study; however, the association of other loci reported in Chinese GWAS was not replicated. These loci were also not significant in our previous multi-ethnic meta-analysis14. They seem to be Chinese population-specific signals. Recently, an exome-wide association study identified that rs13107325, a non-synonymous SNP encoding SLC39A8, is associated with AIS34. However, the SNP is monomorphic in Japanese and other East Asian populations, and no variants associated with AIS at this locus were identified.

Integrative analyses indicated that AIS is associated with several cell-type groups such as cardiovascular, connective bone, skeletal muscle, and CNS, which suggests heterogeneity in etiology and pathogenesis of AIS. The negative correlation between AIS and BMI confirmed by our genetic analysis is consistent with the clinical observations reported in many clinical studies35,36. In contrast, there have been few reports on the relation between AIS and UA. We could find only a few papers reporting a relation between renal and ureteral abnormalities and congenital scoliosis37,38. UA is also known to be associated with a risk for cardiovascular diseases, and our cell-type specificity analysis showed that AIS is associated with cardiovascular cell types. These findings provide starting points to clarify the complex etiology and pathogenesis of AIS.

In the cis-eQTL analysis, we searched data for all tissues included in the current GTEx project; however, the tissues examined in the project are limited. AIS-related tissues such as intervertebral disc, cartilage and bone were not included and in the first place, it is difficult to determine the tissue that contributes to the onset of AIS. In addition, because the sample size greatly affects eQTL mapping, current eQTL data would have missed a considerable number of actually associated tissues. However, based on the cis-eQTL analysis followed by functional annotations of the variants, we could show that the TBX1 expression is regulated through the binding of FOXA2 to rs1978060. Consistent with our results, previous mouse studies have shown that Foxa2 in the pharyngeal endoderm can bind and activate transcription through the critical cis-element upstream of Tbx1 (refs. 39,40).

Among the cis-eQTL-associated genes other than TBX1, there are several interesting candidate genes. One of them is DSE at chromosome 6q22.1. DSE encodes dermatan sulfate epimerase, an enzyme that is necessary for dermatan sulfate biosynthesis. Bi-allelic loss-of-function mutations in DSE are reported to cause Ehlers–Danlos syndrome (EDS) musculocontractural type 2 (OMIM #615539), which is characterized by progressive multi-system fragility-related manifestations. The patients of this type of EDS present spinal deformity including scoliosis and kyphoscoliosis36. Consistent with this, cis-eQTL analysis shows that risk allele of rs482012 is associated with decreased expression of DSE (Supplementary Table 5). In addition, FTO on chromosome 16q12.2 is a promising candidate susceptibility gene. AIS-associated variants including the lead SNP, rs12149832 are present in the intronic region of FTO and the risk allele is associated with decreased expression of FTO (Supplementary Table 5 and Supplementary Data 6). It is well known that FTO is associated with BMI and obesity41,42,43, and risk allele of rs12149832 is associated with lower BMI and decreased risk of obesity43. These findings are consistent with the recently reported negative correlation between BMI and AIS20.

Sex-stratified analyses identified three susceptibility loci that are significant only in the female meta-analysis. Among the loci, we found an evidence of sex-heterogeneity at the BOC locus. BOC is a co-receptor for the hedgehog pathway which is induced by 1,25-dihydroxyvitamin D3 (1,25(OH)2D3) in osteocytes44. Vitamin D is essential to maintain bone and mineral metabolism. Its status is greatly influenced by gender45 and its deficiency is well known to be associated with osteoporosis and autoimmune disease which show significant sexual dimorphism46. Thus, BOC might be specifically associated with female AIS through the vitamin D metabolism.

In this study, 14 previously unreported AIS susceptibility loci were identified in Japanese, and TBX1 was identified as one of the AIS susceptibility genes within the loci. Further studies are necessary to validate the association of these susceptibility loci in other ethnic groups. Identifying susceptibility SNPs within the loci and elucidating the function of candidate susceptibility would lead to understanding the etiology and pathogenesis of AIS. Although the mechanism behind sexual dimorphism in AIS remains unknown, the female-specific susceptibility locus we identified will be a key clue to understanding sexual dimorphism in AIS.



In GWAS3, as in GWAS1 and GWAS2, the case subjects with a Cobb angle of 10° or greater measured on standing spinal posteroanterior radiographs were recruited from collaborating hospitals8,9,10. All subjects were Japanese and were diagnosed with AIS between the ages of 10 and 18 years by expert scoliosis surgeons8,47. Congenital, juvenile, adult-onset scoliosis and scoliosis secondary to some other disorders were excluded. The control subjects were randomly selected from the subjects registered in the BioBank Japan project48 ( For the quality control of GWAS samples, we removed samples with a call rate <0.98. We removed related individuals with PI_HAT > 0.25. PI_HAT is an index of relatedness between two individuals based on identity by descent implemented in PLINK49 ( This filtering of relatedness was applied in each GWAS and combined data of the three GWAS separately and resulted in 102 cases and 806 controls excluded in total. To identify population stratification, we conducted principal component analysis (PCA) for genotype using FastPCA50. We excluded outliers from the East Asian cluster (distance from the mean of the cluster should be within 3 SD). There was no overlap of individuals in the three GWASs. Informed consents were obtained from all participants and from the parents of subjects who were minor. The ethical committees at all collaborating institutions and RIKEN approved the study.

Genotyping and imputation

GWAS3 subjects were genotyped by using the Illumina Human OmniExpress Genotyping BeadChip or a combination of Illumina HumanOmniExpress and HumanExome BeadChips. For quality control of variants, we applied the standard QC measures and excluded those with (i) SNP call rate < 0.99, (ii) MAF < 0.05 and (iii) Hardy–Weinberg equilibrium P value ≤ 1.0 × 10−5. We also excluded the variants whose allele frequencies had differences of >0.16 between the GWAS dataset and the Asian data in reference panel. For each of the three GWASs, we pre-phased the genotypes using EAGLE and imputed dosages with the 1000 Genomes Project Phase 3 reference panel (May 2013 release; with 1037 Japanese in-house reference panel using minimac3. For X chromosome, pre-phasing was performed in males and females together, and imputation was performed separately for males and females. Dosages of variants in X chromosomes for males were assigned between 0 and 2. Since variants in the pseudo-autosomal region (PAR) were not contained in the reference panel, we did not analyze PAR. We used variants with an imputation quality score Rsq ≥ 0.3 and MAF ≥ 0.005 for the subsequent association study. The quality control and imputation analysis for three GWASs were processed separately. The number of SNPs through analysis steps is illustrated in Supplementary Table 8.

GWASs and meta-analysis

Association analyses of autosomes of GWAS1-3 were performed independently in a logistic regression model with top 10 principal components as covariates. The three GWAS were meta-analyzed with inverse-variance method under fixed effect model. For X chromosome, an association analysis was conducted separately for male and female at each GWAS, and meta-analyzed the association results. We filtered variants showing strong heterogeneity (Cochran’s Q-test, P < 0.0001). This filtering excluded 484 variants in autosomes and 24 variants in X chromosomes. Regional association plots were produced by Locuszoom ( Adjacent genome-wide significant (P < 5.0 × 10−8) variants were grouped in one locus if they were located within 1 Mb apart from each other. To identify multiple independent signals within the 20 significant loci, we performed a stepwise conditional meta-analysis. We first performed conditional analyses of GWAS1-3 independently, and then combined the results using a fixed-effects model with the inverse-variance method. We repeated this process until the index variants fell below the locus-wide significance threshold of 5.0 × 10−6, based on the approximate average number of multiple tests in each locus. Association studies were conducted by plink2 or mach2dat software. We estimated the AIS heritability using the LDSC software. The variance explained by SNPs was calculated based on a liability threshold model by assuming prevalence of AIS as 2.5%. In this model, we assume that subjects have a continuous risk score and that subjects whose score exceed a certain threshold develop AIS.

Sex-specific GWAS

We separately extracted imputed genotypes of male and female subjects from the GWAS data and applied the same quality control criteria in each GWAS of males and females. We empirically estimated difference in effect sizes in males and females. We randomly generated 100,000 true effect sizes in males and females based on correlation coefficients and standard errors in sex-specific GWAS and compared effect sizes between males and females to compute P values. We also calculated statistical power to identify GWAS significant signals in males and females (Methods).

Power calculation of the current study

We conducted power calculation with use of GeneticsDesign package of R software ( to detect a signal with P value of 5 × 10−8 assuming disease prevalence of 3%. Calculation was done for entire dataset, males and females (Supplementary Table 9).

Estimation of confounding biases using LD score regression

To estimate confounding biases derived from population stratification and cryptic relatedness, we conducted LD score regression15. We used LD scores for the East Asian population provided by the LDSC software.

Functional annotation and eQTL analyses

To characterize associated variants, we used HaploReg v4.1, 3DSNP and RegulomeDB to gain functional annotation of variants in LD (r2 ≥ 0.8 in EAS of 1KGP3) with the 14 AIS lead variants. LD was calculated using LDlink51 (, a web-based application. We searched for the overlaps between these associated variants and promoter and enhancer marks using HaploReg v4.1 setting the source for epigenetic annotation as ChromHMM core 15-state model. We also searched for the overlaps between the associated variants and lead cis-eQTL variants in GTEx (release v7). We considered only the cis-eQTLs with FDR < 0.05, and listed the variants showing the most significant association for each gene in Supplementary Tables 5 and 6.

Cell-type specificity analysis

To assess the heritability enrichment in cell types for AIS, we performed stratified LD score regression by combining data from specific annotations of 10 cell-type groups and 220 cell types and four activating histone marks (H3K4me1, H3K4me3, H3K9ac and H3K27ac) from the Roadmap Epigenomics project17. The variants with low imputation quality score (Rsq < 0.3) and the variants within the major histocompatibility complex (MHC) region (chromosome 6: 25–34 Mb) were excluded from the regression analysis. We defined significant heritability enrichments as those with P < 0.05 after Bonferroni correction.

Pathway analysis

To investigate biological pathways associated with AIS, we performed PASCAL19. PASCAL computes gene scores (max or sum scores) by aggregating SNP P values from a GWAS meta-analysis and calculates pathway scores by combining the scores of genes belonging to the same pathways. We used sum statistics and predefined pathway libraries from KEGG, REACTOME and BIOCARTA with default parameters.

Genetic correlation analysis

To estimate genetic correlations across the 67 traits (61 quantitative traits and 6 diseases), we conducted bivariate LD score regression15 using the East Asian LD score and summary statistics of the current GWAS meta-analysis. We excluded SNPs found in the MHC region (chromosome 6: 25–34 Mb) from the analysis because of its complex LD structure. We defined significant genetic correlations as those with FDR < 0.05, calculated via the Benjamini–Hochberg method to correct multiple testing.

Luciferase assay

We constructed luciferase reporter vectors by cloning TBX1 promoter (−912~ +63; TBX1-promoter-F: 5′-GTTGGTACCCTCCTCAGTGCTTCCCTTTG-3′ and TBX1-promoter-R: 5′-ACTCTCGAGAGTGTTCCTCCCTCCCTCAC-3′) with oligonucleotides (sense: 5′-TCGATGTCTAATGTACRCACCAGCTCGGA-3′and antisense: 5′-TCGATCCGAGCTGGTGYGTACATTAGACA-3′) containing either risk or non-risk alleles of rs1978060 into the multicloning site of a promoterless pGL4.10[luc2] vector (Promega). We also cloned a cDNA of FOXA2 (FOXA2-F: 5′-GTTAAGCTTGCCACCATGCACTCGGCTTCCAGTAT-3′ and FOXA2-R: 5′-ACTGGATCCAGAGGAGTTCATAATGGGCC-3′) into the multicloning site of p3xFLAG-CMV-14 vector (Sigma-Aldrich) for protein expression in mammalian cell lines. We sequenced the inserts of all constructs. The phRL-SV40 vector (Promega) was used as an internal control to normalize the variation in transfection efficiency. MCF-7 and OUMS-27 cells are generally used for luciferase assay. MCF-7 cells (HTB-22, ATCC) were cultured at 37 °C under 5% CO2 in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (FBS), 0.01 mg ml−1 human recombinant insulin, 50 U ml−1 penicillin and 50 μg ml−1 streptomycin. OUMS-27 cells (JCRB Cell Bank) were cultured at 37 °C under 5% CO2 in DMEM supplemented with 10% FBS and 50 μg ml−1 kanamycin. All cell lines were authenticated by STR analysis and confirmed to be mycoplasma negative. MCF-7 and OUMS-27 cells were seeded in 24-well plates at a density of 5 × 104 and 1 × 105 cells per well, respectively. After 24 h, cells were transfected using TransIT-LT1 (Mirus Bio) according the manufacturer’s instructions. The luciferase activities were measured using the PicaGene Dual Sea Pansy Luminescence kit (Toyo Ink).

Electrophoretic mobility shift assays

We prepared probes for the risk (G) and non-risk (A) alleles of rs1978060 by annealing 25-bp complementary oligonucleotides (sense: 5′-TGTCTAATGTACRCACCAGCTCGGA-3′ and antisense: 5′-TCCGAGCTGGTGYGTACATTAGACA-3′) and labeling with digoxigenin (DIG)-11-ddUTP (Roche). The nuclear extracts were prepared from MCF-7 and FOXA2-overexpressing OUMS-27 cells. DNA–protein binding reactions were performed using a DIG gel shift kit according to the manufacturer’s instructions (Roche). For competition assays, nuclear extracts were pre-incubated with excess unlabeled probes prior to adding DIG-labeled probes. For a super-shift assay, 2 μg of FOXA2 antibody (Santa Cruz; sc-374376X) was added to the reaction mixture and incubated for 20 min at room temperature. DNA-protein complexes were resolved on 6% DNA retardation gels (Thermo Fisher Scientific), and the signal was detected using a chemiluminescent detection system according to the manufacturer’s instructions (Roche). Uncropped gel images can be found in the Source Data file.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.