Genome-wide association study identiﬁes new susceptibility loci for adolescent idiopathic scoliosis in Chinese girls

,

A dolescent idiopathic scoliosis (AIS) is a structural deformity of the spine that is estimated to affect millions of children, with a prevalence of 2-4% (refs 1,2). Commonly understood as a complex trait (polygenic disease), the genetic aetiology of AIS has been widely investigated by linkage analysis and candidate gene association analysis [3][4][5] . During the era of candidate study, several genes were reported to be associated with AIS, such as ERa (ref. 6), MATN1 (ref. 7), MTNR1B (ref. 8) and TGFB1 (ref. 9). However, few of them have been successfully replicated in different ethnicities [10][11][12][13] . The first genome-wide association study (GWAS) in a Caucasian population reported that rs1400180 of CHL1 was associated with AIS 14 , which however could not be replicated in a Chinese Han population 15 . Subsequently, a GWAS performed in a Japanese population revealed that LBX1 and GPR126 could play a role in the etiopathogenesis of AIS 16,17 . In a recent study that combined the GWAS data of AIS in Caucasian and Japanese populations, Sharma et al. 18 reported an enhancer locus of PAX1 was associated with the susceptibility of female AIS. As estimated by Kou et al. 17 , variants of LBX1 and GPR126 can only explain B1% of the trait variance in AIS. Herein additional genetic risk factors need to be discovered for a better understanding of the genetic susceptibility of AIS.
To map novel risk loci for AIS, here we report the results of a four-stage GWAS in a Chinese population. Overall, we identify three new susceptibility loci: 1p36.32 near AJAP1, encoding adherens junction associated protein 1 (rs241215, P combined ¼ 2.95 Â 10 À 9 ); 2q36.1 between PAX3 and EPHA4, encoding paired box 3 and EPH receptor A4, respectively (rs13398147, P combined ¼ 7.59 Â 10 À 13 ); and 18q21.33 near BCL-2, encoding B-cell CLL/lymphoma 2 (rs4940576, P combined ¼ 2.22 Â 10 À 12 ). In addition, we refine a previously reported region associated with AIS at 10q24.32 (rs678741, P combined ¼ 9.68 Â 10 À 37 ), which suggests LBX1AS1, encoding an antisense transcript of LBX1 (lady-bird like homeobox), might be the functional variant of AIS. This is the first GWAS to our knowledge investigating the genetic variants associated with AIS in a Chinese population, and the findings provide new insight into the multiple aetiological mechanisms of AIS.

Results
Association analyses in the discovery and replication. We performed a four-stage GWAS in cohorts of 4,317 patients and 6,016 controls, all of the Chinese Han population. In the discovery stage, 906,703 single nucleotide polymorphisms (SNPs) were genotyped in 1,001 patients presenting a main thoracic curve (major curve located in thoracic region) and 1,500 healthy controls. After quality control filtering of the genotyping data, 516,220 autosomal SNPs in 960 cases and 1,499 controls were retained for the subsequent Cochran-Armitage trend test. Analysis of population stratification with principal-component analysis (PCA) showed that all subjects of our study were clustered in Chinese population ( Supplementary Fig. 1). The genomic inflation factor (l) as shown by the quantile-quantile plot was 1.031, indicating that there was only a minimal amount of population stratification ( Supplementary Fig. 2). As shown by the Manhattan plot ( Fig. 1), 43 SNPs were observed to surpass the threshold of suggestive whole-genome significance (Po1.0 Â 10 À 5 ) (Supplementary Table 2 and 3).
Functional annotation. We evaluated the overlap of the novel associated SNPs with the Encyclopedia of DNA Elements (ENCODE)-annotated genomic elements 19 . Some signs of regulatory activity, including histone modifications at promoter or enhancer, DNase hypersensitivity sites, binding proteins and motifs changed, were observed for the four associated SNPs and their surrogates (Supplementary Tables 5-8).

Discussion
In this four-stage GWAS, we identified four SNPs that are significantly associated with risk of AIS in the Chinese population. The linkage disequilibrium block of the top signal SNP (rs678741) at 10q24.32 contains a widely validated risk variant of AIS (rs11190870) mapping to the 3 0 region of LBX1 (refs 16,20-22). Highly expressed in spinal cord and skeletal muscle, LBX1 was identified as an AIS susceptibility gene by a previous GWAS 16 . LBX1 may be implicated in the aetiology of scoliosis through abnormal somatosensory function 16 . In the current study, we further pinpointed a potentially regulatory SNP in this region. Data from ENCODE showed that rs678741 may be located in a strong enhancer region marked by peaks of several active histone methylation modifications. In addition, a search of HaploReg indicated that rs678741 maps to a DNase hypersensitivity site in multiple cell lines (Supplementary Table 5) 23 . These findings indicated that rs678741 has the potential to regulate the transcriptional activity of LBX1AS1.
Previous studies have shown that antisense long non-coding RNA can regulate the expression of target genes by directly binding to the transcript 24,25 . Therefore, it is possible that LBX1AS1 can be a biologically functional element that contributes to the risk of AIS. Further studies are necessary to clarify the role of LBX1AS1 in the pathogenesis of AIS. SNP rs4940576 is located in the intron of BCL-2 gene that encodes an integral outer mitochondrial membrane protein playing a key role in apoptosis 26,27 . Amling et al. 28 observed that BCL-2 is expressed at highest levels in late proliferative and maturing chondrocytes. In BCL-2 knockout mice, endochondral ossification was accelerated owing to the premature loss of terminal hypertrophic chondrocytes in the growth plate 29 . Thus, the product of BCL-2 appears to have an important role in controlling osteoblast activity and regulating the endochondralossification process. Interestingly, overgrowth of anterior spinal endplate is a classic hypothesis in the etiopathogenesis of AIS 30 . Proliferative and hypertrophic chondrocytes in the anterior column of AIS patients seemed more active than that of the posterior column 31 . Moreover, AIS patients were found to have different growth kinetics in bilateral growth plate of the vertebrae as indicated by differences in cellular activity between the convex and concave side of the curve 32 .
The third SNP rs13398147 is located between EPHA4 and PAX3. EPHA4 belongs to the EPH receptor subfamily of the protein-tyrosine kinase family 33 . EPH receptors are implicated in mediating developmental events, particularly in the nervous system 34 . The PAX3 subfamily regulates both myogenesis and neurogenesis in the neural tube [35][36][37] . The neural tube/notochord complex has a critical function during the development of vertebral muscle 38 . In addition, PAX3 mutation can lead to muscular and neural tube defects, as well as malformation of the vertebral column 39,40 . Abnormality of the paravertebral muscles has been proposed as the cause of AIS 30 . Many diseases that affect muscle function can present a secondary scoliosis 41,42 .The fourth SNP rs241215 maps near AJAP1, encoding a transmembrane protein that interacts with E-cadherin/b-catenin complexes and the adaptor protein complex AP-1B in polarized epithelial cells 43 . Recent findings suggested a potential role for AJAP1 in cell-cell and cell-extracellular matrix interactions that could be involved in cell adhesion, migration and invasion 44,45 . Regulation of cell adhesion has been found to be essential in the bone growth and osteoblast differentiation [46][47][48] .
To determine if there was any association between the novel associated SNPs and curve severity, we investigated a subgroup of 632 patients with a mean Cobb angle of 37.2 ± 9.4°( range 27-49°). Of these, 214 patients received fusion surgery, and all other 418 who did not require surgical correction had been observed until skeletal maturity. As shown in Supplementary Table 9, rs241215 might be associated with the curve severity. Patients with genotype AA had a higher curve magnitude (39.6 ± 7.3°) than those with genotype TA (36.9±7.7°) or TT (36.7±8.1°). As for the other three SNPs, no difference regarding the curve severity was found among the genotypes.
The primary limitation of our study lies in the lack of experiments supporting the functional role of the newly identified susceptibility genes. We believe that future functional analysis concerning the expression of these genes in tissues of AIS patients and related regulatory pathways can shed light on their relationships with the development of AIS. Moreover, future finemapping studies based on data from the resequencing of these regions may provide a clear understanding of the association at these loci.
In summary, our GWAS conducted in a Chinese Han population identifies three new susceptibility loci of AIS at 1p36.32, 2q36.1 and 18q21.33. This study also refines a region previously associated with AIS at 10q24.32, suggesting that LBX1AS1, an antisense transcript of LBX1, might be the functional variant of AIS. These findings provide additional insights into the genetic architecture of AIS. Further dissection of these loci is likely to expand our understanding of the aetiology of AIS.

Methods
Subjects. The current case-control analysis was composed of four stages, including the initial genome-wide discovery stage followed by three replication stages. The study has been approved by the following local Institutional Review Boards (IRB): IRB of Nanjing Drum Tower Hospital, IRB of The Chinese University of Hong Kong, IRB of The West China Hospital, IRB of The Second Affiliated Hospital of Zhejiang University and IRB of China-Japan Union Hospital of Jilin University. Patients who came to our joint scoliosis research clinics between 2000 and 2015 were reviewed for the eligibility to be included in this study. The inclusion criteria were as follows: (1) female; (2) diagnosed as AIS through clinical and radiologic examinations; (3) with Cobb angle 420°; and (4) having right major thoracic curve. Patients with scoliosis secondary to known aetiology, such as congenital scoliosis, neuromuscular scoliosis, scoliosis secondary to skeletal dysplasia and connective tissue abnormalities, were excluded from the study. Healthy subjects who came to our hospital for routine physical examinations were recruited as controls. Overall, there were 4,317 patients and 6,016 controls included in our study, all from ethnic Han Chinese population and collected in Nanjing of Jiangsu Province, Hangzhou of Zhejiang Province, Chengdu of Sichuan province, Changchun of Jilin Province and Hong Kong. The detailed profile of the subjects was as follows: in the discovery stage, 728 cases and 1,500 controls were collected in Nanjing, and the other 273 cases were collected in Hong Kong; in replication stage 1, all the 1,916 cases and 2,001 controls were collected in Nanjing; in replication stage 2, 307 cases and 1,735 controls were collected in Nanjing, and 600 cases and 180 controls were collected in Hong Kong, 15 cases were collected in Hangzhou, 10 cases were collected in Chengdu and 18 cases were collected in Changchun; in replications stage 3, all the 450 cases and 600 controls were collected in Nanjing. Under the approval of each participating centre's Ethical Committee, informed ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9355 consent was obtained from all the participants and from the guardians of the adolescents. Clinical characteristics of the participants in the discovery stage were summarized in Supplementary Table 1.
Genotyping and quality control for GWAS. Genomic DNA were extracted from peripheral blood leukocytes using a commercial DNA extraction kit (QIAGEN). Genotype analysis in the discovery stage was conducted using the Affymetrix Genome-Wide Human SNP Array 6.0 according to the manufacturer's protocol. Systemic quality control on the raw genotyping data in the discovery stage was carried out using PLINK (v1.90). The exclusion of unqualified SNPs was performed on the basis of following criteria: (1) with call rates o95%; (2) with minor allele frequency o0.05; (3) showing deviation from Hardy-Weinberg equilibrium (Po1 Â 10 À 5 ); and (4) mapping on non-autosomal Chromosomes. For sample quality control, cryptic relatedness for each sample was evaluated with an identityby-state method, and subjects showing second-degree relatedness or closer were removed (n ¼ 35). In addition, seven individuals with more than 10% missing genotypes were removed. There was no individual showing gender discrepancies based on their X chromosome genotypes. To detect population outliers and stratification, we used PCA implemented in the software package EIGENSTRAT 49 . A total of 2,459 participants remaining after sample quality control were analysed together with 210 HapMap subjects (60 YRI, 60 CEU, 45 JPT and 45 CHB individuals). Using the top two associated principal components, we identified that all individuals were of Chinese ancestry. Subsequently, we performed PCA using only the genotype information of the case and control subjects, and the scatterplot indicated that all cases and controls were genetically matched with minimal evidence of population stratification ( Supplementary Fig. 1). After the quality control process, a total of 960 cases and 1,499 controls within Chinese Han population and 516,220 SNPs were used in the final analysis.
Selection of SNPs for the replication stage. SNPs surpassing a threshold of suggestive genome-wide significance (Po1 Â 10 À 5 ) in the discovery stage were potential candidates for replication. Among these SNPs, those with ambiguous genotype scatter plots were excluded from the replication stage. A total of 43 SNPs remained after filtering, from which we chose 16 representative SNPs for replication stage 1 and removed the other 27 SNPs due to high linkage disequilibrium with at least 1 of the 16 representative SNPs (Supplementary  Table 2 and 3). After replication stage 1, 4 SNPs were included in replication stage 2 and 3, including rs678741, rs241215, rs13398147 and rs4940576 (Table 1 and Fig. 1). The SNP genotyping in replication stage 1 was performed with Sequenom MassARRAY system (Sequenom Inc.). Sixteen SNPs were multiplexed in one assay, and the primers were designed by custom software to give unique mass ranges for each SNP. Samples of replication stage 2 and 3 were genotyped using TaqMan SNP Genotyping Assay, which was read with an ABI Step-One-Plus sequence detection system (Applied Biosystems, Foster City, CA). Approximately 20 ng of genomic DNA was used to genotype each sample. Several quality control measures were applied as listed below. Laboratory technicians performing the genotyping assay were kept blind of case or control samples. In all, 5% of the samples were randomly selected as blind duplicates, with a reproducibility rate of 100%. DNA samples with 410% missing genotyping were excluded.
Statistical analysis. In the discovery stage, a Cochran-Armitage trend test was used to calculate the association of each SNP with the disease. We calculated OR values and 95% CIs from a 2 Â 2 allele frequency table. Data from the discovery stage and three replication stages were combined using the Mantel-Haenszel method. The heterogeneity between studies was examined using the Q-statistic P value 50 . The inflation factor l was calculated by dividing the mean of the lower 90% of the test statistics by the mean of the lower 90% of the expected values from a w 2 distribution with 1 degree of freedom. We used PLINK 1.90 for general statistical analysis 51 . The Manhattan plot of À log10 P was generated using Haploview (v4.2) 52 . A quantile-quantile plot generated with R (v2.6) was used to evaluate the overall significance of the GWAS results and the potential impact of population stratification.
Imputation. We performed genotype imputation within 800 kb around the four significant regions using MaCH-Admix software 53 . We used the linkage disequilibrium and haplotype information from the 1000 Genomes Project (phased CHB and CHS data; March 2012) as the reference set to impute ungenotyped markers (Supplementary Table 4). SNPs with minor allele frequency r0.05 or with a low imputation quality score (R 2 r0.30) were excluded to ensure good imputation quality. Logistic regression analysis was used to determine the association of the SNPs with the AIS risk. The linkage disequilibrium pattern around the risk-associated SNPs was analysed to identify susceptibility genes that were covered by the linkage disequilibrium blocks. Linkage disequilibrium structures and haplotype block plots were generated with Haploview (v4.2) 52 . The four significant regions were plotted using the online tool LocusZoom 54 .
Functional annotation. To explore the regulatory properties of the association signals, we used chromatin state segmentation in LCL data generated by the ENCODE project 19 . HaploReg was used to examine whether any of the SNPs or their proxies (that is, r 2 4 0.8 in the 1000 Genomes CHB reference panel) annotate putative transcription factor binding or enhancer elements (Supplementary Table 5-8) 23 .
Subgroup analysis. To investigate the relationship between the novel associated SNPs and curve severity, two types of patients were included in the subgroup analysis: (1) being consistently observed until skeletal maturity; and (2) undergoing fusion surgery due to curve progression. Skeletal maturity was determined based on chronological age (Z16-years-old) or Risser sign (44). The curve severity was recorded at the latest visit for patients being consistently observed, or at the last preoperative visit for those undergoing surgery. One-way analysis of variance test was used to compare the curve severity among different genotypes. Statistical significance was set at a P value of o0.05.