Central corneal thickness (CCT) is a highly heritable trait associated with complex eye diseases such as keratoconus and glaucoma. We perform a genome-wide association meta-analysis of CCT and identify 19 novel regions. In addition to adding support for known connective tissue-related pathways, pathway analyses uncover previously unreported gene sets. Remarkably, >20% of the CCT-loci are near or within Mendelian disorder genes. These included FBN1, ADAMTS2 and TGFB2 which associate with connective tissue disorders (Marfan, Ehlers-Danlos and Loeys-Dietz syndromes), and the LUM-DCN-KERA gene complex involved in myopia, corneal dystrophies and cornea plana. Using index CCT-increasing variants, we find a significant inverse correlation in effect sizes between CCT and keratoconus (r = −0.62, P = 5.30 × 10−5) but not between CCT and primary open-angle glaucoma (r = −0.17, P = 0.2). Our findings provide evidence for shared genetic influences between CCT and keratoconus, and implicate candidate genes acting in collagen and extracellular matrix regulation.
Central corneal thickness (CCT) is a highly heritable quantitative trait, with heritability estimates ranging between 0.68 and 0.951,2,3,4. The corneal stroma, which accounts for 90% of the corneal thickness in humans, is composed of uniformly arranged type I collagen fibrils that are critical to the optical properties of the cornea. Corneal thinning is a common feature of rare Mendelian connective tissue disorders, such as Ehlers-Danlos syndrome (EDS), Marfan syndrome and osteogenesis imperfecta (OI), and extreme thinning is a clinical characteristic of brittle cornea syndrome (previously classified as EDS type VIB based on non-ocular features shared with EDS)5,6,7. Thinner CCT is also observed in more common ocular disorders such as keratoconus8, and has been associated with development and progression of primary open-angle glaucoma (POAG)9,10,11,12. Keratoconus is the leading cause of corneal transplants worldwide13 and its prevalence varies widely depending on the ethnicity, ranging from 0.02 in 100,000 to 229 in 100,00014,15. POAG accounts for around 74% of all cases of glaucoma, which is the most common cause of irreversible blindness worldwide16.
Genetic variants that affect the functions of genes responsible for maintaining the structural integrity of cornea are strong candidates for involvement in corneal thickness-associated diseases. We previously reported that six CCT-associated single nucleotide polymorphisms (SNPs) were also associated with keratoconus (using N = 874 cases and 6085 controls), including two strong associations (mean odds ratio and lower 95% confidence interval estimates greater than 1.2), at the FOXO1 and FNDC3B loci17. The latter was also shown to be associated with POAG (using N = 2979 cases and 7399 controls), although not in the direction that was expected (i.e., the CCT-decreasing allele was associated with decreased risk of POAG)17.
Over 26 loci have been associated with CCT to date, explaining around 8% of the CCT heritability17. Increased knowledge of the genetic basis of the variation in CCT in the general population promises to help in prioritising future research in corneal disease. To identify new CCT-associated loci, we performed a larger cross-ancestry genome-wide association study (GWAS) including over 25,000 individuals of European and Asian descent, with genotypes imputed to the 1000 genomes reference panel. Further, we assess the relevance of CCT influencing loci to the risk of keratoconus and POAG using slightly larger (keratoconus) and substantially larger (POAG) ocular disease datasets than those previously described17.
Meta-analysis of GWAS studies
The overall study design and main findings are depicted in Supplementary Fig. 1. In stage 1, we meta-analysed GWAS results from 14 studies comprising 17,803 individuals of European ancestry (see details in Supplementary Table 1). The inflation factor for European-specific meta-analysis was 1.075 (lambda scaled to n = 1000 is 1.004), which suggests the population stratification had a negligible effect on our meta-analysis. The European-specific meta-analysis identified 28 genome-wide significant CCT loci (P < 5 × 10−8) (Supplementary Table 2 and Supplementary Fig. 2a, b). Of these, seven were novel loci and map (as per closest gene) to LTBP1, STAG1, ARL4C, NDUFAF6, ADAMTS8, DCN and POLR2A. In stage 2, we examined the 28 lead SNPs from stage 1 in the Asian-specific meta-analysis (n = 8107) and found that 16, including the novel lead SNPs within or close to ADAMTS8 and DCN, were significant after Bonferroni correction (P ≤ 1.79 × 10−3, 0.05/28), further three other SNPs including the two novel close to STAG1 and NDUFAF6 were nominally significant (P < 0.05). The effect estimates of these 19 (16 + 3) loci were in the same direction and order of magnitude as in the European-specific meta-analysis (Tables 1 and 2 and Supplementary Table 2). Lead SNPs at four of the nine remaining loci, including at LTBP1, did not meet our filtering criteria in the Asian-specific meta-analysis (see Methods section). Lead SNPs at the remaining five loci showed the same direction but did not reach nominal significance, with SNPs at ARL4C and POLR2A displaying little effect in Asian populations. Meta-analysis of Asian-specific cohorts alone did not result in novel genome-wide significant findings (Supplementary Table 3 and Supplementary Fig. 3a, b). Because most loci had consistent effect directions in both European and Asian meta-analyses, we performed in stage 3 a cross-ancestry fixed-effect meta-analysis to detect additional loci associated with CCT (N = 25,910). This stage 3 meta-analysis identified 44 loci associated with CCT of which 19 were novel findings (Fig. 1 and Supplementary Figs. 4, 5). These 19 loci included five of the seven loci found in stage 1 (European-meta-analysis) and 14 novel ones with similar effect size and direction across-ancestries, see Tables 1 and 2. Two of the 44 loci are driven by low-frequency variants (i.e., 0.01 < minor allele frequency [MAF] < 0.05) identified in the European-specific meta-analysis (both are monomorphic in Asians), one novel in LTBP1 and one known in WNT10A18. The remaining 42 loci were all consistent across ancestries.
In our previous CCT GWAS we identified loci harbouring multiple independent variants17. To identify additional independently associated variants in European population, we performed conditional and joint multiple-SNP (CoJo) analysis implemented in the program GCTA. We used genotype data of 2582 unrelated Australians from the BMES cohort19. The CoJo analysis resulted in 16 independent SNPs, of which seven have not been previously associated with CCT (Table 3). Thus, in total, we identified 44 loci associated with CCT, harbouring 54 independent association signals (i.e., 28 previously published + 19 from the cross-ancestry meta-analysis + 7 from CoJo analyses).
Gene-based association analysis
To further identify loci not implicated in the single marker association tests, we performed gene-based tests using the software VEGAS220 using a ‘−10 kb’ window (see Methods section). We performed separate analyses for European-specific and Asian-specific meta-analyses results. In total 24,769 autosomal genes were analysed. Hence, we set our Bonferroni-corrected gene-based significant threshold as Pgene-based < 2.02 × 10−6 (0.05/24,769). In addition to genes implicated through the single marker association tests, we found significant association of the CDO1 gene with CCT (Pgene-based = 3.74 × 10−7, Supplementary Data 1). This gene showed strong association in the European gene-based study (Pgene-based = 2.00 × 10−7), with a top variant rs34869 (P = 7.88 × 10−8) driving the association.
Clinical relevance of CCT-associated loci
We first investigated whether the CCT-associated variants influence susceptibility to keratoconus and to POAG. Since keratoconus is characterized by progressive thinning of the cornea and reduced CCT is associated with POAG, we expected—if the underlying mechanisms are shared—that the CCT-reducing alleles would also increase the risks of keratoconus and of POAG. We aimed to test the association of all 54 independent CCT SNPs (or their proxies, r2 > 0.8) in the case-control studies. However, after quality control, only 36 SNPs were available in the keratoconus studies; all 54 SNPs were available in the POAG studies. We used a P-value of 5.56 × 10−4 (0.05/90) as Bonferroni-corrected significance threshold.
The keratoconus cohorts comprised 933 cases and 5946 controls of European ancestry. Overall, we found a significant negative correlation of effect sizes across CCT and keratoconus (r = −0.62, P = 5.30 × 10−5) (Fig. 2a, Supplementary Table 4), this correlation was largely unchanged if the known SNPs in ZNF469, FOXO1, COL5A1 and MPDZ/NFIB were removed from the analysis (r = −0.61, P = 2.04 × 10−4). Of the 36 CCT SNPs tested for association with disease risk in the keratoconus studies, three were significant and with the expected direction of effect (rs66720556 between MPDZ-NFIB, rs3132303 between RXRA-COL5A1, and rs2755238 close to FOXO1). Another 14 independent SNPs were associated at a nominal level of significance (P < 0.05). Of these, 12 showed the expected risk effect direction, including three tagging known CCT loci that had not reached nominal significance in our previous study (using a different proxy SNP), and four tagging novel CCT loci (DCN, LTBP1, STAG1, and THBS2). Of those, rs7308752 in DCN displayed the smallest P-value (P = 6.33 × 10−3, Supplementary Table 4).
Analyses in POAG cohorts included 5008 cases and 35,472 controls of European ancestry. None of the 54 available CCT-SNPs were significantly associated with POAG after correcting for multiple testing (Supplementary Table 5). Further, no correlation in effect sizes between CCT and POAG was found (r = −0.17, P = 0.2, Fig. 2b). Five variants were nominally associated (rs6445046 in FNDC3B, rs66720556 between MPDZ and NFIB, rs56009602 in ADAMTS8, rs10161679 between SNORD36 and BASP1P1, and rs2755238 close to FOXO1). Of these, association between rs6445046 in FNDC3B and POAG was the strongest (P = 3.18 × 10−3). However, as in our previous study17, the CCT-decreasing allele tagging FNDC3B was associated with a decreased risk of glaucoma rather than the expected increased risk. The next strongest association was for rs2755237 close to FOXO1 (P = 3.70 × 10−3); in this instance with the CCT-decreasing allele C being associated with an increased risk of glaucoma (rs2755237-C allele glaucoma OR = 1.15).
We then investigated whether CCT-associated loci are located in the vicinity (less than 1 Mb away) of rare Mendelian disorder genes (Supplementary Data 2). We identified that 20.5% (9/44) of the CCT loci are within 1 Mb of a Mendelian gene implicated in rare corneal or connective tissue diseases. In addition to the more immediate connections previously recognised (COL5A1—classical EDS, ZNF469—brittle cornea syndrome, COL8A2—Fuchs endothelial dystrophy), AGBL1—a Fuchs endothelial dystrophy gene21—is 784 kb away from rs4843040 on chr15q25.3, and SMAD3 is a Loeys-Dietz connective tissue syndrome gene. For the CCT loci identified here, DCN (12 kb from rs7308752) and KERA in the same locus (75 kb from rs7308752), are involved in congenital stromal corneal dystrophy (OMIM: 610048) and Cornea plana 2 (OMIM:217300), respectively. Finally, three connective tissues disease genes harboured lead SNPs for new CCT loci, ADAMTS2 (intronic lead SNP rs35028368) involved in EDS, type VIIC (OMIM: 225410), FBN1 (intronic lead SNP rs8030753) the Marfan syndrome major gene (OMIM: 154700), and TGFB2 (intronic lead SNP rs4846476) a Loeys-Dietz syndrome gene.
Regulatory potential of CCT-associated variants
We explored regulatory annotations within the 54 independent CCT lead SNPs and their proxies (r2 > 0.8) using different tools (see Methods section). In total, 974 variants (i.e., 54 lead SNPs + 920 SNPs in LD) were examined. Of these, 118 were prioritized including the 54 lead SNPs and another 64 SNPs which were selected based on their RegulomeDB score22 (i.e.,1a–1f, 2a–2c or 3a). SNPs with a score from 1a–1f to 2a–2c were classified as showing maximum evidence for being located in regulatory regions, while SNPs with a score of 3a were classified as showing medium evidence (Supplementary Data 3). In total, 63% (75/118) of the prioritized SNPs overlap with at least two regulatory elements of the ENCODE data (i.e., promoter or enhancer histone marks, DNase I hypersensitive sites, transcription factor or other protein-binding sites and eQTLs). Strong enrichment for histone modifications, particularly, H4K20me1 (which indicates transcriptional activation), was also found when results from the European-specific meta-analysis were assessed using GARFIELD (http://www.ebi.ac.uk/birney-srv/GARFIELD) (Supplementary Fig. 6). Additionally, we found 26 SNPs in eight loci showing a cis-eQTL effect in skin, which share the same embryonic origin as the cornea (Supplementary Data 3). Further, we tested if genes in associated CCT loci were highly expressed in any of the 209 Medical Subject Heading (MeSH) annotations used in DEPICT23. Tissue-enrichment analyses showed 33 FDR-associated (<0.05) tissues or cell type annotations. Of these, one annotation included the musculoskeletal system, five included tissues such as the muscle and connective tissue, and nine included cell types such as myocytes, osteoblast, chondrocytes, mesenchymal stem cells, stromal cells and fibroblasts (Supplementary Table 6). DEPICT prioritized 54 genes, of which 85% (46/54) are expressed in the human cornea. High expression levels (>200 PLIER) were observed for SMAD3, COL12A1 and DCN, LUM, KERA (Supplementary Table 7), the latter three being at the same “DCN” locus.
We tested enrichment of the genes defined by VEGAS2 in 9981 pathways or gene-sets derived from the Biosystem’s database. Using a 10 kb window in the VEGAS2 computation, we identified 23 pathways that were significantly enriched after correcting for multiple testing (Pgene-set < 5.01 × 10−6), Supplementary Data 4. The majority of these gene-sets are involved in the metabolic activities associated with collagen and extracellular matrix (ECM). We confirmed the previously identified significant association of the collagen trimer pathway (GO:0005581) with CCT17. Additional pathways involved in basement membrane (GO:0005604), TGF-β regulation (GO:0071636) and skeletal system development (GO:0001501), were also identified as associated with CCT. Similar pathways were observed using single variants with sub-threshold association P-values < 1 × 10−4 as input for the interval enrichment analysis (INRICH) method, Supplementary Table 8. Pathway analysis using a 200 kb window in VEGAS2, showed comparable pathways and additionally revealed the endoplasmic-reticulum-associated protein degradation (ERAD) pathway (GO:0036503) (Supplementary Data 5). The ERAD pathway also emerged as an overrepresented canonical pathway in the Ingenuity Pathway Analysis (IPA) (Supplementary Fig. 7), along with pathways related to connective tissue disorders and metabolism (Supplementary Tables 9, 10).
The FUMA platform (https://ctg.cncr.nl/software/fuma_gwas) highlighted that eight of the closest genes to the 44 CCT cross ethnic meta-analysis lead variants are amongst the 64 fibroblastic signature genes overexpressed in cancer cells that have undergone epithelial to mesenchymal transition24: THBS2, COL5A1, FBN1, LOXL2, DCN, LUM, COL6A2 and GLT8D2.
In this study, we identified 44 loci associated with CCT (42 across ancestry and two European specific—LTBP1 and WNT10A), 19 of which are novel findings. We also found that six of the 44 loci harbour multiple independent signals associated with CCT. Furthermore, we explored the relevance of CCT to complex eye diseases (i.e., keratoconus and POAG) and to Mendelian disorders. We found evidence of a strong inverse correlation in effect sizes between CCT and keratoconus, but not between CCT and POAG. Interestingly, 20.5% (9/44) of the CCT-associated loci are located close or within genes implicated in rare corneal or connective tissue disorders.
We confirmed all loci, except one (rs3749260 in GPR15), reported in our previous study by Lu et al.17 The variant in GPR15 found by Lu et al.17 in the European-specific analysis did not reach genome-wide significance in either our European-specific (P = 2.15 × 10−6) or cross-ancestry meta-analysis (P = 6.93 × 10−5); this could be due to the additional samples analysed or/and different quality of imputation. Additionally, in our study, we identified only one signal in the chr7q11.21 region, which in Lu et al.17 was reported as two independent loci, TMEM248 (also called as C7orf42) and VKORC1L1. Interestingly, the variant found in our study (rs34557764) lies in RABGEF1 with established eQTL effect in 90 tissues, influencing the expression of various genes including TMEM248 in testis, and VKORC1L1 in skin, blood and esophagus muscularis25,26, and further studies will be needed to ascertain the associated target gene(s). Overall, we report 44 loci harbouring 54 independent CCT-associated SNPs. These associations explain 8.5% and 7.2% of narrow sense CCT heritability in the European and Asian populations, respectively. Despite the small increase in the variance explained in the present study (~0.2%), the new loci greatly improved our understanding of potential underlying mechanisms.
At the newly identified CCT loci, if we select the nearest gene to the top SNP, we can putatively identify genes related to collagen and ECM (ADAMTS2, ADAMTS8, COL6A2, COL12A1, FBN1, LOXL2, LUM/DCN/KERA, THSB2), skeletal morphogenesis (RUNX2), embryonic development and cell growth (FGF1), TGF-β signalling (TGFB2, LTBP1), binding processes (ARVCF, STAG), coagulation and fibrinolysis systems (HABP2), endocytic machinery (STON2) and mitochondrial processes (NDUFAF6). It is important to stress that for several of these genes, the nearest gene may not be the relevant gene because the associated SNPs can have their primary effect on a more distant gene or genes. However, for a subset of the above nearest genes, additional information is available to support the noted gene. For example, knockout mouse models available for these genes have shown a variety of cornea-related phenotypes, including thin corneal stroma (FBN1, KERA, LUM, TGFB2)27,29,30,31,32,32, corneal opacity (LUM)30,32,32, absence of corneal endothelium (TGFB2)27, delayed corneal endothelium maturation and increased thickness (COL12A1)33. While in other mouse models, observed phenotypes included fragile skin (ADAMTS2, DCN, LUM)30,31,34,35 or bone abnormalities (RUNX2, COL12A1)33,36. Moreover, connections between ECM, skeletal and TGF-β signalling pathways give support for the implicated genes to influence CCT. For instance, it has been reported that fibrillin (encoded by FBN1) plays an important role in the ECM by controlling TGF-β signalling37. In addition, the latent transforming growth factor β–binding protein 1 (encoded by LTBP1) is an ECM protein thought to mediate the binding between fibrillins and TGF-β, influencing the growth factors availability in bone and connective tissues.
Another locus of interest harbouring genes related to ECM and collagen is the “DCN” locus (top SNP rs7308752) in which DCN, KERA and LUM genes are located. Decorin (encoded by DCN, the closest gene to rs7308752) is a leucine-rich proteoglycan that promotes the formation of collagen fibers but also binds to the various isoforms of TGF-β and fibronectins38. Mutations in DCN have been identified in congenital stromal corneal dystrophy35,39; which produces characteristic corneal opacity and increased corneal thickness35, making it an excellent candidate CCT gene. However, rs7308752 also shows a significant cis-eQTL effect (P = 3.1 × 10−6, in adipose tissue), modifying the expression of KERA, a keratan sulfate proteoglycan vital for maintaining corneal transparency. Mutations in KERA cause cornea plana-2 (OMIM:217300)40,41, a recessive corneal disorder characterised by flattening of the normally convex corneal surface. Additionally, LUM, another gene in the region (30 kb apart from rs7308752), is a member of the small interstitial proteoglycan gene family and has been implicated in high myopia32,42. In the ocular tissue database43, all three genes showed high expression levels in the cornea, with KERA showing the highest expression.
Our gene-based analysis identified association of the CDO1 gene with CCT. The CDO1 protein, is a cysteine dioxygenase type 1, involved in various metabolic pathways. Expression studies in mouse found that CDO1 is overexpressed in cornea compared with the lens; and based on its function, may play a role in protection against oxidative stress44. The top variant, rs34869, leading association of CDO1, is an established eQTL in transformed fibroblasts26, modifying the expression levels of CDO1 and it is encompassed within promoter histone marks and DNase I hypersensitive sites in at least 20 tissues.
Corneal thinning is one of the clinical features of keratoconus. We found in the keratoconus analysis a consistent direction of effect in 77% (28/36) of the CCT-associated SNPs (Fig. 2a). This finding suggests that the effect of variants on keratoconus is mediated through their effect on CCT. We did not observe the same trend in the POAG analyses (Fig. 2b), with our data providing no support for a role for CCT SNPs in determining POAG risk.
Interestingly, besides the “DCN” locus, three other loci harbour genes implicated in Mendelian diseases including rare connective tissue, inflammatory and eye disorders with corneal thinning as one of their clinical features, giving weight for them to be prioritized in follow-up studies. The cross-ancestry GWAS revealed an intronic variant (rs8030753) in the FBN1 gene. Mutations in FBN1 are the major cause of Marfan syndrome, an autosomal dominant disorder characterized by multiple manifestations in the ocular, skeletal, and cardiovascular systems. Patients with Marfan syndrome have flattened corneas with reduced stromal thickness45. Common genetic variants in FBN1 have also been associated with ocular refraction46. The rs8030753 shows a significant cis-eQTL effect modifying the expression of FBN1 in whole blood (P = 4.1 × 10−10)26. Furthermore, we identified an intronic variant in ADAMTS2, which encodes a metalloproteinase involved in collagen metabolism47; Mutations in ADAMTS2 have been found in patients with EDS, type VIIC48, a recessive inherited connective-tissue disorder. We also identified a common CCT variant (rs4846476) in TGFB2. It has been shown that TGFB2 is down-regulated in skin fibroblasts of brittle cornea syndrome patients carrying PRDM5 mutations49. Our analysis brings the number of CCT-associated loci implicated in Mendelian diseases to nine, representing 20.5% (9/44) of the CCT loci. Most of the Mendelian disorders genes (8/9) are located within a 200 kb window from the lead SNP (Supplementary Table 12) with the exception of AGBL, located −784 kb away from rs4843040 in the 15q25.3 CCT-locus. Studies correlating gene variation to gene expression have found that most of the enhancers are located within a 200 kb window50,51, supporting the hypothesis that lead CCT–associated SNPs might have an impact on the expression of genes that cause rare eye and connective tissue disorders. Our study reveals a considerable proportion of Mendelian genes as candidate genes involved in a quantitative trait.
Although findings of pathway analyses remain speculative, our exhaustive analyses suggest that the leading pathways implicated in CCT are related to the function and metabolism of connective tissue (e.g., collagen, ECM and basement membrane), as well as the regulation of TGF-β signalling, the development of skeletal system, and the ERAD pathway.
In conclusion, we have identified 19 novel loci associated with CCT and novel independent signals in six known loci. Together, CCT loci clearly point to genes implicated in collagen related pathways and ECM metabolism. The enrichment analyses highlighted gene-sets involved in collagen fibrils, ECM organisation, TGF-β signalling and fibroblastic determination, all fitting with a largely stromal contribution to CCT. Functional studies need to be performed to confirm which gene or genes are relevant at each locus and to assess the underlying mechanisms by which genetic variation influences CCT some of which promise to inform on the risk of complex diseases such as keratoconus.
Study design and sample description
We performed meta-analyses of 1000 genomes phase 1 (integrated variant set- March 2012 release) imputed GWASs on CCT and tested significance of associations in keratoconus and POAG cohorts for lead CCT SNPs. The overall study design and main findings are depicted in Supplementary Fig. 1. In total, 19 CCT cohorts (N = 25,910) from the International Glaucoma Genetics consortium (IGGC) participated in this study. In stage 1, we performed a meta-analysis of cohorts of European ancestry (14 cohorts, N = 17,803). In stage 2, genome-wide significant variants (P < 5 × 10−8) from stage 1 were tested in a meta-analysis of cohorts of Asian ancestry (5 cohorts, N = 8107). We then performed in stage 3, a joint meta-analysis of European-specific and Asian-specific results. The individual cohorts are as described in detail in previous publications18,52, with summary statistics and imputation details in Supplementary Table 1. To investigate the role of the identified CCT loci in keratoconus and POAG, we then tested the implicated CCT loci in disease case-control sets. The keratoconus datasets comprised cases and controls from Australia (711 keratoconus cases and 2622 controls from the Blue Mountains Eye Study) and the United States (222 keratoconus cases and 3324 controls). The POAG cases and controls were drawn from studies in Australia (1155 cases and 1992 controls) and the United States (3853 cases and 33480 controls). Detailed information of the keratoconus and POAG cohorts can be found in the Supplementary Note 1. The local research and medical ethics committees approved the individual studies. Written informed consent was obtained from all participants (or parents in case of minors) in accordance with the Declaration of Helsinki.
Ancestry-specific and cross-ancestry GWAS meta-analyses
All participating studies performed association testing under an additive model for the effect of the risk allele while adjusting for age, sex and at least the first five principle components for the population-based studies. In samples with related individuals association testing accounting for family structure was conducted using the –fastAssoc option in MERLIN53 or the –mmscore54 option implemented in GenABEL55. Before meta-analysis, we removed variants with MAF < 0.01, and with imputation quality scores less than 0.3.
Ancestry specific meta-analyses (European-specific and Asian-specific) and joint meta-analysis were performed using the inverse variance fixed effect scheme implemented in the software METAL56. The ‘genomic inflation’ correction option was used in METAL56 and applied to all input files. We also computed the test statistics for heterogeneity of effect among studies for each variant using Cochran’s Q-test. We removed variants with heterogeneity P-value < 0.001 from both European-specific and Asian-specific meta-analyses. Moreover, we focused on variants that were present in more than 25% of participating studies in the European-specific meta-analysis (at least four studies) and the Asian-specific meta-analysis (at least two studies). Finally, to detect additional loci associated with CCT, we performed a fixed-effect cross ancestry meta-analysis.
Selecting independent variants
We applied the conditional and joint (CoJo) analysis approach57 implemented in the software Genome-wide Complex Trait Analysis58 (GCTA) on European-specific meta-analysis results in order to identify potentially independent signals within the same genomic regions. For this CoJo analysis we used 1000 genomes phase 1 imputed data from Blue Mountain Eye Study (BMES) population cohort comprising 2582 individuals of European ancestry to calculate linkage disequilibrium (LD) patterns. We used the software GTOOL-v0.7.5 (http://www.well.ox.ac.uk/%7Ecfreeman/software/gwas/gtool.html) to convert BMES IMPUTE2 data (both SNPs and Indels) to the plink format. This conversion changes A/T/G/C/I/D/R based allele coding to 1 or 2 (first and second allele). We extracted variant IDs, hg19 genomic locations and converted A/T/G/C/I/D/R (from 1/2 based allele coding) for all 16,666,330 available variants (MAF > 0.01) from BMES data and merged it with the European-specific meta-analysis result file based on hg19 genomic location. Further quality checking was done by plotting the allele frequencies of the allele 1 of variants in chromosome 22 in the BMES cohort and European-specific meta-analysis summary file. The edited European-specific meta-analysis summary file with 1/2 allele coding was used as an input for the CoJo analysis. In the CoJo analysis we considered 5 × 10−8 as the genome-wide significant threshold. We did not perform CoJo analysis in the Asian studies because the various Asian sub-studies (Indian, Malay, Chinese) had differing ancestry within Asia and we did not have access to a suitably large (i.e., N > 2000) reference genotype data set for each Asian sub-population.
We performed gene-based association testing using the VEGAS2 software20. VEGAS2 is an extension of the VErsatile Gene-based Association Study (VEGAS) approach59 that uses 1000 genomes reference data to estimate LD between variants and provides a test using a more flexible gene boundary. For this analysis, we considered ‘−10 kbloc’ parameter, which assigns all variants in the gene or within 10 kb on either side of a gene’s transcription site to compute a gene-based P-value. We performed analysis using the default ‘-top 100’ test that uses all (100%) variants assigned to a gene to compute gene-based P-value. We used 1000 Genomes phase 1 European and Asian populations to compute LD between variants for European-specific and Asian-specific gene-based analysis respectively. Finally, we meta-analysed the European-specific and Asian-specific gene-based results using Fisher’s method for combining P-values.
Analysis of case-control cohorts
We tested the lead CCT-associated SNPs in two keratoconus datasets with 933 cases and 5946 controls and two POAG datasets with 5008 cases and 35,472 controls. Details of the disease cohorts can be found in the Supplementary Note 1. For both keratoconus and POAG we meta-analysed the association results for individual study samples using a fixed effect approach. The significance threshold for replication was established using the Bonferroni method for multiple testing correction.
CCT-associated loci and Mendelian diseases
We assessed whether the CCT-associated loci overlapped with candidate genes for rare Mendelian diseases. For this analysis we downloaded the most up-to-date annotations of genes to the Mendelian disease from the Online Mendelian Inheritance in Man (OMIM) portal60 on 26th July 2016. We converted the genomic locations of CCT-associated variants from hg19 (or GRCh37) to the GRCh38 human genome build using the software liftOver61,62 and extracted all gene transcription start sites that lie within the 1 mega-base (Mb) on either side of a given variant.
Identifying regulatory variants
Using the software HaploReg (version 4.1)25 and RegulomeDB v1.122, we investigated regulatory annotations for variants in LD (r2 > 0.8, 1000 genomes CEU) with the CCT-associated SNPs. To prioritize functional SNPs, we first used HaploRegv4.1 to extract all variants in LD with the 54 independent index SNPs and examined whether variants overlapped with regulatory elements of the ENCODE data, with the caveat that those do not include corneal tissue or cell lines data. We then used the RegulomeDB score to assess their potential functional consequence, as described previously63. Tissue-enrichment and gene prioritization analyses were performed with the DEPICT23 framework, using independent CCT genome-wide significant SNPs. We also investigated the expression of functionally relevant genes in associated loci using the Ocular Tissue Database, https://genome.uiowa.edu/otdb/, in which gene expression is indicated as Affymetrix Probe Logarithmic Intensity Error (PLIER) normalized value (with normalization in PLIER as described in Wagner et al.,43). Further, we used GARFIELD to assess enrichment of CCT association signals in regulatory features, using the 1005 features extracted from ENCODE, GENCODE and Roadmap Epigenomics projects provided by the software developers.
Pathways analysis based on VEGAS2 gene-based P-values
We adopted the resampling approach to perform pathway analyses using VEGAS2 derived gene-based P-values considering a ‘−10 kbloc’ and ‘−200 kbloc’ parameters respectively. The latter was performed to capture a larger number of nearby genes, in case the causal SNP or SNPs operate via long distance effects on genes in the wider region. The resampling approach performs a competitive test in which each gene-set is benchmarked against the ‘typical’ set of the same size. For individual gene-set, firstly we computed observed summed χ2 statistics by converting gene-based P-values of annotated genes to upper tail χ2 statistics with 1 degree of freedom. If two or more genes in a gene-set were located less than 500 kb of individual transcription sites, then only one gene was selected when computing the observed summed χ2 statistics and the other neighbouring genes were dropped out. This step might lead to loss of information but it ensures that the association of a gene-set is not driven by variants in LD. Following this, the same numbers of genes as present in a given gene set were repeatedly drawn at random from all set of genes used in the study and summed to generate the distribution of expected summed χ2 statistics. Finally, the empirical P-value of association of a gene-set is computed by comparing the observed summed χ2 statistics against the distribution of expected summed χ2 statistics using following formula:
where I() is an indicator function which denotes whether a summed χ2 statistics from a random draw (χ2*) was equal to or more than the observed summed χ2 statistics (χ2), and N is the total number of random draws performed to compute the distribution of expected summed χ2 statistics.
We performed separate European-specific and Asian-specific pathway analysis, and further combined the two ethnic-specific pathway P-values using the Fisher’s method. For pathway analysis, we considered the Biosystem’s pathways or gene-sets comprising minimum 10 and maximum 1000 genes64. In total 9981 gene-sets with 16,503 unique annotated genes were analysed. The processed Biosystems pathway/gene-set annotation file is available on the VEGAS2 webpage (https://vegas2.qimrberghofer.edu.au/).
Pathway analysis using INRICH
We also tested if single variants with association P-values less than 1 × 10−4 were enriched in the Biosystem’s pathways. For this analysis, we used the INRICH approach, which assumes a hypergeometric distribution for the null hypothesis that a pathway is not enriched with associated variants. To create LD-independent genomic regions to be tested for enrichment, we performed LD clumping with PLINK ( --clump-p1 1 × 10−4 --clump-p2 0.05 --clump-r2 0.5 --clump-range-border 20), using the 1000 Genomes European and Asian reference data, for European-specific and Asian-specific gene-set enrichment analyses respectively.
Pathway analysis using Ingenuity Pathway Analysis
To select the genes included in the IPA®, we extracted the genes +/− 200 kb of the lead SNP and further chose those that were expressed in the cornea (Supplementary Table 12). Corneal expression levels were retrieved from the Ocular Tissue Database43. A gene list including 162 genes was used to run the IPA “Core-Analysis”. Parameters of the analysis included (1) the Ingenuity Knowledge Base (Genes only) as the reference set, (2) including direct and indirect relationships, (3) experimentally observed, (4) from mammal species, (5) using all tissues and cell lines. Results were corrected for multiple testing using the Benjamini-Hochberg multiple testing correction as implemented in IPA.
GARFIELD software is available in a standalone version at http://www.ebi.ac.uk/birney-srv/GARFIELD/ and as a Bioconductor package at http://bioconductor.org/packages/release/bioc/html/garfield.html.
Summary association statistics results that support the findings of this study have been deposited in http://hdl.handle.net/10283/2976.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We gratefully thank the invaluable contributions of all study participants and staff at the recruitment centers. Complete funding information and acknowledgements for each individual study can be found in the Supplementary Note 2.
About this article
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.