## Introduction

Central corneal thickness (CCT) is an interesting morphological trait of cornea. Reduced CCT is a feature of keratoconus1, and, by some accounts, is associated with an increased risk of primary open-angle glaucoma (POAG)2,3,4. Epidemiological observational studies have implicated demographic and clinical risk factors that influence CCT, including, sex, glaucoma diagnosis, and ethnicity5,6,7,8. Individuals of African ancestry have thinner CCT but also increased POAG risk, and have worse visual field damage and disease progression compared to other populations5,6,7,9,10. It is, however, not clear whether the observed variation in CCT confounds, or directly contributes to the greater glaucoma burden in African ancestry populations. Because of the association between CCT and the common vision disorders noted above, the identification of genes which influence CCT may open new avenues for understanding the etiology of these disorders.

CCT has a strong genetic component, with heritability estimates ranging between 0.68 and 0.9511,12,13. Recently, Iglesias et al.14 reported 44 CCT-genomic regions in a cross-ancestry meta-analysis, including 19 novel loci awaiting independent replication. These 44 CCT-loci account for ~8.5% of the variance for this ocular trait.

Here, we present a large and ethnically diverse human genetic study of CCT, including, for the first time to our knowledge, African American and Hispanic/Latino individuals. Our study utilizes data from 44,039 individuals between the Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and the International Glaucoma Genetics Consortium (IGGC)14. We evaluate the effect of genetic ancestry on CCT and conduct multiethnic GWAS, identifying many novel loci. The associated loci provide candidate genes and relevant pathways and highlight differential expression of these CCT-associated genes in human ocular tissues, including cornea and lens. We also assess the effect of newly and previously identified CCT loci on the ancestry effects observed in the GERA African American ethnic group. To evaluate the clinical relevance of CCT, we examine associations of lead CCT-associated single nucleotide variations (SNVs) with keratoconus and POAG. Finally, we conduct a two-sample Mendelian Randomization analysis to clarify the nature of the relationship between CCT and POAG.

## Results

### GERA cohort and CCT

The GERA cohort is an unselected cohort of adult members of the Kaiser Permanente Northern California integrated health care delivery system, with ongoing longitudinal records from vision examinations. For this study, our GERA sample consisted of 18,129 individuals from four ethnic groups (79.9% non-Hispanic white, 7.5% Hispanic/Latino, 8.4% East Asian, and 4.2% African American) with a measured CCT (Table 1). In our GERA sample, African Americans had thinner CCTs on average than other groups, consistent with previous reports5,6,7.

### Variation in CCT by ethnicity and genetic ancestry

To examine how CCT varied within each ethnic group, we assessed the association between the first two ancestry principal components (PCs), representing geographic origin and calculated within each group separately15, and CCT. Within our African American sample, greater African (versus European) ancestry (PC1) was associated with thinner CCTs (P = 3.01 × 10−6) (Fig. 1 and Supplementary Data 1). We also observed a significant association of thicker CCT in northeastern versus northwestern European ancestry (PC2, P = 1.20 × 10−5) within our non-Hispanic white sample.

### GWAS of CCT in GERA

We conducted a GWAS meta-analysis of CCT in GERA, combining results from four individual ethnic groups (Supplementary Figs. 15), and replicated 34 (out of the 44 loci previously reported14) at a Bonferroni significance level (P ≤ 1.14 × 10−3, 0.05/44) (Supplementary Data 2). Further, five additional SNPs were nominally significant (P < 0.05). The effect estimates of these 39 (34 + 5) loci were in the same direction as in the IGGC cross-ancestry meta-analysis. Lead SNPs at the remaining five loci (i.e. COL8A2, ADAMTS2, SAMD9, LOXL2, and COL6A2) did not reach nominal significance in the GERA meta-analysis. The 28 loci that reached genome-wide significance in the GERA meta-analysis, 24 replicated at a Bonferroni level of significance (P ≤ 1.79 × 10−3, 0.05/28) in either the IGGC cross-ancestry meta-analysis or the IGGC European-specific meta-analysis (Supplementary Data 3). Further three additional SNPs were nominally significant (P < 0.05). Only the lead SNP rs112024264 at the novel locus ZNF680 did not reach nominal significance in IGGC and so was not validated.

### Multiethnic meta-analysis of GERA and IGGC

We then conducted a meta-analysis of CCT combining results from GERA and IGGC. This combined meta-analysis identified 74 loci associated with CCT (P < 5 × 10−8), of which 31 were novel (Fig. 2, Table 2, Supplementary Fig. 6, and Supplementary Data 4). The effect estimates of the 31 lead SNPs at novel loci were consistent across the 2 studies (Fig. 3), and no significant heterogeneity was observed between them (Table 2). Conducting a GWAS meta-analysis of European-specific cohorts (GERA + IGGC Europeans only) and a GWAS meta-analysis of Asian-specific cohorts (GERA + IGGC Asians only) did not result in the identification of additional novel genome-wide significant findings (Supplementary Fig. 7).

### Conditional analysis identified additional loci

Conditional and joint (CoJo) analysis in the combined (GERA + IGGC) meta-analysis (full description in Methods) revealed 24 additional independent SNPs within the identified genomic regions, including the SNP rs72755233 at ADAMTS17 on chromosome 15 which was recently identified in a GWAS of CCT conducted in the Icelandic deCODE health study8. Among those 24 independent SNPs, 10 have not been previously associated with CCT and are not proxy variants of previously reported SNPs (Supplementary Data 5), resulting in a total of 98 independent genome-wide significant signals. After Bonferroni correction (0.05/98 SNPs tested), we found that the betas were not significantly different between GERA non-Hispanic whites and GERA Hispanic/Latinos (Supplementary Data 6).

### Array heritability estimate for CCT and variance explained

We then estimated SNP-based heritability in the GERA non-Hispanic white ethnic group (the largest group of individuals from GERA) using GCTA16, and we found an SNP-based heritability estimate of 42.5% (SE = 3.3%). When included together as a genetic risk score in a linear-regression model, the 98 lead SNPs identified in the current study (74 from the combined meta-analysis + 24 from the COJO analysis) collectively explained up to 14.2% of the CCT variance in GERA non-Hispanic whites (Supplementary Data 7). To determine whether the 98 CCT-loci explain the observed association of genetic ancestry with CCT variation within GERA African Americans, we repeated the ancestry analysis, including a genetic risk score. After accounting for the effect of the 98 identified SNPs in the genetic ancestry model, the African (versus European) ancestry (PC1) association was no longer significant (P = 0.12) within the GERA African Americans. This suggests that the identified CCT-loci explain most of the ancestry effects in African Americans.

### SNP prioritization and annotations

To prioritize variants within the 74 genomic regions identified in the combined (GERA + IGGC) meta-analysis, we computed each variant’s ability to explain the observed signal and derived the smallest set of variants that included the causal variant with 95% probability17. In each of the 74 autosomal loci, the corresponding 74 credible sets contained from 1 to 143 variants (1554 total variants, Supplementary Data 8). Interestingly, among the 31 sets representing newly identified CCT genomic regions, three sets included a unique variant (i.e. CYP1B1 rs1800440, FBN2 rs154001, and NUP37 rs4611262 with 99.1%, 97.9%, and 100% posterior probability of being the causal variants, respectively), suggesting that those variants may be the true causal variants.

### Gene prioritization

To prioritize genes within the 74 GERA + IGGC genomic regions, we used the DEPICT18 integrative tool. DEPICT gene prioritization analysis detected 14 genes, of which 4 were within novel CCT-associated loci, to prioritize after false-discovery rate (FDR) correction (Supplementary Data 9). These included: C1orf129 (near PRRX1 and also known as MROH9) on chromosome 1, CYP1B1 on chromosome 2, LOX on chromosome 5, and LSP1 on chromosome 11.

### Gene expression in human ocular tissues

We then evaluated the ocular expression levels of genes at CCT loci that contained associated 95% credible set variants across various human eye tissues (i.e. choroid retinal pigment epithelium, ciliary body, cornea, iris, lens, optic nerve, optic nerve head, retina, sclera, and trabecular meshwork). Among the genes at novel CCT loci, PLEKHA1, which encodes the pleckstrin homology domain containing A1, was highly expressed (PLIER number>200) in the cornea, lens, and the trabecular meshwork according to the Ocular Tissue Database (OTDB)19 (Supplementary Data 10). Similarly, CRIM1, which encodes the cysteine rich transmembrane BMP regulator 1, was highly expressed in the cornea and the lens.

### Biological pathway annotations and prioritization

DEPICT tissue-enrichment analysis highlighted 17 significantly associated (FDR < 0.05) tissues or cell type annotations, consistent with recent findings14; 4 annotations pertained to fibroblast and collagen-rich tissues such as cartilage, joints, synovial membrane, and joint capsule (Supplementary Data 11). An additional 5 annotations involved granulation tissue, cicatrix, keloid, chorion, and extraembryonic membranes, and 5 annotations included cell types such as, osteoblasts, mesenchymal stem cells, chondrocytes, stromal cells, and fibroblasts. DEPICT gene-set enrichment analysis did not detect pathway to prioritize after FDR correction; however, nominal evidence was found for previously reported gene-sets14, including those involved in the regulation of epithelial to mesenchymal transition, extracellular matrix (ECM) organization, and collagen formation (Supplementary Data 12).

For comparison with previous works14,20, we also conducted a pathway analysis using VEGAS software21 to assess enrichment in 9732 pathways or gene-sets derived from the Biosystem’s database. Using a 10-kb window in the VEGAS2 computation, we found that 25 pathways/gene-sets were significantly enriched after correcting for multiple testing (P < 5.14 × 10−6) (Supplementary Data 13), compared to 23 pathways/gene-sets previously identified14. Similarly, most of these pathways/gene-sets contribute to the function of the extracellular matrix and collagen. In addition, we identified gene-sets related to head/face morphogenesis and development.

### Effect of the 98 CCT-associated loci on keratoconus

As previous studies have reported that CCT is significantly lower in eyes with keratoconus compared to normal eyes1,22, we evaluated the effect estimates of the 98 CCT-SNPs identified in the current study (74 from the combined meta-analysis + 24 from the COJO analysis) between CCT and keratoconus in GERA. Our GERA keratoconus cohort consisted of 207 cases and 97,375 controls (Table 1). We confirmed the negative correlation of effect estimates between CCT and keratoconus (R2 = −0.32, P = 1.30 × 10−3; Supplementary Fig. 8), as previously reported14. We then evaluated whether the 98 CCT-SNPs identified in the current study were also associated with keratoconus in a multiethnic meta-analysis combining the GERA keratoconus cohort, and an independent cohort from the UK of 2353 patients with keratoconus23 (see full description in the Methods). Among the 97 CCT-SNPs available, 20 were significantly associated with keratoconus after correction for multiple testing (P < 5.15 × 10−4, 0.05/97), including 12 at genome-wide level of significance (Supplementary Data 14). These included SNPs at ZNF469, FOXO1, MPDZ, SMAD3, and COL5A1, consistent with previous findings14 and with the expected direction of effect; but also SNPs at novel CCT-loci: LOX and FST. Further, an additional 18 were associated with keratoconus at a nominal level (P < 0.05).

### Effect of the 98 CCT-associated loci on POAG

As observational epidemiologic studies have reported that thinner CCT is associated with an increased risk of glaucoma1,3,4, we evaluated the effect estimates of the 98 CCT-SNPs identified in the current study between CCT and POAG in GERA. Our GERA POAG cohort consisted of 4986 cases and 58,426 controls (Table 1), and consistent with previous findings14, no correlation in effect estimates was found between CCT and POAG (R2 = −0.16, P = 0.12; Supplementary Fig. 9). We then investigated whether the 98 CCT-SNPs identified in the current study were also associated with POAG in GERA. Associations with glaucoma (subtype unspecified) were then confirmed in UKB24. Among the 98 lead CCT-associated SNPs identified in the current study, only SNP rs3740685 in RAPSN was associated with POAG in GERA after multiple testing correction (P = 1.9 × 10−4; Supplementary Data 15). Consistently, RAPSN rs3740685 was nominally associated with glaucoma (subtype unspecified) in UKB (P = 0.0017). Visual inspection of the association signals in the RAPSN region based on GERA results revealed that the lead SNPs for POAG (rs2167079) and CCT (rs3740685) are different (Supplementary Fig. 10). Further, those lead SNPs are relatively distant from one another (198.5 kb apart) and are only moderately correlated in European-ancestry populations (r2 = 0.40; D′ = 0.63), suggesting that they may represent different signals. When we repeated the POAG association analysis in GERA, conditioning on our most strongly associated CCT SNP (rs3740685) from our GERA meta-analysis, RAPSN rs2167079 remained significant (P = 0.0098), suggesting that our lead associated SNPs for CCT and POAG represent different signals at the RAPSN locus. This locus has previously demonstrated genome-wide significant association with intraocular pressure25,26,27, which is an important risk factor for developing POAG.

### Two-sample Mendelian randomization

In GERA, POAG was significantly associated with lower CCT after adjusting for age, sex, ethnic group, and CCT measurement type (β = −12.57 and P = 2.33 × 10−66; Supplementary Data 16). This significant association between POAG and lower CCT was true for all the GERA ethnic groups. Further, consistent with previous studies28,29,30, we observed that GERA POAG patients have, on average, thinner CCTs than controls (Mean ± SD: 537.0 ± 34.7 for POAG cases vs. 549.5 ± 34.4 for controls; P = 3.29 × 10−74). To clarify whether the observational association between CCT and POAG risk is consistent with a causal relationship, we conducted a two-sample Mendelian randomization analysis31. We built MR models using as instruments genetic effects previously SNP alleles associated with CCT in the IGGC European-specific meta-analysis14 (exposure) and associations with POAG32 (outcome of interest) observed in the GERA non-Hispanic whites (full description in Methods). The MR Egger intercept test suggested no horizontal pleiotropy (intercept = 0.027; SE = 0.020; P = 0.19) and we observed no notable heterogeneity across instrument SNP effects (Q = 32.81; P = 0.12). Our two-sample Mendelian randomization analysis did not detect any evidence of causal relationship between CCT and POAG (inverse variance–weighted (IVW) method: OR = 1.00, se = 1.00, P = 0.088), and the weighted median and MR Egger analyses yielded similar results (Supplementary Data 17 and Supplementary Figs. 1113).

## Discussion

Our analysis of 44,039 individuals identified 41 novel loci significantly associated with CCT inter-individual variation. These loci, along with those previously reported, account for 14.2% of phenotypic variation and largely explain the genetic ancestry association of thinner CCT observed in African Americans. We found that 20.6% (20/97) of the CCT-loci were significantly associated with keratoconus but only the RAPSN locus was associated with POAG after correction for multiple testing. It appears that the associations at the RAPSN locus may represent two independent signals. Finally, our MR analyses suggest, for the first time to our knowledge, that thinner CCT might not causally increase the risk of POAG.

Although GWAS-identified associations do not directly highlight a specific gene, our study revealed potential candidate genes, including LOX, FBN2, SPRY2, and CRIM1, which have all been linked to cornea and eye development. LOX encodes a member of the lysyl oxidase family of proteins, which is responsible for the cross-linking of collagens and elastin33. LOX is differentially expressed in keratoconus epithelium34, and LOX variants lead to increased susceptibility to keratoconus35,36, and are associated with intraocular pressure variation25,26,27. FBN2 encode the fibrillin 2 which has a crucial role in ocular morphogenesis in mice37 and is expressed in the corneal stroma but not in the corneal epithelium of mice heterozygous for the micropinna microphthalmia (Mp) mutation, a 660-kb inversion on chromosome 18 that disrupts the Fbn2 gene38,39. Fbn2 is involved in the corneal epithelial homeostasis of Mp/+ mice39, and rare and common variants in FBN2 have been shown to be associated with macular degeneration in human40,41. Our CCT study also implicated SPRY2 which encodes a protein belonging to the sprouty family. This SPRY2 gene is involved in regulating corneal epithelial cell proliferation and differentiation, enabling eyelid closure42,43. Our study also identified CRIM1, which encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. Importantly, CRIM1 is involved in eye development in human and mouse44,45 and in the corneal response to ultraviolet and pterygium development46, and is required for maintenance of the ocular lens epithelium47.

Similarly, our study revealed potential biological pathways and relevant tissues involved in CCT variation that are pertained to the function of fibroblast and collagen-rich tissues, as well as the regulation of epithelial to mesenchymal transition, and the organization of the extracellular matrix, consistent with previous works14. Functional follow-up experiments in cell lines or animal models may confirm the involvement of these genes and biological pathways in CCT variation and reveal the underlying mechanisms of CCT-related vision disorders.

Many of the CCT-associated loci identified in this study, are also associated with other eye conditions, particularly NPLOC4, CYP1B1, RBMS3, and PLEKHA1. NPLOC4 encodes the NPL4 homolog, ubiquitin recognition factor, and polymorphisms at this locus have been previously reported to be associated with myopia, age-related macular degeneration, eye color, and recently with corneal or refractive astigmatisms, strabismus and macular thickness48,49,50,51,52,53. Mutations in CYP1B1, which encodes a member of the cytochrome P450 superfamily of enzymes, involved in eye development54, are associated with primary congenital glaucoma55,56. Similarly, in the current CCT study we identified RBMS3 which encodes the RNA binding motif single stranded interacting protein 3, and a recent GWAS reported a strong association between RBMS3 locus and increased risk of exfoliation syndrome57. Our study also identified as a CCT locus PLEKHA1, which encodes a pleckstrin homology domain-containing adapter protein. Polymorphisms in PLEKHA1 are associated with age-related macular degeneration58,59.

We recognize several potential limitations of our study. First, the ‘non-cases’ of the keratoconus GERA study may include some potential cases with other ophthalmic conditions, which may result in underestimates of the effects of individual CCT-associated SNPs if those conditions are also associated with the risk of keratoconus. Second, glaucoma diagnoses in UKB were based on self-reported data, and the subtypes of glaucoma were unspecified, which may result in underestimates of the effects of individual SNPs due to phenotype misclassification. However, our glaucoma results were consistent across GERA and UKB. Third, to date, it has been difficult to discern whether associations between CCT and POAG are truly causal or biased due to confounding associated with traditional observational studies2,3,4. Our MR analysis failed to detect alteration in CCT as a causal risk factor for POAG. This result is unexpected, and it is possible that other factors (e.g. environmental, epigenetics) that have not been taken into consideration in the current study, might influence the relationship between CCT and POAG. Future studies will be needed to clarify further the relationship between CCT and POAG.

In summary, our study identified 98 independent loci associated with CCT, of which 41 were novel. In addition to doubling the number of CCT-associated loci reported, this study explains up to 14.2% of the variance of CCT. The loci also explain variation among African Americans due to genetic ancestry. Our Mendelian randomization analysis did not support the idea that a thinner CCT causally increase the risk of POAG. Altogether, this large study of CCT increases substantially our understanding of the etiology of CCT variation and may open new avenues of investigation into human ocular traits and their relationship to the risk of vision disorders.

## Methods

### GERA cohort

The Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort consists of 110,266 adults, 18 years and older, who are consented participants in the Research Program on Genes, Environment, and Health (RPGEH). The GERA participants are members of the Kaiser Permanente Northern California (KPNC) integrated health care delivery system, and most have ongoing longitudinal records from vision examinations. For the current study, 18,129 GERA participants from four ethnic groups who had at least one recorded CCT measurement on both eyes during the same visit between June 2014 and January 2018 were included (Table 1). All study procedures were approved by the Institutional Review Board of the Kaiser Permanente Northern California Institutional Review Board. Written informed consent was obtained from all participants.

### CCT measurement

CCT was measured in GERA using the DGH-550 or DGH-55 ultrasonic (contact) pachymeter (DGH Technology Inc.; Exton, PA)4, or a non-contact optical biometer (Lenstar LS900, Haag-Streit, Köniz, Switzerland), and recorded for both eyes in the electronic health records. Patients with single eye measurements were removed. We also excluded 1106 patients who had ocular conditions which may influence CCT, including patients with Fuchs dystrophy, keratoconus, history of corneal refractive surgery, corneal transplantation, or laser vision surgery. For patients with both types of measurement (i.e., ultrasonic pachymeter and non-contact biometer) available, we selected the non-contact biometer measurements. Because the means of the distributions of the two types of measurements slightly differed, we transformed all CCT values to the standard normal distribution scale, with µ = 0 and σ = 1, i.e. N(0,1). Then, the mean standardized CCT of both eyes and standard deviation (sd) were assessed for each patient. Outliers (N = 9) defined by large left-right differences (i.e., beyond 4 sd of the overall standardized distribution of left-right differences) were removed (Supplementary Fig. 14). Finally, for consistency of CCT scale between GERA and IGGC, we rescaled standardized CCT values, as follows:

$${\mathrm{CCT}}_{{\mathrm{Final}}} = {\mathrm{CCT}}_{{\mathrm{Standardized}}\,{\mathrm{value}}} \ast {\mathrm{sd}}({\mathrm{CCT}}_{{\mathrm{non}} - {\mathrm{contact}}}) + {\mathrm{mean}}({\mathrm{CCT}}_{{\mathrm{non}} - {\mathrm{contact}}})$$

### Genotyping and imputation

GERA DNA samples were genotyped at the Genomics Core Facility of the University of California, San Francisco (UCSF) on four ethnicity-specific Affymetrix Axiom arrays (Affymetrix, Santa Clara, CA, USA) optimized for individuals of European, Latino, East Asian, and African American ancestry60,61. Genotype QC (quality control) procedures were performed on an array-wise basis62, as follows: SNPs with initial genotyping call rate ≥97%, allele frequency difference (≤0.15) between males and females for autosomal markers, and genotype concordance rate (>0.75) across duplicate samples were included. About 94% of samples and more than 98% of genetic markers assayed passed QC procedures. In addition to those QC criteria, SNPs with a minor allele frequency <1% were removed.

Imputation was also conducted on an array-wise basis. After the pre-phasing of genotypes with Shape-IT v2.r727195863, variants were imputed from the cosmopolitan 1000 Genomes Project reference panel (phase I integrated release; http://1000genomes.org) using IMPUTE2 v2.3.05964. As a QC metric, we used the info r2 from IMPUTE2, which is an estimate of the correlation of the imputed genotype to the true genotype65. We excluded variants with an imputation r2 < 0.3, and restricted to SNPs that had a minor allele count ≥20.

### Principal components analysis

We used Eigenstrat66 v4.2 to calculate the principal components (PCs) on each of the four GERA ethnic groups15. The top 10 ancestry PCs were included as covariates for the non-Hispanic whites, while the top six ancestry PCs were included for the three other ethnic groups. The percentage of Ashkenazi (ASHK) ancestry was also used as a covariate for the non-Hispanic whites to adjust for genetic ancestry15. Association of each PCs and ASHK ancestry with CCT are reported in Supplementary Data 1.

### Genetic ancestry analysis

A full description of the ancestry analyses in GERA is provided in Banda et al.15. The CCT distribution by the ancestry PCs for each GERA ethnic groups is illustrated on Fig. 1. To create these plots, a smoothed distribution of each individual i’s CCT phenotype was created using a radial kernel density estimate weighted on the distance to each other jth individual, as follows: $$\sum j\phi \left( {\left\{ {\left. {d(i,j)/{\mathrm{max}}_{i\prime ,j\prime }[d(i\prime ,j\prime )] \ast k} \right)} \right\}} \right)$$, where $$\phi$$(.) is the standard normal density distribution, k is the smooth value (5 for non-Hispanic whites; and 15 for East Asians, Hispanic/Latinos, and African-Americans), and d(i′, j′) is the Euclidean distance based on the first two PCs (Fig. 1). For visual representation of different groups, we derived the ethnicity and/or nationality subgroup labels from GERA or the Human Genome Diversity Project15.

### Association analysis in GERA

Each of the four GERA ethnic groups (non-Hispanic whites, Hispanic/Latinos, East Asians, and African-Americans) were first analyzed individually. We performed a linear regression of CCT and each SNP using PLINK67 v1.9 (www.cog-genomics.org/plink/1.9/) with the following covariates: age, sex, ancestry PCs, and CCT measurement type (i.e., ultrasonic pachymeter or non-contact optical biometer). Data from each genetic variant were modeled using additive dosages to account for the uncertainty of imputation68. We then performed a GERA meta-analysis of CCT to combine the results of the four ethnic groups using the package “meta” of R (https://www.R-project.org).

### International Glaucoma Genetics Consortium

The IGGC study was a meta-analysis of 25,910 participants from 19 CCT cohorts of European (14 cohorts) and Asian descent (5 cohorts)14. A full description of the individual cohorts are provided in previous publications69,70. GWAS summary statistics from the study of Iglesias et al.14 were publicly accessible at http://hdl.handle.net/10283/2976.

### Meta-analysis

To combine the study results of Iglesias et al.14 with our GERA meta-analysis, we conducted a fixed-effect meta-analysis. Heterogeneity index, I2 (0–100%) and P-value for Cochrane’s Q statistic among studies were assessed. For each locus, the top genetic variant was defined as the most significant variant within a 2-Mb window, and novel loci were defined as those that were located over 1 Mb apart from any previously reported locus.

### Conditional and joint analysis

To potentially identify independent signals within the 74 identified genomic regions, we performed a multi-SNP-based conditional and joint association analysis (COJO)71, which is implemented in the Genome-wide Complex Trait Analysis (GCTA) integrative tool72. This COJO analysis was conducted on the combined (GERA+IGGC) meta-analysis results. We first conducted the COJO analysis on each of the individual ethnic groups (European-specific samples (GERA and IGGC Europeans), GERA Hispanic/Latinos, Asian-specific samples (GERA and IGGC Asians), and GERA African Americans). To calculate linkage disequilibrium (LD) patterns we used the following reference panels: 10,000 random samples from GERA non-Hispanic white ethnic group as a reference panel for European samples, 8565 samples from GERA Hispanic/Latino ethnic group for Hispanic/Latino samples, 7518 samples from GERA East Asian ethnic group for Asian samples, and 3161 samples from GERA African American ethnic group for African American samples. We then meta-analyzed the four ethnic groups COJO results by calculating fixed effects summary estimates for combining the P-values. For this COJO analysis we considered a P-value <5 × 10−8 as the significance threshold.

### SNP-based heritability and variance explained

GWAS heritability estimate was obtained for CCT in GERA non-Hispanic whites (the largest ethnic group of GERA) using the GCTA software16,72. To estimate the proportion of variance in CCT explained by the 98 identified CCT-associated SNPs, we performed a REML (restricted maximum likelihood) analysis (GREML) using GCTA73.

### Variants prioritization

To prioritize genetic variants, we used a Bayesian approach (CAVIARBF)17. For each of the 74 associated signals identified in the combined (GERA+IGGC) meta-analysis, we computed each variant’s capacity to explain the identified signal within a 2-Mb window (±1.0 Mb with respect to the original top variant) and derived the smallest set of variants that included the causal variant with 95% probability (95% credible set). A total of 1554 variants within 72 annotated genes were included in these 74 credible sets (Supplementary Data 8). For this CAVIARBF analysis, we used 10,000 random samples from GERA non-Hispanic white ethnic group as a reference panel to calculate LD patterns.

### Genes and biological pathways prioritization

To prioritize genes and biological pathways, and highlight gene-set and tissue/cell enrichments within the 74 CCT genomic regions identified in the combined (GERA + IGGC) meta-analysis, we used the following integrative tool: DEPICT18. All independent genome-wide significant genetic variants in the combined (GERA + IGGC) meta-analysis served as input, and as the reference panel, we used 10,000 random samples from GERA non-Hispanic white ethnic group. Genes, gene-sets, tissue/cell annotations that achieved a nominal significance level of 0.05 after false-discovery rate (FDR) correction were subsequently prioritized.

To compare our pathways analysis results with recent findings14,20, we also conducted a pathways analysis using the Versatile Gene-based Association Study - 2 version 2 (VEGAS2v02) web platform21. We first performed a gene-based association analysis on the combined (GERA + IGGC) meta-analysis results using the default ‘-top 100’ test that uses all (100%) variants assigned to a gene to compute gene-based P-value. Gene-based analyses were conducted on each of the individual ethnic groups (European-specific samples (GERA and IGGC Europeans), GERA Hispanic/Latinos, Asian-specific samples (GERA and IGGC Asians), and GERA African Americans) using the appropriate reference panel: 1000 Genomes phase 3 European population, 1000 Genomes phase 3 American population, 1000 Genomes phase 3 Asian population, and 1000 Genomes phase 3 African population, respectively. Second, we performed pathways analyses based on VEGAS2 gene-based P-values. We tested enrichment of the genes defined by VEGAS2 in 9732 pathways or gene-sets (with 17,701 unique genes) derived from the Biosystem’s database (https://vegas2.qimrberghofer.edu.au/biosystems20160324.vegas2pathSYM). We adopted the resampling approach to perform pathway analyses using VEGAS2 derived gene-based P-values considering the default ‘−10 kbloc’ parameter as previously decribed14. We then meta-analyzed the four ethnic groups gene-based results using Fisher’s method for combining the P-values.

### Ocular gene expression

Expression of the genes (N = 74) that contained associated 95% credible set variants was evaluated in human ocular tissues using two publicly available databases: the OTDB19 and EyeSAGE74,75 publicly available at https://genome.uiowa.edu/otdb/ and http://neibank.nei.nih.gov/EyeSAGE/index.shtml, respectively. The OTDB consists of gene expression data for 10 eye tissues from 20 normal human donors, and the gene expression is described as Affymetrix Probe Logarithmic Intensity Error (PLIER) normalized value19.

### Associations with keratoconus

To identify keratoconus cases (N = 207) in GERA, we selected all participants who had at least one diagnosis of keratoconus by a Kaiser Permanente ophthalmologist based on the following ICD-9 diagnosis codes: 371.60, 371.61, and 371.62, and conducted a chart review of each of those cases (by Dr. Melles, a KPNC ophthalmologist). Our GERA keratoconus control group (N = 97,375) included all the non-cases.

To evaluate whether the 98 CCT-SNPs identified in the current study were also associated with keratoconus, we conducted a multiethnic meta-analysis combining the GERA keratoconus cohort, and an independent cohort of patients with keratoconus, recruited from specialist clinics at Moorfields Eye Hospital, London, United Kingdom; the recruitment methodology is the same as that described for a previously published subset of European ancestry23. Briefly, each participant was examined using tomography (Pentacam; Oculus), and the presence of keratoconus was confirmed using established criteria based on corneal thinning and corneal distortion76. A previous bilateral keratoplasty for keratoconus was also accepted as confirmation of disease status. Patients with syndromic disease and keratoconus (e.g., Down syndrome, Ehlers Danlos syndrome) were excluded. Controls were extracted from a pool of 80,000 randomly selected participants in the UK Biobank cohort. Exclusions included any individual with any ICD9 or ICD10 code for any corneal disease. The cases and controls were ethnically matched. In total, 2353 keratoconus cases from the Moorfields Eye Hospital and 37,360 controls from UK Biobank were included (1371 cases and 25,166 controls of European, 661 cases and 8009 controls of South Asian, and 321 cases and 4185 controls of African ancestry). Keratoconus cases and controls were genotyped using the Affymetrix UK Biobank Axiom Array.

### Associations with POAG in GERA

Associations of CCT-associated SNPs with open-angle glaucoma (POAG) were also evaluated in the GERA cohort. POAG cases were diagnosed by a Kaiser Permanente ophthalmologist and were identified in the KPNC electronic health record system based on the International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes (i.e., ICD-9 codes 365.01, 365.1, 365.10, 365.11, 365.12, and 365.15) as previously reported32. In total, 4986 POAG cases were identified in GERA. Our POAG control group (N = 58,426) included all the non-cases, after excluding subjects who have one or more diagnosis of any type of glaucoma (ICD-9 code, 365.xx).

### Associations with glaucoma in UKB

We then evaluated associations between CCT-associated genetic variants and glaucoma (subtype unspecified) in the multiethnic UK Biobank (UKB)24,77. In UKB, the glaucoma phenotype was assessed through a touchscreen self-report questionnaire completed at the Assessment Centre, via the question “Has a doctor told you that you have any of the following problems with your eyes?”, and cases (N= 7,329) were defined as those reporting “glaucoma” (subtype unspecified). The control group included 169,561 individuals. For this confirmation analysis, as one phenotype and 89 genetic variants were tested (as only 89 out of the 98 CCT-SNPs were available in UKB), our P-value adjusted for Bonferroni correction was set as P < 5.62 × 10−4 (0.05/89).

### Mendelian randomization analyses for CCT and POAG

To assess the causal relationships between CCT and POAG, we conducted two-sample Mendelian Randomization analyses using the TwoSampleMR package31. For CTT exposure risk factor, we used the following set of genetic instrument drawn on summary statistics data from the published Iglesias et al. European-specific meta-analysis14: lead SNPs previously reported as genome-wide significant (P < 5 × 10−8); after clumping SNPs for independence, 26 representative SNPs were retained. We built MR models using our GWAS summary associations for POAG32 (outcome of interest) from GERA non-Hispanic whites. Here, we reported MR estimates using the inverse variance–weighted (IVW) method as MR estimates using weighted median and MR Egger methods yielded similar pattern of effects (Supplementary Data 17 and Supplementary Fig. 11). Further analyses were conducted, including horizontal pleiotropy, leave-one-SNP-out or single variant analyses (Supplementary Figs. 12 and 13).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.