Meta-analysis of gene–environment-wide association scans accounting for education level identifies additional loci for refractive error

Myopia is the most common human eye disorder and it results from complex genetic and environmental causes. The rapidly increasing prevalence of myopia poses a major public health challenge. Here, the CREAM consortium performs a joint meta-analysis to test single-nucleotide polymorphism (SNP) main effects and SNP × education interaction effects on refractive error in 40,036 adults from 25 studies of European ancestry and 10,315 adults from 9 studies of Asian ancestry. In European ancestry individuals, we identify six novel loci (FAM150B-ACP1, LINC00340, FBN1, DIS3L-MAP2K1, ARID2-SNAT1 and SLC14A2) associated with refractive error. In Asian populations, three genome-wide significant loci AREG, GABRR1 and PDE10A also exhibit strong interactions with education (P<8.5 × 10−5), whereas the interactions are less evident in Europeans. The discovery of these loci represents an important advance in understanding how gene and environment interactions contribute to the heterogeneity of myopia.

Scatter plot of SNP x education interaction effects for spherical equivalent at 39 known GWAS loci The interaction beta coefficient corresponds to the effect in diopters of one additional copy of the risk allele on spherical equivalent in the high versus low educational level. Thirty-nine index SNPs had larger SNP x education interaction effect on spherical equivalent in Asians versus Europeans (meta-regression P for fold changes < 0.001). For 20 SNPs with the same direction of the interaction effect, the magnitudes of interaction effects were 4-fold larger on average in Asians than in Europeans (P = 0.003). The P-value for the difference of interaction effects in Asian versus European samples was obtained from the meta-regression with the outcome as the foldchanges of the interaction beta coefficients in Asians as compared to Europeans. Network was generated based on the functional and biological connectivity, graphically represented by anodes (gene) and edges (connections), using the information provided by GeneMANIA database 1 . The novel 9 genetic loci (12 genes; LINC00340 was not in database and hence omitted) identified in this study and 13 overlapping genes for spherical equivalent and age-at-onset of myopia GWAS from the CREAM and 23&Me 2; 3 were included for the analysis. The network weighting was assigned based on query genes, as the default method in GeneMANIA. The top three function categories are: ligand-gated channel activity (False Discovery Rate [FDR] = 0.175), neurotransmitter transport (FDR = 0.175) and extracellular matrix (FDR = 0.299). Connections are colorcoded by interaction type. Purple line, co-expression (32.03%); Blue, co-localization (14.94%); Grey, shared standard pathway 4 (48.73%) or protein domain 5 (4.29%). The black circle represents the query gene, and grey circle represents additional gene predicated by GeneMANIA for the network.
Supplementary Fig. 5 Regional association plots of SNP x education interactions at three loci for spherical equivalent (-log 10 (P int )) in European (a, c, e) and Asian cohorts (b, d, f)

a) AREG rs1246413 b). AREG rs12511037 c) GABRR1 rs13215566 d) GABRR1 rs13215566
e) PDE10A rs12206610 f) PDE10A rs12206610 Note: SNP rs12511037 was absent in European genotype/imputed data and we thus presented the proxy SNP rs1246413 (T/G, frequency of risk allele T = 0.95) in LD with rs12511037 (r 2 = 1).  900 Logistic regression for education on three SNPs was performed in Asian studies (total n = 10,315): SCES-610K, SCES-OmniE, SiMES, SINDI, SP2-1M, SP2-610, STARS, BES and Nagahama study adjusted for age, gender, and population stratification (SiMES and SINDI). The odds ratio (OR) was estimated from the meta-analysis of the results from above studies. Education level was defined as 1 = higher education, 0 = lower education. A1/A2: Effect allele/reference allele.  11 . ACP1 (acid phosphatase 1) belongs to the phosphoprotein tyrosine phosphatase family that dephosphorylates platelet-derived growth factor receptor (PDGFR) 12 .PDGFR is implicated in corneal proliferation 13  ARID2 (AT rich interactive domain 2) facilitates ligand-dependent transcriptional activation. SNP rs10880855 has a nominal association with ARID2 transcript level (P = 0.047) in skin tissues 18 . SNAT1 (Aliases: SLC38A1) supplies glutamine to the synthesis of glutamatergic and GABAergic neurons 19 . The variant rs12827763 is associated with transacting expression of SNAT1 (P = 1.3 × 10 -8 ) in muscle skeletal tissues 20 , and is in low LD with the index SNP rs10880855 (r 2 = 0.15). rs10853531 SLC14A2 SLC14A2 (Aliases: SETBP1) encodes solute carrier family 14 member 2. SLC14A2 has the role as the transport of glucose, organic acids, metal irons and amine compounds. rs12511037 AREG AREG encodes amphiregulin, a ligand of the epidermal growth factor receptor (EGFR). EGFR promotes the growth of normal epithelial cells and is implicated in myopia progression through the muscarinic system 21; 22 . rs13215566 GABRR1 GABRR1 encodes gamma-aminobutyric acid (GABA) C receptor ρ1. GABA C ρ1 is involved in the neurotransmission in the retina. The variant rs13215029, in perfect LD (r 2 = 1) with the index SNP rs13215566, is associated with cis-acting expression of GABRR1 (P = 2.3 × 10 -4 ) in skin tissues 18 . Another variant rs6902106 (r 2 = 0.45) is associated with cisacting expression of GABRR1 (P = 2.5 × 10 -7 ) in artery tibial tissues 16 . rs12206610 PDE10A PDE10A encodes phosphodiesterase, hydrolyzing both cAMP and cGMP to the monophosphate 23 . The levels of PDE10A protein display circadian rhythms at retinal photoreceptors 24 , suggesting its potential roles in the visual circle. Ik-2,NF-AT,p300 After exclusion of subjects operated for cataract and other eye procedures and diseases that could alter refraction, 618 subjects were available, among which 529 were genotyped at the French national centre for genotyping (CNG) using Illumina Human 610-Quad BeadChip. Among them, 509 individuals had good genotype QC (individuals of European ancestry, unrelated with other individuals, without discrepancy between clinical and genetic gender and with missingness < 5%) and had imputation data. In addition, 2 subjects had missing education data, leaving 507 subjects in the statistical analysis. Imputation was performed in two steps: prephasing with SHAPEIT2, followed by imputation with IMPUTE2 using 1000 Genomes(March 2012, MACGT1) as reference panel. SNPs were used in the imputation process if call rate > 98%, HWE p-value > 1 x 10 -6 , MAF> 1%. Analysis was performed using Quicktest, with adjustment on age, gender, education, PC1 and PC2 and modelling of interaction between SNP and education, using robust variance estimates. No SNP exclusion was applied on imputed SNPs.

Avon Longitudinal Study of Parents and Children (ALSPAC)
Details of ALSPAC cohorts have been published previously 8; 9 . The research adhered to the tenets of the Declaration of Helsinki. Ethical approval for the study was obtained from the ALSPAC Law and Ethics committee and three local research ethics committees. Pregnant women with an expected date of delivery between 1st April 1991 and 31st December 1992, resident in the former Avon health authority area in Southwest England, were eligible to participate in this birth cohort study. 13,761 women were recruited. Data collection has been via various methods including self-completion questionnaires sent to the mother, to her partner and after age 5 to the child; direct assessments and interviews in a research clinic. As well as investigating the health and wellbeing of the children in the birth cohort, the health of the mothers is also an important area of investigation. For mothers, DNA was extracted from blood samples collected as part of routine antenatal care, during attendance at ALSPAC research clinics, or from immortalized lymphoblastoid cell lines, for a total of 10,321 of the mothers. Non-cycloplegic autorefraction (Canon R50 instrument) was performed opportunistically when mothers accompanied their child to a research clinic visit, and/or by a researcher visiting their optician to obtain their spectacle prescription. The design of this study has been approved by the ALSPAC Ethics Committee and National Health Service Research Ethics Committee.
Non-cycloplegic autorefraction data was used in preference to subjective refraction data when available. DNA samples were available for 11,343 children, prepared from either blood samples or lymphoblastoid-transformed cell lines. Non-cycloplegic autorefraction (Canon R50 instrument) was performed during attendance at an ALSPAC research clinic visit when the children were approximately 15 years old. Genotyping was performed using Illumina 660 W-quad (mothers) or Illumina HumanHap 550 (children) bead arrays. Samples that did not cluster with HapMap CEU individuals on IBS plots, with excessive missingness (>5%), minimal or excessive autosomal heterozygosity, cryptic relatedness (>10% IBD) or with a sex-mismatch were excluded. SNPs with call rate <95%, minor allele frequency <1%, or Hardy-Weinberg P value < 10 −7 were excluded. Genotypes were available for 8340 mothers and 8365 children. Imputation was carried out separately for Mothers and Children. For mothers, individual chromosomes were pre-phased with ShapeIt v2 using the b37 genetic map, and imputation was performed with minimac-omp using the GIANT phase1 release v3 (2010-11-23) 1000 Genomes reference panel. For children, phasing was carried out using MACH and imputation with minimac, against the same reference panel. Genotype and phenotype data were available for 1865 mothers and 3792 children. SNP x education interaction was performed using Probabel for mothers. In children, tests for SNP main effect and SNP x near work interaction were carried out using R for 3 SNPs that showed evidence of SNP x Education interaction effects in the meta-analysis of Asian adults.

AREDS
The Age-Related Eye Disease Study (AREDS) was initially designed as a long-term multicenter, prospective study of the clinical course of age-related macular degeneration (AMD) and age-related cataract 27; 28 . In addition to collecting natural history data, AREDS included a randomized clinical trial of high-dose vitamin and mineral supplements for AMD and a clinical trial of high-dose vitamin supplements for cataract [27][28][29]  Written informed consent was obtained from all participants before enrollment in accordance with the Declaration of Helsinki. AREDS participants were 55 to 80 years of age at enrollment and had to be free of any illness or condition that would make long-term follow-up or compliance with study medications unlikely or difficult. On the basis of fundus photographs graded by a central reading center, best-corrected visual acuity and ophthalmologic evaluations, 4,757 participants were enrolled in one of several AMD categories, including persons with no AMD (control group). Visual acuity measurement of all participants was performed with the standard procedure developed for the Early Treatment of Diabetic Retinopathy Study (ETDRS). A refraction measurement was performed for participants at the randomization visit and each annual visit. For those who experience a decrease of 10 letters from baseline visual acuity, refractions were also conducted at the nonannual visits. Blood samples were collected at baseline and longitudinally, and cell lines were established. DNA was extracted from cell lines according to standard protocols when the initial DNA supply has been depleted.
For the current analysis, 1865 participants were included from the AREDS 1c population. Refractive error which was measured by a refraction protocol at baseline enrollment into the AREDS study [27][28][29][30] was utilized for the definition of astigmatism. For AREDS 1c, genotyping of SNPs was performed using the Illumina HumanOmni2.5-4v1_B chip array and a genome-wide association study of astigmatism using the Illumina 2.5M chip was performed using a subset of the control group from the original AREDS study. These control individuals are all Caucasians, who do not have age-related macular degeneration (AMD) and were further screened to also exclude individuals with cataracts, retinitis pigmentosa or other retinal degenerations, color blindness, other congenital eye problems, LASIK, artificial lenses, and other eye surgery. For all studies, samples with low call rate (<98%), with low mean confidence scores over all non-missing genotypes, with chromosome anomalies, or with sex-mismatch were excluded. No samples exhibited excess heterozygosity rates (1.5 interquartile ranges above or below the upper/lower quartile ranges). Cryptic relatedness was detected by estimating IBD sharing and kinship coefficients among all possible pairs and one member of each pair exhibiting a first cousin or closer relationship was dropped from the analysis. SNPs were dropped from the analysis if they exhibited more than 1 blind duplicate error, more than 1 HapMap control error or more than 1 error in HapMap control trios, a genotype call rate < 99%, minor allele frequency < 0.01, or Hardy-Weinberg P-value < 1 x 10 -4 . Tests for batch effects were not significant. No sex-specific differences in allelic frequency (P > 0.2) or heterozygosity (P > 0.3) were detected. Imputation was performed with the IMPUTE version 2 software (imputed to plus strand of NCBI build 37, 1000 Genomes worldwide reference panel of 1,092 samples from phase I integrated variant set (v3, release March 2012)). For each imputed SNP, info, a measure of imputation quality was calculated. Info typically ranges between 0 and 1, 1 indicating no uncertainty in imputed genotypes. Quicktest was used for analyses including age, sex and the first two principal components (to adjust for population stratification) as covariates. Genotype data from AREDS 1c are publicly available through the database of Genotype and Phenotype under the name of either the MMAP study or the AREDS study.

BATS
The Brisbane Adolescent Twins Study (BATS) is a part of the Australian Twin Eye Study 31 . Ethical approval was obtained from the QIMR Berghofer Medical Research Institute-Human Research Ethics Committee. In all subjects post-cycloplegic (following instillation of tropicamide 1%) refraction for both eyes was measured using a Humphrey-598 automatic refractor (Carl Zeiss Meditec, Inc., Miami, Florida, USA). These measurements were used to determine the spherical equivalence trait analysed here. Education data in BATS were collected as part of the 19UP study, through either telephone interviews or online questionnaires. We restricted the analyses to those of 20-year-old or above.
DNA was extracted from blood leucocytes according to standard procedures. The Australian cohorts were genotyped on the Illumina Human Hap610 Quad array. SNPs with a genotype success rate of 0.95 or above was required for inclusion of the SNP into further steps of the analysis. Only SNPs in Hardy-Weinberg equilibrium were processed: the HWE inclusion threshold was P>10x10-6. The minimum minor allele frequency required for inclusion of individual SNPs was 0.01. Ancestral outliers were defined as having the first two principal components more than six standard deviations from the mean values of HapMap European samples, and therefore were subsequently excluded from the analyses. Imputation was performed against version 3 of the November 23, 2010 version of the publicly released 1000 Genomes Project genotyping, using MACH for phasing and minimac for imputation We used the two-step score test for this analysis, with the first step fitting a mixed model for spherical equivalence adjusted for age, sex and kinship matrix using GenABLE, and the second step using the GWFGLS function in MixABEL which fits a linear model for the residual of spherical equivalence from the first step and tests the main SNP effect and the SNP x education interaction term.

Blue Mountains Eye Study (BMES)
The . Imputed SNPs were excluded from the analysis when failing one or more of the following QC filters: 1) prop info ≥ 0.5 (a software-specific statistic from IMPUTE); 2) Hardy-Weinberg P-value < 1×10 -6 . We did not filter the SNPs with MAF < 0.01 from the imputed SNPs so that rare SNPs were included for association assessment.

CROATIA-Korčula Study
The CROATIA-Korčula study, Croatia, is a population-based, cross-sectional study that includes a total of 969 adult examinees, aged 18-98 (mean=56.3), from the Dalmatian island of Korčula and most (N=930) underwent a complete eye examination 33 . The study received approval from Ethics Committee of the Medical School, University of Split and NHS Lothian Board in Scotland and Croatia and followed the tenets of the Declaration of Helsinki. Non-cycloplegic autorefraction was measured on each eye using a NIDEK Ark30 hand-held autorefractometer. Measures on eyes with a history of trauma, intra-ocular surgery, LASIK operations or keratoconus were removed. Analysis was performed as per analysis plan, excluding individuals with a cylinder power >= 5D in either eye and individuals with difference in cylinder power between right and left eyes beyond 4 standard deviations from the mean, and for over 25 year-old only as there were too few individuals in this study who were under 25 years of age. Genotypes were generated using a dense Illumina SNP arrays, Illumina CNV370v1 and CNV370-Quadv3, following the manufacturer's standard recommendations. Genotypes were determined using the Illumina BeadStudio software. Samples with a call rate below 97% , potentially mixed samples with excess autosomal heterozygosity or gender discrepancy (based on the sex chromosomes genotypes), and ethnic outliers (based on principal components analysis of genotypic data), were excluded from the analysis using the quality control algorithm implemented in the R package GenABEL. After exclusion of SNP with MAF < 0.01, call rate < 98% and HWE deviation p < 10 -6 , samples were pre-phased using shapeit v2 34 . Imputation was carried out using impute v2 35 and the 1,000 genomes All ancestries phase1 integrated v3 reference panel. The impute2mach GENABEL function was used to convert the impute2 outputs to the MACH format that is used in the ABEL suite (http://www.genabel.org/packages) and the regression analyses of Spherical Equivalent Refraction adjusted for age and sex on SNP allele dose, education and interaction between SNP and education performed using the MixABEL package. The variance covariance matrix used in MixABEL to account for relatedness between individuals was generated using the polygenic functions of the GenABEL package. After phenotypic and genotypic quality control steps, 807 individuals were analysed.

CROATIA-Split Study
The CROATIA-Split study, Croatia, is a population-based, cross-sectional study in the Dalmatian City of Split that includes 1000 examinees aged 18-95. The study received approval from Ethics Committee of the Medical School, University of Split and NHS Lothian Board in Scotland and Croatia and followed the tenets of the Declaration of Helsinki. Individuals were genotyped with either the 370CNV-Quadv3 (n=500) or the Illumina OmniExpress Exome-8v1_A beadchips (n=500). Alleles were called in BeadStudio/GenomeStudio using Illumina cluster files. Subjects were excluded if they fulfilled any of the following criteria: genotypic call rate <97%, mismatch between reported and genotypic sex, unexpectedly low genomic sharing with first degree relatives, excess autosomal heterozygosity, or outliers identified by IBS clustering analysis. We excluded SNPs on the basis of minor allele frequency (<0.01/monomorphism), HWE (P< 10 -6 ), call rate (<97%). The samples genotyped with the denser array (Illumina OmniExpress Exome) were first prephased and imputed as described for the CROATIA-Korcula study and the output of this imputation used as a secondary panel to complement the 1,000 genomes. All ancestries phase1 integrated v3 reference panel for the imputation of the samples genotyped on the less dense array. Imputations for the two halves of the study were then combined to form a combined panel of ~37.5m SNPs. Genome-wide scan for association was performed as described in the CROATIA-Korcula Study.

Diabetes Control and Complications Trial (DCCT)
DCCT (1982-1993) was a multi-center randomized clinical trial to compare the effectiveness of intensive (≥3 daily insulin injections or insulin pump) and conventional (<3 daily insulin injections) diabetic treatments at the time in preventing development and progression of microvascular complications of type 1 diabetes 36 . Ethical approval was obtained from the Research Ethics Board of The Hospital for Sick Children. Subjective refraction was measured following the standard protocols using a letter chart at 10 to 20 feet, at baseline visit and annually thereafter during DCCT. Refraction measurement was attempted at 1 meter for the subjects with poor visual acuity. In these cases the 4 meter refraction was estimated by subtracting +0.75 sphere from the 1 m measurement. In the current study the last available measurement for each individual was used. Genotyping was done using Illumina Human1M BeadChip assay. Individuals showing gender mismatch with typed X-linked markers, call rate <0.95, genotyping mismatch with an earlier study, high autosomal heterozygosity or cryptic relatedness were excluded from the analysis. Analysis was restricted to individuals who were self-identified as "white" and of 20 years or older. Ethnically admixed subjects, identified using population genetic approaches, were also excluded from further analysis. Details of genotyping quality control procedures are presented elsewhere 37 . Genotyped of untyped markers where imputed using 1000 Genomes Phase I integrated haplotypes as reference in IMPUTE v2.3.0.

Estonian Genome Center, University of Tartu (EGCUT)
The Estonian cohort is from the population-based biobank of the Estonian Genome Project of University of Tartu (EGCUT

EPIC-Norfolk Eye Study (EPIC)
The European Prospective Investigation into Cancer (EPIC) study is a pan-European prospective cohort study designed to investigate the aetiology of major chronic diseases 38 . EPIC-Norfolk , one of the UK arms of EPIC, recruited and examined 25,639 participants aged 40-79 years between 1993 and 1997 for the baseline examination 39 . Recruitment was via general practices in the city of Norwich and the surrounding small towns and rural areas, and methods have been described in detail previously 40 . Since virtually all residents in the UK are registered with a general practitioner through the National Health Service, general practice lists serve as population registers. Ophthalmic assessment formed part of the third health examination and this has been termed the EPIC-Norfolk Eye Study 41  Genomes haplotypes reference panel were carried out on the entire FHS dataset using parameters estimated from step 1. Statistical analyses were conducted with the R statistical software (version 2.7) and the GenABEL (version 1.7-2) and MixABEL (version 0.1-1) packages for linear mixed model association analyses. Linear mixed models included age, sex, the first two eigenvectors from principal components analyses of genotype data, a binary coding of education (0, 1), and the additively-coded SNP dosage (0 to 2). For the original cohort, years of schooling was not reported but an ordinal variable ranging from 0 (no schooling) to 8 (postgraduate) was collected. An interaction (G x E) term was generated as the product of additively-coded SNPs with the binary education variable. The kinship matrix was estimated empirically from the data and included as a random effect in the statistical model.

Gutenberg Health Study (GHS1, GHS2)
The Gutenberg Health Study (GHS) is a population-based, prospective, observational cohort study in the Rhine-Main Region in midwestern Germany with a total of 15,010 participants and follow-up after five years. The study sample is recruited from subjects aged between 35 and 74 years at the time of the exam. The sample was drawn randomly from local governmental registry offices and stratified by gender, residence (urban and rural) and decade of age. Exclusion criteria were insufficient knowledge of the German language to understand explanations and instructions, and physical or psychic inability to participate in the examinations in the study center. Individuals were invited for a 5-hour baseline-examination to the study center where clinical examinations and collection of blood samples were performed. The interdisciplinary study design comprises an ophthalmological examination, general and especially cardiovascular examinations, psychosomatic evaluation, laboratory tests, and biobanking for proteomic and genetic analyses. All participants underwent an ophthalmological investigation of 25 minutes' duration taking place between 11:00 a.m. and 8:00 p.m. This examination was based on standard operating procedures and included a medical history of eye diseases, autorefraction and visual acuity testing (Humphrey ® Automated Refractor/Keratometer (HARK) 599™, Carl Zeiss Meditec AG, Jena, Germany), visual field screening using frequency doubling technology (Humphrey ® Matrix Perimeter, Carl Zeiss Meditec AG, Jena, Germany), central corneal thickness and keratometry measurement (Scheimpflug imaging with the Pachycam™, Oculus, Wetzlar, Germany), IOP measurement with a non-contact tonometer (Nidek NT-2000™, Nidek Co., Japan), slitlamp biomicroscopy with undilated pupils (Haag-Streit BM Jena, Germany), all administered by an ophthalmologist. The study was approved by the Local Ethics Committee of Rhineland-Palatinate, Germany (reference no. 837.020.07). According to the tenets of the Declaration of Helsinki, written informed consent was obtained from all participants prior to entering the study.
Within GHS, DNA was extracted from buffy-coats from EDTA blood samples as described in Zeller et al. 46 . Genetic analysis was conducted in the first 5,000 study participants. For these, 3,463 individuals were genotyped in 2008 (GHS1) and further 1,439 individuals in 2009 (GHS2). Genotyping was performed for GHS1 and GHS2 using the Affymetrix Genome-Wide Human SNP Array 6.0 (http://www.affymetrix.com), as described by the Affymetrix user manual. Genotypes were called using the Affymetrix Birdseed-V2 calling algorithm. Individuals with low genotyping call rate, a too high level of heterozygosity (hetFDR>0.01)), with sex-mismatches, and with Non-European ancestry were excluded. After applying standard quality criteria (minor allele frequency >1%, genotype call rate >98% and P Genome-wide genotyping using the Illumina 2.5M chip or the Illumina Omni Express chip was performed on a subset of individuals from the S3/F3. Samples with low call rate (<98%), sex-mismatch, exhibited excess heterozygosity rates or evidence for non-Caucasian ancestry were excluded. SNPs were excluded before imputation if they had a low a genotype call rate (<0.98), low minor allele frequency (<0.01) or Hardy-Weinberg P-value <10 −6 . Phasing and imputation was performed with SHAPEIT v2 and IMPUTE2 v2.3.0 using the 1000g phase 1 integated reference panel. Eyeglass prescriptions were measured in addition to an evaluation using the Nikon Retinomax and subjects with age-related macular degeneration, cataracts, retinitis pigmentosa, color blindness, other congenital eye problems, LASIK, artificial lenses, and other eye surgery were excluded. GxE analyses were done with QUICKTEST version 0.95.The genomic control inflation factor was 1.016 (after filtering SNPs for MAF >1%, imputation quality info >0.3).

Ogliastra Genetic Park, Talana study (OGP Talana)
A cross-sectional ophthalmic study was performed in Talana Helsinki. Talana is an Ogliastran village situated at an altitude of 700 m above sea level in one of the most secluded areas of Sardinia; it has about 1200 inhabitants and, importantly, archival records are available from 1589 and genealogical trees have been reconstructed from 1640. 789 volunteers gave their written informed consent and were invited to the local medical centre, which was equipped with a complete set of ophthalmic instruments for this survey. All participants underwent a complete eye examination conducted according to a standardized protocol that included visual acuity measurement with Snellen charts at a distance of 5 m, autorefraction (RK-8100 Topcon, Tokyo, Japan) assessing sphere, cylinder and axis, slit lamp biomicroscopy (Model BQ900, Haag-Streit, Bern, Switzerland), contact tonometry and colour fundus photography (TRC-50IA,Topcon) and non-contact optical biometry (IOLMaster,Carl Zeiss, Italy) and Optical coherence tomography (OCT). Whole blood was obtained from all consenting family members of Talana village for DNA extraction.
Genotyping was carried out using the Affymetrix 500k chips using standard protocols. SNPs quality control was performed using the GenABEL software package in R. Samples with overall SNP call rate < 95%, showing excess of heterozygosity, or being classified as outliers by allelic identity-by-state (IBS) clustering analysis, were excluded. After exclusion of SNPs with minor allele frequency < 0.05, Hardy-Weinberg P value >10 −4 and call rate < 95%, data were pre-phased with Shapeit and imputed with Impute2 Using the GIANT phase 1 release v3 1000 Genome reference panel. Genome-wide GxE association analysis was performed using MixABEL.

Orkney Complex Disease Study (ORCADES)
The Orkney Complex Disease Study (ORCADES) is a population-based, cross-sectional study in the Scottish archipelago of Orkney, including 1,285 individuals with eye measurements. The study received approval from the Orkney and North of Scotland Local Research Ethics Committees in Scotland and followed the tenets of the Declaration of Helsinki. Autorefractive measurements were obtained using a Kowa KW 2000 autorefractometer.
Measures on eyes with a history of trauma, intra-ocular surgery, LASIK operations or keratoconus were removed. Analysis was performed as per analysis plan excluding individuals with a cylinder power >= 5D in either eye and individuals with difference in cylinder power between right and left eyes beyond 4 standard deviations from the mean, and for over 25 year-old only as under 25 year were too few.
Individuals were genotyped with either the Illumina HumanHap300v2 or 370CNV-Quad beadchips (n=890) or the Illumina Omni1 (n=304) or Illumina OmniExpress beadchips (n=1073). Alleles were called in BeadStudio/GenomeStudio (Hap300/Omni) using Illumina cluster files. Subjects were excluded if they fulfilled any of the following criteria: genotypic call rate <98%, mismatch between reported and genotypic sex, unexpectedly low genomic sharing with first degree relatives, excess autosomal heterozygosity, or outliers identified by IBS clustering analysis. We excluded SNPs on the basis of minor allele frequency (<0.01/monomorphism), HWE (P<10 -6 ), call rate (<97%). Given the very high overlap in SNPs between the two Omni chips, the intersection of QC'd SNPs was used to impute and phase individuals genotyped on the Omni arrays together, whilst the Hap300 individuals were phased and imputed separately. Samples were phased using shapeit v2. Imputation was carried out using impute2 and the 1,000 genomes All ancestries phase1 integrated v3 reference panel, with a secondary reference panel of local exome sequences, sequenced using the Agilent SureSelect All Exon Kit v2.0 and Illumina 100 bp paired end reads (average 30x depth), derived from 90 ORCADES subjects chosen to optimally represent the haplotypes present. Imputations for the Hap300 and Omni subjects were then combined to form a combined panel of 37.5 M SNPs for 2222 subjects 52 . The impute2mach GENABEL function was used to convert the impute2 outputs to the MACH format that is used in the ABEL suite (http://www.genabel.org/packages) and the regression analyses of Spherical Equivalent Refraction adjusted for age and sex on SNP allele dose, education and interaction between SNP and education performed using the MixABEL package. The variance covariance matrix used in MixABEL to account for relatedness was generated using the polygenic functions of the GenABEL package.

RAINE Eye Health Study (RAINE)
The Raine Eye Health Study (REHS) was conceived to determine the prevalence of and risk factors for eye disease in young adults, and to characterize ocular biometric parameters in a young adult cohort 53 . The Western Australian Pregnancy Cohort (Raine) Study originated as a randomized-controlled trial of 2900 women recruited from the state's largest maternity hospital. The design of study has been approved by the Human Research Ethics Committee, University of Western Australia. Their offspring (N=2868) have been followed at birth, ages 1, 2, 3, 5, 8, 10, 14, 17 and 20 years of age in a prospective cohort study. DNA was collected from participants for genome-wide association studies and genotyping was performed using Illumina 660 Quad Array. Any pair of individuals who were related with a π > 0.1875 (in between second and third degree relatives -e.g. between half-sibs and cousins) was investigated, and the individual with the higher proportion of missing data was excluded from the 'clean' dataset (68 individuals excluded). Individuals who had low genotyping success (i.e. missing data) were excluded from the 'clean' dataset -a threshold of absent data > 3% was used for exclusion (16 individuals excluded). Additionally, if they had high levels of heterozygosity then they were also excluded (heterozygosity < 0.30 excluded 3 individuals). SNPs which did not satisfy a Hardy-Weinburg equilibrium p-value > 5.7x10-7 (919 markers), a call rate >95% (97,718 markers), and a minor allele frequency >0.01 (1%) (119,246 markers -includes CNV's) were excluded. To account for population stratification, the first five principal components were calculated using a subset of 42,888 SNPS that were not in LD with each other. Principal component analysis was conducted using the EIGENSTRAT program. Raine Study was imputed against the 1000 Genomes Phase 1 Europeans (November 23, 2010 release) using MACH v 2.3.0 software. A minimum passing threshold of 0.3 on the Rsq metric and a MAF>0.01 were applied to ~30 million imputed SNP. At the 20-year follow-up participants completed a comprehensive eye assessment that included visual acuity, orthoptic assessment and cycloplegic autorefraction, as well as several ocular biometric variables and multiple ophthalmic photographs of the anterior and posterior segments. Using the 20 year follow-up examination refractive error phenotypes, 348 Caucasian participants aged 20 years or older with high quality genotypes and known spherical equivalent refraction and educational level were included in the current analysis. ProbABEL 0.4.1 was used to perform G×E interaction analysis assuming an additive model with age, sex and the first two principal components fitted as covariates. Linear regression adjusting for age, sex and the first two principal components was performed using mach2qtl to estimate association of each SNP with spherical equivalent refraction.

Rotterdam Study (RS1, RS2, RS3)
The Rotterdam Study is a prospective population-based cohort study in the elderly living in Ommoord, a suburb of Rotterdam, the Netherlands. Details of the study are described elsewhere 54 . In brief, the Rotterdam Study consists of 3 independent cohorts: RS1, RS2, and RS3. For the current analysis, 5,422residents aged 55 years and older were included from RS1, 1,973 participants aged 55 and older from RS2, and 1,971 aged 45 and older from RS 3. 99% of subjects were of Caucasian ancestry. Participants underwent multiple physical examinations with regular intervals from 1991 to present, including a non-dilated automated measurement of refractive error using a Topcon RM-A2000 autorefractor. All measurements in RS-1-3 were conducted after the Medical Ethics Committee of the Erasmus University had approved the study protocols and all participants had given a written informed consent in accordance with the Declaration of Helsinki.
DNA was extracted from blood leucocytes according to standard procedures. Genotyping of SNPs was performed using the Illumina Infinium II HumanHap550 chip v3.0 array (RS-I); the HumanHap550 Duo Arrays and the Illumina Human610-Quad Arrays (RS-II), and the Human 610 Quad Arrays Illumina (RS-III). Samples with low call rate (<97.5%), with excess autosomal heterozygosity (>0.336), or with sex-mismatch were excluded, as were outliers identified by the identity-by-state clustering analysis (outliers were defined as being >3 s.d. from population mean or having identity-by-state probabilities >97%). We used genomic control to obtain optimal and unbiased results and applied the inverse variance method of each effect size estimated for both autosomal SNPs that were genotyped and imputed in both cohorts. A set of genotyped input SNPs with call rate >98%, with minor allele frequency >0.01, and with Hardy-Weinberg P value >10 −6 was used for imputation. We used Minimac to impute to 1000G (phase 1, March 2012). For each imputed SNP, a reliability of imputation was estimated as the ratio of the empirically observed dosage variance to the expected binomial dosage variance (O/E ratio). GWAS GxE analyses were performed using ProbABEL.

TwinsUK
The TwinsUK adult twin registry based at St. Thomas' Hospital in London is a volunteer cohort of over 10,000 twins from the general population 55 . Twins largely volunteered unaware of the eye studies, gave fully informed consent under a protocol reviewed by the St. Thomas' Hospital Local Research Ethics Committee and underwent non-cyclopleged autorefraction using an ARM-10 autorefractor (Takagi Ltd).
Genotyping of the TwinsUK dataset was done with a combination of Illumina arrays (HumanHap300, HumanHap610Q, 1M-Duo and 1.2MDuo 1M). Intensity data for each of the three arrays were pooled separately (with 1M-Duo and 1.2MDuo 1M pooled together) and genotypes were assigned using the Illuminus calling algorithm . We applied similar quality control criteria to each dataset and merged them. Pre-phasing was done with SHAPE-IT software and imputation was performed using the IMPUTE v2 using 1000 Genomes haplotypes-Phase I integrated variant set release (v3) in NCBI build 37 (hg) coordinates. GWAS GxE analyses were performed using Quicktest and only one twin for each pair was included in the analysis to overcome family structure issues.

Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR)
WESDR is an observational cohort study of diabetes complications   56 . The study protocol has been approved by the Research Ethics Board of The Hospital for Sick Children. Subjective refraction was measured following standard protocols at each follow-up visit (roughly every 5 years). In the current study the first available refractive measurement after age 25 was used.
Subjects with type 1 diabetes from WESDR were genotyped using Illumina HumanOmni1-Quad BeadChip assay.
Individuals showing gender mismatch with typed X-linked markers (n=8), cryptic relatedness (n=5), high autosomal heterozygosity (n=6), call rate <0.95 (n=30), as well as ethnicities other than "white" were not included in the analysis. Population genetic approaches based on multi-dimensional scaling implemented in PLINK v1.07 were used to identify and exclude ethnically admixed individuals. Imputation was performed in IMPUTE v2.3.0 using integrated haplotypes from 1000 Genomes Phase I as reference (IMPUTE2 chooses the best custom reference set for each individual internally). The GxE regression model accounted for age, gender and the first two principal components.

Young Finns Study (YFS)
The YFS cohort is a Finnish longitudinal population study sample on the evolution of cardiovascular risk factors from childhood to adulthood 57 59 . Of them, we excluded 151 with cryptic relatedness during sample QC procedure. Additional 259 Individuals with cataract surgery or missing refraction data were also excluded. This left a total of 585 individuals for analysis. Linear regression analyses for SE were performed at each SNP using 585 individuals with age, sex, education, SNP x education, and the first two principal components (to adjust for population stratification) included in the model.

Nagahama Prospective Genome Cohort for the Comprehensive Human Bioscience (Nagahama)
The Nagahama Prospective Genome Cohort for the Comprehensive Human Bioscience (the Nagahama Study) is a community-based prospective cohort study that aims to determine the prevalence and risk factors of various diseases in a community. The details of study design and methodology have been described elsewhere 60 . In brief, residents of Nagahama City who satisfied the following criteria were recruited as participants and were examined between November 2008 and November 2010: 1) age 30 and 74 years; 2) ability to participate on one's own; 3) no significant problems communicating in Japanese; 4) no current serious diseases/symptoms or health issues; and 5) voluntarily decided to participate in this study. A total of 9,804 Japanese individuals participated in the Nagahama Study. All the participants in the Nagahama Study had their axial length (millimeter [mm]; IOL Master, Carl Zeiss Meditec, Dublin, CA, USA), spherical equivalent (diopter [D]; ARK-530A, Nidek, Aichi, Japan), and corneal curvature (mm; ARK-530A, Nidek) measured for both eyes. Color fundus photographs were also obtained from all participants (CR-DG10, Canon, Tokyo, Japan). Of the participants, 3,712 individuals were genome-scanned using HumanHap610K Quad Arrays, HumanOmni2.5M Arrays, and/or HumanExome Arrays (Illumina Inc., San Diego, California, USA). After our standard quality control, genomic imputation was performed on 192 participants' data that had been genotyped by every platform. Finally, the data that consists of 1,756,611 SNPs of 3,248 individuals were fixed. All study procedures were approved by Ethnics committee of Kyoto University Graduate School of Medicine.

Singapore Chinese Eye Study (SCES)
Similar to SINDI, the Singapore Chinese Eye Study (SCES) is a population-based cross-sectional study of eye diseases in Chinese adults 40 years of age or older residing in the southwestern part of Singapore. The methodology of the SCES study has been described in detail previously. Between 2009 and 2011, 3353 (72.8%) of 4605 eligible individuals underwent a comprehensive ophthalmologic examination, using the same protocol as SINDI 61 . Genome-wide genotyping using was done in a subset of SCES participants using Illumina Human610-Quad BeadChip 59 (SCES-610K, n=1952) and Illumina OmniExpress (SCES-OmniE, n=615). Samples were excluded if they showed evidence of admixture, cryptic relatedness, high heterogeneity and gender discrepancies. From a starting number of 1952 individuals, three samples had per-sample call rate of <95% and were removed from analysis. A total of 21 individuals showed evidence of admixture and were consequently excluded. Biological relationship verification revealed a total of 29 sample pairs with cryptic relatedness. For these, the sample with the lower call rate was removed. In addition, further 14 samples with impossible biological sharing or heterogeneity, probably because of contamination, were removed, as well as two individuals who were removed due to gender discrepancies. PC analysis of the remaining individuals for SCES against the 1000 genomes phase 1 cosmopolitan panel haplotypes (March 2012 release) did not show the cohort to be dissimilar in ancestry, and therefore no PCs were used to correct for any underlying population substructure in the analysis performed. Individuals were excluded from the study if they had cataract surgery and missing refraction data. Linear regression analyses of SE with gene and education interaction were performed using 1710 individuals in SCES-610K and 543 in SCES-OmniE with age and sex included in the model as covariates.

Singapore Malay Eye Study (SIMES)
SiMES is a population-based prevalence survey of Malay adults aged 40 to 79 years living in Singapore that was Total of 3072 DNA samples were genotyped using the Illumina Human 610 Quad Beadchips 62; 63 . Using the same quality control criteria, we omitted a total of 530 individuals including those of subpopulation structure (n=170), cryptic relatedness (n=279), excessive heterozygosity or high missingness rate > 5% (n=37), and gender discrepancy (n=44). A total of 2165 individuals were over age 25 and had high quality genotypes and phenotypes for astigmatism. After the removal of the samples, SNP QC was then applied on a total of 579,999 autosomal SNPs for the 2542 post-QC samples. The same QC methods used for SCES were applied to the SiMES genotyping samples. Linear regression analyses of SE with gene and education interaction were performed using 2256 individuals with age, sex and the first two principal components (to adjust for population stratification) included in the model as covariates.

Singapore Indian Eye Study (SINDI)
SINDI is a population-based survey of major eye diseases 64 in ethnic Indians aged 40 to 80 years living in the South-Western part of Singapore and was conducted from August 2007 to December 2009. In brief, 4,497 Indian adults were eligible and 3400 participated. Genome-wide genotyping was performed in 2,953 individuals 63 . Participants were excluded from the study if they had cataract surgery and missing refraction data. The Illumina Human610 Quad Beadchips was used for genotyping all DNA samples from SINDI (n=2593). We excluded 415 subjects from the total of 2953 genotyped samples based on: excessive heterozygosity or high missingness rate > 5% (n=34) , cryptic relatedness (n=326), issues with population structure ascertainment (n=39) and gender discrepancies (n=16). This left a total of 2,538 individuals with 579,999 autosomal SNPs and 2,088 of these individuals were also over age 20 and had phenotype data. During SNP QC procedure, SNPs were excluded based on (i) high rates of missingness (> 5%) ; (ii) monomorphism or MAF < 1% ; or (iii) genotype frequencies deviated from HWE (P < 1  10 -6 ). Linear regression analyses of SE with gene and education interaction were performed using 2088 individuals with age, sex and the first two principal components (to adjust for population stratification) included in the model as covariates.

Singapore Prospective Study Program (SP2-1M; SP2-610)
Samples of SP2 were from a revisit of two previously conducted population-based surveys carried out in Singapore between 1992 and 1998, including the National Health Survey 1992 and the National Health Survey 1998 65 . These studies comprise random samplings of individuals stratified by ethnicity from the entire Singapore population. A total of 8266 subjects were invited in this follow-up survey and 6301 (76.1% response rate) subjects completed the questionnaire, of which 4056 (64.4% of those who completed the questionnaire) also attended the health examination and donated blood specimens. The present GWA genotyping for SP2 involved individuals of Chinese descent only (n=2867) 66 .
Of the 2,867 blood-derived DNA samples, 1,459 samples were genotyped on the 610-Quad (SP2-610) and 1,016 samples on the 1M-Duov3 (SP2-1M). We excluded 443 individuals on the following conditions, sample call rates of less than 95%, excessive heterozygosity, cryptic relatedness by IBS, population structure ascertainment, and gender discrepancies as listed in the main text. During the SNPs QC procedure, we excluded SNPs with low genotyping call rates (> 5% missingness) or monomorphic, with MAF < 1%, or with significant deviation from HWE (P< 10 -6 ). This yielded a post-QC set of 462,580 SNPs. We additionally assessed the SNPs that are present on different platforms for extreme variations in allele frequencies with a 2-degree of freedom chi-square test of proportions, removing 62 SNPs with P-values < 0.0001. A total of 811 individuals in SP2-1M and 854 individuals in SP2-610 had both high quality genotype data and SE data and were used in the Linear regression analyses of SE with gene and education interaction, adjusting for age and sex.

Strabismus, Amblyopia and Refractive Error Study (STARS)
The Strabismus, Amblyopia and Refractive Error Study in Singaporean Chinese Preschoolers (STARS) Family study is a family-based study nested in a prevalence survey of Singaporean preschool children (n=3,009) conducted from March 2008 to March 2010 67 . The biological parents of STARS probands were invited to enroll in the STARS Family study. A total of 1,451 samples from 440 nuclear families were genotyped using Illumina Human610 Quad Beadchips. The 741 parents who had phenotype data and who also had available, high quality GWAS genotypes were used in the current study. Linear regression analyses of SE including gene and education interaction were performed with age, sex included in the model as covariates.
All Singapore studies adhere to the Declaration of Helsinki. Ethics approvals have been obtained from the Institutional Review Boards of the Singapore Eye Research Institute, Singapore General hospital, National University of Singapore and National Healthcare Group, Singapore. In all cohorts, participants provided written, informed consent at the recruitment into the studies.