Introduction

Corneal dystrophies are heterogeneous hereditary diseases characterized by progressive opacities of the cornea that lead to severe visual impairments. Clinical types of corneal dystrophies are classified according to the results of slit-lamp examination, the depth of the deposits, and their histological features. The majority of corneal dystrophies recognized to date are transmitted following an autosomal dominant mode of transmission with a high degree of penetrance (Pieramici and Afshari, 2006). Stromal corneal dystrophies are classified according to both their biomicroscopic distribution and the appearance of deposited material. Granular corneal dystrophy, type II (CGD2; OMIM #. 607541), the most commonly diagnosed stromal corneal dystrophy in Koreans, is an autosomal dominant corneal dystrophy characterized by progressive accumulation of both granular lesions and lattice lesions. In addition, CGD2 is a bilateral disease that initiates during the first or second decades of life and exhibits substantial intra- and interfamilial variation in clinical expressivity (Rodrigues and Krachmer, 1988; Klintworth, 1999; Weiss et al., 2008).

Munier et al. (1997) identified transforming growth factor beta-induced gene human clone 3 (TGFBI, BIGH3, a gene encoding keratoepithelin) on chromosome 5q31 as the causative gene for four stromal corneal dystrophies in Caucasians. Worldwide, the majority of TGFBI mutations linked with corneal dystrophies are missense mutations at codons 124 (R124H) and 555 (R555W) (Kannabiran and Klintworth, 2006). However, population-specific variations in the prevalence of different mutations have also been observed. For example, CGD2 accounts for approximately 12% of TGFBI-related corneal dystrophies in Europe (Munier et al., 2002). In Korean and Japanese populations, CGD2 is the most common corneal dystrophy, accounting for approximately 90% and 72% of TGFBI-related corneal dystrophies, respectively (Mashima et al., 2000; Kim et al., 2001).

A genotype-phenotype correlation for mutations of the TGFBI gene has emerged from several studies as associated with R124H in CGD2. Watanabe et al. (Watanabe et al., 2001) reported two different clinical phenotypes of Japanese CGD2 patients with the same R124H mutation. The pathophysiology leading to CGD2 is still unknown. Therefore, genetic research on CGD2 is important to detecting the novel gene related to pathophysiology, ultimately allowing more accurate clinical-molecular classification.

In the present study, we also observed phenotypic variations between Korean CGD2 patients with the same R124H mutation. It was speculated that there is another genetic factor, except TGFBI, related to the protein deposit of different types in cornea. Therefore, we tried to identify genetic loci cosegregated with CGD2 in Korean families using SNP markers. This is the first whole genome-wide linkage study with high density SNP array for CGD2.

Results

Genotyping

A total of 551 Illumina HumanCNV 370K-Duo Beadchip were processed. The average SNP call rate per array was 98.49%. Of 370,404 SNPs, a total of 136,584 were excluded based on following three criteria: (1) SNPs with low genotyping call rates, < 95%; (2) SNPs with low minor allele frequency, < 0.1; (3) SNPs with impossible recombination. SNPs on X chromosome were also excluded in the analysis. Of 551 participants, six individuals with Mendelian inconsistency and four offspring without parental genotypes were excluded from further study. Finally, 541 individuals from 59 families and 233,820 polymorphic SNP markers were used in the parametric linkage analysis. These SNP markers were distributed evenly throughout the genome with 0.1 cM marker distance.

Linkage analysis

To identify genomic loci co-segregated with CGD2, we performed a genome-wide linkage analysis with the 233,820 SNPs from 59 families. Characteristics of the 59 families are summarized in Table 1. Based on the CGD2 inheritance pattern, a parametric linkage analysis was implemented, assuming autosomal dominant inheritance with 90% penetrance and a disease allele frequency of 0.001. Candidate regions were those regions for which traditional thresholds of classical single-point linkage studies of Mendelian traits corresponded to pointwise significance levels of 10-3 and 10-4 and LOD scores of 2.1 and 3.3 (Lander and Kruglyak, 1995). According to these criteria, we confirmed that chromosome 5, including the TGFBI gene, had the highest LOD score in a 14-Mb genomic region on q31-q31.2 (maximum LOD score = 47.36; Figure 1A). In addition, we selected nine markers on five different chromosomes, which showed the linkage score over 2 and were supported by at least three near SNPs (LOD > 1). HLOD score revealed one region of significant linkage at rs6445141 on chromosome 3q26.3 (HLOD = 5.83, α = 0.901) as well as five regions with evidence of suggestive linkage on chromosome 7q22 (rs997447), 12q21 (rs10778948), 13q33 (rs9587583), 20p12.3 (rs6086473), and 20q13.2-.3 (rs6086473, rs6021231, rs2766648, and rs3761202) in a single-point analysis (Figure 2, Table 2). Regions showing HLOD scores > 2 without supporting by multiple markers, including a peak on chromosome 2, were excluded in further analysis.

Table 1 Characteristics of samples from 59 families with granular corneal dystrophy, type II *Carrier of the R124H mutation in TGFBI gene without disease phenotype
Figure 1
figure 1

Confirmation of 5q31 in Korean CGD2 families with parametric single-point (A) and multipoint (B) linkage analysis (gray: LOD, black: HLOD). Dotted line represents the suggestive linkage level (LOD (HLOD) = 2.1), and dashed line represents the significant linkage level (LOD (HLOD) = 3.3).

Figure 2
figure 2

Parametric single-point linkage analysis in 59 CGD2 families. Heterogeneity LOD (HLOD) scores for all families are plotted for each chromosome. Dashed lines represent suggestive linkage level (HLOD = 2.1).

Table 2 Loci with single-point HLOD socres > 2.1 and simulation analysis in 59 families The data were generated using Merlin v.1.1.2 and obtained by assuming an autosomal dominant model of inheritance with 90% penetrance. Alpha refers to the proportion of families linked to that locus. Genetic position is the relative position from pter to qter in cM. NCBI Build 36.3 was used to obtain all reference sequences. In bold is shown the significant HLOD score for chromosome 3. aChromosome-wide and bgenomes-wide empirical P value were obtained by gene dropping simulation of Merlin v.1.1.2

In multipoint linkage analysis, the presence of linkage disequilibrium between two or more markers can falsely inflate the LOD score, and missing genotypes can increase this effect (Schaid et al., 2004). Therefore, to accommodate intermarker LD, the pairwise LD measures (r2) between consecutive pairs of SNP markers were calculated with MERLIN (Supplemental Date Figure S1). The parametric multipoint linkage was analyzed by organizing closely spaced SNP markers into clusters on all chromosomes. Assuming r2 = 0.4 without LD, we observed similar maximum HLOD scores at 5q31 and 20q13.2-.3 without false positive linkage signals, although the total numbers of SNP markers were reduced from 14,697 to 5,598 and from 5,857 to 2,224, respectively. The overall multipoint parametric linkage analysis of all 59 families without high-LD SNPs is shown in Figure 3. Parametric multipoint linkage analysis of chromosome 5 showed the highest LOD (HLOD) score in the q31-q31.2 region (maximum LOD score = 55.94; Figure 1B), and the maximum HLOD score for suggestive linkage was 2.24 (α = 0.19) in region 20q13.2-.3. The sizes of the regions on chromosome 5 and 20 were about 14 Mb and about 6 Mb, respectively.

Figure 3
figure 3

Parametric multipoint linkage analysis in 59 CGD2 families. Heterogeneity LOD (HLOD) scores for all families are plotted at each chromosome. Except 5q31, the only chromosome 20 showed suggestive linkage signal (HLOD = 2.24). Dashed line represents suggestive linkage level (HLOD = 2.1).

Simulation study

In the present study, by simulation analysis, we investigated how many regions with similar linkage scores could be expected by chance, in order to examine the false positive rates in our data. Then, we calculated empirical genome-wide (P genome) or chromosome-wide (P chromosome) significance levels for linkage statistics after markers in high LD were removed, using 1,000 gene-dropping simulations by MERLIN software. From simulation study, empirical limit for genome-wide significance was established at HLOD > 2.665. This means that the 50th high linkage score in 1,000 random replicates (frequency of 1 in 20, P = 0.05) was 2.665. Therefore, HLOD scores larger than 2.665 were considered to be significant in this study.

Linkage score of rs6445141 on 3q26.3, HLOD = 5.83, reached a significant linkage level (P = 0.001). Except 3q26.3, eight loci on five different chromosomal regions (7q22, 12q21, 13q33, 20p12.3, and 20q13.2-3) were excluded from candidate regions by simulation analyses (Table 2).

Subgroup analysis

As a result of the linkage analysis, we observed linkage signals on chromosome 3 as well as chromosome 5 (Figure 2, Table 2). To eliminate the confounding effect of families without linkage on 3q26.3, we selected 153 individuals from 17 families contributing to the HLOD peak on chromosome 3. The parametric single-point linkage analysis of this subgroup showed a LOD = 8.88 for 3q26.3 region and 13.33 for 5q31 region (Figure 4). To analyze the correlation between genetic locus and disease phenotype in this subgroup, we tried to classify and compare the phenotypic differences of CGD2 in patients with or without the segregation of 3q26.3. We could not find a significant correlation between genotype and phenotype, due to the lack of criteria for phenotype classification. We next ascertained whether the region of 3q26.3 was associated with specific clinical data in group A. We observed that CGD2 patients had lower triacylglycerol and insulin levels than controls in group A, although the mean levels of two traits were within normal range in both patients and controls (Table 3).

Figure 4
figure 4

Parametric single-point linkage analyses in Group A. Heterogeneity LOD (HLOD) scores for subgroup families are plotted on chromosomes 3 and 5 (gray: LOD, black: HLOD). Dotted line and dashed line represents suggestive linkage level (2.1) and significant linkage level (3.3), respectively.

Table 3 Comparison of clinical data between patients and controls in Group A Values are expressed as mean ± s.d. In bold is shown the significant clinical phenotypes for subgroup

Discussion

Nearly all previous molecular genetic studies on CGD2 patients have reported R124H mutation of the TGFBI gene. Studies on CGD2 have reported phenotypic heterogeneity, since CGD2 is a dominant Mendelian trait with age dependent penetrance and variable expressivity (Rosenwasser et al., 1993; Okada et al., 1998; Meallet et al., 2004). In addition, phenotypic heterogeneity could suggest that mutation of several different genes may have an effect on common targets or pathways responsible for the disease. For example, the variable expressivity by affected individuals varied markedly within and between families with the same R124H mutation in TGFBI. In Japanese patients, Watanabe et al. (Watanabe et al., 2001) reported two different clinical phenotypes of CGD2 for homozygotes. They suggested that even though the R124H mutation of exon 4 in the TGFBI gene responsible for the disease was the same among all patients, other genes and factors could affect the maintenance of corneal transparency. Those patients homozygous for this mutation (R124H) displayed a more severe phenotype compared to the heterozygous patients; the expressivity of corneal deposits varied among individuals and had no correlation with the age of patients (Okada et al., 1998; Kim et al., 2008; Cao et al., 2009).

Sellick et al. (2004) reported that the use of a high-density SNP array provides an efficient method of conducting genome-wide linkage analysis for identification of susceptible loci of Mendelian disorders. However, the effect of LD on LOD scores becomes an issue when high-density marker platforms are used. Current analyses concur with the findings of Schaid and coworkers (Schaid et al., 2004), who found that the presence of LD between SNPs can lead to inflated linkage statistics, since existing linkage software requires equilibrium between markers. The use of high-density maps of SNPs as markers is a relatively recent advance in linkage analysis. In this study, we first evaluated the performance of the high-density SNP genotyping platform in confirming the disease locus of CGD2 by excluding SNPs with high LD. Second, we applied this technology to the identification of susceptible loci co-segregated with CGD2 in a series of pedigrees by applying parametric linkage analysis. Parametric single-point and multipoint linkage analyses using the SNP genotypes confirmed linkage to TGFBI and provided LOD scores better than those obtained using microsatellites, as reported by Stone et al. (Stone et al., 1994). In addition, we found a new locus on 3q26.3 (rs6445141) co-segregated with CGD2 in Korean families. rs6445141 showed the highest LOD score (5.832), except TGFBI locus, in single-point linkage analysis. The following simulation analysis supported this result with a significant empirical p-value (Table 2). rs6445141 is located on intron of neuroligin 1 gene (NLGN1) on chromosome 3 and there was no previous report about this region related to CGD2. Neuroligin 1 is a member of the family of brain-specific membrane proteins and is known to be continuously increased in the optic tectum and retina during development. Also, NLGN family members were reported to be expressed at high levels by β-cells and to play a role in the insulin secretory mechanism (Suckow et al., 2008). For this reason, we compared the clinical traits related to glucose metabolism in Group A. The results showed meaningful differences between patients and controls in blood insulin and triacylglycerol level. To elucidate the mechanisms behind these findings, further studies in genetics and molecular biology are needed.

In the first genome-wide linkage analysis, we also found eight other candidate regions for CGD2 on chromosome 7, 12, 13 and 20 by satisfying the criteria for suggestive linkage proposed by Lander and Kruglyak (Lander and Kruglyak, 1995). Although these results were failed to be supported by simulation analysis, it was interesting that some of genes in these regions are expressed in cornea or related to TGFβ gene signaling pathways.

Reelin (RELN, 7q22) is an extracellular matrix protein that binds to integrin receptor (D'Arcangelo et al., 1995). In normal eyes, reelin is expressed only at very low levels in the ganglion cell layer of the retina and the endothelial cell layer of the cornea. In injured eyes, however, there is marked expression in the retina and cornea (Pulido et al., 2007). Increased reelin expression after injury may play a role in the healing phase (Dong et al., 2003). Bone morphogenetic protein-7 (BMP7, 20q13) is a multifunctional cytokine, a member of the transforming growth factor beta (TGF-β) superfamily of growth factors that function in regulation of cell growth, differentiation, and apoptosis; these growth factors are expressed in the human cornea, ciliary epithelium, lens epithelium, retina, and blood vessels (Reddi, 1994; Yamashita et al., 1997; Wordinger et al., 2002).

In subgroup study, the significant increase of linkage on chromosome 3 indicated that the genetic heterogeneity at this locus likely decreased the linkage in the initial single-point analysis with whole samples.

To investigate the effect of 3q26.3 on CGD2, genetic and functional analyses are needed. The histopathological test in patients also needed to classify disease phenotypes more accurately. Accordingly, further work is required to evaluate their putative roles by sequencing and comparing allele frequency between cases and controls. Further, it is necessary to set up a phenotype classification to study pathophysiology or phenotypic variation more accurately. Nonetheless, these findings offer important insights for our overall understanding of the genetics of CGD2, indicating that the genetic basis of CGD2 is more complex than originally believed.

Methods

Subjects

The study population comprised 551 individuals from 59 unrelated families, including 216 patients (8 homozygotes and 208 heterozygotes for R124H mutation in TGFBI gene), 11 heterozygous R124H carriers without disease phenotype, and 324 unaffected controls. A participant was diagnosed with CGD2 who showed accumulation of both granular and lattice lesions of cornea in slit-lamp test and had R124H mutation in TGFBI gene, simultaneously. Carrier was defined as participant having R124H mutation of TGFBI gene without accumulation of lesions in cornea. All family members over age 4 received a medical examination since symptoms of homozygote patients manifest from 3 years of age. All subjects enrolled in this study were of Korean ethnicity. The study was approved by the institutional review boards of Yonsei Severance Hospital and the Center for Genome Science, National Institute of Health and was performed following the tenets of the Declaration of Helsinki. Informed consent was obtained from all participants for the clinical and molecular genetics analyses.

Clinical data

Each participant was measured for his or her anthropometric characteristics (weight, height, blood pressure, waist circumference, hip circumference), and a blood sample was drawn from each participant for biochemical analysis. In addition, the following clinical tests were performed: bone density of ankle (quantitative ultrasound), body fat composition (impedance body composition analyzer), slitlamp biomicroscopy, and intraocular pressure (Supplemental Data Table S1). Participants also completed a questionnaire regarding physical activity, alcohol intake, smoking, exposure to second-hand smoke, medical history, and dietary questions.

Genotyping

Genomic DNA was extracted from peripheral blood samples using DNA isolation kits according to the manufacturer's recommended protocols (Intron, Seoul, Korea). Illumina HumanCNV 370K-Duo Beadchip including 370,404 markers was used to genotype the 551 individuals according to the manufacturer's recommended protocol (Illumina, San Diego, CA). The array was scanned with a two-color confocal BeadArray reader (Illumina). Image intensities were extracted and genotyped using Illumina's BeadStudio 3.2 software.

Linkage analysis

Searches for non-Mendelian and Mendelian consistent errors for potential genotyping were initially undertaken using the pedstats and error options in the MERLIN ver. 1.1.2 software. All SNPs showing inconsistency in transmission were removed from further analyses. Single-point and multipoint linkage analysis was performed with MERLIN along the entire length of each chromosome. Parametric linkage in the presence of heterogeneity was assessed using heterogeneity LOD (HLOD) scores. A parametric pairwise LOD score of 2.1 or greater was required for further study of a potential locus. In single-point linkage analysis, to eliminate false positives, markers with LOD > 2 which were supported by at least three near markers with LOD scores > 1 were selected as candidate makers for CGD2. By considering the age dependent penetrance, we set parameter for the disease penetrance to 90% in linkage analysis with autosomal dominant model. In multipoint linkage analysis, to avoid inflation of the false positive rate, (Huang et al., 2004; Boyles et al., 2005) linkage disequilibrium (LD) between markers was calculated, and the impact of LD on linkage at the linked regions was assessed.

To evaluate the robustness of our linkage results, genome-wide significance levels of observed statistics were defined as the frequency of equivalent or higher scores for 1,000 data replicates simulated using the hypothesis of no linkage. Simulated data sets were generated and analyzed using MERLIN.

To investigate the effect of genetic heterogeneity between families, we classified 59 families into two subgroups. 17 families showing positive HLOD score on 3q26.3 were classified into group A and 42 families showing negative score were classified into group B.

Statistical methods

To compare clinical traits between case and control in Group A, a total of 34 individuals (19 cases and 15 controls) from 17 families were selected by systematic random sampling technique. Age- and sex-adjusted characteristics were expressed as means and standard deviations of the participants and by Student's t test. The reported P values were based on two-side levels of significance. Statistical analyses were performed using SPSS software version 12.00.