Whole-exome Sequencing Analysis Identifies Mutations in the EYS Gene in Retinitis Pigmentosa in the Indian Population

Retinitis pigmentosa (RP) is a rare heterogeneous genetic retinal dystrophy disease, and despite years of research, known genetic mutations can explain only approximately 60% of RP cases. We sought to identify the underlying genetic mutations in a cohort of fourteen Indian autosomal recessive retinitis pigmentosa (arRP) families and 100 Indian sporadic RP cases. Whole-exome sequencing (WES) was performed on the probands of the arRP families and sporadic RP patients, and direct Sanger sequencing was used to confirm the causal mutations identified by WES. We found that the mutations of EYS are likely pathogenic mutations in two arRP families and eight sporadic patients. Specifically, we found a novel pair of compound heterozygous mutations and a novel homozygous mutation in two separate arRP families, and found two novel heterozygous mutations in two sporadic RP patients, whereas we found six novel homozygous mutations in six sporadic RP patients. Of these, one was a frameshift mutation, two were stop-gain mutations, one was a splicing mutation, and the others were missense mutations. In conclusion, our findings expand the spectrum of EYS mutations in RP in the Indian population and provide further support for the role of EYS in the pathogenesis and clinical diagnosis of RP.


Retinitis pigmentosa (RP) is a rare heterogeneous genetic retinal dystrophy disease, and despite years of research, known genetic mutations can explain only approximately 60% of RP cases. We sought to identify the underlying genetic mutations in a cohort of fourteen Indian autosomal recessive retinitis pigmentosa (arRP) families and 100 Indian sporadic RP cases. Whole-exome sequencing (WES) was
performed on the probands of the arRP families and sporadic RP patients, and direct Sanger sequencing was used to confirm the causal mutations identified by WES. We found that the mutations of EYS are likely pathogenic mutations in two arRP families and eight sporadic patients. Specifically, we found a novel pair of compound heterozygous mutations and a novel homozygous mutation in two separate arRP families, and found two novel heterozygous mutations in two sporadic RP patients, whereas we found six novel homozygous mutations in six sporadic RP patients. Of these, one was a frameshift mutation, two were stop-gain mutations, one was a splicing mutation, and the others were missense mutations. In conclusion, our findings expand the spectrum of EYS mutations in RP in the Indian population and provide further support for the role of EYS in the pathogenesis and clinical diagnosis of RP.
Retinitis pigmentosa (RP; OMIM 226800) is a highly heterogeneous genetic disease characterized by progressive visual loss caused by the impairment of retinal photoreceptors 1 . The worldwide prevalence of RP is approximately one in 3,500-5,000 2 , and it can be inherited as an autosomal recessive (50-60%), an autosomal dominant (30-40%) or an X-linked trait (5%-15%). Thus far, more than 70 genes and loci have been identified for RP (https://sph.uth.edu/RetNet/). However, these genes account for only approximately 60% of RP cases 3 . Therefore, unknown RP genes remain to be identified, and novel RP genes would provide valuable information for the diagnosis, prevention and treatment of RP.
The EYS gene (OMIM 612424, NM_001142800) corresponds to the RP25 locus and was identified as the gene causing autosomal recessive retinitis pigmentosa (arRP) in 2008, and it is mainly expressed in the retina 4 . The human EYS gene encodes a homologue of the Drosophila eye spacemaker (SPAM) protein and is essential for the development and morphology of photoreceptors 5 . Thus far, 15 mutations have been reported in EYS for RP patients, and the types of mutations include missense mutations, nonsense mutations, insertions, deletions and splice site mutations 2,6-10 .
However, there has been limited success when using traditional approaches to screen potential genes for RP because many techniques for positional cloning and gene identification are relatively time consuming, expensive and inefficient. Recently, whole-exome sequencing (WES) by next-generation sequencing (NGS) has become an efficient method for identifying genetic variants at the whole-genome level. In several studies, NGS has provided a promising alternative approach for the molecular diagnosis and genetic identification of RP 11,12 . Because of relatively high levels of consanguinity and large numbers of offspring per family, hereditary-disease gene analysis is highly effective in the Indian population. In this study, which is part of an international collaborative project, we used WES to identify disease-causing genes for RP in the Indian Population, and the results indicated novel mutations in the EYS gene in two consanguineous Indian families and eight sporadic Indian RP patients. Our findings expand the mutation spectrum of EYS within the Indian population and demonstrate that WES by NGS is a powerful tool for the genetic diagnosis of RP.  Whole-exome sequencing and data analysis. DNA samples from the probands of arRP families and 100 sporadic RP patients were subjected to WES at Axeq Technology Inc., Seoul, Korea. In brief, the workflow of WES was as follows. First, the genomic DNA samples were fragmented into 150-200 bp fragments and then ligated to paired-end adaptors. Exome enrichment was performed according to the manufacturer's protocol with the Agilent SureSelect Human All Exon 50 MB kit V5 (Santa Clara, Californian, U.S.A.), which covers 20,965 genes and 334,378 exons in the Consensus Coding Sequence Region database. The captured libraries were subjected to a quality assessment with an Axeq 2100 Bioanalyzer and sequenced on an Illumina HiSeq 2000 sequencer. The raw image files were processed by Illumina base calling software v.1.7 for base calling with default parameters, and the sequences of each individual were generated as 90-bp paired-end reads. High-quality sequencing reads of each sample were aligned to the reference human genome hg19 UCSC assembly (http:// genome.ucsc.edu/) using the Burrows-Wheeler Aligner (BWA) program v.0.5.9-r16 (http://bio-bwa.sourceforge. net/) 13 . The SNPs and indels were detected by SAMTOOLS v.0.1.19 (http://samtools.sourceforge.net/) using the 'mpileup' command. The variants that had depths less than 10 and those located outside of the exome-capture regions were filtered out. For all of the samples, we used individual base-calling algorithms.     supplementary Table S1. PCR amplification was performed, and the products were purified using the PCR purification kit from Qiagen following the manufacturer's instructions. The purified PCR products were then sequenced on an ABI3730 genetic analyser. The sequencing results were compared with the EYS gene reference sequence (NC_000006.12) to confirm the candidate nucleotide variants.

Results
Clinical features of Indian families with arRP and sporadic RP. The detailed clinical data for two arRP families and eight sporadic RP patients are shown in Table 1 Fig. 6A,B), and six novel likely pathogenic mutations in six patients (Table 2, Fig. 7). These mutations included two stop-gain mutations, one splicing mutation, six missense mutations, and one frameshift deletion ( Table 2). These mutations either affect predicted functional regions of EYS, such as the Laminin G (LamG), EGF, EGF-like calcium-binding (EGF-CA), and EGF-like domains (Fig. 6A), or affect evolutionarily conserved amino acid residues (Fig. 6B).
Additional disease-causing mutations of other known RP genes or previously identified mutations of EYS were not found in any of these patients. According to the Exome Aggregation Consortium database (http://exac. broadinstitute.org/) and our 1000 control results, homozygous mutations or compound heterozygous mutations were not detected in the normal control population. 14 and SMART were used to predict the effect of the identified amino acid substitutions on the EYS protein function. Most mutations are located at the protein functional domain LamG or EGF/EGF-like/EGF-CA, which may affect the function of the protein (Table 2, Fig. 8A).

Discussion
EYS spans 2.0 Mb of genomic DNA and encodes 3,165 amino acids, and it is considered one of the largest genes expressed in the human eye. EYS is a multi-domain protein that starts with a signal peptide of 21 amino acids. It contains 28 epidermal growth factor (EGF) or EGF-like domains at the N-terminus and five LamG domains at the C-terminus and is highly expressed in retinal photoreceptors 9 . Previous studies have shown that a homologous protein of EYS in Drosophila named Spacemaker (spam) is involved in luminal space formation and plays an essential role in the formation of matrix-filled interrhabdomeral space in Drosophila [15][16][17] . However, its biological function in the human retina remains unknown. Further investigations for EYS function in the retina are essential for deep understanding of the pathological mechanism of retinal degeneration.
EYS is a major causative gene for RP. Thus far, the mutations identified in EYS include p.D904Qfs*17, p.S754Afs*6, p.T657Afs*5, p.W2640*, p.E1836* and others 5,9,18 . Most of these reported mutations result in a truncated EYS protein, which reveals that the C-terminus of EYS is essential for its function in the retina. In this study, we found a pair of compound heterozygous mutations, c.8422G> A (p.A2808T) and c.7868G> A (p.G2623E), in EYS in family ARRP-49. Both mutations affect a conserved amino acid residue. The mutation p.A2808T in exon    18 . This French RP patient had another heterozygous mutation c.3329C> G 18 . The novel mutation p.G2623E in exon 40 identified in our study lies in a region of disulphide bond modification (http://www.uniprot.org/uniprot/Q5T1H1), which may affect the formation of disulphide bonds, thereby impairing protein function. These two mutations together may account for the incidence of RP in family ARRP-49. The novel homozygous mutation c.1871G> A (p.S624L) in EYS is likely the causative mutation for family ARRP-206. This missense mutation leads to the replacement of a small-sized polar residue, serine, by a hydrophobic lysine. However, this mutation is not highly evolutionarily conserved, and further studies are warranted to ascertain the impact of this mutation on EYS functions. The majority of RP cases show the sporadic form 19,20 , and the inheritance pattern of this form is difficult to ascertain. In our study, we found two compound heterozygous mutations in the sporadic RP patients RP:S-10 and RP:S-18. In patient RP:S-10, the compound heterozygous mutations c.C4060G (p.Q1536E) and c.A5038G (p.N1680D) were found, and these mutations are located in exon 26 and the unknown region of the protein, respectively. In patient RP:S-18, the compound heterozygous mutations c.G1418T (p.G473V) and c.C2971T (p.L991F) were found, and these mutations are located in the unknown region and the EGF domain of the protein, respectively. However, there were several limitations in this study. Because the blood samples of the intra-family members of the sporadic RP patients were difficult to collect, we could not assess the genotypes of these intra-family members, which included their parents, siblings or offspring. Therefore, we could not be certain whether these two variants originate from distinct alleles or the same allele in these two sporadic patients.
In sporadic patients RP:S-22 and RP:S-48, we found two novel homozygous stop-gain mutations, c.8388C> A (p.Y2796X) and c.3024C> A (p.C1008X), respectively, and these mutations resulted in the generation of truncated proteins in the fourth LamG domain at the C-terminal and the fifth EGF-CA domain at the N-terminal, respectively. The truncated proteins likely impair the function of EYS. These nonsense mutations could cause the degradation of EYS mRNA via nonsense-mediated mRNA decay (NMD) 21 . We also found a novel frameshift deletion mutation in exon 43 and a splicing-site mutation in exon 15 in patients RP:S-2 and RP:S-40. Both mutations led to abnormal proteins. In sporadic patients RP:S-14 and RP:S-34, we found missense mutations in the LamG and EGF domains, respectively, and the SIFT/PROVEAN predictions indicate that these mutations may be harmful to the protein. Iwanami et al., demonstrated that mutation types were related to the severity of the RP symptoms 22 . In our study, the correlation between genotype and phenotype of patients is not readily obvious due to the small number of subjects. More studies about the correlation between genotype and phenotype of RP would give the better understanding of the diagnosis and prediction of the incidence of RP.
In summary, we identified three novel compound heterozygous mutations in EYS and seven novel homozygous mutations for RP in the Indian population. Our study not only expands the spectrum of EYS mutations for arRP in the Indian population but also shows that WES can be an effective tool for identifying causative mutations in RP patients and diagnosing genetic diseases.