NGS-based Molecular diagnosis of 105 eyeGENE® probands with Retinitis Pigmentosa

The National Ophthalmic Disease Genotyping and Phenotyping Network (eyeGENE®) was established in an effort to facilitate basic and clinical research of human inherited eye disease. In order to provide high quality genetic testing to eyeGENE®’s enrolled patients which potentially aids clinical diagnosis and disease treatment, we carried out a pilot study and performed Next-generation sequencing (NGS) based molecular diagnosis for 105 Retinitis Pigmentosa (RP) patients randomly selected from the network. A custom capture panel was designed, which incorporated 195 known retinal disease genes, including 61 known RP genes. As a result, disease-causing mutations were identified in 52 out of 105 probands (solving rate of 49.5%). A total of 82 mutations were identified, and 48 of them were novel. Interestingly, for three probands the molecular diagnosis was inconsistent with the initial clinical diagnosis, while for five probands the molecular information suggested a different inheritance model other than that assigned by the physician. In conclusion, our study demonstrated that NGS target sequencing is efficient and sufficiently precise for molecular diagnosis of a highly heterogeneous patient cohort from eyeGENE®.

an important step towards better clinical diagnosis, prognosis, and identifying patients who may benefit from therapeutic interventions such as gene therapy 10,12,13 .
The National Ophthalmic Disease Genotyping and Phenotyping Network (eyeGENE ® ) is a multicenter genomic medicine initiative started by the National Eye Institute (NEI) at the National Institutes of Health (NIH) in 2006. eyeGENE ® aims to promote studies of inherited eye diseases and their genetic causes. The program includes a CLIA-level DNA repository, a database linking genotype and phenotype data, and a patient registry. eyeGENE ® not only expands patients' access to diagnostic testing, but also allows registered researchers to gain access to the research database and samples for continued studies such as genotype-phenotype correlations, disease causing mutation prevalence and novel disease gene discovery 14,15 . Additionally, eyeGENE ® is able to contact specific patient populations from the registry for recruitment of additional clinical studies.
The largest patient population in eyeGENE ® is RP with total over 2,000 patients, of which about 70% are simplex cases. Due to the high cost and complexity of molecular diagnosis of RP, the vast majority of the simplex RP cases and some of the multiplex cases had not been tested, representing one of the biggest challenges for fulfilling eyeGENE ® 's mission. To address this issue, we conducted a pilot study and performed NGS based mutation screening of 105 RP probands from eyeGENE ® whose molecular diagnosis remained unknown. This cohort was tested using a custom designed 195-gene panel, which included 61 known RP causative genes and 19 genes that cause syndromic RP such as Usher and Bardet-Biedl syndrome. Through NGS based sequencing analysis we assigned causative mutations to 52 patients, achieving a solving rate of 49.5%. While 49 patients carried mutations in known RP genes, 3 patients were found to carry mutations in retinal disease genes that have not been associated with nonsyndromic RP previously. In addition, 5 of the 6 RP families initially labeled as autosomal dominant were found to carry compound heterozygous or homozygous mutations in known RP disease genes. Taken together, our results indicate that an NGS based approach is effective in providing a diagnosis for the highly heterogeneous patient collection at eyeGENE ® .

Materials and Methods
Clinical identification of RP patients. Patients with inherited eye disease were enrolled in the eyeGENE ® program (protocol #06-EI-0236) by approved certified eye care specialist. Clinical details and family history were provided by referring clinicians and entered into the eyeGENE ® database (https://nationaleyegene.nei.nih.gov/ eyeGENE). Clinical information and family history were further reviewed by members of the eyeGENE ® Working Group to corroborate the patient's diagnosis of RP. For this study, 105 unrelated RP probands were randomly selected from the eyeGENE ® database. Informed consent was obtained from tested individuals or from parents or guardians for individuals under age 18. All experimental protocols were approved by the Institutional Review Board of Baylor College of Medicine. This study adhered to the Declaration of Helsinki. DNA extraction, library preparation and capture sequencing. For each patient enrolled in eyeGENE ® , a blood sample was collected and shipped to the eyeGENE ® Coordinating Center CLIA (Clinical Laboratory Improvement Amendments) laboratory on the NIH campus in Bethesda, MD. Genomic DNA was extracted from whole blood either manually or automatically using the Gentra Puregene (Qiagen). DNA concentration was measured by a NanoDrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE) and samples were stored indefinitely in the eyeGENE ® Biobank at NEI. A fraction of de-identified DNA was send to Baylor College of Medicine for diagnostic research testing. Genomic DNA from each sample was mechanically sheared, end repaired, and ligated to molecularly bar-coded adaptors to generate sequencing libraries following the manufacturer's standard protocol (Illumina). Co-capture was performed on pooled DNA libraries in groups of up to 48 samples. Captured sample DNA was sequenced on an Illumina HiSeq 2000 according to the standard operating protocol.

Capture panel design.
A capture panel enriched of the retinal disease genes was developed and assessed as described previously 16 . The panel covers coding exons and flanking splicing junctions for 195 known retinal disease genes at the time of design (Supplemental Table 1). A total of 61 nonsyndromic RP associated genes were included in the panel including 18 adRP genes and 33 arRP genes, 3 X-linked RP genes, and 7 RP genes that can be both dominant and recessive (Supplementary Table S1).
Bioinformatics analysis. An automated pipeline previously described was used to process sequencing data with reads mapping, recalibration, realignment, variant calling, variant filtering and annotation 17 . Since RP is a rare Mendelian disease, recessive variants with an allele frequency > 0.5% or dominant variants with an allele frequency > 0.1% in the following databases were filtered out: the 1000 genome database 18 , dbSNP135 (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/SNP/), the NHLBI Exome Sequencing database (http://evs.gs.washington.edu/EVS/), the NIEHS Exome Sequencing database (http://evs.gs.washington.edu/ niehsExome/), as well as an internal control database of 997 exomes. The pathogenicity of these rare variants was assessed based on three criteria. First of all, variants reported in the Human Gene Mutation Database (HGMD) 19 or the primary literature were identified. Secondly, variants that lead to severe loss of function mutations such as stopgain, stoploss, frameshift and splicing defects were identified. Third, missense variants that result in protein coding changes were evaluated by the in silico prediction program dbNSFP 20 and only deleterious ones predicted by 3 out of the 6 algorithms (SIFT, Polyphen2, LRT, MutationTaster, MutationAssessor, and PhyloP) were considered as candidates.
Validation and Report. All putative causative mutations identified were validated by Sanger sequencing. A 500-bp flanking sequence at each side of the mutation was obtained from the UCSC genome browser. RepeatMasker was used to mask the repetitive region. Primer 3 was used to design a pair of primers at least 50 bp upstream and downstream from the mutation. After PCR amplification, the amplicons were sequenced on an ABI 3730xl or 3500XL Genetic Analyzer. Reports of the high-confidence genetic testing results were sent back to eyeGENE ® and positive results were confirmed through direct sequencing by the CLIA certified laboratories in the eyeGENE ® Network. Confirmed results were then shared with the referring clinician.

Results
The RP patient cohort. A total of 105 unrelated RP probands whose mutations remained unknown were randomly selected from the eyeGENE ® database. Based on inheritance information documented in the database, most of the cases were simplex or unknown (67%), followed by autosomal recessive (20%), and autosomal dominant (13%). There was also one case of syndromic RP with hearing loss (Fig. 1a). The majority of the probands is Caucasian (65%), followed by unknown descent (19%), Asian (9%), African American (4%), and multiple races (2%  Table S2).

Identification of pathogenic mutations.
To identify the pathogenic mutations in the 105 RP patients, NGS based panel sequencing that covers all coding exons and flanking splicing junctions of 195 known retinal disease genes was performed. Ten and twenty fold average coverage was achieved for 97% and 95% of the targeted regions, respectively (Fig. 1b). Sequencing results were analyzed using the bioinformatics pipeline as described in the method section. Known mutations were identified by searching the public databases, such as the HGMD database 19 , while novel variants were annotated for their impact on protein coding. As a result, putative mutations were found in 52 cases with a solving rate of 49.5% (Table 1). The mutations are distributed across 21 retinal disease genes with USH2A as the most frequently mutated gene, accounting for 11 solved cases (22%). In addition, mutations have been found in EYS (10%), CRB1 (10%), PDE6B (10%), RDH12 (8%), RPGR (6%), RHO (6%), CNGB1 (4%), MERTK (4%), and the rest of cases in 12 genes with one case for each gene (Fig. 2a). A total of 82 mutant alleles have been identified in our study, all of which have been confirmed by the CLIA certified laboratories in the eyeGENE ® Network. Among them, missense mutations account for 58%, followed by frameshift (18%), nonsense (17%), and splicing (7%) (Fig. 2b). Interestingly, close to 60% of the mutant alleles have not been previously reported (48/82). Fig. 1 (Table 1). These patients have been assigned with a dominant inheritance model based on the initial diagnosis provided by the physician. Our molecular diagnosis results did not support this except for patient 5A2+ H.62, so we contacted the clinicians for further information. Investigation of the pedigree information indicated that indeed some of the families were likely to be misclassified (Fig. 3).   There were total 21 patients labeled as recessive inheritance and 9 of them were solved (Table 1). Consistent with the diagnosis of arRP, all genes found were known arRP genes, including CRB1, PDE6B, RDH12, RPE65, and USH2A. Of the solved arRP cases, one interesting case was the c.295C> A, p.(Ile2995Asn) missense mutation in homozygous state in RDH12 found in proband 3JY+ V.17, which is also the causal mutation for the proband 3WP+ 3.68 (Table 1). This missense mutation leads to LCA 21 when combined with a nonsense mutation, while severe RP 22 is observed when combined with a second missense allele. Consistent with the idea that the c.295C> A, p.(Ile2995Asn) is a hypomorphic allele, both probands 3WP+ 3.68 and 3JY+ V.17 show RP phenotype at age 11 and 3 years old, respectively. Therefore, both cases turned out to have relatively earlier onset age compared to typical RP patients and should be classified as juvenile RP. This is also in consistent with a recent research shown that RDH12 was the most frequently mutated gene in the juvenile RP group in a large Spanish cohort 8 .

Identification of pathogenic mutations in dominant and recessive cases. As shown in
Identification of pathogenic mutations in simplex/unknown cases. In this study, 70 (67%) of the RP cases were simplex or unknown, for which molecular diagnosis is most valuable. In the 70 simplex cases, causative mutations were identified in 37 samples. Specifically, we identified mutations in autosomal dominant retinal disease genes for 5 cases, in X-linked disease genes for 4 cases, and in recessive disease genes for 28 cases.
Dominant mutations in RP genes PROM1, PRPH2, RHO, and CRX were found in 5 probands (Table 1). While one novel nonsense mutation in CRX was found in patient 3XM+ J.87, four mutations previously reported were found in genes PRPH2, PROM1, and RHO. For example, the p.(Arg172Trp) mutation in PRPH2 was assigned causative mutation for proband 8J+ Y.4. In a previous study, the p.(Arg172Trp) mutation in PRPH2 was found to segregate in two independent families with affected members showing symptoms with blurred central vision and photophobia, while no complain of night blindness or restricted peripheral vision 23 . A closer investigation of the clinical exam result for proband 8J+ Y.4 showed that this patient had both macular and peripheral retinal degeneration and that he experienced visual acuity loss (at 24 years) before night blindness (at 46 years). This is similar to the phenotype described in the previous study supporting that p.(Arg172Trp) in PRPH2 is likely the causative mutation. For another example, proband N6+ A.15 was assigned the p.(Arg373Cys) mutation in PROM1. This mutation has been reported in a four-generation Italian family with autosomal dominant RP and affected members showing reduced central vision first and with night blindness progressing over time 24 . In proband N6+ A.15, however, night blindness and visual acuity loss occurred at the same time (at age 30 years). It is possible that genetic background or environment factors could influence the onset of development of night blindness in patients with the p.(Arg373Cys) mutation in PROM1. Finally, p.(Ala164Val) and p.(Pro171Leu) mutations in RHO were found in patients S7+ G.76 and 5VY+ V.14, respectively. Both have been previously reported to be causal mutations and segregate in dominant RP families 25,26 , and both affect folding of rhodopsin protein by biochemistry studies 27 .
X-linked RP is estimated to account for 10% ~ 20% of all RP, of which the males typically show an early age of onset and a rapid course of vision loss. RPGR and RP2, the genes most often associated with X-linked RP, explain more than 15% of isolated male RP cases 28 . In this study, we identified hemizygous mutations in RPGR in 3 probands and RP2 in 1 proband (Table 1). Two out of the three RPGR mutations, p.(Glu749*) and p.(Glu1014Glyfs*64), were novel and were located at the mutation hot spot RPGR ORF15. All three mutations are likely to be loss of function mutations that either result in truncated proteins or no protein through nonsense mediated decay. The p.(Leu240Tyrfs*14) identified in RP2 was also novel and predicted to produce a prematurely truncated protein.
For 28 of the simplex/unknown cases, mutations were identified in 10 arRP genes ( Table 1). As expected, the most frequently mutated genes were USH2A and EYS, accounting for 8 and 4 cases, respectively. Additionally, for three probands we found deleterious mutations in retinal disease genes other than those associated with RP (Table 1). For example, compound heterozygous mutations c.4349A> G, p.(Lys1450Arg) and c.2285G> A , p.(Arg762His) that are novel and predicted to be damaging, were found in Usher gene GPR98 in patient 5ES+ 3.87. Although the age of diagnosis of hearing loss in type II Usher patients can be variable, it is generally during childhood with a median age of 5 years 29 . However, patient 5ES+ 3.87 did not show any hearing loss at the time of the clinical visit when he was 39 years old, it is thus unlikely that patient 5ES+ 3.87 could be a typical type II Usher patient ( Table 2). In another proband UFC+ 7.74, novel missense mutations p.(Val243Phe) and p.(Ser747Leu), predicted to be damaging, were found in the complete type of Congenital Stationary Night-blindness (CSNB) gene GRM6. Electroretinogram (ERG) responses for this patient were not recordable in either eye under scotopic and photopic conditions ( Table 2 and supplementary Fig. S2). Finally, compound heterozygous mutations p.(Phe1950Leufs*15) and p.(Glu1803Asp) in CEP290 were found in proband 5WY+ Y.91. The first allele, p.(Ph-e1950Leufs*15) in CEP290 has been previously reported in two LCA families in compound heterozygous state with either a splicing mutation or a non-frameshift mutation, and the second allele p.(Glu1803Asp) was novel 30 . A closer investigation of the clinical information of this patient showed that she had first experienced night-blindness at age 18 and vision loss at age 22. Also, her best corrected visual acuity was 20/20 in both eyes at the time of clinical visit when she was 39 years of age.

Discussion
One of the biggest challenges for fulfilling eyeGENE ® 's mission is that most of its enrolled patients were not introduced to a clear plan of genetic testing that would be both financially efficient and result in a likely associated genetic cause. Of these patients, the majority (70%) is RP, which is highly heterogeneous at multiple levels. First, RP is both genetically and clinically heterogeneous with multiple genes contributing to the disease, and phenotypes vary greatly among patients. Second, the inheritance pattern is heterogeneous and not always clear based on the pedigree information alone. Finally, as a national network, the eyeGENE ® patients were recruited and examined by many physician groups across the country thus the clinical tests performed and the diagnosis criteria are not always the same. The information for each proband available also varies significantly. As a result, molecular diagnosis of this highly heterogeneous collection is challenging. Here, we performed a pilot study using NGS based panel sequencing for molecular diagnosis of eyeGENE ® RP patients and achieved a similar yield of solved cases (~50%) in comparison to previous studies that adopted similar approaches 10,11 . In addition, of the 82 mutations identified, 48 (~60%) of them were novel, which is also comparable to previous studies 10,11 .
One interesting finding of this study is the inconsistency between inheritance patterns assigned and the genetic test results for five out of six adRP solved cases (Table 1 and Fig. 3). While two of the families were misclassified (RC+ V.27 and VGM+ V.35), the other three families are inconclusive for dominant inheritance as members from less than three generations were affected. Indeed, assigning inheritance patterns based solely on pedigree information could be prone to error. For example, 8.5% of the families thought to have adRP truly have X-linked RP 31 . Using the NGS approach, however, helps to clarify this issue since all variants, regardless of inheritance, are considered simultaneously.
Our study also showed that NGS based molecular diagnosis can potentially reveal novel genotype phenotype associations. For example, in three cases, we identified potential new associations for mutations in retinal disease genes GPR98, CEP290 and GRM6 with an RP phenotype. Although the documented clinical information for these three patients supported RP phenotype, it is possible that these patients had Usher syndrome, LCA, or CSNB and were misdiagnosed as general RP. Segregation tests as well as clinical diagnosis refinements will be required to  confirm the genetic testing results. Nevertheless, these findings are particularly important for the family, especially family members at risk. With the identification of many more mutations causing inherited retinal diseases and their associated phenotype clearly documented in the eyeGENE ® database, clinicians and counselors will feel more confident in providing guidance to affected families.
In conclusion, our study showed that NGS based approach is robust and effective in providing precise molecular diagnosis for the highly heterogeneous collection of RP patients from eyeGENE ® . The results from this study are essential for fulfilling the goals of eyeGENE ® to advance vision research and to contribute to the shared resources for the research community. First, the novel mutations identified in these RP patients will be documented in the database and accessible to other research groups to continue the research cycle, which will provide valuable information for genotype-phenotype correlation studies in the future. Secondly, patient samples without assigned mutations represent a valuable resource for novel RP gene discovery. In fact, novel RP genes have been identified from these samples and have lead to existing publications 32 . Last but not least, genetic testing results will provide registered eyeGENE ® patients the information and opportunity to participate in gene-specific clinical trials.