Panel-Based Population Next-Generation Sequencing for Inherited Retinal Degenerations

Inherited retinopathies affect approximately two and a half million people globally, yet the majority of affected patients lack clear genetic diagnoses given the diverse range of genes and mutations implicated in these conditions. We present results from a next-generation sequencing study of a large inherited retinal disease patient population, with the goal of providing clear and actionable genetic diagnoses. Targeted sequencing was performed on 539 individuals from 309 inherited retinal disease pedigrees. Causative mutations were identified in the majority (57%, 176/309) of pedigrees. We report the association of many previously unreported variants with retinal disease, as well as new disease phenotypes associated with known genes, including the first association of the SLC24A1 gene with retinitis pigmentosa. Population statistics reporting the genes most commonly implicated in retinal disease in the cohort are presented, as are some diagnostic conundrums that can arise during such studies. Inherited retinal diseases represent an exemplar group of disorders for the application of panel-based next-generation sequencing as an effective tool for detection of causative mutations.

Stargardt Disease. The most commonly implicated gene in Stargardt disease was ABCA4, causing recessive Stargardt disease or Fundus Flavimaculatus in 32 pedigrees. Although Stargardt disease is not the most common condition in the Irish IRD patient cohort, it is very genetically homogeneous: 73% of sequenced Irish Stargardt pedigrees showed causative mutations in ABCA4, with many of the remainder having one identifiable ABCA4 mutation but not two. ABCA4 mutations are the primary cause of Stargardt disease and autosomal-recessive cone-rod dystrophy, in addition to causing a small number of cases of RP 9 ; ABCA4 was therefore the single most  commonly implicated gene across all conditions in this study. This cohort included one pedigree with a rare case of dominant Stargardt-like disease caused by a mutation in the PROM1 gene, (NM_006017.2:c.1117C > T,p. Arg373Cys), which has previously been identified as causative of dominant maculopathy 10 . In addition, mutations in BEST1 were identified in one pedigree in the cohort, leading to a rediagnosis of Best Vitelliform Macular Dystrophy.

Leber Congenital Amaurosis (LCA) and Early-Onset Severe Retinal Degeneration (EOSRD).
The cohort of IRD patients with LCA/EOSRD was relatively small, comprising 18 patients across 11 pedigrees. The mutation NM_001122769.2:c.1756A > T,p.Lys586* in the LCA5 gene, which encodes the Lebercillin protein, segregated with the disease in a recessive LCA pedigree with two affected members. This mutation has not previously been reported as disease causing, although the variant was reported by the Exome Aggregation Consortium (ExAC) 11 at an allele frequency of 1/121,012. No homozygotes have previously been reported. As a premature stop mutation in a gene implicated in LCA and showing segregation with disease, it is highly likely to be the causative mutation. The homozygous mutation (NM_000554.4:c.206G > A,p.Arg69His) in the CRX gene was observed in a simplex pedigree with no other candidate mutations. The Arg69His substitution has not been implicated in disease before, but has been observed in ExAC at extremely low allele frequency (2/121,402), with no homozygotes reported. The mutation is predicted by the ensemble model to cause disease and is located in the crucial homeodomain of CRX, a domain previously reported to be sensitive to missense mutations 12 Table 1. List of novel disease-associated mutations identified in this study. Amino acid substitutions were subject to stringent evaluation before inclusion in the list and were only included if segregation with disease could be confirmed in at least three family members, including at least two affected individuals, and bioinformatic methods predicted the mutation to be damaging to protein function.
of the patient's mother, who retains normal visual function, confirmed the heterozygous presence of the mutation, indicating that the mode of inheritance is indeed recessive.
In one interesting simplex case compound heterozygous frameshifts in the NRL gene, (NM_006177.3:c.16delA,p.Ser6fs) and (NM_006177.3:c.386delC,p.Ala129fs), were found in a patient with atypical LCA marked by unusually good retention of visual acuity despite the presence of typical LCA features such as congenital nystagmus, extinguished ERG and onset of visual symptoms at a very young age. Dominant-acting NRL mutations have been reported in the past 13 , and so molecular cloning of the entire region was used to verify that the two mutations were on separate chromosomes and did not merely represent a single haplotype. Mutations in NRL have also been implicated previously in recessive inherited retinal disease 14 , however we have been unable to find another reported case of a patient with two frameshift or nonsense mutations in NRL. As such, this is a phenotype of interest, as the first homozygous NRL-null case reported in humans.
At 46 years the proband retained best-corrected Snellen visual acuity of 6/60 and 6/15 in the right and left eyes respectively. Visual fields were concentrically constricted to within 10° of fixation to the Goldmann IV4e target. No convincing rod or cone full-field ERG responses were recordable, although delayed and reduced amplitude cone responses were recordable 10 years previously. Fundoscopy at 46 years of age revealed extensive retinal and choroidal atrophy peripherally, with better preservation at the posterior pole in each eye. Scattered clumped pigment deposits were observed, consistent with existing observations of patients with recessive NRL-based disease 14 .
Retinitis Pigmentosa (RP). The largest disease cohort comprised patients with RP, accounting for 37.9% of the total patient cohort. The most commonly implicated genes for RP varied based on inheritance: for dominant RP, the most commonly implicated genes were RHO (8.6%), RP1 (8.6%) and PRPH2 (3.8%). This may underestimate the true prevalence of RHO mutations in autosomal dominant RP, as pre-NGS sequencing studies had been performed in the Irish IRD population focusing solely on exons of RHO 15,16 . As a result, a number of patients with RHO mutations had already been identified, and this subpopulation of Irish IRD patients did not participate in the current study, resulting in a likely underestimate of the frequency of RHO-linked adRP.
The primary contributor to the high prevalence of RP1 mutations in dominant RP was a frameshift mutation in RP1 (NM_006269.1:c.2285_2289delTAAAT,p.Leu762fs) that was observed to segregate with disease in five dominant RP pedigrees, including one large pedigree with four affected members and several unaffected members, and was never observed in pedigrees with other conditions. Although these patients presented to the clinic as unrelated probands and were not geographically clustered, it is anticipated that they likely share a common ancestor and form a contiguous super-pedigree. This RP1 mutation has not been previously reported in dbSNP or ExAC, nor was it observed in a recent study of RP patients in Northern Ireland 17 but is the most frequent single mutation causing RP in the current study.
Similarly, the primary contributor to the high prevalence of PRPH2 mutations is the S212G (NM_000322.4:c.634A > G,p.Ser212Gly) mutation which has previously been identified as causative of disease in the Irish population 18 . As with the RP1 p.Leu762fs mutation, patients carrying the S212G mutation likely share a common ancestor, although they presented as unrelated probands in the clinic.
In cases of X-linked RP, nearly half of all sequenced pedigrees (10/21) showed mutations in the RPGR gene, representing almost all of the XLRP pedigrees for which mutations could be identified in this study, with the exception of two pedigrees with RP2 mutations. The RPGR mutations included the previously-reported G436D (NM_001034853.1:c.1307G > A,p.Gly436Asp) mutation. However, most mutations in RPGR were premature stop and frameshift mutations. Several of these were novel mutations that have neither been reported previously nor implicated in disease (Table 1). There was also one novel amino acid substitution in the RPGR gene (p.Thr99Ala) that segregated with disease in a small pedigree. This region of the protein is highly conserved and quite sensitive to mutation 19 ; a different mutation in the same codon (p.Thr99Asn) 19 , as well as a mutation in the adjacent codon (p.His98Gln) 20 have been implicated in disease. Simulated folding of both the native and the novel p.Thr99Ala proteins using the I-TASSER suite 21 showed a disruption in the beta-propeller domain that contains residue Thr99, converting that domain from beta-sheet to coil conformation. Such a conformational change would be likely to severely affect protein function (Fig. 3).
For recessive and simplex forms of RP, the results obtained were more disparate. Notably, however, four pedigrees within the simplex/recessive RP cohort were found to carry the BBS1 mutation NM_024649.4:c.1169T > G,p. Met390Arg, which has been associated with Bardet-Biedl syndrome. In one of these cases, the patient was noted to have mild mental retardation consistent with Bardet-Biedl syndrome. However, the other three patients showed none of the extraretinal features of the disease. These cases therefore continued to be classified as nonsyndromic RP. This is consistent with reports from other groups, which have identified that BBS1 p.Met390Arg can cause either Bardet-Biedl syndrome or nonsyndromic RP in different pedigrees 22 .
Two cases in the recessive/simplex RP cohort were particularly noteworthy: A homozygous premature stop mutation in the GNAT1 gene (NM_000172.3:c.904C > T,p.Gln302*) in a case of simplex RP, which has been reported in an earlier paper 23 and a homozygous frameshift in the SLC24A1 gene (NM_004727.2:c.2679delT,p. Asn893fs) in a pedigree with two affected individuals. In both situations, the mutation segregated with the retinopathy (although in the case of the GNAT1 mutation, there was only one patient who carried it homozygously and only this individual was affected in the pedigree). Both of these genes have been implicated in congenital stationary night-blindness (CSNB), but neither gene has previously been implicated in RP. Although both CSNB and RP affect rod photoreceptor cells, rod cells do not die in significant numbers in CSNB, with the result that the disease is largely stationary and does not affect central or colour vision, although dark vision is entirely lost 24 . This finding therefore represents a novel clinical phenotype caused by SLC24A1 mutations.
In both the GNAT1 and the SLC24A1 pedigrees, the mutations were recessive premature-stop mutations, causing loss of protein function. In both pedigrees onset of the disease was late for RP, with a mild, slowly-progressing Scientific RepoRts | 6:33248 | DOI: 10.1038/srep33248 course. Similar to the GNAT1 phenotype 23 , patients in the SLC24A1 pedigree presented with lifelong night-blindness, but with very distinctive, albeit mild, symptoms of RP including progressive mid-peripheral visual field loss, early appearance of cataracts and unmistakable pigmentary retinopathy on fundus photography (Fig. 4).
Notably, no other plausibly causative mutations were observed in either pedigree in the retinal genes included in the study. The data suggest that in both cases, a severe, homozygous mutation in a known CSNB disease gene has caused a mild, late-onset form of RP, and we therefore hypothesize that this may represent a general pattern in retinal disease. Additional observations of the phenotype resulting from severe mutations in CSNB-associated genes in other pedigrees will be required to confirm or refute this.
Usher Syndrome. A total of 32 pedigrees with Usher syndrome were sequenced during the study, with type II disease being the most common (23 pedigrees), followed by type I (5 pedigrees). 4 other pedigrees had atypical or type III Usher syndrome. Consistent with previous work 25,26 , Usher Type I pedigrees were found to be predominantly (80%) caused by MYO7A mutations, while Usher Type II pedigrees were most commonly (74%) caused by mutations in USH2A. No causative mutations could be found in patients with atypical Usher syndrome, although two mutations were identified in the CLRN1 gene in a patient with type III Usher syndrome.
Choroideremia. Choroideremia is caused only by mutations in the CHM gene. As a result, detection was relatively uncomplicated and efficiencies were good. A total of 12 pedigrees with choroideremia were included in the study. Detection rates were the highest of all conditions included in this study; only two of the twelve pedigrees could not be solved. Three pedigrees in this cohort are particularly noteworthy: In one, the mutation NM_000390.2:c.1376C > G,p.Thr459Arg in the CHM gene was observed in all three affected members and no  unaffected members, with no other relevant mutations being detected in that gene. This mutation is predicted by Human Splicing Finder 27 to affect correct splicing and has not previously been reported as causative, and so this finding is of clinical relevance for future choroideremia diagnoses. In a second pedigree, a novel 6.3 kb deletion (hg19 co-ordinates chrX:85233437-85239772) was found to remove exons 3 and 4 of the CHM gene.
A third pedigree, however, was an unusual case. NGS of the proband revealed a deleterious mutation in the CHM gene: NM_000390.2 c.715C > T,p.Arg239*, which was confirmed by Sanger sequencing. The pedigree structure was consistent with X-linked inheritance, with no affected females. A second affected member of the pedigree was Sanger sequenced, but was found not to carry the CHM mutation. At the time of assessment, this patient had an end-stage retinopathy which could not be cleanly classified, although it was presumed before sequencing was performed that they carried the same mutation and condition as the rest of the pedigree (see Fig. 5). The absence of the mutation was confirmed again using a fresh sample from the patient, indicating that the CHM mutation was definitively not present. NGS of this patient revealed a homozygous mutation in the RPE65 gene (NM_000329.2:c.271C > T,p.Arg91Trp), which is known to be associated with retinitis pigmentosa 28 . Subsequent clinical work confirmed a diagnosis of RPE65-RP, making this was a very unusual case of two rare, genetically distinct Mendelian retinal diseases segregating within the same pedigree.

Discussion
Adopting a target panel-based NGS approach targeting the exons of known retinal disease genes, causative mutations could be identified in 57% of pedigrees (a breakdown by condition is given in Fig. 2). The fraction of pedigrees for which causative mutations could be identified varied widely between retinal conditions, with conditions such as Stargardt disease and choroideremia having the highest detection rates (70-86%).
A major novel finding from this study was the association of two genes, GNAT1 and SLC24A1, with RP that had previously only been associated with congenital stationary night blindness (CSNB). Although both retinitis pigmentosa and stationary night blindness affect rod photoreceptor cells, they are considered to be distinct conditions. In both cases, the mutation was a complete loss of function, either a homozygous premature stop or a homozygous frameshift. This suggests the intriguing possibility of a recurring pattern: Severe variants in CSNB-associated genes can cause a mild form of RP, potentially blurring the distinction between CSNB-and RP -associated genes, although the data is too limited for strong conclusions to be drawn as yet.
The results of this study emphasize the diversity of mutations underlying IRDs in the Irish population and the significant value of targeted NGS for IRDs. More than forty novel, previously unreported mutations were identified in this patient cohort (Table 1). We demonstrate the application of novel ensemble prediction methods and protein folding simulations for variant effect prediction (Fig. 3). In other cases, mutations in unexpected genes, such as BBS1 mutations in cases of nonsyndromic RP, underscored the significant overlap between different conditions in terms of clinical presentations, as well as the difficulty in identifying the causative gene based on clinical examination alone. The observation of one pedigree in which two IRDs are segregating (RPE65-RP and CHM-Choroideremia) serves to emphasise the essential role that NGS will play in the future diagnosis of this genetically heterogeneous group of conditions.
The pedigrees for which a mutation could not be identified are likely to be a mixture of those which could not be observed, such as mutations in promoter regions and other non-coding regions or genes not included in the panel, and those where the true mutations were in the sequenced region but could not be uniquely identified as causative because of the presence of multiple candidates. It is difficult to assess what proportion of pedigrees fall into each category, however an estimate can be made given that just over 25% of pedigrees had no candidate variants remaining in relevant genes after common and synonymous mutations were filtered out of the analysis. We suggest that this may be a reasonable, albeit conservative, lower bound for the fraction of pedigrees that may have non-coding or novel-gene mutations requiring whole-exome or whole-genome sequencing. This issue informs the question of whether to select whole-genome or targeted-panel sequencing for future studies. Costs of NGS can be broken down between library preparation and sequencing. Library preparation for a targeted panel of genes is more expensive in terms of time and reagents than preparing a whole-genome library, however, this is outweighed by the cost of deep-coverage whole-genome sequencing, which is still about $1000/ sample, 10 or 20 times the cost for sequencing of pooled target-capture samples.
Until whole-genome sequencing costs fall to a half or a quarter of current levels, therefore, there is still a role for targeted sequencing due to lower cost and the simplicity of analysis. For the near-term future, we recommend a two-tier approach to genetic analysis of IRDs: targeted sequencing of a core panel of exons to identify mutations in known genes, followed by whole-exome or whole-genome sequencing for pedigrees in which a mutation could not be found by the targeted approach. This applies even when the aim of the study is to identify non-coding mutations and new disease genes, as the cost savings from eliminating 50-60% of patients with coding mutations in known genes from the larger-scale sequencing study would more than outweigh the cost of performing targeted sequencing. The combination of these approaches, in concert with the increasing range of known retinopathy genes and the decreasing cost of wider-scale sequencing approaches, is likely to improve mutation detection rates even further in the next few years.  29 under standardised lighting conditions. Goldmann perimetry was used to assess the peripheral visual fields to the IV4e, I4e and 04e targets. Full-field electroretinograms were performed according to ISCEV standards 30 using a Roland Consult RETI-port retiscan (Brandenburg an der Havel, Germany). Fundus color and autofluorescence photography was performed using a Topcon CRC50DX. Spectral domain optical coherence tomography was performed using a Cirrus HD-OCT (Carl Zeiss Meditec, Germany).

Methods
DNA Isolation and Sequencing. Following informed consent, blood samples were collected from patients after clinical assessment. DNA was isolated from 2 ml of patient blood and fragmented for sequencing by ultrasonication in a Diagenode Bioruptor (Diagenode s.a., Belgium) to an average fragment size of 200-250 bp.
Sequencing libraries were generated and target capture was performed initially using the Agilent Sureselect XT2 kit (Agilent Technologies, Santa Clara, CA). Later captures used a redesigned panel with the Roche Nimblegen SeqCap EZ kit (Roche), incorporating new genes implicated in retinopathies since the design of the earlier panel. For both kits, captures were performed according to the manufacturer's recommendations. Exons for all genes previously implicated in retinal degeneration, as listed by Retnet at the time of capture panel design, were included as capture targets, plus 100 bp surrounding the CEP290 intronic mutation implicated in LCA 31 . Additional intronic regions in ABCA4 32 and USH2A 33 that are commonly implicated in disease were not included in the panel but were sequenced by single-read sequencing in relevant patients when causative mutations were not detected by targeted sequencing. These regions will be included in future versions of the panel. In the earlier capture design, UTRs were also included, but these were excluded in later panels to improve exon coverage and sequencing throughput. The total size of the captured region was 1,490 kb for the earlier captures and 728 kb for the later captures.
Captured patient DNA was multiplexed into either 24-or 96-sample pools and sent for sequencing. Sequencing of 96-sample pools was performed off-site by BGI Tech using an Illumina HiSeq 2000 (Illumina, San Diego, CA). 24-sample pools were sequenced locally using an Illumina MiSeq. Confirmatory single-read sequencing was also performed to verify the presence of candidate mutations. Data Analysis. Sequence data were demultiplexed and mapped to the human genome (hg19) using BWA version 0.7.12 34 . Duplicate reads were flagged using Picard version 1.106 35 and downstream analysis and variant calling were performed using GATK version 3.3.0 36 according to the protocol specified in the GATK Best Practices Workflow, with the notable difference that hard-filtering rather than variant quality score recalibration (VQSR) was used to filter variants, due to the small size of the capture area. Variants filtered by this method were not discarded, but instead marked as potential sequencing artefacts.
The list of identified variants was annotated with snpEFF 37 and dbNSFP 38 , and an ensemble model was used for classification of novel variants. Commonly used variant effect prediction software is based on machine learning approaches that learn a classifier model based on patterns in datasets of known pathogenic and neutral mutations 39,40 . In the fields of statistics and machine learning, 'ensemble' classifier models are commonly used which learn meta-classifiers using the predictions of existing individual classifier models as inputs. Such models frequently have performances superior to any individual model in the ensemble 41 , particularly if, as is the case with variant effect prediction, the individual models employ different approaches, features and training sets to generate their models and therefore have errors that should be substantially decorrelated with each other. The primary tool adopted for novel variant pathogenicity prediction was therefore an ensemble prediction model 42 , which shows significant improvements in accuracy compared to many tools, but which to our knowledge has not yet been used in studies of IRDs.
Synonymous variants and common polymorphisms were filtered out, and the remaining list of rare variants with the potential to affect protein sequence was output for manual curation. The output for each patient also included a list of coding regions where coverage was insufficient for reliable variant calling.
Scientific RepoRts | 6:33248 | DOI: 10.1038/srep33248 Ethical Approval. Ethical approval for this study was granted by the ethics committee of the Royal Victoria Eye and Ear Hospital prior to commencement of this study. All work was carried out in accordance with the approved guidelines. All patients gave written informed consent before recruitment to the study. No patients under 18 years of age were included in the study.