Comprehensive sequencing of the myocilin gene in a selected cohort of severe primary open-angle glaucoma patients

Primary open-angle glaucoma (POAG) is the most common form of glaucoma, prevalent in approximately 1–2% of Caucasians in the UK over the age of 40. It is characterised by an open anterior chamber angle, raised intraocular pressure (IOP) and optic nerve damage leading to loss of sight. The myocilin gene (MYOC) is the most common glaucoma-causing gene, accounting for ~2% of British POAG cases. 358 patients were selected for next generation sequencing (NGS) with the following selection criteria: Caucasian ethnicity, intraocular pressure (IOP) 21–40 mm Hg, cup:disc ratio ≥0.6 and visual field mean deviation ≤−3. The entire MYOC gene (17,321 bp) was captured including the promoter, introns, UTRs and coding exons. We identify 12 exonic variants (one stop-gain, five missense and six synonymous variants), two promoter variants, 133 intronic variants, two 3′ UTR variants and 23 intergenic variants. Four known or predicted pathogenic exonic variants (p.R126W, p.K216K, p.Q368* and p.T419A) were identified across 11 patients, which accounts for 3.07% of this POAG cohort. This is the first time that the entire region of MYOC has been sequenced and variants reported for a cohort of POAG patients.


Aim
In this study, the entire region of MYOC is assessed in 358 individuals with POAG selected from a UK cohort. Through the use of next-generation sequencing (NGS), application of bioinformatic tools and strategic filtering of variants, variants across the intergenic, promoter, UTR, exonic coding sequences and intronic regions are reported for the first time.

Methods
Patients with Primary Open Angle Glaucoma (POAG) were recruited from eye clinics at University Hospital Southampton, Addenbrook's Hospital Cambridge, Frimley Park Hospital Surrey, Queen Elizabeth Hospital Birmingham, Queen Alexandra Hospital Portsmouth, Romsey Hospital, St Mary's Hospital Isle of Wight, Torbay Hospital Devon and New Cross Hospital Wolverhampton. Patient data was collected including gender, ethnicity, family history of POAG, specific diagnosis of the patient, age at diagnosis, intraocular pressure (IOP), cup:disc ratio (CDR), central corneal thickness and visual field mean deviation (VFMD). Blood samples were collected and DNA was extracted using the salting out method 28 and stored at −20 °C. Initially, 372 patients were selected for Next Generation Sequencing (NGS) using the following selection criteria: Caucasian ethnicity, 21 mm Hg ≤ IOP ≤ 40 mm Hg, cup:disc ratio ≥0.6 and visual field mean deviation ≤−3. Ten patients who passed all inclusion criteria also had one affected first degree relative recruited to the study, even if they did not meet all inclusion criteria.
The entire MYOC gene was targeted for inclusion using a custom sequencing panel to include the intronic, exonic, UTR and promoter regions (Table 1) and an additional 1000 bp upstream of the Eukaryotic Promoter Database defined promoter coordinates (hg38 coordinates: chr1:171652678-171652737) 29,30 .
Library preparation and sequencing were performed in local laboratories, where DNA was simultaneously fragmented and tagged with sequencing adapters using the Illumina's Nextera Rapid Capture Custom Enrichment kit (Illumina 5200 Illumina Way San Diego, California USA). Target regions of DNA were bound and amplified with custom capture probes and enriched prior to running on a Illumina NextSeq500 sequencing machine. Sequencing was performed in three batches of 96 samples and one batch of 84 samples.
For the 372 sequenced patient samples, coverage across both the MYOC gene and all targets were determined. The proportion of variants shared between samples was checked for consistency with known sample relationships and ethnicities. VerifyBamID v1.1.13 45 50 . Repetitive regions were annotated using UCSC RepeatMasker data, and rare variants (allele frequency ≤ 0.05) in the POAG cohort and 1000 Genomes Project were plotted.
Variants were considered as previously known glaucoma-causing variants if they were identified as 'Glaucoma-causing' in the 'Myocilin allele-specific glaucoma phenotype database' or ClinVar. Exonic variants were prioritised as candidate causal variants through CADD Phred scores ≥15 51 . Exonic splice variants were prioritised if they exceeded a 0.6 MutPred Splice score threshold 39 .
Non-coding variants were prioritised using FATHMM which out-performs CADD in the non-coding region whilst CADD has a superior classifier over FATHMM in the coding region. Variants were prioritised if the FATHMM score exceeded the default threshold of ≥0.5 43 . Intronic variants were also prioritised if they were flagged as potentially splice affecting in HSF3.0 40,41 .
CNVkit, which is designed for use with custom target panels and short-read Illumina sequencing, was used to infer copy number 52 . Data were analysed in batches to account for variation in average depth between batches, and a pooled reference of all samples within the batch was used.
Consent was obtained in accordance with the Declaration of Helsinki and was approved by South West Hampshire Local Research Ethics Committee (05/Q1702/8). Informed consent was obtained from all subjects, and all methods were carried out in accordance with the relevant guidelines and regulations of Research Ethics Committees (REC).

Results
samples analysed. 358 of 372 patients passed inclusion criteria after sample quality control. One sample was omitted due to insufficient depth (≤20X), five were omitted due to post hoc detection of sample duplication or mixed race individuals. A further eight samples were omitted due to the patient age at diagnosis being less than 40 years. Selected demographic and clinical characteristics for all 358 individuals in the final analysis are summarised in Table 2.
Genomic features and variation across the MYOC gene. The coordinates of the MYOC gene (5′ promoter -3′ UTR) are chr1:171,652,737-171,635,416 (hg38). Coverage was uninterrupted across the entire region with 100% coverage at 20X depth for all samples. The four batches had mean depths of 717X, 552X, 389X, and 726X across target regions respectively, averaged across all samples. The poorest coverage was observed in batch 3. A consistent coverage pattern is seen for all four batches (Fig. 1B) and was found to be correlated with mappability, repetitive context, conservation and GC content (R 2 = 0.3358, p-value < 2.2 × 10 −16 ). Conservation scores derived from PhastCons (Fig. 1C) and PhyloP (Fig. 1D) were highest across exonic regions, with smaller regions of high conservation in intron 1 and a region upstream of the Eukaryotic Promoter Database defined promoter region.
We identified a total of 172 annotated variants comprising 160 SNPs and 12 indels in the POAG cohort of 358 individuals (Table 3). These variants were distributed across MYOC with 21 variants upstream intergenic, two variants in the promoter region, four in exon 1, 118 in intron 1, one in exon 2, 15 in intron 2, seven in exon 3, two in the 3′ UTR and two variants in the downstream intergenic region. 156 SNPs were identified in the non-coding regions of the MYOC gene in the POAG cohort. The majority (70.5%) of non-coding variants were located in the largest intron, intron 1, which spans 13,285 bp (76.7%) of the 17,321 bp length of MYOC. For comparison, there were 574 total SNPs in the 1000 Genomes Project European population (1000gEUR) in MYOC. There were 105 rare (AF < 5%) variants in POAG and 134 rare (AF < 5%) variants in 1000gEUR, and these had a similar distribution across the MYOC gene ( Fig. 1F and G).
Three SNPs were identified with high conservation in PhastCons (20 mammals), using a threshold of 0.4, and excluding variants in repetitive regions. The variant rs76745622 was located in the upstream intergenic region whilst rs11586716 and rs12035960 were located in intron 1. No variants were identified in the non-coding regions with high conservation using PhyloP (≥1.5). www.nature.com/scientificreports www.nature.com/scientificreports/  www.nature.com/scientificreports www.nature.com/scientificreports/ exonic variants. A total of 12 exonic variants were called (Table 4). Four SNPs were detected in exon 1, one SNP in exon 2 and seven SNPs in exon 3. Four variants had CADD Phred scores greater than 15 suggesting the variants were likely pathogenic. Variants NM_000261:exon3:c.C1102T (p.Q368*) and NM_000261:exon1:c. C376T (p.R126W) had previously been identified as 'Glaucoma-causing' by the 'myocilin allele-specific glaucoma phenotype database' . Variant p.R126W also had a high MutPred splice score of 0.605, indicating that this variant was likely to affect splicing.
Two variants NM_000261:exon2:c.G648A (p.K216K) and NM_000261:exon3:c.A1255G (p.T419A) not previously identified as glaucoma-causing, had pathogenicity scores indicating they may be of importance. In three individuals p.K216K had a high PhastCons score of 0.992 indicating that it is within a highly conserved element. The variant was also more common in the POAG cohort (AF = 0.0042) than in the ExAC Non-Finnish European (NFE) population (AF = 0.0005).
The missense variant p.T419A had no known rsID and was not found in ExAC NFE. In the POAG cohort it has an allele frequency of 0.0028, and identified as heterozygous in two individuals. It is located in exon 3 and has a SIFT score of 0, PolyPhen HDIV score of 0, GERP++ score of 4.52 and CADD Phred of 37 which indicate that this variant is likely to be highly pathogenic. However, this variant was found to be present on the same read pair as the p.Q368* variant in both patients (see Supplementary Fig. S1).
There was no significant difference in sub-phenotypes between patients with candidate causal MYOC variants (p.Q368*, p.R126W, p.K216K or p.T419A) and patients with no candidate causal MYOC variants (t-test, IOP p-value = 0.766, CDR p-value = 0.626, VFMD p-value = 0.211). However, hypertension was treated in five of the 11 patients with candidate causal MYOC variants.
Non-coding variants. There were 160 variants identified in non-coding regions (see Table 3 and Supplementary Fig. S2). The majority of variants (118) were identified within the largest region, intron 1.Using a FATHMM threshold of 0.5 to prioritise the non-coding variants, one variant upstream of the promoter, three intron 1 and one intron 2 variants remain ( Table 5). The highest FATHMM score of 0.86 was seen for a common variant with an allele frequency of 10% in the 1000gEUR, and similar (8.2%) in the POAG cohort. A single variant in intron 1, NM_000261.1:c.605-5949C>T, had a CADD Phred score greater than 15, however, there was no significant difference in frequency between this variant in the POAG cohort compared with the 1000gEUR cohort (allelic chi-squared test, p-value = 0.716). A second intron 1 variant, NM_000261.1:c.604+5942G>A, had a CADD Phred score of 12.78, a GERP++ score of 3.57 and PhastCons score of 0.504. Although pathogenicity scores are in favour of a potentially pathogenic effect, allele frequencies show no significant difference between the POAG cohort and 1000gEUR (allelic chi-squared test, p-value = 0.132). The novel intergenic upstream variant identified, NM_000261.1:c.-2851C>T, had a CADD Phred score of 11.19, a GERP++ score of 3.22 and low conservation PhastCons score of 0.0079 providing ambiguous indications of pathogenicity. This variant is not present in the 1000gEUR but is observed as heterozygous in one individual within the POAG cohort. 20 variants were flagged as 'potentially splice altering' by human splice finder (HSF) version 3.0, 18 of which had rsIDs (see Supplementary Table S2). 17 splice variants were in intron 1 (15 SNPs, and 2 insertions) and three in intron 2 (three SNPs). Of these, three variants had potential to introduce both a new splice acceptor and/or a splice donor site (AD), nine introduced a new slice acceptor site only (A), seven a new donor site only (D) and one breaks a branch point (BBP).
No copy number variants (CNVs) were detected within the MYOC gene region. There were no losses in copy number (CN) across the entire region, however, some (N = 113) samples were called as a single copy gain with a CN = 3 across the entire region.  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
We have performed targeted next-generation sequencing on the full region of the MYOC gene (promoter, UTRs, coding exons, introns and intergenic regions) on 358 POAG patients with severe POAG sub-phenotypes. We report all variants detected across the region and have performed an in silico analysis to assess pathogenicity. We identified a known pathogenic stop-gain in exon 3, a known pathogenic missense variant in exon 1, a known synonymous variant not previously considered pathogenic in exon 2, and an unknown missense variant in exon 3. Between them these variants account for 11/358 (3.07%) of patients within our POAG cohort. NM_000261:exon3:c.C1102T (p.Q368*) is the most common causal variant in POAG, accounting for 31.2% of disease-causing variants in MYOC 24 . This stop-gain is 6.5 times more common in our POAG cohort than in the 1000gEUR (allelic Fisher's Exact test, p-value = 0.0336) and is seen in seven patients, accounting for 63.6% of the candidate causal variants identified. Of the seven patients with p.Q368* variants, three did not have a positive family history of POAG in our database. Craig et al. have previously shown a 100% family history for this variant, however this was only after retrospective follow up and new diagnoses were made. They found that the index patient was sometimes unaware of their family history of disease and thus family history reported may be underestimated, which may also be that case here. Furthermore, the penetrance of this variant is high but not complete. 82% of carriers of p.Q368* have POAG or OHT at the age of 65 years, and an unaffected carrier has been identified at 74 years of age 54 . Shepard et al. have previously shown that protein MYOC monomers with this variant do not contain a cryptic peroxisomal signalling sequence (PTS1) and that it likely exposes the PTS1 sequence in its dimer partner 16 . This is believed to cause the mutant dimer to associate with the PTS1R and ultimately cause deleterious trabecular meshwork cell function 16 . This mechanism has been supported by other studies 55 Supplementary Table S3).
The synonymous variant NM_000261:exon2:c.G648A (p.K216K) was not previously considered a pathogenic variant. This variant is found in exon 2 which contains just one known pathogenic variant and is believed to translate to a linker region within the MYOC protein 24,62 . Synonymous variants in MYOC have been suggested to have a role affecting MYOC mRNA structure and subsequently the translated protein stability 63 . Variant p.L215Q, on the preceding codon of p.K216, is believed to be glaucoma-causing on the basis of an in-silico damaging SIFT score 62 . Similarly, p.K216K has strong in silico pathogenicity scores to suggest possible pathogenic status (PhastCons of 0.992 and CADD Phred of 15.63). Furthermore, this variant is found in the gnomAD Non-Finnish European (NFE) population 25 significantly less frequently than the POAG cohort (allelic Fisher's Exact test, p-value = 0.0109). This heterozygous variant was found in three patients from the University Hospital Southampton site. No evidence of relatedness was identified, however, there is a possibility that there is some distant relatedness which we do not have the capability to detect.
The missense variant NM_000261:exon3:c.A1255G (p.T419A) does not have an associated rsID, nor is it found within 1000gEUR or ExAC. However, it is found at an allele frequency of 8.952e-6 in the gnomAD NFE population. This variant has never been observed in a glaucoma context before but is seen as a heterozygote in two patients in this study (AF = 0.0028). This is a substantially higher frequency than gnomAD NFE (allelic Fisher's Exact test, p-value = 9.4e-6). This variant had a SIFT score of 0, GERP++ score of 5.04, PhastCons of 0.945 and CADD Phred score of 23.5 which indicated further support for pathogenicity. However, we have found that in both patients p.T419A is co-inherited with the upstream p.Q368* variant (see Supplementary Fig. S1), therefore the protein will be truncated before translation of the potentially pathogenic substitution. Although these two patients had the earliest onset (50 & 56 years) of those carrying the p.Q368* variant, it is not possible to provide a plausible mechanism by which this variant could have a modifying effect.
Whilst there were no clear likely pathogenic variants in the non-coding region of the gene, NM_000261.1:c.-2851C>T which is located in the upstream intergenic region (44694 bp from the neighbouring VAMP4 gene) was found to be of potential interest. Whilst it is not within a conserved element in PhastCons, it had damaging GERP++ and FATHMM scores. It has an allele frequency in the POAG cohort of 0.0014 (one heterozygous patient). This variant is not found within the 1000gEUR and is at a position not currently covered by gnomAD. The Ensembl regulatory build indicates that this variant could be functionally important as it is located at a potential CTCF binding site. All other variants with a FATHMM score ≥0.5 were seen at similar frequencies in both the POAG cohort and 1000gEUR. Genotyping this non-coding variant across a wider POAG cohort could prove informative. Variants located up to 1000 bp upstream of MYOC have been implicated as potentially functionally important for controlling IOP 64,65 .  www.nature.com/scientificreports www.nature.com/scientificreports/ Five rare variants were present in six individuals that potentially affect splicing. However, the presence of these variants in the European population cannot be confirmed due to lack of coverage in allele frequency databases for these sites.
We found no evidence of sub-gene copy number changes, and no whole gene deletions. We did predict some whole gene single copy gains and we suspect the predicted gain reflects within-batch depth variation. Patient selection criteria for this study used strict sub-phenotype parameters in order to select most severe POAG sub-phenotypes with a greater chance of an accurate POAG diagnosis. However, such criteria hinders genotype-phenotype analyses within the selected cohort. Genotyping of a larger POAG cohort not selected on sub-phenotypes is necessary in order to perform robust genotype-phenotype analyses. The MYOC gene accounts for ~3% of patients with POAG, therefore a larger cohort would also have greater power to detect rarer causal variants.

Conclusion
For the first time all regions of MYOC have been sequenced and analysed in a POAG cohort. We have identified two known pathogenic variants and two high pathogenic scoring variants, which may cause POAG in 11 patients. Synonymous and non-coding variants have been identified as having pathogenic qualities using in silico pathogenicity predictions, and a known glaucoma-causing variant has been implicated as a potential deep exonic splice variant. This work expands the known allelic diversity of MYOC in POAG which is useful for diagnosis, genetic counselling and cascade genetic testing in families. Additional sequencing of MYOC interacting partners 66 and other POAG-causing genes could reveal rare causal variants and provide further insight into the genetic basis of POAG.

Data Availability
Data generated or analysed during this study are included in this published article and its supplementary files.