A novel variant in SMG9 causes intellectual disability, confirming a role for nonsense-mediated decay components in neurocognitive development

Biallelic loss-of-function variants in the SMG9 gene, encoding a regulatory subunit of the mRNA nonsense-mediated decay (NMD) machinery, are reported to cause heart and brain malformation syndrome. Here we report five patients from three unrelated families with intellectual disability (ID) and a novel pathogenic SMG9 c.551 T > C p.(Val184Ala) homozygous missense variant, identified using exome sequencing. Sanger sequencing confirmed recessive segregation in each family. SMG9 c.551T > C p.(Val184Ala) is most likely an autozygous variant identical by descent. Characteristic clinical findings in patients were mild to moderate ID, intention tremor, pyramidal signs, dyspraxia, and ocular manifestations. We used RNA sequencing of patients and age- and sex-matched healthy controls to assess the effect of the variant. RNA sequencing revealed that the SMG9 c.551T > C variant did not affect the splicing or expression level of SMG9 gene products, and allele-specific expression analysis did not provide evidence that the nonsense mRNA-induced NMD was affected. Differential gene expression analysis identified prevalent upregulation of genes in patients, including the genes SMOX, OSBP2, GPX3, and ZNF155. These findings suggest that normal SMG9 function may be involved in transcriptional regulation without affecting nonsense mRNA-induced NMD. In conclusion, we demonstrate that the SMG9 c.551T > C missense variant causes a neurodevelopmental disorder and impacts gene expression. NMD components have roles beyond aberrant mRNA degradation that are crucial for neurocognitive development.


INTRODUCTION
Pathogenic variants in nonsense-mediated mRNA decay (NMD) components in humans have been associated with intellectual disability (ID) and disruption of normal development [1][2][3]. NMD is a selective RNA turnover mechanism maintaining steady-state RNA levels, degrading both aberrant mRNAs harboring premature translation termination codons (PTCs) and subsets of normal mRNAs, including in neural development [4,5]. NMD is a complex pathway involving numerous components regulated in a tissuespecific and developmentally controlled manner [6]. This has raised interest in the developmental role of NMD components and whether disrupting the degradation of aberrant transcripts disturbs normal brain development.
X-linked, recessively inherited loss-of-function variants in the core NMD component UPF3B (OMIM 300298) result in ID [1], whereas autosomal recessive pathogenic variants in NMD regulators SMG9 and SMG8 have been associated with heart and brain malformation syndrome (OMIM 616920) [2,7]. To date, seven patients from five consanguineous families carrying homozygous pathogenic SMG9 variants, unique to their pedigree, have been described [2,[8][9][10]]. In the current study, we report and delineate the phenotypic spectrum of a further five Finnish patients from three unrelated families harboring the same novel homozygous SMG9 c.551T > C missense variant, which is enriched in the Finnish population. The phenotype is associated with mild to moderate ID, mild dysmorphisms, dyspraxia, increased susceptibility to a heart defect, and pyramidal signs. The phenotype is similar, but milder than in the previously reported patients with SMG9 homozygous loss-offunction variants, suggesting a hypomorphic role of the SMG9 c.551T > C variant.
The mechanism for how pathogenic SMG9 variants affect the NMD pathway and lead to heart and brain malformation syndrome is largely unknown. SMG9 encodes a regulatory cofactor of the SMG1 complex, which is essential in NMD, but it is unclear how the altered function of SMG9 affects the NMD mechanism as a whole [11]. To investigate the impact of this SMG9 variant on the NMD mechanism and the human transcriptome, we compared RNA sequencing results of our five patients with the homozygous SMG9 c.551T > C variant with that of age-and sex-matched healthy controls.

SUBJECTS AND METHODS Subjects and study approval
A total of 966 patients with either ID or pervasive and specific developmental disorders (ICD-10 codes F70-79 and F80-89, respectively) of unknown etiology who belonged to the Northern Finland Intellectual Disability cohort were recruited for clinical and molecular genetic studies. A detailed description of the project is provided by Kurki et al. [12]. Two patients who were identified using whole exome sequencing (WES) performed as part of clinical diagnostics at Centogene (Rostock, Germany) were also recruited for the project and included in the detailed phenotypic analysis. All patients were examined by one of the authors (ER).

DNA sequencing
We used standard methods to extract genomic DNA from the peripheral blood samples of the probands and their participating affected or unaffected healthy relatives. WES of DNA samples from Patients 1-3 and Control 3 was performed at the Broad Institute of MIT and Harvard (Cambridge, MA, USA). WES of DNA samples from Patients 4 and 5 and Control 2 was performed at Centogene (Rostock, Germany). More details of the WES analyses are provided in the supplementary note, including detailed clinical data. We confirmed the presence and segregation of the SMG9 c.551T > C p.(Val184Ala) variant by PCR, followed by conventional Sanger sequencing of DNA samples from patients and their unaffected parents, siblings, and other close relatives who had consented to participate in the study.

RNA sequencing and data analysis
We used standard methods to extract RNA from the peripheral blood samples of the five probands and five age-and sex-matched healthy control individuals (Supplementary Table 1). RNA sequencing and analysis were performed at the Institute for Molecular Medicine Finland (FIMM, Helsinki, Finland).
We performed differential gene expression and differential transcript analysis using a standard pipeline created by the FIMM Sequencing Center. Briefly, we normalized raw gene count data and detected differentially expressed genes (DEGs) with the edgeR package in R [13], and performed differential transcript analysis using Ballgown [14] with transcripts reconstructed from Stringtie [15]. We used all 13,822 protein-coding genes from the DEG analysis to perform a ranked list enrichment analysis, using the Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway databases [16][17][18] from the Molecular Signatures Database [19]. We tested ranked genes for overrepresented pathways using the FGSEA-multilevel method [20]. Detailed descriptions of the RNA sequencing, RNA data analysis, and enrichment analysis are provided in the supplementary note.

Allele-specific expression analysis
We performed allele-specific expression analysis on allele counts from the ASEReadCounter tool from the Genome Analysis Toolkit [21]. Using the resulting allele counts, we compared the proportion of reference reads for likely NMD-targeted protein-truncating (i.e. nonsense or stop-gain) variants to the proportion of reference reads for other, non-protein-truncating variants and for likely NMD-escaping protein-truncating variants. Allelespecific expression was calculated in Patients 1-4 and Controls 2-3, individuals for which WES data was available to locate the likely NMDtargeted variants. We accounted for the increased likelihood of allelespecific expression from variants on the same gene by randomly choosing one variant per gene per sample, permuted 1000 times. Detailed descriptions of the definition of likely NMD-targeted and likely NMDescaping genes and the allele-specific expression analysis are provided in the supplementary note.

RESULTS
Clinical delineation of the patients Clinical findings of the five patients with the homozygous SMG9 c.551T > C p.(Val184Ala) variant are compiled in Table 1 and Fig. 1. All patients were male, aged 24-56 years, and had either mild or moderate ID (N = 5/5, 100%). Motor development was typically mildly delayed and two patients were reported to have had muscular hypotonia during childhood (N = 2/5, 40%). Language development was markedly delayed in all patients (N = 5/5, 100%). First words were delayed in all patients until 2 to 3.5 years, and expressive language increased slowly. The intelligible speech was typically acquired by the age of 6-8 years. Four patients had strabismus (N = 4/5, 80%), and three patients had vertical strabismus (N = 3/5, 60%). One patient had short stature (N = 1/5, 20%), otherwise growth parameters were normal. One patient had a complex congenital heart defect (N = 1/5, 20%). All patients had muscular hypertonia and clonic or very brisk reflexes in their lower limbs. In addition, they had intention tremor and slow diadochokinesis (N = 5/5, 100%). Four patients were described as ataxic when they were children (N = 4/5, 80%). Three patients had planovalgus (N = 3/5, 60%). The electroencephalogram (EEG) of three patients showed abnormalities (N = 3/5, 60%), but none of the patients were diagnosed with epilepsy.
Consistent with previously published patients with pathogenic biallelic SMG9 variants [2,[8][9][10], the patients had ID, increased peripheral muscle tone, brisk deep tendon reflexes, and dysmorphic facial features including prominent forehead, broad nasal bridge, and high arched palate (Fig. 1). All the previously published patients with pathogenic biallelic SMG9 variants had a congenital heart defect (N = 7/7, 100%) and most of them had brain abnormalities (N = 6/7, 86%) [2,[8][9][10]. Unlike previously published patients with both heart and brain malformations, only one patient (N = 1/5, 20%) in this cohort had a congenital heart defect, and one other patient (N = 1/5, 20%) had mild dilation of lateral ventricles and mild lack of white matter in the trigonum area. Detailed clinical descriptions of the patients are provided in the supplementary note.

Genetic results
All five Finnish patients were found to carry a homozygous SMG9 gene variant: c.551T > C p.(Val184Ala) (NM_019108.4, GRCh38 g.19:43747479 A > G, rs749498958). The variant was identified by WES and confirmed by Sanger sequencing. The variant is present in the Genome Aggregation Database (gnomAD v.2.1.1), with a minor allele frequency of 0.0001627. Due to recent population bottlenecks and subsequent rapid population expansion in isolation [22], the allele frequency is 34 times higher in the Finns (0.001598) than in non-Finnish Europeans (0.00004644), and there are no homozygous individuals in the gnomAD. The expected number of SMG9 c.551T > C homozygous patients in Finland is (0.001598) 2 *5.5 million is approximately 14 individuals.
The ID segregated in the families in a recessive manner (Fig. 2). To investigate whether the SMG9 c.551T > C had arisen from a single mutational event, we performed a haplotype analysis using previously published data [23]. The SMG9 gene variant c.551T > C lies on a shared ancestral haplotype with a median length of 1.7 Mb for shared haplotype segments further suggesting SMG9 c.551T > C to be a Finnish founder variant.
The SMG9 c.551T > C variant affects a highly conserved amino acid and a moderately conserved nucleotide (Fig. 3A), and its CADD score is 23.8. In silico, the variant is predicted to be disease causing (Mutation Taster), benign (Polyphen), deleterious (SIFT), and potentially capable of altering of splicing (Human Splicing Finder). We performed an in silico assessment of the SMG9-Val184Ala mutated structure using SWISS-MODEL [24] that predicted a structural alteration compared to the wild-type SMG9 structure (Fig. 3B, Supplementary Fig. 1 and Supplemental video). In the predicted model of mutated SMG9 p.184Ala    Tables 3, 4 and Supplementary Fig. 2). We first asked whether the SMG9 c.551T > C would lead to the removal of transcripts harboring the variant. Interestingly, the missense variant did not significantly impact the total mRNA levels or abundances of the different isoforms of the gene (Supplementary Table 5), suggesting that the clinical phenotype in patients was not resulting from loss of SMG9 transcripts. We then wondered whether the missense variant would result in disrupted splicing of SMG9 gene products. All exons of the SMG9 gene were found to be expressed in the patient group, and no change in expression levels was detected between the two groups when analyzed on the gene level or transcript level.
To determine the effect of the SMG9 variant on PTC-induced NMD, we examined allele specific expression (ASE) of genes harboring heterozygous protein-truncating variants that were predicted to be targeted by NMD. We reasoned that for genes where NMD effectively removes the PTC-containing transcripts, the majority of the detected mRNA originates from the transcript containing the reference allele. As recently shown by others [26] this results in deviation from the expected equal biallelic expression for these genes from each parental chromosome. In contrast, if PTC-induced NMD was attenuated by the SMG9 p. (Val184Ala) missense variant in patients, then both alleles would be present in equal proportions, giving a reference read ratio of approximately 0.5. We identified 28 likely NMD-targeted variants in four patients and two controls. Surprisingly, we saw that the likely NMD-targeted variants in patients had a significantly higher proportion of reference reads than non-NMD targeted variants (p = 3.1e−3), indicating a functioning PTC-induced NMD mechanism (Fig. 3C). We did not see a significantly higher proportion of reference reads in NMD-targeted variants in the controls, likely due to the small number of predicted NMD-targeted variants in two controls for which WES was available (p = 0.27, Supplementary Fig. 3). Importantly, the protein-truncating variants that were predicted to escape the NMD mechanism did not have significantly higher proportion of reference reads than other variants (p = 0.021, Supplementary Fig. 4). We next asked whether the SMG9 p.(Val184Ala) missense variant impacts the function of cells by analyzing differential gene expression of 13,822 protein-coding genes in the patient and control data set (Supplementary Table 6). The DGE analysis revealed that the expression of 112 genes was statistically significantly different (false discovery rate adjusted p-value, FDR padj < 0.05) between the patients and the healthy controls (Supplementary Table 7). This analysis revealed genes whose expression is affected by dysfunctional SMG9, which are also candidates possibly contributing to the disease pathogenesis. The overall pattern observed was an increased expression of genes in the patient group compared to the control group, including 6.5 times more upregulated (97) than downregulated (15) genes (p-value = 7.12e−16) (Supplementary Table 7), suggesting that dysfunctional SMG9 either affects NMD-regulated gene expression outside of the known canonical mechanisms or through a non-NMD regulatory mechanism.
The ranked list enrichment analysis of the DEGs using the GO function ( Supplementary Figs 5, 6) revealed enrichments in biological processes involved in the cellular response to toxic substances (adjusted p-value calculated using the FGSEA-multilevel method [20], or padj, of 0.025), detoxification (padj = 0.03), the hydrogen peroxide metabolic process (padj = 0.03), and cellular oxidant detoxification (padj = 0.05), suggesting that dysfunctional SMG9 may impair the normal function of the SMG1 complex involved in the cell stress response, and the genotoxic and oxidative stress pathway.

DISCUSSION
In this study, we report five patients from three unrelated families with a novel homozygous SMG9 c.551T > C p.(Val184Ala) variant. The ID segregates recessively in the families. The clinical phenotype of all five patients was recognizably similar, and the variant was not present in a homozygous state in the healthy controls of either our own in-house set of Finnish sequencing data or the largest public database of sequence data to date (gnomAD), providing additional support for its role in pathogenicity.
RNA sequencing showed that the SMG9 c.551T > C variant did not affect the splicing of SMG9 gene products, and the expression levels of SMG9 gene products did not significantly differ between SMG9 c.551T > C homozygous patients and healthy controls. Allele-specific expression analysis did not provide evidence that the SMG9 variant affects the PTC-induced NMD mechanism. However, DGE analysis revealed 112 statistically significant DEGs between the cases and controls, with an overall pattern of an increased expression of genes in the patient group. This suggests that normal SMG9 function may have an inhibitory effect on gene expression and the SMG9 c.551T > C variant causes transcriptional upregulation.
SMG9 is a regulatory cofactor that binds to the SMG1 kinase, which carries out an indispensable phosphorylation step in the NMD pathway [27]. SMG9 is ubiquitously expressed in the brain, heart, eye, and blood [28]. It contains 14 exons and has only one known functional mRNA isoform (NM_019108.4). The SMG9 gene product, SMG9, contains two functional domains: the N-terminally located intrinsically disordered domain and the C-terminally located nucleotide-trisphosphatase domain [29]. The SMG9 c.551T > C variant is located in exon 5 of the SMG9 gene and in the nucleotide-binding G-fold domain of the SMG9 protein, where it faces the active kinase site of SMG1 [30]. As the SMG9 c.551T > C variant is predicted to be damaging, it possibly reduces the kinase activity of the SMG1.
NMD is a regulatory pathway that functions not only to degrade transcripts containing PTCs, but also to maintain normal transcriptome homeostasis [4,6]. RNA sequencing results of samples from patients with homozygous SMG9 c.551T > C and controls showed that 87% (97) of the 112 significantly DEGs were upregulated, suggesting that normal SMG9 function may downregulate transcription of these genes. This is consistent with previous studies that demonstrated a prevalent upregulation of gene expression as a result of SMG9 deficiency [2] and depletion of UPF1, the key protein of NMD, in human embryonic stem cells (hESCs) [31]. In this study, twenty-two of the statistically significant DEGs (N = 22/112, 20%) were known NMD substrates upregulated or downregulated in UPF1-depleted hESCs (Supplemental Table 6) [31] Our results are consistent with previous studies, which have suggested that constitutional defects in the SMG8 and SMG9 genes may cause NMD-related transcriptional dysregulation without affecting PTCinduced NMD [7].
In addition to the NMD pathway, the SMG1 complex plays a role in cell growth, the cell stress response, the genotoxic and oxidative stress pathway, and TNFα-induced apoptosis, which could explain the enrichment of DEGs involved in the cellular response to oxidative damage, toxic substances, detoxification, and apoptosis (Table 2 and Supplementary Fig. 5). Several highly significant DEGs are involved in various pathways, suggesting that multiple processes could play a role in disease pathogenesis. DGE analysis revealed a number of DEGs which are highly expressed in the brain, heart, and/or eye, which could contribute to disease pathogenesis (Table 2 and Supplementary Table 6).  The expression of spermine oxidase (SMOX), a highly inducible enzyme that regulates polyamine metabolism, was significantly upregulated in patients. SMOX-associated dysregulation of polyamine metabolism has been suggested to play a role in neurodegenerative diseases [32,33], rendering SMOX a biologically interesting candidate in the pathogenesis of these patients' disease. In addition, elevated SMOX levels and the resultant disturbance of polyamine levels increase the severity of seizures in mice models [33], and overexpression of SMOX is associated both with excitotoxic injury and higher oxidative stress [34]. Other significantly upregulated genes were those that encode: oxysterolbinding protein 2 (OSBP2), which is essential for cell proliferation and survival [35], RAP1 GTPase activating protein (RAP1GAP), which is involved in neuronal differentiation [36], and glutathione peroxidase 3 (GPX3), which protects cells from oxidative damage. Another interesting, upregulated gene was HEY1, a transcriptional repressor in the Notch signaling pathway that plays an important role in gliogenesis and cardiac morphogenesis and angiogenesis [37,38], which could contribute to heart and brain pathogenesis.
On the other hand, the expression of zinc finger protein 155 (ZNF155), which may be involved in transcriptional regulation, was significantly downregulated. However, these results should be validated in future studies by confirming the findings using RT-qPCR. The SMG9 gene variant c.551T > C is enriched in the Finnish population, and its allele frequency is 34 times higher in Finns (0.0016) than in non-Finnish Europeans (4.6e-05) in the gnomAD. The genetic architecture of the Finnish population is characterized by recent bottlenecks and genetic drift causing enrichment of unique rare variants, some of which are deleterious [23]. This has led to the identification of numerous recessively inherited pathogenic founder variants that are more common in Finns than in any other population, as exemplified in the Finnish Disease Heritage database [22]. It is likely that more novel pathogenic recessive variants will be identified in the Finnish population in future studies.
Recently, seven patients with biallelic loss-of-function variants in SMG9 were described as having heart and brain malformation syndrome [2,[8][9][10]. In our cohort of five patients with the same novel homozygous SMG9 c.551 T > C variant, variable phenotypic penetrance was noted for heart and brain malformations, and consistent findings of ID, pyramidal signs, and dyspraxia were noted in all patients. Several patients also had vertical strabismus, which has been suggested to be part of a broader motor control deficit [39]. All the patients were adults with good general health, providing evidence that the homozygous SMG9 c.551T > C variant was not significantly associated with reduced life expectancy. The milder phenotype in patients with homozygous damaging SMG9 c.551T > C missense variant, compared to the phenotypes in the previously reported patients with biallelic loss-of-function variants, suggests that SMG9 c.551T > C could be a hypomorphic allele resulting in milder developmental outcome.
In conclusion, this study shows that the phenotype of heart and brain malformation syndrome ranges from a characteristic set of heart and brain anomalies to the presentation of ID, pyramidal tract defect, and ocular manifestations extending the knowledge of phenotypic spectrum. RNA sequencing results revealed prevalent upregulation of genes, suggesting that normal SMG9 function is involved in transcriptional downregulation. A series of highly significantly DEGs were identified, including SMOX, OSBP2, GPX3, and ZNF155; these are candidate genes possibly contributing to the disease pathogenesis. This study and previous studies [1][2][3]7] confirm the presence of a novel, emerging clinical group of developmental syndromes caused by pathogenic germline variants in genes encoding components or regulators of NMD machinery. As there are several genes in the NMD pathway that have not yet been associated with human disease, it is possible that novel disease genes in this pathway will be identified in the future.

DATA AVAILABILITY
De-identified materials, data sets, and protocols are available upon request. The reported variant was submitted to the LOVD database hosted at Leiden University Medical Center, the Netherlands.