Disrupted auto-regulation of the spliceosomal gene SNRPB causes cerebro–costo–mandibular syndrome

Elucidating the function of highly conserved regulatory sequences is a significant challenge in genomics today. Certain intragenic highly conserved elements have been associated with regulating levels of core components of the spliceosome and alternative splicing of downstream genes. Here we identify mutations in one such element, a regulatory alternative exon of SNRPB as the cause of cerebro–costo–mandibular syndrome. This exon contains a premature termination codon that triggers nonsense-mediated mRNA decay when included in the transcript. These mutations cause increased inclusion of the alternative exon and decreased overall expression of SNRPB. We provide evidence for the functional importance of this conserved intragenic element in the regulation of alternative splicing and development, and suggest that the evolution of such a regulatory mechanism has contributed to the complexity of mammalian development.

A lthough only 1.5% of the human genome consists of regions that are translated into proteins, a higher proportion (5-7%) has been shown to be under evolutionary constraint 1,2 . These non-coding conserved elements (NCEs) have been subclassified by somewhat arbitrary length and conservation criteria, and include ultraconserved 3 and highly conserved 4 elements. The finding that many NCEs exhibit a higher level of conservation 5 and constraint 6 than protein-coding sequences initially perplexed the genomics community. The elucidation that some NCEs have functional roles as long range enhancers of flanking genes, splicing regulators, functional co-activators [7][8][9] and their frequent association with developmental genes with the potential to regulate spatiotemporal expression 7,10 , imply a largely regulatory role. The evolution of these complex regulatory networks may therefore have underpinned the emergence of our organismal complexity 6,11 .
Evidence continues to emerge of a critical relationship between NCEs and alternative splicing (AS), the mechanism by which over 95% of human multi-exon genes create additional protein diversity 12,13 . Intragenic NCEs are preferentially associated with genes involved in pre-mRNA splicing 3 , and are also often involved in the regulation of the expression of this class of genes by coupling AS with nonsense-mediated decay (NMD) [14][15][16] . Many genes involved in pre-mRNA splicing have ultra and highly conserved NCEs containing premature termination codons (PTCs), which can be alternatively spliced into the mature mRNAs to induce NMD in order to auto-regulate their expression [14][15][16] . These AS-NMD-mediated mechanisms are presumed to be crucial to the homeostatic maintenance of the core spliceosome components and the regulation of AS in a spatiotemporally specific manner by gene auto-and cross-regulation 17,18 .
Here we present mutations in a highly conserved, alternative PTC-containing exon of the small nuclear ribonucleoprotein polypeptides B and B1 (SNRPB) gene ( Fig. 1) as the cause of cerebro-costo-mandibular syndrome (CCMS), a human multiple malformation disorder characterized by posterior rib gaps and Pierre Robin sequence (micrognathia, glossoptosis and cleft palate). This finding provides biological evidence of a direct link between conserved genomic elements, regulation of AS and human development, and therefore novel insight in the regulatory and developmental role of NCEs.

Results
A combination of whole-exome sequencing and Sanger sequencing was used to identify causative mutations in a cohort of 10 unrelated families with CCMS, a rare genetic disorder characterized by micrognathia and posterior rib gaps 19 ( Supplementary Fig. 1). All patients have typical features of CCMS except one patient (Family D) who had a more severe disease (Supplementary Methods, Supplementary Fig. 2 and  Supplementary Table 1). Nine of the 10 patients had heterozygous regulatory mutations in SNRPB. Overall, six distinct, novel mutations in SNRPB were identified. Five mutations are within the alternative PTC-containing exon (chr20:g.2447838_2447961) of SNRPB. These mutations cluster at the 5 0 and 3 0 ends of this exon within areas of high conservation (Fig. 1c). A single patient had a 5 0 untranslated region (UTR) mutation, which is predicted to introduce an outof-frame translation initiation site (TIS) leading to a stop codon after 25 amino acids (Fig. 1b). In the SNRPB-positive families, mutation analysis confirms that CCMS is an autosomal dominant disorder. We observed a high rate of de novo mutations and two instances of non-penetrance. One individual with classic CCMS was negative for sequence or copy-number variants in the coding and UTRs of SNRPB.
SNRPB encodes the protein isoforms SmB and SmB 0 , which are core components of the U1, U2, U4/U6 and U5 small ribonuclear protein (snRNP) 20 subunits of the major spliceosome. The highly conserved alternative exon within the second of six introns in SNRPB contains a PTC and has been shown to auto-regulate SNRPB levels through NMD 18 . The alternate exon, which has a sub-optimal 5 0 splice site, is less frequently included when U1 snRNP levels are low as a result of SmB/B 0 depletion 18 . Conversely, it is more frequently included with SmB/B 0 overexpression 16 . We hypothesized that the mutations identified within this exon would alter the homeostatic balance between the coding full-length mRNA and alternative exon-containing transcripts targeted for degradation. Thus, we determined the effect of two of the alternative exon mutations using a splicing reporter minigene assay 21 . In the presence of the wild-type exon, 23% of all transcripts include this alternative exon, while introduction of either the chr20:g.2447951C4G or chr20:g.2447847G4T mutation shifts the proportion to 78% and 80%, respectively (Fig. 2a,b). Inclusion of the alternative PTC-containing exon was also assessed by quantitative reverse transcription PCR (qRT-PCR) in patient fibroblasts with the chr20:g.2447951C4G, chr20: g.2449752C4G, and chr20:g.2447847G4T mutations. Expression of the PTC-containing transcript increased, whereas overall expression of SNRPB decreased compared with control cells (Fig. 2c,

Discussion
Collectively, these results implicate the deregulation of SNRPB expression as the main disease mechanism for CCMS. Mutations in the alternative PTC-containing exon cluster at two sites, which overlap with known exonic splicing silencers (ESSs) 22 . In an experiment by Saltzman et al. 18 , deletion of both of these regions resulted in increased inclusion of the alternative exon in HeLa cells. Our results support the functional significance of these ESSs, which are perfectly conserved across placental mammals ( Fig. 3 and Supplementary Fig. 3), and suggest that the identified mutations weaken their silencing function. This would lead to the observed increase in the inclusion of this exon in CCMS, which is presumably the cause of the decreased overall SNRPB expression seen in patient cells (Fig. 4). The mutations identified in the alternative exon appear to cause a reduction in the amount of SmB/B 0 that is consistent with a hypomorphic, but not a null, allele. qRT-PCR experiments in three patients show a narrow range of total SNRPB expression (0.53-0.66 relative to controls). In the minigene experiment, exclusion of the alternative exon was not eliminated in mutant transcripts, but occurred 20-22% of the time. We also have evidence that null alleles might result in a more severe phenotype as one patient without an alternative exon mutation has a 5 0 UTR mutation predicted to result in a null allele causing haploinsufficiency (Supplementary Discussion). This patient's phenotype was more severe than the remainder of the cohort, with only five pairs of poorly ossified ribs, a poorly ossified spine, cystic hygroma and multiple pterygia (Supplementary Table 1). Since none of the other patients carry truncating mutations in the gene (which would be much more likely to occur by chance than point mutations at two specific loci), and truncating mutations in or deletions encompassing SNRPB have not been reported 23,24 , we suggest that SNRPB haploinsufficiency may cause a more severe and likely lethal phenotype that is distinct from classic CCMS.
CCMS joins a growing list of developmental disorders caused by mutations in core spliceosomal genes. Of particular interest are those with an overlapping craniofacial phenotype, such as Nager syndrome and the EFTUD2-related disorders [25][26][27] . Interestingly, all of the above are caused by dominant mutations that are predicted to reduce expression of a component of the major spliceosome. It is known that the abundance of the spliceosomal machinery influences AS 28,29 . In the case of SNRPB, RNAseq experiments have shown that specific AS exons are more sensitive to changes in SmB/B 0 levels 18 . Among genes containing such exons, nucleic acid binding and RNA processing genes are overrepresented. The SNRPB mutations presented here are therefore predicted to cross-regulate AS and expression of downstream ARTICLE genes. However, it is perplexing that a spliceosomal deficiency could cause such a strikingly specific phenotype. It may be that this deficiency affects a small number of transcripts that are particularly sensitive to spliceosomal protein levels. The identity of such transcripts remains speculative, however, animal models suggest that craniofacial abnormalities commonly found in CCMS are likely due to abnormal cell proliferation 30 , whereas the rib abnormalities have long been postulated to be a consequence of abnormal cartilage formation 31 . Given the common craniofacial phenotypes associated with the abovementioned disorders, it is possible that a common gene or network of genes is perturbed in all of these. Another possible explanation for the specificity of the CCMS phenotype is that the spliceosomal deficiency is exacerbated in a critical tissue or developmental stage owing to increased demand for spliceosomal activity. Studies of the two spliceosome-associated disorders spinal muscular atrophy and retinitis pigmentosa have shown that the retina and spinal cord, tissues that appear to be sensitive to a spliceosomal deficiency, show increased demand for spliceosomal proteins 32,33 .
Our study highlights the importance of accurate AS in development, alludes to the broad network of splicing regulation, and demonstrates the regulatory and developmental importance of a highly conserved regulatory element. The alternative exon of SNRPB has high conservation at the nucleotide level throughout placental mammals (average GERP score 4.08), although to a lesser extent than the ultra-conserved elements ( Supplementary  Fig. 3). In general, shorter human conserved elements are conserved among mammals, but not with other species 3 . It has been suggested that evolution of these elements is ongoing in vertebrates, and that specific specializations may reflect cladespecific adaptive regulatory changes 3 . It is then possible that auto-regulation of SNRPB has evolved in mammals with the function of guiding specific cellular and developmental processes. Broadly, we therefore speculate that NCEs may have a significant role in regulating the phenotypic variation on which natural selection acts to drive the evolution of complex and highly integrated traits.

Methods
Patients. A cohort of 10 CCMS families was assembled through the Finding Of Rare disease GEnes (FORGE) Canada Consortium (now called Care4Rare). All patients provided informed consent, and the study was approved by and complies with the ethical regulations of the institutional review board at the University of Calgary. An experienced clinical geneticist was responsible for each diagnosis of CCMS. Exclusion criteria included absence of micrognathia and posterior rib gaps. Other variable features include scoliosis, short stature, conductive hearing loss and congenital heart defects. Although intellectual disability is reported to be a common feature of CCMS, this was not prevalent in our cohort (Supplementary Fig. 2 and Supplementary Table 1). Family A had a sibling recurrence with unaffected parents, families E and F had parent-child transmission, and the seven remaining cases were sporadic (Supplementary Fig. 1).
Exome sequencing. DNA was extracted from whole blood. Exome sequencing was performed for six unrelated cases and seven family members at the McGill University and Génome Québec Innovation Centre. The SureSelect 50 Mb Human All Exon kit (Agilent) was used for exon capture; v3 was used for families A, B and C, and v5 was used for families D, E and F. Captured regions were sequenced on a HiSeq 2000 sequencer (Illumina) with 100 bp paired-end reads. Reads were aligned to the hg19/GRCh37 human reference sequence using the Burrows-Wheeler Aligner 34 , and indel realignment was done with GATK 35 . Duplicate reads were then marked using Picard (http://picard.sourceforge.net/) and excluded from downstream analyses. Coverage of consensus coding sequence (CCDS) bases was assessed using the GATK, which showed that samples had on average 494% of CCDS bases covered by at least 10 reads, and 490% of CCDS bases covered by at least 20 reads. Single-nucleotide variants and short insertions and deletions were called with SAMtools mpileup 36 with the extended base alignment quality adjustment (-E). Only variants that were supported by Z20% of reads were returned. These were annotated using both Annovar 37 and custom scripts to identify whether they affected protein-coding sequence, and whether they had previously been seen in the 1,000 genomes data set (April 2012), the National Institutes of Health Heart, Lung, Blood Institute, Grand Opportunity Exome Sequencing Project (NHLBI GO) exomes, or in B700 exomes previously sequenced at our center.
To identify de novo variants in the probands of the three families for which trios were sequenced, we filtered out all proband variants seen in a parent or in the Higher levels of these proteins then favour inclusion of the alternative exon, by an unknown mechanism, leading to NMD and a reduction of SmB/SmB 0 protein levels. In alleles mutated in CCMS patients, the binding of repressor proteins is thought to be abolished or reduced due to the mutations present in the regulatory sequences. This leads to continued inclusion of the alternative exon, and reduced SmB/SmB 0 protein levels due to NMD.
1,000 genomes or NHLBI exome data sets, and manually reviewed remaining candidates. For family D a de novo 5 0 UTR variant was seen that introduced a potential out-of-frame TIS in SNRPB. We used TIS miner (http:// dnafsminer.bic.nus.edu.sg/Tis.html) to predict the effect of this variant.
Sanger sequencing. For all individuals in the cohort, Sanger sequencing of the alternative exon including the flanking intronic regions was performed. For patient D II-1, the 5 0 UTR was sequenced to confirm the presence of the variant identified by exome sequencing. For patient G II-3, the coding regions, including the flanking intronic sequences, and the UTRs of SNRPB were sequenced. Primers were designed with Oligo 6 (Molecular Biology Insights). Sequences can be found in Supplementary Table 2. An amount of 2.5 ml of 50 ng ml À 1 DNA was used in a 25ml PCR using the HotStar Taq amplification system. Thermocycler conditions were as follows: 96°C for 5:00, 35 cycles of 96°C for 0:30, 58°C for 0:30 and 72°C for 0:30, and a final elongation step at 72°C for 7:00. An amount of 5 ml was analysed on a 1% agarose gel. A quantity of 1.2 ml of 1/20 dilution of the PCR product was purified in a reaction with 1 ml ExoSAP-IT (Affymetrix) and 3 ml H 2 O The product of this reaction was added to a sequencing reaction with 2.2 ml H 2 O, 1.875 ml of 5 Â sequencing buffer, 0.5 ml primer and 0.25 ml BigDye Terminator v1.1 (Life Technologies). Unincorporated nucleotides were removed from the sequencing reaction by passage through a Sephadex column. The products were then analysed on a 3130xL Genetic Analyzer (Applied Biosystems).
Copy-number variant analysis by qPCR. qPCR of all exons of SNRPB was used to search for copy-number variants in patient G II-3. One microliter of 5 ng ml À 1 DNA was used in a 20-ml reaction with 1 ml of 10 mM primer mix, 10 ml of SYBR Transfections. HEK293 cells were plated in 24-well plates at 40-50% confluency.
The following day, cells were transfected with 0.25 mg of DNA and 0.75 ml of Fugene 6 (Promega) in 2 ml of DMEM þ 10% FBS. Eighteen hours later, cells were rinsed with Dulbecco's PBS and lysed with 1 ml of TRIzol (Invitrogen). RNA extraction was done using the manufacturer's protocol and the resulting purified RNA was resuspended in 20 ml of diethylpyrocarbonate (DEPC)-treated H 2 O. Transfection was performed three times.
RT-PCR and analysis. Half of the RNA (10 ml) was treated with 2 units of DNase I (NEB) by incubation at 25°C for 15 min, then 1 ml of EDTA 25 mM was added, followed by incubation at 65°C for 10 min. One-eighth of this reaction (2.5 ml) was used in a 10 ml reverse transcriptase reaction using SuperScript III (Invitrogen), using the manufacturer's protocol, with random primers and a reaction temperature of 50°C. RNase H treatment was performed by adding 1 unit of the enzyme and incubating at 37°C for 20 min. A quantity of 0.75 ml of cDNA was used in a 30 ml PCR, of which 10 ml was analysed on a 2% agarose gel and 20 ml was analysed on a BioAnalyzer (Agilent). Primer sequences are available in the Supplementary Material.
Patient fibroblast culture and RNA extraction. Skin biopsies were collected from patient II-1 from family C, patient II-2 from family E and patient II-1 from family F. Three anonymous control fibroblast lines were obtained from The Centre for Applied Genomics. Fibroblasts were cultured at 37°C in Amniomax media (Invitrogen) (15% Amiomax supplement, 0.5% glutamine and 0.005% fungizome). At confluency, cells were treated with 3 ml of Hank's balanced salt solution (Invitrogen), then 1 ml of trypsin-EDTA (Invitrogen) and centrifuged at 1,100 r.p.m. for 10 min. Cells were then either resuspended in fresh media for growth of the next passage or used for RNA extraction. Total RNA was extracted using the RNeasy Mini Kit (Qiagen) according to the manufacturer's protocol, with a DNase I digestion performed at the wash step (10 ml DNase I in 70 ml buffer RDD (Pre-AnalytiX). RNA was extracted from three subsequent cell passages.
qRT-PCR primer design and efficiency testing. Primers were designed using Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/primer3/) to overlap exon-exon junctions to prevent amplification of genomic DNA. Sequences can be found in Supplementary Table 6. For each primer pair, efficiency was determined by tracing a standard curve of the C t values of four serial dilutions of cDNA. Primers with efficiency between 90 and 110% were selected. The SNRPBaltexon_qRTPCR_F and R primers amplify the transcript including the alternative PTC-containing exon; SNRPBtotal_qRTPCR_F and R primers amplify all transcripts.
qRT-PCR analysis of SNRPB expression. Two microlitres of RNA were used in a 20 ml reverse transcriptase reaction using SuperScript III (Invitrogen), according to the manufacturer's protocol, with oligo d(T) primers. The resulting cDNA was diluted by one-fifth and used in a qRT-PCR. All qRT-PCRs had a 20 ml volume, with 1 ml cDNA, 1 ml of 10 mM primer mix, 10 ml of SYBR Green (Life Technologies) and 8 ml of H 2 O. Reactions were run on a 7900HT Fast Real-Time PCR System (Applied Biosystems). Cycling conditions were as follows: 95°C for 10 min, 40 cycles of 95°C for 15 s and 60°C for 1 min, and a dissociation step with 95°C for 15 s, 60°C for 15 s and 95°C for 15 s. Relative expression was calculated using the DDC t method 36 , with EIF1B used as a reference gene. The experiment was performed three times. Statistical significance of observed differences was calculated with a Student's t-test.