Synonymous GATA2 mutations result in selective loss of mutated RNA and are common in patients with GATA2 deficiency

Deficiency of the transcription factor GATA2 is a highly penetrant genetic disorder predisposing to myelodysplastic syndromes (MDS) and immunodeficiency. It has been recognized as the most common cause underlying primary MDS in children. Triggered by the discovery of a recurrent synonymous GATA2 variant, we systematically investigated 911 patients with phenotype of pediatric MDS or cellular deficiencies for the presence of synonymous alterations in GATA2. In total, we identified nine individuals with five heterozygous synonymous mutations: c.351C>G, p.T117T (N = 4); c.649C>T, p.L217L; c.981G>A, p.G327G; c.1023C>T, p.A341A; and c.1416G>A, p.P472P (N = 2). They accounted for 8.2% (9/110) of cases with GATA2 deficiency in our cohort and resulted in selective loss of mutant RNA. While for the hotspot mutation (c.351C>G) a splicing error leading to RNA and protein reduction was identified, severe, likely late stage RNA loss without splicing disruption was found for other mutations. Finally, the synonymous mutations did not alter protein function or stability. In summary, synonymous GATA2 substitutions are a new common cause of GATA2 deficiency. These findings have broad implications for genetic counseling and pathogenic variant discovery in Mendelian disorders.

Genomic studies typically focus on the discovery of nonsynonymous variants that alter coding regions or canonical splice sites because their effect is predictable. Conversely, due to codon degeneracy, synonymous substitutions do not alter the amino acid composition of the encoded protein and are usually not reported as pathogenic. However, previous studies revealed that such variants can alter RNA or protein on multiple levels including pre-mRNA splicing, messenger RNA (mRNA) stability and structure, miRNA binding, and translation [15][16][17][18][19][20][21][22][23][24].
Here, we initially identified a synonymous substitution in exon 3 of the GATA2 gene (c.351C>G, p.T117T) in two unrelated pedigrees, with the clinical phenotype of GATA2 deficiency. The variant was recently reported in an adult patient (the mother of two siblings studied here) presenting with immunodeficiency, severe infections and lung disease [25]. This prompted us to study the contribution of synonymous alterations to the genetic spectrum of GATA2 deficiency and to assess their pathogenic role. We discovered and characterized five distinct synonymous mutations with RNA-deleterious effect in nine patients. They represent a new type of mutation in GATA2 deficiency and have broad implications for both the discovery of disease-causing mutations and genetic counseling.

Patient cohort and genomics
The screening cohort consisted of 911 patients (Fig. 1a): 729 children and adolescents with primary MDS classified according to WHO criteria [26][27][28] enrolled in the studies 1998 and 2006 of the European Working Group of MDS in Childhood (EWOG-MDS, #NCT00662090), and 182 patients with cytopenias and/or GATA2-specific clinical problems, referred to our diagnostic laboratory. GATA2 gene sequence, including intron 4 was analyzed in bone marrow (BM) samples using targeted deep sequencing with Sanger sequence validation, and subsequent confirmation of germline mutational status in nonmyeloid tissues as previously reported [7,29]. Whole exome/genome sequencing (WES/WGS) was performed in patients with synonymous GATA2 variants to rule out other hereditary causes (Supplementary methods, Supplementary Table 1).

Targeted investigations of GATA2 transcript expression
We analyzed RNA expression in blood, BM or fibroblasts using Sanger, deep sequencing, and TA cloning-based sequencing (Supplementary methods, Supplementary Fig. 1 and Supplementary Table 2). In addition, GATA2 expression in various hematopoietic compartments of healthy controls was measured (Supplementary methods).

Studies of GATA2 protein stability and function
In order to explore the influence of synonymous mutations on protein stability and function, in vitro analysis of exogenously expressed GATA2 was performed in 293T cells. To further investigate the protein function, in vivo studies in zebrafish were accomplished (for details see Supplementary methods). Experiments were performed in duplicates or triplicates as indicated in the figure legends.

Statistics
For reporter assay, data from biological and technical triplicate experiments were presented as the mean values ± standard deviation (SD). Statistical significance was assessed using GraphPad Prism v 7.04 software employing either standard one-way ANOVA test (reporter assay, thermodynamic effect of GATA2 variants) or Student's t test (allele quantification in patients' cDNA by deep sequencing, frequency of zebrafish phenotypes). P values < 0.05 were considered statistically significant.

Identification of synonymous GATA2 variants
We initially discovered two unrelated individuals (P1, P3) with GATA2 deficiency carrying an identical synonymous GATA2 variant. This prompted a systematic evaluation of the GATA2 gene sequence in our screening cohort of patients presenting for the most part with the phenotype of pediatric MDS (Fig. 1a). At first, we categorized "classical" diseasecausing alterations and identified 101 patients with 62 distinct pathogenic GATA2 mutations (Fig. 1a). The distribution of mutations corroborated data reported in previous studies [9]. The most common were null mutations affecting the Nterminal part of the protein: stop-gain, frameshift, splice site (N = 52), followed by missense mutations within or adjacent to ZnF2 (N = 36), intron 4 EBOX-GATA-ETS site alterations (N = 10), and other aberrations (N = 3): one in-frame and two whole gene deletions (Fig. 1b).
Next, we searched GATA2 coding sequence for the presence of synonymous substitutions. Variants that are either not reported or very rare (<0.05% allele frequency) in the gnomAD population database were found in nine patients. These variants were present in 8.2% (9/110) of all patients with GATA2 alterations, and 14.8% (9/61) of cases with GATA2 exonic substitutions only (Fig. 1c). In comparison, common polymorphisms with synonymous effect p.P5P, p. P22P, p.Q38Q, p.T188T, and p.A411A were not significantly enriched in our cohort (not shown), arguing against their disease-causing role in MDS.
The synonymous substitutions encountered in P1-P9 were predicted to have a likely benign effect using the combined annotation-dependent depletion score (CADD) and gene-specific calibration by Gene-Aware Variant INterpretation (GAVIN) (    Table 1).

Phenotype of patients with synonymous GATA2 mutations
Patients with synonymous GATA2 mutations were diagnosed at a median age of 11.5  years. Hematologic and immunological phenotypes were consistent with the heterogeneous clinical picture of GATA2 deficiency and included varying degrees of immune cytopenias (low B/NK, DC cells, monocytopenia), immunodeficiency, neutropenia, and/or pancytopenia (supplemental case descriptions). P2 is the sibling of GATA2-deficient patient P1 and was categorized as a silent GATA2 mutation carrier with a reduction of B-and NK-cells. Their mother was previously reported with pulmonary alveolar proteinosis [25]. P7 and P8 (unrelated, carrying the same mutation), initially presented with thrombocytopenia and while P8 developed transfusion-dependent refractory cytopenia of childhood (RCC), P7 remained stable with BM morphology suspicious for RCC. P9 was first seen with complications of immunodeficiency and clinically evolved to MDS. Monosomy 7 in BM was detected at diagnosis in four patients (P1, P3, P4, and P6), normal karyotypes were present in four (P5, P7, P8, and P9), while no marrow exam was performed in P2 (Table 2). According to the WHO classification, P1 and P4-P8 were diagnosed with RCC, and P9 with MDS with multilineage dysplasia as a young adult. Initial disease of P3 was MDS-EB, which progressed to AML after 6 months. Other clinical problems in the affected patients were transient organ dysfunction after birth and facial abnormalities in P4, hepatosplenomegaly in P5, hypospadias in P6, and Crohn's colitis as well as HPVdriven neoplasia in P9. The majority of patients (6/9) underwent allogeneic hematopoietic stem cell transplantation (HSCT) with favorable outcome: 5/6 patients were alive at last follow up (at a median of 1.9 years after HSCT) and 1/6 (P3) died from infection 7 months following HSCT ( Table 2).

Exclusion of other hereditary causes
We next aimed to determine if other genetic conditions predisposing to inherited bone marrow failure (IBMF) or

Synonymous GATA2 variants result in selective loss of mRNA expression
Building on the assumption that synonymous variants detected in our patients were associated with degradation of the mutant (Mut) mature mRNA, we first sequenced cDNA transcribed from polyadenylated RNA transcripts (equivalent to mRNA) using Sanger method. Compared with genomic DNA, cDNA sequences showed loss of heterozygosity manifested by complete lack of the Mut allele in five out of seven cases: P1, P3, P5, P6, and P7, and a substantial reduction in P4 and P9 (Fig. 2a upper panel). Compared with hematopoietic specimens, Mut allele expression was slightly higher in skin fibroblasts of P1 and P4 ( Fig. 2a lower panel). Because it is not known if monoallelic GATA2 expression might be a general phenomenon in normal hematopoiesis, we sequenced three healthy controls who carried a common heterozygous polymorphism (rs2335052: c.490G>A; p.A164T). Both the genomic DNA and cDNA showed an equal ratio of alternative to reference alleles (Fig. 2b).
Deep sequencing based quantification of allelic frequency showed nearly total absence of Mut alleles in P1, P3, P5-P7, and a reduction of Mut expression to 21% in P4 (Table 1 and Fig. 2c). Combined across all samples, we observed median values of 27 reads for Mut, versus 330,544 reads for wild-type (WT) alleles. Lastly, TA cloning of P5's and P6's cDNA followed by sequencing of an average of 345 single colonies was a third independent method confirming the RNA reduction (0% and 11% of Mut amplicons for P5 and P6, respectively, not shown). In order to address at which stage of RNA maturation the Mut alleles were lost, we deep sequenced products that were reverse-transcribed using alternative priming approaches. While oligo(dT) that are specific to mature transcripts (mRNA) produced almost exclusively GATA2 WT reads, the use of random hexamers (enriching both pre-mRNA and mRNA) resulted in an increase of Mut reads to~30% for P1 and P6 (Fig. 2d).

Splicing analysis of the GATA2 gene
In order to ascertain the mechanism of monoallelic GATA2 expression, RNA sequencing (RNAseq) was performed in sorted CD34+ BM cells of five patients (P1, P4-P7). Isoform analysis revealed two novel splice junctions in P1, not observed in the Ensembl database and healthy controls ( Fig. 3a and Supplementary Fig. 3). In both new transcripts in P1 the c.351C>G mutation acts as a new splice donor that joins to alternative acceptors either at c.488 or at c.608. Long range RT-PCR and sequencing in P1′ BM and fibroblasts (Fig. 3b) confirmed the presence of the transcript with c.488 alternative acceptor. Finally, TA cloning of the cDNA PCR products of P1 and sequencing of 348 colonies revealed the presence of three novel transcripts (Fig. 3c).
Two of these were identical as detected by RNAseq; the third transcript found in only nine colonies harbors the c.351 donor that joins to a new splice acceptor at position c.539. All three transcripts resulted in sequence frameshift

Synonymous variants are predicted not to affect RNA stability
The impact of synonymous variants on mRNA stability and secondary structure was determined using Mfold, RNAfold, and Quickfold tools. Synonymous substitutions were predicted not to significantly affect secondary structure of mRNA ( Supplementary Fig. 4a). In addition, no relevant energy change (ΔG) was observed between Mut and WT ( Supplementary Fig. 4b). As a comparison, five common synonymous polymorphisms from GnomAD and five nonsynonymous pathogenic GATA2 mutations were included in the analysis. None of these variants had influence on the mRNA structure and thermodynamic characteristics.

Analysis of protein stability and function
We investigated the levels of endogenous GATA2 protein in P9 who carried RNA-deleterious mutation c.351C>G, p. T117T and had sufficient primary specimen. Analysis was performed in patient-derived platelets since GATA2 was previously found to be highly expressed in this hematopoietic subpopulation ( Supplementary Fig. 5) [33,34]. GATA2 protein levels were severely reduced, similarly to other known pathogenic GATA2 mutations (Fig. 4a). Next, to determine the effect of the synonymous variants on GATA2 transcriptional function, GATA-specific reporter assay was performed. Transactivation activity was comparable between synonymous Mut and WT (Fig. 4b). We subsequently assessed the protein:DNAbinding ability using the electrophoretic mobility shift assay (EMSA) for the mutation c.649C>T, p.L217L. Using this limited approach, no significant difference in DNA binding between Mut and WT GATA2 proteins was seen (Fig. 4c) CompeƟtor oligo 100x Specificity Target oligo Mutated oligo performed at steady state with a high level of ectopic protein expression.
Because it is known that synonymous variants can impair translation, we aimed to analyze the effect of Muts on protein levels. We ectopically expressed cDNA under the principle that splicing effect will not be expected due to missing introns, and observed protein changes will result from altered translation. We blocked the transcription with actinomycin D in transfected 293T cells and analyzed protein levels over time (Fig. 4d). Expectedly, protein content decreased during the course of treatment for all genotypes resulting from exhaustion of mRNA reserves. However, p.L217L showed slightly higher protein content as compared with WT. To better delineate the cause for the relative increase in protein levels after transcription blockade, we then quantified the proteins after translation inhibition (cycloheximide). The p.L217L variant was associated with a slowdown of protein degradation visible after 5-7 h of treatment (Fig. 4d).

Effect of synonymous GATA2 c.649C>T variant on zebrafish hematopoiesis
For further analysis we selected the c.649C>T, p.L217L variant due to only partial reduction of the mutated allele expression in hematopoietic specimen of the P4. We hypothesized that the mutation may exert its effect on the protein level and aimed to determine if it alters zebrafish hematopoiesis. We used a previously published MO against gata2b [35] and visualized hematopoietic stem and progenitor cell (HSPC) in zebrafish embryos by whole-mount in situ hybridization of the HSPC marker c-myb at 28 h post fertilization, when HSPCs arise from the dorsal aorta. Expectedly, gata2b inhibition resulted in a reduction of HSPCs in zebrafish embryos (Fig. 5a, top right) [35]. We then performed a phenotype rescue experiment by coinjecting gata2b MO with human GATA2 WT or Mut mRNA. Phenotype rescue (defined as medium/high phenotype; Supplementary Fig. 6) was achieved in 83% and 98% of embryos injected with WT and Mut mRNA, respectively, (Fig. 5b-d). However, we observed a significantly higher proportion of high phenotypes in animals rescued with Mut mRNA (42%) as compared with WT (19%), p < 0.05 (Fig. 5d right panel).

Discussion
GATA2 deficiency is a monogenic disorder known so far to be caused by heterozygous nonsynonymous mutations, whole gene deletions or intronic enhancer mutations, all of which result in haploinsufficiency. In this study, we report the identification of synonymous, RNA-deleterious mutations in GATA2 that accounted for 8.2% of all GATA2-mutated patients and 14.8% of cases with GATA2 exonic substitutions. In total, we identified nine patients harboring five distinct synonymous GATA2 variants that are either absent or exceedingly rare in general population: p. T117T, p.P472P, p.L217L, p.G327G, and p.A341A. Two of these (p.T117T and p.P472P) were encountered in multiple unrelated pedigrees, suggesting either independent mutational events or rare founders in the European population (which is possible at least for p.P472P present in gnomAD in 23 individuals of non-Finnish European ancestry). The phenotype of patients carrying synonymous variants resembled GATA2 deficiency. All of the patients were alive at last follow-up with exception of one patient who died from HSCT-related complications. Additional mutations in GATA2 were not identified. Other MDSpredisposing conditions were excluded based on clinical studies and WES/WGS in all patients with the exception of P6 who carried two VUS in the SAMD9 gene. No specific features demarcating GATA2 from SAMD9 syndrome were present in this patient; hypospadias are unspecific and had been reported in both conditions [36,37]. At this point, we cannot rule out that in P6 both gene defects acted in a synergistic manner facilitating MDS development.
Computational prediction assigned an increased probability of missplicing to three of the five variants. Further assessment of mutation deleteriousness with existing in silico tools failed to ascribe pathogenic effects. Because of the difficulty in predicting deleteriousness, synonymous mutations have been generally left out in genomic studies. However, it is likely that many disease-causing mutations are being consistently overlooked-including mutations located in noncoding regions of the genome as well as synonymous variants. So far, little is known about the role of such mutations in hematopoietic malignancies due to lack of routine screening of the inter-/intragenic regions. Besides known recurrent deleterious mutations in the regulatory element of GATA2 [10] there are only few examples of noncoding mutations associated with BMF. Recently, two patients were reported with dyserythropoietic anemia and an intronic substitution in GATA1 gene that is 24 nucleotides upstream of the canonical splice acceptor site. This alteration resulted in reduced canonical splicing and increased use of an alternative splice acceptor site that causes a partial intron retention event [38]. Moreover, mutations in 5'UTR and deep intronic region of ELANE gene have been reported to be associated with severe congenital neutropenia [39]. Due to lack of studies integrating functional evaluation, the prevalence of such variants in Mendelian disorders is yet to be determined. It is remarkable that recent pancancer studies report acquired synonymous driver mutations at a rate of 6-8% among all single-nucleotide changes found in human cancers [40]. This is strikingly similar to the proportion of (germline) synonymous mutations identified in our study. Mutations causing phenotypically severe hereditary disease are mainly introduced as random de novo events, and it is well accepted that purifying selection will eventually eliminate these deleterious alleles. This is especially valid in high-penetrance conditions, such as GATA2 deficiency that  often manifests before the reproductive age and thus results in reduced fecundity.
There are multiple ways how synonymous substitutions can exert deleteriousness even though the amino acid sequence is not changed. As confirmed using three orthogonal approaches, all of the mutations found here resulted in a nearly complete and selective loss of the Mut transcript in hematopoietic cells, with the exception of c.649C>T, p. L217L that showed Mut allele reduction to~20%. In contrast, paired analysis of two patients revealed a higher Mut allele expression in skin fibroblasts versus hematopoietic cells. Potential explanations for this discrepancy might be the variability of allelic expression across different tissues [41,42] or the notion of context-dependent monoallelic expression observed for~20% of human genes [43]. In addition, we observed that divergence in allelic ratio depends not only on the tissue analyzed but also on the stage of RNA processing. Strikingly, the mutation frequency in BM of two patients (c.351C>G and c.1023C>T) increased from nearly absent in mRNA to 30% in total RNA transcripts, implying that the defect manifests at a late stage of RNA maturation, at least for these two variants. Splicing disruption was predicted for three variants (c.351C>G, c.981G>A, and c.1023C>T); however, splicing analysis confirmed novel splicing pattern only for c.351C>G. This mutation resulted in aberrant transcripts with premature stop codon which makes it functionally equivalent to a frameshift-truncating mutation causing nonsense-mediated decay. For the remaining four mutations, no abnormal splicing was detected. It is conceivable that these Mut mRNAs are extremely unstable and subjected to a very rapid sequestration. Another potential explanation for loss of allelic expression is epigenetic silencing that could arise from aberrant promoter methylation. Supporting this, allelic disbalance due to hypermethylation was recently observed in one patient with GATA2 p.T354M mutation [44]. Synonymous variants can also affect translation and thus result in increased or decreased protein stability or function. Surprisingly, p.L217L Mut protein was slightly more stable in vitro, although its function (tested in vitro using the EMSA DNA gel shift assay at steady state, with ectopic GATA2 overexpression) seemed not to be affected. Further, this mutation not only rescued the GATA2-deficient phenotype in zebrafish, but also resulted in a significantly higher number of HSPCs in comparison with control animals. Higher stability of this mutated protein might potentially explain the relative increase in its functional properties in vivo. In analogy, it is known that moderate GATA2 overexpression enhances proliferation and self-renewal of progenitor cells [45]. We reason that the more efficient rescue of the morphant phenotype can be associated with higher stability of the p.L217L Mut, which is seen when transiently overexpressed in 293T cell line (Fig. 4d). Because of the challenging data (decrease of Mut RNA expression but higher protein stability of protein) we do question the pathogenicity of this mutation until additional biological data or patients are reported. Limited availability of patients' primary specimens as well as instability of the transcripts with synonymous mutations precluded further mechanistic studies.
Reported diagnostic yields for WES/WGS in single individuals can reach~40% and heavily rely on computational predictions [46,47] which are difficult to achieve for synonymous mutations. Moreover, WES is limited to the analysis of coding regions only. Even though genome sequencing overcomes this constraint, it generates an enormous output of alterations within coding and noncoding regions of the genes. In setting of GATA2 deficiency, WGS would facilitate the detection of pathogenic intronic mutations in regulatory region in intron 4 (corresponding to +9.5 kb enhancer region) as well as whole gene and partial gene deletions. However, allelic loss on RNA level would be missed. The utility of transcriptome analysis was previously highlighted by the identification of disease-causing mutations in patients with negative exome or genome sequencing results, increasing the diagnostic rate by as much as 35% [48,49]. Hence, we propose that diagnostic sequencing should incorporate a cascade approach where RNA sequencing follows inconclusive DNA analysis in patients with suspected disease. This approach is feasible not only for patients with GATA2 deficiency but also in patients with high index of suspicion for a specific Mendelian disorder but without a known pathogenic mutation. Our findings suggest that a straightforward Sanger or deep sequencing of cDNA would be sufficient to confirm the RNA-deleteriousness of a synonymous variant.
In summary, we demonstrate that a significant proportion of GATA2-deficient patients carry damaging synonymous alterations. These genetic changes, previously excluded from analysis due to their likely silent effect, should be incorporated into standard diagnostic pipeline for individuals with GATA2 disease phenotype. However, patients with other hereditary BM failure and MDS syndromes might also benefit from this extended diagnostic approach. In the long term, identification of pathogenic synonymous variants has the potential to improve genetic counseling, HSCT donor selection, and clinical outcomes.

Data availability
WES data have been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001003817. Further information about EGA can be found on https://ega-archive.org "The European Genome-phenome Archive of human data consented for biomedical research".

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval Primary patients' samples were obtained after written informed consent in accordance with the Declaration of Helsinki. The study was approved by the local Ethics Committee (CPMP/ICH/135/ 95). All animal experiments were performed in accordance with relevant guidelines and regulations, approved by the review committee of the Max Planck Institute of Immunobiology and Epigenetics and the Regierungspraesidium Freiburg, Germany (license Az 35-9185.81/G-14/95).
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.