Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation

Loss-of-function of DDX3X is a leading cause of neurodevelopmental disorders (NDD) in females. DDX3X is also a somatically mutated cancer driver gene proposed to have tumour promoting and suppressing effects. We perform saturation genome editing of DDX3X, testing in vitro the functional impact of 12,776 nucleotide variants. We identify 3432 functionally abnormal variants, in three distinct classes. We train a machine learning classifier to identify functionally abnormal variants of NDD-relevance. This classifier has at least 97% sensitivity and 99% specificity to detect variants pathogenic for NDD, substantially out-performing in silico predictors, and resolving up to 93% of variants of uncertain significance. Moreover, functionally-abnormal variants can account for almost all of the excess nonsynonymous DDX3X somatic mutations seen in DDX3X-driven cancers. Systematic maps of variant effects generated in experimentally tractable cell types have the potential to transform clinical interpretation of both germline and somatic disease-associated variation.

Fig

S8: dNdScv analysis of individual cancer types
The proportion of missense variants classified as SGE-enriched and SGE-depleted (x-axis), and the percentage of missense variants that are likely drivers (y-axis), estimated from the observed:expected number of DDX3X missense variants (dN/dS), in different cancer types where there is more than 1 missense DDX3X variant in A) females and B) males.Error bars denote 95% CI.CNS-MB: Central nervous system medulloblastoma; CESC: Cervical squamous cell carcinoma and endocervical adenocarcinoma; UCEC: Uterine Corpus Endometrial Carcinoma; HNSC: Head and Neck squamous cell carcinoma; KIRP : Kidney renal papillary cell carcinoma; KIRC : Kidney renal clear cell carcinoma; BLCA: Bladder Urothelial carcinoma; BRCA: Breast invasive carcinoma; Lymph: Lymphomas; OV: Ovarian serous cystadenocarcinoma.Error bars show 95% confidence intervals.Source data are provided as a Source Data file.the 'snvre' option of our VaLiAnT package, a tool used for SGE oligo design 2 .In short, we introduced an alternative codon for each designed non-synonymous SNV, and all possible synonymous codons for each amino acid.We maximised the number of synonymous variants in the library because we heavily rely on the synonymous variants to establish the LFC/LFC-trend baseline for normalisation during the data analysis.These redundant synonymous variants were particularly important for short exons during LFC/LFC-trend normalisation, such as Exon 3 of DDX3X.Moreover, the redundant codons confirm the concordance of the variant effect at the amino acid level.In our data, we observed that both SGE-depleted and SGE-enriched variants exhibited good concordance between redundant variants: 95% of SGE-depleted variants had a redundant codon that was also classified as SGE-depleted, while 69% of SGE-enriched variants had a redundant codon that was also classified as SGE-enriched, with directional consistencies of 99% and 95%, respectively (Fig. S11).

Genotype-phenotype correlation
To investigate whether there are differences in the severity of intellectual disability between DDX3X-related NDD probands who carry SGE-fast and SGE-slow-depleting variants we identified 61 probands who had undergone assessment via the Vineland Adaptive Behaviour Scales 3,4 across three studies [5][6][7] .No difference in global adaptive function (Vineland Adaptive Behaviour Composite score) was observed between individuals carrying missense variants and those carrying protein-truncating variants (PTVs) (Fig. 5b).There was also no significant difference in the global adaptive function of individuals carrying fast or slow-depleting DDX3X variants (two-tailed t-test p=0.27,Fig. 5c).To investigate phenotypes more broadly, a composite score was devised for the Lennox et al. cohort, encompassing brain MRI findings, microcephaly, sensory deficits, muscle tone anomalies, cardiac findings, precocious puberty, experience of seizures and behavioural assessment.We observed no significant difference in composite score between fast and slow-depleting variants (Fig. 5c).Finally, we observed no difference in the rate of attainment of developmental milestones (age of speaking first words or taking first independent steps) in patients in the DDD cohort carrying fast or slow-depleting variants (Fig. 5e,f).

Discrepancies between SGE Data and Functional Data
Relatively few DDX3X variants have been functionally characterised previously, against which we could compare our results.Fonseca et.al. proposed that the L556S and R376C de novo variants observed in female patients render DDX3X prone to protein aggregation 8 .Both variants are functionally abnormal depleting variants in our assay.Kellaris et al. modelled in vivo in zebrafish embryos the R79K variant seen in two male siblings with NDD and proposed that this variant results in a partial loss-of-function 9 .This variant is functionally normal in our assay.
Snijders-Blok et al. [5][6][7]10 applied the same in vivo assay to assess 3 variants observed in male probands and 5 observed in females and observed all male variants to be no different to wildtype. Of he 5 female variants for which they identified loss-of-function effects (I214T, R326H, R376C, I507T, R534H), 4 are functionally abnormal depleting variants in our assay.Of the 8 variants previously determined to have negatively impacted helicase activity 6 , 7 are functionally abnormal depleting variants in our assay.The remaining variant, R326H, was interpreted to be pathogenic and functionally abnormal by both Lennox and Snijders Blok et al..The SNV generating R326H in our assay is classified as functionally normal.However, the multinucleotide redundant variant for the same amino-acid change is a functionally abnormal, fastdepleting variant.

-
Fig. S1: Correlation between sgRNAs and between measures of variant abundance -Fig.S2: Additional characterisation of SGE-depleted and SGE-enriched variants.-Fig.S3: Map of all SNVs and codon deletion variants' SGE functional class -Fig S4: cLFC trend of all DDX3X variants, grouped by exon -Fig S5: cLFC trend of all DDX3X variants, grouped by variant type -Fig S6: Map of all SNVs and codon deletion variants' predicted relevance for neurodevelopmental disorders -Fig S7: Distribution of classifier of NDD-relevant functional abnormality posterior probabilities -Fig S8: dNdScv analysis of individual cancer types -Fig.S9: Characterization of the HAP1 cell bank -Fig.S10: Optimisation of Saturation Genome Editing -Fig.S11: Correlation between cLFC trend and average redundant codon cLFC trend 2) Supplementary Results -Optimization of the SGE methodology -Genotype-phenotype correlation -Discrepancies between SGE Data and Functional Data 3) Supplementary References Fig. S1 a b
Fig. S4.cLFC trend of all DDX3X variants, grouped by exon a) Top panel: DDX3X exon structure with locations of key domains and protein annotations.b and c) y-axis: cLFC-trend; x-axis: Chromosome X hg38 position, grouped by exon.Variants coloured by variant type, shape indicates SGE functional class.Codon deletion variants are shown separately in c for clarity.Source data are provided as a Source Data le.
l M is s N o n S S p A /D n c S y n C D e l M is s N o n S S p A /D n c S y n C D e l M is s N o n S S p A /D n c S y n C D e l M is s N o n S S p A Fig. S5.cLFC trend of all DDX3X variants, grouped by variant type a) DDX3X exon structure with locations of key domains and protein annotations.b) cLFC-trend of non-coding (nc), synonymous (Syn), in-frame codon-deletion (Cdel), missense (Miss), nonsense (NonS) and canonical splice acceptor/donor variants (SpA/D), grouped by exon and coloured by SGE functional class.Source data are provided as a Source Data le.

Fig. S6 :Fig. S7 :
Fig. S6: Sequence-function map of DDX3XThis gure displays the relevance to DDX3X-related neurodevelopmental disorder for all 7,944 nucleotide (SNVs) and 626 codon deletion (Codon_del) variants in DDX3X.The x-axis of each sub-panel shows the chromosome X coordinate based on the hg38 reference genome.The outline colour of each box represents the predicted functional impact of the SNV, while the ll colour re ects the degree of con dence that a variant results in a non-functional protein, calculated using the random forest model.Source data are provided as a Source Data le.

Fig.
Fig. S8: dNdScv analysis of individual cancer typesThe proportion of missense variants classified as SGE-enriched and SGE-depleted (x-axis), and the percentage of missense variants that are likely drivers (y-axis), estimated from the observed:expected number of DDX3X missense variants (dN/dS), in different cancer types where there is more than 1 missense DDX3X variant in A) females and B) males.Error bars denote 95% CI.CNS-MB: Central nervous system medulloblastoma; CESC: Cervical squamous cell carcinoma and endocervical adenocarcinoma; UCEC: Uterine Corpus Endometrial Carcinoma; HNSC: Head and Neck squamous cell carcinoma; KIRP : Kidney renal papillary cell carcinoma; KIRC : Kidney renal clear cell carcinoma; BLCA: Bladder Urothelial carcinoma; BRCA: Breast invasive carcinoma; Lymph: Lymphomas; OV: Ovarian serous cystadenocarcinoma.Error bars show 95% confidence intervals.Source data are provided as a Source Data file.

Map of all SNVs and codon deletion variants' SGE functional class
This gure displays the SGE functional class of all 7,944 nucleotide (SNVs) and 626 codon deletion (Codon_del) variants in DDX3X.The x-axis of each sub-panel shows the chromosome X coordinate based on the hg38 reference genome.The outline colour of each box represents the predicted functional consequence of the SNV, while the ll colour re ects the SGE functional class.Source data are provided as a Source Data le.