Main

SWI/SNF is a family of chromatin remodeling complexes. Due to the combinatorial and modular assembly of their subunits, SWI–SNF complexes can assemble in different configurations. The three main variants identified are BAF, pBAF and ncBAF. ARID1A and ARID1B subunits are mutually exclusive subunits of the BAF complex.

ARID1B is the most frequently mutated SWI/SNF subunit in the autism-spectrum disorder Coffin–Siris syndrome (CSS)1,2. ARID1B-containing BAF complex is critical during neural crest cell differentiation3. ARID1A instead is the most frequently mutated subunit in cancer4. In ARID1A-deficient cancers, ARID1B-containing complex sustains BAF complex activity5,6, explaining the dependency of ARID1A mutant cancers on ARID1B (ref. 7). While most CSS mutations in ARID1B are loss of function nonsense mutations, the functional relevance of missense mutations is largely unexplored (Fig. 1a, Extended Data Fig. 1a,b and Supplementary Table 1).

Fig. 1: Point mutations in ARID1B disrupt its function largely by compromising protein stability.
figure 1

a, Schematic diagrams of ARID1B domains (coordinates refer to UniProt variant Q8FND5-3) and distribution of different types of mutation reported from the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/, October 2022 release). b, Schematic diagram describing the workflow used to perform DMS screens on ARID1B. After library construction and lentiviral transduction, two assays (with their respective libraries) were interrogated, one measuring ARID1B-driven cell proliferation and the other measuring protein stability. c, Scatter plot depicting the effect of each mutation in the libraries Pool4 and Pool5 on cell proliferation and protein stability. Dots are colored based on the −log10 adjusted P value (adjP) for the stability screen and size is based on −log10 (adjP) from the proliferation screen. Blue and orange squares represent cutoffs used to select ‘inert’ and ‘prolif’ mutants, respectively. d, Boxplot representing the percentage of cells gated as GFP<mCherry ‘destabilized population’ (refer to diagram in b and gating scheme in Extended Data Fig. 2b) for the selected set of ‘prolif’ and ‘inert’ mutants (Extended Data Fig. 5a) in Cal51 cells. A boxplot represents the median, first and third quartile and whiskers extend to 95th percentile resulting from six biological replicates. A dotplot underneath each boxplot panel represents values from pooled stability screen colored by log2 fold change (FC) and sized by −log10 (P). Squares underneath the dotplot as well as coloring boxplot coloring represent the category of mutants analyzed. e, Boxplot representing colony formation assay data from cell lines stably expressing the indicated ARID1B mutant cDNA. Data are presented as a percentage of surviving cells relative to NoDox control (shRNA to knockdown endogenous ARID1B is doxycycline inducible). Data are depicted as boxplots (representing median, first and third quartile and whiskers extend to 95th percentile) from three biological replicates. Dotplots underneath each boxplot panel represent values from pooled proliferation screen colored by log2 fold change and sized by −log10 (P). Squares underneath the dotplot as well as the boxplot coloring represent the category of mutants analyzed. f, Coimmunoprecipitation assay in HEK293A ARID1A/B double knockout cells transfected with HA-tagged cDNA constructs indicated. Coloring represent the mutant category (as reported in Fig. 2d,e). Data are representative example from three biological replicates. IP, immunoprecipitation; WT, wild-type.

Source data

Recent cryo-EM studies have explained the structure of ARID1A-containing BAF complex8,9. In such structures, only the C-terminal EHD2 domain is visible8,9. This domain adopts an all-helical armadillo repeat fold and it is centrally located in the base module of the complex engaging multiple subunits in protein–protein interactions to stabilize the base module structure8,9. ARID1A EHD2 domain interacts with a long helical domain of BRG1 (HSA domain), the catalytic subunit of the complex. In vitro, mutations in BRG1 aiming to disrupt the interaction with the ARID1A EHD2 domain have been reported to decrease nucleosome remodeling9 similar to assembly of an ARID-less BAF complex8. However, the in vivo functional consequences of missense ARID1B mutations remain unexplored.

To systematically characterize the ARID1B EHD2 domain structure–function relationship, we performed deep mutational scanning (DMS) with two functional assays: (1) a cellular proliferation assay in which expression of ectopic ARID1B cDNA rescues the lethal phenotype induced by silencing endogenous ARID1B in an ARID1A mutant cancer cell line, (2) a sensor assay for protein stability and/or abundance using a bicistronic vector expressing green fluorescent protein (GFP)-tagged ARID1B EHD2 domain and a mCherry protein to normalize expression levels (Fig. 1b) similar to VAMP-seq10. We refer to protein stability in a broad sense, including protein misfolding and other events causing rapid clearance through the cellular quality control machinery. We validated these two assays using either wild-type sequences or a construct bearing a deletion in the BC-box, which was previously shown to be critical for protein stability11 and consequently also for proliferation (Extended Data Fig. 2a,b). We then created plasmid libraries including a total of 8,960 single-residue variants (divided into six pools, Extended Data Fig. 2c). We accurately quantitated the abundance of each allele from next-generation sequencing (NGS) data (Methods and Extended Data Fig. 3a) and, after assessing homogeneous representation of the variants in the different pools (Extended Data Fig. 3b and Supplementary Table 2), we screened two pools (Pool4 and Pool5) encompassing the region of ARID1B at the interface with BRG1 (amino acids 1970–2130) in both cancer proliferation assay and protein stability assay (Fig. 1b and Extended Data Fig. 2c).

We computed differential representation of each variant allele in each screen and observed that most mutations do not affect protein stability or proliferation (Extended Data Fig. 4a,b and Supplementary Table 3). Direct comparison of the results from the two screening readouts revealed a significant effect on cell proliferation for mutations eliciting substantial decrease in protein stability (Fig. 1c). Only few sporadic mutations (displaying lower significance levels) were able to specifically inhibit ARID1B function (effect on cancer cell proliferation in absence of destabilization), likely due to extensive cooperative binding within the complex impeding the disruption of BRG1 PPI with a single point mutation. Therefore, our DMS data suggest that single point mutations might largely represent loss of function alleles because of an effect on protein stability rather than additional mechanisms (for example, disruption of the protein–protein interaction). To validate our findings, we selected a subset of mutants affecting protein stability as well as cancer cell proliferation (called ‘prolif’) and a subset affecting protein stability in absence of antiproliferative effect (called ‘inert’) (boxes in Fig. 1c and Extended Data Fig. 5a). We selected mutants within 10–15 Å distance from BRG1, attempting to balance the type of mutations (Extended Data Fig. 5a). Using the stability sensor assay we validated that all the selected mutants affect protein stability in both human embyronic kidney 293A (HEK293A) and Cal51 cells (Fig. 1d and Extended Data Fig. 5b), as well as we reproduced the pattern of proliferative effect on Cal51 cells using both colony formation assay (Fig. 1e) and live-monitoring cell growth assay (Extended Data Fig. 5c). To explore the mechanism underlying the effect on cancer cell proliferation elicited by ‘prolif’ mutants compared to the ‘inert’ ones, we analyzed their potential for complex assembly. Coimmunoprecipitation assays reveal that mutants affecting cancer cell proliferation cannot be assembled into the BAF complex (Fig. 1f), suggesting that exploitation of the synthetic lethal relationship between ARID1B and ARID1Amut cancers requires profound perturbation of BAF complex composition beyond ARID1B partial protein degradation.

Given the paucity of point mutations affecting cell proliferation without altering protein stability, we extended our analysis by performing DMS screen using our stability sensor screen with the additional four sublibraries encompassing the entire EHD2 domain (Extended Data Fig. 2c). After quality control of the libraries (Extended Data Fig. 3b and Supplementary Table 2), we conducted differential analysis for each individual library (Extended Data Fig. 4b and Supplementary Table 3) and overlaid the results with a set of structural features of the ARID1B EHD2 domain (Fig. 2a and Extended Data Fig. 6a). Overall, we found that positions in the hydrophobic core are more susceptible to mutations than surface-exposed residues (Extended Data Fig. 7a), which is in line with previous studies using DMS coupled to phenotypic readouts12 or measuring directly epistatic interactions13 or protein abundance readout10,14. Mutations of hydrophobic side chains in helices to charged or polar amino acids caused protein destabilization, while introducing smaller hydrophobic residues (for example, alanine), on average, did not affect stability. Cases where alanine substitutions led to a marked effect on protein stability were limited to positions with large hydrophobic residues as wild-type amino acid (for example, Phe, Leu, Ile). Proline mutations in helices were particularly destabilizing due to its irregular geometry that disrupts the hydrogen bonding pattern in helices (Extended Data Fig. 7b–h). Within the EHD2 domain, we observed that the central helices were particularly sensitive to mutations, in contrast to unstructured regions and peripheral helices with higher degrees of solvent-exposure (Fig. 2b).

Fig. 2: Protein destabilization underlies ARID1B pathogenic missense mutations in CSS.
figure 2

a, Heatmap depicting protein stability score for each mutation interrogated in ARID1B DMS. Positions coordinates refer to UniProt variant Q8FND5-3 and are annotated based on library pools, presence of a secondary structure (helix) and wild-type amino acid (AA). Variants amino acids are color coded based on the amino acid property. Stability sensor log2 fold change values are represented in red-white-blue gradient. Gray values represent wild-type amino acids. Solvent accessible surface area (SASA) per residue in Å2 is depicted with rainbow color scale. b, ARID1B EHD2 domain alphafold model color coded based on average stability score for each position. BRG1 helix is colored in acquamarine and other BAF complex subunits are depicted as gray surface. The inset shows a magnification of ARID1B EHD2 domain alone. c, Boxplot representing the stability score for missense mutations in ARID1B EHD2 domain annotated according to clinical evidence of pathogenicity in ClinVar database. Boxplots represent median and first and third quartiles, and whiskers extend to 95th percentile. d, ARID1B EHD2 domain alphafold model in dark gray with positions annotated as mutated in ClinVar displayed as spheres and color coded based on clinical evidence of pathogenicity. BRG1 helix is colored in acquamarine and other BAF complex subunits are depicted as a gray surface. e, Boxplot representing the percentage of HEK293A cells gated as GFP<mCherry ‘destabilized population’ (refer to diagram in b and gating scheme in Extended Data Fig. 2b) for a set of mutants reported in ClinVar. As positive control for destabilization ("Pos-ctrl") we used a construct bearing deletion of the BC-Box. The boxplot represents the median, first and third quartile and whiskers extend to 95th percentile resulting from six biological replicates. The dotplot underneath the boxplot panel represents values from pooled stability screen colored by log2 fold change and sized by −log10 (P). Squares underneath the dotplot as well as the boxplot coloring represent the category of clinical significance reported in ClinVar.

Source data

Next, we reasoned that having generated a systematic ARID1B mutation-stability map could shed light on the functional relevance of ARID1B clinical missense mutations reported in the ClinVar database. Seven out of eight known pathogenic missense variants were covered by our DMS screen. Six out of these seven showed a strong negative stability score and are located in positions that we found to be highly susceptible to destabilization in general, while only one (E2029K) showed a marginal effect on stability (Fig. 2c). Mutations with uncertain interpretation had varying effects (Fig. 2c). We then essayed the functional impact of a subset of ClinVar mutations on orthogonal readouts of protein stability (Fig. 2e and Extended Data Fig. 8a,b) and cancer cell proliferation (Extended Data Fig. 8c,d), and assessed BAF complex assembly by immunoprecipitation (Extended Data Fig. 8e). Overall, we confirmed that decreased protein levels are the best predictor of CSS pathogenicity, underlining the importance of a tightly regulated ARID1B protein levels during development (as evidenced by the haploinsufficiency of ARID1B mutations in mice15 and humans16,17).

Further, out of the 19 theoretically possible mutations, the clinically manifested mutations in CSS were, in all cases, among the top most destabilizing mutations for the respective position (Extended Data Fig. 9a,b). By contrast, benign CSS mutations and mutations identified in patients without CSS patients (from the GnomAD database) did not impair ARID1B stability (Extended Data Fig. 9c). Structurally, all seven pathogenic mutations are located in the central helices that are particularly critical for ARID1B stability (Fig. 2d). For example, V1939G, M1952T, I2018T and S2019F, cluster in close spatial proximity (<5 Å), supporting that this region is of particular importance to maintain protein integrity. Moreover, the missense variants S2123P and L2070P also showed strong negative stability scores, which is likely caused by perturbing the helical secondary structure (Extended Data Fig. 10a). Thus, we conclude that the pathogenicity of clinically observed mutations in patients with CSS is due to ARID1b destabilization or misfolding, rather than a direct effect on the molecular interaction with another BAF complex subunit.

Together, our DMS data shed light on the functional consequence of ARID1B missense mutations by: (1) pinpointing regions and/or positions that are generally critical for ARID1B stability (for example, central helices) and thus are largely intolerant to almost any mutation, and (2) by providing a comprehensive mutation-stability map to rationalize the interpretation of any known or yet unknown amino acid substitution, including mutations considered ‘uncertain’ (Extended Data Fig. 10b,c).

In summary, our DMS framework integrates functional and protein stability measurements, which should be broadly applicable to facilitate the interpretation of missense mutations and contribute to a deeper understanding of the molecular mechanisms underlying various diseases.

Methods

Cell culture

HEK293A and Cal51 cells were cultured in DMEM (Bioconcept) supplemented with 10% fetal bovine serum (FBS) (Seradigm), 1× l-glutamine (2 mM), 1× sodium pyruvate (1 mM), and 1× nonessential amino acid (0.1 mM). HEK293A cells were obtained from Thermo Scientific and Cal51 from DMSZ, and were tested for identity by single-nucleotide polymorphism genotyping and mycoplasma contamination. The doxycycline-inducible short-hairpin RNA (shRNA) Cal51 cell line was generated by lentiviral transduction of pLKO-TET-ON construct containing the following shRNA sequence: shARID1B_2683 5′-gagagtcacacaaaggaatct-3′. HEK293T ARID1A/B double knockout cells were generated by transfecting all-in-one CRISPR plasmids expressing the following single-guide RNA sequences: sgARID1B_2 (5′-ACCGTGAGGTGCCAACGTTTAGGT-3′) sgARID1B_3 (5′-ACCGAAACTTGATAAGCTTCCTAG-3′), sgARID1B_8 (5′-ACCGGGCACCCCACTATACGCTGG-3′), sgARID1A_2 (5′-ACCGTTGAGATGTCCAAACACCCA-3′), sgARID1A_3 (5′-ACCGGATGTTGGCGAGTGTAACCA-3′) and sgARID1A_4 (5′-ACCGCTTGCAACCAACCTCAATGT-3′). Cells were then transfected with plasmids expressing ARID1B cDNAs cloned in a custom lentiviral vector under an EF1a promoter.

Immunoprecipitation and western blotting

For immunoprecipitation assays, cells transfected with various HA-tagged constructs were collected and lysed in RIPA buffer supplemented with protease inhibitor cocktail (Roche). Cleared lysates were incubated with Anti-HA magnetic beads (ThermoFisher catalog no. 88836) overnight at 4 °C. Beads were then washed three times in RIPA buffer and proteins eluted by boiling for 5 min in laemmli buffer. For western blot analyses, cells were collected and lysed in RIPA buffer supplemented with protease inhibitor cocktail (Roche). Protein samples (from cell collection or immunoprecipitation elutions) were loaded on 3–8% Tris-Acetate gels (Invitrogen), transferred onto nitrocellulose membranes and probed with the following antibodies: Actin (Millipore, catalog no. MAB1501; 1:1,000 dilution), HA (Cell Signaling, catalog no. 3724; 1:1,000 dilution), ARID1B (Sigma, catalog no. WH0057492M1, 1:500 dilution), SMARCB1 (Cell Signaling, catalog no. 91735, 1:1,000 dilution), BRG1 (Abcam, catalog no. ab110641, 1:1,000 dilution) and HRP‐antirabbit and HRP-antimouse (Cell Signaling, 1:2,500 dilution).

Cloning of DMS libraries

Six DMS libraries were designed to cover the whole EHD2 domain of ARID1B: pool 1 (1616–1700), pool 2 (1764–1824), pool 3 (1905–1975), pool 4 (1970–2065), pool 5 (2065–2130) and pool 6 (2131–2236). The oligo pools of the DMS libraries were purchased from Twist Bioscience as a Single Site Variant Library with changes at the amino acid level. The oligos were flanked by the sequences 5′-gccatccagaagacttaccgcgtctcg-3′ and 5′-gcagtctggaagacggaaaccgtctcg-3′, which contain BsmbI restriction sites. Oligo pools were amplified by polymerase chain reaction (PCR) using matching primers for the flanking sequences, and cloned into two different backbones under the EF1a promoter by Golden Gate. For the proliferation assay, libraries were cloned into a lentivirus vector containing FL ARID1B while for the sensor assay, libraries were cloned into a bicistronic lentiviral vector expressing mCherry-P2A-eGFP-ARID1B_EHD2. Endura electrocompetent cells (Lucigen) were transformed according to the manufacturer’s protocol. We estimated that the transformation efficiency was more than 100-fold over the size of the initial oligo pool, indicating that each variant is highly represented in the plasmid libraries. The bacteria were expanded in Luria-Bertani medium for roughly 16 h (optical density at 600 nm (OD600) = 0.8), and plasmid DNA was harvested using a NucleoBond Xtra Maxi kit (Macherey-Nagel). We performed a quality control of all six libraries by NGS. Illumina sequencing libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit and the NEBNext UDI Set 1 (New England Biolabs, catalog nos. E7645L, E6440S), according to the manufacturer’s instructions. Paired-end 250 data were produced using the Illumina NovaSeq 6000 system. Quality control analyses (below) retrieved more than 99% of the variants expected in the DMS libraries.

Proliferation-based assays

For the validation of the proliferation assay, Cal51_shARID1B_2683 cells were transduced with lentivirus plasmids containing either control complementary DNAs (cDNAs) (Nluc or eGFP) or ARID1B constructs under EF1a promoter (multiplicity of infection (MOI) = 0.3). The cells were selected using neomycin (1.2 mg ml−1; Gibco) at 24 h after transduction, after which they were expanded. After 2 weeks, 2,000 cells per well were seeded into six-well plates, in the presence or absence of 100 ng ml−1 doxycycline (n = 8 biological replicates) for colony formation assay. At D14, colony formation assays were stopped by adding 3.7% formaldehyde (Sigma), stained using crystal violet (sigma) and quantified. Stable Cal51_shARID1B_2683 cells expressing a battery of ARID1B mutant cDNAs were seeded at 10% confluency in 96-well plates. After 24 h, half of the wells were treated with 100 ng ml−1 doxycycline (n = 4 biological replicates) and imaged every 12 h for 15 days for live imaging using an Incucyte SX5 (Sartorius).

For the libraries’ screening, Cal51_shARID1B_2683 cells were transduced with independent lentiviral pools (MOI = 0.3) of pool 4 (1970–2065) and pool 5 (2060–2130) libraries. Roughly 1,000 cells per plasmid were transduced to ensure a correct representation of all variants in the cell population. The cells were selected using neomycin (1.2 mg ml−1; Gibco) at 24 h after transduction, after which they were expanded. After 2 weeks, 1.4 million cells were seeded in cell stacks in the presence or absence of 100 ng ml−1 doxycycline (n = 5 biological replicates for each condition) to induce ARID1B KD. After 10 days, cells were collected. Genomic DNA (gDNA) was extracted using Dneasy Blood & Tissue Kit (Qiagen), libraries were amplified by PCR and amplicons were purified using SPRI beads (Beckman) before being submitted to NGS. Illumina sequencing libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit and the NEBNext UDI Set 1 (New England Biolabs, catalog nos. E7645L, E6440S), according to the manufacturer’s instructions. Paired-end 250 data were produced using the Illumina NovaSeq 6000 system.

Stability sensor assays

For the validation of the sensor assay HEK293A cells were transfected with 1 µg of pXP1510-mCherry-eGFP-ARID1B (1565–2236) plasmid and derivatives carrying a battery of mutations. X-tremeGENE9 (Roche) was used for the transfection, according to manufacturer’s protocol. After 72 h of transfection, mCherry and/or eGFP expression was analyzed by fluorescence activated cell sorting (FACS) (Cytoflex, Beckman Coulter) using CytExpert v.2.4.0.28 software.

For the libraries’ screening, HEK293A cells were transduced with independent lentiviral pools (MOI = 0.3) of the six libraries pools (n = 3 biological replicates). Roughly 1,000 cells per plasmid were transduced to ensure a correct representation of all variants in the cell population. Cells were expanded and collected at 14 days before cell sorting by FACS sorting (below). Sorted cells were centrifuged, lysed and gDNA was extracted using Dneasy Blood & Tissue Kit (Qiagen). Libraries were amplified by PCR and amplicons were purified using SPRI beads (Beckman) before being submitted to NGS. Illumina sequencing libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit and the NEBNext UDI Set 1 (New England Biolabs, catalog nos. E7645L, E6440S), according to the manufacturer’s instructions. Paired-end 250 data were produced using the Illumina NovaSeq 6000 system using Illumina NovaSeq control software v.1.8.1.

FACS

A FACS flow cytometer (Aria Fusion, Becton Dickinson, equipped with BD FACSDiva Software) was used for cell sorting, using the 70 µm nozzle at 70 psi pressure, using 1× BioSure Preservative-Free Sheath Solution (Concentrate, catalog no. 1027). Temperature of the sample and of the collecting tubes was set at 4 °C and cells were sorted using a four-way purity mode at an event rate around 11,000 cells per s to maintain a high recovery yield. Cells were sorted into 1.5 ml Eppendorf tubes containing 350 µl of FACS buffer or a 5 ml PP FACS tube containing 350 µl of FACS buffer (DPBS, 2 mM EDTA, 2% FBS).

To match the calculated minimum number of sorted cells with regards to the library structure and sequencing depth, the samples were sorted individually in parallel on different instruments. To avoid batch effects, each instrument’s performance was normalized using CST beads’ brightest peak signal for all instruments, and an equal fraction of each individual sample was sorted on each instrument in parallel before being pooled.

Computational analyses

For each sample, paired-end sequencing reads (Fastq) were stitched into unique sequence fragments using NGmerge (v.0.3)18, stitched reads were mapped to the codon-optimized wild-type reference sequence using bowtie2 (v.2.4.4, ref. 19) and PCR amplification primers were trimmed off using cutadapt (v.3.5, ref. 20). A summary count matrix for each variant (Supplementary Table 3, DMS_data_all and Supplementary Table 2, Plasmid_libraries_QC) was computed using a custom Python and R script (Python v.3.9.9, R v.4.1.10). Differential representation analysis was then performed using DESeq2 (v.1.34.0)21 to identify variants significantly affecting proliferation or protein stability (Supplementary Table 3, DMS_data_all). All statistical analysis and plotting were performed in R (v.4.1.1). The ARID1B structural model was downloaded from the AlphaFold Protein Structure Database in December 2022, and trimmed to residues 1616–2236. The ARID1B model was aligned to the ARID1A cryo-EM structure accessible under Protein Data Bank code 6LTH, with a root mean-squared deviation of 0.6 Å. The solvent accessible surface area was computed the ARID1B (1616–2236) AF2 model using the Shrake–Rupley ‘rolling ball’ algorithm, with a probe radius of 1.4 Å. The surface area was assigned on the residue level. Secondary structure assignment for each residue was computed using Pymol v.2.5.2. DMS values were mapped onto the ARID1B model using a custom Python script, and images were rendered using Pymol v.2.5.2. Figure 1b and Extended Data Fig. 3a have been created with BioRender.com.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.