Single-cell brain organoid screening identifies developmental defects in autism

The development of the human brain involves unique processes (not observed in many other species) that can contribute to neurodevelopmental disorders1–4. Cerebral organoids enable the study of neurodevelopmental disorders in a human context. We have developed the CRISPR–human organoids–single-cell RNA sequencing (CHOOSE) system, which uses verified pairs of guide RNAs, inducible CRISPR–Cas9-based genetic disruption and single-cell transcriptomics for pooled loss-of-function screening in mosaic organoids. Here we show that perturbation of 36 high-risk autism spectrum disorder genes related to transcriptional regulation uncovers their effects on cell fate determination. We find that dorsal intermediate progenitors, ventral progenitors and upper-layer excitatory neurons are among the most vulnerable cell types. We construct a developmental gene regulatory network of cerebral organoids from single-cell transcriptomes and chromatin modalities and identify autism spectrum disorder-associated and perturbation-enriched regulatory modules. Perturbing members of the BRG1/BRM-associated factor (BAF) chromatin remodelling complex leads to enrichment of ventral telencephalon progenitors. Specifically, mutating the BAF subunit ARID1B affects the fate transition of progenitors to oligodendrocyte and interneuron precursor cells, a phenotype that we confirmed in patient-specific induced pluripotent stem cell-derived organoids. Our study paves the way for high-throughput phenotypic characterization of disease susceptibility genes in organoid models with cell state, molecular pathway and gene regulatory network readouts.


Main text
Human cortical development is comprised of many unique and elaborate processes. Following neural tube formation, neuroepithelial cells within the telencephalon proliferate, expand, and generate radial glial progenitors, intermediate progenitors, and outer radial glial progenitors. In the dorsal region, these progenitors give rise to diverse lineages of excitatory neurons to form a layered cortex. In the ventral telencephalon, they instead generate interneurons that migrate into the dorsal cortex to integrate with excitatory projection neurons. In humans, these processes are governed by precise and highly orchestrated genetic, molecular, and cellular programs, many of which have remained largely elusive (3). Neurodevelopmental disorder (NDD) research has advanced our understanding of human brain development and helped reveal how it can go awry. However, many NDDs, such as ASD, are diagnosed only after birth when brain development is almost complete. Analysis of the developmental and celltype specific origins of NDDs is key to understanding their pathogenesis but is currently limited to neuroimaging and postmortem tissue studies. Researching the genetic etiology of NDDs is critical for dissecting their cause (1,5,6). However, studying the effects of risk genes requires access to the unique cell types and developmental processes of the human brain, many of which cannot be analyzed in most animal models. Brain organoids recapitulate early fetal brain development and generate the diverse cell types also found in vivo (7). Brain organoids have been used to examine disease-associated genes but are limited by phenotypic variability and low-throughput (7)(8)(9). This could be solved by analyzing large numbers of genes in parallel with single-cell perturbation screenings (10)(11)(12), but currently such genetic screening technology is missing in organoids and existing methodologies require further improvements in efficiency and controllability. Here, we describe the CHOOSE system that combines efficient, parallel genetic perturbations with single-cell transcriptomic readout in mosaic cerebral organoids. We use the system to identify developmental and cell type-specific phenotypes of ASD. We deliver individually verified, and uniquely barcoded pairs of gRNAs as pooled lentiviral libraries to stem cells. Starting with thousands of genetically barcoded clones for each genetic perturbation, we generate mosaic organoids enriched for telencephalic tissues to identify the loss-of-function phenotypes of 36 high-risk ASD genes at the level of cell type, developmental trajectory, and gene regulation. Using single-cell multiomic data including single-cell transcriptome and accessible chromatin modalities, we construct a developmental gene regulatory network (GRN) of the cerebral organoids and identify ASD-specific regulatory hubs connected to genes dysregulated in response to genetic perturbations. Amongst the 36 genes, one of the most dramatic changes in cell type composition was identified in the context of ARID1B. In particular, we find that perturbing ARID1B expands the ventral telencephalic pro-genitor pool and increases their transition probability to early oligodendrocyte precursor cells (OPCs). Finally, we confirm our findings in brain organoids generated from two ARID1B patient-derived iPSC lines.

Single-cell loss-of-function screening in clonally barcoded
organoids Single-cell RNA sequencing (scRNA-seq) is a high-throughput method for analyzing cellular heterogeneity of complex tissues. To establish an organoid system that enables CRISPR perturbations with single-cell transcriptomic readout, we made use of a human embryonic stem cell (hESC) line that expresses an enhanced specificity SpCas9 (eCas9) with significantly reduced off-target effects and is controlled by an upstream loxp stop element (13) (Fig. 1a). To regulate the eCas9 induction, we engineered a lentiviral vector to deliver a 4-hydroxytamoxifen (4-OHT)-inducible CRE recombinase and a dual single-guide RNA (sgRNA) cassette (Fig. 1a). In this cassette, two sgRNAs targeting the same gene are expressed under the U6 or H1 promoter. The dual-gRNA is located within the 3' long terminal repeat (LTR) of the lentiviral vector and is thus transcribed by RNA polymerase II to be captured by scRNA-seq assays (14). To ensure efficient generation of loss-of-function alleles, we determined the editing efficiency of the candidate sgRNAs for each target gene using a flow cytometry-based gRNA reporter assay (Fig. 1b, Fig. S1a). In this assay, a pre-assembled array of gRNA-targeting sequences is fused with TagBFP and is transduced in 3T3 cells to generate reporter cell lines. sgRNA and eCas9 are then delivered to the reporter cell lines by lentiviruses. Successful genome editing causing frameshift mutations leads to the loss of BFP fluorescence, allowing us to quantitatively evaluate gRNA efficiency in a large cell population (Fig. 1c, Fig. S1b, c). Using our reporter assay, we selected efficient sgRNA pairs for 36 ASD genes (Fig. S1d-e, Table 1). We cloned sgRNA pairs individually and pooled them equally to construct a lentiviral plasmid library. To ensure that the lentiviral integration frequency is limited to one per cell, we used a low infection rate of 2.5% (15) (Fig. S2a-c). Importantly, the development of human brain tissues exhibits generation of cell clones with highly variable sizes both in vivo and in vitro (13,16). To monitor the clonal complexity of the founder cells, we introduced a unique clone barcode (1.4 X 10 7 possible combinations) for each dual-sgRNA cassette to label individual lentiviral integration events (Fig. 1a,  Fig. S3a-c). Our analysis indicated homogeneous distribution of the gRNAs in the plasmid library and in hESCs at the time of infection that was also maintained after the formation of embryoid bodies (EBs) (Fig. 1d). Using this strategy, we obtained an average of 2,770 unique clones for each gene perturbation, which we used to generate mosaic EBs (Fig. 1e). Altogether, we established a highly efficient and controlled pooled screening system with high clonal complexity in the organoid.
CHOOSE cerebral organoids generate a high diversity of telencephalic cell types Cortical abnormalities are a key pathological feature of ASD (6, 17). Many ASD risk genes related to transcriptional regulation and chromatin remodeling pathways are crucial for cortical development (18,19). However, it was previously not possible to explore which cell types are affected when and how by these risk genes. We thus sought to leverage our methodology to explore loss-offunction phenotypes for 36 transcriptional control genes with high causal confidence (SFARI database category 1 genes) (5). We used previously established protocols which reproducibly generate human telencephalon tissues from human pluripotent stem cells (20,21) (Fig. S4a-c). Single-cell transcriptomic profiling (10X Genomics Chromium) of cerebral organoids at 4 months revealed a large diversity of cell types composing the developing dorsal and ventral telencephalon ( Fig. 1f-h, Fig. S5a (25). Interestingly, we found a cluster of interneurons expressing MEIS2 and strong PAX6 (LGE_PAX6), which transcriptionally resembles mouse olfactory bulb precursors, and was recently reported to be redirected to white matter specifically in primates (26). In addition to neuronal populations, we also identified glial cell populations including astrocytes (S100B, APOE, ALDH1L1) and oligodendrocyte precursor cells (OPC; OLIG2, PDGFRA) ( Fig. 1f-g). RNA velocity analysis (27,28) revealed general developmental trajectories from neural progenitor cells to neuronal populations in both the dorsal and ventral telencephalon (Fig. 1i). Taken together, our scRNA-seq dataset of 4-month-old cerebral organoids recapitulates diverse telencephalic cell types that are present in the developing human brain.
Perturbing ASD risk genes depletes dorsal IPCs and enriches ventral RGCs The large cellular diversity detected in our organoids allows us to systematically assess, compare, and categorize the effects of perturbations on cell fates. Using targeted amplification and hemi-nested emulsion PCR, we first recovered sgRNA information and assigned 28,344 cells to unique sgRNAs (Methods, Fig. S5d-e). Using a Cochran-Mantel-Haenszel test stratified by library, we measured the differential abundance of overall dorsal (D) versus ventral (V) telencephalic cells, as well as the abundance of each individual cell type in the various perturbations versus a non-targeting sgRNA control (Fig. 2a, Fig. S6). For 24 perturbations, we detected significant changes in the The CHOOSE system for multiplexed screening of ASD risk genes in human cerebral organoids. a, CHOOSE system overview. Barcoded dual-sgRNA cassette located within the 3' LTR of the lentivirus. b, Reporter assay to test gRNA efficiencies for 36 ASD risk genes. c, Editing efficiencies of gRNAs determined by flow-cytometry.
Plots show examples of gRNAs with no or efficient editing. d, sgRNA sequence read distributions of gRNAs sequenced from the ASD plasmid library, lentivirus infected hESCs, and EBs at day 5. e, Numbers of clones from the starting hESCs for each perturbation used to generate mosaic cerebral organoids. f, UMAP embedding of the scRNA-seq dataset containing dorsal and ventral telencephalon trajectories. g, Subclustering and UMAP embedding of the ventral telencephalon trajectory excluding astrocytes and ccv-RGC to annotate OPC clusters. h, Heatmap shows the expression of marker genes in different cell types. i, RNA velocity and velocity pseudotime inference in dorsal and ventral trajectories to show developmental directions.
ratio of dorsal to ventral (D-V) cell populations (orange to purple, Fig. 2a, left). Interestingly, most perturbations (21/24) lowered the D-V abundance ratio. For 21 perturbations, we detected changes in the abundance of at least one cell type (CMH-test, FDR < 0.05) (red to blue, Fig. 2a, right). Perturbation of KMT2C, for example, led to an overall depletion of cells with dorsal identity and an enrichment of ventral cells, whereas 8 perturbations specifically targeted one cell type without affecting the others, such as ADNP (L2/3 enrichment), MED13L (L2/3 depletion), ILF2 (CGE_IN enrichment), and DDX3X (CGE_IN depletion). Perturbations of LEO1 and TCF20 caused an enrichment of L4 neurons, which is a cell population critical for rapid sensory response that serves as the first level of sensory signal processing in cortical neurons (29). This is consistent with the fact that sensory and in particular auditory and visual processing concerns are one of the prevalent ASD symptoms (30).  . Colors indicate the sign of the log odds ratio multiplied by the -log10 FDR-corrected p-value of a CMH test stratified by library (signed -log10 FDR). b, e, Accumulative differential density of perturbations versus control across a binned pseudo-temporal axis. Each color represents one perturbation. c, f, Heatmaps show the expression of genes used to define developmental stages. d, g, Heatmaps show enrichment of gRNAs in annotated developmental stages for each trajectory.
we found that astrocytes were affected by 2 perturbations, although in the opposite direction (enrichment in DEAF1 and depletion in LEO1 perturbations). Collectively, the CHOOSE system allowed us to simultaneously investigate the effects of numerous ASD genes on cell fate determination. We found that progenitor populations, including dorsal IPCs and ventral RGCs, are among the most affected cell types, which indicates that ASD pathology could already emerge as early as the neural progenitor stage.

Perturbing ASD risk genes impairs upper-layer neuronal specification and ventral progenitor differentiation
scRNA-seq captures transcriptional dynamics in developing tissues and therefore allows the analysis of cell state transitions (31). Using RNA velocity and gene expression patterns, we first defined developmental stages along both the dorsal and ventral telencephalic trajectory covering progenitor proliferation (VIM), onset of neurogenesis (dorsal: NEU-ROG2; ventral: ASCL1), differentiation (dorsal: BCL11B, NEUROD6; ventral: ASCL1, DLX2, OLIG2), and neuronal subtype specification (dorsal: NEUROD2, NEUROD6, BCL11B, SATB2; ventral: MEIS2, NR2F2, DLX5) ( Fig. 2cf). We then calculated the differential density for cells of each perturbation versus control along a binned pseudo-temporal axis within the dorsal and ventral telencephalic trajectory, respectively ( Fig. 2b-e, Fig. S7). Developmental stage-based differential testing identifies effects of perturbations on cell states with increased resolution. For example, in the dorsal trajectory, we found an accumulative high density of perturbed cells in an early stage of upper layer neuron specification, as indicated by decreasing BCL11B and increasing SATB2 expression (Fig. 2b, c). Differential abundance tests revealed that perturbations of ASH1L, ASXL3, and TCF20 were enriched in the upper layer specification stage (Fig. 2d) despite the fact that we did not detect differential abundance of upper layer neurons when comparing to all cell types (Fig. 2a). Thus, the combined analyses of cell states and developmental progression favor the hypothesis that perturbations of these genes cause accelerated upper layer neurogenesis. Additionally, analyses of the ventral trajectory show an enriched density at the stages of neurogenesis and differentiation caused by multiple perturbations (high DLX2 and OLIG2 expression), suggesting that this developmental process is susceptible to ASD genetic perturbations ( Fig. 2e-g). Interestingly, other perturbations which did not cause cell type compositional changes nonetheless affected specific processes along a developmental trajectory. For example, disrupting the chromatin remodeler SRCAP results in reduced dorsal neural stem cell proliferation, which is consistent with a recently identified role of SRCAP in stem cell self-renewal (32) (Fig. 2d). Overall, our perturbation-specific characterizations of differential abundance of cell types as well as of stages along developmental trajectories together create a unique resource to delineate ASD high-risk gene loss-of-function phenotypes.

CHOOSE identifies dorsal and ventral telencephalon-spe-
cific dysregulated genes To further assess molecular changes caused by each perturbation, we performed differential gene expression analysis comparing each perturbation to control within the combined clusters of dorsal and ventral identity and detected in total 885 differentially expressed genes (DEGs) ( Fig. 3a; Supplementary Data 1). We ranked DEGs by detection frequency and discovered that certain genes, or gene families, are identified in multiple perturbations (Fig. 3b). Interestingly, in the dorsal populations, perturbations of DEAF1, FOXP1 and KDM6B, all lead to the down-regulation of CHCHD2 (Fig. 3a), a gene encoding a mitochondrial coiled-coil-helix-coiled-coil-helix domain containing protein and associated with Parkinsons disease (PD) (32). CHCHD2 was also found to be downregulated in both excitatory and inhibitory neuronal populations in postmortem brain tissues from several autism patients (33). Additionally, 4 other PD or ubiquitin proteasome pathway associated genes are dysregulated, including PRKN (LEO1 perturbation), PACRG (SETD5 perturbation), HECW1 and PDZRN3 (SRSF11 perturbation) (Fig. 3a). Of note, ubiquitin genes have been identified to cause autism (34). Our findings support the role of a dysregulated ubiquitin proteasome system in the pathophysiology of ASD during development. In the ventral cell populations, the most frequently detected DEG is the adhesion molecule CSMD1 (Fig. 3a, b), which is upregulated by ARID1B, CIC, MED13, and PHF3 perturbations, and downregulated by LEO1 and KMT2C perturbations. Ion channel deficiency has been implicated in NDDs and is strongly associated with neurological conditions such as epilepsy (35). We found that genes encoding voltage-gated potassium channels (VGKCs), including KCNB2, KCND2, KCNE5, KCNH6, and KCNH7, are differentially expressed in at least one of seven perturbations (ARID1B, CIC, ILF2, KMT2C, LEO1, PHF3, TBL1XR1) in the ventral trajectory (Fig. 3a). VGKCs play an important role in regulating neuronal excitability and have been linked to ASD (36). Interestingly, genes related to sodium channels are specifically dysregulated in the dorsal trajectory ( Fig. 3a, top, SCN2A upregulation in ILF2 and KDM6A perturbations, SCN3A upregulation in BAZ2B perturbation), but not in the ventral trajectory. These data suggest dorsal and ventral telencephalonspecific channel dysregulation caused by ASD genetic perturbations.
CHOOSE combined with GRN inference from single-cell multiome data identifies ASD-associated transcription regulatory modules When combining the DEGs from all perturbations, we found that they were significantly enriched in risk genes associated with ASD (SFARI database, 1031 genes; 2-fold enrichment, p < 10 − 5; Fig. 3c, d; Supplementary Data 1). Interestingly, however, we did not observe enrichment in risk genes for other neurodevelopmental disorders such as intellectual disability (ID; SysID database, 936 primary ID genes; Fig. 3c) indicating that certain biological processes and developmental regulatory programs are specifically susceptible to ASD-associated gene perturbations. To explore these potential gene regulatory 'hubs', we first generated single-cell multiome data including single cell transcriptome and chromatin accessibility modalities from 4month-old cerebral organoids ( Fig. S8a-d). Using Pando (37), we could harness these multimodal measurements to infer a GRN of the developing telencephalon and extract sets of genes regulated by each transcription factor (TF modules) (Fig. S8e,f). We visualized this GRN on the level of TFs us-ing a UMAP embedding (38), which revealed distinct groups of TFs active in NPCs (PAX6, GLI3, OLIG1), inhibitory neurons (DLX2, DLX6) and excitatory neurons (NEUROD2, NFIB, SATB2) as well as regulatory interactions between the TFs (Fig. 3d). To test whether regulatory sub-networks indeed exist at which ASD risk genes accumulate, we tested all TF modules for enrichment with SFARI genes. We found significant enrichment for a set of 40 TFs (adjusted Fisher test p-value < 0.01, >2-fold enrichment; e.g. EOMES, OLIG1, DLX2)( Fig.  S8g), among which 14 TFs are ASD risk genes (e.g. NFIA, BCL11A, MEF2C) (Fig. 3d). All TF regulatory modules enriched in SFARI genes together form an ASD-associated sub-GRN. Next, we assessed the transcriptomic effect of ASD genetic perturbations in the context of the inferred GRN. We performed enrichment tests on perturbation-induced DEGs (CHOOSE-DEGs) from dorsal and ventral telencephalic cells separately. We found that, similar to ASD risk genes, CHOOSE-DEGs were enriched in specific TF modules (Fig.  3f). In the ventral telencephalic cells, OLIG1 and NFIA were most strongly affected, whereas dorsal telencephalonspecific DEGs were most strongly enriched in NEUROD4, SOX4 and EOMES modules. Interestingly, some of the ASD-associated TF modules were among the most strongly enriched in CHOOSE-DEGs, supporting their role in ASDassociated gene dysregulation (Fig. 3e). We finally present gene regulatory subnetworks of OLIG1 and EOMES, two important genes whose regulomes are both enriched in ASD risk genes and strongly affected by ASD genetic perturbations (Fig. 3f). Taken together, we have characterized gene expression changes for each genetic perturbation in both dorsal and ventral telencephalon and uncovered novel molecular programs shared between different perturbations. Leveraging GRN inference from single-cell multiomic data, we further identified ASD-associated TF modules during cortical development and critical regulatory hubs underlying the detected gene expression changes.

ARID1B perturbation increases the transition probability of
v-RGCs to early OPCs Amongst the 36 genes, we found that ARID1B perturbation causes one of the most dramatic enrichments of v-RGCs (Fig. 2a). Interestingly, the OLIG1 regulatory module was also specifically enriched in DEGs caused by ARID1B perturbation (Extended Data Fig. 9). These data intrigued us to further investigate how v-RGCs are affected by ARID1B perturbation. We used Cellrank (39) to resolve the developmental trajectories leading to different inhibitory neuron types as well as OPC populations (Fig.  4a). We visualized the terminal fate probabilities for each cell as a circular projection, which revealed a distinct differentiation trajectory from ventral progenitors towards early OPCs (OLIG2, PDGFRA) and a branching of INPs (DLX2) into different inhibitory neuronal fates (DLX5) (Fig. 4b).
We found that ARID1B perturbed cells were strongly enriched in the OPC trajectory and had a higher percentage of OLIG2+ v-RGCs (Fig. 4c, d). This is an interesting find- ing given that OLIG2 is known to regulate progenitor selfrenewal at earlier developmental stages and is a master regulator for oligodendrocyte lineage specification in the ventral telencephalon (40,41). We then analyzed fate transition probabilities of ventral progenitors and found that ARID1Bperturbed v-RGCs have significantly higher transition probabilities towards early OPC than neuronal fates (Fig. 4e).
Loss-of-function mutations in ARID1B have been identified to cause ID and ASD (5,42). To confirm whether our findings are relevant to human disorders, we recruited two patients with ARID1B heterozygous mutations. Patient 1 harbors a nucleotide duplication (c.2201dupG) that results in a frameshift and an early STOP codon, and Patient 2 carries a microdeletion (6q25.3del) that includes exon 8 and the downstream region of the ARID1B locus (Fig. S10a, b). We established iPSC lines from both patients and a mutationcorrected cell line for patient 1 as an isogenic control. We then generated cerebral organoids with enriched ventral te-lencephalon regions by adding patterning factors SAG and IWP2 and investigated the behavior of v-RGCs at day 40 (43). In organoids from both patients, we observed considerably increased OLIG2+ cells, as well as DLX2+, and DLX2+/OLIG2+ cells compared to the control, which is in line with our previous findings (Fig. 4g). Finally, to further explore the potential consequences of such defects in patients, we analyzed the prenatal brain structure of patient 2 using intra-uterine super-resolution MRI obtained at two gestation stages (GW22 and GW31). 3D reconstruction of the LGE and CGE, sources of ventral telencephalon progenitors, revealed an enlarged volume compared to multiple age-matched controls at both stages (Fig. 4h, Fig. S10c). Taken together, the enrichment of v-RGCs and ccv-RGCs (Fig. 2a), the higher transition probability of v-RGCs to early OPCs, and the increased proportion of OLIG2+ cells in our CHOOSE experiment and in organoids generated from two patient iPSC lines, all suggest that ARID1B perturbation leads to abnormal ventral progenitor expansion and aberrant cell fate specification (Fig. 4i). The enlarged volume of LGE and CGE in the ARID1B patient is consistent with these observations.

Discussion
We have developed the CHOOSE system to characterize the loss-of-function phenotypes of 36 high-risk ASD genes across dozens of cell types spanning early brain developmental stages in human cerebral organoids. Our findings provide a developmental and cell-type specific phenotypic database for ASD high-risk gene loss-of-function research, which will shed light on the disease pathogenesis. We assessed phenotypes affecting dorsal and ventral telencephalic cell populations during brain development. IPCs are transit-amplifying dorsal progenitors which generate neurons for all cortical layers and contribute to the evolutionary expansion of the human cortex (44,45). We found that IPCs appeared to be particularly susceptible to ASD genetic perturbations. Interestingly, some of the perturbed genes have previously been suggested to play a role in IPC regulation within a disease context. For example, brain organoids derived from a patient with an MECP2 mutation were shown to have decreased number of IPCs (46). Among the cells with ventral identity, we identify a strong enrichment of progenitor cells, including v-RGs and ccv-RGs, which are the major progenitor pools differentiating into interneurons and oligodendrocytes (40,47). Compared to CGE-INs, which can be depleted or enriched, we found that LGE-INs can only be enriched by the perturbations. Our results establish a basis for dissecting the relevance of CGE and LGE-IN dysfunction in ASD pathophysiology, which is important considering recently discovered human-specific CGE and LGE interneuron migration features and function (21,26,48). Additionally, DEG analyses revealed that multiple perturbations impaired the same molecular processes, which may be critical for the development of ASD pathophysiology. For example, we found that specifically in the dorsal trajectory, genes related to Parkinsons disease and the ubiquitin system are frequently dysregulated. Interestingly, parkinsonism features have been reported in older autistic adults (49). Our findings further suggest that ubiquitin dysfunction could already be emerging during early development stages. Furthermore, we constructed a GRN underlying telencephalon developmental trajectories and identified ASDassociated regulatory modules in dorsal and ventral cell populations. The OLIG1 regulatory module is particularly interesting as many of its downstream targets are ASD risk genes (SFARI database). This regulatory module was previously identified as being important for oligodendrocyte differentiation in the developing human cerebral cortex (19). Thus, our findings highlight the involvement of the oligodendrocyte population in the development of ASD pathophysiology. Finally, we discovered that loss of ARID1B, a BAF complex component, leads to increased transition of ventral progenitors to early OPCs. Interestingly, the OLIG1 regulatory module is also affected by ARID1B perturbation. The role of the BAF complex in oligodendrocyte generation and maturation has been mainly established by studying one of its components, SMARCA4, which is required for OPC differentiation (50). Considering the temporal and cell-type specific expression of each BAF subunit, it would be interesting to investigate how ARID1B or other subunit-mediated chromatin remodelers regulate oligodendrocyte specification and how their dysfunction contributes to neurodevelopmental disorders. The ability to determine cell-type specific contributions to genetic disorders in a systematic, scalable, and efficient manner will greatly facilitate our understanding of disease mechanisms. As the CHOOSE system provides a robust, precisely controlled screening strategy, we anticipate it will be widely applied beyond brain organoids to study disease-associated genes.

CHOOSE screen
sgRNA selection and cloning Top 4 sgRNAs were first selected based on the predictions using multilayered VBC score (52) and then subjected to the reporter assay (see below) to test editing efficiency. sgRNAs were cloned into the gRNA reporter assay lentivirus construct (containing dual-sgRNA cassette: U6-sgRNA1-H1-sgRNA2) using the GeCKO cloning protocol (53). The two sgRNAs were cloned using IIs class restriction enzymes FastDigest BpiI (ThermoFisher, cat. no. FD1014) and Esp3I (ThemrmoFisher, cat. no. FD0454) separately and verified using sanger sequencing. All gRNAs used for this study can be found in the Extended Data Table 1.
sgRNA reporter assay A construct containing dTomato-2A-gRNA target arrary-TagBFP under the RSV promoter was assembled using Gibson assembly. The construct was packaged into retrovirus using the Platinum-E retroviral packaging cell line via calcium phosphate-based transfection method. Virus-containing supernatant (DMEM/10% FBS/2mM L-Glutamine/ 100 U/ml Penicillin/ 0.1 mg/ml Streptomycin) was collected up to 72 hours, filtered through 0.45 µm filter and then stored on ice. Retroviruses were then used to infect NIH-3T3 cells and BFP and dTomato positive cells were sorted using flow cytometry into single cells to establish reporter cell lines. To deliver sgRNAs, the lentiviral construct containing the dual-gRNA cassette and SFFV driving eCas9 were packaged using HEK293 cells to produce lentivirus. The reporter 3T3 cell lines generated above were cultured in 6-well plates and infected with lentivirus containing dual-sgRNA cassette targeting each gene individually. BFP fluorescent was measured at 7, 14, and 20 days post infection (dpi). Fluorescent changes at 20 dpi were used to evaluate the gRNA editing efficiency. In total, 98 dual-sgRNA cassettes were tested for 36 genes.

Generation of barcoded CHOOSE lentiviral pool, hESCs infection and EB generation
The CHOOSE lentiviral vector was constructed based on a previously published lentiviral vector which carries a CAG driving ERT2-Cre-ERT2-P2A-EGFP-P2Apuro cassette (13). A multi-cloning site including NheI and SgsI recognition sequences was introduced to the 3' LTR of the lentivirus backbone according to the CROP-seq vector design (14). Then the original U6-sgRNA expression cassette was removed, instead the dual-sgRNA (U6-sgRNA1-H1-sgRNA2) cassette was introduced to the 3' LTR cloning site. To generate a barcoded library, the following primers were used to individually amplify (8-10 cycles, monitored using a qPCR machine, stopped when reaching to logarithmic phase) each dual-sgRNA cassette from the lentiviral construct used in the reporter assay, while introducing a 15 bp barcode. FW primer: 5'-tcgaccgctagcagggcctatttcccatga-3'. RV primer: 5'-cagtagggcgcgccNVDNHBNVDNHBNVDccggcgaaccatgatcaaa-3' Equal molar amount of amplicons for ASD library (36 paired sgRNAs targeting ASD genes) or control library (a paired non-targeting control gRNAs) were pooled. Amplicons and lentiviral backbone were then digested with FastDigest NheI (Thermofisher, cat. no. FD0973) and FastDigest SgsI (Thermofisher, cat. no. FD1894) and gel purified. Ligation was performed using T4 DNA ligase (Thermofisher, cat. no. EL0011) and cleaned up by Phenol-Chloroform extraction. In total, 90 ng of ASD library plasmids and 30 ng of control library plasmids were used for electroporation of MegaX DH10B T1R Electrocomp TM Cells (Thermofisher, cat.no. C640003) following the manufacturer's guide. Bacteria were plated on LB media plates containing ampicillin. Dilutions were performed to calculate the complexity. 2.6 X 10 7 colonies were obtained for the ASD library and 0.5 X 10 7 colonies were obtained for the control library. Lentivirus were packaged using HEK293T cells and infection of hESC were performed as before (13). Infection rate was controlled to be lower than 5% to prevent multiple infections (15). 6.6 X10 5 ASD library cells and 2.3 X10 5 control library cells positive for GFP were sorted by flow cytometry. Cells were recovered and passaged two times in 10cm dishes to maintain maximum complexity. Cells were mixed with a ratio of 96: 4 (ASD: Control) and then used to make EBs. Organoids were cultured using conditions described above.
Cerebral organoid tissue dissociation and scRNA-seq For each library, 3-6 organoids at four months were pooled, washed twice in DPBS-/-and dissociated using gentleMACS TM dissociator in trypsin/accutase (1X) solution with TURBO TM DNase (2 ul/mL, ThermoFisher, cat. no. AM2238). After dissociation, DPBS-/-supplemented with 10% FBS (DPBS/10%FBS) was gradually added to stop reaction. Samples were then centrifuged at 400g for 5min at 4řC and the supernatant was aspirated without touching the pellet. The pellet was then resuspended in additional 1-2 ml of DPBS/10%FBS, filtered through 70 µm strainer and FACS tubes. Cells were then stained with viability dye DRAQ7 (Biostatus, DR70250, 0.3mM). Target live cells have been sorted with a BD FACSAria TM III on Alexa 700 filter with low pressure (100 µm nozzle) and collected in DPBS/10%FBS at 4řC. Cells were then centrifuged and resuspended in FBS/10%PBS to achieve a target concentration of 450-1000 cells per ul. Samples with more than 85% viability will be processed. For each library, 16,000 cells were loaded onto a 10X chromium controller to target a recovery of 10, 000 cells. 8 libraries (31 organoids in total) using the Chromium Single Cell 3' Reagent Kits (v3.1) were prepared following the 10X user guide. Libraries were sequenced on a Novaseq S2 flow cell with 4.6 billion paired-end reads.
Custom genomic reference Each cell expresses eCas9 from a genomic locus (AAVS1) and a polyadenylated dual-sgRNA cassette, which is delivered by lentivirus and integrated into the genome. To cover these extrinsic elements, we built a custom genomic reference for mapping 10x single-cell data by amending the GRCh38 human reference. As the individual gRNA sequences differed, we masked them by N's not to interfere with mapping (individual gRNA information is distinguished in a separate counting pipeline). The sequences added covered the genomic loci of AAVS1 with eCas9-dTomato-WPRE-SV40 and the masked lentiviral construct.
Emulsion PCR and target amplification Emulsion PCR was used to recover gRNA and unique clone barcode (UCB) sequences from plasmid libraries, genomic DNA extracted from lentivirus infected hESCs, and 10X single cell cDNA libraries to reduce PCR bias and to prevent the generation of chimeric PCR products (54,55). AmpliTaq Gold TM 360 master mix (Thermofisher, cat. no. 4398876) was used for all PCR reactions. Emulsion PCR was performed using Micellula DNA Emulsion & Purification Kit (EURX, cat. no. E3600) according to the manufacturer's guide. For target amplification from 10X single cell libraries, hemi-nested emulsion PCR were performed using the following primers: 1st PCR FW: 5'-gcagacaaatggctgaacgctgacg-3' RV: 5'-ccctacacgacgctcttccgatct-3' 2nd PCR FW: 5'-ggagttcagacgtgtgctcttccgatcttgggaatcttataagttctgtatgagaccactctttcc-3' RV: 5'-ccctacacgacgctcttccgatctx-3' Amplicons were then indexed with unique NEB dual indexing primers and amplifications were monitored in a qPCR machine and stopped when reaching the logarithmic phase. Amplicons were sequenced using Illumina Nextseq2000 or Novaseq6000 systems. All primers used can also be found in Supplementary Table 1. gRNA and UCB recovery and analyses gRNA sequences were extracted by cutting 5'-and 3'-flanking regions with cutadapt (10% error rate, 1nt/3nt overlap, no indels) (56). Sequences were filtered to be between 15nt to 21nt long. The corrected cell barcode (CBC) and unique molecular identifier (UMI) of each read was derived via the 10x Genomics Cell Ranger 6.0.1 alignment. Only reads with a corresponding GEX cell were accepted. Reads and target sequences were joined by allowing partial overlaps and hamming distances of 2. Reads are counted towards unique CBC-UMI-gRNA combinations. A read count cutoff of 1% of median read count of the UMI with highest reads count per cell was applied. Cells with only one gRNA and more than 1 read were kept. In addition, within unique CBC-UMI combinations, only gRNA with more than 20% of the maximal read count of that group were kept. After read filtering, UMI were counted for each CBC-gRNA combination. If more than one gRNA were found within a cell, only the gRNAs with equal UMI count compared to the maximum UMI count were kept. Only 1-to-1 combinations were considered further. Analogous to gRNA extraction, UCB was extracted with at least 6nt overlap to the flanks. Sequences with 12nt length were selected and had to follow the synthesis pattern. Further processing was done analogous to gRNA.
Preprocessing and annotation of single-cell transcriptomics data We first aligned reads to the above defined custom genomic reference with Cell Ranger 6.0 (10x Genomics) using pre-mRNA gene models and default parameters to produce the cell-by-gene, Unique Molecular Identifier (UMI) count matrix. UMI counts were then analyzed in R, using the Seurat v4 (57). We first filtered features detected in min 3 cells. Next, we filtered high quality cells based on the number of genes detected (min. 1000, max. 8000), removing cells with high mitochondrial (<15%) or ribosomal (<20%) mRNA content. Additionally, cells for which no gRNA could be detected were filtered out. Thereafter, expression matrices of high-quality cells were normalized ("LogNormalize") and scaled to a total expression of 10K UMI for each cell. Principal component analysis (PCA) was 2000 most variable features (RNA). For each cluster, peak accessibility was summarized by computing the arithmetic mean from binarized peak counts so that each cell in the cluster was represented by the detection probability vector of each peak.
To constrain the set of peaks considered by Pando, we used the union of PhastCons conserved elements (64) from an alignment of 30 mammals (obtained from https://genome.ucsc.edu/) and candidate cis-regulatory elements (cCREs) derived from the ENCODE project (65)(initiate_grn()). In these regions we scanned for TF motifs (find_motifs()) based on the motif database shipped with Pando, which was compiled from motifs derived from JASPAR and CIS-BP. Based on motif matches, cell-level log-normalized transcript counts and cluster-level peak accessibilities, we then inferred the GRN using the Pando function infer_grn() (peak_to_gene_method = 'GREAT', upstream = 100,000, downstream = 100,000) for the 5000 most variable features. Here, genes were associated with candidate regulatory regions in a 100,000 radius around the gene body using the method proposed by GREAT (66). From the model coefficients returned by Pando, TF modules were constructed using the function find_modules() (p_thresh = 0.05, rsq_thresh = 0.1, nvar_thresh = 10, min_genes_per_module = 5). To visualize subnetworks centered around one TF, we computed the shortest path from the TF to every gene in the GRN graph. If there were multiple shortest paths, we retained the one with the lowest average p-value. The resulting graph was visualized with the R-package ggraph (https://github.com/thomasp85/ggraph) using the circular tree layout.
Enrichment testing for TF modules To find subnetworks of the GRN at which ASD-associated genes accumulate, we first obtained a list of ASD risk genes from SFARI (https://gene.sfari.org/database/gene-scoring/). For all genes included in SFARI (1031 genes), we tested for enrichment in TF modules using a Fisher exact test. All genes expressed in >5% of cells in our dataset (12079 genes) were treated as the background. Fisher test p-values were multiple testing corrected by using the Benjamini-Hochberg method and significant enrichment was defined as FDR < 0.01 and > 2-fold enrichment (odds ratio). To assess which TF modules were most affected by genetic perturbations of ASD associated genes we similarly used a Fisher exact test. For the set of TOP-DEGs we tested for enrichment in any of the inferred TF modules. Here, all genes included in the GRN (5000 most variable features) were treated as the background.
Cellrank analysis To better understand the differentiation trajectories leading up to inhibitory neuron populations, we used CellRank (39) to compute transition probabilities into each terminal fate based on the previously computed velocity pseudotime. First, the clusters with the highest pseudotime for each terminal cell state were annotated as terminal states. We then constructed a Palantir kernel (PalantirKernel()) (67) based on velocity pseudotime and used Generalized Perron Cluster Cluster Analysis (68)(GPCCA()) to compute a terminal fate probability matrix (compute_absorption_probabilities()). All cellrank functions were run with default parameters. Fate probabilities for each cell were visualized using a circular projection (69). In brief, we evenly spaced terminal states around a circle and assigned each state an angle α t . We then computed 2D coordinates (x i , y i ) from the F ∈ R N ×n t transition probability matrix for N cells and n t terminal states as To visualize enrichment of perturbed cells in this space, we used the method outlined in Nikolova et al. (60). Here, the kNN graph (k=100) was computed using euclidean distances in fate probability space and enrichment scores were visualized on the circular projection. Otherwise the method was performed as described above.
Immunofluorescence Organoid tissues were fixed in paraformaldehyde at 4řC overnight followed by washing in PBS three times for 10 min. Tissues were then allowed to sink in 30% sucrose overnight, followed by embedding in O. Microscopy, image processing and quantification Tissue sections were imaged using an Olympus IX3 Series inverted microscope, equipped with a dual-camera Yokogawa W1 spinning disk. Images were acquired with 10x/0.75 (Air) WD 0.6 mm or 40X 0.75 (Air) WD 0.5 mm objectives and produced by the Cellsense software. For DLX2 and OLIG2 quantification, images were processed and quantified using Fiji. Based on the size of the tissue, 5-12 regions from each organoid were selected using the Hoechst channel. In total, 108 areas (13 organoids from 4 batches) from ARID1B control group (c.2201dupG repair), 104 areas (15 organoids from 4 batches) from ARID1B+/-(c.2201dupG) group, and 94 areas (15 organoids from 3 batches) from ARID1B+/-(6q25.3del) group are collected and subjected to an automatic segmentation using a Fiji macro. Both DLX2 and OLIG2 channels are used to define the cell body area, followed by the intensity measurement. Area mean intensity was used for setting up the threshold.

Patient sample collection
The study was approved by the local ethics committee of the Medical University of Vienna (MUV). Study inclusion criteria were as follows: 1) mutation in the ARID1B gene proven by whole exome sequencing; 2) age between 0 and 18 years; 3) continuous follow-up at the Vienna General Hospital; 4) availability of fetal brain MRI data. After informed consent, 10 ml of blood were collected from two selected patients for iPSC reprogramming.
Reprogramming of PBMCs into iPSCs IPS cells were generated from PBMCs isolated from patient blood samples as previously described (70). Briefly, 10 ml blood was collected in sodium citrate collection tubes. PBMCs were isolated via a Ficoll-Paque density gradient and erythroblasts were expanded for 9 days. Erythroblast-enriched populations were infected with Sendai Vectors expressing human OCT3/4, SOX2, KLF4 and cMYC (CytoTune, Life Technologies, A1377801). Three days after infection, cells were switched to mouse embryonic fibroblast feeder layers. Five days after infection, the medium was changed to IPSC media (KoSR + FGF2). Ten to 21 days after infection, the transduced cells began to form colonies that exhibited IPSC morphology. IPSC colonies were picked and passaged every 5 to 7 days after transfer to the mTeSR culture system (Stemcell Technologies).
Generation of isogenic control cell line for ARID1B patient Isogenic control cell lines of patient 1 were generated using Crispr/Cas9. S. pyogenes Cas9 protein with two nuclear localization signals was purified as previously described (71). gRNA transcription was performed with HiScribe T7 High Yield RNA Synthesis Kit (NEB) according to the manufacturer's protocol and gRNAs were purified via Phenol:Chloroform:Isoamyl alcohol (25:24:1; Applichem) extraction followed by ethanol precipitation. The HDR template (custom ssODN (Integrated DNA Technologies)) was designed to span 100 bp up and downstream of the mutation site. IPSCs had been grown in mTeSR for 14 passages before the procedure. For generation of isogenic control cell lines, cells were washed with D-PBS-/-and incubated for 5 min at 37řC with 1 ml of accutase solution (Sigma-Aldrich A6964-500ML). The plate was gently tapped to detach cells and cells were gently pipetted to generate a single cell suspension, pelleted by spinning at 200 g for 3min, and counted using Trypan Blue solution (ThermoFisher Scientific). For nucleofection, 1.0 x 10 6 cells were spun down and resuspended in Buffer R of the Neon Transfection System (Thermo Fisher Scientific) at a concentration of 2x107 cells/ml. 12 ng of sgRNA and 5 ng of Cas9 protein were combined in resuspension buffer to form the Cas9/sgRNA RNP complex. The reaction was mixed and incubated at 37řC for 5 min. 5 µl of the HDR template (100 µM) were added to the Cas9/sgRNA RNP complex and combined with the cell suspension. Electroporations were performed using a Neon®Transfection System (Thermo Fisher Scientific) with 100 µl Neon®Pipette Tips using the ES cells electroporation protocol (1400 V, 10 ms, 3 pulses). Cells were seeded in one matrigel-coated well of a 6 well-plate in mTeSR. After a recovery period of 3 days, a single-cell suspension was generated and cells were split into another well of a 6 well-plate for banking; and sparsely into two 10 cm dishes for colony formation from single cells. After colony growth for one week, individual colonies were picked and seeded each into one well of a 96 well-plate. After colony expansion, gDNA was extracted using DNA QuickExtract Solution (Lucigen), followed by PCR and Sanger sequencing to determine efficient repair of the mutation.
Fetal MRI and 3D reconstruction Women with singleton pregnancies undergoing fetal MRI at a tertiary care center from January 2016 and December 2021 were retrospectively reviewed. This study was approved by the institutional ethics board and all examinations were clinically indicated. A retrospective review of patient records was performed and a patient with a positive genetic testing report for ARID1B mutation was selected. The participant was included in further analysis, and the gestational age (given in gestational weeks and days post menstruation) was determined by first-trimester ultrasound. High-quality superresolution reconstruction was obtained (72). Age-matched control cases were identified and included if they presented with an absence of severe cerebral or cardiac anomalies or fetal growth restriction. Fetal MRI scans were conducted using 1.5 T (Philips Ingenia/Intera, Best, Netherlands) and 3 T magnets (Philips Achieva, Best, Netherlands). The mother was examined in a supine or, if necessary, left recumbent position to achieve sufficient imaging quality. The examinations were performed within 45 min, neither sedation nor MRI contrast medium were applied and both the fetal head and body were imaged. Fetal brain imaging included T2-weighted sequences in three orthogonal planes (slice thickness 3-4 mm, echo time = 140 ms, field of view = 230 mm) of the fetal head. Post-processing was conducted in a similar methodology as before (73,74). Super-resolution imaging was generated using a volumetric super-resolution algorithm (75). The resulting super-resolution data were quality assessed and only cases that met high-quality standards (score 2 out of 5) were included in the analysis. Segmentation of the GE (LGE and CGE) was performed manually using the open-source application ITK-SNAP (75). To delineate the T2-weighted hypointense GE, histological fetal atlantes by Bayer and Altman were used as a reference guide (76)(77)(78). Volumetric data was extracted and calculations for the LGE and CGE were made based on the investigated gestational ages.    S4. Generation of cerebral organoids using the CHOOSE system. a, Schematic showing the protocol used for generating cerebral organoids. 4-OHT was added to induce eCas9 expression at day 5. A short 3-day CHIR treatment was applied from day 12-14. Culture medium was switched to Brainphys after day 60 for advanced neuronal maturation. Tissues are collected at day 120 and subjected to scRNA-seq assays. b, ESCs were infected by lentivirus and GFP positive cells were collected to make EBs. Microscopic images show the overall morphology and GFP expression of brain organoids overtime. Top, bright-field; Bottom, GFP fluorescence. c. Immunohistochemical staining of cerebral organoids at day 24. GFP labels cells infected with lentivirus and Tomato is a reporter for eCas9 expression. SOX2 is a marker for progenitor cells. FOXG1 is a marker for tissues with telencephalon identity.      Bottom, sanger sequencing showing the mutation was repaired and this cell line is used as an isogenic control. Sequencing was done on fragments amplified from genomic DNA extracted from the two iPSC lines. b, SNP array genotyping of PBMCs and iPSCs from patient 2 shows a microdeletion (dark green arrow) identified on chromosome 6. c, Dot plot of the GE (LGE and CGE) volume of patient 2 (dark green circles) at two gestation stages. The white circles represent age-matched controls. A simple linear regression model was constructed using data from all controls. The gray area represents the 95% confidence interval.