Transcriptome analysis of MBD5-associated neurodevelopmental disorder (MAND) neural progenitor cells reveals dysregulation of autism-associated genes

MBD5-associated neurodevelopmental disorder (MAND) is an autism spectrum disorder (ASD) characterized by intellectual disability, motor delay, speech impairment and behavioral problems; however, the biological role of methyl-CpG-binding domain 5, MBD5, in neurodevelopment and ASD remains largely undefined. Hence, we created neural progenitor cells (NPC) derived from individuals with chromosome 2q23.1 deletion and conducted RNA-seq to identify differentially expressed genes (DEGs) and the biological processes and pathways altered in MAND. Primary skin fibroblasts from three unrelated individuals with MAND and four unrelated controls were converted into induced pluripotent stem cell (iPSC) lines, followed by directed differentiation of iPSC to NPC. Transcriptome analysis of MAND NPC revealed 468 DEGs (q < 0.05), including 20 ASD-associated genes. Comparison of DEGs in MAND with SFARI syndromic autism genes revealed a striking significant overlap in biological processes commonly altered in neurodevelopmental phenotypes, with TGFβ, Hippo signaling, DNA replication, and cell cycle among the top enriched pathways. Overall, these transcriptome deviations provide potential connections to the overlapping neurocognitive and neuropsychiatric phenotypes associated with key high-risk ASD genes, including chromatin modifiers and epigenetic modulators, that play significant roles in these disease states.


Materials and methods
This study was approved by the Baylor College of Medicine Institutional Review Board. All samples and information were collected after obtaining written informed consent. Whole blood and skin samples were provided by each participant. Participant recruitment, sample collection, and all experiments were performed in accordance with relevant guidelines and regulations by the Baylor College of Medicine Institutional Review Board and associated regulatory committees.
Reprogramming skin fibroblasts into iPSC with episomal plasmids. Early passages (between passages p3 and p5) of skin fibroblasts were cultured at low oxygen atmosphere of 4% O 2 /5% CO 2 /91% N 2 and maintained in the condition for the entire reprograming period to enhance iPSC generation 21,26 . Episomal vectors carrying shRNA for p53 suppression and non-transforming MYCL, in addition to the usual reprogramming genes, POU5F1, SOX2, KLF4, and LIN28 were employed to reprogram the fibroblasts 27 . Three micrograms of combined episomal plasmids (Addgene 27077, 27078 and 27080) were electroporated into 6 × 10 5 fibroblast cells by using a Nucleofector II device (Lonza) and an Amaxa NHDF Nucleofector kit (Lonza) according to the manufacturer's instructions 28 . The electroporated cells were allowed to recover for 2 to 4 days by culturing in the above conditions. Cells (2 × 10 5 ) were placed into 100 mm dishes previously coated with Matrigel (BD Bioscience). The following day the culture medium was switched to mTeSR1 29 (STEMCELL Technologies). Colonies emerged about 14 days post-transduction. Colonies were mechanically isolated around day 20 and expanded under feeder free conditions in mTeSR1 medium on a Matrigel coated substratum.

Gene expression network analysis: biological processes and enrichment.
To identify the prevalent biological pathways affected in MAND due to a 2q23.1 deletion, we used four independent pathway analysis databases (BIOCARTA, KEGG 34 , PANTHER 35 and REACTOM 36 ) to incorporate and assess the 468 differentially expressed genes (significance cut off p = 0.05). Since each of these approaches utilize different methodologies and algorithms, we conducted each analysis separately to provide the most comprehensive approach to pathway interrogation. We also utilized Gene Set Enrichment Analysis (GSEA) 37 with default parameters for Gene Ontology (GO) analysis of biological processes, molecular functions and pathways. To identify the enrichment of the 468 differentially expressed genes by chromosomal location, we utilized Enrichr 38 . This enrichment analysis relied on ranking the list of genes based on statistical significance with the default settings of GSEA 39 .

Results
Identification of MAND (2q23.1 deletion syndrome) patients. Three individuals with 2q23.1 deletion syndrome were identified clinically by chromosomal microarray analysis (CMA) (Fig. 1a, Supplementary Table 1). Despite the differently sized deletions (Fig. 1a), all deletions include MBD5 exon 1, containing the transcription start site, as well as the MBD5 translation start site (located in exon 6), and as a consequence, all three patients have ~ 50% MBD5 mRNA expression (p < 0.0001) resulting in haploinsufficiency (Fig. 1b). These three patients have phenotypes commonly observed in MAND, including developmental delay, intellectual disability, sleep disturbance, seizures, language delay, hypotonia, mild dysmorphic features, and ASD-associated behaviors (short attention span and stereotypic, repetitive behaviors), consistent with the most prevalent clinical features of 2q23.1 deletion syndrome 5 . Since the primary phenotypic concerns for patients with 2q23.1 deletion syndrome are neurodevelopmental, we took an approach to investigate the molecular mechanism(s) underlying these phenotypes by creating neural progenitor stem cells from fibroblast cell lines derived from skin biopsies from each patient (Fig. 1c).
Generation and characterization of iPSC derived from MAND patient fibroblasts. Skin fibroblast cell lines from the three MAND patients and four normal controls were developed within one month from the time of the biopsy. The fibroblasts were expanded, stored, and then transduced to produce iPSCs in a feeder-free and defined culture system ( Supplementary Fig. 1). The generated iPSC colonies expressed pluripotent markers, alkaline phosphatase, POU5F1 (also known as OCT4), NANOG, and TRA-1-60 ( Supplementary  Fig. 1). Programming efficiencies for MAND1-3 were 0.016, 0.4, and 0.022%, respectively. The successfully generated colonies were banked, and we utilized a subset of the iPSC lines to characterize MAND1-iPSC, MAND2-iPSC, MAND3-iPSC ( Fig. 2a,b, Supplementary Fig. 2c), and CTR1-iPSC, CTR2-iPSC, CTR3-iPSC, and CTR4-iPSC ( Supplementary Fig. 2a,b). Isolated DNAs from the iPSC lines were analyzed by chromosomal microarray (CMA) to confirm the molecular karyotype for each cell line was preserved and consistent with the original fibroblast line (Fig. 1a). Furthermore, we isolated RNA from these iPSCs and confirmed reduced MBD5 mRNA expression in the MAND-iPSCs (Fig. 2c).

Generation and characterization of NPCs derived from MAND-iPSCs.
Using an established neural induction monolayer culture protocols (STEMCELL Technologies) 42,43 , MAND and control iPSC lines were differentiated into PAX6 expressing NPCs (Fig. 2a,b & Supplementary Fig. 2). Expression of the pluripotent stem cell marker NANOG and POU5F1, which was initially expressed in 100% of the iPSCs before the differentiation, had decreased to undetectable level. Another neural stem cell marker NES was expressed (Fig. 2b). Finally, DNA and RNA were isolated for CMA and RT-qPCR studies, respectively, to confirm cells maintained their original integrity during differentiation into neural lineage cells. CMAs of each NPC line were consistent with fibroblast and iPSC analyses, and RT-qPCR studies confirmed significantly reduced MBD5 mRNA expression in the MAND-NPC lines (Fig. 2d). investigate the genes, pathways, and biological processes that are disrupted due to MBD5 haploinsufficiency that might provide insight into the etiology of this complex neurodevelopmental condition, we performed RNAseq analysis on these established NPC lines. RNA-seq analysis (n = 3 MAND-NPCs, n = 4 CTR-NPCs) initially identified 498 differentially expressed genes in MAND-NPCs (Fig. 3a). From these 498 genes, there was an even distribution of upregulated and downregulated genes, confirming that MBD5 can act as a positive and negative regulator of gene expression (Fig. 3b). The top differentially expressed (up and down) genes in MAND-NPCs compared to the Control-NPCs are depicted in a heat map (Fig. 3c). The 228 genes up-regulated in MAND-NPCs (Fig. 3b,   To identify the key pathways impacted by the 468 differentially expressed genes, we ran our differential gene list through BIOCARTA and PANTHER pathway databases. Overall, our gene set was enriched in pathways involved in signal transduction, neuronal system, developmental biology, hemostasis, immune system, and disease ( Supplementary Fig. 4). Key pathways that contained MAND differentially Transcriptional impact of MAND-associated 2q23.1 deletion on expression of autism-associated genes. We queried if there was overlap between the differentially expressed genes identified via RNAseq in MAND NPCs and those genes known to be associated with autism. We compared MAND RNA-seq data to syndromic and non-syndromic ASD genes from SFARI (https:// www. sfari. org/ resou rce/ sfari-gene/) (Fig. 4a,  Supplementary Data 3, Tab 1-3). Genes differentially expressed in MAND NPCs show overlap with 4.3% of the ASD associated genes from SFARI (n = 20). We then compared our list with the SFARI autism gene set utilizing ToppCluster 40 . This analysis revealed striking significant overlap between the two gene lists on biological processes, which are in keeping with the neurological phenotype and ASD, and include forebrain and telencephalon regionalization, neuron fate commitment, and forebrain development (Fig. 4b). Interestingly, processes unique to each gene list were also identified, which may be an area for future studies of pathogenesis for each genetic subgroup. Finally, we discovered that several MAND differentially expressed genes are also involved in overrepresented pathways commonly seen in ASD 18,44,45 , including Hippo (8 genes identified in the pathway, i.e. n = 8), MAPK (n = 11), DNA replication/cell cycle (n = 11), and spliceosome formation pathways (n = 7) (Fig. 4c, Supplementary Data 3, Tab 5), providing additional biological connections between MAND and ASD.
FOXG1 expression is significantly increased in MAND NPC. In this study, MAND NPC RNA-seq data revealed significantly increased FOXG1 mRNA levels over the controls, which was confirmed by RT-qPCR (Fig. 5a). FOXG1 is essential for the normal development of the telencephalon, which is one biological process that is highly enriched in the MAND NPC data set ( Supplementary Fig. 3 and Fig. 4b) 48 . Further, duplication of FOXG1 is associated with West syndrome, a neurodevelopmental syndrome that includes epilepsy, developmental delays, and severe speech impairment 46,49,50 , similar to the phenotype observed in MAND. Increased FOXG1 expression was also noted in ASD proband forebrain organoids over the control organoids, and down regulation of FOXG1 in the organoid rescued the ASD-associated phenotype 51 . We compared the growth of MAND NPCs to CTR NPC and found that the doubling time of MAND NPC was significantly slower than CTR NPC (Fig. 5b). Overall, 2q23.1 deletion results in de-repression of FOXG1 and other genes, including SOX11, EZH2, EED, ETV5, and KDM1A, which all function as chromatin modifiers and are implicated in neurogenesis (Supplementary Data 2, Tab 4), further connecting this network of epigenetic modifiers and their critical roles in neurodevelopment.

Discussion
Investigation and understanding of neurodevelopmental disorders has been fueled by the development of iPSC from patients with these complex genetic conditions 14 . In this study, we generated and characterized iPSCs and NPCs that were derived from MAND (2q23.1 deletion syndrome) patients with overlapping phenotypes of ID and autism. By utilizing global transcriptome analysis, we were able identify differentially expressed genes in early neuronal development that are associated with pathways and genes common to ASD. These findings support previous knowledge about the functional role of MBD5 but also provide new insights into the consequences of chromosome 2q23.1 deletion in the regulation of numerous pathways that when altered contribute to the phenotype in 2q23.1 deletion syndrome. In this study, cells were directly collected from patients with MAND and converted to iPSCs and NPCs. This approach allowed us to recapitulate the gene dosage effects in early neuronal developmental stages in human cells. There are advantages of our model compared to previous models to understand MAND. A previous study using an Mbd5 gene-trap mouse model showed abnormal behavior and cognitive impairment 9 ; however, this Mbd5 gene-trap model is not representative of human 2q23.1 deletion syndrome. Another study utilized primary human neural precursor cells that were treated by a short hairpin RNA (shRNA) to suppress MBD5 expression 52 ; however, this methodology resulted only in a single cell line with reduced MBD5 expression, limiting the analysis of data. In the most recent study, Seabra et al. (2020) generated CRISPR-derived neurons with mutation of MBD5 and then conducted RNA-seq 53 . However, MBD5 was not sufficiently dysregulated in the study; thus, the data presented are not truly representative of MAND 53 . The present study utilizing patient-derived NPC provides a representative disease model, consistent with other recent ASD studies that have shown the advantages of using iPSC technology for the study of genetic impact on human early neuronal developmental stage 54 .
Previous studies have shown that 2q23.1 deletions are critical for neurodevelopment and that MBD5 haploinsufficiency is responsible for phenotypes, such as seizures, sleep, speech impairment, and autism [5][6][7]30,55 . These studies fell short in identifying the exact targets of MBD5 and the perturbed biological processes and pathways consequent to MBD5 haploinsufficiency that could explain these neurodevelopmental phenotypes. In our study, we recognize that the dysregulated genes and pathways identified may not necessarily be due to direct effects of MBD5 haploinsufficiency alone and could be due to indirect, downstream effects of reduced MBD5 expression or may be due to reduced expression of other deleted gene(s) in these patient samples. Additional molecular studies are required to understand the effect of MBD5 haploinsufficiency specifically on these dysregulated genes. Differentially expressed genes in NPC of MAND patients included genes associated with other neurodevelopmental disorders (Figs. 3, 4). A study of pathogenic de novo mutations that link to intellectual disability identified  . By the Gene Ontology (GO) enrichment analysis of the transcriptome data, biological functions of neuron development, brain development and cellular processes were altered in the cohort in alignment with the neurodevelopmental phenotype present in these individuals ( Supplementary Fig. 3).
In this study, we took advantage of the ability to derive NPCs from MAND patient cells and RNA-seq analysis to investigate the role of MAND on ASD-associated genes and pathways (Figs. 3, 4). Studies of ASD patientderived neuronal cells have demonstrated altered gene expression and dysregulation of pathways such as Hippo, MAPK, DNA replication/cell cycle, and spliceosome formation pathways 57,58 . While our study may have been hindered by small sample size, our analysis using the SFARI database identified 20 genes (4.3%) linked to ASD with altered gene expression due to 2q23.1 deletion, highlighting the commonalities between the MAND and ASD (Fig. 4a). This percentage is similar to the enrichment for SFARI genes in other key ASD RNA-seq studies 44,[57][58][59] . These genes could be the key to understanding the connection between MAND and autism. Further, six key biological processes are shared between our data set and genes with known association to ASD (Fig. 4b), suggesting that these biological processes are key to understanding the overlapping phenotypes between MAND and ASD. Further study of these biological processes is needed to more clearly elucidate the physiological mechanisms that are altered.
The MAND-derived NPC RNA-seq data revealed differentially expressed genes previously linked to neurodevelopmental disorders. Some of the most interesting connections in phenotypically similar neurodevelopmental disorders include the FOXG1 dosage-linked associations with Rett syndrome and West syndrome. Our findings in this study indicate that MBD5 either directly or indirectly regulates FOXG1 expression in an early neurogenesis stage (Fig. 5). Mutation and/or altered expression of FOXG1 have been shown to cause an autosomal dominant form of Rett syndrome 60 , associated with ASD 59 , West syndrome 50 , and schizophrenia 61 . Since FOXG1 is a dosage sensitive gene, its overexpression would likely affect proper brain development and cognitive functions. Furthermore, this may represent a possible route for convergent phenotypes between MAND and other syndromes, including West syndrome associated with duplication of FOXG1 50 . The slow growth of MAND NPCs (Fig. 5b) may represent a link between FOXG1 dysregulation and improper glutathione synthesis. CHAC2 expression is also altered in our dataset (Supplementary Data 1). CHAC2 is thought to be essential for glutathione maintenance 62,63 . Interestingly, FOXG1 plays a role in the transcriptional repression of CHAC1 which is in the same family as CHAC2 and also plays a role in glutathione maintenance (Fig. 6). CHAC1 is thought be a pro-apoptotic factor and inhibitor of NOTCH signaling in glioblastoma cell lines 64 . Further, FOXG1 is thought to play a role in glutathione maintenance and glutathione is thought to regulate cell proliferation 65,66 . Future molecular studies to assess MBD5 regulation of FOXG1 is necessary especially in relation to glutathione synthesis and Notch signaling (Fig. 6).
Here, our study of MAND with human neural in vitro cultures through patient-derived NPC demonstrates FOXG1 upregulation (Fig. 5), as well as perturbed expression of genes implicated in brain development, especially forebrain regionalization (Fig. 4b), and affected genes significantly overlap with ASD associated genes (Fig. 4a). These findings expand our knowledge about gene network differences and possible interactions between these related disease pathways (Figs. 4c, 6). Such findings may lead to potential connections between overlapping neurocognitive and neuropsychiatric phenotypes and associated risk factors, such as other chromatin modifiers and epigenetic factors that link to ASD. We expect these in vitro models will nurture hypotheses toward therapeutic interventions 67,68 .

Web resources
The URLS for data present herein are as follows: AutismKB

Data availability
The RNA-seq data are available in the Gene Expression Omnibus (GSE141835).