Background & Summary

Rett syndrome (RTT) is an X-linked neurodevelopmental disorder mostly caused by heterozygous de novo mutation in the methyl-CpG-binding protein 2 gene (MECP2) and predominantly affecting females1. MECP2 duplications have been identified in males with developmental encephalopathy, seizures, autistic features, and recurrent infection2. These clinical disorders illustrate the critical requirement for proper MECP2 expression in human brain development, though how MeCP2 dysfunction leads to the RTT phenotype is unclear.

MeCP2 acts as a global transcriptional regulator by recruiting chromatin-remodeling complexes or regulating higher-order chromatin structures3,4,5,6,7,8. Thus, MeCP2 may be required for fine-tuning the gene expression for a network of protein-coding genes through both direct and indirect mechanisms. Consistent with this hypothesis, small magnitude changes in gene expression have been detected in brain tissue from either human postmortem RTT samples or mouse Mecp2-mutants9,10,11,12. However, most transcriptional studies of postmortem RTT brain have used microarray platforms with small numbers and a lack of age-matched control samples, which impact the sensitivity for detecting transcriptional changes. One study used both microarrays and RNA sequencing (RNA-seq) to examine frontal and temporal cortex from individuals with RTT compared to controls and identified over 200 differentially expressed genes after normalizing data for neuron versus glia composition of samples13. Another larger study used RNA-seq to examine motor cortex and cerebellum and identified over 2,000 differentially expressed genes with a global increase in expression14.

We generated RNA-seq data using brain samples for two distinct brain regions, temporal cortex and cingulate cortex, from four female RTT and four age-matched female donors. Reduced volume and dendritic branching of neurons in the temporal cortex and reduced connectivity of the cingulate cortex have been reported in RTT, indicating the importance of these brain regions in the disorder15,16,17,18. We also compared our data with the transcriptomic profiles of RTT brain samples from published RNA-seq datasets13,14. The composite analysis will be useful to facilitate interpretation and further understanding of MECP2-mediated changes in human brain.

Methods

Brain samples

Postmortem brain tissue samples were obtained from the Harvard Brain Bank (http://hbtrc.mclean.harvard.edu/) and the National Institutes of Health (NIH) NeuroBioBank (https://neurobiobank.nih.gov), with approval from the coordinating foundation (https://www.rettsyndrome.org). Consent was obtained from next of kin and tissue was collected with approval from the Partners Human Research Committee for the Harvard Brain Bank and from The University of Maryland Institutional Review Board (IRB) and The Maryland Department of Health and Mental Hygiene IRB for the NeuroBioBank. Work was approved by the University of Southern California and is compliant with all ethical regulations. Frozen temporal (BA36/38) and cingulate cortex samples were obtained from four RTT and four control (CTL) brain donors that were matched in age (Fig. 1). The Harvard Brain Bank sequenced MECP2 coding exons and reported intragenic mutations in two of the four brains. Brain donor characteristics are described in Table 1.

Fig. 1
figure 1

Overview of the experimental workflow.

Table 1 Brain Donor Characteristics.

MECP2 variant confirmation

Genomic DNA was isolated from brain samples for 7773 and 7783 using the PureLink Genomic DNA Kit (LifeTechnologies) according to the manufacturer’s protocol. We performed Sanger sequencing of MECP2 to verify the reported variants (Table 1). Chromatograms were aligned to MECP2 (ENSG00000169057) using MAFFT v719. No additional genes were screened.

RNA sample and library preparation

Total RNA was previously isolated using the Qiagen RNeasy Kit according to the manufacturer’s instructions20. Double stranded cDNA fragments were synthesized from mRNA, ligated with adapters, and size-selected for library construction according to the TruSeq Sample Preparation v2 protocol using 0.5–1.5 μg of total RNA (Table 2). ERCC RNA spike-in controls were not included in this experiment. Library quality was measured using an Agilent 2100 Bioanalyzer and concentration was assessed by PicoGreen incorporation. Barcoded libraries were pooled and sequenced in two lanes using an Illumina HiSeq 2000 sequencer.

Table 2 RNA Sample Characteristics.

RNA-Seq data analysis

Single-end reads (100 bp) were aligned to the Human reference genome (NCBI build 37/hg19) using STAR v2.5.3a21 (see Code Availability 1). Aligned reads mapping to the exons of a gene were summarized into gene counts using featureCounts v1.622 (see Code Availability 2). Picard CollecteRnaSeqMetrics was used to measure the 3′ bias of genes in the RNA-seq data (see Code Availability 3). Gene-level differential expression was analyzed using DESeq223 specifying ~ region + group + bias as the experimental design (see Code Availability 4). Aligned reads mapping to MECP2 isoforms were also summarized using featureCounts v1.622 (see Code availability 2) by substituting isoforms for gene name.

Data Records

Count matrix and normalized count matrix were submitted to the NCBI Gene Expression Omnibus (GEO) under accession number GSE12838024. The raw FASTQ files can be downloaded from the Sequence Read Archive (SRA) under accession number SRP18855525.

Technical Validation

MECP2 variant confirmation

We verified the presence of the MECP2 c.473 C > T (p.Thr158Met) intragenic variant using DNA isolated from brain 7773 (Supplemental Fig. 1). No MECP2 variants were detected in exons 2–4 of brain 7783. Since we were unable to amplify exon 1 in 7783, we infer exon 1 is likely to be the deleted exon. We also examined RNA-seq data for presence of MECP2 variants (Supplemental Fig. 2). The MECP2 c.473 C > T (p.Thr158Met) intragenic variant was also detected in RNA-seq data from CCTX and TCTX for brain 7773. MECP2 variants were not detected in RNA-seq data for other RTT brain samples, possibly due to low sequencing read depth of MECP2 (Supplemental Fig. 3), or because causal variants are present in another gene26,27.

RNA and data quality

RNA quality was determined using the Agilent 2100 Bioanalyzer and the RNA 6000 Pico Kit and high-quality RNA was obtained from all samples (RNA integrity number [RIN] > 8.0; median RIN = 9.4 [Table 2]). At the time the experiment was performed, the TruSeq RNA Sample Prep v2 protocol (Part # 15026495 Rev.C, May 2012) was optimized for 0.1–4 μg of total RNA. Although the quantity of RNA input varied among the samples in our experiment, it was equivalent within each age- and tissue- matched case-control sample pair, and all samples were within the optimized range. On average, RNA-seq generated 21.9 million high-quality reads per sample, 70.3% of which mapped uniquely to the Human reference genome (NCBI build 37/hg19) (Table 3). RIN and RNA quantity were each correlated with the number of uniquely mapped reads (Fig. 2). Cook’s distance was calculated to test for outliers, with none detected (Fig. 3a). The first principal component explained over 50% of the variance (Fig. 3b). A correlation matrix based on the gene expression data indicated that samples mostly cluster by individual and diagnostic group, but also by 3′ bias (Fig. 3c).

Table 3 RNA-seq Data Mapping Statistics.
Fig. 2
figure 2

RNA quality or RNA quantity versus number of uniquely mapped reads.

Fig. 3
figure 3

RNA-seq data quality assessment. (a) Boxplots showing Cook’s distance calculated for each sample. (b) Principal component analysis with samples colored by diagnostic group (CTL, RTT), brain region (CCTX, TCTX), or brain donor. (c) Heatmap of the sample distance matrix. Presence (black) or absence (grey) of 3′ bias in RNA-seq data is indicated for each sample.

MECP2 and MET differential expression

We previously used quantitative reverse transcription PCR to compare expression of MECP2_e1 (NM_004992.3), MECP2_e2 (NM_001110792.1), and MET (NM_000245.3) in the temporal cortex between RTT and CTL brains28. Consistent with our previous results, the RNA-seq data showed no significant difference in MECP2 expression between RTT and CTL brains (FDR adjusted p-value = 0.16 and 0.59, respectively), while MET expression was significantly reduced in RTT brains (FDR = 1.07 × 10−05; Fig. 4).

Fig. 4
figure 4

Boxplots showing the expression of MECP2_e1, MECP2_e2, and MET in RTT and CTL brain. Expression values are shown as normalized counts.

Compatibility with published transcriptional profiles

Two RNA-seq datasets of postmortem brain from females with RTT compared to controls have been published13,14 (Table 4). The first dataset examined pooled frontal and temporal cortex (FTTX) for each of three individuals with RTT compared to three CTL and is available from the Sequence Read Archive under accession number PRJNA30268529. The second larger dataset examined motor cortex (Motor) and cerebellum (Cblm) for nine females and six females with RTT, respectively, compared to eight CTL of each tissue, but the primary data were not accessible14. We downloaded the FASTQ files for the available dataset, aligned reads using salmon30 (see Code Availability 6), summarized the aligned reads into gene counts using tximport v1.12.131 (see Code Availability 7), and retained genes with ≥10 counts in ≥3 samples. Count data were converted to logCPM to adjust for the total counts per sample using limma v3.40.232 (see Code Availability 8), then observation-level and sample-level weights were estimated using voom32,33. We also reanalyzed our data using this workflow. Analysis of variance models were fit separately for each of three brain regions (CCTX, FCTX, FTTX), then combined in a random effects meta-analysis using GeneMeta v1.56.034 (see Code Availability 9). Our meta-analysis identified 1,455 genes that were significantly differentially expressed (FDR < 0.05) between brain samples from control individuals and those with RTT.

Table 4 Published RTT Brain RNA-seq datasets.

To verify these results, we compared the results from our meta-analysis with differential gene expression results from previous RTT RNA-seq analyses13,14 (Fig. 5). We compared the Z-score for each of the significantly differentially expressed genes from our meta-analysis with the log2 fold change from our previous analysis (GEO DESeq2) and from each of the three published RNA-seq datasets (Lin et al., Gogliotti et al. Motor, and Gogliotti et al. Cblm; Fig. 5a). We found strong concordance among RTT transcriptional profiles from regions within the cerebral cortex, while RTT transcriptional profiles from the cerebellum were least correlated with the regions from the cerebral cortex (Fig. 5b). We aggregated the gene-wise correlation coefficients among datasets and found an overall positive correlation for 63% of the comparisons among datasets, indicating an overall agreement among the differential gene expression per dataset (Fig. 5c). Not only do our data represent an independent technical and biological replication of molecular alterations in RTT brain, but our meta-analysis demonstrates the power of combining datasets to maximize detectable results among several smaller studies.

Fig. 5
figure 5

Replication of differential gene expression between RTT and CTL brain. (a) Meta-analysis Z-Score compared to log2 fold change (FC) between RTT and CTL from our initial analysis (GEO), Lin et al. combined frontal and temporal cortex (from Table S5)13, Gogliotti et al. motor cortex (from Table S2)14, and Gogliotti et al. cerebellum (from Table S3)14. Genes with significant differential expression (False Discovery Rate [FDR] < 0.05) in the dataset represented on the X-axis are in red. (b) Spearman’s correlation between meta-analysis Z-score and logFC for each of the other datasets. Color intensity and circle size are proportional to the correlation coefficients with values displayed below the diagonal. (c) Density of gene-wise correlation coefficients among datasets in (b).