Transcriptome data of temporal and cingulate cortex in the Rett syndrome brain

Rett syndrome is an X-linked neurodevelopmental disorder caused by mutation in the methyl-CpG-binding protein 2 gene (MECP2) in the majority of cases. We describe an RNA sequencing dataset of postmortem brain tissue samples from four females clinically diagnosed with Rett syndrome and four age-matched female donors. The dataset contains 16 transcriptomes, including two brain regions, temporal and cingulate cortex, for each individual. We compared our dataset with published transcriptomic analyses of postmortem brain tissue from Rett syndrome and found consistent gene expression alterations among regions of the cerebral cortex. Our data provide a valuable resource to explore the biology of the human brain in Rett syndrome.


Background & Summary
Rett syndrome (RTT) is an X-linked neurodevelopmental disorder mostly caused by heterozygous de novo mutation in the methyl-CpG-binding protein 2 gene (MECP2) and predominantly affecting females 1 . MECP2 duplications have been identified in males with developmental encephalopathy, seizures, autistic features, and recurrent infection 2 . These clinical disorders illustrate the critical requirement for proper MECP2 expression in human brain development, though how MeCP2 dysfunction leads to the RTT phenotype is unclear.
MeCP2 acts as a global transcriptional regulator by recruiting chromatin-remodeling complexes or regulating higher-order chromatin structures [3][4][5][6][7][8] . Thus, MeCP2 may be required for fine-tuning the gene expression for a network of protein-coding genes through both direct and indirect mechanisms. Consistent with this hypothesis, small magnitude changes in gene expression have been detected in brain tissue from either human postmortem RTT samples or mouse Mecp2-mutants [9][10][11][12] . However, most transcriptional studies of postmortem RTT brain have used microarray platforms with small numbers and a lack of age-matched control samples, which impact the sensitivity for detecting transcriptional changes. One study used both microarrays and RNA sequencing (RNA-seq) to examine frontal and temporal cortex from individuals with RTT compared to controls and identified over 200 differentially expressed genes after normalizing data for neuron versus glia composition of samples 13 . Another larger study used RNA-seq to examine motor cortex and cerebellum and identified over 2,000 differentially expressed genes with a global increase in expression 14 .
We generated RNA-seq data using brain samples for two distinct brain regions, temporal cortex and cingulate cortex, from four female RTT and four age-matched female donors. Reduced volume and dendritic branching of neurons in the temporal cortex and reduced connectivity of the cingulate cortex have been reported in RTT, indicating the importance of these brain regions in the disorder [15][16][17][18] . We also compared our data with the MECP2 variant confirmation. Genomic DNA was isolated from brain samples for 7773 and 7783 using the PureLink Genomic DNA Kit (LifeTechnologies) according to the manufacturer's protocol. We performed Sanger sequencing of MECP2 to verify the reported variants (Table 1). Chromatograms were aligned to MECP2 (ENSG00000169057) using MAFFT v7 19 . No additional genes were screened.  www.nature.com/scientificdata www.nature.com/scientificdata/ RNa sample and library preparation. Total RNA was previously isolated using the Qiagen RNeasy Kit according to the manufacturer's instructions 20 . Double stranded cDNA fragments were synthesized from mRNA, ligated with adapters, and size-selected for library construction according to the TruSeq Sample Preparation v2 protocol using 0.5-1.5 μg of total RNA (Table 2). ERCC RNA spike-in controls were not included in this experiment. Library quality was measured using an Agilent 2100 Bioanalyzer and concentration was assessed by PicoGreen incorporation. Barcoded libraries were pooled and sequenced in two lanes using an Illumina HiSeq 2000 sequencer.

RNA-Seq data analysis.
Single-end reads (100 bp) were aligned to the Human reference genome (NCBI build 37/hg19) using STAR v2.5.3a 21 (see Code Availability 1). Aligned reads mapping to the exons of a gene were summarized into gene counts using featureCounts v1.6 22 (see Code Availability 2). Picard CollecteRnaSeqMetrics was used to measure the 3′ bias of genes in the RNA-seq data (see Code Availability 3). Gene-level differential expression was analyzed using DESeq2 23 specifying ~ region + group + bias as the experimental design (see Code Availability 4). Aligned reads mapping to MECP2 isoforms were also summarized using featureCounts v1.6 22 (see Code availability 2) by substituting isoforms for gene name.

Data Records
Count matrix and normalized count matrix were submitted to the NCBI Gene Expression Omnibus (GEO) under accession number GSE128380 24 . The raw FASTQ files can be downloaded from the Sequence Read Archive (SRA) under accession number SRP188555 25 .

Technical Validation
MECP2 variant confirmation. We verified the presence of the MECP2 c.473 C > T (p.Thr158Met) intragenic variant using DNA isolated from brain 7773 (Supplemental Fig. 1). No MECP2 variants were detected in exons 2-4 of brain 7783. Since we were unable to amplify exon 1 in 7783, we infer exon 1 is likely to be the deleted exon. We also examined RNA-seq data for presence of MECP2 variants (Supplemental Fig. 2). The MECP2 c.473 C > T (p.Thr158Met) intragenic variant was also detected in RNA-seq data from CCTX and TCTX for brain 7773. MECP2 variants were not detected in RNA-seq data for other RTT brain samples, possibly due to low sequencing read depth of MECP2 (Supplemental Fig. 3), or because causal variants are present in another gene 26,27 . RNA and data quality. RNA quality was determined using the Agilent 2100 Bioanalyzer and the RNA 6000 Pico Kit and high-quality RNA was obtained from all samples (RNA integrity number [RIN] > 8.0; median RIN = 9.4 [ Table 2]). At the time the experiment was performed, the TruSeq RNA Sample Prep v2 protocol (Part # 15026495 Rev.C, May 2012) was optimized for 0.1-4 μg of total RNA. Although the quantity of RNA input varied among the samples in our experiment, it was equivalent within each age-and tissue-matched case-control sample pair, and all samples were within the optimized range. On average, RNA-seq generated 21.9 million high-quality reads per sample, 70.3% of which mapped uniquely to the Human reference genome (NCBI build 37/hg19) ( Table 3). RIN and RNA quantity were each correlated with the number of uniquely mapped reads (Fig. 2). Cook's distance was calculated to test for outliers, with none detected (Fig. 3a). The first principal component explained over 50% of the variance (Fig. 3b). A correlation matrix based on the gene expression data indicated that samples mostly cluster by individual and diagnostic group, but also by 3′ bias (Fig. 3c).  www.nature.com/scientificdata www.nature.com/scientificdata/ MECP2 and MET differential expression. We previously used quantitative reverse transcription PCR to compare expression of MECP2_e1 (NM_004992.3), MECP2_e2 (NM_001110792.1), and MET (NM_000245.3) in the temporal cortex between RTT and CTL brains 28 . Consistent with our previous results, the RNA-seq data showed no significant difference in MECP2 expression between RTT and CTL brains (FDR adjusted p-value = 0.16 and 0.59, respectively), while MET expression was significantly reduced in RTT brains (FDR = 1.07 × 10 −05 ; Fig. 4).
Compatibility with published transcriptional profiles. Two RNA-seq datasets of postmortem brain from females with RTT compared to controls have been published 13,14 (Table 4). The first dataset examined pooled frontal and temporal cortex (FTTX) for each of three individuals with RTT compared to three CTL and is available from the Sequence Read Archive under accession number PRJNA302685 29 . The second larger dataset examined motor cortex (Motor) and cerebellum (Cblm) for nine females and six females with RTT, respectively, compared to eight CTL of each tissue, but the primary data were not accessible 14 . We downloaded the FASTQ files for the available dataset, aligned reads using salmon 30 (see Code Availability 6), summarized the aligned reads into gene counts using tximport v1.12.1 31 (see Code Availability 7), and retained genes with ≥10 counts in ≥3 samples. Count data were converted to logCPM to adjust for the total counts per sample using limma v3.40.2 32 (see Code Availability 8), then observation-level and sample-level weights were estimated using voom 32,33 . We also reanalyzed our data using this workflow. Analysis of variance models were fit separately for each of three brain regions (CCTX, FCTX, FTTX), then combined in a random effects meta-analysis using GeneMeta v1.56.0 34 (see Code Availability 9). Our meta-analysis identified 1,455 genes that were significantly differentially expressed (FDR < 0.05) between brain samples from control individuals and those with RTT.  Table 3. RNA-seq Data Mapping Statistics. www.nature.com/scientificdata www.nature.com/scientificdata/ To verify these results, we compared the results from our meta-analysis with differential gene expression results from previous RTT RNA-seq analyses 13,14 (Fig. 5). We compared the Z-score for each of the significantly differentially expressed genes from our meta-analysis with the log2 fold change from our previous analysis (GEO DESeq2)     Fig. 5a). We found strong concordance among RTT transcriptional profiles from regions within the cerebral cortex, while RTT transcriptional profiles from the cerebellum were least correlated with the regions from the cerebral cortex (Fig. 5b). We aggregated the gene-wise correlation coefficients among datasets and found an overall positive correlation for 63% of the comparisons among datasets, indicating an overall agreement among the differential gene expression per dataset (Fig. 5c). Not only do our data represent an independent technical and biological replication of molecular alterations in RTT brain, but our meta-analysis demonstrates the power of combining datasets to maximize detectable results among several smaller studies.