A multi-omics dataset for the analysis of frontotemporal dementia genetic subtypes

Understanding the molecular mechanisms underlying frontotemporal dementia (FTD) is essential for the development of successful therapies. Systematic studies on human post-mortem brain tissue of patients with genetic subtypes of FTD are currently lacking. The Risk and Modyfing Factors of Frontotemporal Dementia (RiMod-FTD) consortium therefore has generated a multi-omics dataset for genetic subtypes of FTD to identify common and distinct molecular mechanisms disturbed in disease. Here, we present multi-omics datasets generated from the frontal lobe of post-mortem human brain tissue from patients with mutations in MAPT, GRN and C9orf72 and healthy controls. This data resource consists of four datasets generated with different technologies to capture the transcriptome by RNA-seq, small RNA-seq, CAGE-seq, and methylation profiling. We show concrete examples on how to use the resulting data and confirm current knowledge about FTD and identify new processes for further investigation. This extensive multi-omics dataset holds great value to reveal new research avenues for this devastating disease.


Background & Summary
Frontotemporal Dementia (FTD) is a devastating pre-senile dementia characterized by progressive deterioration of the frontal and anterior temporal lobes 1 .The most common symptoms include severe changes in social and personal behaviour as well as a general blunting of emotions.Clinically, genetically, and pathologically there is considerable overlap with other neurodegenerative diseases including Amyotrophic Lateral Sclerosis (ALS), Progressive Supranuclear Palsy (PSP) and Cortical Basal Degeneration (CBD) 2 .Research into FTD has made major advances over the past decades.Up to 40% of cases 3 have a positive family history and up to 60% of familial cases can be explained by mutations in the genes Microtubule Associated Protein Tau (MAPT), Granulin (GRN) and C9orf72 4 which has been key to the progress in our understanding of its molecular basis.Several other disease-causing genes have been identified that account for a much smaller fraction of cases 5 .Mutations in MAPT lead to accumulation of the Tau protein in neurofibrillary tangles in the brain of patients while mutations in GRN and C9orf72 lead to the accumulation of TDP-43 6 , as well as dipeptide repeat proteins (DPRs) and RNA foci in the case of C9orf72 7 .
As of today, no therapy exists that halts or slows the neurodegenerative process of FTD and to develop successful therapies there is an urgent need to determine whether a common target and therapy can be identified that can be exploited for all patients, or whether the distinct genetic, clinical, and pathological subgroups need tailored treatments.Therefore, the development of remedies relies heavily on a better understanding of the molecular and cellular pathways that drive FTD pathogenesis in all FTD subtypes.
Although our knowledge of FTD pathogenesis using molecular and cellular biology approaches has significantly advanced during recent years, a deep mechanistic understanding of the pathological pathways requires simultaneous profiling of multiple regulatory mechanisms.
Post-mortem human brain tissue is an important source for studying the disturbance of molecular processes in patients with FTD.However, systematic studies on human post-mortem brain tissue with genetic subtypes of FTD are currently lacking.Therefore, the Risk and modifying factors in Frontotemporal Dementia (RiMod-FTD) consortium has generated a multi-omics data resource with the focus on mutations in the three most common causal genes for FTD: MAPT, GRN and C9orf72.
Here, we report four datasets from the frontal lobe of post-mortem human brain from RiMod-FTD samples.We present RNA-seq, CAGE-seq (Cap analysis gene expression sequencing), small RNA-seq (smRNA-seq), and methylation datasets from matched samples, which enable precise profiling of transcriptional dysregulations in genetic FTD subtypes (Fig. 1, Table 1).The RNA-seq dataset can be used to identify general transcriptional differences in the FTD subtypes as well as potential alternative splicing events.The regulation of the transcriptome can be studied using the other three datasets.CAGE-seq enables the detection of active or inactive promoters and thereby helps to pinpoint potentially disease-relevant transcription factors.Methylation and smRNA-seq datasets allow researchers to identify regulatory mechanisms that lead to down-regulation of certain genes.These different aspects of the transcriptome can be studied in detail due to matched samples for all these datasets.
We show that known differences and commonalities of the investigated FTD subtypes can be recapitulated with initial analyses of the datasets, such as for instance neuronal loss due to neurodegeneration.More importantly, interesting new aspects about these FTD subtypes can be easily obtained from this dataset.We therefore believe that a thorough analysis of these data from the RiMod-FTD project can reveal helpful new insights about FTD and believe that researchers will find these datasets helpful for their own studies.To make the data easily accessible, we provide a gene browser which allows to compare expression and methylation information between disease groups for specific genomic locations at www.rimod-ftd.org.

Methods
Donor samples employed in this study.The brain samples and/or bio samples were obtained from The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam (open access: www.brainbank.nl).All Material has been collected from donors for or from whom a written informed consent for a brain autopsy and the use of the material and clinical information for research purposes had been obtained by the NBB.Additional samples were provided by the Queen Square Brain Bank of Neurological Disorders and MRC, King College London.We complied to all relevant regulations of the mentioned brain banks for the work with these samples.
Gyrus frontalis medialis (GFM) tissue from each subject was divided into three pieces for transcriptomic, proteomic, and epigenetic experiments in a dry-ice bath using precooled scalpels and plasticware.

Genetic analysis.
Genomic DNA was isolated from 50 mg of GFM frozen brain tissue by using the Qiamp DNA mini kit (Qiagen) following the manufacturer protocol.DNA concentration and purity were assessed by nanodrop measurement.DNA integrity was evaluated by loading 100 nanogram per sample on a 0,8% agarose gel and comparing size distribution to a size standard.
Presence of C9orf72 hexanucleotide repeat expansion (HRF) in post-mortem brain tissues was confirmed by primed repeat PCR according to established protocols.Reported mutations for MAPT and GRN were verified by sanger sequencing.Multi-omics profiling is performed on all samples to gain detailed insights into disease mechanisms.Integrative data analysis is performed to gain new insights and the data is made available for further studies.The goal of the RiMod-FTD project is to further extend this data resource with fitting datasets.transcriptomic procedures.RNA isolation from human brain tissue.Total RNA for CAGE-seq and RNA-seq was isolated from ±100 mg of frozen brain tissue with TRIzol reagent (Thermo Fischer Scientific) according to the manufacturer recommendation, followed by purification with the RNeasy mini columns (Qiagen) after DNAse treatment.
Total RNA for smRNA-seq was isolated from frozen tissue using the TRIzol reagent (ThermoFischer Scientific).After isopropanol precipitation and 80% ethanol rinsing RNA pellet was resuspended in RNAse free water and up to 10 micrograms of RNA was incubated with 2U of Ambion DNAse I (ThermoFischer) at 37 °C for 20 minutes.DNA-free RNA samples were then further purified by phenol-chloroform-isoamyl-alchol extraction followed by ethanol precipitation.
RNA QC.For each RNA sample, RNA concentration (A260) and purity (A260/280 and A260/230) were determined by Nanodrop measurement and RNA integrity (RIN) was assessed on a Bioanalyser 2100 system and/or Tape station 41200 (Agilent Technologies Inc.) RNA-seq libraries.Total RNA-seq libraries were prepared from 1 microgram of total RNA from frozen brain tissue using the TruSeq Stranded Total RNA with Ribo-Zero Gold kit (Illumina) according to the protocol specifications.RNA-seq libraries were sequenced on a Hiseq2500 and HISeq4000 on a 2 × 100 bp paired end (PE) flow cell (Illumina) at an average of 100 M PE/sample.CAGE-seq libraries.CAGE-seq libraries were prepared from 5 micrograms of RNA from frozen brain tissues according to a published protocol 8 .Libraries were sequenced on a HiSeq2000 and/or HiSeq2500 on a 1 × 50 bp single read (SR) flow cell (Illumina) at an average of 20 M reads/sample.On average, 21,212,923 reads were generated per sample.
smRNA-seq libraries.The small RNA-seq libraries were prepared in two different batches.They were prepared from frozen tissue starting from 2 micrograms of total RNA using the Nextflex Small RNA-seq kit v3 (Bioo Scientific) and the NEBNext Small RNA library prep set for Illumina (New England Biolabs), respectively.Libraries were sequenced on a NextSeq550 on a 75 cycles flow cell.
Methylation assay.To assess the methylation status of over 850000 CpG sites in promoter, gene body and enhancer regions we have used the MethylationEPIC bead chip arrays (Illumina).
Bisulfite conversion of genomic DNA, genome amplification, hybridization to the beadchips, washing, staining, and scanning procedure was performed by Atlas Biolabs (Atlas Biolabs, Berlin, Germany).Cases and controls DNAs were distributed randomly across each array.
RNA-seq processing and analysis.Raw FastQ files were processed using the RNA-seq pipeline from nf-core (nf-core/rnaseq v1.3) 9 , with trimming enabled.Gene quantification was subsequently done using Salmon (v0.14.1) 10 on the trimmed FastQ files.Alignment and mapping were performed against the human genome hg38.On average, 82,165,192 reads could be uniquely mapped, which relates to on average 88.6% uniquely mapped reads per sample (Supplementary Table 2).In total, 59,270 transcripts could be identified.DESeq2 (v.1.26.0) 11 was used to perform differential expression analysis.We corrected for the covariates sex and PH-value.For visualization and clustering, the data was transformed using the variance stabilization transformation from the DESeq2 package.
Cell type deconvolution.We performed cell type deconvolution on the RNA-seq data using Scaden 12 .For training we used the human brain training dataset used in the Scaden publication.Each ensembl model was trained for 5000 steps.Cell type deconvolution was then performed with the trained Scaden model on the RNA-seq count data.Relative changes in cell type composition were quantified by first calculating the average fractions of a cell type for all groups and then calculating the percentual change of cell fractions compared to the average control fractions.This allows to detect relative changes in cell type compositions.To test for statistical differences between disease groups, we have performed an ANOVA test with a post-hoc Tukey HSD test using R.

CAGE-seq processing and analysis.
Sequencing adapters and barcodes in CAGE-seq FastQ files were trimmed using Skewer (v.0.1.126) 13.Sequencing artefacts were removed using TagDust (v1.0) 14 .Processed reads were then aligned against the human genome hg38 using STAR (v.2.4.1) 15 .On average, 16,306,077 could be uniquely mapped per sample (76% uniquely mapped on average reads per sample, Supplementary Table 2).CAGE detected TSS (CTSS) files were created using CAGEr (v1.10.0) 16.With CAGEr, we removed the first G nucleotide if it was a mismatch.CTSS were clustered using the 'distclu' method with a maximum distance of 20 bp.For exact commands used we refer to the reader to the scripts used in this pipeline: https://github.com/dznetubingen/cageseq-pipeline-mf.In total, we could identify 47,298 different peaks.Data was normalized to counts per million (CPM) for visualization on the website.smRNA-seq processing and analysis.After removing sequencing adapters, all FastQ files were uploaded to OASIS2 17 for analysis.On average, 3,430,613 (+/− 1,365,407) reads could be uniquely mapped per sample (Supplementary Table 2).Subsequent differential expression analysis was performed on the counts yielded from OASIS2, using DESeq2 and correcting for sex and PH-value, as was done for the RNA-seq data.Additionally, we added a batch variable to the design matrix to correct for the two different batches of this dataset.In total, 2904 human smRNA genes could be detected.For visualization and clustering, the data was transformed using the variance stabilization transformation from the DESeq2 package.Methylation data processing and analysis.The Infinium MethylationEPIC BeadChip, which consists of 866,091 CpG locations, data was analyzed using the minfi R package 18 .We removed all sites with a detection P-value above 0.01, on sex chromosomes and with single nucleotide polymorphisms (SNPs), leaving 810,290 loci for analysis.Data normalization was done using stratified quantile normalization.Sites with a standard deviation below 0.1 were considered uninformative and filtered out, leaving 170,595 sites that were used to perform a principal component analysis.
Additional sequencing quality control.To gather additional sequencing quality control metrics, all sequencing assays (RNA-seq, CAGE-seq, smRNA-seq) were aligned against the hg38 reference genome (RNA-seq, CAGE-seq) or miRbase (smRNA-seq) using bowtie2 19 .Subsequently, the CollectAlignmentSummaryMetrics and CollectGcBiasMetrics from the Picard toolkit (https://broadinstitute.github.io/picard/)were run on the aligned data.Sample swap analysis on sequencing assays was performed with reads aligned against the hg38 reference genome using SMaSH 20 .We verified sample identify between assays by checking the sample with the most significant p-Value between assays up to a p-value of 0.2.This was not possible for all combinations of samples and assays as insufficient variant overlap led to bad p-values, especially between CAGE-seq and smRNA-seq.However, no sample swaps could be detected and for the majority of sample comparisons the correct identity could be verified.We furthermore verified the sex of all samples by comparing the ratio of reads mapping to the X and Y chromoses.

Data Records
All datasets have been submitted to ArrayExpress with the following accessions: E-MTAB-12647 21 (RNA-seq), E-MTAB-12646 22 (CAGE-seq), E-MTAB-12674 23 (Methylation) and E-MTAB-12731 24 (smRNA-seq).The sequencing datasets (RNA-seq, smRNA-seq and CAGE-seq) contain raw FastQ files and metadata information.The methylation dataset contains raw intensity values and metadata information.Processed data in the form of count tables have been uploaded to FigShare for ease of access: https://doi.org/10.6084/m9.figshare.23825595.v1 25.The cell type fractions from the deconvolution analysis have been uploaded to FigShare as well.technical Validation RNa integrity.We only included samples with an RNA integrity value above 5.The mean RIN value is 6.9 with a standard deviation of 1.0.

Sample identity.
We confirmed the sex of all samples by comparing the ratio of reads mapping to the X and Y chromosomes.All sequencing assays (RNA-seq, CAGE-seq, smRNA-seq) were checked for potential sample swaps using the SMaSH software 20 .

Sample statistics.
To gain an overview of the dataset, general sample statistics are visualized in Fig. 2.
Samples from the control group have a higher age compared to FTD samples, as FTD patients die at a younger age because of the disease.Percentage of different sexes are similar in all groups, with slightly higher percentages of female samples.FTD samples furthermore have, on average, a lower RNA integrity value and a lower pH value compared to healthy controls.The postmortem duration (PMD) is very similar across all groups.All metadata is provided in the supplements (Supplementary Table 1).Additional QC metrics were generated with the Picard toolkit and are provided as supplementary data (supplementary Tables 3 and 4).

Methylation data quality control and normalization.
The detection P-value has been calculated for all positions and samples.This value is calculated by comparing the total DNA signal to the background level, which is estimated using negative control positions.All samples had very low detection P-values, with a maximum value of 0.0005 (Fig. 3a).We used stratified quantile normalization to normalize the methylation signal for every position.From Fig. 3a,b it is visible that this normalization can be used to even out the signal between samples in this dataset.
Cell type composition of samples.To evaluate how the predicted cell type composition of the RiMod-FTD samples reflects the disease, we performed cell type deconvolution with the RNA-seq data using the Scaden algorithm (see Methods).As expected, neuronal cells make up the largest fractions for all samples, although neuronal fractions are smaller in samples with FTD (Fig. 4c).Comparison of percentage changes of average cell type fractions compared to controls (Fig. 4a) reveals that microglial fractions are particularly large in samples with FTD-GRN.Furthermore, all disease subtypes show a strong increase in endothelial cell fractions compared to controls.The decrease in neuronal cells is largely caused by a smaller number of excitatory neurons in all disease subtypes, while inhibitory neuron fractions are not decreasing.A statistical comparison of cellular fractions between groups using an ANOVA with post-hoc Tukey test showed that many of these composition changes are significant for FTD-MAPT and FTD-GRN (Fig. 4b).In the FTD-C9orf72 group, no cell composition change is significant, indicating a less strong change in cellular composition in this group.Generally, these results are in accordance with recent findings about FTD biology, such as the involvement of excitatory neurons 26,27 , the increase microglial inflammatory response in FTD-GRN 28 , or very recently, neurovascular dysfunctions in FTD-GRN 29 .

Usage Notes
inspection of assay results in RiMod-FtD browser.The data from the different assays of the RiMod-FTD project can be easily inspected using the RiMod-FTD Browser at https://www.rimod-ftd.org.The website allows to query the RNA-seq expression levels of single genes.It is easily visible, for instance, that GRN RNA levels are lower in brains from patients with GRN mutations, compared to the other groups (Fig. 5a).When selecting a gene, the surrounding genomic location is additionally shown with RNA-seq expression levels for genes in the region, CAGE-seq peaks and methylation levels.All data can be grouped by disease, pathology, sex, mutated gene and specific mutation (Fig. 5b,c).The browser does allow for quick inspection of FTD-related genes.

Fig. 1
Fig. 1 Schematic overview of the RiMod-FTD project idea.(a) Samples from post-mortem human brain tissue of patients with FTD caused by mutations in MAPT, GRN and C9orf72 and from healthy controls are collected.Multi-omics profiling is performed on all samples to gain detailed insights into disease mechanisms.Integrative data analysis is performed to gain new insights and the data is made available for further studies.The goal of the RiMod-FTD project is to further extend this data resource with fitting datasets.

Fig. 2
Fig. 2 Visualization of sample statistics.(a-d) show boxplots of Age, RIN, pH and PMD, respectively.The different disease groups are indicated on the x-axis and by colour.(e) percentages of sexes among samples for the different disease groups.

Fig. 3
Fig. 3 Methylation data QC statistics.(a) Per-sample Detection P-values calculated with the minfi R-package.Each bar (y-axis) shows the detection P-value (x-axis) for a sample.(b,c) Density plots of methylation beta values before and after quantile normalization, respectively.Every colored line describes one sample.

Fig. 4
Fig. 4 Predicted cell composition of samples and disease groups.(a) Difference in percentage of cell compositions per group and cell type, compared to the healthy controls.Each bar signifies the relative change in percentage that the fraction of a specific cell type (x-axis) has increased or decreased (positive or negative values on the y-axis, respectively).(b) Statistical comparison of cell type fractions between groups using a Tukey test.Shown is the negative log10 P-value (y-axis) for all different cell types (x-axis).The dotted red line indicates a P-value of 0.05.(c) Stacked barplot of cellular composition for all samples, divided by groups (x-axis).Each color represents one cell type.The fraction of this cell type is shown in the y-axis.

Fig. 5
Fig. 5 Exemplary view of RiMod-FTD genome browser.(a) Normalized RNA-seq expression levels of GRN for the different disesae groups in the RiMod-FTD project.(b) Browser view a selected gene (GRN).(c) CAGE-seq peaks and methylation levels for the currently selected region in the browser.(d) Gene expression levels for genes in the selected region of the genome browser.

Table 1 .
Sample numbers of the different disease groups for the available assays.