Common mitochondrial deletions in RNA-Seq: evaluation of bulk, single-cell, and spatial transcriptomic datasets

Omidsalar, Audrey A.; McCullough, Carmel G.; Xu, Lili; Boedijono, Stanley; Gerke, Daniel; Webb, Michelle G.; Manojlovic, Zarko; Sequeira, Adolfo; Lew, Mark F.; Santorelli, Marco; Serrano, Geidy E.; Beach, Thomas G.; Limon, Agenor; Vawter, Marquis P.; Hjelm, Brooke E.

doi:10.1038/s42003-024-05877-4

Download PDF

Article
Open access
Published: 17 February 2024

Common mitochondrial deletions in RNA-Seq: evaluation of bulk, single-cell, and spatial transcriptomic datasets

Communications Biology volume 7, Article number: 200 (2024) Cite this article

2750 Accesses
8 Altmetric
Metrics details

Subjects

Abstract

Common mitochondrial DNA (mtDNA) deletions are large structural variants in the mitochondrial genome that accumulate in metabolically active tissues with age and have been investigated in various diseases. We applied the Splice-Break2 pipeline (designed for high-throughput quantification of mtDNA deletions) to human RNA-Seq datasets and describe the methodological considerations for evaluating common deletions in bulk, single-cell, and spatial transcriptomics datasets. A robust evaluation of 1570 samples from 14 RNA-Seq studies showed: (i) the abundance of some common deletions detected in PCR-amplified mtDNA correlates with levels observed in RNA-Seq data; (ii) RNA-Seq library preparation method has a strong effect on deletion detection; (iii) deletions had a significant, positive correlation with age in brain and muscle; (iv) deletions were enriched in cortical grey matter, specifically in layers 3 and 5; and (v) brain regions with dopaminergic neurons (i.e., substantia nigra, ventral tegmental area, and caudate nucleus) had remarkable enrichment of common mtDNA deletions.

CommonMind Consortium provides transcriptomic and epigenomic data for Schizophrenia and Bipolar Disorder

Article Open access 24 September 2019

Detection of mosaic and population-level structural variants with Sniffles2

Article Open access 02 January 2024

Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain

Article 14 March 2022

Introduction

Mitochondrial DNA (mtDNA) deletions are large structural variants in the mitochondrial genome where a large piece of DNA (often several kilobases or more) is missing^1,2,3,4,5,6. These mtDNA molecules may or may not have the ability to replicate and can lead to metabolic impairment by disrupting regions that code for proteins, ribosomal and/or transfer RNAs essential for the oxidative phosphorylation (OXPHOS) pathway^3,7,8,9. The identification of mtDNA deletion breakpoints and the relative quantification of deleted vs. wild-type mitochondrial genomes have traditionally relied on several approaches, including Southern Blot analysis, Sanger sequencing, and quantitative polymerase chain reaction (qPCR)^10,11,12. Recently, however, several bioinformatics programs and pipelines have been described to identify mtDNA deletions with high accuracy using next-generation sequencing (NGS) data^{13,14,15,16,17}. Our group developed the Splice-Break pipeline for this purpose, and we previously demonstrated its accuracy and applications to human disease and aging research when used on Illumina sequencing data derived from NGS libraries prepared using mitochondrial-enriched, long-range PCR products as the DNA input¹³. While PCR amplification of the mitochondrial genome in one or more large fragments is a common approach used by many mitochondrial research groups, it still represents a niche method, and such sequencing data is limited. With the more widespread availability of RNA-sequencing (RNA-Seq) data from a variety of tissues and phenotypes, however, it is of great interest to determine if these datasets can be utilized as a resource for mtDNA deletion investigations.

In this study, we demonstrate how Splice-Break2¹³ may be utilized to evaluate mtDNA deletions from RNA-Seq data. We describe a robust evaluation of 30 common mtDNA deletions in 1570 human samples from 14 RNA-Seq studies^{18,19,20,21,22,23,24,25,26,27,28,29}, including 1107 samples across 11 tissues from the Genotype-Tissue Expression (GTEx) Project²⁹. Studies analyzed include both publicly available and newly presented datasets and cover a variety of methods including those that utilized bulk RNA-Seq (polyA/non-ribosomal depletion vs. ribosomal depletion), laser capture microdissection (LCM) RNA-Seq, spatial transcriptomics, and single-cell RNA sequencing (scRNA-Seq) library preparation methods. These 14 studies include 1570 samples that vary in age, diagnosis and tissue, including: (1) age ranges from prenatal to 96 years; (2) diagnoses of Parkinson’s Disease (PD)^18,22,24, Alzheimer’s Disease (AD)¹⁹, schizophrenia (SCZ)^20,21, major depressive disorder (MDD)²¹, bipolar disorder (BD)²¹, and healthy controls (CTRL); and (3) tissues of several brain regions^{18,19,20,21,24,25,26,28,29}, skeletal muscle^22,23,29, liver²⁹, blood²⁹, and the pancreas²⁷.

Large mtDNA deletions are often associated with short repeat sequences proximal to the 5′ and 3′ breakpoints; these repeats lead to an incorrect joining of two distal regions of the mitochondrial genome, either through strand-displacement during mtDNA replication or homologous end-joining during DNA repair processes^30,31,32,33. Our previous study evaluating mtDNA deletions from brain and blood samples found that all the “Top 30” most frequent mtDNA deletions (which were detected in 69% or more of all samples) were associated with (perfect/imperfect) repeat sequences of 8–22 base pairs in length. These 30 deletions were previously validated by Sanger sequencing and include 12 breakpoints that had been previously described in patients with mitochondrial deletion disorders^13,34. Three such examples are the 6335–13999, 7816–14807, and 8471–13449 deletions, which represent 5′-3′ positions 6341–14005, 7814–14805, and 8482–13460, respectively, when adjusted to the first repeat base^{13,33,35,36,37,38,39}. These deleted molecules have been associated with a number of rare diseases including (but not limited to) Kearns-Sayre Syndrome (KSS)^36,37, chronic progressive external ophthalmoplegia (CPEO) with or without an associated POLG mutation^37,40,41, and diffuse leukodystrophy³⁹; moreover, they have been found to be enriched in metabolically-active, somatic tissues and have been investigated in pathogenic phenotypes associated with cellular aging (e.g., PD, AD)^42,43,44.

MtDNA sequences present in RNA-Seq datasets are likely derived from real, transcribed RNA molecules, but may also be attributed to DNA contamination or “off-target” effects^26,45,46. The amount of mitochondrial gene transcripts detected in RNA-Seq studies can also be highly variable due to overall RNA quality and library preparation procedures^{26,45,46,47,48,49,50,51}. Due to these concerns, many RNA-Seq and ATAC-Seq bioinformatics pipelines remove mitochondrial reads prior to genome alignment and/or transcript quantification^{26,45,46,47,48,49,50,51}. As such, our first objective was to determine which subset or species of mtDNA deletions demonstrated a significant and positive correlation between RNA-Seq and the aforementioned DNA sequencing processes that utilize long-range PCR amplification of the mitochondrial genome. We PCR-amplified mtDNA from the dorsolateral prefrontal cortex (DLPFC) and processed the NGS data through the traditional Splice-Break2 pipeline¹³, and compared the deletion read %’s to mRNA-Seq data that was previously published by Zeppillo et al. using the same tissue (GSE224683)²⁰. Next, we evaluated 14 RNA-Seq studies^{18,19,20,21,22,23,24,25,26,27,28,29} to determine how methodological differences in library preparation and sequencing depth affected our ability to detect mitochondrial reads and common mtDNA deletions. Lastly, taking our library preparation observations into account, we evaluated 11 of the 14 RNA-Seq datasets derived from human brain, muscle tissue, liver and blood, and examined if these common deletions significantly correlated with age^{20,21,22,23,29}, were enriched in disease phenotypes or brain regions^{18,19,20,21,22,24,25,29}, or had differential levels in cortical grey matter layers (I-VI) compared to white matter²⁶. Cortical layer evaluations were performed using two spatial transcriptomics datasets, including one dataset of middle temporal gyrus (MTG) that is newly presented here in this study²⁶ (GSE226663).

Here, we describe the methodological considerations for applying the Splice-Break2 pipeline to RNA-Seq datasets, and present evidence that somatic aging effects, tissue specificity, and metabolic dysfunction can be studied with this approach. Given the wealth of both publicly available and restricted RNA-Seq datasets, this strategy may expedite investigations of common mtDNA deletions, especially for cases where the affected human tissue is not readily available for additional sequencing studies (e.g., rare diseases). Moreover, RNA-Seq studies are already designed to appreciate factors such as tissue and cellular specificity, environmental context, aging and drug exposure, whereas germline DNA studies (e.g., whole genome sequencing (WGS)) often use DNA derived from saliva or blood, which may not accurately recapitulate the somatic structural variation present in metabolically-active tissues or cells^52,53. Taken together, we believe this study will open a new door to study common mtDNA deletions and their role in human health and disease.

Methods

Samples

We evaluated 1570 samples from 14 RNA-Seq datasets. Thirteen of these datasets are provided on the National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GEO) website or are otherwise publicly available- we refer to these datasets collectively as “GEO+”^{18,19,20,21,22,23,24,25,26,27,28}. This consisted of 7 bulk sequencing with and without ribosomal depletion (GSE226663, GSE114517, GSE159699, GSE224683, GSE140089, GSE164471, and the Stanley Neuropathology Consortium dataset: http://sncid.stanleyresearch.org), 2 LCM RNA-Seq (GSE114918 and GSE166024), 2 spatial transcriptomics (GSE226663 and SpatialLIBD dataset available at Globus endpoint “jhpce#HumanPilot10x”), and 2 single-cell sequencing datasets (GSE81547 and GSE67835). The last dataset came from the Genotype-Tissue Expression Consortium (GTEx)²⁹, where we analyzed bulk sequencing samples from 11 human tissues (dbGaP phs000424.vN.pN). All described studies obtained consent from living donors and/or next-of-kin for postmortem tissues, along with necessary approvals from Institutional Review Boards (IRBs) and/or ethics committees.

The 30 samples derived from mitochondrial-enriched, long-range PCR products as the DNA input are available through dbGaP (phs002395.v1.p1) as part of a larger investigation of blood and brain-derived mtDNA^13,54. DNA sequencing was performed on mitochondrial enriched PCR-amplicons fragmented by sonication and prepared for NGS using a whole-genome library preparation kit^13,55. Briefly, ~50 mg of frozen brain tissue was homogenized manually with a pestel; total DNA was extracted using the AllPrep DNA/RNA/Protein Kit (Qiagen, Valencia, CA, USA) and quantified using the Qubit Fluorometer and dsDNA BR Assay Kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. mtDNA was enriched for each sample using a long-range (LR) PCR. The forward primer used was: 5′-CCGCACAAGAGTGCTACTCTCCTC-3′; the reverse primer used was: 5′-GATATTGATTTCACGGAGGATGGTG-3′ (Integrated DNA Technologies, Coralville, IA, USA)^13,55. Each sample was amplified in a 50 μl PCR reaction that contained the following: 50 ng of total DNA, 1 μl of each 10 uM primer, 8 μl of 2.5 mM dNTPs, 0.5 μl of LA Taq DNA Polymerase, Hot-Start Version (Takara Bio USA, Inc., Mountain View, CA, USA), and 5 μl 10x buffer. Thermocycler parameters were as follows: 94 °C for 1 min, followed by 30 cycles of denaturation at 98 °C for 10 s and annealing/extension at 68 °C for 15 min, with a final extension at 72 °C for 10 min. Reactions were then kept at 4 °C. Following PCR, 5 μl (10% of the total reaction volume) was loaded into a 1% agarose gel containing 10 mg/ml ethidium bromide and the gel was run at 100 V for approximately 2 h to confirm amplification of the full-length mitochondrial genome. PCR products were purified using Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN, USA) and were quantified using the Qubit Fluorometer and dsDNA BR Assay Kit (Invitrogen, Carlsbad, CA, USA). DNA shearing was performed in S220 Covaris microTUBEs with the following settings: Duty Factor = 5%, Peak Incident Power = 175 W, Cycles per Burst = 200, Time = 35. 200 ng of the sheared LR mitochondrial PCR product was used for library preparation using the TruSeq Nano DNA HT Library Preparation Kit (Illumina, San Diego, CA, USA). Each library was barcoded for multiplex sequencing with 96 samples per lane. NGS libraries were quantified and qualified prior to sequencing using the KAPA Library Quantification Kit (KAPA Biosystems, Wilmington, MA, USA) and Agilent Bioanalyzer DNA 7500 chips (Agilent, Santa Clara, CA, USA), respectively. Libraries were sequenced as 150-mer paired-end reads using the Illumina HiSeq 2500 at the UCI Genomics High Throughput Facility.

All accession numbers for the respective RNA-Seq datasets are provided in the Data Availability, Results section, in Supplementary Data 1 and 2, and are on GitHub (https://github.com/aomidsalar/RNA-Seq_Splice-Break2).

Internal data preparation: bulk RNA-Seq

The internal data obtained for methods comparisons (bulk RNA-Seq and spatial transcriptomics) utilized four subject’s brain tissue obtained from the Banner Sun Health Research Institute (BSHRI) Brain and Body Donation Program. These four subjects consisted of one Parkinson’s Disease (PD) subject (144113–144114), one Alzheimer’s Disease (AD) subject (144111–144112), and two age-matched controls with no neurodegenerative disease (144105–144106 and 144107–144108). Briefly, frozen middle temporal gyrus (MTG) was microdissected into ~5 mm³ blocks to be compatible with the grid size used for the 10x Visium Spatial Gene Expression platform. Each block was dissected to contain a sulcus, all six cortical layers of the grey matter, and white matter. Frozen tissue was affixed to cryostat chucks with OTC and sectioned with a Microm HM525 Cryostat at −20 °C to 10 μm slices. Frozen tissue slides were adhered to the 10x Visium Spatial Gene Expression or Tissue Optimization slides following the manufacturer’s protocols. Additional sections were adhered to Selectfrost microscope slides for bulk RNA-Seq assays. All slides were frozen at −80 °C prior to library preparation.

For bulk RNA-Seq, two frozen slides from one control subject (144107/144108) were extracted for RNA using the Direct-Zol RNA mini prep kit. RNA was quantified and qualified using the Agilent Tape Station-HS RNA Screen Tape and Tape Station and had an RNA Integrity Number (RIN) of 5.6. 240 ng of RNA was used as input for each bulk RNA-Seq prep, which utilized the NEBNext Ultra II Directional RNA library prep kit. The polyA (mRNA) library preparation used the NEBNext Poly(A) mRNA Magnetic Isolation Module, and the ribo depletion library preparation used the NEBNext rRNA Depletion Kit V2 and SBP beads. Libraries were barcoded with the NEBNext Multiplex Oligos for Illumina. Library size and quality was evaluated using the Agilent Tape Station-HS DNA Screen Tape and Tape Station; libraries were quantified with the KAPA qPCR library quantification kit. Both RNA-Seq libraries were sequenced as paired-end 100-mer reads on an Illumina NovaSeq 6000 system using a NovaSeq S1 flowcell. Sequencing was performed by the USC Keck Genomics Platform (KGP).

For spatial transcriptomics, tissue optimization (i.e., permeabilization time) was tested according to the 10x Genomics protocols, with 12 min of permeabilization selected for spatial transcriptomics sequencing. Two sections of each of the four subjects that were adhered to the 10x Visium Spatial Gene Expression were processed according to the 10x Genomics Visium protocols. Hematoxylin and eosin (H&E) imaging was performed at 20X magnification and auto-stitching using a Keyence BZ-X810 Microscope to generate high-resolution TIFF files. The cDNA was amplified using 12 cycles following Cq determination steps. Library size and quality was evaluated using the Agilent Tape Station-HS DNA Screen Tape and Tape Station; libraries were quantified with the KAPA qPCR library quantification kit. Spatial transcriptomics libraries were sequenced as paired-end 150-mer reads on an Illumina NovaSeq 6000 system using a NovaSeq S4 flowcell. Sequencing was performed by the USC Keck Genomics Platform (KGP).

Downloading files

FASTQ files available on GEO were obtained using the NCBI tool fastq-dump, with command options to split files (--split-files) and append the read ID (-I). The LIBD spatial transcriptomics dataset²⁶ is available on Globus and Github; FASTQ and BAM files from this study were downloaded directly from the Globus endpoint jhpce#HumanPilot10x (http://research.libd.org/globus/jhpce_HumanPilot10x/index.html); “filtered_feature_matrix.h5” files and tissue images, which were used for clustering and image annotation, were downloaded from their Github repository (https://github.com/LieberInstitute/HumanPilot). RNA-Seq FASTQ files from cerebellum (CER), hippocampus (HIPP), and prefrontal cortex (PFC) samples were downloaded directly from the Stanley Neuropathology Consortium website after requesting access to the data portal (http://sncid.stanleyresearch.org/). GTEx samples were obtained from the AnVIL Gen3 repository and downloaded as BAM after gaining access through dbGaP; they were sorted using samtools sort⁵⁶ (version 1.6) and converted to FASTQ using bedtools bamtofastq⁵⁷ (version 2.25.0).

HISAT2 alignment

In order to isolate reads that uniquely mapped to chrM and did not contain nuclear-mapped reads, we performed HISAT2⁵⁸ alignment to the nuclear genome on the RNA-Seq samples from the Zeppillo et al. study²⁰. To do this, we first downloaded the Ensembl human genome reference FASTA file version GRCh38.103⁵⁹ and removed the mitochondrial genome (chrM) from that file. Next, we built the index files for the human genome reference using the command “hisat2-build”⁵⁸. We then performed alignment using HISAT2⁵⁸ default settings and the updated human genome reference file (without chrM), with the additional option “--un-conc” to store unmapped reads to separate FASTQ files. Those FASTQ files (containing reads that did not map to the nuclear genome) were then used as the input for Splice-Break2 analysis¹³.

Splice-Break2 analysis

RNA-Seq FASTQ files were pushed through Splice-Break2 (https://github.com/brookehjelm/Splice-Break2), using either the single-end or paired-end version based on the data format of the respective study. Default command line options described were used, apart from the pre-alignment steps which were skipped (i.e., command line options: --align=yes, --ref=rCRS, fastq_keep=no, --skip_preAlign=yes). For the spatial transcriptomics datasets, alignment (BCL to BAM file) was performed with 10x Genomics Space Ranger (https://support.10xgenomics.com/spatial-gene-expression/software/overview/welcome) and BAM to FASTQ conversion was performed using the 10x Genomics bamtofastq tool (https://support.10xgenomics.com/docs/bamtofastq). The FASTQ files for Read 2 were processed through the single-end version of Splice-Break2. Single-cell samples were pooled by tissue type (via FASTQ file concatenation) prior to being run through Splice-Break2. Initial read numbers for each study were obtained from the stats file (stats.txt); all other metrics used in this analysis regarding the mtDNA read % for the “Top 30” common deletions were obtained from the “<sample > _LargeMTDeletions_DNAorRNA_Top30_NARpub.txt” files output from Splice-Break2.

Identification of FASTQ reads spanning the 6335–13999, 7816–14807, and 8471–13449 breakpoints

We utilized the Unix command “grep” to identify reads that spanned the three deletion breakpoints described. For 6335–13999, we searched for a string containing the deletion breakpoint and ten bases upstream and downstream from that break (CTCCGTAGACCTAACCTGAC). For the 7816–14807 deletion, we searched for an 18 bp string (TCATCGACCTCCCCACCC) first, followed by strings ATCCTAGT and GAAACTTCGG, which was necessary for specificity. For the 8471–13449 deletion, we searched for a string containing the deletion breakpoint and fifteen bases upstream and downstream from that break (CAAACTACCACCTACCTCCCTCACCATTGG).

Cortical layer imputation

We analyzed two spatial transcriptomics datasets, one from the SpatialLIBD project²⁶ and the other containing internal (USC) samples (GSE226663). Spatial transcriptomics datasets were processed using the 10x Genomics Visium pipeline instructions (https://support.10xgenomics.com/spatial-gene-expression/software/overview/welcome). Scripts for spatial transcriptomics processing and analysis are available on https://github.com/aomidsalar/RNA-Seq_Splice-Break2/.

To evaluate mtDNA deletions in cortical layers using spatial transcriptomics data that did not have stereological assessment, we conducted cortical layer imputation using the SpatialLIBD results as “ground truth” since that dataset included histology. First, we performed Seurat⁶⁰ clustering starting with “filtered_feature_matrix.h5” files on a pool of our eight USC spatial samples along with four SpatialLIBD samples (151673–151676) to generate 10 clusters, as this resulted in the most similar cortical layer architecture to the SpatialLIBD annotation. We performed further sub-clustering of cluster 3 (using the Seurat⁶⁰ command “FindSubCluster”, to generate clusters 3_0 and 3_1), which gave us further distinction between cortical layers 2 and 3. The sample barcodes and their assigned clusters were then used to generate cluster-specific BAM files using the script “split_spatial_bam_per_cluster.py” (provided in our GitHub repository). We compared the proportion of spots that appeared in the eleven Seurat clusters to the seven “ground truth” designations (white matter and cortical layers 1–6) to assign each Seurat cluster to its corresponding imputed layer. We assigned clusters that had over 40 percent of the spots overlapping with “ground truth” annotations to that given cortical layer and omitted clusters 8 and 9 which had an overall frequency of less than 5 percent. BAM files were converted to FASTQ files using the 10x Genomics bamtofastq tool (https://support.10xgenomics.com/docs/bamtofastq); the cluster-specific FASTQ files were concatenated (when required) to make imputed cortical layer FASTQ files for each sample, which were then run through Splice-Break2.

To find spot barcodes that contained either the 6335–13999 deletion or 8471–13449 deletion, we used the Unix tool “grep” to search for the strings containing the deletion breakpoints (see Methods section “Identification of FASTQ Reads Spanning the 6335–13999, 7816–14807, and 8471–13449 Breakpoints”). As a note, the 10x Genomics bamtofastq tool will output 3 types of files for each sample: (1) a “Read 1” file, which contains the UMI and barcode; (2) a “Read 2” file, which contains the cDNA sequence; and (3) an “Index” file, which contains the sample index. Reads that contained each of these deletions (from “grep” results of Read 2) were filtered and cross-referenced with matching headers in the Read 1 FASTQ to output their corresponding barcode sequences; barcodes were then stored in separate csv files (this was done using the script “grepfilestodeletionbarcodes.sh”; https://github.com/aomidsalar/RNA-Seq_Splice-Break2), and these barcodes were used for deletion analyses. Two proportion Z-tests were performed by calculating the proportion of spot barcodes in each imputed cortical layer that contained a “deletion spot”; spot counting and statistics were done manually in Microsoft Excel.

Graphical analysis

All figures in this study (Figs. 1–8, Supplementary Fig. 1–9) were made using R (v.4.1.3); all tables were made in Microsoft Powerpoint for Mac (v.16.61). Bar graphs, scatter plots, and boxplots (Figs. 1, 3–7, 8d; Supplementary Figs. 1–7, 9) were made using the “ggplot2” package (v.3.3.5)⁶¹. All boxplots show the median as a solid black line; the first and third quartiles are captured by the bounds of the box. Boxplot whiskers are defined as the first and third quartiles ± interquartile range times 1.5, respectively, and outliers are denoted as points. The heatmap (Fig. 8a) was made using the “ComplexHeatmap” package (v.2.10.0)⁶². Spatial images (Fig. 8b, c; Supplementary Fig. 8) were made according to the “Secondary Analysis in R” script available on the 10x Genomics website (https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit?src=event&lss=tradeshow&cnm=ts-2020-02-08-event-ra_g-keystone-banff-amr&cid=NULL). Multiple sequence alignment plots for the 6335–13999, 7816–14807, and 8471–13449 deletions (Fig. 2b–d) were generated using the ClustalOmega multiple sequence alignment tool on the EMBL-EBI website (https://www.ebi.ac.uk/Tools/msa/clustalo/) and the “ggmsa” R command (“ggmsa” package v.0.0.6)⁶³.

**Fig. 1: Correlations and comparisons of mtDNA deletions captured by DNA vs RNA-Seq.**

**Fig. 2: MtDNA deletions captured by RNA-Seq.**

**Fig. 3: Sequencing metrics of 14 RNA-Seq datasets evaluated for mtDNA deletions.**

**Fig. 4: Analyses of age in GEO+ samples.**

**Fig. 5: Analyses of age in GTEx samples.**

**Fig. 6: Analyses of brain region and diagnosis in GEO+ samples.**

**Fig. 7: Analysis of brain region and tissue in GTEx samples.**

**Fig. 8: Mitochondrial deletions in spatial transcriptomics and cortical layers.**

Statistics and reproducibility

Statistical analyses were performed in both R and Microsoft Excel. Spearman’s and Pearson correlation tests were done in R using the command “cor.test”. Linear regression calculations were made using the “lm” command and ANOVA test were done using the command “aov”; t-tests were done using the “t.test” command in RStudio or “TTEST” formula in Microsoft Excel. Equality of variances was testing using the FTEST formula in Microsoft Excel. Two-proportion Z-tests were calculated manually in Microsoft Excel. All statistical tests where four deletion metrics were evaluated include multiple comparisons corrections using the Bonferroni method; this was done using the command “p.adjust” with option “method = “bonferroni” in R.

All sample sizes for the RNA-Seq studies are shown in the figure legends and in Tables 1 and 2. All replicates analyzed were from different subjects, apart from GSE140089 that included PD subjects pre- and post-exercise. All deletion metrics for GEO+ datasets are shown in Supplementary Data 2.

Table 1 Summary of 12 GEO + RNA-Seq datasets evaluated for mtDNA deletions

Full size table

Table 2 Summary of GTEx RNA-Seq datasets evaluated for mtDNA deletions

Full size table

Scripts and data processing details

All scripts and details on the above data processing steps are available on GitHub (https://github.com/aomidsalar/RNA-Seq_Splice-Break2). This includes the following sections: (1) “Data_Availability_and_Accession”, which includes references to datasets analyzed in this study and their corresponding accession information; (2) “Processing_FASTQ_from_GEO”, which includes scripts used to format and run single-end and paired-end FASTQ files through Splice-Break2; (3) “Grep_for_Deletions”, which includes scripts used to isolate reads containing the 6335–13999, 7816–14807, or 8471–13449 deletion reads from FASTQ files; (4) “10xSpatial_DataProcessing”, which contains scripts for generating cluster-specific BAM files and identifying barcode sequences for reads containing 6335–13999 or 8471–13449 mtDNA deletions; and (5) “Spatial_Processing_Seurat”, which contains R scripts used to perform Seurat clustering and create annotated tissue images. The GitHub repository for this paper (https://github.com/aomidsalar/RNA-Seq_Splice-Break2) and the Splice-Break2 tool repository (https://github.com/brookehjelm/Splice-Break2) both contain a document for “RNA-Seq Best Practices for Splice-Break2” to help guide users with workflow and command lines.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Common mtDNA deletions in DNA vs RNA sequencing

We used a paired dataset of 30 postmortem brain samples that underwent both DNA and RNA-Sequencing²⁰ to explore Splice-Break2’s utility in analyzing common mtDNA deletions from RNA-Seq data (Fig. 1). These samples came from DLPFC of patients with schizophrenia (SCZ) and healthy controls (CTRL). MtDNA-enriched DNA sequencing was performed as previously described^13,55. This DNA dataset had an average of 3.47 million reads per sample, with 90.31 ± 5.2% alignment to the revised Cambridge Reference Sequence (rCRS) version of the human mitochondrial genome (NC_012920.1)⁶⁴. This resulted in an average MT Benchmark coverage (i.e., mitochondrial depth) of 23,466x. MT benchmark coverage is a measure of the average sequencing depth of the sample, measured from two 250 bp segments within the RNR1 and CYB genes¹³. RNA-Sequencing was performed as described in Zeppillo et al. ²⁰. This RNA-Seq dataset had an average of 114.5 million reads per sample, with 16.17±5.9% alignment to the mitochondrial genome (NC_012920.1)⁶⁴. Spearman and Pearson’s correlations between the Splice-Break2 results from the DNA and RNA datasets were performed for three of the “Top 30” most common mtDNA deletions, as well as the summation (cumulative read %) of all these 30 deletions together (Fig. 1). We also evaluated the summation of additional deletions of high frequency (n = 112), but those cumulative read %’s did not have a significant correlation between DNA and RNA (Supplementary Table 1), so we focused exclusively on the “Top 30” deletions for this investigation. The deletion read %’s for the “Top 30” deletions and the cumulative deletion read rate for every sample and this paired RNA/DNA dataset is provided in Supplementary Data 1.

The cumulative deletion read % of these “Top 30” had a significant and positive correlation between the DNA and RNA-Seq datasets after each was normalized to the amount of mitochondrial data (i.e., MT benchmark coverage) (Spearman’s rho. 0.567, p = 4.33e-3; Pearson’s cor. 0.454, p = 0.0468) (Fig. 1a). Not surprisingly, the DNA dataset had a significantly higher cumulative deletion read % than the RNA-Seq dataset (p = 1.09e-10), with 124-fold higher levels in the DNA (Fig. 1a). Three individual deletions were also examined for DNA/RNA correlations: the 6335–13999 deletion, which was the most frequently detected deletion in our previous study and was present in 98.9% of samples there, the 7816–14807 deletion, which was the second most frequently detected deletion and was present in 92.5% of samples there, and the 8471–13449 deletion (known as “The Common Deletion”), which was the third most frequently detected deletion in our previous study and was present in 91.4% of samples there¹³. The 6335–13999 deletion did not have statistically significant correlations (Spearman’s rho 0.409, p = 0.0993; Pearson’s cor. 0.235, p = 0.845) after multiple comparisons corrections (Fig. 1b). The DNA dataset also had a significantly higher deletion read % for this deletion than the RNA-Seq dataset (p = 1.35e-8), with 512-fold higher levels in the DNA (Fig. 1b). The 7816–14807 deletion did not have a significant correlation between the DNA and RNA-Seq datasets when evaluating the ranks (Spearman’s rho 0.342, p = 0.256); however, the correlation of read values was significant (Pearson’s cor. 0.466, p = 0.0378) (Fig. 1c). The DNA dataset also had a significantly higher deletion read % for this deletion than the RNA-Seq dataset (p = 7.65e-6), with 196-fold higher levels in the DNA (Fig. 1c). The 8471–13449 deletion had a significant and positive correlation between the DNA and RNA-Seq datasets (Spearman’s rho 0.740, p = 1.21e-5; Pearson’s cor. 0.672, p = 1.88e-4) (Fig. 1d). Again, the DNA dataset had a significantly higher deletion read % than the RNA-Seq dataset (p = 5.54e-7), with 22-fold higher levels in the DNA (Fig. 1d). DNA consistently had a higher deletion read % than the RNA-Seq dataset for each of the “Top 30” deletions, and a higher percentage of the “Top 30” deletions were captured in the DNA data (Supplementary Fig. 1).

The “Top 30” mtDNA deletions breakpoints from Splice-Break2, along with their adjusted positions (aligned to the first base based on the repeat sequences), and the mitochondrial genes that encompass these 5′ and 3′ breakpoints are shown (Fig. 2a). Confirmation and visualization of the three mtDNA deletions analyzed was performed using “grep” and the ggmsa R package, respectively⁶³. Reads that contained the sequence CTCCGTAGACCTAACCTGAC, a 20-bp string that encompasses the 6335–13999 breakpoint, aligned as expected to the COX1 and ND5 genes, with retainment of 1 copy of the repeat sequence (Fig. 2b). Reads that contained the sequence TCATCGACCTCCCCACCC an 18-bp string that encompasses the 7816–14807 breakpoint (and that also contained the sequences ATCCTAGT and GAAACTTCGG), aligned as expected to the COX1 and CYB genes, with retainment of 1 copy of the repeat sequence (Fig. 2c). Reads that contained the sequence CAAACTACCACCTACCTCCCTCACCATTGG, a 30-bp string that encompasses the 8471–13449 breakpoint, aligned as expected to the ATP8 and ND5 genes, with retainment of 1 copy of the repeat sequence (Fig. 2d).

To determine if nuclear-mitochondrial DNA segments (NUMTs) were contributing to our detection of common mtDNA deletions, and to assess if it was necessary to process RNA-Seq data with these sequences removed, we compared the mtDNA deletion results from all reads to only those that uniquely mapped to chrM (i.e., after HiSat2⁵⁸ alignment to the nuclear genome and downstream processing of unmapped reads) (Supplementary Fig. 2). The correlations between the mtDNA deletion reads detected (before normalization) was very high for both the “Top 30” sum and the three individual deletions evaluated when comparing all reads to only uniquely mapped reads (Spearman’s rho range: 0.81 to 0.999, p-value range: 2.40e-7 to 2.72e-42; Pearson’s cor range: 0.957–0.998, p-value range: 2.06e-20 to 5.10e-39). There was no significant difference in either the number of deletion reads detected or the deletion read % (after normalization) for any deletion metric (Supplementary Fig. 2). Additional analysis of the uniquely mapped reads from RNA-Seq data compared to the DNA dataset demonstrated that utilizing uniquely mapped reads did not significantly improve the correlations between RNA and DNA (Supplementary Fig. 3); thus, we concluded removal of nuclear mapped reads was not necessary and we processed all RNA-Seq reads through the Splice-Break2 pipeline for the remainder of this study to streamline our workflow. However, both approaches (i.e., using all reads or only chrM uniquely mapped reads) are described in our “RNA-Seq Best Practices for Splice-Break2” document on GitHub.

Total reads, mitochondrial data and common mtDNA deletions across 14 RNA-Seq studies

We processed 14 human RNA-Seq studies^{18,19,20,21,22,23,24,25,26,27,28,29} through the Splice-Break2 pipeline and evaluated the effects of library preparation method and sequencing depth (i.e., total read numbers and MT benchmark coverage) on our ability to capture common mtDNA deletions (Fig. 3). Our analysis included 12 previously published studies and 2 newly presented here. The 12 previously published studies we included are as follows (see Tables 1 and 2 for additional details): (1) Simchovitz et al. (GSE114517) study of ribosomal-depleted RNA-Seq libraries isolated from substantia nigra (SN), amygdala (AM), and middle temporal gyrus (MTG) brain tissue of patients with PD+Dementia and CTRL¹⁸; (2) Nativio et al. (GSE159699) study of ribosomal-depleted RNA-Seq libraries isolated from lateral temporal lobe (TL) tissue of patients with AD and CTRL¹⁹; (3) Zeppillo et al. study (GSE224683) of polyA (non-ribosomal depleted) total RNA libraries isolated from DLPFC brain tissue of patients with SCZ and CTRL²⁰; (4) Kim et al. study (i.e., The Stanley Neuropathology Consortium) of polyA (non-ribosomal depleted) total RNA libraries isolated from cerebellum (CER), hippocampus (HIPP), and prefrontal cortex (PFC) of patients with bipolar disorder (BD), SCZ, major depressive disorder (MDD) and CTRL^21,65; (5) Lavin et al. (GSE140089) study of polyA (non-ribosomal depleted) total RNA libraries isolated from skeletal muscle (SM) of patients with PD and CTRL²²; (6) Tumasian et al. (GSE164471) study of polyA (non-ribosomal depleted) total RNA libraries isolated from SM of CTRL²³; (7) Aguila et al. (GSE114918) study of laser-capture microdissection (LCM) SN and ventral tegmental area (VTA) neurons isolated from patients with PD and CTRL²⁴; (8) Monzón-Sandoval et al. (GSE166024) study of LCM ventral and dorsal SN neurons isolated from CTRL subjects²⁵; (9) Maynard et al. spatial transcriptomics sequencing of DLPFC brain tissue isolated from CTRL subjects²⁶; (10) Enge et al. (GSE81547) analysis of 2544 single cells (alpha, acinar, beta, delta, ductal, mesenchymal, and unsure cell types) isolated from pancreas tissue of CTRL subjects²⁷; (11) Darmanis et al. (GSE67835) analysis of 357 single cells (neurons, fetal quiescent, astrocytes, endothelial, oligodendrocyte precursor cells, and microglia cell types) isolated from fresh brain tissue²⁸; and (12) Lonsdale et al. (i.e., GTEx) study, evaluating multiple brain regions, skeletal muscle, liver and blood²⁹. Single-cell samples were pooled by cell type prior to Splice-Break2 analysis, resulting in 7 pools of pancreas cell types and 6 pools of brain cell types (Table 1). The two newly presented studies include one bulk sequencing dataset of postmortem MTG brain tissue with RNA libraries prepared both with (n = 1) and without (n = 1) ribosomal depletion, as well as one spatial transcriptomics dataset of MTG brain tissue from patients with PD (n = 2), AD (n = 2), and CTRL (n = 4) (GSE226663). One of our internal control samples was sequenced using all three preparation methods; this helped us compare the effect of library preparation method on our ability to capture mtDNA deletions without additional effects of tissue type or sequencing site. In total, we examined 1570 samples ranging in age from prenatal to 96 years across the 14 datasets.

We processed all 1570 samples (Fig. 3a–p), including the one internal (MTG) sample that had been processed three ways (Fig. 3a–h), through the Splice-Break2 pipeline and compared the following key metrics to determine which variables influenced common mtDNA deletion levels: total RNA-Seq reads, mitochondrial (MT) benchmark coverage, MT benchmark coverage per million reads, and the “Top 30” cumulative deletion read numbers. Total RNA-Seq reads are the initial reads from each sample prior to alignment. MT benchmark coverage is a measure of the average sequencing depth of the sample, measured from two 250 bp segments within the RNR1 and CYB genes¹³. The “Top 30” cumulative deletion read amount is the sum of the “Top 30” reads. The deletion read rates for the internal samples processed three ways are also shown (Fig. 3e–h), which are calculated by normalizing the number of reads for each mtDNA deletion to the MT benchmark coverage, and is the metric used to evaluate the effect of age, tissue, and diagnoses (see Figs. 4–7). The deletion read %’s for the “Top 30” deletions and the cumulative deletion read rate for every sample and all RNA-Seq studies is provided in Supplementary Data 2 for the GEO+ datasets; individual-level data is not shown for the GTEx dataset because of dbGaP restrictions.

We observed that samples that used a ribosomal depletion library preparation method (but had similar abundance of sequencing reads prior to alignment) had less MT benchmark coverage than polyA (non-ribosomal depleted) samples (Fig. 3a–c, i–k). Due to a reduced amount of mitochondrial data in the samples subjected to ribosomal depletion, we observed little to no common mtDNA deletions in these samples; conversely, for samples prepared without ribosomal depletion (i.e., bulk polyA or other), we consistently observed these common mtDNA deletions and more mitochondrial data overall (Fig. 3). Samples prepared by LCM showed the highest number of deletion reads (Fig. 3l); however, it is important to note that these LCM samples were all derived from aged human brain regions with high dopaminergic neuron innervation, the SN and VTA, so these high values may not solely reflect the sample preparation method. Evaluations of our internal control samples prepped by three methods further confirmed our observation of higher abundance of mitochondrial reads and deletions in polyA (non-ribosomal depleted) preparations and showed more MT data was captured by spatial transcriptomics than by bulk RNA-Seq (Fig. 3b–h). This is in alignment with previous observations that this spatial transcriptomics method (i.e., 10x Genomics Visium) may capture more “off-target” and/or non-polyadenylated reads compared to other methods^26,66. Single-cell RNA sequencing (scRNA-Seq) samples did not contain the three mtDNA deletions followed in this study, so these samples were not evaluated further (Table 1, Fig. 3l). Taken together, we found that bulk sequencing with polyA (non-ribosomal depletion), spatial transcriptomics, and LCM methods were amenable to mtDNA deletion investigations, while scRNA-Seq and ribosomal depletion methods were generally inutile due to insufficient mitochondrial data.

Common mtDNA deletions and the effects of age and sex

MtDNA mutations and deletions have been positively associated with aging in various tissues such as the brain, skeletal muscle, and heart^{4,5,43,67,68,69,70}. One of the most investigated large mtDNA deletions is the 8471–13449 “common deletion” (also referred to as the 4977 bp deletion), which was found to increase with age in the heart and brain of healthy adult patients^43,69,70. In contrast with the nuclear genome, the mitochondrial genome is particularly vulnerable to DNA damage because it is located in close proximity to damaging free radicals and reactive oxygen species generated through the OXPHOS pathway, does not have the protection of histones, and has less efficient DNA repair machinery^71,72,73. Over time, these factors can lead to the accumulation of mtDNA deletions within cells^70,71,73.

Among the GEO+ datasets, we chose four studies for age analysis because of the wide age distributions of their samples^20,21,22,23. In the DLPFC study²⁰, we observed significant positive correlations between age and the “Top 30” cumulative deletion read rate (p = 0.0119) and the 8471–13449 “common deletion” (p = 0.0159) (Fig. 4a). In the SM study from PD and CTRL samples²², we similarly observed significant positive correlations between age and the “Top 30” cumulative deletion read rate (p = 0.0113) as well the 8471–13449 “common deletion” (p = 0.021; Fig. 4b). In the SM study of CTRL samples²³, we did not observe a significant correlation between age and any of the four mtDNA deletion metrics evaluated (Fig. 4c); only ~1/2 of the samples were included in the aging analysis because we observed a significant “batch effect” in this study (Supplementary Fig. 4). In the Stanley Neuropathology Consortium dataset²¹, which contained CER, HIPP, and PFC, we observed significant positive correlations between age and the “Top 30” cumulative deletion read rate (p = 0.0131) and the 8471–13449 “common deletion” (p = 0.0193) in the PFC (Fig. 4f); these deletion metrics were also observed to increase with age in the CER and HIPP, but the correlations with age were not significant after multiple-comparisons corrections (Fig. 4d, e). We did not observe a significant correlation with age for the 6335–13999 or 7816–14807 deletion in any of these datasets (Fig. 4). In addition, none of these datasets exhibited significant differences between biological sexes for any of the four deletion read rate metrics (Supplementary Table 2).

Age analysis for the GTEx RNA-Seq data was also performed on a subset of available samples, including 183 paired samples of cerebellum (CER) and cortex (CORT), 41 paired samples from multiple brain regions (i.e., amygdala (AM), anterior cingulate cortex (ACC), caudate nucleus (CAUD), frontal cortex (FC), hippocampus (HIPP), and substantia nigra (SN)), and 165 paired samples from non-brain regions (i.e., blood, liver and skeletal muscle (SM)) (Fig. 5 and Supplementary Fig. 5). In the paired CER and CORT dataset, we observed significant positive correlations between age and the “Top 30” cumulative deletion read rate (p = 0.00238) and the 8471–13449 “common deletion” (p = 5.88e-4) in the CORT, and also a significant positive correlation between age and the “Top 30” cumulative deletion read rate (p = 0.0342) in the CER (Fig. 5). The other common deletions evaluated were not significant for these brain regions after multiple comparisons corrections (Fig. 5 and Supplementary Fig. 5). In the paired analysis of 6 brain regions, we observed significant positive correlations between age and the mtDNA deletion metrics for the AM, CAUD and SN; we did not observe significant correlations with age in the ACC, FC or HIPP after multiple comparisons corrections (Fig. 5 and Supplementary Fig. 5). Significant age correlations in the AM included the 8471–13449 deletion (p = 0.0172) and the “Top 30” cumulative deletion read rate (p = 0.0351). Significant age correlations in the CAUD included the 6335–13999 deletion (p = 1.94e-3), the 7816–14807 deletion (p = 0.0358), the 8471–13449 deletion (p = 1.11e-5), and the “Top 30” cumulative deletion read rate (p = 1.20e-4). Significant age correlations in the SN included the 8471–13449 deletion (p = 0.00896), and the “Top 30” cumulative deletion read rate (p = 0.0182). Lastly, in the paired analysis of blood, liver and SM, we only observed significant and positive correlations between age and the mtDNA deletion metrics for SM, specifically for the 8471–13449 deletion (p = 0.0210), and the “Top 30” cumulative deletion read rate (p = 0.0210) (Fig. 5). Overall, we were able to recapitulate previously published findings that common mtDNA deletions increase with age in the brain and muscle, and we provide evidence from paired datasets that different brain regions and tissues have variable susceptibility to this increase, some of which, to our knowledge, is illustrated for the first time.

Common mtDNA deletions across brain regions, tissues, and the effect of diagnosis

Evaluation of the MTG, AM, and SN dataset¹⁸ of PD and CTRL subjects revealed no significant differences in “Top 30” cumulative deletion read % in the SN compared to the MTG or AM for any of the deletion metrics, after multiple comparisons corrections (Fig. 6a). However, the SN did display higher levels and it should be emphasized that this study used a ribosomal depletion method for library preparation, so it is perhaps not surprising we were unable to detect a significant difference since so little mitochondrial data is captured with that RNA-Seq method. The SN and VTA LCM dataset²⁴ from PD and CTRL samples (Fig. 6b) and the dorsal and ventral SN LCM dataset²⁵ of CTRL subjects (Fig. 6c) showed no significant differences in any of the four mtDNA deletion metrics we evaluated. The Stanley Neuropathology Consortium dataset²¹ of CER, PFC, and HIPP tissue showed significantly higher deletion read %’s in the HIPP compared to CER and PFC for the 7816–14807 deletion (p ≤ 0.01324) and the “Top 30” cumulative deletion read rate (p ≤ 3.12e-12; Fig. 6d). The HIPP had significantly higher deletion read %’s than the CER for the 6335–13999 deletion (p = 0.00636) and the 8471–13449 deletion (p = 2.92e-5) (Fig. 6d). The 8471–13449 deletion were also significantly higher in the PFC than in the CER (p = 0.0261) (Fig. 6d).

In general, we did not detect significant differences based on diagnosis in any of the GEO+ datasets we evaluated. The one exception to this was the SN and VTA LCM dataset²⁴ from PD and CTRL samples had significantly higher levels of the 7816–14807 deletion in the CTRL subjects in both SN (p = 0.0109) and VTA (p = 0.0064) (Fig.6f and Supplementary Fig. 6). It should be noted that all of these studies had a small sample size per diagnostic group (range: n = 5–17).

Analysis of GTEx included tissue comparisons of the three paired datasets used in our aging analysis (Fig. 5). In paired CER and CORT, the CORT had significantly higher amounts of all four deletion metrics (6335–13999 deletion (p = 5.72e-19); 7816–14807 deletion (p = 1.43e-15); 8471–13449 deletion (p = 3.86e-33); and the “Top 30” cumulative deletion read rate (p = 7.68e-41) (Fig. 7a). Across the six paired brain regions, deletion levels were also significantly different (6335–13999 deletion (p = 1.46e-17); 7816–14807 deletion (p = 2.29e-15); 8471–13449 deletion (p = 6.64e-20); and the “Top 30” cumulative deletion read rate (p = 3.38e-21) (Fig. 7b). Similarly, in paired blood, liver and SM, all four deletion metrics were significantly different (6335–13999 deletion (p = 1.56e-5); 7816–14807 deletion (p = 5.99e-13); 8471–13449 deletion (p = 1.18e-10); and the “Top 30” cumulative deletion read rate (p = 9.13e-38) (Fig. 7c). Cross comparisons of all GTEx tissues revealed the following results: (1) blood had the lowest deletion read % and was significantly lower than all other tissues; (2) CER had significantly lower deletion levels than all other brain regions; (3) liver had significantly lower deletion levels than SM and cortical brain regions; (4) SM had significantly lower deletion levels than some (but not all) brain regions; (5) brain regions with dopaminergic neurons (i.e., the CAUD and SN) had remarkable enrichment of common mtDNA deletions, with significantly higher levels than all other tissues. The SN had by far the highest mtDNA deletion levels and was likewise significantly higher than all tissues including the CAUD. All tissue comparisons and p-values are shown in Fig. 7 and Supplementary Fig. 7.

Common mtDNA deletions across imputed cortical layers in spatial transcriptomics data

To investigate the common mtDNA deletion metrics across (imputed) cortical layers, we used 12 spatial transcriptomics samples from two separate human brain datasets. The two spatial transcriptomics datasets include the Maynard et al. “SpatialLIBD” dataset²⁶, from which we included 4x section replicates of DLPFC tissue from a single healthy control subject (sample numbers 151673–151676), and a new internal dataset of 8x spatial transcriptomics samples of MTG tissue taken from duplicate sections of 4 patients (diagnosed as AD, PD, or CTRL). Both datasets were prepared according to 10x Genomics’s Visium spatial transcriptomics library preparation and sequencing protocols. The SpatialLIBD samples were included to assist in cortical layer imputation since they contained “ground truth” measures of cortical layer geography based on stereology/imaging analysis²⁶. We performed Seurat⁶⁰ clustering on all 12 spatial samples from both studies and imputed cortical layers based on the Seurat clusters that had the highest overlap with the SpatialLIBD ground truth annotations; this resulted in the following imputed cortical layers: “White Matter”, “Layer 1”, “Layer 2 + 3”, “Layer 3”, “Layer 5”, and “Layer 6” (Fig. 8a–c).

To determine the proportion of spots in each cortical layer that contained the 6335–13999 and/or 8471–13449 mtDNA deletion, we needed the spot barcode data (which is not output as part of the Splice-Break2 process). As such, we used “grep” to identify reads that spanned these breakpoints, as described previously, and then determined which cortical layer these reads mapped to using the spot barcode. The percentage of spots that contained the 6335–13999 deletion (Fig. 8d) or the 8471–13449 deletion (Fig. 8e) for all 12 samples was analyzed for each imputed cortical layer. We observed that the 6335–13999 deletion had a significantly higher percentage of spots (two-proportion Z-test) in imputed cortical layers 3 and 5 compared to the white matter (p ≤ 0.0066; Fig. 8d). The 8471–13449 “common deletion” also had significantly higher geographical representation in cortical layers 3 and 5 compared to white matter (p ≤ 3.30e-5; Fig. 8e). These spots containing deletions were also annotated on tissue images for visualization (Supplementary Fig. 8). Further, we observed similar trends in the percentage of spots with the 6335–13999 or 8471–13449 deletion when analyzing just the USC internal samples (Supplementary Fig. 9). These 8x spatial samples were split according to their imputed cortical layer (i.e., using “cluster BAMS” from Seurat⁶⁰), and each imputed layer was run through Splice-Break2; we found a significant difference between white and grey matter for the 8471–13449 “common deletion” (p = 0.0393; Supplementary Fig. 9). Taken together, these results suggest cortical layers have different levels of common mtDNA deletions in the aged human brain and are most abundant in layers 3 and 5 where there is an enrichment of pyramidal neurons^26,74,75.

Discussion

The aim of this study was to determine whether RNA-Seq data can be used to evaluate common mtDNA deletion levels in somatic tissues using the bioinformatics tool Splice-Break2. Initial analyses were done on a paired dataset where there was DNA and RNA-Seq data available; we observed robust differences in DNA and RNA deletion read %, with the DNA data containing at least 22-fold higher deletion read rates. We hypothesize that the decreased rate of these common deletions in RNA-Seq data may be due to transcript abundance or stability- that mitochondrial genomes with large deletions may transcribe polycistronic transcripts less efficiently than wild-type molecules, or that RNA transcripts containing deletion breakpoints may be less stable, or both. It is worth mentioning that polyadenylation of mitochondrial transcripts serves a different role than it does for nuclear-encoded genes where 3′ polyadenylation helps stabilize mRNA molecules and promotes protein translation; this also occurs for human mitochondrial-encoded genes, but transcripts can also be polyadenylated at many intragenic positions and are subject to polyadenylation-dependent RNA degradation mechanisms that are conserved features of bacteria (the evolutionary origin of mitochondria)^76,77,78. In addition, the deletion read %’s in the DNA data are likely increased due to the long-range PCR amplification used for mitochondrial enrichment and the smaller size of the deleted genomes.

The mtDNA deletion metrics that were examined in this study were the cumulative read % of the “Top 30” most frequent deletions that we identified and Sanger-validated in our previous study, and the individual read % of three specific mtDNA deletions: 6335–13999 (adjusted position 6341–14005), 7816–14807 (adjusted position 7814–14805), and 8471–13449 (the “common deletion”; adjusted position 8482–13460). The cumulative read % of the “Top 30” deletions and the 8471–13449 deletion had statistically significant positive correlations between the DNA and RNA-Seq paired dataset examined, with the “common deletion” having the strongest correlations between methods. Future studies may include additional paired datasets (DNA, RNA-Seq, and other genomics methods) and analyses of less common mitochondrial deletions. Analysis of patients with large clonal deletions will also be of interest. The Splice-Break2 pipeline does output additional breakpoints besides the “Top 30” most common deletions, but those metrics should be used with caution for RNA-Seq data and may require further validation (such as qPCR and/or Sanger sequencing).

All Splice-Break2 data is reported as “deletion read %”, which is the number of deletion reads detected divided by the MT Benchmark coverage; however, this should not be interpreted as an absolute estimation of heteroplasmy rate (for DNA or RNA-Seq) because of inherent biases of these NGS methods. In our original methods article, we did observe a significant positive correlation between the Splice-Break2 deletion read % and qPCR quantification of the 8471–13449 “common deletion”, and here we show the deletion read %’s can correlate between DNA and RNA-Seq datasets- thus, we conclude that although absolute heteroplasmy rates cannot be inferred, the fold differences between samples is retained and allows us to use deletion read % for statistical analyses of age, tissue type and diagnoses. Future studies that pair mitochondrial/metabolic assays with NGS measurements will be important to understand the physiological relevance of these deletions and at what threshold deletion read %’s impact cellular function.

Splice-Break2 was used to examine various library preparation methods, such as bulk RNA-Seq, LCM RNA-Seq, spatial transcriptomics, and scRNA-Sequencing. We observed that bulk sequencing without ribosomal depletion allowed for more consistent capturing of mtDNA transcripts and deletions compared to bulk sequencing with ribosomal depletion. Additionally, the scRNA-Seq studies we evaluated had minimal to no mtDNA deletion capture despite containing comparable amounts of mtDNA transcripts (MT Benchmark coverage) compared to the other studies evaluated. Analysis of additional scRNA-Seq datasets and methods, including LCM, along with other non-RNA single-cell genomics methods (such as scATAC-Seq), is warranted in the future. The LCM RNA-Seq samples we evaluated contained the highest deletion read rates; however, it is important to note that these samples originated from aged brain tissue from regions innervated with dopaminergic neurons (SN and VTA), which have a demonstrated significant increase in mtDNA deletion burden^79,80. Therefore, we cannot determine from these LCM datasets if the increased mtDNA deletion levels are strictly due to brain region, library preparation method, or are a combination of both effects. Overall, these data indicate that the most valuable RNA-Seq wet lab protocols for mtDNA deletion detection include bulk sequencing without ribosomal depletion (e.g., polyA), LCM RNA-Seq, and spatial transcriptomics.

Our analysis of aging, differences among brain regions, and diagnostic effects revealed trends that were consistent with the effects of mitochondrial dysfunction reported in current literature^{2,3,4,5,9,79,80}. It has been reported in several studies that mtDNA deletions accumulate in some tissues as they age, and studies on the “hallmark” mtDNA deletion disorders (e.g., KSS, Pearson’s Syndrome) have proved these deletions can have functional (and even lethal) consequences if at a high enough threshold to affect cellular function^36,37. The age-dependent increase in mtDNA deletions has been observed in various somatic tissues including the brain; in the substantia nigra, high mtDNA deletions have been linked to age-related and disease-related (i.e., PD) neuronal loss^43,67,79,80. We observed statistically significant increases in the mtDNA deletion metrics due to age in brain and/or muscle datasets. Our most robust evaluation of aging was performed using paired RNA-Seq data from GTEx. Overall, we were able to recapitulate previously published findings that common mtDNA deletions increase with age in the brain and muscle, and show that different brain regions and tissues have variable susceptibility to this age-dependent increase.

We were able to observe some interesting differences in tissues and brain regions in the paired GTEx dataset, where common mitochondrial deletions were lowest in blood, lower in liver than skeletal muscle, higher in cortex than in cerebellum, and high (but variable) across cortical brain regions. In both GEO+ and GTEx datasets, we observed increased mtDNA deletions in brain regions containing dopaminergic neurons (SN, VTA and caudate nucleus); however, we did not detect an increase in PD. This is consistent with previous reports that have found an increase in SN tissue but no significant increase in PD specifically^79,81,82,83. It would be interesting to investigate a larger cohort of patient samples taken at various stages of PD and/or compare these results neuropathological measures of dopaminergic cell death (e.g., SN depigmentation scores), as this data suggests the age effects in PD SN may not be linear. Measures of cell type abundance (e.g., histology, in situ gene expression, cell counts from flow cytometry or CBC tests, or single-cell RNA-Seq data) from paired tissue (preferably the same dissection/sample) maybe also improve analyses of age and diagnosis if that data can be obtained and used as a co-variate.

Our common mtDNA deletion analysis of spatial transcriptomics data, specifically across (imputed) cortical layers of the DLPFC and MTG, revealed increased deletion burden in grey matter compared to white matter, which is not surprising given its high metabolic activity^74,84,85. Cortical layers 3 and 5 consistently contained the highest percentage of “spot barcodes” with mtDNA deletions. This supports the hypothesis that brain regions and cortical layers are differentially susceptible to mtDNA damage^69,79,80,86. Future studies investigating the effect of mtDNA damage in neurodevelopment, psychiatric disorders or neurodegenerative diseases may want to focus on spatial transcriptomic approaches so that analyses can focus on specific cortical layers and increase the signal-to-noise ratio for these tests.

In summary, this robust analysis of multiple, highly used RNA-Seq methods demonstrated the utility of detecting mtDNA deletions of high frequency through our bioinformatics pipeline. The Splice-Break2 tool was effective in quantifying common mtDNA deletions in polyA (non-ribosomal depleted) bulk sequencing, LCM, and spatial transcriptomic datasets, and these datasets may also be amenable to other bioinformatics approaches of mtDNA deletion quantification. Of note, the current version of Splice-Break2 is only compatible with human data, as the alignment and annotations utilize the rCRS⁶⁴, our catalogue of human mtDNA deletion breakpoints¹³, and MitoMap⁸⁷, respectively. With the wide breadth of publicly available and restricted human RNA-Seq datasets that encompass a variety of tissues, diseases, and experimental/environmental conditions, the ability to incorporate mtDNA deletion metrics into these investigations may provide information about metabolic effects in tissues and their contributions to disease phenotypes and aging.

Data availability

Our analysis included 12 previously published studies and 2 newly presented here. The 12 previously published studies we included in this paper can be accessed from the following: (1) Simchovitz et al.¹⁸ study is deposited on GEO with accession code “GSE114517”; (2) Nativio et al.¹⁹ study is deposited on GEO with accession code “GSE159699”; (3) Zeppillo et al.²⁰ study is deposited on GEO with accession code “GSE224683”; (4) Kim et al. study^21,65 (i.e., The Stanley Neuropathology Consortium) is available on their website (http://sncid.stanleyresearch.org); 5) Lavin et al. study²² is deposited on GEO with accession code “GSE140089”; (6) Tumasian et al. study²³ is deposited on GEO with accession code “GSE164471”; (7) Aguila et al. study²⁴ is deposited on GEO with accession code “GSE114918”; (8) Monzón-Sandoval et al. study²⁵ is deposited on GEO with accession code “GSE166024”; (9) Maynard et al. study²⁶ is available on their GitHub page (https://github.com/LieberInstitute/HumanPilot) and Globus endpoint “jhpce#HumanPilot10x” (http://research.libd.org/globus/jhpce_HumanPilot10x/index.html); (10) Enge et al. study²⁷ is deposited on GEO with accession code “GSE81547”; (11) Darmanis et al. study²⁸ is deposited on GEO with accession code “GSE67835”; and (12) Lonsdale et al. (i.e., GTEx) study²⁹ is available through their portal (https://gtexportal.org/home/)²⁹. All RNA-Seq samples newly presented in this study have been deposited on GEO with primary accession code “GSE226663”. Source data underlying Fig. 1 and Fig. 3a–h can be found in Supplementary Data 1. Deletion metrics for all GEO+ studies (used for Figs. 3, 4 and 6) can be found in Supplementary Data 2. Source data underlying Fig. 8d can be found in Supplementary Data 3. Raw data is not provided for the GTEx datasets due to restricted access.

Code availability

All scripts and details on processing steps are available (https://github.com/aomidsalar/RNA-Seq_Splice-Break2; https://doi.org/10.5281/zenodo.10499375)⁸⁸. This repository and the Splice-Break2 tool repository (https://github.com/brookehjelm/Splice-Break2; https://doi.org/10.5281/zenodo.10499097)⁸⁹ both contain a document for ‘RNA-Seq Best Practices for Splice-Break2’ to help guide users with workflow and command lines.

References

Holt, I. J., Harding, A. E. & Morgan-Hughes, J. A. Deletions of muscle mitochondrial DNA in patients with mitochondrial myopathies. Nature 331, 717–719 (1988).
Article ADS CAS PubMed Google Scholar
Wallace, D. C. et al. Mitochondrial DNA mutation associated with Leber’s hereditary optic neuropathy. Science 242, 1427–1430 (1988).
Article ADS CAS PubMed Google Scholar
Wallace, D. C. et al. Mitochondrial DNA mutations in human degenerative diseases and aging. Biochim. Biophys. Acta 1271, 141–151 (1995).
Article PubMed Google Scholar
Wallace, D. C. Mitochondrial DNA in aging and disease. Sci. Am. 277, 40–47 (1997).
Article CAS PubMed Google Scholar
Wallace, D. C. Mitochondrial DNA mutations in disease and aging. Environ. Mol. Mutagen. 51, 440–450 (2010).
Article CAS PubMed Google Scholar
Moraes, C. T., Schon, E. A., DiMauro, S. & Miranda, A. F. Heteroplasmy of mitochondrial genomes in clonal cultures from patients with Kearns-Sayre syndrome. Biochem. Biophys. Res. Commun. 160, 765–771 (1989).
Article CAS PubMed Google Scholar
Shoffner, J. M. et al. Myoclonic epilepsy and ragged-red fiber disease (MERRF) is associated with a mitochondrial DNA tRNA(Lys) mutation. Cell 61, 931–937 (1990).
Article CAS PubMed Google Scholar
Shoffner, J. M. et al. Spontaneous Kearns-Sayre/chronic external ophthalmoplegia plus syndrome associated with a mitochondrial DNA deletion: a slip-replication model and metabolic therapy. Proc. Natl Acad. Sci. USA 86, 7952–7956 (1989).
Article ADS CAS PubMed PubMed Central Google Scholar
Wallace, D. C. A mitochondrial paradigm of metabolic and degenerative diseases, aging, and cancer: a dawn for evolutionary medicine. Annu. Rev. Genet 39, 359–407 (2005).
Article CAS PubMed PubMed Central Google Scholar
Moraes, C. T., Atencio, D. P., Oca-Cossio, J. & Diaz, F. Techniques and pitfalls in the detection of pathogenic mitochondrial DNA mutations. J. Mol. Diagn. 5, 197–208 (2003).
Article CAS PubMed PubMed Central Google Scholar
Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
Article ADS CAS PubMed Google Scholar
Belmonte, F. R. et al. Digital PCR methods improve detection sensitivity and measurement precision of low abundance mtDNA deletions. Sci. Rep. 6, 25186 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Hjelm, B. E. et al. Splice-Break: exploiting an RNA-seq splice junction algorithm to discover mitochondrial DNA deletion breakpoints and analyses of psychiatric disorders. Nucleic Acids Res. 47, e59 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bosworth, C. M., Grandhi, S., Gould, M. P. & LaFramboise, T. Detection and quantification of mitochondrial DNA deletions from next-generation sequence data. BMC Bioinform. 18, 407 (2017).
Article Google Scholar
Basu, S. et al. Accurate mapping of mitochondrial DNA deletions and duplications using deep sequencing. PLoS Genet 16, e1009242 (2020).
Article CAS PubMed PubMed Central Google Scholar
Goudenège, D. et al. eKLIPse: a sensitive tool for the detection and quantification of mitochondrial DNA deletions from next-generation sequencing data. Genet Med. 21, 1407–1416 (2019).
Article PubMed Google Scholar
Lujan, S. A. et al. Ultrasensitive deletion detection links mitochondrial DNA replication, disease, and aging. Genome Biol. 21, 248 (2020).
Article CAS PubMed PubMed Central Google Scholar
Simchovitz, A. et al. A lncRNA survey finds increases in neuroprotective LINC-PINT in Parkinson’s disease substantia nigra. Aging Cell 19, e13115 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nativio, R. et al. An integrated multi-omics approach identifies epigenetic alterations associated with Alzheimer’s disease. Nat. Genet 52, 1024–1035 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zeppillo, T. et al. Functional impairment of cortical AMPA receptors in schizophrenia. Schizophr. Res. https://doi.org/10.1016/j.schres.2020.03.037 (2020).
Kim, S. & Webster, M. J. The stanley neuropathology consortium integrative database: a novel, web-based tool for exploring neuropathological markers in psychiatric disorders and the biological processes associated with abnormalities of those markers. Neuropsychopharmacology 35, 473–482 (2010).
Article CAS PubMed Google Scholar
Lavin, K. M. et al. Rehabilitative impact of exercise training on human skeletal muscle transcriptional programs in Parkinson’s disease. Front Physiol 11, 653 (2020).
Article PubMed PubMed Central Google Scholar
Tumasian, R. A. et al. Skeletal muscle transcriptome in healthy aging. Nat. Commun. 12, 2014 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Aguila, J. et al. Spatial RNA sequencing identifies robust markers of vulnerable and resistant human midbrain dopamine neurons and their expression in Parkinson’s disease. Front. Mol. Neurosci. 14, 699562 (2021).
Article CAS PubMed PubMed Central Google Scholar
Monzón-Sandoval, J. et al. Human-specific transcriptome of ventral and dorsal midbrain dopamine neurons. Ann. Neurol. 87, 853–868 (2020).
Article PubMed PubMed Central Google Scholar
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Article CAS PubMed PubMed Central Google Scholar
Enge, M. et al. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 171, 321–330.e314 (2017).
Article CAS PubMed PubMed Central Google Scholar
Darmanis, S. et al. A survey of human brain transcriptome diversity at the single cell level. Proc. Natl Acad. Sci. USA 112, 7285–7290 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Consortium, G. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article Google Scholar
Nissanka, N., Minczuk, M. & Moraes, C. T. Mechanisms of mitochondrial DNA deletion formation. Trends Genet. 35, 235–244 (2019).
Article CAS PubMed Google Scholar
Thyagarajan, B., Padua, R. A. & Campbell, C. Mammalian mitochondria possess homologous DNA recombination activity. J. Biol. Chem. 271, 27536–27543 (1996).
Article CAS PubMed Google Scholar
Lakshmipathy, U. & Campbell, C. Double strand break rejoining by mammalian mitochondrial extracts. Nucleic Acids Res. 27, 1198–1204 (1999).
Article CAS PubMed PubMed Central Google Scholar
Mita, S. et al. Recombination via flanking direct repeats is a major cause of large-scale deletions of human mitochondrial DNA. Nucleic Acids Res. 18, 561–567 (1990).
Article CAS PubMed PubMed Central Google Scholar
Damas, J., Carneiro, J., Amorim, A. & Pereira, F. MitoBreak: the mitochondrial DNA breakpoints database. Nucleic Acids Res. 42, D1261–D1268 (2014).
Article CAS PubMed Google Scholar
Schon, E. A. et al. A direct repeat is a hotspot for large-scale deletion of human mitochondrial DNA. Science 244, 346–349 (1989).
Article ADS CAS PubMed Google Scholar
Zeviani, M. et al. Deletions of mitochondrial DNA in Kearns-Sayre syndrome. Neurology 38, 1339–1346 (1988).
Article CAS PubMed Google Scholar
Moraes, C. T. et al. Mitochondrial DNA deletions in progressive external ophthalmoplegia and Kearns-Sayre syndrome. N Engl. J. Med. 320, 1293–1299 (1989).
Article CAS PubMed Google Scholar
Johns, D. R., Threlkeld, A. B., Miller, N. R. & Hurko, O. Multiple mitochondrial DNA deletions in myo-neuro-gastrointestinal encephalopathy syndrome. Am. J. Ophthalmol. 115, 108–109 (1993).
Article CAS PubMed Google Scholar
Nakai, A. et al. Diffuse leukodystrophy with a large-scale mitochondrial DNA deletion. Lancet 343, 1397–1398 (1994).
Article CAS PubMed Google Scholar
Moslemi, A. R., Melberg, A., Holme, E. & Oldfors, A. Clonal expansion of mitochondrial DNA with multiple deletions in autosomal dominant progressive external ophthalmoplegia. Ann. Neurol. 40, 707–713 (1996).
Article CAS PubMed Google Scholar
Paramasivam, A. et al. Novel mutation in C10orf2 associated with multiple mtDNA deletions, chronic progressive external ophthalmoplegia and premature aging. Mitochondrion 26, 81–85 (2016).
Article CAS PubMed Google Scholar
Ozawa, T. et al. Quantitative determination of deleted mitochondrial DNA relative to normal DNA in parkinsonian striatum by a kinetic PCR analysis. Biochem. Biophys. Res. Commun. 172, 483–489 (1990).
Article CAS PubMed Google Scholar
Wei, Y. H. Mitochondrial DNA alterations as ageing-associated molecular events. Mutat. Res. 275, 145–155 (1992).
Article CAS PubMed Google Scholar
Phillips, N. R., Simpkins, J. W. & Roby, R. K. Mitochondrial DNA deletions in Alzheimer’s brains: a review. Alzheimers Dement 10, 393–400 (2014).
Article PubMed Google Scholar
Yan, F., Powell, D. R., Curtis, D. J. & Wong, N. C. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis. Genome Biol. 21, 22 (2020).
Article PubMed PubMed Central Google Scholar
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Article ADS PubMed PubMed Central Google Scholar
Preissl, S. et al. Author Correction: Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 1015 (2018).
Article CAS PubMed Google Scholar
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116.e1120 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rickner, H. D., Niu, S. Y. & Cheng, C. S. ATAC-seq assay with low mitochondrial DNA contamination from primary human CD4+ T lymphocytes. J. Vis. Exp. https://doi.org/10.3791/59120 (2019).
Li, X., Nair, A., Wang, S. & Wang, L. Quality control of RNA-seq experiments. Methods Mol. Biol. 1269, 137–146 (2015).
Article CAS PubMed Google Scholar
Alvarez, M. et al. Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM. Sci. Rep. 10, 11019 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758 (2013).
Article PubMed PubMed Central Google Scholar
Stewart, J. B. & Chinnery, P. F. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat. Rev. Genet. 16, 530–542 (2015).
Article CAS PubMed Google Scholar
Rollins, B. L. et al. Mitochondrial complex I deficiency in schizophrenia and bipolar disorder and medication influence. Mol. Neuropsychiatry 3, 157–169 (2018).
PubMed Google Scholar
Zhang, W., Cui, H. & Wong, L. J. Comprehensive one-step molecular analyses of mitochondrial genome by massively parallel sequencing. Clin. Chem. 58, 1322–1331 (2012).
Article CAS PubMed Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023).
Article CAS PubMed Google Scholar
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
Zhou, L. et al. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Brief Bioinform. 23 https://doi.org/10.1093/bib/bbac222 (2022).
Andrews, R. M. et al. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999).
Article CAS PubMed Google Scholar
Hwang, Y. et al. Gene expression profiling by mRNA sequencing reveals increased expression of immune/inflammation-related genes in the hippocampus of individuals with schizophrenia. Transl. Psychiatry 3, e321 (2013).
Article CAS PubMed PubMed Central Google Scholar
Prakrithi, P., Juwayria, Jain, D., Malik, P. S. & Gupta, I. Caution towards spurious off-target signal in 10X Visium spatial transcriptomics assay from potential lncRNAs. Brief. Bioinform. 24, bbad031 (2023).
Google Scholar
Thompson, L. V. Oxidative stress, mitochondria and mtDNA-mutator mice. Exp. Gerontol. 41, 1220–1222 (2006).
Article CAS PubMed PubMed Central Google Scholar
Kujoth, G. C., Bradshaw, P. C., Haroon, S. & Prolla, T. A. The role of mitochondrial DNA mutations in mammalian aging. PLoS Genet 3, e24 (2007).
Article PubMed PubMed Central Google Scholar
Corral-Debrinski, M. et al. Mitochondrial DNA deletions in human brain: regional variability and increase with advanced age. Nat. Genet. 2, 324–329 (1992).
Article CAS PubMed Google Scholar
Cortopassi, G. A., Shibata, D., Soong, N. W. & Arnheim, N. A pattern of accumulation of a somatic deletion of mitochondrial DNA in aging human tissues. Proc. Natl Acad. Sci. USA 89, 7370–7374 (1992).
Article ADS CAS PubMed PubMed Central Google Scholar
Copeland, W. C. & Longley, M. J. Mitochondrial genome maintenance in health and disease. DNA Repair (Amst.) 19, 190–198 (2014).
Article CAS PubMed Google Scholar
Cline, S. D. Mitochondrial DNA damage and its consequences for mitochondrial gene expression. Biochim. Biophys. Acta 1819, 979–991 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. R. et al. Mitochondrial DNA aberrations and pathophysiological implications in hematopoietic diseases, chronic inflammatory diseases, and cancers. Ann. Lab. Med. 35, 1–14 (2015).
Article PubMed Google Scholar
Harris, J. J. & Attwell, D. The energetics of CNS white matter. J. Neurosci. 32, 356–371 (2012).
Article CAS PubMed PubMed Central Google Scholar
Harris, K. D. & Shepherd, G. M. The neocortical circuit: themes and variations. Nat. Neurosci. 18, 170–181 (2015).
Article CAS PubMed PubMed Central Google Scholar
Slomovic, S., Laufer, D., Geiger, D. & Schuster, G. Polyadenylation and degradation of human mitochondrial RNA: the prokaryotic past leaves its mark. Mol. Cell Biol. 25, 6427–6435 (2005).
Article CAS PubMed PubMed Central Google Scholar
Nagaike, T., Suzuki, T. & Ueda, T. Polyadenylation in mammalian mitochondria: insights from recent studies. Biochim. Biophys. Acta. 1779, 266–269 (2008).
Article CAS PubMed Google Scholar
Shepard, P. J. et al. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA 17, 761–772 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bender, A. et al. High levels of mitochondrial DNA deletions in substantia nigra neurons in aging and Parkinson disease. Nat. Genet. 38, 515–517 (2006).
Article CAS PubMed Google Scholar
Kraytsberg, Y. et al. Mitochondrial DNA deletions are abundant and cause functional impairment in aged human substantia nigra neurons. Nat. Genet. 38, 518–520 (2006).
Article CAS PubMed Google Scholar
Dölle, C. et al. Defective mitochondrial DNA homeostasis in the substantia nigra in Parkinson disease. Nat. Commun. 7, 13548 (2016).
Article ADS PubMed PubMed Central Google Scholar
Reeve, A. K. et al. Nature of mitochondrial DNA deletions in substantia nigra neurons. Am. J. Hum. Genet. 82, 228–235 (2008).
Article CAS PubMed PubMed Central Google Scholar
Mamdani, F., Rollins, B., Morgan, L., Sequeira, P. A. & Vawter, M. P. The somatic common deletion in mitochondrial DNA is decreased in schizophrenia. Schizophr. Res. 159, 370–375 (2014).
Article PubMed PubMed Central Google Scholar
Laughlin, S. B. & Sejnowski, T. J. Communication in neuronal networks. Science 301, 1870–1874 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Yu, Y., Herman, P., Rothman, D. L., Agarwal, D. & Hyder, F. Evaluating the gray and white matter energy budgets of human brain function. J Cereb. Blood Flow Metab. 38, 1339–1353 (2018).
Article PubMed Google Scholar
Campbell, G. R. et al. Mitochondrial DNA deletions and neurodegeneration in multiple sclerosis. Ann. Neurol. 69, 481–492 (2011).
Article CAS PubMed Google Scholar
Lott, M. T. et al. mtDNA variation and analysis using mitomap and mitomaster. Curr. Protoc. Bioinform. 44, 1.23.21–26 (2013).
Article Google Scholar
Omidsalar, A. A. et al. (Zenodo, https://doi.org/10.5281/zenodo.10499375).
Hjelm, B. E. et al. (Zenodo, https://doi.org/10.5281/zenodo.10499097).

Download references

Acknowledgements

Funding support for A.A.O. was provided by the USC Department of Translational Genomics, the Anenberg Center for Parkinson’s Disease and Other Neurodegenerative Disorders (USC), and NINDS U01 NS120260. B.E.H. was supported by the USC Department of Translational Genomics. Funding for paired RNA and DNA samples was provided by NIMH R21 MH113177 and NIMH R01 MH085801, respectively. Funding for spatial transcriptomics experiments was provided by the USC Department of Translational Genomics and 10x Genomics. Brain tissue used for paired RNA and DNA analyses was collected at the University of California Irvine Brain Bank and funded by philanthropic gifts from the William Lyon Penzner Foundation, the Della Martin Foundation, and the Pritzker Neuropsychiatric Disorders Research Consortium. Brain tissue used for spatial transcriptomics and bulk RNA-Seq was collected through the BSHRI Brain and Body Donation Program, which has been supported by NINDS (U24 NS072026 National Brain and Tissue Resource for Parkinson’s Disease and Related Disorders), NIA (P30 AG19610 and P30AG072980, Arizona Alzheimer’s Disease Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer’s Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001 to the Arizona Parkinson’s Disease Consortium) and the Michael J. Fox Foundation for Parkinson’s Research. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Mitochondrial DNA data used for this analysis are available in: dbGaP accession number phs002395.v1.p1. The GTEx data used for the analyses described in this manuscript were obtained from: dbGaP accession number phs000424.vN.pN on 07/17/2023. We would like to thank the laboratory of Leonardo Morsut (USC) for the use of their microscope. We would also like to thank the authors of the various human RNA-Seq studies analyzed here for making their data publicly available to the research community. In addition, we are always greatly appreciative to all the patients and tissue donors, as well as their families, without whom this work would not be possible.

Author information

Authors and Affiliations

Department of Translational Genomics, Keck School of Medicine of USC, Los Angeles, CA, USA
Audrey A. Omidsalar, Carmel G. McCullough, Lili Xu, Stanley Boedijono, Daniel Gerke, Michelle G. Webb, Zarko Manojlovic & Brooke E. Hjelm
Department of Psychiatry and Human Behavior, University of California - Irvine (UCI) School of Medicine, Irvine, CA, USA
Adolfo Sequeira & Marquis P. Vawter
Department of Neurology, Keck School of Medicine of USC, Los Angeles, CA, USA
Mark F. Lew
Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine of USC, Los Angeles, CA, USA
Marco Santorelli
Banner Sun Health Research Institute (BSHRI), Sun City, AZ, USA
Geidy E. Serrano & Thomas G. Beach
Mitchell Center for Neurodegenerative Diseases, Department of Neurology, School of Medicine, University of Texas Medical Branch, Galveston, TX, USA
Agenor Limon

Authors

Audrey A. Omidsalar
View author publications
You can also search for this author in PubMed Google Scholar
Carmel G. McCullough
View author publications
You can also search for this author in PubMed Google Scholar
Lili Xu
View author publications
You can also search for this author in PubMed Google Scholar
Stanley Boedijono
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gerke
View author publications
You can also search for this author in PubMed Google Scholar
Michelle G. Webb
View author publications
You can also search for this author in PubMed Google Scholar
Zarko Manojlovic
View author publications
You can also search for this author in PubMed Google Scholar
Adolfo Sequeira
View author publications
You can also search for this author in PubMed Google Scholar
Mark F. Lew
View author publications
You can also search for this author in PubMed Google Scholar
Marco Santorelli
View author publications
You can also search for this author in PubMed Google Scholar
Geidy E. Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Thomas G. Beach
View author publications
You can also search for this author in PubMed Google Scholar
Agenor Limon
View author publications
You can also search for this author in PubMed Google Scholar
Marquis P. Vawter
View author publications
You can also search for this author in PubMed Google Scholar
Brooke E. Hjelm
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AAO and BEH wrote the manuscript; AAO and SB performed the data analysis; AAO generated the figures; BEH, CGM, DG, and ZM performed the spatial transcriptomics and library preparations; AAO, LX, and MGW wrote the code; AS, MFL, MS, GES, TGB, AL, and MPV provided materials and/or tissue.

Corresponding author

Correspondence to Brooke E. Hjelm.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Caleb Lareau, Darren Walsh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Debarka Sengupta and George Inglis.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Omidsalar, A.A., McCullough, C.G., Xu, L. et al. Common mitochondrial deletions in RNA-Seq: evaluation of bulk, single-cell, and spatial transcriptomic datasets. Commun Biol 7, 200 (2024). https://doi.org/10.1038/s42003-024-05877-4

Download citation

Received: 15 March 2023
Accepted: 31 January 2024
Published: 17 February 2024
DOI: https://doi.org/10.1038/s42003-024-05877-4

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.