Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9

ATAC-seq is a high-throughput sequencing technique that identifies open chromatin. Depending on the cell type, ATAC-seq samples may contain ~20–80% of mitochondrial sequencing reads. As the regions of open chromatin of interest are usually located in the nuclear genome, mitochondrial reads are typically discarded from the analysis. We tested two approaches to decrease wasted sequencing in ATAC-seq libraries generated from lymphoblastoid cell lines: targeted cleavage of mitochondrial DNA fragments using CRISPR technology and removal of detergent from the cell lysis buffer. We analyzed the effects of these treatments on the number of usable (unique, non-mitochondrial) reads and the number and quality of peaks called, including peaks identified in enhancers and transcription start sites. Both treatments resulted in considerable reduction of mitochondrial reads (1.7 and 3-fold, respectively). The removal of detergent, however, resulted in increased background and fewer peaks. The highest number of peaks and highest quality data was obtained by preparing samples with the original ATAC-seq protocol (using detergent) and treating them with CRISPR. This strategy reduced the amount of sequencing required to call a high number of peaks, which could lead to cost reduction when performing ATAC-seq on large numbers of samples and in cell types that contain a large amount of mitochondria.


Results
The anti-mt CRISPR treatment consisted of 100 gRNAs targeting the human mitochondrial genome at regular intervals, which is usually densely covered by ATAC-seq reads generated from LCLs, as shown in Fig. 1. The rationale was to cleave targeted DNA fragments in the sequencing library, rendering them unable to bind and amplify on the Illumina HiSeq flow cell. Similarly to Gu et al. 2 and Wu et al. 3 , we chose to treat the final (PCR amplified) sequencing library with the gRNA/Cas9 mix instead of the unamplified tagmented DNA because of the small amount of DNA present in the sample at this earlier step. Although treating the samples before PCR amplification might result in lower fractions of mitochondria, we chose the conservative approach of treating larger amounts of DNA to reduce technical variability.
To develop and analyze the anti-mt CRISPR treatment, we used 50,000 human LCL cells per sample and generated a total of 27 pairs of ATAC-seq libraries for Illumina high-throughput sequencing according to the protocol of Buenrostro et al. 6 (Array Express accession number E-MTAB-5205 and Supplemental File 2). We split each of the 27 libraries into two equal parts, leaving one half untreated and treating the other half with 100 mitochondrial gRNAs and Cas9. Due to the single turn-over nature of Cas9, Gu et al. 2 used an excess of enzyme and of gRNA to deplete mitochondrial ribosomal DNA. Based on this notion, we used 100X Cas9 and 100X gRNA excess. We assumed 50% mtDNA fragments in the PCR-amplified ATAC-seq library to calculate exact amounts to be used in the treatment (see Methods).
We obtained between 9.8 M and 108.6 M reads per sample in four batches of experiments. Because different numbers of reads were sequenced from each sample due to imprecision in DNA quantification and the number of multiplexed samples, we randomly sampled a fixed number of sequenced reads from each library in order to compare across samples. This approach allowed us to assess which library preparation method yielded the best results regardless of how it affected the number of aligned or usable (unique, non-mitochondrial) reads.
After aligning reads to the human genome, we removed mitochondrial reads and reads aligned to identical coordinates and called peaks using HOMER 7 and MACS2 8 . Qualitatively similar results were obtained with both peak callers at three read depths (9.8 M (54 samples), 17 M (52 samples) and 21.9 M (47 samples)) and using different parameters to call peaks (Supplemental File 1). The results reported in the figures were obtained with MACS2 using custom parameters and 21.9 M sequenced reads. Results for all other read depths and parameters are presented in Supplemental Figs S1 and S2 and Supplemental Tables S1 and S2 in Supplemental File 1. Figure 2 shows the comparison between 14 ATAC-seq samples before and after treatment with anti-mt CRISPR, using the original ATAC-seq protocol that includes detergent (DT). Visual inspection of the data showed that the untreated and treated samples were similar (Fig. 2a), indicating that the treatment did not damage the samples. As expected, the anti-mt CRISPR treatment resulted in depletion of mitochondrial reads, while the number of reads in the nuclear genome increased (Fig. 2b).
At the same sequencing depth, the anti-mt CRISPR-treated samples yielded considerably less mitochondrial reads (Fig. 2c). This result is similar to the level of reduction of ~50% reported by Wu et al. 3 . Consequently, more usable reads (non-mitochondrial reads with unique coordinates), were generated ( Fig. 2d and Supplementary  Fig. S1). The increased number of usable reads resulted in 50% more peaks in the treated halves of all samples ( Fig. 2e and Supplementary Fig. S2), demonstrating the importance of removing excess mitochondrial reads from ATAC-seq samples.
One concern when treating samples with CRISPR/Cas9 was whether off-target gRNA/Cas9 activity would affect the data to a significant extent. To address this issue, we compared the percentage of peaks common across replicates (1 bp overlap) and across anti-mt CRISPR-treated and untreated samples. Figure 2f shows that the degree of overlap between untreated and treated samples was not smaller than the degree of overlap between replicates of the same condition, indicating that the anti-mt CRISPR treatment did not cause loss of peaks or create artefactual peaks. This observation is in accordance with a previous report that CRISPR treatment of sequencing libraries did not modify the read enrichment pattern 3 . We also found evidence that the anti-mt CRISPR-treated samples identified more transcription start sites and enhancers than untreated samples (see below), indicating that mtDNA cleavage did not negatively affect the data. Analysis of samples normalized by the number of usable reads instead of total number of reads sequenced (see below), corroborates the idea that the anti-mt CRISPR does not damage ATAC-seq samples.
Effect of removing detergent from the original ATAC-seq protocol. Another method that has been used to reduce the fraction of mitochondrial reads is the removal of detergent from the cell lysis step of the ATAC-seq protocol 5 . We generated seven ATAC-seq samples with no detergent (ND) and observed several differences compared to the original protocol with detergent (DT). Interestingly, the fraction of unique reads was considerably higher in ND samples compared to DT samples (56.5% vs. 32.5%, respectively), which could reflect a lower fraction of mitochondrial fragments before PCR amplification of the sequencing library. In addition, the fraction of reads uniquely aligned to the genome was slightly higher in ND samples, compared to samples prepared with the original protocol (83.6% vs. 74.6%, respectively). This difference is due to discarding reads that map to both mitochondrial and nuclear genomes (6% of ND reads and 17% of DT reads) in order to retain only uniquely aligned reads. Because we started our analyses with the same number of sequenced reads, these differences in mappability were accounted for in our comparisons. Figure 3 shows that the removal of detergent had a pronounced depletive effect on mitochondrial reads compared to untreated DT libraries (Fig. 3a) and consequently increased the fraction of usable reads (Fig. 3b). Despite this 2.4-fold increase of ND usable sequences (0.45/0.19), the number of peaks called was higher by only 1.04-fold compared to untreated DT samples (Fig. 3c, 26,651/25,664). The mean fold-difference using other parameters to call peaks and read depths was higher at 1.2-fold, but still substantially lower than the increase in usable reads (Supplemental Table S1). This difference could be due to the increased background in ND samples ( Fig. 3d and 3e), as suggested previously 5 . We considered background reads as reads that were not mitochondrial and were not in any ATAC-seq peak identified in any DT or ND sample. The lower signal/noise ratio in ND samples ( Fig. 3d) provides an explanation for why fewer peaks were identified in ND samples. Thus, although removing detergent from the lysis buffer increased the overall number of non-mitochondrial reads, the background read coverage also increased, resulting in fewer peaks called at the same sequencing depth compared to the original protocol.
Treating the ND samples with anti-mt CRISPR, i.e. combining anti-mt CRISPR treatment with the detergent-free lysis buffer, led to a 3.1-fold decrease in the fraction of mitochondrial reads compared to untreated ND samples (Fig. 3a, 0.16/0.05). However, unlike DT samples, the fraction of unique, non-mitochondrial reads increased only slightly (Fig. 3b, median fold-change: 1.1), probably because the fraction of mitochondrial reads was already small. Additionally, the effect of the anti-mt CRISPR treatment was inconsistent, with three samples showing a decrease in the fraction of usable reads and four showing an increase (Fig. 3b, dashed lines). When calling peaks in anti-mt CRISPR-treated ND samples, this inconsistency was also observed in some of the comparisons performed with different peak calling parameters and read depths, with some of the samples showing an increase in the number of peaks over their untreated counterparts, while other samples had the opposite effect (Supplemental Fig. S1).
When comparing the effect of the anti-mt CRISPR treatment between ND and DT samples, the former underperformed DT samples in terms of peaks called by 1.3-fold fewer peaks (median of 30,182 vs. 40,682, respectively). Therefore, combining the anti-mt CRISPR treatment with removal of detergent from the lysis buffer did not provide substantial gains over the original protocol with detergent that was treated with anti-mt CRISPR.
In addition to the number of peaks called in the different treatments, we evaluated the quality of peaks using two other parameters: (i) the fraction of Gencode 9 transcription start sites (TSS's) (Fig. 4a) and (ii) the fraction of Epigenome Roadmap 10 annotated enhancers overlapping peaks ( Fig. 4b and Supplemental Fig. S2). The highest fraction of TSS's and enhancers was identified in samples treated with anti-mt CRISPR, regardless of whether they were generated with or without detergent. Whereas both anti-mt CRISPR-treated DT and ND samples identified similar numbers of TSS's, enhancers were identified at a higher rate using detergent in conjunction with the anti-mt CRISPR treatment. This difference could be explained by the notion that chromatin tends to be more open in promoters to allow transcription, while enhancers, due to their dynamic nature, would be less accessible. In this scenario, finding enhancers requires lower background and higher quality data, which we have shown is best represented by the detergent anti-mt CRISPR samples.
To further investigate differences caused by the anti-mt CRISPR treatment and by removing detergent, we normalized samples by the number of usable reads (Fig. 5), instead of total sequenced reads (Figs 2, 3 and 4). Figure 5 shows that at 6.2 M usable reads, ND samples clearly underperformed DT samples. It also shows that the anti-mt CRISPR treatment removed mitochondrial reads without altering the samples in other ways, since the number of peaks identified, fraction of reads in peaks, fraction of TSS and enhancers identified is the same. Notice that 34 M reads from DT samples (median) were necessary to obtain 6.2 M usable reads, while only 17 M reads from DT samples treated with anti-mt CRISPR were necessary to obtain the same number.
We conclude that reducing mitochondrial reads by cleavage of DNA sequencing fragments using an anti-mt CRISPR strategy yielded the best results in terms of numbers of peaks identified and their quality at the same sequencing depth.
Other treatments. Given the success of using CRISPR/Cas9 to reduce the amount of mitochondrial reads, we tested modifications of the treatment to enhance the degree of depletion of mitochondrial reads (Fig. 6). We tested (i) a longer Cas9 incubation of 2 hours instead of 1 hour, (ii) addition of Cas9 for an additional 1 hour after the initial 1 hour treatment (Cas9 boost) and (iii) adding 40X, 200X and 400X Cas9 instead of 100X. None of the treatments led to enhanced depletion of mitochondrial reads and, intriguingly, the 200X and 400X Cas9 treatments performed poorer than the 100X treatment (Fig. 6).
We did not test a larger number of gRNAs targeting the mitochondrial genome, but it is likely that using 200 gRNAs instead of 100, for example, could further reduce the fraction of mitochondrial reads. However, as guide RNAs are priced per unit, the cost of the treatment increases linearly with the number of targets.

Discussion
Our CRISPR/Cas9 treatment targeting 100 loci of the human mitochondrial chromosome successfully reduced the number of mitochondrial reads in LCLs by 1.7-fold, similarly to Wu et al. 3 , and increased the number of usable reads by 1.6-fold. Consequently, at the same read depth, samples generated with the original ATAC-seq protocol (DT) and treated with CRISPR/Cas9 and anti-mt gRNAs resulted in 1.6-fold more peaks than the untreated Removing detergent from the cell lysis step (ND) resulted in even lower number of mitochondrial reads (3.1-fold), but the peaks called were fewer and of lower quality. While the anti-mt CRISPR treatment improved ND samples, resulting in increased number of peaks called, it did not improve over DT samples. We observed more variability in treated ND samples than DT samples, as well as higher background, lower number of peaks and lower overlap with LCL enhancers.
In conclusion, our data show that treating samples prepared using detergent with gRNAs/Cas9 targeting mtDNA was the best way to reduce mtDNA contamination in LCLs, increase the number of peaks, and improve identification of features such as TSS's and enhancers.
Given the cost of gRNA oligos, sacrificing sequencing reads may be more economical than depleting mitochondrial reads if only a few samples are generated. In Supplemental File S3 we provide a cost calculator based on the numbers obtained in this study and the cost of one lane of sequencing at the University of Chicago Functional Genomic Core Facility. As we have not tested the anti-mt CRISPR treatment and detergent removal in other cell types and cell lines, it is possible that different results may be obtained in other systems, which will affect the cost.
Caution should be taken when multiplexing anti-mt CRISPR-treated samples with samples that have not been treated. Treated samples will yield fewer sequencing reads unless a higher library concentration is used relative to other untreated samples. This is because the cleaved mitochondrial fragments will remain in the library but will not be sequenced since they cannot be amplified by bridge amplification. Sequencing a full lane of samples treated the same way does not require any adjustments.
During the execution of this project, an improved ATAC-seq method was published, termed Fast-ATAC 11 , which uses a milder detergent in the cell lysis buffer. This treatment was reported to decrease the fraction of mitochondrial reads from 50% to 11%, while increasing the enrichment of reads in peaks over background and yielding more fragments per cell. The authors noted that cells that are more resistant to lysis may require a stronger detergent, i.e., the original ATAC-seq protocol, in which case, using the CRISPR treatment we analyzed here will remain useful. Since the cost of gRNAs is fixed and can be distributed among multiple laboratories, reducing mtDNA contamination using an anti-mt CRISPR treatment could still lead to significant savings if large numbers of samples are generated.

Human lymphoblastoid cell line growth and harvesting. Human lymphoblastoid cell line NA19193
was obtained from Coriell Cell Repository. Cells were grown in RPMI 1640 medium lacking L-Glutamine (Corning), supplemented with 15% fetal bovine serum, 1% GlutaMAX (ThermoFisher) and 1% penicillin-streptomycin solution (ThermoFisher) at a density of 0.5 × 10 6 to 1.0 × 10 6 cells/mL. Cells were passaged every 2-3 days to maintain this density. Cells were harvested for ATAC-seq by centrifugation at 500 x g for 5 minutes at 4 °C and resuspended in PBS. Cells were counted using a hemocytometer and 50,000 cells were immediately placed into a 1.5 mL Eppendorf tube for ATAC-seq.
Preparation of ATAC-seq libraries. ATAC-seq libraries were generated according to the protocol of Buenrostro et al. 6 with minor changes. Instead of NEB Next High-Fidelity 2X PCR Master Mix, we used Q5 Hot Start High-Fidelity 2X Master Mix (New England Biolabs). Following PCR-amplification, instead of using a column to clean the reaction, we used a 0.8X Ampure bead purification and eluted the library with 20 μL nuclease-free water. For the ND samples, Igepal-CA630 was removed from the lysis buffer and replaced with water. One microliter of the library was used to run a high sensitivity Bioanalyzer to determine fragment size distribution and concentration. Anti-mitochondrial CRISPR/Cas9 treatment. To deplete ATAC-seq libraries of DNA fragments derived from the human (hg38) mitochondrial genome, 100 high-quality guide RNAs that specifically targeted the mitochondrial genome roughly every 250 base pairs were chosen using the gRNA design tool at http://crispr.mit.edu (full list of guide sequences is in Supplemental File 2). Full-length guide RNAs were designed according to Gu et al. 2 and generated from single-stranded oligo templates (Integrated DNA Technologies) according to Linet al. 12

. Briefly, each ol i go c ons i s t e d of t h e s e qu e n c e 5 ′ -TA AT AC G A C T C A C T AT AG ( N 2 0 ) G T T TA AG AG C T-AT G C T G G A A A C A G C AT A G C A A G T T TA A A TA A G G C TA G T C C G T TA T C A A C T T G A A
AAAGTGGCACCGAGTCGGTGCTTTTTTT-3′ w he re N 20 corresponds to the 20 nucleotide guide RNA seed sequence. The PAM sequence would occur at the 3′ end of the N 20 sequence. Oligos were purchased as a 200 picomole plate from Integrated DNA Technologies and received as a lyophilized pool. They were resuspended in 1 mL TE 1.0 buffer (10 mM Tris-HCl, 0.1 mM EDTA). A Nanodrop was used to determine the concentration of the oligos and 8 ng were used as template for PCR to make them double-stranded. The PCR reaction consisted of 4 μL 5X HF Buffer (New England Biolabs), 0.4 μL 10 mM dNTPs, 1 μL of each 10 μM primer (For: 5′-TAATACGACTCACTATAG, Rev: 5′-AAAAAAAGCACCGACTCGGTGC), 0.2 μL Phusion High-Fidelity DNA Polymerase (New England Biolabs) and nuclease-free water to a final volume of 20 μL. Thermocycler conditions were 98 °C for 30 s, followed by 30 cycles of 98 °C for 10 s, 56 °C for 10 s, 72 °C for 10 s, and then a final extension of 72 °C for 5 minutes. The reaction was cleaned using a Qiagen MinElute Purification kit and eluted in 10 μL of nuclease-free water. Enough PCR reactions were performed to obtain 1 μg of double-stranded template (should be in a volume of less than 8 μL). Transcription was carried out on 1 μg of template using the MEGAshortscript T7 Transcription kit (Thermo Fisher) following manufacturer's instructions and then cleaned with the MEGAclear Transcription Clean-Up kit (Thermo Fisher). gRNAs were eluted from the column with 50 μL of RNase-free water and the concentration was determined using a Nanodrop, aliquoted and stored at −80 °C.
We estimated that half of the DNA in each library was of mitochondrial origin, thus a 20 nM ATAC-seq library contained a mtDNA target concentration of 10 nM. Based on this value, 40, 100, 200, or 400 molar excess of Cas9 enzyme was used (New England Biolabs #M0386M) along with 100 molar excess gRNAs in a 30 μL reaction. The reaction was set up according to the protocol for Cas9 from S. pyogenes (New England Biolabs #M0386M). Briefly, the appropriate amounts of Cas9 enzyme and gRNAs were mixed with 3 μL of 10X Cas9 Buffer and water to a final volume of 22 μL. This was incubated at 25 °C for 10 minutes and then 8 μL of the ATAC-seq library was added and the reaction was incubated at 37 °C for one hour. For the two-hour treatment, the incubation was extended an additional hour; for the "Cas9 boost" treatment, the same amount of Cas9 enzyme was added after 1 hour of incubation and left for an additional hour. Reactions were subsequently treated with 1 μL of 20 mg/mL proteinase K for 15 minutes and purified using a Qiagen MinElute kit followed by elution in 10 μL nuclease-free water. Treated libraries were run on a high sensitivity Bioanalyzer chip to assess fragment size distribution and concentration (Supplemental Fig. S4). Because the multiplexing barcodes are added before treatment, for each batch of experiments, samples were sequenced on two lanes of an Illumina Hi-Seq 4000 instrument, separating anti-mt-CRISPR untreated and treated samples.
Mean fraction of common peaks and mean Pearson's R 2 of read counts. We ranked peaks called by MACS2 by -log10(qvalue) and used bedtools intersect to count the number of peaks common between the top 20,000 peaks of each sample. The fraction presented in Fig. 2f is the mean fraction of peaks common between samples of a given group (e.g. treated vs. treated). To calculate the degree of similarity of read counts in ATAC-seq peaks, we merged all peaks from all samples and counted the number of reads in each peak in each sample. We then calculated the R 2 of the read counts per peak in pairs of samples in each group (e.g. treated vs. treated) and obtained the mean per group presented in Fig. 2f.

Statistical tests.
Due to the small number of replicates, we chose the more conservative Wilcoxon rank sum test to compare treatments in the boxplots shown (R statistical package version 3.3.1) 16 . Student t-tests were in agreement with the results presented, yielding smaller P-values. Paired tests were used to compare treated/ untreated pairs and unpaired tests were used to compare samples prepared with and without detergent. One-sided P-values are presented, since we were interested in specific directions of change. Two-tailed P-values do not change our conclusions. Fold-differences of DT samples were calculated pairwise and the median was reported. Fold-differences of DT versus ND samples were calculated on the median of each group. Ethics Statement. The NA19193 cell line was purchased from Coriell Cell Repository. The original samples were collected by the HapMap project in between 2001-2005. All of the samples were collected with extensive community engagement, including discussions with members of the donor communities about the ethical and social implications of human genetic variation research. Donors gave broad consent to future uses of the samples, including their use for extensive genotyping and sequencing, gene expression and proteomics studies, and all other types of genetic variation research, with the data publicly released.
All methods were carried out in accordance with known guidelines and regulations.