Systematic alteration of ATAC-seq for profiling open chromatin in cryopreserved nuclei preparations from livestock tissues

The use of Assay for Transposase-Accessible Chromatin (ATAC-seq) to profile chromatin accessibility has surged over the past years, but its applicability to tissues has been very limited. With the intent of preserving nuclear architecture during long-term storage, cryopreserved nuclei preparations from chicken lung were used to optimize ATAC-seq. Sequencing data were compared with existing DNase-seq, ChIP-seq, and RNA-seq data to evaluate library quality, ultimately resulting in a modified ATAC-seq method capable of generating high quality chromatin accessibility data from cryopreserved nuclei preparations. Using this method, nucleosome-free regions (NFR) identified in chicken lung overlapped half of DNase-I hypersensitive sites, coincided with active histone modifications, and specifically marked actively expressed genes. Notably, sequencing only the subnucleosomal fraction dramatically improved signal, while separation of subnucleosomal reads post-sequencing did not improve signal or peak calling. The broader applicability of this modified ATAC-seq technique was tested using cryopreserved nuclei preparations from pig tissues, resulting in NFR that were highly consistent among biological replicates. Furthermore, tissue-specific NFR were enriched for binding motifs of transcription factors related to tissue-specific functions, and marked genes functionally enriched for tissue-specific processes. Overall, these results provide insights into the optimization of ATAC-seq and a platform for profiling open chromatin in animal tissues.


Results
First iteration of ATAC-seq from cryopreserved nuclei preparations yields poor signal. Fresh lung tissue collected from adult inbred roosters was homogenized, filtered to remove debris, supplemented with cryoprotectant and slow-frozen. This method is intended to isolate nuclei from tissues for cryopreservation, before treatment with DNase-I 12 . Following a similar methodology for use with DNase-seq, cryopreserved nuclei preparations were thawed on ice, pelleted, resuspended in sucrose buffer, filtered, pelleted again, washed, and resuspended in PBS for counting. At this point, the original ATAC-seq protocol 8 was followed without modification. Briefly, 50,000 nuclei were counted, pelleted, and incubated for 30 minutes with hyperactive Tn5 transposase. Transposed DNA was then amplified by PCR (with 12 total cycles determined as optimal by qPCR) and inspected for fragment length distribution on a Bioanalyzer High Sensitivity DNA Chip (Agilent Genomics). This revealed the expected nucleosomal laddering pattern, with subnucleosomal, mononucleosomal, and dinucleosomal fragments enriched at 200, 350, and 550 bp, respectively ( Fig. 2; Library 1-a-50K). Taking this as an indication that the genomic DNA had been successfully tagmented, the library was submitted for sequencing, resulting in over 137 million uniquely mapped monoclonal non-mitochondrial reads (Table 1). However, at a depth of 40 million reads, only 6,664 regions showed significant enrichment, and only 6% of DHS were detected (Figs. 2 and 3a).
Data were generally noisy, with reads aligning broadly across the genome rather than accumulating at DHS. The Fraction of Reads in Peaks (FRiP) was used to measure the signal-to-noise ratio, since reads falling within regions of enrichment are informative of open chromatin, whereas those falling outside those regions are background noise. In this case, the FRiP score was 0.06, well beneath the minimum 0.2 standard set by ENCODE for ATAC-seq libraries (Fig. 3b). This uniform distribution clearly contradicted the nucleosomal laddering pattern that was observed prior to sequencing, which suggested that tagmentation had resulted in enrichment for fragment lengths that should have specifically corresponded to either open chromatin, or DNA wrapped around one or more nucleosomes. It was speculated that the poor library quality of 1-a-50K could have resulted from either (1) degraded cells in the nuclei preparations, whose free-floating genomic DNA was contributing the bulk of ATAC-seq reads, or (2) under-tagmentation, indicating that either the incubation period with the transposase was too short, or the number of cells used in the reaction too high.  Figure 1. ATAC-seq optimization process for compatibility with cryopreserved nuclei preparations. Variations to the ATAC-seq protocol are labeled chronologically, with the first iteration (1) shown in orange, the second iteration (2) in yellow, the second iteration with a size-selection step (2S) also in yellow, and the third iteration (3) in green.
www.nature.com/scientificreports www.nature.com/scientificreports/ Varying nuclei input does not substantially improve ATAC-seq library quality. To exclude cell debris from the tagmentation reaction, intact nuclei were isolated from cryopreserved nuclei preparations using a sucrose cushion and ultracentrifugation. Preparations were thawed on ice, carefully layered on top of a sucrose gradient and pelleted. Nuclei were then counted and ATAC-seq libraries were prepared as previously. All libraries were PCR amplified for a total of 10 cycles -two less than the optimal number of PCR cycles for 1-a-50K -to  minimize PCR bias. The number of nuclei per transposition reaction was varied from 50,000 to 500,000 to determine if over-or under-transposition was affecting the signal-to-noise ratio after sequencing. The rate of duplication among mapped reads remained below 20% (Table 1), indicating that PCR bias remained below the level that was introduced when PCR cycling was initially optimized. After randomly subsampling all libraries to a depth of 40 million reads for peak calling, 15% of DHS were detected on average by ATAC-seq peaks ( Fig. 2 and Supplementary Fig. 1; Protocol Iteration 2), with an average FRiP score of 0.06 (Fig. 3b). Detection of open chromatin at transcription start sites (TSS) was also improved. Open chromatin identified by 1-a-50k only marked 1.7% of promoters, whereas open chromatin identified by libraries from the second ATAC-seq iteration marked 12% of promoters, on average ( Fig. 3c), with preference for actively expressed genes (Fig. 4). While these libraries demonstrated slight improvements over 1-a-50K, they were still not of sufficient quality. Detection of DHS remained below the 46% detected by the original ATAC-seq protocol in a cell line 8 , and signal was still far beneath the 0.2 minimum FRiP score required by ENCODE.
As observed in the first ATAC-seq iteration (1-a-50K), the signal-to-noise ratio remained too low to reliably distinguish peaks in regions with DHS, despite clear nucleosomal laddering patterns prior to sequencing. In addition, isolation of subnucleosomal length alignments (observed template length <150 bp) for peak calling did not improve signal or specificity of reads to DHS. Linear regression analysis of all ATAC-seq libraries from the second protocol iteration showed that higher FRiP scores were correlated with improved detection of DHS (R 2 = 0.83, p = 0.01), indicating that a high signal-to-noise ratio is critical to library quality. Increasing the number of nuclei used in the transposition reaction did not significantly impact number of peaks called (R 2 = 0.55, p = 0.09), but appeared to somewhat improve the FRiP score (R 2 = 0.74, p = 0.03) and DHS detection (R 2 = 0.82, p = 0.01). Despite issues with low signal, 7,546 peaks (61%) were shared between technical replicates (libraries 2-c-500K-1 and 2-c-500K-2), and 6,841 peaks (55%) were shared between biological replicates (libraries 2-b-500K and 2-c-500K-2).
Overall, it was concluded that (1) nucleosomal laddering was not a reliable indicator of library quality, (2) isolating nuclei by sucrose gradient did not solve the issue of low signal, (3) isolating subnucleosomal length reads post-sequencing also did not solve the issue of low signal, and (4) increasing nuclear input could somewhat improve signal quality and DHS detection.

Sequencing only the subnucleosomal fraction substantially improves ATAC-seq signal.
Pursuing the possibility that there was some low-level enrichment of subnucleosomal reads at TSS, a library with sufficient material for re-sequencing (2-c-200K) was size-selected for only subnucleosomal length fragments (150-250 bp) using the PippinHT system, and then re-sequenced on the NextSeq platform to generate 40 bp paired-end reads. This substantially improved the FRiP score from 0.06 to 0.16 (Fig. 3b), quadrupled the number of peaks called, and nearly doubled the detected DHS from 16% to 31% ( Fig. 2; library 2S-c-200K). Clearly, size-selection of libraries for subnucleosomal length fragments substantially improved library quality.
Frac on of reads in peaks Promoter Distal Not in Peaks  Desirable values are shaded darker, with the scale starting at 0 (white) and ending at the maximum value. For the percent mitochondrial reads, anything below 5% is shaded blue. All values were determined from 5 million random aligned reads.
Postulating that library quality could be further improved, alternative methods for isolating nuclei prior to transposition were explored. Previously, a sucrose gradient separation had been employed to isolate intact nuclei for ATAC-seq. In many cases, a large amount of material did not pellet consistently and was discarded. Reasoning that this material might contain intact cells that had not been lysed during the homogenization step prior to cryopreservation, all material from thawed nuclei preparations were pelleted, washed, and incubated with cell lysis buffer. Considering the correlation between increased nuclei input and signal, 200,000-500,000 nuclei were incubated with transposase. Transposition time was also varied from 30 to 90 minutes, and transposed DNA was amplified for 10 PCR cycles as previously. The fragment length distributions of these libraries shifted considerably in response to varied incubation time; the longer the incubation, the higher proportion of subnucleosomal length fragments (Supplementary Fig. 2). For 200,000 nuclei, 30 minutes of exposure to transposase resulted in a preponderance of high molecular weight fragments, possibly indicating under-transposition. At 90 minutes, laddering after the mononucleosomal peak was lost, suggesting over-transposition. Consequently, nuclei were transposed for 60 minutes total, such that the subnucleosomal fraction and nucleosomal laddering were both prominent. Libraries were size-selected for subnucleosomal length fragments, and submitted for sequencing on the NextSeq 500 platform to generate 40 bp paired-end reads. As expected, since smaller fragments are more efficiently amplified by PCR, size selection for the subnucleosomal fraction increased the rate of PCR duplication in (Table 1), indicating that total PCR cycles should be carefully considered when size-selecting libraries.
After subsampling to a depth of 40 million reads ( Fig. 2; Libraries 3-d-200K and 3-e-200K), libraries from the third ATAC-seq protocol iteration captured more DHS than previously and demonstrated FRiP scores above 0.3 ( Fig. 3a,b). Peak calls were consistent between biological replicates, with 61,503 peaks (67.9%) from 3-e-200K directly overlapped by 3-d-200K peaks. These nucleosome-free regions (NFR) also consistently overlapped activating histone modifications ( Fig. 2; ChIP-seq data generated as part of the UC Davis FAANG pilot project): H3K27ac (typically a marker of active enhancers; average 54% ChIP-seq peaks detected), H3K4me1 (a general mark of enhancers; average 24% detected), and H3K4me3 (a marker of active TSS; average 67% detected). Additionally, these NFR more consistently marked the promoters of expressed genes compared to data from previous ATAC-seq protocol iterations (Fig. 4).
To gauge the target sequencing depth for future experiments, all available uniquely-mapped, monoclonal, non-mitochondrial reads were combined from 3-d-200K and 3-e-200K, and then randomly subsampled from 10 to 90 million reads. These subsampled read sets were then subjected to peak calling and assessed for DHS detection ( Supplementary Fig. 3), revealing a plateau in additional DHS detected after about 50 million mapped reads, in concordance with the recommendations by Buenrostro et al. (2013).

Chicken lung ATAC-seq data marks both promoters of active genes and intergenic regions.
One concern when sequencing only the subnucleosomal fraction was whether this would lead to preferential detection of TSS. Buenrostro et al. (2015) showed that TSS were enriched for nucleosome-free reads  www.nature.com/scientificreports www.nature.com/scientificreports/ as compared to distal elements, implying that size selection for the subnucleosomal fraction could lead to an over-representation of TSS in ATAC-seq peaks. In contrast to this expectation, only a third of NFR found in both 3-d-200K and 3-e-200K corresponded to promoters (2 kb upstream of TSS), and nearly half fell in intergenic regions (Fig. 5). The NFR that overlapped promoters were most significantly enriched for the binding motif of Sp1, a transcription factor involved in many cellular processes, and that of ETS factors, which are also implicated in a wide variety of functions via transcriptional regulation. Distal NFR were most significantly enriched for the CTCF binding motif, suggesting that many of these peaks are likely involved in the 3-D organization of the genome, since CTCF has been implicated in long-range chromatin interactions, such as those between enhancers and their targets 13 .
Modified ATAC-seq generates high quality chromatin accessibility data for pig lung, muscle and spleen. To confirm the broader applicability of this modified ATAC-seq protocol, the technique was also applied to lung, muscle, and spleen collected from adult male Yorkshire pigs (Michigan State University) as part of the FAANG pilot project at UC Davis. For all libraries, over 100,000 NFR were called, which consistently overlapped with the active histone modifications H3K27ac, H3K4me1, and H3K4me3 ( Fig. 6a; ChIP-seq data generated as part of the UC Davis FAANG pilot project). Over 20% of reads fell within peaks, with reads distributed across both distal and promoter NFR (Fig. 6b), which marked about a third of promoters, on average (Fig. 6c). Tissue replicates clustered together based on read alignments (Fig. 7a), and NFR were consistent between biological replicates, with 68%, 87%, and 64% of peaks detected in both replicates of lung, muscle, and spleen, respectively.
Comparing sets of NFR revealed both common and tissue-specific NFR (Fig. 7b), such as those observed at genes with tissue-specific functions (Fig. 7c). Muscle-specific NFR were enriched for binding motifs of various myocyte enhancer factors, whereas NFR that were common to all three tissues were primarily enriched for the CTCF motif and promoter-specific sequences ( Table 2). Genes with tissue-specific NFR in their promoters were similarly enriched for tissue-specific functions, with muscle-specific NFR marking genes related to muscle contraction and ion transport, and spleen-specific NFR marking genes related to immune function (Fig. 8). These data suggest that this modified ATAC-seq technique may be broadly applied across species and tissues to profile chromatin accessibility.

Discussion
While ATAC-seq has been used extensively to profile open chromatin in cell lines, its applicability to other sample types, such as frozen tissues, has been limited. Another modified ATAC-seq protocol -Omni-ATAC -reportedly detected 76% of DHS in a cell line 10 . However, Omni-ATAC data generated from frozen tissue was not compared to reference DNase-seq data, so conclusions could not be drawn about the consistency between Omni-ATAC and DNase-seq from frozen tissue samples. Additionally, concerns about possible degradation of the nuclear architecture during flash-freezing suggest that slow-freezing, or cryopreservation, may be the ideal long-term storage method for samples intended for ATAC-seq 11 . Through systematic modifications to ATAC-seq, chromatin accessibility data for chicken lung were generated from cryopreserved nuclei preparations and compared with DNase-seq, ChIP-seq, and RNA-seq data also available for chicken lung. These ATAC-seq data detected 55% of DHS in chicken lung at a depth of 40 million reads, consistent with the original ATAC-seq protocol, which detected 46% of DHS in a human cell line 8 .
Cell quality and thorough lysis appeared to be the most important criteria for generating high-quality ATAC-seq libraries from cryopreserved nuclei preparations. This was exemplified by the observation that subjecting the sample to an additional cell lysis step improved the specificity of ATAC-seq peaks to actively expressed genes. In addition, doubling the tagmentation time and sequencing only the subnucleosomal fraction of libraries substantially improved the signal-to-noise ratio after mapping, ultimately resulting in ATAC-seq data that meets ENCODE standards. Attempts to evaluate library quality before sequencing were largely uninformative.  Figure 5. Characterization of the 61,503 chicken lung ATAC-seq peaks that were shared between libraries from the third ATAC-seq protocol iteration. Genomic location of peaks was determined by minimum 50% overlap with promoters (2 kb upstream of TSS), exons, or introns. Peaks that did not overlap any features were classified as distal. The top 5 enriched binding motifs are reported for peaks that were distal or which localized to promoters. www.nature.com/scientificreports www.nature.com/scientificreports/ Sequencing data revealed that the presence of a nucleosomal laddering pattern did not necessarily indicate a good-quality ATAC-seq library. In fact, strong laddering patterns were evident in libraries that demonstrated very low signal-to-noise ratios (as low as 3% of reads in peaks). Ultimately, we found that the best way to gauge library quality was to generate preliminary sequencing data at a low depth of about 5 million mapped reads, and assess the signal-to-noise ratio by the fraction of reads falling in peaks (Figs. 3c and 6c), which should exceed 10% for good quality libraries.
Overall, we conclude that (1) sample preparation (cell quality and thorough lysis) is critical for generating high quality libraries, (2) nucleosomal laddering is not a consistent indicator of library quality, and (3) sequencing only the subnucleosomal fraction of libraries considerably improved the signal-to-noise ratio after mapping. Following ATAC-seq optimization, regions enriched for ATAC-seq reads captured more DHS and regions with active histone modifications. Consolidating these findings yields a modified ATAC-seq protocol capable of generating high-quality chromatin accessibility data from cryopreserved nuclei preparations. These results broaden the applicability of ATAC-seq to cryopreserved nuclei preparations from tissues, which will benefit current efforts to annotate regulatory elements in a wider range of samples and species.

Materials and Methods
Tissue collection, processing, and cryopreservation of nuclei preparations. Lung tissue was harvested from inbred adult roosters (UCD003 line). One gram of tissue was minced with a razor blade, transferred to a gentleMACS C tube with 10 mL of sucrose buffer (250 mM D-Sucrose, 10 mM Tris-HCl (pH 7.5), 1 mM MgCl 2 ; 1 protease inhibitor tablet per 50 mL solution just prior to use), and homogenized using the gentleMACS Dissociator Program 'E.01c Tube' , twice. Homogenate was filtered using a 100 uM Steriflip Vacuum Filter system. Volume was brought up to 9.9 mL with sucrose buffer, and 1.1 mL DMSO was added to achieve a 10% final concentration. Solution was aliquoted into cryotube vials, frozen overnight at −80 °C in a Nalgene Cryo 1 °C/min Freezing Container, and then stored at −80 °C long-term.
All animal experimental protocols were approved by the University of California Davis Animal Care and Use Committee (IACUC). All methods were performed in accordance with the relevant guidelines and regulations.     14. Subject PCR reaction to 5 additional cycles (for a total of 10 cycles). Optionally, an aliquot of the PCR reaction may be subjected to qPCR to determine the optimal number of additional cycles to minimize PCR bias in final library, with the additional number of cycles calculated as those needed to reach 1/3 of the www.nature.com/scientificreports www.nature.com/scientificreports/ maximum R n value determined by qPCR (see Buenrostro et al. (2013) for additional details on using qPCR to determine optimal PCR cycles).  Pelleted nuclei were resuspended in PBS for counting. Nuclei concentration was determined by counting on a hemocytometer, an aliquot of 50,000-500,000 nuclei was pelleted in a 1.5 mL Eppendorf tube (500 × g for 5 min at 4 °C), and the nuclear pellet was incubated in 50 µL transposition mix ( . Libraries were then purified with the MinElute PCR Purification kit and eluted in 10 µL Buffer EB. If a library was not concentrated enough for sequencing after 10 cycles of amplification, library complexity was a concern, and library preparation was repeated. Libraries were assessed for fragment length distribution on a Bioanalyzer High Sensitivity DNA Chip (Agilent Genomics), and submitted for paired-end 40 bp sequencing on the NextSeq platform.