A scATAC-seq atlas of chromatin accessibility in axolotl brain regions

Axolotl (Ambystoma mexicanum) is an excellent model for investigating regeneration, the interaction between regenerative and developmental processes, comparative genomics, and evolution. The brain, which serves as the material basis of consciousness, learning, memory, and behavior, is the most complex and advanced organ in axolotl. The modulation of transcription factors is a crucial aspect in determining the function of diverse regions within the brain. There is, however, no comprehensive understanding of the gene regulatory network of axolotl brain regions. Here, we utilized single-cell ATAC sequencing to generate the chromatin accessibility landscapes of 81,199 cells from the olfactory bulb, telencephalon, diencephalon and mesencephalon, hypothalamus and pituitary, and the rhombencephalon. Based on these data, we identified key transcription factors specific to distinct cell types and compared cell type functions across brain regions. Our results provide a foundation for comprehensive analysis of gene regulatory programs, which are valuable for future studies of axolotl brain development, regeneration, and evolution, as well as on the mechanisms underlying cell-type diversity in vertebrate brains.


Background & Summary
The axolotl (Ambystoma mexicanum) has remarkable regenerative abilities, including the ability to regenerate its limbs, heart, tail, and spinal cord [1][2][3][4][5] , making it as an ideal model for studying vertebrate regeneration, comparative genomics, and evolution.With the development of technology, great progress has been made in axolotl genome and transcriptome research.In brief, recent advances in single-cell and spatial transcriptomics assays have led to profiling the distribution of cell types of the axolotl telencephalon in situ during development and regeneration 6 .Furthermore, by using either species-shared differentially expressed genes or transcription factors (TFs), previous axolotl telencephalon studies have shown the conservation between the medial pallium neurons and amniote hippocampal neurons 7 .Interestingly, the axolotl genome has conserved coding regions with the human genome but is ten times its size, with the majority of the expansion in non-coding regions.With a 32-gigabase-pair genome 8 , non-coding regulatory DNA sequences in axolotls may be proposed to play a distinctive role in environmental adaptation.However, a comprehensive understanding of cis-regulatory elements in the axolotl genome, especially in complex organs like the brain, has lagged.
Using techniques such as Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and chromatin immunoprecipitation followed by sequencing (ChIP-seq), candidate cis-regulatory elements (cCREs) have been mapped to identify and exploit epigenetic features 9,10 .Due to the extreme cellular heterogeneity of the brain, conventional assays that use bulk tissue samples have difficulty identifying cCREs in specific cell types.Single-cell ATAC sequencing (scATAC-seq) is a powerful technique that allows us to identify the regulatory regions of the genome that are active in individual cells [11][12][13][14] .It was developed to determine chromatin accessibility in various biological scenarios and the corresponding transcriptional programs.In recent years, scATAC-seq has gained significant attention in the field of neuroscience, because it allows for the investigation of the epigenetic landscape of cell-type-specific transcriptional regulatory sequences in brain regions [15][16][17] .For example, regulatory elements linked to Cux2 and Foxp2, which are highly expressed only in layers II-IV and layer VI of mice, respectively, are only accessible in cells of the corresponding neuron clusters, reflecting the heterogeneity of chromatin accessibility in different layers of the prefrontal cortex 15 .
In this study, we profiled a comprehensive chromatin accessibility landscape across the axolotl brain with scATAC-seq 16,18,19 , including the olfactory bulb (OB), telencephalon (Tel), diencephalon and mesencephalon (DM), hypothalamus and pituitary (HP), and rhombencephalon (Rho).We were able to obtain single-cell chromatin accessibility for 81,199 cells after applying quality control.By analysing the chromatin accessibility profiles of different cell types, we have identified unique sets of regulatory elements that are specific to each cell type, including microglia, GABAergic, glutamatergic, ependymoglia cell (EGC), and so on.Overall, our findings provide insights into the regulatory mechanisms that underlie the diverse functions of different cell types in axolotl brain regions, and serve as a foundation for future studies aimed at understanding the transcriptional programs that drive the distinguishing features of axolotl brain regions.

Tissue collection. The Biomedical Research Ethics Committee of Guangdong Provincial People's Hospital
(license number: KY-Q-2022-395-01) approved the use of animals in this study.The d/d strain of axolotl was used for all experiments without sex bias in the research.Animals were bred and maintained in freshly dechlorinated tap water at 18-20 °C with a 12 h/12 h light-dark cycle in the Laboratory of Neural Development and Regeneration, Guangdong Provincial People's Hospital.Totally 8 adult axolotls were sacrificed, and OB, Tel, DM, HP, and Rho were harvested in this study.In brief, axolotls were deeply anaesthetized using 0.03% ethyl-p-aminobenzoate.Each brain region from 8 animals was dissected from the axolotl head.Then, each brain region was pooled into a separate tube for single-nucleus isolation, respectively.The samples were snap-frozen in liquid nitrogen and then transferred to a −80 °C freezer for storage before dissociation.
Nuclear isolation from frozen brain tissue.The axolotl brain regions focused on this study were the OB, Tel, DM, HP, and Rho (Supplementary Table 1 and Fig. 1a).The single-nucleus preparations were obtained using the Omni-ATAC protocol with certain modifications, as previously described 20 .In brief, nuclei from the frozen brain regions were isolated 21,22 .Tissue was cut into pieces and ground in 2 mL of chilled homogenization buffer [HB; 120 mM Tris pH 7.8 (Sigma), 150 mM KCl (Sigma), 30 mM MgCl 2 (Invitrogen), 250 mM sucrose (Sigma), 0.1% NP-40 (Roche), 1 × Protease inhibitor cocktail (Roche), 1 mM DTT (Thermo Fisher Scientific), 1% BSA/0.8 × PBS], and incubated on ice for 5 min.The tissue was homogenized by stroking (grinding) to release the nuclei.Then, nuclei were filtered through a 30 μM cell strainer (Sysmex Partec) into a 15 mL centrifuge tube.After centrifugation at 500 g and 4 °C for 5 min, the nuclei were collected and washed twice with 1 mL of chilled blocking buffer [BB, 1% BSA/0.8 × PBS].After another round of centrifugation, the nuclei were resuspended in 50 μl of 1% BSA/0.8 × PBS and counted by staining with DAPI.scATAC-Seq library preparation and sequencing.For the preparation of single-cell ATAC-seq libraries, we employed the DNBelab C Series Single-Cell ATAC Library Prep Set (MGI, #1000021878) 19 .To investigate the transcriptional regulation of different brain regions in axolotl with sufficient and high-quality data, a total of 39 libraries were generated.Specifically, there were 8 libraries for OB, DM, HP, and Rho, and 7 libraries for Tel.After quality control, 25 high-quality libraries were retained, including 6 from the OB, 4 from the Tel, 3 from the DM, 6 from the HP, and 6 from the Rho.The barcoded scATAC-seq libraries were generated from the transposed single-cell suspensions.In summary, the protocol involved droplet encapsulation, pre-amplification, emulsion breakage, collection of capture beads, DNA amplification, and purification.Following this, an indexed sequencing library was prepared as per the user guide.Finally, we used the Qubit ssDNA Assay Kit (Thermo Fisher Scientific) to measure the concentrations of the sequencing libraries.The library was sequenced on the MGI DNBSEQ-T1 platform using the following read lengths: 50 bp for read 1, 70 bp for read 2, and 10 bp for the sample index sequencing scheme of the China National GeneBank 23 (CNGB).
scATAC-seq raw data processing.The processing of scATAC-seq data involved the following steps.First, the raw reads were separated into insertions and barcodes and then filtered using PISA (version 1.1) with a minimum sequencing quality of 20; the software is available at https://github.com/shiquan/PISA.The sequencing reports for the scATAC-seq datasets are summarized in Table 1.Next, the filtered reads were aligned to the axolotl genome 8 using BWA (version 0.7.17-r1188) 24 , and the resulting BAM files were processed using bap2 (version 0.6.2) 25 to identify barcodes from the same cell (Fig. 1b).
Quality control (QC) of the scATAC-seq downstream analysis.ArchR (version 1.0.2) 26 was used to filter low quality cells with the following criteria: unique fragments (nFrags) ≥ 1000 and transcription start site (TSS) score ≥ 4 of each library.Then, we calculated the doublet score by the 'addDoubletScores' function with the parameters of filterRatio = 2 and potential doublets were removed based on the ArchR 26 .
Briefly, considering the 32-gigabase-pair axolotl genome 8 , to determine whether each cell was accessible within each window, we created 100000 bp windows across the genome.ArchR (version 1.0.2) 26 uses a weighted average of the accessibility of the peaks, where the weight of each peak is determined by its proximity to the transcription start site.We calculated gene activity scores using the 'addGeneScoreMatrix' function.Furthermore, due to the sparsity of scATAC-seq data, we employed the MAGIC (Markov Affinity-based Graph Imputation of Cells) imputation technique to impute gene activity scores from bolstering signal strength and reducing noise by sharing information with similar nearby cells.

Latent semantic indexing (LSi) clustering of scATAC-seq.
Analyses of the scATAC-seq data were conducted using ArchR 26 .LSI dimensionality reduction was performed with 'addIterativeLSI' function in ArchR 26 .By using Seurat's 'FindClusters' function, we clustered the data with parameters: reducedDims = 'IterativeLSI' , integration of library.As our data were produced from the same batch, no significant batch effects were observed.The integration of our libraries was performed by simply merging them using ArchR 26 .To accomplish this, we first created a project using ' ArchRProject' function.Next, we applied dimensionality reduction to our data using the 'addIterativeLSI' function with 'TileMatrix' matrix.Subsequently, we incorporated UMAP embedding of reduced dimensions object using 'addUMAP' function, allowing us to visualize different brain regions by 'plotEmbedding' function.Finally, we calculated the Pearson correlation coefficients between different libraries by 'cor' function with the gene activity matrix.
peak calling of scATAC-seq.The model-based analysis of ChIP-seq (MACS2) 27 (https://github.com/hbctraining/Intro-to-ChIPseq)callpeak command was used to perform peak calling on the Tn5-corrected insertions (representing each end of the Tn5-corrected fragments) for each cell type.In brief, the 'addGroupCoverages' function was used to create pseudo-bulk replicates for each cell type and the 'addReproduciblePeakSet' function (parameters: groupBy = 'Clusters' , pathToMA-CS = pathToMacs2, '-nomodel' , genomeSize = 2,782,028,915) was used to call peaks.Next, the 'getMarkerFeatures' function was utilized to define the marker peaks for each cell type, and the 'getMarkers' function was applied to get the marker peaks.
Motif enrichment analysis.We annotated motif using the 'addMotifAnnotations' function of ArchR with the default parameters: motifSet = 'cisbp' , name = 'Motif ' , species = 'homo sapiens' , version = 2.Then, with the marker peaks, motif enrichment was performed by the 'peakAnnoEnrichment' function, and the top 18 cell-type-specific motifs were shown in the heatmap.
Assigning gene to the scATAC-seq peaks.project from the provided ArrowFiles with ' ArchRProject' function (parameters: geneAnnotation = axolotl gene annotation, genomeAnnotation = axolotl genome annotation).With the annotated ArchR project, scATAC-seq peaks were assigned genes by using the peak-to-gene assignment algorithm of ArchR based on their proximity to the transcription start site (TSS) of the nearest gene.
The marker peaks of cell type used as input (FDR ≤ 0.05 & Log2FC ≥ 0.25) were annotated based on proximity to the TSS of the nearest genes.

Data records
We presented chromatin accessibility landscapes for different regions of the axolotl brain, providing valuable insights into the epigenetic regulation mechanisms of brain function and cell heterogeneity.Our data set consists of chromatin accessibility landscapes for 81,199 high-quality single-cells from the OB, Tel, DM, HP, and Rho regions of the axolotl brain, including 3-6 libraries of each region.Figure 1 provides an overview of the laboratory bioinformatical and analysis workflow.Table 1 displays the quality of each library.The raw fastq and fragments data have been deposited in the CNGB Nucleotide Sequence Archive (CNP0004118 28 ).The raw fastq data have also been submitted to the NCBI Sequence Read Archive (PRJNA990676 29 ).Additionally, the peak matrices and metadata has been uploaded to Figshare (https://doi.org/10.6084/m9.figshare.22548400.v7) 30.The Supplementary Table 1 (available at Figshare) 30 exhibited libraries correlation between the accession IDs in the CNGB or NCBI and the sample IDs.

technical Validation
Quality control of scATAC-seq.In this study, nuclei were extracted from the OB, Tel, DM, HP, and Rho of adult axolotl and were prepared for scATAC-seq (see Methods) (Supplementary Table 1 and Fig. 1a) 30 .The total number of raw reads was 9,872,843,413, with 8,872,034,324 reads passing QC (Table 1).The raw data were processed via the standard pipeline (Fig. 1b).
After quality control, we obtained a satisfying set of single-cell chromatin accessibility profiles for axolotl brain regions (Fig. 2a-c).We eliminated doublets by applying a filterRatio parameter of 2 in ArchR 26 (Fig. 2d), and subsequently examined the chromatin landscape of the 81,199 single cells to investigate cell-type-specific regulatory elements.The average unique fragments of cells remaining after quality filtering was 6,281; the average of TSS enrichment was 19.97, the average accessible peaks of each library were 113,571 (Table 1).To identify reproduced peaks between technical replicates, we computed Pearson correlation coefficients to assess the similarity between technical replicates across regions, based on gene activity (Fig. 2e).UMAP showed the heterogeneity of gene activity in different brain regions, which is probably indicative of differential functioning (Fig. 2f,g).For example, the Tel serves as the hub for sensory processing, oversees voluntary movements and activities, and is linked to sophisticated cognitive skills such as emotions, learning, and memory 31 , whereas the primary role of the HP is to manage neuroendocrine processes, with the paraventricular region housing cell bodies that project to the hypophysis and secrete hormones 32 .
Chromatin accessibility analysis was used to identify the differential accessibility regions (DARs) of those 20 cell types (Fig. 3c).For example, peaks were identified specifically within the TSS region of the Gfap marker genes that were accessible in both asclEGC and IPC.In addition, signals around Chd7 were specifically enriched in the chdEGC region but not in other cell types.These phenomena strongly suggest that our data are of adequate quality.To identify responding cells across regions, we investigated the varied cell-type compositions of 25 libraries by percent constituency (Fig. 3e).Interestingly, asclEGC increased gradually from the anterior to the posterior axis, and wntEGC and chdEGC were specifically enriched in Rho.Next, the diversity of cellular compositions across different regions of the axolotl brain was analysed in detail and summarized in Fig. 3f.We found asclEGC, HPN, and GEMs have more distinct gene regulation in regions with a large number of different up-regulated gene activities.As Gfap is a stem-cell associated protein that, like stem cells, exhibits a similar response to injury 31,32,41,42 , we analysed asclEGC (with Gfap chromatin accessible) and found Egr1 and Meis1 are accessible in OB; Mef2c, Slc1a6, Wnt5b, Lhx2 in Tel; Col and the Nd family in DM; Chd6, Nkx2-3 in HP; Pax7   and the Hox family in Rho, respectively (Fig. 3g).Therefore, this analysis strongly suggests that the scATAC-seq profiles of axolotl brain regions can effectively and accurately identify the accessible chromatin regions within the axolotl genome and provides a direct theoretical basis reference that can be exploited by future studies.inferring cell type-specific transcription factors.To gain a better understanding of the regulatory mechanisms governing the chromatin landscape, we implemented peak calling to generate a set of 121,697 cell-type differential peaks based on pseudobulk chromatin accessibility.Then, we used ArchR 26 to identify transcription factors that exhibit strong correlation with cell-type-specific open chromatin and compared these results with previous studies (Fig. 4).Conformably, the master regulators have been shown to be closely related to specific functions of cell types.For example, MCG is enriched for motifs of the BCL11 family (BCL11A 43 , BCL11B 44 ).This was supported by the previous study of BCL11A deletion causes apoptosis in immature B cells and common lymphoid progenitors, and also results in delayed or deficient lymphoid development of hematopoietic stem cells to B, T, and NK cells in adult mice 43 .Besides, BCL11B prevents autoimmune disorders by controlling multiple regulatory T cell gene expression programs 44 .EGCs are enriched for motifs that distinguish progenitor cells from other cell types, in the agreement of the characterization of proliferation in EGCs 31,32,41,42 .In brief, we observed strong enrichment of PLAG1 in asclEGC, which is consistent with regulation of progenitor cell proliferation and neurogenesis during telencephalic development of mice 45 .We also found NEUROD1 is enriched in wntEGC in our dataset.NEUROD1 promotes transit-amplifying progenitors that contribute to the generation of the majority of the excitatory neurons of the neuroepithelium of the dorsal telencephalon in early development [46][47][48] .Moreover, chdEGC is enriched with CTCF, which regulates functional neural development and neuronal diversity 49 .Other well-known cell-type-specific motifs such as the Oligo (TBR1 50 , EOMES 51 ), nptxEX-1 (NEUROG2, NEUROD2, NEUROD4, NEUROD6 52 , BHLHE22), nptxEX-2 (BACH1, BACH2, the JUN 53 , NFIC, and FOSL families), nosIN (DLX family), and GEMs (RFX family) were enriched in our data.Taken together, cell-type-specific TFs that have been reported in previous studies are also present in our data, exhibiting similar patterns of enrichment, which indicates the accuracy of our cell type identification and the high-quality of our data.Therefore, we provide a resource-rich and high-quality data of chromatin landscape of the axolotl brain.

Fig. 1
Fig. 1 Schematic diagram illustrating the experimental and data analysis processes of scATAC-seq in the axolotl brain.(a) Cartoon illustrates the main experimental and analytical processes.Five different regions from the brain of adult axolotl were collected for single-cell Assay for Transposase Accessible Chromatin (scATAC-seq).Region 1, olfactory bulb (OB); region 2, telencephalon (Tel); region 3, diencephalon and mesencephalon (DM); region 4, hypothalamus and pituitary (HP); and region 5, rhombencephalon (Rho).Colors correspond to the five regions.(b) Analysis of the scATAC-seq workflow.

Fig. 2
Fig. 2 Quality metrics for scATAC-seq.(a) Violin plot illustrating the distribution of log 10 (unique fragments) for each library.(b) Violin plot of the transcription start site (TSS) enrichment score for each library.(c) Quality control (QC) filtering plots of the TSS enrichment fractions vs unique cell fragments for each cell from the olfactory bulb (OB), telencephalon (Tel), diencephalon and mesencephalon (DM), hypothalamus and pituitary (HP), and rhombencephalon (Rho).(d) UMAP of scATAC-seq data showing simulated doublet enrichment over expectation.(e) Heatmap clustering of correlation coefficients across 25 libraries of scATAC-seq profiles.(f) A UMAP visualization of the 25 libraries, each color of which represents a per library interaction of the five brain regions.(g) A UMAP visualization representing the five brain regions; Cells are colored by brain region in the plots.

Fig. 3
Fig. 3 Single-cell chromatin accessibility and diversity between assessed regions of the axolotl brain.(a) UMAP of 20 clusters as identified by scATAC-seq.Cells coloration indicates annotation.(b) UMAP visualization of celltype-specific gene activity score.(c) Genome browser view of the aggregated scATAC-seq chromatin accessibility profiles of the housekeeping gene (Gapdh) and the cell-type-specific gene loci.(d) Bar plot exhibiting the representative GO enrichment pathways of cell types.(e) Histogram displaying the distribution of cell types across various brain regions in different libraries.(f) Histogram of the number of up-regulated genes of the same cell type in different axolotl brain regions.(g) Heatmap of asclEGC-specific gene activity scores in the five brain regions.

Estimated number of cells Mean fragments per cell TSS Enrichment Accessible peaks
To assign genes to scATAC-seq peaks, it required axolotl genome annotation object and gene annotation object.Firstly, we forged a BSgenome package for the axolotl genome annotation following the official tutorial (https://github.com/Bioconductor/BSgenome).Briefly, we forged axolotl BSgenome package by the 'forgeBSgenomeDataPkg' function with the seed file and the published axolotl genome.Then, we created axolotl genome annotation by 'createGenomeAnnotation' function (parameters: genome = BSgenome.Axolotl).Secondly, to create gene annotation object, we utilized 'createGeneAnnotation' function and 3 GRanges object (GRanges object containing gene coordinates; GRanges object containing gene exon coordinates; GRanges object containing standed transcription start site coordinates for computing TSS enrichment scores downstream) as input.Finally, scATAC-seq peaks were assigned genes by creating an ArchR