Unraveling the whole genome DNA methylation profile of zebrafish kidney marrow by Oxford Nanopore sequencing

Liu, Xudong; Ni, Ying; Wang, Dandan; Ye, Silin; Yang, Mengsu; Sun, Xuan; Leung, Anskar Yu Hung; Li, Runsheng

doi:10.1038/s41597-023-02431-5

Download PDF

Data Descriptor
Open access
Published: 10 August 2023

Unraveling the whole genome DNA methylation profile of zebrafish kidney marrow by Oxford Nanopore sequencing

Xudong Liu¹^na1,
Ying Ni^2,3^na1,
Dandan Wang⁴,
Silin Ye ORCID: orcid.org/0009-0006-0928-3002¹,
Mengsu Yang ORCID: orcid.org/0000-0003-2083-2296^2,3,5,
Xuan Sun^4,6,
Anskar Yu Hung Leung^4,6 &
…
Runsheng Li ORCID: orcid.org/0000-0003-1563-1844^1,7

Scientific Data volume 10, Article number: 532 (2023) Cite this article

2162 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Zebrafish is a widely used model organism for investigating human diseases, including hematopoietic disorders. However, a comprehensive methylation baseline for zebrafish primary hematopoietic organ, the kidney marrow (KM), is still lacking. We employed Oxford Nanopore Technologies (ONT) sequencing to profile DNA methylation in zebrafish KM by generating four KM datasets, with two groups based on the presence or absence of red blood cells. Our findings revealed that blood contamination in the KM samples reduced read quality and altered methylation patterns. Compared with whole-genome bisulfite sequencing (WGBS), the ONT-based methylation profiling can cover more CpG sites (92.4% vs 70%–80%), and exhibit less GC bias with more even genomic coverage. And the ONT methylation calling results showed a high correlation with WGBS results when using shared sites. This study establishes a comprehensive methylation profile for zebrafish KM, paving the way for further investigations into epigenetic regulation and the development of targeted therapies for hematopoietic disorders.

Smart-RRBS for single-cell methylome and transcriptome analysis

Article 09 July 2021

Extended-representation bisulfite sequencing of gene regulatory elements in multiplexed samples and single cells

Article 06 May 2021

Systematic benchmarking of tools for CpG methylation detection from nanopore sequencing

Article Open access 08 June 2021

Background & Summary

Zebrafish (Danio rerio) is a widely used model organism in biomedical research, particularly for studying human diseases such as cancer¹, heart disease², and blood-related disorders³. This is due to their genetic and physiological similarities to human^4,5. Comparative genomic analysis has shown that the zebrafish genome shares a high degree of synteny and sequence homology with the human genome, with approximately 70% of human genes having at least one clear zebrafish orthologue^6,7.

The hematopoietic system is responsible for producing and maintaining blood cells in the animal body. In adult zebrafish, the kidney marrow (KM) has been identified as the primary hematopoietic organ where hematopoietic stems differentiate into various types of blood cells including erythrocytes, leukocytes, and thrombocytes^8,9. The zebrafish kidney marrow is considered to be analogous to the mammalian bone marrow⁸. Therefore, zebrafish KM has become an important model for studying hematopoiesis and blood-related discorders^10,11. Unlike mammalian red blood cells, zebrafish red blood cells are nucleated and DNA-rich. Consequently, when extracting KM DNA in zebrafish, red blood cells can represent a significant fraction of the samples, potentially impacting data interpretation for downstream analyses, including quality control and DNA methylation profiling.

Epigenetic modifications, such as DNA methylation, are heritable changes capable of modulating gene expression in hematopoietic disorders. Aberrant DNA methylation patterns have been observed in various hematologic malignancies, including leukemia¹². These alterations can result in dysregulated gene expression patterns, which contribute to the initiation and progression of these disorders. For example, hypermethylation of tumor suppressor genes can lead to their silencing, promoting cancer development¹³.

Investigating the role of DNA methylation in hematopoietic disorders using the zebrafish KM model is crucial for developing novel diagnostic and therapeutic strategies for human blood-related diseases. However, the absence of a comprehensive methylation baseline in wild-type zebrafish KM remains a challenge for achieving this goal.

The gold standard for detecting the 5-methylcytosine (5mC) methylation with high throughput sequencing still relies on the bisulfite conversion. Whole-genome bisulfite sequencing (WGBS) can be used to detect the 5mC methylation on a genome-wide scale. However, the bias arises from chemical treatments, and subsequently PCR amplification hindered the detection of 5mC levels in some specific regions of the genome. A recently published WGBS data for different zebrafish tissues¹⁴ is based on GRCz10 assembly, while the most recently updated genome assembly is GRCz11. A liftover of these datasets to new assembly are needed to make the dataset more accessible to the zebrafish research community.

Using nanopore sequencing technology to detect DNA 5mC methylation offers several significant advantages, including long-read sequencing, direct detection of base modifications, and single-molecule resolution¹⁵. As a long-read sequencing platform, the read GC bias could be less than NGS, which is a result of amplification. Nanopore sequencing can directly detect 5mC modifications by inferring the raw current signal. This eliminates the need for additional chemical treatments, reducing the risk of DNA damage and minimizing potential biases introduced during the conversion process. Moreover, nanopore sequencing provides single-molecule resolution, enabling the detection of 5mC methylation patterns at individual DNA molecules. This allows for the identification of rare and heterogeneous methylation events, which may be crucial for understanding the functional consequences of DNA methylation in different biological contexts.

In this study, we employed the Oxford Nanopore Technologies (ONT) platform to profile DNA methylation in the zebrafish KM with or without red blood cells and compared the genome-wide 5mC profiling capacity using ONT reads with the WGBS method. This dataset establishes a foundation for further exploration of the epigenetic regulation of hematopoietic disorders and the development of targeted therapeutic interventions in zebrafish.

Methods

Sample collection and DNA extraction

Wild-type Tubingen (TU) zebrafish lines were obtained from the Zebrafish International Resource Center (ZIRC, USA). The study received approval from the Committee on the Use of Laboratory and Research Animals (CULATR), approval number CULATR 5649-21, at the University of Hong Kong (HKU). Kidney marrow from 8-month-old WT zebrafish was collected, dissociated in 0.9X PBS via pipetting, and filtered through a 40 µm nylon cell strainer (Corning, NY, USA). Some samples underwent treatment with eBioscience™ 1x RBC Lysis Buffer (Invitrogen, MA, USA) to lyse red blood cells, and these samples were designated as the KM group. Samples without red blood cell lysis were categorized as the KMB group. The resulting cells were centrifuged at 500 g for 5 minutes, resuspended, filtered, and washed twice with 0.9X PBS for DNA extraction. Genomic DNA was extracted using QIAamp DNA Kits (Qiagen, Germany).

Library preparation and nanopore sequencing

Approximately 2 μg of Genomic DNA extracted from the kidney marrow cells of zebrafish was used to prepare the library with a ligation sequencing kit (LSK-110, ONT). The libraries were sequenced on MinION Mk1B after loading to R9.4.1 flow cells (FLO-MIN106D, ONT).

Read processing

The generated raw Nanopore data (fast5 type) were basecalled by Dorado v. 0.1.1 (https://github.com/nanoporetech/dorado) with the dna_r9.4.1_e8_sup@v3.3 super-accuracy model. Adapters from nanopore reads were removed by Porechop v. 0.2.4 (https://github.com/rrwick/Porechop) and both head and tail 50 bp bases were cut off for each read using NanoFilt v.2.8.0¹⁶. Reads with a low length than 200 bp and an estimated read quality score below 7 were removed by NanoFilt v.2.8.0¹⁶.

Read quality analysis

The reads were aligned to the zebrafish reference (GRCz11) using Minimap2 v.2.22¹⁷ with the argument “–secondary = on”. To summarize the read accuracy of different libraries, we counted the estimated accuracy (Eq. 2) and observed accuracy (Eq. 3) for each aligned read:

$$N\left(total\right)=N\left(sub\right)+N\left(mat\right)+N\left(ins\right)+N\left(del\right)$$

(1)

$$Estimated\;read\;accuracy=1-\left[\frac{1}{N}\ast \sum 1{0}^{-{q}_{i}/10}\right]$$

(2)

$$Observed\;read\;accuracy=N(mat)/N(total)$$

(3)

Here, N was the number of the base in each read and q_i was the i-th basecalled base quality score. N(sub), N(mat), N(ins), and N(del) were the number of substitution(s), match(es), insertion(s), and deletion(s) in each read, respectively.

Methylation profiling

All the raw Nanopore data (fast5 type) were basecalled by Bonito v.0.6.1 (https://github.com/nanoporetech/bonito) with the dna_r9.4.1_e8_sup@3.3 model and the argument of “--modified_bases 5mc” for 5mC to bam files with methylation information. To get the site-level modification proportion, modbam2bed v.0.6.3 (https://github.com/epi2me-labs/modbam2bed) was used to profile the bam file. To obtain the proportion of covered CpG sites in ONT methods, only the site that had at least one forward and one reverse sequencing read coverage was counted. To assess the methylation status of each chromosome region, the chromosome was divided into 100Kb fragments. The 5Kb fragments upstream of the transcription start site (TSS) were defined as the promoter region. The regional methylation proportion was then calculated for each fragment using an in-house script, which can be found at https://github.com/lrslab/Zebrafish-Multisequencing.

Dataset subsampling and coverage estimation

To evaluate the reference coverage of the datasets, we implemented a subsampling approach using the seqkit¹⁸ tool with the “sample” argument, generating subsets of the dataset at ten different proportions, spanning a range from 0% to 100%. These subsets were then aligned to the reference using minimap2¹⁷ with the “-ax map-ont” argument, and the resulting alignments were used to calculate the coverage of the reference using the samtools¹⁹ tool.

GC bias comparison

To compare the GC bias of whole-genome bisulfite sequencing (WGBS) and Oxford Nanopore Technologies (ONT) datasets, we mapped the datasets to the zebrafish genome (GRCz11) using bwa and minimap2¹⁷ for NGS and ONT data, respectively. We utilized samtools¹⁹ to profile the genome coverage at the 1k bin level using the “bedcov” argument. The resulting bins were then classified based on their GC content, which ranged from 0% to 100%. We subsequently calculated the read coverage for each GC content level and determined the average read coverage for GC content ranging from 10% to 60%. We normalized the read coverage by dividing the coverage of each GC content level within this range by the overall average read coverage.

Data Records

The raw sequencing reads and methylation bam files of all samples are available from the NCBI via the BioProject accession number PRJNA930374²⁰. The ONT regional level methylation profiling on zebrafish GRCz11 assembly can also be found in Gene Expression Omnibus (GEO) with accession number GSE232842²¹. The WGBS per-site, gene promoter, and 100 Kb bins level methylation profiling files on zebrafish GRCz11 assembly generated from the GRCz10 assembly using LiftOver are available at the Figshare with the DOI number (https://doi.org/10.6084/m9.figshare.22689700²² and https://doi.org/10.6084/m9.figshare.22785191²³).

Technical Validation

Read statistics

We presented four datasets of zebrafish samples collected from kidney marrow (KM). The samples were divided into two groups based on the sample handling process. The first group contained kidney marrow with blood (KMB), while the second group filtered out the blood DNA by adding a red blood cell lysis before the DNA extraction. Both groups of samples were sequenced by Oxford Nanopore Technologies (ONT) platform, with one R9.4.1 flow cell used for each sample. The raw yield for KMB1, KMB2, KM1, and KM2 was 7.2 Gb, 9.8 Gb, 20.6 Gb, and 20.5 Gb, respectively (Supplementary Table 1). To obtain clean data, several filters were applied, including adapter, short read, and low quality read removal (see methods for detail). After filtering, at least 85% of high-quality reads were retained for further analysis (Supplementary Table 1).

Read accuracy and genome recovery rate

High read accuracy is of paramount importance for downstream analysis and obtaining accurate results. To accurately assess the quality of the four datasets, we calculated both the estimated read accuracy and observed read accuracy. The average estimated accuracy for the two datasets in the KMB group was approximately 92.5%, while the two datasets in the KM group reached 95% (Fig. 1A). The average observed read accuracies for KMB and KM samples are approximately 90% and 92.5% (Fig. 1A). The modal read accuracy provides results similar to those obtained from the average read accuracy that the read quality for KM is better than that for KMB. The modal read accuracy for KM1 and KM2 exceeded 95%, while those for KMB1 and KMB2 were below 95% (Fig. 1B,C). To exclude the potential influence of read length to sequencing quality across different groups, we conducted a comparative analysis of read length and observed read accuracy. Specially, we computed the correlation coefficients between these two variables in four samples. Our results indicated that the absolute value of Pearson correlation coefficients (R) in all four samples were less than 0.2, suggesting no significant association between read length and read accuracy (Fig. S1). To account for differences in read coverage across the samples, which may affect the quality of sequencing data, we further subsampled the yield of KM1, KM2 and KMB2 to match that of KMB1 (7.2 Gb). We then compared the estimated and observed read accuracy for each sample. Our results were consistent with those obtained using the total reads (Fig. S2). It should be noted that the zebrafish DNA sequencing data was not used for training Nanopore basecalling models, which may result in a lower observed read accuracy when compared to human sequencing data²⁴.

To calculate the genome recovery rate for each group, we merged the dataset within the same group and subsampled the reads. The KMB reads covered 96.7% of the GRCz11 genome and KM reads covered 99.1% when using the full dataset (Fig. 1D). The reduced genome recovery rate is related to the low read yield in KMB samples.

Our evaluation of read accuracy for the KM and KMB groups revealed that both the estimated and observed read accuracy were consistently lower in the KMB group when compared to the KM group. These results suggest that the presence of red blood cells in the kidney marrow samples may have contributed to the lower read quality during the sequencing.

Methylation pattern

Given the critical role of base methylation in diverse disease mechanisms, we sought to investigate whether blood contamination would have an impact on the methylation information of our kidney marrow samples. To ensure that our read depth was sufficient for accurate 5mC methylation calling, we conducted a subsampling analysis on each dataset. Specifically, we randomly subsampled each dataset to different proportions and repeated this process 10 times for each proportion. The overall methylation for each subsampled dataset was calculated and results demonstrate that each dataset exhibits stable methylation patterns across all subsampling proportions, indicating that the read depth is sufficient for accurate methylation calling (Fig. 2A). The overall 5mC methylation proportion of subsampling datasets for KMB1 and KMB2 were 74.7% and 76.8%, respectively, which were lower than that of the kidney marrow dataset (78.2%) (Fig. 2A). To assess whether there are any significant differences in regional methylation status, the correlation coefficients between the four datasets at both the 100Kb bin and promoter levels were computed. The results revealed a high degree of correlation across all four datasets, with correlation coefficients of at least 0.68 (Fig. 2B,C). The correlation coefficients among the four groups reveal that KM1 and KM2 exhibited the highest correlation, with a coefficient of 0.83 at the 100Kb bin level, and 0.93 at the promoter level. In contrast, the KMB1 and KMB2 had the lowest correlation coefficient, with a value of only 0.68 at the 100Kb bin level and 0.81 at the promoter level (Fig. 2B,C). This suggests that the presence of blood contamination may reduce the stability of the methylation status of kidney marrow.

To further explore the impact of blood contamination on the methylation status of kidney marrow, we compared the distribution of methylation patterns at the regional level for each dataset. From the distribution of regional methylation proportion, the average 5mC methylation proportion of KMB1 and KMB2 were 74.2% and 76.3%, respectively, which were lower than that of the KM group (78.2%) at 100Kb bin level (Fig. 2D). Additionally, the average methylation proportion of KMB1 (approximately 61.7%) is lower than that of KM (approximately 65%) at promoter level (Fig. 2E). Considering the potential influence of read length and read coverage differences among samples on methylation proportion, we calculated the 5mC methylation proportion and read length for each read and subsequently computed the correlation coefficients between these two variables. Our findings revealed that the coefficients were less than 0.2 across all samples, indicating no significant association between read length and 5mC methylation proportion (Fig. S3). Concurrently, we equalized the yield of KM1, KM2, and KMB2 to match that of KMB1 and compared the distribution of regional methylation patterns among the four datasets with equal coverage. Our results demonstrated a pattern similar to that observed when using total reads (Fig. S4). The observation suggests that the KMB samples may have been subject to greater variability in methylation patterns, potentially due to the presence of blood contamination.

Methylation pattern in eleven different tissues from WGBS

As is widely recognized, each tissue processes its distinctive methylation pattern. To assess the methylation patterns among different tissues, especially the blood and kidney, we conducted a comparison of the CpG site methylation using the zebrafish tissues datasets from NCBI Gene Expression Omnibus, which included the whole-genome bisulfite sequencing (WGBS) results in eleven zebrafish adult tissues²⁵. The liftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver) was utilized to convert the original base-modification profiling data from the GRCz10 genome to the GRCz11 genome, which is necessary to ensure consistency and compatibility of the data with the least version of the genome¹⁴. Despite the exclusion of a few positions due to differences between the two references, the average overall 5mC methylation proportion was found to be similar to that of the original GRCz10 genome data (Fig. 3A).

As illustrated in Fig. 3B, the average genome-wide 5mC proportion across all tissues ranged from 77.4% to 86.5%. Notably, the liver and testis exhibited a proportion exceeding 80%, indicating a potentially important role of DNA methylation in these tissues. At the promoter level, the average 5mC proportion ranged from 63.2% to 69.6% across all tissues (Fig. 3C). Especially, the 5mC methylation proportion for blood was 77.4% at the genome-wide level and 64.7% at the promoter level. In contrast, the kidney exhibited a 5mC methylation proportion of 79.6% at the genome-wide level and 65% at the promoter level. This discrepancy may explain why KMB has a lower 5mC level than that of KM.

Comparison of WGBS and ONT methods in methylation calling

Next, we compared the efficacy of two distinct methods, WGBS and ONT, in calling methylation in CpG sites. Taking into account the fact that the ONT method can identify the single 5mC methylation proportion with directionality, we counted both the C-base (in the forward direction) and the adjacent G-base (which is essentially equivalent to the C-base in the reverse direction) as a single CpG site (see method for detail). As shown in Fig. 4, the ONT method was able to cover 92.4% of CpG sites with at least one positive and revered read pair across the 25 zebrafish chromosomes (GRCz11), while the WGBS method covered a less proportion of CpG sites, ranging from 70% to 80%, with a maximum of 83.4%. This observation suggests that the datasets generated using the ONT platform exhibit a higher rate of genome recovery compared to those generated using WGBS.

To further validate the reliability of our ONT sequencing data, we compared the genome-wide and promoter CpG methylation levels of kidney marrow with those profiled from kidney tissue. We initially included all the CpG sites reported in each dataset and grouped them into 100Kb bins to estimate the correlation coefficient (R) at both the genome-wide and promoter levels. The resulting R values were 0.75 and 0.81, respectively (Fig. 5A,B). As the WGBS datasets covered fewer CpG sites than ONT datasets, we also selected the shared positions in both the WGBS and ONT data to evaluate the correlation at the whole-genome and promoter levels. As shown in Fig. 5C,D, this approach resulted in an improvement in the correlation coefficient in both the 100Kb bin and promoter levels. To further explore the relationship across different tissues and sequencing platforms, we computed the correlation coefficient of regional methylation using the shared sites between Blood (WGBS), Kidney (WGBS), KMB1 (ONT), and KMB2 (ONT). Our results indicated the two samples from WGBS exhibited the highest R values, while the two KMB samples had the lowest. Additionally, although the R values between Kidney and KMBs were high, they were still lower than that between KM and Kidney (Fig. S5). These observations suggest that the presence of blood in kidney marrow may affect the methylation pattern.

To explore the reduced CpG recovery rate in the WGBS dataset and the discrepancies between ONT and WGBS, we assessed potential sequencing bias in genomic regions with varying GC content.

To address this question, we compared four datasets, including kidney sample 1 (WGBS)²⁶, kidney sample2 (WGBS)²⁶, KM1(ONT), and KM2 (ONT), which had a raw yield of approximately 47.4 Gb, 70.9 Gb, 20.6 Gb, and 20.5 Gb, respectively. It is noteworthy that both WGBS datasets exhibited a lower genome recovery rate of approximately 85%, which is lower than that of ONT datasets (approximately 98%), despite higher data yield than ONT data (Fig. 5E). This indicates the presence of bias in the WGBS method.

To determine whether the observed bias in the WGBS method is due to the GC content, further investigation is needed. As illustrated in Fig. 5F, the majority of GC content in the zebrafish genome lies between 10% and 60%, which guided our focus for read coverage analysis within this range. Our result demonstrated that ONT reads were relatively uniformly distributed across the genome, irrespective of GC content (Fig. 5G). Conversely, WGBS reads displayed a distinct bias, characterized by increased read depth in regions with low GC content and decreased read depth in regions with high GC content (Fig. 5G). This GC bias may lead to uneven genomic coverage when characterizing DNA methylation using WGBS reads, particularly in high GC content regions.

Code availability

All software utilized in this study is publicly available, and their respective parameters are detailed in the Methods section. In cases, where no parameters were specified, default values, as recommended by the software developers, were employed. The scripts utilized in this study are accessible at https://github.com/lrslab/Zebrafish-Multisequencing.

References

White, R. M. et al. DHODH modulates transcriptional elongation in the neural crest and melanoma. Nature 471, 518–522 (2011).
Article CAS PubMed PubMed Central ADS Google Scholar
Chico, T. J., Ingham, P. W. & Crossman, D. C. Modeling cardiovascular disease in the zebrafish. Trends in cardiovascular medicine 18, 150–155 (2008).
Article CAS PubMed Google Scholar
Rasighaemi, P., Basheer, F., Liongue, C. & Ward, A. C. Zebrafish as a model for leukemia and other hematopoietic disorders. Journal of Hematology & Oncology 8 (2015).
Kari, G., Rodeck, U. & Dicker, A. P. Zebrafish: an emerging model system for human disease and drug discovery. Clinical Pharmacology & Therapeutics 82, 70–80 (2007).
Article CAS Google Scholar
Lieschke, G. J. & Currie, P. D. Animal models of human disease: zebrafish swim into view. Nature Reviews Genetics 8, 353–367 (2007).
Article CAS PubMed Google Scholar
Howe, K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Howe, K. et al. Structure and evolutionary history of a large family of NLR proteins in the zebrafish. Open Biology 6, 160009 (2016).
Article PubMed PubMed Central Google Scholar
Murayama, E. et al. Tracing Hematopoietic Precursor Migration to Successive Hematopoietic Organs during Zebrafish Development. Immunity 25, 963–975 (2006).
Article CAS PubMed Google Scholar
Kissa, K. & Herbomel, P. Blood stem cells emerge from aortic endothelium by a novel type of cell transition. Nature 464, 112–115 (2010).
Article CAS PubMed ADS Google Scholar
Weyand, A. C. & Shavit, J. A. Zebrafish as a model system for the study of hemostasis and thrombosis. Current opinion in hematology 21, 418–422 (2014).
Article PubMed PubMed Central Google Scholar
Wang, D. et al. Transgenic IDH2R172K and IDH2R140Q zebrafish models recapitulated features of human acute myeloid leukemia. Oncogene 42, 1272–1281 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ley, T. J. et al. DNMT3A Mutations in Acute Myeloid Leukemia. New England Journal of Medicine 363, 2424–2433 (2010).
Article CAS PubMed Google Scholar
Esteller, M. Epigenetic gene silencing in cancer: the DNA hypermethylome. Human molecular genetics 16, R50–R59 (2007).
Article CAS PubMed Google Scholar
Yang, H. et al. A map of cis-regulatory elements and 3D genome structures in zebrafish. Nature 588, 337–343 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nature Methods 14, 407–410 (2017).
Article CAS PubMed Google Scholar
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).
Article PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016).
Article PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/PRJNA930374 (2023).
Liu, X. et al. GEO. https://identifiers.org/geo/GSE232842 (2023).
Liu, X. Methylation information. figshare https://doi.org/10.6084/m9.figshare.22689700.v2 (2023).
Liu, X. Methylation proportion at different regional level. figshare https://doi.org/10.6084/m9.figshare.22785191.v1 (2023).
Ni, Y., Liu, X., Simeneh, Z. M., Yang, M. & Li, R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Computational and Structural Biotechnology Journal 21, 2352–2364 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yang, H. et al. GEO. https://identifiers.org/geo/GSE134055 (2020).
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRP213938 (2020).

Download references

Acknowledgements

This work was supported by the Hong Kong Branch of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou) (SMSEGL20SC02), Early career scheme (project number 9048204) from the Hong Kong Research Grant Council, Hong Kong Health and Medical Research Fund (project number 9211280) and new Research Initiatives support from City University of Hong Kong (project number 9610497) to R.L, and was supported by Hetao Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone Shenzhen Park Project (HZQB-KCZYZ-2021017) and City University of Hong Kong (project number 9680217 and number 9678223) to M.Y; and was supported by Theme-based Research Scheme (T12-702/20-N), Technology Start-up Support Scheme for Universities (TSSSU/HKU/22/12/1), Incu-Bio Programme, HKSTP, and NSFC/RGC Joint Research Scheme (N_HKU745/21) to A.Y.H.L.

Author information

These authors contributed equally: Xudong Liu, Ying Ni.

Authors and Affiliations

Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China
Xudong Liu, Silin Ye & Runsheng Li
Department of Precision Diagnostic and Therapeutic Technology, City University of Hong Kong Shenzhen Futian Research Institute, Shenzhen, China
Ying Ni & Mengsu Yang
Department of Biomedical Sciences and Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong, China
Ying Ni & Mengsu Yang
Division of Haematology, Department of Medicine, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
Dandan Wang, Xuan Sun & Anskar Yu Hung Leung
Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute of City University of Hong Kong, Shenzhen, China
Mengsu Yang
ZeBlast Technology Limited, Hong Kong Science Park, Hong Kong, China
Xuan Sun & Anskar Yu Hung Leung
Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
Runsheng Li

Authors

Xudong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Ni
View author publications
You can also search for this author in PubMed Google Scholar
Dandan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Silin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Mengsu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Anskar Yu Hung Leung
View author publications
You can also search for this author in PubMed Google Scholar
Runsheng Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xudong Liu: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing. Ying Ni: Methodology, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing. Dandan Wang: Resources, Methodology, Writing – review & editing. Silin Ye: Methodology, Writing – review & editing. Mengsu Yang: Resources, Writing – review & editing, Supervision, Funding acquisition. Xuan Sun: Resources, Methodology, Writing – review & editing. Anskar Yu Hung Leung: Resources, Methodology, Writing – review & editing, Supervision, Funding acquisition. Runsheng Li: Conceptualization, Methodology, Software, Validation, Resources, Writing – review & editing Supervision, Funding acquisition.

Corresponding authors

Correspondence to Anskar Yu Hung Leung or Runsheng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

supplementary table and figures

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, X., Ni, Y., Wang, D. et al. Unraveling the whole genome DNA methylation profile of zebrafish kidney marrow by Oxford Nanopore sequencing. Sci Data 10, 532 (2023). https://doi.org/10.1038/s41597-023-02431-5

Download citation

Received: 22 May 2023
Accepted: 31 July 2023
Published: 10 August 2023
DOI: https://doi.org/10.1038/s41597-023-02431-5