Abstract
Deep-sea trenches representing an intriguing ecosystem for exploring the survival and evolutionary strategies of microbial communities in the highly specialized deep-sea environments. Here, 29 metagenomes were obtained from sediment samples collected from Kermadec and Diamantina trenches. Notably, those samples covered a varying sampling depths (from 5321 m to 9415 m) and distinct layers within the sediment itself (from 0~40 cm in Kermadec trench and 0~24 cm in Diamantina trench). Through metagenomic binning process, we reconstructed 982 metagenome assembled genomes (MAGs) with completeness >60% and contamination <5%. Within them, completeness of 351 MAGs were >90%, while an additional 331 were >80%. Phylogenomic analysis for the MAGs revealed nearly all of them were distantly related to known cultivated isolates. The abundant bacterial MAGs affiliated to phyla of Proteobacteria, Planctomycetota, Nitrospirota, Acidobacteriota, Actinobacteriota, and Chlorofexota, while the abundant archaeal phyla affiliated with Nanoarchaeota and Thermoproteota. These results provide a dataset available for further interrogation of diversity, distribution and ecological function of deep-sea microbes existed in the trenches.
Similar content being viewed by others
Background & Summary
The deep-sea environment is very unique as it is characterized by being in near-total darkness with high hydrostatic pressure, a low average temperature, and a low supply of organic matter. Trenches are the deepest oceanic areas, featured with an extremely high hydro-static pressure and isolated hydrotopographical conditions. Due to the funnel structure of trenches, the sediments accumulate particularly along the trench axis and vary in terms of quality and quantity with depths. The unique geological features of hadal trenches are known to influence both the structure and ecological function of microbial communities, which primarily rely on chemosynthesis and heterotrophy to synthesize and consume organic material. Consequently, bacteria and archaea, equipped with robust metabolic capabilities for carbon fixation, nutrient recycling, and the assimilation of sparse substrates from the ocean’s upper strata, establish the cornerstone of this distinctive ecosystem. This fundamental microbial activity supports a distinct community residing in the hadal sediments, shedding light on the versatility and intricacy of life in extreme conditions. The application of next-generation sequencing technologies has significantly enhanced our understanding of microbiomes within trench sediments, expanding microbial ecology from examining patterns of microbial diversity to unraveling adaptive survival strategies in trench environments.
Situated approximately 120 kilometers off the northeastern coast of New Zealand in the Southern Hemisphere, the Kermadec Trench with 10,047 meters in length occupies the esteemed ranking of the fifth deepest global trench. Flaunting a longitudinal stretch of 1,500 kilometers complemented by a mean breadth of 60 kilometers, this trench mesmerizing expanse produces a dimensional spectacle. Intriguingly, the Kermadec Trench exhibits the emblematic ‘V’ cross-section that stands as a hallmark attribute of hadal trenches—a testament to their profound depths and steep slopes1. Notably, it forms part of the Pacific Ring of Fire, a volatile zone with numerous active volcanoes and frequent seismic activity2. This highly varified environment hosts a vast array of extremophilic microorganisms, characterized by their ability to survive in high-pressure and low-temperature conditions3,4. These microorganisms are essential for the conversion of inorganic substances into organic compounds through a process known as chemolithoautotrophy5. On the other hand, the Diamantina Trench located in the southeastern Indian Ocean with depth extends to nearly 8,047 meters below the ocean’s surface. With its formidable length of approximately 520 kilometers and a breadth of 70 kilometers, the Diamantina Trench unwinds in a northeast-southwest orientation. Remarkably, it sets its sprawling presence about 1,500 kilometers to the west of Perth, Australia, underscoring its geographical significance6. Given the scarcity of genomic information on sediment microorganisms in these trenches, we gathered sediment samples from several depths within these trenches from Nov. 2022 to Mar. 2023. Following our collection, we partitioned the sediment into multiple layers. Each layer consisted of a 2 cm slice, extending from the top surface down to a bottom layer of each sample (Fig. 1 and Supplementary Table S1).
After extraction of the DNA from each subsample, the metagenomes were sequenced with Illumina HiSeq X Ten platform, with each sample (Supplementary Table S1). Metagenome sequencing statistics for the assembly results are listed in Supplymentary Table 2. Through metagenomic binning process, we reconstructed a total of 982 metagenome assembled genomes (MAGs) that were estimated to be >60% completeness and <5% contamination. Within these MAGs, 351 were estimated to be >90% complete, while an additional 331 were >80% complete (Supplementary Table S3). The phylogenomic analysis suggests that this set of draft genomes include plenty of microbial taxa that lack cultured representatives, such as bacteria Patescibacteria, Zixibacteria, Marinisomatota, Aenigmatarchaeota, Patescibacteria, Hydrogenedentota, Armatimonadota, Eisenbacteria and so on. In addition, there are also some potential new phyla including CG03, JACPSX01, JdFR-76, KSB1, SAR324 and SM23-31 and others (Fig. 2). We have uploaded all unique draft metagenome-assembled genomes discussed in this study to the National Center for Biotechnology Information (NCBI). We anticipate that this contribution will serve as a valuable asset for subsequent downstream analyses. It aims to provide reference data for extensive comparative genomic studies across crucial phylogenetic groups worldwide. Additionally, it offers the chance to delve into previously unexplored microbial metabolic processes. All assembled MAGs are also available at the github site: https://github.com/ylifc/microbial-Genomes-and-metagenomic-assembly-from-Kermadec-and-Diamantina-trench-sediments. The taxonomy results of all MAGs have been summarized into the Supplementary Table S4. Additionally, the relative abundance of microbial phyla showed significant differences among the various trenches and depths (Fig. 3). A Venn diagram of pan-genomic analysis result was constructed to illustrate the metabolic differences of microbial phyla between hadal and non-hadal sediments from the Kermadec and Diamantina trenches (Fig. 4). These results revealed distinct differences in metabolic capabilities and distribution pattern within each major microbial phylum between the trenches and between hadal and non-hadal sediments.
Methods
Sample collection
Sediment samples were collected using a pushcore from the Kermadec and Diamantina Trenches during the TS29 cruise on the R/V “Tan Suo Yi Hao” (Nov. 2022 - Mar. 2023) (Fig. 1A). The sediment pushcores, ranging from 0 to 50 cm below the seafloor (cmbsf), were retrieved using the manned submersible “Fendouzhe.” These cores were sliced into 2 cm subsamples on board and then stored at −80 °C until further analysis. The push-core recovery time from the bottom to the sea surface at each sampling site was less than 30 minutes.
DNA extraction and metagenome sequencing
A schematic overview of workflow in this study was shown in Fig. 1B. Totally, 29 samples were collected, and genomic DNAs with 2.5 g of each sample were extracted by using the DNeasy PowerSoil Pro Kit (QIAGEN, USA) according to the manufacturer’s instructions. Sequencing pipeline: The quantity of extracted DNA was measured using the Qubit dsDNA assay kit in combination with a Qubit® 2.0 fluorometer (Life Technologies, USA) and verified by 1% agarose gel electrophoresis. The quality of extracted DNA was measured with a Nanodrop instrument (Thermo Fisher Scientific, Waltham, MA, USA). Sequencing libraries were generated using the NEBNext Ultra DNA Library prep kit for Illumina (NEB, USA) and sequenced using an Illumina HiSeq X Ten platform (Illumina, USA). The quality filtering of short reads were achieved by removing the adapters and barcodes, as well as reads containing poly-N or that were of low-quality from the raw data using the FASTX-Toolkit. (http://hannonlab.cshl.edu/fastx_toolkit) and Fastqc softwares (https://github.com/s-andrews/FastQC).
Genome binning and annotation
The clean reads of each sample were assembled using MEGAHIT v1.2.97 with parameters ‘--k-min 21 --k-max 144 --k-step 10’, and remapped to assemblies using Bowtie2 v2.4.48 with default settings to obtain the coverage of contigs. Genomic binning was implemented using MetaBAT2 v2.12.19, MaxBin2 v2.2.710 and CONCOCT v1.1.011, with 1.5 kb as the contig length cut-off. All the reconstructed MAGs were refined using the ‘bin_refinement’ module of MetaWRAP v1.312, and their quality and taxonomic information were obtained using CheckM v1.1.213 and GTDB-TK v1.6.014 with GTDB-TK reference database (version 220), respectively. MAGs with completeness greater than 60% and contamination less than 10% were used for down-stream analysis. The open reading frames (ORFs) of the genomes was predicted using Prodigal v2.6.3 with the ‘-p meta’ parameter15, and then annotated against the Kyoto Encyclopedia of Genes and Genomes (KEGG) (version May 1, 2024)16 using Diamond v2.0.417 with >75% coverage and e values < 1 × 10−20.
Phylogenetic analysis
The 982 draft genomes (Supplementary Table 3) and the 234 reference genome sequences (Supplementary Table 5) accessed from NCBI GenBank were combined to find orthologs for phylogenetic analysis by Orthofnder (default parameters)18. Each ortholog was aligned using MUSCLE v.3.8.31 (parameters:–maxiters 16)19, trimmed using trimAL v.1.2 (parameters: -automated1)20 and manually assessed. A maximum-likelihood (ML) phylogenomic tree of concatenated ortholoes was constructed using the IQ-TREE v2.1.221. The final phylogenomic tree was visualized using iTOL (https://itol.embl.de/).
Data Records
The raw shotgun metagenome data has been deposited and is available through NCBI’s SRA and Biosample repository under umbrella project PRJNA1111327 (https://www.ncbi.nlm.nih.gov/bioproject/1111327)22, which is organized to include the nested Biosample and SRA Experiment accessions. The assembled MAGs associated with this study has also been deposited in NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP50888123, with individual accession numbers ranging from SRR30568798 to SRR30569739, as listed in the Supplementary Table 6. Additional data is available through Figshare24, including the FASTA files containing the contigs of all MAGs and the phylogenetic tree in Newick format, which are accessible at the following GitHub repository: https://github.com/ylifc/microbial-Genomes-and-metagenomic-assembly-from-Kermadec-and-Diamantina-trench-sediments.
Technical Validation
To minimize the chance of sample contamination, we adhered closely to established protocols for microbiota community analysis25. In summary, sample preparation was conducted under sterile conditions aboard the R/V “Tan Suo Yi Hao”. DNA extraction was carried out in a specialized lab area, equipped with a laminar flow hood and employing aseptic techniques, such as surface sterilization with DNA-OFF, the use of sterilized plasticware, and aerosol barrier pipette tips. The entire sample processing was expedited and completed within 48 hours. We consistently used the PowerSoil DNA Isolation Kit for sediment samples from the same batch to ensure uniformity. The quality of the sequenced data was thoroughly evaluated using fastp v0.20.1, accessible at https://github.com/OpenGene/fastp, with default settings applied26. Our quality assessment revealed that for all samples, over 90% of the reads achieved a Q score of Q30, which denotes a high level of accuracy in the read construction. We then assembled the metagenome data into MAGs through the automated quality control and assembly protocols outlined in our manuscript. To guarantee the integrity of the assembled contigs, different k-mer sizes were selectively used during the MEGAHIT assembly process (ranging from 21 to 141, step by 10). Following assembly, rigorous binning standards were applied, and the sequences obtained post-binning were re-assembled to ensure the highest possible quality of the resulting data. The phylogenomic tree was constructed using IQ-TREE with the -m TEST parameter to select the best model. The resulting phylogenomic tree is highly consistent with the results from GTDB-Tk v1.6.0.
Usage Notes
Investigating the microorganisms in trench sediments is crucial for understanding microbial ecology and evolution. This study provides comprehensive metagenomic and microbial genomic datasets from the sediments of the Kermadec and Diamantina trenches, covering both hadal and non-hadal sediments from these trenches. These datasets were acquired using a next-generation sequencing platform and a commonly used metagenomic analysis pipeline. Detailed information about the samples, including the sampling information and the sequencing platform used, is provided in Supplementary Table 1. Metagenome sequencing statistics and assembly results are presented in Supplementary Table 2, while genome quality metrics are outlined in Supplementary Table 3. The taxonomy of all MAGs (metagenome-assembled genomes) is summarized in Supplementary Table 4. Information on the 234 reference genome sequences is compiled in Supplementary Table 5, and the accession numbers for all MAGs analyzed in this study are listed in Supplementary Table 6.
Code availability
The present study did not use custom scripts to generate the dataset. The parameters and versions of all the bioinformatics tools used for the analysis are described in the Methods section.
References
Angel, M. V. Ocean trench conservation. Environmentalist 2, 1–17 (1982).
Ewart, A., Collerson, K., Regelous, M., Wendt, J. & Niu, Y. Geochemical evolution within the Tonga–Kermadec–Lau arc–back-arc systems: the role of varying mantle wedge composition in space and time. Journal of Petrology 39(3), 331–368 (1998).
Du, M.; et al, Geology, environment, and life in the deepest part of the world’s oceans. The Innovation 2, (2) (2021).
Peoples, L. M. et al. Microbial community diversity within sediments from two geographically separated hadal trenches. Frontiers in microbiology 10, 347 (2019).
Liu, H. & Jing, H. The Vertical Metabolic Activity and Community Structure of Prokaryotes along Different Water Depths in the Kermadec and Diamantina Trenches. Microorganisms 12(4), 708 (2024).
Stewart, H. A. & Jamieson, A. J. The five deeps: The location and depth of the deepest place in each of the world’s oceans. Earth-Science Reviews 197, 102896 (2019).
Li, D. et al. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature methods 9(4), 357–359 (2012).
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2016).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nature methods 11(11), 1144–1146 (2014).
Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6(1), 1–13 (2018).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research 25(7), 1043–1055 (2015).
Chaumeil, P.-A.; Mussig, A. J.; Hugenholtz, P.; Parks, D. H., GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. In Oxford University Press: (2020).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics 11(1), 1–11 (2010).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28(1), 27–30 (2000).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature methods 12(1), 59–60 (2015).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome biology 20, 1–14 (2019).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32(5), 1792–1797 (2004).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molecular biology and evolution 37(5), 1530–1534 (2020).
DOE Joint Genome Institute. Metagenomics of sediment samples from the Kermadec Trench and the Diamantina Trench. Genbank. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1111327 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP508881 (2024).
Metagenome sequencing and 982 microbial genomes from Kermadec and Diamantina Trenches sediments, Figshare, https://doi.org/10.6084/m9.figshare.27003355 (2024).
Eisenhofer, R. et al. Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends in microbiology 27(2), 105–117 (2019).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17), i884–i890 (2018).
Acknowledgements
We thank the pilots of the deep-sea HOV “Shen Hai Yong Shi” and the crew of the R/V “Tan Suo Yi Hao” for their professional service during the cruise of TS07 in June 2018. This work was supported by the innovational Found for the scientific and technological personnel of Hainan Province (KJRC2023C37), National Key R&D Program of China (2022YFC2805505), the Hainan Province Science and Technology special fund (ZDKJ2021036), the Special Research Assistant Grant of Chinese academy of science (E4710101), Hainan Province Postdoctoral Research Project (E33D0101).
Author information
Authors and Affiliations
Contributions
H.L. collected the samples. H.J. contributed reagents, materials, and analysis tools. Y.L. and Y.X. contributed to the data analysis. Y.L. wrote the paper. H.J. and H.L. contributed to the manuscript revision. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, Y., Liu, H., Xiao, Y. et al. Metagenome sequencing and 982 microbial genomes from Kermadec and Diamantina Trenches sediments. Sci Data 11, 1067 (2024). https://doi.org/10.1038/s41597-024-03902-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03902-z