Background & Summary

The deep-sea environment is very unique as it is characterized by being in near-total darkness with high hydrostatic pressure, a low average temperature, and a low supply of organic matter. Trenches are the deepest oceanic areas, featured with an extremely high hydro-static pressure and isolated hydrotopographical conditions. Due to the funnel structure of trenches, the sediments accumulate particularly along the trench axis and vary in terms of quality and quantity with depths. The unique geological features of hadal trenches are known to influence both the structure and ecological function of microbial communities, which primarily rely on chemosynthesis and heterotrophy to synthesize and consume organic material. Consequently, bacteria and archaea, equipped with robust metabolic capabilities for carbon fixation, nutrient recycling, and the assimilation of sparse substrates from the ocean’s upper strata, establish the cornerstone of this distinctive ecosystem. This fundamental microbial activity supports a distinct community residing in the hadal sediments, shedding light on the versatility and intricacy of life in extreme conditions. The application of next-generation sequencing technologies has significantly enhanced our understanding of microbiomes within trench sediments, expanding microbial ecology from examining patterns of microbial diversity to unraveling adaptive survival strategies in trench environments.

Situated approximately 120 kilometers off the northeastern coast of New Zealand in the Southern Hemisphere, the Kermadec Trench with 10,047 meters in length occupies the esteemed ranking of the fifth deepest global trench. Flaunting a longitudinal stretch of 1,500 kilometers complemented by a mean breadth of 60 kilometers, this trench mesmerizing expanse produces a dimensional spectacle. Intriguingly, the Kermadec Trench exhibits the emblematic ‘V’ cross-section that stands as a hallmark attribute of hadal trenches—a testament to their profound depths and steep slopes1. Notably, it forms part of the Pacific Ring of Fire, a volatile zone with numerous active volcanoes and frequent seismic activity2. This highly varified environment hosts a vast array of extremophilic microorganisms, characterized by their ability to survive in high-pressure and low-temperature conditions3,4. These microorganisms are essential for the conversion of inorganic substances into organic compounds through a process known as chemolithoautotrophy5. On the other hand, the Diamantina Trench located in the southeastern Indian Ocean with depth extends to nearly 8,047 meters below the ocean’s surface. With its formidable length of approximately 520 kilometers and a breadth of 70 kilometers, the Diamantina Trench unwinds in a northeast-southwest orientation. Remarkably, it sets its sprawling presence about 1,500 kilometers to the west of Perth, Australia, underscoring its geographical significance6. Given the scarcity of genomic information on sediment microorganisms in these trenches, we gathered sediment samples from several depths within these trenches from Nov. 2022 to Mar. 2023. Following our collection, we partitioned the sediment into multiple layers. Each layer consisted of a 2 cm slice, extending from the top surface down to a bottom layer of each sample (Fig. 1 and Supplementary Table S1).

Fig. 1
figure 1

Sample collection and data analysis procedure. (A) Location and the sampling area in the Kermadec and Diamantina trenches in the southwest Pacific Ocean and southeast Indian Ocean. (B) Schematic overview of sampling and metagenomic analysis performed in this study. Each rectangle symbolizes processes containing descriptions (in bold), methods or tools used in the corresponding analysis.

After extraction of the DNA from each subsample, the metagenomes were sequenced with Illumina HiSeq X Ten platform, with each sample (Supplementary Table S1). Metagenome sequencing statistics for the assembly results are listed in Supplymentary Table 2. Through metagenomic binning process, we reconstructed a total of 982 metagenome assembled genomes (MAGs) that were estimated to be >60% completeness and <5% contamination. Within these MAGs, 351 were estimated to be >90% complete, while an additional 331 were >80% complete (Supplementary Table S3). The phylogenomic analysis suggests that this set of draft genomes include plenty of microbial taxa that lack cultured representatives, such as bacteria Patescibacteria, Zixibacteria, Marinisomatota, Aenigmatarchaeota, Patescibacteria, Hydrogenedentota, Armatimonadota, Eisenbacteria and so on. In addition, there are also some potential new phyla including CG03, JACPSX01, JdFR-76, KSB1, SAR324 and SM23-31 and others (Fig. 2). We have uploaded all unique draft metagenome-assembled genomes discussed in this study to the National Center for Biotechnology Information (NCBI). We anticipate that this contribution will serve as a valuable asset for subsequent downstream analyses. It aims to provide reference data for extensive comparative genomic studies across crucial phylogenetic groups worldwide. Additionally, it offers the chance to delve into previously unexplored microbial metabolic processes. All assembled MAGs are also available at the github site: https://github.com/ylifc/microbial-Genomes-and-metagenomic-assembly-from-Kermadec-and-Diamantina-trench-sediments. The taxonomy results of all MAGs have been summarized into the Supplementary Table S4. Additionally, the relative abundance of microbial phyla showed significant differences among the various trenches and depths (Fig. 3). A Venn diagram of pan-genomic analysis result was constructed to illustrate the metabolic differences of microbial phyla between hadal and non-hadal sediments from the Kermadec and Diamantina trenches (Fig. 4). These results revealed distinct differences in metabolic capabilities and distribution pattern within each major microbial phylum between the trenches and between hadal and non-hadal sediments.

Fig. 2
figure 2

Phylogenetic diversity of 982 metagenome assembled genomes (MAGs) from sediments of Kermadec and Diamantina trenches (Supplementary Table 3) and reference genomes of bacteria (A) and archaea (B) available in RefSeq (Supplementary Table 5). The number of MAGs in each node are provided after the phylum name.

Fig. 3
figure 3

The relative abundance of the major microbial phylum and supergroups of proteobacteria in sediments of Kermadec and Diamantina trenches.

Fig. 4
figure 4

Venn diagram showing the metabolic differences of microbial groups between hadal and non-hadal depth in the sediments of Diamantina and Kermadec trench.

Methods

Sample collection

Sediment samples were collected using a pushcore from the Kermadec and Diamantina Trenches during the TS29 cruise on the R/V “Tan Suo Yi Hao” (Nov. 2022 - Mar. 2023) (Fig. 1A). The sediment pushcores, ranging from 0 to 50 cm below the seafloor (cmbsf), were retrieved using the manned submersible “Fendouzhe.” These cores were sliced into 2 cm subsamples on board and then stored at −80 °C until further analysis. The push-core recovery time from the bottom to the sea surface at each sampling site was less than 30 minutes.

DNA extraction and metagenome sequencing

A schematic overview of workflow in this study was shown in Fig. 1B. Totally, 29 samples were collected, and genomic DNAs with 2.5 g of each sample were extracted by using the DNeasy PowerSoil Pro Kit (QIAGEN, USA) according to the manufacturer’s instructions. Sequencing pipeline: The quantity of extracted DNA was measured using the Qubit dsDNA assay kit in combination with a Qubit® 2.0 fluorometer (Life Technologies, USA) and verified by 1% agarose gel electrophoresis. The quality of extracted DNA was measured with a Nanodrop instrument (Thermo Fisher Scientific, Waltham, MA, USA). Sequencing libraries were generated using the NEBNext Ultra DNA Library prep kit for Illumina (NEB, USA) and sequenced using an Illumina HiSeq X Ten platform (Illumina, USA). The quality filtering of short reads were achieved by removing the adapters and barcodes, as well as reads containing poly-N or that were of low-quality from the raw data using the FASTX-Toolkit. (http://hannonlab.cshl.edu/fastx_toolkit) and Fastqc softwares (https://github.com/s-andrews/FastQC).

Genome binning and annotation

The clean reads of each sample were assembled using MEGAHIT v1.2.97 with parameters ‘--k-min 21 --k-max 144 --k-step 10’, and remapped to assemblies using Bowtie2 v2.4.48 with default settings to obtain the coverage of contigs. Genomic binning was implemented using MetaBAT2 v2.12.19, MaxBin2 v2.2.710 and CONCOCT v1.1.011, with 1.5 kb as the contig length cut-off. All the reconstructed MAGs were refined using the ‘bin_refinement’ module of MetaWRAP v1.312, and their quality and taxonomic information were obtained using CheckM v1.1.213 and GTDB-TK v1.6.014 with GTDB-TK reference database (version 220), respectively. MAGs with completeness greater than 60% and contamination less than 10% were used for down-stream analysis. The open reading frames (ORFs) of the genomes was predicted using Prodigal v2.6.3 with the ‘-p meta’ parameter15, and then annotated against the Kyoto Encyclopedia of Genes and Genomes (KEGG) (version May 1, 2024)16 using Diamond v2.0.417 with >75% coverage and e values < 1 × 10−20.

Phylogenetic analysis

The 982 draft genomes (Supplementary Table 3) and the 234 reference genome sequences (Supplementary Table 5) accessed from NCBI GenBank were combined to find orthologs for phylogenetic analysis by Orthofnder (default parameters)18. Each ortholog was aligned using MUSCLE v.3.8.31 (parameters:–maxiters 16)19, trimmed using trimAL v.1.2 (parameters: -automated1)20 and manually assessed. A maximum-likelihood (ML) phylogenomic tree of concatenated ortholoes was constructed using the IQ-TREE v2.1.221. The final phylogenomic tree was visualized using iTOL (https://itol.embl.de/).

Data Records

The raw shotgun metagenome data has been deposited and is available through NCBI’s SRA and Biosample repository under umbrella project PRJNA1111327 (https://www.ncbi.nlm.nih.gov/bioproject/1111327)22, which is organized to include the nested Biosample and SRA Experiment accessions. The assembled MAGs associated with this study has also been deposited in NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP50888123, with individual accession numbers ranging from SRR30568798 to SRR30569739, as listed in the Supplementary Table 6. Additional data is available through Figshare24, including the FASTA files containing the contigs of all MAGs and the phylogenetic tree in Newick format, which are accessible at the following GitHub repository: https://github.com/ylifc/microbial-Genomes-and-metagenomic-assembly-from-Kermadec-and-Diamantina-trench-sediments.

Technical Validation

To minimize the chance of sample contamination, we adhered closely to established protocols for microbiota community analysis25. In summary, sample preparation was conducted under sterile conditions aboard the R/V “Tan Suo Yi Hao”. DNA extraction was carried out in a specialized lab area, equipped with a laminar flow hood and employing aseptic techniques, such as surface sterilization with DNA-OFF, the use of sterilized plasticware, and aerosol barrier pipette tips. The entire sample processing was expedited and completed within 48 hours. We consistently used the PowerSoil DNA Isolation Kit for sediment samples from the same batch to ensure uniformity. The quality of the sequenced data was thoroughly evaluated using fastp v0.20.1, accessible at https://github.com/OpenGene/fastp, with default settings applied26. Our quality assessment revealed that for all samples, over 90% of the reads achieved a Q score of Q30, which denotes a high level of accuracy in the read construction. We then assembled the metagenome data into MAGs through the automated quality control and assembly protocols outlined in our manuscript. To guarantee the integrity of the assembled contigs, different k-mer sizes were selectively used during the MEGAHIT assembly process (ranging from 21 to 141, step by 10). Following assembly, rigorous binning standards were applied, and the sequences obtained post-binning were re-assembled to ensure the highest possible quality of the resulting data. The phylogenomic tree was constructed using IQ-TREE with the -m TEST parameter to select the best model. The resulting phylogenomic tree is highly consistent with the results from GTDB-Tk v1.6.0.

Usage Notes

Investigating the microorganisms in trench sediments is crucial for understanding microbial ecology and evolution. This study provides comprehensive metagenomic and microbial genomic datasets from the sediments of the Kermadec and Diamantina trenches, covering both hadal and non-hadal sediments from these trenches. These datasets were acquired using a next-generation sequencing platform and a commonly used metagenomic analysis pipeline. Detailed information about the samples, including the sampling information and the sequencing platform used, is provided in Supplementary Table 1. Metagenome sequencing statistics and assembly results are presented in Supplementary Table 2, while genome quality metrics are outlined in Supplementary Table 3. The taxonomy of all MAGs (metagenome-assembled genomes) is summarized in Supplementary Table 4. Information on the 234 reference genome sequences is compiled in Supplementary Table 5, and the accession numbers for all MAGs analyzed in this study are listed in Supplementary Table 6.