Background & Summary

Metagenomics has emerged as a fundamental approach in marine environmental studies, deciphering the intricacies and diversity of microbial communities in the oceans and their environmental interplay1,2,3,4. This technology has facilitated the discovery of hitherto unknown microbes5,6,7, genes8,9,10, and metabolic pathways11,12, thereby considerably enriching our understanding of marine biodiversity and ecosystem function. Additionally, metagenomic techniques present opportunities for new biotechnological discoveries, such as enzyme development for industry and the identification of bioactive compound sources13,14,15. The continuous evolution of metagenomic tools can substantially augment our comprehension of Earth’s ecosystems, necessitating the generation and efficient exploitation of purpose-aligned metagenomic data.

The East Sea, which is also referred to as the Sea of Japan, is a semi-enclosed marginal sea situated in the western Pacific Ocean, and is colloquially termed a “miniature ocean” owing to its resemblance to the global oceans16. One of the prominent features of the East Sea is the Tsushima Warm Current (TWC), which originates from the south and is a major driver in shaping the region’s oceanic circulation by modulating water temperature, salinity, and nutrient dispersion. The substantial nutrient influx via these currents, in tandem with coastal upwelling17,18, results in high primary productivity, especially associated with the annual cycle of spring (April-June) and autumn (October-November) phytoplankton blooms19,20,21. Such recurrent biogeochemical fluctuations necessitate regular assessments to understand their impact on the microbial ecology of the East Sea, as shown by studies in the North Sea22.

Numerous long-term studies from the East Sea have been undertaken via programs including the Circulation Research in the East Asian Marginal Sea (CREAMS), CTD station operations23,24 or satellite colour measurements20,25. To date, however, there appears to be a lacuna in studies specifically focusing on the monthly variation of microbial community structure in the East Sea, which are indicative of seasonal changes in environmental parameters such as chlorophyll a concentration. Previous microbial investigations in the East Sea have primarily concentrated on the deep-sea sediments or methane hydrate-containing sediments26,27,28,29. In addition, a metagenomic study has been undertaken to assess the influence of environmental determinants on the spatial distribution of pelagic bacteria in the East Sea, albeit limited to a bimonthly scale during the summer and winter months30.

In this study, we present a one-year (January to December 2009) monthly metagenomic dataset derived from the East Sea’s coastal waters. Seawater samples were filtered using a 0.2 µm pore-size membrane and subsequently cryopreserved at −80 °C until DNA extraction, followed by sequencing via the Illumina HiSeq platform. The physicochemical characteristics of the water samples were concurrently measured to infer the environmental factors influencing the microbial community. The schematic diagram illustrating the methodology used for generating this dataset is presented in Fig. 1.

Fig. 1
figure 1

Schematic diagram of the processing of seawater samples.

Our selected sampling locations are of considerable scientific import due to the successful in situ isolation of a plethora of bacterial strains belonging to major marine bacterial clades, including SAR11, oligotrophic marine gammaproteobacteria (e.g., SAR92, OM60), OM43, and SAR11631,32,33. Therefore, the creation of this monthly metagenomic repository can offer an asset for investigating prokaryotic assemblages of temperate coastal seas through both culture-dependent and culture-independent methodologies. Furthermore, metagenomic analyses may unveil previously uncultured microbial species and suggest potential cultivation strategies. This exhaustive insight into the microbial community of this “miniature ocean” holds promise for fostering a deeper understanding of global marine ecosystems.

Methods

Sampling process

Seawater samples were collected monthly off the coast of the East Sea, in proximity to Sokcho, Korea, throughout the year 2009. Sampling was executed approximately 8 km from Dongmyeong Port (Fig. 2 and Table 1), with the exact location of sampling stations subject to minor variations due to atmospheric conditions. Approximately 10 litres of surface seawater samples were collected from a depth of 10 m using a Niskin sampler (General Oceanics, Inc., USA) and were transported to the laboratory in an ice-cooled box. The water samples (6 litres; 6 replicates of each 1 litre) were filtered through 0.2 μm pore-size polyethersulfone membrane filter (47 mm in diameter, Supor®, Pall, USA) for DNA extraction. Additionally, 1 litre was filtered using a 47 mm GF/F glass-fiber filter (Whatman, USA) to analyse chlorophyll a. All filters were stored at −80 °C until further processing. The residual volume was filtered employing a 0.45 μm pore-size cellulose ester membrane filter (Advantec, Japan), aliquoted into 50 ml conical tubes (Falcon, USA), and preserved at −80 °C, to be later used for the analysis of environmental parameters including dissolved inorganic ions (ammonium, nitrite + nitrate, phosphate, and silicate). Temperature and salinity of the water samples were measured onboard using a YSI 30 (YSI Inc., USA). The total cell count was conducted using epifluorescence microscopy (Nikon 80i, Nikon, Japan), enumerating DAPI-stained cells (Table 2).

Fig. 2
figure 2

Map of the sampling stations. Sampling stations of each month are indicated as blue dots with three-letter abbreviations of months. Inset at the upper left shows the approximate location of the sampling stations in the East Sea.

Table 1 Data availability of the metagenomic sequences from the East Sea, South Korea.
Table 2 Physicochemical parameters of seawater samples collected monthly.

Biogeochemical analyses

Chlorophyll a was extracted from GF/F glass-fiber filters using 90% aqueous acetone (v/v) at 4 °C overnight. The extraction solution was centrifuged for 10 min at 2,000 rpm, and the supernatants were analysed via a fluorometer (10 AU, Turner Designs, USA). Concentrations of inorganic nutrients, including NO2, NO3, NH4+, PO43−, and SiO2, were determined employing a QuAAtro Microflow Analyzer (SEAL Analytical, UK). The values obtained are graphically represented in Fig. 3 and tabulated in Table 2.

Fig. 3
figure 3

Physicochemical parameters of seawater samples collected monthly.

DNA extraction and metagenome sequencing

DNA was extracted from membrane filters using a protocol based on manual cell lysis, followed by purification with the DNeasy Blood & Tissue Kit (Qiagen, Germany). The membranes were situated inside a 5 ml tube with the sample-filtered side inward. Subsequent to the addition of 1 ml of cell lysis buffer (20 mM EDTA, 50 mM Tris, 400 mM NaCl, and 0.75 M sucrose) and 5 μl of lysozyme solution (10 mg mL−1 in 10 mM Tris-Cl (pH 8.0)), tubes were incubated for 30 min at 37 °C in a horizontal orientation with a rotation speed of 5 rpm in a hybridization oven. Following this, proteinase K at a final concentration of 0.2 mg mL−1 and sodium dodecyl sulfate at a final concentration of 1% were introduced, and the tubes were further incubated at 55 °C overnight with rotation in a hybridization oven. After incubation, RNase A (200 μg mL−1), 1 mL of AL buffer (DNeasy Blood & Tissue Kit, Qiagen), and 70% ethanol were sequentially added to the tubes with appropriate incubation times. The manufacturer’s instructions of DNeasy Blood & Tissue Kit were adhered to from the stage of transferring the lysis mixture to the DNeasy Mini spin column. The quality and quantity of the extracted DNA were assessed using electrophoresis with 1% of agarose gel, Nanodrop ND-1000 (Thermo Fischer Scientific, USA), and Qubit 2.0 Fluorometer (Life Technologies, US) employing the Qubit® dsDNA Assay Kit. Metagenome sequencing was performed at Theragen Etex Inc. (Suwon, Korea). The Truseq library preparation kits with default library linkers and adaptors were used to generate sequencing libraries. The libraries were sequenced on an Illumina HiSeq 2500 platform, producing 250 bp paired-end reads.

Taxonomic profiling of metagenomic reads

Raw sequence reads were decontaminated by adapter removal and quality trimming using BBDuk (v39.01) with the following parameters: ktrim = r, k = 23, mink = 11, hdist = 1, tpe, tbo, ftm = 5, qtrim = rl, trimq = 10, minlen = 100. Subsequently, the taxonomic profiling of these metagenomic reads was performed against a customized GTDB database (R207) generated by Struo234 (http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/). Taxonomic classification and species abundance estimation were performed using Kraken2 (v2.1.3) and Bracken (v2.7)35. The organization of the output report file was accomplished using Pavian36 (https://fbreitwieser.shinyapps.io/pavian/). Finally, the resulting species abundance information was visualized using the R package ‘tidyverse’.

Data Records

This project has been deposited at DDBJ/ENA/GenBank under the SRP accession No. SRP39615537. The Sequence Read Archive (SRA) accession numbers associated with the metagenomes are available in Table 1.

Technical Validation

The assessment of quality scores for the raw reads of the 12 metagenomes was performed using FastQC (v0.10.1). The results show that ~91.88% and ~79.60% of the bases have quality scores of ≥20 and ≥30, respectively, indicating that sequencing was performed successfully (Fig. 4). The distribution of per-read quality scores across the 12 metagenomes was similar, further indicating no quality issues (Fig. 4). Consistent with the characteristics of the Illumina sequencing technology, the forward reads exhibited higher quality compared to the reverse reads (Fig. 4). A succinct taxonomic profiling analysis was then conducted to ascertain the suitability of the generated data for subsequent metagenomic analysis (Fig. 5). The taxonomic composition revealed a prominent dominance of Pelagibacterales (26.0%; a median value over a 12-month period), followed by Flavobacteriales (12.6%), SAR86 (7.7%), Pseudomonadales (5.5%), and Rhodobacterales (5.3%). These taxa are typically known for their prevalence in the ocean1,4. This observed pattern also aligns with the microbial community structure derived from culture-independent investigations conducted on a seawater sample collected from the same research station31.

Fig. 4
figure 4

Distribution of per sequence mean quality scores of the 12 metagenomes.

Fig. 5
figure 5

Taxonomic profiling of metagenomic reads retrieved from the East Sea. A total of 34 orders are presented that ranked in the top 30 by maximum or median among the 12 metagenome samples.