Metagenomic data from surface seawater of the east coast of South Korea

The East Sea, also known as the Sea of Japan, is a marginal sea located in the western Pacific Ocean, displaying comparable characteristics to Earth’s oceans, thereby meriting its recognition as a “miniature ocean”. The East Sea exhibits a range of annually-recurring biogeochemical features in accordance with seasonal fluctuations, such as phytoplankton blooms during the spring and autumn seasons. Despite ongoing monitoring efforts focused on water quality and physicochemical parameters, the investigation of prokaryotic assemblages in the East Sea, encompassing seasonal variations, has been infrequently pursued. Here, we present a monthly time-series metagenomic dataset spanning a one-year period in 2009, obtained from surface (10 m) seawater samples collected off the coast of the East Sea. The dataset encompasses 12 metagenomes, amounting 195 Gbp, with 14.73–22.52 Gbp per sample. This dataset is accompanied by concurrently measured physicochemical parameters. Our anticipation is that these metagenomes will facilitate extensive investigations aimed at elucidating various aspects of the marine microbial ecosystems in the East Sea.


Background & Summary
Metagenomics has emerged as a fundamental approach in marine environmental studies, deciphering the intricacies and diversity of microbial communities in the oceans and their environmental interplay [1][2][3][4] .This technology has facilitated the discovery of hitherto unknown microbes [5][6][7] , genes [8][9][10] , and metabolic pathways 11,12 , thereby considerably enriching our understanding of marine biodiversity and ecosystem function.Additionally, metagenomic techniques present opportunities for new biotechnological discoveries, such as enzyme development for industry and the identification of bioactive compound sources [13][14][15] .The continuous evolution of metagenomic tools can substantially augment our comprehension of Earth's ecosystems, necessitating the generation and efficient exploitation of purpose-aligned metagenomic data.
The East Sea, which is also referred to as the Sea of Japan, is a semi-enclosed marginal sea situated in the western Pacific Ocean, and is colloquially termed a "miniature ocean" owing to its resemblance to the global oceans 16 .One of the prominent features of the East Sea is the Tsushima Warm Current (TWC), which originates from the south and is a major driver in shaping the region's oceanic circulation by modulating water temperature, salinity, and nutrient dispersion.The substantial nutrient influx via these currents, in tandem with coastal upwelling 17,18 , results in high primary productivity, especially associated with the annual cycle of spring (April-June) and autumn (October-November) phytoplankton blooms [19][20][21] .Such recurrent biogeochemical fluctuations necessitate regular assessments to understand their impact on the microbial ecology of the East Sea, as shown by studies in the North Sea 22 .
Numerous long-term studies from the East Sea have been undertaken via programs including the Circulation Research in the East Asian Marginal Sea (CREAMS), CTD station operations 23,24 or satellite colour measurements 20,25 .To date, however, there appears to be a lacuna in studies specifically focusing on the monthly variation of microbial community structure in the East Sea, which are indicative of seasonal changes in environmental parameters such as chlorophyll a concentration.Previous microbial investigations in the East Sea have primarily concentrated on the deep-sea sediments or methane hydrate-containing sediments [26][27][28][29] .In addition, a metagenomic study has been undertaken to assess the influence of environmental determinants on the spatial distribution of pelagic bacteria in the East Sea, albeit limited to a bimonthly scale during the summer and winter months 30 .
In this study, we present a one-year (January to December 2009) monthly metagenomic dataset derived from the East Sea's coastal waters.Seawater samples were filtered using a 0.2 µm pore-size membrane and subsequently cryopreserved at −80 °C until DNA extraction, followed by sequencing via the Illumina HiSeq platform.The physicochemical characteristics of the water samples were concurrently measured to infer the environmental factors influencing the microbial community.The schematic diagram illustrating the methodology used for generating this dataset is presented in Fig. 1.
Our selected sampling locations are of considerable scientific import due to the successful in situ isolation of a plethora of bacterial strains belonging to major marine bacterial clades, including SAR11, oligotrophic marine gammaproteobacteria (e.g., SAR92, OM60), OM43, and SAR116 [31][32][33] .Therefore, the creation of this monthly metagenomic repository can offer an asset for investigating prokaryotic assemblages of temperate coastal seas through both culture-dependent and culture-independent methodologies.Furthermore, metagenomic analyses may unveil previously uncultured microbial species and suggest potential cultivation strategies.This exhaustive insight into the microbial community of this "miniature ocean" holds promise for fostering a deeper understanding of global marine ecosystems.

Methods
Sampling process.Seawater samples were collected monthly off the coast of the East Sea, in proximity to Sokcho, Korea, throughout the year 2009.Sampling was executed approximately 8 km from Dongmyeong Port (Fig. 2 and Table 1), with the exact location of sampling stations subject to minor variations due to atmospheric conditions.Approximately 10 litres of surface seawater samples were collected from a depth of 10 m using a Niskin sampler (General Oceanics, Inc., USA) and were transported to the laboratory in an ice-cooled box.The water samples (6 litres; 6 replicates of each 1 litre) were filtered through 0.2 μm pore-size polyethersulfone membrane filter (47 mm in diameter, Supor ® , Pall, USA) for DNA extraction.Additionally, 1 litre was filtered using a 47 mm GF/F glass-fiber filter (Whatman, USA) to analyse chlorophyll a.All filters were stored at −80 °C until further processing.The residual volume was filtered employing a 0.45 μm pore-size cellulose ester membrane filter (Advantec, Japan), aliquoted into 50 ml conical tubes (Falcon, USA), and preserved at −80 °C, to be later used for the analysis of environmental parameters including dissolved inorganic ions (ammonium, nitrite + nitrate, phosphate, and silicate).Temperature and salinity of the water samples were measured onboard using a YSI 30 (YSI Inc., USA).The total cell count was conducted using epifluorescence microscopy (Nikon 80i, Nikon, Japan), enumerating DAPI-stained cells (Table 2).

Biogeochemical analyses.
Chlorophyll a was extracted from GF/F glass-fiber filters using 90% aqueous acetone (v/v) at 4 °C overnight.The extraction solution was centrifuged for 10 min at 2,000 rpm, and the supernatants were analysed via a fluorometer (10 AU, Turner Designs, USA).Concentrations of inorganic nutrients, including NO 2 − , NO 3 − , NH 4 + , PO 4 3− , and SiO 2 , were determined employing a QuAAtro Microflow Analyzer (SEAL Analytical, UK).The values obtained are graphically represented in Fig. 3 and tabulated in Table 2.
DNa extraction and metagenome sequencing.DNA was extracted from membrane filters using a protocol based on manual cell lysis, followed by purification with the DNeasy Blood & Tissue Kit (Qiagen, Germany).The membranes were situated inside a 5 ml tube with the sample-filtered side inward.Subsequent to the addition of 1 ml of cell lysis buffer (20 mM EDTA, 50 mM Tris, 400 mM NaCl, and 0.75 M sucrose) and 5 μl of lysozyme solution (10 mg mL −1 in 10 mM Tris-Cl (pH 8.0)), tubes were incubated for 30 min at 37 °C in a horizontal orientation with a rotation speed of 5 rpm in a hybridization oven.Following this, proteinase K at a final library preparation kits with default library linkers and adaptors were used to generate sequencing libraries.The libraries were sequenced on an Illumina HiSeq 2500 platform, producing 250 bp paired-end reads.

technical Validation
The assessment of quality scores for the raw reads of the 12 metagenomes was performed using FastQC (v0.10.1).The results show that ~91.88% and ~79.60% of the bases have quality scores of ≥20 and ≥30, respectively, indicating that sequencing was performed successfully (Fig. 4).The distribution of per-read quality scores across the 12 metagenomes was similar, further indicating no quality issues (Fig. 4).Consistent with the characteristics of the Illumina sequencing technology, the forward reads exhibited higher quality compared to the reverse reads (Fig. 4).A succinct taxonomic profiling analysis was then conducted to ascertain the suitability of the generated data for subsequent metagenomic analysis (Fig. 5).The taxonomic composition revealed a prominent dominance of Pelagibacterales (26.0%; a median value over a 12-month period), followed by Flavobacteriales (12.6%),SAR86 (7.7%), Pseudomonadales (5.5%), and Rhodobacterales (5.3%).These taxa are typically known for their  prevalence in the ocean 1,4 .This observed pattern also aligns with the microbial community structure derived from culture-independent investigations conducted on a seawater sample collected from the same research station 31 .

Fig. 4 Fig. 5
Fig. 4 Distribution of per sequence mean quality scores of the 12 metagenomes.

The East Sea (Sea of Japan) Oct Dec Jan Jun Jul Nov Feb, May Mar Apr Aug Sep Dongmyeong port Fig
Schematic diagram of the processing of seawater samples.
concentration of 0.2 mg mL −1 and sodium dodecyl sulfate at a final concentration of 1% were introduced, and the tubes were further incubated at 55 °C overnight with rotation in a hybridization oven.After incubation, RNase A (200 μg mL −1 ), 1 mL of AL buffer (DNeasy Blood & Tissue Kit, Qiagen), and 70% ethanol were sequentially added to the tubes with appropriate incubation times.The manufacturer's instructions of DNeasy Blood & Tissue Kit were adhered to from the stage of transferring the lysis mixture to the DNeasy Mini spin column.The quality and quantity of the extracted DNA were assessed using electrophoresis with 1% of agarose gel, Nanodrop ND-1000 (Thermo Fischer Scientific, USA), and Qubit 2.0 Fluorometer (Life Technologies, US) employing the Qubit ® dsDNA Assay Kit.Metagenome sequencing was performed at Theragen Etex Inc. (Suwon, Korea).The Truseq . 2 Map of the sampling stations.Sampling stations of each month are indicated as blue dots with threeletter abbreviations of months.Inset at the upper left shows the approximate location of the sampling stations in the East Sea.

Table 1 .
Data availability of the metagenomic sequences from the East Sea, South Korea.

Table 2 .
Physicochemical parameters of seawater samples collected monthly.*Numeric values less than or equal to zero are denoted as 'n/d'

Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Jan Feb