Background & Summary

In pelagic marine ecosystems, a major proportion of primary production is transformed by heterotrophic microbes on the scale of hours to days1,2,3. Much of this rapidly-processed primary production is made available in the form of dissolved organic carbon (DOC), released from phytoplankton by direct excretion or through trophic interactions. Bacterial uptake of DOC produces living biomass and regenerates inorganic nutrients1.

Monterey Bay is a coastal ecosystem with high primary production driven by frequent upwelling of nutrient-rich waters4,5. Intense phytoplankton blooms can develop6, and these vary dynamically in terms of taxonomic composition. In 2016, the fall phytoplankton bloom (Fig. 1) was dominated by an unusually intense bloom of the dinoflagellate Akashiwo sanguinea7. A. sanguinea cell abundances reached 4.9 × 106 cells L−1, and chlorophyll a concentrations reached 57 µg L−1 (at ~6 m depth) over the period spanning mid-September to mid-November. Here we present metagenomic, metatranscriptomic, and iTag data on the bacterial and archaeal communities during a 52-day period spanning this unusual plankton bloom in Monterey Bay (Table 1). iTag data on the eukaryotic microbial communities provides contextual information on community dynamics of the bloom-forming phytoplankton and grazer communities.

Fig. 1
figure 1

MODIS satellite image on September 26, 2016 of the phytoplankton bloom occurring in Monterey Bay and extending into the Pacific. The red dot represents the sampling station M0, located at 36.835 N, 121.901 W.

Table 1 Sequence datasets from the fall bloom in Monterey Bay, CA, 2016.

Methods

Sampling protocol

From September 26 through November 16, 2016, microbial cells were collected at Monterey Bay station M0 for sequence analysis. A moored autonomous robotic instrument, the Environmental Sample Processor (ESP)8, filtered up to 1 L of seawater sequentially through a 5.0 µm pore-size polyvinylidene fluoride filter to capture primarily eukaryotic microbes, which was stacked on top of a 0.22 µm pore-size polyvinylidene fluoride filter to capture primarily bacteria and archaea (Table 1). The samples were collected between 5 and 7 m depth at approximately 10 a.m. PST. Samples were collected daily except during October 7 – November 1 when the ESP was offline for repair. ESP filters were preserved with RNAlater at the completion of sample collection and stored in the instrument until retrieval. While the ESP was offline, grab samples were collected by Niskin bottle at the M0 mooring site 2–3 times per week, with time of sampling, depth of sampling, and filters the same as for the ESP samples except that filters were flash frozen in liquid nitrogen.

Environmental data (temperature, salinity, chlorophyll a fluorescence, light transmission, and dissolved O2 concentrations) were collected by a CTD instrument mounted with the ESP9. Additional environmental data were obtained from grab samples collected at the M0 mooring 2–3 times per week [total dimethylsulfoniopropionate concentration (DMSPt), dissolved DMSP concentration (DMSPd), DMSPd consumption rate, chlorophyll a, and cell counts by flow cytometry and microscopy]10,11 (Online-only Table 1).

DNA/RNA extraction

Total community nucleic acids for metagenome, metatranscriptome, and 16S iTag sequencing were obtained from the same 0.22 µm filter (0.22–5.0 μm size fraction) using the ZymoBIOMICS DNA/RNA Miniprep Kit (Zymo Research, Irvine CA). At extraction start, internal standards were added to the lysis buffer tube (see Usage Notes), and the filter was cut into small pieces under sterile conditions to facilitate extraction. RNA was treated according to the manufacturer’s instructions with in-column DNase I treatment. After elution, RNA was treated with Turbo DNase (Invitrogen, Carlsbad CA) and concentrated using Zymo RNA Clean and Concentrator (Zymo Research). Except for a few cases of low nucleic acid yields, duplicate filters were sequenced for each sample date.

DNA for 18S rRNA gene sequencing was extracted from the 5.0 μm filters using the DNeasy Plant Mini Kit (Qiagen, Venlo NL) with modifications. Filters were cut into pieces and added into a prepared lysis tube containing ~200 µl of 1:1 mixed 0.1 and 0.5 mm zirconia/silica beads (Biospec Products, Bartlesville, OK) and 400 μl Buffer AP1. Internal standards (see Usage Notes) were added just prior to extraction. Three freeze-thaw cycles were performed using liquid nitrogen and a 65 °C water bath. Following freeze-thaw, bead beating was performed for 10 min, followed by centrifugation at 8,000 rpm for 10 min to remove foam. Following centrifugation, 45 μl of proteinase K (>600 mAU/ml, Qiagen) was added to each tube and incubated at 55 °C for 90 min with gentle rotation. Filters were then removed and the tubes incubated at 55 °C for 1 h. The DNeasy kit protocol was resumed at the RNase A addition step. Final DNA was eluted in 75 μl of diluted (1:10) TE buffer.

Metagenome sequencing and analysis

Sequence data were generated at the Department of Energy (DOE) Joint Genome Institute (JGI) using Illumina technology. Libraries were constructed and sequenced using the HiSeq-2000 1TB platform (2 × 151 bp). For assembly, reads were trimmed and screened, and those with no mate pair were removed using BFC (v r181)12. Remaining reads were assembled using SPAdes (v 3.11.1)13. The read set was mapped to the final assembly and coverage information generated using BBMap (v 37.78)14 with default parameters. Assembled metagenomes were processed through the DOE JGI Metagenome Annotation Pipeline (MAP) and loaded into the Integrated Microbial Genomes and Microbiomes (IMG/M) platform15,16.

Metatranscriptome sequencing and analysis

Sequence data were generated at the DOE JGI using Illumina technology. Libraries were constructed and sequenced using the HiSeq-2500 1TB platform (2 × 151 bp). Metatranscriptome reads were assembled using MEGAHIT (v 1.1.2)17. Cleaned reads were mapped to the assembly using BBMap.

16S and 18S iTag sequencing and analysis

Sequence data were generated at the DOE JGI using Illumina technology. Primers 515FB18 (5′-GTGYCAGCMGCCGCGGTAA) and 806RB19 (5′-GGACTACNVGGGTWTCTAAT) were used for 16S rRNA gene amplification, and primers 565F (5′-CCAGCASCYGCGGTAATTCC) and 948R (5′-ACTTTCGTTCTTGATYRA) were used for 18S rRNA gene amplification20. Libraries were constructed and sequenced using the Illumina MiSeq platform (2 × 301 bp). Contaminant reads were removed using the kmer filter in BBDuk, and filtered reads were processed by the JGI iTagger (v 2.2) pipeline (https://bitbucket.org/berkeleylab/jgi_itagger).

To generate an overview of microbial community composition during the bloom (Figs 2 and 3), the 16S and 18S rRNA amplicon libraries (raw reads) were primer-trimmed using Cutadapt (v 1.18)21 and analyzed using QIIME2 (v 2018.6)22. The DADA223 plugin in QIIME2 was used to generate exact sequence variants (ESVs), which were classified using the QIIME2 naive Bayes classifier trained on 99% Operational Taxonomic Units (OTUs) from the SILVA rRNA database (v 132)24 after trimming to the primer region. Taxonomic bar plots were generated using QIIME2.

Fig. 2
figure 2

Relative abundance of bacterial and archaeal taxa at Monterey Bay station M0 during the fall of 2016. Samples were collected at ~6 m, and 16S rRNA genes were amplified from community DNA in the 0.22 to 5.0 µm size range. Taxonomic groups were defined based on exact sequence variants using DADA2 in QIIME 2 (https://qiime2.org) and assigned taxonomy with the naive Bayes q2-feature-classifier trained using the 515F/806R region from 99% operational taxonomic units from the SILVA 132 16S rRNA database. Assignments of the 30 most abundant taxa are given at the family level.

Fig. 3
figure 3

Relative abundance of eukaryotic taxa at Monterey Bay station M0 during the fall of 2016. Samples were collected at ~6 m, and 18S rRNA genes were amplified from community DNA in the >5.0 µm size range. Taxonomic groups were defined based on exact sequence variants using DADA2 in QIIME 2 (https://qiime2.org) and assigned taxonomy with the naive Bayes q2-feature-classifier trained using the 565F/948R region from 99% operational taxonomic units from the SILVA 132 18S rRNA database.

Data Records

The raw Illumina sequencing reads for metagenomes, metatranscriptomes, and 16S rRNA and 18S rRNA iTags are available from the NCBI Sequence Read Archive under 342 separate project IDs (summarised in Online-only Table 2) which we have gathered under a single BioProject umbrella ID25.

Contigs assembled within each individual metagenome and metatranscriptome are available from the JGI Integrated Microbial Genomes portal (Online-only Table 2).

Chemical and biological data associated with each sample are available at the Biological and Chemical Oceanography Data Management Office (BCO-DMO)9,10. Measured parameters include temperature, salinity, depth, light transmission, concentrations of dissolved oxygen and chlorophyll, concentration and consumption rates of DMSP, and cell counts for heterotrophic bacteria, Synechococcus, Akashiwo, and photosynthetic eukaryotes.

Technical Validation

For metagenomic and metatranscriptomic Illumina data, BBDuk (version 37.95; https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) was used to remove contaminants, trim reads that contained adapter sequence, and trim reads where quality dropped to zero. BBDuk was used to remove reads that contained four or more ‘N’ bases, had an average quality score across the read <3, or had a minimum length ≤51 bp or 33% of the full read length. Reads mapped with BBMap to masked human, cat, dog and mouse references at >93% identity were separated into a chaff file. Reads aligned to common microbial contaminants were also separated into a chaff file. For metatranscriptomic data, reads containing ribosomal RNA and known JGI spike-in sequences were removed and placed into separate fastq files. The internal DNA and mRNA standards added for quantification purposes at the nucleic acid extraction step (see Usage Notes) were recovered at 0.5–5.0% of sequences as expected.

For 16S rRNA and 18S rRNA, BBDuk was used to remove contaminants and trim reads that contained adapter sequence. This program was also used to remove reads that contained one or more ‘N’ bases, had an average quality score across the read of <10, or had a minimum length ≤51 bp or 33% of the full read length. Reads mapped with BBMap to masked human, cat, dog and mouse references at >93% identity or aligned to common microbial contaminants were separated into a chaff file. The 16S and 18S rRNA reads amplified from the internal DNA standards added for quantification purposes (see Usage Notes) were recovered at their expected level (0.5–5.0% of sequences).

Sequence datasets were checked for consistency with the expected composition of coastal marine microbial communities. Taxonomic assignments of 16S and 18S rRNA ESVs matched those of marine microbes common in coastal areas in general26,27 and in Monterey Bay seawater in particular11 (Figs 2 and 3). Taxonomic assignments of protein-encoding genes from metagenomic datasets were likewise representative of coastal and Monterey Bay microbial communities, and had taxonomic assignments consistent with the iTag datasets.

Usage Notes

Sample processing included the addition of internal standards to allow for calculation of volume-based absolute copy numbers for each gene or transcript type (i.e., counts L−1 rather than % of sequence library)28,29. The DNA standards consisted of genomic DNA from Thermus thermophilus DSM7039 HB829 and Blautia producta strain VPI 4299 (American Type Culture Collection, Manassas, VA). mRNA standards consisted of custom-designed 1006 nt artificial transcripts29. Artificial transcript sequences are available at Addgene Plasmid Repository (https://www.addgene.org; products MTST5 and MTST6). All four standards (two DNA and two mRNA) were added to the 0.22 μm pore size samples at the initiation of nucleic acid extraction. In the case of 18S iTag samples, genomic DNA from Arabidopsis (BioChain Institute, Inc., Newark, CA) and Mus musculus (Millipore Sigma, Burlington MA) was similarly added to the 5.0 μm pore size samples at initiation of extraction. Added amounts of internal standards were estimated at ~1% of final yields of DNA or mRNA based on prior recoveries from similar filters. Actual yields averaged ~2% of reads. The internal standards should be removed from the raw data prior to analysis. Information on how internal standards can be used for volume-based quantification is available elsewhere29,30.

Environmental data collected in association with the nucleic acid samples are given in Online-only Table 1. Available data differ between sampling dates depending on whether sampling was done by the ESP, from Niskin grab samples, or both.