## Introduction

Uptake of carbon by phytoplankton and its exchange between organisms in the marine environment plays a critical role in the carbon cycle, with primary production in the world’s oceans representing half of global net primary production [1]. A small proportion of surface production [2] is transported to depth via sinking particles, subduction, and other processes, transferring carbon to deep ocean pools with a residence time of millennia or longer [3]. Considerable interest has consequently focused on exploring relationships between surface microbial community structure, marine production [4,5,6], and particulate carbon export [7,8,9].

Net community production (NCP) rates reflect the productivity and metabolic balance of the surface ocean microbial community. Expressed as the difference between gross primary production and community respiration, NCP rates estimate the mixed-layer production of organic carbon available for export [10,11,12,13]. NCP patterns have been well-examined independently, as have patterns of surface ocean community structure. However, direct comparison of relationships between ecology and productivity remains an emerging line of investigation.

The Western North Atlantic is a region of interest for unraveling potential links between community structure and productivity. New production across this region is thought to be driven by a variety of physical and biological processes including nitrogen fixation, mesoscale features, seasonal mixing, and allochtonous nutrient inputs [14,15,16].

A dominant feature of the Western North Atlantic is the Sargasso Sea, an oligotrophic region typical of other subtropical gyre systems [17]. While spring and winter phytoplankton blooms occur following winter mixing of nutrients into the surface layer, the Sargasso Sea in summer exhibits limiting nitrate and phosphate concentrations (N < 50 nmol kg1, P < 20 nmol kg1) [18]. Ongoing changes in the biogeochemistry of the Sargasso Sea may impact community composition, carbon export, and nutrient cycling due to increasing stratification [19] and changing nutrient inputs [20]. Records suggest gradual community shifts are underway, with haptophyte populations declining and Synechococcus and dinoflagellate groups increasing in abundance [20, 21]. Globally, oligotrophic subtropical gyres cover some 40% of the planet’s surface [22], and small shifts in microplankton ecology in such regions may have repercussions for biogeochemistry and climate. Large-scale genomics sampling work has suggested that specific key taxa may be important drivers of carbon export in such regions [7].

To the west, the Western North Atlantic is bounded by the North American continental shelf. High rates of production are observed along this coast well into summer [23]. In this region, the shelf, shelf break, shelf slope, and Gulf Stream exert dynamic physical forcings upon resident microplankton, driving variation in community structure and primary production over short transects [24]. Such coastal regions are increasingly being recognized as potentially important carbon sinks [25] and are also predicted to undergo future ecological shifts in response to eutrophication and climate change [26,27,28].

Considering ongoing shifts in microplankton community structure in ecosystems across the Western North Atlantic, evaluating the impact of future community shifts upon primary production and potential carbon export in this region is of great interest. There is thus a need to identify relationships between community composition and NCP. Few regional NCP measurements have been conducted in the Western N. Atlantic to date, with existing NCP data generally coming from time-series measurements [29] or fine-scale studies [30, 31]. Similarly, while community structure at the Bermuda Atlantic Time Series (BATS) has been regularly studied [32, 33], broader rDNA amplicon data surveying the whole region are far sparser.

In this study, we gathered samples for high-throughput 16 and 18S rDNA amplicon sequencing and concurrently conducted high-resolution O2/Ar-based NCP measurements using Equilibrator Inlet Mass Spectrometry (EIMS) [34] along three transects spanning the oligotrophic Sargasso Sea, the Gulf Stream, and the U.S. East Coast. To obtain absolute taxonomic abundances for the sampled communities, we adapted an internal standard approach for 16 and 18S rDNA sequencing to quantitatively characterize community structure [35]. We then assessed trends in whole-community composition and diversity in relation to NCP and evaluated associations between productivity and specific microplankton groups identified in our samples.

## Materials and methods

### Study Area and collection of O2/Ar and ancillary data

Continuous and discrete measurements were collected over a 3 100 km transect in the western North Atlantic aboard the R/V Atlantic Explorer from 3–12 August 2015. The cruise track progressed west from the BATS Station (32.3°N, −64.6°W) to the North Carolina coast, then northeast to ~50 km south of Long Island, New York before returning to Bermuda (Fig. 1). Fourteen CTD casts were conducted during the cruise at 200–400 km intervals. Underway dissolved O2/Ar measurements were collected alongside discrete sampling for chlorophyll and DNA. O2/Ar was measured continuously from the ship’s underway intake using the EIMS method [34]. Details of O2/Ar-derived NCP calculations and assessment of potential vertical O2/Ar fluxes are described in the Supplementary Methods.

### Microbial community sampling and rDNA amplicon sequencing

Samples for rDNA analysis were obtained from 5 m CTD casts and underway samples (Table S2) pumped from a towfish trailing abeam of the vessel at 3–5 m depth. This custom-built towfish, suspended alongside, is trace metal-clean, using plastic tubing and carrying seawater aboard via an air-driven pump. For each sample, one liter was filtered through a 0.22-μm filter (Millipore, Billerica, MA, USA) using a peristaltic pump, preserved with RNAlater (Thermo Fisher, Waltham, MA, USA), and flash-frozen in liquid nitrogen. At stations with high biomass, the volume of filtrate was reduced to 0.2–0.5 l as filters became clogged.

### Internal controls for quantitative sequencing

A quantitative internal standard approach provides information on per-liter abundance of taxa across samples, yielding more meaningful comparisons between taxonomic abundances and biological rate measurements. To quantify rDNA copy numbers l1, internal genomic standards were added to each sample following [35]. Genomic DNA was obtained from the American Type Culture Collection (ATCC, Manassas, VA, USA) for Thermus Thermophilus (ATCC #27634D-5), a thermophilic hot springs bacterium, and Schizosaccharomyces pombe (ATCC #24843D-5), a yeast species. The S. pombe genome contains ~110 copies of the 18S V4 rDNA amplicon [35], while the T. thermophilus genome contains two 16S V4 copies [36].

Given the large range in 18S rDNA copy number across eukaryotic genomes, we determined an appropriate spike of control DNA (0.073 ng) by evaluating the average 18S rDNA concentration in our samples using qPCR with 18S V4 primers. To ensure that such a diluted spike would reliably manifest in sequencing output, we conducted a pilot sequencing run on duplicate filters from this study at the Boston University Microarray Core on an Ion Torrent PGM using a 314 chip (Supplementary Methods).

The Ion Torrent test revealed that a 15.2 ng T. thermophilus genomic DNA spike resulted in T. thermophilus reads comprising an average of 5.3% of all reads, while the addition of 0.679 ng of S. pombe gDNA yielded 0.9% S. pombe reads. Based upon these results, we adjusted the spike amounts to quantities expected to constitute <1% of sequenced reads, adding 0.679 ng of the S. pombe standard and 3.04 ng of the T. thermophilus standard to each sample, both in 50 μl volumes. This corresponded to adding c.a. 5 780 000 rDNA copies sample−1 of S. pombe and 2 800 000 rDNA copies sample−1 of T. thermophilus genomic DNA.

### DNA extraction for 16S and 18S rDNA sequencing

We conducted DNA extraction using the Qiagen DNeasy Plant Mini Kit (Qiagen, Germantown, MD, USA) following manufacturer instructions with slight modifications [37], with internal gDNA standards added prior to bead-beating [38]. PCR amplification was performed for 30 cycles using custom 16S V4 primers 515F-Y (5′-GTGYCAGCMGCCGCGGTAA-3′) and 805 R (5′-GACTACNVGGGTATCTAAT-3′) and 18S V4 primers F (5′-CCAGCASCYGCGGTAATTCC-3′) and R (5′-ACTTTCGTTCTTGAT-3′), with attached Illumina adapters and barcodes (Supplementary Table 3). These primers are adapted from widely-used universal primers for the amplification of marine prokaryotic [39, 40] and eukaryotic [41] taxa, modified to improve coverage of SAR11 and haptophytes [5, 42]. Primers were each dual-indexed with 6 bp barcodes, using a heterogeneity spacer approach [43, 44]. 16S samples were run at 94 °C for 3 min, 30 cycles at 94 °C for 30 s, 60 °C for 30 s, 72 °C for 1 min, followed by a third stage at 72 °C for 10 min. 18S samples were run identically apart from an annealing temperature of 57 °C. Each 16S PCR reaction (25 μl volume) consisted of 2.5 μl 10 × PCR buffer, 0.5 μl dNTP mix (10 μM each), 1 μl 50 mM MgSO4, 0.5 μl each of forward and reverse primer (10 μM), 0.1 μl Platinum Taq Hi-Fidelity Polymerase (Thermo Fisher, Waltham, MA, USA), and 19.4 μl of sterile water. 18S PCR reaction mixtures were identical except polymerase amounts were doubled (0.2 μl per reaction) to address weak amplification, with a compensating water volume decrease to 19.3 μl. PCR products were purified using the Qiagen QIAquick PCR Purification Kit and quantified using a Qubit 3.0 fluorometer (Life Technologies, Carlsbad, CA, USA). The samples were then pooled at equimolar concentrations and sequenced using the Illumina MiSeq platform (300 bp PE, V3 chemistry) at the Duke Center for Genomic and Computational Biology.

### Analysis pipeline

We obtained 20 450 700 single-end reads from our 25 sequenced samples. Raw single-end reads were trimmed to remove barcodes, assembled, and quality filtered following [43] using pandaseq [45]. 16S amplicon length was 296.7 + / − 2.9 bp (mean + / − sd), while mean 18S amplicon length was 424.1 + / − 4.3 bp. Demultiplexing was performed in QIIME [46]. Five 18S rDNA samples and one 16S rDNA sample contained no reads, the former likely due to a defective forward primer. Primer and other non-biological sequences were subsequently removed using Tagcleaner [47]. We conducted chimera detection and open-reference OTU picking at 97% similarity using the Usearch 6.1 algorithm [48, 49] and Release 123.1 of the SILVA database [50]. OTU clustering was performed using the usearch61 method for de novo OTU picking, and the usearch61_ref method for reference-based OTU picking. Alignment was performed using PyNAST [51] and taxonomy assignment conducted using the RDP classifier 2.2 [52]. Full sequence processing scripts are included in the Supplementary Material. Following taxonomy assignment, internal standard DNA sequences, eukaryotic metazoans, and plastid 16S sequences were filtered out using the QIIME script ‘filter_taxa_from_otu_table.py’. We further discarded one sample due to the low volume of filtrate, leaving 19 eukaryotic and 23 prokaryotic samples.

Sample diversity metrics were calculated for 16S and 18S datasets using the phyloseq package [53] for R 3.4.1 [54]. For alpha diversity analyses only, sample libraries were rarefied to the smallest library size in each set of samples (16S: 98 819; 18S: 33 245). Rarefaction curves begin to level off at the sequencing depths obtained, suggesting that depth was sufficient to represent major patterns of diversity in our samples (Figure S1). Alpha diversity metrics (observed OTUs and Shannon diversity) were calculated using averages from five rarefactions.

Using our non-rarefied sample libraries, calculation of absolute abundances for each OTU was performed following [38]:

$$rDNA\,abundance\,l^{ - 1} = \frac{{\# \,of\,OTU\,reads}}{{R \ast V}}$$
(1)

where V is the volume filtered and R represents the recovery ratio of internal standards (genomic standards sequenced/molecules of genomic standard added). Output OTU tables are included in the Supplementary Material (Supplementary Tables 5a, 5b)

Further details of downstream statistical analyses including ordination and PLS regression are described in the Supplementary Methods.

## Results and Discussion

### Patterns of O2/Ar-derived NCP

Underway O2/Ar-derived biological oxygen fluxes within the mixed layer ranged from −2.4 to 17.4 mmol O2 m−3 day−1 (MLD-integrated rates of −25–190 mmol O2 m−2 day−1) (Fig. 1). We observed initial rates below 0.5 mmol Om−3 day−1 in the open ocean, increasing to 1 mmol O2 m−3 day−1 within 400 km of the coast. Turning north, fluxes reached 2–4 mmol O2 m−3 day−1 along the Carolina coast. Values were subsequently variable along the coast.

The highest O2/Ar supersaturation occurred at the expedition’s northernmost extent within a productive phytoplankton bloom, with values peaking at 17.4 mmol O2 m−3 day−1 south of Long Island. Passing this bloom, O2/Ar supersaturation declined again to typically below 1 mmol O2 m−3 day−1 during transit back to Bermuda.

We assessed the potential contribution of eddy diffusive and entrainment fluxes to mixed-layer O2/Ar values as minimal (Supplementary Methods). Consequently, we report all biological O2 fluxes as NCP rates henceforth. Except when comparing our data with integrated figures from other literature, we also report rates throughout this manuscript as volumetric values, more suitable for relation to quantitative taxonomic abundances.

Overall, our high-resolution NCP measurements agree well with previously measured patterns, with low NCP rates observed in the open ocean and higher values over the continental shelf along the Mid-Atlantic Bight. The marked peak in productivity at the northern end of the expedition coincided with high measured nitrogen fixation rates [55] and high Chl a. Peak MLD-integrated productivity, reaching 190 mmol O2 m−2 day−1 (136 mmol C m−2 day−1 assuming a photosynthetic quotient of 1.4 [56]), is of a similar magnitude as integrated 14C-derived primary production rates for the Mid-Atlantic Bight spring bloom of up to 158 mmol C m−2 day−1 [57]. Our observed rates are also comparable to summer peak photic-zone primary production of between 145 and 190 mmol C m−2 day−1 modeled for the same area using profile observations [24].

Our low MLD-integrated open-ocean NCP rates, with a mean of 2.2 mmol O2 m2 day−1, are also consistent with prior Sargasso Sea O2/Ar-based estimates in September/October of 1.1–3.4 mmol O2 m−2 day−1 [30], as well as modeled summer regional NCP values of 3–4 mmol O2 m−2 day−1 [58].

### Microbial community quantitative and relative abundance patterns

Analysis of rDNA reads yielded 7 843 eukaryotic and 5 604 prokaryotic OTUs across 19 eukaryotic and 23 prokaryotic samples (Supplementary Table 6). 16S and 18S samples contained at least 98 819 and 33 245 reads per sample.

Our observations of 16S and 18S rDNA abundances per liter were within expected bounds. Bacterial 16S rDNA abundances of 1.78 × 108–5.4 × 109 copies l−1 are consistent with bacterial abundances in the Sargasso Sea and Western North Atlantic of 4.0 × 108–2.3 × 109 cells l−1 [59, 60], assuming a typical 16S copy number of 1–15 [61]. Excluding the three highest NCP stations, where the highest 18S rDNA abundances were observed (range of 1.43 × 108–3.14 × 1010 18S rDNA genes l−1), the median 18S rDNA abundance was 1.4 × 109 sequences l−1.This is high compared with surface ocean eukaryotic cell densities of 1 × 107 protists l−1 and 1 × 106 phytoplankton l−1 [62], but is likely driven by variation in 18S rDNA copy number. Peak 18S rDNA abundances, while high, are also reasonable. Phaeocystis blooms can reach cell counts of 1.5 × 108 cells l−1 [63], and Aureococcus blooms of 6 × 108 cells l−1 have been observed along the Long Island Coast [64].

Absolute abundances of individual taxa are also consistent with previous observations. For example, the median SAR11 16S rDNA abundance in our samples (Fig. 2b) was 6.2 × 108 rDNA genes l−1 (SAR11 contains one 16S gene copy cell1), compared with previous measurements of 2 × 108 SAR11 cells l−1 in the Sargasso Sea from fluorescence in-situ hybridization counts [65]. Similarly, we observed a median of 1.9 × 108 Prochlorococcus 16S rDNA genes l−1 in our samples (Fig. 2b), consistent with Western North Atlantic observations of 1 × 108 cells l−1 based on qPCR quantification and flow cytometry [66, 67]. Applications of the internal standard approach for samples collected in the lower Amazon River, the Southern Ocean, as well as in soil samples have also demonstrated good correspondence between the standard-derived abundances and complementary abundance data measured using epifluorescence microscopy, photosynthetic pigments, flow cytometry, phospholipid fatty acid analysis, and substrate-induced respiration approaches [35, 36, 68].

Notably, calculation of absolute taxonomic abundances using internal standards produces patterns distinct from those generated using relative abundance metrics (Fig. 3). This is evident among several abundant 18S and 16S OTUs, including SAR11 clade members, as well as the protist clades Dinoflagellata, Gonyaulacales, Alveolata, and Gymnodiniphycidae. The latter four eukaryotes increase in absolute abundance within the bloom environment, while their relative abundances decrease due to the dominance of Chrysophyceae and Aureococcus anophagefferens within these samples. A similar phenomenon affects SAR11 relative abundances, which are highest between S2–S9 and S20–S25 due to lower 16S rDNA counts for other prokaryotes at those stations. These discrepancies highlight longstanding criticisms of traditionally-used relative abundance metrics [36, 38, 69,70,71,72] and illustrate advantages offered by the internal standard approach. In addition, avoidance of issues caused by compositional community data [73, 74] is valuable when relating taxonomic abundances to microbial or biogeochemical processes like NCP.

The internal standard approach is nonetheless subject to several assumptions and limitations. A key assumption is that recovery rates of DNA standards are comparable to those of natural sequences within the sample. Particularly given the general implications of primer biases in amplicon work, this premise warrants further investigation. Recovery rate differences due to amplification bias would not alter how the quantitative abundance pattern of a sampled taxon changes across samples, but might result in discrepancies between estimated and actual in-situ abundances. Another important limitation is that quantitative abundance data produced by this method remain sensitive to differences in rDNA copy number across taxa. Although better knowledge of 16S copy number variation across prokaryotes has spurred efforts to correct for copy number differences [75], existing datasets remain limited particularly for eukaryotes, in which rDNA copy number may vary by multiple orders of magnitude. As data collection continues, corrections will likely become more feasible and commonplace.

Among eukaryotes, dinoflagellate lineages dominated all samples except three from the coastal bloom (S14–S16) (Fig. 4a, b). Most of these dinoflagellate sequences corresponded to Syndiniales, alveolate parasites infecting various marine organisms and often detected at high abundances using molecular tools [76,77,78]. While many of these sequences may originate from endosymbionts inside metazoan zooplankton caught on our filters, Syndiniales also infect microzooplankton protists, including ciliates, cercozoa, and other dinoflagellates, and clades targeting both host categories often exhibit a short free-living life stage [76]. Consequently, these sequences may also represent organisms living outside of metazoan hosts, interacting within the marine microbial environment. To a degree, elevated dinoflagellate abundances observed may also reflect high 18S copy numbers, driven by large dinoflagellate genomes [79, 80].

Two samples (S14, S15) associated with the coastal bloom were dominated (>90% relative abundance) by Aureococcus anophagefferens, a pelagophyte that forms coastal “brown tide” harmful algal blooms (HABs) [81], as well as Chrysophyceae (Fig. 4a, b). qPCR surveys have also detected A. anophagefferens at low abundances in pelagic waters, which some suggest indicates an oceanic origin for this nuisance algae [82]. A wide distribution of A. anophagefferens is also supported by our study. We found Aureococcus present in 16 of 19 18S rDNA samples, with a mean of 7.5 × 104 Aureococcus 18S rDNA genes l−1 observed in non-bloom samples. We estimated abundances of 4.4–6.6 × 104 18S rDNA genes l−1 in open-ocean samples (S24, S25) collected near Bermuda. In comparison, estimated Aureococcus 18S rDNA gene abundances ranged between 1.8 × 108 and 2.0 × 1010 rDNA genes l−1 within the observed bloom (Fig. 2a).

Sample 16 featured a high population (~20%) of Prymnesiales, primarily Chrysochromulina and Chrysoculter. Chrysochromulina are another nuisance algae, capable of mixotrophy [83], and forming blooms that can cause fish kills [84]. Other members of Prymnesiales produce harmful hemolytic compounds [85]. Eukaryotic diversity was lower at two bloom stations, S14 and S15, (Supplementary Figure 2) but was similar across our other samples.

Among bacterioplankton, SAR11, SAR86 clade members (appearing as Oceanospirillales in Fig. 4), and Prochlorococcus (Subsection I cyanobacteria) dominated the communities sampled (Fig. 4c, d). The AEGEAN-169 clade of Alphaproteobacteria (Rhodospirillales), as well as MGII Archaea (Thermoplasmatales) also appeared at high proportional abundances. Within the northern bloom, we observed elevated abundances of Planctomycetales, Flavobacteria, Sphingobacteriales, and Order III Cytophagia, with Phycisphaerales appearing at particularly high abundances (>10%) at two stations. Not much is currently known about Phycisphaerales, although they are hypothesized to form associations with macroalgae, with many representatives facultatively anaerobic [86]. In addition, these bloom samples also appear to contain more sequences belonging to less-abundant and “rare” taxa (labeled ‘Other’ in Fig. 4). This phenomenon of elevated abundances of “rare” taxa in bloom events has also been reported elsewhere and may be related to ecological associations with phytoplankton [87, 88]. Bacterial diversity across samples was more uniform than eukaryotic diversity, with prokaryotic Shannon diversity between 4.1–4.5 versus 2.4–5.7 for eukaryotic samples (Supplementary Figure 2).

### Relationships between microbial community structure and NCP

At the community level, we observed a negative relationship between measured NCP and eukaryotic Shannon’s H diversity (Pearson: −0.81, Spearman: −0.76, p « 0.01 for both) (Fig. 5), which was strongly driven by low diversity at two highly productive stations. This relationship does not remain significant with those samples excluded (Pearson: −0.56, Spearman: −0.61, p > 0.01). We observed no relationship between prokaryotic diversity and NCP.

Recent debate over the nature of the relationship between marine microplankton diversity and productivity has been energetic. Any overall relationship between community diversity and productivity would reflect the relative importance of functional diversity, cooperation, competitive exclusion, selective feeding by grazers, and other factors in governing ecosystem production [6, 89, 90]. Earlier research suggests a peak of phytoplankton diversity at locations with moderate production, with decreasing diversity observed for less-productive and highly productive sites [91, 92]. Dominance of a handful of taxa beyond the control of grazers may explain decreased diversity at high productivity. Increased diversity at moderate productivity rates may reflect selective feeding pressures that allow coexistence between a higher diversity of taxa. Within the Western North Atlantic, our data supports the view that the most productive marine communities may exhibit relatively low eukaryotic diversity, a result consistent with meta-analysis and model-based findings that the most productive communities are among the least diverse [6].

Principal coordinate analysis (PCoA) of both prokaryotic and eukaryotic samples demonstrated distinctions between coastal bloom and other samples (Fig. 6), indicating community dissimilarities. Linear regressions of environmental parameters against the first principal component revealed significant correlations between NCP, temperature, latitude, Chlorophyll, and PC1 for both our 18 and 16S datasets (Supplementary Table 1), suggesting associations between these parameters and community structure. None of these trends remained significant once data from bloom stations S14, S15, and S16 were excluded, however, indicating that these relationships were driven largely by these samples, which possess distinctive community structure, high Chl and NCP, and low water temperatures compared to all other stations.

### Relationships between NCP and specific microplankton taxa

Partial Least Squares (PLS) regression analysis revealed groups of prokaryotic and eukaryotic taxa associated with high volumetric NCP rates (Supplementary Tables 4a-4f), with these relationships again strongly driven by the bloom community. Eukaryotic taxa associated with NCP included Ochrophyta, Aureococcus anophagefferens, picozoa, cryptophytes, prymnesiophytes, and stramenopiles, such as several uncultured MArine STramenopile (MAST) clades (Fig. 7b).

Many of these protists are commonly associated with phytoplankton bloom conditions. Aureococcus anophagefferens possesses a large genome optimized for uptake of ambient dissolved organic carbon and nitrogen and is adapted for fast growth under turbid, low-light conditions [81, 93]. Members of Chrysophyceae also form blooms and practice phagotrophy, engulfing, and processing particulate matter [94]. The high abundance of these two taxa within the bloom implies an environment favoring opportunistic uptake of available particulate and dissolved organic material.

Other eukaryotes strongly associated with high NCP include groups of heterotrophic protists: radiolarians, centrohelids, Labyrinthulomycetes, Ciliophora, as well as flagellates such as Kathablepharidae, Choanomonada, and uncultured marine stramenopiles. Many of these taxa feed upon algae, bacteria, detritus, and other particles. The associations between these taxa and NCP may indicate flourishing of heterotrophs within an environment with enhanced food and prey concentrations.

The bacterial taxa most correlated with NCP corroborate this picture of a productive bloom ecosystem driven by high phytoplankton productivity. Groups of Bacteriodetes, a class of heterotrophic bacteria generally observed to thrive in particle-rich bloom environments [95], are strongly associated with NCP. Other bacterial groups primarily exhibiting surface or particle-associated lifestyles, including Verrucomicrobia and Planctomycetes, also display high correlations with NCP. Numerous Gammaproteobacteria taxa, including the fast-growing Vibrionales clade, are also strongly associated with productivity (Fig. 7a).

We acknowledge that our community sampling represents a snapshot of this bloom and cannot capture successional dynamics. 8-day MODIS satellite chlorophyll data measured before and after our cruise suggest that the bloom first appeared in late July one to two weeks before sampling. Our expedition likely encountered the bloom at its temporal midpoint, with the bloom then fading by late August. We further note that taxa associated with this event may not be characteristic of other blooms that might occur throughout the region. Although satellite imagery indicates that a large bloom often recurs annually in the Mid-Atlantic Bight in late summer, additional sampling is required to confirm whether the observed community structure also recurs.

Interestingly, when PLS regression analyses were repeated while excluding bloom stations S14–S16, only a handful of bacterial taxa and eukaryotic taxa remained associated with NCP rates, and the overall strength of associations weakened. Outside of the observed bloom, moderate correlations with productivity were displayed by just several groups of cryptophytes and bacterioplankton (Fig. 8a, b). These results might indicate that relationships between specific groups of eukaryotic and prokaryotic taxa and NCP in less-productive locations are either undetected by our study or hidden within the uncertainties of the measurements conducted. At the same time, such a finding may suggest that links between productivity and community structure in this region are complex, with the abundance of any given taxa not strongly associated with measured productivity.

The relationships we have detailed between productivity and selected microplankton taxa exhibit interesting discrepancies with findings from similar work conducted in other regions of the global ocean. A TARA Oceans study of associations between bacterial, eukaryotic, and viral taxa, NPP, and particulate carbon export linked some of the same microplankton groups to primary production and to particle export that were productivity-associated within our full dataset, including Vibrio and Alteromonadales among bacteria, as well as dinoflagellates, Labyrinthula, Cercozoa, Picozoa, prymnesiophytes, MAST-3, and Radiolaria [7].

Intriguingly, however, many of these abovementioned associations vanish from our analysis when our dataset is limited to non-bloom station data, whereas Guidi et al. suggest that these same relationships are strong within the oligotrophic ocean. It is also worth noting that several taxa implicated in carbon export by Guidi et al. show no or even negative correlations with NCP in our analysis, such as Synechococcus (Subsection I Cyanobacteria) and Oceanospirillales. Dissimilarities may be attributable to differences in abundance metrics, molecular methods, and the distinctions between in-situ O2/Ar-derived NCP, modeled NPP, and optically-determined particle export (i.e., not all NCP is exported). Further, ecological dynamics encompassed by our regional study may not be extrapolatable to global open-ocean data. Yet our work nevertheless spans a considerable area and range of marine biomes. Rather, our results suggest that outside of the observed bloom, productivity across a relatively wide region is not strongly associated with specific microbial taxa. Such questions warrant further investigation.

## Conclusions

Our results document a dramatic bloom in Mid-Atlantic Bight coastal waters, where the harmful algal bloom-forming taxon Aureococcus, Chrysophyceae, heterotrophic protists, and particle-associated bacterioplankton were strongly associated with this productivity peak. This result emphasizes the potential significance of large coastal blooms to productivity patterns in the Western North Atlantic, and highlights HAB-forming Aureococcus as a taxon of particular interest. We also find few associations between taxonomy and NCP across a wide range of less-productive waters, suggesting that specific microplankton taxa may not be responsible for driving broader patterns of production across much of this region.

Our quantitative amplicon sequencing approach serves as a useful tool in investigating the ocean microbiome and its influence on the marine environment, providing important additional context beyond relative abundance metrics. Coupled with the ever-increasing resolution and capabilities of in-situ biogeochemical methods, adoption of similar study designs can enable more nuanced examination of the role of the microplankton community across diverse ocean environments.

Supplementary information is available at the ISME Journal’s website. Sequences and metadata are available from the NCBI Sequence Read Archive under accession number SRP126177.