Introduction

Small photosynthetic eukaryotes [1] are key component of the biomass and primary production in marine ecosystems [2,3,4]. In coastal waters, these small photosynthetic eukaryotes are often dominated by green algae from Mamiellophyceae [5], whereas in oceanic waters they are much more diverse, including green algae (Mamiellophyceae, prasinophytes clade VII) but also pelagophytes, chrysophytes, and prymnesiophytes [3, 6, 7]. While for some groups such as Mamiellophyceae, prasinophytes clade VII, or pelagophytes, numerous representatives have been isolated in culture [8], this is not the case for chrysophytes or prymnesiophytes. Initially, these small eukaryotes were thought to be purely phototrophs, but recently novel types of trophic modes and associations have been discovered. In particular, quite a few appear to be mixotrophs, such as Micromonas in polar regions [9], and a new type of symbiosis has been found between a small haptophyte alga and a nitrogen-fixing cyanobacterium UCYN-A [10].

Nitrogen (N2) fixation, performed by diazotrophs, is an anaerobic process catalyzed by the nitrogenase enzyme, which produces biologically available ammonium (NH4+). It has a strong impact on primary productivity in past [11], present [12], and future oceans [13]. Trichodesmium sp. was initially identified as the most important marine diazotroph [14, 15]. However, recent discoveries have set the ground for a paradigm shift in our understanding of N2 fixing players regarding their diversity, ecological strategies, geographic distribution, and impact on the primary productivity in the global ocean ([16,17,18] and references therein). Two decades ago, an uncultivated unicellular cyanobacterium UCYN-A was discovered by PCR amplification of the nitrogenase gene nifH [19]. The genome of UCYN-A was found to be extremely streamlined, lacking the genes involved in the oxygen-evolving pathway, CO2 fixation [20], and biosynthetic pathways of several amino acids and purines [21]. Such modifications in the genome, resembling the ones observed in cellular organelles, led to the hypothesis of a symbiotic partnership [21]. This was confirmed later by the identification of an association between UCYN-A and an apparently non-calcifying prymnesiophyte microalga [10], closely related to the coccolithophorid Braarudosphaera bigelowii, which bears pentagon-shaped coccoliths (pentaliths), and to the non-calcifying haptophyte Chrysochromulina parkeae [10]. Recent data suggest that this symbiosis could be responsible for a major part of the oceanic nitrogen fixation [22].

UCYN-A appears to be quite diversified with at least six clades (A1 to A6), which forms a monophyletic group with Crocosphaera sp. and Cyanothece sp. [16, 23,24,25]. The hosts of the two major clades (A1 and A2) appear to have different cell sizes, to harbor a different number of symbionts [16, 26], and present distinct oceanic distributions, the former being less widespread but with higher relative abundance peaks than the latter [27].

Many open questions remain concerning these symbioses, including the ecological distribution of the hosts and their importance within the phytoplankton community. The South Atlantic Bight off Brazil is reported to have a high UCYN-A abundance in comparison to other oceanic regions [27] and also appears to be a hot spot of nitrogen fixation [28]. We investigated pico and nano-phytoplankton composition in this area using flow cytometry sorting [29] combined with parallel high throughput sequencing of the V4 region of the 18S rRNA and of a nifH gene fragment. Our data revealed two major types of small photosynthetic eukaryote communities, one dominated by green algae and the other dominated by the UCYN-A1 host. By analyzing the nifH diversity in the same samples, we recovered sequences from UCYN-A, confirming the importance of this symbiosis in oceanic regions off Brazil.

Material and methods

Sampling

Three transects (TR0, 1 and 2) were sampled off the coast of Brazil during the CARBOM V cruise onboard the R/V “Alpha Crucis” in November 2013 (Table 1; Fig. 1). TR0 was sampled mostly in surface, TR1 at the level of deep chlorophyll maximum (DCM) and TR2 at 3 depths. During the TR2 transect, a Trichodesmium sp. bloom was observed at Station 101, where additional samples were taken (Station Bloom). All samples were collected with 12 L Niskin bottles attached to a CTD Teledyne model PS7000M (Teledyne Technologies Inc., CA, USA), except for surface samples from TR0 and Station Bloom, collected with a polycarbonate bucket. Samples (1.5 mL) for cell sorting by flow cytometry were collected into cryotubes with 10% DMSO (final concentration), flash-frozen in liquid nitrogen, and stored at −80 °C until analysis at the Station Biologique de Roscoff, France.

Table 1 List of samples analyzed
Fig. 1
figure 1

Map of stations

Flow cytometry sorting

Samples were sorted using a FACSAria™ flow cytometer (Becton Dickinson, San Jose, CA), equipped with a laser emitting at 488 nm and a 70-mm nozzle. Emitted light was collected through the following set of filters: 488/10 band pass (BP) for side scatter, 576/26 BP for orange fluorescence, and 655 long pass for red fluorescence. Signal detection was triggered on chlorophyll fluorescence. Photosynthetic pico and nanoeukaryotes populations were selected based on light scatter, orange phycoerythrin, and red chlorophyll fluorescence as described previously [29]. Tris–HCl 50 mM, pH 8.0, NaCl 10 mM was used as sheath liquid. Sheath pressure was set at 70 PSI and nozzle frequency was 90.000 Hz with a deflection voltage of 6.000 V. Cells were sorted in purity mode and collected into Eppendorf tubes containing Tris-EDTA lysis buffer (Tris 10 mM, EDTA 1 mM, and 1.2% Triton, final concentration). In addition, sheath fluid samples were collected and analyzed as negative controls in all subsequent steps including sequencing, to test for contamination in the flow sorting process.

PCR and Illumina run

DNA from sorted cells was extracted by one cycle of freezing and thawing in liquid nitrogen. Two genes,18S rRNA and nifH, were amplified with nested PCR on the same sorted populations.

For eukaryotes, the V4 region of the 18S rRNA gene was targeted (about 380 bp). The first round of PCR amplification was done using the following 10 µL mix: 5 µL KAPA HiFi HotStart ReadyMix® 2×, 0.3 µM final concentration of primer 63F, 0.3 µM final concentration of primer 1818R (Supplementary Table 2, [30]), 1 µL of DNA and H2O. Thermal conditions were: 95 °C for 5 min, followed by 25 cycles of 98 °C for 20 s, 52 °C for 30 s, 72 °C for 90 s, and a final cycle of 72 °C for 5 min. For the second round: 12.5 µL KAPA HiFi HotStart ReadyMix® 2×, 0.3 µM final concentration of primer V4F_illum, 0.3 µM final concentration of primer V4R_illum (Supplementary Table 2, [31]), 2.5 µL of first round product and H2O for a 25 µL reaction. Thermal conditions were: 95 °C for 3 min, followed by 25 cycles of 98 °C for 20 s, 65 °C for 1 min, 72 °C for 90 s, and a final cycle of 72 °C for 5 min. One sample (127p, Table 1) failed to amplify the 18S gene. Negative controls and sheath fluid yielded either no amplification or faint bands which provided either no Illumina sequences or sequences from groups not considered in our analysis (Rhizaria, Mammalia) except for one Chrysophyceae clade C sequence that was also found in a few samples.

For the partial nifH gene amplification, the PCR reaction contained for the first round: 7.5 µL HotStartTaq® (Qiagen) buffer, 1.5 µL CoralLoad (Qiagen), 0.6 µM final concentration of primer nifH3, 0.6 µM final concentration of primer nifH4 [32], 2 µL of DNA and H2O for a 15 µL reaction. Thermal conditions were: 95 °C for 10 min, followed by 35 cycles of 95 °C for 1 min, 45 °C for 1 min, 72 °C for 1 min, and a final cycle of 72 °C for 10 min. For the second round: 12.5 µL KAPA HiFi HotStart ReadyMix® 2×, 0.3 µM final concentration of primer nifH1_illum, 0.3 µM final concentration of primer nifH2_illum [32], 2.5 µL of first round product and H2O to a 25 µL reaction. Thermal conditions were: 95 °C for 5 min, followed by 25 cycles of 98 °C for 20 s, 54 °C for 1 min, 72 °C for 1 min, and a final cycle of 72 °C for 5 min. Second round PCRs were performed in triplicate and pooled together. Several samples had very low or no amplification of the nifH gene (Table 1). Negative controls did not yield any amplification.

Library was prepared as detailed on the Illumina® support website (http://support.illumina.com/documents/documentation/chemistry_documentation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdf) with final concentration of 1 nM. PCR products were checked initially by gel electrophoresis and Bioanalyzer High Sensitivity DNA® kit. PCR cleanup and fragment size-selection was made with AMPure XP® (Beckman Coulter, Inc.). The Nextera XT Index Kit (Illumina®, FC-131-1002) was used to attach adapters that are complementary to the flow cell of the Illumina MiSeq® and to provide dual, unique indices for each sample. After another round of purification and size-checking validation, we performed quantification of the final products in triplicate with both Quant-iT™ PicoGreen® dsDNA Assay Kit and qPCR, with KAPA® Library Quantification Kit. DNA concentration (nM) was computed as:

$$d = 10^6\frac{D}{{660 \times S}}$$

where d is the DNA concentration in nM, D the DNA concentration in ng µL−1 and S the amplified fragment size.

Sequencing run was performed on a MiSeq system at the GenoMer platform at Station Biologique in Roscoff. Sample concentration was normalized to have equal nM DNA concentration, pooled, denatured, diluted to 6 pM and loaded to the MiSeq run, along with 5% of denaturated PhiX to prevent sequencing errors due to low-diversity libraries. For the sequencing step, we used MiSeq Reagent Kits v2®. The run was designed with Illumina Experiment Manager (https://support.illumina.com/downloads/illumina-experiment-manager-v1-13.html).

Illumina data processing

Sequences were demultiplexed using the Illumina software. For each sample, the quality of sequences was first verified using the FastQC software [33]. Only sequences longer than 200 bp and with at least 75% of the bases with a quality score higher than 20 were kept. After this step, unpaired reads were also removed. The following steps were performed using Mothur version 1.38 [34]. Contigs were assembled from forward and reverse reads for each sample. Contigs that contained ambiguities (N) were discarded. Amplicons for 18S rRNA and nifH contigs were extracted using the command pcr.seqs, allowing for two mismatches between the sequence and the primers (Supplementary Table 3). For each gene, sequences were dereplicated and singletons were removed. Unique sequences for 18S rRNA and nifH were aligned, respectively, to the Silva_123 SEED available on Mothur website (https://www.mothur.org/wiki/Silva_reference_files) and the aligned nifH database [35] available from http://www.css.cornell.edu/faculty/buckley/nifh.htm. Aligned unique sequences were pre-clustered allowing for two differences. Chimeras were removed with the UCHIME algorithm, as implemented in Mothur. At this stage, sequences from control samples (e.g., sheath fluid sort) and sequences corresponding to <10 reads were removed. The latter step makes all subsequent steps much faster to perform while removing a very small number of reads: for example for 18S rRNA, it decreases by sixfold the number of unique sequences while reducing the total number of reads by only 0.4% (Supplementary Table 3). OTUs were built at the 98% similarity level using Mothur average neighbor algorithm. The 274 OTUs for 18S rRNA were taxonomically assigned using the PR2 database [36] version 4.4 available on figshare repository (https://figshare.com/articles/PR2_rRNA_gene_database/3803709/6) and on GitHub (https://github.com/vaulot/pr2_database/releases). This version of PR2 incorporates a revision of Haptophyta taxonomy following Edvardsen et al. [37, 38]. 18S rRNA and nifH OTUs were also searched against the GenBank database using BLAST plugin (November 2016) in Geneious® 10 [39]. One sample (11n, Table 1) was dominated by Metazoa sequences and was not considered further. For 18S rRNA, 16 OTUs (7.9 % of total reads) from Opisthokonta and Streptophyta were not further considered. Six groups (Dinophyta, Ochrophyta, Haptophyta, Cryptophyta, Chlorophyta, and Chlorarachniophyceae) were considered as autotrophic. Most analyses focus on these autotrophic groups to the exclusion of non-autotrophic groups such as Syndiniales. Autotrophic 18S rRNA or nifH OTUs were considered as “major” (Supplementary Table 4; Supplementary Table 5) if they contributed to a minimum of 20% of reads in at least one sample. This definition seems better than only considering the total contribution of a given OTU because some specific samples may have one dominant OTU that is not found in any other sample and which has therefore a low total abundance over the whole sample set. nifH sequences from major OTUs were aligned to related sequences obtained by BLAST as well as to some reference sequences using MAFFT [40]. Phylogenetic analysis was performed with FastTree [41]. Both programs were used as implemented in Geneious 10 [39] with default parameters.

Mothur script as well as OTU sequences for 18S rRNA and nifH as fasta files are available on GitHub (https://github.com/vaulot/Ribeiro_CARBOM_ISME_2018). Raw FASTQ sequence files were deposited to GenBank under the project number PRJNA377956.

Statistical analyses

All statistical analyses and graphics were performed under R [42] using the following libraries: ggplot2, ggmap, dplyr, treemap, vegan. NMDS analysis of communities based on major 18S rRNA OTUs was performed using Bray–Curtis distance with vegan metaMDS function. Environmental parameters (Supplementary Material: https://github.com/vaulot/Ribeiro_CARBOM_ISME_2018) were mapped using vegan envfit function. Network analysis was conducted using a matrix combining major 18S rRNA and nifH OTUs as rows, samples as columns, and read abundances as values. Correlations between OTUs and pseudo p-values were computed using SparCC [43]. Only correlations larger than 0.25 and with a pseudo p-value ≤ 0.05 were retained. Network between OTUs was constructed using the R package igraph and drawn using Gephi [44]. Scripts are available on GitHub (https://github.com/vaulot/Ribeiro_CARBOM_ISME_2018).

Results

The CARBOM V cruise took place during the austral spring of 2013 in a region located off South East Brazil and extending from the continental shelf to the 3510 m isobath (Fig. 1). Phytoplankton samples were preserved with DMSO and deep frozen (Table 1). Once back to the laboratory, we sorted small autotrophic eukaryotes using flow cytometry based on scatter and chlorophyll fluorescence properties. For each sorted population, we sequenced using high throughput techniques (Illumina) simultaneously for the V4 region of the 18S rRNA gene and a fragment of the nifH gene.

18S rRNA

The number of 18S rRNA reads varied between 14,000 and 135,000 depending on the sample (Table 1). Using a 98% similarity level, 258 OTUs were found (excluding opisthokonts and streptophytes) of which 53 could be attributed to heterotrophs, representing <7% of the reads. Within autotrophs, the major groups were Prymnesiophyceae, Mamiellophyceae, Dinophyceae, Bacillariophyta (diatoms), and Chrysophyceae, with minor contributions of Pelagophyceae and Dictyochophyceae (Fig. 2a). All abundant autotrophic OTUs matched with high similarity (from 99 to 100%) existing GenBank sequences but did not always correspond to sequences from cultures (Supplementary Table 4). The OTU that made the highest contribution (Fig. 2b, Otu002, 12.8% on average over all samples—note that this was not the OTU with the highest number of reads because read number varied between samples) matched an environmental sequence (FJ537341, clone BIOSOPE T60.34) from an uncultivated Braarudosphaeraceae (Prymnesiophyceae) that has been demonstrated to host the nitrogen-fixing cyanobacteria UCYN-A1 [10]. An additional OTU (Otu020, 1.3% on average), with a lower average contribution, matched another Braarudosphaeraceae, B. bigelowii, which is known to host UCYN-A2 [45]. Two OTUs (Otu003 and Otu001) with high average contributions (10.1 and 9.1%, Fig. 2b) matched respectively two Mamiellophyceae, Bathycoccus [46], and the cultivated but yet undescribed clade B of Ostreococcus [47]. Other abundant OTUs (Fig. 2) were assigned to dinoflagellates (Prorocentrum, Gonyaulax, Karlodinium), Prymnesiophyceae (Chrysochromulina, Syracosphaera, Emiliania), diatoms (Pseudo-nitzschia, Thalassiosira, Cylindrotheca), uncultivated Chrysophyceae (clade G), Pelagophyceae (Pelagomonas), and other Mamiellophyceae (Micromonas). Within heterotrophs, the most abundant OTU (Otu013) was assigned to Syndiniales, known parasites of dinoflagellates [48].

Fig. 2
figure 2

a Mean relative contribution of each class to 18S rRNA sequences for all sorted samples (pico- and nanoeukaryotes). MOCH: Marine Ochrophyta. b Mean relative contribution for major 18S rRNA OTUs clustered at 98% similarity including both autotrophic (colored bars) and heterotrophic (black bars) groups. Major OTUs are defined as those that contribute to more than 20 % of reads in at least one sample

The spatial distribution of the autotrophic groups for the different samples (Fig. 3) or along TR2 (Supplementary Figure 1) for which different depths have been sampled revealed that Prymnesiophyceae were in some cases more abundant in surface and Mamiellophyceae at depth (e.g., Stations 96 or 101). Some classes with overall relatively low abundance dominated some specific samples, as was the case for members of the division Ochrophyta: Pelagophyceae (nanoeukaryotes, St. 106, 5 m: 63% of reads), Chrysophyceae (picoeukaryotes, St. 87, 105 m and St. 114 80 m and nanoeukaryotes, St. Bloom 5 m, from 57 to 84%), and Dictyochophyceae (picoeukaryotes, St. 101, 5 m, 76%).

Fig. 3
figure 3

Relative read abundance for 18S rRNA autotrophic classes (left) and for cyanobacteria nifH OTUs (right) in each sorted sample for pico- (top) vs. nanoeukaryotes (bottom). Classes or OTUs that did not contribute to at least 20% of sequences in at least one sample were regrouped into “Others”. Samples for which we could not amplify cyanobacteria nifH are displayed as empty (right side)

Samples were clustered based on the contribution of dominant autotrophic OTUs using Bray–Curtis dissimilarity (Fig. 4). Four major types of communities emerged (clusters A to D). Cluster A encompassed the largest number of samples (Fig. 4) and corresponded to communities dominated by Otu002, the uncultivated Braarudosphaeraceae known to host UCYN-A1. Some samples from this cluster also contained Otu003 (Bathycoccus) and some dinoflagellate OTUs. Samples from cluster B (Fig. 4) were dominated by Mamiellophyceae, with Otu001 (Ostreococcus clade B) and Otu003 (Bathycoccus prasinos) occurring together most of the time in quite similar proportions. In a few samples, Ostreococcus was replaced by Micromonas from clade A (Otu025). Samples from cluster C (Fig. 4) were composed by a mix of Prymnesiophyceae including the host of UCYN-A1 but also OTUs related to Chrysochromulina, Syracosphaera, Emiliania as well as Braarudosphaera. Finally, cluster D (Fig. 4) was dominated by a dinoflagellate OTU related to Prorocentrum along with a smaller contribution of Braarudosphaera. The rest of the samples had all quite dissimilar communities without any other clear structure.

Fig. 4
figure 4

Heatmap for major 18S autotrophic OTUs (see Supplementary Table 4) ordered by taxonomy. Labels of samples corresponding to sorted picoeukaryotes and nanoeukaryotes are ending with p and n, respectively. Sample names colored in blue correspond to samples from which cyanobacteria nifH could be amplified. Samples were clustered using Bray–Curtis dissimilarity. Four clusters can be defined: A correspond to samples dominated by UCYN-A1 host, B by the Mamiellophyceae Ostreococcus and Bathycoccus, C by an assemblage of Prymnesiophyceae and D by a dinoflagellate from the genus Prorocentrum

NMDS analysis of OTUs based on Bray–Curtis dissimilarity between samples onto which environmental variables were mapped [49] revealed a clear sample clustering based on the type of community present in the sample (Fig. 5). Samples dominated by Mamiellophyceae (cluster B) were clearly separated from those dominated by UCYN-A1 haptophyte host (cluster A). The Mamiellophyceae community was linked to high chlorophyll fluorescence, higher N/P, larger depth, and lower temperature corresponding to the interface between South Atlantic Current and Tropical waters [49], whereas the UCYN-A1 host community was linked to high temperature, low chlorophyll fluorescence, and low N/P corresponding to warm tropical waters. Cluster C corresponding to Prymnesiophyceae dominated samples was also associated to these tropical waters. Finally Prorocentrum dominated samples were from low salinity and Synechococcus-rich samples corresponding to more coastal waters.

Fig. 5
figure 5

NMDS analysis based on Bray–Curtis dissimilarity of samples based on major 18S OTUs (see Supplementary Table 4) onto which environmental parameters (from [49]) have been mapped. Samples are colored according to the community clusters defined in Fig. 4 and empty symbols correspond to samples not belonging to any of the four main clusters

nifH

The number of nifH reads per sorted sample varied from a few reads to 137,000. The number of OTUs (22 at 98% similarity) was more than ten times smaller than for 18S rRNA. Among the five most abundant OTUs (Supplementary Figure 2), three were related to clades of the unicellular cyanobacterium UCYN-A, respectively A1, A2, and A3 [24]. Interestingly, another minor OTU could be associated to the recently described clade A4 [16]. The fifth most abundant OTU was related (99.7% similarity) to the filamentous nitrogen-fixing cyanobacterium Trichodesmium erythreum (Supplementary Figure 2).

The third most abundant OTU was related to Rhizobiales (Supplementary Figure 2). Several GenBank sequences matching with high similarity this OTU originated from marine waters. However, this OTU also had 100% similarity to a sequence (AB198390) that has been determined to be a contaminant originating from primers [50]. This OTU, as well as the other non-cyanobacteria minor OTUs, were therefore not considered any further in the paper. Furthermore, we did not consider samples with <2000 cyanobacteria nifH reads assuming that such a low read number was due to noise.

On average, UCYN-A1 contributed to most of the reads, followed by A2, A3 and A4 (Fig.6a). In a given community, a single OTU was in general dominating (in 63% of the cases, one OTU represented more than 90% of the reads). This was confirmed by heatmap cluster analysis which clearly discriminated four clusters dominated by either UCYN-A1, A2, A3, or Trichodesmium (Fig.6b). Both A2 and A3 could co-occur with A1. A4 was really important only in one sample (St86, 105 m, 39% of reads) and co-occured with A2. Trichodesmium was only present at two locations (Bloom station in surface and St 114, 80 m, Fig.3) and it co-occurred with UCYN-A1 (Fig.6b). No significant trend with respect to either depth or distance to shore emerged (Fig.3). In particular, UCYN-A could be found near the surface as well at depth, down to 110 m.

Fig. 6
figure 6

a Mean relative contribution of cyanobacteria nifH OTUs clustered at 98% similarity for all sorted samples (both pico- and nanoeukaryotes), excluding only samples from which nifH could not be amplified. b Heatmap for cyanobacteria nifH OTUs (Supplementary Table 5). Labels of samples corresponding to sorted picoeukaryotes and nanoeukaryotes are ending with p and n, respectively. Samples were clustered using Bray–Curtis dissimilarity

Network analysis of the major autotrophic 18S rRNA and nifH OTUs

To determine the connections among eukaryotes as well as between eukaryotes and UCYN-A clades, we built a contingency table joining both sets of 18S rRNA and UCYNA-A nifH OTUs, and performed a network analysis based on SparCC distance [43]. Major eukaryotic taxonomic group (Prymnesiophyceae, Chrysophyceae, Mamiellophyceae) formed sub-networks (Fig. 7), indicating they had a tendency to co-occur. This network analysis (Fig. 7) recovered the two main host–symbiont relationships previously hypothesized in the literature [18], i.e., UCYN-A1 correlated to prymnesiophyte Otu002 and UCYN-A2 correlated to Otu020 (B. bigelowii). UCYN-A4, like A2, appeared to be correlated to B. bigelowii. UCYN-A3 was not connected to any of the two Braarudosphaeraceae but to Bathycoccus and an uncultivated dinoflagellate. Still A1, A2, and A4 were also correlated to other 18S OTUs, for example A1 correlated to Syracosphaera or A2 to prasinophytes clade VII.

Fig. 7
figure 7

Network analysis of the major autotrophic 18S rRNA (circles) and nifH (squares) OTUs (see Supplementary Table 4 and Supplementary Table 5) using SparCC correlation [43]. Colors of nodes correspond to taxonomic assignation. Size of nodes is proportional to number of reads obtained. Width of edges is proportional to correlation between OTUs. Only correlations >0.20 with pseudo p-values < 0.05 were considered

Discussion

Phytoplankton communities in the euphotic zone of the Brazilian South Atlantic Bight are influenced by two main water masses: the warm and oligotrophic Tropical Water on the top 100 meters and the cold and nutrient-rich South Atlantic Central Water (SACW) below. In this region, photosynthetic pico and nanoeukaryotes average about 3000 cells per mL and 850 cells per mL, respectively [49]. To assess the taxonomy of these small eukaryotes, we analyzed populations sorted by flow cytometry based on their size and pigment fluorescence. This method has the advantage of targeting specific populations and allows to focus in components of the diversity that usually go unnoticed from filtered samples, due to the competition with more abundant or more easily amplified targets [29]. Almost all previous studies using this approach relied on sorting live samples [6, 51, 52], which requires the availability of a sorting flow cytometer onboard ships and are therefore complex in terms of logistics. A recent study [53] used preserved samples but required both sample concentration and progressive freezing. We used instead a much simpler preservation method, relying on the addition of 10% DMSO to 1.5 mL of non-concentrated seawater samples, followed by a quick flash freezing. By eliminating sample filtration, the protocol used here requires <15 min hands-on work after the water samples get onboard, therefore making sampling considerably easier in oceanographic cruises and minimizing changes in the plankton community.

Despite the small number of cells sorted in some samples (18–16,000), we could successfully PCR amplify and sequence both the 18S rRNA gene and the nifH gene on these samples. The vast majority of eukaryotes sequences were from autotrophs (93%). The major heterotrophic group was Syndiniales, which are known to infect dinoflagellates [48]. These parasites were probably sorted in association with the dinoflagellates that were present in our samples (e.g., Otu004 or Otu008). The successful targeting of the autotrophic population by flow cytometry sorting is also demonstrated by the low number of OTUs (258) for 18S rRNA using a 98% similarity level, which is more than one order of magnitude less compared to what is found in studies using filtered material (e.g., [31, 54]). Likewise, the number of OTUs for nifH was very low (22) even compared to typical clone library studies (e.g., [55]), demonstrating that we probably only targeted bacteria that were in association with the sorted eukaryotes.

Heatmap and network analysis (Figs. 4 and 7) demonstrated that for some of the phylogenetic groups, 18S rRNA OTUs belonging to the same class had a tendency to co-occur, suggesting that population response to environmental forcing occurred often at the class level. This was the case for Mamiellophyceae, Prymnesiophyceae, and Chrysophyceae, which are well known contributors to phytoplankton in tropical oligotrophic and mesotrophic waters [3, 6].

The most prevalent guild, characteristic of warm tropical waters (Fig. 5), was dominated by the Prymnesiophyceae Otu002, which sequence is identical to that of the known host of UCYN-A1 (Fig. 4). This sequence has been shown to be abundant in specific oceanic areas, in particular off Brazil, where it can exceed 10 % of Prymnesiophyceae reads [27]. However, in our case, it could be much more dominant, representing up 81% of small photosynthetic eukaryotic reads (at St.96, 5 m). Another OTU (Otu020) matched B. bigelowii, which is hypothesized to be the host of UCYN-A2 [45], and is more widespread but less abundant than the host of UCYN-A1 [27]. Other relatively abundant genera included the coccolith-bearing Algirosphaera, Syracosphaera, and Emiliania. Although the latter genus is extremely widespread [56], data on the distribution of the two former ones are much more scarce, although they were among the most abundant OTUs in a seasonal study of Oslo fjord (Norway, [57]).

We observed four types of UCYN-A off Brazil belonging to the clades A1 to A4 [16], but none were related the two recently defined clades A5 and A6 [25]. Although for all samples where UCYN-A was present, either the UCYN-A1 host or Braarudosphaera bigelowii were detected, these two OTUs were relatively abundant (in excess of 20% of the reads) in two sample where UCYN-A was not observed (St 26 and St 101 both at 5 m). This suggests that UCYN-A symbiosis may not be always obligatory for the host or that, in these two cases, UCYN-A cell dislodgement may have occurred during sampling or sorting, as observed previously [10, 27]. The most prevalent clade was A1, which is consistent with recent reports of this clade being the most abundant in metabarcoding surveys (e.g., [25]). Its co-occurrence (Fig. 7) with the prymnesiophyte 18S rRNA Otu002 reinforces the hypothesis of the specificity of this symbiotic relationship. Clade A2 was less frequent and less abundant, but was also clearly linked in our network analysis to its hypothesized host, B. bigelowii. It co-occurred with A1 in a few samples as observed elsewhere in coastal influenced pelagic waters [25]. In contrast to A1 and A2, virtually nothing is known about the other clades. Off Brazil, A3 could co-occur with A1 as recently observed [25]. Network analysis using major OTUs suggested that A3 could be correlated to Bathycoccus and to a dinoflagellate while A4 could be similar to A2 also hosted by B. bigelowii (Fig. 7). However, as A3 and A4 were in relatively low abundance, they also could occur in symbiosis with a rare autotrophic eukaryote not taken into account in the network analysis.

We also detected some nifH sequences related to T. erythreum [58] which had been previously observed in this area [59]. These sequences originated from samples taken within a Trichodesmium bloom and at 80 m in the most offshore station of TR2. It is surprising that we could sort Trichodesmium cells because they possess phycoerythrin and our settings discriminated against phycoerythrin-containing cells. However, senescent cells of Trichodesmium display lower phycoerythrin fluorescence signal [60] and could have been sorted along with the eukaryotes. Both Trichodesmium and UCYN-A1 co-occurred in the same samples (Fig. 3), confirming that these two nitrogen-fixing cyanobacteria are not mutually exclusive, which has been observed elsewhere [55, 61].

The second type of community was dominated by Mamiellophyceae, either Bathycoccus or Ostreococcus, or both (Fig. 4). Mamiellophyceae are known to be ubiquitous in marine ecosystems but more prevalent in coastal waters, being replaced by prasinophytes clade VII in pelagic regions [62]. B. prasinos was first described from a deep Mediterranean Sea isolate [46] but since it has been observed in many areas from coastal waters to the open ocean [7, 63]. The existence of two different species of Bathycoccus has been hypothesized based on metagenomic data and strains available in cultures [7, 64]. Unfortunately, these two species have exactly the same 18S rRNA sequence, and therefore we were not able to differentiate them in the present study. In contrast, at least 4 clades can be distinguished by 18S rRNA for Ostreococcus, three corresponding to named species O. tauri, O. “lucimarinus” and O. mediterraneus and the last one, clade B [47, 65] also called OII [7], not yet described. The major Ostreococcus OTU in our study corresponds to clade B, which has been found in more oligotrophic regions than O. “lucimarinus” (e.g., [7]). For the third major Mamiellophyceae genus Micromonas, its two main OTUs were in relatively low abundance and related to the newly described species [66] M. commoda (clade A) and M. bravo (clade B), which are known to be widespread and abundant in coastal waters [67]. Network analysis demonstrated that these three genera (Ostreococcus, Bathycoccus and Micromonas), plus an OTU belonging to another green alga (prasinophytes clade VII A4, [62]), often co-occurred in our samples (Fig. 7).

A recent global study on UCYN-A symbioses [27] did not find a strong correlation between their distribution and temperature, N/P ratios or chlorophyll a, suggesting their structuring factors may lie elsewhere. In the present work, NMDS analysis suggests that the alternation between communities dominated by UCYN-A1 host and Mamiellophyceae (Fig. 5) is linked to water masses with the former found in low chlorophyll tropical waters and the latter near the interface between warm tropical waters and colder South Atlantic waters where the chlorophyll maximum develops. Two key factors that could explain the dominance of the UCYN-A1 host community could be higher temperatures and lower N/P ratios, both favoring nitrogen-fixing cyanobacteria [17]. Tropical waters off Brazil have a very low N/P ratio (down to 0.4 in the present study) which could explain why this symbiosis is so prevalent. Another factor which could be important but was not measured in our study is iron, a key element for nitrogen-fixers [17]. By comparison, the “green “community was found in samples at depth with high chlorophyll fluorescence (Supplementary Figure 2), corresponding to the DCM at the interface between tropical water and SACW [49]. This environment matches the hypothesized habitat for Ostreococcus clade B [47, 68] and Bathycoccus [46, 64].

The lack of representative cultures from UCYN-A symbioses limits our comprehension of the factors structuring their populations and their competitive fitness in relation to other populations. Our study provides evidences that small phytoplankton communities in the region are highly structured and fall into a few specific types. It confirms the existing links between UCYN-A cyanobacteria bacteria and Prymnesiophyceae as well as the ecological importance of this association in the South Atlantic off Brazil. Moreover, our approach allows to formulate hypothesis about other UCYN-A clades for which host have not been characterized to this date.