Impacts of microbial assemblage and environmental conditions on the distribution of anatoxin-a producing cyanobacteria within a river network

Blooms of planktonic cyanobacteria have long been of concern in lakes, but more recently, harmful impacts of riverine benthic cyanobacterial mats been recognized. As yet, we know little about how various benthic cyanobacteria are distributed in river networks, or how environmental conditions or other associated microbes in their consortia affect their biosynthetic capacities. We performed metagenomic sequencing for 22 Oscillatoriales-dominated (Cyanobacteria) microbial mats collected across the Eel River network in Northern California and investigated factors associated with anatoxin-a producing cyanobacteria. All microbial communities were dominated by one or two cyanobacterial species, so the key mat metabolisms involve oxygenic photosynthesis and carbon oxidation. Only a few metabolisms fueled the growth of the mat communities, with little evidence for anaerobic metabolic pathways. We genomically defined four cyanobacterial species, all which shared <96% average nucleotide identity with reference Oscillatoriales genomes and are potentially novel species in the genus Microcoleus. One of the Microcoleus species contained the anatoxin-a biosynthesis genes, and we describe the first anatoxin-a gene cluster from the Microcoleus clade within Oscillatoriales. Occurrence of these four Microcoleus species in the watershed was correlated with total dissolved nitrogen and phosphorus concentrations, and the species that contains the anatoxin-a gene cluster was found in sites with higher nitrogen concentrations. Microbial assemblages in mat samples with the anatoxin-a gene cluster consistently had a lower abundance of Burkholderiales (Betaproteobacteria) species than did mats without the anatoxin-producing genes. The associations of water nutrient concentrations and certain co-occurring microbes with anatoxin-a producing Microcoleus motivate further exploration for their roles as potential controls on the distributions of toxigenic benthic cyanobacteria in river networks.


Microbial assemblage diversity
The taxonomic composition of the microbial assemblage in the samples was investigated using the ribosomal protein S3 (rpS3) gene. The amino acid sequences of all assembled scaffolds >1 kb were searched for the rpS3 gene using custom Hidden Markov Models (HMMs) (https://github.com/AJProbst/rpS3_trckr). The rpS3 amino acid sequences were then clustered at 99% sequence identity to approximate species-level clusters and create unique rpS3 clusters for each organism bin. The longest scaffold from each rpS3 cluster was identified, and reads from each sample were mapped onto that set using Bowtie 2 [2] allowing ≤3 mismatches per read. An organism was considered present in a sample if the rpS3 sequence was found on an assembled scaffold, or if reads from a sample mapped to the rpS3 sequence with a breadth >95%. The coverage values from read mapping for all rpS3 clusters in a sample were then normalized by the number of sequenced gigabase pairs (gbp) that went into each assembly. The relative abundance of each rpS3 gene was calculated by dividing the coverage of each rpS3 sequence in a sample, by the sum of the coverage values of all the rpS3 genes in a given sample, then multiplying by 100.
Preliminary taxonomic identifications for each rpS3 cluster were derived by searching [3] the amino acid sequence against a combined database from previous publications [4,5] and selecting the best match. Refinements to the taxonomic annotation were made using the maximum likelihood phylogenetic tree described below.
A phylogenetic tree was built to investigate the taxonomic diversity of rpS3 sequences.
Reference rpS3 amino acid sequences were downloaded from NCBI and aligned with sample sequences using MUSCLE [6]. Amino acid sites with >95% gaps after the alignment were stripped from the analysis in Geneious v8.1.8 [7], and duplicate sequences were removed. A maximum likelihood phylogenetic tree was constructed from the remaining 363 reference and sample sequences using RAxML [8] with the PROTGAMMALG amino acid evolution model and the number of bootstraps automatically determined (autoMRE).
Average nucleotide identity (ANI) was used to compare the diversity of the Oscillatoriales genomes in the mat. The quality of the 35 assembled genome bins in the order Oscillatoriales was assessed using CheckM [9]. Genomes <75% complete or with >10% contamination were excluded from further analysis. ANI was calculated on the remaining 29 genomes, and 15 reference genomes downloaded from NCBI (Table S1), with the ANIm method [10] implemented using the Python module PYANI (https://github.com/widdowquinn/pyani) [11].
This program first performs pairwise alignments of all input genomes. For each alignment, the nucleotide similarity is calculated. It then averages the nucleotide identity for all aligned regions in each pairwise comparison. As genomes differ in similarity it becomes harder for the algorithm to align the genomes and the alignment coverage decreases [12]. If less than 25% of the genome aligned, we considered the ANI results non-representative and did not include the ANI percentage in the results, and indicative of low genome wide sequence similarity. We chose 25% alignment coverage as a threshold because there was a large break in the distribution of alignment coverage values at 25%. Genomes with ANI less than 96% were considered different species [10,12,13].
To further investigate the taxonomy of Oscillatoriales genomes, maximum likelihood trees were constructed with nucleotide sequences from the 16S ribosomal RNA gene (16S rRNA), gyrB gene (gyrase subunit B), and rbcL gene (ribulose 1,5-bisphosphate carboxylase large subunit).
Reference strains were selected from Strunecký et al. [14] and Sciuto et al. [15], two papers that contributed to revisions of the genera Microcoleus and Phormidium. Genes were identified in the Oscillatoriales genomes from the annotation methods described above and by blasting genomes with sequences from Oscillatoria nigro-viridis PCC 7112. Reference and sample nucleotide sequences (Table S2) were aligned with MUSCLE [6] and non-informative sites stripped with BMGE [16]. Trees were built with RAxML [8] using the CIPRES Science Gateway [17] with the GTRCAT substitution model and 100 bootstraps for each tree.

Metabolic potential and phosphorus acquisition
Profiling of metabolic potential was performed on genome bins that passed quality filtering (>70% complete and <10% contamination according to CheckM). This resulted in 106 genomes with mean and median contamination of 1.4% and 0.6%, respectively, and mean and median completeness of 87.3% and 88.0%, respectively (Table S5). The amino acid sequences of predicted genes from genome bins were compared to TIGRFAM HMMs [18] and custom HMMs for metabolic pathways involving arsenic, C1 compounds, carbon, carbon monoxide, halogenated compounds, hydrogen, nitriles, nitrogen, oxygen, sulfur, and urea from Anantharaman et al. [4]. Cut off values for HMM scores were derived from Anantharaman et al. [4] (Table S3).
Phosphorus acquisition and transport were investigated by searching genomes for genes involved in phosphorus transport, solubilization, mineralization, and regulation using Pfam or TIGRfam HMMs (Table S4) [19]. Cutoff values were derived by downloading 21 annotated isolate genomes and searching them using the HMMs. Search results were verified with blastp against the NCBI RefSeq database (June 2017) by looking for enzyme name keywords from Table S4 indicating gene functions, and then establishing cutoff values.

Anatoxin-a gene cluster
Some, but not all, genes in the anatoxin-a gene cluster were correctly annotated via the procedures described above. Additional anatoxin-a biosynthesis genes were identified by investigating genes surrounding correctly annotated genes. To search for additional scaffolds with the anatoxin-a gene cluster, HMMs were built using hmmbuild (hmmer.org) for each gene in the anatoxin-a gene cluster using reference sequences from NCBI. All genes in the vicinity of the above identified genes that passed our screen thresholds (e-value cutoff: 10 -50 ) were further investigated as candidate anatoxin-a genes, using the same methods described above. In most samples, the anatoxin-a biosynthesis genes were located on multiple scaffolds within a genome.
We successfully combined some scaffolds to get the entire gene cluster on a single scaffold by checking for identical sequences on different scaffolds and using the de novo assembler in Geneious v8.1.2. Then, reads from samples were mapped to the new combined scaffolds with Bowtie 2 [2] to confirm read support for the sequences on the combined scaffolds. Additionally, raw reads were mapped with Bowtie 2 [2] to the scaffold PH2015_06S_scaffold_1561, to confirm that reads from each sample mapped to the anatoxin-a gene cluster. Once all anatoxin-a genes were identified, the protein domains of the samples were compared to reference sequences using hmmscan (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan), and scaffolds were mapped to a reference anatoxin-a gene cluster from Oscillatoria PCC6506 (NCBI accession FJ477836.2) using Geneious v8.1.2 [7] to analyze gene synteny and sequence identity among samples and two additional reference anatoxin-a gene cluster sequences, Anabaena sp. 37 (NCBI accession JF803645.1) and Cuspidothrix (NCBI accession KM245023.1).
The relationship between the presence of the anatoxin-a gene cluster and the microbial assemblage was investigated with non-metric multidimensional scaling of Bray-Curtis dissimilarities using the R package vegan [20], and statistically tested with permutational multivariate ANOVA (PERMANOVA) [21]. Additionally, a binomial regression was used to test differences in the percent relative abundance of Burkholderiales in samples with and without the anatoxin-a gene cluster.

Anatoxin-a measurements
Anatoxin-a concentrations in Oscillatoriales mats were measured using liquid chromatography mass spectrometry (LC-MS) with Select Ion Monitoring. After a mat sample was collected for DNA extraction, ~1 g of remaining mat on the cobble was placed in a 250 mL glass jar and placed in a cooler on ice in the dark, transported to the laboratory, and stored at 4°C overnight in the dark. The next day 50-100 mL of de-ionized water was added to the sample mat, and the mat homogenized with a blender, then a 15 mL subsample transferred to a glass vial and frozen at -20°C. For anatoxin-a extraction, samples were thawed, and 3 mL sub-sample added to a glass culture tube with 3 mL of 100% MeOH (Fisher A452), then the tube was sonicated for 30 s using a probe sonicator (Sonic Dismembrator 100; Thermo Fisher Scientific, Massachusetts, USA) at Phenylalanine standards were used to check for separation between anatoxin-a and phenylalanine peaks, which can create false anatoxin-a positives in LC-MS measurements [22]. Detection limits were 0.7 parts per billion for anatoxin-a. Calibration was performed using certified reference materials with a minimum of five calibration points for each batch of samples, and analytical blanks and matrix blanks included in each run. After centrifuging and sampling the supernatant, the cyanobacterial mat in the culture tube was transferred to a weighing tin and dry weight measured after 48 hours in a drying oven at 50°C. Anatoxin-a concentrations were then calculated as ng anatoxin-a / g dry weight (DW). We also tested for a positive association between the recovery of the anatoxin-a gene cluster in samples and the detection of anatoxin-a with LC-MS using a Fisher exact test using R [24].