Insights into the Planktothrix genus: Genomic and metabolic comparison of benthic and planktic strains

Planktothrix is a dominant cyanobacterial genus forming toxic blooms in temperate freshwater ecosystems. We sequenced the genome of planktic and non planktic Planktothrix strains to better represent this genus diversity and life style at the genomic level. Benthic and biphasic strains are rooting the Planktothrix phylogenetic tree and widely expand the pangenome of this genus. We further investigated in silico the genetic potential dedicated to gas vesicles production, nitrogen fixation as well as natural product synthesis and conducted complementary experimental tests by cell culture, microscopy and mass spectrometry. Significant differences for the investigated features could be evidenced between strains of different life styles. The benthic Planktothrix strains showed unexpected characteristics such as buoyancy, nitrogen fixation capacity and unique natural product features. In comparison with Microcystis, another dominant toxic bloom-forming genus in freshwater ecosystem, different evolutionary strategies were highlighted notably as Planktothrix exhibits an overall greater genetic diversity but a smaller genomic plasticity than Microcystis. Our results are shedding light on Planktothrix evolution, phylogeny and physiology in the frame of their diverse life styles.

Planktothrix strains each month at the PCC. The other BG11--based media used were: BG11o, medium with no combined nitrogen source; BG119 with 9 mM of NaNO3 and BG112 with 1.8 mM of NaNO3. All the cultures were grown at 22°C under a rhythm of 13h--11h light--dark cycle at 20 µmol photon.m --1 .s --2 . Each culture was grown for 1 month before being transferred (dil. 1/20 for planktic form or a fragment of the biofilm for the benthic form) into a subculture in the same conditions. The growth and the pigmentation of the strains were estimated visually due to the fact that depending on the life style (benthic versus planktic), the filaments were more or less aggregated in pellet or in biofilms (Fig. S1), making the estimation of the optical density impossible.
Genome assembly. Assembly validation was made via the Consed interface (www.phrap.org), and 287 and 494 PCR reads for PCC 7805 and PCC 7821, respectively, were performed for gap closure. For the quality assessment, around 100--fold coverage of Illumina reads (GAIIX instrument, 51 bp) were mapped onto the whole genome sequences, using SOAP (http://soap.genomics.org.cn), as described by Aury et al. 1 .
Additionally as tRNA histidine was missing from the original assembly of our two planktic genomes (the same way, tRNA isoleucine is lacking in the genomes of eight available planktic Planktothrix), we resequenced their genomes (Illumina NextSeq500 technology) to find tRNA histidine located between highly repetitive sequences.
Core and pan--genome. The Pan/Core genome functionality of the MicroScope platform was used to compute the core and the pangenome of Planktothrix strains 2 . Putative orthologs were defined as gene pairs satisfying an alignment threshold of at least 80% amino acid sequence identity over at least 80% of the length of the smallest protein.
Phylogenetic tree reconstruction. The extended species tree was generated by a concatenation of twenty--nine conserved proteins selected from the phylogenetic markers previously validated for Cyanobacteria 3 . A Maximum--Likelihood phylogenetic tree was generated with the alignment using PhyML 3.1.0.2 using the LG amino acid substitution model with gamma--distributed rate variation (six categories), estimation of a proportion of invariable sites and exploring tree topologies using Nearest Neighbor

Interchanges.
Synteny and estimation of the proportion of repeated sequences. Synteny computation and repeated sequence detection are provided by the MicroScope platform.
The proportion of repeats was estimated using the Repseek algorithm, a fast two--step method (seed detection followed by their extensions), which allows finding large degenerate repeats within or between large DNA sequences 4 . The synteny values representing the percentage of CDSs belonging to a synteny group were estimated by taking into account CDSs sharing at least 35% sequence identity on 80% of the length of the smallest protein, with a gap parameter (number of consecutive genes not involved in synteny) set to five.
Comparative analysis of the distribution of the genes among the COG categories.
For each genome, all genes were assigned to a COG category by using the tool COGnitor available in MicroScope platform, knowing that COGnitor compares the gene sequences to the COG database by using BLASTP 5 . A non--metric multidimensional scaling (NMDS) was performed on the relative abundances of each COG category in all Planktothrix genomes available.
Detection of natural product gene clusters in Planktothrix strains. Natural product gene clusters were identified using the antiSMASH 2.0.2 software 6 and the modified version of the complete genome scanning pipeline 2metdb 7 . Each gene within a cluster was compared to its syntenic homolog at the amino--acid level in the reference genome such as P. rubescens PCC 7821 to obtain the deduced amino--acid sequence identity (AAI). Features of genomic plasticity were identified using RGPfinder with SIGI--HMM 8 and AlienHunter (IVOM) 9 incremented on the MicroScope platform.