A novel cyanobacterial geosmin producer, revising GeoA distribution and dispersion patterns in Bacteria

Cyanobacteria are ubiquitous organisms with a relevant contribution to primary production in all range of habitats. Cyanobacteria are well known for their part in worldwide occurrence of aquatic blooms while producing a myriad of natural compounds, some with toxic potential, but others of high economical impact, as geosmin. We performed an environmental survey of cyanobacterial soil colonies to identify interesting metabolic pathways and adaptation strategies used by these microorganisms and isolated, sequenced and assembled the genome of a cyanobacterium that displayed a distinctive earthy/musty smell, typical of geosmin, confirmed by GC-MS analysis of the culture’s volatile extract. Morphological studies pointed to a new Oscillatoriales soil ecotype confirmed by phylogenetic analysis, which we named Microcoleus asticus sp. nov. Our studies of geosmin gene presence in Bacteria, revealed a scattered distribution among Cyanobacteria, Actinobacteria, Delta and Gammaproteobacteria, covering different niches. Careful analysis of the bacterial geosmin gene and gene tree suggests an ancient bacterial origin of the gene, that was probably successively lost in different time frames. The high sequence similarities in the cyanobacterial geosmin gene amidst freshwater and soil strains, reinforce the idea of an evolutionary history of geosmin, that is intimately connected to niche adaptation.


Results
Identification of secondary metabolite production potential through genome mining. The genome assembly of Microcoleus asticus sp. nov. was performed and its major statistical attributes are described in Table 1.
The quantitative assessment and annotation of the generated genome resulted in a value of 99% completeness (see supplementary Fig. S1) and the functional annotation of the predicted transcriptome is presented in supplementary Table S1.
After a genomic survey for non-ribosomal peptide synthase (NRPS), polyketide synthase (PKS), hybrid NRPS/PKS gene clusters and ribosomally synthesized and post-translationally modified peptides (RiPPs), we were able to identify the complete gene cluster for the synthesis of terpenoid geosmin, composed of the geosmin synthase gene (geoA) (758 aa in length), followed by two cyclic nucleotide-binding genes (cnb) (471 and 469 aa in length each) of the Crp/Fnr-type global transcription regulators (Fig. 1). Our organism presents the same cluster  Fig. 1 demonstrate.
Other genes that participate in the synthesis of known cyanobacterial secondary metabolites were also detected, albeit, in none of them, the complete cluster was found. The incompletes gene clusters identified would code for: benzenodiol resorcinol, cyanopeptolin (DarB was detected as well as genes with shared similarity), trichamide, and the mixed PKS/NRPSs nostophycin, nostopeptolide, and jamaicamide. Furthermore, we also identified the existence of two unknown terpene biosynthesis gene clusters showing the potential for alternative pathways for the synthesis of terpenoid compounds and other secondary metabolites that are still unknown.
Analysis of terpenoid compound geosmin production by Microcoleus asticus sp. nov. The extract of volatile compounds produced by the isolate was analyzed to test if the geosmin complete gene cluster was, in fact, producing this terpenoid secondary metabolite. The odorous molecule geosmin was identified through headspace solid-phase micro-extraction (SPME). Figure 2 shows the ion monitoring chromatogram of the detection of geosmin from the volatile extract produced from Microcoleus asticus sp. nov., with retention time 15.5 min, matching the peaks for geosmin standards.
Geosmin synthase gene is scattered throughout three bacterial phyla. To better illustrate geosmin presence/production in Bacteria, it was fundamental to tackle different questions: How frequent is the gene spread in terms of phyla?; What species possess the genetic machinery to produce geosmin and even niche occupancy?; Can we trace the evolutionary history of the gene in Bacteria? To do so, we compiled a set of geosmin gene sequences, collected by BLAST searching public databases for complete and partial putative geosmin gene  sequences. Our first outcome, was that sequences were restricted to only three bacterial phyla: Cyanobacteria, Actinobacteria, and Proteobacteria represented by two classes, Delta and Gammaproteobacteria. Besides being restricted, the geosmin synthase was not evenly distributed in those groups (Fig. 3), with just some few representatives of these groups presenting the gene.
The environmental sources of our set of bacterial strains are also depicted in Fig. 3, where it was possible to identify the occupancy throughout very diverse niches. Regarding Cyanobacteria, we identified 18 strains associated with terrestrial niches: 8 soil strains, 7 strains in symbiosis with lichens, 1 in symbiosis with liverwort and 2 strains that exist as symbionts of plant roots. The aquatic strains are mostly from freshwaters with only 1 out of 15 representatives being from brackish waters. Four strains do not have publicly available information, of niche occupancy. Deltaproteobacteria species are mostly associated with soil or decaying wood or tree bark and there is a single aquatic strain, the marine Myxococcus fulvus. Moreover, of the 3 Gammaproteobacteria strains, 2 are related to bacterial infections in edible mushrooms. Actinobacteria are mostly terrestrial strains, but 2 out of the 13 actinobacterial strains are associated with insects, while 1 was identified in a human lung infection. Figure 3 has also the indication of known geosmin producers, using information collected from the literature, and systematized in supplementary Table S2.
Taking into account the number of geosmin producers used in our phylogenetic study, we found that, of the 36 Cyanobacteria analyzed, 21 are known geosmin producers, whilst 2 out of the 14 Deltaproteobacteria strains have been tested positive regarding the synthesis of this metabolite. Furthermore, among the set of 13 Actinobacteria strains, there are 3 known producers of geosmin, while there is no information regarding the production of geosmin by the 3 Gammaproteobacteria strains in our dataset. In fact, to our knowledge, this is the first report of geosmin synthase gene in Gammaproteobacteria.
Overall, the gene tree topology matched the major phyla divisions of a 16S rRNA based species tree for Bacteria 35 . To better understand the evolutionary history of this gene in Bacteria, we compared the geosmin gene tree presented in Fig. 3 with a well-supported 16S rRNA based species tree 35 . Some differences in tree topology were visible between these two phylogenetic trees regarding the positioning of Cyanobacteria, Actinobacteria, Delta and Gammaproteobacteria. The geosmin synthase gene tree shows a close relationship between Cyanobacteria and Deltaproteobacteria, while in the species tree this closer relationship is not clearly visible. Nevertheless, Actinobacteria appears to be the most ancient phylum to harbor the geosmin gene as well as the most ancient of the four taxa divisions in the species tree. We decided to analyze these two phylogenies using a tree reconciliation algorithm to obtain support for a possible scenario for the evolutionary history of the geosmin gene in Bacteria (see Supplementary Fig. S2). Nevertheless, it did not reveal any clear patterns of evolution for the geosmin gene in Bacteria, probably due to a complex scenario involving several evolutionary drivers that probably lead to many geosmin gene loss events, which is the most dominant source of genetic variation in bacterial genomes 35,36 . The gene tree also exposed the formation of two sister clades in Cyanobacteria, with no apparent similarities in terms of genera or niche of the strains. Being the formation of the two groups in Cyanobacteria incongruent with its phylogeny and taxonomy and to complement the geosmin gene's evolutionary history through Bacteria, we did a closer analysis of the two conserved magnesium binding motifs of the N-terminal half part of the geosmin synthase gene represented in Fig. 4, which is an important region for the catalysis reactions during geosmin synthesis. The N-terminal half part displays particular modifications between the groups and to better quantify the differences between these groups we performed a sequence similarity analysis of the gene sequence alignment, producing a similarity percentage matrix used to build the heatmap representation in Fig. 4. The differences and similarities between geosmin genes in Bacteria are noticeable, where the values of sequence similarity between all 66 bacterial strains of our dataset accentuate five major groups that share, within each group, similarity values between 76 to 100%. The two separate cyanobacterial groups can be distinguished, which we called Cyano I and Cyano II, as well as three groups we called Delta, Gamma, and Actino, and highlighted by the grey bars over the gene tree in Fig. 4. In Bacteria, the two motifs have three metal-binding residues each: the aspartate-rich motif has a universal consensus sequence DDXXX(D) and downstream of it, is the second motif, the NSE/DTE triad, with consensus sequence (N,D)D(L,I,V)X(S,T)XXXE 30,32,34,37 . The amino acid residues in bold are the metal ligands and are identified in Fig. 4 by the letters Mg. The five groups in the gene tree, share similarities in the two magnesium binding motifs which are represented by the amino acid logos in Fig. 4. We took as reference the Cyano I motif sequences DDHFLE and NDLFSYQRE, to highlight the residue substitutions in each group, which are colored in orange. These substitutions in both motifs, although do not appear to be critical to the binding capability, since it does not occur in the binding residues or in positions that are strictly conserved, can affect the motif conformation and the binding pocket with unforeseen consequences. Focusing on the magnesium binding motifs of the two cyanobacterial groups, the first motif of Cyano II group is identical to what is found in Cyano I, on the other hand in the NSE/DTE triad, there are two amino acid substitutions: the 3 rd residue (leucine) is replaced in Cyano II by isoleucine and the 4 th residue, a phenylalanine in Cyano I, is replaced by a leucine in 2 strains of Cyano II. Still, none of the strains in Cyano II were tested for geosmin production, so more tests should be made in order to clarify the impact of these amino acid modifications.
Cyano I group harbors cyanobacterial strains that share the same residue sequence of the Mg 2+ binding motifs, but it is clear that in terms of the broader analyzed protein sequence Leptolyngbya sp. A2 is distinct from other cyanobacterial sequences, sharing with them 77 to 82% similarity, while strains in Cyano I share amongst them 83% to 100% similarity. Strains in Cyano II share high similarity values, 91 to 100%, and lower values with Leptolyngbya sp. A2 (78 to 79%). Leptolyngbya sp. A2, a MIB and geosmin producer from freshwater 23 has its highest similarity values with the Nostocales strains in this study except for Fischerella muscicola and Calothrix sp. NIES-2100. The Cyano I group also reveals the existence of two clusters formed both by gene sequence and niche, where the freshwater strains in the clade Aphanizomenon/Anabaena/Dolichospermum share 94% to 100% sequence similarity and soil-related strains that include all Calothrix and Nostoc strains in our set, Oscillatoria sp. PCC 6506, F. muscicola and Cylindrospermum stagnale share amongst them 90% to 100% similarity. Microcoleus Figure 3. Phylogeny of the geosmin synthase gene in Bacteria calculated from a protein alignment of the geosmin synthase gene sequence. Tested geosmin producers are identified by the letter G and the letters NT identify strains that have not yet been analytically tested. The environmental origin of each strain is also shown. Black dots and thick branches represent maximum likelihood and posterior probability values higher than 85, respectively.
Analysis of the geosmin gene tree highlights the probable close evolutionary history of geosmin synthase gene of Cyanobacteria with the Deltaproteobacteria gene, with higher similarity values shared with the two Archangium strains and C. fuscus (74 to 81%). In fact, the Deltaproteobacteria phylum shows a tight proximity among its gene sequences with similarity values ranging from 76 to 100%. Regarding Actinobacteria, the Streptomyces sp. clade forms a closely related group with clear differences in the protein sequence to the other actinobacterial strains, despite their similarities of the geosmin synthase metal-binding motifs. The Pseudomonas strains (P. gingeri, Pseudomonas sp. QS1027 and P. agarici), which we identified by in silico search, have low similarity values with all sequences from the other bacterial phyla, with the lowest similarity values (61%) with Deltaproteobacteria's strains M. xanthus DK 1622 and C. coralloides DSM 2259 and Actinobacteria's S. roseum and S. cattleya. The best similarity values with the Pseudomonas strains were identified for Deltaproteobacteria C. fuscus (67 to 70%) and Cyanobacteria Fischerella sp. PCC 9431 (69 to 70%). Taken as reference the Cyano I group, Actino is the group with more modifications which in turn occur with low frequency (L by V in the first site and L by I and Q by E in the second binding site), while Cyano II group has two modifications both in the second binding site: a complete replacement of L with I and F with L. Delta and Gamma groups have both one amino acid replacement: E by Q in the second binding site and Y/H in the first binding site, respectively. the selection pressures on geosmin gene in Bacteria. We quantified the selection pressures on geosmin gene using GARD algorithm to identify recombination breakpoints, which could increase the false points of positive selection pressure. This pre-analysis identified one recombination point in the amino-acid alignment. We then tested three different algorithms; SLAC, FUBAR and MEME and in all of them the global value of dN/ dS is significantly higher than 1. We identified most codon sites as positive selection sites, indicative of strong diversifying selection pressure of the geosmin gene in Bacteria. The SLAC analysis points to pervasive (refereeing to the whole phylogeny) positive selection in 28 out of 269 sites and negative selection in solely 1 site (p < 0,05) while the FUBAR test also points to positive (35 sites) over purifying (21 sites) pervasive selection (pp > 0, 9). The MEME algorithm, used for the identification of adaptive evolution in individual sites, shows positive selection in 55 sites (p < 0,05), confirming our hypothesis that several sites in the geosmin gene were subjected to positive or diversifying selection pressure. www.nature.com/scientificreports www.nature.com/scientificreports/ Morphological description of Microcoleus asticus sp. nov. The novel specimen is a terrestrial, free-living filamentous Cyanobacteria without heterocysts and akinetes. It forms dense mats in culture conditions in both liquid and solid media. In the liquid, the mat is both submerged attached to the bottom of the flasks and walls and also at the surface of the liquid forming aggregates with pockets of air (Fig. 5a). In solid, it dwells the medium and it grows in all its thickness. The filaments are dark green in color, uniseriate, straight, without false branching (Fig. 5b). Mucilage is present and visible in light microscopy; each filament is enwrapped by a single sheath (Fig. 5b). In natural samples the filaments were solitary but in dense cultures entangled filaments were (e) heteropolar filament; (f) variation in end cell morphology in straight filaments: 1 -truncated, 2 -conical, 3 -broadly rounded; (g) tapered filament, note a reduction of 64% in cell width towards the end between the two arrows, the lines represents the cell morphometric measurements, x -length, y -width; (h) tapered filament with a calyptra (arrow); (i,j) filaments with spherical and square hyaline membranous structures (arrows); (k) distribution density of the cells widths and lengths, the white dot indicates the median and the black bar represents the interquartile range, the black line represents the 95% confidence intervals after 500 measurements. Scale bars 10 µm.

Scientific RepoRtS |
(2020) 10:8679 | https://doi.org/10.1038/s41598-020-64774-y www.nature.com/scientificreports www.nature.com/scientificreports/ visible with no evidence of a shared sheath (Fig. 5c). The filaments have movement capability and exhibit phototaxis. Cells are wider than long and rarely isodiametric, 5.87 ± 0.674 µm (CV = 11.4%) wide and 3.95 ± 0.776 µm (CV = 19.7%) long. The morphometry with the distribution density of the cell widths and lengths is presented in Fig. 5k. The minimum cell width measured was 2.90 µm and the maximum 7.62 µm, cell length had a minimum of 1.82 µm and a maximum of 6.05 µm. Filaments are cylindrical, heteropolar, with both straight and narrow ends (Fig. 5e). Tapered filaments can have a reduction of 64% in cell width towards the end (Fig. 5g). The terminal cells can have several morphologies: can be pointed in tapered filaments with or without a thickened membrane (calyptra) (Fig. 5g,h), rounded in strait non-calyptrate filaments ( Fig. 5f) and can also present spherical, cylindrical or square hyaline membranous structures (Fig. 5i,j).
Filament separation is by means of necridia and two types of these cells are present in this species (Fig. 6). One type is formed by only one dying cell in which the result filaments are straight, small and with round ends (hormogonia) that slide away from each other inside the sheath (Fig. 6a). The other type involves several dying cells in a very specific pattern: first a swollen hyaline cell is formed in the filament (Fig. 6b,c), after that, the cells around that hyaline nodule start to degrade (Fig. 6d,e) and the filament can break at any point along that extension of the degrading cells (Fig. 6f,g). The resulting filaments present tapered ends. The dying group of cells can be observed in fluorescence microscopy with cell viability imaging reagents ( Fig. 6h-l). In Fig. 6h is the filament in bright field microscopy after the separation of necridia. Observation in bright field can be misleading since it can be interpreted as a different morphologic filament end. However, when stained with propidium iodide (Fig. 6i) and NucBlue ® (Fig. 6j) it is visible the non-viable cells versus viable cells (Fig. 6l).
In cell ultrastructure it is visible the constrictions at cross-cell walls (Fig. 7a) and regular cyanobacterial cell inclusions, such as polyhedral bodies, polyphosphate granules and cyanophycin granules (Fig. 7a,b). Gas vacuoles are absent in this species. Cell division is in one plane and perpendicular to the cell wall, membrane invaginations for the new cells are visible at several stages of development simultaneously (Fig. 7c). The cell wall structure is gram-negative, formed by the S-layer, outer membrane, periplasmatic space, peptidoglycan layer and inner cytoplasmatic membrane (Fig. 7d). In the cell cross-section there is a well-developed mucilaginous sheath involving only one filament (Fig. 7e). In high magnification, we can see the oscillin fibrils attached to the s-layer that help the movement of the filament inside the sheath (Fig. 7f) and the excreted exopolysaccharides (Fig. 7g). Lipids and cyanophycin granules are located mainly at cross-cell walls (Fig. 7a,c,g-i). The nucleoplasmatic region is in the center of the cell (Fig. 7c,e). The thylakoids have a fasciculate arrangement of irregularly distributed and omnidirectional membranes (Fig. 7c-f,h-l). The fascicules run in parallel and form curves (Fig. 7i,j) to fully spherical formations (Fig. 7i,k,l). phylogenetic analysis Microcoleus asticus sp. nov. To perceive the phylogenetic context of this soil isolate, it is required to find its kindred strains. Our primary 16S rRNA based phylogenetic analysis (Fig. 8a) using a curated database of Cyanobacteria 16S rRNA, confirmed the isolate as a member of the order Oscillatoriales and a close relative to Oscillatoria nigro-viridis PCC 7112. In an effort to refine and validate this primary identification, we performed a close-up phylogenetic analysis, using the 16S rRNA gene, widely represented in genomic databases, which confirmed a close phylogenetic relationship of the isolate with other strains from the family Microcoleaceae (Fig. 8b). Finally, a refined phylogenetic analysis of a set of 64 housekeeping genes (see Supplementary Table S3) present in 11 fully sequenced Oscillatoriales strains, to unveil its relationship with other strains from the Microcoleus vaginatus/Microcoleus autumnalis (former Phormidium autumnale) clade, as presented in Fig. 8c. Our phylogenetic analysis supports this isolate as new species, being the closest completely sequenced genomes the Microcoleus vaginatus FGP-2 and Oscillatoria nigro-viridis PCC 7112. The similarity matrix constructed from the set of 64 genes of these three genomes shows differences of 3% with M. vaginatus FGP-2 and 7% with O. nigro-viridis PCC 7112 (supplementary Table S4). We further compare the average nucleotide identity (ANI), the differences in G + C content and in silico DNA-DNA hybridization (DDH) between our isolate and its closest relative Microcoleus vaginatus FGP-2. For the ANI 38 analysis results showed that average nucleotide identity is91.71% ( Table 2). The DDH estimate 39 (GLM-based) was 46.40% with a distance of 0.0805 with a probability that DDH > 70% (same species) of 10.65% and probability that DDH > 79% (same subspecies) of 2.28% and a difference in % G + C of 0.44. Here designated Etymology: of/located in a city, city, urban Type Locality: Lisbon city center, urban area, Av. Infante Santo, coordinates: 38°42′33.7″N 9°10′00.3″W, Portugal.
Habitat: Cyanobacterial mat in soil dominated by Nostoc sp. in a street flowerbed from an urbanized area. Description: Filaments form dark olive-green mats, only entangled when the densely mat is formed, otherwise, solitary, motile and enwrapped in a single sheath. The sheath is prominent, firm, colourless and hyaline. Filaments are cylindrical, heteropolar (straight and/or attenuated towards the ends) and constricted at the cross-walls. Apical cells can be broadly rounded to conical with or without calyptra. Cell content is granulated with the cell inclusions visible in light microscopy. Lipid droplets are located near the cross-walls, polyphosphate granules and polyhedral bodies are in the center of the cell and cyanophycin granules are mainly near the cross-walls, but can also be scattered in the cytoplasm. Gas vacuoles are absent. Cells are wider than long and rarely isodiametric; 2.

Discussion
We have isolated and sequenced the genome of a soil Cyanobacteria that is able to produce geosmin. Based on a phylogenetic study and morphological analysis a new representative of the Microcoleus vaginatus/Microcoleus autumnalis 41,42 clade was confirmed and named Microcoleus asticus. The strong support of our phylogenetic and the results of average nucleotide identity, and DNA-DNA hybridization, allowed us to confirm that it is a new (g) Detail of the mucilaginous sheath with excreted exopolysaccharides (Eps); (h) fasciculate thylakoid membranes in longitudinal section, note the different directions of the membranes some were cut longitudinally and others were cut transversely -arrows; (i) Oblique section of the filaments, note the lipid droplets near the cross-walls and the cylindrical membrane structure that resembles "thylakoid centers" -arrow 48,49 ; (j) Close-up of the oblique section evidencing the curved fascicules of thylakoids (arrow); (k) spherical formations of the thylakoid membranes -arrow; (l) spherical formations of the thylakoid membranes in high magnification -arrow. Lplipid droplet, Cb -carboxysomes/polyhedral bodies, Ph -polyphosphate granules, Cy -cyanophycin granules, Gy -electroyaline storage granule (glycogen granule), Ccw -cross-walls, Cw -Cell wall, Nc -nucleoplasmic region with DNA and ribosomes, Thy -thylakoid membranes, Rhy -Ribosomes, Phy -phycobilisomes, Shmucilaginous sheath.

Scientific RepoRtS |
(2020) 10:8679 | https://doi.org/10.1038/s41598-020-64774-y www.nature.com/scientificreports www.nature.com/scientificreports/ species with close proximity to Microcoleus vaginatus FGP-2, a free-living isolate collected from a desert-soil crust 43 and a member of the Microcoleaceae IV family 41,44,45 . The ANI values were below the 95-96% similarity threshold 46,47 and DDH estimate also below 70% similarity threshold for species boundaries 39,46,48 indicating that our isolate is a different species from M. vaginatus. The main distinctive morphological characters are the type of fasciculated thylakoid arrangement and the type of necridia cells. The curved and spherical formations of the thylakoid membranes that resemble the centers for thylakoid connectivity described by Nevo et al. 48,49 are not described for Microcoleus. Spherical thylakoid formations are present in several distinct Cyanobacteria 44,50 , but none of the reported combinations of spatial, directional, and morphological rearrangement of the thylakoids are similar to the ones present in Microcoleus asticus sp. nov 44,50 . Necridia cells are also different in Microcoleus asticus sp. nov., beside the separation disc formed by one cell that is typically described 44 , this species also presents fragmentation by a group of degrading cells. Surprisingly the resulting filaments have tapered ends. Tapered end cells is usually attributed to filament maturation rather than newly separated filaments in Oscillatoriales 44,51 . All the other morphological characters are congruent with the genus Microcoleus sensu stricto with the type species  www.nature.com/scientificreports www.nature.com/scientificreports/ Microcoleus vaginatus (Vaucher) Gomont ex Gomont (1892). Nevertheless, morphology is quite similar within some groups of the order Oscillatoriales (ex. the genus Phormidium) so, genetic information is necessary for the distinction 41,42 . Microcoleus is worldwide distributed and mainly associated with soil, aerophytic and epiphytic habitats and is one of the main constituents of biological soil crusts 43,[52][53][54] . In soil environments Cyanobacteria play an important ecological role providing a nitrogen source in symbiotic relationships and producing phytohormones required for plant growth and development 55,56 , in soil particle aggregation, erosion reduction, increasing water penetration and retention and nutrient recycling [57][58][59][60] . Although the knowledge in the diversity of soil Cyanobacteria is increasing, much is still unknown regarding species composition and bioactive compound production.
The in silico mining of the genomic data identified a complete geosmin synthesis gene cluster, and the analytical analysis for volatile compounds allowed us to conclude that Microcoleus asticus sp. nov. IPMA8 is actively producing geosmin.
Our primary search for the geosmin synthase gene in bacterial public genomic databases allowed the identification of this gene in Gammaproteobacteria -which to our knowledge, is reported here for the first time -as well as in Deltaproteobacteria, Actinobacteria, and Cyanobacteria. Our efforts to map the presence of the geosmin gene in Bacteria revealed a restricted distribution, to these three phyla, crossing distinct genera from different niches, suggesting a possible rich evolutionary history of the gene. In our analysis the congruence between the bacterial species tree and the gene tree confirms a probable ancient origin of geosmin gene in Bacteria, that might have resulted from a mixture of evolutionary processes difficult to disentangle but that we identified as dominated by gene loss.
It is noteworthy that even in Cyanobacteria the distribution of geosmin gene is not heterogeneous, from the eight orders described for this Phylum, the Oscillatoriales, Nostocales and Synechococcales harbor the majority of known geosmin producers 10,11,24,44 . The gene cluster in Cyanobacteria is a mostly conserved one; to our knowledge the only cyanobacterial strain for which a different cluster organization was detected is Phormidium sp. (Pr_1 and Pr_2) 61 . The apparent disparity of the Cyano II group when compared with the straightforward divisions in the species tree could be explained with minor specific alterations in taxon and niche, since geosmin genes studied here appear to come from a common ancient Cyanobacteria and thus are orthologous genes. The geosmin gene cluster has two global transcription regulator genes, known to modulate cellular signals associated with responses to environmental stress 62 . The high conservation of the arrangement of the cluster, in Cyanobacteria, could thus indicate its high importance in controlling environmental adaptation, which is supported by the results described in other studies, indicating that the synthesis of geosmin and other volatile organic compounds, could be related with defense/offense mechanisms towards other microorganisms 63,64 . In Nostoc punctiforme PCC 73102, formation of geosmin results from conversion of the sesquiterpene precursor farnesyl diphosphate. However, upon expression of the enzyme in E. coli, no geosmin is produced, but instead synthesis of alternative products of geosmin synthases, such as germacradienol, germacrene D, and germacrene A, were observed, suggesting a dependency for environmental factors or other enzymes and metabolites for the active production of geosmin 65 .
Cyanobacteria and Deltaproteobacteria have very conserved and similar gene architecture, with a three gene operon assembly, whilst in Gammaproteobacteria and Actinobacteria this construction is not widely conserved. When circumspect geosmin synthase sequences, we find that Actinobacteria representatives are the most distant phylogenetic relative to Cyanobacteria while Deltaproteobacteria's protein sequences are the closest (see Fig. 3). Otherwise, when the motifs in these protein sequences are analyzed in detail, Actinobacteria has globally more similarities with Cyanobacteria, despite having more modified amino acid positions (see Fig. 4). Actinobacteria are major components of both soils and freshwaters 66 . So, the inferences made from these patterns of evolution cannot be made separately, since these are probably the result of distinct evolutionary events acting on these phyla either simultaneously or in different time frames.
Even though half of the strains in our cyanobacterial dataset are associated to soil related environments (18 out of 36), a considerable part (15), has an aquatic origin and only one is found in brackish waters. It is also evident that free living terrestrial Cyanobacteria that carry the cluster, including our specimen, are more common than previously reported. This implies that Cyanobacteria are likely important contributors for the geosmin production in soil. Cyanobacteria from marine environments with the genetic capability to synthesize geosmin were not identified in our survey, which is congruent to what has been described in other studies 11,15,25 . However, as stated by Jutner and Watson 25 salinity by itself is not impeditive of geosmin production since Cyanobacteria from brackish and high salinity environments are able to produce it. Additionally, there is only one marine organism in our data set carrying the full geosmin gene cluster which is the Deltaproteobacteria Myxococcus fulvus HW-1 67 .
Myxococcus are mostly associated with terrestrial environments and marine specimens have only recently been discovered 67,68 . From an evolutionary perspective the salinity barrier in habitat expansion can be difficult to overcome 66 . Nevertheless, according to Sánchez-Baracaldo 2015 69 the origin of marine planktonic cyanobacteria lies in freshwaters and benthic marine ancestors. The brackish water Nodularia spumigena CCY9414, for example, evolved from freshwater relatives by the end of the Pre-Cambrian period 69 . This strain carries the geosmin gene (geoA accession Number: WP_063873980.1). So, there is the possibility that marine cyanobacterial geosmin producers are already present in marine environments or perhaps the geosmin gene had no specific advantage in the sea and was lost throughout time. Interestingly, in Cyanobacteria it is noticeable the high sequence similarity of the geosmin gene among strains from freshwater environments as well as higher values for strains from soil symbiotic related environments, reinforcing that the evolutionary history of this gene could be related to niche adaptation.
Some questions can be posed at this time: Could these similarities in the metal-binding motifs and the broader protein sequence imply that the geosmin gene was acquired by Cyanobacteria from Actinobacteria? But what about the Cyano II group, was it a later acquisition to the cyanobacterial phylum? (2020) 10:8679 | https://doi.org/10.1038/s41598-020-64774-y www.nature.com/scientificreports www.nature.com/scientificreports/ A congruence test of the geosmin gene tree with a well-supported Bacteria species tree revealed no evidence for any specific pattern of evolution, instead, it appears that selection is acting towards the elimination of genes that were probably representing excessive energy costs. Nevertheless, our scrutiny of the natural selection pressures on the geosmin gene, shows clear evidence of strong positive selection when tested for both whole bacterial phylogeny and codon sites. These results support our hypothesis of an early presence of the geosmin gene in the bacterial kingdom, which was successively lost in the most derived branches of the species tree. The fixation of beneficial mutations, from positive natural selection, ultimately leads to differences among species and adaptation to different niches. Many bacterial strains have kept the geosmin gene and the ability to synthesize the metabolite, as it is clear by the present-day producers in Actinobacteria, Deltaproteobacteria and Cyanobacteria. Recent studies describe that Cyanobacteria kept the geosmin gene through purifying selection 70 . Also in Cyanobacteria, others have proposed a similar evolutionary history for the cyanotoxin microcystin, synthesized by several genera, proposing that negative selection was the main driver for the sporadic distribution of microcystin synthesis in this phylum: the toxin was present in ancient cyanobacteria and was successively lost throughout the species tree, refuting the lateral gene transfer hypothesis for this gene cluster 71 . These two examples are specific of Cyanobacteria and of distinct genes, which have their on environmental queues, but show that distinct evolutionary processes could be pressuring and shaping bacterial genomes and particularly the geosmin profile in Bacteria. We identified only three Pseudomonas strains, with the geosmin gene, which we do not know if are effective geosmin producers. Interestingly, several studies on geosmin biodegradation have identified strains of Pseudomonas, Sphingopyxis, Chryseobacterium, Sinorhizobium, Stenotrophomonas, Novosphingobium (Gamma, Beta and Alphaproteobacteria, and Actinobacteria) with potential to be used in bioremediation of geosmin and 2-MIB in aquaculture and water treatment facilities, often in bacterial consortiums in specific niches [72][73][74][75] . Several bacteria evolved to rely on less abundant and less explored carbon sources as their primary carbon source like monoterpenes and other volatile organic compounds 76 , which could ultimately result in losing the ability to synthesize terpenoids like geosmin, in specific environments. How this gene was kept and/or lost by distinct members of Bacteria, especially in the Phyla we studied here, is a result of many evolutionary processes that acted simultaneously or in different time frames or in different environments and different microorganisms. Our data support that this gene, was present in an ancient bacterium, probably an ancestor of Actinobacteria, but due to a rich evolutionary history was successively lost by several groups of Bacteria, resulting in this patchy presence in the taxa discussed.
Current knowledge on geosmin distribution, points to a bacterial originated soil compound that is strongly precepted by animals and also endogenously produced by higher plants, where terpenoids have been linked to defense mechanisms 18,19,21,22,27,29 . Nevertheless, the exact function of this compound and the adaptive advantage in "retaining" this gene cluster in Bacteria is still unknown. It is clear the near absence of marine geosmin-producing organisms, which pushes us to pursue the cunning halotolerant geosmin gene carrier Cyanobacteria. In fact, our study demonstrates that bacterial, as well as cyanobacterial, soil representatives could still be identified, diversifying the origins of geosmin producers that could have biotechnological uses. Over the last years, the increasing number of reports of geosmin occurrence in freshwaters states them as significant contributors for the production of this organic volatile compound in this environment. Likely, the increasing reports on terrestrial Cyanobacteria producing geosmin, indicates that they are also key players of its production in soil. Our study demonstrates that geosmin in soil can have multiple cyanobacterial origins resulting from intrinsic factors and environmental drivers during evolution.

Methods
Sample collection, isolation and culture maintenance. The cyanobacterial specimen used in this study was obtained from soil mats of the Cyanobacteria Nostoc sp. from an urbanized area in Lisbon, Portugal. A sample of the Nostoc sp. mat was collected and washed with Z8 culture medium. The washings were observed under the inverted microscope Leica ® DMi8 and cyanobacterial filaments were isolated by micromanipulation with a glass capillary. Only one filament was picked up at each time, washed in new culture medium several times and placed in a culture flask to propagate. Successfully established cultures were maintained at 20 ± 1 °C with a light intensity of 10 μmol·photons·m −2 ·s −1 and a 12:12 h light: dark cycle.
Morphological characterization. The specimen was studied using both optical and electron microscopy.
The morphological characters evaluated were: filament color and shape, cell dimensions (length and width), constrictions at the cross-cell walls, shape of apical cells, presence/absence of sheath, false branching, calyptra and necridia 44 . The ultrastructure aspects analyzed were: arrangement and vacuolization of the thylakoids, type of cell division and cellular inclusions 44 . Captures were taken with a Leica ® DFC7000T digital camera using a Leica ® DMi8 inverted fluorescent microscope at 1000× magnification. The cell measurements were performed using LAS V4.12.0 Leica ® software 2017, at least 500 measurements were done in 100 different filaments. The description and identification of the Cyanobacteria specimen was done based on the following manuals and bibliography [40][41][42]44,51,77 .
Cell viability was accessed in live cells using the ReadyProbes ™ Cell Viability Imaging Kit (Blue/Red) from Invitrogen ™ composed of NucBlue ® Live reagent and propidium iodide following the manufacturer's instructions. The cells were visualized under the Leica ® DMI8 inverted fluorescent microscope and photographs were taken with a Leica ® DFC7000T camera.
The cell ultrastructure was analyzed with transmission electron microscopy following the method describe in Churro et al. 78  www.nature.com/scientificreports www.nature.com/scientificreports/ ferrocyanide in 0.1 M cacodylate buffer for 3 h on the ice and protected from light. Cells were then washed three times in 0.1 M cacodylate buffer for 10 min. To increase cell contrast, cells were incubated with 0.5% uranyl acetate for 1 h in the dark at room temperature and rinsed with distilled water for 15 min. Cells were dehydrated in a graded ethanol series (30,50,70, 95%, 20 min each, and absolute ethanol 3 × 20 min) at 4 °C, with a final step of propylene oxide for 10 min. After dehydration, cells were embedded in Spurr's low-viscosity epoxy resin overnight, placed in flat embedding silicone rubber molds and polymerized at 60 °C for 24 h. Sections were post-stained with 2% uranyl acetate (5 min in the dark) and lead citrate (5 min). Sections were examined with a Hitachi ® H-7650 transmission electron microscope.
DnA extraction and sequencing. The DNA was extracted from the cultures using the DNeasy Plant Mini Kit, Quiagen ® . The total DNA concentration was quantified using the Qubit ™ Fluorometric Quantitation, Thermo Fisher Scientific ® . The DNA libraries were created using Nextera DNA Library Preparation Kit, the purified DNA was sequenced on an Illumina NextSeq. 500 (Illumina, San Diego, CA, USA) and the sequencing was executed by Instituto Gulbenkian de Ciência at the Genomics Facility in Oeiras, Portugal. Genome assembly procedures were done as we have previously described for other genomic studies 79 . Genomic analysis, assembly and general features. The genome was assembled using SPAdes v3.6.10 80 .
We used QUAST 81 to calculate the quality statistics of the genome assemblies. Annotation of the genome was done with PROKKA v1.11 82 and RAST v2.0 83 using default parameters to identify putative genes (coding and non-coding sequences). Identification of orthologs proteins clusters was accomplished with eggNOG v4.5 84 and the identification of CRISPR repeats, typical in Cyanobacteria, was performed with CRISPR Recognition Tool CRT v1.1 85 , considering a minimum of 3 repeat units. The prediction of transmembrane topology and signal peptide sites was done using Phobius 86 . To evaluate the presence and possible origin of prophage sequences, identification and annotation of these sequences were performed using PHASTER 87 . The quantitative assessment and annotation completeness of the assembled genome was performed with BUSCO v3 88 , by comparison with BUSCO's orthologues database set for Cyanobacteria, odb9. We used the online tool antiSMASH v4.0 89 to identify the presence of non-ribosomal peptide synthase (NRPS), polyketide synthase (PKS), hybrid NRPS/PKS gene clusters and other domains of secondary metabolites produced by Cyanobacteria. phylogenetic analysis. In order to address the phylogenetic relationships of Microcoleus asticus sp. nov.
IPMA8 in the cyanobacterial phylum a phylogenetic tree of the 16S rRNA gene sequences of the 118 strains from a public dataset of cyanobacterial 16S rRNA genes collected in CyanoType v.1 database 90 . This is a curated database of 16S rRNA gene sequences from 317 relevant cyanobacterial strains, which aided to our initial identification of kin strains of Microcoleus asticus sp. nov. IPMA8, for which we used the compressed set of 118 representative cyanobacterial strains. 16S rRNA gene sequences were separately aligned with MAFFT 91 and trimmed using Gblocks v0.91b 92,93 . ML phylogeny values were calculated with IQTree 94 , using the best-fit model of substitution proposed by this algorithm, for 1.000 bootstraps under GTR + I + G model. Subsequently, to refine the phylogenetic relations of our strain and validate this gene tree, we calculated a phylogenetic tree of 12 Oscillatoriales strains using a set of 12 fully sequenced genomes, which was built from the concatenation of 64 gene sequences common to all 12 genomes (see supplementary Table S3). Gene sequences were separately aligned with MAFFT and trimmed using Gblocks v0.91b. To infer ML phylogenies, we used IQTree to compute and support the trees by calculating 1.000 bootstraps using GAMMA distribution and LG + I + F model, the best-fit model of substitution proposed by this algorithm. Bayesian phylogenies were inferred with MrBayes v3.2 95 for 100.000 generations using the same model and a discarded burn-in rate of 25% of the initial generations. To further investigate the relationship of Microcoleus asticus sp. nov. IPMA8 among other Cyanobacteria from the Oscillatoriales order, we used the 16S rRNA gene sequence for a set 23 of publicly available cyanobacterial nucleic sequences. Trees were computed using RAxML under GTR + I + G model 96 , which we confirmed using PartitionFinder v.2.1.1 97 for 500 bootstraps.

Analysis of the geosmin synthase gene in Bacteria.
To interrogate the phylogenetic relationships of the geosmin synthase genes in Bacteria kingdom we inferred a gene tree with geosmin synthase genes available on Genbank (accessed on Oct. 2018). Candidate geosmin synthase genes were identified by BLAST comparison with a reference geosmin synthase gene from Cylindrospermum stagnale PCC 7417, with a cut-off value of 50% identity on similarity. We collected complete and partial sequences of geosmin synthase sequences from available databases (in the case of several Nostoc the gene was only partly sequenced) from 66 representative bacterial strains from four distinct phyla. Several putative gene sequences from the three bacterial phyla were identified, which including 36 cyanobacterial gene sequences, three Gammaproteobacteria, 14 Deltaproteobacteria and 13 Actinobacteria strains covering all identified genus and strains of known geosmin producers and non-producers. Gene sequences were aligned with MAFFT and trimmed with Gblocks resulting in an alignment of 273 amino acid positions, which included the N-terminal part of gene. This alignment was used to calculate the phylogenetic tree for geosmin synthase gene and a sesquiterpene cyclase (CAM04173.1) was used as outgroup 98 . To infer ML phylogenies, we used IQTree to calculate and support the trees for 1.000 bootstraps under LG + I + G4 model, the model proposed by this algorithm. To corroborate this tree, Bayesian phylogenies were calculated with MrBayes v3.2 for 100.000 generations using the same model and a discarded burn-in rate of 25% of the initial generations. (2020) 10:8679 | https://doi.org/10.1038/s41598-020-64774-y www.nature.com/scientificreports www.nature.com/scientificreports/ Visualization of geosmin synthase gene sequence variation. To evaluate sequence similarities between the geosmin synthase genes, in our set of 66 Bacteria strains, we compared the calculated geosmin gene tree described in the previous section, with a heatmap representing pairwise sequence similarities values between all the gene sequences. This percent identity matrix was calculated, with the amino acid trimmed alignment, using CLUSTAL OMEGA online tool 99,100 and the graphical representation was done with the online tool Evolview 101,102 .

Detection of recombination.
To detect adaptive evolution in the bacterial geosmin gene, the rates of synonymous and non-synonymous substitutions were estimated using algorithms in the Datamonkey.org web-server 103 . The detection possible recombination was performed with Genetic Algorithm for Recombination Detection (GARD) 104 , a method that identifies recombination, and can be used as a pre-analysis to the inference selection tests performed afterwards.
We used the protein alignment prepared for the geosmin gene distribution analysis in the 3 bacterial phyla which we back translated to DNA, as suggested by the software's documentation, using EMBOSS Backtranseq V6.6.0 online tool 105 . natural selection scanning. To detect the possibility of adaptive (positive) or purifying (negative) selection selection in the geosmin gene in Bacteria, we used Datamonkey web-server 103 algorithms SLAC (Single-Likelihood Ancestor Counting) and FUBAR (Fast Unconstrained Bayesian AppRoximation). Finally, MEME (Mixed Effects Model of Evolution) performed a test for the identification of individual sites also subject to positive or purifying selection. The global ratio of non-synonymous nucleotide substitutions (dN) to synonymous nucleotide substitutions (dS) was employed as an indicator of positive or negative selection pressure. Diversifying or positive selection dominates when the value of dN/dS is >1, dN/dS >1 implies purifying or negative selection while dN/dS = 1 is suggestive of neutral selection pressure. chemical analysis of geosmin production. The Cyanobacteria culture was tested for geosmin production using SPME GC-MS at iBET -Instituto de Biologia Experimental e Tecnológica by the Food and Health Division Laboratory (Oeiras, Portugal). Solid phase microextraction (SPME) was used for the extraction of volatile compounds from Cyanobacteria cultures. Briefly, 2.5 mL of sample (mix of fresh algae culture, 8 mL of ddH2O and 3 g of NaCl) were measured to a 10 mL headspace vial (La-Pha-Pack ® ) and were capped with a white PTFE silicone septum (Specanalitica). The SPME operating conditions were: extraction temperature 40 °C for 40 min, rotating speed 250 rpm, agitator during 10 s, desorption time 5 min at 260 °C. Analysis were carried out in a GCMS-QP2010 Plus (Shimadzu ® ) equipped with an AOC-5000 autosampler (Shimadzu ® ). A divinylbenzene/ Carboxen/polydimethylsiloxane (DVB/Car/PDMS) fiber (SUPELCO Analytical, Bellefonte, PA, USA) was used for headspace SPME sampling. For the analysis a capillary column ZB-5MSi (Zebron, phenomenex ® ) capillary column 30 m, 0.25 mm (IS), 0.25 µm (film thickness) was used. The working conditions were: injector temperature: 260 °C, injection mode: splitless, detector temperature: 250 °C. High-purity helium (≥99.999%) was used as the carrier gas, column oven temperature was kept at 50 °C for 3 min, increased to 180 °C at a rate of 8 °C min −1 and maintained for 8 min, then was increased to 230 °C at 25 °C min −1 and maintained for 1 min.; carrier gas (He) 2.00 mL.min −1 , interface and ion source temperature in MS were at 250 °C. Mass spectra were acquired in Electron Ionization (EI) mode at 70 eV. in a m/z range between 29-299 with a scan speed of 555 scans s −1 . Geosmin in samples was detected in Single Ion Monitoring (SIM) mode using characteristic ions m/z 111, 112, 125 and 126. The geosmin solution (100 µg μL −1 in methanol; Sigma -Aldrich ® ) was used as standards for GC-MS analysis. Dilution series from 100 µg ml −1 to 5 ng ml −1 were prepared for the geosmin standard to test the response and sensitivity of the GC-MS method. Geosmin was identified using the mass spectra libraries, NIST 21,27,107,147 and Wiley 229.

Data availability
The nucleotide sequence data are available at DDBJ/EMBL/GenBank under the accession number: PRJNA531034. The Holotype was deposit in the Herbarium of University of Coimbra (COI), Coimbra, Portugal (http://www. uc.pt/en/herbario_digital/) in a metabolically inactive state in the form of preserved material with the strain identifiers (COI00097100; COI00097101; COI00097102; COI00097103) and the information avalilable online at: https://coicatalogue.uc.pt/index.php. Cultures of Microcoleus asticus were deposited in in the Collection of Microalgae of University of Coimbra (ACOI) at the Department of Life Sciences, University of Coimbra, Coimbra, Portugal and at the Culture Collection of Cyanobacteria of the Portuguese Institute for Sea and Atmosphere, I. P. (IPMA, IP), Av. Alfredo Magalhães Ramalho, 6, 1495-165 Algés under the strain identifiers ACOI 3416 and IPMA8 respectively. Phycobank registration http://phycobank.org/102090.