New cyanobacterial genus Argonema is hiding in soil crusts around the world

Cyanobacteria are crucial primary producers in soil and soil crusts. However, their biodiversity in these habitats remains poorly understood, especially in the tropical and polar regions. We employed whole genome sequencing, morphology, and ecology to describe a novel cyanobacterial genus Argonema isolated from Antarctica. Extreme environments are renowned for their relatively high number of endemic species, but whether cyanobacteria are endemic or not is open to much current debate. To determine if a cyanobacterial lineage is endemic is a time consuming, elaborate, and expensive global sampling effort. Thus, we propose an approach that will help to overcome the limits of the sampling effort and better understand the global distribution of cyanobacterial clades. We employed a Sequencing Read Archive, which provides a rich source of data from thousands of environmental samples. We developed a framework for a characterization of the global distribution of any microbial species using Sequencing Read Archive. Using this approach, we found that Argonema is actually cosmopolitan in arid regions. It provides further evidence that endemic microbial taxa are likely to be much rarer than expected.

and Argonema antarcticum (Figs. [6][7][8] Trichomes of A. galeatum appear more straight (Fig 2), while trichomes of A. antarcticum form waves (Fig 6) and loops (Fig 7). Scale = 10 µm, wide arrow = necridic cells, arrowhead = granules, asterisk = colored apical cell, circle = empty sheath.   www.nature.com/scientificreports/ purple-brown to almost black (Fig. 11b). Cell content often granulated. Reproduction by necridic cells and subsequent breaking of the filaments into hormogonia (Fig. 11a,c).  [5][6][7][8]. Filaments are wavy, gray-green to brown-green in color. The sheaths are colorless to light brown, distinct, and variable in length. The filament can protrude from the sheath or the sheath can exceed filament. No true branching was observed. Trichomes are cylindrical, not attenuated or slightly attenuated towards the end with a concave apical cell, slightly or not constricted at cell walls (Fig. 11d). Necridic cells present (Fig. 11e), reproduction by hormogonia. The morphological description was based on both culture and fresh material.
Reference strain: Argonema antarcticum A004/B2.  Fig. 9). Cell length is less variable between the two species, with both averaging at 1.7 µm (Fig. 10). The difference in cell length was not statistically significant (Nested ANOVA, p = 0.7261). The distinctly colored concave apical cell was observed only in A. galeatum strains (Fig. 11). No true branching and no aerotopes were observed in either species. We also observed the morphology of uncultured filaments in the native sample. These filaments matched the morphological characteristics of cultured Argonema strains (Supplements, Fig S1).
Phylogeny. Phylogeny based on the 16S rRNA gene using Bayesian inference, maximum likelihood, and maximum parsimony revealed that strains of Argonema form a distinct monophyletic clade among other Oscillatoriales (Fig. 12). The A003 strains formed one clade, while the two A004 strains formed a highly supported sister clade, suggesting that they may be different species. The A003 clade also contains an uncultured Antarctic cyanobacterium clone AY151731 30 . The closest clade to that formed by Argonema lineages was a clade containing Cephalothrix komarekiana and Aerosakkonema funiforme (Fig. 12). The second closest clade contained Potamosiphon australiensis, Microseira wollei, Phormidium irriguum, and Phormidium ambiguum. Argonema formed a distinct clade different from Oscillatoria sensu stricto and Phormidium sensu stricto 31 . There are two significant insertions in the 16S-23S ITS present in A. antarcticum strains that are 9 bp and 6 bp long (Supplements, Fig. S2). We also estimated the secondary structure of D1-D1-' and Box B helices in 16S-23S ITS. The secondary structures of A. galeatum and A. antarcticum were identical (Supplements, Fig. S3).
16S rRNA sequence may not often be sensitive enough for a species delimitation in cyanobacteria 20 , so we investigated the whole-genome sequence for each putative species. Two strains were selected for whole genome sequencing, one A. galeatum strain (A003/A1) and one A. antarcticum strain (A004/B2). Both strains have average genome sizes similar to other filamentous cyanobacteria (6-8 Mb) 32,33 . Both genomes have very similar and average GC content, but differ slightly in the number of coding sequences and number of RNAs (Table 1). 117 annotated cyanobacterial genomes were obtained from the NCBI database for phylogenomic reconstruction (Fig. 13). Both strains of Argonema clustered with genomes of Phormidium sp. LEGE05292 and Phormidium ambiguum (strain IAM M-71) as in 16S rRNA phylogeny. Argonema thus belongs to the Oscillatoriales order, to the family Oscillatoriaceae sensu Komárek et al. 18 .
We used whole genome average nucleotide identity as additional evidence for erecting a new species. The ANIb score of A003/A1 A. galeatum and A004/B2 A. antarcticum was 92.16-92.55, the ANIm score was 93.73-93.74. Both of these scores are lower than the 95-96% threshold for species delimitation 34 , Table S1).
Taken together with a morphological difference, this evidences that we recognize two species within Argonema -A. galeatum and A. antarcticum.
We also calculated average percent similarity between Argonema galeatum and Argonema antarcticum strains and closely related strains, based on 16S rRNA phylogeny (Supplements,  , Table S3). Matches with amplicon sequencing were considered positive if at least one 16S rRNA sequence with 97% or higher identity and read length over 100 bp was recovered. Positive matches from metagenomics sequencing were considered if the genome coverage was covered by reads at least 10x. The datasets belonged to uncultured cyanobacteria from soil samples or soil crusts from diverse geographic locations, most notably USA (California, Utah, New Mexico, and the Mojave Desert), China (Tengger Desert, Tibetan Plateau, Gurbantunggut Desert), and Israel, but also Svalbard, Antarctica, Spain, Germany, Austria, Australia, India, and Oman ( Fig. 14).
We also searched the NCBI nucleotide database for partial sequences of 16S rRNA of uncultured cyanobacteria similar to Argonema, and constructed a maximum likelihood tree with sequences with 97% or more similarity to Argonema (Supplements, Fig. S4). We found 15 short partial 16S rRNA sequences in the NCBI database (< 413 bp). These sequences also came from diverse geographical locations (Supplements , Table S3), mostly from Antarctica, notably from McMurdo Dry Valleys, but also from Nepal, Pakistan, and even Luxembourg (Fig. 14).
In total, we discovered 57 possible Argonema matches at 57 geographical localities. Samples were generally collected from soils or soil crusts, but there were also samples collected from melt-water lakes in Antarctica and also one from a freshwater lake (Luxembourg). Samples came from geographical areas with annual precipitation lower than the global average (950 mm), with most samples from areas with annual precipitation lower than 500 mm (Supplements , Table S3).
Discussion. The diversity of soil cyanobacteria remains largely unexplored. Many taxa have been erected in the last decade, but their geographical distribution is still largely unknown outside of type locality. In this study, we describe a new genus of cyanobacteria with two species. This genus was firstly found on James Ross Island, Western Antarctica, but we show that it has a cosmopolitan distribution in soils and soil crusts. Our findings stress that our knowledge of the diversity of microbes is still very limited. We provide evidence that the concept of endemism in microbes is heavily dependent on a sampling effort, even in extreme environments such as Antarctica 35 . Moreover, sequence data archives represent a wealthy source of data available to study the distribution of newly discovered microbes.
Argonema forms a distinct and highly supported clade among other Oscillatoriales, based on both phylogenetic (16S rRNA) and phylogenomic data. Phylogeny based on whole genomes clustered Argonema strains as a sister clade to a clade consisting of Phormidium ambiguum and Phormidium irriguum, which belong to the genus Phormidium sensu stricto according to Komárek 31 . Currently, no whole-genome sequence is available for either Cephalothrix, Aerosakkonema, Potamosiphon or Microseira, thus these genera could not be included in the phylogenomic analysis. Based on the phylogenomic and phylogenetic analysis, Argonema genus belongs to the order Oscillatoriales, to the family Oscillatoriaceae, sensu Komárek et al. 18 .
Based on 16S rRNA phylogeny, we recognize two separate clades in Argonema representing two species: A. galeatum and A. antarcticum. Additional evidence supporting a division of Argonema into two species was provided by ANI values estimation 36 . Both the ANIb and ANIm scores between A. galeatum and A. Antarcticum were lower than the 95-96% threshold for species delimitation. Interestingly, the genomes of A. galeatum and A. antarcticum were very similar in GC content, the number of genes, and overall size. In cyanobacteria, genome size and GC content are often connected with adaptations to a new habitat or environment 33,37 , so the similar GC content and genome size in Argonema species are likely a result of adaptation to the same environment. Further evidence supporting the division of Argonema strains into two species is the presence of 2 significant insertions/deletions in the 16S-23S ITS gene (Supplements, Fig. S2). We also calculated the average percent similarity between Argonema galeatum and Argonema antarcticum strains and closely related strains, based on 16S rRNA phylogeny (Supplements , Table S2). Generally, the average percent similarity between A. galeatum and A. antarcticum was quite high, at 98.7%. We believe, however, that the division of Argonema strains into two species is sufficiently supported by other evidence. www.nature.com/scientificreports/ The two proposed species A. galeatum and A. antarcticum can be differentiated based on their 16S rRNA phylogeny, whole genome phylogeny, and morphological features. Compared to other prokaryotes, cyanobacteria could possess sufficient variability of morphological traits which can be used for identification. Thus, we performed a detailed analysis of the Argonema strains morphology. Some filaments of A. galeatum possess distinct colored apical cells, which were not observed in A. antarcticum. The distinct apical cell may play a role in burrowing of the trichomes into the substrate as in other soil filamentous cyanobacteria such as Microcoleus vaginatus 38 . It is currently unclear, whether the apical cell is dark colored itself, or if it is covered in dark structure, e.g. apex www.nature.com/scientificreports/ of a sheath. Filaments of A. galeatum were generally straight, whereas filaments of A. antarcticum formed waves and loops. Trichomes of A. antarcticum are significantly wider than those of A. galeatum. A. antarcticum also differed in color from A. galeatum, with trichomes being more brown-green to gray-green, rather than blue-green.
Argonema is morphologically similar to other Oscillatoriales, but it can be differentiated from its closely related genera based on morphology. Argonema can be distinguished from Ceplhalothrix based on the absence of aerotopes in cells. Trichomes of Cephalothrix komarekiana are also narrower, at 4.8-7.3 µm wide. Apical cells of C. komarekiana trichomes can be capitate and calyptras can be present. The apical cells of Argonema galeatum can be distinctly colored, but no calyptras were observed in either A. galeatum or A. antarcticum. Also, while Argonema is a predominantly soil cyanobacterium from areas with generally low precipitation, Cephalothrix genus is found in a tropical or sub-tropical freshwater environment. The type species, Cephalothrix komarekiana is a tropical freshwater species isolated from Brazilian Pantanal Wetlands 39 . Aerosakkonema funiforme is also a freshwater cyanobacterium isolated from the mesotrophic reservoir in the Lao People's Democratic Republic 40 . Aerosakkonema can also be differentiated from Argonema by the presence of small aerotopes in cells. Trichomes of Aerosakkonema funiforme are wider, at 11.7-16.6 µm wide (compared to avg 6.5-9.2 µm of Argonema), and do not have sheaths. Argonema can also be morphologically and ecologically differentiated from Potamosiphon australiensis and Microseira wollei. Potamosiphon australiensis was originally isolated from benthos of the freshwater stream in Australia 41 . Filaments of P. australiensis are cylindrical, approximately 20-22 µm wide, not or slightly constricted at cell walls, and encased in distinct and sometimes lamellated sheaths. Two or more filaments often share one sheath following diagonal division. P. australiensis reproduces via motile hormogonia, which often form in series. Microseira wollei is a mat-forming cyanobacterium described from freshwater environments in Australia 42 . Filaments of M. wollei are significantly wider than filaments of Argonema, 30-65 µm wide, and encased in a distinct lamellated sheath. False branching was also observed in Microseira, which was not observed in Argonema. Distinctly colored apical cells were not observed in either Potamosiphon or Microseira.
Argonema can also be morphologically differentiated from Phormidium ambiguum and Phormidium irriguum, which belong to the Phormidium sensu stricto 31 . P. ambiguum was described in 1892 43 from northern Germany as a freshwater/marine species, bright blue-green to yellow-green in color, with trichomes 4.5-7.5 µm wide, not attenuated, not capitate, with thin distinct sheaths. However, Compére 44 described a variant P. ambiguum var. major that has wider trichomes at 9.5 µm wide, which are shortly attenuated towards the end and have a calyptra. Currently, there is a partial 16S sequence (AB003167) 45 and a whole genome sequence of P. ambiguum available in the NCBI database (strain IAM M-71) 46 , but there are no morphological data available to assess which morphological variant it is. Phormidium irriguum is a similar case as several morphotypes differ significantly. www.nature.com/scientificreports/ Anagnostidis & Komárek 47 described P. irriguum as blue-green or grayish in color, with cells 6-11.2 µm wide and 4-11 µm long. Apical cells are convex, slightly capitate, and with thickened cell walls. Sciuto et al. 48 described two variants of P. irriguum. One CCALA 759, which has cells that are 9-12 µm wide and 2-3 µm long, and second strain ETS-02 with 3-5 µm wide and 1-2 µm long cells, trichomes with rounded apical cells, and colorless sheaths. Neither of these match in description P. irriguum as described by Anagnostidis & Komárek 47 . Furthermore, the partial sequence of 16S rRNA gene for P. irriguum CCALA 815, annotated by Strunecký et al. 49 , bears a high identity (99.24%) with a sequence of 16S rRNA of Tychonema bourrellyi LT546478, annotated by Salmasso et al. 50 , which is described as having trichomes 4-6 µm wide, purple or red-brown in color 47 . In this case, it might be a result of misidentification of P. irriguum as T. bourrellyi by Salmaso et al. 50 . Both P. ambiguum and P. irriguum also differ from Argonema in ecological requirements, as P. irriguum was first isolated from mossy rock surface in Switzerland 47 and P. ambiguum is marine/freshwater. No previously described species morphologically or ecologically match Argonema. The most similar morphospecies is Oscillatoria subproboscidea 51 , which was described from Antarctic coast lakes. O. subproboscidea is quite similar to Argonema strains in cell dimensions (cells 8.2-9 µm wide and 3-4 µm long). It was described as having suddenly attenuated and frequently uncinated, never capitate filaments. We observed some filaments that were suddenly attenuated in Argonema strains, especially in native samples, but only rarely. McKnight et al. 52 also described two morphotypes of O. subproboscidea, one that had filaments 7.5-10 µm wide with calyptrate apical cells and sheaths, and second morphotype that was 9-12 µm wide with numerate granules and possible aerotopes. While the first morphotype described by McKnight et al. could be morphologically similar to Argonema, the second is likely a completely different filamentous cyanobacterium 52 . In Nadeau et al. 53 , O. subproboscidea is mentioned as Phormidium subproboscideum and there is even a partial 16S rRNA sequence available in the database (Phormidium sp. Ant-brack-3, AF263332). However, the p-distance value between Argonema galeatum strain A003/A1 and Phormidium sp. Ant-brack-3, as computed by MEGA X software was 0.08 (92% similarity), which further supports the hypothesis that it belongs to a different cyanobacterial clade. Broady & Kibblewhite 54 described several morphotypes of Antarctic Phormidium strains, one of which is morphologically very similar to Argonema galeatum, as it is described as having cells 8.2-10.9 µm wide and 2-6 µm long with sheaths and distinct apical cells. This strain was also described as Phormidium subproboscideum by Broady & Kibblewhite 54 , but there are no molecular data available on this strain. It is possible that Oscillatoria subproboscidea (or its alternative Phormidium subproboscideum) is a species belonging to Argonema genus. Oscillatoria subproboscidea was described more than 100 years ago and there are no genetic data available to assess its relatedness to our strains based on molecular phylogeny.
Other notable morphologically similar species is for example Oscillatoria annae 55 that is described as having dull green trichomes, cells isodiametric, 7.5-8 µm wide and 1.5-3 µm long. Trichomes straight at the end, apical cells conically narrowed, rounded, and without calyptras. No colored apical cells, distinct necridic cells or distinct sheaths were observed in O. annae and unlike Argonema, which is terrestrial, O. annae was described as freshwater benthic cyanobacterium from temperate and tropical regions. Another morphologically similar species is Oscillatoria tenuis var. levis 56 which is described as having straight trichomes blue-green to purple gray in color, irregularly granulated, not or slightly constricted at crosswalls, with cells 6-11 µm long and 2-3.5 µm wide. O. tenuis can have apical cells with a thickened outer cell wall, but no sheaths or distinct necridic cells were observed. O. tenuis was also described as predominantly benthic freshwater cyanobacterium from tropical regions, although possibly cosmopolitan. Another possibly similar cyanobacterium is Lyngbya antarctica which was described by Gain 57 as having pale brownish to blue-green filaments with distinct sheaths and cells 7.5-9 µm wide and 1-1.5 µm long, with calyptrate apical cells. This taxon is, however, problematic. It was discovered from Antarctica, but it was only described once by Gain and there are no further records or data available of this species so its taxonomic status is uncertain.
We also analyzed the secondary structures of 16S-ITS of A. galeatum and A. antarcticum, to further support the hypothesis that they belong to two distinct species. Although the secondary structures turned out identical, we have extensive evidence based on morphological, phylogenetic, and phylogenomic reconstructions to differentiate A. galeatum and A. antarcticum.
We searched the NCBI SRA and nucleotide databases for partial sequences similar to our Argonema sequences to assess a possible distribution of Argonema genus. Our approach allows sequence database mining without previous knowledge of the taxonomic diversity within metagenomic samples. No taxonomic scheme (SILVA, https:// www. arb-silva. de/; NCBI, https:// www. ncbi. nlm. nih. gov/; RDP, https:// rdp. cme. msu. edu/) needs to be applied. We have only limited knowledge about the distribution of most of the cyanobacterial taxa erected based on molecular data within the last two decades. Using this method, the distribution of new taxa can be evaluated upon the description, which provides a rich source of information about the importance of a particular organism. The only disadvantage is that the raw reads in the NCBI SRA cannot be efficiently searched via online form. The data must be downloaded locally. The search also produces large files. Although the files can be compressed, they still occupy terabytes of memory.
We discovered multiple sequences of Argonema from several soil or soil crust metagenomic samples from various geographical locations (e.g., Antarctica, China, USA, Israel, Svalbard, etc.). This indicates that Argonema is not endemic to Antarctica, but might be in fact cosmopolitan, only it was not properly described before. The annual precipitation in the sampling locations where the sequences of potential Argonema strains were obtained was generally lower than the 990 mm global average 58 , with McMurdo Dry Valleys only receiving less than 10-50 mm of precipitation annually. However, there were some exceptions with samples from Germany, Austria or Luxembourg, which have precipitation levels close to the global average. This indicates that Argonema might be generally associated with dry to very dry habitats and it is well adapted to low levels of precipitation or that there is an intraspecies variability, which cannot be uncovered using this approach. In environmental samples of soil crusts, Argonema filaments might have been overgrown by faster growing Oscillatoriales cyanobacteria, www.nature.com/scientificreports/ which might be one of the reasons why it avoided detection. This would be in line with the well-known Baas Becking hypothesis that "everything is everywhere but the environment selects" 59 . The obtained sequences from metagenomic studies were only partial, so it is not possible to deduce whether they belong to one of the two species described here, or if they belong to another new species in the novel Argonema genus.
In conclusion, we used a complex approach based on molecular, morphological, and metagenomic data, to describe a novel genus of crust-forming filamentous cyanobacteria with a potentially cosmopolitan distribution. Moreover, our findings provide evidence that the concept of endemic taxa in microbes can be much rarer than previously expected. With enough effort, the species can be found somewhere in the sea of the sequencing data.

Strain isolation. Samples were collected in Bohemian stream valley and Solorina Valley on James Ross
Island, Western Antarctica (Table 2) from well-developed soil crusts. The original sample was obtained by Michal Zeman (Masaryk University Brno, Czech Republic) on March 6 and 9 2020. We isolated 12 strains from the fresh samples using standard isolation techniques 60  Morphology assessment. We observed the morphology of all 12 strains and selected 5 strains for the morphology assessment (three A003 strains and two A004 strains) using light microscopy. Zeiss AxioImager microscope with high resolution camera AxioCam HRc 13MPx was used for morphology assessment. The following features of cultured strains were assessed: cell shape, cell dimensions, terminal cells, reproduction, sheaths, branching, and granulation. To assess the cell dimensions, the width and length of 100 cells from each observed sample were measured. Morphological data were analyzed using PAST software 62 . A nested ANOVA test was used to identify whether the morphological difference between the strains was statistically significant. We also observed the native sample and studied the morphology of uncultured filaments using the same method as with the cultured strains.
De novo genome sequencing. Two strains were selected for whole genome sequencing-strains A003/ A1 (A. galeatum) and A004/B2 (A. antarcticum). Genomic DNA for whole genome sequencing was extracted from approximately 100 mg of fresh biomass using DNeasy UltraClean Microbial Kit (QuiaGEN, Hilden, Germany) The quality of extracted DNA was checked by agarose gel electrophoresis (1.5% agarose gel, GelRed-Biotium, California, USA) and the DNA was quantified using NanoDrop 1000 (Thermo Fisher Scientific, Waltham, USA). The sequencing was done by commercial Illumina sequencing (Novogene, UK). The whole genome sequences were uploaded to the NCBI database as: BioProject: PRJNA761285, Biosamples: SAMN21250282 We selected both draft and complete annotated genomes which represent all lineages of cyanobacteria. Moreover, all annotated genome assemblies identified as Oscillatoria and Phormidium were added. The whole-genome phylogenomic dataset was identified using OrthoFinder 2.3.1 71 with default settings. The search yielded a multiple sequence alignment with 153 585 amino acid sites. The ML phylogenetic reconstruction was performed in IQ-TREE 1.6.5 67 . The best model was selected using Modeltest implemented in IQ-TREE 72 based on BIC as follows-LG + F + G4. The tree topology was tested by 2000 ultrafast bootstrap re-samplings 68 . The tree was also edited in FigTree 1.4.4 and Inkscape. To test the hypothesis that A003 and A004 strains belong to two distinct species, average nucleotide identity was estimated in JSpecies 1.2.1 software 36 .
We estimated secondary structures of D1-D1' and Box B helices using Mfold RNA folding form with default options except for structure draw mode, which was changed to 'untangle with loop fix' 73 . Taxonomy and nomenclature. We combined the monophyletic species concept sensu Johansen & Casamatta 74 and ANI values 36 to erect new species. The species and genus description conform to the rules of the International Code of Nomenclature for Algae, Fungi, and Plants (https:// www. iapt-taxon. org/ nomen/ main. php). SRA database mining. We used the NCBI SRA database to assess whether Argonema DNA was captured in earlier metagenomics studies. We searched through 16S rRNA amplicons, RNA, and whole-metagenome datasets. The fastq files of 1799 amplicon datasets were downloaded using fastq-dump 2.11.0 (SRA tools; https:// ncbi. github. io/ sra-tools/) with default settings from the SRA archive (https:// www. ncbi. nlm. nih. gov/ sra). Moreover, one amplicon dataset of soil sample originating in Ladakh (India) was provided by Klára Řeháková (Institute of Hydrobiology ASCR, České Budějovice, Czech Republic). All the sequences were mapped to the reference sequence of 16S rRNA of the strain A003A1 using minimap2 v2.22 75 with the following command: minimap2 -a -o output.sam A003_A1 database input. The sam files with at least one mapped sequence were converted to fasta using samtools fasta 1.7 76 : samtools fasta -@ 8 -0 output input. The fasta files were searched against the BLAST database of 16S rRNA of the strain A003A1 using blastn 77 . Only hits of 97% similarity and longer than 100 bp were kept. This length was selected because Soergel et al. suggested that short reads with length > 96 bp provide 82-100% as confident identification as the long or full length sequences 78 . The RNAseq (223) and wholemetagenome (191) datasets were downloaded and mapped in the same way as the amplicon datasets. The sam files with coverage of reads > 10 were considered as hits. A diagram of the SRA mining workflow can be found in Supplements (Fig. S6). The world map showing potentional geographical distribution of Argonema genus based on metagenomic data was constructed using R software 79 , with packages rnaturalearth v0.1.0 80 and ggplot2 v.3.3.5 81 , and modified in Inkscape 70 .