Introduction

Picocyanobacteria are small unicellular cyanobacteria, and they contribute greatly to carbon fixation in the aquatic ecosystem. Marine picocyanobacteria contain two major genera, Prochlorococcus and Synechococcus, which together can contribute about 25% of net primary production in the ocean [1, 2]. While Prochlorococcus is more restricted to warm oligotrophic water, Synechococcus is widely distributed in various aquatic environments ranging from open oceans to freshwater [3]. The average cell size of Synechococcus (0.9 µm) is larger than that of Prochlorococcus (0.6 µm) [4]. In addition, the average genome size of Prochlorococcus (1.8 Mb) is also smaller than that of Synechococcus (2.9 Mb) [5]. Genome streamlining provides less ecological flexibility to marine Prochlorococcus, on the other hand, the relatively large genome size of Synechococcus provides more genomic plasticity which enables them to adapt to more variable habitats [6,7,8,9].

Diverse Synechococcus strains have been isolated from freshwater, estuarine, coastal, and oceanic water [2, 10,11,12], suggesting that Synechococcus can adapt to distinct aquatic environments. In the estuarine environment, picocyanobacteria (mostly Synechococcus) can make up 20–40% of phytoplankton chlorophyll a and up to 60% of primary production in summer [13]. Freshwater Synechococcus can also play an important role in carbon fixation and nutrient cycling in ponds, lakes, and rivers [14,15,16]. Phylogenetic analyses of freshwater and marine Synechococcus show that Synechococcus is polyphyletic [3, 17,18,19]. Molecular systematics has challenged the traditional taxonomy of Synechococcus in the past 20 years [17, 20, 21]. Genetic diversity of Synechococcus has been studied in various aquatic environments [22,23,24,25]. In marine waters, three subclusters of Synechococcus have been defined [10, 26] and subdivided into 28 clades based on the ITS sequences [27]. In the freshwater system, 6-8 clusters of Synechococcus have been identified based on the 16S rRNA gene or other genetic markers [27,28,29,30,31]. Freshwater Synechococcus are deeply branched and are less congruent compared to marine Synechococcus [12]. Because of their ubiquity in aquatic systems, Synechococcus contain highly diverse phylotypes and ecotypes.

Comparative genomics of cyanobacteria has greatly advanced our understanding of molecular evolution, metabolic potential and ecological adaptation of different cyanobacterial types [10, 26]. Unicellular cyanobacteria with smaller genomes (<3.3 Mb) appear to have relatively more genes involved in amino acid metabolism, but fewer genes for environmental sensing (signal transduction) and cell motility compared to cyanobacteria with larger genomes (>3.3 Mb) [8]. In the marine environment, ecological adaptation of Synechococcus to different niches is evident at the genomic level. The first complete marine Synechococcus genome (strain WH8102) was sequenced in 2003 [32]. By comparing the genomes of a coastal Synechococcus strain (CC9311) and the oceanic Synechococcus strain (WH8102), Palenik et al. showed that the coastal strain has a greater capacity to sense and respond to changes in their environment compared to its  oceanic counterpart [33]. Open ocean Synechococcus (‘specialists’) tend to have smaller genomes and less genome islands than coastal Synechococcus (‘opportunists’ or ‘generalists’) [10, 26]. Coastal Synechococcus strains have an increased tolerance for copper and oxidative stress through distinct transcriptional responses and genomic features [34, 35]. Coastal Synechococcus genomes contain a large portion of accessory and unique genes which provide them considerable flexibility to adapt to diverse habitats [26]. Novel genes on picocyanobacterial genome islands can provide selective advantage for niche adaptation [10]. Recently, genome sequencing of a Chesapeake Bay Synechococcus strain CB0101 unveiled its increased capacity in environmental sensing, transportation, regulation, and stress response [36]. The presence of toxin-antitoxin (TA) genes and their functional assignment in Synechococcus CB0101 suggests that TA systems can be important to the high environmental endurance of estuarine Synechococcus [37].

TA systems are known to be involved in stress responses in microbes, but little is known about TA systems in picocyanobacteria. TA systems are genetic modules comprised of a toxin, which often arrests translation and subsequent growth, and a cognate antitoxin which negates the interruption caused by the toxin [38, 39]. TA system activation often results in persister cell formation which can be advantageous for bacterial survival in highly variable environments. While TA systems have been broadly described as ubiquitous in nearly all bacterial species, TA systems in cyanobacteria have only recently been described. TA systems have been predicted in freshwater cyanobacteria including Microcystis aeruginosa [40], Synechocystis PCC6803 [41], and Synechococcus PCC7942 [42]. Only a few TA genes, i.e. VapB, VapC, and PemK were reported in the freshwater cyanobacterial strains, no systematic survey on TA genes was performed at the genomic level on those strains. The first chromosomal TA system in marine Synechococcus was described in the estuarine Synechococcus strain CB0101 [37]. CB0101, isolated from the Chesapeake Bay, belongs to Synechococcus subcluster 5.2 [43]. Transcriptomic analysis of CB0101 reveals a tight coupling between the upregulation of particular toxins, such as relE, with environmental stressors like zinc heavy metal toxicity and high light intensity [37]. Marsan et al. showed that TA systems can be important to the environmental stress response in Synechococcus [37]. However, little is known about the occurrence, diversity, evolution, and ecological functions of type II TA systems in Synechococcus and other picocyanobacteria.

The goal of this study is to investigate the presence of TA genes in picocyanobacteria using the TAfinder software [44]. Our search comprised of 71 complete picocyanobacterial genomes, including 33 Synechococcus and 38 Prochlorococcus genomes. An interesting linear relationship between the number of TA pairs and genome size was found in Synechococcus.

Methods

Complete Synechococcus (n = 33) and Prochlorococcus (n = 38) genomes were downloaded in September, 2019 from both the National Center for Biotechnology Information (NCBI) RefSeq database [45, 46] and the Joint Genome Institute genome portal [47]. To ensure quality, we omitted incomplete genomes from this study. The Synechococcus and Prochlorococcus genomes included in this study cover the majority of major known phylogenetic clades and subclusters (Table S1).

Toxin-antitoxin systems were predicted using the TAfinder software which utilizes the Toxin-Antitoxin Database (TADB) [44, 48]. Genomes that were not included in the TAfinder’s available genome list were downloaded locally and manually uploaded to TAfinder. TAfinder was used to predict type II TA pairs in Synechococcus (freshwater, estuarine, coastal, and open ocean strains) and marine Prochlorococcus genomes using default settings (BLAST e-value = 0.01, HMMer = 1, Maximum length = 300 aa, Distance = −20_150). Synechococcus strains were classified into habitats based on literature searches for original isolation information. Because estuarine strains, Synechococcus CB0101 and PCC7002, are underrepresented, they were categorized into the coastal habitat category (Table S1) for the purpose of linear regression data analysis.

To estimate relative diversity of the putative TA families, predicted amino acid sequences were searched against the NCBI conserved domain database (version CDD v3.18 - 55570 PSSMs) [49]. Short names for conserved domains were manually reviewed and determined to be of a consensus of a major TA family. If the gene did not fall into one of the traditional TA families, it was categorized as “Other” for the consensus. If the predicted amino acid sequence did not have a significant match to the conserved domain database, it was categorized as “Unknown”.

Linear regression and linear models were completed using Rstudio software [50] and figures were made using ggplot2 [51]. Genome Island regions were predicted using IslandViewer 4 software [52].

Results

TAfinder predicted at least one TA pair in 27 of 33 Synechococcus genomes (81%). The number of TA systems in Synechococcus varies from 0 to 42 (A toxin-antitoxin system is normally comprised of one toxin gene and one cognate antitoxin gene). Only five strains of Synechococcus did not contain putative TA systems. A total of 986 putative toxin and antitoxin genes were predicted, constituting 493 TA systems, in 27 complete Synechococcus genomes. The occurrence frequency of TA systems in Synechococcus is shown in Fig. 1. The 27 TA-containing Synechococcus strains were isolated from various aquatic environments including freshwater, Antarctic (cold adapted), hot spring (thermophile), estuarine, coastal, and oceanic waters. Representative Synechococcus strains belong to diverse phylogenetic lineages (Table S1).

Fig. 1: Occurrence frequency of putative TA systems in 33 strains of Synechococcus isolated from various aquatic environments.
figure 1

Numbers shown at the end of bars are the number of putative TA systems.

TAfinder did not predict any TA systems in any of the Prochlorococcus genomes (n = 38). These Prochlorococcus genomes were representative of many clades from both high light and low light adapted strains. These queried Prochlorococcus genomes ranged in size from 1.6 to 2.7 Mb.

Freshwater and coastal Synechococcus contained many putative TA systems. For example, freshwater Synechococcus strains PCC6312 and PCC6307 both contained 42 putative TA systems. These 84 genes accounted for ~1.2% of their total coding sequences (Table S1). Coastal strains PCC7003, and PCC7117 contained 38, and 37 TA pairs, respectively, accounting for 1.24% and 1.17% of their coding sequences.

In general, Synechococcus living in coastal, estuarine, and freshwater environments tend to have larger genomes compared to their counterparts living in the open ocean. It appears that Synechococcus with larger genomes contain more TA genes than Synechococcus with smaller genomes. Interestingly, a good linear correlation between the genome size and the number of putative TA pairs (r2 = 0.6235, p < 0.0001) (Fig. 2a) was found in Synechococcus, further confirming the above observation that larger Synechococcus genomes contain more TA genes. This apparent relationship between genome size and putative TA pairs becomes more clear in cases when endemic ecological conditions are considered; specifically, for coastal and freshwater Synechococcus. When analyzed separately, better linear regressions (r2 = 0.9152, p < 0.00001 and r2 = 0.8296, p < 0.005) between genome size and putative TA pairs were found in coastal and freshwater Synechococcus (Fig. 2b). Conversely, the general correlative trend between genome size and the number of TA pairs in all Synechococcus strains was not found when only the open ocean strains were analyzed. Synechococcus toxin genes contained more known conserved domains than antitoxins (Fig. 3). About 77% of toxin genes had a known conserved domain with an annotation of a traditionally named TA system. The most common toxin gene included a conserved PIN domain which is characteristic of the VapC toxin which cleaves tRNAs or rRNAs [53]. Nearly 41% of putative toxin genes contained the conserved domain for VapC.

Fig. 2: Relationship between genome size and the number of putative TA pairs in Synechococcus.
figure 2

a Linear correlation between genome size and putative TA systems for all complete genomes of Synechococcus strains isolated from all habitats (r2 = 0.6235, p < 0.0001); b Linear correlation between genome size (Mb) and the number of putative TA pairs in coastal (gold) and freshwater (royal blue) Synechococcus, (r2 = 0.9152, p < 0.00001 and r2 = 0.8296, p < 0.005 respectively). No such correlation was found in open ocean Synechococcus (green).

Fig. 3: Conserved domain regions of putative toxins and antitoxins.
figure 3

a Conserved domains in toxins. b Conserved domains in antitoxins. Putative toxin and antitoxin sequences that contained a conserved domain that was not a traditional TA system were categorized as ‘Other’. Sequences that did not contain a conserved domain were categorized as “Unknown”.

Putative antitoxin sequences contained fewer NCBI conserved domains than toxins. Only 35% of antitoxin genes had a conserved domain with a traditionally annotated TA system. Many conserved domains in putative antitoxin genes had generic names such as “domains of unknown function” (DUF) or clusters of orthologous groups (COG).

Discussion

A survey on picocyanobacterial TA systems leads to an interesting finding that Synechococcus strains with larger genome size contain more TA systems. Although many genetic features of picocyanobacterial genomes have been explored [10, 11], little is known about the prevalence of TA genes in picocyanobacteria. Synechococcus has a remarkable adaptation capability, which is reflected by their occupancy in diverse environments ranging from lakes, rivers, estuaries, coastal and oceanic water. The presence of a specific group or genus over such a wide range of habitats makes Synechococcus an ideal model to explore the relationship between their ecological adaptation and genomic features. TA systems have been well studied in bacteria and archaea. One well-known function of TA systems is dormancy, or the ability to initiate a  persister state under stressed conditions and recover when the adverse stresses are released [39, 40, 54]. While the actual functions of Synechococcus TA genes have not be tested, it is believed that inheritance of more TA genes may allow some Synechococcus strains to endure more variable environments, which could confer a competitive advantage against other less resilient picocyanobacteria. Coastal, estuarine, and freshwater environments are characterized by rapid changes in environmental conditions, the higher occurrence frequency of TA genes may provide adaptive advantages for Synechococcus living in these types of aquatic habitats.

TA systems have been shown to provide recoverable persister states when Synechococcus cells were exposed to conditions to induce oxidative stress [37]. These conditions are expected in rapidly changing environments such as estuaries, coastal, and some freshwater environments. In another cyanobacterial species, Synechococystis PCC 6803, TA systems have been found and were predicted to have RNase activity which could have a drastic effect on transcriptomic remodeling [55]. Such a remodeling is possible through RNA degradation as a result of toxin overexpression, which can have a significant impact on slowing translation. Type II TA systems can have other methods of actions in other bacteria including post segregational killing and abortive infection [54]. Unfortunately, these remain poorly studied and understood in picocyanobacterial systems.

Interestingly, a linear correlation was found between the genome size and the number of TA genes in Synechococcus. Such a linear correlation has not been found in bacteria [56]. When testing 2,181 genomes of prokaryotes (archaea and bacteria from both obligate intracellular species to free living species) [56] and 65 genomes of Acetobacter (with sources ranging from fermented food, to fruits, to symbiotes in the fruit fly Drosophila melanogaster) [57], the number of TA gene pairs does not increase linearly with increased genome size. The clear linear trend seen in Synechococcus is likely related to larger genome sizes having a wide array of CDS. For strains with expanded genetic capacity, it may be advantageous to retain a multitude of TA systems in aquatic habitats with highly variable chemical and physical features. While Synechococcus is ubiquitous in nearly all aquatic ecosystems, the presence of Synechococcal TA systems is not; this suggests that TA systems are advantageous in some, but not all, aquatic environments. The genome size of Synechococcus available for this in this study ranges from 2.1 to 3.7 Mb. Synechococcus genomes have previously been shown to correlate strongly with the length of hypervariable genome island regions [26]. In Synechococcus, TA genes can be located on these genome islands, but the majority of the TA pairs are not located on hypervariable genome islands (Table S1).

The lack of TA systems in Prochlorococcus is likely related to their relatively stable habitats. The endemic habitat of Prochlorococcus is the pelagic ocean which is characterized by its stable, nutrient limiting environment coupled with a predictably high cellular density [5]. The genus Prochlorococcus is a highly diverse group comprised of 12 specialized clades with genomic features uniquely adapted to specific conditions in oceanic ecosystems [6]. High light adapted group II has some of the smallest genomes (~1.7 Mb) and lowest GC content (~33%), which is indicative of genomic reduction [9]. Some Prochlorococcus strains (such as low light adapted group IV) have relatively large genomes (2.4–2.6 Mb) and many unique genes [58]. Regardless of their large genetic capacity, no TA genes were detected in the genomes of group IV Prochlorococcus strains. Despite the diversity of Prochlorococcus ecotypes, TA systems may not be needed due to Prochlorococcus specific adaptation to the oligotrophic ocean.

Among the 33 Synechococcus strains examined in this study, 11 are open ocean strains. Oceanic Synechococcus strains in general contain no TA genes or only a few TA genes. The five open ocean Synechococcus strains that are void of TA genes are WH8109, KORDI-52, KORDI-100, CC9605, and CC9902, while four oceanic Synechococcus strains (MIT9504, MIT9508, MIT9509, and KORDI-49) contain few (1 to 5) putative TA pairs. The exception to this is open ocean Synechococcus strain WH8102, which contains 15 putative systems. WH8102 was originally isolated from the Sargasso Sea, and its genome is more indicative of a ‘generalist’ with features acquired via horizontal gene transfer [59]. Like marine Prochlorococcus, oceanic Synechococcus may not need TA genes due to their acclimation to the stable oligotrophic environment.

Along with genome size, endemic ecological conditions and habitats are an important indicator of the prevalence of TA systems in Synechococcus, and more broadly picocyanobacteria. Synechococcus strains from more variable environments like coastal and freshwater locations tend to have more TA pairs than open ocean strains that are streamlined to a stable pelagic lifestyle. This phenomenon may also explain the broader pattern of TA system distribution in picocyanobacteria; the prevalence of TA pairs in picocyanobacteria living in the nutrient rich and dynamic habitats and the rareness and complete absence of TA in picocyanobacteria living in the oligotrophic open ocean. The presence of TA systems may be one of the many genetic features that allow Synechococcus to inhabit a wide array of aquatic ecosystems and achieve a cosmopolitan distribution. The lack of TA systems in Prochlorococcus is consistent with their reduced genomes and oligotrophic lifestyle [60].

Originally, 7 TA pairs were predicted in CB0101 using BLASTCLUST [61] and confirmed using the RASTA-bacteria [62] and TADB [48]. More recently, the TAfinder search tool was used to search whole genomes [44], rather than specific gene pairs to predict type II TA systems. Due to the ever-expanding TADB and improved prediction methods like TAfinder, 22 TA systems, including the original 7 pairs, were found in CB0101. These new pairs were confirmed manually, and conserved domains were predicted using NCBI’s conserved domain database and Interpro for protein functional analysis.

The scope of this study is constrained by the use of TAfinder. TAfinder is capable of predicting type II TA systems, which are the most well studied and characterized TA systems. Type II TA systems comprise 99% of the TA genes in the TADB [48]. To ensure that other, less known TA families (I, II-VI) were not overlooked, a blast search for those few systems against all the genomes of Synechococcus and Prochlorococcus was completed. No significant matches were reported using default settings. Antitoxin sequences contained fewer conserved domains than toxins. Antitoxin sequences appear to be highly diverse and variable among Synechococcus strains. Multiple antitoxin structures may function to bind their cognate toxin. When the paired gene can sufficiently neutralize the toxin, it acts as an antitoxin and selection for highly conserved sequences may be relaxed. Although more toxins contain conserved domains than antitoxins, it is important to note that TA systems are not present in all Synechococcus and they are highly variable in terms of the number and type of TA systems. Even within the closely related Synechococcus strains, it is difficult to identify suitable genetic markers for phylogenetic analysis due to the overall poor TA gene conservation. VapC, and its cognate VapB antidote, are the largest family of bacterial toxin-antitoxin modules [63]. A wide variety of toxin functionality is represented in Synechococcus as both ribosomal-dependent mRNA endonucleases like RelE and ribosomal-independent mRNA endonucleases like HicA and MazF were predicted [54].

Conclusion

The tight correlation of genome size and the number TA genes in coastal and freshwater Synechococcus suggest that the retention of TA systems could be advantageous for Synechococcus living in highly variable environments. All the tested Prochlorococcus genomes (n = 38) do not contain any TA genes, regardless of their  genome sizes (1.6 to 2.7 Mb). This result suggests that Prochlorococcus do not have a TA system mediated dormancy in response to changing environments. This also applies to some Synechococcus living in open oceans where chemical and hydrological conditions are relatively stable compared to coastal, estuarine, and freshwater environments. It is interesting that the number of TA pairs is linearly correlated with the increasing genome sizes of Synechococcus. It appears that the acquisition and retention of TA genes in Synechococcus is not only influenced by genome size, but also environmental stability. Synechococcus strains with large genomes, especially those that inhabit dynamic ecosystems (coastal, estuarine, and freshwater) have more TA systems than strains with smaller genomes that are present in stabile environments like the open ocean. Compared to Prochlorococcus, Synechococcus has a relatively large genome, with space for more coding sequences, ample TA systems, and a wide variety of environmental response genes that allow for their ubiquitous distribution in diverse aquatic environments. TA systems in Synechococcus could confer an ability to enter persister states in the presence of stressful stimuli, which is advantageous to Synechococcus living in freshwater and coastal estuary where environmental conditions are more variable compared to open oceans.