Introduction

Regulatory small RNA molecules are involved in all major adaptive responses in bacteria. Among the controlled processes is quorum sensing (Lenz et al., 2004; Tu et al., 2008, 2010; Sonnleitner et al., 2011), the response to iron limitation (Wilderman et al., 2004; Masse et al., 2005), low temperature (Sledjeski et al., 1996), photosynthesis under iron limitation (Dühring et al., 2006), pathogenicity and virulence (Lenz et al., 2004; Pfeiffer et al., 2007). Regulatory RNAs in prokaryotes are very diverse and can act through a multitude of different mechanisms (Hess and Marchfelder, 2011). The majority of regulatory RNAs recognize their target mRNAs by base pairing, frequently sequestering the ribosome-binding site or start codon (AUG) sequences, repressing gene expression by modulating the initiation of translation or mRNA stability (Waters and Storz, 2009). However, several examples have shown that the window for target recognition can also be located upstream or downstream of ribosome-binding site/AUG sequences (Vecerek et al., 2007; Bouvier et al., 2008; Desnoyers et al., 2009; Pfeiffer et al., 2009), and that some small RNA (sRNA)–mRNA interactions lead to an activation of gene expression rather than repressing it. Consequently, hundreds of such potential riboregulators have been predicted in various groups of bacteria, but their actual verification and functional characterization is still mostly limited to human pathogens, mainly enterobacteria, Staphylococcus aureus and Pseudomonas species, whereas environmentally relevant bacteria are much less investigated in this respect.

Marine picocyanobacteria numerically dominate the euphotic zone of most oceanic waters and are widely recognized as being major contributors to the marine primary production and global photosynthetic biomass. There are only two genera of marine picocyanobacteria, Prochlorococcus (Partensky et al., 1999) and Synechococcus (Scanlan et al., 2009). Both genera comprise a number of different clades or ecotypes (Coleman and Chisholm, 2007), which exhibit distinct ecophysiologies, geographical and spatial distributions (Johnson et al., 2006; Zwirglmaier et al., 2008). The genetic and regulatory basis behind these patterns is only partially understood (Coleman and Chisholm, 2007; Scanlan et al., 2009).

Previously unrecognized putative sRNAs have been detected in microbial metatranscriptomic data sets from the ocean (Shi et al., 2009), including several such sRNAs most likely belonging to Prochlorococcus. Indeed, based on the total genome sequences of four marine cyanobacteria and using comparative genomics, systematic searches for sRNAs in three laboratory strains of Prochlorococcus and one Synechococcus identified seven different sRNAs in these marine cyanobacteria (Axmann et al., 2005), which were called Yfr1–7 for cYanobacterial Functional RNA. Complementing this approach by using high-density microarrays, additional 14 sRNAs and 24 cis-antisense RNAs were found in Prochlorococcus sp. MED4 (Steglich et al., 2008). On the basis of RNA profiling and detailed biochemical experiments, the involvement of several of these transcripts was suggested in light stress adaptation or the response to phage infection (Stazic et al., 2011), consistent with the location of most sRNA genes in highly variable genomic islands (Steglich et al., 2008). Whereas these data indicate that a whole class of regulatory genes and their functions are still to be identified in marine cyanobacteria, their further functional characterization is hampered by the fact that most Prochlorococcus, including the MED4 strain, are not genetically tractable. Therefore, we here set out to identify the suite of potentially regulatory sRNAs in Synechococcus sp. WH7803, a strain which can be manipulated (Brahamsha, 1996). Crucial for the successful computational prediction of sRNA genes is the availability of well-annotated genome sequences. The more than 20 genome sequences available for marine Synechococcus and Prochlorococcus (Kettler et al., 2007; Scanlan et al., 2009) provide an excellent data set for this purpose. Although phylogenetically closely related, these strains differ by their pigmentation and light-harvesting complexes and span the complete rRNA diversity (96% to 99.9% similarity) among the cultured representatives from the marine picocyanobacterial clade.

Materials and methods

Cultivation of Synechococcus sp. WH7803

Synechococcus sp. WH7803 was grown in ASW medium and total RNA was isolated as previously described (Axmann et al., 2005). The cyanobacteria were exposed to several stresses. For high light stress, cultures were transferred from 30 μmolquanta m−2 s−1 to 280 μmol quanta m−2s−1 white light for 1 h. Synechococcus was cold stressed by incubating the culture flask in a water tank with 10 °C cold running water for 2 h. To expose the culture to iron or phosphate limitation, the cells were spun twice for 10 min at room temperature at 10 000 g. Each time, the media were replaced and the cells were resuspended in the corresponding depleted media. For iron stress, ASW without FeCl6H2O and in the case of phosphate stress, medium without NaH2POH2O was prepared. RNA from iron-limited cells was extracted 12 h after initiating stress conditions. Cyanobacteria growing without added phosphate were collected after 24 h. We began all experiments with an exponentially growing culture diluted to a titer of 1x107 cells per ml, before applying the respective stress. Control cultures were treated like the stressed ones in terms of centrifugation and dilution but then incubated under standard conditions.

RNA extraction and analysis

Synechococcus were filtered rapidly onto Supor 0.45 μm membrane filters (Pall, New York, NY, USA). The filters were directly placed in RNA resuspension buffer (10 mM sodium acetate, 200 mM sucrose, 5 mM EDTA), flash frozen in liquid nitrogen and stored at −80 °C until extraction. The frozen material was thawed under addition of 3 vol Z6 buffer (8 M guanidinium hydrochloride, 50 mM mercaptoethanol, 20 mM EDTA, 20 mM MES, pH 7 (NaOH)) for 60 min at room temperature and gentle shaking. Acidic phenol was added (0.5 vol) and the cells were incubated for 15 min in a 60 °C water bath with occasional shaking by hand, followed by extraction with 0.5 vol of chloroform:isoamyl alcohol (24:1). After shaking, the phases were separated by centrifugation for 5 min. The aqueous phase was extracted and purified by phase extraction with 1 vol phenol:chloroform:isoamyl alcohol (25:24:1) and finally 1 vol chloroform:isoamyl alcohol (24:1). The RNA in the aqueous phase was precipitated with 1 vol isopropanol at −20 °C overnight. After centrifugation at 13 000 g for 30 min at 4 °C the pellet was washed with 70% ethanol and subsequently centrifuged at 13 000 g for 5 min at 4 °C. All liquid was removed and the pellet dried for 10 min, dissolved in autoclaved water and stored at −80 °C. Quality of RNA was checked visually on 1.5% denaturing agarose gels stained with ethidium bromide.

High resolution northern hybridization and 5′RACE experiments were performed from the separation of 5 to 50 μg of total RNA as described by Georg et al. (2009). Single-stranded RNA probes were constructed by in vitro transcription from the T7 promoter as described (Georg et al., 2009). DNA probes for the detection of transcript levels of control genes for cold stress and high light conditions were labelled by random priming. 12–25 ng of PCR fragment in 5 μl of aqueous solution were mixed with 7 μl of 1 M Hepes-KOH (pH=6.7, final concentration 20 μM; final volume 35 μl). A volume of 2 μl of random hexamer primer (100 μM, final concentration 5.7 μM) (Fermentas, St. Leon-Rot, Germany), 4.5 μl of TE buffer and 8 μl H2O were added. The solution was denatured for 10 min at 95 °C and cooled for 15 min at room temperature. The preparation was continued by the addition of 3.5 μl mercaptoethanol (100 mM, final concentration 10 mM), 2.5 μl dNTPs (200 μM of each dATP, dCTP and dGTP, 500 mM Tris-HCl pH 7.8, 50 mM MgCl2), 1 μl of Klenow exo- (2 U per μl) (Fermentas) and 2.5 μl of [α-32P]dTTP (10 μCi per μl). The solution was incubated for 10 min at 37 °C followed by the addition of 0.5 μl of dNTP mix (1 mM of dATP, dTTP, dCTP, dGTP) and incubation for additional 5 min at 37 °C. The reaction was stopped by the addition of 3.5 μl of EDTA (0.2 M). The labelled DNA was denatured for 10 min at 95 °C and stored on ice until use. Sequences of all DNA oligonucleotides are given in Supplementary Table S1.

Microarray analysis

To facilitate global experimental analysis, a tiling microarray was designed consisting of 12 000 overlapping oligonucleotides for both DNA strands. We chose the 12 K Electrasense CombiMatrix (CombiMatrix Diagnostics, Irvine, CA, USA) microarray system for this experiment. The 12 000 overlapping oligonucleotide probes covered almost all intergenic regions (IGRs) of the Synechococcus sp. WH7803 genome on both strands with an average probe length of 31.2 nt, resulting in an average overlap of the probes of 12 nt.

DNase treated RNA (4 μg) was directly labelled with the Kreatech Diagnostics (Amsterdam, The Netherlands) ULS labelling kit for CombiMatrix arrays with Biotin-ULS, according to the manufacturer's protocol (CombiMatrix Diagnostics). The RNA fragmentation after labelling was performed as advised for aRNA fragmentation reactions by the manufacturer's instructions (Ambion, Austin, TX, USA). The arrays were hybridized with 3 μg of labelled, fragmented RNA, scanned and stripped according to the manufacturer's protocol using chemicals provided by CombiMatrix (CombiMatrix Diagnostics). ElectraSense Application Software (CombiMatrix) was used for feature extraction. Advanced analyses were done with the software Genespring GX 10.0.2 (Agilent Technologies, Santa Clara, CA, USA). The feature intensities of all experiments and its controls were quantile normalised. The microarray data are available in the GEO database with the accession number GSE28263.

Sequence data and computational analysis

A comparative prediction of ncRNA elements within IGRs was performed as previously described (Voss et al., 2009). In short, all IGRs longer than 50 nt were compared among the five different genomes using BlastN. The genome sequence for Synechococcus sp. WH7803 (Dufresne et al., 2008) was obtained from the Genoscope (http://www.genoscope.cns.fr/). All other sequences were obtained from the finished or unfinished genomes website at Genbank http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi with the following accession numbers: Synechococcus CC9605, NC_007516; WH8102, NC_005070; RS9916, NZ_AAUA00000000; Synechococcus CC9311, NC_008319. Based on the results of BlastN, homologous sequence regions were clustered, aligned using ClustalW, and analyzed for structural significance by RNAz (Washietl et al., 2005). Alignments were postprocessed using the tool rnazSelectSeqs.pl with default parameters from the RNAz package. RNAz was applied in a sliding window approach (a step size of 10 nt and different window sizes were used, namely 80, 100, 120, 140 and 160 nt), of which the window with highest probability was selected. Elements were considered high-scoring when they achieved an RNAz probability of 0.5 or their Z-score was −2.0. The Z-score is a commonly used measure for structure stability. As we had no prior knowledge about the orientation of an sRNA gene (forward or reverse strand), we computed the measure for both possibilities, given as Z and Zrev, respectively.

In order to assess the coding potential of the candidate sRNAs we used RNAcode (Washietl et al., 2011), which predicts the coding potential for segments of a multiple sequence alignment. We applied RNAcode with default parameters to the same alignments as RNAz and restricted the output to the best hit (option -b). The P-value is given in the comment column, where significant predictions are expected to have a P<0.05.

Results

Computational prediction of sRNAs in marine Synechococcus

Marine Synechococcus have developed specific adaptations to cope with different environmental conditions, especially with regard to the availability of various nutrients, light quantity and quality. Based on 16S rRNA sequences, marine Synechococcus have been divided into three subclusters, with the major one (subcluster 5.1) consisting of nine genetically distinct clades, I to IX (Scanlan et al., 2009). Here, genome sequences of five marine Synechococcus were included (strains CC9311, CC9605, WH 8102, WH 7803 and RS9916), representing clades I, II, III, VI and IX of subcluster 5.1 (Scanlan et al., 2009).

The comparative prediction was performed based on the sequence-structure conservation of possible RNA elements as described before (Voss et al., 2009) and yielded information on putative sRNAs, possible riboswitches, as well as highly structured 5′ and 3′ mRNA leader and trailer sequences.

This analysis resulted in 156 sequence clusters above threshold (P or Prev >0.5; (Washietl et al., 2005)) for marine Synechococcus. From these, 70 included a direct hit in Synechococcus sp. WH7803 and 24 included sequences from all five compared genomes. A summary of the highest scoring clusters is given in Table 1. Comprehensive information on all predicted RNA elements can be accessed at: http://www.cyanolab.de/paper/suppl/synechococcus/summary.html.

Table 1 List of selected top-scoring RNA elements

Short transcripts from IGRs can also be coding because gene modelling for short protein-coding genes is less accurate than for longer genes. Therefore, the RNAcode algorithm (Washietl et al., 2011) was used to differentiate between non-coding and possibly coding regions based on nucleotide substitution patterns in an alignment of at least three homologous sequences. This approach lead to the identification of three candidates for peptide-coding or dual function sRNAs (i519, i1825 and i1971 (Yfr2a)). The latter sRNA is ubiquitous in cyanobacteria (Gierga et al., 2009), a potential peptide-coding function has not been described thus far. Moreover, there is no suitable AUG in Yfr2a, therefore it still is more likely a non-coding sRNA rather than a short mRNA. However, the other two sRNAs are indeed likely coding for short peptides. Their deduced amino acid sequences are presented in Table 2.

Table 2 Newly identified protein-coding genes or gene segments

Verification experiments

To facilitate global experimental analysis, a tiling microarray was designed. This array consisted of 12 000 probes targeting exclusively the non-protein-coding IGRs of the genome (average overlap of 12 nt). To avoid artefacts of cDNA synthesis, RNA was directly fluorescence-labeled without cDNA synthesis (Georg et al., 2009). RNA was isolated from cyanobacteria cultivated under standard growth conditions or applying environmentally relevant stress conditions, low temperature, high light, iron and phosphate limitation. Samples were taken shortly after the initiation of the respective growth condition in order to detect an early response of the cells and to avoid secondary effects. The successful stress induction was verified by tracking the expression of known marker genes.

For the verification of high light stress, the genes psbA, psaA and groEL were chosen, encoding the photosystem II (PSII) D1 protein, the photosystem I (PSI) reaction center protein PsaA and the GroEL chaperone. These choices were based on previous observations that at least one of the four Synechococcus sp. WH7803 psbA genes is strongly induced by high light (Garczarek et al., 2008), that psaA expression is downregulated upon transfer from low to high light in many cyanobacteria (Hihara et al., 2001), whereas groEL is upregulated under high light in Synechocystis PCC 6803 (Hihara et al., 2001). Marker genes for the detection of cold stress were synWH7803_1511, synWH7803_0505 and synWH7803_2417, encoding an ATP-dependent RNA helicase, the phycobilisome rod-core linker polypeptide CpcG and one of two desC-like desaturase genes. Homologs of these genes were previously found to be upregulated under cold stress (the RNA helicase CrhL (Inaba et al., 2003) and desC (Los and Murata, 1999)) or repressed (CpcG). Marker gene for phosphate limitation was pstSII, well known to be upregulated under this condition in Synechococcus sp. WH7803 (Scanlan et al., 1993, 1997), whereas the iron-stress-induced gene futA (Webb et al., 2001; Singh et al., 2003) served as marker for iron limitation. These experiments confirmed the intended effects of gene expression under the respective growth conditions (Supplementary Figure 1).

Validation of novel transcripts from IGRs and their regulation

RNA from the following conditions was directly labelled and hybridized to the microarrays: light shift from 30 μE to 280 μE for 1 h, temperature shift from 21 °C to 10 °C for 2 h, iron and phosphate limitation for 24 h each.

The median intensities of all experiments and controls were quantile normalised, P-values calculated by application of an unpaired t-test and corrected according to Benjamini–Hochberg adjustment using the Genespring GX 10.0.2 software (Agilent Technologies). The microarray data are available in the GEO database with the accession number GSE28263.

The detected sRNA candidates were validated by Northern blots (Figures 1 and 2) and in many cases by additional 5′ RACE experiments. We found six sRNAs differentially regulated under cold stress, six ncRNAs differentially regulated under high light stress, two sRNAs responding to iron limitation and one transcript differentially regulated under phosphate limitation. The homolog of the 6S RNA of Escherichia coli (previously called Yfr7 in marine cyanobacteria (Axmann et al., 2005, 2007) was also detected, but revealed no variation in RNA levels. Probes for tmRNA (ssrA) and rnpB were not included on the microarray because of the good characterization of these RNAs in the literature. Subsequent to the Northern experiments, the analyses of the IGRs were complemented by 5′RACE experiments.

Figure 1
figure 1

Verification of microarray data by northern hybridization of differentially expressed sRNAs from IGRs. (a) Transcript levels for sRNAs affected by cold stress. 21 °C marks the control, 10 °C the cold stress conditions and for i1720, a time series is shown with transcript levels after 0, 30, 60 and 120 min of incubation. (b) Levels of sRNA affected by iron limitation. (c) Phosphate limitation. (d) High light. Major sRNA bands are marked by an arrow. Size markers are given to the left. Control hybridizations against the RNAse P RNA (RnpB) are shown at the bottom.

Figure 2
figure 2

Northern blotting analyses of sRNAs with high microarray probe intensities. Positions are labelled by arrows at which the main sRNA species accumulate.

Nineteen additional sRNAs exhibited an overall high expression, verified by northern hybridization (Figure 2), but their accumulation did not change under the tested conditions. There are several bacterial sRNAs known with constitutive high expression that are physiologically very relevant (for example, tmRNA, RNase P RNA or the 4.5S RNA component of the signal recognition particle) but in this work we focused on those which were differentially expressed as these are more likely to have a regulatory function. All verified sRNAs are summarized in Table 3 together with additional data, such as genome location, orientation and transcriptional start site (TSS), if determined.

Table 3 Location of Northern-verified sRNAs expressed in Synechococcus WH7803

One of the identified sRNAs, i298, is located antisense to the 5′ UTR of psbA. The hybridization results, with several bands from 80 to about 500 nt (Figure 2), resemble more the pattern observed for bacterial cis-antisense RNAs (Georg et al., 2009) rather than that of a trans-acting sRNA originating from an intergenic spacer. Indeed, the TSS of i298 is located in antisense orientation within the coding region of psbA, overlapping 109 of its codons. Another, very short sRNA (i1642) was found partly antisense to the 5′ end of kaiA (Figure 2), encoding one of three core components of the cyanobacterial circadian clock (Ishiura et al., 1998). Like these examples, many identified sRNAs offer an interesting starting point for more detailed analyses.

Differentially expressed sRNAs in Synechococcus sp. WH7803

Figure 3 illustrates the identification of differentially expressed sRNAs by microarray analysis. The sRNA i403 originates 3′ to the gene mpeU (Figure 3) coding for a bilin biosynthesis protein, likely involved in binding phycobilins to one or both types of phycoerythrin in this organism (Six et al., 2007). The Northern blot verified a clear increase in transcript levels after exposure to high light for two hours. The sRNA i403 has a size of 60 nt. This sRNA is located in a genomic island together with several genes encoding phycobilisome rod components (Dufresne et al., 2008). As changes in RNA levels under high light are likely for genes associated with light harvesting, this sRNA might be involved in this regulation (see also sRNA i399 and possible targets below).

Figure 3
figure 3

Graphical output of microarray data and corresponding northern hybridization results for i403, i855 and i1971. Displayed are the normalized log2 intensity values for the single probes from both strands. The x axis shows a schematic representation of the genome of Synechococcus sp. WH7803. The values for the five different growth conditions are presented in different colors. The values for the control growth condition were calculated as the mean intensity from the controls of all experiments. Every dot represents the middle of the respective probe (the strands for probe orientation have been reversed for easier visualization). The gene for the respective sRNA is indicated by a pink arrow and the location of the single-stranded RNA probes for northern hybridization is marked as a red box.

The sRNA i855 accumulated more strongly under iron deficient conditions (Figure 3). The size of about 220 nt detected in the Northern blot, together with the microarray data suggested a partial antisense location to gene synWH7803_1012. However, further inspection of this sRNA revealed its likely coding function and the presence of a 48 codons reading frame (Table 2). Indeed, a related protein-coding gene, RS9916_26469, is annotated in Synechococcus RS9916 (Supplementary Figure 2a). Consequently, we annotated the deduced gene product of i855 as nORF1 (novel ORF1). Blast searches on the Marine Cyanobacteria Blast Server (http://www.sb-roscoff.fr/SIG/blast/blastWH7803.html) detected another homolog in Synechococcus sp. WH7805. This sequence contains an ORF of the same length as nORF1 and the gene RS9916_26469 and shares 92% identical amino-acid residues with the deduced Synechococcus sp. WH7803 nORF1 sequence (Supplementary Figure 2b). In Synechococcus sp. WH7805, this gene is flanked by WH7805_01707 and WH7805_01712. The latter gene codes for a putative Mn2+/Fe2+ transporter of the NRAMP family and constitutes another possible iron stress-response gene. Further sequence analysis with the upstream promoter sequences revealed a conserved sequence motif also found in front of putative transcription factor genes, belonging to cluster 1390 of the Cyanorak database (http://www.sb-roscoff.fr/Phyto/cyanorak/). We detected a strong conservation of this likely promoter element among the cluster 1390 and nORF1 genes from the mentioned strains, especially in the −35 region (Supplementary Figure 2c). Cluster 1390 proteins constitute with high probability a class of CRP-type transcription factors involved in iron regulatory processes (Scanlan et al., 2009), thus there is a strong link for nORF1 to be involved in the regulation of iron homeostasis.

Expression of the two previously identified highly expressed sRNAs Yfr2a and Yfr2b (Axmann et al., 2005; Gierga et al., 2009) was downregulated under cold stress (Figure 3). Yfr2a and b belong to a conserved sRNA class, found in freshwater as well as marine cyanobacteria (Gierga et al., 2009). Their function is unknown but their differential regulation is here observed for the first time.

Another sRNA (i399) with decreased expression under cold stress and even more reduced levels under high light was detected in the 5′UTR of cpeB (Figure 4). The Northern analyses confirmed the regulation inferred from the microarray data and indicated a prominent transcript of about 80 nt, but also several transcripts of greater length. Because of the proximity to cpeB, transcribed in the same direction, the TSSs of i399 and cpeB were mapped. Interestingly, the cpeB TSS was located to genome position 493029, 20 nt upstream of the TSS mapped for i399 (Table 1). We conclude that there are two promoters in front of cpeB. The more distal one controls the transcription of the gene, whereas the more proximal located promoter controls the transcription of i399. The expression of cpeB appears to be downregulated under cold stress and high light, but the expression of i399 is decreased only under high light.

Figure 4
figure 4

Microarray data and corresponding northern hybridizations for i399 (color code as in Figure 3). The northern blots show the transcript levels under standard and low temperatures (21 °C and 10 °C) and under standard and high light (C and HL). The 5′ RACE results are shown as green bars for (1) cpeB and (2) the small RNA in front of cpeB with the 3′ end of (2) defined by the location of the RACE primers (green triangle) and not reflecting the natural transcript 3′ end. The location of the single-stranded RNA probe for northern hybridization is marked as a red box.

Target prediction for selected ncRNAs

We determined possible target genes with the program IntaRNA (Busch et al., 2008), taking the sRNA as query and all annotated protein-coding genes of Synechococcus sp. WH7803 as the subject. The complete list of target predictions is presented in Supplementary Table S2. Interestingly, cpeB appeared as a possible target gene for two sRNAs, i403 (Figure 3) and the 60 nt long sRNA i812 (Figure 2). The i403 sRNA showed significantly higher transcript levels under high light, possibly caused by an upregulation of transcription. As a consequence, target expression could be downregulated by base-pairing interactions with this sRNA under these conditions. The two potential target mRNAs cpeB and mpeB with their possible interaction sites are shown in Figure 5. The predicted interactions consist of a perfect match of 10 nt between sRNA and mRNA around the AUG. The biological interpretation of such a suggested downregulation of phycobilisome components could be the prevention of photosystem damage by an excess of energy.

Figure 5
figure 5

Potential interactions between the sRNA i403 and the mRNAs cpeB and mpeB involving 2 nt of the AUG (highlighted in red) and eight following positions. The interaction was predicted using the program IntaRNA (Busch et al., 2008). The color reproduction of this figure is available at ISME journal online.

Bioinformatic analysis of the second aforementioned sRNA, i812, also suggested cpeB as a possible target (Figure 6a). A similarly extended complementarity was observed between i812 and two other mRNAs involved in photosynthesis, petA (Figure 6b) and cpeE (Figure 6d). A fourth potentially interesting target is synWH7803_1709, coding for an OmpR-type two-component-system response regulator (Figure 6c).

Figure 6
figure 6

Potential targets for the sRNA i812. Targets were identified by IntaRNA (Busch et al., 2008) and interactions between sRNA and the respective mRNA target were plotted with RNA hybrid. Presented are possible interactions between sRNA i812 (in green) and the mRNAs (orange) for cpeB (a), petA (b), synWH7803_1709 (c) and cpeE (d). Especially noticeable is the pronounced complementarity in the region directly 5′ of the mRNA AUG (boldface letters and arrow).

Discussion

New sRNAs and novel protein-coding genes in Synechococcus sp. WH7803

The necessity for physiological adaptation to fluctuating mesotrophic conditions, in combination with its streamlined genome (Dufresne et al., 2008) and an existing genetic system, make Synechococcus sp. WH7803 an attractive model for marine picocyanobacteria. The microarray analysis did not only confirm many of the predicted RNA elements but in addition revealed the presence of transcripts that had not been computationally predicted. In summary, we identified 15 transcripts differentially regulated under environmentally relevant conditions. Additional 21 transcripts were found with high expression levels. From these 36 transcripts, four are likely protein coding (three novel short peptide-coding genes and one 5′-prolonged long protein-coding gene), the other 32 are likely non-coding sRNAs. One of the peptide-coding mRNAs and two of the sRNAs are to a substantial degree cis-antisense RNAs, whereas all other appear to be candidates for bona fide trans-acting, potentially regulatory sRNAs. Their functions are entirely unknown at present but some of them may be involved in the response to distinct types of nutrient limitation, redox stress, or phage infection.

Very short protein-coding genes are notoriously difficult to predict during standard genome annotation, therefore it is not surprising to identify some transcripts, which are, in fact, rather short mRNAs than regulatory non-coding sRNAs. From the validated sRNAs in this work, only three possessed the potential to code for a protein. The sRNA i855 coding for nORF1 was not classified as potentially protein coding, because it substantially overlaps synWH7803_1012, as described in the results section. The deduced amino-acid sequences of these three peptide candidates are presented in Table 3.

A cis-antisense RNA to psbA

The combination of computational results with experimental data revealed interesting features of some of the newly identified sRNAs. We did not only define a set of sRNAs for Synechococcus sp. WH7803, but provide information about the distribution of these sRNAs within a subset of marine picocyanobacteria. With i298 we identified an sRNA antisense to the 5′UTR of psbA, for all five tested cyanobacteria. The gene psbA encodes the photosystem II reaction center protein D1 and frequently occurs in the form of a small gene family. Synechococcus sp. WH7803 possesses four psbA copies. One copy (synWH7803_0784) encodes the D1:1 isoform of the protein, mainly expressed under non-stress conditions and providing the majority of the D1 protein (Garczarek et al., 2008). The other three psbA copies encode the D1:2 isoform of the protein. Using 5′ UTR-specific primers Garczarek et al. (2008) identified one of the three copies (synWH7803_0790) to become strongly expressed under high light whereas the other two copies (synWH7803_0366 and synWH7803_2084) were only slightly induced. Our microarray data gave evidence for sRNA i298 to be specifically associated with synWH7803_0366 and synWH7803_2084. Prediction and microarray data were confirmed further by Northern analysis. Mapping the TSS by 5′ RACE revealed i298 to originate within the coding region of psbA and the Northern results showed a small cohort of bands (Table 1 and Figure 2). Thus, i298 resembles more a cis-antisense RNA to psbA than a typical trans-acting sRNA. Antisense RNAs were also reported to the psbA gene of Synechococcus phage S-PM2 (Millard et al., 2010) and the psbA2 and psbA3 genes of Synechocystis PCC6803 (Mitschke et al., 2011). In Synechococcus sp. WH7803, this sRNA could be a factor contributing to the observed lesser expression of the two synWH7803_0366 and synWH7803_2084 psbA gene copies.

Novel sRNAs and the adaptation to changes in the environment

Regulatory RNAs that control the response to changing physical parameters such as light or temperature are frequently controlled by this environmental factor itself. Therefore, we included experiments to test the sRNA expression under various conditions. Temperature, in particular cold stress, is one of the most important environmental factors driving the distribution and composition of natural populations of marine cyanobacteria at a global scale (Bouman et al., 2006; Johnson et al., 2006; Zwirglmaier et al., 2007). There is solid evidence for the control of adaptation to low temperatures in bacteria through regulatory RNA (Sledjeski et al., 1996). Indeed, here we found six sRNAs specifically responding to cold. From these, five became repressed and one activated, suggesting sRNA involvement in the response to colder temperatures also in marine picocyanobacteria. Another point for future analysis is how regulation of differential sRNA expression is achieved. There is no doubt that promoters for sRNA genes are fully integrated into the fabric of the intracellular regulatory network. However, cold or heat stress should lead to conformational changes of the regulatory RNA molecules themselves, as well as of other transcripts. Indeed, there is evidence that this is another level of posttranscriptional control of gene expression. Expression of the RNA helicase CrhL becomes induced upon cold stress (Supplementary Figure 1d) and the mutation of its homolog CrhR in Synechocystis sp. PCC6803 leads to a severe phenotype at all temperatures but most pronounced in the cold (Rowland et al., 2011).

Besides cold stress, we studied the responses to high light, phosphate and iron limitation during exponential growth. Regulatory sRNAs involved in the regulation of iron homeostasis are known from several different groups of bacteria. The best studied representative is the 90 nt sRNA RyhB that was first identified in a genome-wide screen in E. coli (Wassarman et al., 2001) and regulates probably more than 20 iron-storage, iron-using and other proteins when iron becomes limiting (Masse and Gottesman, 2002; Modi et al., 2011; Salvail and Masse, 2011). Functional homologs of RyhB were also identified in several non-enterobacteria, among them PrrF1 and PrrF2 in Pseudomonas aeruginosa (Wilderman et al., 2004). Interestingly, all these sRNAs have in common that their expression becomes activated when iron becomes limiting and share the iron-dependent superoxide dismutase (sodB gene) as validated target (Afonyushkin et al., 2005; Masse et al., 2005). From the two sRNAs strongly induced by iron limitation in Synechococcus sp. WH7803 i854 and i855, one (i855) is likely coding for a short peptide involved in iron homoeostasis. Nevertheless, it qualifies as a candidate for a functional homolog of RyhB and PrrF, even more so as sodB ranks relatively high on the list of predicted targets (predicted energy −14.35 kcal mol−1; Supplementary Table S2).

There is no example for an sRNA involved in the bacterial phosphate stress response in the literature and the i95 transcript identified here as induced upon phosphate limitation is no exception. Closer inspection of our data revealed i95 to belong to the SynWH7803_0111 gene, which was incorrectly annotated and codes for a PhoA-type phosphatase. Based on the transcript data we extended the predicted protein sequence by 109 additional amino acids (Table 2), yielding a giant protein of 1855 amino acids. This protein has homologs in chlorobia (Pelodictyon phaeoclathratiforme and Chlorobium phaeobacteroides DSM 266), Synechococcus WH8102 and the diatom Thalassiosira pseudonana CCMP1335. Notable features include the 5′ nucleotidase and a phytase-like domain (Scanlan et al., 2009).

Most previous work on bacterial sRNA regulators has been focused on human pathogens. However, for cyanobacteria, as photosynthetic organisms, light is a key environmental factor. Therefore, the finding of six light-regulated sRNAs is truly intriguing as some of them might be involved in the control of photosynthesis. Among the predicted targets of these six light-regulated sRNAs are several components of the light harvesting apparatus, some of which would be consistent with the effects observed upon mutation of the high light-inducible SyR1 sRNA in the cyanobacterium Synechocystis PCC 6803 (Mitschke et al., 2011). Future work needs to be done to examine the specific functions of these sRNAs in vivo, by manipulating their expression and the physiological characterization of deletion mutants. The identification of a small set of sRNAs here as particularly interesting targets is an essential precondition for such experiments. Hence, this work provides a solid fundament for further experimental analysis of sRNAs, which are likely interacting with proteins involved in essential cellular processes.