Microbial metabolic networks in a complex electrogenic biofilm recovered from a stimulus-induced metatranscriptomics approach

Microorganisms almost always exist as mixed communities in nature. While the significance of microbial community activities is well appreciated, a thorough understanding about how microbial communities respond to environmental perturbations has not yet been achieved. Here we have used a combination of metagenomic, genome binning, and stimulus-induced metatranscriptomic approaches to estimate the metabolic network and stimuli-induced metabolic switches existing in a complex microbial biofilm that was producing electrical current via extracellular electron transfer (EET) to a solid electrode surface. Two stimuli were employed: to increase EET and to stop EET. An analysis of cell activity marker genes after stimuli exposure revealed that only two strains within eleven binned genomes had strong transcriptional responses to increased EET rates, with one responding positively and the other responding negatively. Potential metabolic switches between eleven dominant members were mainly observed for acetate, hydrogen, and ethanol metabolisms. These results have enabled the estimation of a multi-species metabolic network and the associated short-term responses to EET stimuli that induce changes to metabolic flow and cooperative or competitive microbial interactions. This systematic meta-omics approach represents a next step towards understanding complex microbial roles within a community and how community members respond to specific environmental stimuli.

Three metagenomic assembly metrics, total bases (y-axis), N50 (x-axis), and % mapped raw reads back to the assmbled contigs (size of bubble), are shown with different parameters of both kmer size (dashed line) and bubble length (color) in CLC de novo assembly cell 4.0. Open arrowhead indicates an assembly used in previous study (Ishii et al. 2013. Nature Commun. 4: 1601). Filled arrowhead indicates an assembly used in this study. Panel A shows Bin-genome clusters (colored rounded rectangles) for taxa related to Phylum Proteobacteria and Family Methanosacrinaceae, while panel B shows Bin-genome clusters for the other taxa. Bingenomes (ID described near rounded rectangles) were established using the estimated taxonomic classification of contigs (color of dots), contig lengths (size of dots), GC content of contigs (%), and mean coverage of contig.

Bacteroides fragilis (YP_209956)
Odoribacter splanchnicus (YP_004253093)   Left panels show the relationship between MFC and SP conditions. Right panels show the relationship between SP and OC conditions. The correlation coefficient (R) is shown.

Methanosarcinaceae Betaproteobacteria
Metagenome con. MFC vs con. SP Metagenome con. SP vs con. OC Left panels show the relationship between MFC and SP conditions. Right panels show the relationship between SP and OC conditions. The correlation coefficient (R) is shown. Left panels show the relationship between MFC and SP conditions. Right panels show the relationship between SP and OC conditions. The correlation coefficient (R) is shown. Left panels show the relationship between MFC and SP conditions. Right panels show the relationship between SP and OC conditions. The correlation coefficient (R) is shown. Longer-term population dynamics of those two EET-responsive families, family Desulfobulbaceae (red circle) and family Desulfuromonadaceae (blue square) suggested from the stimulus-induced metatranscriptomics was analyzed by bacterial 16S rRNA-basis community analyses of the anode-associated electrogenic biofilm, which was reported from the previous studies (Ishii et   MFC#1 was analyzed based on the metagenomic DNA frequency (DNA-RPKM), 16S rRNA clone analyses (total number of sequenced clone is described in center circles) for both Bacteria and Archaea which are described in previous study (Ishii et. al. 2013. Nat Commun 4:1601). MFC#2 and MFC#3 were analyzed based on 454 pyrosequencing of V1-V3 region or V3-V5 region of 16S rRNA amplicon (number of sequenced reads are described in paraphrase of the center circle), which method is described in previous work (  Branch points supported with bootstrap values (100 trials) of >90% are indicated with closed circles, while those between 70% and 90% are indicated with open circles. Accession numbers of reference sequences are indicated in parentheses.
e Contig was assigned to phylum "Mixed " when several phyla classified to ORFs within single contigs not met to contig taxnomic binning criteria. Numbers of CDSs were counted of ORFs where at least one mRNA read was mapped on the criteria of 50% length cut-off and 95% identity. d Contig was assigned to phylum "Mixed " when several phyla classified to ORFs within single contigs not met to contig taxnomic binning criteria.
c Raw reads counts to the bin-genomes wwere calculated by read mapping of raw DNA or mRNA reads to ORFs. The total DNA reads were calcurated as means of three operational conditions. The total mRNA reads of each condition were normalized by DNA reads of each condition.   Upregulated genes showed red letters, while down-regulated genes showed blue letters. b DNA frequency (DNA-RPKM) was an average of three operational conditions. c Potential operons (gene cluster within the contig) were highlighted by different colors.

Characterization of thirteen bin-genomes
To accomplish the recovery of bin-genomes of highly abundant microbes in the complex microbial community, we clustered the assembled contigs by using mean contig coverage vs. G+C content plots as reported elsewhere (1). The contig clusters were further analyzed by a set of criteria that considered the predicted taxonomy, mean coverage, G+C content, and contig length (Supplementary  Tables S2). Using this method, we recovered thirteen bin-genomes overall ( Table 1). Six of these bingenomes (MS1, DB1, DB2, DF1, Bac1, and Bac2) showed notably high estimated genome completeness based on observed coverage of single-copy housekeeping genes (Supplementary Table S3-S4) (2, 3), indicating nearly-complete draft genomes of these six strains were recovered from the metagenome. Bingenomes DM1, DF2, and Bet1 showed relatively larger numbers of contigs and less complete genomes than the other bin-genomes. The lower quality of these specific bin-genomes is likely due to the presence of phylogenetically closely related strains that assembled with a similar mean coverage, and/or microdiversity between the strains resulting in different mean coverage but notably similar genomes (4). However, the assembly quality for those bin-genomes was sufficient for the subsequent analyses.
On the other hand, two bin-genomes (Chl1mix and DM2mix) showed duplicate or triplicate existence of the noted single-copy genes (Supplementary Table S3), suggesting that those clusters were mixtures of genomes from at least two different microbes, and were omitted from further metabolic network analyses. In total, eleven bin-genomes were recovered from the new metagenomic assembly and determined to be the dominant strains within the community. These bin-genomes were then utilized for reconstructing the metabolic networks within the community by performing comparative analyses of their gene expression profiles between before and after stimuli addition.
The 16S rRNA gene is generally used to affiliate bin-genomes to their taxonomic position and estimate their potential role in the community; however, it is difficult to assemble highly conserved genomic regions such as 16S rRNA (Supplementary Table S5). Therefore, we used three housekeeping proteins (RplE, NusA, and PheS) for analyzing the phylogenetic positions and taxonomic classification of each bin-genome if the bin-genomes did not include 16S rRNA sequence information ( Supplementary Fig.  S3-S6). This method provided the taxonomic assignment of the eleven bin-genomes (Table 1) and led to the first estimation of functional roles for each strain; i.e., family Methanosarcinaceae (MS1) as a methanogen (5), families Desulfobacteraceae (DF1 and DF2) and Desulfobulbaceae (DB1 and DB2) as dissimilatory sulfate reducers (6), and Desulfuromonadaceae (DM1) as a dissimilatory iron reducer (7). These results also indicate that we successfully identified a bin-genome of the most dominant Desulfuromonadaceae strain DM1 from Desulfuromonadales pan-genome DM of the previous study (1), which is important for analyzing the strain-based metabolic network.

Accurate community composition for complex microbial community
Conventional community analyses, i.e. evaluating community structure based on the relative abundance of small subunit rRNA gene of each taxon, is convenient. However, there are some unavoidable issues due to the inconsistent copy numbers of ribosomal RNA gene operon per genome (8), ununiformed compatibility of universal primers to the 16S rRNA genes from various organisms for the PCR amplification (9), and difficulty to design primers that can amplify both Archaeal and Bacterial 16S rRNA genes (10). Although the 16S rRNA clone analysis showed appropriate identification of phylotypes with phylogenetic positions (Supplementary Fig. S3), the community composition is incorrect due to the reasons described above (1).
To gain more accurate view of the community composition by overcoming the issues, the metagenomics-based community composition analysis has been examined because the next generation sequencing (NGS) is theoretically less-biased with no PCR amplification than 16S rRNA-basis sequencing. Taxonomic assignment of each ORF by similarity search against UniRef (11) or NCBI-NR (12) are frequently used for raw read or ORF frequencies-basis community composition analysis such as MEGAN (13) and MG-RAST (14). However, the results from this method are considerably affected by the accumulation of adequate reference genomes, and never address uncultured taxa with no reference genome and also phylotype level information of the given communities.
Since our bin-genome effort by contig clustering enables to show the strain-level information as well as the unclassified taxa (like bin-genomes Unc1 and Unc2) in the community (Table 1), raw reads frequencies for each bin-genome are another option to identify community composition (Supplementary Table S5). The raw reads frequencies related to bin-genome and taxonomic assignment of contigs showed relatively accurate view of the community composition; however, the approach has potential errors in terms of microbial cell numbers mainly attributed by size of (bin)genomes and raw reads contamination of virus/plasmid. The sizes of genomes directly affect to raw reads frequencies such as bigger genome size overestimates the frequency because of more raw reads existence. We found several contigs taxonomically assigned as Superkingdom Viruses within our metagenomic assembly. Those contigs seem to represent miscellaneous viruses infected and amplified in the microbial cell body. At the case, it is better to omit viruses-associated raw reads for microbial community composition analysis. The plasmidassociated contigs are similar contaminant to consider accurate community composition analysis because their copy numbers within the microbial cell body are highly variable.
To overcome all concerns described above, we used core-gene frequencies based community composition by relative frequencies of sixteen single-copied housekeeping genes for each bin-genome (Fig. 2). The core-gene frequency based community composition analysis along with metagenomics was first reported using 31 proteins as the phylogenetic markers for Bacterial population analyses (15), and using 105 proteins for Archaeal phylogenetic markers (16). However, in anaerobic microbial ecosystems, both Bacteria and Archaea are often existed together in a community; thereby, we selected core genes that are present in both domains' genomes for the community analyses ( Fig. 2A). To the end, core-gene frequencies based community composition would be most suitable way to reflect microbial cell numbers composition within the given communities (Fig. 2C).

Stimulus-induced metatranscriptomics for natural environmental community
In the natural environments, microbial communities are sometimes highly diverse (17) because the selective pressures are low and variable (18). In those cases, metagenomic sequence data can be complex and analytically challenging (19) to obtain sufficient genome-binning; however, if dominant organisms occupy more than 1.5% of a given community, their bin-genomes will be associated using our level of metagenomic sequencing (~100M reads and ~10 Gbp). Given advances in sequencing technologies which can now provide high coverage sequence data, the recovery of draft genomes from the highly diverse samples will continue to improve.
In addition, natural environments generally include multiple variables that impact microbial activities. For example, a diurnal cycle and depth dependency includes light intensity as well as temperature and oxygen concentration differences (20)(21)(22), which are both significant environmental changes relative to microbial activity. If these two environmental factors are divided into individual stimuli (e.g. temperature or light), we can separately analyze the effect of each factor that controls individual microbial activity and metabolism, or interspecies interaction in the complex microbial communities. Since the microbial community should respond by changing gene expression relative to the specified applied stimulus (e.g. only light intensity), our proposed stimulus-induced metatranscriptomics approach with genome-binning should be suitable in many natural environments for further understanding how environmental perturbations impact microbial function at a strain-and community-level.