Freshwater genome-reduced bacteria exhibit pervasive episodes of adaptive stasis

The emergence of bacterial species is rooted in their inherent potential for continuous evolution and adaptation to an ever-changing ecological landscape. The adaptive capacity of most species frequently resides within the repertoire of genes encoding the secreted proteome (SP), as it serves as a primary interface used to regulate survival/reproduction strategies. Here, by applying evolutionary genomics approaches to metagenomics data, we show that abundant freshwater bacteria exhibit biphasic adaptation states linked to the eco-evolutionary processes governing their genome sizes. While species with average to large genomes adhere to the dominant paradigm of evolution through niche adaptation by reducing the evolutionary pressure on their SPs (via the augmentation of functionally redundant genes that buffer mutational fitness loss) and increasing the phylogenetic distance of recombination events, most of the genome-reduced species exhibit a nonconforming state. In contrast, their SPs reflect a combination of low functional redundancy and high selection pressure, resulting in significantly higher levels of conservation and invariance. Our findings indicate that although niche adaptation is the principal mechanism driving speciation, freshwater genome-reduced bacteria often experience extended periods of adaptive stasis. Understanding the adaptive state of microbial species will lead to a better comprehension of their spatiotemporal dynamics, biogeography, and resilience to global change.

For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of of all covariates tested
A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g.means) or or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g.Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Data analysis
Adrian-Stefan Andrei Apr 3, 3, 2024 No No software was used for data collection.All sequence data generated during this study have been deposited in the EBI/NCBI (Bioprojects: PRJEB35770, PRJEB35640, PRJNA428721, PRJNA429145).The accession numbers for the 52 raw metagenomic datasets are listed in Table S0 of the Supplementary Dataset.The 5 519 MAG IDs, their accession numbers, Bioproject IDs, and Sample IDs, along with additional metadata, are provided in Table S1 of the Supplementary Dataset.The generated data supporting the conclusions of this study may be found at figshare: 10.6084/m9.figshare.23546067.All additional important data supporting the study's conclusions are included in the publication and its supplemental material files.

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.In response to the study's focus on microbial communities within freshwater ecosystems, our research sample comprises an extensive collection of datasets from metagenomic sequencing efforts targeting five distinct Central European lakes.These lakes were meticulously selected to represent a broad spectrum of trophic states, from oligotrophic (nutrient-poor) to dystrophic (rich in organic matter but low in oxygen), ensuring a comprehensive analysis of microbial diversity across different ecological conditions.This strategic selection was motivated by the hypothesis that varying nutrient availability and environmental pressures across these trophic states would influence the diversity and adaptive strategies of the resident bacterial populations.
The datasets themselves are a compilation of approximately 5,500 prokaryotic metagenome-assembled genomes (MAGs), generated from 52 shotgun-sequenced samples.This corresponds to about 11 billion reads and 3.31 Tb of data, making it one of the most substantial collections of lake microbial genomes analyzed to date.
Our rationale for this sample choice is twofold: to leverage the high-resolution insights provided by genome-resolved metagenomics for understanding bacterial adaptation and evolution, and to capture the diversity and dynamism of microbial communities across a gradient of environmental conditions.This approach allows us to dissect the complex interplay between genomic features, such as genome size, and ecological strategies within and among bacterial populations, shedding light on the underlying mechanisms of microbial diversification in freshwater habitats.
Samples from five freshwater lakes (that range in trophic status from oligotrophic to eutrophic; the Czech Republic and Switzerland) were used to recover genomic information from prokaryotes colonizing diverse freshwater niches."ímov Reservoir (470m a.s.l., 48°50'N, 14°29'E, Czech Republic) is a meso-eutrophic, canyon-shaped dimictic water body with an area of 2.0km^2 (length 13.5km, the volume of 34.5×106 m^3, mean water retention time 77 days, maximum depth of 43m) that was built during 1974-1979 by damming a 13.5km long section of the River Malše.The sampling was performed between June 2015 and August 2017, above the deepest point of the reservoir by using a Friedinger sampler.20L of water were collected from 0.5 (n=10) and 30m (n=8) depths and subjected to sequential peristaltic filtration through a series of 20, 5, and 0.2-'m-pore-size polycarbonate membrane filters (Sterlitech Corporation, USA).The sample collection and filtration steps were similar for the rest of the lakes/pools unless otherwise stated.Ji$ická pond (892 m a.s.l., 48°36.96'N14°40.59'E,Czech Republic) is a dystrophic humic water body with an area of 0.035 km2 (volume 6.59 x103 m^3, mean water retention time 9 days, maximum depth of 3.7 m), located in the Novohradské mountains of Southern Bohemia.Fifteen epilimnia (0.5 m depth) water samples were collected between May 2016 and August 2017.Lake Zurich (406m a.s.l., 47°18'N, 8°34'E, Switzerland) is an oligomesotrophic, perialpine monomictic water body, with an area of 67.3km^2 (length 40km, volume 3.3 km^3, mean water retention time 1.4 years, maximum depth of 136m).Thirteen samples were collected between 2013 -2019 from the epilimnion (5 m depth, n=8) and hypolimnion (80/120 m depth, n=5) layers, and processed as described above.Lake Thun (558 m a.s.l., 46°41'N, 7°43'E, Switzerland) is an oligotrophic, alpine water body with an area of 48.3 km^2 (length 17.5 km, volume 6.5 km^3, mean water retention time 1.8 years, maximum depth of 217 m).Two water samples were collected in June 2018 from 5 and 180 m depths.Lake Constance (395 m a.s.l., 47°32'N, 9°31'E, Swiss Confederation) is an oligotrophic perialpine lake with an area of 473 km^2 (length 63 km, volume 48 km^3, mean water retention time 5 years, maximum depth of 252m).Four samples were collected in July and October 2018 from 5 m and 200 m depths.The sampling locations within the lakes were selected based on their stratification (when applicable) to accurately reflect the varying environmental conditions.The volume of water collected aimed to encompass approximately 10^9 to 10^10 prokaryotic cells, with the specific quantity varying by lake, season, and individual sample.We did not perform any sample size calculations for this study.
In our study, the data collection procedure was meticulously designed to capture a comprehensive snapshot of prokaryotic diversity across different freshwater ecosystems. 1. Selection of Sampling Sites: We strategically chose sampling locations within each of the five Central European lakes to reflect the unique ecological stratification present within these bodies of water.This approach ensured that the collected samples represented the broad range of microenvironments and the microbial life they harbor.2. Seasonal and Spatial Sampling: Recognizing the influence of seasonal changes on microbial communities, we conducted sampling across various seasons.Additionally, we selected multiple sites within each lake to account for spatial heterogeneity, aiming to encompass the full spectrum of microbial diversity.The data generated are available in in referenced public repositories, ensuring transparency and accessibility.All methods used in in this study are extensively cited, and the parameters for software applications are fully documented.Given the study's reliance on on environmental samples, no no attempts were made to to replicate the experiments.This is is an an exploratory study and randomization is is not relevant to to the study design.
Blinding was not performed because it it was not relevant to to this study.This study was an an exploratory survey of of microbial diversity without a priori expectations that would influence the analyses.
Water temperatures varied from 0.2°C to to 24.1°C, influenced by by the specific lake and season.For comprehensive temperature data, refer to to Table S1 S1 in in the Supplementary Dataset.Weather conditions at at the time of of sampling were typical for each location, and no no sampling occurred during weather anomalies.
materials, systems and methodsWe We require information from authors about some types of of materials, experimental systems and methods used in in many studies.Here, indicate whether each material, system or or method listed is is relevant to to your study.If If you are not sure if if a list item applies to to your research, read the appropriate section before selecting a response.were excluded from the analyses.
Source data are provided with this paper.
This study does not involve human research participants.This study does not involve human research participants.This study does not involve human research participants.natureportfolio | reporting summary This is an exploratory metagenomic study focused on the recovery and analysis of environmental bacterial genomes.The nature of the study does not necessitate any treatment factors, interactions, design structure (factorial, nested, hierarchical) or replicates.