Evolution in action: Habitat-transition leads to genome-streamlining in Methylophilaceae (Betaproteobacteriales)

The most abundant aquatic microbes are small in cell and genome size. Genome-streamlining theory predicts gene loss caused by evolutionary selection driven by environmental factors, favouring superior competitors for limiting resources. However, evolutionary histories of such abundant, genome-streamlined microbes remain largely unknown. Here we reconstruct the series of steps in the evolution of some of the most abundant genome-streamlined microbes in freshwaters (‘Ca. Methylopumilus’) and oceans (marine lineage OM43). A broad genomic spectrum is visible in the family Methylophilaceae (Betaproteobacteriales), from sediment microbes with medium-sized genomes (2-3 Mbp genome size), an occasionally blooming pelagic intermediate (1.7 Mbp), and the most reduced pelagic forms (1.3 Mbp). We show that a habitat transition from freshwater sediment to the relatively oligotrophic pelagial was accompanied by progressive gene loss and adaptive gains. Gene loss has mainly affected functions not necessarily required or advantageous in the pelagial or are encoded by redundant pathways. Likewise, we identified genes providing adaptations to oligotrophic conditions that have been transmitted horizontally from pelagic freshwater microbes. Remarkably, the secondary transition from the pelagial of lakes to the oceans required only slight modifications, i.e., adaptations to higher salinity, gained via horizontal gene transfer from indigenous microbes. Our study provides first genomic evidence of genome-reduction taking place during habitat transitions. In this regard, the family Methylophilaceae is an exceptional model for tracing the evolutionary history of genome-streamlining as such a collection of evolutionarily related microbes from different habitats is practically unknown for other similarly abundant microbes (e.g., ‘Ca. Pelagibacterales’, ‘Ca. Nanopelagicales’).


Availability of data 172
All genomes have been submitted to NCBI under BioProject XXX, BioSamples XY-XYX 173 (Please note: Submission is in progress, accession numbers will be provided as soon as Currently, 31 Methylophilaceae genomes of high quality (i.e., >99% completeness, <20 180 scaffolds) are publicly available, mostly from axenic isolates from freshwater sediments (Fig.  181 1, Table S1). We additionally sequenced the genomes of 41 strains of planktonic freshwater 182 strains affiliated with 'Ca. Methylopumilus planktonicus' (38 strains) and Methylophilus sp. (3 183 strains). These microbes were isolated from the pelagial of three different freshwater habitats 184 (Lake Zurich, CH; Římov Reservoir, CZ; Lake Medard, CZ) by dilution-to-extinction [14,30]. 185 All novel genomes are of very high quality, i.e., they are complete, with one circular 186 chromosome (Table S1) medardicus, with closest hits to isolates from freshwater sediment. These strains might 204 originate from the same clone, as they were gained from the same sample from Lake Medard 205 and were 100% identical in in their genome sequence. M. mediardicus seem to be not 206 abundant in the pelagial of lakes, as indicated by recruitments from 345 different pelagic 207 freshwater metagenomic datasets, however they could be readily detected in relatively high 208 proportions in sediment metagenomes (Fig. 1b, Table S2). Sediments also appear to be the 209 main habitat of other Methylophilus and Methylotenera. The three strains isolated from marine 210 systems, that were referred to as 16,23], form two different genera based 211 on AAI (Fig. S3). However, none appear to be abundant in the open ocean (Fig. 1b), and only 212 strain HTCC2181 could be detected in estuarine/coastal metagenomes, although lineage 213 OM43 has been repeatedly reported in coastal oceans by CARD-FISH, where they can reach 214 up to 4% or 0.8 x 10 5 cells ml -1 during phytoplankton blooms [28,29]. It is thus likely that other, 215 more abundant strains of OM43 still await isolation. 'Ca. Methylopumilus spp.' on the other 216 hand, were found in moderate proportions in estuarine/coastal systems, but their main habitat 217 is clearly the pelagial of lakes, where they are highly abundant (Fig. 1b) were also prevalent in rivers (Fig. 1b). 221

Genome-streamlining in pelagic strains 222
The genomes of pelagic freshwater 'Ca. Methylopumilus sp. ' (n=39) (Table S1). Moreover, we observed a negative relationship 234 between genomic GC content and stop-codon usage of TAA instead of TAG, as well as a 235 preferred amino acid usage of lysine instead of arginine (Fig. 2), both suggested to be 236 involved in nitrogen limitation [4]. Furthermore, amino acids with less nitrogen and sulphur and 237 more carbon atoms were favourably encoded by the genome-streamlined microbes (Fig. 2, 238 S5, S6). 239

Adaptive gene loss during habitat transition from the sediment to the pelagial 240
The core genome of the family Methylophilaceae consists of 664 protein families (4.3% and also not very abundant in lake sediments (Fig. 1b, Table S2). The most pronounced 266 differences in the genetic make-up of sediment vs. pelagic strains were detected in motility 267 and chemotaxis (Figs. 3,4), with all Methylophilus and all but two Methylotenera strains 268 having flagella and type IV pili, while the planktonic strains have lost mobility and also greatly 269 reduced the number of two-component regulatory systems and sigma factors. A large number 270 of membrane transporters for inorganic compounds was detected exclusively in sediment 271 Methylophilaceae, while this number is reduced in 'Ca. M. turicensis' and even more in 'Ca. 272 Methylopumilus' and OM43 (Fig. 4 Table S4). Ammonia is the main microbial nitrogen source in the epilimnion of 282 lakes and oceans, while nitrate and other compounds like urea, taurine, or cyanate are more 283 abundant in deeper, oxygenated layers and the sediment [14,73,74]. Therefore, an 284 adaptation to ammonium uptake might be advantageous for pelagic microbes. 285

Furthermore, a high diversity of pathways involved in sulfur metabolism was detected in 286
Methylophilaceae, with the genome-streamlined strains representing the most reduced forms 287 again. All Methylophilus, Methylotenera, 'Ca. Methylosemipumilus turicensis' and one strain of 288 'Ca. Methylopumilus rimovensis' encode ABC transporters for sulfate uptake, and a sulfate 289 permease was annotated in OM43 and several sediment Methylophilaceae, while the majority 290 of 'Ca. Methylopumilus' lack these transporters (Fig. 3, 4). Canonical assimilatory sulfate 291 reduction seems to be incomplete in most Methylophilaceae, as adenylyl sulfate kinase cysC 292 was annotated only in a few strains (Table S4). Thus, the mode of sulfite generation remains 293 unclear, with unknown APS kinases or other links from APS to sulfite. Methylophilus 294 rhizosphaera encodes genes for dissimilatory sulfate reduction and most sediment strains 295 possess ABC transporters for alkanesulfonates, most likely transporting methylsulfonate, that 296 can be oxidised to sulfite by methanesulfonate monooxygenases generating formaldehyde as 297 by-product. Dimethyl sulfide (DMS) seems to be a source for sulfur and formaldehyde as well, 298 as dimethylsulfoxide and dimethyl monooxygenases are present in several sediment 299 Methylophilaceae, but absent in all pelagic strains. It is thus still unclear how 'Ca.  Table  306 S4). However, putative cobalamin transporters were annotated in all isolates. 307 The methylcitric acid (MCA) cycle for oxidising propionate via methylcitrate to pyruvate is 308 present in Methylotenera, Methylophilus, 'Ca. M. turicensis', and the marine OM43, but absent 309 in all 'Ca. Methylopumilus' strains, suggesting it has been selectively lost in these organisms. 310 All genes were arranged in a highly conserved fashion, with the exception of 'Ca. M.

Genome-streamlining leading to a loss of redundant methylotrophic pathways 319
Some of the sediment dwellers seem to be facultative methylotrophs, as ABC 320 transporters for amino acids were annotated (Figs. 3, 4, Table S3). Methylotenera versatilis 321 301 additionally encodes a fructose-specific phosphotransferase system (PTS) and a 1-322 phosphofructokinase, as well as transporters for putrescine uptake and the subsequent 323 pathway for its degradation. 'Ca. M. turicensis' might also be a facultative methylotroph, as it 324 possesses a PTS system for cellobiose, while this (as well as amino acid transporters) is 325 lacking in all 'Ca. Methylopumilus' and OM43 strains, making them obligate methylotrophs. 326 These observations suggest that the ancestor of both pelagic and sediment lineages was also 327 a facultative methylotroph and that obligate methylotrophy emerged only in the truly pelagic 328

strains. 329
Remarkably, also pathways involved in methylotrophy were reduced in the course of 330 genome streamlining with the sediment dwelling Methylophilus and Methylotenera having the 331 most complete modules for C 1 compound oxidation, demethylation and assimilation (Fig. 3, 4, 332 Table S4). They also encode multiple types of methanol dehydrogenases (up to five different 333 types in single strains), while the pelagic forms possess only XoxF4-1 (Fig. S9). Moreover, the 334 latter encode neither traditional methylamine-dehydrogenases nor the N-methylglutamate 335 (NMG) pathway for methylamine oxidation. Thus, the mode of methylamine uptake is still 336 unclear, although it has been experimentally demonstrated that some pelagic strains can 337 utilize this C 1 substrate [14,75]. However, also nearly half of the sediment strains lack these 338 well-described pathways in a patchy manner only partly reflected by phylogeny, therefore it is 339 likely that methylamine utilization is not a common feature within Methylophilaceae, or that 340 alternative routes of its oxidation still await discovery [69]. Formaldehyde oxidation can be 341 achieved via three alternative routes, and only four Methylophilus strains encode all of them, 342 i.e., all others lack a formaldehyde-dehydrogenase. All Methylophilus and Methylotenera as 343 well as 'Ca. M. turicensis' carry genes for the tetrahydromethanopterin (H 4 MPT) pathway, but 344 none of the 'Ca. Methylopumilus' and OM43 strains. Therefore, the only route for 345 formaldehyde oxidation in these genome-streamlined microbes is the tetrahydrofolate (H 4 F) 346 pathway which includes the spontaneous reaction of formaldehyde to H 4 F and is thought to be 347 relatively slow [14,17]. The ribulose monophosphate (RuMP) cycle for formaldehyde 348 assimilation/oxidation and formate oxidation via formate dehydrogenases was annotated in all 349 Methylophilaceae, while none of them possess other potential methylotrophic modules such 350 as the serine cycle, the ethylmalonyl-CoA-pathway for glyoxylate regeneration, a glyoxylate 351 shunt, nor the Calvin-Benson-Bassham cycle for CO 2 assimilation, as already previously noted 352 to be lacking in Methylophilaceae [17]. Thus, the core methylotrophic modules in 353 Methylophilaceae contain methanol oxidation via XoxF methanol dehydrogenases, 354 formaldehyde oxidation via the H 4 F pathway, the RuMP cycle, and formate oxidation (Fig. 3, 355 Table S4) [17,69,76]. The majority of genes encoding these pathways were organized in 356 operon structures or found in close vicinity to each other with high synteny and 357 phylogenetically reflecting the overall phylogeny of the family (Fig. S10, S11). 358 Photoheterotrophy as adaptation to oligotrophic pelagic conditions. 359 Rhodopsins are light-driven proton pumps producing ATP that fuel e.g., membrane proteorhodopsin was never present in the marine lineage or was lost subsequently, and if so, 377 the reasons for a secondary loss remain enigmatic as two rhodopsins would provide an even 378 better adaptation to oligotrophic waters than one. 379 The second transition from freshwater pelagial to the marine realm is characterized by 380 adaptations to a salty environment 381 The second habitat transition across the freshwater-marine boundary does not appear 382 to involve genome streamlining, as genomes of pelagic freshwater and marine methylotrophs 383 are of similar small size and low GC content (Figs. 1, 2). We hypothesize that this transition 384 had less impact on the lifestyle (purely planktonic, oligotrophic) but required specific 385 adaptations to the marine realm that were mainly acquired by HGT, and as suggested by the 386 long branches in the phylogenetic tree (Fig. 1), multiple, rapid changes in existing genes. family. Two marine OM43 strains (KB13 and MBRS-H7) encode this pathway followed by 405 sodium:proline symporter putP arranged in high synteny and protein similarity with 406 marine/hypersaline sediment microbes, thus it is likely that both components were gained via 407 HGT (Fig. S14). A second copy of the putP symporter was common to all Methylophilaceae 408 (data not shown). Also a dipeptide/tripeptide permease (DtpD) unique for the marine OM43 409 lineage seems to be transferred horizontally, either from marine Bacteroidetes or sediment-410 dwelling Sulfurifustis (Gammaproteobacteria, Fig. S15). Other putative membrane compounds 411 involved in sodium transport in marine OM43 include a sodium:alanine symporter (AlsT, Fig.  412 S16a), a sodium:acetate symporter (ActP, Fig. S16b), a sodium:dicarboxylate symporter (GltT, 413 antiporter (NhaE-like, Fig. S16e). Although also several other Methylophilaceae carry some of 415 these sodium transporters, they are only distantly related to OM43, thus they might be 416 acquired horizontally. Conversely, ActP and GltT of OM43 are most closely related to three 417 'Ca. M. universalis' strains and the two 'Ca. M. rimovensis' strains, respectively (Fig. S16b, 418 S16c). Both symporters are related to microbes from freshwater and marine habitats, hinting 419 to some yet unknown lineages related to both OM43 and 'Ca. Methylopumilus' most likely 420 thriving in the freshwater-marine transition zone. 421

Conclusions 422
Our study provides first genomic evidence that the ancestors of genome-streamlined 423 pelagic Methylophilaceae can be traced back to sediments with two habitat transitions 424 occurring in the evolutionary history of the family. The first from sediments to the pelagial is 425 characterized by pronounced genome reduction driven by selection pressure for relatively 426 more oligotrophic environmental conditions. This adaptive gene loss has mainly affected 427 functions that (i) are not necessarily required in the pelagial (e.g., motility, chemotaxis), (ii) are 428 not advantageous for survival in an oligotrophic habitat (e.g., low substrate affinity 429 transporters), and (iii) are encoded in redundant pathways (e.g., formaldehyde oxidation). 430 Likewise, (iv) genes providing adaptations to oligotrophic conditions have been transmitted 431 horizontally from indigenous pelagic microbes (e.g., rhodopsins). The second habitat transition 432 across the freshwater-marine boundary did not result in further genome-streamlining, but is 433    vs. marine OM43. For details on pathways see Table S4. 719