Borgs are giant genetic elements with potential to expand metabolic capacity

Al-Shayeb, Basem; Schoelmerich, Marie C.; West-Roberts, Jacob; Valentin-Alvarado, Luis E.; Sachdeva, Rohan; Mullen, Susan; Crits-Christoph, Alexander; Wilkins, Michael J.; Williams, Kenneth H.; Doudna, Jennifer A.; Banfield, Jillian F.

doi:10.1038/s41586-022-05256-1

Download PDF

Article
Open access
Published: 19 October 2022

Borgs are giant genetic elements with potential to expand metabolic capacity

Basem Al-Shayeb^1,2,
Marie C. Schoelmerich¹,
Jacob West-Roberts³,
Luis E. Valentin-Alvarado ORCID: orcid.org/0000-0001-7988-8556^1,2,
Rohan Sachdeva^1,4,
Susan Mullen⁴,
Alexander Crits-Christoph^1,2,
Michael J. Wilkins ORCID: orcid.org/0000-0002-3595-0853⁵,
Kenneth H. Williams^6,7,
Jennifer A. Doudna ORCID: orcid.org/0000-0001-9161-999X^1,8 &
…
Jillian F. Banfield ORCID: orcid.org/0000-0001-8203-8771^1,3,4,6,9

Nature volume 610, pages 731–736 (2022)Cite this article

44k Accesses
17 Citations
732 Altmetric
Metrics details

Subjects

Abstract

Anaerobic methane oxidation exerts a key control on greenhouse gas emissions¹, yet factors that modulate the activity of microorganisms performing this function remain poorly understood. Here we discovered extraordinarily large, diverse DNA sequences that primarily encode hypothetical proteins through studying groundwater, sediments and wetland soil where methane production and oxidation occur. Four curated, complete genomes are linear, up to approximately 1 Mb in length and share genome organization, including replichore structure, long inverted terminal repeats and genome-wide unique perfect tandem direct repeats that are intergenic or generate amino acid repeats. We infer that these are highly divergent archaeal extrachromosomal elements with a distinct evolutionary origin. Gene sequence similarity, phylogeny and local divergence of sequence composition indicate that many of their genes were assimilated from methane-oxidizing Methanoperedens archaea. We refer to these elements as ‘Borgs’. We identified at least 19 different Borg types coexisting with Methanoperedens spp. in four distinct ecosystems. Borgs provide methane-oxidizing Methanoperedens archaea access to genes encoding proteins involved in redox reactions and energy conservation (for example, clusters of multihaem cytochromes and methyl coenzyme M reductase). These data suggest that Borgs might have previously unrecognized roles in the metabolism of this group of archaea, which are known to modulate greenhouse gas emissions, but further studies are now needed to establish their functional relevance.

Groundwater Elusimicrobia are metabolically diverse compared to gut microbiome Elusimicrobia and some have a novel nitrogenase paralog

Article Open access 17 July 2020

Raphaël Méheust, Cindy J. Castelle, … Jillian F. Banfield

Identifying and tracking mobile elements in evolving compost communities yields insights into the nanobiome

Article Open access 28 August 2023

Bram van Dijk, Pauline Buffard, … Paul B. Rainey

Insights into the ecological roles and evolution of methyl-coenzyme M reductase-containing hot spring Archaea

Article Open access 08 October 2019

Zheng-Shuang Hua, Yu-Lin Wang, … Wen-Jun Li

Main

Of all of the biogeochemical cycles on Earth, the methane cycle may be most tightly linked to climate. Methane (CH₄) is a greenhouse gas roughly 30 times more potent than carbon dioxide (CO₂), and approximately 1 gigatonne is produced annually by methanogenic (methane-producing) archaea that inhabit anoxic environments². The efflux of methane into the atmosphere is mitigated by methane-oxidizing microorganisms (methanotrophs). In oxic environments, CH₄ is consumed by aerobic bacteria that use methane monooxygenase (MMO) and O₂ as a terminal electron acceptor³, whereas in anoxic environments, anaerobic methanotrophic archaea (ANME) use a reverse methanogenesis pathway to oxidize CH₄, the key enzyme of which is methyl-CoM reductase (MCR)^4,5. Some ANMEs rely on a syntrophic partner to couple CH₄ oxidation to the reduction of terminal electron acceptors, yet Methanoperedens (ANME-2d, phylum Euryarchaeota) can directly couple CH₄ oxidation to the reduction of iron, nitrate or manganese^6,7. Some phenomena have been suggested to modulate rates of methane oxidation. For example, some phages can decrease rates of methane oxidation by infection and lysis of methane-oxidizing bacteria⁸, and others with the critical subunit of MMO⁹ probably increase the ability of their host bacteria to conserve energy during phage replication. Here we report the discovery of novel extrachromosomal elements (ECEs) that are inferred to replicate within Methanoperedens spp. Their numerous and diverse metabolism-relevant genes, huge size and distinctive genomic architecture distinguish these archaeal ECEs from all previously reported elements associated with archaea^10,11,12 and from bacteriophages, which typically have one or a few biogeochemically relevant genes^13,14. We hypothesize that these novel ECEs may substantially impact the capacity of Methanoperedens spp. to oxidize methane.

Genome structure and features

By analysis of whole-community metagenomic data from wetland soils in California, USA (Extended Data Fig. 1), we discovered enigmatic genetic elements, the genomes for three of which were carefully manually curated to completion (Methods). From sediment samples from the Rifle, Colorado aquifer¹⁵, we recovered partial genomes from a single population related to those from the wetland soils; the sequences were combined and manually curated to ultimately yield a fourth complete genome (Methods). All four curated genomes are linear and terminated by more than 1-kb inverted repeats. The genome sizes range from 661,708 to 918,293 kb (Fig. 1a, Extended Data Table 1 and Supplementary Table 1). Prominent features of all genomes are 25–54 regions composed of perfect tandem direct repeats (Fig. 1b and Supplementary Table 2) that are novel (Extended Data Fig. 2) and occur both in intergenic regions and in genes where they usually introduce perfect amino acid repeats (Supplementary Table 2). All genomes have two replichores of unequal lengths and initiate replication at the chromosome ends (Extended Data Fig. 3). Each replichore carries essentially all genes on one strand (Fig. 1a). Although the majority of genes are novel, approximately 21% of the predicted proteins have best matches to proteins of Archaea (Extended Data Fig. 4a), and the largest group of these have best matches to proteins of Methanoperedens spp. (Extended Data Fig. 4b). Of note, the GC contents of the four genomes are approximately 10% lower than those of previously reported and coexisting Methanoperedens species (Fig. 2a). We rule out the possibility that these sequences represent genomes of novel Archaea, as they lack almost all of the single-copy genes found in archaeal genomes and sets of ribosomal proteins that are present even in obligate symbionts (Extended Data Figs. 5 and 6a and Supplementary Tables 3–6). There are no additional sequences in the datasets that could comprise additional portions of these genomes. Thus, they are clearly neither part of Methanoperedens spp. genomes nor parts of the genomes of other archaea.

**Fig. 1: Borgs share overall genomic features.**

**Fig. 2: Borg and *Methanoperedens* spp. genomic features and abundance patterns.**

Abundances of Methanoperedens spp. and some ECEs are tightly correlated over a set of 46 different wetland soil samples (43 genomes were included in the analysis; Extended Data Fig. 6b). This observation supports other indications that these ECEs associate with Methanoperedens and suggests that specific ECEs have distinct Methanoperedens spp. hosts (Fig. 2b). This is true for one ECE whose abundances correlate reasonably well with a specific host group, in which ECE to Methanoperedens spp. abundance ratios range from 2:1 to 8:1. Given their up to approximately 1-Mb length, there may be more ECE DNA in some host cells than host DNA. The Borg sequences are much more abundant in deep, anoxic soil samples (Extended Data Fig. 7a,b).

A few percent of the genes in the genomes have locally elevated GC contents that approach, and in some cases match, those of coexisting Methanoperedens spp. (Fig. 1b). This, and the very high similarity of some protein sequences to those of Methanoperedens spp., indicates that these genes were acquired by lateral gene transfer from Methanoperedens spp. Other genes with best matches to Methanoperedens spp. genes have lower GC contents (closer to those of these ECEs at approximately 33%), suggesting that their DNA composition has partly or completely ameliorated since acquisition¹⁶.

Archaeal ECEs include viruses¹⁷, plasmids¹⁸ and minichromosomes, sometimes also referred to as megaplasmids^10,11,12. The genomes reported here are much larger than those of all known archaeal viruses, some of which have small, linear genomes¹², and at least three are larger than any known bacteriophage¹⁹. These linear elements are larger than all of the reported circular plasmids that affiliate with halophiles, methanogens and archaeal thermophiles. We did not detect genes for plasmid partitioning or conjugative systems, rRNA loci or encoded viral proteins (Supplementary Table 3), and the genomes were markedly different from recently reported Methanoperedens spp. plasmids²⁰. The distinctly lower GC content and variable copy number argue against their classification as archaeal minichromosomes^12,21. Thus, we cannot confidently classify the ECEs as viruses, plasmids or minichromosomes. Moreover, the protein family profiles are quite distinct from those of archaeal and bacterial ECEs (Fig. 2d and Extended Data Fig. 5). Some bacterial megaplasmids have been reported to be very large and linear, but they typically encode few or no essential genes²², and if they contain repeats, they are interspaced (that is, not tandem)²³. Each distinctive feature of the ECEs has been reported in microbial genomes, plasmids or viruses, but the combination of these features in these huge ECEs is unique. Thus, we conclude that the genomes represent novel archaeal ECEs that occur in association with, but not as part of, Methanoperedens spp. genomes. We refer to these as Borgs, a name that reflects their propensity to assimilate genes from organisms, most notably Methanoperedens spp.

Using criteria based on the features of the four complete Borgs, we searched for additional Borgs in our metagenomic datasets from a wide diversity of environment types. From the wetland soil, we constructed bins for 11 additional Borgs, some of which exceed 1 Mb in length (Extended Data Table 1 and Supplementary Table 1). Other Borgs were sampled from the Rifle, Colorado aquifer, discharge from an abandoned Corona mercury mine in Napa County, California, and from shallow riverbed pore fluids in the East River, Colorado. In total, we recovered genome bins for 19 different Borgs, each of which was assigned a colour-based name. We found no Borgs in some samples, despite the presence of Methanoperedens spp. at very high abundance levels (Extended Data Fig. 7). Thus, it appears that these ECEs do not associate with all Methanoperedens spp.

Pairs of the four complete Borg genomes (Purple, Black, Sky and Lilac) and three fragments of the Orange Borg are alignable over much of their lengths (Fig. 1a). The Rose and Sky Borg genomes are also largely syntenous (Extended Data Fig. 8a) and were reconstructed from different samples that contain these Borgs at very different levels of abundance. Despite only sharing a less than 50% average nucleotide identity across most of their genomes, the genomes have multiple regions that share 100% nucleotide identity, one of which is approximately 11 kb in length (Extended Data Fig. 8b,c). This suggests that these two Borgs recombined, indicating that they recently coexisted within the same host cell.

Borg gene inventories

Many Borg genomes encode mobile element defence systems, including RNA-targeting type III-A CRISPR–Cas systems that lack spacer acquisition machinery, a feature previously noted in huge bacterial viruses¹⁹. An Orange Borg CRISPR spacer targets a gene in a mobile region in a coexisting Methanoperedens spp. (Extended Data Fig. 8d), further supporting the conclusion that Methanoperedens spp. are the Borg hosts.

The four complete genomes and almost all of the near-complete and partial genomes encode ribosomal protein L11 (rpL11), and some have one or two other ribosomal proteins (Extended Data Fig. 6a). The rpL11 protein sequences form a group that places phylogenetically sibling to those of Methanoperedens spp. (Extended Data Fig. 9), further reinforcing the link between Borgs and Methanoperedens spp. Four additional rpL11 sequences were identified on short contigs from the wetland group with the Borg sequences and probably represent additional Borgs (Supplementary Table 1). The topology of the rpL11 tree, and similar topologies observed for phylogenetic trees constructed using other ribosomal proteins, MCR proteins, electron transfer flavoproteins and aconitase, may indicate the presence of translation-related genes in the Borg ancestor (Extended Data Fig. 9 and Supplementary Information).

The most highly represented Borg genes encode glycosyltransferases, which are proteins involved in DNA and RNA manipulation, transport, energy and the cell surface (PEGA and S-layer proteins). Also prevalent are many genes encoding membrane-associated proteins of unknown function that may impact the membrane profile of their host (Fig. 2c). At least seven Borgs carry a nifHDK operon for nitrogen fixation, also predicted in Methanoperedens spp. genomes, and may augment the influence of the host on nitrogen cycling (Fig. 1b, Supplementary Information and Supplementary Table 6). Potentially related to survival under resource limitation are genes in at least ten Borg genomes for synthesis of the carbon storage compound polyhydroxyalkanoate (PHA), a capacity also predicted for Methanoperedens spp.²⁴. Other stress-related genes encode tellurium resistance proteins that do not occur in Methanoperedens spp. genomes (Supplementary Table 5). All Borgs carry large FtsZ-tubulin homologues that may be involved in cell division, and proteins with the TEP1-like TROVE domain protein that also do not occur in Methanoperedens spp. genomes (Supplementary Table 5). These may form a complex similar to Telomerase, Ro or Vault ribonucleoproteins, although their function remains unclear²⁵. Several Borgs encode two genes of the tricarboxylic acid cycle (citrate synthase and aconitase; Supplementary Information).

Many Borg genes are predicted to have roles in redox and respiratory reactions. The Black Borg encodes cfbB and cfbC, genes involved in the biosynthesis of F430, which is the cofactor for MCR, the central enzyme involved in methane oxidation by Methanoperedens spp. The similarity in GC content of Borg cfbB and cfbC and protein sequences of coexisting Methanoperedens spp. suggests that these genes were acquired from Methanoperedens spp. recently. The Blue and Olive Borgs encode cofE (encoding coenzyme F420:L-glutamate ligase), which is involved in the biosynthesis of a precursor for F420. The Blue and Pink Borgs have an electron bifurcating complex (Supplementary Information) that includes d-lactate dehydrogenase. Eight Borgs encode genes for biosynthesis of tetrahydromethanopterin, a coenzyme used in methanogenesis, and ferredoxin proteins, which may serve as electron carriers. The Green and Sky Borgs also encode 5,6,7,8-tetrahydromethanopterin hydro-lyase (Fae), an enzyme responsible for formaldehyde detoxification and involved in pentose-phosphate synthesis. Also identified were genes encoding carbon monoxide dehydrogenase (CODH), plastocyanin, cupredoxins and many multihaem cytochromes (MHCs). These results indicate substantial Borg potential to augment the energy conservation by Methanoperedens spp. This is especially apparent for the Lilac Borg.

Host-relevant gene inventory of Lilac Borg

We analysed the genes of the complete Lilac Borg genome in detail as, unlike the other Borgs, the Lilac Borg co-occurs with a single group of Methanoperedens spp. that probably represent the host (Fig. 3 and Supplementary Table 7). Remarkably, this Borg genome encodes an MCR complex, which is central to methanogenesis and reverse methanogenesis. The mcrBDGA cluster shares high (75–88%) amino acid sequence identity with that of the coexisting Methanoperedens spp. genome. This complex is also encoded by a fragment of the Steel Borg. For both the Lilac and the Steel Borgs, the GC content of the region encoding this operon is elevated relative to the average Borg values. Methanoperedens spp. pass electrons from methane oxidation to terminal electron acceptors (Fe³⁺, NO₃⁻ or Mn⁴⁺) via MHCs^26,27,28. The Lilac Borg genome encodes 16 MHCs with up to 32 haem-binding motifs within one protein. By analogy with experiments showing that cyanophages with a photosystem gene increase host fitness, we suggest that MHC genes may increase the capacity of Methanoperedens spp. to oxidize methane^9,29. However, this needs to be tested experimentally. Membrane-bound and extracellular MHC may diversify the range of Methanoperedens spp. extracellular electron acceptors.

**Fig. 3: Cell cartoon illustrating capacities inferred to be provided to *Methanoperedens* spp. by the coexisting Lilac Borg.**

The Lilac Borg encodes a functional NiFe CODH, but this is fragmented in some genomes. Other genes for the acetyl-CoA decarbonylase–synthase complex are present only in Methanoperedens spp. The CODH is located in proximity to a cytochrome b and cytochrome c, so electrons from CO oxidation could be passed to an extracellular terminal acceptor such as Fe³⁺ in an energetically downhill reaction. This would allow the removal of toxic CO and may contribute to the formation of a proton gradient that can be harnessed for energy conservation.

The Lilac Borg has a gene resembling the γ-subunit of ethylbenzene dehydrogenase (EBDH), which is involved in transferring electrons liberated from the hydroxylation of ethylbenzene and propylbenzene³⁰. This EBDH-like protein is located extracellularly, and given haem-binding and cohesin domains, it may be involved in electron transfer and attachment.

Although the Lilac Borg lacks genes for a nitrate reductase, it encodes a probable hydroxylamine reductase (Hcp) that may scavenge toxic NO and hydroxylamine byproducts of Methanoperedens spp. nitrate metabolism. As the hcp gene was not identified in coexisting Methanoperedens spp., the Borg gene may protect Methanoperedens spp. from nitrosative stress. Proteins such as H₂O₂-forming NADH oxidase (Nox) and superoxide dismutase (SOD) may protect against reactive oxygen species. An alkylhydroperoxidase, two probable disulfide reductases and a bacterioferritin all may detoxify the H₂O₂ byproduct of Nox and SOD. The Lilac Borg also encodes genes that probably augment osmotic stress tolerance. This Borg, but not Methanoperedens spp., provides genes to make N^ε-acetyl-β-lysine as an osmolyte. An aspartate aminotransferase links the tricarboxylic acid cycle and amino acid synthesis, producing glutamate that can be used for the production of the osmolyte β-glutamate. More importantly, perhaps, it has recently been established that a bacterial homologue of this single enzyme can produce methane from methylamine³¹, raising the possibility of methane cycling within the Borg–Methanoperedens spp. system.

The Lilac Borg has three large clusters of genes. The first may be involved in cell wall modification, as it encodes large membrane-integral proteins with up to 17 transmembrane domains, proteins for polysaccharide synthesis, glycosyltransferases and probably carbohydrate-active proteins. The second contains key metabolic valves that connect gluconeogenesis with mannose metabolism for the production of glycans. One gene, encoding fructose 1,6-bisphosphatase (FBP), was not identified in the Methanoperedens spp. genomes and may regulate carbon flow from gluconeogenesis to mannose metabolism. In between these clusters are 12 genes with PEGA domains with similarity to S-layer proteins. Cell-surface proteins, along with these PEGA proteins, account for approximately 13% of all Lilac Borg genes. We conclude that functionalities related to cell wall architecture and modification are key to the effect of these ECEs on their host, perhaps triggering cell wall modification for adaptation to changing environmental conditions (Fig. 3).

Conclusions

Borgs are enigmatic ECEs that can approach (and probably exceed) 1 Mb in length (Extended Data Table 1). We can neither prove that they are archaeal viruses or plasmids or minichromosomes, nor prove that they are not. Although they may ultimately be classified as megaplasmids, they are clearly different from anything that has been previously reported. It is fascinating to ponder their possible evolutionary origins. Borg homologous recombination may indicate movement among hosts, thus their possible roles as gene transfer agents. It has been noted that Methanoperedens spp. have been particularly open to gene acquisition from diverse bacteria and archaea⁶, and Borgs may have contributed to this. The existence of Borgs encoding MCR demonstrates for the first time (to our knowledge) that MCR and MCR-like proteins for metabolism of methane and short-chain hydrocarbons can exist on ECEs and thus could potentially be dispersed across lineages, as is inferred to have occurred several times over the course of archaeal evolution^17,32. Borgs carry numerous metabolic genes, some of which produce variants of Methanoperedens spp. proteins that could have distinct biophysical and biochemical properties. Assuming that these genes either augment Methanoperedens spp. energy metabolism or extend the conditions under which they can function, Borgs may have far-reaching biogeochemical consequences, with important and unanticipated climate implications. Confirmation that Borgs impact the rate of oxidation of methane by Methanoperedens and extend the conditions under which these archaea can function will require experimental evidence. This could be pursued by establishing cultures that include Methanoperedens with and without Borgs and comparison of the methane oxidation rates, with testing performed under a range of geochemical conditions.

Methods

Sampling and creation of metagenomic datasets

We analysed sequences from sediments of an aquifer in Rifle, Colorado, USA, that were retrieved from cores from depths of 5 m and 6 m below the surface¹⁵ in July 2011, and cell concentrates from pumped groundwater from the same aquifer collected at a time of elevated O₂ concentration in May 2013. Discharge from the Corona Mine, Napa County, California, USA, was sampled in December 2019. Shallow pore water was collected from the riverbed at the East River, Crested Butte, Colorado sampled in August 2016. Soil was sampled from depth intervals between 1 cm and 1 m from a permanently moist wetland located in Lake County, California. Wetland soils were sampled in late October and early November 2017, 2018 and 2019. DNA was extracted from each sample (DNeasy PowerSoil Pro) and submitted for Illumina sequencing (150-bp or 250-bp reads) at the QB3 facility, University of California, Berkeley. Reads were adapter and quality trimmed using BBduk³³ and sickle³⁴. Filtered reads were assembled using IDBA-UD³⁵ and MEGAHIT, gene predictions were established using Prodigal³⁶ and USEARCH³⁷ was used for initial annotations^34,35,37,38. Functional predictions and predictions of tRNAs followed previously reported methods¹⁹.

Genome identification, binning and curation

Hundreds of kilobytes of de novo-assembled sequences were identified to be of interest as potential novel ECEs first based on their taxonomic profile. The taxonomic profiles were determined through a voting scheme in which the taxonomy is assigned at the species to domain level (Bacteria, Archaea, Eukaryotes and no domain) by comparison with a sequence database (protein annotations in the UniProt and ggKbase: https://ggkbase.berkeley.edu/) when the same taxonomic assignment received >50% votes. Assembled sequences selected for further analysis had no taxonomic profile, even at the domain level. The majority of contigs of interest had more genes with similarity to those of archaea of the genus Methanoperedens spp. than to any other genus (see Extended Data Fig. 4). The second feature of interest was dominance by hypothetical proteins yet absence of genes that would indicate identification as phage or viruses or plasmids.

These initially identified large fragments were manually curated to remove scaffolding gaps and local assembly errors, to extend and join contigs with the same profile, GC and coverage, and then to extend the near-complete sequences fully into their long terminal repeats. The last step required reassignment of reads mapped at one end and at double depth to both ends. The fully extended sequences had no unplaced reads extending outwards, despite genome-wide deep coverage. Given this, and the absence of any fragments that could potentially be part of a larger genome, it was concluded that sequences represented linear genomes.

In more detail, our curation method involved mapping of reads to the de novo fragments and extension within gaps and at termini using previously unplaced reads that we added based on overlap or by the relocation of misplaced reads (these could often be identified based on improper paired read distances and/or wrong orientation). Local assembly errors were sought by visualization of the reads mapped throughout the assembly and identified based on imperfect read support, or where a subset of reads was partly discrepant and discrepancies involved sequences that were shared by tandem direct repeats of the same region (that is, the tandem direct repeat regions were collapsed during assembly). De novo-assembled sequences often ended in tandem direct repeat regions because repeats fragment assemblies. To resolve local assembly errors, gaps were inserted and reads relocated to generate the sequence required to fill the gaps. This ensured comprehensive essentially perfect agreement between reads and the final consensus sequence. In some cases, the tandem direct repeat regions had greater than the expected depth of mapped reads and no reads spanned the flanking unique sequences. In these cases, the repeat number was approximated to achieve the expected read depth, but some arrays may be larger than shown. GC skew and cumulative GC skew were calculated using iRep³⁹ for the fully manually curated complete genomes, and the patterns were used to identify the origins and terminus of replication. The pattern of use of coding strands for genes (predicted in Bacterial Code 11) was compared with these origin and terminus predictions to resolve genome organization. The curated sequences were searched for perfect repeats of lengths of 50 or more nucleotides using Repeat Finder in Geneious. When repeat sequences overlapped, the unit of direct repeat was identified and the length of that repeat, number of repeats, location (within gene versus intergenic) and genome position were tabulated. Once the features characteristic of the ECEs of interest had been determined, we sought related elements. Sequences of interest were identified based on (1) credible partial alignment with the complete sequences, (2) no domain-level profile, (3) GC content of 30–35%, (4) regions with three or more direct tandem repeats scattered throughout the genome fragment, and (5) more best hits to Methanoperedens spp. proteins than to proteins from any other organisms. If scaffolds met criterion (1) they were immediately classified as targets. If they met most or all of the other criteria and had similar coverage values, they were binned together with other scaffolds from the same sample with these features. Often, ends of some of the contigs in the same bin overlapped perfectly and could be joined, increasing confidence in the bin quality. Genome sequences were aligned to each other using Mauve⁴⁰. Where anomalously high (perfect) sequence identity suggestive of recent recombination was detected between Borgs, reads mapped to the region were visualized to verify that the assembly was correct (that is, not chimeric; also see information in the Extended Data).

Genome fragments were phylogenetically profiled to establish relatedness to sequences in public databases. Sequences were classified as having no detectable hit if the protein had no similar database sequence with an E < 0.0001.

Correlation analyses

Reads from each sample were aligned to each Methanoperedens and Borg genome. Alignments were performed using bbmap⁴¹ using the following parameters: editfilter = 5, minid = 0.96, idfilter = 0.97, ambiguous = random. The number of reads aligning to each genome was then parsed into a matrix and the correlation between abundance patterns for Methanoperedens and Borg genomes was then calculated using Pearson correlation metric as implemented in scipy⁴². Correlation between a Methanoperedens genome and a Borg genome was deemed significant if the Pearson correlation between the two genomes was higher than 0.92. The code used for this analysis is available through Zenodo (https://doi.org/10.5281/zenodo.6887003).

CRISPR–Cas analysis

Borg and Methanoperedens-encoded CRISPR repeats and spacers were identified using CRISPRDetect⁴³. The coding sequences from this study were searched against Cas gene sequences reported from previous studies⁴⁴ using hmmsearch with E < 1 × 10⁻⁵ to identify the full locus. Matches were checked using a combination of hmmscan and BLAST searches against the NCBI nr database and manually verified by identifying colocated CRISPR arrays and Cas genes. Spacers extracted from between repeats of the CRISPR locus were compared with sequence assemblies from the sites where Borgs were identified using BLASTN-short ⁴⁵. Matches with alignment length of more than 24 bp and 1 or less mismatch were retained and targets were classified as bacteria, phage or other. CRISPR arrays that had 1 or less mismatch, were further searched for more spacer matches in the target sequence by finding more hits with three or less mismatches.

Protein and gene content analysis

After the identification and curation of Borg genomes and accumulation of usearch annotations for coding sequences, functional annotations were further assigned by searching against PFAM r32, KEGG, pVOG. Transmembrane regions in proteins were predicted with TMHMM. All Methanoperedens genomes and genome assemblies, as well as 1,153 archaeal viruses and ECEs were downloaded from the NCBI RefSeq database. Open reading frames were predicted using Prodigal, and all proteins from Borg genomes and the reconstructed ECE database were clustered into protein families and compared across genomes as previously described¹⁹. In brief, the coding sequences were clustered into families using a two-step procedure; first an all-versus-all sequence search was performed using an E value cut-off of 1 × 10⁻³, sensitivity of 7.5 and coverage of 0.5, and a sequence similarity network was built on the basis of the pairwise similarities and the greedy set cover algorithm to define protein subclusters. The resulting subclusters were grouped into protein families using a comparison of hidden Markov models. For subfamilies with probability scores of at least 95% and coverage at least 0.50, a similarity score (probability × coverage) was used as weight of the input network in the final clustering using the Markov clustering algorithm, with 2.0 as the inflation parameter. These clusters were defined as the protein families.

Functional annotation

Genes of interest were further verified and compared using the conserved domain search in NCBI and InterproScan⁴⁶ to identify conserved motifs within the amino acid sequence. MHCs were identified based on three or more CxxCH motifs within one gene. The cellular localization of proteins was predicted with Psort (v3.0.3) using archaea as the organism type. Proteins were compared using blastp and aligned using MAFFT⁴⁷ v.7.407 to visualize homologous regions and check conserved amino acid residues that constitute the active site or are required for cofactor and ligand binding.

Phylogenetic trees

For each gene, references were compiled by BLASTing the corresponding gene against the NCBI nr database, and their top 50 hits clustered by CD-HIT using a 90% similarity threshold⁴⁸. The final set of genes was aligned using MAFFT v.7.407, and a phylogenetic tree was inferred using IQTREE v.1.6.6 using automatic model selection⁴⁹ and visualized using iTOL⁵⁰. Synteny plots were generated using Mauve⁵¹ and gene clusters through Adobe Illustrator and gggenes.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The Borg and Methanoperedens genomes and their proteins reported in this study are provided as Source Data (Supplementary Data), along with phylogenetic trees and alignments related to ribosomal protein analysis from Borgs and Methanoperedens. Genomes and reads can be accessed via PRJNA866293.

Code availability

The code used to perform the correlation analysis is available through Zenodo (https://doi.org/10.5281/zenodo.6887003). All other code is readily available at the cited sources.

References

Wallenius, A. J., Dalcin Martins, P., Slomp, C. P. & Jetten, M. S. M. Anthropogenic and environmental constraints on the microbial methane cycle in coastal sediments. Front. Microbiol. 12, 631621 (2021).
Article PubMed PubMed Central Google Scholar
Thauer, R. K., Kaster, A.-K., Seedorf, H., Buckel, W. & Hedderich, R. Methanogenic archaea: ecologically relevant differences in energy conservation. Nat. Rev. Microbiol. 6, 579–591 (2008).
Article CAS PubMed Google Scholar
Hanson, R. S. & Hanson, T. E. Methanotrophic bacteria. Microbiol. Rev. 60, 439–471 (1996).
Article CAS PubMed PubMed Central Google Scholar
Boetius, A. et al. A marine microbial consortium apparently mediating anaerobic oxidation of methane. Nature 407, 623–626 (2000).
Article ADS CAS PubMed Google Scholar
Hallam, S. J., Girguis, P. R., Preston, C. M., Richardson, P. M. & DeLong, E. F. Identification of methyl coenzyme M reductase A (mcrA) genes associated with methane-oxidizing archaea. Appl. Environ. Microbiol. 69, 5483–5491 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Leu, A. O. et al. Lateral gene transfer drives metabolic fFlexibility in the anaerobic methane-oxidizing archaeal family Methanoperedenaceae. mBio 11, e01325-20 (2020).
Article PubMed PubMed Central Google Scholar
Ettwig, K. F. et al. Archaea catalyze iron-dependent anaerobic oxidation of methane. Proc. Natl Acad. Sci. USA 113, 12792–12796 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, S. et al. Methane-derived carbon flow through host-virus trophic networks in soil. Preprint at bioRxiv https://doi.org/10.1101/2020.12.16.423115 (2021).
Chen, L.-X. et al. Large freshwater phages with the potential to augment aerobic methane oxidation. Nat. Microbiol. 5, 1504–1515 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ng, W. V. et al. Snapshot of a large dynamic replicon in a halophilic archaeon: megaplasmid or minichromosome? Genome Res. 8, 1131–1141 (1998).
Article CAS PubMed Google Scholar
Ausiannikava, D. et al. Evolution of genome architecture in Archaea: spontaneous generation of a new chromosome in Haloferax volcanii. Mol. Biol. Evol. 35, 1855–1868 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, H., Peng, N., Shah, S. A., Huang, L. & She, Q. Archaeal extrachromosomal genetic elements. Microbiol. Mol. Biol. Rev. 79, 117–152 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lindell, D. et al. Transfer of photosynthesis genes to and from Prochlorococcus viruses. Proc. Natl Acad. Sci. USA 101, 11013–11018 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Anantharaman, K. et al. Sulfur oxidation genes in diverse deep-sea viruses. Science 344, 757–760 (2014).
Article ADS CAS PubMed Google Scholar
Hug, L. A. et al. Aquifer environment selects for microbial species cohorts in sediment and groundwater. ISME J. 9, 1846–1856 (2015).
Article PubMed PubMed Central Google Scholar
Lawrence, J. G. & Ochman, H. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44, 383–397 (1997).
Article ADS CAS PubMed Google Scholar
Hua, Z.-S. et al. Insights into the ecological roles and evolution of methyl-coenzyme M reductase-containing hot spring Archaea. Nat. Commun. 10, 4574 (2019).
Article ADS PubMed PubMed Central Google Scholar
DasSarma, S., Capes, M. & DasSarma, P. in Microbial Megaplasmids (ed. Schwartz, E.) 3–30 (Springer Berlin Heidelberg, 2009).
Al-Shayeb, B. et al. Clades of huge phages from across Earth’s ecosystems. Nature 578, 425–431 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Schoelmerich, M. C. et al. A widespread group of large plasmids in methanotrophic Methanoperedens archaea. Preprint at bioRxiv https://doi.org/10.1101/2022.02.01.478723 (2022).
Hall, J. P. J., Botelho, J., Cazares, A. & Baltrus, D. A. What makes a megaplasmid? Phil. Trans. R. Soc. B 377, 20200472 (2022).
Article CAS PubMed Google Scholar
Medema, M. H. et al. The sequence of a 1.8-Mb bacterial linear plasmid reveals a rich evolutionary reservoir of secondary metabolic pathways. Genome Biol. Evol. 2, 212–224 (2010).
Article PubMed PubMed Central Google Scholar
Wagenknecht, M. et al. Structural peculiarities of linear megaplasmid, pLMA1, from Micrococcus luteus interfere with pyrosequencing reads assembly. Biotechnol. Lett. 32, 1853–1862 (2010).
Article CAS PubMed PubMed Central Google Scholar
Liu, Z. et al. Domain-centric dissection and classification of prokaryotic poly(3-hydroxyalkanoate) synthases. Preprint at bioRxiv https://doi.org/10.1101/693432 (2019).
Berger, W., Steiner, E., Grusch, M., Elbling, L. & Micksche, M. Vaults and the major vault protein: novel roles in signal pathway regulation and immunity. Cell. Mol. Life Sci. 66, 43–61 (2009).
Article CAS PubMed Google Scholar
Cai, C. et al. A methanotrophic archaeon couples anaerobic oxidation of methane to Fe(III) reduction. ISME J. 12, 1929–1939 (2018).
Article CAS PubMed PubMed Central Google Scholar
McGlynn, S. E., Chadwick, G. L., Kempes, C. P. & Orphan, V. J. Single cell activity reveals direct electron transfer in methanotrophic consortia. Nature 526, 531–535 (2015).
Article ADS CAS PubMed Google Scholar
Scheller, S., Yu, H., Chadwick, G. L., McGlynn, S. E. & Orphan, V. J. Artificial electron acceptors decouple archaeal methane oxidation from sulfate reduction. Science 351, 703–707 (2016).
Article ADS CAS PubMed Google Scholar
Lindell, D., Jaffe, J. D., Johnson, Z. I., Church, G. M. & Chisholm, S. W. Photosynthesis genes in marine viruses yield proteins during host infection. Nature 438, 86–89 (2005).
Article ADS CAS PubMed Google Scholar
Heider, J., Szaleniec, M., Sünwoldt, K. & Boll, M. Ethylbenzene dehydrogenase and related molybdenum enzymes involved in oxygen-independent alkyl chain hydroxylation. J. Mol. Microbiol. Biotechnol. 26, 45–62 (2016).
CAS PubMed Google Scholar
Wang, Q. et al. Aerobic bacterial methane synthesis. Proc. Natl Acad. Sci. USA 118, e2019229118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Boyd, J. A. et al. Divergent methyl-coenzyme M reductase genes in a deep-subseafloor Archaeoglobi. ISME J. 13, 1269–1279 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bushnell, B. BBTools software package. http://sourceforge.net/projects/bbmap (Source Forge, 2014).
Joshi, N. & Fass, J. N. Sickle: a sliding-window, adaptive, quality-based trimming tool for FastQ files. GitHub https://github.com/najoshi/sickle (2011).
Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
Article CAS PubMed Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Article PubMed PubMed Central Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Article CAS PubMed Google Scholar
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Article CAS PubMed Google Scholar
Brown, C. T., Olm, M. R., Thomas, B. C. & Banfield, J. F. Measurement of bacterial replication rates in microbial communities. Nat. Biotechnol. 34, 1256–1263 (2016).
Article CAS PubMed PubMed Central Google Scholar
Darling, A. E., Mau, B. & Perna, N. T.progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010).
Article ADS PubMed PubMed Central Google Scholar
Bushnell, B. BBMap: A fast, accurate, splice-aware aligner. OSTI.gov https://www.osti.gov/biblio/1241166 (2014).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Biswas, A., Staals, R. H. J., Morales, S. E., Fineran, P. C. & Brown, C. M. CRISPRDetect: a flexible algorithm to define CRISPR arrays. BMC Genomics 17, 356 (2016).
Article PubMed PubMed Central Google Scholar
Makarova, K. S. et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
McWilliam, H. et al. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 41, W597–W600 (2013).
Article PubMed PubMed Central Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Article CAS PubMed Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128 (2007).
Article CAS PubMed Google Scholar
Darling, A. C. E., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1394–1403 (2004).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank G. Tyson, D. Nayak, N. Baliga, J. Cate, S. Diamond, R. Hatzenpichler and A. Murat Eren for helpful discussion; E. Smith, who proposed the name ‘Borg’; L. Law and S. Lei for data management assistance; and Y. Amano for permission to mention Methanoperedens spp.-dominated metagenomic datasets in which we did not discover Borgs. This research was supported by an NSF Fellowship to B.A.-S., DFG Fellowship to M.C.S., the US Department of Energy, Office of Science, Office of Biological and Environmental Research under award number DE-AC02- 05CH11231, the Chan Zuckerberg Biohub and the Innovative Genome Institute, UC Berkeley.

Author information

Authors and Affiliations

Innovative Genomics Institute, University of California, Berkeley, CA, USA
Basem Al-Shayeb, Marie C. Schoelmerich, Luis E. Valentin-Alvarado, Rohan Sachdeva, Alexander Crits-Christoph, Jennifer A. Doudna & Jillian F. Banfield
Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
Basem Al-Shayeb, Luis E. Valentin-Alvarado & Alexander Crits-Christoph
Environmental Science, Policy and Management, University of California, Berkeley, CA, USA
Jacob West-Roberts & Jillian F. Banfield
Earth and Planetary Science, University of California, Berkeley, CA, USA
Rohan Sachdeva, Susan Mullen & Jillian F. Banfield
Department of Soil and Crop Sciences, Colorado State University, Fort Collins, CO, USA
Michael J. Wilkins
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Kenneth H. Williams & Jillian F. Banfield
Rocky Mountain Biological Lab, Gothic, CO, USA
Kenneth H. Williams
Department of Chemistry, University of California, Berkeley, CA, USA
Jennifer A. Doudna
The University of Melbourne, Melbourne, Victoria, Australia
Jillian F. Banfield

Authors

Basem Al-Shayeb
View author publications
You can also search for this author in PubMed Google Scholar
Marie C. Schoelmerich
View author publications
You can also search for this author in PubMed Google Scholar
Jacob West-Roberts
View author publications
You can also search for this author in PubMed Google Scholar
Luis E. Valentin-Alvarado
View author publications
You can also search for this author in PubMed Google Scholar
Rohan Sachdeva
View author publications
You can also search for this author in PubMed Google Scholar
Susan Mullen
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Crits-Christoph
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Wilkins
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth H. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer A. Doudna
View author publications
You can also search for this author in PubMed Google Scholar
Jillian F. Banfield
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study was conceived by B.A.-S. and J.F.B. Metagenomic datasets were contributed by B.A.-S., R.S., L.E.V.A., A.C.-C., J.W.-R., M.J.W., S.M., K.H.W. and J.F.B. Genome binning was done by J.F.B., B.A.-S. and A.C.-C. Manual genome curation was conducted by J.F.B., with read mappings and CRISPR–Cas analysis from B.A.-S. Borg genome structure, taxonomic breakdowns, horizontal gene transfer and general feature analyses were conducted by B.A.-S. and J.F.B. J.W.-R., B.A.-S. and J.F.B. calculated relative abundances of Borgs and Methanoperedens spp. Phylogenetic analyses were conducted by B.A.-S., and repeat sequence comparisons across Borgs were done by B.A.-S. and J.F.B. General Borg and Methanoperedens spp. gene inventory and protein family analyses were done by B.A.-S. and J.F.B. Lilac Borg in-depth analysis was done by M.C.S. J.A.D. provided advisory support. B.A.-S., M.C.S. and J.F.B. wrote the manuscript, with input from all authors.

Corresponding author

Correspondence to Jillian F. Banfield.

Ethics declarations

Competing interests

J.F.B. is a co-founder of Metagenomi. J.A.D. is a cofounder of Caribou Biosciences, Editas Medicine, Scribe Therapeutics, Intellia Therapeutics and Mammoth Biosciences; is a scientific advisory board member of Vertex, Caribou Biosciences, Intellia Therapeutics, Scribe Therapeutics, Mammoth Biosciences, Algen Biotechnologies, Felix Biosciences, The Column Group and Inari Agriculture; is Chief Science Advisor to Sixth Street, a director at Johnson & Johnson, Altos and Tempus; and has research projects sponsored by Apple Tree Partners and Roche.

Peer review

Peer review information

Nature thanks Christian Rinke, Rudolf Thauer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Geochemical profiles of the permanently moist and organic-rich wetland soils.

(A) The mean concentrations of total carbon, nitrogen as well as (B) iron and manganese in wetland soils at 20 cm (n = 3), 40 cm (n = 5), and 90 cm (n = 2) where n denotes the number of biological samples. Deeper soils, where these extrachromosomal elements are most abundant, are somewhat depleted in carbon, iron and manganese compared to shallow soils. Error bars denote standard deviation. 36 samples were collected and sequenced, with 1 to 10 independent samples collected from the same soil depth.

Extended Data Fig. 2 Sets of three or more perfect tandem direct repeats (TDR) are a characteristic feature of the Borg genomes.

Up to 54 instances occur in the four complete Borg genomes, with, on average, one repeat every 12 (Lilac) − 31 (Sky) kbp. These repeat regions fragment assemblies and cause local assembly errors, which we resolved by manual curation (Methods). Within the TDR regions of the four curated, complete genomes, the unit repeats occur up to 20 times and unit repeats are up to 54 bps in length (Supplementary Table 2). Between 54 and 64% of these perfect TDRs are encoded in intergenic regions, although part or all of the first repeat may occur within the C- terminus of a protein-coding gene. When the TDRs occur within proteins, the unit lengths are almost always divisible by 3, so they introduce perfect amino acid repeats. TDR sequences within a single Borg genome are almost always unique. Repeat sequence comparison from the four complete curated Borgs highlights the novelty of almost all TDR sequences (both within and across genomes).

Extended Data Fig. 3 All genomes have two replichores of unequal lengths.

GC skew (grey plots) and cumulative GC skew (green lines) across the four complete Borg genomes, all of which end in long inverted terminal repeats (1.4–2.7 kbp in length). The cumulative GC skew plots indicate replication is initiated in these terminal repeats (red lines). Blue lines mark the predicted replication termini. The red and blue lines define two replichores of unequal length that correspond almost completely to distinct coding strands (almost all genes on the +ve strand of the large replichore and on the -ve strand of the small replichore).

Extended Data Fig. 4 Taxonomic profiles of the four complete Borg genomes.

A. In all cases, the majority of proteins have no similarity to proteins in the reference database (“Unknown”; e-value of > 0.0001). For the cases where a protein has an identifiable hit (blue and red bars in A), the plots in B. show the taxonomy of the organisms in which those hits were identified. Only cases where the same organism accounted for hits for > 0.5% of genes are shown. The results clearly indicate that the vast majority of cases where proteins have identifiable matches involve matches to proteins of Methanoperedens spp. (gold bars).

Extended Data Fig. 5 The clustering based on protein family content demonstrates that the Methanoperedens, Borgs, archaeal viruses and plasmids/minichromosomes are distinct from each other.

(A) Colored blocks indicate presence of each protein family in the corresponding genome. The blue highlight at the top indicates the Methanoperedens spp. (top) and Borg (bottom) protein family profiles. For details see Fig. 2d. We note that archaeal plasmids are highly undersampled. If Borgs are ultimately classified as plasmids, they dramatically expand the known characteristics (e.g., size, linear genomes) and diversity of archaeal plasmids. (B) Borg protein inventories (purple highlight) compared to giant linear bacterial plasmids. (C) Protein families occurring in more than 5 genomes of Borgs and giant linear bacterial plasmids. Few protein families are shared between Borgs and linear plasmids in bacteria beyond methyltransferases, histidine kinases, and other enzymes unrelated to replication. (D) Average Nucleotide Identity of different Methanoperedens species that coexist with Borgs (red) and previously reported genomes (gray) and the 95% species threshold shown with a dashed line.

Extended Data Fig. 6 Ribosomal protein analyses and phylogenies.

(A) The array of single-copy archaeal ribosomal genes (columns) vs. Borg (blue) and Methanoperedens spp. (gold) genomes illustrating that although Borgs often have rpL11 and occasionally, other ribosomal proteins, they do not have the gene inventory needed to construct ribosomes. (B) Left; Dendrogram of hierarchical clustering of all-vs-all Pearson correlation values between all Borgs and Methanoperedens spp. from the wetland. Right; Maximum Likelihood Phylogeny of concatenated ribosomal proteins from Methanoperedens species that do and do not coexist with Borgs and previously reported genomes. We found no data indicating the presence of Borgs in samples containing previously reported Methanoperedens genomes. We searched for Borgs in the samples highlighted in blue using the same methods used to detect Borgs in this study and concluded that they do not contain Borgs. A subset of the Borg-free samples contain Methanoperedens spp. at very high abundance levels.

Extended Data Fig. 7 Abundance and distribution of Borgs and Methanoperedens spp. in the wetland soil and Rifle aquifer.

A. Relative abundances of Methanoperedens spp. and Borgs in samples collected over time and arrayed by sample collection depth from the wetland soils, sediments and groundwater. The absolute abundances of Borgs are far greater in the deeper compared to shallower soils B. Although some Borgs can substantially exceed all the combined abundance of Methanoperedens spp., no Borgs were detected in some Methanoperedens-bearing samples. “W” indicates that the sample was pumped groundwater.

Extended Data Fig. 8 Genome comparisons and CRISPR-Cas interactions.

(A) Genome-to-genome comparisons provide evidence for recombination between two of the mostly closely related Borgs, Sky and Rose. These Borgs share only moderate overall genomic nucleic acid identity although, as is the case for other Borgs (Fig. 1a), have blocks of partially alignable sequence throughout their genomes. Notable, and indicating recent homologous recombination, are 100% identical regions of up to ~11 kbp in length (B). Although not fully manually curated to completion, the relevant Rose Borg genome regions were carefully checked by inspection of the mapped reads to rule out chimeric assembly that could otherwise explain perfect identity with the Sky Borg sequence (Sky is one of the four curated complete genomes). (C) Read coverages over the Rose and Sky genomes are consistent throughout, with the regions in B noted with green boxes. (D) Diagram illustrating the organization of the Type III-A CRISPR-Cas system variant (lacking acquisition machinery and Csm6) in the Orange Borg. One spacer from the CRISPR array targets a small protein with a ribbon-helix-helix motif, a common transcriptional regulator in archaeal mobile elements, in a mobile region of a Methanoperedens genome bin from the same wetland site.

Extended Data Fig. 9 The Borg ribosomal sequences form monophyletic groups that cluster adjacent to those from Methanoperedens spp.

Phylogenetic tree constructed using the protein sequences for (A) ribosomal protein L11 (rpL11), (B) Ribosomal protein S2 (C) Ribosomal protein 3ae.

Extended Data Table 1 Manually curated complete and draft genomes for the best sampled Borgs

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Fig. 1 and a Supplementary guide which includes descriptions for Supplementary Tables 1–7.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–7 – see Supplementary Information document for descriptions.

Supplementary Data

Alignments and trees used to generate figures in the paper, in FASTA and newick format.

Supplementary Data

Genome sequences generated for the paper in FASTA format.

Supplementary Data

Protein sequences of the genome sequences in FASTA format.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Al-Shayeb, B., Schoelmerich, M.C., West-Roberts, J. et al. Borgs are giant genetic elements with potential to expand metabolic capacity. Nature 610, 731–736 (2022). https://doi.org/10.1038/s41586-022-05256-1

Download citation

Received: 21 July 2021
Accepted: 22 August 2022
Published: 19 October 2022
Issue Date: 27 October 2022
DOI: https://doi.org/10.1038/s41586-022-05256-1

This article is cited by

Viral potential to modulate microbial methane metabolism varies by habitat
- Zhi-Ping Zhong
- Jingjie Du
- Matthew B. Sullivan
Nature Communications (2024)
Convenient synthesis and delivery of a megabase-scale designer accessory chromosome empower biosynthetic capacity
- Yuan Ma
- Shuxin Su
- Ying-Jin Yuan
Cell Research (2024)
A compendium of viruses from methanogenic archaea reveals their diversity and adaptations to the gut environment
- Sofia Medvedeva
- Guillaume Borrel
- Simonetta Gribaldo
Nature Microbiology (2023)
A comprehensive genomic catalog from global cold seeps
- Yingchun Han
- Chuwen Zhang
- Xiyang Dong
Scientific Data (2023)
Candidatus Alkanophaga archaea from Guaymas Basin hydrothermal vent sediment oxidize petroleum alkanes
- Hanna Zehnle
- Rafael Laso-Pérez
- Gunter Wegener
Nature Microbiology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Main

Genome structure and features

Borg gene inventories

Host-relevant gene inventory of Lilac Borg

Conclusions

Methods

Sampling and creation of metagenomic datasets

Genome identification, binning and curation

Correlation analyses

CRISPR–Cas analysis

Protein and gene content analysis

Functional annotation

Phylogenetic trees

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links