A genus in the bacterial phylum Aquificota appears to be endemic to Aotearoa-New Zealand

Allopatric speciation has been difficult to examine among microorganisms, with prior reports of endemism restricted to sub-genus level taxa. Previous microbial community analysis via 16S rRNA gene sequencing of 925 geothermal springs from the Taupō Volcanic Zone (TVZ), Aotearoa-New Zealand, revealed widespread distribution and abundance of a single bacterial genus across 686 of these ecosystems (pH 1.2-9.6 and 17.4-99.8 °C). Here, we present evidence to suggest that this genus, Venenivibrio (phylum Aquificota), is endemic to Aotearoa-New Zealand. A specific environmental niche that increases habitat isolation was identified, with maximal read abundance of Venenivibrio occurring at pH 4-6, 50-70 °C, and low oxidation-reduction potentials. This was further highlighted by genomic and culture-based analyses of the only characterised species for the genus, Venenivibrio stagnispumantis CP.B2T, which confirmed a chemolithoautotrophic metabolism dependent on hydrogen oxidation. While similarity between Venenivibrio populations illustrated that dispersal is not limited across the TVZ, extensive amplicon, metagenomic, and phylogenomic analyses of global microbial communities from DNA sequence databases indicates Venenivibrio is geographically restricted to the Aotearoa-New Zealand archipelago. We conclude that geographic isolation, complemented by physicochemical constraints, has resulted in the establishment of an endemic bacterial genus.

Forty-seven scaffolds from the draft genome sequence of CP.B2 T were analysed through the Integrated Microbial Genomes annotation pipeline v4.16.4 (IMG Taxon ID 2799112217, GOLD Analysis ID Ga0311387) 1 .The CP.B2 T genome is also accessible in GenBank (GCA_026108055.1).Estimated total size was 1.6 Mbp, with 29.6 % mol G+C.The number of protein coding genes was 1,707, with 1,409 of these having predicted function.Two 16S rRNA genes were detected.A second copy of the CP.B2 T genome exists in GOLD (Analysis ID Ga0170441) and IMG (Taxon ID 2724679818), sequenced directly from the culture collection DSMZ (DSM 18763) which was used to corroborate annotation.Detailed annotation on carbon assimilation, electron transport, sulfur, nitrogen and arsenic metabolisms, transmembrane transportation, cytosolic pH moderation, and comparison of Hydrogenothermaceae can be found below, with a full list of genes annotated with predicted function outlined in Supplementary Data 11.

Carbon assimilation
Presence of the Type 1 reductive TCA (rTCA) cycle was evident by the annotation of ATPcitrate lyase (ACL; gene aclAB), succinate dehydrogenase/fumarate reductase (sdhABC), and 2-oxoglutarate:ferredoxin oxidoreductase (korAB) genes 2 .Citryl-coA synthetase (ccsAB) and 2-oxoglutarate carboxylase (cfiAB), necessary for the alternate Type II rTCA cycle used by the Aquificaceae, were not present 2,3 .ACL is thought to have been acquired by the Hydrogenothermaceae through horizontal gene transfer 2 , initially formed from a fusion of citryl-coA lyase (ccl) and citryl-coA synthetase (ccsAB), both used in the older Type II cycle.
There was also no evidence of citrate synthase (gltA), found in many autotrophic bacteria, which can be used in both the oxidative TCA cycle for the production of citrate and the reversed oxidative cycle (roTCA) to fix carbon dioxide 4 .An entire Embden-Meyerhof-Parnas (EMP) pathway for glycolysis/gluconeogenesis was annotated 5 , with no evidence found of the alternate Entner-Doudoroff (ED) pathway or the oxidative branch of the pentose phosphate pathway (oxPPP), traditional sources of NADPH for cell metabolism 6 .A non-phosphorylating glyceraldehyde-3-phosphate dehydrogenase gene (gapN) was annotated, which can produce NADPH by irreversibly oxidising glyceraldehyde-3-phosphate (GAP) straight to 3phosphoglycerate (3-PG) in the EMP pathway 6 .There was also no evidence of genes for carbon monoxide dehydrogenase, or ribulose-bisphosphate carboxylase (RuBisCo) from the Calvin-Benson-Bassham cycle.

Electron transport
The genome had 13 subunits of the proton-translocating NADH:ubiquinone oxidoreductase (nuoA-N) for complex I of the electron transport chain.Succinate dehydrogenase (sdhABC and frdB) was annotated for complex II.Three subunits for the cytochrome bc1 complex (cytochrome c oxidoreductase) were encoded for complex III (petABC), with cytochrome bd being the respiratory terminal oxidase (cydAB; complex IV).Research has shown that cytochrome bd increases expression in response to a range of environmental stressors, including pH and temperature extremes 7 .The genome had membrane-bound F-type ATPases (atpA-H) for complex V.

Sulfur, nitrogen & arsenic
The only gene found in the CP.B2 T genome from the SOX pathway (soxD) for sulfur/thiosulfate oxidation is a subunit in cytochrome c of the electron transport chain, and may also be involved in arsenic cycling 8 .Cytochrome c is reduced as an intermediate between complexes III and IV 9 , and a proton gradient is created for ATP synthesis.There was no evidence of sulfite dehydrogenase (sorAB or soeA) in the genome, even though genes were present for the biosynthesis of the cofactor molybdopterin 10 .No further genes from other sulfur-metabolising pathways including sulfur oxygenase reductase (SOR), thiosulfate dehydrogenase (tqoAB), dissimilatory sulfite reductase (dsrABC), and heterodisulfide reductase subunits hdrC1 and hdrB2 11 were detected.Nitrogen assimilation via the uptake of ammonia by glutamine synthetase (glnA) and glutamate synthase (gltBD) was noted, along with nitronate monooxygenase (NMO) which putatively generates nitrite from nitroalkane.No genes associated with nitrogen-dissimilatory pathways were identified.There was no evidence of genes encoding nitric oxide reductase (norB) for denitrification, which is commonly found in other Hydrogenothermaceae.Congruently, CP.B2 T was the only Hydrogenothermaceae analysed to not encode a nicotinamidase gene (pncA).While the arsenic resistance operon arsRBC was annotated in the CP.B2 T genome, there was no indication of the arsenic ABC transporter ATPase gene (arsA; as part of the more complex arsRDABC operon) 12 .

Transmembrane transportation
Genes associated with transport systems for the facilitated diffusion of ions across the membrane were found for sodium, iron, calcium, magnesium, iron and ammonium, with multiple transport systems found for potassium.There were also several ABC transporters for zinc, cobalt, nickel, phosphate and a range of molecules involved in cell membrane formation (e.g., phospholipids, lipopolysaccharides, and lipoproteins).While a gene for the molybdate transport system regulatory protein (modE) was present, the rest of the high-affinity molybdate uptake system (modABCD) was missing.This is contrary to the rest of the Hydrogenothermaceae.There was also one copper exporting ATPase (copB) encoded in the genome, whereas all other Hydrogenothermaceae analysed had a range of two to five.

Ability to moderate cytosolic pH
Two genes associated with the aguBDAC operon for increasing alkalinity within the cell, agmatine deiminase (aguA) and putrescine amidase (aguB), were detected in the genome.
Potential for malolactic fermentation was also noted by the presence of a lactate dehydrogenase gene (dld).No copies of glutamate, arginine or lysine decarboxylase, urease, or tryptophanase were observed, indicating that Venenivibrio does not possess these well-known mechanisms to manage cell pH homeostasis 17 .However, it should be noted that all Hydrogenothermaceae family members analysed appear to have no significant differences in genomic capabilities when it comes to cytosolic pH modulation.

Hydrogenothermaceae genomes
Along with CP.B2 T , a group 2d hydrogenase was found in three Persephonella spp., H. marinus VM1 T , and S. subterraneum HGMK1 T .Sulfurihydrogenibium sp.Y03AOP1 and S. yellowstonense SS-5 T had no hydrogenases, and instead rely on the oxidation of reduced sulfur species 18 .An arsenite oxidase gene (aioA, formally aroA or aoxA) 15 , which was found in four Sulfurihydrogenibium spp.analysed, was not annotated in the CP.B2 T genome.study 23 , with a similarity range of 95.2-95.7 %.These were most closely related to S. azorense Az-Fu1 T (98.9-99.8%) and were from a hot spring in the Nagano Prefecture, Japan.All other results from NCBI, both full and partial length, were <95 % sequence similarity to V. stagnispumantis CP.B2 T .

Sequence Read Archive (NCBI)
Three samples from SRA had total k-mer counts of 1672, 3218 and 7146 that assigned to the genus Venenivibrio when searched using the STAT program (Supplementary Data 15), which searched 208.5 gb of metadata from a possible 12.2 petabytes (or 27.3 quadrillion bases) of open access sequence data (06/Dec/2021; https://www.ncbi.nlm.nih.gov/sra/docs/sragrowth/). One of these samples was a metagenome from a geothermal spring in Aotearoa-New Zealand that was already included in this study (P1.0019,Radiata Pool; SRR14702244).The other two (SRR15830908 and SRR15830907) were 16S rRNA gene amplicon samples from seafloor hydrothermal vents near the Baja California Peninsula, Mexico.The greatest sequence similarity to the full length 16S rRNA gene of V. stagnispumantis CP.B2 T found across both of these samples was 92 %, with 16 % of the gene covered by the query.The remaining samples from this SRA search (n=85) that produced k-mers assigned to Venenivibrio had counts that ranged between 25 and 430 (Supplementary Data 15).

SILVA database
The SILVA SSU r138.1 database contained a total of 33 entries classified to Venenivibrio, including the 16S rRNA gene of V. stagnispumantis CP.B2 T , from a total of 9,469,124 aligned rRNA sequences (Supplementary Data 13).Seven of these entries were clones originating from Aotearoa-New Zealand hot springs and all had ≥98.0 % pairwise sequence similarity to V. stagnispumantis CP.B2 T (GenBank accessions AF402979, EF101539, EF101540, FN429034, FN429035, FN429036, and FN429037) 19,21,22 .The remaining 25 entries composed of one isolate and 24 clones and had a similarity range of 79.3-94.7 % to V. stagnispumantis CP.B2 T (Supplementary Data 13).An approximate maximum-likelihood phylogenetic tree of all 33 aligned sequences clustered only the seven Aotearoa-New Zealand clones together with V. stagnispumantis CP.B2 T (Figure S6).SILVA identified one closest neighbour (≥95 % identity) to V. stagnispumantis CP.B2 T in the Ref NR database, the Aotearoa-New Zealand clone AF402979 21 .

Greengenes database
The latest version of the Greengenes 16S rRNA gene database v13.8 (August 2013) had 1,262,986 unique sequences.These were clustered into 203,452 and 99,321 representative OTUs at 99 % and 97 % similarity, respectively.Only one of these OTUs classified as Venenivibrio stagnispumantis (Greengenes OTU ID 1142935) in the 99 % representative set, with one additional sequence in the database (ID 189417) also found mapped to this cluster.
Both sequences corresponded to the 16S rRNA gene sequence of the type strain, CP.B2 T (GenBank accessions DQ989208 and NR_044029).No OTU in the 97 % representative set was assigned to Venenivibrio, with only 25 classifying to the family Hydrogenothermaceae.
From these 25 OTUs, OTU ID 32720 had 98.6 % sequence similarity to V. stagnispumantis CP.B2 T and the corresponding GenBank accession was AF402979, a clone found in a hot spring from Kuirau Park, Aotearoa-New Zealand 21 .The next highest similarity from the 97 % OTUs was OTU ID 242647 at 94.2 %, with all others at ≤92.7 %.OTU ID 1142935 was also mapped to OTU 32720 cluster in the 97 % reference set.

Integrated Microbial Next Generation Sequencing (IMNGS) platform
From a total of 500,048 samples (24,835,679,746 reads), 31 OTUs were found across 29 samples with ≥95 % sequence similarity to the 16S rRNA gene of V. stagnispumantis CP.B2 T (Supplementary Data 14).None of these OTUs had ≥99 % similarity, with only eight having ≥97 % across ≤34 % of the CP.B2 T 16S rRNA gene.One of these OTUs (with ≥97 % similarity) was sourced from a sample of sugarcane root soil in Australia (SRA run accession SRR1924223) 24 , and two were from peat soil adjacent to cold temperature springs in Canada (SRR1029457 and SRR2026416) 25 .However, the abundance of these OTUs ranged from one to five reads which contributed to only ≤0.01 % of the sample communities and were present in ecosystems not conducive to supporting Venenivibrio populations (e.g., soils, cold temperature).The remaining five OTUs with ≥97 % similarity to CP.B2 T were sourced from synthetic samples (Supplementary Data 14).Samples containing reads at ≥95 % sequence similarity to Venenivibrio which accounted for ≥0.1 % of the sample community were all from Aotearoa-New Zealand geothermal springs (n=10).

Integrated Microbial Genomes and Microbiomes (IMG/M) database
IMG/M had a total of 147,328 datasets at the time of analysis, including 26,097 distinct public non-redundant genomes and 83,287 metagenomic bins.Thirteen genomes were found classified to the Hydrogenothermaceae from 11 isolates and two MAGs.Two of these genomes were found assigned to the genus Venenivibrio: Venenivibrio stagnispumantis DSM 18763 (IMG Genome ID 2724679818; Gold Analysis ID Ga0170441), which was sequenced from the type strain stored in the DSMZ culture collection; and Venenivibrio stagnispumantis CP.B2 (IMG Genome ID 2799112217; Gold Analysis ID Ga0311387), which was sequenced by this study from the type strain stored in the original laboratory that isolated the microorganism.No other genomes in the entire collection contained genes that matched the 16S rRNA gene of V. stagnispumantis CP.B2 T .Likewise, no metagenome bins were found assigned to the genus.There were 20 bins classified to the Hydrogenothermaceae, using GTDB-Tk lineage, and these were either Sulfurihydrogenibium (n=11) or Persephonella (n=9).

Earth Microbiome Project (EMP) & Qiita database
No samples from the first release of the Earth Microbiome Project (EMP) had sequences that matched Venenivibrio in the analysis classified using SILVA taxonomy.This dataset had a total of 27,411 samples, 126,730 OTUs, and 1,754,319,647 sequence reads.The reference OTUs for this analysis were checked and two OTUs were classified to the genus Venenivibrio (AF402979.1.1441and KM221400.1.1484).To confirm this result, the redbiom search function was enabled for both taxon (g__Venenivibrio) and features (OTU ID 32720 and 1142935) in the Greengenes-assigned taxonomy.Again, no hits were found.The EMP subset used to create trading cards had 155,002 unique sequences (ASVs; amplicon sequence variants) from a total of 10,000,000 reads, 2,000 samples and 95 studies.Venenivibrio was also not found in this subset and there were only two Hydrogenothermaceae-assigned ASVs, one found in five samples with a total of 238 reads, the second found in one sample with six reads (or observations).These six samples were from the Lost City Hydrothermal Field, Mid-Atlantic Ridge, and no significant similarity was found to the 16S rRNA gene of V. stagnispumantis CP.B2 T .Finally, no results were found for Venenivibrio within publicly available studies (n=599) in the Qiita platform.A total of 276,184 samples were searched by taxon name, OTU IDs and sequence.

NCBI & Google Scholar word search
A word search for Venenivibrio in all NCBI databases highlighted four entries in SRA and four entries in GenBank.Two of the SRA entries belonged to whole genome sequencing of the type strain CP.B2 T from the culture collection DSMZ (DSM 18763; SRA runs SRR5889102 and SRR5889103).The other two entries were amplicon sequences added in 2020 from an unpublished study of hot spring microbial communities in SiChuan, China (SRR10580885 and SRR10580889).The highest sequence similarity found with the full length 16S rRNA gene of V. stagnispumantis CP.B2 T was 93.1 %, with 16 % of the gene covered.The GenBank results included three accession numbers from the type strain CP.B2 T (DQ989208, NR_044029, and EF581124) 19,20 , with the fourth result from an environmental clone labelled as uncultured Venenivibrio sp.CCB8131 (GenBank accession KY480601).This sequence had only 78.3 % sequence similarity to V. stagnispumantis CP.B2 T , with 85 % of the query covered.
There were eight published manuscripts that contained the word 'Venenivibrio' from NCBI's PubMed Central database (PMC; accessed 28/Apr/2022).These included five studies that used samples from Aotearoa-New Zealand geothermal springs [26][27][28][29][30] , and three publications that referenced the type strain V. stagnispumantis CP.B2 T and/or associated characterisation [31][32][33] .A similar search was also conducted in Google Scholar (accessed 28/Apr/2022) which highlighted an additional nine manuscripts that referenced the type strain [34][35][36][37][38][39][40][41][42] , plus two that reported Venenivibrio taxa in amplicon sequencing of Chinese hot spring and wetland microbial communities 43,44 .The first of these studies described an average of 8.7 % Venenivibrio in the microbial communities across 16 hot springs 43 ; however, the greatest sequence similarity to CP.B2 T from all of these samples was 93.5 % over 16 % of the 16S rRNA gene.The second study reported trace amounts of Venenivibrio (0.03-2.15 %) in three microbial communities of a voltage-applied wetland plant 44 .The 16S rRNA gene sequences from this study were not deposited in a database for review, so this result could not be verified.yellowstonense SS-5 T ) being covered.The depth of coverage from the recovered genomes varied across samples, depending on taxon concentration that comprised the initial community makeup (Supplementary Data 21).Average coverage depth of V. stagnispumantis was 100x in the sample with 100 % of reads from that species.This reduced to 2.3x for samples with just 1 % V. stagnispumantis.A similar trend was observed for coverage depth for P. hydrogeniphila and Sulfurihydrogenibium sp.Y03AOP1.

Figure S1 .
Figure S1.Venenivibrio 16S rRNA gene diversity in the Taupō Volcanic Zone (TVZ), Aotearoa-New Zealand.(a) The number of operational taxonomic units (OTUs) found across 467 geothermal springs that assigned to the genus Venenivibrio are shown (post filtering; n=99 OTUs).(b) Springs (n=467) are plotted as a function of environmental pH and temperature conditions.The number of Venenivibrio-assigned OTUs in each spring is represented by blue circles (<30), green squares (30-60), or red triangles (>60), with data ellipses assuming multivariate t-distribution and a 95 % confidence interval.

Figure S2 .
Figure S2.Prevalence, read abundance, pH, and temperature ranges of low abundance Venenivibrio operational taxonomic units (OTUs).Low abundance OTUs (<10 % relative abundance per spring community) that assigned to the genus Venenivibrio are shown (post filtering; n=99 OTUs).These are ordered by the number of springs where each OTU was found (i.e., prevalence).Median pH and temperature for low abundance OTUs were 6.0 (IQR 1.7) and 62.9 °C (IQR 23.4), respectively.

Figure S3 .
Figure S3.Venenivibrio hotspots in the Taupō Volcanic Zone (TVZ), Aotearoa-New Zealand.Geothermal springs containing Venenivibrio 16S rRNA genes (post filtering; n=467) are shown in the centre map, with springs containing ≥85 % of the microbial community (n=20) highlighted in red.These springs are also presented in their respective geothermal fields, with circles coloured and sized according to Venenivibrio relative read abundance per spring community.Total number of springs that contained Venenivibrio per geothermal field are in brackets.Median pH and temperature of these hotspots were 5.5 (IQR 0.8) and 66.0 °C (IQR 18.1), respectively.Map data ©2022 Google.

Figure S4 .
Figure S4.Distance-decay pattern of Venenivibrio populations in Aotearoa-New Zealand.Bray-Curtis dissimilarity was calculated between non-transformed Venenivibrio populations only (at operational taxonomic unit [OTU]-level) in geothermal springs (post-filtering, n=467) from across the TVZ, and plotted against pairwise geographic distance.A linear regression model was applied, highlighted in red (slope=4.36x 10 -5 ).

Figure S5 .
Figure S5.Venenivibrio relative read abundance per geothermal spring in Aotearoa-New Zealand.Springs that containing Venenivibrioassigned reads (post filtering; n=467) are shown, split by respective geothermal field.Operational taxonomic units (OTUs) with the greatest read abundance (n=20) are represented by colour in each spring.

Figure S6 .
Figure S6.Phylogenetic tree of 16S rRNA gene sequences assigned to Venenivibrio.Maximum-likelihood quartet-puzzling phylogenetic tree showing the position of the type strain Venenivibrio stagnispumantis CP.B2 T , with near full-length environmental 16S rRNA gene clones that have been reported as belonging to or are closely related to the genus Venenivibrio from the SILVA database (SSU r138.1).Six new Venenivibrio strains recently isolated from the TVZ (CPO1, KUI1, KUI2, LRO1, LRO2 and OKO1), and type strains from closely related genera Sulfurihydrogenibium and Persephonella have also been included, with type strains from the family Aquificaceae collapsed for clarity.Caldisericum exile AZM16c01 T was used as an outgroup.Quartet-puzzling support values (10,000 resamples) are represented by the following symbols: [open circles] >90 %, [closed circles] >80 %, and [open diamonds] >70 % at each internal branch.Multifurcations are drawn where the support value for a bifurcation is <50 %.The scale bar represents 0.05 substitutions per nucleotide position.