Two novel bacteriophage genera from a groundwater reservoir highlight subsurface environments as underexplored biotopes in bacteriophage ecology

Although bacteriophages are central entities in bacterial ecology and population dynamics, there is currently no literature on the genomes of bacteriophages isolated from groundwater. Using a collection of bacterial isolates from an aquifer as hosts, this study isolated, sequenced and characterised two bacteriophages native to the groundwater reservoir. Host phylogenetic analyses revealed that the phages targeted B. mycoides and a novel Pseudomonas species. These results suggest that both bacteriophages represent new genera, highlighting that groundwater reservoirs, and probably other subsurface environments as well, are underexplored biotopes in terms of the presence and ecology of bacteriophages.

Burst size. For each phage, a one-step growth curve experiment 9 was conducted to determine the phage latency period and phage burst size. A preliminary requirement to this experiment, though, is to establish the CFU-OD 600nm relationship of the host at the relevant conditions. This relationship is in turn used to determine the OD at which the appropriate concentration of (viable) host cells is reached, and a sensible inoculum is used for the given experimental conditions. Unfortunately, a reliable OD 600nm -CFU relationship for the host of Anath was not possible due to the rhizoid growth behaviour of B. mycoides. Furthermore, microscopy observations during exponential growth of this host (data not shown) revealed that the cells existed not as singular cells, but predominantly as multicellular filamentous growth as well as singular cells. Naturally, this type of growth distorts an accurate burst size determination for Anath and stresses the need for the development of new methods to study phage-host interactions in non-model organisms. Thus, it was only possible to complete the one-step growth curve for Lana. For Lana, the latency period was found to be 106.67 ± 3.33 (SEM) min and burst size was determined to be 19.85 ± 3.55 (SEM) phage progenies per cell. The one-step growth curve for phage Lana is provided in Figures S5 in the Supplementary Information. The phage-host interactions were all performed under standard laboratory conditions (planktonic cells and room temperature). This evidently does not reflect the natural conditions in groundwater and as such it is possible that the growth parameters determined for Lana are not representative of its natural growth. Ideally, methods should also be developed that allow the study of phagehost interactions under conditions closer to those found in groundwater, such as cells growing in biofilms.
General features of phage genomes. Assembly of Anath (average coverage × 3,168) revealed a 52 369 bp linear genome with a 41.1% GC content. Interestingly, this is a higher GC content compared with its host (35.2%). Thus, the sequence of Anath is in contrast to the common trend found among phages that shows a lower 10,11 or similar GC content to their hosts 12 , but falls within the variability of the phage-host GC ratio at this genome size 13 . Anath harboured 76 ORFs ( Fig. 2A), and via protein database analysis, putative protein functions were assigned to 20 of the 76 ORFs. Furthermore, its genome contained genes with a similarity to counterparts found in other Bacillus phages 14 associated with DNA replication/metabolism and lysis. An overview of phage Anath genes annotated with predicted functions is provided in Supplementary Table S1.
The assembly of phage Lana (average coverage × 142.6) revealed a linear 88 342 bp genome with a 60.8% GC content. The GC content of Lana was similar to its host (60.65%) and thus shared the common trend in phagehost GC relationships 13 . Putative protein functions were assigned to 27 of the 133 ORFs (Fig. 2B). An overview of phage Lana genes annotated with predicted functions is provided in Supplementary Table S2. Similarity with other phages. As one of the first two sequenced groundwater phages and a novel B. mycoides phage, Anath was expected to show a weak resemblance to known phages. A BLASTn search 15 of Anath (June 2019) resulted in 11 Siphoviridae hits, of which one was an unclassified Siphoviridae member-vB_BpsS-36   16 in a tBLASTx fragmented multiple alignment (accurate mode; fragment size = 200 bp, sliding-step size = 100 bp). The analysis revealed a similarity in amino acid sequence of just 28-32% between Anath and any of the included phages (Fig. 3A). Genomes were then compared between Anath, vB_BpsS-36 and selected representatives of Andromedavirus (Gemini, Leo2, Taylor and Finn) using the Easyfig visualisation tool (version 2.2.3) 17 in BLASTn mode (Fig. 3B). As expected, the results revealed similar synteny with the closely related Andromedavirus. However, while Anath shared synteny with these phages, several differences were also revealed that suggested a more distant phylogenomic relation, with phage vB_BpsS-36 as its closest known relative. Based on the relatively conserved large terminase gene in the genomes (Fig. 3B), a maximum-likelihood phylogenetic tree was constructed using the MUSCLE algorithm 18 21 . In support of this, a comparison between the Lana and PMBT3 revealed some discrepancies in gene synteny (Fig. 4). Due to the lack of other BLASTn phage hits, no further phylogenomic analyses were undertaken for phage Lana.
To assess the phylogeny of both phages including more distantly related phages and prophages, trees were built based on the conserved nucleotide sequences of the major capsid protein (Anath) and terminase (Anath and Lana). For the analyses phages from a combination of both BLASTp and BLASTn searches were included. However, the results did not provide any new information with regards to the classification. The trees are shown in Figures

conclusions
The sampling method in the present study followed the method applied in the recent study by Korbel et al. (2017) 22 to investigate the microbiology of groundwater. It recommends that groundwater wells are purged of (at least) three times their volume before a sample is designated as "aquifer" water. Since the wells were purged a minimum of 13 times of their volume before sampling, there can be confidence that bacteriophages indigenous to the reservoir were isolated. In summary, this study revealed for the first time the complete genomes of two novel phages isolated from groundwater, which are likely to represent their own separate genera. Thus, in view of (i) the distant relationship between the Anath and Lana hosts (Bacillus and Pseudomonas, respectively) and (ii) the present evidence suggesting that the phages represent novel genera, it can be argued that the limited scope of this study further confirms that groundwater reservoirs are an underexplored, heterogenic and poorly understood biotope. This is in line with recent findings that describe aquifers as (i) harbouring diverse microbial communities 1,6,23 and (ii) acting as viral reservoirs 7 . This also emphasises the great potential of groundwater reservoirs for elucidating novel food web interactions and novel phage discoveries. Therefore, continued effort should be made to further isolate and catalogue phages from subsurface environments in order to contribute significantly to studies of predator-prey interactions and biogeochemical cycling, for example, and to the development of groundwater bioremediation strategies.

Materials and methods
Media and culture conditions. All bacterial culturing steps, with or without phages, were undertaken at room temperature (~ 22 °C) using R2A agar and R2B broth (Alpha Biosciences Inc., Baltimore, Maryland, USA) as growth media. Liquid cultures were grown with agitation at 200 rpm. Between uses, bacterial isolates were stored on R2A agar at 4 °C or as cryostocks at − 80 °C (~ 20% glycerol). Purified phage suspensions were stored at 4 °C between uses. The buffer solution (SM buffer) for phage resuspension contained 100 mM NaCl, 8    In brief, sample bottles with groundwater were shaken thoroughly and 100 µL of each well sample was plated and left to grow for 4-14 days. Colony growth from samples varied greatly and library building was based on acquiring all the colony morphotypes throughout the incubation period. Where possible, several colonies of a distinct morphotype (up to a maximum of eight) were picked. To obtain pure isolates, colonies were re-streaked three times.
Phage screening, isolation and amplification. Phage activity was detected in a pre-screening of phage enrichment cultures containing sample water, concentrated media and the relevant isolate. Screening hits, indicated by clear or turbid zones in the top agar layer as a result of plausible phage activity, were then followed by a new enrichment to verify phage activity and subsequently obtain pure phage isolates.
The initial enrichment and pre-screening were performed by mixing 2.5 mL 2 × R2B, 2.5 mL sample water and then inoculating with 50 µL culture of the relevant isolate (isolated from that sample water). Following five days of growth, enrichment cultures were centrifuged at 10,000 g for three minutes and supernatant was filtered through a 0.22-µm sterile syringe PVDF membrane filter (Millex Durapore, Burlington, MA, USA). Phage activity in the filtrate from each enrichment was screened in a standard double agar overlay assay. Briefly, 10 µL of each filtered enrichment was drop-plated in 10 replicates onto a semisolid agar, R2B + 50 mM CaCl 2 /MgCl 2 + 0.6% agarose, inoculated with 100 µL culture of relevant bacterial isolate, using R2A as the bottom agar layer. Plates were incubated at room temperature and inspected daily for one week, with any potential plaque forming noted as 'hits' for groundwater sample and isolate.
The follow-up enrichment cultures of phage-host 'hits' contained 4 mL 10 × R2B broth, 35 mL groundwater sample, 1 mL culture of relevant host and 10 mM CaCl 2 /MgCl 2 . Enrichment cultures were incubated for 24 h. Subsequently, the culture was supplemented with 1 M NaCl and incubated for 30 min. The culture was then centrifuged at 5,000 g for five minutes and the supernatant was filtered through a 0.45-µm PVDF syringe filter (Millex Durapore, Burlington, MA, USA). Then 100 µL of the filtered supernatant was used in a double agar overlay assay as described above, and individual plaques were picked and re-plated three times to obtain pure phage samples.
A high titre of both phages was obtained by polyethylene glycol (PEG) precipitation as described in 9 with modifications. Briefly, 200 mL R2B was inoculated with 100 µL host culture, infected with 100 µL of pure phage lysate and left overnight. The following day, 1 M NaCl was added to cultures and left on an orbital shaker for one hour to burst bacterial cells. Cultures were then spun at 12,000 g for 10 min, supernatant was collected and PEG was added to reach 10% w/v. The mixture was left for two hours on an orbital shaker for phage-PEG adsorption. Finally, the mixture was centrifuged at 12,000 g for 10 min and the pellets with phage virions were resuspended and collected in 3-5 mL SM buffer. Following PEG precipitation, titre was determined by PFU counts by drop-plating dilution series of collected particles on an agar overlayer, as described above. A titre range between 10 10 -10 11 PFU mL −1 was obtained. Single plaques or high-titre phage samples (10 10 -10 11   www.nature.com/scientificreports/ California, USA), following the manufacturer's protocol. DNA libraries were prepared with the Nextera XT DNA kit (Illumina, San Diego, USA) according to the manufacturer's protocol. The prepared libraries were then sequenced in a 2 × 251 paired-end sequencing run, as part of a flow cell, using the Illumina MiSeq v2 kit (Illumina, San Diego, CA, USA). In assembling host draft genomes, CutAdapt (v1.8.3) was used to qualitytrim sequence reads (bases with < q20 removed from read ends) and to remove any contaminants (primers and indexes). Shorter reads (< 50 bp) were then removed and overlapping read pairs merged using AdapterRemoval (v2.1.0) 26 at default settings. Finally, the cleaned merged and unmerged reads were assembled using SPAdes (v3.6.0) 27 and assemblies evaluated in QUAST (v3.1) 28 . For each host, the genome sequence data were uploaded to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform available at https ://tygs.dsmz.de, for a whole genome-based taxonomic analysis 8 . Methods (and results) for strain identification are provided in the Supplementary Information. transmission electron microscopy of virions. The caesium chloride-purified phage sample was adsorbed to freshly prepared ultra-thin carbon film and fixed with 2% (v/v) EM-grade glutaraldehyde (20 min). Fixed samples were then negatively stained with 1% (w/v) uranyl acetate and picked up with 400-mesh copper grids (Plano, Wetzlar, Germany). Finally, prepared samples were analysed using a Tecnai 10 transmission electron microscope (Thermo Fisher, Eindhoven, the Netherlands) at an acceleration voltage of 80 kV. Micrographs were taken with a MegaView G2 CCD-camera (EMSIS, Muenster, Germany).
phage Lana burst size. Phage latency period and burst size were determined for phage Lana in a one-step growth curve experiment as described elsewhere 9 . Here, the host was grown to OD 600nm 0.75, corresponding to 4 × 10 7 CFUs mL −1 . Then, 10 mL culture was infected with a 0.05 multiplicity of infection and incubated for 20 min at 200 rpm to allow phage-host adsorption. After adsorption, three aliquots of infected culture were diluted × 10,000, and PFUs in the three diluted cultures were followed over time to determine the phage latency period and phage burst size. Experimental cultures were sampled at the beginning of the experiment and then every 10 min from 90 min until the end. Average PFU numbers before and after the burst event were used to calculate the burst size. In calculating the burst size, the infection efficacy of phage Lana after the adsorption step was also considered. After adsorption, one sample from the infected culture was centrifuged at 6,000 g for five minutes, thereby separating infecting phages (pellet) and unadsorped phages (supernatant). Subsequently, three technical replicates from the resuspended pellet and the supernatant were plated and PFUs were counted. The infection efficacy of Lana was then determined to be 19.27% ± 1.04 SEM. Prior to the experiment, the study established (i) a growth curve for the host strain to determine the OD 600nm -CFU relationship and (ii) an approximate phage latency period (data not shown).
phage DnA extraction and genome sequencing. For both phages the protocol for the direct plaque sequencing (DPS) method was used as described by Kot et al. 29 with the following modifications for phage Anath only: 500 µL high titre lysate was used as input (~ 2 × 10 10 PFU mL −1 ), 10 µL (100 mg mL −1 ) of protein kinase K (Thermo Scientific, Waltham, USA) was used in capsid DNA release, and 10 µL was used as elution volume for purified DNA. Phage DNA libraries were prepared with the Nextera XT kit DNA kit (Illumina, San Diego, USA), using the DPS method described in Kot et al. 29 for phage Lana and the manufacturer's kit protocol for phage Anath. Prepared libraries were sequenced in a 2 × 250 paired-end sequencing run, as part of a flow cell, using the Illumina MiSeq platform (Illumina, San Diego, USA). phage genome assembly and annotation. Sequence reads were trimmed and assembled in the CLC Genomic Workbench 11.1.0 (CLC bio, Aarhus, Denmark) using standard settings, and assembly was crossverified using SPAdes (version 3.13.0, using trimmed and merged reads as input running on-careful mode), as described elsewhere 30 . Assembled genomes were automatically annotated using the RAST online tool 31 and were manually curated by cross-referencing with four other publicly available protein recognition tools: BLASTp, Pfam, HHpred and Phyre [32][33][34][35] . Predicted protein functions were annotated accordingly when identical functions were predicted in at least three of the five databases used.
De novo peptide sequencing-identification of structural proteins. To identify the proteins, the previous procedure for protein purification from Lavigne et al. 36 was followed with minor modifications. In short, 100 µL of the phage extract was transferred to an Amicon Ultra filter unit (MWCO 30 kDa) and centrifuged at 14,000 × g for 20 min and further desalted four times with 450 µL water. The filtrate containing the phage particles (10 µL) was denaturised in 25 µL buffer consisting of 6 M urea, 5 mM dithiothreitol and 50 mM Tris-HCl (pH 8). The phage particles were destabilised by five successive freeze-thawing cycles followed by a full hour incubation at 60 °C to reduce the phage proteins. The proteins were alkylated by adding 25 µL of 100 mM iodoacetamide and 150 µL of 50 mM ammonium bicarbonate and incubated for 45 min at room temperature. Phage proteins were digested with 0.8 µg trypsin dissolved in 40 µL 50 mM ammonia bicarbonate and incubated for 24 h at 37 °C. The protein digest was diluted with 200 µL 0.1% trifluoroacetic acid (TFA) and purified by solid-phase extraction using 2 mg hydrophobic reversed phase well-plate cartridges (Thermo Fisher Scientific) preconditioned with 200 µL acetonitrile and 200 µL 0.1% TFA. The peptides were eluted from the cartridges with two times 25 µL 70% acetonitrile and diluted with 150 µL 0.1% TFA. The phage peptides were analysed using an Ultimate 3,000 RSLCnano UHPLC system hyphenated with a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Denmark). An amount of 6.4 µL of the sample was loaded on a preconcentration trap (C 18 300 µm × 5 mm cartridge, Thermo Fisher Scientific) and eluted onto an analytical column (75 µm × 250 mm, 2 µm C 18 , Thermo Fisher Scientific) with a chromatographic triple-phasic 53 min gradient Scientific RepoRtS | (2020) 10:11879 | https://doi.org/10.1038/s41598-020-68389-1 www.nature.com/scientificreports/ ranging from 1 to 64% mobile phase B (98% acetonitrile and 0.1% formic acid) at a 300 nL per minute flow rate. The total analysis time was 65 min and mobile phase A consisted of 2% acetonitrile and 0.1% formic acid. The high-resolution mass spectrometer was operated with positive electrospray ionisation in data-dependent mode by automatically switching between MS and MS/MS fragmentation. Based on a survey MS scan in the Orbitrap operated at a mass resolution of 120,000 at m/z 200 with a target of 3e6 ions and a maximum injection time at 50 ms, the twelve most intense peptide ions were selected for MS/MS fragmentation in subsequent scans. The selected ions were isolated (in a m/z 1.4 window) and higher-energy collision dissociation was performed at a normalised collision energy (28) and fragments recorded in centroid mode at a resolution of 60,000 (m/z 200) with a 250 ms maximum filling time and target of 1e5 ions. The high-resolution data generated were analysed in Proteome Discoverer 2.2 (Thermo Fisher Scientific) and searched against predicted phage/host proteins by the Sequest HT algorithm in an iterative processing pipeline. The search criteria were enzyme, trypsin (full); dynamic modifications, methionine oxidation and acetyl (N-terminus); precursor mass tolerance, 5 ppm; fragment mass tolerance, 20 mDa. The processed data were filtered in a Proteome Discoverer consensus workflow with the Peptide Validator algorithm (q-value < 0.01) to ensure the peptide-spectrum match had a false discovery rate under 1%. The de novo peptide sequencing identified 136 proteins in total with a false discovery rate < 1%. Individual samples contained proteins mapping to predicted proteins of both the phage and its host: 17 identified proteins in Anath/B. mycoides and 122 identified proteins in Lana/Pseudomonas sp. To avoid false positive identification, only phage proteins identified with quality scores (sequest HT score) exceeding the highest quality score of identified host proteins were regarded as virion proteins.