Aquatic reservoir of Vibrio cholerae in an African Great Lake assessed by large scale plankton sampling and ultrasensitive molecular methods

The significance of large tropical lakes as environmental reservoirs of Vibrio cholerae in cholera endemic countries has yet to be established. By combining large scale plankton sampling, microbial culture and ultrasensitive molecular methods, namely Droplet Digital PCR (ddPCR) and targeted genomics, the presence of Vibrio cholerae was investigated in a 96,600 L volume of surface water collected on a 322 nautical mile (596 km) transect in Lake Tanganyika. V. cholerae was detected and identified in a large area of the lake. In contrast, toxigenic strains of V. cholerae O1 or O139 were not detected in plankton samples possibly in relation to environmental conditions of the lake ecosystem, namely very low salinity compared to marine brackish and coastal environments. This represents to our knowledge, the largest environmental study to determine the role of tropical lakes as a reservoir of V. cholerae.

In the last decades, climate change including warming and decreased winds have been noted with apparent impacts on the thermic stratification, oxygenation and primary production patterns of Lake Tanganyika (8)(9)(10). Cholera outbreaks have occurred at Lake Tanganyika as in the East African region almost every year since 1977-1978. It has been noted that epidemics often started near the lake shores (11)(12). During lull periods, persistence of cholera near the Great African lakes was explained by outbreak dynamics, which suggested a metapopulation pattern, and by endemic foci around the lakes (12). Cross analyses have indicated correlation between phytoplankton blooms and cholera outbreaks at Lake Tanganyika (13). Moore et al. (14) indicate that isolates from the African Great Lakes Region (DRC and Zambia) formed a closely related group. They also found that certain MLVA types collected in the DRC persisted in the country for several years, occasionally giving rise to expansive epidemics.

Large scale Continuous Plankton Recorder sampling in Lake Tanganyika
The CPR is a robust mechanical plankton-sampling device that has been used extensively by large merchant vessels in the oceans across the globe for more than 80 years (15) (Figure S1). For the present study, the CPR was mounted on a small transport vessel Maman Benita KAL 2271 for sampling across Lake Tanganyika. Six tows of 13 (1ALT), 68 (2ALT), 61 (3ALT), 67 (4ALT), 80 (5ALT) and 33 (6ALT) nautical miles were conducted between Kigoma and Kiranda on the eastern part of Lake Tanganyika at the beginning the short rainy season (cholera season) from 22nd to 26th October 2018. 18 standard formalin-fixed CPR samples were collected along routes 1ALT, 2ALT and 3ALT for molecular and plankton analysis. In addition, 18 non-formalin fixed CPR samples (viable-CPR protocol) were collected along routes 4ALT, 5ALT and 6ALT for culture-based microbiological studies. Sampling took place in the surface layer (~7 meters) and plankton was collected on a band of silk (mesh-size 270 μm) moving across the sampling aperture at a rate proportional to the speed of the towing ship. The CPR mesh width of 270 μm retains larger zooplankton with a high efficiency, but also collects small planktonic organisms such as nauplii, microzooplankton and phytoplankton. On return to the laboratory, the silk was removed from the device and divided into individual samples that were stored in sterile plastic boxes with or without 4% buffered formalin. Each sample represented 10 nautical miles of tow (3m 3 of filtered seawater) and was previously shown to capture a substantial fraction of the plankton associated Vibrio community (16) ( Figure S1). A conductivity-temperature depth (CTD) sensor was mounted on the CPR to record surface water temperature and conductivity along the tows ( Figure S2).
A key element of the CPR methodology is its ability to obtain large amounts of information from remote areas at low cost by using merchant ships of opportunity freely towing the sampling machine. The costs involved, other than for purchase of equipment, are largely for subsequent laboratory analysis of the bacteria and plankton. All samples collected during the cruise were shipped to Europe and analyzed within 14 days from collection.

V. cholerae culture-based studies
Laboratory tests were initially conducted to assess recovery and culturability of V. cholerae strains on CPR silk under conditions mimicking those found in Lake Tanganyika environment ( Figure S3). Briefly 1 ml of bacterial suspension corresponding to 10 6 cells of V. cholerae N16961 (ATCC 39315) strain were inoculated on 1cm 2 sterilized CPR silk section. Silks were incubated in 5ml Eppendorf tubes containing 3ml of Artificial Seawater (ASW) or Phosphate-buffered saline (PBS) or Alkaline peptone water (APW) or unsterilized Lake water (LW) at different temperatures (+4°C and room temperature). Each condition was run in triplicate. Culturability was then assessed by conventional APW enrichment and Thiosulfate-citrate-bile salts-sucrose agar (TCBS) plating of the bacterial suspension at different time intervals of 7, 15, 28, 365 and 730 days, respectively (results showed that V. cholerae could be cultivated up to 12 weeks at room temperature on the CPR silk under all tested conditions, Figure S3). Isolation of V. cholerae from CPR samples collected by the viable-CPR protocol (no formalin fixation of the CPR samples) was accomplished by employing APW enrichment of the silk (static incubation for 16h at 30 to 35°C overnight) followed by selective culture on Thiosulfate-citrate-bile salts-sucrose agar (TCBS) (incubation at 30°C to 35°C for 24 ± 2 h) as described in Huq et al. (17).
The silk used for enrichment corresponds to ca 3000 L of filtered lake water. Colonies grown onto TCBS agar were inspected and sucrose fermenting (yellow) colonies that have different sizes and morphologies were picked off the agar and isolated as pure cultures onto the same medium. A PCR assay (18) and partial sequencing of the rpoA gene (19) was used to determine if each colony belonged to the species V. cholerae. Table S1.

Nucleic acid extraction
For each CPR sample, the filtering silk was cut into strip sections (1x10 cm) and DNA was extracted and purified from each section using the methodology described in Vezzulli et al. (16,18). The amount of DNA extracted from the CPR samples was determined fluorimetrically with PicoGreen using a NanoDrop® ND-3300 fluorometer (NanoDrop Technologies, Wilmington, DE, USA). Sizing of genomic DNA was also conducted in an Agilent Bioanalyzer 2100 (Agilent, Palo Alto, CA) using the High Sensitivity DNA kit (Agilent Technologies).

Droplet Digital Polymerase Chain Reaction (ddPCR) studies
Preliminary screening of CPR samples was carried out by ultrasensitive QX200 Droplet Digital PCR System (Bio-Rad Laboratories, Hercules, CA, USA) ddPCR to target the V. cholerae O1, O139, Non-O1, and Non-O139 specific gbpA marker gene (18). ddPCR is a third generation of PCR that allows absolute quantification of DNA copies by partitioning DNA single molecule into approximately 20,000 droplets based on the Poisson distribution, and then counting the number of positive and negative droplets (20). In order to estimate the ddPCR limit of detection (LoD) for detecting V. cholerae genomes into a sample, we firstly determined the genome mass of the pathogen using the formula m = (n) (1.096 x 10E-21 g/bp) where "m" is the genome mass in grams (1g =1E+12 pg) and n is the genome size in base pairs (bp). Therefore, since V. cholerae N16961 reference genome is made up of 4,033,464 bp, one haploid genome weighs 0.004421 pg. Then, we used this value to define the minimum amount of V. cholerae DNA that could be detected. Specifically, we made 10-fold serial dilutions starting from 0.06 ng containing only V. cholerae DNA to 0.00006 ng of V. cholerae DNA mixed with a DNA from V. aestuarianus, a phylogenetically close bacteria with a comparable genome size (4,837,307 bp). Briefly, DNA was mixed with Supermix no d-UTP (Bio-Rad Laboratories) and 0.4 µM primers and 0.2 µM FAM-labeled TaqMan Probe (1) in a final volume of 22 µl. Then, PCR mix was partitioned in at least 10,000 droplets by droplet generator (Bio-Rad Laboratories) and amplified in a Mastercycler nexus gradient thermal cycler (Eppendorf, Hamburg, Germany). The PCR program was as follows: enzyme activation at 95°C for 10 min followed by 40 cycles of denaturation at 94°C for 30 sec and annealing/extension at 57°C for 1 min followed by a final step of enzyme deactivation at 98°C for 10 min. Finally, the droplets were acquired into the droplet reader (Bio-Rad Laboratories), and data analyzed using the QuantaSoft software (Bio-Rad Laboratories). Notably, at a quantity of 0.00006 ng we were able to detect a mean of 6 positive droplets (two replicates) of V. cholerae DNA, reaching a LoD of six haploid genomes on a total of ~13,000 analyzed. Since the DNA composition in our CPR samples was unknown, we assumed that each sample contained a mix of genomes from the major subgroups of life and viruses (average genome sizes: 3,95E-02 Mb viruses; 3,214 Mb prokaryotes; 31,874 Mb fungi; 59,529 Mb unicellular eukaryotes; 855,59 Mb algae; 4456 Mb animals; 5958 Mb plants) (21) with a theoretical mean genome size of approximately 1,623 Mb (1 genome ~ 1.78 pg). In light of these considerations we used an input of 10 ng of DNA for the ddPCR, running these samples in triplicate (total DNA: 30 ng) to reach an ideal sensitivity of approximately 3 VC positive droplets on ~17,000 analyzed genomes. Table S2.

Capillary quantitative Real-Time PCR (qPCR) studies
CPR samples that scored positive to ddPCR were further investigated by quantitative PCR using assays targeting gbpA (control), ctxA, tcpA, rfbN and wbfR genes as described in Vezzulli et al. (22). Briefly, with the exception of the gbpA protocol (18), the LightCycler-FastStart DNA Master SYBR Green I kit (Thermofisher Scientific) optimized for use with glass capillaries and containing a hot start polymerase was used as the master mix base for all reactions. Each reaction mixture contained 5.0 mmol of MgCl2 and 500 nmol of each primer in a final volume of 20 ml. The PCR programme used was as follows: initial denaturation at 95°C for 10 min, followed by 45 cycles of denaturation at 95°C for 10 sec, annealing at 59°C for 20 sec and elongation at 72°C for 4 sec, followed by final elongation at 72°C for 10 min. PCR runs were analyzed directly in the LightCycler using melt-curve analysis and the software provided with the instrument. The correct size of the products was further confirmed by agarose gel electrophoresis. Presence of V. cholerae toxigenic strains was also evaluated by Real-Time PCR using the commercial VibChoTx MONODOSE dtec-qPCR Test (Genetic PCR Solutions™, Alicante, Spain). Table S2.

Metagenomic studies
Selected CPR samples (2ALTstart, 2 ALT2 and 2ALT3) were analyzed by shotgun metagenomic and targeted metagenomic techniques (Whole Genome Enrichment) following protocols described in Vezzulli et al (22) (Figure S4). Briefly, genomic DNA extracted from CPR samples was used for the production of an indexed library for next-generation sequencing on the Illumina platform (Illumina, Inc) using the KAPA HyperPlus Kit for Illumina (Roche Diagnostics, Mannheim, Germany). About 200 ng of the produced shotgun metagenomic library was used for target DNA capturing using biotinylated RNA baits (on average >100-mer). Baits were produced using the MYcroarray WGE proprietary technology (MYcroarray, Ann Arbor, MI, USA) and made out from genomic DNA extracted from different V. cholerae strains representative of the main pathotypes: V. cholerae N16961 (serogroup O1, biotype El Tor), V. cholerae O395 (serogroup O1, biotype classical), V. cholerae MO10 (serogroup O139) and V. cholerae TMA21 (serogroup non O1/O139). The DNA library was heat-denatured and hybridized to the RNA baits under stringent conditions. Hybridization was carried out at 65°C for 36 h. After hybridization, the biotinylated baits hybridized to captured material were pulled out of the solution with streptavidin-coated magnetic beads and the captured genomic DNA was released by chemical degradation of the RNA baits. Enriched libraries were amplified and sequenced by STAB VIDA, LDA company (Caparica, Portugal) on a MiSeq Illumina™ platform (V3 flow cell, 600 cycles, 25M reads 250bp pair ends). Shotgun metagenomic libraries (not enriched libraries) were also sequenced on a Illumina® HiSeq®2500 platform (PE150 on one lane with an output of ~600M reads). Sequence reads data were archived at NCBI sequence read archive (SRA) with Accession Number PRJNA679303.

Bioinformatics analysis
Pair-read sequences from both shotgun and targeted metagenomic analyses were quality trimmed to a minimum length (75bp), with quality scores (quality nucleotide limit 0.05 based on Phred scale) and the presence of ambiguous nucleotides (n=2). Sequencing adapters were also removed. Trimmed reads were mapped against reference V. cholerae N16961 sequence (Accession: AE003852/AE003853) using the mapping tool of the CLC Genomics Workbench (version 20.0.4) (QIAGEN, CLC Bio, Aarthas, Denmark). A length fraction of 0.5 and similarity fraction of 0.8 were employed in the analysis (e.g. 50% minimum read length matching the reference at >80% nucleotide identity). Masking of 16SrDNA encoding genes was applied. Mapped reads-pairs obtained from shotgun and targeted metagenomic analyses for all samples were then extracted and pooled for subsequent bioinformatic analysis. Taxonomic Profiling and Find Best Matches with K-mer Spectra (Microbial Genomics Module; CLC genomics workbench) against a reference database of 466 complete V. cholerae genome sequences were applied for strain identification (V.cholerae sequences used in this study are listed in a separate excel table as supplementary). Metagenome assembly was conducted using metaSPADES (version 3.14.1) (23) and metagenome-assembled genomes (MAGs) reconstruction was carried out using CONCOCT (version 1.1.0) (24). MAGs were then refined using the CheckM (v1.1.3) 'merge' and 'outliers' tools which merge MAGs with complementary sets of marker genes to improve completeness and remove contigs from MAGs which appear to be outliers (contaminants) relative to reference GC and tetranucleotide distributions (25). MAG's taxonomic assignment to V. cholerae N16961 reference sequence was determined by using Kraken2 (26). Phylogenetic analysis was performed by annotating contigs based on core genes and assigning a mass-probability to their classification with Phylosift (27). A phylogenetic tree based on alignment of the conserved codons among the strains was created with Fastree (28). Whole genome alignment and calculation of average nucleotide identity (ANI) of the MAG region with V. cholerae reference genomes were also applied and a tree was created through a neighbor joining clustering method with the CLC software.

Sample contamination
To avoid laboratory contamination of treated samples all the analyses including DNA extraction, DNA amplification and NGS library preparations were carried out in a separate laboratory (nonaquatic/non-microbiological laboratory) using a dedicated set of pipettes, reagents, and consumables.

Plankton analysis
Qualitative assessment of plankton organisms in CPR samples (2ALT1, 2 ALT5, 3ALT2, 3ALT5) was analyzed microscopically according to standard CPR procedures as described in Reid et al. (30) (Figure S5). Figure S1. The Continuous Plankton Recorder (CPR) Figure S2. Water temperature profile measured by a CTD mounted on the CPR during the cruise.

Figure S3
V. cholerae culturability test in CPR samples. Figure S4. Metagenomic analysis workflow applied on CPR samples. Figure S5. Average plankton composition of the collected CPR samples along routes 2ALT and 3ALT. Table S1. Results of V. cholerae cultivation in APW and TCBS media from CPR samples collected in Lake Tanganyika  Table S2 . Results of PCR and ddPCR test targeting V. cholerae in CPR samples collected in Lake Tanganyika (GU/rx=Genomic Unit/reaction)