Faecal pollution source tracking in the holy Bagmati River by portable 16S rRNA gene sequencing

A suitcase laboratory was used for 16S rRNA amplicon sequencing to assess microbial water quality in the holy Bagmati River, Kathmandu, Nepal. SourceTracker analysis and Volcano plots revealed that microbial communities in the downstream part of the river were mainly contributed by untreated sewage. Seasonal variability in the sewage microbiome was reflected in the downstream river water quality. The bacterial genera Acidovorax, Geobacillus and Caulobacter predominated in the upstream sites, while genera containing putative human pathogens and gut bacteria, such as Clostridium, Prevotella, Arcobacter, Lactobacillus, Enterococcus and Streptococcus become prominent in the downstream sites. Marker gene qPCR assays for total bacteria, total coliforms, Human E. coli, Arcobacter butzleri and Vibrio cholerae confirmed the sequencing data trends. Even though basic sanitation provision is nowadays near universal in Nepal, our findings show how inadequate wastewater management may turn an urban river into an open sewer, which poses a public health risk.


INTRODUCTION
Bagmati is the principal river of the Bagmati River Basin, one of the major basins of Nepal. It flows through the Kathmandu Valley, which includes Kathmandu, Lalitpur and Bhaktapur. The river basin is mainly fed by natural springs and monsoon rainfall and is considered as the source of Nepalese civilisation and urbanisation 1,2 . The river flows along the famous Pashupatinath Temple, which holds great religious value for Hindu, and is also a World Heritage Site attracting more than a million tourists every year from across the globe 3 . Every day, hundreds and thousands of Hindu pilgrims perform bathing activities as a religious ritual in the Bagmati River. Water from the upstream part of this river is used for drinking, irrigation, livestock, industrial and domestic purposes by the people of the Kathmandu Valley 3,4 . Every day, 30 million litres of water is sourced from the river and its tributaries for domestic purposes 5 . However, the river has been polluted due to rapid urbanisation, population growth, unmanaged sewage connection (from domestic, industrial and agricultural effluents) and solid waste disposal along the river banks 4,6-9 . Approximately 21,000 kg of domestic sewage is discharged daily into the river from the Kathmandu Valley, including industrial BOD load discharges of 3151 kg per day 1 . This pollution has not only affected the aesthetic value of the river, but has also caused high microbial and chemical loads, posing a considerable threat to public health and the ecosystem 7 . Furthermore, the river water pollution directly affects the surface, subsurface and groundwater quality, including the shallow wells in vicinity of the river, as they are directly interconnected 5,10 . More than fifty percent of Kathmandu Valley residents rely on ground and subsurface water for various household activities including drinking 11 . Various studies have reported frequent incidents of diarrhoeal diseases in Nepal, including in the Kathmandu Valley, which is mostly caused by drinking water contaminated with human or animal faeces [12][13][14] . Therefore, robust and frequent screening of river water microbiomes is essential to get an insight into faecal contamination sources and the pathways of waterborne disease transmission, which will ultimately help in designing effective interventions for safeguarding public and aquatic ecosystem health. Quantitative "Microbial Source Tracking" (MST) methods employing qPCR and next-generation sequencing (NGS) techniques are increasingly being used to identify the sources of faecal contamination in watersheds 15,16 .
Most of the studies done on the Bagmati River have focused on spatiotemporal variation in organic matter, nutrients, major ions and trace elements 4,6,17,18 . Only a few recent studies have addressed the microbial water quality and microbiome diversity of this river 14,19,20 . Tandukar et al. quantified several human enteric viruses (i.e., adenoviruses, noroviruses and enteroviruses) and faecal indicator bacteria (i.e., total coliforms, Escherichia coli, Enterococcus spp and Bacteroides) present in the watersheds of the Bagmati River using real-time qPCR, and they also used fluorescence microscopy to quantify helminths, such as Cryptosporidium spp. and Giardia spp. 20 14 . Although qPCR results can be obtained within 2 h, they provide limited insight into the composition of water microbial communities, as it can only identify the targeted pathogens that are specifically chosen for monitoring. In contrast, high-throughput molecular methods, such as Illumina NGS allow more comprehensive water microbiome characterisations, but there will be delay in the availability of data from centralised laboratory facilities, which can restrict rapid decision making 21 . This is especially a problem in resource-limited settings, like in developing countries 22 .
An exciting development for near-real-time microbial water quality surveying in developing countries is the recent release of the low-cost, field-deployable and memory-stick sized MinION sequencer from Oxford Nanopore Technologies (ONT) Ltd. Its portability offers the possibility to comprehensively and quickly survey microbial water quality in any kind of setting, including in resource-limited developing countries [23][24][25] . In our previous work, we developed and validated a suitcase laboratory, which combined the MinION sequencer with other small equipment items to enable water microbiome characterisation by portable 16S rRNA gene amplicon sequencing 24,25 . In the current study, we deployed this toolbox to survey the microbiomes of the Bagmati River for different reaches of the river and different seasons. Our previous work had focused on the method development and validation 24 , and the assembly of small equipment items into a suitcase laboratory 25 . The current study reports the innovative application of these tools in combination with SourceTracker (ST) and Volcano plot analysis to facilitate microbial water quality monitoring at the watershed-scale in a culturally, religiously and economically important river located in a developing country.

Microbial community analysis
An overall microbial community analysis is presented as PCA plots and a dendrogram in Fig. 1. The cluster analysis showed good agreement between sample replicates, which clustered most closely. WWTP influent (i.e., untreated sewage) collected in the post-monsoon season clustered most closely with water samples from S4, S5 and S6 collected at same time, while WWTP influent from the monsoon season clustered with water samples from S6 collected in the same season. The WWTP effluent from both monsoon and post-monsoon season clustered together. The PCA plot with data from all the sampling times ( Fig. 1b) generally showed a separation of downstream and WWTP influent water samples from the upstream and WWTP effluent samples along principal component 1, with only a few exceptions. Genera mostly found in the human gut microbiome 15,26 like Streptococcus, Trichococcus, Lactobacillus, Enterococcus, Prevotella and Arcobacter, were highly prevalent in downstream and WWTP influent water samples, which separated these samples from the upstream water samples in the PCA. Among the three factors analysed (i.e., location, sampling time and water sample types), locations and sampling time had a significant effect on the similarity of the samples in the ANOSIM, although with relatively low R values (ANOSIM; Location: R = 0.29, p value = 0.001 and Sampling time: R = 0.16, p value = 0.01). ANOSIM further indicated no statistically significant differences between the microbial communities in water from locations S4, S5 and S6 and the wastewater influent (ANOSIM; (1) S4 and Inf: R = 0.0309, p value = 0.357: (2) S5 and Inf: R = 0.0617, p value = 0.369 and (3) S6 and Inf: R = 0.0123, p value = 0.3690).
An interesting picture emerged when a separate PCA (Fig. 1c, d) and cluster analysis ( Supplementary Fig. 1) was conducted for water samples from the monsoon and the post-monsoon season. In both seasons, there were substantial, but seasonally distinct, contributions of genera found in the human gut microbiome to the variance among water samples along principal component 1: in the monsoon season Arcobacter, Aeromonas, Streptococcus and Prevotella had significant PC1 loadings; in the post-monsoon season Enterococcus, Acinetobacter, Streptococcus and Trichococcus had significant PC1 loadings. Separation of wastewater treatment plant effluent (WWTP Effluent) samples along PC1 away from the WWTP influent (WWTP Influent) samples in both sampling events signified the benefits of wastewater treatment, because human gut-associated genera became less predominant in treated wastewater microbiomes, as expected 27 . Accordingly, there was a clear separation of the most upstream water samples from the most downstream water samples along PC1 in both events, with the downstream water samples becoming more similar to WWTP Influent (Fig. 1c, d). Evidently, as the Bagmati River flowed into more densely populated areas, the characteristics of its water microbiome changed from a composition more similar to treated, to a composition more similar to untreated urban sewage, but the composition of the urban sewage was variable for the monsoon and post-monsoon season.
Abundance of human gut and putative pathogenic bacteria in the water microbiomes A more detailed breakdown of the microbial community composition in the Bagmati River for the monsoon and postmonsoon season is reported in Table 1, and Supplementary Tables 1 and 2, which compare the total percentage relative abundance of putative human gut 28 and pathogenic 29 bacteria at genus and species level for different sampling sites in the Bagmati River, and the WWTP influent and effluent (Refer to Supplementary Tables 3-5 for more detailed lists of bacteria). Based on our previous findings 24 , species identities are not always reliable due to the limited read accuracy of the MinION sequencing reads, but the overall trends are nonetheless indicative of changes in microbial composition. For all sampling events, the water collected at the most upstream site S1 and S2 showed the lowest relative abundance for both human gut and putative pathogenic bacteria, whereas the highest relative abundance was observed in the water collected at the most downstream sites S4-S6 (Table 1,  Supplementary Tables 1 and 2). The microbial water quality of water samples collected at site S1 can be considered as baseline data, as this watershed is distant from the densely populated Kathmandu Valley and has the minimal influence of human and urbanisation activities. Figure 2, and additional figures in Supplementary Information (Supplementary Fig. 2-9) show how the abundance of human gut and putative pathogenic genera changed in space and time along the Bagmati River. As the river flowed downstream, the abundance of some of these groups of bacteria increased, and the most drastic and significant increase was observed at the sites S4, S5 and S6 downstream of the Pashupatinath Temple as compared to site S1 (Two-sample t test, p value < 0.05) ( Table 1, Supplementary Table 1, Supplementary  Table 2, Supplementary Figs. 2, 3, 6 and 7). In comparison with the WWTP influent and effluent samples, the relative abundance of human putative pathogenic species in the water at sites S1 and S2 was significantly lower than in WWTP effluent (Two-sample t test, p value < 0.05), [ Table 1 and Supplementary Table 1]. However, the pathogen load in the WWTP effluent can always be further reduced by tertiary treatment processes, such as chlorination before it is discharged into the receiving water bodies 30 . Downstream of Pashupatinath Temple, the abundance of some of the human gut and putative pathogenic genera were similar or even greater than observed in WWTP influent ( ). These more detailed results supported the observations from the PCA and showed that the microbial water quality of the Bagmati River deteriorated significantly downstream of Pashupatinath Temple as a result of faecal pollution, which was most likely due to the discharge of untreated sewage into the river, a significant public health hazard. Similar results have been reported in previous studies, although for a more limited number of sample sites 14,20 .
Effect of untreated sewage discharge in the downstream receiving river ST analysis was performed to better understand the impact of three sources on the microbial water quality of the Bagmati River: (1) WWTP influent as a proxy for untreated sewage discharges, (2) WWTP effluent and (3) the most upstream river water, which provides a baseline. Each source was distinct in both seasons based on PCA analysis, showing that the leave-one-out source class prediction provided a reasonable reflection of sources. This then allowed us to proportionate source influences in sinks for two different seasonal events (i.e., monsoon [June 2019] and post monsoon [August 2019]; no wastewater treatment data was available for the July 2019). In the monsoon season, sequences of microbial communities in location S2 were mostly sourced from the upstream river (63 ± 1.19%), while WWTP effluent contributed 16.46 ± 1.18 % (Fig. 3a). In contrast, at S3, S4 and S5, both the upstream river (35.48 ± 0.2% for S3; 30.27 ± 0.4% for S4; and 32.73 ± 1.12 for S5) and WWTP influent [i.e., untreated sewage] (26.90 ± 0.75% for S3; 48.95 ± 0.12% for S4; and 23.23 ± 0.17% for S5) had significant contributions (Fig. 3a). The sequences of microbial communities in location S6 were mostly dominated by sequences from the untreated sewage (72.70 ± 0.34%), and to lesser extent by the river upstream (S1) and WWTP effluent. (Fig.  3a). Interestingly, at all five sites, unknown sources showed an influence (13.33-18.28%). Finally, in the post-monsoon season, the sequences of microbial communities in sites S4 (74.66 ± 0.21), S5 (62.73 ± 0.29) and S6 (83.71 ± 0.19) were largely constituted by the sequences found in WWTP influent, while at sites S2 and S3, the upstream river (S1) contributed 77.91 ± 0.23% and 32.65 ± 0.09%, respectively (Fig. 3b). Interestingly, the unknown source contributed 50% of the sequences in location S3, while in other sites the contribution from an unknown source ranged between 13.66 and 19.7%. Based on the recent study from ENPHO, in the Kathmandu valley, 76.63% of faecal waste is discharged into the water bodies without treatment 31 . These findings suggest that the whole watersheds network can be divided into two main microbial source communities (river upstream and WWTP influent), and their seasonally variable composition and blending explained where, how and why faecal contamination influenced the microbiomes of the Bagmati River. In addition, the water flowing in the downstream part of the Bagmati River had mostly the characteristics of untreated sewage discharged directly into the river.

Faecal markers in the river quantified with qPCR
The qPCR results on different marker genes, including, faecal marker genes for the water samples collected from different sites at three different sampling events, including WWTP influent and effluent are presented in Fig. 4. These results were from the further analysis of DNA samples from Nepal at Newcastle University in the UK. All the pathogen or faecal marker genes were detected in the WWTP influent and effluent, and the three sites downstream of Pashupatinath Temple (i.e., S4, S5 and S6) for all sampling events. The concentration of these genes in these three sites was significantly higher than in upstream sites and WWTP effluent (Two-sample t test, p value < 0.05), and comparable to what we observed in the WWTP influent, except for the Inf and Eff, respectively, indicated WWTP influent and WWTP effluent, while P-mon indicates Post-monsoon (August 2019). The number in the parenthesis is the standard deviation of the duplicate analysis. Data from the other sampling events are available as supporting information, Tables S1 and S2.
ciaB gene. In the other three sites, which were located upstream of Pashupatinath Temple, the ciaB gene was always detected and quantified in the water collected from site S3, but only detected in one water sample from site S2 in the post-monsoon season (August 2019). Except for the July 2019 water samples from site S1, E. coli from human origin were detected in water samples collected from all locations. Whereas the ompW gene, which is a specific marker for Vibrio cholerae, was only detected at sites S2 and S3 in the monsoon water samples (June 2019) (Fig. 4). Furthermore, a boxplot presented in Supplementary Fig. 10 revealed the range of concentration of marker genes in the studied seasons for the different sampling events. The observed trends for all the marker genes suggested that the concentration of all the studied marker genes increased as the river flowed downstream. Marker genes for total bacteria, Human E. coli and Arcobacter butzleri were significantly reduced in the WWTP effluent as compared to WWTP influent (Two-sample t test, p value < 0.05) (Supplementary Fig. 10).

Cross-comparison and validation of MinION data with other microbiology data
Our previous analysis of a MOCK community revealed that MinION sequenced reads might result in false positive outcomes, especially at the species level, which indicated the need for validation with alternative methods, such as qPCR 24 . To validate the MinION's result in this study we performed qPCR on extracted DNA for different marker genes at Newcastle University after returning to UK as presented in Supplementary Table 6. The marker genes used to quantify Vibrio cholera, Arcobacter butzleri and Human E. coli were virulence genes and specifically present only in those microorganisms. Supplementary Fig. 11 shows the extent of correlation between different microbial water quality indicators determined by qPCR and NGS approaches. A significant Spearman rank correlation was observed between Arcobacter butzleri, total coliform and human E. coli quantified with qPCR, and other putative faecal indicator bacteria screened with the MinION (Supplementary Fig. 11). These correlations were well aligned with our earlier results 24,25 , and substantiated the usefulness of the portable MinION platform for bacterial hazard screening in water samples via sequencing of 16S rRNA gene amplicons. Supplementary Table 7 shows the preliminary MPN index for coliform bacteria in different water samples collected from six sites in the monsoon and post-monsoon season. Coliform bacteria were present in water collected from all the sites. The confirmative Fig. 2 Volcano plots comparing the abundance of specific genera in S6 relative to S1. Comparison of Human gut genera 28 for a monsoon and c post-monsoon seasons, human pathogenic genera 29 for b monsoon and d post-monsoon seasons. Green and blue dots indicate the genera in the S6 site that are at least twofold (x-axis) higher and lower, respectively, than detected in the S1 site, and high statistical significance (−log10 p value, y-axis). The dashed blue line shows where the p value = 0.05, with points above the line having the p value < 0.05 and points below the line having p > 0.05. The relative abundance of human gut and pathogenic genera on the left side of the solid blue line decreased, while on the right side it increased as compared to abundance in the site S1.  test for faecal indicator organisms in Supplementary Table 8 showed the growth of characteristic greenish metallic shine bacterial colonies on EMB agar at 44.5°C in all the samples, which previously showed the presence of coliform bacteria except in the most upstream water samples from S1 and S2 collected in the monsoon season (June 2019). This further confirmed the presence of faecal indicator organism E. coli at the downstream sampling sites and provides further evidence for the faecal pollution at these sites.

DISCUSSION
In this study, we demonstrated the suitability of the portable MinION NGS platform for comprehensive water quality monitoring and faecal pollution source tracking in a watershed via an innovative combination of 16S rRNA amplicon sequencing, ST and Volcano plot analysis. This demonstrated the practical feasibility of microbial community analysis and bacterial hazard screening with this portable laboratory in a developing country, and is to the best of our knowledge the first report of water microbiomes sequenced in Nepal. Several 16S rRNA gene sequences retrieved from water samples were associated with bacterial groups with public health relevance, and qPCR assays performed later in the UK to target virulence genes confirmed the Vibrio cholerae, Arcobacter butzleri and human faeces derived E. coli hazards.
Putative faecal indicator bacteria were frequently detected in high abundances in the river water at sites downstream of Pashupatinath Temple, which is a World Heritage Site attracting more than one million tourists each year, and many of them take a bath in the river as a ritual. There was a significant increase of putative faecal indicator bacteria from the most upstream to the most downstream site, demonstrating the deleterious impact of poorly managed sanitation in the densely settled urban areas of the Kathmandu Valley on the Bagmati River water quality. During the post-monsoon season, the PCA and ST analysis elucidated that the microbial communities in the more downstream sites most strongly resembled the microbial communities of WWTP influent, which is untreated urban sewage. At the most downstream site (S6) of the Bagmati River in the Kathmandu Valley, the microbial communities of the water were always most similar to WWTP influent communities during both the monsoon and postmonsoon season. This must be attributed to the practice of discharging untreated sewage into the river, as soon as it passes the Pashupatinath Temple. These findings for putative faecal indicator bacteria in the NGS libraries were well aligned with the qPCR results for faecal marker genes. The lowest abundance of these microorganism was seen at S1, which is commonly used as a source of drinking water for Kathmandu City, due to little human interference 14,20 . Poor microbial river water quality from site S4 in the downstream could be attributed to unsafe sanitation, and open defecation along in the river as has been reported elsewhere 14,20,32 . Additional pollution contributed by the river tributaries also cannot be ignored 33 . Overall, these results further verify deterioration of the river water quality as the river flows through Kathmandu city, which is of significant concern due to the possible exposure pathway for waterborne diseases. As the occurrence of enteric viruses strongly correlates with the faecal indicators 20 , additional studies may be conducted to identify pathogenic human viruses in the river water.
People taking a bath in the Bagmati river (as a routine activity or for a holy dip) between sites S3 and S4 are vulnerable to faecally polluted water and associated waterborne diseases, such as typhoid, cholera, paratyphoid fever, dysentery, jaundice and amoebiasis 34,35 . In addition, people residing nearby the Bagmati River in urban settings are expected to be exposed indirectly through consumption of water from shallow wells and vegetables grown with polluted water 24 . Many research findings have already highlighted river pollution as one of the root cause of groundwater pollution, which includes both pollution with anthropogenic chemicals [36][37][38] and potential pathogenic microbes 39 . Chronic exposure to such compounds and microbes may have severe consequences, including diseases conditions, multidrug resistance and endocrine disruption 40,41 . In addition, consumption of vegetables grown through drip irrigation in this polluted watershed may contribute an additional disease risk as has been reported elsewhere 42,43 . This study revealed how WWTP effluent had substantially better microbial water quality than the downstream Bagmati River, which demonstrates the urgent need for a less fragmented sanitation provision with proper sewage treatment. For instance, appropriate sewage collection strategies and subsequent treatment of sewage can be an appropriate intervention to prevent further degradation of river water quality.
At all the studied areas, the bacterial community structure, including the abundance of putative faecal indicator bacteria in the monsoon season was distinct and more diverse than in the post-monsoon season. Certain groups of faecal indicator bacteria prevailed in significantly higher abundance in the monsoon season, while separate groups were prevalent in the postmonsoon season. For instance, Enterococcus, Streptococcus, Bacillus, Clostridium and some genera from Enterobacteriaceae were significantly higher in downstream sites than in the WWTP influent during the post-monsoon season. Leight et al., detected higher abundance of organisms, such as Streptococcus, Clostridium and Bacteroides and some genera from the Enterobacteriaceae in the studied river during the wet season, and further suggested that some of these might be allochthonous organisms that might get washed in from the land 26 . Such temporal variation in the abundance of the putative faecal indicator bacteria highlights the necessity of taking into consideration source variability when selecting the most appropriate indicator for MST. In this study of the Bagmati River, species from the genera Enterococcus and Streptococcus appeared to be suitable indicators for faecal pollution during the post-monsoon season, while species from the genera Prevotella and Arcobacter were better indicators during the monsoon season. Therefore, it is imperative to study pollution sources and river water in conjuncture, as was done in this study, to account for seasonal variability of indicator bacteria in the sources.
Urban water pollution, mostly due to the discharge of untreated sewage and agricultural or storm-water runoff into waterways from land, are a growing problem in developing countries 44 . The observed microbial water quality at downstream sites in the Bagmati River illustrated this problem in a socioeconomically important watershed. This further implies that current sanitation provision in Kathmandu is insufficient for the protection of economies, public health and ecosystems, and thus implementation of safely managed sanitation system is urgent. In the absence of adequate wastewater treatment facilities and safe faecal sludge The project aims to rehabilitate the sewerage network and further expand the capacity of WWTPs to 90.5 million litres from around 16 million litres per day by constructing five new treatment plants 45 . The data presented here provide the current status quo for water quality in the Bagmati River, which will be useful for documenting the environmental benefits of the new wastewater treatment infrastructures in the future.
In this study, we used a low-cost, portable sequencer, the MinION that was also used in our previous study 25 , to characterise in Nepal the microbiome of the Bagmati River from its source to the point where it leaves the Kathmandu Valley. Portable sequencing allowed to simultaneously screen, identify and monitor multiple indicator bacteria for faecal pollution, and produce multivariate data for faecal pollution source tracking, which is particularly valuable when there is temporal variability in the source composition, as was illustrated in this study. This technology can enable on-site monitoring, and make a vital contribution to the more rapid characterisation of water quality, which underpins the design of adequate treatment technologies and regulatory frameworks for mitigating the associated problems 25,46 . Furthermore, such portable methods to comprehensively screen the microbiome of a biological system will play a major role in monitoring wastewater treatment systems and the protection of both ecosystem and human health. Overall, the application of portable sequencing technologies for faecal pollution source tracking will help achieve the UN Sustainable Development Goal (SDG) 6; clean water and sanitation 47 . For example, faecal pollution source tracking is essential for SDG 6.3, which aims to protect both ecosystem and human health by eliminating, minimising and significantly reducing different streams of pollution into water bodies 47 .

MATERIAL AND METHODS Study site and sample collection
The study investigated a 25 km stretch of the Bagmati River ranging from Sundarijal (Upstream) to Chovar (Downstream). The Bagmati catchment drains an area of 595.4 km 2 with Chovar as its outlet. Several tributaries of different order feed the river, including Manahara khola, Hanumante khola, Dhobi khola, Tukucha khola, Bishnumati khola and Nakkhu khola. The altitude of the basin area varies from 1178 to 2723 m above the mean sea level.
Eight sampling sites as shown in Fig. 5 were selected based on the population density. The upstream region, which was located above the holy Pashupatinath Temple, contained sampling sites S1, S2 and S3, while the downstream region included sampling sites S4, S5 and S6 (Supplementary Table 9 in supporting information). The sites S4 and S5 were in the areas where the river flows through the settlements with the highest population density. The most downstream sampling site S6 (Chovar), was in the area where the river flows beyond the Valley into a region with a lower population density. In addition, the WWTP influent and effluent of the Guheshwori WWTP, located in between the sampling locations S3 and S4, were also sampled. From each  Table 10 in supporting information. Samples were transported immediately to the laboratory in cold boxes, containing ice cubes and packs, and processed within 2 h.
16S rRNA gene amplicon sequencing analysis of water quality with a suitcase laboratory All the necessary portable equipment and consumables for the 16S rRNA gene amplicon sequencing analysis with the memorystick sized MinION of ONT were packed into two check-in sized luggage items at Newcastle University and transported to Nepal by air and road. The molecular microbiology consumables including flow cells, PCR reagents and 16S rRNA gene amplicon sequencing kit required cold storage. Therefore, they were placed into a polystyrene box with the cool packs used by ONT to ship their flow cells to customers. This packaging could maintain the refrigerator temperature of about 4°C for~2 days. The portable laboratory 25 was set-up at a bench space provided by the Research Institute for Bioscience and Biotechnology, Kathmandu, Nepal.
Biomass for total DNA extraction from each sampling event was concentrated in duplicates by independently filtering 100 mL of water through a 0.22 μm membrane (Sartorius UK Limited, Surrey, UK), except for site S1, for which 250 mL were filtered, because a lower amount of biomass was anticipated. Membrane filters were folded aseptically, transferred individually in a 15 mL sterile falcon tubes, and stored at −20°C until use. These filtrations were conducted in the CEMAT water laboratories (Kathmandu, Nepal).
Total DNA from the retained biomass was extracted using a PowerWater DNA Isolation Kit as per manufacturer's instruction (QIAGEN, Crawley, UK), while the isolated DNA was quantified using a Qubit dsDNA HS Assay Kit (Life Technologies, UK). The 16S rRNA gene amplicon library preparation, subsequent sequencing of the libraries, and data processing and statistical analysis were performed in accordance with the method described elsewhere 25 .

Metadata collection
For context and validation of the sequencing data generated with the portable MinION toolbox in Nepal, the multiple tube method to assess the most probable number (MPN) of faecal indicator bacteria was performed for the monsoon (June 2019) and premonsoon season (August 2019) at CEMAT water laboratories (Kathmandu, Nepal), according to the procedure described elsewhere 48 , and qPCR was performed at Newcastle University using surplus of the DNA extracted in Nepal to quantify different faecal marker genes. Briefly, the MPN index with a 95% confidence limit was worked out with the help of an MPN Chart. The presumptive test was performed using MacConkey broth. Further, a confirmatory test was conducted by inoculating one loopful of the sample from each presumptive positive tube on EMB agar and incubating it at 44.5°C for 24 h. The greenish metallic shine colonies from EMB agar were then streaked on nutrient agar and incubated at 37°C overnight. Standard IMViC tests were performed for isolated colonies from the nutrient agar for confirmation of E. coli and a further complete test was done on MacConkey broth 48 . qPCR assays for target genes were conducted on a Bio-Rad CFX C1000 system (Bio-Rad, Hercules, CA USA) by following the method described elsewhere 25 . Briefly, primers mentioned in Supplementary Table 6 were used to quantify the target marker genes in a final reaction volume of 15 μL. The final reaction mix consisted of 2 μL of template DNA, 7.5 μL of so Advanced Universal SYBR Green Supermix (Bio-Rad, Hercules, CA USA), 0.75 μL (500 nmol L −1 ) of each forward and reverse primer, and 4 μL molecular grade H 2 O (Invitrogen, Life Technologies, Paisley, UK) were mixed together. The qPCR programme for the quantification of each marker genes was 98°C for 3 min (1×), then 98°C for 15 s followed by primer annealing temperature (T a ) for 60 s (Supplementary Table 6) (40 cycles). At the end of each assay, the qPCR products were subjected to melt curve analysis with gradual increase of temperature (0.2°C temperature increments every 10 s) from 65 to 95°C. Standard curves were generated for every qPCR assay. All the samples, standards and negative controls (molecular grade water replaced the DNA template) were run in duplicates, while to minimise the effect of inhibitors on PCR reactions, the DNA samples were diluted to a working solution of 5 ng uL −1 using molecular grade water.

Data analysis
A correlation analysis (i.e., Spearman rank) between the water quality markers and indicators quantified, respectively, by qPCR and NGS approaches was performed in R-studio (Version; R version 3.6.2). This helped to validate and cross-compare the outcomes for different methods. Finally, comparison between the abundance of detected human gut and pathogenic genera for both monsoon (June 2019) and post monsoon (August 2019) in S1 (the most upstream and rural site) versus other sites were performed with a Volcano plot in R-studio. A similar comparison between the WWTP influent versus water from the other sites, including the WWTP effluent, were also performed with a Volcano plot in R-studio (Version; R version 3.6.2).
ST analysis as described elsewhere 49 was used to evaluate the relative contribution of microbial communities from different "sources" along the Bagmati river to the "sinks", during both the monsoon (June 2019) and post-monsoon (August 2019) season. Sources consist of the most upstream site, S1 (n = 2), WWTP influent (n = 2) and WWTP effluent (n = 2). While the sinks consist of the five sites in the Bagmati River labelled as S2, S3, S4, S5 and S6. ST uses Gibb's sampling (Markov chain Monte Carlo-algorithm) to evaluate the relative contribution of sources, while all the undefined operational taxonomic units in the sinks will be assigned as from unknown sources. ST analysis was performed with a sequencing depth of 15,000 with 100 iterations, ten restarts and the auto-tuning functionality.

DATA AVAILABILITY
Raw sequencing data from the 16S rRNA sequencing are registered on the NCBI biosample database with sequence read archive (SRA) accession number PRJNA690679. Additional data created during this research are openly available at https://doi.org/10.25405/data.ncl.13160147.v1. Please contact Newcastle Research Data Service at rdm@ncl.ac.uk for access instructions.