Surface freshwater ecosystems including lakes, rivers and streams, play an important role in the recycling of nutrients, biogeochemical cycling, and energy flows. In these ecosystems a diverse range of microorganisms play critical roles in the functioning of these cycles, which is associated with physicochemical parameters of water1. In addition, the use of surface water for human recreation and consumption has increased with the pace and extent of urbanization, making microbial analysis of these water resources an essential part of water quality and human health.

A number of investigations into surface water microbial diversity have been performed2,3. However, the microbial communities inhabiting urban surface waters are still relatively unknown. Beijing, as one of the largest cities in China, with a population of over 20 million, contains a number of surface water ecosystems, including river, reservoir, stream and lakes, all of which are closely tied to people’s lives. A better understanding of the scale spatial characteristics of microbial population (distribution, abundance and structure, etc.). The influence of environmental factors is essential to understanding the microbial processes underlying the urban surface ecosystem and is of particular importance to water management and public health. In previous studies, Araya et al. studied bacterial community composition in an urban river using fluorescent in situ hybridization (FISH) and denaturing gradient gel electrophoresis (DGGE), and showed that stream water was dominated by β-Proteobacteria and the Cytophaga-Flavobacterium cluster4. Zhang et al. investigated the bacterial community structure of surface water in three rivers located in the city of Shanghai over a one-year period using DGGE, showing that temperature was the major driver of bacterial community composition5. Unfortunately, water microbial communities are particularly difficult to study by traditional PCR-cloning based approaches (DGGE, clone library and FISH, etc.). Recently developed high-throughput next-generation sequencing methods, such as MiSeq pyrosequencing of the 16S rRNA gene, have provided more comprehensive descriptions of bacterial communities in various environments due to the increased number of sequence reads that can be obtained. However, there is very little published research on bacterial diversity and community composition of the surface waters in Beijing. In 2015, Wei et al. reported the core microbial community from five surface water samples was composed of 13 phyla and 40 genera6. However, their correlation with physical and chemical factors in the environment remains unknown.

The objective of this study was to examine and compare the microbial diversity and abundance of sixteen surface waters in Beijing, using the Illumina MiSeq system. Meanwhile, multivariate statistical analysis was also performed to evaluate the relationships between microbial communities and environmental factors. Furthermore, the presence of potential pathogens in these water samples was also evaluated. Our results provide more detailed information on the characteristics of urban surface water microbial communities, as well as for risk management of potential bacterial pathogens in urban surface waters.


Physicochemical characteristics of 16 surface water samples

The temperature of sampling sites ranged from 9.0 to 25.0 °C with an average of 17.65 °C. Plate counts showed the highest concentrations of total bacteria were in sample HL (3.83 × 106 CFU/mL), which also exhibited the highest turbidity value, followed by sample QR and GBD, with the values of 3.0 × 104 and 2.8 × 104 CFU/mL, respectively, while sample MY contained the lowest number (3.0 × 102 CFU/mL). The highest conductivity was found in sample QR. Total nitrogen ranged from 1.38 to 19.0 mg/L and nitrate-nitrogen from 0.8 to 17.3 mg/L. The highest of total phosphorus was observed in sample HL, and the lowest in sample CQ. The properties of 16 surface water samples are listed in Table 1.

Table 1 Physical and chemical properties of the 16 surface water samples.

Diversity analysis using operational taxonomic units

In total, Illumina MiSeq sequencing generated clean reads of 585,728 high quality 16S rRNA gene sequences with an average of 36,608 sequences per sample. After trimming, alignment, and chimera removal using the standard operating procedure for Mothur, the number of sequences of each sample ranged from 17,519 (QR) to 36,002 (WY). All samples were randomly resampled to 17,519 sequences, with 280,304 rarefied sequences for the 16 samples retained and were subjected to further statistical analyses.

Biodiversity of the 16 surface water samples was investigated by analyzing OTUs, Chao1, and Shannon indices at cut-off levels of 3% (Table 2). Slight differences in biodiversity were observed among the different sites. For example, the library derived from sampling site CQ had the greatest number of OTUs (1,483), followed by sample GBD (1,424) and QR (1,406), while sample ZGC had the smallest (393). Chao1 numbers were considerably higher than OTU numbers, suggesting that more OTUs may exist in these bacterial communities. However, the Good’s coverage value of all samples was above 95.14%, suggesting that the observed sequences could function as a good representation of the bacterial communities present in the 16 samples. Additionally, the Shannon diversity index was also generated for each sample, with the results supporting that sample CQ had the highest bacterial diversity of the 16 samples.

Table 2 Diversity indices from 16 surface water samples.

In order to test the interaction between biodiversity indices and water quality indices, Spearman’s rank correlation was calculated to compare the correlation between diversity indices (OTUs, Chao1 and Shannon diversity index) and water temperature as well as other qualitative parameters. As indicated by Spearman’s rank correlation, water temperature, nitrate, and TN had significant correlations (p < 0.05) with OTUs and Chao1 (Table 3). Among all the water quality indices, only the TN was significantly positively correlated with Chao1 at p ≤ 0.01 level. It should be noted that no significant correlation was found between Shannon index and any water physicochemical parameter.

Table 3 Correlation between the alpha diversity and water quality indexes.

Taxonomic distributions among samples

Surface water OTUs were classified to 43 bacterial phyla the most predominant phyla being Proteobacteria, Bacteroidetes and Actinobacteria. Based on relative abundances, the dominant groups (greater than 1% abundance) of each sample are displayed in Fig. 1. As shown, Proteobacteria and Bacteroidetes were the most commonly detected phyla in all the samples, accounting for 21.9–78.5% and 19.1–74.7% of sequences, respectively. The sample QR contained the most Proteobacteria (78.5%), while sample ZGC contained the most Bacteroidetes (74.7%). At the class level, the dominant groups for the 16 water samples varied. For instance, Betaproteobacteria were dominant in the majority of water samples including BJ, GBD, HR, NT, QH, QL, QR, RM, WY and YX, while in samples HL, NH, YD and ZGC, Flavobacteriia were the dominant group. It should be noted that only sample CQ was dominated by the Actinobacteria class, and Sphingobacteriia was dominant in only sample MY. Of the 16 samples, only in MY were there OTUs (2.97%) that could not be assigned to any known, which may indicate a novel uncharacterized phylum present at that site.

Figure 1
figure 1

Bacterial community compositions at the phylum level as revealed by MiSeq sequencing. Phyla making up less than 1% of total composition were classified as “others” in the 16 samples.

The abundance of the top 30 genera are shown in Fig. 2. The most predominant genera among Proteobacterial sequences were unclassified_Comamonadaceae, Limnohabitans, Polynucleobacter, Rhodoferax, Hydrogenophaga, Candidatus_Aquirestis and Polaromonas. Flavobacterium was the predominant Bacteroidetes genus and was the most predominant genus in 12 of 16 surface water samples. The exceptions being CQ, GBD, QR and WY.

Figure 2
figure 2

Heat map of the top 30 genera present in the sixteen surface water samples.

Principal coordinate analysis was used to analyze the bacterial community dissimilarity. As shown in Fig. 3, significant differences in bacterial community composition were found among most samples. However, similar patterns of bacterial community composition were observed for few samples, as exemplified by samples MY and RM.

Figure 3
figure 3

The plots of principal coordinate analysis illustrating community dissimilarities based on weighted UniFrac distances metric.

To test possible linkages between the microbial communities and surface water quality parameters, canonical correspondence analysis (CCA) was performed (Fig. 4) and the results showed that bacterial community composition was significantly correlated to turbidity (r2 = 0.56; p = 0.01). CCA axis 1 was positively correlated with water temperature, pH, EC, nitrate, chloride, turbidity, AN, TP and TN, but was negatively correlated with TB. The first CCA dimension explained 8.51% of the variation of bacterial communities, and the second explained 8.42%.

Figure 4
figure 4

Canonical correspondence analysis (CCA) biplot of bacterial communities and environmental parameters.

Detection of potential waterborne pathogens

Of the 66 potential waterborne pathogens recommend by USA EPA7, a total of 15 were detected across all of the samples. Only two types of potential pathogenic bacteria were found in sample ZGC, while 12 were detected in sample NH, the most of any sample. However, the greatest abundance of potential waterborne pathogens was observed in sample QR, accounting for 10.39% of all the OTUs, followed by sample HL (5.33%), QL (3.85%), GBD (1.94%), NH (1.26%), WY (1.34%), YX (1.07%), MY (0.96%), RM (0.94%), NT (0.80%), HR (0.56%), CQ (0.43%), YD (0.43%), QH (0.31%), BJ (0.23%) and ZGC (0.11%). At the phylum level, Proteobacteria contributed the highest proportion of bacterial richness that may be caused diseases in humans, accounting for 24.48–100% in all the potential pathogenic bacteria composition. The next most dominant phylum was the Firmicutes, followed by Actinobacteria, Spirochaetae and Tenericutes. The distribution of potential pathogenic bacteria at genus level is shown in Fig. 5.

Figure 5
figure 5

Potential pathogenic bacteria diversity and abundance at genus level in total pathogen.

Among the potential bacterial pathogens, the genus Pseudomonas was dominant in all the samples expect for sample YD, which was dominated by the genus Mycobacterium. The next most dominant flora in each sample was different. For instance, Aeromonas was the second richest genus of potential pathogenic bacteria in samples CQ, GBD, HL, NT, QH, QL and YX. While in samples NH, QR, RM and WY, Arcobacter was the second most abundant genus of potential pathogenic bacteria among all the OTUs. Additionally, the relationship between water parameters and the percent of total potential pathogens was assessed using Spearman correlation coefficient analysis (Table 4). The results indicated that both AN and TN were significantly correlated with the percent of total potential pathogens (p ≤ 0.05). No significant links were found between the percent of total potential pathogens and the remaining water parameters.

Table 4 The Spearman correlation coefficients between total pathogen percent (%) and water quality indexes.


Surface water, which consists of both natural and man-made water bodies (rivers, lakes, reservoirs, wetlands, parks, etc.) plays a vital role in urban ecosystem services8,9, and remains an important source for drinking water, irrigation, and recreation, etc. Changes in urban surface water may result in fluctuation of microbial community, with consequences to water quality and water-borne pathogen contamination. Assessing and understanding the microbial ecology of urban surface water from different regions is essential for making water quality management plans. In our study, high-throughput amplicon sequencing was used to characterize bacterial community structure and diversity of 16 surface water samples collected from various locations in the city of Beijing. The total number of phyla in each of the 16 samples ranged from 9 to 37, while 20 to 27 phyla have been observed in five surface water samples collected from other areas of Beijing6. These results suggest that the bacterial community composition of Beijing surface waters was significantly different from each other. The dominant phyla also differed among the 16 surface water samples. Proteobacteria (21–53%) and Bacteroidetes (11–64%) were the dominant phyla, which is in agreement with the findings of Bai et al.10, who examined the bacterial community structures in wastewater treatment bioreactors via high-throughput sequencing. However, Actinobacteria (41.17% of total sequences) and Proteobacteria (31.80%) were the dominant phyla in the Ganjiang River basin11, similar results have been found in other Beijing urban surface water bodies6.

In previous studies, the factors that could influence the bacterial community in surface water were well understood. For example, the bacterial community composition in the Danube River was determined by denaturing gradient gel electrophoresis analysis, and results indicated its shifts were related to changes in environmental conditions12. In 2015, Zhong et al. reported that the composition and diversity of the prokaryotic communities were significantly related to the concentrations of salinity in lakes of the Tibetan Plateau13. Recently, Ibekwe et al. reported that microbial community structure of an urban river was significantly shaped by several key physical and chemical variables for water, including NO2, pH, and NO314. Temperature was found to be the most influential factor in determining bacterial community composition of Ganjiang River11. In contrast to these studies, we found turbidity significantly affected bacterial community. This may be because the 16 water samples showed a slightly different turbidity.

In many countries, especially in the developing world, surface waters are severely contaminated with pathogens leading to various waterborne disease outbreaks15. However, few studies concerning waterborne pathogens in urban surface waters have utilized high throughput sequencing. In the present study, it was found that potentially pathogenic bacteria were ubiquitous across all of the sampled surface waters. Among the potential waterborne pathogens, Pseudomonas spp. were the most prevalent at all sites; followed by Aeromonas. Several other potentially pathogenic genera, such as Arcobacter, Legionella, Enterobacter, Bacillus, Serratia, Enterococcus, Mycobacterium, Leptospira, Stenotrophomonas, Clostridium, Mycoplasma, Corynebacterium and Neisseria were also detected in various samples. Examples of how potentially pathogenic these organisms include the following examples: Between 2007 and 2009, 21% of the waterborne outbreaks caused by bacteria was reported in the United States, the leading etiologic pathogen bacteria including E. coli O157: H7, Shigella sonnei, Pseudomonas spp., and Legionella spp16. Among species of Legionella, a majority are considered pathogenic, with L. pneumophila the leading cause of severe pneumonia worldwide17. Members of the Aeromonas genus are common aquatic microorganisms and a number of them have been associated with human diseases18. Several species of Mycobacteria are also pathogenic to humans or animals, causing pulmonary and cutaneous disease, lymphadenitis, and disseminated infections, especially in the growing immunodeficient population19. In addition, three waterborne outbreaks associated with Arcobacter have been previously reported20.

It should be noted that both AN and TN correlated strongly with the percent of total potential pathogens (p ≤ 0.05), and potentially pathogenic bacteria were present at relatively high concentrations in samples QR (10.39%), HL (5.33%) and QL (3.85%). As the short reads generated by Illumina MiSeq platform cannot be accurately classified to the species level nor can they provide information on the amount of total or viable cells in the water samples. This limitation can lead to overestimation of the abundance of potential pathogens. Even so, more attention should be paid to health risks from surface waters in urban areas.


In conclusion, microbial community characteristics of 16 surface water samples in the Beijing area were studied using 16S rRNA gene amplicon sequencing. Bacteroidetes, Proteobacteria and Actinobacteria were the predominant phyla across all the samples. Principal coordinate analysis indicated significant differences in surface water bacterial community composition. Meanwhile, our results demonstrated the presence of potential waterborne pathogens in all of the surface waters sampled and that Pseudomonas spp. are ubiquitous bacteria in a wide variety of surface waters. Spearman correlation analysis showed a positive correlation between water quality properties (AN and TN) and total percent of the potential bacterial pathogens. These results give a better understanding of the urban surface waters ecology.

Materials and Methods

Study areas and sample collection

From March 26th to April 17th in 2014, samples from the 0–15 cm depth were taken from the lakes, rivers, or reservoir banks using ethanol-disinfected core tubes and stored in sterile Nalgene sampling bottles at 4 °C until processed within 24 h. These 16 surface waters are distributed across Beijing and consequently are located in different districts (Fig. 6), i.e., Bajia country park (BJ), Qing river (QR), Zhongguancun forest park (ZGC), Wenyu river (WY), Rome lake (RM), Huairou reservoir (HR), Hongluo lake (HL), Yanxi lake (YX), Qinglong lake (QL), Miyun reservoir (MY), Yongding river (YD), Gaobeidian reservoir (GBD), Chongqing reservoir (CQ), Niantan reservoir (NT), Nanhucheng river (NH) and Qianhai park (QH). All samples were collected in triplicate and then combined to create a composite sample. The temperature (Temp), electrical conductivity (EC) and pH of the samples were determined on-site using a Multi-parameter water quality meter (YSI, USA). Other analyses included assessment of turbidity, chloride, nitrate, ammonia nitrogen, total phosphorus, and total nitrogen were performed according to Chinese national standards (GB: 5749-2006) by Beijing ZKLH Analyzing & Testing Center (Beijing, China). For determination of total bacterial count, microbial biomass was harvested from approximately volume water samples using 0.45 µm polycarbonate membranes by means of a hand-operated pump, and then placed on aerobic count chromogenic plate (Qingdao Hope Bio-Technology Co., Ltd., Shandong, China). Typical colonies were counted and results expressed as colony forming units (CFU) ml−1 after incubated at 37 °C for 48 h. The experiment was repeated in triplicate.

Figure 6
figure 6

Map of sampling site locations in the Beijing city. The map was generated using Esri ArcGIS v10.1 (Redlands, CA).

DNA extraction from water samples and sequencing

Total bacterial DNA was extracted from suitable volumes of water samples from each site using Power Soil and Water DNA kits (MO BIO, Inc., Solana Beach, CA), according to the manufacturer’s protocol. The quality and quantity of the isolated nucleic acids were determined using the Nanodrop ND1000 (NanoDrop Technologies, Delaware, USA) and adjusted to 20–30 ng/µL with sterilized ultra-pure water. The V4-V5 regions of the bacterial 16S rRNA gene were amplified by PCR using primers 515F (5′-GTGCCAGCMGCCGCGG-3′) and 907R (5′- CCGTCAATTCMTTTRAGTTT-3′). PCR reactions were performed using 50 μL mixtures containing 5 μL of 10× PCR Buffer, 4 μL of 2.5 mM dNTPs, 1.5 μL of each primer (5 μM), 0.5 μL of Taq DNA polymerase, and 1 μL of template DNA, and sterilized ultrapure water to bring volume up to 50 μL. A negative control containing all reagents except the template DNA was included with each set of reaction mixtures. PCR conditions were as follows: initial denaturation at 94 °C for 5 min; followed by 35 cycles of 94 °C for 30 s, annealing at 52 °C for 30 s, and extension at 72 °C for 1 min; and a final extension at 72 °C for 10 min. PCR products were extracted from 1% agarose gels and purified using the AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, USA) according to the manufacturer’s instructions and quantified using QuantiFluorTM-ST (Promega, USA). All PCR products were sequenced using the Illumina MiSeq platform by the Shanghai Majorbio Bio-pharmTechnology Co., Ltd., China.

Data analysis

Sequences obtained from the Illumina sequencing platform were strictly filtered to obtained clean data and each sample was split by Qiime (V1.7.0, Low quality sequences were identified by denoising and filtered out of the dataset. The data generated by Illumina MiSeq was analyzed using the quantitative insights into microbial ecology (QIIME 1.7) toolkit21. Chimeric sequences were removed using the Uchime function with the sequence collection as its own database in the Mothur program. The effective sequences were then clustered into operational taxonomic units (OTU) at 97% sequence similarity with standard UCLUST method22. Finally, the RDP classifier was used to assign representative sequence to the microbial taxa23. Phylogenetic trees were then built from all representative sequences using the FASTTREE algorithm24. Heatmaps were drawn using gplots25 in R. The differences in overall community composition between each pair of samples were determined using the weighted and unweighted UNIFRAC metric26. For the statistical analysis, we used a randomly selected subset of 17,519 sequences per water sample to compare relative differences between samples. Normalized libraries were used to calculate the rarefaction curves and phylogenetic diversity. Rarefaction curves are a result of repeated randomizations of the observed species accumulation curve, allowing the observed richness to be compared among samples27. The Good’s coverage estimator was used to estimate the percent of the total species represented in a sample. Canonical correspondence analysis (CCA) was used to evaluate the relationship between the bacterial diversity revealed by pyrosequencing of 16S rRNA genes and water properties. The data of total bacteria was standardized using ln (CFU/mL) transformed before the analysis. CCA were performed using R (2.14.0, with the community ecology package “vegan (2.0-4)”28. Spearman correlation was applied to identify the associations between water parameters and the percent of total potential pathogens, p-values < 0.05 were considered significant.

Nucleotide sequence accession numbers

The sequences obtained in this study are available in the NCBI Nucleotide Archive database under project identification numbers SRR5405060-62, SRR5405064-65, SRR5405069-70, SRR5405072-75, SRR5420965-68 and SRR5429450.