Dominant and novel clades of Candidatus Accumulibacter phosphatis in 18 globally distributed full-scale wastewater treatment plants

Here we employed quantitative real-time PCR (qPCR) assays for polyphosphate kinase 1 (ppk1) and 16S rRNA genes to assess relative abundances of dominant clades of Candidatus Accumulibacter phosphatis (referred to Accumulibacter) in 18 globally distributed full-scale wastewater treatment plants (WWTPs) from six countries. Accumulibacter were not only detected in the 6 WWTPs performing biological phosphorus removal, but also inhabited in the other 11 WWTPs employing conventional activated sludge (AS) with abundances ranging from 0.02% to 7.0%. Among the AS samples, clades IIC and IID were found to be dominant among the five Accumulibacter clades. The relative abundance of each clade in the Accumulibacter lineage significantly correlated (p < 0.05) with the influent total phosphorus and chemical oxygen demand instead of geographical factors (e.g. latitude), which showed that the local wastewater characteristics and WWTPs configurations could be more significant to determine the proliferation of Accumulibacter clades in full-scale WWTPs rather than the geographical location. Moreover, two novel Accumulibacter clades (IIH and II-I) which had not been previously detected were discovered in two enhanced biological phosphorus removal (EBPR) WWTPs. The results deepened our understanding of the Accumulibacter diversity in environmental samples.


Results
qPCR efficiency and accuracy. The AS samples for DNA extraction were collected from 18 WWTPs, each WWTP was assigned a code consisting of the acronyms of country-city-name of the WWTP (Table 1). To determine whether the extracted genomic DNA contained contaminants that inhibit qPCR, the sludge samples UK-WL-OW, JP-STD-TK, CN-WH-LW, JP-A2O-TK and UK-NW-NW were diluted in series and amplified by using primer sets targeting Accumulibacter clades IA, IIA, IIB, IIC and IID respectively to calculate the PCR efficiency ( Figure S1). The results showed that the standard curves had correlation coefficient of 0.98-0.99 and PCR efficiencies of 93%-106%, demonstrating that the qPCR assay was accurate and PCR inhibitors in the extracted DNA were negligible.
To confirm the specificity of qPCR assays targeting Accumulibacter ppk1 genes in specific clades, two representative qPCR products amplified by each ppk1 gene primer set were cloned and sequenced. The phylogenetic trees recruiting qPCR product sequences and their closest GenBank matches demonstrated the specificity of the qPCR-ppk1 assays applied in this study ( Figure S2). Nucleotide sequences from qPCR products were clustered with its appropriate reference ppk1 genes most of which had been defined by He et al. 8 .
Detection of Accumulibacter abundance. The fractional abundances of the total Accumulibacter lineage relative to the bacterial community have been estimated by three approaches, i.e., 1) ppk1 genes and bacterial 16S rRNA genes; 2) Accumulibacter 16S rRNA genes and bacterial 16S rRNA genes; and 3) pyrosequencing reads assigned to Accumulibacter and bacteria. The first two approaches were determined by qPCR, whereas the third approach was based on the 16S rRNA gene pyrosequencing ( Table 2). Both the finished genome of Accumulibacter Clade IIA UW-1 (referred to CAP IIA UW-1) 15 and the recently reported draft genomes of Clades IA 14,16 , IB 17 , IC 16 , IIC 16 and IIF 16 carried the single copy of ppk1 gene. CAP IIA UW-1 carried two copies of rrn operon. Therefore, we estimated the fractional abundance of Accumulibacter by assuming that Accumulibacter genome had single copy of ppk1 gene and 2 copies of the rrn operon while the bacterial genomes carried 4 copies of rrn operon according to the available 2,671 finished genomes in IMG/M database at the time of writing.
Total Accumulibacter abundances determined by qPCR using the ppk1 primer set displayed similar trends to qPCR-based estimates using the Accumulibacter 16S primer set (paired Student t test, P = 0.08) and, to a lesser extent, 16S rRNA gene pyrosequencing data (paired Student t test, P = 0.38). However, estimates based on the latter two approaches were significantly different (paired Student t test, P = 0.03). The differences between 16S rRNA gene qPCR-based and pyrosequencing-based estimation were due to the different biases existing within each of the two quantification approaches. According to the qPCR-ppk1 assays, AS samples from SG-SG-UP, CN-WH-LW, UK-WL-OW, UK-NW-NW, UK-LB-LB, CN-GZ-DT and JP-A2O-TK had the highest abundances of Accumulibacter lineage among the 18 AS samples, ranging from 1.2% to 7.0%. Five of these top seven plants were indicated to promote biological phosphorus removal with anaerobic/anoxic/aerobic (A/A/O) or anoxic/ aerobic and return activated sludge (A/O + RAS) fermentation configurations. The abundance of the Accumulibacter lineage identified in the AS from JP-A2O-TK was about two times higher than that from JP-STD-TK, which treated the same wastewater using different treatment processes. This can be reasonably explained by the fact that A/A/O processes (JP-A2O-TK) often to enrich Accumulibacter relative to conventional AS processes.
Distribution of Accumulibacter clades. The relative abundance of the five major Accumulibacter clades in each WWTP was assessed by qPCR using the specific ppk1 primer sets designed by He et al. 8 . For this assessment, we adopted the primer set targeting the clade IIC ppk1 genes, excluding the NS D3 clone (accession number EF559355), because the NS D3 clone sequence was only distantly affiliated to other sequences in clade IIC and it had overlap regions with those in other clades. This may significantly affect the quantification of clade IIC by qPCR 8,13 .
Accumulibacter clades IIC & IID dominated the five Accumulibacter clades in 11 out of the 18 AS samples with relative abundances ranging between 30% and 99% of the total Accumulibacter lineage ( Table 2) and 0.01%-1.45% of the total bacterial community (Table S2). It should be born in mind that the Accumulibacter genome of clade IID has not been obtained, although the genome retrieving work is ongoing. Besides the five clades, some other clades co-existed in the Accumulibacter lineage, which had been demonstrated by Peterson et al. 11 . Given that most of the Accumulibacter clades shared highly similar 16S rRNA genes, we assumed abundances of the Accumulibacter lineage in bacterial communities quantified by the qPCR-16S assay had contained the Accumulibacter of all clades. Besides those five clades whose ppk1 genes can be amplified by the qPCR-ppk1 assay, large proportions of other clades were estimated in the Accumulibacter lineage of over 50% abundances in 10 WWTPs, especially over 80% in sludge samples of CN-HK-ST, CN-QD-TD and CN-HK-SH (Table S3). Designing qPCR primer sets targeting more clades, for example, clades IB, IC, ID and IE in type I and clades IIE and IIF in type II 11 based on their divergent regions in ppk1 sequences, is necessary to reveal the distribution of different Accumulibacter clades more accurately.
To compare the Accumulibacter clade diversity and evenness among the full-scale AS samples, the Shannon diversity index and Pielou evenness index were calculated using the relative abundances of the five Accumulibacter clades as determined by qPCR targeting the ppk1 genes ( Table 2 and Fig. 1). The AS from CA-GP-GP was not included in this analysis because no Accumulibacter lineage was detected by qPCR. The least diversity and evenness were found in CN-NJ-SJ with 99% of Accumulibacter in clade IID. Notably, CN-BJ-BX and SG-SG-UP, UK-LB-LB and UK-WL-OW showed highest level of diversity, i.e., all five clades were detected and clades were more evenly distributed than in the other samples. The high diversity and evenly distribution of the Accumulibacter lineage present in these four samples might be due to the complex configuration (e.g. Effects of operational variables on composition of the Accumulibacter lineage. According to the old microbiological tenet 'everything is everywhere, but the environment selects' 18 , the lineage structure may be affected by the stimuli from WWTP operation. To further understand the effects of WWTP operating conditions on compositions of the Accumulibacter lineage, redundancy analysis (RDA) was conducted by recruiting the relative abundances of the Accumulibacter clades in the bacterial community (Fig. 2a) and within the Accumulibacter lineage (Fig. 2b). Given those clades kept quite large distance from the variable lines in Fig. 2a, the total Accumulibacter levels in bacterial communities did not significantly correlate with any single operating variable (P > 0.05) (Table S4). However, the relative abundances of specific clades in the Accumulibacter lineage showed significant correlation (P < 0.05) with WWTP influent levels of total phosphorus (TP) and chemical oxygen demand (COD) (Fig. 2b and Table S5). The proportions of the dominant clade IID in Accumulibacter were most positively influenced by influent TP, COD and total nitrogen (TN). The results were consistent with well-recognized conclusions that the level of PAOs was related to the operational variables of influent biodegradable COD, reactor configuration and amount of oxidized nitrogen entering the anaerobic zone 2 . Compared to the wastewater characteristics the geographic factors such as latitude played little role in distributing the Accumulibacter clades.
Novel clades of the Accumulibacter lineage. The Accumulibacter ppk1 genes were sequenced from PCR amplicons obtained from four selected AS samples carrying out EBPR. All the retrieved sequences from UK-LB-LB and UK-WL-OW were affiliated with previously recognized Accumulibacter clades 8,11 , but 28 ppk1 sequences from CN-GZ-DT and 16 ppk1 sequences from JP-A2O-TK were relatively distant from known clades (Fig. 3 and Figure S3). These ppk1 sequences of two novel clades (IIH and II-I) shared identities of 88% with their closest ppk1 gene matches in Type II respectively. The observation of the novel Accumulibacter clades in EBPR AS systems expand our understanding of the diversity of its ecotypes and implied that different wastewater characteristics or operational parameters may induce the proliferation of different Accumulibacter clades.

Discussion
We used qPCR assays to quantify the abundances of the Accumulibacter lineage in AS samples of 18 full-scale WWTPs. 6 of the 18 plants were indicated as having conditions to promote EBPR. Overall, Accumulibacter were determined to habitat in 17 AS samples with percentage of 0.02%-7.0% in bacterial communities.
Based on currently available qPCR primer sets that target ppk1 genes, the clades IIC and IID dominated the Accumulibacter lineage of 11 of the 18 AS samples. Moreover, deeper analysis suggests that clade IIC abundances are probably underestimated by our qPCR-ppk1 assay due to the exclusion of the NS D3 clone. Among the detected Accumulibacter clades, WWTP influent TP, COD and TN levels were positively correlated with the relative abundance of clade IID, whereas all of them were inversely correlated with clade IIC, implying functional divergence between the two clades. Some operational variables regarding to be meaningful for EBPR, e.g. solid retention time and effluent TP, were not included in the analyses, which may affect the correlation of Accumulibacter distribution and WWTPs operation to a certain extent. However, this is still a valuable starting point for understanding dominant Accumulibacter clades in full-scale WWTPs.
Consistent with previous results 11, 12 , higher Accumulibacter diversities were apparent among the AS samples (compared with the five clades). Thus we suspect the five qPCR primer sets understand the actual abundances of Accumulibacter clades within the WWTPs. Moreover, two novel Accumulibacter clades (IIH and II-I) were identified from two full-scale AS samples, respectively. Therefore, designing new qPCR primer sets targeting more clades of ppk1 genes, according to their divergent regions in sequences, is urgently needed.
Overall, this study for the first time shows Accumulibacter abundances and dominant clades on a global scale among various types of WWTPs. The results deepen our understanding in the distribution  Table 2. Relative distributions of Accumulibacter clades and estimated abundance of the total Accumulibacter lineage relative to the bacterial community. a The relative abundance of specific clade within Accumulibacter lineage was obtained by dividing the ppk1 copy number of each clade by the sum of the ppk1 copy numbers from five clades. The numbers in bold indicated the dominant clade among the five clades in each sludge sample. "\" indicated that no effective amplification were detected in qPCR assays or visualized by the gel electrophoresis. b Percentage of Accumulibacter lineage within bacterial community was calculated by copy numbers of ppk1 genes from five clades and copy numbers of bacterial 16S rRNA genes. c Percentage of Accumulibacter lineage within bacterial community was calculated by copy numbers of 16S rRNA genes from Accumulibacter and copy numbers of bacterial 16S rRNA genes. d Percentage of Accumulibacter lineage within bacterial community was calculated according to 16S rRNA gene pyrosequencing reads assigned to Accumulibacter and those assigned to bacteria. Sludge samples from 12 WWTPs had been conducted 16S rRNA gene pyrosequencing targeting the V4 region 19 . ND, not determine.  Table 1. Out of the 18 AS samples, 12 samples were previously analyzed using 454-pyrosequencing to reveal their bacterial diversity 19 . In the current study, we performed qPCR analysis to more specifically investigate the distribution of specific clades of Accumulibacter.
DNA extraction. DNA extraction was performed in duplicate from each sludge sample, following methods reported previously 19 , using FastDNA SPIN Kit for Soil (MP Biomedicals, Solon, OH, USA). DNA purity was checked by spectrometry at 260 nm and 280 nm measured by NanoDrop ® Spectrophotometer ND-100 (Thermo Fisher Scientific, USA), and the DNA integrity was visualized by gel electrophoresis using λ DNA-HindIII Digest (Takara, Japan) and DL2000 (Takara, Japan) DNA markers. For accurate quantification by qPCR, DNA samples in high purity were extracted, which could be indicated by the 260/280 ratios of 1.84 ± 0.07. qPCR conditions. The qPCR assays quantified the abundances of Accumulibacter ppk1 genes and 16S rRNA genes from Accumulibacter and total bacteria using primer sets published previously 8 . Thermal cycling and fluorescence detection of the qPCR were conducted on a BioRad iCycler (v. 5.0, BioRad, Hercules, CA) in a 25 μ L reaction volume containing 5 ng DNA template and using SYBR Premix Dimer Eraser TM (Takara, Japan). The thermal cycling protocol was as follows: initial denaturation at 95 °C for 30 s, followed by 40 cycles of denaturation at 94 °C for 5 s, annealing for 45 s, and extension at 72 °C for 30 s. The cycle threshold (Ct) was determined by setting the threshold at a relative fluorescence unit value of 5.0. The annealing temperature and specific primer concentration had been optimized to improve the quality of qPCR (Table S1). Five-to seven-point calibration curves for the qPCR were generated by 10-fold serial dilution of plasmid carrying ppk1 gene fragments of specific clades. Plasmids as the standard for qPCR were generated from appropriate clones. In detail, the PCR product of ppk1-or 16S rRNA-target genes was purified using quick-spin PCR Product Purification Kit (iNtRON, Korea) and cloned using pMD18-T Vector (TaKaRa, Japan). The plasmid then was extracted from the clone and its mass concentration was determined by NanoDrop ® Spectrophotometer ND-100 (Thermo Fisher Scientific, USA). Copy number was calculated based on the molecular weight and the mass concentration of each plasmid 20,21 . Quality control of qPCR. A negative control containing sterile water was used for each assay to check possible contamination and primer-dimer formation. All amplification products were visualized  by agarose gel electrophoresis, some gel images could be found in Figure S4. PCR amplification efficiency (E) was estimated from the slope of the standard curve by the formula E = 10 -1/slope -1. To check the qPCR assay variation, standard curves were run in duplicate reactions within each plate. Samples were run on at least two plates with triplicate reactions per plate. Standard deviations were calculated from the average for the runs. In the present study, only those qPCR results with standard curves of R 2 within 0.97-1.00, Ct values from 12 to 32, and PCR efficiencies of 87%-106% were accepted as the reliable data. In order to evaluate the reliability of qPCR identification and whether DNA extract of sludge samples contained PCR inhibitors, PCR efficiency values of several DNA extracts from AS samples were also checked 22,23 . To validate the specificity of the qPCR assays, representative qPCR products of each Accumulibacter clade were cloned and sequenced. The phylogenetic distances of the sequences and their closest GenBank matches were visualized by using MEGA (v. 5.02) 24 .
Quantification of total Accumulibacter lineage. The abundance of the Accumulibacter lineages was examined by two qPCR assays targeting the 16S rRNA gene and the ppk1 gene, respectively. From this data, the fraction of Accumulibacter was estimated by assuming that one Accumulibacter genome carries two copies of the 16S rRNA gene and one copy of the ppk1 gene according to the finished genome CAP IIA UW-1 15 . It was assumed that other bacteria contained an average of 4.0 copies of the 16S rRNA gene per genome based on the 2,671 finished genomes in IMG/M database (downloaded on April 7, 2014) 25 . The sequences in red are derived from draft genome sequences rather than clones. Black circles on node branches indicate > 60% bootstrap support. The ppk1 gene of Rhodocyclus tenuis was adopted as the most closely related outgroup sequence. Previous 16S rRNA gene pyrosequencing targeted the V4 regions on samples from 12 WWTPs. Given the limited number of current databases, the 16S rRNA gene of the candidate genus Accumulibacter only can be detected by the BLASTN Best-hit method based on similarity searches (not by other methods like RDP 26 and MEGAN-LCA annotation 27,28 ). Therefore, following the methodology published before 29,30 , to quantify the total Accumulibacter 16S rRNA genes, representative 16S rRNA gene sequences of various Accumulibacter clades ( ≥1200 bp) were chosen to form a specialized sub-database (Table S6) and used for identifying Accumulibacter-related sequences from pyrosequencing reads by BLASTN (v. 2.2.28+ ) 31 . The BLASTN Best-hits were screened to identify the candidate Accumulibacter-like 16S rRNA gene sequences according to minimum identity of 97% and alignment length of 200 bp. The candidate sequences from each sample were manually checked online at NCBI to validate the identification 27 .
Accumulibacter diversity analysis. Following the methods published by He et al. 8 , Shannon diversity index (H = -∑Pi ln Pi) 32,33 and Pielou evenness index (R = H/lnS) 34,35 were calculated to determine the clade diversity and evenness within the Accumulibacter lineage for each sample, respectively. Pi indicated the relative abundance of each clade in the Accumulibacter lineage as quantified by the qPCR-ppk1 assay. S represented the richness value that was determined by the number of clades as detected by the qPCR-ppk1 assay. Profiling of the Accumulibacter lineage structure was linked to the operational conditions of full-scale WWTPs, including influent COD, TN and TP, sampling temperature, and latitude of the location, according to RDA. Contribution of each set of variables was evaluated by Monte Carlo permutation tests using 1000 permutations as implemented in CANOCO (v. 4.5) 36 .
The ppk1 gene clone library. To characterize the Accumulibacter population in full-scale AS samples, the ppk1 gene as the phylogenetic marker was amplified from the AS samples of CN-GZ-DT, UK-LB-LB, UK-WL-OW, JP-A2O-TK by using the primer set of ACCppk1-254F (5'-TCACCACCGACGGCAAGAC-3') and ACCppk1-1376R (5'-ACGATCATCAGCATCTTGGC-3') 10 . Triplicate 50 μ L PCR amplification solutions were prepared for each sample, containing 25 μ L of Premix Ex Taq TM (TaKaRa, Dalian, China), 1 μ L of 10 μ M forward and reverse primers, 50 ng of extracted DNA, and water (RT-PCR grade, Ambion Inc., USA). The thermocycling steps for PCR were set as follows: initial denaturation at 95 °C for 4 min followed by 30 cycles of denaturation at 95 °C for 30 s, annealing at 65 °C for 1 min, and extension at 72 °C for 2 min; and a final extension step at 72 °C for 12 min. Fifty clones from each clone library were sequenced in both directions using vector primers, and paired reads were quality-and vector-trimmed, and assembled by a custom script online (http://124.17.29.44/ customer.html). To further confirm the accuracy of the novel clades sequences, the ppk1 gene amplicons from sludge samples of CN-GZ-DT and JP-A2O-TK were sent to BGI for the clone library again. Twenty ppk1 gene sequences were obtained from each sample to validate the accuracy of sequences. Putative chimeras were checked by using Bellerophon 37 and manually using partial treeing analysis 11 . The obtained sequences were aligned with publicly available ppk1 sequences to construct a phylogenetic tree by employing maximum likelihood method within MEGA (v. 5.02) 24 and deposited in GenBank under accession numbers of KP737870-KP738109. Data availability. The draft genome sequences of Candidatus Accumulibacter Clade IB HKU-1 and Clade IIC HKU-2 have been deposited in the NCBI whole genome shotgun database under the accession numbers of LBIU00000000 and LBIV00000000, respectively.