Introduction

Legionella spp. are aerobic, gram-negative bacteria that belong to the order Gammaproteobacteria. Legionella spp. survive and grow in protozoan hosts, such as amoebae, and form biofilms in natural waters (lakes and rivers) and artificial waters (hot springs, bathing facilities, and cooling towers)1,2,3. Legionnaires’ disease is a respiratory illness caused by the aspiration of water or inhalation of aerosols containing Legionella spp1. More than 60 species of Legionella spp. have been identified4, with L. pneumophila being the most frequently detected in patients5. The risk of developing Legionnaires’ disease is high in immunocompromised and elderly individuals as well as those who smoke2,6, and the number of affected patients is increasing every year globally. For example, at least 8000 to 18,000 individuals are infected each year in the United States1, and more than 2000 cases have been reported annually since 2018 in Japan2. Bathwaters are a potential source of legionellosis in Japan. In the 2001–2007 survey of legionellosis in Japan reported by Kuroki et al., most patients were elderly individuals (the mean age was 67 years), and more than 40% of infection sources were in public baths7. The public bath for the elderly individuals is popular in Japan because of its therapeutic significance. Thus, there are approximately 25,000 public baths in Japan, as reported by the Japanese Ministry of Health, Labor and Welfare (MHLW), and the majority of these use a circulating water system. Young, healthy people with strong immune systems are not susceptible to infection when exposed to Legionella spp. Those at a higher risk of infection are elderly individuals, those with underlying diseases, and immunocompromised individuals. Thus, there is a need to establish strict regulations to prevent bacterial contamination in all public baths2. The MHLW is committed to ensuring that public baths provide a safe environment and that effective infection control measures are implemented on a regular basis. Strict regulations are in place and enforced to ensure that public baths provide a safe environment and that effective infection control measures are implemented on a regular basis. Thus, Legionnaires’ disease remains an important public health threat, especially in the rapidly aging population2. Consequently, periodic testing of Legionella spp. is mandatory for bathing facilities in Japan.

Legionella spp. can be detected in water samples using culture or molecular methods, such as real-time PCR8. The culture method is commonly used9; however, it is time-consuming owing to the long incubation period required and can only quantify viable and culturable Legionella spp10. In addition, the culture method has reliability limitations because its recovery rate is frequently < 100% owing to bacterial loss during the enrichment process11,12. The real-time PCR method can provide results faster than the culture method; however, the results of the two methods correlate poorly10. The PCR method is considered to overestimate the presence of Legionella spp., as it can detect the DNA extracted from dead, viable, and non-culturable Legionella cells, including those hosted in amoebae13,14.

Previous studies have identified several protozoan species in which Legionella spp. are capable of intracellular growth, and the presence of these organisms is necessary for the survival and growth of Legionella spp. Therefore, it is important to examine the microbiota of the environment in which Legionella spp. survive to effectively control Legionnaires’ disease; however, the relationship between the survival and growth of Legionella spp. and the coexisting microbiota is not fully understood1. Peabody et al.15 and Llewellyn et al.16 reported metagenomic data of microorganisms containing Legionella spp. in environmental water and cooling towers, respectively. These authors showed that the analysis of the flora in these systems is important. On the contrary, there is a lack of extensive metagenomic analysis of microorganisms including Legionella spp. in bathwater.

In this study, we investigated the environment in which Legionella spp. reproduce by analyzing the bathwater microbiota and its relationship with Legionella spp. We investigated the bathwater microbiota using 16S and 18S rRNA sequencing in 112 samples from bathing facilities. We compared the detection results of 16S and 18S rRNA sequencing and culture tests. Moreover, we determined the differences in the microbiota between samples in which Legionella spp. were present (positive samples) or absent (negative samples), based on both 16S and 18S rRNA sequencing and culture methods, to identify the microorganisms that coexisted with Legionella spp.

Results and discussion

Sequencing and taxonomic assignment

A total of 9,429,223 high-quality 16S rRNA sequences were obtained from the 112 bathwater samples after Illumina MiSeq sequencing, quality filtering, and chimera screening. The average, maximum, and minimum numbers of 16S rRNA sequences in each sample were 84,189; 190,987; and 20,733; respectively. Additionally, 12,179,023 high-quality 18S rRNA sequences were obtained. The average, maximum, and minimum numbers of 18S rRNA sequences in each sample were 108,741; 221,475; and 31,435; respectively.

After conducting BLASTn ver. 2.8.117 searches against the SILVA ver. 132 database18, we examined the taxonomic information of sequences within the threshold values. Taxonomic information was assigned to 8,398,717 sequences. The average assignment rate for each sample was 89.1% (average 74,989; maximum 144,333; and minimum 16,389 sequences among the 112 samples). In addition, BLASTn analysis was performed on 18S rRNA sequences. A total of 1,867,244 18S rRNA sequences were assigned, and the average assignment rate for each sample was 15.3% (average 16,672, maximum 172,578, and minimum 11 sequences). Many samples presented a low percentage of assignments in 18S rRNA sequences, as these samples had a large number of assignments with prokaryotes.

Tables 1 and 2 show the top 10 frequencies of the number of sequences per phylogenetic group relative to the total number of sequences identifying the genus using 16S rRNA (8,398,717 sequences) and 18S rRNA (1,867,244 sequences) genes. Proteobacteria was the most frequently detected phylum in 16S rRNA sequences. The most common class was Alphaproteobacteria, followed by Gammaproteobacteria, both of which accounted for approximately 90% of all classes. At the genus level, we detected Methylobacterium, Sphingomonas, Acinetobacter, and Pseudomonas, which are widely found in the environment. Based on 18S rRNA sequences, eight groups were detected as supergroups. Among them, Opisthokonta, Amoebozoa, and SAR accounted for more than 90% of the total supergroups. Vermamoeba, a known Legionella spp. host19,20, was the most frequently detected genus, accounting for approximately 25% of the total genera.

Table 1 Top 10 frequencies of the phylogenetic groups identified by 16S rRNA sequences.
Table 2 Top 10 frequencies of the phylogenetic groups identified by 18S rRNA sequences.

Legionella spp. detection using culture method and 16S rRNA sequencing

We compared the detection rates of Legionella spp. using the culture method and 16S rRNA sequencing. The culture method indicated that 72 of the 112 samples were positive (detection rate 68.8%, average detection rate 821 CFU per 100 mL, maximum detection rate 18,800 CFU per 100 mL, and minimum detection rate 0 CFU per 100 mL). The 16S rRNA sequencing detected Legionella spp. in 86 of the 112 samples (detection rate 76.8%, total number of detected sequences 10,270, and relative abundance of total sequences 0.0012) (10,270/8,398,717 sequences). The average and maximum number of sequences per sample was 92 and 1268, respectively, whereas the average and maximum relative abundance per sample was 0.0012 and 0.0144, respectively (Supplementary Table 1). The relative abundance of Legionella spp. per sample was higher than the mean relative abundance of all the identified genera (0.0008). Figure 1 shows a scatter plot of the Legionella spp. content obtained using the culture method (CFU per 100 mL) and the number of reads assigned to Legionella spp. using 16S rRNA sequencing in the 112 samples. The Spearman’s rank correlation coefficients between the two methods were almost identical to those obtained in a previous study that compared culture and real-time PCR methods for Legionella spp. detection in bathwater. This result is in agreement with the results of Guillemet et al.21 and Bontta et al.22. They reported a significant but weak correlation between the concentrations of Legionella spp. obtained using real-time PCR and those obtained using conventional culture methods in water samples.

Fig. 1: Scatter plot of Legionella spp. content obtained using the culture method and the number of reads assigned using the 16S rRNA sequencing in the 112 samples.
figure 1

Each axis is shown in logarithmic scale. rs is the Spearman’s rank correlation coefficient between the numbers of 16S rRNA sequences and colonies in the 112 samples.

Subsequently, we compared the detection and non-detection status of Legionella spp. using the culture method and 16S rRNA sequencing (Table 3, sample classification detailed in the Methods). The agreement between the detection status of the culture method and 16S rRNA sequencing was relatively high: D/P = 87.8%, ND/N = 53.3%. In particular, a high degree of conformity was observed between the samples that tested positive for Legionella spp. in the culture method and those that tested positive in 16S rRNA sequencing (D/P). These results showed that the quantitative correlation between both methods was low, but the qualitative correlation was high. On the contrary, the agreement between the samples that tested negative in the culture method and positive in 16S rRNA sequencing (D/N) was relatively high at 46.7%. This pattern may be due to the difference in the measurement principle of the two methods. In contrast to the real-time PCR method, the culture method detects only viable and culturable Legionella spp. and underestimates Legionella spp. present in protozoa21. The agreement between samples that tested positive in the culture method and negative in 16S rRNA sequencing (ND/P) was also relatively high at 12.2%. This pattern might be explained by disregarding PCR inhibitors in the extracted DNA21; alternatively, it could be due to the presence of PCR biases during amplicon sequencing. Further analysis of this cause is needed in the future.

Table 3 Legionella spp. testing results using the culture method and 16S rRNA sequencing in 112 samples.

In this study, we found a weak correlation in Legionella spp. abundance between the culture method and 16S rRNA sequencing. On the contrary, when we consider only the qualitative results (detection and non-detection), the degree of consistency between the culture method and 16S rRNA sequencing was high. The growth of Legionella spp. is considered to be subject to complex interactions with the habitat, such as water quality and coexisting microorganisms23. To elucidate the presence of microbes coexisting with Legionella spp. as a first step, we compared the samples using the presence or absence of each microbe as an indicator.

Microbiota comparison between the positive and negative groups

The 112 samples were classified into four groups: P_D, P_ND, N_D, and N_ND. The P_D + (positive group) included 22 samples, whose Legionella spp. relative abundance per sample was greater than the average relative abundance (0.001) of all detected microbial genera. In contrast, 16 samples were classified as N_ND (negative group) (Supplementary Table 1). The classification of each sample is shown in Table 4.

Table 4 Number of positive and negative samples.

A total of 1,343 genera (956 prokaryotes and 387 eukaryotes) were identified in the 38 positive and negative samples (Supplementary Table 2). Table 5 shows the top genera with high detection frequency (the presence or absence of a species in a sample) in the P_D + and N_ND groups. We observed that some genera were common to both P_D + and N_ND groups, whereas others were unevenly distributed. This pattern indicates that the microbiota may differ between the two groups.

Table 5 Genera with high detection frequency in the positive and negative samples.

To confirm the differences in microbiota between the P_D + and N_ND groups, we performed multidimensional scaling (MDS) analysis using the Jaccard dissimilarity indexes for detection frequency in the positive and negative samples of all 1,343 genera (Fig. 2). The analysis of similarities (ANOSIM) showed a statistic R of 0.777 (p = 0.001). The positive and negative samples were classified into different clusters, thereby confirming the differences in microbiota between the P_D + and N_ND groups.

Fig. 2: Multidimensional scaling (MDS) plot of 956 prokaryotic and 387 eukaryotic genera.
figure 2

Red dots show positive samples (N = 22) and blue dots show negative samples (N = 16).

Extraction of microbial genera coexisting with Legionella spp.

Figure 3 shows the relationship between the number of occurrences of each of the 1343 genera detected in the positive and negative samples. The genera that appeared in only one sample accounted for approximately 40% of the total, suggesting that most microorganisms do not show any association with Legionella spp. and are likely rare microorganisms present in the samples. The prokaryotes Acinetobacter24, Flavobacterium25, Methylobacterium26, Pseudomonas27, and Sphingomonas28 are commonly found in the environment, including drinking water, groundwater, and soil (Table 1). These prokaryotes might be universally present in the samples regardless of the presence of Legionella spp.

Fig. 3: Number (blue line) and cumulative percentage (orange line) of genera commonly detected within the positive and negative samples.
figure 3

The cumulative percentage was calculated as cumulative genera/1343 genera × 100.

To extract only the microorganisms associated with Legionella spp., we removed the microorganisms that were rarely and commonly detected in the positive and negative samples. A total of 1224 (approximately 90% of 1343 genera) and 44 genera were classified as rare and common microorganisms, respectively (detailed in the Methods). After removing these microorganisms, 75 genera were extracted. The classification of each genus is shown in Supplementary Table 2.

We performed MDS analysis using the Jaccard dissimilarity indexes for the detection frequency of these 75 genera (Fig. 4). The ANOSIM statistic R was 0.962 (p = 0.001), clearly showing the differences in microbiota between the P_D + and N_ND groups, compared with that before microbial removal (R = 0.777). Next, we determined the Spearman’s rank correlation coefficients against detection frequency in positive and negative samples between the 75 extracted genera and Legionella spp., and the top 10 genera are listed in Table 6. The Spearman’s rank correlation coefficients between Legionella spp. and 1343 genera are shown in Supplementary Table 2.

Fig. 4: Multidimensional scaling (MDS) plot of 75 selected genera.
figure 4

Red dots show positive samples (N = 22), and blue dots show negative samples (N = 16).

Table 6 Genera associated with the presence of Legionella spp.

Among prokaryotes, Methyloversatilis (rs = 0.89), Cupriavidus (rs = 0.85), and Phenylobacterium (rs = 0.84) had high correlation coefficients. Among eukaryotes, Vermamoeba (rs = 0.77) and Aspidisca (rs = 0.58) were highly correlated with Legionella spp. Methyloversatilis, which had the highest correlation coefficient in prokaryotes, has been reported as a microorganism that forms biofilms in drinking water pipes and serves as a food source for amoebae29. Moreover, Methyloversatilis, Phenylobacterium, and Caulobacter have been detected together with Legionella spp. in biofilms formed in water pipes in artificial environments30. Similarly, Reyranella and Bosea have been isolated from biofilms in tap water by coculture with amoebae23. Vermamoeba, which had a high correlation in eukaryotes, is considered a Legionella spp. host and has been detected together with Legionella spp. in household hoses31,32. Aspidisca has also been detected in drinking water samples33.

Contrary to expectations, Acanthamoeba and Naegleria, which are protists commonly associated with Legionella spp.34, were not highly correlated in this study. Indeed, Acantamoeba was classified as a rare microorganism. Naegleria was extracted as one of the 75 genera highly related to Legionella spp. However, Naegleria was detected in a large number of samples that tested negative in the culture method, resulting in a low correlation with Legionella spp. The interaction between Legionella spp. and protozoa has been previously reported34, and it can be influenced by a number of factors; the identity of the host cell, variations in the predatory behavior or feeding preferences of the host, the strain or species of the bacterium, the relative abundance of the two organisms, the external environment, and other microorganisms, which may have led to the present results34. This result may be partially due to the low hit rate (15.3%) against the 18S rRNA sequence database; however, to what extent this skews the genus identification remains unclear.

Several previous studies have determined the microbiota of artificially formed biofilms using non-chlorinated water29,30. In contrast, previous studies have reported that there is a predominant difference in the amount of Legionella spp. present in water at the boundary of ≥ 0.2 mg L-1 residual chlorine in the water35,36. However, as Legionella spp. survive and multiply in biofilms, Legionella spp. suspended in water are disinfected by chlorine, whereas those present in the microflora in biofilms are able to survive as a result of the high resistance of the biofilm to disinfectants20,34. We used bathwater samples, which are highly likely to contain chlorine. The similarity of the microbiota detected, regardless of the habitat of Legionella spp., suggests that the microorganisms extracted in this study may be closely related to the survival and growth of Legionella spp.

In conclusion, we observed differences in the microbiota of the 112 bathwater samples with and without Legionella spp. The prokaryotes Methyloversatilis, Cupriavidus, and Phenylobacterium and the eukaryotes Vermamoeba and Aspidisca were highly correlated with Legionella spp. In terms of the habitat of Legionella spp., most previous studies were based on the microbiota in environments where residual chlorine was not present, whereas the results of this study were obtained in environments where residual chlorine was likely to be present. Therefore, it is suggested that Legionella spp. develop in a specific microbiota, regardless of the habitat or formation process. However, as the chlorine concentration was not actually measured in this study, it is necessary to accurately determine the presence or absence of residual chlorine in the future. In addition, other habitats, such as cooling towers, should be analyzed to clarify the composition of the microbiota coexisting with Legionella spp.

Methods

Sample collection

A total of 112 independent bathwater samples were collected from bathtubs in bathing facilities in Japan between February 2016 and November 2018. Details of the collection date for each sample are provided in Supplementary Table 1. All samples were collected in 200 mL polyethylene flasks containing sterile sodium thiosulfate and stored in the dark at approximately 4 °C until testing.

Legionella spp. detection via the culture method

In this study, Legionella spp. were detected using the filtration method commonly used in Japan9. Each water sample (200 mL) was concentrated using filtration through a 0.2 μm pore size membrane filter (Advantec Tokyo Co., Ltd., Tokyo, Japan). Next, the membrane was immersed in 4 mL of sterile distilled water and vortexed for 2 min. The suspension (1 mL) was supplemented with 1 mL of 0.2 M HCl-KCl buffer (pH 2.2), heated for 5 min at approximately 25 °C, and incubated on two Wadowsky-Yee-Okuda agar plates (100 μL per plate) supplemented with α-ketoglutarate (Eiken Chemical Co., Ltd., Tokyo, Japan) for 5–7 days at 36 ± 1 °C. After incubation, colonies (1−10) with Legionella spp. were cultured on Legionella buffered charcoal yeast extract (BCYE-α) agar (Nikken Bio Co., Ltd., Tokyo, Japan). After 3 days of incubation at 36 ± 1 °C, the number of colonies per 100 mL was determined for isolates that grew on BCYE-α but not on blood agar base agar. The remaining suspension was stored at −20 °C and used for DNA extraction. The detection limit of the culture method is 1 colony per 10 mL, i.e., 10 colonies per 100 mL of bathwater.

Sequencing of 16S and 18S rRNA genes and bioinformatics methods

Mixed genomic DNA from each sample was extracted using the NucleoSpin Microbial DNA kit (Macherey-Nagel, Germany) and was used to amplify the bacterial 16S rRNA and eukaryotic 18S rRNA regions using PCR. Primers 515 F and 806 R amplified the 16S rRNA V4 region, whereas primers 1389 F and 1510 R amplified the 18S rRNA V9 region. Sequencing of 16S and 18S rRNA genes was performed by Fasmac Co., Ltd. (Atsugi, Japan) using the Illumina MiSeq sequencing platform. The 16S and 18S rRNA sequences were deposited in the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) Sequence Read Archive under accession no. DRA014009.

To assign taxonomic information using sequence similarity, we performed BLASTn ver. 2.8.117 searches against the SILVA ver. 132 database18 using 16S and 18S rRNA sequences. We examined the taxonomic information of sequences showing the best hit with an E-value of ≤ 1e-5 and both sequence identity and coverage were ≥ 97%. If the 16S and 18S rRNA sequences were similar to a reference sequence with an unidentified genus, the sequences were removed.

Sample classification

To identify samples in which Legionella spp. are reliably detected, samples were labeled based on the detection of Legionella spp. using the culture method and 16S rRNA sequencing. First, samples in which Legionella spp. were detected using the culture method were defined as “P,” whereas samples in which Legionella spp. were not detected were defined as “N.” Next, samples in which Legionella spp. were detected using 16S rRNA sequencing were defined as “D,” whereas samples in which Legionella spp. was not detected were defined as “ND.” By combining the labels of the culture method and 16S rRNA sequencing, the samples were classified into four groups: P_D, P_ND, N_D, and N_ND. Among P_D samples, those with Legionella spp. relative abundance per sample greater than the average relative abundance of all detected microbial genera were defined as P_D + . In this study, P_D + was defined as the positive and N_ND as the negative group.

Microbiota comparison in the positive and negative groups

To compare the microbial profiles of the P_D + and N_ND groups, we performed MDS using the Jaccard dissimilarity index, which was calculated from the genus presence-absence matrix in the R package vegan37. ANOSIM is a non-parametric test that was used to compare the differences between and within groups based on the ranks of the Jaccard dissimilarity distances38. The ANOSIM statistic (R) was calculated by comparing the mean of all rank Jaccard dissimilarities for samples between and within groups. The R values range from −1 to 1; values close to 1 indicate high dissimilarity between groups, whereas values close to −1 indicate high dissimilarity within groups. A value of zero indicates completely random dissimilarity38,39.

Elimination of rarely and commonly detected microorganisms in the positive and negative groups

In 16S and 18S rRNA sequencing using next-generation sequencers, the majority of the microorganisms assigned by one or two sequences may be due to sequence errors, and microorganisms with low numbers are typically excluded from the analysis40,41,42. In addition, common microorganisms detected in both P_D + and N_ND groups were considered as indigenous bacteria in the bathwater. The elimination criteria were defined to remove rarely and commonly detected microorganisms in both P_D + and N_ND groups. The detection frequency of a microorganism (X) in the P_D + group (SPx) was calculated as SPx = Xp/P; where, Xp is the number of microorganisms (X) detected in the P_D + group and P is the number of positive samples. In addition, the detection frequency of a microorganism (X) in the N_ND group (SNx) was calculated as SNx = Xn/N; where, Xn is the number of microorganisms (X) detected in the N_ND group and N is the number of negative samples.

Microorganisms with SPx and SNx < 0.5 were defined as rare microorganisms and removed from both P_D + and N_ND groups. Next, microorganisms with detection frequency ratios (SPx/SNx) ≥ 0.5 and ≤ 2 were defined as commonly detected microorganisms and removed from both P_D + and N_ND groups.

Statistical analysis

All statistical analyses, including Spearman’s rank correlation analysis, were performed using the R package43.