Legionella is a genus of aquatic Gram-negative, rod-shaped, facultative aerobic bacteria. Members of the genus Legionella are found around the globe in a variety of natural and man-made freshwater environments1,2. Protozoa are considered the natural hosts of Legionella in these environments3. Legionella spp. and especially L. pneumophila, the clinically most relevant species, can cause two clinical syndromes in humans. The first is Pontiac fever, a self-limited flu-like illness that develops within 2–3 days in 95% of the people that are exposed to the bacteria. The second form is legionellosis or Legionnaires’ disease (LD), a severe form of pneumonia that could also affect other organs, such as the liver and kidneys4,5. Humans become infected with L. pneumophila by inhaling aerosols from aquatic environments, such as potable water, cooling towers, showerheads, whirlpools, and other man-made devices that generate aerosols6. LD is not transferred from human to human, i.e. it is not a communicable disease2,7. Major risk factors for developing the disease include suppression of the cellular immune system, cigarette smoking, use of well water, and chronic heart or lung disease8. The conventional antibiotic treatment includes azithromycin, fluoroquinolones (levofloxacin, moxifloxacin), and sulfamethoxazole - trimethoprim9,10. Despite antibiotic treatment, the case fatality rate for LD in hospitalized patients remains high, with reported rates of 5–30%11.

The epidemiological features of legionellosis in Israel are rather similar to those in the EU regarding incidence rates, seasonality, and the methods used for laboratory diagnosis. However, in Israel a larger proportion of nosocomial cases occurred12. An average of 49.3 LD cases was diagnosed annually between 2006 and 2011 in Israel; 71.4% of the clinical isolates belonged to L. pneumophila serogroup 112. Indeed, L. pneumophila serogroup 1 was recently reported as the dominant serogroup isolated from the drinking water system in Israel13,14. It is difficult to clinically distinguish LD from other causes of pneumonia; thus, many legionellosis cases remain unreported. It is highly important to develop a clinical scoring system to distinguish LD from other pneumonia causes15. Today there are several diagnostic tests in use for the detection of Legionella infections such as: Culture diagnosis, Urinary Antigen Detection, Direct Fluorescent Antibody (DFA) staining, and Legionella-specific PCR8. Previously, culture diagnosis was the golden standard for legionellosis diagnosis. Nowadays it is hardly used in routine diagnosis because it is unable to provide results within a clinically useful time frame as the diagnosis takes between three to five days16. The DFA staining is commonly used however, this test lacks sensitivity for detecting all clinically important Legionella species17. Thus, LD can be described as an elusive diagnosis rather than an exotic infection16. Although molecular tools such as specific PCR for Legionella spp. PCR have been developed, they are rarely used in clinics18. In Europe, only 2% of the 11,832 confirmed or probable LD cases were ascertained by PCR19.

In recent years, Next Generation Sequencing (NGS) high-throughput technologies have been applied to study the human microbiome–the bacterial communities inhabiting humans20. Using these methods, the microbiota of different human body organs can be studied directly using DNA extracted from samples and without employing any culture techniques. Currently, several studies investigated the lungs, the respiratory tract, and the sputum microbiome21,22,23,24 with regard to tuberculosis or cystic fibrosis. Nevertheless, none of them studied the sputum microbiome with regard to legionellosis. A better understanding of the microbiota associated with pneumonia patients may improve our understanding of patients with pulmonary infections and particularly patients with LD.

The aim of the current study was to analyze the bacterial community and the proportion of Legionella in sputum samples of patients with pneumonia due to Legionella spp. and to compare it to sputum samples of patients with pneumonia due to other pathogens. Using a NGS approach based on 16S rRNA gene amplicons, we compared bacterial communities of sputum of pneumonia patients with respect to richness, diversity, and relative abundances of bacterial genera in correspondence with presence and abundance of the genus Legionella and L. pneumophila.


Screening of sputum samples for Legionella

One hundred and thirty-three sputum samples were analyzed for the presence of Legionella species by both culture and molecular methods. To verify the ability of the chosen molecular method to detect the presence of L. pneumophila in a sputum sample, the sensitivity of the PCR was tested as follows: sputum samples were inoculated with known concentrations of L. pneumophila and DNA was extracted from the inoculated samples. The method was proven to be very sensitive as Legionella were still detectable in sputum samples that were inoculated with only 40 cfu/ml using the Legionella genus-specific PCR reaction (unpublished data). Nine sputum samples (6.8%) were positive for Legionella by PCR. One sample (sample 2PS2) out of these nine PCR-positive samples was also positive by culture. The isolate was identified as L. pneumophila serogroup 1. Details regarding the patients are summarized in Table S1.

NGS analyses of the sputum microbiome

Bacterial communities of the nine Legionella-positive sputum samples (LGP) and 13 Legionella-negative samples (LGN, chosen randomly) were studied using Illumina MiSeq deep sequencing. One of the LGP samples had to be discarded due to insufficient number of sequences. Across all 21 sputum samples analyzed, almost 2 million quality sequences were obtained. Those sequences were classified in total as 64,965 unique OTUs (Operational Taxonomic Units) at the 97% sequence similarity level across all samples. Sequences were classified into 12,738 OTUs after rarefying of all the samples to the lowest number of reads (7,155 sequences). At 3% sequence divergence, most rarefaction curves describing the number of OTUs observed as a function of sampling effort asymptotically approached saturation, indicating that the surveying effort covered almost the full extent of taxonomic diversity at this phylogenetic level (Fig. S1). Detailed coverage and Chao1values of the OTUs are given in Table S2. A Principal Coordinate Analysis (PCoA) was conducted in order to have a global picture on the relations between LGP and LGN samples. The AMOVA that was used to assess the statistical significance of the separation among the LGP and LGN groups did not show significant dissimilarities (F1,19 = 1.11, p = 0.28).

Taxonomic composition of the sputum samples

The sputum samples were analyzed by Legionella-specific PCR and bacterial-specific NGS. Based on the PCR results the samples were qualified as Legionella positive (LGP) and Legionella negative (LGN). The PCR results were comparable to the NGS results, namely, Legionella sequences were detected in the NGS in all of the Legionella PCR-positive samples. For further analyses, we compared the three samples with Legionella abundance above 0.5% separately (samples 2PS2, 6PS2, and 26PS4 with 2.88%, 0.82%, and 0.56% Legionella abundance, respectively), ‘High-LGP’ hereafter. The other five Legionella PCR-positive samples had a far lower abundance ranging from 0.11% to 0.02% and sample 33PS3 with only 0.004%. These samples are referred to as ‘Low-LGP’ hereafter.

At the phylum level, Legionella positive (LGP) and negative (LGN) samples had a comparable composition with a pronounced dominance of Proteobacteria and Firmicutes that together account for more than 80% of the bacterial community (Fig. S2A). Together with Actinobacteria (up to 9%) and Bacteroidetes (up to 5%) these four phyla comprised more than 96% of the total bacterial community. A very distinct pattern was observed in the high and low Legionella samples. Proteobacteria (High-LGP 66%, Low-LGP 27%) and Actinobacteria (High-LGP 11%, Low-LGP 4%) had an almost three-fold higher abundance in high Legionella samples (Fig. S2B). Firmicutes was the dominant phylum for low Legionella abundant samples (High-LGP 16%, Low-LGP 63%) (Fig. S2B). The higher abundance of Firmicutes is mainly due to the high abundance of Streptococcus.

The genus composition of each sample is shown in Fig. 1. At the genus level, the Legionella-positive sample types show both a comparable high abundance of Acinetobacter and a highly distinct abundance of Streptococcus that constitutes a major fraction of the Low-LGP samples. For the Legionella negative samples (LGN), six more genera achieved higher abundances (Stenotrophomonas, Escherichia-Shigella complex, Haemophilus, Proteus, Corynebacterium, and Prevotella) (Fig. 1). The High-LGP group can be distinguished from the Low-LGP and the LGN on the genus level. A main distinction for the High-LGP is a reduced dominance of single genera, i.e., only in one case does a single genus exceed an abundance of 18%, and a broad set of different genera ranges between 2% and 0.2% abundance. By contrast, the Low-LGP group is usually dominated by a single genus, mostly Streptococcus. The LGN samples are mostly (10 out of a total of 13 samples) dominated by a single genus; however, there is a broad set of different genera that may dominate these sputum samples. The genus Streptococcus played a dominant role for four samples. Seven LGN samples were each dominated by a different genus (Acinetobacter, Stenotrophomonas, Escherichia-Shigella-complex, Haemophilus, Proteus, Corynebacterium, and Prevotella). The remaining two LGN samples did not have dominant genera, mostly due to the presence of a high fraction of non-classified genera (Fig. 1).

Figure 1: Microbiome composition of all sputum samples analyzed by NGS at genus level.
figure 1

Classified genera with relative abundances above a cut-off level of 0.2% are indicated. White bars comprise of all the genera with a relative abundance of <0.2%. Gray bars represent taxa with a relative abundance above the cut-off level of 0.2%, but that could not be classified at genus level. High Legionella load (High-LGP), low Legionella load (Low-LGP) and Legionella negative samples (LGN) are displayed as separate groups for comparison.

Genus diversity and genera origin in sputum samples

Chao1 and Shannon indices were calculated at the genus level for Legionella positive and negative samples, and in addition, High-LGP (0.56% to 2.88% relative abundance) and Low-LGP (0.11% to 0.01% relative abundance) (Fig. 2). While richness and diversity were not distinct when comparing Legionella-positive vs. Legionella-negative samples, there was a clear distinction of High-LGP from Low-LGP or LGN samples. Richness and diversity were significantly higher for High-LGP compared to Low-LPG and LGN (p < 0.05, Fig. 2 and Table 1). This pattern was also observed using other diversity indices for the genus and the family level (Fig. S3).

Figure 2: Alpha-Diversity estimates for sputum samples based on OTU-data at the genus level.
figure 2

(A) Comparison of Legionella-negative samples (LGN) with the total of the Legionella-positive samples (LGP) was not distinct. (B) Samples with high abundance of Legionella (High-LGP) display a higher diversity on the genus level than samples with low Legionella abundance (Low-LGP) and PCR-negative samples (LGN). Levels of significance are indicated by asterix as inferred from pairwise t-test with Holms-adjusted p-values (**p-value ≤ 0.01, *p-value ≤ 0.05, n.s.: not significant; more details can be found in Table 1). This pattern was also observed for these samples on the family level. For a comparison of different diversity indices on the genus and family levels see also Supporting Information (Fig. S3) and Table 1.

Table 1 Holm-corrected p values of differences between the richness and diversity estimates displayed in Figs 2 and S3.

Details on the genus composition of the sputum samples are shown in Fig. 3 and in Tables S3 and S4. The average of the genera abundances is presented at three different abundance levels (Fig. 3A, from 1.0% to 100%; Fig. 3B from 0.1% to 10%; and Fig. 3C from 0.01% to 1%). High-LGP samples had different genera than Low-LGP or Legionella negative (LGN) samples. The High-LGP samples showed high genus diversity between 2% and 0.2% relative abundance. Though still distinct, the genus composition within the High-LPG samples was more comparable than those within the Low-LGP and the LGN samples (Fig. 3).

Figure 3: Rank abundance curves of the overall genus composition of the sputum microbiomes at different levels of abundances.
figure 3

Shown are the average relative abundances for all genera that exceeded 0.5% relative abundance in at least one sample. Samples are grouped by relative Legionella abundance, i.e. high Legionella load (High-LGP; red, i.e. >0.5% relative abundance), low Legionella load (Low-LGP; orange, i.e. 0.11 to 0.01% relative abundance) and where Legionella was not detectable by PCR (LGN; blue). For comparison of genera at different abundance levels, three separate plots showed genera in the range of 1–100% (A), 0.1–10% (B) and 0.01–1% (C) relative abundance. For detailed abundance values, see also Supporting Information (Tables S3 and S4).

An interesting observation was that in most of the sputum samples, OTUs belonging to one or two genera were dominant. For example, Streptococcus OTU abundance in samples 26PS1, 26PS6, 11PS4, 22PS3, 30PS2 and 30PS4, were 97.5%, 70.9%, 80.1%, 85.7%, 82.8% and 72.7%, respectively. Acinetobacter OTUs abundances in samples 33PS3 and 9PS2 were 98.9% and 41.5%, respectively. Stenotrophomonas OTUs abundances in sample 19SP16 were 95.7%. (Tables S3 and S4). Sample 16PS2 was dominated by Streptococcus (29.2%) and Neisseria (15.2%); sample 26PS4 was dominated by Streptococcus (17.9%) and Acinetobacter (31.3%); sample 30PS1 was dominated by Prevotella (50.6%) and Streptococcus (21.8%); and sample 30PS3 was dominated by Corynebacterium (55.6%) and Stenotrophomonas (21.0%) (Tables S3 and S4)

NGS results compared to hospital laboratory culture results

Sputum samples are cultured by the hospital laboratory on selective and diagnostic plates only if they have more than 25 leukocytes per microscopic field. This is why, seven out of the 21 Illumina analyzed sputum samples were not cultured at all. NGS analyses showed the prevalent genus of the sputum samples (Tables S3 and S4), however, in the majority of cases, these results did not match the hospital laboratory culturable results. The culturable results of only two samples (2PS2 and 29PS3) matched the NGS result, i.e., Legionella was detected on a selective medium, and Escherichia was identified as the most abundant pathogen. In all other cases, abundant species were not detected; in some cases bacteria of minor prevalence in the NGS analysis were cultured. Yet, it has to be emphasized that commensal pathogens (for example streptococci which can be normal oral microbiota), are not reported by the clinical laboratory, while NGS sequencing is unable to resolve pathogenic streptococci species.

Comparative sequence analysis of Legionella species

The 16S rRNA gene amplicons analyzed in this study allowed differentiation between most Legionella species and also between two clusters of L. pneumophila at the subspecies level: One cluster: Legionella operational taxonomic unit 8 (LTU 8) comprised L. pneumophila reference strain Philadelphia (and included also Corby and Fraseri). A second cluster (LTU 5) comprised L. pneumophila Lens (and Alcoy) (Fig. 4). Clustering at the 99% sequence similarity level showed that the majority of Legionella reads in LGP sputum samples were indeed clustering with one of the two aforementioned L. pneumophila clusters (Table S5 and Fig. 4). Phylogenetic comparison further elucidated the prevalence of sequences highly similar to L. pneumophila and the sample-specific compositions of LTUs (Fig. 4). Only in sample 16PS2 non-pneumophila LTUs (LTU 66 and 68) were slightly more abundant than LTU 5 (Fig. 4). Interestingly, non-pneumophila LTUs displayed an even more sample-specific distribution pattern by each cluster, occurring only in a single sputum sample (2PS2 and 6PS2). The fact that the abundance pattern of LTUs differed substantially between all samples indicated that sequence variations were not due to amplification or sequencing errors but might reflect the individual nature of each Legionella infection. By looking at the phylogenetic distribution of the major LTUs detected in the sputum samples it is apparent that, in addition to the two subspecies represented by the strains Philadelphia and Lens, a third phylogenetic branch exists that was detected in three of the patients (26PS5, 2PS2, and 6PS2) comprising the LTUs 24, 50, 32, and 59 from L. pneumophila (Fig. 4). Therefore, we conclude that there might be a third L. pneumophila subspecies specifically occurring in the respective patients in Israel. In addition, the two samples (2PS2 and 6PS2) with the highest relative abundance of L. pneumophila reads, had sets of 5 and 8 and non-pneumophila LTUs associated with the L. pneumophila-specific LTUs (Fig. 4). This hints at additional signatures for a patient-specific NGS profile of Legionella species that could be of clinical relevance. The distribution of the most abundant LTUs in High-LGP and Low-LPG sputum samples is presented in Fig. 5.

Figure 4: Phylogenetic diversity and abundance of Legionella operational taxonomic units (LTUs) in Legionella-positive (LGP) sputum samples based on 16S rRNA sequence comparison.
figure 4

Representative sequences of each LTU were compared with selected reference sequences of described Legionella species. Coxiella burnettil served as an outgroup. The size of the colored dots represents LTU-specific abundances in read numbers. The colors refer to the different samples. LTUs display a sample-specific distribution pattern and those belonging to the L. pneumophila branch (bracket) were most abundant. For clarity, only clusters with ≥5 reads are shown. Sample 33PS3 was below this threshold. Nodes with bootstrap support of ≥50% are indicated and the scale bar shows the number of substitutions per site.

Figure 5: Distribution of the most abundant Legionella operational taxonomic units (LTUs) in Legionella-positive (LGP) sputum samples with high (red) and low (orange) Legionella load.
figure 5

For clarity only clusters with more than 10 reads in at least one sample are shown. Sample 33PS3 was below this threshold. Note that the scale is log-transformed.


LD or Legionellosis is a type of bacterial pneumonia caused by L. pneumophila and other pathogenic Legionella species8. Studies in Europe and the United States showed that Legionella infections are responsible for 1–5% of all hospitalized community-acquired pneumonia cases24. LD identification in pneumonia cases is underestimated due to the failure to diagnose LD in routine practice15. Several attempts at developing a clinical scoring system to distinguish LD from other pneumonias have failed8. In the current study we monitored the prevalence of Legionella in hospitalized pneumonia patients by both culture-dependent and -independent (PCR) methods. In addition, we used the Illumina MiSeq platform to sequence the 16S rRNA gene in order to describe the relation between Legionella and the composition of the sputum microbiome. Legionella detection by culture-dependent methods is limited compared to the PCR-based culture-independent methods, in which indicative genes are assessed directly from the DNA extracted from sputum25.

The prevalence of Legionella species in sputum samples by PCR genus-specific primers according to Kahlisch et al.26 was 6.8% with nine out of 133 positive samples. The PCR reaction was found to be very sensitive, as it could detect the presence of 40 Legionella CFUs per ml. In contrast, Legionella was cultured only from one sample (2PS2). Patient 2PS2 was the only one recognized by the hospital as having LD. As for the other eight patients that were Legionella-positive only by PCR, there was a debate regarding their treatment. However, they received an antibiotic treatment that also covered LD. There is no reference in the literature discussing similar cases, and bacterial inoculum required to cause human infection or disease is currently under debate8,27,28,29.

The NGS analyses of the sputum microbiome revealed that a broad set of bacteria were dominating the sputum, especially for samples without higher abundance of Legionella (LGN and Low-LGP). Streptococcus played a major role in these samples. However, in LGN samples six more genera showed high abundances, i.e., Stenotrophomonas, Escherichia-Shigella complex, Haemophilus, Proteus, Corynebacterium, and Prevotella (Fig. 3, Tables S3 and S4). Besides the highly abundant genera, a second or a third bacterial genus with a higher abundance (above 10% relative abundance) was observed. Interestingly, Legionella was never a dominant genus and ranged below 2.9% of relative abundance, even in the case of confirmed Legionellosis.

The bacteria in sputum with high Legionella abundance (>0.5% relative abundance) showed a distinct composition and diversity compared to samples without or with low Legionella abundance. This was indicated by a significantly higher richness and diversity of the high Legionella samples (p < 0.05, Fig. 2 and Table 1). Furthermore, the composition on the phylum and the genus level was distinct; especially pronounced was the lower abundance of Firmicutes in the high-LGP samples.

The genus composition of High-LGP samples was distinct in many respects from the LGN and Low-LGP samples (Figs 1 and 3). High-LGP samples showed a lower pronounced dominance of single genera and high genus diversity between 0.2% and 2% relative abundance. A large fraction of genera of the High-LPG can be considered of environmental origin, with many of aquatic origin. In addition to Legionella, waterborne pathogens such as Acinetobacter, Stenotrophomonas, Pseudomonas, Vibrio, Helicobacter, and Aeromonas were present. Some genera that were found in the High-LGP samples were observed in the amoeba microbiome isolated from drinking water distribution systems30,31,32. These include protobacterial genera of aquatic origin such as the Alphaproteobacteria genera Sphingomonas, Brevundimonas, Novosphingobium, Bradyrhizobium and Methylobacterium, the Betaproteobacteria genus Curvibacter, and the Gammaproteobacteria genera Legionella, Acinetobacter, Stenotrophomonas, Escherichia, Pseudomonas, and Sphingobacterium. These potentially amoeba-borne genera form a fraction between 24.4% and 5.6% of the total genera (including unassigned and unclassified) or 8% to 33% of the classified genera. Thomas et al.32 highlighted the potential risk associated with these amoeba-based bacteria from drinking water systems. The fraction of genera of “potential amoebal origin” increased with higher Legionella abundance. For a comparison, the genus Legionella itself contributed 9% to 4% of this “potentially amoeba-borne fraction”. In contrast, in LGN samples, only three Gammaproteobacteria genera with shown presence in amoeba in drinking water were occasionally observed, i.e., Acinetobacter, Stenotrophomonas, and the Escherichia-Shigella-complex, whereas the other genera were not detected or were much less abundant (Fig. 3).

L. pneumophila was the most prevalent Legionella species in LGP samples. The High-LPG samples had not only the highest Legionella abundance, but also the highest sub-species L. pneumophila diversity and non-pneumophila phylotype diversity compared to the Low-LGP samples (Fig. 4). Based on the above observations, we hypothesize that the presence of high L. pneumophila abundance in our samples might have been caused by the transfer of amoeba or amoebal vesicles together with the co-microbiome of the amoeba. This might explain the establishment of L. pneumophila due to their greater virulence after contact with amoeba and the accompanying microbiota that is distinct from the other sputum samples.

Streptococcus was a prevalent genus in almost half of the LGN samples and 80% of the Low-LGP samples (Tables S4 and S5). These results are in agreement with Cho et al.33 who described Streptococcus pneumoniae as the most common bacterial agent in community-acquired pneumonia. Streptococcus was also found to be the most abundant genus in the lung microbiomes of cystic fibrosis patients23,33. Besides Streptococcus, there were seven more pathogenic genera dominating LGN samples (Fig. 3A and Table S5). Although Streptococcus may have an outstanding role, a broad set of other genera was shown to dominate sputa and can be considered as causes of pneumonia29,34.

For LGP samples, at least one more dominant bacterial species was present suggesting that coinfection of Legionella with another species may occur (Fig. 1 and Table S3). An interesting point that arises here is that Legionella was never the dominating genus, and was always accompanied by other respiratory pathogens. The pattern was distinct for samples with high and low Legionella abundance. For low-LGP samples, there was mostly dominance by Streptococcus, except for one sample with dominance of Acinetobacter. For the high-LGP samples, Acinetobacter and Streptococcus co-occurred but with a much lower abundance. Therefore, for the High-LGP patients, disease due to Legionella is more likely than for the Low-LGP patients. Tan et al.35, described six LD patients who all had bacteremic co-infection of Legionella, particularly with Streptococcus pneumoniae. Legionella coinfection cases with other bacterial species were also described in the litrature. For example; coinfections of Legionella with Mycoplasma pneumoniae, Chlamydia pneumoniae, Chlamydia psittaci, Klebsiella pneumoniae and Pseudomonas aeruginosa36 and Listeria monocytogenes37. Coinfections of Legionella with the fungus Pneumocystis jirovecii in an infant38, with influenza virus39 and with herpesvirus40, have also been reported. Based on our results and the literature evidence, we hypothesize that Legionella patients might have bacterial coinfections.

Comparison between NGS and the culture results that were obtained by the hospital laboratory were highly divergent (Table S6). In many cases, species at a low abundance in NGS were cultured. These differences can be due the fact that the bacterial community in a sputum sample is a mixture of different species, and thus a species with a minor prevalence but a more competitive growth in culture overgrew others and is identified as the disease causing agent (for example; Acinetobacter can overgrow Legionella even on Legionella-selective medium). In a former large cohort study of Cystic Fibrosis patients we23 have shown that the coincidence between NGS data and clinical cultivation based data may be highly consistent. The full spectrum of bacterial genera can be very helpful to judge the patient’s development and response to the treatment23. In the current study, very heterogeneous group of pneumonia patient were analyzed, and as shown, infected by a broad set of different bacterial genera. This may explain the lower match between clinical culture based and NGS data, because it is challenging to cover such a broad spectrum of bacterial genera by cultivation. Charlson et al.41 and Morris et al.42 suggested that the upper and lower airway bacterial community composition is rather comparable within healthy individuals, except that the bacterial abundance decreases towards the lung tissue. Moreover, some bacterial species tend to increase in the lung compared to the upper airways43. Recently, Segal et al.44, showed that for healthy volunteers, the lung microbiome can be similar or can differ from the upper airway microbiome. They showed that inflammatory processes occurring in healthy airways lead to a high similarity between the upper airway microbiome and the lung microbiome. They44 concluded that the transfer is based on increased microaspiration caused by inflammatory processes. Thus, sputum samples can be considered as providing a good assessment of relevant pathogens of the overall respiratory system. This was also evident from our former study23. Our results combined with the evidence from the literature demonstrate that there is a need to reevaluate the regulation of medical laboratory routine procedures and to consider using molecular methods for more accurate diagnosis.

In Figs 4 and 5 Legionella species-specific NGS profiles (based on LTUs) provided most patient-specific signatures, i.e., the abundances of the individual LTUs vary from patient to patient. While the microbiological understanding of these profiles awaits further studies at a strain level (also conceivable with marker genes of higher resolution), the highly diverse profile could help identify the source of infection if similar profiles would be obtained from DNA extracted from suspected sources. Coscollá et al.45, also described mixed infections of L. pneumophila strains in outbreak patients. They analyzed sequence based typing (SBT) profiles from uncultured respiratory samples and found evidence of a mixture of SBT Legionella profiles in three of the patients. These, combined with our results indicate that patients may be infected from the environment by more than one strain.

The suggested polymicrobial nature of L. pneumophila infections bears essential aspects for the survival and growth of L. pneumophila on one hand, and the development of the disease for the patient on the other hand. Several studies have shown that L. pneumophila is not a very competitive bacterium46,47. Both positive and negative interactions of L. pneumophila with the co-occurring microbiota are highly likely to occur and thus may have an important impact on its proliferation and the resulting disease. The distinct bacterial community occurring in the presence of high Legionella abundance may be an indication in this respect.

With respect to the competitiveness of L. pneumophila and its interaction with the co-occurring microbiota, the strain level may be of further interest. The diversity of the present L. pneumophila strains within the microbiota along the whole respiratory tract may represent an important selection factor. As revealed by the analysis of Legionella clusters in this study, the diversity was greater with increasing Legionella abundance (Figs 2 and S3). Based on our observations, we hypothesize that a high diversity supported the competitiveness of L. pneumophila in the lung microbiota, but this still has to be substantiated by more detailed studies on the strain level. Based on our data, we think that future studies analyzing the interactions between L. pneumophila and the co-occurring microbiota (including non-bacterial pathogens), are valuable and needed. A more thorough understanding of these interactions can be considered to hold great promise for the prevention of legionellosis.

Future perspectives

Overall, our sputum NGS sequencing resulted in profiles of all major bacterial commensals and pathogens for individual patients. We analyzed the microbiota profiles at different taxonomic levels: the genus level for the overall community and the species and sub-species level for the genus Legionella. Ideally, all bacterial species should be identified with such a profile but the 16S rRNA genes used in our NGS approach do not provide this level of taxonomic resolution for all relevant genera. For example, for the genus Streptococcus only the alpha-hemolytic streptococci, including S. pneumoniae, can be separated from the anaerobic S. milleri group23, where as for the genus Legionella the resolution is sometimes better than the species level as observed for L. pneumophila. Other taxonomic marker genes, such as gyrB or rpoB, might offer better taxonomic resolution but await further development and application to clinical specimens48,49.

Since deep sequencing is currently not used as a diagnostic tool in hospital laboratories, further studies at larger scales are essential to assess the correlation between Legionella abundance and other pathogens in sputum samples, and to compare those results to the clinical and laboratory indicators such as inflammatory markers related to the presence of Legionella. Future studies should include a comparison of sputum with bronchoalveolar lavage (BAL) in order to assess the transfer of Legionella from the lung to the sputum and the correlation between Legionella abundance in sputum and BAL50. The potential of molecular sputum analyses compared to BAL and culture-based methods could be assessed with the aim to generate a fast, sensitive, and reliable diagnostic tool for Legionella species detection in sputum without the need of BAL. Finally, NGS profiles from sputum could help select the antibiotic mix needed for a successful treatment of pneumonia based on knowledge of the bacterial infections present.

Another aspect of potential relevance for pathology and prevention is the observation that the bacterial community occurring with high Legionella abundance was distinct from low or no Legionella presence in sputum. First, this may indicate that a certain level of L. pneumophila is needed to cause Legionellosis. Second, the accompanying microbiota may indicate that the microbiome of amoeba from drinking water is transferred together with the Legionella itself. The relevance of amoeba as a transfer vehicle of highly virulent Legionella may be additionally emphasized by this observation. These aspects deserve future more ample studies for insights into the adherent mechanisms of pathogenesis and potential prevention measures.

In conclusion, the NGS approach allowed the identification of the sputum microbiota at the genus level, and for Legionella genus, at the species and sub-species level. Legionella sub-species profiles (based on LTUs) provided patient-specific signatures. Legionella was never the dominant species and it is possible that in some patients coinfection with other bacterial species might have occurred. We are at the beginning of the NGS technological era. These novel techniques should certainly be considered as tools for developing new and fast molecular methods to diagnose pathogens in pneumonia patients. Legionella detection in sputum as well as in water samples plays a critical role in public health. Thus, identifying Legionella as the causative agent of infection is crucial for disease treatment and outbreak prevention33,51.

Materials and Methods

Ethics Statement

The methods were carried out in “accordance” with the relevant guidelines of Helsinki Committee. Written informed consent signed by each of the pneumonia patients was provided before sputum samples were taken. This study was approved by the institutional Helsinki Committee (approval no. 2013-5999).

Sputum sampling

Sputum samples were collected from 133 pneumonia patients who were hospitalized at Poriya Hospital (Israel), between April 2013 and September 2014. All sputum samples were taken before the patients received any antibiotic treatment. All the patients were tested for influenza A (including H1N1), influenza B and respiratory syncytial virus (RSV). Only patients negative to these viruses were including in the research. All the sputum samples were stored for up to 6 hours at 4 °C and were further treated as described below in the same day of the sampling. The average age of the patients was 62.5 years; 37 patients were females and 96 males. Most of the patients were from the respiratory intensive care unit (ICU) (52%). Others were from the internal medicine department (24%), the cardiac ICU (19%), and the pediatrics department (5%).

Culture-dependent detection of Legionella species

For Legionella culturing, 10 μl sputum samples were treated thermally (10 min, 56 °C) and then inoculated on GVPC (Glycine-Vancomycin-Polymyxin-Cycloheximide; BD, United States) Legionella-selective agar plates. Plates were incubated at 37 °C for 7 days. Identification of Legionella pneumophila was made based on morphological and Gram stain; colonies that were Gram-negative were further analyzed using a Legionella latex agglutination kit (Oxoid, Thermo Scientific). Colonies that were found positive were kept in Luria Broth (LB) supplemented with 30% glycerol at −80 °C.

Culture-independent detection of Legionella species

For sputum DNA extraction, a protocol used for Mycobacterium tuberculosis detection was adapted with some modifications52. Briefly, 1 ml of sputum sample was mixed with MycoPrep (Becton Dickenson, USA) in a ratio of 1:1 and incubated for 17 min at room temperature. NPC-67 neutralizing buffer (Alpha Tec Systems, Inc., USA) was added to a final volume of 25 ml. Samples were centrifuged for 15 min at 3,500 g. The supernatant was discarded and the pellet was suspended in 1 ml Pellet Resuspension Buffer (Alphatec, USA). After mixing, 200 μl of the sample was used for DNA extraction using DNeasy Blood & Tissue Kit (Qiagen, Germany), according to the manufacturer’s instructions.

Genomic DNA from sputum was used as a template for amplifying Legionella genus-specific 16S rRNA genes according to Kahlisch et al.26. Primers specific for the genus Legionella were: Lgsp28R 5′-CACCGGAAATTCCACTACCCTCTC-3′ and Lgsp17F 5′-GGCCTACCAAGGCGACGATCG-3′. To verify the identification of Legionella from positive DNA sputum samples, PCR products were run on an agarose gel and bands from all the positive samples were excised. DNA was extracted from the bands using the QIAquick Gel Extraction Kit (Qiagen, Germany). Amplicons were sequenced by MCLAB laboratory (CA, USA). The obtained sequences were compared to those available at the EzTaxon server ( to ascertain their closest relatives. Nucleotide sequence accession numbers of the band sequences were deposited in the GenBank (KT382274-KT382277) (some of the sequences were less than 200 bp in length and thus were not deposited).

16S rRNA gene library preparation for microbiome analyses

Sputum DNA was amplified using primers targeting the V4 variable region of the bacterial 16SrRNA gene. Primer sequences were: CS1_515F 5′-ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA-3′ and CS2_806R 5′-TACGGTAGCAGAGACTTGGTCTGGACTACHVGGGTWTCTAAT-3′ with an amplicon size of 291 bp54. These primers contained 5′ common sequence tags, in accordance to Moonsamy et al.55. The PCR amplification procedure is described in detail in Aizenberg-Gershtein et al.56.

Illumina MiSeq sequencing

MiSeq sequencing was performed at the DNA Services (DNAS) Facility–University of Illinois, Chicago (UIC). The sequencing protocol is described in Aizenberg-Gershtein et al.56. The procedure included a second PCR amplification in 96-well plates, where each well received a separate primer pair, obtained from the Access Array Barcode Library for Illumina Sequences [10-base barcode (Fluidigm, South San Francisco, CA; Item no. 100-4876)]. Pooled, diluted libraries were sequenced using a MiSeq 600-cycle sequencing kit version 3, and analyzed with Casava1.8 (pipeline 1.8). Reads were 200 nucleotides in length (paired end, 2×200). PhiX DNA was used as a spike-in control. Barcode sequences were provided to the MiSeq server, and were automatically binned according to 10-base multiplex identifier (MID) sequences. Raw reads were recovered as FASTQ files.

Sequence analyses of all bacteria

Bioinformatics were performed using MOTHUR v.1.33.357. The MiSeq Standard Operating Procedure (SOP) followed was the one described by Kozich et al.58. Briefly, any sequences with ambiguities or homopolymers longer than 8 bases were removed from the data set. Sequences were aligned using the SILVA-compatible alignment database available within MOTHUR. Sequences were trimmed to a uniform length of 290 bp, and chimeric sequences were removed using Uchime59. Sequences were classified using the MOTHUR-formatted version of the RDP training set (v.9) and clustered using Furthest Neighbor algorithm, into OTUs, based on 97% sequence identity. The whole dataset was randomly subsampled to the minimum number (7,155) of sequences per sample, to avoid bias connected with uneven sequences across the samples.

The Illumina sequences can be downloaded at The accession for the submission is: PRJNA312879 (SRP070932).

Microbial richness and coverage estimations

Community richness (Chao1 estimator), community diversity (coverage), and rarefaction curves were generated using the MOTHUR program (version 1.33.3). Yue and Clayton-based distance matrix, which measures community structure by incorporating both membership and abundance, was used to generate Principal Coordinates Analysis (PCoA). The significance of differences in theta index scores between the samples was assessed by analysis of molecular variance (AMOVA). AMOVA tests whether the centers of the clouds of samples representing each group (between-groups variation) are more separated than the variation among the samples of the same group (within-group variation)60. The groups tested were: 1. Legionella-positive sputum samples versus Legionella negative sputum samples; 2. Sputum samples for different age groups; 3. Sputum samples from different hospital departments.

Specific sequence analyses of Legionella species

Illumina reads that were classified as Legionella spp. (n = 3,838; average read length: 292 bp) were cleared of singletons (resulting in 3,562 reads resembling 260 original sequences) and further aligned with reference sequences of all described Legionella species (, last accessed October 27, 2015) using the default settings of the ClustalW tool in MEGA (v. 6)61. Aligned sequences were then grouped into clusters of 99% sequence similarity using the assembly algorithm for dirty data implemented in Sequencher (v. 5.2.4), resulting in 178 clusters. For each cluster, read numbers of all individual sequences were summed and a representative sequence was picked as to serve as the “Legionella operational taxonomic unit” (LTU). A phylogenetic comparison was conducted for LTUs with more than 3 reads in total by the Neighbor Joining method under the Kimura 2-parameter model with gamma distributed rate variation among sites (shape parameter = 5) using MEGA. The phylogenetic tree was tested by bootstrapping 1000 replicates. The R package “phyloseq” (v. 1.10.0) was used for integration of abundance data and phylogeny as well as for visualization62.

Additional Information

How to cite this article: Mizrahi, H. et al. Comparison of sputum microbiome of legionellosis-associated patients and other pneumonia patients: indications for polybacterial infections. Sci. Rep. 7, 40114; doi: 10.1038/srep40114 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.