The human oral cavity is inhabited by over 600 bacterial species, known collectively as the oral microbiome (Dewhirst et al., 2010); these bacteria are involved in a wide variety of functions, and many are important in maintaining oral health (Belda-Ferre et al., 2012). Oral dysbiosis leads locally to periodontitis, dental caries and potentially to head and neck cancer (Wade, 2013; He et al., 2015). There is also increasing evidence of a role for oral dysbiosis in systemic diseases of the lung (Beck et al., 2012), digestive tract (Ahn et al., 2012) and cardiovascular system (Koren et al., 2011), yet factors that influence the oral microbiome are poorly understood. Cigarette smoke is a source of numerous toxicants (WHO, 2012) that come into direct contact with oral bacteria; these toxicants can perturb the microbial ecology of the mouth via antibiotic effects, oxygen deprivation or other potential mechanisms (Macgregor, 1989). Loss of beneficial oral species due to smoking can lead to pathogen colonization and ultimately to disease; this contention is strongly supported by the well-established role of smoking in the onset and progression of periodontitis (Nociti et al., 2015). Previous studies have shown alterations in the abundance of selected oral bacteria in smokers compared with non-smokers (Colman et al., 1976; Ertel et al., 1991; Charlson et al., 2010; Kumar et al., 2011; Hugoson et al., 2012; Morris et al., 2013; Belstrom et al., 2014; Mason et al., 2015); however, results across these studies are largely inconsistent, possibly due to small sample sizes in some, use of different sampling sites in the mouth and use of different laboratory methodologies, some of which impose limitations on bacterial profiling.

To improve our understanding of the influence of smoking on the oral microbiome, we conducted a comprehensive assessment of oral microbiome community composition and individual taxon abundance, by bacterial 16S rRNA gene sequencing, in 1204 individuals from two large US national cohorts. Strengths of our study include the availability of detailed data from both cohorts on individual smoking history and potential demographic confounding factors. In addition, the large sample size available from each cohort provided us with excellent statistical power for discovery in combined analyses, as well as the opportunity to independently replicate results in each cohort.

Subjects and methods

Study population

Participants were drawn from the National Cancer Institute (NCI) Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial cohort (Hayes et al., 2000) and the American Cancer Society (ACS) Cancer Prevention Study II (CPS-II) Nutrition cohort (Calle et al., 2002), which are described in detail in the above-cited references.

Briefly, the PLCO Cancer Screening Trial is a multicenter trial designed to determine the effects of screening on cancer-related mortality and secondary end points among 55- to 74-year-old subjects (Hayes et al., 2000). In PLCO, 154 901 participants were recruited between 1993 and 2001 at 10 centers across the United States and were randomized to either the screening arm (n=77 445) or the control arm (n=77 456) of the trial. Oral wash samples were collected in the control arm only.

The ACS CPS-II Nutrition cohort (n=184 194) is a prospective study of cancer incidence initiated in 1992. It is a subset of the larger CPS-II cohort (n=1.2 million participants) recruited by ACS volunteers in 1982 and followed for mortality. At enrollment in the larger cohort in 1982, and in the subcohort in 1992/1993, participants completed self-administered questionnaires that included information on demographics, family characteristics, personal and family history of cancer and other diseases, reproductive history, as well as various behavioral, environmental, occupational and dietary exposures. Beginning in 1997, follow-up questionnaires were sent to cohort members every 2 years to update exposure information and ascertain newly diagnosed cancers. Oral wash samples were collected by mail from 70 004 CPS-II Nutrition cohort participants.

All subjects included in the present analyses were originally selected from the CPS-II and PLCO cohorts as cases or controls for collaborative nested case–control studies of the oral microbiome in relation to two smoking-related cancers, head and neck cancer and pancreatic cancer. Cases were participants who developed one of these two types of smoking-related cancers at any point after collection of the oral wash samples (time from sample collection to diagnosis ranged up to 12 years). Age- and sex-matched controls were selected by incidence density sampling among cohort members who provided an oral wash sample and had no cancer prior to selection.

Because the oral microbiome assays took place at different times for the pancreas study and the head and neck study, four separate data sets were assembled for this analysis, PLCO-a (n=261 PLCO participants in the head and neck study), PLCO-b (n=400 PLCO participants in the pancreas study study), CPS-II-a (n=203 CPS-II participants in the head and neck study), and CPS-II-b (n=340 CPS-II participants in the pancreas study). All participants provided informed consent and all protocols were approved by the New York University School of Medicine Institutional Review Board.

Smoking and other covariate assessment

Comprehensive demographic and lifestyle information was collected by baseline and follow-up questionnaires in the PLCO and CPS-II Nutrition cohorts. Detailed information on cigarette smoking, including smoking status (never, former, current), smoking dose and smoking duration, was ascertained at baseline and follow-up via questionnaires in both cohorts. These smoking data have been used in previous publications (Hocking et al., 2010; Gaudet et al., 2013).

Oral sample collection

Oral wash samples were collected between 1993 and 2001 from control arm participants in the NCI PLCO cohort (Hayes et al., 2000) and between January 2001 and December 2002 in the ACS CPS-II Nutrition cohort (Calle et al., 2002). Participants in both cohorts were asked to swish vigorously with 10 ml Scope mouthwash (P&G) and were directed to expectorate into a specimen tube. Samples were shipped to each cohort’s biorepository, pelleted and stored at −80 °C until use.

Oral microbiome assay

In 2013, we extracted bacterial genomic DNA from oral samples aliquoted from the respective biorespositories using the Mo Bio PowerSoil DNA Isolation Kit (Mo Bio Laboratories, Carlsbad, CA, USA) with bead-beating method in the Mo Bio Powerlyzer instrument (Mo Bio Laboratories). As reported previously (Wu et al., 2014), 16S rRNA gene amplicons covering variable regions V3 to V4 were generated using primers (347F-5′GGAGGCAGCAGTRRGGAAT-3′ and 803R-5′-CTACCRGGGTATCTAATCC-3′) (Nossa et al., 2010) incorporating barcode sequences as well as Roche 454 FLX Titanium adapters (454 Life Sciences, Branford, CT, USA), and pooled amplicons libraries were sequenced with the 454 Roche FLX Titanium pyrosequencing system following the manufacturer’s specifications. Laboratory personnel were blinded to smoking status. Multiplexed and barcoded sequences were deconvoluted. Poor-quality sequences were excluded using the default parameters of the QIIME script (minimum average quality score=25, minimum/maximum sequence length=200/1000 base pairs, no ambiguous base calls and no mismatches allowed in the primer sequence) (Caporaso et al., 2010). From the 1204 oral wash samples, we obtained 14 553 620 quality-filtered 16S rRNA gene sequence reads. Sequences were clustered into de novo operational taxonomic units (OTUs) at 97% identity, and representative sequences for each OTU were assigned taxonomy based on fully-sequenced microbial genomes (IMG/GG Greengenes), using the QIIME script (Caporaso et al., 2010). Chimeric sequences (identified using ChimeraSlayer; Haas et al., 2011), sequences that failed alignment, and singleton OTUs were removed. The final total data set retained 12 212 734 sequences (mean±s.d.: 10 144±2845 sequences per sample), with similar sequencing depths in the individual four data sets (Supplementary Table S1), and contained 43 435 OTUs.

Quality control

Blinded quality control specimens were used in each data set, respectively, across all sequencing batches. The number of quality control subjects and replicates of each sample are shown in Supplementary Table S2. Quality control samples had good reliability: across the four data sets, coefficient of variability ranged from 0.45% to 8.28% for the Shannon Index, and 6.29% to 26% for various phylum relative abundances.

Statistical analysis

The relationship between smoking status and overall oral microbiome composition was assessed by analysis of weighted and unweighted UniFrac distances (Lozupone et al., 2011), computed using the QIIME pipeline (Caporaso et al., 2010). Principal coordinate analysis plots were generated using the first two principal coordinates and labeled according to smoking status. Permutational MANOVA ('Adonis' function, vegan package, R) (McArdle and Anderson, 2001) of the weighted UniFrac distance was used to test differences in overall oral microbiome composition across the categories of smoking status, adjusting for age, sex and data set.

The OTU table of raw counts was normalized to an OTU table of relative abundances, and taxa of the same type were agglomerated at the phylum, class, order, family and genus levels. Our analysis included taxa from the five major phyla of the oral microbiome (Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria and Fusobacteria), and we additionally filtered out taxa present in less than 10% of participants, leaving 13 classes, 20 orders, 40 families and 69 genera in this analysis. Relative abundances of bacterial taxa were compared across the categories of smoking status using the nonparametric Kruskal–Wallis test. We further used polytomous logistic regression for the pairwise comparisons of former vs never smokers and current vs never smokers, adjusting for age and sex. We calculated nominal P-values for each individual data set (PLCO-a, PLCO-b, CPS-II-a and CPS-II-b) and report meta-analysis P-values based on Z-score methods (Evangelou and Ioannidis, 2013). Kendall's Tau rank coefficient was used to test the association of smoking-related variables (number of cigarettes per day and number of years since quitting) with relative abundances of selected taxa.

In order to investigate whether any of the differentially relative abundant taxa were identified due only to autocorrelation with other taxa, we conducted a parallel analysis using the DESeq function in the DESeq2 R package (Love et al., 2014; McMurdie and Holmes, 2014); this function models raw counts using a negative binomial distribution, taking into account sample library size and the dispersion for each taxon. Using this model we compared current smokers to never smokers, and former smokers to never smokers, adjusting for sex, age and data set, at the phylum through genus levels and at the OTU level.

Bacterial metagenome content was predicted from 16S rRNA gene-based microbial compositions, and functional inferences were made from the Kyoto Encyclopedia of Gene and Genomes (KEGG) catalog, using the PICRUSt algorithm (Kanehisa et al., 2012). A total of 6,909 inferred genes were categorized into 275 KEGG functional pathways; pathways present in <10% of participants were removed, leaving 252 KEGG pathways for analysis. The 'DESeq' function in DESeq2 was used to test for differentially abundant KEGG pathways by smoking status, adjusting for sex, age and data set. Spearman's rank correlation was used to examine associations between pathways significantly associated with smoking status and genera significantly associated with smoking status.

All statistical tests were two-sided, and a P-value of less than 0.05 (or false discovery rate (FDR) adjusted q less than 0.05) was considered statistically significant. All analyses were carried out using SAS 9.3 (SAS institute, Cary, NC, USA) and R 3.2.0.


Oral biospecimens, smoking history and other demographic information were collected at baseline for both the CPS-II and PLCO cohorts. Of the 1204 individuals included in the present analyses, 43.3% (n=521) were never smokers, 47.4% (n=571) were former smokers and 9.3% (n=112) were current smokers. Among the former smokers, 17% had quit within 10 years of sample collection, 47% 10–30 years prior to sample collection and 30% over 30 years prior to sample collection (6% were missing this information). Current and former smokers tended to be younger and more often male, compared with never smokers (Table 1). All subjects included in the present analyses were originally selected from the CPS-II and PLCO cohorts as cases and controls for nested case–control studies of the oral microbiome in relation to two smoking-related diseases, head and neck cancer and pancreatic cancer. Therefore, approximately half (49.3%) of the 1204 subjects were individuals who went on to develop one of these cancers at some point after providing an oral sample (ranging from 1 to 12 years after oral sample collection); future cancer status was not a confounder of the observed smoking status—microbiome relationships. Additionally, results were highly similar between cases and controls (data not shown).

Table 1 Selected demographic characteristics of study populations from four data sets

To determine whether overall microbiome composition differed according to never, former and current smoking status, we conducted principal coordinate analysis based on UniFrac phylogenetic distances. We found a significant difference in composition between current, former and never smokers (P=0.001, permutational MANOVA based on weighted UniFrac; Figure 1a), after controlling for age, sex and data set. Former and never smokers overlapped on the principal coordinate analysis plot (Figure 1a); when combining former and never smokers, we observed a significant difference in oral microbiome composition between current and non-current (former + never) smokers (P=0.001; Figure 1b). To further support these findings, comparison of within- and between-group distances for all smoking categories indicated that current smokers tended to be more heterogeneous than former or never smokers (Figure 1c), and that never and former smokers are more alike than current smokers (P=0.001 and P=0.003, respectively; Figure 1d). Results were similar with the unweighted Unifrac distances (data not shown).

Figure 1
figure 1

Overall oral microbiome composition according to smoking status (current, former and never) in 1204 individuals. The principal coordinate analysis was conducted based on the weighted UniFrac distance. Sixty-eight percent confidence ellipses were drawn using the panel.ellipse function (Lattice, R), and centroids represent the coordinate mean of the first and second axes. (a) Adjusting for data set, age and sex, there was a significant difference in composition according to smoking status (P=0.001). (b) When combining former and never smokers, there was a significant difference in composition between current and non-current smokers (P=0.001). Comparison of within (c) and between (d) group distances for all smoking categories indicated that never and former smokers are more alike than are the current smokers.

We next examined the relative abundance of individual taxa according to smoking status using the Kruskal–Wallis test, and found that relative abundance of the phylum Proteobacteria differed significantly by smoking status (meta-analysis for the four data sets, P=2.29 × 10−9, FDR adjusted q=1.15 × 10−8; Table 2), with a clear depletion of Proteobacteria among current smokers (data set-weighted median: 11.7% in never smokers, 4.6% in current smokers). After adjustment for age and sex in polytomous logistic regression, the lower relative abundance of phylum Proteobacteria among current smokers compared with never smokers was statistically significant (q=5.24 × 10−7), while there was no difference between former and never smokers (q=0.60). As cross-validation, we assessed these relationships in the four data sets separately and found highly consistent results (Table 2). Aside from Proteobacteria, we observed that relative abundance of the phylum Actinobacteria was elevated in current smokers compared with never smokers after adjustment for age and sex (q=0.04) (Table 2), while relative abundance of the other major phyla did not differ significantly by smoking status (Supplementary Table S3).

Table 2 Median relative abundances of selected taxa according to smoking status in four data sets

Lower-level analysis within Proteobacteria revealed that relative abundances of the major classes, Betaproteobacteria and Gammaproteobacteria, were significantly lower in current smokers (Table 2; q=5.05 × 10−15 and q=3.62 × 10−5, respectively), as were several genera, including Neisseria, Haemophilus and Aggregatibacter (Supplementary Table S4). Several taxa not belonging to Proteobacteria, including the class Flavobacteriia (q=0.0003; Table 2) and genus Capnocytophaga, were also significantly lower in smokers, as were several genera not altered in class-level analysis, including Corynebacterium (Actinobacteria), Porphyromonas and Prevotella (Bacteroidetes), Leptotrichia (Fusobacteria), and Peptostreptococcus, Abiotrophia and Selenomonas (Firmicutes) (Supplementary Table S4). Gram-positive class Coriobacteriia tended to be enriched in current smokers (Table 2). Lower-level analysis indicated that the genus Atopobium, belonging to Coriobacteriia, as well as the genera Bifidobacterium (Actinobacteria), Lactobacillus and Streptococcus (Bacilli), were increased in current smokers (Supplementary Table S4).

The majority of the findings in our relative abundance analysis were confirmed in a parallel analysis of raw abundance using DESeq2 (Supplementary Table S5), with some exceptions. Notably, the genus Lactobacillus was not identified as differentially abundant in the DESeq analysis, while the Firmicutes phylum was found to be significantly more abundant in current compared with never smokers. At the OTU level, 249 OTUs were identified as differentially abundant between current and never smokers at the q<0.05 cutoff (Supplementary Table S6; Figure 2). These included 42 OTUs within Actinobacteria (generally enriched in current smokers), 95 OTUs from Bacilli (mostly from the Streptococcus genus, generally enriched in current smokers), 32 OTUs from Clostridia (generally depleted in current smokers), 25 OTUs from Proteobacteria (generally depleted in current smokers) and 27 unclassified OTUs. In contrast, only 17 OTUs were identified as differentially abundant between former and never smokers.

Figure 2
figure 2

Cladogram representation of oral microbiome OTUs associated with smoking status. A red branch indicates a taxon or OTU enriched in current smokers and a green branch indicates a taxon or OTU depleted in current smokers, as detected in the DESeq2 analysis. The bars represent log2 fold changes of counts in current compared with never smokers; red bars indicate positive fold change and green bars indicate negative fold change. A total of 1158 OTUs are included in the cladogram, representing OTUs with at least two sequences in at least 30 subjects in the five major phyla; only OTUs with q<0.05 are colored. Cladogram was created using EvolView (Zhang et al., 2012).

We further examined whether class-level bacterial relative abundances differed according to number of cigarettes smoked per day and years since smoking cessation. We observed inverse associations of the relative abundances of Betaproteobacteria, Gammaproteobacteria and Flavobacteriia with greater number of cigarettes smoked per day in an analysis including current and never smokers (Figure 3a; q=1.13 × 10−17, q=4.64 × 10−7 and q=1.29 × 10−6, respectively), and positive associations of the relative abundance of Betaproteobacteria, Gammaproteobacteria, and Flavobacteriia with greater number of years since quitting smoking in an analysis including former and current smokers (Figure 3b; q=1.96 × 10–10, q=8.13 × 10−4 and q=8.13 × 10−4, respectively). However, these associations did not remain significant when the cigarettes per day analysis was restricted to current smokers only, and when the years-since-quitting analysis was restricted to former smokers only, indicating that the observed associations were primarily determined by strong differences between current and never smokers, and current and former smokers, respectively.

Figure 3
figure 3

Median relative abundance of selected taxa according to the number of cigarettes smoked per day and number of years since quitting. Plot (a) includes current and never smokers, while plot (b) includes former and current smokers. False discovery rate adjusted q-values were calculated based on meta-analysis P-values of correlations between relative abundance of taxa and number of cigarettes smoked per day or number of years since quitting.

We also explored microbiota function based on inferred metagenomes using the PICRUSt algorithm (Langille et al., 2013). Of 252 KEGG pathways tested, 83 non-human-gene pathways differed in abundance between current and never smokers at q<0.05, adjusting for sex, age and data set (Supplementary Table S7); these included pathways relating to environmental information processing, carbohydrate and energy metabolism, glycan biosynthesis and metabolism, and xenobiotic biodegradation. Interestingly, pathways related to aerobic metabolism (tricarboxylic acid (TCA) cycle and oxidative phosphorylation) were depleted in current smokers, whereas oxygen-independent pathways (glycolysis, fructose, galactose and sucrose metabolism, and photosynthesis) were enriched in current smokers. Additionally, abundances of xenobiotic biodegradation pathways were significantly altered in current smokers, including some enriched (polycyclic aromatic hydrocarbon degradation, xylene degradation and drug metabolism) and some depleted (styrene, toluene, nitrotoluene, chlorocyclohexane and chlorobenzene degradation) pathways in current compared with never smokers. Bacterial genera altered in current smokers were related to many of these pathways (Figure 4). For example, genera depleted in current smokers were positively associated with styrene and toluene degradation, the TCA cycle and oxidative phosphorylation, and negatively associated with glycolysis and other carbohydrate metabolism pathways.

Figure 4
figure 4

Bacterial taxa associated with smoking status are related to several gene functional pathways. Bacterial gene functions were predicted from 16S rRNA gene-based microbial compositions using the PICRUSt algorithm to make inferences from KEGG annotated databases. Genus and KEGG pathway counts were normalized for DESeq2 size factors and adjusted for data set using the 'removeBatchEffect' function (limma). Spearman's correlation coefficients were estimated for each pairwise comparison of genus counts and KEGG pathway counts, adjusting for age and sex. Only KEGG pathways relating to carbohydrate, energy, xenobiotic and glycan metabolism and selected genera of interest are included in the heatmap; full lists of genera and KEGG pathways associated with smoking can be found in Supplementary Tables S4, S5 and S7.


In this large meta-analysis of four data sets, we observed that the oral microbiome of current smokers differed substantially from that of never and former smokers. Specifically, at the phylum level we observed a significant depletion of Proteobacteria, and enrichment of Firmicutes and Actinobacteria, in current compared with never smokers. These strong differences at the phylum level resulted from phylum-wide differences in OTU abundance between current and never smokers. Analysis of inferred metagenomes indicated that smoking may alter oral microbial ecology through influencing oral oxygen availability, while simultaneously having consequences for microbial degradation of xenobiotics. Finally, we observed that the overall oral microbiome composition of former smokers did not differ from that of never smokers; this is a promising indication that smoking-related changes to the oral microbiome are not permanent. As has been observed with other smoking-related health changes, smoking cessation clearly remains the best practice to restore a healthy phenotype (Godtfredsen and Prescott, 2011).

Given the many toxicants found in cigarette smoke (Rodgman and Perfetti, 2013), it is not surprising that smoking drastically alters the microbial ecology of the mouth. Indeed, several other studies have also observed effects of smoking on oral bacteria. Early in vitro studies using culture-based methods noted that cigarette smoke has a strong inhibitory effect on the growth of Neisseria species (Bardell, 1981; Ertel et al., 1991), while Streptococcus species were less inhibited by cigarette smoke (Bardell, 1981). Additionally, early studies in humans identified decreased Neisseria species on mucosal surfaces of smokers (Colman et al., 1976), and an increased proportion of Gram-positive to Gram-negative bacteria on developing plaques of smokers (Bastiaan and Waite, 1978). Recently, studies with comprehensive oral bacterial profiling in humans have found increased Streptococcus sobrinus and Eubacterium brachy in the saliva of smokers (Belstrom et al., 2014), decreased Neisseria, Porphyromonas and Gemella in oral wash samples from smokers (Morris et al., 2013), enrichment of Megasphaera, Streptococcus and Veillonella, and depletion of Capnocytophaga, Fusobacterium and Neisseria, in the oropharynx of smokers (Charlson et al., 2010), and alterations in 172 subgingival plaque OTUs in smokers (Mason et al., 2015). Because of the various sample types used to study the oral microbiome, and the known variation in microbial communities in different parts of the oral cavity (Segata et al., 2012), comparison across studies is difficult. In the current study we have employed Scope mouthwash samples to study the microbiome, which are likely most comparable with studies using saliva or other types of mouthwash samples, and less comparable with plaque samples (Segata et al., 2012). Nevertheless, in similarity with some of the above-mentioned studies, we observed decreases in Neisseria, Porphyromonas and Capnocytophaga and increases in Veillonella and Streptococcus, in current compared with never smokers. To our knowledge, this is the first study to report phylum- and class-wide associations of bacterial taxa with smoking status; our more robust findings may relate to the large sample size, which provided the power to detect these associations.

There are several potential mechanisms by which smoking may alter microbial ecology, including increasing the acidity of saliva (Parvinen, 1984; Kanwar et al., 2013), depleting oxygen (Kenney et al., 1975), antibiotic effects (Macgregor, 1989), influencing bacterial adherence to mucosal surfaces (Brook, 2011) and impairing host immunity (Sopori, 2002). Our analysis of inferred metagenomes revealed decreased abundance of aerobic metabolism pathways, including the TCA cycle and oxidative phosphorylation, and increased abundance of glycolysis and other oxygen-independent carbohydrate metabolism pathways, in current smokers compared with never smokers. This finding suggests that cigarette smoke creates an environment favoring strict or facultative anaerobes over strict aerobes. At the genus and OTU level, we observed increased abundance of Streptococcus in current smokers; members of the Streptococcus genus are facultative or obligate anaerobes (Patterson, 1996) and generally acid tolerant, which may explain their success in the smoking environment. Additionally, we observed smoking-related increases in the anaerobic Veillonella genus and Actinobacteria OTUs from anaerobic Actinomyces spp., Rothia mucilaginosa, Bifidobacterium longum and Atopobium spp. Conversely, aerobes such as Neisseria subflava and Corynebacterium were depleted in smokers. Consistent with the oxygen deprivation hypothesis, Mason et al. observed higher abundance of anaerobes and lower abundance of aerobes in subgingival plaque samples of smokers compared with non-smokers (Mason et al., 2015). Interestingly, we observed depletion of certain anaerobic OTUs in smokers as well, including Leptotrichia spp., Veillonella parvula and Peptostreptococcus sp. It is possible that these bacteria were depleted due to specific antibiotic toxicants in cigarette smoke, or depleted indirectly due to competition for colonization with smoking-enriched bacteria or co-aggregation with smoking-depleted bacteria. Because this is an observational study, we cannot determine which of the altered taxa are directly affected by cigarette smoke or indirectly affected through microbe–microbe interactions.

Aside from creating an anaerobic, acidic and/or selectively toxic environment, smoking is also known to have prominent effects on human immunity (Sopori, 2002), which can in turn influence the host's ability to stave off colonization by pathogens. The chemotactic mobility and phagocytic function of oral polymorphonuclear leukocytes is diminished in smokers (Noble and Penny, 1975; Kenney et al., 1977; Archana et al., 2015); as these cells are crucial to the host defense against pathogens, smoking inherently promotes a more pathogen-friendly oral ecosystem, thus increasing risk for oral disease (e.g. periodontitis). Several of our findings are consistent with progression towards a diseased state: Neisseria and Eikenella are depleted in oral mucosa from periodontitis patients (Mager et al., 2003), and Streptococcus species are more abundant in periodontal disease-progressing oral sites than healthy oral sites (Yost et al., 2015). However, most of the taxa typically implicated as periodontal pathogens were not affected by smoking in the current study, despite smoking being a strong risk factor for periodontitis. Our use of oral mouthwash samples rather than subgingival samples may account in part for this discrepancy.

The depletion of certain xenobiotic biodegradation pathways in current smokers suggests important functional losses with potential health consequences. The oral bacteria are first to come into contact with cigarette smoke as it enters the human body, and may play an important role in degrading the accompanying toxic compounds. We observed that functional pathways relating to toluene, nitrotoluene, styrene, chlorocyclohexane and chlorobenzene degradation were depleted in current smokers, as was cytochrome P450 xenobiotic metabolism. Conversely, polycyclic aromatic hydrocarbon and xylene degradation were enriched in current smokers. These chemicals are components of cigarette smoke (Rodgman and Perfetti, 2013), and thus alterations in the ability of the oral community to degrade these substances may have toxic consequences for the host. It is surprising that some of the xenobiotic degradation pathways are depleted in smokers, given the need for bacterial upregulation of these pathways to detoxify cigarette smoke. This result is also in contrast with a metagenomic study which observed increased cytochrome P450 xenobiotic metabolism in smokers (Boyle et al., 2010). A simple explanation for our finding is that these pathways are carried out in bacteria that were depleted in smokers. Alternatively, the toxic compounds themselves may saturate the enzymes responsible for their degradation, thus killing the bacteria possessing these enzymes. Although the long-term effects of smoking-related oral dysbiosis remain unclear at this time, oral bacteria are known to play an important role in both oral and systemic diseases (Wade, 2013; Olsen, 2015). It is therefore not unreasonable to imagine that changes in the oral bacterial community due to smoking may have detrimental health effects.

We additionally observed that oral bacteria abundances were generally similar between former and never smokers, implying that specific bacteria depleted by smoking may be restored following smoking cessation. Interestingly, a small number of OTUs were identified as differentially abundant between former and never smokers, including a few that were altered in the same direction as in current smokers. This finding may indicate some minor lingering effects of smoking. We did not observe an association between years since smoking cessation and bacterial class relative abundances in analyses restricted to former smokers. The absence of a clear trend with years since quitting among former smokers may be due to restoration of the oral microbiome occurring relatively quickly following smoking cessation, for example, during the first year or two immediately after quitting. We had insufficient data on the precise timing of smoking cessation to examine potential trends in oral microbiome composition during the period within a few years of quitting. Moreover, the effect of smoking cessation on the oral microbiome would be better studied with longitudinally collected oral wash samples, which would allow for within-person comparison of the oral microbiome pre- and post-smoking cessation. A few studies have examined the effect of smoking cessation on subgingival plaque bacteria longitudinally (Fullmer et al., 2009; Delima et al., 2010); however, these studies were limited by the extent of bacterial profiling. A longitudinal investigation of smoking cessation involving more extensive bacterial profiling of the oral microbiome (i.e. 16S rRNA gene sequencing) will be important to determine which taxa recolonize the oral environment after smoking cessation.

In summary, in this large study of the human oral microbiome, we observed that smoking is related to overall oral microbiome community composition, and to the abundance of many taxa. Smoking may promote an anaerobic oral environment and a bacterial community with reduced xenobiotic degradation capabilities. Strengths of this study include the large sample size, the ability to check replication of findings in four data sets and the control of potential confounders. This study was limited by lack of metagenomic data to determine the actual gene content of bacteria altered by smoking and lack of longitudinal samples pre- and post-smoking cessation. Additionally, due to the elderly age of our study participants, our findings may not be generalizable to younger populations, particularly since the oral microbiome changes with age (Xu et al., 2015). Future studies should investigate the impact of smoking on the metagenomic content of the oral microbiome, and whether smoking-related oral bacterial and/or metagenomic changes mediate the health effects of smoking.