The human body is home to many indigenous microorganisms, with distinct communities at different anatomical sites (Dethlefsen et al., 2007). Recent studies have shown the importance of the gut microbiota in digestion, fat storage, angiogenesis, immune system development and response, colonization resistance and epithelial architecture (reviewed in Flint et al., 2007; Tappenden and Deutsch, 2007; Cogen et al., 2008). The oral cavity is also home to microbial communities, with important implications for human health and disease. Chronic periodontitis is one of the most common inflammatory conditions worldwide, and is associated with bacterial community structures that are distinct from those of health.

Efforts to characterize microbial diversity increasingly rely on cultivation-independent molecular techniques (Hugenholtz, 2002; Schloss and Handelsman, 2004), as the vast majority of bacteria have yet to be cultivated. Most of these molecular studies are based on the small subunit (16S) ribosomal RNA (rRNA) gene because of its universal presence in cellular organisms, the presence of conserved regions and its reliability for phylogenetic analysis (Woese and Fox, 1977). Recent molecular surveys of the human distal gut microbiota have shown that each individual gut is home to 500–3000 bacterial species, with a large degree of interindividual variation (Eckburg et al., 2005; Dethlefsen et al., 2007, 2008). Using rRNA gene-based techniques, it is estimated that the human oral cavity harbors 500–700 different bacterial species (Kroes et al., 1999; Paster et al., 2001; Kazor et al., 2003; Aas et al., 2005; Dewhirst et al., 2008). A recent study based on 14 115 partial 16S rRNA gene sequences obtained from saliva specimens from 120 healthy individuals from 12 different geographic locations around the world found 101 different bacterial genera, with a high level of interindividual variation (Nasidze et al., 2009). Two recent 16S rRNA gene-tag pyrosequencing-based studies have suggested that there are 250–300 species-level phylotypes in the mouth of any given individual, and that they segregate based on mucosal versus dental surfaces (Keijser et al., 2008; Zaura et al., 2009). All three of these recent studies are limited by their dependence on relatively short (<500 nucleotides) sequences, and hence, by limited phylogenetic resolution.

We analyzed 1000 near full-length-cloned 16S rRNA gene sequences from each of 10 individuals with healthy oral tissues and gingiva, and examined variation in patterns of diversity between individuals.

Materials and methods

Subjects and specimen collection

Specimens were collected from 10 individuals with healthy oral tissues and gingiva (five women; age range 27–61 years; average age 38.1 years; ethnicity: six Caucasian, one Afro-American, two Chinese and one from India). Oral health status of all individuals was determined by a dentist who performed a full-mouth clinical examination that included inspection of the teeth, oral mucosa and periodontal tissues. All participants had normal oral mucous membranes and were free from nonrestored carious lesions. At most sites, periodontal tissues showed no clinical signs of inflammation, such as redness, swelling or bleeding on probing, and were judged to be free of gingivitis or periodontitis. Details of the periodontal data obtained from sites from which plaque specimens were taken are provided in Table 1 and Supplementary Methods. From each individual, 26 oral specimens were collected. Separate dental plaque specimens were taken with sterile curettes from supragingival and subgingival surfaces of seven target teeth (no. 3, 9, 12, 19, 25, 28 and 30). The 26th sample consisted of whole saliva that was expectorated into a test tube. Healthy human mouths have relatively little bacterial biomass compared with the gastrointestinal tract; therefore, because the ultimate purpose of this project was to obtain community-wide shotgun sequence data, specimens were pooled in order to ensure sufficient DNA. One-third of each of the 26 specimens obtained from each individual was combined to obtain 10 ‘individual-specific’ pools, whereas a separate third of each of the subgingival specimens from all 10 individuals (seven specimens per subject) was pooled to create a single ‘subgingival pool’. To study the influence of DNA isolation method in the UniFrac analysis (see below), specimens were also collected from three additional healthy individuals. These specimens were not included in downstream analyses, unless otherwise noted. Further details about inclusion and exclusion criteria, specimen collection and other procedures are provided in Supplementary Methods.

Table 1 Characteristics of subjects who participated in this study

DNA extraction

To extract DNA, pooled specimens were washed twice in 1 ml ice-cold phosphate-buffered saline, pelleted by 5 min centrifugation at 16 000 g at 4 °C and resuspended in 100 μl phosphate-buffered saline, to reduce the amount of contaminating free human DNA. To this suspension, 10 μl of a 10% Triton-X100 solution and 2.5 μl of a 20 mg ml−1 Proteinase K solution (Qiagen, Valencia, CA, USA) were added, and the suspension was incubated at 60 °C for 30 min. A volume of 200 μl of a cell lysis buffer (100 mM Tris-HCl (pH 7.4), 20 mM EDTA, 5 M guanidine isothiocyanate) was added. To obtain maximum bacterial diversity, we split each specimen pool into two equal portions. To one specimen portion, three sizes of baked zirconia beads were added, and the mixture was agitated in a FastPrep FP120 machine (Qbiogene, Carlsbad, CA, USA) at 4.0 m s−1 for 30 s. The bead-beaten portion was recombined with the nonbead-beaten portion. The DNA was further purified, precipitated, washed, dried and resuspended in 50 μl of 10 mM Tris (pH 8.0) (details are provided in the Supplementary Methods). Extraction controls were processed in parallel during the DNA extraction procedure to monitor contamination. A second set of pooled oral specimens from three additional healthy mouths was extracted using the QIAamp DNA mini kit (Qiagen).

16S rRNA gene amplification, cloning and sequencing

The 16S rRNA gene was amplified using broad-range bacterial-specific primers 8FM (5′-AGAGTTTGATCMTGGCTCAG-3′) (Edwards et al., 1989; Palmer et al., 2007) and 1391R (5′-GACGGGCGGTGTGTRCA-3′) (Lane et al., 1985; Palmer et al., 2007). These primers amplify 90% of the bacterial 16S rRNA coding sequence. PCR was performed as described previously (Eckburg et al., 2005), except that PCRs were performed with 5 min at 95 °C, 20 cycles of 30 s at 94 °C, 30 s at 55 °C and 90 s at 72 °C, followed by 8 min at 72 °C. To obtain sufficient PCR product for cloning, the products of four replicate 20-cycle amplification reactions were pooled. No amplification product was observed in the extraction controls and negative PCR controls. Purified PCR products were cloned with the TOPO TA cloning kit (Invitrogen, Carlsbad, CA, USA), and plasmid inserts were sequenced on both strands.

Phylogenetic analysis

A total of 11 447 high-quality, 1400 bp-length, 16S rRNA gene sequences were aligned with the online Greengenes NAST aligner (DeSantis et al., 2006) ( and inserted into the Greengenes version of ARB (Ludwig et al., 2004). The alignment was further perfected by manual optimization. In total, 79 chimeras (0.7%) were manually identified and removed from the analysis, so that 11 368 sequences were included in the final analysis. Operational taxonomic units (OTUs; phylotypes) were defined using a 99% sequence similarity cutoff, by using similarity matrices and a filter of 1253 nucleotide positions, masking out the hypervariable regions. The 99% cutoff in this setting roughly corresponds to species-level groupings. One representative for each of the 247 OTUs found in this study was deposited in GenBank (accession numbers FJ976202 to FJ976448) (Supplementary Table S1). Sequences with less than 99% similarity to sequences in public databases were considered novel (Supplementary Table S2). Genus names were assigned based on placement of sequences within defined groups, or on a cutoff of 95% sequence identity in the case of unclassifiable sequences. The DOTUR and mothur packages were used to calculate the number of OTUs at different cutoffs, and to calculate collector’s curves and the Chao1 species richness (Schloss and Handelsman, 2005; Schloss et al., 2009).

Estimates of microbial diversity

Richness estimates and diversity indices were determined (Simpson and Shannon formulae) with EstimateS (Colwell, 2005). The percentage of coverage was calculated by Good's method using the formula [1−(n/N)] × 100, where n is the number of phylotypes in a specimen represented by one clone (singletons) and N is the total number of sequences in that specimen (Good, 1953). The Shannon index of evenness was calculated using the formula E=eD/N, where D is the Shannon diversity index.

UniFrac analysis

After calculating, with an Olsen correction, a neighbor-joining tree containing representatives of all 247 OTUs found in this study, the 11 different oral environments were clustered using principal coordinates analysis, as enabled in UniFrac (Lozupone et al., 2006), using weighted, normalized abundance data. To compare sequence data obtained from oral specimens in this study against data obtained from other locations in the human body from subjects in previously published studies, UniFrac principal coordinates analysis was also performed on a second data set. These combined data included data obtained from the 11 oral pools of this study, 3 additional oral pools from healthy human mouths isolated using a different DNA extraction method (QIAamp DNA mini kit from Qiagen; 1034 sequences; unpublished data), 18 colonic biopsy and 3 stool specimens from 3 healthy subjects (Eckburg et al., 2005) and 15 stool specimens from 3 healthy subjects in an antibiotic perturbation study (Dethlefsen et al., 2008).

Community comparisons

Community composition was examined in two separate ways. First, the communities were compared using shared species estimators. Second, community assembly was examined using taxon co-occurrence. The Chao–Jaccard abundance-based similarity index is a shared species estimator that measures the probability that two individuals chosen from two different specimens are members of species shared by both specimens (Chao et al., 2005). This particular test can only be used to examine similarity between two communities at a time. The Chao–Jaccard similarity index was calculated using EstimateS (Colwell, 2005) for all possible pairwise comparisons of the communities from the 10 mouths. Community similarity was compared at two taxonomic levels—OTU and genus. The subgingival pool was not included in this analysis.

In addition to community similarity, we tested for nonrandom patterns of taxon co-occurrence by calculating C-scores for this data set (Stone and Roberts, 1990). This measure of community structure calculates the number of checkerboard units (specimens in which two taxa are not found together) between all possible taxon pairs in a matrix, and calculates a single score for the entire data set. The C-score is the average for all of the possible pairs in the matrix. This measure is compared with a null distribution of random matrices of the same size. If the observed C-score is larger than the score for the null hypothesis, it suggests significant segregation between taxa, and if the observed C-score is smaller than the score for the null hypothesis, it suggests significant aggregation between taxa. In this case, we calculated C-scores using an abundance matrix of all taxa, organized by mouth (Supplementary Table S1), which was then converted to a presence/absence matrix. The subgingival pool was not included in this analysis. These scores were compared with those generated from a null model based on 500 randomly generated matrices of the same size using the program EcoSim (Gotelli and Entsminger, 2004). Co-occurrence patterns were examined at three separate taxonomic levels—OTU level (n=247, approximately species level), genus level (n=53) and phylum level (n=9).


Bacterial diversity of the healthy human mouth

From each of 10 individuals with a healthy oral status, 26 specimens from different parts of the mouth were collected. Portions of the specimens were pooled per individual and an 11th pool was constructed with portions of each subgingival specimen from all 10 individuals. Ribosomal RNA gene sequences were amplified using broad-range bacterial primers, cloned and sequenced. The 11 368 near full-length, nonchimeric sequences of the combined data set were manually assigned to 247 OTUs (phylotypes) using a cutoff of 99% sequence identity (Supplementary Table S1). DOTUR and mothur analyses revealed a total of 228 OTUs at this cutoff level, with an expected OTU richness of 236 (Supplementary Figure S1, which also shows the rarefaction curves of each of the 11 clone libraries). A graph displaying the DOTUR-determined number of phylotypes versus the phylogenetic distance displayed the typical ‘hockey stick shape’ that is found in most animal-associated bacterial communities, with an enriched representation of diversity at the tip (Supplementary Figure S2). Nine bacterial phyla were identified within the combined data set (Figure 1). Of these, Firmicutes (33.2% of all sequences; mean abundance in 11 pools is 32.2±8.1%), Proteobacteria (27.5% in combined set; mean 24.6±8.1%), Bacteroidetes (16.6%; mean 14.6±8.4%) and Actinobacteria (14.5%; mean 11.9±10.3%) were the most abundant. Less abundant phyla included Fusobacteria (6.7%; mean 5.6±4.1), TM7 (1.3%; 0.52±1.3%), as well as Spirochaetes, OD2 and Synergistes (all <1%). Figure 2 displays a phylogenetic tree and relative abundance of all genera found in this study. In the combined data set, the genus Streptococcus was the most abundant genus (2180 sequences, 19.2% of total). Other abundant genera include Haemophilus (1325; 11.7%), Neisseria (1042; 9.2%), Prevotella (974; 8.6%), Veillonella (973, 8.6%) and Rothia (820; 7.2%). However, the genera and species that dominate the mouth vary between individuals (see below).

Figure 1
figure 1

Relative abundance of phylum members of the oral communities from 10 healthy individuals. A total of 11 368 bacterial rRNA gene sequences derived from pools of specimens from different oral habitats, from each of 10 healthy individuals (numbered 1–10), as well as from a pool of all subgingival samples (S), were analyzed and assigned to phyla (color-coded, according to the scheme at the right). ‘Total’ refers to the combined set of sequences from all pools. The number of clones in each rRNA gene library is given below the name of the pool.

Figure 2
figure 2

Phylogenetic relationships and relative abundance of the genera found in pools of oral specimens. (a) Phylogenetic tree for the 247 OTUs found in this study, grouped by genus. A 95% sequence similarity threshold was used for unclassified groups. The tree was constructed by neighbor-joining analysis with an Olsen correction. Bootstrap values 50 (expressed as percentages of 1000 replicates) are shown at branch points. The scale bar represents evolutionary distance (10 substitutions per 100 nucleotides). (b) Relative abundance of genera in each of the 11 oral specimen pools displayed with gray scale values (white, 0% present; black, 100% of clone library; exact scale shown at the bottom). 1–10, each of the individual subject pools; S, subgingival, T, total. Genera are shown in the same order as in (a).

Novel sequences

Using a 1% sequence identity cutoff, 24 OTUs (10%) were considered novel (Supplementary Table S2). Of these, six had less than 97% sequence identity to published sequences (Table 2). The sequences with the least identity to previously reported sequences were clone 10B928 (phylum Bacteroidetes), which was 92.5% identical to AF371900 (isolated from the intestinal tract of a pig, (Leser et al., 2002)), and clone 7BB842 (phylum OD2), which displayed 92.5% sequence similarity to its closest neighbor, AB243989 (detected in a Japanese oil well (unpublished data)).

Table 2 Novel OTUs found in this study

Comparisons between bacterial communities in the 11 oral pools

Observed bacterial richness was highest in subject 4, in whom the highest number of OTUs, singletons and doubletons was found (Table 3). In contrast, both Shannon and Simpson estimators of bacterial diversity were the highest for subject 3. This subject also showed the highest Shannon estimator of evenness. Good's estimator suggested >95% coverage for each of the 11 libraries, indicating that only an additional five OTUs would be found if 100 additional clones were sequenced. UniFrac analysis showed no clustering of the oral communities from the 10 individuals based on gender, age or ethnic background (Figure 3a). Pairwise comparisons of the oral pools showed that all individuals were equally distinct (Bonferroni-corrected P-values all >0.5).

Table 3 Estimators of sequence library diversity, evenness and coverage
Figure 3
figure 3

Variation in patterns of diversity. Unifrac principal components analysis was performed using weighted, normalized abundance data (Lozupone et al., 2006). (a) Analysis of oral specimen pools from each of 10 healthy subjects (white circles, females, n=5; gray circles, males, n=5), and one pool of the subgingival specimens from all of these 10 subjects (black circle). (b) Analysis of the oral specimen pool data obtained from this study (white circles, n=11), data from additional oral specimen pools extracted with a different DNA extraction method (gray circles, n=3, unpublished data) and previously published data from human colon samples (gray squares, n=18, Eckburg et al. 2005) and human stool samples (gray triangles, n=3, Eckburg et al. 2005; black triangles, n=15, Dethlefsen et al. 2008). All sequences were compared using the same alignment and 1253-nucleotide filter.

Microbiota from human oral cavity is distinct from that of other human habitats

We compared the oral bacterial communities described in this study with those found in previously published studies of the human colon and stool (Figure 3b). Although these specimens were derived from different studies and different individuals (except for certain stool and colonic specimens that were derived from the same three individuals), specimens from different anatomical sites clustered in a distinct fashion; the corrected UniFrac significance (all environments together) was 0.01, indicating that the environments were significantly different from each other. Three additional oral communities from QIAamp-extracted specimens (CDL, unpublished results) clustered with the 11 communities from 11 benzyl alcohol-extracted specimen pools described in this study, suggesting that DNA extraction method accounts for less variation in the composition of communities than do differences between individuals.

Shared taxa among the bacterial communities of the healthy human mouth

The different bacterial communities were compared using the Chao–Jaccard abundance-based similarity index. An average of 50.5 (range 29–76) OTUs were found to be shared between any two specimens (Table 4a). Similarity between communities was typically low, averaging 0.671 (range 0.501–0.801) with the raw index and 0.760 (range 0.533–0.969) with the estimated index (Table 4b). When community similarity was examined on the genus level, observed shared genera averaged 25 (range 18–34) (Table 4a) and the Chao–Jaccard abundance similarity averaged 0.942 (range 0.845–0.988 for raw) and 0.963 (range 0.845–1 for estimated) (Table 4b). A value of 1 indicates that all genera are shared between the two specimens examined. A total of 15 bacterial genera were observed in all 10 healthy individuals: Neisseria, Cardiobacterium, Haemophilus and Campylobacter (Proteobacteria); Streptococcus, Granulicatella and Veillonella (Firmicutes); Fusobacterium (Fusobacteria); Rothia, Actinomyces, Corynebacterium and Atopobium (Actinobacteria); and Prevotella, Capnocytophaga and Bergeyella (Bacteroidetes). Every individual also contained TM7 sequences (Figure 4). All of these bacterial taxa were also present in the pooled subgingival library. Of these shared genera, species belonging to eight were present in all 10 individuals, leading to eleven shared bacterial species: Haemophilus parainfluenzae, Streptococcus oralis, Streptococcus sanguinis, Granulicatella adiacens, Veillonella parvula, Veillonella dispar, Rothia aeria, Actinomyces naeslundii, Actinomyces odontolyticus, Prevotella melaninogenica and Capnocytophaga gingivalis.

Table 4a Observed, shared OTUs (to the left and below the diagonal, in bold) and genera (above and to the right of the diagonal, italicized) for each subject pair
Table 4b Chao–Jaccard abundance-based similarities between each pair of subjects (raw values)
Figure 4
figure 4

Schematic depiction of oral community membership among 10 healthy individuals. Inner circle, bacterial genera found in all 10 individuals (100%); second circle, present in 6–9 out of 10 individuals (51–99%); third circle, present in 3–5 individuals (21–50%); outer circle, present in 1–2 individuals (1–20%). Genera are grouped according to phylum.

Interindividual differences among the bacterial communities of the human mouth

Despite conserved oral bacterial community composition at the genus level, there were also interindividual differences. Several different patterns of genus dominance were found in the 10 healthy mouths. Of the 10 mouths, 5 were dominated by Streptococcus species (nos. 2, 5, 7, 9 and 10). Two mouths were dominated by Prevotella (nos. 1, 4), and one each was dominated by Neisseria (no. 3), Haemophilus (no. 8) and Veillonella (no. 6) (Supplementary Figure S3). In addition, even among the genera present in all 10 healthy individuals, the presence of particular species within that genus was variable between individuals. For example, although every subject had sequences belonging to the genus Neisseria, no single Neisseria species was shared across all subjects. The same was true for species in the genera Fusobacterium and Corynebacterium.

Co-occurrence of bacterial taxa

Co-occurrence analysis was performed on the data obtained from the 10 individual subjects, using the C-score of Stone and Roberts (1990), which compares the taxon distribution of a data set to a randomized distribution of the same number of taxa. This method calculates the checkerboard units for each taxon pair (how often those two taxa are found together). When analyzed at the level of OTU, the observed C-score was not significantly different from the null hypothesis (random distribution). When the same data were analyzed at the genus level, the C-score indicated that the communities display co-occurrence patterns significantly different from the null hypothesis (observed C=0.99184, expected C=0.95366; P=0.02860). These scores (higher than expected) suggested segregation or competition among taxa. Examination of the matrix of checkerboard units between each taxon pair can pinpoint taxa that are more or less likely to be found together. Figure 5 displays the taxa pairs as a matrix of C-scores. Taxa with low C-scores (found together frequently) are colored white, whereas those with high C-scores (rarely or never found together) are colored black. Genus pairs in which both genera are found in all mouths, such as Streptococcus, Neisseria and Haemophilus have zero checkerboard units, as expected. When examining the genus pairs with high checkerboard units, the genus Abiotrophia was identified as unlikely to be found together with the genera Dialister, Oribacterium, Eubacterium and Treponema. In addition, the genus Scardovia was unlikely to be found with Eikenella or Dialister. Because it may be inappropriate to compare this broad range of bacterial taxa in a single analysis (owing to the fact that members of different phyla may not be in competition), we re-analyzed the OTU-level data, but in this case, comparing the patterns only within a given phyla. In this case, we also calculated the C-scores based on presence/absence for all OTUs (but only within a given phylum). This was repeated for each phylum, except for OD2 and Synergistes, owing to the few observations in each of these two groups. This OTU-level, within-phylum analysis revealed that only the taxa within Firmicutes showed a C-score significantly different from the null hypothesis (observed=2.1370, expected=2.08243; P=0.03460), suggesting segregation of species and evidence of possible competitive species interactions.

Figure 5
figure 5

Checkerboard (C) scores for each possible combination of two genera. The C-scores are shown in gray scale. White depicts a C-score of 0 for genera always found together. Darker colors show higher C-scores for genera that co-occur less frequently than expected. The highest C-score in this data set, 16, was found for the Abiotrophia–-Treponema genus pair, two fairly abundant genera never found together. Genera are ordered according to their overall abundance in the 10 individual mouth pools. The numbers after the genus names indicate the number of individuals (out of 10) in which that genus was found. The 16 taxa that were found in all 10 individuals, as expected, show a C-score of 0 (white). Data obtained from the subgingival pool were not included in this analysis.


The composition of the microbial communities on and within the human body varies between individuals. Interindividual variation has been shown in a variety of studies for the healthy intestinal tract (Eckburg et al., 2005; Dethlefsen et al., 2006; Ley et al., 2006; Palmer et al., 2007). In contrast, knowledge about the interindividual differences in the healthy human mouth microbiota and the uniqueness of the oral microbiota compared with other microbial communities in our bodies is still somewhat sparse. Several molecular studies have been carried out regarding the composition of the oral microbiota, but these studies used limited numbers of sequences per individual or only looked at short regions of the 16S rRNA gene (Kroes et al., 1999; Paster et al., 2001; Kazor et al., 2003; Aas et al., 2005). A study by Diaz et al., 2006 in three individuals showed that early colonization of enamel is subject specific. The distinctness of the phylogenetic structure of the human oral microbiota in relation to the microbiota of the skin and feces in nine individuals was revealed in a recent study (Costello et al., 2009). Although other studies have considered the oral microbiota of a larger number of individuals, our study was based on one of the largest sets of near full-length sequences per individual to date for the human oral cavity. The most important contributions of this work are the combination of depth of coverage and degree of phylogenetic resolution for the human mouth, the features of a human oral core microbiota and previously unrecognized patterns of taxon co-occurrence.

In this study, we amplified and analyzed an average number of 1029 near full-length, well-aligned oral 16S rRNA gene sequences (range 931–1070) per subject from each of 10 healthy individuals, as well as an additional 1083 clones from the pooled subgingival specimens, bringing the total number of sequences analyzed in this study to 11 368. The advantage of near full-length 16S rRNA gene sequences in providing greater phylogenetic resolution than hypervariable region ‘tag’ sequences was highlighted in a comparative analysis of these two types of sequence data (Huse et al., 2008). In this data set, we identified a total of 247 different OTUs at the level of species, of which 24 were less than 99% identical than previously published sequences. Approximately 10% of the OTUs found in this study were previously uncharacterized.

The abundant bacterial groups found in our study are similar to those found in most other studies. For example, 20% of our sequences belonged to the genus Streptococcus, confirming the preponderance of Streptococcus species within a healthy mouth by microscopy and culture (Socransky, 1963) and by molecular methods (Kroes et al., 1999). In a recent molecular study, the most predominant bacterial genera in the oral cavity were Streptococcus, Gemella, Abiotrophia, Granulicatella, Rothia, Neisseria and Prevotella (Aas et al., 2005). We found those same groups to be prevalent as well, but, in addition, we found many Proteobacteria (for example, Haemophilus and Lautropia) to be abundant. This difference may be the result of a deeper sequencing effort per individual in the current study (average 57.5 clones per subject in the Aas et al. study for a total of 2589 clones, in contrast to an average 1029 clones per individual in this study). In addition, different DNA extraction methods and different broad-range PCR primers could also explain the divergent results.

Despite the evidence for a conserved healthy oral community at the genus level in all 10 healthy mouths, there was also evidence in this study for large interindividual differences. Our study confirms results by Nasidze et al. (2009), suggesting high variability in the oral microbiome between individuals, although in the latter study, saliva was the only specimen-type examined. In addition to Streptococcus, which was the most abundant genus in the combined data set and in five of the individual mouths, we identified four additional genera that may dominate the oral ecosystem of a healthy subject. Our data indicate that there are various alternative oral bacterial community structures and a greater degree of variation in patterns of diversity associated with oral health than previously thought. It remains to be seen what factors, for example, human genetics or lifestyle, correlate with oral bacterial community structure. Clearly, the concept of a core oral microbiome may be better defined with measurements of community function rather than community membership. Such analyses will need to include community-wide assessments of gene content, gene transcript abundance and protein products.

The role of bacteria in periodontal disease is complex, and likely involves polymicrobial consortia (Lepp et al., 2004). Socransky and Haffajee have proposed that the presence of a high proportion of so-called ‘red complex’ bacteria, that is, Porphyromonas gingivalis, Tannerella forsythia and Treponema denticola, is associated with periodontal disease (Socransky et al., 1998; Haffajee et al., 2008). In a survey of five healthy mouths, Aas et al. (2005) did not find any representatives of the ‘red complex’. Other studies have, however, identified members of this complex in healthy mouths (Ximenez-Fyvie et al., 2000). In our study, all three species were found in subjects with healthy gingival tissues, although in low numbers and limited to subjects 1, 4 and 9. Taken together with previous studies, this study confirms that the ‘red complex’ group may be found in small numbers in healthy individuals. Other bacterial species such as Filifactor alocis, Selenomonas species and Dialister species have been associated with a worsening periodontal status (Kumar et al., 2005). A bacterial species previously shown to be associated with periodontal health (Veillonella parvula, Veillonella X042, Genbank accession number AF287781) (Kumar et al., 2005) was found in all specimens in this study, and was the third most abundant OTU in our combined sequence data set.

UniFrac principal coordinates analysis showed no apparent clustering of oral microbial communities based on gender, age or ethnicity. In addition, UniFrac analysis showed no apparent effect of DNA extraction method of oral specimens. No individual pool was found to be more significantly different than others in pairwise comparisons, and the subgingival library was not significantly different from the individual pools. This may be indicative of the fact that (1) despite the many different habitats in the human mouth, many bacterial species are shared among those habitats or (2) that the individual pools are dominated by the subgingival specimens. However, the number of subjects in this study was relatively small, and interindividual differences associated with gender, age or ethnicity might become apparent when larger numbers of subjects are studied. Because specimens from multiple sites within an individual were pooled, bacterial community differences between anatomical sites could not be examined.

When the oral sequence libraries were compared with similar sequence libraries from the human colon and stool, a clear clustering according to anatomical site was observed. These results need to be interpreted with caution, as data were obtained from different individuals and differences between study groups might drive some of the findings. But it is appealing to assume that each anatomical location within a healthy human has specific physiochemical conditions that shape the composition of a microbial community specifically adapted to that site. Our finding of human habitat-specific microbial community structure is supported by recently published data (Costello et al., 2009).

Tests for significant segregation patterns of taxa were originally developed as a means of assessing whether competition between taxa is a driving force behind community assembly. C-scores higher than expected are consistent with inter-species competition, as well as with habitat differences that cross over the sampling scheme and historical processes. We feel that habitat differences (other than host genotype) were minimized in our study owing to the fact that the pools presumably represented multiple intra-oral sites in a consistent manner across individuals. However, successional or early historical differences between subjects cannot be eliminated as a possible explanation of the observed segregation patterns. It has been previously suggested that as taxonomic level is refined, C-scores become more statistically significant (Horner-Devine et al., 2007). The fact that significant segregation was found at the genus level in our study, but not at a level equivalent to species, has several possible interpretations. One possibility is that taxonomic levels are not the relevant biological units of measure. Another possibility is that the level of ecological interest and interaction in the mouth is the level that humans have chosen to label as genus, rather than species.

Co-occurrence analysis not only addresses the forces structuring a community but also draws attention to specific taxa that have apparent interactions and may be worthy of further investigation. For instance, in this study, Abiotrophia was found to have a high number of checkerboard units with the genera Dialister, Oribacterium, Eubacterium and Treponema, and the genus Scardovia had a high number of checkerboard units with Eikenella and Dialister. Interactions among these genera have not been the focus of research so far, but such research may lead us to understand whether and why these taxa compete. Each of these genera (except Treponema) is represented in this data set by a single species, each of which has been implicated in human disease; recognition of competitive partners may prove useful in preventive medicine. For instance, it has been suggested that known competitive interactions between Streptococcus mutans and other species may be exploited to develop preventive treatments for dental caries by encouraging growth of species with lower cariogenicity (Kreth et al., 2005).

This study shows that each person's mouth harbors a unique community of bacterial species, but that these communities tend to be more similar when classified at the level of genus. Ecological tools initially developed for larger organisms, such as co-occurrence analysis, will greatly facilitate the analysis of complex bacterial communities such as those found in the human body and will enhance our understanding of the role of the microbiota in health and disease.