A recent review by Vaarala et al. (2008) describes a trio of factors that create a perfect storm of events leading to autoimmunity in type 1 diabetes (T1D). These factors include an aberrant intestinal microbiota, a leaky intestinal mucosal barrier and an altered intestinal immune responsiveness. The interplay of these factors seems to have a crucial role in the onset of several allergenic and autoimmune diseases, including Crohn's disease, celiac disease, T1D and multiple sclerosis (Frank et al., 2007; Wen et al., 2008; Willing et al., 2009).

Insulin-dependent T1D is relatively common, with no female dominance. Chronic and autoimmune diseases are usually diagnosed in young children and are caused by T-cell-mediated destruction of insulin-producing pancreatic β cells in the islets of the pancreas (Harrison et al., 2008). A role for bacteria in the onset of diabetes has been shown in two murine models. For example, feeding probiotic bacterial strains, usually lactic acid bacteria, to non-obese diabetic mice or biobreeding diabetes-prone (BB-DP) rats can delay or prevent diabetes (Matsuzaki et al., 1997; Calcinaro et al., 2005; Yadav et al., 2007). Feeding antibiotics to nonobese diabetic mice or BB-DP rats can also increase survival in these models (Brugman et al., 2006; Schwartz et al., 2007). In addition, pathogen-free nonobese diabetic mice lacking an adaptor protein for multiple toll-like receptors known to bind to bacterial ligands fail to develop diabetes (Wen et al., 2008).

Roesch et al. (2009b) conducted a culture-independent analysis of gut bacteria in BB-DP and biobreeding diabetes-resistant (BB-DR) rats and showed that, at the time of diabetes onset, the bacterial communities in these two rat strains differed significantly. Stool from BB-DR rats contained much higher populations of probiotic-like bacteria, such as Lactobacillus and Bifidobacterium, whereas BB-DP rats had higher numbers of Bacteroides, Eubacterium and Ruminococcus. A total of 24 bacterial species were found to differ significantly in abundance between the BB-DP and BB-DR rat samples. Five species of Clostridium were higher in BB-DP rats, whereas Clostridium hylemonae was higher in BB-DR rats. In addition, hundreds of bacterial taxa that could not be classified to genus level were also found to differ. Many Lachnospiraceae were in higher abundance in BB-DP rats, whereas many unclassified Clostridiaceae were more common in BB-DR rats. The differences in Lactobacillus and Bifidobacterium observed by pyrosequencing were confirmed by quantitative PCR.

All these results from Roesch et al. (2009b) are consistent with the notion that beneficial bacteria seem to provide a protective effect in rodent models by delaying or preventing the onset of diabetes. As BB-DP rats have lower populations of species that contain known probiotic strains than do BB-DR rats, potentially beneficial bacteria may be necessary for the maintenance of a healthy microbiome essential in preventing a leaky gut. A Lactobacillus johnsonii strain has since been isolated from the stool of the same set of BB-DR rats as was used in Roesch et al. (2009b). This strain of L. johnsonii prevents diabetes when fed to BB-DP rats (Valladares et al. 2010).

These results encouraged a close examination of gut bacteria in humans who are at high risk for autoimmunity and T1D. Human stool samples for such an analysis have been collected by the Diabetes Prediction and Prevention study (DIPP) in Finland (Nejentsev et al., 1999; Kupila et al., 2001). DIPP has been collecting stool samples from children on the basis of their genotype since 1994. At birth, the HLA-DQ genotypes of babies are determined. Those infants who possess specific HLA genotypes are considered to be at high risk for autoimmunity and progression toward T1D early in life. When children enter the study, stool samples are collected every 3 months and blood samples are collected and assayed for the presence of specific autoantibodies. Once two autoantibodies are detected, the child is diagnosed as having seroconverted to autoimmunity for T1D.

Materials and methods

The samples used in this study came from a total of eight Finnish children, each represented by three stool samples collected at three time points, in a total of 24 separate samples. The case children all developed autoimmunity and eventually T1D overtime (Table 1). Autoimmunity was diagnosed by the appearance of at least two autoantibodies. Each of these cases is matched with three samples from a child of the same age and HLA-DQ genotype who did not become autoimmune during the study.

Table 1 Age (days after birth) of each subject at the three time points of collection

Stool samples were collected by parents at home and delivered to the repository for frozen storage within 48 h. Recent work has shown that storage at room temperature for up to 72 h has a minimal effect on stool bacterial community structure (Roesch et al., 2009a). The case children became autoimmune at or near the time of the third sampling (Table 1). Autoimmunity in T1D is defined as the appearance of two autoantibodies in the serum as described by The Environmental Determinants of Diabetes in The TEDDY Study Group, (2007). Each case subject was matched with a healthy (that is, non-T1D-associated autoantibody positive, nondiabetic) child of the same genotype and of approximately the same age.

DNA extraction, 16S rRNA amplification and pyrosequencing were performed as described previously (Roesch et al., 2009a). An average of 15 709 sequences were obtained for each of the 24 samples (Table 2). The barcodes used in this study to differentiate the samples are described in Supplementary Table 1. The original sequences and the corresponding quality scores are submitted to GenBank as study accession number SRP002359.1.

Table 2 Characteristics of samples collected from the DIPP study

The sequences were processed and analyzed to determine differences at all taxonomic and community levels using PANGEA (Giongo et al., 2010a). In PANGEA, small sequences are discarded, poor quality ends are trimmed, 16S rRNA sequences are separated by barcode, the closest cultured relative of each sequence is identified using Megablast, data are collated into stables, statistically significant differences in the abundance of taxa are determined and the Shannon diversity index (Shannon and Weaver, 1949) of each community is calculated. Classification was performed using an up-to-date RDP-II database modified using TaxCollector (Giongo et al., 2010b). The Shannon diversity index was chosen as it considers both the presence and abundance of operational taxonomic unit. The significant differences between taxa are determined in a manner that includes a false discovery rate determination and is carried out using a modified χ2-test.

To test whether the microbial communities from controls are more similar to each other than those from cases, UniFrac (Lozupone et al., 2006) pairwise distances in between all pairs of cases and all pairs of controls were determined. Specifically, for each one of the six possible pairs of individuals in the same group, a corresponding phylogenetic tree was generated using MUSCLE (Edgar, 2004). The sum of the UniFrac distances of the trees in the case group was computed, as well as the sum of the six UniFrac differences in the control group. The difference between these sums (D_observed) was used to calculate the difference between cases and controls. A permutation test was calculated to verify whether the difference was statistically significant.

To assess the significance of the UniFrac distances described above, we ran a permutation test as follows (R code is available from the authors). For each phylogenetic tree, we randomly permute the labels of the individuals (children) and calculate D for the data with permuted labels. This is repeated 1000 times to obtain a sample of differences (D_permuted), and to create a histogram that estimates this distribution. As the labels are permuted, the populations are now equivalent, and the average value of D should be closer to zero.

If the observed distance, D_observed, is in the tail of the sample of D_permuted, pairwise distances in the case group are significantly greater than those of the control. In other words, the microbial communities in healthy children are more similar to each other.

To test whether the microbial communities from controls are more similar to each other than those of cases during the time course, samples from the same time point were grouped before the clustering using cluster database at high identity with tolerance (Li and Godzik, 2006) and then submitted to principal coordinate analysis using weighted UniFrac (Lozupone et al., 2006).


Stool samples were obtained from eight children in the DIPP in Finland. These children represented four matched case–control pairs in which the samples of each autoimmune child were paired with samples from a nonautoimmune child of approximately the same age and HLA genotype (Table 1). Samples from each individual were taken at three time points (Table 1). The first sample was taken between 4 and 8 months of age before any child was found to have autoantibodies. In two of the four case subjects, the first autoantibody appeared about 6 months before the second collection point. In all four case subjects, the second autoantibody was detected within several months of the third collection point. Ultimately, all four cases were diagnosed with T1D.

After trimming the 16S rRNA sequences for low-quality sequences and bases, a total of 390 759 sequences were useful in the analysis of the 24 stool samples examined for this case–control experiment (Table 2). This represents an average of 15 709 sequences per sample. The number of sequences identified at six taxonomic levels shows that the number of species identified in all samples was 377, with the species defined at the 99% level of similarity. The classification of sequences at each of the six phylogenetic levels for all cases and controls at each collection time point was determined (Tables 3 and 4; Supplementary Tables 2–5).

Table 3 Mean percent of total reads for all phyla identified in the case and control samples
Table 4 Mean percent of total reads for those species (or OTU, operational taxonomic unit) identified that showed significant differences in abundance between cases and controls at any time point of collection

Phylum level

By far, the two most striking differences between the healthy (nonautoimmune) and autoimmune stool microbial communities were the differences within the two most abundant phyla, the Bacteroidetes and the Firmicutes (Table 3). In case samples, the Bacteroidetes sequences increased from 53.27% of all sequences at the first collection point to 69.17% of all sequences at the third collection point, whereas in control samples, Bacteroidetes sequences decreased overtime from 76.13% to 54.65% of all sequences (Table 3). In contrast, the second most abundant phylum, the Firmicutes, expressed an inverse pattern. The Firmicutes sequences declined in case samples overtime from 43.41% to 20.66% of all sequences, whereas they increased in control samples from 21.78% to 25.89% of all sequences (Table 3). The differences in abundance of both Bacteroidetes and Firmicutes observed between cases and controls were significant at all three time points.

Class and order levels

Similar trends occured at the class level for the two most abundant classes, the Bacteroidetes and the Clostridia (Supplementary Table 2). As the Bacteroidetes become more abundant in the case samples overtime, they became less abundant in the control samples overtime. Conversely, as the abundance of Clostridia decreased overtime in the case samples, the Clostridia sequences increased in control samples. These trends repeated at the order level: the Bacteroidales sequences increased in case samples, whereas they decreased in control samples and the Clostriales sequences increased in control samples, whereas they decreased in case samples. Four other orders differed at one time point or another between cases and controls, but no clear trends emerged overtime (Supplementary Table 3).

Family level

Analysis at the family level showed that three families among the Firmicutes, the Ruminococcaceae, the Lachnospiraceae and the Eubacteriaceae, were significantly more abundant in controls than in cases at time point 3. In contrast, Veillonellaceae, also in the Firmicutes, were present at significantly higher levels in cases than in controls at all time points, but decline rapidly overtime in cases. Bacteroidaceae was the dominant family of the Bacteroidetes phylum and was significantly more abundant in cases than in controls (Supplementary Table 4). Among Bacteroidetes, the Porphyromonadaceae sequences were present at greater levels in controls than in cases at all time points. The Rikenellaceae family was also significantly more abundant in cases compared with controls at the second and third time points.

Genus level

At all time points and in both cases and controls, more than 92% and 81% of all case and control sequences, respectively, could be classified to the genus level. Hence, the majority of organisms in these samples are well known to science, and their physiology and morphology are understood. The Bacteroides is by far the dominant genus in these samples, representing over two-thirds of all sequences early in controls, and at the third time point in cases. In cases, the number of Bacteroides increases overtime and is significantly higher than that in controls at time points 2 and 3. In controls, however, Bacteroides declines from 66.47% to 38.63% of the bacteria present in stool samples. Two genera in the Firmicutes, Eubacterium and Faecalibacterium, increase dramatically overtime in controls compared with cases, and together represent more than 13% of the total population of bacteria in autoimmune children at time point 3 (Supplementary Table 5).

Species level

At the 99% identity level, 59% and 49% of case and control sequences, respectively, were classified to named and cultured species. Of the 377 bacterial species identified in these samples, 51 species statistically differed in abundance between cases and controls with a P-value <0.01 in at least 1 time point (Table 4). At the species level, specific taxa make large contributions to the overall differences between cases and controls. Perhaps the most striking example is Bacteroides ovatus. Nearly one-fourth of the difference between cases and controls within the phylum Bacteroidetes can be explained by this single species (Figure 1a). At the first time point, there are slightly more B. ovatus sequences in control samples than in case samples, but, at the time of autoimmunity, there are 16-fold more B. ovatus sequences in cases than in controls (Table 4).

Figure 1
figure 1

Significant differences in taxa between cases (autoimmune) and controls (healthy). Samples were collected approximately 4 months, 1 year and 2 years after birth, represented, respectively, as time points 1, 2 and 3: (a) increasing numbers of Bacteroidetes in cases overtime compared with controls; (b) increasing numbers of Firmicutes in controls overtime compared with cases; and (c) higher proportion of unclassified sequences in controls compared with cases. Significant differences between cases and controls are designated by a star (P0.002).

Most other Bacteroides species are also present in greater proportions in cases than in controls. However, two Bacteorides species, B. vulgatus and B. fragilis, are observed at much higher levels in controls than in cases at the third time point and represent more than 11% of total sequences in the control samples.

Another bacterium of interest is the human firmicute CO19 (Table 4). Although not sufficiently characterized to be given a species name, this organism has been cultured (Hayashi et al., 2002). At the 99% similarity level, more than 7% of all control sequences at the third time point clustered with human firmicute CO19. Although increasing in both cases and controls overtime, the abundance of the human firmicute CO19 at the third time point seems to be nearly fourfold greater in controls than in cases.

At the third collection time point, just 37 of 377 species differed significantly between cases and controls (Table 4). The 15 species that are higher in abundance in controls represent 30% of all control sequences at time point 3. Similarly, the 22 species that are much higher in cases than in controls represent nearly half of all sequences present in cases in autoimmune children.

The unclassified sequences

At all taxonomic levels, the sequences that cannot be classified to known taxa are significantly more abundant in control samples than in cases at the third time point (Figure 1). As expected, the number of unclassified sequences increases as the phylogenetic classification becomes more restrictive. Hence, the proportion of unclassified sequences at the phylum level is <1% of all sequences, whereas the proportion of unclassified sequences at the species level is over 30% in some control samples.

Community diversity indices

At the genus level, bacterial diversity, as measured through the Shannon index, increases overtime in control samples (Figure 2). At the third time point, the diversity index is significantly higher in control communities than in case communities (P-value <0.05).

Figure 2
figure 2

Bacterial community differences between cases and controls during autoimmunity development in cases; (a) significant increase in Bacteroidetes with concomitant decrease in Firmicutes in cases compared with controls (P-value 0.01 at all time points; (b) significantly higher (P<0.05) Shannon diversity index in controls compared with cases in time point 3. Significant differences between cases and controls are designated by a star. The P-values for time points 1, 2 and 3 are (a) 0.0000, 0.0000 and 0.0000, and (b) 0.80, 0.33 and 0.03, respectively.

In addition, the analysis of microbial communities using principal coordinate analysis shows that the bacterial communities in control samples are more similar to each other than are the bacterial communities in case samples (Figures 3 and 4). No differences in community diversity were observed at time point 1 at the 10% confidence interval. However, the average distance between any pair of cases was significantly higher than between any pair of controls at the second and third collection points at the 5% and 10% level of confidence, respectively. Although these confidence intervals are relatively high, they are remarkable, given that only six pairwise comparisons were available in cases and control.

Figure 3
figure 3

Histograms showing the permutation test based on the UniFrac significance obtained from the three time points (a, b and c). Dashed blue lines represent the 0.10, 0.05 and 0.01 quantiles, and the red line indicates the value of the observed difference. No differences in community diversity were observed at time point 1 at the 10% confidence interval. However, the average distance between any pair of cases was significantly higher than that between any pair of controls at the second and third collection points at the 5% and 10% level of confidence, respectively. A summary of data over all time points is shown in (d). (The color version of this figure is available in online version only).

Figure 4
figure 4

Principal coordinate analysis for the case and control communities at time points 1 (a), 2 (b) and 3 (c).


The data presented here were analyzed at various taxonomic levels and at the community level to identify specific taxa and community characteristics that differ between microbiomes in healthy children and autoimmune microbiomes in children who develope T1D. One striking result of this analysis is the decline in Firmicutes and increase in Bacteroidetes in the gut microbiome overtime as children become autoimmune; in contrast, Firmicutes increase as Bacteroidetes declines in healthy children (Figures 1a, b and 2a). These two phyla comprise more than 80% of the sequences at all time points in both cases and controls. Phylogenetic analysis revealed that virtually all changes that occurred overtime between cases and controls within the Bacteroidetes phylum were attributed to a single genus, Bacteroides. More than one-fifth of the changes that occurred within the genus Bacteroides can be ascribed to a single species, B. ovatus (Figure 1a). Nearly all of the increased abundance in Firmicutes in control samples compared with cases can be attributed to a single order, Clostridiales (Supplementary Table 3). Over 17% of that increase is ascribed to a single species represented in the literature by a single strain, human intestinal firmicute CO19 (Figure 1b), which was isolated from the intestine of a healthy human subject (Hayashi et al., 2002).

The dysbiosis observed at phylum level between Bacteroidetes and Firmicutes in the human gut has been described in several human disorders. As described in previous studies, the ratio between Firmicutes and Bacteroidetes in human type 2 diabetes declines compared with controls (Larsen et al., 2010). In Crohn's disease, both Bacteroidetes and Firmicutes seem to decline, whereas Proteobacteria increase (Frank et al., 2007; Willing et al., 2009). The reverse was observed in obesity, the imbalance is observed by the reduction in the Bacteroides proportion in obese human subjects, with a corresponding increase in the Firmicutes/Bacteroidetes ratio. Among Firmicutes, Lactobacillus numbers seem to increase in obese patients (Turnbaugh et al., 2006; Armougom et al., 2009).

In all four pairs of cases and controls at the third time point, 22 bacterial species were significantly more abundant (P<0.01) in cases than in controls. Of those 22 species, 5 species (bacterium mpn-isolate group 18, B. ovatus, Bacteroides sp. CJ78, B. thetaiotaomicron and B. uniformis) represented more than 1% of all sequences each. Similarly, 15 bacterial species were found to be significantly more abundant (P<0.01) in controls compared with cases in all four pairs of children at the third time point. Of those species (Bacteroides fragilis, B. vulgatus, Eubacterium eligens, E. rectale, Faecalibacterium prausnitzii, human intestinal firmicute CB47 and human intestinal firmicute CO19), each represented at least 1% of all sequences. Thus, this study identified highly abundant bacteria in the gut microbiomes that are either negatively or positively correlated with the development of autoimmunity in children who are at high risk for the onset of T1D.

Three lines of evidence suggest that, overtime, nonautoimmune children are able to build a healthy and stable gut microbiome, whereas the microbiomes of autoimmune children are less diverse and unstable. Hence, we now refer to this unhealthy, unstable microbiome as the autoimmune microbiome for T1D. The evidence presented here does not prove that the microbiomes of case children are less healthy than those of control children. These data, however, do build a framework on which specific questions can be asked regarding the autoimmune microbiome. These three lines of evidence are as follows:

First, as children become older, changes in the composition of the intestinal microbiota take place depending on the feeding stage of childhood (Favier et al., 2002). Near 2 years of age, the microbiota of children become more similar to the adult and present large changes thereafter (Tiihonen et al., 2009). In this study, healthy children contain a significantly higher number of poorly known, unclassified microorganisms than do autoimmune children overtime (Figure 1c). The presence of a greater proportion of unclassified organisms in microbiomes of healthy subjects suggests that these individuals may host a greater proportion of nonpathogenic organisms than do autoimmune subjects. Historically, microbiology has focused on the characterization of microorganisms that cause disease, as well as their mechanisms of pathogenesis (de Kruif, 1926; Lechevalier and Solotorovsky, 1974; Sapp, 2009). Thus, if a bacterium is found that causes disease, the scientific community studies this organism immediately and with great intensity. As a result, if a bacterium is a pathogen, it is far more likely to be classified and characterized than if it is not pathogenic. Thus, if the autoimmune microbiome possesses more organisms with a pathogenic potential than the healthy microbiome, the number of unclassified sequences would be expected to be lower in autoimmune subjects compared with healthy ones.

Second, bacterial diversity increased in controls, but remained constant in cases (Figure 2b). The same phenomenon was described in Crohn's disease, in which decreased diversity of fecal microbiota of cases in comparison with controls (Manichanh et al., 2006; Sartor, 2008). A decrease in microbial diversity in the gut was also observed in obese individuals in relation with lean individuals (Turnbaugh et al., 2009). This result may be caused by a disturbance, likely an immunological response, in autoimmune children that reduces the microbial diversity in these case subjects. In natural systems, microbial communities are very sensitive to disturbance and are not sufficiently resilient to return to their original states (Allison and Martiny, 2009). Reduced community composition decreases the set of ecosystem processes available in this community. In the human intestine, limited diversity may lead to a reduced capacity to digest a diverse diet, leading to lower energy levels in affected individuals. Hence, a gut microbial community with less diversity may lead to less healthy individuals or may be an indicator of an unhealthy state.

Finally, the mean molecular distance between any two case samples is greater than the molecular distance between any two control samples collected at any of the time points (Figures 3 and 4). Thus, although control communities are more diverse than case communities, overall, control communities are much more similar to each other and show a community stability that is lacking in case communities. This community difference between any two case samples suggests that the autoimmune communities are much less stable.

Of all the differences observed between autoimmune and healthy microbiomes, there are two indicators of the autoimmune microbiome that can be observed at the earliest time points, before the first appearance of autoantibodies in case subjects. The first of these characteristics is the instability of the autoimmune microbiome. The second characteristic is the high ratio of Firmicutes to Bacteroidetes in cases, which is observed within the first 6 months after birth, compared with the low ratio found in controls. Both these characteristics may be early diagnostic markers of pending autoimmunity that can be observed before the first appearance of autoantibodies in serum. This knowledge may allow early interventions that can delay or prevent autoimmunity in children by alerting the intestinal microbiome or by other means.

Taken collectively, these findings support the concept of an autoimmune microbiome for T1D. Although the number of samples used here is too low to make any firm conclusions, these data suggest that the autoimmune microbiome for this disease tends to have more classified members but decreased diversity and reduced stability when compared with a healthy microbiome. The forces at work that result in the autoimmune microbiome are unknown, but may be related to the breeching of the epithelial layer combined with abnormal immune responsiveness. This work provides a model of the autoimmune microbiome for T1D and delineates four features that can serve as springboards for further analysis. The defining characteristics of the stable, healthy microbiome and the unstable, unhealthy autoimmune microbiomes are reminiscent of the famous first line from Tolstoy′s Anna Karenina: ‘All happy families are alike; each unhappy family is unhappy in its own way.