Introduction

Crohn's disease (CD) is a chronic relapsing inflammatory disorder of the gastrointestinal tract with an unknown etiology. Available data suggest that inflammation occurs due to an imbalanced mucosal immune response to the commensal bacteria in genetically susceptible individuals (Sartor, 2006).

The knowledge about genetic factors that are relevant for CD has increased considerably during recent years and several susceptibility genes have been associated with CD. For example, polymorphisms in pattern recognition receptors, such as CARD15/NOD2 that recognize microbial components, have highlighted the importance of the microbiota in pathogenesis of CD (Sartor, 2006). The genetic influence is also supported by higher concordance rates (approximately 50%) for CD occurrence in monozygotic twins (Tysk et al., 1988; Orholm et al., 2000; Halfvarson et al., 2003; Jess et al., 2005). Still, approximately 50% of identical twin pairs are discordant for CD (that is, one is diseased and one is healthy) demonstrating that environmental factors are also important for disease incidence (Loftus, 2004; Halfvarson et al., 2006).

It has been difficult to correlate specific causative bacterial agents to CD. An increased prevalence of mucosal bacteria has been observed in CD patients, with higher levels of E. coli and Bacteroides species (Keighley et al., 1978; Swidsinski et al., 2002, 2005). Representatives of these bacteria have also been demonstrated to induce colitis when inoculated into germ free animals, however, with conflicting results (Sartor, 2003). Recently, increased levels of adherent, invasive Escherichia coli were found in ileal tissues of CD patients (Darfeuille-Michaud et al., 1998; Barnich and Darfeuille-Michaud 2007). Furthermore, there have been reports of reduced numbers and a lower diversity of Firmicutes in the gut microbiota in individuals with CD (Gophna et al., 2006; Manichanh et al., 2006). However, the total microbiota of a patient with CD has not yet been profiled to a degree where its composition can be considered predictive for disease development, and specific bacterial species responsible for the bowel inflammation have not yet been identified.

The search for a causative disease agent is complicated by the great individuality of the gut microbiota with little overlap between individuals (Zoetendal et al., 1998; Eckburg et al., 2005; Dicksved et al., 2007). However, previous findings have shown that there is a high similarity in the composition of the fecal microbial communities in monozygotic twins (Van de Merwe et al., 1983; Zoetendal et al., 2001; Stewart et al., 2005). Therefore, one way to unravel the respective contributions of host genetics and commensal bacteria towards CD development and establishment would be to study identical twins.

The aims of this study were (1) to determine if the gut microbiota in healthy twins have a higher degree of similarity than in twins that have CD and (2) to determine whether there are differences in the composition of the gut microbiota in individuals that have CD compared to healthy individuals. In particular, we focused on a set of discordant twin pairs, where one individual is healthy and one has CD, because they provided each other's genetically matched control, thus enabling us to focus on changes in the gut microbiota according to disease state.

To test these hypotheses, we used molecular approaches to provide community profiles of the fecal microbiota. By focusing on the nucleic acid composition of the gut microbiome, we were able to overcome the known biases with cultivation-based approaches. Current estimates are that only 20% of the total species residing in the human gut have been cultivated to date (Eckburg et al., 2005). Therefore, we are still greatly limited in our knowledge about the physiology and ecology of the majority of the gut microbiota.

In this study, we used the molecular fingerprinting approach, terminal-restriction fragment length polymorphism (T-RFLP), to monitor the bacterial community architecture in concordant and discordant identical twins with CD, and healthy twins. In addition, we used percent guanine+cytosine (%G+C) profiling of the total bacterial microbiome as a complementary approach. Furthermore, we aimed to identify members of the microbiota that could be linked to CD incidence or development. We specifically targeted members of the Bacteroides genera as these are dominant members of the commensal biota with functionally important roles in the gut. Some Bacteroides species have previously been shown to be present in increased levels in persons suffering from IBD (inflammatory bowel disease) (Swidsinski et al., 2005; Gophna et al., 2006), yet animal models have provided conflicting evidence as to which species may cause CD, warranting closer examination.

Materials and methods

Patient cohort

The twins with CD were derived from a Swedish twin population, described previously (Tysk et al., 1988; Halfvarson et al., 2003, 2004). In short, twin pairs where at least one twin in each pair had been hospitalized for IBD, were identified by running the Swedish twin registry against the Swedish Hospital Discharge Register. All twins were sent a questionnaire concerning diagnosis of IBD, general gastrointestinal symptoms and exposure to environmental factors. After consent from each twin, the medical notes of all twins were evaluated, to verify or refute the diagnosis of IBD and to characterize the disease according to the Montreal classification (Satsangi et al., 2006). Zygosity was assessed by a questionnaire-based method, applied by the Swedish twin registry (Cederlöf et al., 1961). It relies on questions on childhood resemblance and has been shown to be very accurate (Lichtenstein et al., 2002). Monozygotic twin pairs with CD born between 1936 and 1986, who had approved further contact and had not undergone extensive CD-related surgical resections, that is colectomy, were invited to undergo colonoscopy. Ten monozygotic twin pairs were studied, 6 of these were discordant and 4 were concordant for CD, for a total of 14 individuals with CD. Data on age, disease location, disease duration, and disease behavior at diagnosis in the CD twins is presented in Table 1. All diseased twins, except two (labeled; 10b and 15a; Supplementary Table 1), were in clinical remission according to the Harvey–Bradshaw score (Harvey and Bradshaw 1980). All remaining twins were in endoscopic remission or had only post-inflammatory changes. All twins were asked to send fecal samples 7–10 days prior to the colonoscopy. In addition, they submitted responses to a questionnaire regarding usage of antibiotics or non-steroid anti-inflammatory drugs within the last 12 months, gastroenteritis within the last 3 months or specific dietary habits, and this information is provided in Supplementary Table S1.

Table 1 Clinical characteristics of the twins with Crohn's disease according to the Montreal classification (n=14)

Eight healthy twin pairs, five monozygotic and three dizygotic pairs, not suffering from any gastrointestinal disease, were also invited to participate but did not undergo colonoscopy. The mean (range) age of these twins was 19 (6–56) years. All healthy twins sent fecal samples and responded to the same questionnaire described above. All collected fecal samples were placed in a freezer at −70 °C, immediately after arrival, that is, at most 1 day after the samples were collected, and were stored there until analysis. For detailed characteristics of the twins, see Supplementary Table S1. The Örebro County Ethical Committee approved the use of human subjects for this study (Dnr; 167/03).

Percent guanine+cytosine profiling of the bacterial community DNA

Bacterial cells were extracted from 0.5 g fecal samples by differential centrifugation as described previously (Apajalahti et al., 1998). The isolated bacteria were then lysed and DNA was purified by a protocol comprising enzymatic, chemical and physical steps as described elsewhere in detail (Apajalahti et al., 1998, 2001). The DNA was fractionated by 72 h CsCl equilibrium density gradient centrifugation, which fractionates chromosomes of the component taxa, based on their characteristic G+C content as described previously (Apajalahti et al., 1998, 2001; Holben et al., 2004). This separation is based on differential density imposed by the AT-dependent DNA-binding dye bis-benzimidazole. Following ultracentrifugation, a Brandel model SYR-94 syringe pump (Brandel Inc., Gaithersburg, MD, USA) was used to pass the formed gradients through an ISCO UA-5 UV absorbance detector (ISCO Inc., Lincoln, NE, USA) set to 280 nm. The %G+C content represented by each gradient fraction was determined by linear regression analysis (r2>0.99) of data obtained from control gradients containing standard DNA samples of known %G+C composition as described previously (Apajalahti et al., 1998). This procedure requires a minimum of 30 μg high molecular weight DNA from each sample.

PCR amplification conditions

DNA was extracted from duplicate 250 mg samples from each fecal sample using the MoBio Power Soil DNA Kit (MoBio, Solana Beach, CA, USA), according to the manufacturer's instructions. 16S rRNA genes were PCR amplified from each DNA extract (two technical replicates per extract) using the general bacterial primers Bact- 8F (5′-AGAGTTTGATCCTGGCTCAG-3′) (Edwards et al., 1989), 5′ end-labeled with 6-carboxyfluorescein, and 926r (5′-CCGTCAATTCCTTTRAGTTT-3′) (Muyzer et al., 1993) using conditions that have been described in detail elsewhere (Dicksved et al., 2007).

In addition, 16S rRNA genes of the B. fragilis subgroup were specifically PCR amplified using a B. fragilis subgroup-specific reverse primer, g-Bfra-R (5′-CCAGTATCAACTGCAATTTTA-3′) (Matsuki et al., 2002) in combination with the same end-labeled Bact-8F general bacterial forward primer mentioned above. PCR amplification was carried out with an initial denaturation step at 95 °C for 3 min, followed by 30 cycles consisting of 20 s at 95 °C, 20 s at 49 °C and 30 s at 72 °C. The reaction was completed with a final primer elongation step at 72 °C for 5 min. PCR amplified DNA product amounts and sizes were confirmed by agarose gel electrophoresis using GeneRuler 100bp DNA ladder Plus (Fermentas Life Sciences, Burlington, Canada) as a size marker.

Terminal-restriction fragment length polymorphism

PCR products were digested with the HaeIII restriction enzyme (GE Healthcare, Uppsala, Sweden) and the digested fragments were separated on an ABI 3700 capillary sequencer (ABI Applied Biosystems, Foster City, CA, USA), as described previously (Hjort et al., 2007). The sizes of the fluorescently labeled fragments were determined by comparison with the internal GS ROX-500 size standard (ABI). T-RFLP electropherograms were imaged using GeneScan software (ABI). Relative peak areas of each terminal restriction fragment (TRF) were determined by dividing the area of the peak of interest by the total area of peaks within the following threshold values; lower threshold at 50 bp and upper threshold at 500 bp. Data was normalized by applying a threshold value for relative abundance at 0.5% and only TRFs with higher relative abundances were included in the remaining analyses.

Cloning and sequencing

Cloning and sequencing of 16S rRNA genes from DNA extracted from the fecal samples were performed to confirm the identities of bacterial species corresponding to dominant TRFs from the Bacteroides data set. DNA samples from four twin pairs (one healthy, one concordant and two discordant pairs) were amplified using the B. fragilis subgroup-specific primer g-Bfra-R in combination with the general Bact-8F primer. Three replicate PCR products from each individual were pooled and gel purified using the Qiagen gel extraction kit (Qiagen, Hilden, Germany). A total of eight libraries were constructed by inserting PCR products into TOPO TA PCR 4.0 vectors (Invitrogen, Carlsbad, CA, USA), followed by transformation into E. coli TOP 10 competent cells (Invitrogen). A total of 24 inserts from each library were PCR amplified using vector primers M13f and M13r (Invitrogen) using the same thermal cycling program as described above for amplification using general bacterial primers for T-RFLP. The PCR products were diluted 50-fold and used in a nested PCR reaction with primers g-Bfra-R and fluorescently tagged primer Bact-8F for T-RFLP analysis of inserted clones, with the same running conditions as described previously for these primers (see above). All clones with unique TRF sizes were selected for sequencing, in addition to several clones from redundant TRFs, for a total of 136 clones. Obtained sequences were examined using MacVector 8.1.1 (Accelrys Software Inc, San Diego, CA, USA), to remove redundant sequences. The remaining sequences were aligned against GenBank database entries using standard nucleotide BLAST at NCBI (URL: www.ncbi.nlm.nih.gov). Hits defined as unknown or uncultured bacteria were subsequently aligned against sequenced bacterial genomes (genomic BLAST at NCBI), as well as examined with the Ribosomal Database Project II Sequence Match, in an attempt to classify them. Sequences with 99–100% identity, were given the same name as the species hit. Sequences with 97–99% identity were assigned ‘spp-like’. Sequences were aligned using the online MAFFT (standard FFT-NS-i) aligner (Katoh et al., 2002), followed by construction of a circular Neighbor-Joining tree, using BioNJ settings in PAUP4b10 (Swofford, 1993). Unique sequences were deposited in GenBank at NCBI, under the following accession numbers: EU381163–EU381180.

Statistical analysis

The samples were initially statistically assessed as a blind study; that is, without any prior knowledge of disease status or twin relationship, to avoid potential biases in subsequent data analyses. T-RFLP data from each individual was normalized and entered into a data matrix that consisted of the TRFs as variables and individuals as objects. A consensus T-RFLP profile, from each biological replicate, was constructed by averaging the technical duplicates. Principal component analysis (PCA) plots were generated using the multivariate statistics software Canoco (version 4.5, Microcomputer Power, Ithaca, NY, USA) and statistical significance of ordination was tested using a Monte Carlo permutation test with 999 permutations. Diversity, defined as evenness and richness of the bacterial community members detected as TRFs by T-RFLP analysis, was calculated using Simpson's index of diversity (D) (Begon et al., 2006) and Shannon's diversity (H) and equitability index (E) (Begon et al., 2006). Differences in diversity between different groups of twins were analyzed by Mann–Whitney's U-test. Agreement of diversity within twin pairs was analyzed by calculating the intra-class correlation coefficient (ICC) according to Dunn (1989). A good agreement is indicated by an ICC value higher than 0.8, fair agreement by values between 0.8–0.4 and a great disagreement by negative values. Differences in bacterial composition (TRF data) within each of the twin pairs were computed with Manhattan distances, and significance between the groups was tested with an analysis of variance and Tukey's post hoc test. T-RFLP binary data, that is presence or absence of TRFs, was analyzed by cluster analysis using Jaccard's similarity index. P values <0.05 were considered statistically significant.

Results

Percent G+C profiling

Percent G+C profiling of the bacterial chromosomes recovered from the fecal samples was used to detect major differences in the fecal bacterial communities of healthy and diseased individuals in a subset of the twin samples. The power of this method is its robustness; that is, it examines a large pool of DNA representing the microbial community of interest, and is not susceptible to biases caused by primer mismatches or PCR inhibitors. Six monozygotic twin pairs (one healthy, two discordant and three concordant) were analyzed using this approach. The %G+C profiles that were obtained from the healthy twin pair were very similar, which shows that the major bacterial genera present were similar (Figure 1). Microbial communities in the feces of the twin pairs that were concordant for CD were also very similar, but the profiles from discordant twin pairs were dissimilar, illustrating that the microbial composition differed when one of the twins was healthy and the other had CD (Figure 1). It was not possible, however, to distinguish a common pattern for sick or healthy individuals by assessment of the %G+C profiles in this sample set.

Figure 1
figure 1

Percent guanine+cytosine profiles for six twin pairs: healthy pair (a), two discordant (b and c) and three concordant (df) twin pairs for Crohn's disease (CD). Healthy individuals are labeled dark or light blue and CD individuals are labeled pink or red. Sample identifications are provided at the top of each panel according to assignments given in Supplementary Table S1. Horizontal axis shows GC content in percent and the vertical axis indicates relative absorbance values in percent.

T-RFLP profiles using general bacterial primers

Terminal-restriction fragment length polymorphism was used to obtain bacterial community profiles from fecal samples obtained from 10 monozygotic twin pairs with CD (concordant; n=4 and discordant; n=6) and 8 healthy pairs. The reproducibility of the T-RFLP data was very high within technical and biological duplicates. Similarity scores for biological replicates were generally higher than 90%, regardless if abundance data (Manhattan index) or binary data (Jaccard's index) were used. Similar to the %G+C profiling results shown in Figure 1, we found that the T-RFLP patterns were more similar for healthy twins, and for some of the concordant twin pairs, whereas the discordant twins had large differences in their T-RFLP profiles. An illustrative example of the distribution of TRFs for discordant, concordant and healthy twins is shown in Figure 2a.

Figure 2
figure 2

Terminal restriction fragment (TRF) distributions generated using general bacterial primers (a), and a Bacteroides group-specific reverse primer (b), for three twin pairs (one healthy, one discordant and one concordant for Crohn's disease). Identification of individuals according to assignments given in Supplementary Table S1 is shown below each pie chart. Each area represents the relative abundance of a particular TRF. TRFs of the same size are the same color for all individuals and for both a and b considering that the forward primer used for PCR was the same in all cases.

Community diversity based on TRF diversity

Diversity indices were used to determine the richness (number of TRFs) and evenness of the T-RFLP profiles. The TRF diversity was significantly higher in the healthy group, median (range) 0.91 (0.82–0.93) than in CD patients, median (range) 0.87 (0.71–0.94) when Simpson's index of diversity was used (P=0.029). However, this significance could not be reproduced for the entire sample group with Shannon's diversity (H) and equitability (E) index (Figure 3). Nevertheless, all healthy individuals in the discordant twin pair sets had a higher TRF diversity than their matched disease twin according to pair wise comparisons using all diversity indices. Using the ICC, a high agreement was observed within healthy pairs (ICC=0.51). In contrast, this high agreement was not observed in discordant (ICC=−0.16) or concordant twin pairs with CD (ICC=−0.05).

Figure 3
figure 3

Box plots comparing the diversity of T-RFLP profiles calculated for healthy individuals to those with CD. (a) Simpson's index of diversity, (b) Shannon's richness index and (c) Shannon's evenness index. CD, Crohn's disease; T-RFLP, terminal-restriction fragment length polymorphism.

Multivariate analyses of T-RFLP profiles

The T-RFLP data representing the gut microbial community profiles were analyzed using multivariate statistics separately for the healthy twin pairs (Figure 4a) and the twin pairs that were concordant or discordant for CD (Figure 4b). Principal component analyses of the T-RFLP profiles obtained from the healthy twin pairs clearly demonstrated that the bacterial community profiles were highly similar to each other for both the first and the second principal component (x and y axes on the PCA plot, respectively), for individuals of a given pair (Figure 4a). The second principal component differed for only one of the pairs (6a and 6b in Figure 4a). The dizygotic twins ((3a,b, 5a,b, and 8a,b) in Figure 4a) were as similar to each other as the monozygotic pairs in their microbial profiles (Figure 4a).

Figure 4
figure 4

Principal component analysis (PCA) plots for T-RFLP profiles (including TRF size and relative abundance data) for healthy twins (a) and twins discordant or concordant for CD (b) eigenvalues are shown in parentheses for PC1 and PC2. (c) shows a similarity plot based on the binary T-RFLP data (that is, presence or absence of TRFs) for all individuals calculated using Jaccard's index and UPGMA as a cluster method. Twin pairs are coded according to colors and shapes of symbols: healthy individuals in control group, closed triangles; healthy individuals in discordant twin pairs, closed circles; individuals with CD, closed squares. Individuals within a twin pair have the same colored symbol. In c, clusters comprised of healthy individuals are indicated with blue branches whereas those for CD individuals with red. Sample ID is shown next to the symbols on the PCA plots (a and b) according to the assignments given in Supplementary Table S1. Abbreviated disease locations: I, ileum; C, colon; IC, ileocolon; U, upper gastrointestinal tract. CD, Crohn's disease; TRF, terminal restriction fragment; T-RFLP, terminal-restriction fragment length polymorphism.

The bacterial community profiles from fecal samples of twin pairs that were concordant or discordant for CD were less similar to each other compared to those from the healthy twin pairs (Figure 4b). In particular, two of the discordant twin pairs showed large differences in their community profiles (16a,b and 18a,b in Figure 4b). Interestingly, all of the healthy twins in the discordant pairs, grouped to the left of the PCA plot, suggesting that the microbial communities of the healthy individuals share some characteristics that differentiate them from many of the CD individuals. There was, however, no clear gradient that differentiated the whole CD group from the healthy group. However, the bacterial community profiles of twins with ileal involvement were separated from the others and grouped to the right of the PCA plot (Figure 4b). In contrast, the community profiles of twins with colonic disease were similar to those of the healthy individuals, and localized to the left region of the PCA plot (Figure 4b). This separation according to location of the disease was highly significant in ordination space (P=0.001).

To verify the results shown with PCA, Manhattan distances were computed for each twin pair to determine the similarities of the microbial communities within twin pairs. The T-RFLP profile similarities within pairs were significantly different when making between-group comparisons (P=0.008), with the highest degree of similarity in healthy pairs compared to concordant (P=0.019) or discordant (P=0.033) pairs. In addition, by comparing Manhattan differences within discordant pairs, we observed that individuals with ileal CD involvement were less similar to their healthy matching twin compared to discordant twins with colonic CD.

Binary analyses of the T-RFLP data

The T-RFLP binary data, that is, presence or absence of TRFs, was analyzed by cluster analysis using Jaccard's similarity index. In this analysis all sample data were analyzed together, including healthy, concordant and discordant twin pairs. Four out of eight of the healthy twin pairs, both monozygotic and dizygotic, were more closely clustered to each other than to other individuals, supporting the PCA analyses shown above. Two out of four of the concordant, and one out of six discordant twin pairs, were also similar in their microbial compositions, according to binary similarities (Figure 4c). One of the older healthy pairs (6a,b) had community profiles that were similar according to Jaccard's similarity index of the binary data, but this similarity was not reflected to the same extent in the PCA plots when abundance values were included. Another older twin pair (1a,b) had community profiles that were closely clustered on the PCA plots, but not so when analyzing binary data.

Similar to the PCA analyses, the samples grouped into several clusters (Figure 4c). In particular, CD patients with ileal involvement, except for three individuals (10b, 17a and 12b), clustered separately from all others. Patients with colonic disease tended to cluster with healthy individuals. There was, however, a large cluster with 16 healthy individuals and only 2 CD patients, 1 with ileal involvement (12b) and 1 with colonic disease (14b).

T-RFLP analysis using primers targeting Bacteroides

When using general bacterial primers some trends were observed in the abundances of TRFs within discordant twin pairs with CD, possibly representing Bacteroides spp. (Figure 2).

For example, CD individuals tended to have a higher relative abundance of TRF 264 (Figure 2a, black areas) and a lower relative abundance of TRF 262 (Figure 2a, dark blue areas). According to in silico digestion of 16S rRNA genes deposited in existing databases these TRF sizes could represent Bacteroides spp. Although other genera could potentially have similar TRF lengths, it is highly likely that these are representative of Bacteroides in our samples, as it is known that Bacteroides spp. are dominant members of the fecal microbiota (Seksik et al., 2003; Ott et al., 2004; Scanlan et al., 2006) and our Bacteroides clone sequences from the same material had the same TRF sizes (see below). Therefore, a group-specific primer set was used during T-RFLP, to focus on the Bacteroides group in the same DNA extracts from the fecal samples that were previously analyzed using general bacterial primers. The T-RFLP profiles of the Bacteroides community generally had a low complexity, with one or a few dominant peaks shared by most of the individuals, however, with large differences in the abundances between individuals (Supplementary Figure S1). In contrast to the results obtained using the general bacterial primers (Figure 4a), the Bacteroides profiles within the healthy twin pairs were not more similar compared to discordant or concordant pairs (P=0.85). However, some of the twin pairs had highly similar Bacteroides profiles (over 85% similarity based on Manhattan distances), which could not be correlated to disease state (Figure 2b, and Supplementary Figure S1). Interestingly, when the discordant and concordant CD pairs were analyzed by PCA the pattern of the clustering was similar to that observed with general bacterial primers; that is, the samples from healthy twins in discordant twin pairs grouped together with individuals with colonic disease and the individuals with ileal involvement were significantly separated from the others (P=0.030, Supplementary Figure S2).

Clone libraries of Bacteroides spp.

To determine the identities of the different Bacteroides spp. detected in the T-RFLP profiles, clone libraries of the amplified 16S rRNA genes were made from four twin pairs (one concordant (15a,b), one healthy (6a,b) and two discordant pairs (12a,b and 18a,b)). The same Bacteroides group-specific target regions for PCR amplification were used as those used for T-RFLP of the Bacteroides group. A total of 24 clones from each clone library were screened for their TRF fragment sizes (192 total) and 136 of these were sequenced. Most of the clones were identified as B. vulgatus, B. uniformis and B. ovatus (Figure 5). TRFs 262 and 264 matched to sequences corresponding to B. uniformis and B. ovatus, respectively, and both TRFs 83 and 142 matched to B. vulgatus sequences. Some of the CD individuals had a higher relative abundance of the TRF corresponding to B. ovatus and a lower relative abundance of the TRF corresponding to B. uniformis compared to healthy individuals, but this trend did not hold for the entire sample cohort. However, when looking at disease location the TRF representative of B. uniformis was present in significantly lower abundances in twins with ileal involvement (P=0.0005, average abundance; 21±s.d. 11%) compared with both healthy (average abundance; 45±15%, P=0.006) and twins with colonic disease (average abundance; 54±19%, P=0.0003). By contrast, there was a trend that the TRFs corresponding to B. ovatus (264; P=0.08) and B. vulgatus (83 and 142, P=0.12) were present in higher abundances in patients with ileal involvement (Supplementary Figure S1). Some TRFs had no sequence matches in the clone library and this was generally the case for those TRFs that had a low relative abundance. The relative proportions of specific populations detected by T-RFLP and by cloning and sequencing were highly correlated (Supplementary Table S2).

Figure 5
figure 5

Neighbor joining tree showing clustering of the Bacteroides clone sequences based on libraries created from one healthy twin pair (6a,b), one concordant CD pair (15a,b) and two discordant CD pairs (16a,b and 18a,b); the T-RFLP abundance data for these individuals are shown in Supplementary Figure S1. Coloring of branches illustrate the respective Bacteroides species that matched to the clone sequences. For sequences where the species names are given, the matches were 99–100%, and species with sequence identities >97%, were called Bacteroides spp-like. CD, Crohn's disease; T-RFLP, terminal-restriction fragment length polymorphism.

Discussion

The most widely accepted hypothesis about the pathogenesis of CD is that it is due to a combination of microbial colonization, environmental factors, immune dysfunction and host genetics. Untangling the possible contribution of microorganisms to CD has been complicated by the large variability in the composition of the gut microbiota in humans. Basically, each human has an individual fecal microbial fingerprint (Zoetendal et al., 1998; Eckburg et al., 2005; Dicksved et al., 2007). However, the study of monozygotic twins basically eliminates the variable of host genetics, except for potential epigenetic factors. In particular the study of a set of discordant monozygotic twin pairs, where one had CD and the other was healthy, was extremely valuable for determination of differences in the gut microbiota independent of host genetics.

The microbial compositions in fecal samples collected from healthy twin pairs were highly similar, using both T-RFLP and %G+C profiling techniques, supporting the hypothesis that genetics has a strong influence on the composition of the gut microbiota. However, six out of eight healthy pairs were young and were still living in the same household. This could contribute to their similar microbial profiles. Nevertheless, it was particularly interesting to note that the microbial community profiles of individuals in healthy twin pairs that had lived apart for many years/decades, for example, twin pairs 1a,b and 6a,b, were still highly similar. Zoetendal et al. (2001) also observed high similarities among identical twins that had lived separated for more than 5 years.

In our sample set, three of the healthy twin pairs were dizygotic and the similarities were not higher within the monozygotic compared to the dizygotic twins, although too few pairs were studied to determine the significance of these observations. Even if dizygotic twins share a certain genetic relatedness they are not as closely related as monozygotic twins and therefore, our results also lend support to the hypothesis that there is environmental programming of the gut microbiota soon after birth (Ley et al., 2006). In addition, all of the dizygotic healthy twin pairs were very young (7–8 years old), and were still living in the same household and this could also contribute to their high similarities in profiles.

Another important finding in this study was that patients with CD ileal involvement, had a significantly different gut microbiota than healthy individuals and those with colonic CD. It is increasingly apparent that CD is not a homogenous disease but a tissue response to various etologic factors (Järnerot, 1996), and our results lend support to this hypothesis. Pairwise comparisons of the microbial profiles from twin pairs also showed that all discordant twins with ileal involvement had community profiles that were less similar to their healthy twin compared to discordant pairs with colonic disease. A possible confounding factor could be surgical impacts prior to sampling, such as ileocecal or ileocolonic resection. However, for the subjects included in this study, their prior surgery was not extensive enough for short bowel syndrome to develop. In support of our findings, ileal CD has previously been reported to differ from colonic CD with dysbiosis of the ileal mucosa-associated microbiota correlating to the ileal disease phenotype (Baumgart et al., 2007). Also, there are differences in genetic susceptibility (Ahmad et al., 2002) and adaptive immune responses (Targan et al., 2005) of CD patients with ileal disease compared to those with colonic CD. In this study, dysbiosis of the fecal microbiota correlated with ileal involvement of CD, suggesting that fecal samples could be used as a potential diagnostic marker for the ileal disease phenotype. Some reports suggest that feces are not appropriate for diagnosis of CD, as they may not reflect the composition of mucosa-associated bacteria that are more directly responsible for inflammation (Lepage et al., 2005). However, previous findings show that components of feces are relevant for ileal CD recurrence (Rutgeerts et al., 1991) and induction of inflammation (Harper et al., 1985). Importantly, fecal samples are non-invasive and easier to obtain than biopsies and our findings provide encouragement in the use of fecal samples for eventual monitoring and/or diagnosis of CD.

Previous studies have found that the microbial diversity in the gut is lower in individuals with CD compared to healthy individuals (Seksik et al., 2003; Manichanh et al., 2006; Scanlan et al., 2006). For example, there have been reports of a reduced diversity of Firmicutes (Gophna et al., 2006; Manichanh et al., 2006) and Bacteroides (Ott et al., 2004; Frank et al., 2007) in CD patients. We also found a significantly higher bacterial diversity in healthy individuals (based on T-RFLP profiles) compared to the CD patients. In addition, all healthy individuals in the discordant twin pairs had higher T-RFLP profile diversities compared to their respective CD twin. Assessment of the ICC coefficient of T-RFLP diversity for the twin pairs also showed a higher agreement within healthy pairs compared to those found within discordant and concordant pairs. These results highlight the power of studying twins as genetically matched controls.

In this study, all of the CD twins, except for two (10b and 15a), were in clinical remission. It has been previously shown that the microbiota of CD patients differ from healthy individuals regardless of disease state (Seksik et al., 2003). The two individuals that had active disease in this study had a low degree of inflammation. A possible advantage of studying patients in remission is that during the active stage of the disease, alterations in the microbiota could be an echo, rather than a cause of inflammation (Manichanh et al., 2006). In addition, patients in remission have a lower probability of taking medication that could induce changes in the microbiota independent of disease state.

We focused our attention on Bacteroides species as these seemed to differentiate between healthy individuals and those with CD when using general bacterial primers for T-RFLP. B. vulgatus, B. uniformis and B. ovatus were the most abundant Bacteroides species detected. B. uniformis (TRF 262) was present in all samples and B. ovatus (TRF 264) and B. vulgatus (TRFs 83 and 142), were present in most of the samples (Supplementary Figure S1). The reason for two representative TRFs for B. vulgatus is probably due to different strain variations of the 16S rRNA gene within this species. The Bacteroides communities were not significantly similar within any of the twin sets, including the healthy twins (Supplementary Figure S1). However, it has been previously shown that the Bacteroides group has a very large inter-individual variation (Eckburg et al., 2005; Jernberg et al., 2007).

Interestingly, our data suggest that there are differences in the composition of Bacteroides species in healthy individuals and CD patients with ileal involvement (Supplementary Figure S2). This difference was largely due to lower relative abundances of B. uniformis and higher abundances of B. ovatus and B. vulgatus, in patients with ileal involvement compared with both healthy twins and twins with colonic disease. Several previous reports have also shown an abnormal Bacteroides community in CD patients compared to healthy individuals (Seksik et al., 2003; Ott et al., 2004; Scanlan et al., 2006). For example, Scanlan et al. (2006) reported a lower complexity of DGGE (denaturing gradient gel electrophoresis) profiles within the B. fragilis subgroup in CD patients than in healthy individuals and a difficulty in obtaining PCR products from CD patients compared to controls. In this study, it was not problematic to obtain PCR products for the Bacteroides group, which could simply reflect differences in Bacteroides abundances within the sample groups of the two studies.

One main conclusion of this study was that the healthy twins and some of the concordant twins that were sampled had similar microbial community profiles and these were closely matched within a particular twin set. However, this similarity did not hold for discordant twins suggesting that the diseased individuals had a different microbial community structure than their healthy twins. This finding was made using two independent molecular approaches: %G+C profiling and T-RFLP. By assessing the T-RFLP diversity within twins a higher agreement was found within healthy twin pairs compared to that within discordant and concordant twin pairs. In addition, the bacterial diversity was higher in healthy twins compared to CD twins. Interestingly, cluster analysis of binary T-RFLP data as well as ordination techniques of T-RFLP abundance data showed that CD twins with ileal involvement differed from healthy twins as well as from twins with colonic disease. This difference could partly be explained by a shift of the dominant Bacteroides community members of CD patients with ileal involvement.