Introduction

In addition to providing a variety of functional and metabolic capabilities relevant to host nutrition and well-being (Sonnenburg et al., 2004; Backhed et al., 2005; Mazmanian et al., 2008), the microbiota resident within the human colorectum have also been observed to change in subjects suffering from inflammatory bowel diseases (IBD). Although these alterations are widely believed to be involved with disease (Campieri and Gionchetti, 2001; Gill and Rowland, 2002; Guarner and Malagelada, 2003; Sartor, 2008), the specific role(s) of individual species or groups of species in disease initiation and/or progression are still largely unknown. Furthermore, IBD are clinically heterogeneous diseases; almost all cases of Crohn's disease involve a primary lesion in the vicinity of the ileocaecal junction, whereas ulcerative colitis usually affects the distal section of the colon, with a high incidence in the rectum (Maratka et al., 1985; Gasche et al., 1998; Radford-Smith et al., 2002). Similarly, around 70% of the adenocarcinomas occur in the distal section of the colon and rectum (Office for National Statistics, UK, 2004), and although proximal cancers are more likely to be associated with changes in CpG island methylation (the CIMP pathway) (Weisenberger et al., 2006), distal colorectal cancers are more commonly driven by chromosomal instability (Grady and Carethers, 2008). Such differences in IBD type and tumour biology might reflect, at least in part, differences in tissue microenvironments, including microbiological differences. To that end, it remains pertinent to elucidate how the microbiome might be structured within the human colon of both healthy and diseased subjects.

Ecological theory suggests that different environments give rise to different communities; therefore, the known anatomical and physiochemical differences along the large intestine should result in a ‘biogeographical’ distribution of microbes along and across the digestive tract. For instance, cultivation-independent studies have shown that the mucosa-associated microbiota from the distal small intestine and large intestine differ (Eckburg et al., 2005; Wang et al., 2005; Frank et al., 2007). Earlier studies of sudden-death victims revealed that the abundance of methanogenic archaea, the activity of sulphate-reducing bacteria, and other parameters such as the pH and concentration of fermentation products, varied considerably along the human colorectum (Macfarlane et al., 1992). Despite these observations, and the technological advances that have been applied to the analysis of gut microbial communities, evidence of a biogeographical pattern within the microbiome of the human large intestine has been elusive to obtain, principally because of the high degree of intersubject variability and constraints in sample number and size. A number of studies have shown a lack of variation in the mucosa-associated microbial communities along the human colon (Zoetendal et al., 2002; Lepage et al., 2005; Bibiloni et al., 2006; Green et al., 2006; Willing et al., 2009) and between diseased and healthy mucosal tissue (Seksik et al., 2005; Vasquez et al., 2007). These studies have usually relied on alpha- and beta-diversity measures for the comparative analysis of microbial communities. Although such approaches were useful in revealing the main effect on variation—which in all cases was reported to be intersubject variability—it also hindered the identification of more subtle patterns.

Ekburg et al. (2005) were the first to undertake large-scale rrs gene library construction and DNA sequencing to characterize the human gut microbiome, using matched tissue and faecal samples from three healthy subjects. Their pairwise comparisons also revealed that there were large differences between subjects, and between faecal and mucosal communities. However, they were unable to conclusively reveal evidence of biogeographical trends along the colonic mucosa, and suggested that higher resolution studies using microarray and/or next-generation sequencing technologies, as well as alternative methods of data analysis, were needed. Both phylogenetic microarrays (Paliy et al., 2009; Rajilic-Stojanovic et al., 2009; Kang et al., 2010) and high-throughput sequencing methods are now used more frequently to examine gut microbiome diversity, but the development of robust quantitative methods to reveal the biogeographical features of the gut microbiota is considered a ‘great challenge’ and ‘the diversity and biogeography of the gut microbiota needs to be defined at varying scales of resolution’ (Peterson et al., 2008). Although some clustering and ordination methods of numerical ecology are commonly used to compare microbial community profiles through open source software (for example, QIIME (Caporaso et al., 2010)), and increasingly used in gut microbiomics research (Fuentes et al., 2008; Biagi et al., 2010; Janczyk et al., 2010), we contend that, in addition to those methods, there are additional useful numerical ecology methods that can be used. For instance, constrained ordination methods are being increasingly used for the analysis of mammalian gene expression (Culhane et al., 2002; Baty et al., 2006, 2008) and of the microbial communities present in terrestrial and aquatic samples (see Ramette (2007) for a recent review). To date, there are only a few instances in which these methods have been used with gut microbiome samples (Eckburg et al., 2005; Mondot et al., 2010; Van den Abbeele et al., 2010 with in vitro cultures), even though such approaches may be useful to subtract the well-established intersubject variability that prevails in human-derived samples, thereby revealing more subtle patterns in gut microbiome structure and dynamics.

In this study, we present our findings obtained with a human intestinal tract-specific phylogenetic microarray (the Aus-HIT Chip) and, by applying constrained ordination numerical ecology methods to these data, reveal not only gender-specific differences in the human colonic mucosa, but for the first time a longitudinal gradient for specific microbes along the colorectum.

Materials and methods

Sampling

This study prospectively recruited subjects scheduled for outpatient colonoscopy at the Royal Brisbane and Women's Hospital in Brisbane and the Flinders Medical Centre in Adelaide, Australia. To be included in this study, subjects had to be between 45 and 70 years of age, non-smokers and should have a normal colonoscopy, including no diverticulosis and no past history of colorectal neoplasia. In addition, subjects were excluded if they were receiving anticoagulant therapy or had a history of IBD, familial adenomatous polyposis or hyperplastic polyposis. The first five women (age 57.5±7.7) and five men (age 54.3±9.9) who satisfied these criteria were selected for this study. All subjects gave written informed consent and the study was approved by the local institutional review boards. During colonoscopy, all patients had four mucosal biopsy samples taken, one each from the caecum, the transverse colon, the sigmoid colon and the rectum. The mucosal biopsy samples were snap frozen in liquid N2 and stored at −80 °C until processing. DNA was extracted and purified from the frozen biopsy samples using the AllPrep DNA/RNA Mini Kit (Qiagen Pty Ltd, Hilden, Germany), according to the manufacturer's instructions, and stored at 4 °C until processing.

Generation and fluorescent labelling of cRNA

The methods described by Kang et al. (2010) were used with several modifications. First, greater amounts of total DNA (100–500 ng) extracted from the biopsy samples were used for the PCR amplification of the Archaea and Bacteria rrs genes, using the primers described in Table 1. As an internal standard, the mitochondrial rrs gene was also PCR amplified using DNA extracted from blood as the template and the mitochondrial-specific primers listed in Table 1. All PCR reactions were performed in triplicate for each sample, then pooled and purified using the MiniElute PCR purification kit (Qiagen). Where necessary, the amplicons were concentrated using Pellet Paint (Novagen, Madison, WI, USA).

Table 1 Primers used in this study

Single-stranded cRNA was then produced from 500 ng aliquots of the pooled PCR products, using the MEGAScript T7 In Vitro transcription kit (Ambion, Austin, TX, USA) as specified by the manufacturer. The resulting cRNA samples were purified using the MEGAclear Kit (Ambion), and the size and purity of the products obtained at each step were assessed by agarose gel electrophoresis and with an ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Aliquots (500 ng) of the cRNA samples were spiked with 7 ng of the control mitochondrial cRNA, then labelled and fragmented using the Label IT μArray Cy5 labelling Kit (Mirus, Madison, WI, USA).

Microarray hybridization and image analysis

The cRNA samples described above were hybridized in quadruplicate to CustomArray 4 X 2 K microarrays fabricated by CombiMatrix (Mukilteo, WA, USA). The relevant probe and other methodological details have previously been described by Kang et al. (2010) and are accessible at the GEO database (http://ncbi.nlm.nih.gov/GEO, under GPL9543 and GSE18933). In brief, the microarrays include 593 probes specific for human intestinal bacteria and archaea derived from the published literature, another 163 newly designed probes using the GoArray algorithm (Rimour et al., 2005), as well as 10 probes (6 positive and 4 negative mismatch controls) targeting the human mitochondrial rrs gene. The hybridized microarray slides were scanned using an Axon Genepix 4000A microarray scanner (Axon Instruments, Union City CA, USA). The images obtained were analysed using GenePix Pro 6.0 software (Axon Instruments), and probe signals were recorded as 635 nm foreground and 635 nm background intensities.

Data processing and analysis

Raw signal data were first processed using Genespring GX10 software (Agilent Technologies, Santa Clara, CA, USA). The probe intensity values were transformed to log2, and to avoid the generation of biased profiles the distribution of probe intensities for all the hybridizations were first checked by boxplot graphing before normalization using the quantile normalization method (based on the normalization of intensity distributions) (Bolstad et al., 2003). Any probes that produced signal values lower than the highest negative control signals were also removed from further analysis. Of the 40 samples used in this study, one was consistently found as an outlier and was excluded from the analyses to avoid bias. Using these criteria, the remaining 39 samples produced a total of 115 profiles that were used for analysis, with no less than two hybridization profiles produced for each cRNA sample.

The averaged per-sample profiles were first subjected to hierarchical clustering, using the average linking method based on χ2 dissimilarity measures using R package ade4. The profiles were then subjected to multivariate analysis methods derived from numerical ecology. For these analyses, both the averaged per-sample profiles (n=39) and all the microarray profiles (n=115) were used to examine the technical variation associated with the profiling approach used. Although both data sets produced similar results, only the former (n=39) was used in the assessment of biological effects. Between-group analysis (Culhane et al., 2002) and analysis with respect to instrumental variables (Baty et al., 2006) applied to correspondence analysis (Fellenberg et al., 2001) were performed, using the R package ade4 (Dray and Dufour, 2007). The correspondence analysis method, often referred to as reciprocal averaging, is considered to be specifically useful in ecological studies. Its underlying model assumes a unimodal distribution of species along environmental gradients (as stated by niche theory), and is based on χ2 distance measures, thus avoiding the double-zero problem (Legendre and Legendre, 1998) (Ramette, 2007). These methods are expected to identify the probes that best explain the differences between a priori classifications, and permit the subtraction of particular effects (for example, intersubject variability), which then allows the remaining effects to be effectively compared. Monte–Carlo tests (Baty et al., 2006), in which sample labels are randomly permuted to assess the significance of the constraint being applied, and bootstrapping tests (Baty et al., 2008) assessing the significance of probe contribution to the model were performed to assess the reliability and stability of the results obtained. Only probes showing a stable contribution (P<0.05) and a larger effect on the model than that of the control mitochondrial probes were considered for further analyses, to avoid spurious results. Pairwise differences between class (sampling sites) positions in the model were calculated by analysis of variance using the Statistical 6.0 software (StatSoft Inc., Tulsa, OK, USA).

Real-time PCR analysis

Some key bacterial groups, identified by the numerical ecology methods to vary with respect to gender and/or site were quantified by quantitative PCR. The primers used in these assays are listed in Table 1. The reaction mixtures were prepared using a Biomek2000 automated workstation (Beckman, Brea, CA, USA) and contained 2X iQ SYBR Green Supermix (Bio-Rad, Hercules, CA, USA), 400 nM each of the appropriate forward and reverse primer, 10 pg of template DNA (the pooled PCR amplicons obtained from the original sample) and sterile water. Quadruplicate 5 μl aliquots of each reaction were dispensed into a 384-well plate, and a dilution series was also prepared from a pooled sample to determine the amplification efficiencies for each group-specific PCR reaction. Real-time PCR was performed using a 7900HT Sequence detection system (Applied Biosystems, Mountain View, CA, USA), and after an initial denaturation step of 95 °C for 10 min, 40 cycles of 95 °C for 15 s and 58 °C for 30 s were performed. The resulting data for each specific group were normalized relative to the Bacteria rrs gene using the methods described by Pfaffl, 2001, and subjected to analysis of variance using Statistical 6.0 software. To visualize a possible site-related structure in these data, the datum produced for each individual sample was also divided by the subject average and then these values were plotted using the triangle.plot function from ade4.

Results

Hierarchical clustering of the microarray profiles shows marked intrasubject grouping

Similar to the findings of Eckburg et al. (2005), the initial clustering analysis of the microarray data showed that the microbial profiles produced from different tissues from the same subject clustered together (Supplementary Figure 1). The analysis did not reveal any clear evidence of a biogeography along the colorectum, other than an inconclusive tendency of the caecum-derived profiles to cluster with the transverse colon-derived profiles from the same patient (for 6 of the 10 subjects). However, the clustering analysis did indicate that there may be a possible gender effect, as the profiles derived from all but one of the male subjects clustered together (Supplementary Figure 1).

Between-group analysis of the microarray data confirmed a gender-based bifurcation in microbiome profiles

When the microarray data were subjected to analysis with constrained ordination (that is, between-group analysis using gender as the constraint), the results showed that gender had a statistically significant effect on the profiles and accounted for 15.5% of the variance in the data (P<0.05 compared with 13.3% and P<0.05 for the replicated (n=115) data set; Figure 1). The analysis also showed that the male-derived profiles were more similar to each other, whereas the female-derived profiles contained a greater amount of variation. The cRNA samples prepared from the female subjects produced relatively stronger signals from most of the Streptococcus, Veillonella, Mannheimia and Ruminococcus spp. probes, whereas samples from the male subjects produced relatively greater signals from several probes targeting Faecalibacterium prausnitzii, Bacteroides, Clostridium, Prevotella, Enterococcus and Bifidobacterium spp.

Figure 1
figure 1

Discrimination between female- and male-derived profiles. The diagram shows the single axis result of a between-group analysis applied to correspondence analysis on the data using gender as constraint. The seven most discriminant probes from either end of the axis are included (error bars are used to represent the spread of the probe coordinates on the model after bootstrapping).

Subtraction of intersubject variability from the microarray data by application of numerical ecology methods revealed evidence of colonic mucosal biogeography. To assess whether there were differences between sites, an analysis with respect to instrumental variables (that is, a within-group analysis using subject as constraint, followed by a between-group analysis using sampling site as constraint) (Baty et al., 2006) was performed with the data set. The analyses revealed that intersubject variability accounted for 94.2% of the variance in the data, and when this variability was not removed from the analysis no differences in the profiles with respect to site could be observed (Figure 2b, P<1.0). However, once the intersubject variability was removed, the sampling site was found to significantly contribute to the remaining variance (10.18% and P<0.087, as compared with 4.8%, P<0.01 for the individual hybridizations). Indeed, the between-site analysis applied to the residual variance once the interindividual effects had been removed shows the position of the sites along the colorectum to be the main (x) axis of variation (Figures 2a and 3a). Pairwise tests revealed that all classes (sampling sites) were significantly separated by one of the three axes of the model. Furthermore, there was also evidence of gradients in the intensity of some probe signals along the colorectum (Figure 3a). Probes targeting Streptococcus, Comamonadaceae, Enterococcus and Corynebacterium spp. showed strong associations with the caecum and transverse colon classes, whereas several Enterobacteriaceae probes show association with the sigmoid colon and rectum classes. Taken together, the microarray data suggest that, in the subjects examined in this study, members of the Streptococcus, Comamonadaceae, Enterococcus and Corynebacterium spp. were present in greater abundance in tissue samples collected from the caecum and transverse colon, whereas members of the Enterobacteriacea were present in relatively greater amounts in tissue samples collected from the sigmoid colon and rectum.

Figure 2
figure 2

Decomposition of the data set variability according to site. The diagram shows the results of a between-group analysis applied to correspondence analysis using site as the constraint on the data, in which the ‘individual’ effect has been previously removed (a), or not removed (b). The coloured ellipsoids represent the collective variance for each site. In panel a, a maximum of 20 probes significantly contributing to each class are shown, and the axes comprise 53.13% (x) and 26.87% (y) of the variance explained by the constraint.

Figure 3
figure 3

(a) Species contributing to the main (x) axis of variation shown in Figure 2a, plotted with respect to the microarray hybridization profiles generated from the individual samples. The positional location of the coloured lines represents their relative position along the axis of variation and their association with the listed probe identifiers. Only those probe identifiers that make a statistically significant contribution to the site differentiation are shown. (b) A triangle plot showing the quantitative PCR results produced from the individual samples using group-specific primers. The coloured ellipsoids represent the collective variance for each site, which concurs with the microarray results shown in Figure 2a.

Quantitative PCR analysis supports the findings derived from the microarray profiles

Overall, the results derived from the real-time data validate previous findings of the microarray analysis. The gender-based differences revealed by the microarray analyses were confirmed, with the female DNA samples found to possess fewer F. prausnitzii (female/male ratio: 0.41, P<0.05) and higher numbers of streptococci (female/male ratio: 1.29), although the latter was not statistically significant. The multivariate test applied to site-specific data revealed that there were statistically significant differences (Wilk's lambda, P<0.01), and that streptococci are present in lower abundance in the sigmoid colon compared with other sites (ratio sigmoid/other sites 0.63, P<0.05). Furthermore, when the values obtained for each sample were plotted together (Figure 3b), the per-site distribution obtained was quite similar to results produced using the microarray data (Figure 2a). Taken together, these results are consistent with the findings derived from the microarray analyses and provide further evidence that select groups of bacteria may preferentially colonize different regions of the colorectum.

Discussion

Gender-related differences in gut microbial communities have previously been observed in macaques (McKenna et al., 2008) and mice (Schloss and Handelsman, 2006). In humans, gender-related differences have previously been reported in a cross-sectional study evaluating differences in a small set of members of the faecal microbiota (Mueller et al., 2006). The study concluded that members of the Bacteroides-Prevotella group were in greater abundance in males than in females, which is in accordance with our microarray results. Gender-based differences in the gut microbiota could be indicative of underlying differences in gut physiology or host effect on the microbiome (that is, diet and lifestyle). Regarding the biogeography of the colorectum, it is evident from the overlap observed between the classes (Figure 2a) that the model obtained could not perfectly resolve in all cases between different sampling sites. Nevertheless, the results obtained in this study clearly point to a consistent and common biogeography in the colorectum of the patients studied, as evidenced by the significant effect that sampling site had on the variance of the microarray profiles, and on the abundance of select bacterial taxa measured by quantitative PCR.

The main trend observed in the microarray analyses was a longitudinal gradient for specific microbes along the colorectum, with probes targeting members of the Streptococcus, Comamonadaceae, Enterococcus, Corynebacterium and Lactobacillus producing the strongest signals with caecal and transverse colon samples. These findings are not inconsistent with the observations reported by Frank et al. (2007), in which Streptococcaceae and Corynebacteriaceae were present in higher numbers from distal small intestinal samples compared with those from the colorectum. Conversely, probe signals for several members of the Enterobacteriaceae were found in our studies to increase towards the rectum, suggesting that these bacteria largely reside in the distal regions of the colorectum in healthy subjects. Interestingly, Willing et al. (2009) reported a large increase in Enterobacteriacea in samples analysed from subjects with the ileocaecal form of Crohn's disease; our observations then further support the contention that this form of IBD is associated with a proliferation of specific bacteria ‘outside’ their normal biogeographical location. The environmental drivers of the differences for both axes of variation cannot be determined from this study, but the trend follows the main physicochemical changes known to occur through the colorectum: lumen content dehydration and pH increase towards the rectum (Bown et al., 1974), as well as highest short-chain fatty acid, lactate and ethanol concentrations in the proximal colon, decreasing distally with a concomitant increase in products of protein fermentation (for example, ammonia, branched chain fatty acids and phenolic compounds) (Macfarlane et al., 1992). There is also one report of there being differences in mucosal oxygen concentration along the human colon, especially between the caecum and distal regions (pO2 [mm Hg] being 33.7±7.5 in the caecum, compared with 40.7±8.2, 41.2±8.7 and 39.7±7.3 for the transverse colon, sigmoid colon and rectum, respectively (Sheridan et al., 1987)). Interestingly, recent studies linking a higher abundance of aerotolerant bacteria with ileal CD (Baumgart et al., 2007) and surgical intervention (Hartman et al., 2009) suggest that perturbations in oxygen concentration may be associated with the observed dysbiosis. Our findings are not inconsistent with these reports, and collectively, all three studies support the hypothesis that the dysbiosis seen with some types of IBD might initially be in response to the altered oxygenation state of the inflamed tissue.

In conclusion, it has long been recognized that there is a variation in the physiochemical conditions along the human colon, and it is biologically intuitive that such conditions would influence the biogeography of the colonic mucosal microbiome. However, the detection of such differences has been elusive, obscured behind intersubject variability. Our study shows that the application of numerical ecology methods to the analysis of data generated with a phylogenetic microarray is a powerful approach in the study of the mucosa-associated microbial community of the human colorectum. Not only did we find differences in the mucosa-associated microbiome with respect to gender but we also showed, for the first time, evidence of a distinctive biogeography along the length of the human colon. Given the region-specific nature of many types of colonic cancers and inflammatory diseases, our findings raise interesting questions on how the ‘functional core’ of the mucosa-associated gut microbiome might change along the length of the human colon, as well as the nature of host–microbe interactions at different regions.