Introduction

The human gastrointestinal tract (GIT) hosts a complex bacterial community which, for decades, has been subject to extensive study involving both culture-dependent and -independent methods1,2,3,4. The GIT microbiota is acquired very early in a month-scale laps from the time of delivery of the baby5. Together with this very short time scale, the “inoculum” is determined by the mode of delivery (vaginal or C-section), by the mode of feeding (breast milk or formula) and by environmental factors at the time of delivery, among others6,7,8.

Most likely, very few organisms of maternal origin will be maintained in the newborn after the first month of life9,10; after this short time, the intestine already hosts its own microbiota, which undergoes complex bacterial turnover until reaching a stable composition that is then fairly constant for life11. To achieve this homoeostasis, innate and adaptive immune responses work together. Growing evidence shows a parallel evolution of the immune system with members of gastrointestinal microbiota12,13,14. Thus, since the first day of life, Toll-like receptors (TLR1 to TLR9) are adapted and can recognize specific microbial structures such as lipopolysaccharide (LPS)15. Lotz and collaborators described how new-born mice go through a process of tolerance acquisition to LPS expressing bacteria within one hour from delivery; this process contributes to establishing microbiota-host homoeostasis early after delivery16. In a very well-designed study, Giraud and collaborators demonstrated that Escherichia coli MG1655 inoculated in 8 to12-week old gnotobiotic mice evolves rapidly throughout a fitness process based on mutations and adaptation to the new environment, altering the EnvZ/OmpR operon in 90% of bacteria harvested from independent mice faeces after wild-type “inoculum”17. On the other hand, the immune system has to adapt to microbial invasion, avoiding complete annihilation of the intruders, which have to be recognized as the second part of the mutualistic relationship. These interactions are maintained throughout life. In this scenario, the immune system is continuously stimulated, although controlled, avoiding abnormal responses to commensal or potential pathogens which pass through the GIT. For instance, some bacteria such as Sphingomonas members have been identified as players in the maintenance of immune response producing antigenic glycosphingolipids which have been correlated with the expansion of invariant natural killer T cells18.

Already in 1987, Conley & Delacroix estimated that in the human intestine approximately 40 mg/kg (body weight) are produced every day19. It was estimated that in healthy conditions about 36% of GIT microbiota is covered by IgA and that this number can increase up to 69% during inflammatory processes20.

A hot research topic involves understanding the role of dominant bacteria and their relationships with rare ones. In a previous work, we highlighted that low frequency taxa are present and are made mainly by active bacteria whose diversity is statistically hidden by the dominant majority21. The gastrointestinal tract is the richest body area in terms of microbial groups. The most represented taxa are also the most commonly shared between different samples. However, most of the yet-unknown taxa belong to the under-represented fraction of the gut microbiota22,23.

While next-generation DNA sequencing methodologies shed light on the enormous variety of bacteria inhabiting the human body, the direct cell-by-cell understanding of which the active players are and which of them interact with the immune system still represents a research challenge. Previously, Van Der Waaij and collaborators (1994) described a method for human faecal microbiota sorting based on fluorescent labelling of anti human IgA24. We used an interdisciplinary single-cell approach involving RNA cell staining as a measure of activity25,26, anti human IgA fluorescent hybridisation, flow cytometry and cell sorting, 16S rDNA amplification and deep pyrosequencing to shed light on different segments of gut microbiota, such as the potential activity of under-represented bacteria. We observed that a long tail of low frequency taxa (below 1% in richness) could involve up to 20% of the active population. We also analysed active and S-IgA-coated bacteria from healthy volunteers which had not received antibiotic treatment and were apparently healthy. Our aim was to determine whether IgA were coating specific taxa. We identified several groups, such as members of Actinobacteria or Sphingobacteria for which there are already hypotheses regarding their role and activity, confirmed in murine models as active players in gut microbiota and/or as part of the S-IgA-coated fraction.

Results

Sequencing and diversity overview

From each sample and fraction we obtained an average of 6,980 sequences over 200 bp (min: 727; max: 14,359 reads). The average read length was 422 bp (+/−89 bp). The number of reads per sample/fraction, as well as the number of particles (cells) sorted by flow cytometry and the ratio with respect to the total number are reported in Table 1. As can be observed, the fraction of active cells ranged from 34% (sample V2) to 66% (V4) with respect to the total community and the proportion of S-IgA-coated bacteria ranged from 21% (V4) to 98% (V3).

Table 1 Details of samples/fractions. For each sample and fraction columns describe for each sample and fraction, the number of sorted cells, the proportion of sorted cells with respect to total number, the number of genera, the Shannon index of diversity, the Chao1 richness and its standard deviation, respectively

Diversity analysis at the genus level showed that FS fractions present a lower evenness with respect to almost all Act samples (V2, V3, V4, V5 and V6) and all IgA fractions, as revealed by the means of the Shannon index. The richness estimator Chao1 detected different behaviours of FS with respect to Act and IgA, being lower in samples V1, V2, V3, V5 and V6 and higher in V4 (see Table 1).

Rarefaction curves flatten at the family level and all FS fractions flatten earlier than Act and IgA fractions (see Figure 1). Analysis of family distribution revealed that Act and IgA fractions tend to cluster together (Figure 2, panel A). Moreover, Act or S-IgA-coated bacterial fractions are not representative of the whole community; neither could they be considered a subset of it because of the great richness and evenness of species found. Sample V1 showed a different microbial distribution with respect to the other samples, as also shown in the cluster analysis where V1 fractions behave as an outgroup with respect to the other samples/fractions (Figure 2, panel A). At the level of genera, canonical correspondence analysis among all samples collected showed that two major components yield 43.62% and 17.60% of the observed diversity (Figure 3). As expected, sample V1 is the most divergent and also noteworthy is the clear separation between the three fractions. These results are statistically corroborated by the analysis of variance, using distance matrix by “Adonis” (Permutational Multivariate Analysis of Variance Using Distance Matrices) known also as “Permutational Manova”28,27 and applying Bray-Curtis distance calculation at the genus taxonomic rank. The analysis showed significant p-values for both fractions and sample grouping criteria (Figure 3 and Supplemental Information, Table 1).

Figure 1
figure 1

Rarefaction curves calculated for each sample/fraction.

FS: faecal suspension. Act: active fractions. IgA: S-IgA-coated bacterial fractions. Every plot corresponds to a volunteer. X axis represents the number of sequences while Y axis represents the number of families encountered.

Figure 2
figure 2

Frequency distributions at family taxonomic rank level.

Panel (A) shows clustering of samples/fractions at family level using Bray-Curtis distance matrix and complete linkage clustering method. Panel (B) shows the distribution of all frequency values calculated at family level, Y axis is represented in logarithmic scale. Panel (C) shows aggregation of dominant and rare families, whose abundance values are equal or higher than 1% or lower than 1% respectively. Bars define samples while bar groups define fractions (see legend above the panel).

Figure 3
figure 3

Canonical correspondence analysis.

The distribution has been obtained from the Bray Curtis distance matrix, calculated with the genus contingency table. Axes represent distance values on coordinates 1 and 2. Red lines and labels separate samples, light blue lines separates fractions. Gray labels with “Ssample.Fraction” indications describe the distribution of each experiment. Pink labels represent the distribution of genera.

To classify low- and high- frequency families, we applied a method based on average inflection point, as previously described21. The frequency distribution per families considering the whole dataset had mean and median values of 1.60% and 0.1%, respectively (see Figure 2 panel B), confirming the numeric importance of rare families. Overall mean values per fraction showed that dominant families represented 97.11% (standard deviation: 1.1), 36.37% (sd: 20.39) and 43.46% (sd: 33.54) of families in FS, Act and IgA fractions, respectively, while rare families (frequency below 1%) among all samples represented on average 2.89% (s.d.: 1.1), 5.81% (s.d.: 2.41) and 6.21% (s.d.: 2.08) of families in FS, Act and IgA fractions, respectively (see Figure 2, panel C). Thus, the low-frequency taxa were extraordinarily diverse and diversity is characterized, as expected, by the “long-tail” effect.

OTUs97 distribution and core microbiota

Using an analogy with the pan-genome concept29,30 we can define “pan-microbiota” by identifying sequence clusters shared among samples (a “core” set of bacteria) as well as clusters specific to each sample.

Comparative analysis of species-level operational taxonomic units (sequences at 97% similarity, 80% overlapping, hereinafter OTUs97) revealed that among all samples, 28, 21 and 69 OTUs97 were shared, respectively, among IgA, Act and FS fractions (see Figure 4). The shared diversity of the FS core comprised only members of highly abundant families of Firmicutes and Bacteriodetes, while in Act and IgA fractions, members of Proteobacteria were also found, such as Alphaproteobacteria mainly belonging to Sphingomonadales, Gammaproteobacteria belonging to Pseudomonadales and Betaproteobacteria belonging to Burkholderiales; in the IgA core we also found members of Actinobacteria, which were absent in the FS (see Table 2).

Table 2 Core microbiota. Columns describe taxonomic ranks of core OTUs97 of each fraction among all samples. Asterisks indicate families found in at least 5 out of 6 samples
Figure 4
figure 4

Venn diagrams of species-level OTUs97 distributions.

Graphs show the unique and shared OTUs in FS, Act and IgA fractions, respectively. Sample code is reported close to each area. Numbers in central overlapping areas refer to the shared OTUs97 in the core microbiota.

A less restrictive analysis describing the core bacteria made by clusters represented in at least 5 out of 6 samples showed that the FS core was almost invariable, adding only members of Streptococcaceae. The Act fraction also included members of Xanthomonadaceae from Gammaproteobacteria; Microbacteriaceae and Nocardiaceae from Actinobacteria (see Table 2).

Taxonomy analysis

Unassigned genera

Genera were considered unassigned when annotated by RDP classifier with confidence value score lower than 0.8. These were annotated using upper taxonomic rank levels. In all samples a very high rate of unassigned genera was found. In FS we found that unassigned genera ranged from 24.77% of reads in V6 sample to 44.56% in V4 sample. In the Act fraction, the percentage of unidentified genera was more uniformly distributed, ranging from 8.27% (V3) to 18.83% (V4), except for sample V1 (0.46%). In S-IgA-coated fractions, unidentified genera ranged from 2.45% in V1 to 35.01% in V2.

The number of OTUs97 belonging to unassigned genera was of 9,122 out of 14,558. These data are shown grouped by main classes in Supplemental Information, Figure 1.

Firmicutes

As expected, Clostridiales was the most populated order not only within Firmicutes but also considering the rest of phyla. Overall, among taxa with frequencies equal to or greater than 1% we found: Lachnospiraceae (including: Lachnospira, Roseoburia, Incertae Sedis XIV, Blautia); Ruminococcaceae (Faecalibacterium) and other unidentified Clostridiales (see Supplemental Information, Figure 2 - Firmicutes). Within the Bacilli class, we found several low-frequency (less than 1% in frequency) families and genera belonging only to Act or IgA fractions. Within Lactobacillales, members of Enterococcus, some unidentified Enterococcaceae and Lactobacillus were found to be common to all samples, at least in Act and IgA fractions. Within Streptococcaceae, Streptococcus is the one shared among all samples (see Supplemental Information, Figure 2 - Firmicutes).

Bacteroidetes

Bacteroidetes was the second most recruited phylum, in which the genus Bacteroides was found among all but sample V1. It was highly recruited from the FS fraction, accounting for up to 38.86% of total reads in sample V6 (19.81%, 15.21%, 13.93%, 7.80% in V5, V3, V4 and V2 respectively). Several groups of Bacteroidetes were mainly recruited from the FS fractions, for example, unidentified Porphyromonadaceae were found uniquely in FS fractions from all but V5 samples. Prevotella was also one of the universally recruited groups in Act (all samples), IgA (samples V1, V2, V3 and V5) and FS (samples V1, V2, V3, V4) fractions. Parabacteroides was similar to Prevotella; it was found in FS (all but V6), Act (all samples) and IgA (all but V5) fractions (see Supplemental Information, Figure 2 - Bacteroidetes).

Proteobacteria

Almost all Proteobacteria have been found as part of active and/or S-IgA-opsonised microbiota.

Only Sphingomonas, member of Alphaproteobacteria, was found at frequency higher than 1% at least in the IgA fraction of V1, V2, V3 and V4; although they were also present in less than 1% in V6 (Act and IgA). Pseudomonas was also highly recruited in Act and IgA fractions of samples V3, V4 and V6. No Proteobacteria over 1% in frequency was found in sample V5. Several families of Sphingomonadales were found to be common to all samples, such as Porphyrobacter and unassigned Erythrobacteraceae (all but V4 and V5), Novosphingobium (all samples), Sphingomonas (all but V5) and other unidentified families (see Supplemental Information, Figure 2 - Proteobacteria).

Within Betaproteobacteria, order Burkholderiales, family Sutturellaceae, Parasutterella was commonly retrieved from all fractions. Also Sutterella from the same family was found in all but V4 and V5 samples. Within Comamonadaceae, Pelomonas was found in all samples but never in FS fractions.

Some members of Deltaproteobacteria, mainly related to Desulfovibrionales were found, but without any kind of pattern and mainly present in FS fractions.

Gammaproteobacteria such as Escherichia/Shigella were found in low frequency among all samples: in FS (V5 and V6 samples), Act (all samples) and IgA (V1, V2, V5 and V6). Other Enterobacteriaceae such as Morganella was found in the IgA fraction of all samples. Interestingly, members related to Alcanivorax from Oceanospirillales, were found in all but V4 samples at least in the IgA fraction. Within Pseudomonadales, Acinetobacter was found in all but V2 samples in Act and/or IgA fractions. Pseudomonas was also one of the common genera retrieved from all Act and IgA fractions, but not from FS. Also Stenotrophomonas from Xanthomonadales was found in all but V2 samples from Act or IgA (or both) fractions.

Actinobacteria

No genera belonging to Actinobacteria were found at a frequency higher than 1%. Within Actinomycetales, Microbacterium from Microbacteriaceae was found to be commonly present in all samples in Act (except for V1) and/or IgA but never in the FS fraction. The same trend was observed for Rhodococcus from Nocardiaceae and Propionibacterium from Propionibacteriaceae. Within Coriobacteriales, Collinsella was retrieved from all but V5 samples whereas in samples V2 and V4 it was also recruited from the FS fraction (see Supplemental Information, Figure 2 - Actinobacteria).

Discussion

It is now widely accepted that most of the relationships between micro-organisms and higher organisms (e.g., plants, animals and others) are most probably commensal or mutualistic31. In this frame, there is growing evidence that the immune system evolved accommodating bacterial colonization of growing complexity modulating its reactivity18,32,33. In this work, we show that taxonomic distribution obtained from active and S-IgA-coated bacterial fractions isolated from healthy human volunteers' faecal samples are highly dissimilar from those obtained from DNA extracted directly from faecal material. Generally speaking, what we observed in FS is coherent with results obtained in several other studies concerning gut microbiota distribution, in which stool samples are used34. For example, there are previous reports that dominant OTUs are the ones which are more widespread among samples23. Firmicutes and Bacteroidetes are the main taxa commonly retrieved from all human stool samples and their ratio is often influenced by diet35. As also observed by Tap and collaborators34, several dominant taxa belonging to Lachnospiraceae, Clostridiaceae or Bacteroidaceae have commonly been found in the core microbiota of FS fractions of all samples, as also in this work. In addition, the use of flow cytometry-based sorting to obtain fractions, with subsequent pyrosequencing, enabled us to access active and S-IgA-coated bacterial fractions.

It may seem contradictory that many bacteria found in the active and S-IgA-coated fractions were not detected in the faecal controls. The reason for this lack of detection is probably related to the lack of amplification of rare bacteria, as PCR is biased to preferentially amplify the most common DNA templates in the sample36. Thus, the dominance of some bacterial taxa in faecal samples masks an important part of the diversity, which includes active and IgA-opsonised bacteria.

We found that overall the ratio of Bacteroidetes/Firmicutes recruitments is maintained in FS, Act and IgA microbiota but, in the latter two fractions, their presence decreased in favour of other phyla, such as Proteobacteria and Actinobacteria. Thus employing the RNA/per cell content as a measure of bacterial activity25,26, we show that several still unidentified or very rare taxa are indeed active players of human microbiota and in strict interaction with the immune system. In this context, when active and S-IgA-coated bacterial fractions were characterized, we were able to widen the microbiota core to members of Proteobacteria including Sphingomonadaceae and Pseudomonadaceae, while, in a less restricted core (in five out of six samples), we also found members of Actinobacteria. The presence of core S-IgA-coated bacteria is coherent with observations that the immune system is continuously stimulated by commensal bacteria37,38.

Within Firmicutes, we found that active core microbiota includes members of Blautia whose species B. productia has been tested as active fermenter in simplified human intestinal microbiota experiments using germ-free rats39. Other members of Firmicutes, such as Lactobacillus or Enterococcus, which are important due to their role as immune system modulators or for helping intestinal absorption40, have been commonly found as part of active and IgA fractions.

Regarding Bacteroides, strict anaerobes such as members of genus Prevotella, known as cellulose and xylane degraders, were already found as part of the core microbiota of children whose diet was high in fibre content35. In our experiment we found Prevotella as core bacteria only when we focused our attention on active or S-IgA-coated bacteria. Thus, although members of this group are considered rare, they are commonly found to be active.

A very interesting finding is the case of the phylum Proteobacteria, whose members are almost invisible when DNA extraction from faecal samples is used without the application of any selection or sorting protocol. In our work, we pointed out that Proteobacteria are recurrent active members of gut microbiota, as well as in strict interaction with the immune system. Sphingomonadales members (from Alphaproteobacteria) are important players shaping the immune system. Sphingomonas sp. was highly retrieved in four out of six samples while it was found in less than 1% in frequency in the other two samples. Studies in germ-free mice supplemented with conventional or restricted gut microorganisms show that glycosylceramides, which are a component of Sphingomonas cell wall, are identified as CD1d ligands stimulating invariant natural killer T cells18,41,42. We observed that several other Sphingomonadales families appear to be represented in the core microbiota, including the genera Porphyrobacter, Novosphingobium, Sphingomonas and others not well characterized yet but falling within the Sphingomonadaceae family.

Other commonly retrieved Proteobacteria retrieved from all fractions were members of Beta class, belonging to the order Burkholderiales. Parasutturella and Sutturella have been linked to healthy flora in studies of lean and obese mice43. Burkholderia were also commonly retrieved from active and S-IgA-coated microbial fractions but never in FS fraction probably due to their reduced frequency overwhelmed by dominant bacteria when a standard stool-based approach is applied. Burkholderia is often considered to be an opportunistic pathogenic bacteria but its commensal role makes it apt to survive in the intestine without generating infection processes. Hutchinson and collaborators reported Burkholderia as strict aerobic bacteria although viable colonies have also been reported to appear after days of anaerobic incubation44. This could explain its scarce presence and recruitment from the gastrointestinal tract in active and S-IgA-coated fractions.

Escherichia/Shigella related Enterobacteriaceae were found in almost all samples and fractions. We also found Alcanivorax-related members, which are known to be mostly marine, active obligate oil-degrading bacteria45. Although its 16S rDNA homology with the closest member in GenBank is around 96% (data not shown), it is surprising to find it in active or S-IgA-coated fractions of almost all samples. Another Gammaproteobacteria identified was Acinetobacter (family Moraxellaceae) although its 16S similarity is far from A. baumani (97% similarity) and closer (99%) to other unidentified or uncultured Acinetobacter (data not shown). In a recent study in mice Acinetobacter, Stenotrophomonas and Comamonas are hypothesized to be part of the so-called “crypt-specific core microbiota”46. Here we found that these groups of bacteria are almost always recruited from human active microbiota or from human S-IgA-coated microbiota. Some genera were commonly found in Act and IgA fractions but they seem to be hidden when considering FS fractions.

Moreover, Actinobacteria were commonly recruited at frequencies lower than 1%. In their review, Turroni and collaborators47 described Actinobacteria to have been commonly retrieved from faecal samples using classical culture-dependent approaches, but they are rarely found in metagenomics-based experiments2,48,49. Likewise, in our samples, Actinobacteria were almost absent in FS fractions while they were low-frequency recruited in active and/or S-IgA-coated bacterial fractions. Thus, their absence in the FS fraction cannot be attributed to some biases in the DNA extraction method (commonly applied to all fractions) or to the primers used for PCR amplification, but rather a result of the sequencing optimization when a specific fraction is selected using the flow-cytometry-based sorting approach.

In conclusion, the proposed approach enabled us to obtain 16S rDNA amplicons from genomic DNA of active bacterial cells, selected and sorted by flow cytometry. Compared to other methodologies where active bacteria are identified by retrotranscription of ribosomal RNA, the taxonomic distributions observed in this work are not biased by the rRNA copy number, found in each cell, while it is possibly biased only by the chromosomal 16S rDNA copies (chromosomal 16S rDNA). As already pointed out in our previous work21, the active fraction of gut microbiota represents a different approach to studying the human microbiota. The combination of this approach with the characterization of bacteria opsonised by different human immunoglobulins will allow future study of the active bacteria ignored by the immune system, as well as those actively growing but opsonised by different Ig types. We believe this will provide important insights into the host-microbiota interplay under both healthy and disease-related conditions.

Finally, we describe the presence of a microbiota core among faecal samples of six healthy volunteer samples. This core contains members of taxa already described in previous works; however, focusing the analysis on fractions of active or S-IgA-coated bacteria highlights the presence of other “important” taxa, which were previously not visible. For instance, we found that in addition to the already known members of the gut microbiota core, such as Firmicutes and Bacteroidetes, other active bacteria such as members of Actinobacteria and Proteobacteria (including genera of the Sphingomonadaceae and Moraxellaceae families) may play an important role in gut homeostasis that has yet to be elucidated.

Methods

Samples and fractions

Samples were obtained from six healthy volunteers (between 20 and 36 years old, three male and three female) identified as V1, V2, V3, V4, V5 and V6. All participants expressed their interest in participating in this study by signing an informed consent form, approved by the Ethics and Research Committee of Centre for Public Health Research (CSISP) of Valencia, Spain. One of the main conditions of exclusion was that no antibiotics had been administrated during the last two months prior to sampling. None of the volunteers had organic intestinal disorders. All volunteers follow a Mediterranean diet.

For each faecal sample, we studied the taxonomical distribution of faecal suspension (FS), active bacteria (Act) and S-IgA-coated bacterial fractions (IgA).

The volunteers collected faecal material in sterile 30 ml screw-cap containers (25690 mm; PP SPOON; DELTALAB), containing 8 ml RNAlater (Ambion #AM7020) in order to preserve RNA. Samples were kept at −25°C, delivered to the lab within the next 24 hours and immediately processed. From each sample, the faecal material was resuspended by vortexing (2 min). It was washed twice in physiological solution (NaCl 0.9%). Faecal suspension was centrifuged (800 g) for 2 min to pellet big aggregates. Then, supernatant was centrifuged at 7500 g for 7 min to collect microbial cells from faecal suspension. Pellet was washed twice in physiological solution (NaCl 0.9%). Cells were immediately fixed adding 1/10 volume of 37% formaldehyde (final concentration: 3.7%) and incubated over-night at 4°C. Fixed cells were washed twice to remove residual formaldehyde and resuspended in 0.1 ml of physiological solution. The samples were stored at −20°C after adding 1 volume of absolute ethanol. Three fractions were obtained from each sample stored in the previous step. The DNA in the first aliquot was extracted prior to flow cytometry steps (FS fraction). The second and third fractions were obtained by flow cytometry sorting. Thus we obtained: the active population by the mean of RNA content and the S-IgA-coated population using anti-human IgA immunoglobulin. For staining and flow cytometry protocols, fixed cells from previous steps were washed and diluted to achieve an optical density (O.D. 600) around 0.2 using physiological solution.

Staining

Cell labelling and flow cytometry sorting

Before sorting, cell suspensions were disaggregated by mild sonication and filtering, so cells ran freely in the microfluidic system (see SI Figure 3). Ten microlitres of pyronin-Y (Sigma-Aldrich, #P9172, 10 mg/ml) diluted to 100 μM was added to the samples (1 ml of volume) for total RNA staining and incubated for 1 hour at 4°C. The samples were then stained with SYTO62 (Invitrogen, #S11344) according to manufacturer instructions (final concentration of 0.5 μM) in order to distinguish the bacteria from the noise during the flow cytometry, by the mean of their DNA content. S-IgA-coated bacterial staining was performed using anti-human IgA labelled with FITC (Invitrogen, #62-7411). Anti-mouse IgA labelled with FITC (Invitrogen, #M31001) was used for isotype control (see Supplemental Information, Figures 3 and 4).

Flow cytometry sorting was carried out using the MoFlo™ XDP Cell Sorter. The light sources were the Argon 488 nm (blue) laser (200 mW power) and the 635 nm (red) diode laser (25 mW power). The lasers were aligned using Flow-Check™ (10 μm) and Flow-Set™ (3 μm). The cytometer emission filter was 520/30 (FL1), 580/30 (FL2) and 680/30 (FL4) obtaining emission for FITC, Pyronin Y and SYTO62, respectively. The trigger was set on side-scatter.

DNA extraction and sequencing

DNA extraction from all fractions was carried out using the CTAB method50. Total 16S rDNA was amplified from each fraction using 8F and 530R universal primers for bacteria51 using multiplex identifiers (MIDs, Supplemental Information, Table 2). PCR products obtained from each fraction/sample were purified by Nucleofast 96 PCR filter plates (Macherey Nagel #74310050) and concentrations were measured by PicoGreen assay. The products were pooled obtaining balanced final concentrations and sequenced using the 454 GS-FLX pyrosequencer (Titanium chemistry, Roche).

Bioinformatics

Obtained sequences were trimmed from the end in sliding windows of 10 nucleotides when the average quality value was lower than 20, using Prinseq (v0.19.4) software52. This considerably improved the quality of the reads, as the quality of pyrosequencing reads has been shown to dramatically decrease towards the end of the sequences53. All sequences shorter than 200 nucleotides were not considered. Taxonomic assignations were carried out using the RDP_classifier54 and phylogenetic ranks were assigned when scores exceeded 0.8. Clustering was performed by the use of CD-HIT software55.

Percentage distributions of fractions coming from flow cytometry sorting (Act and IgA) were multiplied by the cell abundance rate of the sorted fraction with respect to the total amount of cells counted during the whole sorting (see Table 1 column “Fraction rate”). Percentage values of FS fractions remained unaltered.

Descriptive and statistical analyses were carried out with the R statistic environment using Vegan R package56,57. Venn diagrams for pan-microbiota analysis were obtained using Vennerable R package58. Flow cytometry data were analysed using R package flowCore and flowViz from Bioconductor59,60,61.

Accession numbers

Sequences were deposited in EMBL-EBI Sequence Read Archive (SRA) under study number ERP002046.