Active and secreted IgA-coated bacterial fractions from the human gut reveal an under-represented microbiota core

Host-associated microbiota varies in distribution depending on the body area inhabited. Gut microbes are known to interact with the human immune system, maintaining gut homoeostasis. Thus, we studied whether secreted-IgA (S-IgA) coat specific microbial taxa without inducing strong immune responses. To do so, we fractionated gut microbiota by flow cytometry. We found that active and S-IgA-coated bacterial fractions were characterized by a higher diversity than those observed in raw faecal suspensions. A long-tail effect was observed in family distribution, revealing that rare bacteria represent up to 20% of total diversity. While Firmicutes was the most abundant phylum, the majority of its sequences were not assigned at the genus level. Finally, the single-cell-based approach enabled us to focus on active and S-IgA-coated bacteria. Thus, we revealed a microbiota core common to the healthy volunteers participating in the study. Interestingly, this core was composed mainly of low frequency taxa (e.g. Sphingomonadaceae).

bacteria whose diversity is statistically hidden by the dominant majority 21 . The gastrointestinal tract is the richest body area in terms of microbial groups. The most represented taxa are also the most commonly shared between different samples. However, most of the yet-unknown taxa belong to the under-represented fraction of the gut microbiota 22,23 .
While next-generation DNA sequencing methodologies shed light on the enormous variety of bacteria inhabiting the human body, the direct cell-by-cell understanding of which the active players are and which of them interact with the immune system still represents a research challenge. Previously, Van Der Waaij and collaborators (1994) described a method for human faecal microbiota sorting based on fluorescent labelling of anti human IgA 24 . We used an interdisciplinary single-cell approach involving RNA cell staining as a measure of activity 25,26 , anti human IgA fluorescent hybridisation, flow cytometry and cell sorting, 16S rDNA amplification and deep pyrosequencing to shed light on different segments of gut microbiota, such as the potential activity of under-represented bacteria. We observed that a long tail of low frequency taxa (below 1% in richness) could involve up to 20% of the active population. We also analysed active and S-IgA-coated bacteria from healthy volunteers which had not received antibiotic treatment and were apparently healthy. Our aim was to determine whether IgA were coating specific taxa. We identified several groups, such as members of Actinobacteria or Sphingobacteria for which there are already hypotheses regarding their role and activity, confirmed in murine models as active players in gut microbiota and/or as part of the S-IgA-coated fraction.

Results
Sequencing and diversity overview. From each sample and fraction we obtained an average of 6,980 sequences over 200 bp (min: 727; max: 14,359 reads). The average read length was 422 bp (1/ 289 bp). The number of reads per sample/fraction, as well as the number of particles (cells) sorted by flow cytometry and the ratio with respect to the total number are reported in Table 1. As can be observed, the fraction of active cells ranged from 34% (sample V2) to 66% (V4) with respect to the total community and the proportion of S-IgA-coated bacteria ranged from 21% (V4) to 98% (V3). Diversity analysis at the genus level showed that FS fractions present a lower evenness with respect to almost all Act samples (V2, V3, V4, V5 and V6) and all IgA fractions, as revealed by the means of the Shannon index. The richness estimator Chao1 detected different behaviours of FS with respect to Act and IgA, being lower in samples V1, V2, V3, V5 and V6 and higher in V4 (see Table 1).
Rarefaction curves flatten at the family level, and all FS fractions flatten earlier than Act and IgA fractions (see Figure 1). Analysis of family distribution revealed that Act and IgA fractions tend to cluster together ( Figure 2, panel A). Moreover, Act or S-IgA-coated bacterial fractions are not representative of the whole community; neither could they be considered a subset of it because of the great richness and evenness of species found. Sample V1 showed a different microbial distribution with respect to the other samples, as also shown in the cluster analysis where V1 fractions behave as an outgroup with respect to the other samples/fractions (Figure 2, panel A). At the level of genera, canonical correspondence analysis among all samples collected showed that two major components yield 43.62% and 17.60% of the observed diversity ( Figure 3). As expected, sample V1 is the most divergent and also noteworthy is the clear separation between the three fractions. These results are statistically corroborated by the analysis of variance, using distance matrix by ''Adonis'' (Permutational Multivariate Analysis of Variance Using Distance Matrices) known also as ''Permutational Manova'' 27,28 and applying Bray-Curtis distance calculation at the genus taxonomic rank. The analysis showed significant p-values for both fractions and sample grouping criteria ( Figure 3 and Supplemental Information, Table 1).
To classify low-and high-frequency families, we applied a method based on average inflection point, as previously described 21 . The frequency distribution per families considering the whole dataset had mean and median values of 1.60% and 0.1%, respectively (see Figure 2 Figure 2, panel C). Thus, the low-frequency taxa were extraordinarily diverse and diversity is characterized, as expected, by the ''long-tail'' effect.
OTUs 97 distribution and core microbiota. Using an analogy with the pan-genome concept 29,30 we can define ''pan-microbiota'' by identifying sequence clusters shared among samples (a ''core'' set of bacteria) as well as clusters specific to each sample. Comparative analysis of species-level operational taxonomic units (sequences at 97% similarity, 80% overlapping, hereinafter OTUs 97 ) revealed that among all samples, 28, 21, and 69 OTUs 97 were shared, respectively, among IgA, Act and FS fractions (see Figure 4). The shared diversity of the FS core comprised only members of highly abundant families of Firmicutes and Bacteriodetes, while in Act and IgA fractions, members of Proteobacteria were also found, such as Alphaproteobacteria mainly belonging to Sphingomonadales, Gammaproteobacteria belonging to Pseudomonadales, and Betaproteobacteria belonging to Burkholderiales; in the IgA core we also found members of Actinobacteria, which were absent in the FS (see Table 2).
A less restrictive analysis describing the core bacteria made by clusters represented in at least 5 out of 6 samples showed that the FS core was almost invariable, adding only members of Streptococcaceae. The Act fraction also included members of Xanthomonadaceae from Gammaproteobacteria; Microbacteriaceae and Nocardiaceae from Actinobacteria (see Table 2).
Taxonomy analysis. Unassigned genera. Genera were considered unassigned when annotated by RDP classifier with confidence value score lower than 0.8. These were annotated using upper taxonomic rank levels. In all samples a very high rate of unassigned genera was found. In FS we found that unassigned genera ranged from 24.77% of reads in V6 sample to 44.56% in V4 sample. In the Act fraction, the percentage of unidentified genera was more uniformly distributed, ranging from 8.27% (V3) to 18.83% (V4), except for sample V1 (0.46%). In S-IgA-coated fractions, unidentified genera ranged from 2.45% in V1 to 35.01% in V2.
The number of OTUs 97 belonging to unassigned genera was of 9,122 out of 14,558. These data are shown grouped by main classes in Supplemental Information, Figure 1.
Firmicutes. As expected, Clostridiales was the most populated order not only within Firmicutes but also considering the rest of phyla. Overall, among taxa with frequencies equal to or greater than 1% we found: Lachnospiraceae (including: Lachnospira, Roseoburia, Incertae Sedis XIV, Blautia); Ruminococcaceae (Faecalibacterium) and other unidentified Clostridiales (see Supplemental Information, Figure 2 -Firmicutes). Within the Bacilli class, we found several low-frequency (less than 1% in frequency) families and genera belonging only to Act or IgA fractions. Within Lactobacillales, members of Enterococcus, some unidentified Enterococcaceae and Lactobacillus were found to be common to all samples, at least in Act and IgA fractions. Within Streptococcaceae, Streptococcus is the one shared among all samples (see Supplemental Information, Figure 2 -Firmicutes). Bacteroidetes. Bacteroidetes was the second most recruited phylum, in which the genus Bacteroides was found among all but sample V1. It was highly recruited from the FS fraction, accounting for up to 38.86% of total reads in sample V6 (19.81%, 15.21%, 13.93%, 7.80% in V5, V3, V4 and V2 respectively). Several groups of Bacteroidetes were mainly recruited from the FS fractions, for example, unidentified Porphyromonadaceae were found uniquely in FS fractions from all but V5 samples. Prevotella was also one of the universally recruited groups in Act (all samples), IgA (samples V1, V2, V3 and V5) and FS (samples V1, V2, V3, V4) fractions.
Parabacteroides was similar to Prevotella; it was found in FS (all but V6), Act (all samples) and IgA (all but V5) fractions (see Supplemental Information, Figure 2 -Bacteroidetes).
Proteobacteria. Almost all Proteobacteria have been found as part of active and/or S-IgA-opsonised microbiota.
Only Sphingomonas, member of Alphaproteobacteria, was found at frequency higher than 1% at least in the IgA fraction of V1, V2, V3 and V4; although they were also present in less than 1% in V6 (Act and IgA). Pseudomonas was also highly recruited in Act and IgA fractions of samples V3, V4 and V6. No Proteobacteria over 1% in frequency was found in sample V5. Several families of  Sphingomonadales were found to be common to all samples, such as Porphyrobacter and unassigned Erythrobacteraceae (all but V4 and V5), Novosphingobium (all samples), Sphingomonas (all but V5) and other unidentified families (see Supplemental Information, Figure 2 -Proteobacteria).
Within Betaproteobacteria, order Burkholderiales, family Sutturellaceae, Parasutterella was commonly retrieved from all fractions. Also Sutterella from the same family was found in all but V4 and V5 samples. Within Comamonadaceae, Pelomonas was found in all samples but never in FS fractions.
Some members of Deltaproteobacteria, mainly related to Desulfovibrionales were found, but without any kind of pattern and mainly present in FS fractions.
Gammaproteobacteria such as Escherichia/Shigella were found in low frequency among all samples: in FS (V5 and V6 samples), Act (all samples) and IgA (V1, V2, V5 and V6). Other Enterobacteriaceae such as Morganella was found in the IgA fraction of all samples. Interestingly, members related to Alcanivorax from Oceanospirillales, were found in all but V4 samples at least in the IgA fraction. Within Pseudomonadales, Acinetobacter was found in all but V2 samples in Act and/or IgA fractions. Pseudomonas was also one of the common genera retrieved from all Act and IgA fractions, but not from FS. Also Stenotrophomonas from Xanthomonadales was found in all but V2 samples from Act or IgA (or both) fractions.
Actinobacteria. No genera belonging to Actinobacteria were found at a frequency higher than 1%. Within Actinomycetales, Microbacterium from Microbacteriaceae was found to be commonly present in all samples in Act (except for V1) and/or IgA but never in the FS fraction. The same trend was observed for Rhodococcus from   Nocardiaceae and Propionibacterium from Propionibacteriaceae. Within Coriobacteriales, Collinsella was retrieved from all but V5 samples whereas in samples V2 and V4 it was also recruited from the FS fraction (see Supplemental Information, Figure 2 -Actinobacteria).

Discussion
It is now widely accepted that most of the relationships between micro-organisms and higher organisms (e.g., plants, animals and others) are most probably commensal or mutualistic 31 . In this frame, there is growing evidence that the immune system evolved accommodating bacterial colonization of growing complexity modulating its reactivity 18,32,33 . In this work, we show that taxonomic distribution obtained from active and S-IgA-coated bacterial fractions isolated from healthy human volunteers' faecal samples are highly dissimilar from those obtained from DNA extracted directly from faecal material. Generally speaking, what we observed in FS is coherent with results obtained in several other studies concerning gut microbiota distribution, in which stool samples are used 34 . For example, there are previous reports that dominant OTUs are the ones which are more widespread among samples 23 . Firmicutes and Bacteroidetes are the main taxa commonly retrieved from all human stool samples and their ratio is often influenced by diet 35 . As also observed by Tap and collaborators 34 , several dominant taxa belonging to Lachnospiraceae, Clostridiaceae or Bacteroidaceae have commonly been found in the core microbiota of FS fractions of all samples, as also in this work. In addition, the use of flow cytometry-based sorting to obtain fractions, with subsequent pyrosequencing, enabled us to access active and S-IgA-coated bacterial fractions.
It may seem contradictory that many bacteria found in the active and S-IgA-coated fractions were not detected in the faecal controls. The reason for this lack of detection is probably related to the lack of amplification of rare bacteria, as PCR is biased to preferentially amplify the most common DNA templates in the sample 36 . Thus, the dominance of some bacterial taxa in faecal samples masks an important part of the diversity, which includes active and IgA-opsonised bacteria.
We found that overall the ratio of Bacteroidetes/Firmicutes recruitments is maintained in FS, Act and IgA microbiota but, in the latter two fractions, their presence decreased in favour of other phyla, such as Proteobacteria and Actinobacteria. Thus employing the RNA/per cell content as a measure of bacterial activity 25,26 , we show that several still unidentified or very rare taxa are indeed active players of human microbiota and in strict interaction with the immune system. In this context, when active and S-IgA-coated bacterial fractions were characterized, we were able to widen the microbiota core to members of Proteobacteria including Sphingomonadaceae and Pseudomonadaceae, while, in a less restricted core (in five out of six samples), we also found members of Actinobacteria. The presence of core S-IgA-coated bacteria is coherent with observations that the immune system is continuously stimulated by commensal bacteria 37,38 .
Within Firmicutes, we found that active core microbiota includes members of Blautia whose species B. productia has been tested as active fermenter in simplified human intestinal microbiota experiments using germ-free rats 39 . Other members of Firmicutes, such as Lactobacillus or Enterococcus, which are important due to their role as immune system modulators or for helping intestinal absorption 40 , have been commonly found as part of active and IgA fractions.
Regarding Bacteroides, strict anaerobes such as members of genus Prevotella, known as cellulose and xylane degraders, were already found as part of the core microbiota of children whose diet was high in fibre content 35 . In our experiment we found Prevotella as core bacteria only when we focused our attention on active or S-IgAcoated bacteria. Thus, although members of this group are considered rare, they are commonly found to be active. A very interesting finding is the case of the phylum Proteobacteria, whose members are almost invisible when DNA extraction from faecal samples is used without the application of any selection or sorting protocol. In our work, we pointed out that Proteobacteria are recurrent active members of gut microbiota, as well as in strict interaction with the immune system. Sphingomonadales members (from Alphaproteobacteria) are important players shaping the immune system. Sphingomonas sp. was highly retrieved in four out of six samples while it was found in less than 1% in frequency in the other two samples. Studies in germ-free mice supplemented with conventional or restricted gut microorganisms show that glycosylceramides, which are a component of Sphingomonas cell wall, are identified as CD1d ligands stimulating invariant natural killer T cells 18,41,42 . We observed that several other Sphingomonadales families appear to be represented in the core microbiota, including the genera Porphyrobacter, Novosphingobium, Sphingomonas and others not well characterized yet but falling within the Sphingomonadaceae family.
Other commonly retrieved Proteobacteria retrieved from all fractions were members of Beta class, belonging to the order Burkholderiales. Parasutturella and Sutturella have been linked to healthy flora in studies of lean and obese mice 43 . Burkholderia were also commonly retrieved from active and S-IgA-coated microbial fractions but never in FS fraction probably due to their reduced frequency overwhelmed by dominant bacteria when a standard stool-based approach is applied. Burkholderia is often considered to be an opportunistic pathogenic bacteria but its commensal role makes it apt to survive in the intestine without generating infection processes. Hutchinson and collaborators reported Burkholderia as strict aerobic bacteria although viable colonies have also been reported to appear after days of anaerobic incubation 44 . This could explain its scarce presence and recruitment from the gastrointestinal tract in active and S-IgA-coated fractions.
Escherichia/Shigella related Enterobacteriaceae were found in almost all samples and fractions. We also found Alcanivorax-related members, which are known to be mostly marine, active obligate oildegrading bacteria 45 . Although its 16S rDNA homology with the closest member in GenBank is around 96% (data not shown), it is surprising to find it in active or S-IgA-coated fractions of almost all samples. Another Gammaproteobacteria identified was Acinetobacter (family Moraxellaceae) although its 16S similarity is far from A. baumani (97% similarity) and closer (99%) to other unidentified or uncultured Acinetobacter (data not shown). In a recent study in mice Acinetobacter, Stenotrophomonas and Comamonas are hypothesized to be part of the so-called ''crypt-specific core microbiota'' 46 . Here we found that these groups of bacteria are almost always recruited from human active microbiota or from human S-IgA-coated microbiota. Some genera were commonly found in Act and IgA fractions but they seem to be hidden when considering FS fractions.
Moreover, Actinobacteria were commonly recruited at frequencies lower than 1%. In their review, Turroni and collaborators 47 described Actinobacteria to have been commonly retrieved from faecal samples using classical culture-dependent approaches, but they are rarely found in metagenomics-based experiments 2,48,49 . Likewise, in our samples, Actinobacteria were almost absent in FS fractions while they were low-frequency recruited in active and/or S-IgA-coated bacterial fractions. Thus, their absence in the FS fraction cannot be attributed to some biases in the DNA extraction method (commonly applied to all fractions) or to the primers used for PCR amplification, but rather a result of the sequencing optimization when a specific fraction is selected using the flow-cytometry-based sorting approach.
In conclusion, the proposed approach enabled us to obtain 16S rDNA amplicons from genomic DNA of active bacterial cells, selected and sorted by flow cytometry. Compared to other methodologies where active bacteria are identified by retrotranscription of ribosomal RNA, the taxonomic distributions observed in this work are not biased by the rRNA copy number, found in each cell, while it is possibly biased only by the chromosomal 16S rDNA copies (chromosomal 16S rDNA). As already pointed out in our previous work 21 , the active fraction of gut microbiota represents a different approach to studying the human microbiota. The combination of this approach with the characterization of bacteria opsonised by different human immunoglobulins will allow future study of the active bacteria ignored by the immune system, as well as those actively growing but opsonised by different Ig types. We believe this will provide important insights into the host-microbiota interplay under both healthy and disease-related conditions. Finally, we describe the presence of a microbiota core among faecal samples of six healthy volunteer samples. This core contains members of taxa already described in previous works; however, focusing the analysis on fractions of active or S-IgA-coated bacteria highlights the presence of other ''important'' taxa, which were previously not visible. For instance, we found that in addition to the already known members of the gut microbiota core, such as Firmicutes and Bacteroidetes, other active bacteria such as members of Actinobacteria and Proteobacteria (including genera of the Sphingomonadaceae and Moraxellaceae families) may play an important role in gut homeostasis that has yet to be elucidated.

Methods
Samples and fractions. Samples were obtained from six healthy volunteers (between 20 and 36 years old, three male and three female) identified as V1, V2, V3, V4, V5 and V6. All participants expressed their interest in participating in this study by signing an informed consent form, approved by the Ethics and Research Committee of Centre for Public Health Research (CSISP) of Valencia, Spain. One of the main conditions of exclusion was that no antibiotics had been administrated during the last two months prior to sampling. None of the volunteers had organic intestinal disorders. All volunteers follow a Mediterranean diet.
For each faecal sample, we studied the taxonomical distribution of faecal suspension (FS), active bacteria (Act) and S-IgA-coated bacterial fractions (IgA).
The volunteers collected faecal material in sterile 30 ml screw-cap containers (25690 mm; PP SPOON; DELTALAB), containing 8 ml RNAlater (Ambion #AM7020) in order to preserve RNA. Samples were kept at 225uC, delivered to the lab within the next 24 hours and immediately processed. From each sample, the faecal material was resuspended by vortexing (2 min). It was washed twice in physiological solution (NaCl 0.9%). Faecal suspension was centrifuged (800 g) for 2 min to pellet big aggregates. Then, supernatant was centrifuged at 7500 g for 7 min to collect microbial cells from faecal suspension. Pellet was washed twice in physiological solution (NaCl 0.9%). Cells were immediately fixed adding 1/10 volume of 37% formaldehyde (final concentration: 3.7%) and incubated over-night at 4uC. Fixed cells were washed twice to remove residual formaldehyde and resuspended in 0.1 ml of physiological solution. The samples were stored at 220uC after adding 1 volume of absolute ethanol. Three fractions were obtained from each sample stored in the previous step. The DNA in the first aliquot was extracted prior to flow cytometry steps (FS fraction). The second and third fractions were obtained by flow cytometry sorting. Thus we obtained: the active population by the mean of RNA content and the S-IgAcoated population using anti-human IgA immunoglobulin. For staining and flow cytometry protocols, fixed cells from previous steps were washed and diluted to achieve an optical density (O.D. 600) around 0.2 using physiological solution.
Staining. Cell labelling and flow cytometry sorting. Before sorting, cell suspensions were disaggregated by mild sonication and filtering, so cells ran freely in the microfluidic system (see SI Figure 3). Ten microlitres of pyronin-Y (Sigma-Aldrich, #P9172, 10 mg/ml) diluted to 100 mM was added to the samples (1 ml of volume) for total RNA staining and incubated for 1 hour at 4uC. The samples were then stained with SYTO62 (Invitrogen, #S11344) according to manufacturer instructions (final concentration of 0.5 mM) in order to distinguish the bacteria from the noise during the flow cytometry, by the mean of their DNA content. S-IgA-coated bacterial staining was performed using anti-human IgA labelled with FITC (Invitrogen, #62-7411). Anti-mouse IgA labelled with FITC (Invitrogen, #M31001) was used for isotype control (see Supplemental Information, Figures 3 and 4).
DNA extraction and sequencing. DNA extraction from all fractions was carried out using the CTAB method 50 . Total 16S rDNA was amplified from each fraction using 8F and 530R universal primers for bacteria 51 Table 2). PCR products obtained from each fraction/ sample were purified by Nucleofast 96 PCR filter plates (Macherey Nagel #74310050) and concentrations were measured by PicoGreen assay. The products were pooled obtaining balanced final concentrations and sequenced using the 454 GS-FLX pyrosequencer (Titanium chemistry, Roche).
Bioinformatics. Obtained sequences were trimmed from the end in sliding windows of 10 nucleotides when the average quality value was lower than 20, using Prinseq (v0.19.4) software 52 . This considerably improved the quality of the reads, as the quality of pyrosequencing reads has been shown to dramatically decrease towards the end of the sequences 53 . All sequences shorter than 200 nucleotides were not considered. Taxonomic assignations were carried out using the RDP_classifier 54 , and phylogenetic ranks were assigned when scores exceeded 0.8. Clustering was performed by the use of CD-HIT software 55 .
Percentage distributions of fractions coming from flow cytometry sorting (Act and IgA) were multiplied by the cell abundance rate of the sorted fraction with respect to the total amount of cells counted during the whole sorting (see Table 1 column ''Fraction rate''). Percentage values of FS fractions remained unaltered.
Descriptive and statistical analyses were carried out with the R statistic environment using Vegan R package 56,57 . Venn diagrams for pan-microbiota analysis were obtained using Vennerable R package 58 . Flow cytometry data were analysed using R package flowCore and flowViz from Bioconductor 59-61 .
Accession numbers. Sequences were deposited in EMBL-EBI Sequence Read Archive (SRA) under study number ERP002046.