The microbial communities colonizing different body districts comprise trillions of microorganisms that perform vital functions and play a role in keeping us healthy1. The human gut microbiota is composed of bacteria, archaea, fungi, protozoans, and viruses, particularly bacteriophages, probably due to the prevalence of bacteria in this environment2. The microbiota can be viewed as a community composed of autochthonous or resident microorganisms and allochthonous or transient microorganisms3. Bacteria, mostly anaerobic, are predominant in this environment and consist of two main phylotypes: Bacteroidetes, including the genera Prevotella and Bacteroides, and Firmicutes, including Clostridium clusters and members of Eubacterium, Faecalibacterium, Roseburia, and Ruminococcus. On the other hand, Proteobacteria, Actinobacteria, Fusobacteria, and Verrucomicrobia phyla are present in relatively small numbers4. Various roles are attributed to the microbial community, including immune system maintenance, vitamin production, digestion, energy homeostasis, angiogenesis, metabolite synthesis, and the maintenance of intestinal barrier integrity5. The gut microbiota develops in children between the ages of one and three and remains relatively stable throughout life. During the transition from childhood to adulthood, the genera Bifidobacteria decrease, while Bacteroidetes increase, affecting the gut's metabolic activity and health. Several studies have reported that Bifidobacteria are beneficial, playing a role in protecting the gut epithelium, while Firmicutes, particularly Clostridia and Enterobacteriaceae, whose numbers increase in elderly subjects, are considered detrimental6.

Although microbiota evolves throughout the lifetime, it is now recognized that one-third of the gut microbiota is common to most people, whereas the remaining two-thirds are specific to each individual. In particular, at lower taxonomic levels (i.e., species), the microbiota is influenced by several individual factors, such as type of delivery at birth and the method of infant feeding, the use or abuse of medications, especially antibiotics, diet, supplements, lifestyle habits (smoking, physical activity) etc1,7,8. Recent studies have highlighted emerging differences in microbiota composition even in population cohorts with similar genetic and cultural backgrounds9,10,11. The balance between the two most important phyla found in the gut, Firmicutes and Bacteroidetes, is essential to maintaining homeostasis in the host and, consequently, health.

Considering the central role played by intestinal microbiota in modulating several pathological disorders and response to treatments, investigating their population-specific variations may lead to findings that contribute to enhancing the benefits of existing diagnostic and therapeutic strategies12,13,14. Identifying and classifying specific sets of microbiota features that promote health is an essential first step to correcting microbial configurations implicated in disease. In fact, although attention is already being focused on how to manipulate microbiota to improve health status, there is a consensus within the scientific community that studying the factors constituting the normal ranges of these features in healthy populations is of fundamental importance15. Despite several important papers have been published on the microbiota composition of Italians suffering from specific diseases or as a changing ecosystem, few data are available on the microbiota of the healthy population1,9,16. Hence, our study aimed to define the reference intervals of the gut microbiota of a sample of Italian subjects with relatively homogeneous physiological features.


Samples of 148 participants (M: 69, F: 79; age: 39.8 ± 16.8 years; height: 164.4 ± 18.3 cm; weight: 61.2 ± 17.5 kg; body mass index: 22.0 ± 4.1 kg/m2) were collected and analysed. Of the 148 subjects, 22 were under 18 years old, 16 were smokers, 35 were ex-smokers, and 97 were non-smokers. In addition, 104 practised sports. Regarding diet, 93 participants had a typical Mediterranean diet, 15 were vegetarian, 4 were vegan, 4 were on a paleo diet, and the remaining 32 reported following another type of diet. Finally, 15 subjects reported avoiding eating certain foods for dietary or ethical reasons, and 27 had other kinds of sporadic food restrictions. The sample size was large enough for the aim of the paper. Indeed, the total number of genera found in the gut pan-microbiota increased with the number of samples, as shown in Supplementary Fig. S1.

Subject clustering

The analysis performed using the elbow and silhouette methods showed that the best clustering solution was the one with two clusters of subjects; both methods yielded the same results (Supplementary Fig. S2). The two clusters are clearly defined: Cluster 1 (C1) comprises 24 subjects, while Cluster 2 (C2) includes 124 subjects. Differences were observed between the two clusters in terms of phyla (F = 72.42), families (F = 9.43), and genera (F = 4.94) (PERMANOVA, Bray–Curtis dissimilarity index, Bonferroni corrected; P < 0.001 for all comparisons). Heatmap was used to represent the abundance of the most prevalent phyla. Firmicutes and Bacteroidetes together constituted more than 85% of the total phyla abundance. The difference between the two clusters, C1 (Fig. 1 above) and C2 (Fig. 1 below), is evident: Bacteroidetes were prevalent in C1, while Firmicutes were prevalent in C2.

Figure 1
figure 1

Heatmap of the most represented phyla. Each column represents a single phylum, with each row representing a different subject/sample. Bray–Curtis distance was used as a clustering method. Two different clusters of subjects were selected as the best solution (see above).

General features of the population

The subjects in the two clusters had similar characteristics, as was shown by the comparison reported in Table 1. No significant differences were found between the two clusters in terms of gender, body mass index, diet type, smoking habits, or alcohol consumption. Regarding food restrictions, no significant difference was found for lactose "intolerance" (subject's self–definition, in the absence of any clinical diagnosis) between the two clusters. By contrast, a three-fold increase was found for other food restrictions in the subjects in C1 (41.7% C1 vs 13.7% C2, P = 0.003). The macro-region of origin (north vs centre vs south) also showed a difference in the distribution between the two clusters, with C2 being comprising almost a doubled proportion of subjects living in northern Italy with respect to C1 (62.1% vs 33.3%, respectively), and a lower percentage of subjects living in central Italy (29.8% vs 54.2%, respectively) (P = 0.03). Finally, physical activity habits differed between the two clusters, with a higher proportion of subjects in C2 who declared practising more than 4 h/week of physical activity with respect to C1 (26.6% vs 8.3%, respectively; P = 0.04).

Table 1 The sociodemographic characteristics of the subjects in the two clusters.

The microbial diversity and richness of the two clusters

In addition to the presence of food restrictions in C2, the cluster is also characterized by higher species richness (OTU number: C1 = 146.67 ± 43.67; C2 = 198.17 ± 48.47; F(1,146) = 23.40; P < 0.001) and diversity (Shannon effective number of species: C1 = 16.88 ± 8.66; C2 = 35.01 ± 13.40; F(1,146) = 40.50; P < 0.001) when compared to C1.

Microbial community composition

The microbiota profile of 148 Italian volunteers were characterized at the taxonomic level (i.e., phylum, family, and genus; Table 2). The taxonomic assignment of the V3-V4 hypervariable region of the 16S rRNA gene showed Firmicutes (46.5%) and Bacteroidetes (43.2%) to be the predominant phyla. Proteobacteria (6.2%) were less abundant, while the remaining phyla were rarely detected, constituting altogether 4.2% of the total population. The analysis of the two clusters shows a higher percentage of Bacteroidetes (57.5% vs 40.4%) and a lower percentage of Firmicutes (26.9% vs 50.2%) in C1 compared to C2; hence, there was a higher Firmicutes/Bacteroidetes ratio in C2 than in C1, as shown in Fig. 2 (F/B Ratio: C1 0.51 ± 0.22, C2 1.33 ± 0.48; Mann–Whitney U = 73.0; P < 0.001).

Table 2 Mean, upper and lower limits of 95% reference intervals are reported for phyla, families, and genera with abundance > 0.1%.
Figure 2
figure 2

Violin plots of the Firmicutes/Bacteroidetes ratio in the two clusters.

At the family level, Bacteroidaceae (25.9%), Lachnospiraceae (21.8%), Prevotellaceae (10.5%), Ruminococcaceae (9.9%), and Oscillospiraceae (2.5%) were the most representative among the over two-hundred families that were detected. Among these, significant differences were detected between the two clusters for Lachnospiraceae, Ruminococcaceae, and Oscillospiraceae, all the three being more abundant in C2 with respect to C1. Among the genera, Bacteroides (25.9%), Prevotella (8.0%), and Faecalibacterium (4.1%) showed the highest percentages of abundance.

In Fig. 3, non-parametric correlation matrices are reported for all genera that showed an abundance greater than 0.5%, both for C1 (Fig. 3, upper plot) and C2 (Fig. 3, lower plot). In C1, some genera showed a strong negative correlation, as for Prevotella with Bacteroides, as well as Sutterella with Lachnospira, Monoglobus, and Parasutterella. On the contrary, some genera were found to be positively co-graduated. Overall, correlations detected in C1 should be interpreted with caution, as this cluster only comprises 24 subjects. As can be noted, C2 showed weaker correlations among genera with respect to C1, except for strongly negative values between genus Bacteroides and genus Clostridia UCG-014, and again, between Bacteroides and Prevotella.

Figure 3
figure 3

Spearman correlation plots of genera (cut off: relative abundance 0.5%) in the two clusters.


Defining a healthy microbiota has become a major challenge for scientists. Although variations in the microbial community associated with a wide range of pathologies, from gastrointestinal disorders, autoimmune diseases, and cancer to mood disorders, have been documented, the composition and functional characteristics of a healthy microbiota have yet to be fully elucidated12,13,17,18,19,20. Defining the composition and functional characteristics of a healthy microbiota is so challenging because the microbiota is strongly influenced by a wide range of factors, including genetics; the mode of delivery at birth and the method of infant feeding; the use or abuse of medications and supplements, especially antibiotics, diet, lifestyle habits (smoking, physical activity), etc1,7,8. Recent studies have highlighted differences in microbiota composition even in population cohorts with similar genetic and cultural backgrounds9,10,11.

This study aimed to characterize the gut microbiota composition of healthy volunteers from Italy. The analysis of faecal samples provided information on the microbial composition that was in line with previously reported results21. The phyla Firmicutes and Bacteroidetes are indeed closely related to a healthy profile as well as a low number of species belonging to Proteobacteria phyla due to the anaerobic conditions in the colon22. Nevertheless, by k-means clustering, we were able to identify two main groups, C1 and C2, with the latter characterized by higher richness, diversity, and Firmicutes/Bacteroidetes ratio.

It is well known that most bacterial genera of the human gut microbiota belong to the phyla Firmicutes or Bacteroidetes, which account for about 90% of intestinal resident microorganisms23. Firmicutes, sub-grouped in Clostridium coccoides (Clostridium cluster XIVa) and Clostridium leptum (Clostridium cluster IV), are responsible for assimilating carbohydrates and animal fat, which are associated with the onset of obesity24,25. Among Bacteroidetes, the two prevalent genera in the human colon are Bacteroides and Prevotella; the former is highly associated with the consumption of animal proteins, amino acids, and saturated fats, which are typical components of the Western diet, and the latter with the consumption of complex carbohydrates and simple sugars, which are important components of vegetarian diets26,27.

Although several studies have attempted to define the composition of a healthy microbiota, such a definition remains elusive due to the many intrinsic and extrinsic factors influencing the gut ecosystem28. Nishijima et al.29 compared the compositions of the gut microbiomes of people from twelve different countries worldwide, showing great variations in the microbiome structure and function in healthy adults from different countries. At the genus level, the relative abundance of Bacteroides reported herein for the Italian population was similar to that reported for the United States, Canada, and Spain; similarly, the relative abundance of Prevotella was analogous to that which was reported for Canada, Denmark, Spain, and Russia. It can be noted that the microbiota composition found in this study in the healthy Italian population is similar to the composition reported in other countries with a predominantly Caucasian ethnicity29. The additional value of our study compared to similar investigations consists of quantifying reference intervals, which could have a direct application in diagnostics. Moreover, De Filippo et al. reported a significant enrichment in Bacteroidetes and a depletion of Firmicutes in African children whose diet was based on cereals, legumes, and vegetables and rich in carbohydrates, fibre, and non-animal protein30. The bacteria belonging to the genus Bacteroides are known to produce short-chain fatty acids (SCFA) and may thus contribute to preventing gut inflammation. Accordingly, multiple studies have reported an association between inflammatory bowel disease (IBD) and the flora disequilibrium of Bacteroides31,32,33,34.

In our study, Firmicutes showed a lower abundance than Bacteroidetes in C1. The greater relative abundance of Bacteroidetes suggests that in the intestines of those subjects, there may be a lower number of bacterial species favouring the onset of metabolic diseases such as those belonging to Firmicutes. In this regard, we also observed that the participants in this study, grouped according to their food habits and, to a lesser degree, according to their geographical origins, are mostly included in C2. Indeed, a significantly higher proportion of participants in C1 reported having food restrictions, mainly related to the consumption of dried and fresh vegetables. A limited consumption, or even worst, the exclusion of fresh vegetables and their derivatives from the diet, can induce alteration in the gut microbiota composition, leading to a reduction of fibre-degrading bacteria, able to produce greater amounts of SCFA35,36. Further, the heterogeneity of vegetables consumed has been positively correlated to the microbial alpha-diversity37. In addition, a significant difference in the distribution of the subjects among the Italian macro-regions (north, centre, south) was detected, with participants from the northern regions being more represented in the C2 and participants from the central regions being prevalent in the C1. This result is in accordance with those previously reported by Fontana and colleagues9, who detected differences in the gut microbiota composition of people pertaining to three different regions of Italy (one for each macro-area). Further, regular physical activity seems to be a discerning factor between the two clusters, with people who practice a higher volume of physical exercise (i.e., more than 4 h per week) being more present in the C2. This result is also coherent with a higher Firmicutes/Bacteroidetes ratio in this cluster, as this ratio was previously associated with higher cardiorespiratory fitness and athletic status38. This result is in accordance with other studies in the literature: Clarke et al.39 found a higher Firmicutes/Bacteroidetes ratio in a sample of rugby players compared to overweight controls; Huang et al.40 reported an increase in this ratio after six weeks of exercise and dietary restriction in obese adolescents; Donati Zeppa et al.41 found an increase in the Firmicutes/Bacteroidetes ratio after nine weeks of high-intensity interval training in healthy males.

In conclusion, this study further supports the significance of Firmicutes and Bacteroidetes and their ratio as a scaffold of a microbiologically healthy gut and, consequently, of the body's wellbeing, similarly to a conventional human structural organ. Reference intervals at every taxonomic level can be used as a reference for verifying whether a single subject or sample belongs to the microbiological community defined in this study. Reference intervals here reported refer to a sample of subjects who do not self-report clear disease symptoms; however, it will be interesting to use these reference intervals for future studies to verify whether samples from patients with specific diseases may or may not have a different microbial community. In this sense, it will then be possible to define the sensitivity and specificity of the reported reference intervals according to specific pathologies. This can be done either at the univariate level (e.g., genus by genus) to verify whether each specific abundance is included within the reference interval or using a multivariate approach. In the latter case, the non-parametric approach of the Mahalanobis distance can be used. In this approach, the ranks of the abundances are used rather than the single absolute values42.

The association between the richness and diversity of gut microbiota and health has been demonstrated by Rinninella et al.1, although it appears difficult to identify a unique optimal gut microbiota composition. The main contribution of the present study is to help identify the existence, within the healthy Italian population, of a commonly distributed, constantly present, principal microbiological pattern, thus suggesting the presence of a sort of microbiological framework or scaffolding. In our view, this finding reinforces the concept that the human intestinal microbiota, with its morpho-functional and pathophysiological aspects, represents a real organ. Indeed, like anatomical organs, the intestinal microbiota may have individual variability; however, such variability must not substantially alter the fundamental framework of a healthy microbiota to ensure its correct functioning. Possible differences in the gut microbiota composition, diversity, and richness among individuals with the same ethnicity, residing in different Italian regions, or with different lifestyles only marginally affect the composition of the two main microbiological clusters identified in the present study. Taken together, our data highlight the significance of studies on population-specific variations in human microbiota composition. Nevertheless, at the same time, the present investigation underscores the need for variability studies to be able to consider even minimal variations in the intestinal microbiological population, given that, at least in the healthy population, there is a significant and reproducible presence of well-defined groups of bacteria, which represent a constant scaffold, or framework. In the next future, it will be crucial to share these data coming from different research groups in order to implement the “normal” range values or to build an algorithm capable of translating the composition of the microbiota associated with diseases states and of suggesting any dietary, pharmacological or lifestyle interventions in order to recover the state of eubiosis. Moreover, integrating these data with metabolomics and genetic variants could improve patient management. Again, this approach to studying the intestinal microbiota calls to mind studies focusing on human structural organs. Indeed, in our view, such an approach to the microbiota could help scientists to better design experimental plans and set up strategies based on precision tailored microbiota engineering.



A total of 148 control subjects from 17 Italian regions were recruited by the medical board, some of whose members were contributing authors of this work. The subjects, 69 males and 79 females, ranging in age from 23 to 57, were recruited from different Italian universities under the supervision of the University of Urbino Carlo Bo (Ethics Committee approval no. 34_2021). All the participating institutions followed the same pre-analytical and analytical procedures. All the subjects agreed to participate according to the ethical guidelines of the 2013 Declaration of Helsinki and signed written informed consent. The volunteer participants were selected to create a model of the Italian adult Caucasian population adequately represented in terms of gender, age, geographical origin, and place of residence (city or countryside) and that falls within the criteria of WHO definition of a "healthy" state of “complete physical, mental, and social well-being, not merely the absence of disease or infirmity”. In detail, the medical board evaluated each subject's complete medical history in order to exclude those who did not meet the study's inclusion criteria. The following subjects were excluded: those being treated with antibiotics or other drugs, those consuming probiotics, and those having a known history of inflammatory bowel disease, systemic disease, other autoimmune, metabolic, or psychiatric disorders or cancer. A questionnaire was then administered to each participant to collect the following information: body mass index, dietary habits, contact with farm animals or pets, smoking and physical activity habits, alcohol consumption, breastfeeding). Dietary patterns were classified into the Mediterranean, vegetarian/vegan, and others; the routine use of probiotics was also assessed. Furthermore, lactose and other food restrictions were evaluated, i.e., voluntary limited consumption or exclusion of specific food groups (mainly dried or fresh vegetables and derivatives).

Sample collection and DNA extraction

Samples were collected over two years, from fall 2017 to spring 2019. Fresh stool samples were collected within tubes containing a DNA stabilization buffer (Canvax Biotech) from each participant. In order to reduce any possible bias, pre-analytical and analytical procedures were performed at only one centre, according to our previously published study9. QIAamp DNA Stool Mini Kit (Qiagen, Milan, Italy) was utilized to perform total DNA extraction starting from 250 µL of each sample following the manufacturer's protocol. Once collected in the stabilizing liquid, samples were processed according to standardized times, usually not exceeding 5 days from the withdrawal; when was not possible to process the samples immediately at the arrival in the laboratory, they were stored at − 80° until processing, after assessing DNA concentration and purity.

16S rRNA gene sequence data processing

The Illumina 16S Metagenomic Sequencing Library Preparation for high-throughput sequencing was performed as follows: 12.5 ng of each DNA extract was employed for the amplification of the V3–V4 hypervariable regions of the bacterial 16S ribosomal RNA (rRNA) gene, using the following primers with Illumina adapters (underlined):

Forward primer (341F):


Reverse primer (785R):


As reported in Klindworth et al.43. Agencourt AMPure XP beads (Beckman Coulter, Milan, Italy) were used to purify PCR amplicons. The amplicons were then used for a second PCR in order to barcode the libraries using the Illumina dual-index system (Nextera XT Index Kit, Illumina Inc., San Diego, CA, USA) necessary for multiplexing. Following a second purification step, the eluted DNA products were quantified using the Qubit dsDNA BR Kit assay, diluted to 4 nM and pooled. The purified DNA products were then subjected to an additional PCR to attach dual Illumina indices (Nextera XT Index Kit, Illumina Inc., San Diego, CA, USA) necessary for multiplexing. Paired-end sequencing (2 × 300 cycles) was carried out using an Illumina MiSeq instrument (Illumina Inc.) according to the manufacturer's instructions. Sequences were demultiplexed based on index sequences, and FASTQ files were generated. FASTQ raw sequencing data were imported into QIIME2 v.2021.244 environments, and then Illumina primers were removed using q2-cutadapt plugin in trim-paired mode45. Trimmed sequences were denoised in paired-end mode using q2-dada2 plugin46. The assignment of taxonomy to amplicon sequence variants (ASVs) was performed with q2-feature-classifier plugin47 against the pre-trained Naïve Bayes classifier SILVA 138 99% operational taxonomic units (OTUs) full-length sequence dataset49.

Statistical analyses

The pan-microbiota (total observed richness in all samples) was determined in subsets of increasing size composed of randomly chosen samples (250 repetitions for each sample size). A collector’s curve, i.e., the total number of observed genera with increasing numbers of samples collected, was subsequently calculated (chronological order) (10 repetitions for each sample size), according to Falony et al.50. Once the sample's representativeness was checked, and it was noted whether the abundances showed very dispersed values, the presence of any homogeneous subgroups within the sample was verified. The vegdist function (vegan R package) was used to calculate Bray–Curtis distance, and the kmeans function was used to create the clusters51. The elbow and silhouette methods were used for determining the optimal clusters. Permutational multivariate analysis of variance (PERMANOVA) was performed on the Bray–Curtis distance matrix to determine if the gut microbiota structure differed between the two clusters, considering phyla, families, and genera. The adonis2 function of the vegan R package was used. A heatmap was built as a graphical representation of the most abundant (representing 99% of total abundance) phyla using the pheatmap R package52. A chi-square test, or Fisher exact test when at least one class had n < 5, was used to test differences in microbiota composition and participant characteristics between the two clusters. The results are presented with p2) and Cramer’s V. V values should be interpreted as > 0.5 = high association, 0.3 to 0.5 = moderate association, 0.1 to 0.3 = low association, 0 to 0.1 = little or no association. Richness (OTUs number) and Shannon's effective number were calculated using the vegan R package. A non-parametric method was used to calculate the reference intervals related to phyla, families, and genera. The 90% confidence intervals relative to the 95% lower and upper limits of the reference intervals were calculated using the bootstrap method according to the NCCLS Guidance Document C28A253. The referenceIntervals R package was used54. The significance of differences in the abundance of phyla, families, and genera between clusters was tested using the Mann–Whitney test: P values from all statistical tests were adjusted for multiple comparisons within each taxonomic level, controlling the False Discovery Rate (FDR) (FSA R package) at level 0.05 using the Benjamini–Hochberg step-up procedure55. A graphical display of a non-parametric correlation matrix, based on Spearman's R, ordered according to hierarchical clustering, was obtained using the corrplot R package. All the analyses were conducted using Microsoft Excel 16, Prism 8 (GraphPad Software, San Diego, CA), and R Studio 3.6.2.

Ethics approval

The study was approved by the Urbino University Ethics Committee (approval number 34_2021).