A proposal for the reference intervals of the Italian microbiota “scaffold” in healthy adults

Numerous factors, ranging from genetics, age, lifestyle, and dietary habits to local environments, contribute to the heterogeneity of the microbiota in humans. Understanding the variability of a “healthy microbiota” is a major challenge in scientific research. The gut microbiota profiles of 148 healthy Italian volunteers were examined by 16S rRNA gene sequencing to determine the range and diversity of taxonomic compositions in the gut microbiota of healthy populations. Possible driving factors were evaluated through a detailed anamnestic questionnaire. Microbiota reference intervals were also calculated. A “scaffold” of a healthy Italian gut microbiota composition was identified. Differences in relative quantitative ratios of microbiota composition were detected in two clusters: a bigger cluster (C2), which included 124 subjects, was characterized by more people from the northern Italian regions, who habitually practised more physical activity and with fewer dietary restrictions. Species richness and diversity were significantly higher in this cluster (C2) than in the other one (C1) (C1: 146.67 ± 43.67; C2: 198.17 ± 48.47; F = 23.40; P < 0.001 and C1: 16.88 ± 8.66; C2: 35.01 ± 13.40; F = 40.50; P < 0.001, respectively). The main contribution of the present study was the identification of the existence of a primary healthy microbiological framework that is only marginally affected by variations. Taken together, our data help to contextualize studies on population-specific variations, including marginal aspects, in human microbiota composition. Such variations must be related to the primary framework of a healthy microbiota and providing this perspective could help scientists to better design experimental plans and develop strategies for precision tailored microbiota modulation.


Results
Samples of 148 participants (M: 69, F: 79; age: 39.8 ± 16.8 years; height: 164.4 ± 18.3 cm; weight: 61.2 ± 17.5 kg; body mass index: 22.0 ± 4.1 kg/m 2 ) were collected and analysed. Of the 148 subjects, 22 were under 18 years old, 16 were smokers, 35 were ex-smokers, and 97 were non-smokers. In addition, 104 practised sports. Regarding diet, 93 participants had a typical Mediterranean diet, 15 were vegetarian, 4 were vegan, 4 were on a paleo diet, and the remaining 32 reported following another type of diet. Finally, 15 subjects reported avoiding eating certain foods for dietary or ethical reasons, and 27 had other kinds of sporadic food restrictions. The sample size was large enough for the aim of the paper. Indeed, the total number of genera found in the gut pan-microbiota increased with the number of samples, as shown in Supplementary Fig. S1. Subject clustering. The analysis performed using the elbow and silhouette methods showed that the best clustering solution was the one with two clusters of subjects; both methods yielded the same results (Supplementary Fig. S2). The two clusters are clearly defined: Cluster 1 (C 1 ) comprises 24 subjects, while Cluster 2 (C 2 ) includes 124 subjects. Differences were observed between the two clusters in terms of phyla (F = 72.42), families (F = 9.43), and genera (F = 4.94) (PERMANOVA, Bray-Curtis dissimilarity index, Bonferroni corrected; P < 0.001 for all comparisons). Heatmap was used to represent the abundance of the most prevalent phyla. Firmicutes and Bacteroidetes together constituted more than 85% of the total phyla abundance. The difference between the two clusters, C 1 (Fig. 1 above) and C 2 ( Fig. 1 below), is evident: Bacteroidetes were prevalent in C 1 , while Firmicutes were prevalent in C 2 .
General features of the population. The subjects in the two clusters had similar characteristics, as was shown by the comparison reported in Table 1. No significant differences were found between the two clusters in terms of gender, body mass index, diet type, smoking habits, or alcohol consumption. Regarding food restrictions, no significant difference was found for lactose "intolerance" (subject's self-definition, in the absence of any clinical diagnosis) between the two clusters. By contrast, a three-fold increase was found for other food restrictions in the subjects in C 1 (41.7% C 1 vs 13.7% C 2 , P = 0.003). The macro-region of origin (north vs centre vs south) also showed a difference in the distribution between the two clusters, with C 2 being comprising almost a doubled proportion of subjects living in northern Italy with respect to C 1 (62.1% vs 33.3%, respectively), and a lower percentage of subjects living in central Italy (29.8% vs 54.2%, respectively) (P = 0.03). Finally, physical activity habits differed between the two clusters, with a higher proportion of subjects in C 2 who declared practising more than 4 h/week of physical activity with respect to C 1 (26.6% vs 8.3%, respectively; P = 0.04).

Discussion
Defining a healthy microbiota has become a major challenge for scientists. Although variations in the microbial community associated with a wide range of pathologies, from gastrointestinal disorders, autoimmune diseases, and cancer to mood disorders, have been documented, the composition and functional characteristics of a healthy microbiota have yet to be fully elucidated 12,13,[17][18][19][20] . Defining the composition and functional characteristics of a healthy microbiota is so challenging because the microbiota is strongly influenced by a wide range of factors, including genetics; the mode of delivery at birth and the method of infant feeding; the use or abuse of medications and supplements, especially antibiotics, diet, lifestyle habits (smoking, physical activity), etc 1,7,8 . Recent studies have highlighted differences in microbiota composition even in population cohorts with similar genetic and cultural backgrounds [9][10][11] .
This study aimed to characterize the gut microbiota composition of healthy volunteers from Italy. The analysis of faecal samples provided information on the microbial composition that was in line with previously reported results 21 . The phyla Firmicutes and Bacteroidetes are indeed closely related to a healthy profile as well as a low number of species belonging to Proteobacteria phyla due to the anaerobic conditions in the colon 22 . Nevertheless, by k-means clustering, we were able to identify two main groups, C 1 and C 2 , with the latter characterized by higher richness, diversity, and Firmicutes/Bacteroidetes ratio.
It is well known that most bacterial genera of the human gut microbiota belong to the phyla Firmicutes or Bacteroidetes, which account for about 90% of intestinal resident microorganisms 23 . Firmicutes, sub-grouped in  www.nature.com/scientificreports/ Clostridium coccoides (Clostridium cluster XIVa) and Clostridium leptum (Clostridium cluster IV), are responsible for assimilating carbohydrates and animal fat, which are associated with the onset of obesity 24,25 . Among Bacteroidetes, the two prevalent genera in the human colon are Bacteroides and Prevotella; the former is highly associated with the consumption of animal proteins, amino acids, and saturated fats, which are typical components of the Western diet, and the latter with the consumption of complex carbohydrates and simple sugars, which are important components of vegetarian diets 26,27 .
Although several studies have attempted to define the composition of a healthy microbiota, such a definition remains elusive due to the many intrinsic and extrinsic factors influencing the gut ecosystem 28 . Nishijima et al. 29 compared the compositions of the gut microbiomes of people from twelve different countries worldwide, showing great variations in the microbiome structure and function in healthy adults from different countries. At the genus level, the relative abundance of Bacteroides reported herein for the Italian population was similar to that reported for the United States, Canada, and Spain; similarly, the relative abundance of Prevotella was analogous to that which was reported for Canada, Denmark, Spain, and Russia. It can be noted that the microbiota composition found in this study in the healthy Italian population is similar to the composition reported in other countries with a predominantly Caucasian ethnicity 29 . The additional value of our study compared to similar investigations consists of quantifying reference intervals, which could have a direct application in diagnostics. Moreover, De Filippo et al. reported a significant enrichment in Bacteroidetes and a depletion of Firmicutes in African children whose diet was based on cereals, legumes, and vegetables and rich in carbohydrates, fibre, and non-animal protein 30 . The bacteria belonging to the genus Bacteroides are known to produce short-chain fatty acids (SCFA) and may thus contribute to preventing gut inflammation. Accordingly, multiple studies have reported an association between inflammatory bowel disease (IBD) and the flora disequilibrium of Bacteroides [31][32][33][34] .
In our study, Firmicutes showed a lower abundance than Bacteroidetes in C 1 . The greater relative abundance of Bacteroidetes suggests that in the intestines of those subjects, there may be a lower number of bacterial species favouring the onset of metabolic diseases such as those belonging to Firmicutes. In this regard, we also observed that the participants in this study, grouped according to their food habits and, to a lesser degree, according to their geographical origins, are mostly included in C 2 . Indeed, a significantly higher proportion of participants in C 1 reported having food restrictions, mainly related to the consumption of dried and fresh vegetables. A limited consumption, or even worst, the exclusion of fresh vegetables and their derivatives from the diet, can induce alteration in the gut microbiota composition, leading to a reduction of fibre-degrading bacteria, able to produce www.nature.com/scientificreports/ greater amounts of SCFA 35,36 . Further, the heterogeneity of vegetables consumed has been positively correlated to the microbial alpha-diversity 37 . In addition, a significant difference in the distribution of the subjects among www.nature.com/scientificreports/ the Italian macro-regions (north, centre, south) was detected, with participants from the northern regions being more represented in the C 2 and participants from the central regions being prevalent in the C 1 . This result is in accordance with those previously reported by Fontana and colleagues 9 , who detected differences in the gut microbiota composition of people pertaining to three different regions of Italy (one for each macro-area). Further, regular physical activity seems to be a discerning factor between the two clusters, with people who practice a higher volume of physical exercise (i.e., more than 4 h per week) being more present in the C 2 . This result is also coherent with a higher Firmicutes/Bacteroidetes ratio in this cluster, as this ratio was previously associated with higher cardiorespiratory fitness and athletic status 38 . This result is in accordance with other studies in the literature: Clarke et al. 39 found a higher Firmicutes/Bacteroidetes ratio in a sample of rugby players compared to overweight controls; Huang et al. 40 reported an increase in this ratio after six weeks of exercise and dietary restriction in obese adolescents; Donati Zeppa et al. 41 found an increase in the Firmicutes/Bacteroidetes ratio after nine weeks of high-intensity interval training in healthy males.
In conclusion, this study further supports the significance of Firmicutes and Bacteroidetes and their ratio as a scaffold of a microbiologically healthy gut and, consequently, of the body's wellbeing, similarly to a conventional human structural organ. Reference intervals at every taxonomic level can be used as a reference for verifying whether a single subject or sample belongs to the microbiological community defined in this study. Reference intervals here reported refer to a sample of subjects who do not self-report clear disease symptoms; however, it will be interesting to use these reference intervals for future studies to verify whether samples from patients with specific diseases may or may not have a different microbial community. In this sense, it will then be possible to define the sensitivity and specificity of the reported reference intervals according to specific pathologies. This can be done either at the univariate level (e.g., genus by genus) to verify whether each specific abundance is included within the reference interval or using a multivariate approach. In the latter case, the non-parametric approach of the Mahalanobis distance can be used. In this approach, the ranks of the abundances are used rather than the single absolute values 42 .
The association between the richness and diversity of gut microbiota and health has been demonstrated by Rinninella et al. 1 , although it appears difficult to identify a unique optimal gut microbiota composition. The main contribution of the present study is to help identify the existence, within the healthy Italian population, of a commonly distributed, constantly present, principal microbiological pattern, thus suggesting the presence of a sort of microbiological framework or scaffolding. In our view, this finding reinforces the concept that the human intestinal microbiota, with its morpho-functional and pathophysiological aspects, represents a real organ. Indeed, like anatomical organs, the intestinal microbiota may have individual variability; however, such variability must not substantially alter the fundamental framework of a healthy microbiota to ensure its correct functioning. Possible differences in the gut microbiota composition, diversity, and richness among individuals with the same ethnicity, residing in different Italian regions, or with different lifestyles only marginally affect the composition of the two main microbiological clusters identified in the present study. Taken together, our data highlight the significance of studies on population-specific variations in human microbiota composition. Nevertheless, at the same time, the present investigation underscores the need for variability studies to be able to consider even minimal variations in the intestinal microbiological population, given that, at least in the healthy population, there is a significant and reproducible presence of well-defined groups of bacteria, which represent a constant scaffold, or framework. In the next future, it will be crucial to share these data coming from different research groups in order to implement the "normal" range values or to build an algorithm capable of translating the composition of the microbiota associated with diseases states and of suggesting any dietary, pharmacological or lifestyle interventions in order to recover the state of eubiosis. Moreover, integrating these data with metabolomics and genetic variants could improve patient management. Again, this approach to studying the intestinal microbiota calls to mind studies focusing on human structural organs. Indeed, in our view, such an approach to the microbiota could help scientists to better design experimental plans and set up strategies based on precision tailored microbiota engineering.

Participants.
A total of 148 control subjects from 17 Italian regions were recruited by the medical board, some of whose members were contributing authors of this work. The subjects, 69 males and 79 females, ranging in age from 23 to 57, were recruited from different Italian universities under the supervision of the University of Urbino Carlo Bo (Ethics Committee approval no. 34_2021). All the participating institutions followed the same pre-analytical and analytical procedures. All the subjects agreed to participate according to the ethical guidelines of the 2013 Declaration of Helsinki and signed written informed consent. The volunteer participants were selected to create a model of the Italian adult Caucasian population adequately represented in terms of gender, age, geographical origin, and place of residence (city or countryside) and that falls within the criteria of WHO definition of a "healthy" state of "complete physical, mental, and social well-being, not merely the absence of disease or infirmity". In detail, the medical board evaluated each subject's complete medical history in order to exclude those who did not meet the study's inclusion criteria. The following subjects were excluded: those being treated with antibiotics or other drugs, those consuming probiotics, and those having a known history of inflammatory bowel disease, systemic disease, other autoimmune, metabolic, or psychiatric disorders or cancer. A questionnaire was then administered to each participant to collect the following information: body mass index, dietary habits, contact with farm animals or pets, smoking and physical activity habits, alcohol consumption, breastfeeding). Dietary patterns were classified into the Mediterranean, vegetarian/vegan, and others; the routine use of probiotics was also assessed. Furthermore, lactose and other food restrictions were evaluated, i.e., voluntary limited consumption or exclusion of specific food groups (mainly dried or fresh vegetables and derivatives). www.nature.com/scientificreports/ Sample collection and DNA extraction. Samples were collected over two years, from fall 2017 to spring 2019. Fresh stool samples were collected within tubes containing a DNA stabilization buffer (Canvax Biotech) from each participant. In order to reduce any possible bias, pre-analytical and analytical procedures were performed at only one centre, according to our previously published study 9 . QIAamp DNA Stool Mini Kit (Qiagen, Milan, Italy) was utilized to perform total DNA extraction starting from 250 µL of each sample following the manufacturer's protocol. Once collected in the stabilizing liquid, samples were processed according to standardized times, usually not exceeding 5 days from the withdrawal; when was not possible to process the samples immediately at the arrival in the laboratory, they were stored at − 80° until processing, after assessing DNA concentration and purity.
16S rRNA gene sequence data processing. The Illumina 16S Metagenomic Sequencing Library Preparation for high-throughput sequencing was performed as follows: 12.5 ng of each DNA extract was employed for the amplification of the V3-V4 hypervariable regions of the bacterial 16S ribosomal RNA (rRNA) gene, using the following primers with Illumina adapters (underlined): Forward primer (341F): 5′-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG CCT ACG GGNGGC WGC AG Reverse primer (785R): 5′-GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACAG GAC TAC HVGGG TAT CTA ATC C As reported in Klindworth et al. 43 . Agencourt AMPure XP beads (Beckman Coulter, Milan, Italy) were used to purify PCR amplicons. The amplicons were then used for a second PCR in order to barcode the libraries using the Illumina dual-index system (Nextera XT Index Kit, Illumina Inc., San Diego, CA, USA) necessary for multiplexing. Following a second purification step, the eluted DNA products were quantified using the Qubit dsDNA BR Kit assay, diluted to 4 nM and pooled. The purified DNA products were then subjected to an additional PCR to attach dual Illumina indices (Nextera XT Index Kit, Illumina Inc., San Diego, CA, USA) necessary for multiplexing. Paired-end sequencing (2 × 300 cycles) was carried out using an Illumina MiSeq instrument (Illumina Inc.) according to the manufacturer's instructions. Sequences were demultiplexed based on index sequences, and FASTQ files were generated. FASTQ raw sequencing data were imported into QIIME2 v.2021.2 44 environments, and then Illumina primers were removed using q2-cutadapt plugin in trim-paired mode 45 . Trimmed sequences were denoised in paired-end mode using q2-dada2 plugin 46 . The assignment of taxonomy to amplicon sequence variants (ASVs) was performed with q2-feature-classifier plugin 47 against the pre-trained Naïve Bayes classifier SILVA 138 99% operational taxonomic units (OTUs) full-length sequence dataset 49 .
Statistical analyses. The pan-microbiota (total observed richness in all samples) was determined in subsets of increasing size composed of randomly chosen samples (250 repetitions for each sample size). A collector's curve, i.e., the total number of observed genera with increasing numbers of samples collected, was subsequently calculated (chronological order) (10 repetitions for each sample size), according to Falony et al. 50 . Once the sample's representativeness was checked, and it was noted whether the abundances showed very dispersed values, the presence of any homogeneous subgroups within the sample was verified. The vegdist function (vegan R package) was used to calculate Bray-Curtis distance, and the kmeans function was used to create the clusters 51 . The elbow and silhouette methods were used for determining the optimal clusters. Permutational multivariate analysis of variance (PERMANOVA) was performed on the Bray-Curtis distance matrix to determine if the gut microbiota structure differed between the two clusters, considering phyla, families, and genera. The adonis2 function of the vegan R package was used. A heatmap was built as a graphical representation of the most abundant (representing 99% of total abundance) phyla using the pheatmap R package 52 . A chi-square test, or Fisher exact test when at least one class had n < 5, was used to test differences in microbiota composition and participant characteristics between the two clusters. The results are presented with p(χ 2 ) and Cramer's V. V values should be interpreted as > 0.5 = high association, 0.3 to 0.5 = moderate association, 0.1 to 0.3 = low association, 0 to 0.1 = little or no association. Richness (OTUs number) and Shannon's effective number were calculated using the vegan R package. A non-parametric method was used to calculate the reference intervals related to phyla, families, and genera. The 90% confidence intervals relative to the 95% lower and upper limits of the reference intervals were calculated using the bootstrap method according to the NCCLS Guidance Document C28A2 53 . The referenceIntervals R package was used 54 . The significance of differences in the abundance of phyla, families, and genera between clusters was tested using the Mann-Whitney test: P values from all statistical tests were adjusted for multiple comparisons within each taxonomic level, controlling the False Discovery Rate (FDR) (FSA R package) at level 0.05 using the Benjamini-Hochberg step-up procedure 55 . A graphical display of a non-parametric correlation matrix, based on Spearman's R, ordered according to hierarchical clustering, was obtained using the corrplot R package. All the analyses were conducted using Microsoft Excel 16, Prism 8 (GraphPad Software, San Diego, CA), and R Studio 3.6.2.
Ethics approval. The study was approved by the Urbino University Ethics Committee (approval number 34_2021).

Data availability
Additional data is available in the supplementary material, and the datasets generated during the current study will be available upon request to the corresponding author.