Discrepant gut microbiota markers for the classification of obesity-related metabolic abnormalities

The gut microbiota (GM) is related to obesity and other metabolic diseases. To detect GM markers for obesity in patients with different metabolic abnormalities and investigate their relationships with clinical indicators, 1,914 Chinese adults were enrolled for 16S rRNA gene sequencing in this retrospective study. Based on GM composition, Random forest classifiers were constructed to screen the obesity patients with (Group OA) or without metabolic diseases (Group O) from healthy individuals (Group H), and high accuracies were observed for the discrimination of Group O and Group OA (areas under the receiver operating curve (AUC) equal to 0.68 and 0.76, respectively). Furthermore, six GM markers were shared by obesity patients with various metabolic disorders (Bacteroides, Parabacteroides, Blautia, Alistipes, Romboutsia and Roseburia). As for the discrimination with Group O, Group OA exhibited low accuracy (AUC = 0.57). Nonetheless, GM classifications to distinguish between Group O and the obese patients with specific metabolic abnormalities were not accurate (AUC values from 0.59 to 0.66). Common biomarkers were identified for the obesity patients with high uric acid, high serum lipids and high blood pressure, such as Clostridium XIVa, Bacteroides and Roseburia. A total of 20 genera were associated with multiple significant clinical indicators. For example, Blautia, Romboutsia, Ruminococcus2, Clostridium sensu stricto and Dorea were positively correlated with indicators of bodyweight (including waistline and body mass index) and serum lipids (including low density lipoprotein, triglyceride and total cholesterol). In contrast, the aforementioned clinical indicators were negatively associated with Bacteroides, Roseburia, Butyricicoccus, Alistipes, Parasutterella, Parabacteroides and Clostridium IV. Generally, these biomarkers hold the potential to predict obesity-related metabolic abnormalities, and interventions based on these biomarkers might be beneficial to weight loss and metabolic risk improvement.


Discussion
In this retrospective study, we detected the GM characters of obese patients with various metabolic abnormalities. Although studies have revealed the decreased bacterial diversity in obese patients 29,30 , in current study higher bacterial diversity was detected in obese patients without metabolic abnormalities than in healthy individuals. Therefore, we hypothesized that specific bacteria and their associations with obesity should be understood, other than bacterial diversity which might be affected by diet, body size and other factors 31 . With the onset of metabolic abnormalities in obese adults, aggravated GM dysbiosis brings about dwindling bacterial diversity and genus number 29 . Moreover, obvious inter-group GM discrepancy was observed between Group H and Group OA after PCoA analysis, while the Group O seemed to be the intermediate state of healthy and obese with metabolic abnormalities. We therefore suggest that gradual GM changes occurred with the aggravation of obesity and the occurrence of other metabolic diseases.
To differentiate obese patients from healthy individuals, six universal biomarkers were identified through random forest classifiers, including Bacteroides, Parabacteroides, Blautia, Alistipes, Romboutsia and Roseburia. Interestingly, most of those genera have been found to interact with host immune system. For example, Bacteroides has been revealed to promote the differentiation of regulatory T cells (Treg) and protect against inflammatory reactions 32 . Meanwhile, systemic inflammatory responses can be suppressed by Parabacteroides through its regulations of IL-10 and Treg cells 33 . Conversely, Alistipes would trigger inflammatory reactions in hosts, and the genus was also found abundant in Chinese T2D patients 9 . Based on their close relationships with   www.nature.com/scientificreports www.nature.com/scientificreports/ host immune system, these biomarkers can be applied for the early diagnosis of obesity and other metabolic risks, given the observed high accuracy (AUC ranged from 0.68 to 0.77). Furthermore, these biomarkers seem to be population specific. For a Danish population 22 , 18 biomarkers have been identified to differentiate obese and lean individuals, including species from Bacteroides, Clostridium, Faecalibacterium and Ruminococcus. However, only one biomarker was commonly found in our Chinese cohort. On the other hand, nine obese-associated genera were reported in Chinese children 24 , and three of them were consistent with the findings in this study, including Bacteroides, Parabacteroides and Blautia. These outcomes enlightened us that specific GM interventions should be considered for different populations with various lifestyles 27 .
Compared to the obese patients without abnormalities, the patients with metabolic abnormalities demonstrated altered GM components, and Clostridium XIVa contributed to the discrimination of obese patients with high UA, serum lipid or blood pressure. A previous report documented that Clostridium XIVa could produce butyrate 34 , and it would suppress systemic inflammatory responses. In addition, Roseburia was also applied for the differentiation of obese patients with high UA, serum lipid or blood pressure. As a butyrate-producing bacterium 35 , Roseburia could stimulate the differentiation of Treg cells, which was beneficial for the alleviation of  www.nature.com/scientificreports www.nature.com/scientificreports/ inflammation. Despite of distinct clinical symptoms, the obese patients with different metabolic abnormalities shared some GM biomarker, such as Blautia, Dorea and Gemmiger. As an acetate producer 36 , Blautia can drive insulin release and promote metabolic syndromes, such as hypertriglyceridaemia, fatty liver disease and insulin resistance 16 . Meanwhile, Dorea was negatively associated with insulin resistance 37 , and Gemmiger would aggregate inflammatory reactions in the hosts through its colonization factors 38 . These biomarkers indicated the common GM alterations in obese patients with different metabolic abnormalities, so other factors (such as genetic variation) might involve in the occurrence of the different metabolic diseases 39 . Based on such observations, we also speculated that obesity-related GM alterations laid the foundation for the occurrence of metabolic disorders, and other specific pathogenic perspectives need to be explored beyond the GM dysbiosis.
The associations between bacterial components and the clinical indicators were explored. Since Faecalibacterium and Butyricicoccus could secret butyrate 40 and boost insulin sensitivity 41 , their negative correlations with LDL, GLU, UA, TC and BMI were discovered. In contrast, Blautia was positively correlated with the aforementioned clinical indicators due to acetate secretion 36 . Given that Faecalibacterium and Butyricicoccus play opposite roles as compared with Blautia, we speculated that synergism and antagonism inside the microbial community were also crucial for obesity development. In addition, Parabacteroides 33 and Clostridium IV 42 could suppress inflammatory responses, and they were negatively associated with the blood pressure, blood lipid and GLU. Since some of the aforementioned bacteria were GM biomarkers in the obese patients, we deduced that these bacteria might be the potential targets for the interference of metabolic disorders, and the corresponding clinical symptoms would possibly be relieved based on these host-microbial relationships. In addition, the relationships among physiological parameters suggested that fat primarily accumulated at the waist in Chinese populations when obesity occurred 43 , and increased waistline was positively associated with elevated blood pressure, blood sugar, UA and TG. Hence, waistline can be recognized as a signal for the occurrence of metabolic abnormalities in Chinese adults.
A limitation of the current research is that the validation accuracy of the biomarkers was not testified in different populations. Since GM composition was affected by ethnicity and lifestyles 27 , the obesity cohorts from other populations would benefit to understand the application scope of the biomarkers. In further study, addition work is also imperative: I) examine the genetic characters in patients with different metabolic diseases; II) perform metagenomic sequencing to evaluate the microbial functions; III) explore the alteration of intestinal metabolites in patients with metabolic diseases, and their associations with gut microbiome. www.nature.com/scientificreports www.nature.com/scientificreports/ In conclusion, the study detected the GM features in the Chinese obese adults with large cohort, furnished genus markers for obese patients with different metabolic abnormalities, and illustrated the associations between bacterial commensals and various clinical indicators. These findings suggested the roles of GM in the pathogenesis of metabolic diseases, and offered potential GM targets for the adjuvant interventions on the treatment of obesity with metabolic abnormalities.

Ethics statement. This study was approved by the Ethics Committee of The General Hospital of the People's
Liberation Army (PLAGH) under registration number S2016-068-01, and the research was carried out according to The Code of Ethics of the World Medical Association. All participants provided signed informed consents, and volunteered to be investigated for scientific research.

Participant recruitment. Randomized volunteers were recruited in four hospitals in China: The 180th
Hospital of People's Liberation Army of China (Quanzhou, China), China-Japan Union Hospital (Changchun, China), Southwest Hospital (Chongqing, China) and Longkou People's Hospital (Longkou, China). A total of 2,058 Han Chinese joined the study, and they completed physical testing including height, weight, waistline and blood pressure. By using a blood auto-analyzer (Beckman Coulter AU5800, Brea, CA, USA), blood testing was carried out in the participants to evaluate the health condition consist of GLU, TC, TG, LDL, high density lipoprotein (HDL), UA and eGFR (Supplementary Table 1).
The participants who satisfied the following criteria were excluded from this study: (I) younger than 18 years or older than 75 years; (II) exposed to antibiotic, probiotics or proton pump inhibitor 1 month before physical examination; (III) suffered from diarrhoea, constipation, haematochezia or other gastrointestinal infectious diseases 1 month prior to physical examination; (IV) experienced enema or other gastroenterology operations 1 month before physical examination; (V) suffered from mental disorders (e.g., depression, anxiety and post-traumatic stress), autoimmune diseases (e.g. type 1 diabetes, rheumatoid arthritis, multiple sclerosis and psoriasis.) or hereditary diseases (e.g., thalassemia, hereditary deafness and phenylketonuria); (VI) had drug abuse history; (VII) exposed to antibiotic, probiotic, or proton pump inhibitors 4 weeks prior to the study. Finally, 1,914 individuals, from whom faecal samples were collected, were enrolled in the study between Jan. 2016 and Sep. 2016.

Grouping based on clinical indicators.
The participants were first divided into 2 groups: a healthy group and an obesity group. The healthy group (Group H) included individuals who passed their physical examinations with a normal BMI (between 18.5 and 23.99) 44 . On the other hand, overweight and obese patients, whose BMI was larger than 24, were assigned to the obesity group in this study. Using published previously clinical standards, five kinds of metabolic abnormalities were defined in the obesity cohorts, including high UA 45 (>416 µmol/L in male or >350 µmol/L in female), high serum lipid 46 (TC ≥ 6.22 mmol/L, TG ≥ 2.26 mmol/L, LDL ≥ 4.14 mmol/L and/or HDL < 1.04 mmol/L), high blood pressure 47 (SBP ≥ 140 mmHg, DBP ≥ 90 mmHg), abnormal renal function 48 (eGFR < 60 ml/Min/Hight 2 ) and high serum glucose 49 (≥7.0 mmol/L). Relying on clinical indicators and personal confirmation, the obese patients were divided into obesity groups with (Group OA) or without metabolic abnormalities (Group O), and then Group OA was subdivided into 15 obesity groups with different metabolic abnormalities (Table 1). To avoid data deviation, groups with less than 100 individuals were removed from subsequent analysis.
Faecal sample collection. The sterile stool collection tubes (Axygen, California, USA) were delivered to the participants, and fresh stools were collected from them when they underwent physical examination. Two kinds of tools were prepared to collect different types of stool: I) a swab (Huachenyang Technology CO., LTD, Shenzhen, China) was used to collect hard stools, and approximately 5 grams of stools was obtained from each person; II) a dropper (Shanghai Truelab Lab, Shanghai, China) was applied to collect loose stools, and approximately 5 ml of stools was acquired from each person. The stool samples were preserved in stool collection tubes, and then transferred to a −80 °C refrigerator for long-term storage within half an hour. Contamination from urine or the environment was avoided during stool sample collection. DNA extraction, library construction and sequencing. Microbial DNA was extracted from stool samples using a Power Soil DNA Isolation Kit (Mo Bio Laboratories, Carlsbad, USA). The V3-V4 region of the 16S rRNA gene was amplified by primers 338F and 806R using a PCR kit (TransGenAP221-02, Peking, China). The quality of the PCR products was detected by Qubit (Thermo Fisher, Singapore), and the qualified DNA was prepared for library construction (TruSeq DNA PCR-Free kit, Illumina, San Diego, USA). The libraries were sequenced on an Illumina Miseq sequencing platform (Illumina, San Diego, USA) with 300 base pairs. Data filtering and taxonomical annotation. Raw sequenced reads were first paired-filtered for adapter contamination (>15 bases), low quality (10 bases with <Q20), and N contained (>1 base) using a self-programmed script. Then, the filtered reads were processed with the DADA2 (v1.6.0) package 50 in R (v3.4.4). Bases were trimmed from the reads if their quality scores were lower than 2, and the trimmed reads were discarded if their lengths were shorter than 200 bps. Then, the sequence variants were inferred for each sample with default parameters and merged into tags. After chimeras removal, qualified tags were aligned to the RDP 16S rRNA database (trainset 16/release 11.5) 51 to obtain corresponding taxonomic profiling. The Shannon index was calculated to evaluate samples biodiversity by using the vegan package in R.

Construction of random forest models and selection of GM markers.
With the relative abundances of genera, random forest classifiers 53 were constructed using a three-step scheme using package randomForest in R. Firstly, the samples in each group were randomized into 2 sets: a discovery data set (70% of the samples) and a validation data set (30% of the samples). Secondly, random forest models were constructed by the discovery data sets comprising the two compared groups. Finally, the constructed models were applied to the validation data sets comprising the compared groups, and compared with the actual category of the samples. The model validity was evaluated with precision, sensitivity, specificity, precision, F1 score and AUC value with 10 repeats, and the ROC curves were plotted using the R package "pROC". The detailed script and parameters were shown in Supplementary scripts. GM biomarkers were obtained from the constructed random forest models. Based on the optimal branch number and Gini values, genera were selected as candidate biomarkers. Since the models were constructed with 10 repeats, candidate biomarkers that arose over 8 times among 10 repeats were selected as final GM biomarkers for discrimination of the two compared groups.
Statistics. All statistical analyses were performed in R (version 3.4.1). Wilcoxon rank-sum test was executed on Shannon index and genus number between different obese groups by using "Wilcox.test" in R, and the statistical difference was examined among Group H, Group O and Group OA using "kruskal.test" or "chisq.test" in R. Spearman correlation was used to evaluate the associations between GM and clinical indicators, and the relationships among different clinical indicators (using "cor" in R). Statistical results from the previous tests were adjusted with Benjamini-Hochberg method (FDR < 0.05) using "p.adjust", and were plotted using the package "ggplot2" in R.

Data Availability
The DNA sequencing data is available in NCBI Sequence Read Archive (SRA) under the Accession Number SRP125854.