Diversity in gut bacterial community of school-age children in Asia

Asia differs substantially among and within its regions populated by diverse ethnic groups, which maintain their own respective cultures and dietary habits. To address the diversity in their gut microbiota, we characterized the bacterial community in fecal samples obtained from 303 school-age children living in urban or rural regions in five countries spanning temperate and tropical areas of Asia. The microbiota profiled for the 303 subjects were classified into two enterotype-like clusters, each driven by Prevotella (P-type) or Bifidobacterium/Bacteroides (BB-type), respectively. Majority in China, Japan and Taiwan harbored BB-type, whereas those from Indonesia and Khon Kaen in Thailand mainly harbored P-type. The P-type microbiota was characterized by a more conserved bacterial community sharing a greater number of type-specific phylotypes. Predictive metagenomics suggests higher and lower activity of carbohydrate digestion and bile acid biosynthesis, respectively, in P-type subjects, reflecting their high intake of diets rich in resistant starch. Random-forest analysis classified their fecal species community as mirroring location of resident country, suggesting eco-geographical factors shaping gut microbiota. In particular, children living in Japan harbored a less diversified microbiota with high abundance of Bifidobacterium and less number of potentially pathogenic bacteria, which may reflect their living environment and unique diet.


DNA/RNA extraction
Bacterial DNA was extracted from samples by the bead-beating method and purified as described previously (1), with some modification. In brief, Freshly-voided fecal sample was diluted 10-fold with RNAlater and homogenized. Then, 200 µl of the fecal sample diluents were mixed with 1 ml PBS and vortexed. After centrifugation at 20,000 × g for 5 min at 4 ºC, the supernatant was removed and washed twice with 1 ml of PBS buffer to remove PCR inhibitors. The supernatant was discarded and the pellet was stored at -30 ºC until use.
Three hundred milligram of glass beads (diameter, 0.1 mm) (TOMY SEIKO Co., Ltd., Tokyo, Japan), 300 μl of Tris-SDS solution and 500 μl of TE buffer-saturated phenol were added to a thawed sample, and then vortexed vigorously using a FastPrep FP120 (Bio 101) at a speed of 5.0 m/sec for 30 s. Four hundred microliter of phenol/chloroform/isoamyl alcohol (25:24:1; v/v) was added to 400 μl of supernatant and shook vigorously with the use of FastPrep PF120 at a speed of 4.0 m/sec for 45 s. After centrifugation at 20,000 × g for 5 min at 4 ºC, 250 μl of supernatant was mixed with 25 μl of 3 M sodium acetate (pH 5.2). After being kept for 3 min on ice, 300 μl of ice cold 100% isopropanol was added and centrifuged at 20,000 × g for 5 min at 4 ºC. The pellet of DNA was washed in 500 μl of ice cold 70% ethanol and air dried prior to suspension in 1 ml of TE buffer (pH 8.0) and stored at -30 ºC until use.
RNA was extracted from the stool samples by the method described previously (2). The thawed sample was resuspended in a solution containing 346.5 µl RLT lysis buffer (catalog no. 79216; QIAGEN Sciences, Germantown MD), 3.5 µl β-meracaptoehtanol (Sigma-Aldrich Co., St. Louis, MO) and 100 Tris-EDTA buffer (pH 8.0). Then 300 mg of glass beads (diameter, 0.1 mm; TOMY SEIKO Co., Ltd.) was added to the suspension, and the mixture was vortexed vigorously for 60 s using a FastPrep FP120 (BIO 101) at a speed of 5.0 m/s. Then 500 µl acid phenol (Wako Pure Chemical Industries, Ltd.) was added, and the mixture was incubated at 60 ºC for 10 min. After incubation, the mixture was cooled on ice for 5 min prior to the addition of 100 µl chloroformisoamyl alcohol. After centrifugation at 12,000 × g for 5 min, 400 µl of the supernatant was collected and subjected to isopropanol precipitation. Finally, the nucleic acid fraction was suspended in 50 µl nuclease-free water (Ambion Inc., Austin, TX, USA). To remove contaminating genomic DNA from the RNA fraction, 0.5 U RNase-free DNase I (Takara Bio Inc., Shiga, Japan) per µg RNA was added to each sample in a solution containing 1 µl DNase I buffer (Takara Bio Inc.), followed by incubating at 37 ºC for 20 min. After incubation, the DNase was inactivated and removed twice by acid-phenol and chloroform-isoamyl alcohol extraction as described above, and the RNA in the resultant supernatant was collected by isopropanol precipitation. Finally, the RNA was suspended in 50 µl nuclease-free water.

qPCR and RT-qPCR
Quantitative PCR amplification (qPCR) and reverse transcription quantitative PCR (RT-qPCR) were performed in an ABI PRISM 7900HT Sequence Detection system (Applied Biosystems, Foster City, CA, USA). For qPCR amplification, a 10 µl of reaction mixture was composed in 10 mM Tris-HCl (pH 8.  (Table S12) at a concentration of 0.6 µM, and 5µl template RNA. The reaction mixture was incubated at 50 °C for 30 min for reverse transcription to occur, prior to continuous amplification which consisted of one cycle at 95 °C for 15 min, followed by 45 cycles at 94 °C for 20s, 55 °C for 20 s, and 72 °for 50 s. To distinguish the target PCR product from the non-targeted PCR products, the melt curve was obtained by continuous fluorescence intensity measurements as the reaction mix was slowly heated at temperatures from 60 to 95 °C in increments of 0.2 °C/s. Amplification and detection were carried out in 384-well optical plates with an ABI PRISM 7900HT Sequence Detection system (Applied Biosystems).

454 pyrotag sequencing and data processing
The V6-V8 region of bacterial 16S rRNA gene was amplified by PCR with a bacterial universal primer set, Q-

Supplementary Note 1 Clustering of fecal bacteria communities of 303 Asian children.
Clustering of the 303 Asian samples was attempted at each taxonomic level. From phylum to species level, clustering was performed using the Jensen-Shannon divergence (JSD) and partitioning around medoid (PAM) algorithm, as performed in the study which originally defined the enterotypes 1 . For phylotype level, the distance was calculated by weighted UniFrac using phylogenetic information 2 . Significant clustering was not observed from phylum to order level, whereas consistent clustering emerged with high significance from family to phylotype level (Table S9). At these taxonomic levels, the optimal number of clusters was suggested to be 2 by maximizing the Calinski-Harabasz (CH) index (inset in Fig. S2) and confirmed by the two indices of prediction strength (PS) and average silhouette width (SW) (Fig. S3a). At the number of cluster equal to two, PS and SW were superior to those of 200 simulation datasets randomized according to the Gaussian deviates based on experimental data (Fig.   S3a), suggesting significant clustering that represents two distinct bacteria communities occurring in the gut microbiota of Asian children. The Jaccard coefficient was kept high during 1,000 bootstrap resampling, suggesting the stability of the two clusters (Fig. S3b).
The clusterings on family and genus level are displayed on the PCA plots ( Fig. S2a  in the genus level, likely due to the complexity of genus-level taxonomy in these families. The clustering based on the weighted UniFrac distance (Fig. 2c) showed a similar profile to those in family and genus levels; 96% and 91% of samples were consistently classified, respectively. This clustering stayed robust even when whole samples of any one city were removed from the dataset used for clustering (Fig. S3c), suggesting that this clustering does not depend on local variation specific to a certain city but is involved in global distribution of these two microbiota types in the Asian children. Distribution of P-and BB-types of children in each city is nearly consistent among the family, genus, and phylotype level (right panels of Fig. S2). Table S4 to S8)

China
Among BB-type countries, the gut microbiota of children living in Beijing and Lanzhou are characterized by a high abundance of cluster II species, including two Dorea species, as indicated by the heat map of Fig. 5b. This feature is shared with P-enterotype cities. Another noticeable common feature in these two Chinese cities is the low abundance of genus Fusobacterium, notably species Fusobacterium mortiferum. Although the majority of subjects in Beijing and Lanzhou harbored the BB-type microbiota, there are significant differences in their microbiota. The Bifidobacterium population represents the major difference between these subjects with averages abundance of 20.0% (10 10.25 cells/g feces) in Lanzhou and 11.7% (10 9.89 cells/g feces) in Beijing (P = 0.0043 for cell counts). In contrast, Lachnospiraceae and Ruminococcaceae are more abundant in children in Beijing than those in Lanzhou. These differences may be due to the unique diet of residents of Lanzhou, where dough noodles are the main source of dietary carbohydrate.

Japan
Children in Japan mostly harbored the BB-type microbiota. Among BB-type countries, their microbiota is particularly characteristic in terms of a high abundance of Bifidobacterium (20.3%) and relatively low abundance of Bacteroides (12.4%), similar to that of the children from Lanzhou. Further, there are a number of distinct features in their microbiota, e.g., a high abundance of families Peptosterptococcaceae and Bacillaceae, and genera Veillonella and Eggerthella, and a low abundance of family Enterobacteriaceae and genera Phascolarctobacterium, Slakia, and Desulfovibrio. As shown in Fig. 5b, Dialister invisus is particularly frequent (67%) in children in Japanese compared to other children (18%). It is known that Dialister invisus is associated with dysbiosis of the faecal microbiota in patients with Crohn's disease. It is also interesting that Bifidobacterium animalis was detected in 10 out of 83 tested subjects in Japan, while it was detected in only two Chinese subjects in the other countries. E. coli (0.10% versus 0.61% for total average) and Clostridium perfringens (0.0029% versus 0.088% for total average) were notably less abundant in the children in Japan compared to the other countries. Lactococcus garvieae which is known as a fish pathogen was not detected in the children from Japan whereas it was detected in 20% of children from the other countries. The qPCR data also indicates a significantly lower level (p < 0.01) and prevalence (p < 0.05) of C. perfringens and Enterobacteriaceae in Japan (Table S8).
It is also evident from Fig. 6a that the gut microbiota of Japanese children is remarkably less diversified compared with children in the other countries. As shown in Fig. 6b, the children from these two Japanese cities possess significantly similar bacteria community. Only minor differences were observed in Methylobacterium Supplementary Figure S1. Phylogenetic tree of dominant and subdominant phylotypes (overall mean abundance >0.01%) found from stool samples of 303 Asian children. Bubble size and color represent overall abundance in total population and bacterial family of each phylotype, respectively. Dominant phylotypes are labeled with OTU ID (see Supplementary Table S2 and Table S3). Stability of each simulated cluster was assessed by bootstrap resampling (1,000 times) using "clusterboot" command of the fpc R package in R and displayed by box plots with the smallest and highest values, 25% and 75% percentiles, and the median. (c) Dependency of the type clustering on each city was examined with adjusted Rand index comparing the result of clustering when removing a whole dataset of the city to those obtained from complete dataset. The adjusted Rand index was calculated by using clust.stats program of the fpc R package.

Supplementary
Supplementary Figure S4. Phylogenetic tree of 250 common phylotypes. Phylotypes observed in >50% of subjects in each type group were chosen (see details in Supplementary Table S2) and their sequences were subjected to the phylogenetic analysis. The median of their abundances in the P-and BB-type groups is represented by the bar chart outside the tree. Phylotypes showing P value < 10 -4 in the chi-square test between Pand BB-type groups are colored on their ID no. and are listed in the right-side table with their closest species name. The colors used for the tree branches represent bacterial families (see Fig. 1b).  Fig. S1 and Fig. S4. Coloured circle represents bacteria family (see the index in Supplementary Fig. S1). b Closest species were searched by RDP seqmatch followed by Seqmatch Q400 algorithm. The values in parentheses represent sequence identity to the 16S rRNA of the indicated species in the RDP-II database. c The number of carriers is indicated by the green bar graph. d Mean abundance (%) by country is shown and coloured by red according to the abundance.