Bacterial community in naturally fermented milk products of Arunachal Pradesh and Sikkim of India analysed by high-throughput amplicon sequencing

Naturally fermented milk (NFM) products are popular ethnic fermented foods in Arunachal Pradesh and Sikkim states of India. The present study is the first to have documented the bacterial community in 54 samples of NFM products viz. chhurpi, churkam, dahi and gheu/mar by high-throughput Illumina amplicon sequencing. Metagenomic investigation showed that Firmicutes (Streptococcaceae, Lactobacillaceae) and Proteobacteria (Acetobacteraceae) were the two predominant members of the bacterial communities in these products. Lactococcus lactis and Lactobacillus helveticus were the predominant lactic acid bacteria while Acetobacter spp. and Gluconobacter spp. were the predominant acetic acid bacteria present in these products.

is produced, and is consumed as curry/soup in meals; and churkam (hard-variety of chhurpi) is the product of dehydrated chhurpi, which is used as masticatory as chewing gum in high altitudes. Lactic acid bacteria were predominant with the load of 10 8 cfu/g in the Himalayan fermented milk products 17 . Lactobacillus bifermentans, Lb. alimentarius, Lb. paracasei subsp. pseudoplantarum, Lactococcus (Lc.) lactis subsp. lactis, Lc. lactis subsp. cremoris; Lb. plantarum, Lb. curvatus, Lb. fermentum, Lb. kefir, Lb. hilgardii, Enterococcus faecium and Leuconostoc mesenteroides were reported from dahi and chhurpi of Sikkim based on phenotypic, biochemical characterization and mol (%) content of G+C of DNA 14,17 . However, no study has been conducted yet on churkam and gheu/mar.
As it is well known that the cultivability of microbiota is still a limiting factor in understanding the natural food fermentation 23,24 , application of high throughput metagenomic techniques like Illumina amplicon sequencing may serve to give more insight into microbial ecology of natural food fermentation. Metagenomic studies of various fermented milk products like kefir, buttermilk, cheeses etc have shown a realistic view of the microbial community structure involved in the natural milk fermentation 21,[24][25][26][27][28] . In this study we aimed to anlayse the bacterial community structure of fifty-four samples of naturally fermented milk products (chhurpi, churkam, dahi and gheu/mar) of Arunachal Pradesh and Sikkim by Illumina amplicon sequencing. This is the first report on bacterial community in NFM products of the Himalayas using in-depth metagenomic analysis.

Results
Overall microbial community structure. The bacterial composition of the different naturally fermented milk products (chhurpi, churkam, dahi and gheu/mar) was compared at different taxonomic levels ( Fig. 2a-c). The bacterial phyla present in four types of NFM products were Firmicutes and Proteobacteria, respectively (data not shown). Phylum Firmicutes was represented by six families belonging to Streptococcaceae (24.2%), Lactobacillaceae (16.8%), Leuconostocaceae (8.0%), Staphylococcaceae (6.8%), Bacillaceae (1.6%), and Clostridiaceae (1.3%); and phylum Proteobacteria included Acetobacteraceae (26.8%), Pseudomonadaceae (3.3%) and Enterobacteriaceae (1.2%) (Fig. 1a). The overall bacterial diversity of these NFM products were predominated by species belonging to the lactic acid bacteria: Lactococcus lactis (19.7%) and Lactobacillus helveticus (9.6%) and Leuconostoc mesenteroides (4.5%) (Fig. 2b,c). Additionally, species belonging to the acetic acid bacteria: Acetobacter lovaniensis (5.8%), Acetobacter pasteurianus (5.7%), Gluconobacter oxydans (5.3%), and Acetobacter syzygii (4.8%) were also observed (Fig. 2b,c). The percentage of Enterobacteriaceae was 1.2% (Fig. 2a), whereas the percentage of genus Enterococcus was below 0.5% (data not shown), hence it was not shown at the genus level (Fig. 2b). Percentage of Streptococcus thermophilus was below 0.1% (data not shown). The percentage of unclassified bacteria at the taxonomical levels was 7.9% ( Fig. 2a-c). Presence of uncultured bacterium was shown in all samples (Fig. 2c). Multivariate analysis. PCA using species-level OTUs data showed significant differences among the NFM products studied (Fig. 3). The NFM products collected from two regions (Arunachal Pradesh and Sikkim) showed significant difference in the bacterial community structure (ANOSIM, p = 0.005, R = 0.16), but however, there was no significant difference between the same products prepared from different sources of milk (cow or yak). This reflects the regional contribution to the bacterial diversity of these products with respect to their location of preparation, but not from the milk source whereby these products are being prepared.
Alpha diversities. Alpha diversities were compared on the basis of states (Sikkim and Arunachal Pradesh)/ places of collection of samples, animal's milk source (cow/yak) and product types (Table 1). There was no significant difference between the states/regions and animal's milk source, respectively. However, significance difference (p = 0.0125) was observed in terms of product types i.e., chhurpi and churkam in Chao1 species richness (Fig. 4). Chhurpi and churkam are two final products of milk fermentation where the latter is produced through a process of dehydration of the former and is usually kept for a longer fermentation. Multivariate analysis of species level OTUs showed a significant difference (ANOSIM p = 0.002, R = 0.16) between the two products. However, there is no significant difference among the general fermenting bacteria. Also, we observed a significant difference in Clostridiaceae (p = 0.0004) and Pseudomonadaceae (p = 0.013) between these two food types (Fig. 5).

Discussion
In this study, bacterial diversity was explored by barcoded Illumina MiSeq amplicon sequencing of the 16 S rRNA gene (V4-V5 region). The applied method using high throughput sequencing detected Lactococcus lactis, Lb. helveticus, Acetobacter lovaniensis, A. pasteurianus, A. syzygii, Gluconobacter oxydans and Leuconoctoc mesenteroides (above 1%) in all 4 samples of NFM products. Reads of OTUs in present study could not detect Lb. farciminis, Lb. biofermentans, Lb. hilgardi, Lb. paracasei subsp. pseudoplantarum, Lb. hilgardii, Lb. paracasei subsp. paracasei which were reported earlier in chhurpi and dahi based on limited phenotypic characterization 14,17 . However, Lb. helveticus (9.6%) was detected in the present culture-independent method which was not reported in culture dependent method earlier. Lb. helveticus is known to be present in dairy products 29 . A major composition of Lactococcus lactis (Streptococcaceae) and Lb. helveticus (Lactobacillaceae) was found to be the most predominant species along with Leuc. mesenteroides (Leuconostocaceae) in the NFM products of India, which still form what are commonly known as the primary cultures in milk fermentation 1 . Metagenomics-based studies of other milk products around the world like kefir, cheeses, have also reported to harbour species of Lactobacillus, Lactococcus and Leuconostoc 25,26,30,31 as the dominant bacteria in general. Apart from the common known lactic acid bacteria group, a relatively high abundance of Proteobacteria-associated Acetobacteraceae (acetic acid bacteria) was observed in gheu/mar products. Acetobacteraceae members have also been reported in milk-related products 19,25,32,33 , and their dominance in gheu/mar (churned before heating) products than the subsequent downstream products (chhurpi and churkam) may be due to the effect of heating during the processing steps. Even though the Acetobacteraceae members were still present in chhurpi and churkam, the abundance was generally low. During the fermentation of chhurpi and churkam, we observed an increase in the abundance of Streptococcaceae (Lactococcus) and subsequently a build-up in the Lactobacillaceae (Lactobacillus) population in churkam.  Based on OTUs system, the percentage of Enterobacteriaceae and genus Enterococcus was very low in NFM samples analyzed. Enterococcus faecalis, Ent. faecium along with Lactococcus lactis subsp. lactis were reported from dahi of Bhutan based on 16 S rRNA gene sequencing 6 . Nunu, African NFM product, is frequently contaminated with pathogenic Enterobacteriaceae, demonstrated by short-read-alignment-based bioinformatics tools which may be used for high-throughput food safety testing 34 . Staphylococcaceae, Bacillaceae, Clostridiaceae and Pseudomonadaceae were observed at relatively low level in this study probably as contaminants. Pseudomonadaceae (Pseudomonas fluorescens) is usually present in milk and milk products as sources of contaminants 35 and Clostridiaceae (Clostridium tyrobutyricum) is another bacterium found in cheese causing late blowing defect 36 . These contaminants were probably associated with the overall handling process, since samples are naturally fermented milk products, and there is no controlled process involved. Contamination of unwanted or rather non-fermenting bacteria are known to have acquired from various sources of production environment 37,38 . Presence of uncultured bacterium was shown in all samples analyzed. Uncultured bacterium group at species level were obtained using OTUs method, as the database could not assign them to any of their closest taxa. OTUs system put sequences into bins based on similarity of sequences within a data set to each other 39 . Moreover, limitations to using OTUs-based method is that the clustering algorithms are computationally intensive, relatively slow, and require significant amounts of memory 40 .
However, the predominance of few species were observed in a particular product showing the remarkable diversity of microbiota among 4 analyzed samples of NFM products and subsequently a build-up in the Lactobacillaceae (Lactobacillus) population in churkam. Lactococcus lactis was predominant in chhurpi, dahi and churkam, whereas in gheu/mar samples, it was relatively less. Lb. helveticus was dominant in churkam comparable to other 3 NFM products. However, Leuc. mesenteroides was predominant in dahi samples. Though we observed a fairly equal distribution between Lactococcus and Acetobacter species in 4 NFM products, however, at species level Lactococcus was represented only by Lc. lactis whereas Acetobacter was represented by A. lovaniensis, A. pasteurianus, A. syzygii and Gluconobacter oxydans. Diversity in bacterial species among the 4 NFM products was observed based on alpha diversity analysis. However, significance difference was observed only in between chhurpi and dahi (p = 0.0152) and chhurpi and churkam (p = 0.0125), respectively.

Conclusion
Earlier reports on chhurpi and dahi of North East India was based on limited culture-dependent analysis with some species of lactic acid bacteria. However, in the present study the NGS data of chhurpi, churkam, dahi and gheu showed the abundance of Lactococcus lactis (Streptococcaceae), Lb. helveticus (Lactobacillaceae) with Leuc. mesenteroides (Leuconostocaceae) as one the main bacterial species which may be the reliable information on microbial profile of NFM products. The application of NGS culture-independent methods to study the microbial ecology of fermented foods is of great significance in understanding the products, where Illumina sequencing has been shown to be one of the reliable tools in this study. Further studies on selective culturing of dominant bacteria, development of probiotic starter cultures and standardisation of processing methods may lead to industrialisation of ethnic food products.

Materials and Methods
Sampling. Fifty-four samples of naturally fermented milk products (chhurpi, churkam dahi and gheu/mar) were collected from high altitude mountains (1650-2587 meter) in Arunachal Pradesh (n = 35) and hills and mountains (381-4878 meter) in Sikkim (n = 19) of India ( Table 2). The products were aseptically collected from the traditional production centres, transported in an ice-box and stored in the laboratory at −20 °C.
Metagenomic DNA extraction. Metagenomic DNA was extracted by two different methods based on the nature of the samples i.e., lipid-rich sample (gheu/mar) and casein-based samples (dahi, chhurpi and churkam). For the gheu/mar (lipid-rich) samples, extraction of DNA was performed as per method I as described in 48 with some modifications. This method was chosen on the basis of the product being rich in its fatty content. The usage of a combination of petroleum ether:hexane (1:1) serves the purpose of dissolving the fat content resolving the product into two phases after rigorous vortexing. Briefly, 2 mL of the sample melted in low temperature was homogenized with 2 ml citrate buffer (2%). To this, 4 ml of petroleum ether: hexane (1:1) was added followed by vortexing and 10 min incubation at room temperature. 2 mL of the lower part of the homogenate was transferred to a sterile 2 ml screw-cap tube containing 0.5 g of zirconia/silica beads (0.1 mm) and 4 glass beads (2 mm). The tubes were centrifuged and the pellet resuspended in 150 µl proteinase-K buffer [50 mM Tris-Cl, 10 mM EDTA (pH 8), 0.5% (w/v) SDS]. After overnight incubation at 65 °C with 25 µl proteinase K (25 mg/ml), it was treated with 150 µl of 2X breaking buffer [4% Triton X-100 (v/v), 2% (w/v) SDS, 200 mM NaCl, 20 mM Tris (pH 8), 2 mM EDTA (pH 8)]. After addition of phenol (pH 8.0), the samples were treated in a bead beater three times (30 sec beating, 10 sec in ice) and further purified with chloroform: isoamyl alcohol mixture (24:1). Lastly, DNA was precipitated with ethanol and the pellet is dissolved in 50 µl of TE buffer (10 mM Tris, 1 mM EDTA).
For the casein-based samples (dahi, chhurpi and churkam), metagenomic DNA was extracted using the method of Keisam et al. 41 . This method was shown to recover maximum DNA yield from fermented milks 41 , hence it was also applied in this study. Briefly, 10 g or 10 ml of the samples were mixed with 90 mL 2% sodium citrate buffer and homogenized in a stomacher at 200 rpm for 2 min. Churkam (hard-cheese) samples were first grinded into powder before the homogenization. 1.5 mL of the homogenate was transferred to a sterile centrifuge  The raw sequence reads obtained was analysed using the default settings in MG-RAST 43 and an open-source bioinformatics pipeline QIIME v1.8.0 44 . A total of 7,614,683 post-quality filtered sequences originating from 54 samples belonging to 4 food types of NFM samples were uploaded to MG-RAST server with the MG-RAST ID number 4732361 to 4732414. The reads were subjected to secondary quality filtering to remove non-rRNA sequences before clustering into operational taxonomic units (OTUs) and subsequent generation of OTU tables at four different taxonomic levels (phylum, family, genus and species) using the SILVA SSU database in MG-RAST. Eukaryota-specific and unassigned OTUs were removed before performing further analysis.
Statistical Analysis. Normalisation of the OTUs relative abundance data was performed by log transformation log 10 (x i + 1). To understand the variation in the microbial community structure of different food types, PCA was plotted using Canoco software v4.52 (Wageningen University, The Netherlands). Significant difference in the bacterial community structure amongst the four food type was evaluated by ANOSIM with 10,000 permutations using Bray-Curtis similarity index in PAST v2.17. Any significant difference in the abundance of individual taxa at four different taxonomic levels between the four food types was tested by p-value calculation using Student's twov-tailed paired t-test and ANOVA. p-value < 0.05 was considered statistically significant and the differences in taxon abundance were represented as boxplots using BoxPlotR 45,46 . Species level-OTUs table was rarefied at a depth of 50 to 6482 sequences using the multiple_rarefactions.py script in QIIME for generation of alpha diversities rarefaction curves. Rarefaction plots were generated for Chao1 richness, diversity indices (Fisher alpha, Shannon), Shannon's equitability and Good's coverage using the make_rarefaction_plots.py script 44 . Significant differences in the alpha indices amongst the food types were calculated using the script compare_alpha_diversity. py in QIIME.
Data availability. Sequence data associated with this present work have been uploaded to MG-RAST server with the MG-RAST ID number 4732361 to 4732414.