Introduction

Gastric cancer (GC) ranks as the fifth leading cancer type, and it has been identified as one of the main causes of cancer-related deaths in the world1. The incidence of GC in eastern Asia, including Korea, is the highest worldwide, which is over 4 times higher than the rates in Western Europe2. It has been reported that the age-adjusted incidence rate of GC was 33.8 per 100,000 in Korea3. According to a prediction of cancer incidence and mortality in Korea, GC accounts for a remarkable proportion of the overall cancer burden because it is the second most common type of cancer among Koreans4. Recent studies about the human microbiome demonstrate a surge in interest in the context of disease, particularly in gastrointestinal cancers5. It is a known fact that there are various types of bacteria in different body sites, which are colloquially referred to as normal flora. This microbiota has the potential to maintain human health by interacting with the human body and can be considered pathological for the development of certain diseases5.

Due to the complex and dynamic nature of the human gastrointestinal microbiota, it is recently considered as a metabolically active organ and the complex nature of it evidently regulates gastrointestinal homeostasis by interacting with immune cells6. The normal flora in the gastrointestinal tract supports several processes, including the host mucosal immune response, energy metabolism, pathogen elimination, and cancer development7. It is widely implicated that human gut bacteria play a crucial role in the etiology of gastrointestinal cancers, particularly GC due to dysbiosis8. Dysbiosis is a condition in which there is an imbalance in the gastrointestinal microbiota, which consequently leads to several pathological conditions, specifically GC. Furthermore, gastric microbial community profiling revealed that dysbiosis of gut microbiota is associated with GC or precancerous lesions9.

The relationship between particular microbial pathogens and carcinogenesis has been the subject of exploration in the context of systems epidemiology. A considerable number of studies have focused on individual pathogens, such as Helicobacter pylori (H. pylori), and their ability to initiate disease conditions, such as gastritis or GC10. Of the numerous risk factors associated with GC occurrence, H. pylori infection plays a pivotal role11. It is an established fact that H. pylori infection is widespread in East Asia, which is associated with GC development12. Thus, it has been extensively suggested that reduction in chronic H. pylori infection is a useful strategy for preventing GC. Notably, the identification of specific microbial species that are associated with various disease conditions, particularly cancers pertaining to the gastrointestinal tract, has been achieved with the expansion of advance sequencing technologies. It has been suggested that colonization of non-H. pylori bacteria in the stomach can also stimulate GC risk13. Thus, understanding how dysbiosis influences host metabolic reactions and inflammatory responses is critical to defining the roles specific components of the microbiota play in carcinogenesis.

However, there is a paucity of data regarding the association between the gastric microbiota and GC. As it is well documented that the composition of the microbiota shapes immune responses at local and systemic levels, as well as inflammatory signaling related to GC development, it is necessary to investigate the gastric microbiota-associated alterations that may influence GC development. Moreover, it is necessary to explore the evidence related to this association based on epidemiological studies because various experimental approaches have already been employed to elucidate the microbiota profiles of GC patients14. Therefore, in the present case-control study, we aimed to investigate the association between the relative abundance of the gastric microbiota components and the risk of GC in a Korean population.

Results

General characteristics of the study population

Table 1 presents the general characteristics of the study participants with and without GC. The proportion of current smokers was higher in the patient group (29.1%) than in the control group (17.7%), and the patients were more likely to have a family history of GC (p = 0.003). The patients engaged in exercise less regularly (p < 0.001), were less educated (p < 0.001), and exhibited lower employment rates (p = 0.037) and lower monthly incomes (p < 0.001) than the controls. The proportion of H. pylori infection among the patients (99.6%) was higher than among the controls (93.4%). The patients had a higher energy intake than the controls (p < 0.001). The Shannon index, which represents the alpha diversity, was significantly higher in the controls than in the patients (p < 0.001).

Table 1 General characteristics of the study population.

Comparison of relative abundance and diversity indices

Table 2 presents the mean relative abundance comparison results between the patients with GC and controls. We found that patients with GC had a higher relative abundance of Helicobacteraceae, Propionibacteriaceae, and Prevotellaceae than the healthy subjects at the family level (p adjusted for false discovery rate (FDR) = 0.025, <0.001, 0.013, respectively). At the genus level, the relative abundances of Helicobacter, Propionibacterium, and Prevotella were higher in the patients than in the controls (p adjusted for FDR = 0.025, <0.001, 0.013, respectively) whereas the relative abundance of Lactococcus was higher in the controls than in the patients (p adjusted for FDR < 0.001). Regarding the species level, we found that the patients had higher relative abundances of H. pylori, Propionibacterium acnes (P. acnes), and Prevotella copri (P. copri) than the controls (p adjusted for FDR = 0.035, 0.002, and 0.049, respectively), while the relative abundance of Lactococcus lactis (L. lactis) was higher in the healthy controls than in the patients (p adjusted for FDR = 0.003).

Table 2 Mean relative abundance of selected family, genus, and species between the GC cases and controls.

Figure 1A presents a box plot for the Shannon index for the entire study population. We observed that the Shannon index was significantly higher in the controls than in the patients (p < 0.001) in the entire study population. Figure 1B shows the box plots for the Shannon index by sex. We found that the Shannon index was significantly higher in controls than in male or female patients (p = 0.013 and p < 0.001, respectively). Figure 2 represents a principal coordinates analyses (PCoA) plot of the Bray-Curtis distance based on the operational taxonomic units (OTU) relative abundance table at the species level. The blue triangles represent the controls, while the red dots represent the patient. The blue and red ellipses represent cases in which 95% of the data belong to the controls and patients at a 95% significance level. The 2-D plot of the first two principal coordinates shows a marked divergence between the GC patients and the healthy controls. The total diversity captured by the first two principal coordinates was 26.4%. The microbiota composition of the patients with GC was significantly different from that of the healthy controls (analysis of similarity (ANOSIM) R = 0.006, p = 0.030) according to the Bray-Curtis dissimilarity measure.

Figure 1
figure 1

Box plot of the Shannon index. (A) Box plot of the Shannon index for the cases and controls (p < 0.001). (B) Box plot of the Shannon index for cases and controls by sex (p = 0.013 for male and p < 0.001 for female).

Figure 2
figure 2

Principal coordinate analysis (PCoA) plot of the Bray-Curtis distance. The blue triangles indicate the controls while the red dots indicate the cases. The blue and red ellipses represent where 95% of data belong to the controls and cases, respectively, at 95% significance.

Association of candidate species with GC risk and evaluation of their abundance for GC diagnosis

Table 3 shows the association between the relative abundance of candidate bacterial species and GC risk. The subjects, who had a higher relative abundance of H. pylori showed a significantly higher risk of GC in model II than the subjects who had a lower relative abundance (odd ratio (OR) = 1.86, 95% confidence interval (CI) = 1.17–2.97, p-trend = 0.017). The females who had a higher relative abundance of H. pylori showed a significantly higher risk of GC than the females who had a lower relative abundance (OR = 3.36, 95% CI = 1.41–8.00, p-trend = 0.008). A positive association between the relative abundance of H. pylori and GC was observed in males, although the results were not significant.

Table 3 Associations between the relative bacterial species abundances and GC risk.

Those who had a higher relative abundance of P. acnes showed a significantly higher risk for GC than those in the lowest relative abundance group among the entire study population (OR = 4.48, 95% CI = 2.79–7.21, p-trend ≤ 0.001), in males (OR = 3.03, 95% CI = 1.62–5.66, p-trend ≤ 0.001) and females (OR = 8.19, 95% CI = 3.41–19.65, p-trend ≤ 0.001). Regarding P. copri, those who carried this species showed a significantly higher risk of GC than the noncarriers among the entire population (OR = 2.39, 95% CI = 1.36–4.19) and in males (OR = 2.78, 95% CI = 1.32–5.85). The subjects who carried L. lactis showed a significantly lower risk of GC than the noncarriers among the entire population (OR = 0.19, 95% CI = 0.10–0.39), in males (OR = 0.22, 95% CI = 0.08–0.57) and in females (OR = 0.18, 95% CI = 0.05–0.60) (Table 3).

Based on principal component analysis (PCA), two linear combinations of four species were obtained using Eigen value greater than 1.00. H. pylori and P. acnes were identified as dominant bacterial species in components 1 and 2, respectively, based on the principal component loadings. Approximately 63.0% of the variation was explained by the first two principal components. According to the linear discriminant analysis, the coefficients for the linear discriminants for PC1 (H. pylori dominant) and PC2 (P. acnes dominant) were −0.07 and 0.97, respectively. We investigated whether the combination of these four bacterial species could demonstrate better predictive ability using a receiver operating characteristic (ROC) curve and area under curve (AUC) analyses. An analysis using two linear combinations of four bacterial species showed 79.7% sensitivity and 67.1% specificity. The AUC was 77.7% indicating that there is a 77.7% chance that the model will be able to distinguish between positive and negative classes (Supplementary Fig. S1). The best cutoff point was chosen as 0.492 in order to find a balance between sensitivity and specificity.

Discussion

In this case-control study involving 556 participants (268 patients and 288 controls), we observed that the relative abundances of the H. pylori, P. acnes and P. copri species were significantly higher in the patients than in the controls, whereas the relative abundance of L. lactis was significantly higher in the controls than in the patients. Generally, the subjects who had a high relative abundance of the H. pylori, P. acnes and P. copri species showed a significantly higher risk of GC. In contrast, those who had a high relative abundance of L. lactis showed a significantly lower risk of GC. A significantly higher Shannon index was observed in the controls than in the patients. The ROC and AUC analysis results suggest that the identified four candidate bacterial species will be clinically useful for the identification of GC in Koreans.

H. pylori infection is the strongest single risk factor for GC, specifically in countries where H. pylori infection is endemic15. Reduced gastric acidity caused by chronic H. pylori infection lowers nutrient availability and local innate immunity responses16. Corpus-dominant infection leads to gastric mucosal atrophy, with an increase in gastric pH due to the loss of acid-producing parietal cells17. Due to the high relative abundance of H. pylori, our data represents a unique bacterial profile of the Korean population by virtue of the fact that the Korean people have been exposed to a similar diet for a long period of time18. In contrast, a study suggested that H. pylori colonization or stomach anatomic sites does not influence the gastric microbiota composition, although it differed between paired nonmalignant and tumor tissues19. In our study population, we observed that H. pylori had the highest mean relative abundance compared with other bacterial species, specifically in GC patients. The results were similar at the family (Helicobacteraceae) and genus (Helicobacter) levels. A study that focused on the molecular characterization of the human stomach microbiota in GC patients concluded that H. pylori is the dominant member of the nonmalignant gastric tissue microbiota in many GC patients20. Interestingly, a study conducted in Colombia revealed that the gastric microbiome composition is considerably different between people but observed a significant correlation with town of origin, although a significant correlation between the H. pylori phylogeographic population and microbial composition has not been identified21. Thus, the identification of the biological role of H. pylori needs to be investigated in clinical practice.

The critical role of H. pylori in GC pathogenesis has been well documented. The carcinogenic potential of H. pylori can be unraveled due to the effect from two H. pylori related virulence factors namely, vacuolating cytotoxin A (VacA) and cytotoxin associated gene A (CagA)22. H. pylori evasion can be stimulated because of the immunosuppressive activities of VacA which eventually leads to enhance gastric tumor survival15. The protein CagA can enter gastric epithelial cells for undergoing phophorylation15, leading to structural changes of cells, including cell scattering, elongation22, and resistance to apoptosis23. Because of changes in physiological and immunological environments in stomach due to the aforementioned H. pylori related pathological issues, composition of the gastric microbiota can be altered resulting in a proinflammatory condition15. Studies showed a strong association between proinflammatory cytokines polymorphisms and an increased risk of developing H. pylori-associated GC24,25,26.

In addition to the carcinogenic role of H. pylori, mucosal atrophy plays a critical role in GC pathogenesis23. Evidence from our study showed that a higher relative abundance of H. pylori increases the risk of GC. Although it has been widely accepted that there is no direct influence of H. pylori to the adults’ gastric microbiota composition, few studies have noted that there is an effect owing to the fact that those who carry H. pylori are feasible to have higher abundance of Spirochetes, Acidobacteria, and non-Helicobacter Proteobacteria and have comparatively lower abundances of Actinobacteria, Bacteroidetes, and Fermicutes phyla than uninfected adults27,28. Our analysis supports the theory that a higher relative abundance of H. pylori increases the risk of GC in males although the results are not statistically significant. This finding may be due to our limited sample size, although we had a relatively large overall sample. The Shannon index box plot shows that the alpha diversity was significantly higher in the controls than in the GC patients in our study sample. A similar bacterial diversity has been noted in a study in which there is a higher Shannon index in healthy subjects than in GC patients29. Such a change in bacterial composition may be the predominant causes of gastric atrophy resulting in GC even though the number of H. pylori bacteria decreases due to atrophy28. Additionally, based on our PCoA plot, the bacterial composition at the species level of our study was markedly divergent between the GC patients and control subjects. This finding also resembles the finding observed by Li, T. H. et al.29, indicating that there is a clear divergence between GC patients and healthy subjects in PCoA analyses.

Regarding P. acnes, our study found that individuals with a high relative abundance of P. acnes showed an increased risk of GC. A study based on comparative microbial community profiling of human stomach biopsies, found that overabundance of P. acnes is a cause of lymphocytic gastritis (LyG)30. Our study findings also show the mean relative abundance of P. acnes is higher in GC patients than in the controls. The same results were found for both the family (Propionibacteriaceae) and genus (Propionibacterium) levels. It is notable that P. acnes, a classic skin bacterium that causes acne, has been recently identified as a gastric microbiota31. Furthermore, LyG caused by P. acnes can enhance GC development by producing proinflammatory cytokines such as IL 1530. It is important to mention that the gut brain skin hypothesis indicates there is a complex interrelationship between acne and gut dysfunction mediated by the brain which has been recently validated by the microbiome studies32. This is supported by the fact that frequent associations of both anxiety and depression and gastrointestinal distress occur with the occurrence of acne. Such condition can be a causative factor for releasing neuropeptides from the enteroendocrine cells due to the production of neurotransmitters, such as serotonin, norepinephrine and acetylcholine by normal flora. These chemicals can trigger both intestinal and systemic inflammation which eventually leads to GC development by increasing the gut permeability that allow for cross talk between the gut and skin32.

Recent research in humans found that the localized and systemic diseases, including periodontitis, bacterial vaginosis, rheumatoid arthritis, metabolic disorders, and low grade systemic inflammation can be caused by the overabundance of Prevotella species at mucosal sites33. In vitro study suggested that Prevotella has remarkable capability in driving T helper type 17 (Th17) immune responses which evidently propose the association between increased Prevotella abundance and augmented Th1733. Further, the ability of Prevotella in producing redox proteins with an increased resistance to the host has also been proposed34. Based on the current study findings, it was observed that the subjects who carried P. copri had a significantly higher risk of GC than the subjects who did not carry P. copri. Additionally, there was significantly higher mean P. copri abundance in the GC patients than in the controls. It has been strongly suggested that P. copri induces inflammatory conditions in the human body, leading to the development of several types of diseases, including GC34. According to previous study on animal models, it has been shown that the overall bacterial composition can be affected by long term H. pylori infection35. A relative abundance of the Prevotella genera was found in studies conducted with patients without H. pylori infection36. Thus, H. pylori-induced changes in gastric microflora can be attributed to various factors and can lead to gastric atrophy, increased gastric pH, and the colonization of the stomach by transient bacteria35. However, the role of P. copri in GC development needs to be further investigated.

L. lactis was observed as a beneficial bacterium in the current study because the subjects who carried L. lactis had a lower GC risk. A study has revealed that there is a strong antiproliferative activity of the cytoplasmic fraction of L. lactis upon human colon cancer cells37. Furthermore, a study investigating the antiproliferative effects of the cytoplasmic fraction of L. lactis on a human stomach cancer cell line revealed that there is an inhibitory effect on cell proliferation with L. lactis treatment in a time and dose dependent manner38. L. lactis caused G0/G1 cell cycle arrest, which was associated with an increase in p53 and p21 expression, a reduction in cyclin D1 expression, and retinoblastoma protein phosphorylation, thereby inducing apoptosis. Additionally, it has been noted that the L. lactis bacterium has a probiotic effect in the human gut and can result in beneficial effects for human gastrointestinal health, including the prevention of gastrointestinal cancers38. Interestingly, we observed that the mean relative abundance of L. lactis was higher in healthy controls than in the GC patients in our study population.

We observed that two linear combinations (H. pylori and P. acnes dominant) of the four identified candidate bacterial species can be considered predictive of GC, therefore, representing a potential diagnostic marker. Together, our data indicated that H. pylori and P. acnes were highly abundant in the GC patients and positively identified GC with 79.7% sensitivity. It is evident that the abundance of H. pylori becomes lower due to the succession of microbial species as GC locally advances14. In contrast, our data indicated that GC patients have significantly high abundances of H. pylori. A possible explanation is that we collected the biopsy samples from patients with early gastric cancer in which bacterial succession had not progressed. A study conducted in Taiwan concluded that regardless of the biological roles of Clostridium, Fusobacterium and Lactobacillus in oncogenesis, the overabundance of these microbes serves as a diagnostic tool for GC14. Thus, it can be suggested that if there is a bacteria species that can promote the GC occurrence, the eradication of this bacteria is useful for decreasing GC incidence.

The human gut microbiome plays a critical role in gastrointestinal cancers39. A report on the next steps in studying the human microbiome and health in prospective studies pointed out that the importance of continuing and expanding of the recent microbiome research40. Despite the utilization of optimized techniques for sample collection, processing and storage, supplementary methodological approach is required specifically in the epidemiological context due to the inherent limitations of epidemiological studies. As the potential limitations in case-control studies of the microbiome and cancer, changes of the composition of the microbiome due to dietary factors, selection and recall bias have been emerged. It has been recommended that biorepositories for human samples establish a similar collection method to serve as a basis for nested case-control studies. In addition, conducting short interventions in human and animal studies can identify the important effects of diet on the microbiome, particularly the fecal microbiome40. Furthermore, managing optimal storage conditions soon after a sample is collected is necessary to reduce the bias arising from the quick changes in the genetic components of the microbes41,42.

Our study has both strengths and limitations. A major strength of our study is that the sample size was relatively large with 268 GC patients and 288 healthy controls, which provides sufficient statistical power to detect the relevant associations between the gastric microbiota and GC risk. Additionally, several possible confounding variables were taken into consideration that are possible risk factors for GC development, including age, smoking, family history of GC, regular exercise, education, occupation, income and total energy intake throughout the analysis. However, our study has potential limitations. In general, the presence of bias associated with a hospital based case-control study, including selection bias and recall bias should be raised. As this is not a prospective study, the associations between microbiome and GC can occur without a causal relation to GC because patients with early GC have altered microbiomes because of atrophy progression. Additionally, we only measured a single sample for our microbial measurements, and it has been shown that microbiome measurements at multiple time points could result in more precise exposure estimates43. However, it is important to emphasize that there are ethical issues in repeating biopsies in those with normal gastric histology and healthy subjects.

In conclusion, H. pylori, P. acnes and P. copri are strong risk factors, whereas L. lactis is a protective factor, for GC development in Koreans. The identified four candidate bacterial species will be clinically useful for the identification of GC in Koreans. Further microbiome studies, such as a prospective study that has the capability to infer causality, are warranted to confirm the findings of the current study. Furthermore, the study can be expanded by evaluating the impact of diet on the bacterial composition of the gastric mucosa and by including other races and ethnicities, specifically other East Asian populations, to improve the generalizability of the results. The identification of the gastric bacterial composition can also serve as a readily accessible, noninvasive biomarker for the identification of GC risk44. Moreover, conducting GC related pathway and functional studies based on gastric microbiome data is warranted, as most cancer related pathways are yet to be discovered45,46.

Materials and Methods

Study subjects

Participants were recruited at the National Cancer Center Hospital in Korea between March 2011 and December 2014. Individuals who had been histologically confirmed at the Center for Gastric Cancer as having early GC within the preceding three months were included in the patient group. Early GC was defined as an invasive carcinoma confined to the mucosa and/or submucosa, regardless of lymph node metastasis status. Patients who had been diagnosed with diabetes mellitus or had a history of cancer within the past five years, advanced GC, or severe systemic or mental diseases, as well as women who were pregnant or breastfeeding, were excluded. The control group was selected from individuals undergoing health-screening examinations at the Center for Cancer Prevention and Detection at the same hospital. Individuals in the control group with a history of cancer, diabetes mellitus, gastric ulcers, and H. pylori treatment were excluded. The final sample of 556 participants was composed of 268 patients and 288 controls (men, 353; women, 203). All study protocols were conducted according to the Declaration of Helsinki principles. This study was approved by the Institutional Review Board of the National Cancer Center (IRB number: NCCNCS-11-438). Written informed consent was obtained from all participants.

Data collection

Five gastric mucosa biopsy samples were collected from each study participant following the Sydney system after endoscopy and examination of the stomach. A biopsy sample in the greater curvature, at least 3 cm away from each tumor, was used for the metagenomics analysis. The H. pylori infection status was determined by a rapid urease test, a serological test and histological evaluation. Regarding the rapid urease test, one biopsy sample was taken from the greater curvature of the corpus. Four biopsy samples were collected from the lesser curvature of the corpus and antrum for histological evaluation. The H. pylori status was determined via Wright-Giemsa staining of the biopsy specimens by a pathologist who specialized in GC. A current infection was defined as at least one positive test result in the rapid urease test or histological evaluation of four biopsy sites47.

Participants were asked to complete a self-administered questionnaire. Demographic, lifestyle, physical activity, and medical history data were collected from the participants. Total energy intake was obtained from the semiquantitative food frequency questionnaire (SQFFQ), which has been previously reported as a reliable and valid questionnaire48. H. pylori infection was assessed by a rapid urease test and histological evaluation.

DNA extraction

DNA was extracted from the biopsy samples using the MagAttract DNA Blood M48 kit (Qiagen, Hilden, Germany) and BioRobot M48 automatic extraction equipment (Qiagen), according to the manufacturers’ instructions.

Metagenomic 16S rRNA gene sequencing

Input gDNA (12.5 ng) was amplified with 16S rRNA gene V3-V4 primers, and a subsequent limited cycle amplification step was performed to add multiplexing indices and Illumina sequencing adapters. The final products were normalized and pooled using PicoGreen, and the library sizes were verified using the LabChip GX HT DNA High Sensitivity Kit (PerkinElmer, Massachusetts, USA). Then, we sequenced using the MiSeq™ platform (Illumina, San Diego, USA). Each sequenced sample was prepared according to the Illumina 16S rRNA gene Metagenomic Sequencing Library protocols. DNA quantification and quality were measured by PicoGreen and Nanodrop analyses, respectively. The 16S rRNA genes were amplified using 16S rRNA gene V3-V4 primers for the 288 control samples and the 268 GC patient samples. The primer sequences are as follows: 16S rRNA gene V3-V4 primer.

16S rRNA gene Amplicon PCR Forward Primer.

5′ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG.

16S rRNA gene Amplicon PCR Reverse Primer.

5′ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC.

Preprocessed reads from each sample were used to calculate the number of OTUs. The number of OTUs was determined by clustering the sequences from each sample using a 97% sequence identity cut-off using quantitative insights into microbial ecology (QIIME) software (v.1.8.0). Taxonomic abundance was counted with a National Center for Biotechnology Information (NCBI) database using a confidence threshold of 0.8 derived from the preprocessed reads for each sample. The microbial composition was normalized using the values calculated from the taxonomic abundance count divided by the number of preprocessed reads for each sample to obtain the relative abundance.

Statistical analysis

To compare the demographic and lifestyle characteristics between the controls and patients, the chi-square test and Student’s t-test were performed for categorical variables and continuous variables, respectively. We compared the relative abundance of the bacterial taxa (class, family, genus, and species) between patients and controls. FDR was applied for the multiple comparison correction. The relative abundance of the candidate species was categorized into tertiles based on the relative abundance in the control group. Exceptionally, if more than one third of the subjects has a relative abundance of zero, that bacterial species was categorized into two groups (noncarriers and carriers) based on the median distribution of the controls. Noncarriers were defined as the subjects who had a relative abundance of zero. The group with the lowest relative abundance was used as the reference group. The ORs and 95% CIs were estimated using unconditional logistic regression models. The median values of relative abundance in each tertile category were used as continuous variables to test for trends. The OR estimates were calculated for the crude model (model I) and model II. Model II was adjusted for age, smoking, first-degree family history of GC, regular exercise, education, occupation, monthly income, and total energy intake. An association analysis was performed for the male and female groups. Boxplots were drawn for the Shannon index value comparisons between the patients and controls using the ggplot2 package. A PCoA was performed on a Bray-Curtis dissimilarity based on the OTU relative abundance table for the species level by using the R package’s “vegan”. Sample clustering in beta diversity analysis was tested using ANOSIM. The statistical significance of the observed R was assessed by 104 permutations9. To determine whether the candidate bacterial species can be used as diagnostic tool for GC, a ROC curve analysis was performed based on linear discriminant function. Initially, PCA was conducted to obtain two linear combinations of four variables that explain part of the variance of the model. Then, a linear discriminant function was constructed to distinguish between patients with GC and non-GC. For constructing the linear discriminant function, 424 observations were selected as training data set whereas 132 observations were selected as test data set. The training data set was used to calculate the linear discriminant function by using the “MASS” package in R. The model evaluation was performed using the ROC and AUC analysis results. Statistical analyses were conducted by using SAS version 9.4 software (SAS Inc., Cary, NC, USA) and the R platform (version 3.5.1) (The R Foundation for Statistical Computing, Vienna, Austria).

Ethical statement

This study was approved by the Institutional Review Board of the National Cancer Center and all the methods were performed in accordance with the approved guidelines and regulations (IRB number: NCCNCS-11-438). Written informed consent was obtained from all participants.