Suicide is the second leading cause of death in individuals between the ages of 10–34 in the United States1. Worldwide life-time prevalence of suicidal ideation (SI) was estimated to be 9.2% in a 2008 report of suicidal behaviors in 84,850 adults across 17 countries2. In a study of 67,000 degree-seeking college students across 108 institutions, 16,337 (24.3%) reported they harbored SI and 9.3% had made a suicide attempt3. The strongest association with suicidality, self-injury, suicide attempts, and mental health diagnosis in students was exposure to stress, in a dose-dependent manner2. Three or more exposures to stress resulted in 4.25 to 10.06 times higher endorsement of SI3, consistent with the diathesis-stress model of depression4. In 729 students, surveyed from 2002 to 2005, 11.1% endorsed having SI within the past 4 weeks, the majority of whom indicated they were not receiving treatment, and 16.5% reported at least one suicide attempt in their lifetime4. According to a 2020 CDC report, as many as one in four people in the U.S. between ages 18 and 24 had seriously contemplated suicide in the month prior5. Here, genetics, lifestyle (diet and sleep) factors, and salivary bacteria were explored for their associations with SI in 489 college students.

The genetic basis of SI risk is understudied, particularly in general population cohorts. Limited in number, genome-wide association studies (GWAS) and polygenic risk score studies have identified some possible genetic variants increasing in subjects with previous suicide attempts6,7,8,9, SI7,9,10,11, and other suicidal behavior11. However, this is often studied in restricted cohorts of individuals with schizophrenia12, bipolar disorder (BD)6,11, or other mood disorder7,8. Reports of suicide attempts in BD have yielded suggestive associations or gene variants with small effects11 or without definitive evidence of association8. Recent validation was attempted in studies of schizophrenia6 and BD, uncovering evidence for a role by cholesterol biosynthesis13. However, validation of other SNPs remains to be seen13. In one report, polygenic scores for major depressive disorder (MDD), but not for previous attempts of suicide, were predictive of SI7.

Evidence of autoimmunity or immune dysfunction in neurobiological and neuropsychiatric disorders14 warrant investigation into the mechanism by which immune haplotypes may play a role. Autoimmunity or immune dysfunction in neurobiological and neuropsychiatric disorders has been shown for schizophrenia14 and autism15,16,17,18,19 and human leukocyte antigen (HLA) genetics may play a moderate effect in depression20, although this connection is controversial in the literature. Furthermore, HLA alleles, particularly those that confer risk for type 1 diabetes, have been shown to influence the gut microbiome. However, the role of HLA in saliva microbiome composition has not been explored. Host immune surveillance systems recognize molecular patterns of microbiota, leading to inflammatory mediator recruitment. The role of HLA in depression, as well as in the saliva microbiome, remains to be elucidated.

Dietary influences on depression have been extensively described, with increasing attention on suicidal behaviors, and diet is also a confound of microbiome composition. Thus, it was of interest to the present investigation. Much of the nutritional psychiatry work has been performed in preclinical animal models and more human study is needed21. Clinical trials have suggested that the Mediterranean diet may prevent depression in patients with type 2 diabetes22, although the MoodFood trial demonstrated no benefit from a behavioral activation intervention and a reduction in MDD incidence was not observed after one year in overweight or obese individuals with subsyndromal depression23. Recently, a connection between fast food consumption and suicide attempts was reported in a cross-sectional study of adolescents from 32 countries, when controlling for food insecurity, obesity, physical exercise, and consumption of fruits and vegetables24.

While the literature on the gut-brain axis is growing, the oral microbiome has garnered little attention. Microbe-brain interactions play a consequential role in governing peripheral and central nervous system events such as cytokine production, short chain fatty acid release, microglial maturation and activation25, as well as expression of the synaptic plasticity-related gene, brain-derived neurotrophic factor (BDNF) in the amygdala and hippocampus26. Weakening of intestinal and mucosal layers facilitates communication of microbiota to the brain and can cause chronic, low-grade inflammation, as seen in depression27. The oral-gut-brain axis describes a pathway by which oral microbiota may influence neurological and systemic disorders either directly or indirectly, including periodontitis and non-oral systemic disease28. Within the saliva resides a complex network of microbes that closely represents the structure of commensal bacteria in the gut29. Microorganisms in the oral cavity are important for host health, as they deter exogenous organisms, contribute to pH dynamics required for a healthy mouth, and can influence nutrient supply in the oral cavity30. These microorganisms are found on both hard and soft oral tissues, with an estimated 100–200 phylotypes in every individual31. Saliva plays several important roles32 including: (1) delivery of innate and adaptive defense to the host, (2) provision of nutrients that support the growth of pathogenic or health-promoting microorganisms, (3) removal of substrates, and (4) regulation of pH within the oral cavity. It is the interaction with host that determines whether an oral bacterium is commensal or pathogenic31. Periodontal pathogens can insert into the periodontal pocket epithelium, where they can then enter the bloodstream and promote infection in the body and brain. Species such as Porphyromonas gingivalis, Tannerella forsythia, and Treponema denticola have been associated with periodontitis. Both gingivitis and periodontitis33, a chronic inflammatory condition, have been linked to cardiovascular health, diabetes, depression, gastrointestinal disease, Alzheimer’s, and certain cancers, suggesting a two-way effect between oral and systemic disease28. Emerging literature suggests that the oral microbiome may be leveraged for treatment and diagnosis of oral as well as systemic disease. Oral disease prevention strategies have been proposed to reduce burden.

Our understanding of the role of oral bacteria in the systemic pathophysiology of depression is limited and, to our knowledge, links with SI have not yet been explored in human populations. Saliva is an understudied entryway of bacteria into the human body and is easily sampled. In this investigation, saliva microbiome, genetics, and dietary habits were studied to identify risk indicators or factors associated with increased incidence of recent SI in a cohort of 489 college students. We sought to examine whether bacterial genera, species, and high-resolution amplicon sequence variants (ASVs) are associated with recent SI were identified, even when controlling for sex, dietary habits, and sleep. Bacterial associations were assessed by two approaches: across the general cohort as well as by accounting for lifestyle (diet and sleep) confounds. As diet is a major contributor to saliva microbiome composition, an assessment of diet was required to understand the interplay of the microbiome with host genetics and SI34. In particular, microbiome composition was examined in the context of reported sleep issues given that chronic fatigue has been linked to dysbiosis35 and insomnia is considered a risk factor for SI, suicidal behavior, and suicide36,37. Fourteen SNPs with published evidence of association to suicidal behaviors were assessed for links to SI and subsequently, salivary microbiota. HLA connections with SI and salivary microbes were also examined.


Description of the cohort and survey instruments used

Mediterranean Diet Quality Index (KIDMED) and Patient Health Questionnaire-9 (PHQ-9) responses are provided in Table 1. The cohort was comprised of 489 total students of whom 318 were female (65.0%) and 171 were male; the average age was 21.37 ± 4.14 years. Of the 489 subjects, 60 (12.3%) screened positive for SI, as defined by a score ≥ 1 on question 9 of the PHQ-9, and 109 (22.3%) scored ≥ 10 on the PHQ-9, which is the suggested cut-off for Major Depressive Disorder (MDD) with optimal sensitivity and specificity according to Kroenke et al.38. Of the 109 subjects (22.3%) who scored ≥ 10 on the PHQ-9, 64 scored within the moderate, 33 within moderately severe, and 12 within severe ranges, respectively. A total of 166 subjects (34.0%) scored in the mild range for depression (PHQ-9 score of 5–9), while 214 (43.8%) scored in the none to minimal depression range (PHQ-9 score of 0–4). The KIDMED questions, as presented to the participants in the electronic survey, are provided in Supplementary Table 1.

Table 1 Diet and depression distributions in the cohort.

The KIDMED and PHQ-9 total scores followed a non-normal distribution (female N = 317 and male N = 170; all p values < 0.026) and had a low, negative correlation (Spearman ρ = − 0.150, N = 487). PHQ-9 scores were statistically higher in females (\(\overline{x}\) = 7.0 ± 5.4, \(\tilde{x}\) = 6.0) than males (\(\overline{x}\) = 5.3 ± 4.6, \(\tilde{x}\) = 4.0, p = 0.001), although total KIDMED scores did not differ significantly (p = 0.082). Females were more likely to report having had SI in the past 2 weeks, χ2 (1, N = 489) = 5.3, p = 0.021, comprising 78.3% of those with SI but only 65% of the cohort.

Total KIDMED scores did not differ significantly between individuals with or without SI, nor across traditional PHQ-9 severity levels, excepting a higher score in those with “no” depression (\(\overline{x}\) = 5.3 ± 2.6, \(\tilde{x}\) = 5.0) compared to those with “mild” (\(\overline{x}\) = 4.4 ± 2.7, \(\tilde{x}\) = 4.0, p < 0.001).

Incidence of SI increases with the frequency of reported sleep issues

Overall, individuals with SI (n = 59) were more likely to endorse all eight of the other symptoms captured in the PHQ-9 than individuals without SI (n = 428, p < 0.001 for each comparison). Specifically, females who reported having issues with sleep nearly every day in the past 2 weeks were significantly more likely to endorse SI (OR = 23.5 [6.6–83.9], p < 0.0001). A higher proportion of females (66.0%) with SI and males with SI (38.5%) had diminished sleep most days within the past 2 weeks, compared to females (28.0%) and males (24.1%) without SI, respectively.

Regular consumption of nuts is inversely correlated with incidence of SI

Individuals who reported consuming nuts less than 2–3× weekly (N = 489, χ2 = 10.5, p = 0.001) were also more likely to endorse recent SI. In addition, females who reported not consuming a second fruit or fruit juice daily (N = 317) χ2 = 3.2, p = 0.072), trended toward being more likely to express SI in the past 2 weeks. Fewer females with SI (14.6%) reported consuming at least two fruits a day, compared to 24.7% of those without (24.7%). No single dietary habit on the KIDMED correlated with SI in males, possibly attributable to the low incidence of SI in males.

HLA Class II DQA1*01, DRB1*15, DRB3*02, DPB1*01/*05 and HLA Class I C*08 and A*30 are positively associated with incidence of SI, while DRB1*04, DQA1*03, DQB1*03, DRB3*01, and DQ8 appear protective

Five HLA haplotypes appear to be protective of SI, while eight were significantly associated with increased risk (see Table 2). Full statistics for all chi square analyses performed across the two HLA class I and eight HLA class II genes are provided in Supplementary Table 2.

Table 2 Human Leukocyte Antigen (HLA) haplotypes associated with incidence of suicidal ideation (SI).

Compared to those with NOSI, subjects with SI were more likely to have DQA1*01 homozygosity than to not have any DQA1*01 alleles, χ2 (1, N = 302) = 6.3, p = 0.012, and subjects who had at least one DQA1*01 allele were slightly more likely to have screened positive for SI, χ2 (1, N = 302) = 2.7, p = 0.0995. Subjects who screened positive for SI were twice as likely to carry DRB3*02, χ2 (1, N = 307) = 3.8, p = 0.051, and 1.98 times more likely to carry at least one allele of DRB1*15, χ2 (1, N = 289) = 3.4, p = 0.065. Furthermore, subjects with SI were more likely to be homozygous for DRB1*15, χ2 (1, N = 222) = 7.3, p = 0.0069. Subjects with SI were two to three times more likely to carry DPB1*01 or DPB1*05, respectively. Alleles C*08 and A*30 were also observed at higher frequencies in individuals with SI. Those with SI were also more likely to be DQ2.2 and DQ2.5, χ2 (1, N = 285) = 4.9, p = 0.026. 15.2% of SI were DQ2.2 and DQ2.5 compared to only 5.2% of subjects with no SI.

Regarding the protective haplotypes, subjects with DQA1*03 were 7.3× more likely to not have SI, p = 0.001. Subjects without DQB1*03 were 2.2× more likely to screen negative for SI, χ2 (1, N = 292) = 4.8, p = 0.028, than subjects with DQB1*03. Subjects who had at least one DRB1*15 allele were slightly more likely to have screened positive for SI, χ2 (1, N = 289) = 3.4, p = 0.064. No individual with SI was homozygous for DQ8 χ2 (1, N = 289) = 6.2, p = 0.0051, or had DR4–DQ8, χ2 (1, N = 270) = 8.8, p = 0.003, the latter, a haplotype which was observed in 21.5% of subjects with no SI. Only 6.1% of subjects with SI had DQ8 compared to 28.6% of those with no SI; those with DQ8 were 6.2× more likely to not endorse SI, χ2 (1, N = 285) = 6.2, p = 0.0051. Those with DRB1*04 (p = 0.0001) and DQ8 were 23.5 and 6.2 times less likely to report recent SI, respectively. Subjects with DRB3*01 were 5.2× more likely to not have SI (p = 0.015).

Presence of minor G allele at rs10437629 is associated with 2.6× increased incidence of SI

Recent SI in the cohort was assessed with respect to genotype at available single nucleotide polymorphisms (SNP) with a prior connection to suicidal attempts, suicide, or suicidality per EBI annotation in the NHGRI-EBI GWAS catalog. Of the sixteen SNPs acquired by the PMRA, fourteen were observed at a minor allele frequency (MAF) of ≥ 0.10 in our cohort and were subsequently tested against incidence of SI. rs10437629, which has been associated with attempted suicide in bipolar disorder6, was the only SNP found to be significantly associated with SI, with the minor G allele conferring increased incidence of SI (OR = 2.62 [1.17 to 5.828, 95% CI], p = 0.0186) at a MAF of 0.132. Data for this SNP was available for 269 individuals (Fig. 1). The increased incidence of SI was observed in both heterozygotes and homozygotes, although the presence of only one G allele was sufficient and homozygosity for G was rare (Fig. 1). While 8.3% of individuals homozygous for the A allele expressed SI, the incidence was 19.0% for those possessing at least one G allele at rs10437629.

Figure 1
figure 1

Interaction of rs10437629 genotype and endorsement of recent suicidal ideation (SI) with significantly associated saliva species. (a) Distribution of A and G alleles in rs10437629 (N = 269). (b) Relative abundance of Alloprevotella rava counts across genetic variants and SI, with a significant difference in abundances between individuals with AA and individuals with AG or GG, Wilcoxon p = 0.037. Here, two groups were compared (homozygotes AA versus heterozygotes, AG, or homozygotes, GG, for the minor G allele) with a Wilcoxon test. (ck) Selected bacteria that were differentially abundant across rs10437629 genotype (AA vs. AG or GG) and endorsement of SI. For these associations, Kruskal–Wallis tests were performed against four separate genotype-SI combinations: (1) AA genotype and no SI (AA_No), (2) AG or GG genotype and no SI (AA or GG_No), (3) AG or GG and SI (AG or GG_Yes), and (4) AA genotype and SI (AA_Yes). In e, Alloprevotella rava is again presented but with this four-group comparison. For each four-group comparison, a Kruskal Wallis test was conducted; p values are presented within the plots. Species assignment was based on Silva taxonomy of the amplicon sequence variants (ASVs) and included all ASVs that were mapped to that closest cultured relative.

Differences in the saliva microbiota were also observed with respect to the interaction of this genotype with recent SI (Fig. 1). Alloprevotella rava was rarely seen in individuals with AG or GG (Fig. 1b, p = 0.037 for two group comparison), and more specifically was highest in abundance in those with AA and without SI (Fig. 1e, p = 0.054 across the four groups). Prevotella melaninogenica, which was a significant microbe present as high as 20% relative abundance in individuals AA and without SI, had the lowest abundances in individuals with SI and possessing the G allele (Fig. 1c, p = 0.007). Prevotella pallens was highest in those with AA and without SI (p = 0.054). Campylobacter concisus was highest in individuals with AA, whether they expressed recent SI or not, but was lowest in those with the G allele and expressing SI (Fig. 1f, p = 0.12).

Although observed at low relative abundances and prevalence, significant differences were observed in the abundance of Treponema denticola (Fig. 1h, p = 0.037) and Fusobacterium naviforme (Fig. 1j, p = 0.0054), and near significant differences were observed across the four groups with respect to Leptotrichia hofstadii (Fig. 1g, p = 0.11), Dialister pneumosintes (Fig. 1i, p = 0.067), and Conservatibacter flavescens (Fig. 1k, p = 0.056), which were all seen at the highest abundance in individuals who were AA and did not have SI. No individual who had the G allele and expressed recent SI had Leptotrichia hofstadii in their saliva, and Dialister pneumosintes was only observed in one of these subjects.

Dietary confounders of saliva microbiome diversity

Saliva microbiome data was available for a total of 372 individuals. Microbial divergence of non-SI controls (N = 325) and individuals with SI (N = 47) was observed at the community-level by the Bray–Curtis dissimilarity measure (p = 0.047). Dissimilarity appears to be regulated in part by sex and regular pasta or rice consumption (Supplementary Table 1, p’s < 0.05). Variance is also explained by daily consumption of fruit and whether the subject visits a fast-food hamburger restaurant more than once weekly (p’s < 0.1), but not by overall KIDMED compliance score (p = 0.697).

Microbial diversity indices are lower in individuals with SI in the presence of sleep issues

Microbiome diversity differences in suicidal ideation (SI) endorsement concurrent with the presence of sleep issues were assessed. Shannon and Diversity Coverage indices (p’s = 0.045 and 0.036, Supplementary Fig. 1b,d) were significantly lower in SIsleep. Diversity Gini Simpson, Inverse Simpson, and Evenness Simpson were lower in individuals with SI, although at p ≤ 0.1. Richness in NOSIsleep and SIsleep was not significantly different (Supplementary Fig. 1g,h).

Network analysis of salivary microbiota observed in individuals with sleep issues

Bacterial networks of salivary genera were assessed in the saliva of individuals with sleep issues with correlation analysis (Supplementary Fig. 2). Spearman correlation coefficients from the network analysis are provided in Supplementary Table 3.

Overall, Veillonella had strong negative correlations with genera including Actinomyces, Alloprevotella, Bergeyella, Campylobacter, Granulicatella, Oribacterium, Prevotella, Rothia, and Stomatobaculum (Spearman correlations < − 0.4; Supplementary Fig. 2a) and was positively correlated with Megasphaera, Saccharimonadaceae TM7x, and Selenomonas (Spearman correlations > 0.45; Supplementary Fig. 2a), all three of which were more abundant in individuals with SI (Supplementary Fig. 2b). Genera that negatively correlated with Veillonella were observed at higher abundance in controls (Supplementary Fig. 2b). Granulicatella correlated positively with Streptococcus, Rothia, Haemophilus, Porphyromonas, Fusobacterium, Gemella, Actinomyces, and Aggregatibacter (Spearman correlation between 0.649 and 0.362, Supplementary Fig. 2c) and negatively with Selenomonas, while Alloprevotella correlated positively with Neisseria, Haemophilus, Eubacterium nodatum group, and Peptostreptococcus (Supplementary Fig. 2d). Prevotella and Atopobium (higher in NOSIsleep), and Selenomonas, Veillonella, and Saccharimonadaceae TM7x (higher in SIsleep) were the top five genera most positively correlated with Megasphaera (Supplementary Fig. 2e). Haemophilus, Porphyromonas, Capnocytophaga, Gemella, and Fusobacterium were higher in NOSIsleep and were the top five genera most negatively correlated with Megasphaera.

Differentially expressed ASVs in individuals with SI

Using DESeq2, normalized base mean counts of the ASVs observed in the saliva community were compared in four methods: (1) the full cohort: those with SI compared to those with no SI, NOSI, and then selected cohorts, where those with SI were compared to (2) those with PHQ-9 scores above 10 (the published threshold for MDD), DEP, (3) those with PHQ-9 scores < 5, NODEP, and (4) those with PHQ-9 scores between 5 and 9, MILD), separately. Depending on the pairwise group comparison, between 240 and 439 ASVs were found to be differentially abundant between those with SI and those without (Supplementary Fig. 3; see Supplementary Tables 47 for full statistics of all significant ASVs). SI and NOSI groups demonstrated the greatest number of significantly different ASVs, while SI and DEP had the fewest, suggesting that their microbial communities were more similar, although still distinctive.

The four models shared 75 differentially abundant ASVs in common (Supplementary Fig. 3e), of which 31 were more abundant in individuals with SI and 44 were more abundant in all four comparative groups without SI. Full statistics and taxonomy for these ASVs are provided in Table 3. Bacteroides dorei (ASV509), Streptococcus undetermined (ASV396), Selenomonadaceae undetermined (ASV333), and Klebsiella oxytoca (ASV163) were at least four times higher in individuals with SI, irrespective of the comparative groups’ total PHQ-9 scores (Table 2). Consistently elevated at least four-fold in the non-SI groups were Alloprevotella tannerae (ASV250), Alloprevotella undetermined (ASV115), Prevotella 7 melaninogenica (ASV147), Veillonella parvula (ASV180), Leptotrichia undetermined (ASV289 and ASV179), Saccharimonadales undetermined (ASV336), and Klebsiella michiganensis (ASV58), irrespective of PHQ-9 score (Table 3).

Table 3 Amplicon sequence variants (ASVs) elevated in students with suicidal ideation (SI) or without.

Among the taxa observed at higher abundances in individuals with SI were members of Actinobacteria, Bacteroidia (Bacteroides dorei, Prevotella melaninogenica), Bacilli (Streptococcus gordonii), Negativicutes (Veillonella atypica, Veillonella dispar, Dialister, and Veillonella), Fusobacteriia (Leptotrichia and Fusobacterium nucleatum), Patescibacteria (Saccharimonadaceae TM7x), Burkholderiales (Neisseria mucosa), Enterobacterales (Klebsiella oxytoca). Among those observed more abundantly in individuals without SI (irrespective of no, mild, or moderately severe depression) were members of Actinomycetales, Bacteroidales (Alloprevotella rava, Tannerella, Porphyromonas, Alloprevotella tannarae), Capnocytophaga granulosa, Solobacterium moorei, Bacilli (Abiotrophia, Granulicatella, Streptococcus, and Gemella), Negativicutes (Selenomonas sputigena, Veillonella dispar, Veillonella atypica, and Veillonella parvula), Fusobacteriia (Leptotrichia, Fosbacterium), Neisseria, and Gammaproteobacteria (Haemophilus haemolyticus, Enterobacter kobei, Klebsiella michiganensis, Serratia marcescens, Haemophilus pittmaniae, and Klebsiella michiganensis).

Of the more dominant ASVs (those with a normalized base mean count > 40; Supplementary Fig. 3g) seen in any of the four models, ASVs belonging of Lautropia, Veillonella, and Megasphaera micronuciformis were among those more abundant in individuals with SI, while ASVs belonging to Alloprovetella, Porphyromonas, R. mucilaginosa, N. perflava, P. nanceiensis, S. oralis, and were more abundant in any of the comparative groups without SI. The largest increase in relative abundance in individuals with SI compared to NOSI was observed in ASV509 (Bacteroides dorei, 4.015 log2 fold change), and ASVs that were the most elevated in NOSI included ASV214 and ASV250 (Alloprevotella tannerae), ASV115 (Alloprevotella), and ASV147 (Prevotella melaninogenica), all log2 fold changes > 3.1.

In the NOSI model comparison, Veillonella ASV121 and ASV40 were both elevated in individuals with SI. According to NCBI BLAST, these ASVs have 99.8% and 95.0% homology with Veillonella nakazawae T1-7. ASV40 was 3.14× higher in individuals with SI versus those without (NOSI), with a normalized base mean of 125. Individuals with SI were 2× more likely to have a non-zero count of at least one of the ASVs, χ2 (1, N = 372) = 5.1, p = 0.024. While 51.1% of SI had at least one of these ASVs, they were found at 34.2% prevalence in NOSI.

Differentially abundant closest species and genera in SI after controlling for sex, diet, and sleep

After propensity score matching on sex and the diet confounders were defined, 17 unique species were discovered to be differentially more abundant in the matched controls (NOSI) and 22 species and 21 genera were more abundant in individuals with SI at p < 0.1 (Table 4). Among the NOSI controls,18 genera were more abundant compared to those with SI (Table 4).

Table 4 Salivary bacterial genera and closest cultured relative species that are differentially abundant in suicidal ideation (SI) after controlling for sex and diet.

After additionally matching on sleep issues, 26 genera were discovered to be more abundant in controls and 8 were more abundant in individuals with SI (Table 5). Twenty species were more abundant in controls and 14 were more abundant in individuals with SI (Table 5). Several species and genera were consistent across the two propensity score matching paradigms, demonstrating the bacterial association is present with no regard to sleep as a potential confounder. In individuals with no SI, Granulicatella adiacens, Streptococcus parasanguinis, Fusobacterium periodonticum, Haemophilus influenzae, Caulobacter vibrioides, and Treponema amylovorum were elevated in both models, as were Alloprevotella, Porphyromonas, Gemella, Fusobacterium, Massilia, and Caulobacter. In individuals with SI, Megasphaera micronuciformis, Veillonella rogosae, Lactoboccous lactis, and Fusobacterium naviforme, as well as Megaphaera, Lactococcus, and Lautropia were elevated in both models.

Table 5 Salivary bacterial genera and closest cultured relative species that are differentially abundant in suicidal ideation (SI) after controlling for sex, diet, and the presence of sleep issues.

Granulicatella was the single genus more abundant (log2 Fold Change of 0.903, padj = 0.00913) in NOSIsleep controls than those with SIsleep, with a base mean of 241.86. Eight species were more abundant in NOSIsleep controls, including Capnocytophaga leadbetteri, Granulicatella adiacens, Prevotella melaninogenica, and Prevotella intermedia, while Megasphaera micronuciformis was more abundant in SIsleep (log2 Fold Change of 1.203, padj = 0.0264) and with normalized base mean counts of 1800.6 and 1564.6 for P. melaninogenica and M. micronuciformis, respectively, indicating they are among the more dominant bacteria in the saliva ecosystem. See Supplementary Table 8 for the DESeq2 statistics for the significant genus and species.

Core microbial differences in individuals with and without SI, irrespective of sleep problems

Within the full cohort, at a threshold of 70% prevalence, representing 39 unique ASVs and 7,570,711 sequences, an out of bag error rate of 0% was observed for SI classification (Fig. 2a). Lautropia (ASV110), Megasphaera micronuciformis (ASV20, ASV19), Candidatus Saccharimonas (ASV55), Rothia dentocariosa (ASV91), Catonella (ASV116), Leptotrichia (ASV52), Stomatobaculum (ASV94), Haemophilus parainfluenzae (ASV11), and Streptococcus parasanguinis (ASV39) were among the top ASVs contributing to the classification at 70% prevalence (Fig. 2b,c).

Figure 2
figure 2

Microbial community prevalence differences in the full cohort (SI vs. NOSI) and in the sleep-restricted cohort (SIsleep vs. NOSIsleep). (a) Principal coordinates analysis of amplicon sequence variants (ASVs) present at 70% prevalence in either the SI (N = 47) or NOSI (N = 325) in the full dataset. (b) Silva taxonomy of the top 30 ASVs contributing to the classification for SI versus NOSI. (c) Mean decrease accuracy of the top 10 ASVs for the model. (d) Principal coordinates analysis of amplicon sequence variants (ASVs) present at 70% prevalence in either the SI (N = 47) or matched NOSI (N = 94) subjects who were selected by propensity score matching on sex, diet factors, and sleep issues. In this analysis, microbial communities were assessed after removal of the thirteen depressed individuals from the matched NOSI cohort. (e) Silva taxonomy of the top 30 ASVs contributing to the classification of SI vs. NOSI after propensity matching. (f) Mean decrease accuracy of the top 10 ASVs for the model. Full sequences for the ASVs are provided in Supplementary Table 10.

The same analysis was performed within the cohort matched on sex, diet (consumption of fruit daily, pasta or rice at least five times weekly, nuts at least twice weekly), and whether sleep issues were present, to assess whether similar patterns in the microbiota were observed when holding sleep issues constant. At a threshold of 70% prevalence, 38 ASVs and 2,868,689 resultant sequences were retained with an out-of-bag error rate of 0% for the classification of SI when considering only those individuals with sleep issues (Fig. 2d). The most important ASVs for the classification are listed (Fig. 2e) and include members of M. micronuciformis, Rothia dentocariosa, F. periodonticum, P. melaninogenica, C. concisus, P. salivae, Eubacterium nodatum group sulci, H. parainfluenzae, V. atypica, G. adiacens, S. parasanguinis and undetermined species of Lautropia, Stomatobaculum, Saccharimonadales TM7x, Catonella, Porphyromonas, Bergeyella, Peptostreptococcus, Lachnoanaerobaculum, Candidatus Saccharimonas, Atopobium, and Veillonella, with the mean accuracy for the top 10 most important ASVs provided in Fig. 2f.

Removal of the thirteen individuals from the matched NOSIsleep cohort with the highest PHQ-9 scores (five subjects with moderately severe to severe PHQ-9 total scores and eight subjects with moderate PHQ-9 total scores) did not significantly change the classification error. As with the original model, the top ten ASVs at 70% prevalence included ASV19, 20, 110, 55, 52, 94, 116. However, Granulicatella adiacens (ASV22), Veillonella (ASV7), and Streptococcus oralis (ASV3) comprised the remaining ASVs with the highest contribution to this model in place of Rothia dentocariosa (ASV91), Haemophilus parainfluenzae (ASV11), and Streptococcus parasanguinis (ASV39).

Salivary microbiota abundances are associated with HLA haplotypes linked to SI

Several notable bacterial species were associated with the HLA haplotypes linked to increased SI incidence in this cohort. Bacterial mean relative abundances are provided in Supplementary Table 9.

In subjects with DQ2.2 and DQ2.5, we observed higher relative abundances of M. micronuciformis (\(\overline{x}\)Not DQ2.2 and DQ2.5 = 2.85 and \(\overline{x}\)DQ2.2 and DQ2.5 = 4.60, p = 0.026) and 9× higher counts of Veillonella rogosae (\(\overline{x}\)Not DQ2.2 and DQ2.5 = 0.135 and \(\overline{x}\)DQ2.2 and DQ2.5 = 0.994, p = 0.001). M. micronuciformis was also elevated in those who were not DR4 and DQ8 (3.1% relative abundance). In these individuals, 2–4× higher relative abundances of Haemophilus haemolyticus and Haemophilus pittmaniae were also observed.

Those who were DR4–DQ8 had elevated relative abundances of Enterobacter kobei (\(\overline{x}\)Not DR4–DQ8 = 0.029 and \(\overline{x}\)DR4–DQ8 = 1.395, p = 0.013) and Klebsiella aerogenes (\(\overline{x}\)Not DR4–DQ8 = 0.078 and \(\overline{x}\)DR4–DQ8 = 3.3056, p < 0.0001). Moreover, both bacterial species had a mean abundance of 0 in individuals with DQ2.2 and DQ2.5. A 1.5-fold elevation of Haemophilus pittmaniae and nearly fourfold elevation of Veillonella rogosae were observed in the absence of DR4–DQ8.

In subjects homozygous for DRB1*03, higher abundances of P. melaninogenica were observed compared to heterozygotes (\(\overline{x}\)DRB1*03 homozygotes = 8.731 and \(\overline{x}\)DRB1*03 heterozygotes = 4.025, p = 0.032) as well as in those who did not carry DRB1*03 (\(\overline{x}\)Not DRB1*03 = 3.892, p = 0.018). Those who were homozygous for DPA1*01 displayed higher abundances of Rothia mucilaginosa than heterozygotes (\(\overline{x}\)DPA1*01 homozygotes = 1.536 and \(\overline{x}\)DPA1*01 heterozygotes = 0.9458, p = 0.036). Neisseria mucosa was elevated in DRB1*15 homozygotes, both when compared to heterozygotes (\(\overline{x}\)DRB1*15 homozygotes = 1.822 and \(\overline{x}\)DRB1*15 heterozygotes = 0.454, p = 0.003) and when compared to those without the allele (\(\overline{x}\)Not DRB1*15 = 0.619, p = 0.007).

The mean relative abundance for Enterobacter kobei was observed at 0.0% but 0.35% in those who did not have the haplotype (Supplementary Table 9). Likewise, Serratia marcescens was not observed in DRB1*15 homozygotes and was present at 0.90% in those who did not have DRB1*15 and at 0.97% in those with only one allele.

Although at low abundance, after controlling for sex, dietary habits (e.g., daily consumption of fruit, regular consumption of nuts, dining at a fast-food hamburger restaurant more than once weekly, consumption of pasta or rice every day or at least five times a week), and sleep issues, several genera and species were significantly differentially abundant between those who were DRB1*15 (N = 78) and those who were not (N = 211). Of note, Tannerella was observed at significantly higher counts in those with DRB1*15, excepting one outlier who reported to eat sweets frequently (more than three times daily) and pasta or rice more than five times a week (Fig. 3a, p = 0.0011). Johnsonella was also observed higher frequency in those with DRB1*15 (Fig. 3b, p = 0.0059). Faecalibacterium prausnitzii, although rarely observed in the saliva, was not present in any individual endorsing SI, nor any individual with DRB1*15, both in the full cohort (N = 289) and when controlling for sex, diet, and sleep in a 2:1 propensity score match (N = 234, Fig. 3c).

Figure 3
figure 3

Selected genera and species that are associated with DRB1*1500 genotype and increased incidence of suicidal ideation (SI). Relative abundances of (a) Tannerella and (b) Johnsonella were compared across DRB1*1500 groups after controlling for sex, diet factors, and sleep using propensity score matching in a 1:2 ratio of those with or without DRB1*1500 (n = 156). (c) Relative abundance of ASVs with a closest cultured relative of Faecalibacterium prausnitzii. The two genotype groups are further categorized based on SI endorsement (Yes/No) at right. This species was not observed in any individual endorsing SI and only ever present in those without SI and not harboring the DRB1*15 allele. For all taxa at (ac), significance was calculated with a Wilcoxon test statistic between those with or without the DRB1*15 allele.

Salivary bacteria associated with SI after additionally controlling for HLA genetics

After controlling for sex, dietary habits, presence of sleep issues, and HLA genetics at DRB1*15, DQA1*01, DQB1*03, DPB1*05, and A*30, notable relative abundance differences were discovered in 17 genera and 21 species (Fig. 4). Fifteen genera were significantly more abundant in the NOSI controls, including Streptococcus, Prevotella, Haemophilus, Rothia, Fusobacterium, Alloprevotella, Granulicatella, Porphyromonas, Peptostreptococcus, Stomatobaculum, Gemella, Catonella, Solobacterium, and Dialister (Fig. 4a–c). Granulicatella and Rothia were rarely observed at an abundance > 1% in individuals with SI. The saliva microbiome was heavily dominated by Veillonella in individuals with SI (median ± SD relative abundance of 60.9% ± 21.1) compared to the NOSI matched controls (41.8% ± 22.2). Megasphaera was also significantly higher in individuals with SI.

Figure 4
figure 4

Differentially abundant genera and species in individuals with suicidal ideation (SI) vs individuals without SI (NOSI) after controlling for sex, diet, sleep, and HLA genetics. Wilcoxon statistics (N = 93, SI = 31, NOSI = 62) for relative abundances compared between SI and NOSI according to a 1:2 nearest neighbor selection using propensity score matching paradigm on sex, five dietary habits, presence of sleep issues, and HLA genetics at DRB1*15, DQA1*01, DQB1*03, DPB1*05, and A*30 (haplotypes that were earlier identified to either carry increased risk or appear to be protective for SI in the cohort). Relative abundance (%) is on the y axis. Significant results are presented here at the genus (ac) and species (df) taxonomic ranks.

Excepting Bacteroides dorei, Megasphaera micronuciformis, and Fusobacterium naviforme, all the identified significant species were higher in abundance in the matched NOSI controls (Fig. 4d–f). Streptococcus oralis, Prevotella melaninogenica, Haemophilus parainfluenzae, Fusobacterium periodoncitum, Rothia mucilagniosa, Streptoccocus salivarius, Prevotella salivae, and Prevotella nanciensis were all higher in NOSI matched controls, and Graulicatella adiacens and Streptococcus parasanguinis were rarely observed at an abundance > 1% in individuals with SI. Although generally less abundant in the saliva, Actinomyces pacaensis and Actinomyces graevenitzii were both observed at higher abundance in NOSI controls. Solobacterium moorei, Dialister pneumosintes, Corynebacterium durum, Actinomyces naeslundii, Peptoanaerobacter stomatis, and Prevotella micans were rarely observed at all in individuals with SI.


To our knowledge, this is the first investigation of oral microbiota associations with SI. Suicidal thoughts and behaviors are a strong risk factor for MDD onset39, and the converse is also true: MDD increases risk for SI. Discovering links to oral microbiota may help us elucidate novel therapeutic targets for MDD and other neuropsychiatric conditions. Accumulating evidence supports the view that depression is an inflammatory disease, which motivated us to consider HLA genes and salivary microbiota which could contribute to the etiology of SI. Here, we identified novel HLA haplotypes that may be risk indicators or confer protection against SI, as well as salivary microbes that were differentially abundant across these haplotypes (Fig. 5).

Figure 5
figure 5

Summary of findings. Here we describe novel associations of diet, genetics, and the saliva microbiome with suicidal ideation (SI) in a cross-sectional study of 489 university students (65.0% female). A 12.3% incidence of SI was observed in this cohort. Students provided a 2-mL saliva sample for microbiome and genetic analysis and completed two brief surveys electronically. Dietary patterns were assessed using the KIDMED survey and SI determined by dichotomized responses to Question 9 of the Patient Health Questionnaire-9 (PHQ-9). Created with

Across the analytical approaches applied in this investigation, Megasphaera micronuciformis was found to be elevated in individuals with SI. Among the genera that comprise the mixed species signature of periodontitis, confirmed by both culture-dependent and culture-independent methods31, is the lactic acid producing genus, Megasphaera, which is also elevated in periodontal disease40. Although certain Megasphaera species may be commensal, such as Megasphaera sp. Strain DISK18 which does not possess any virulence determining genes41, M. micronuciformis has been implicated in various human diseases including bacterial vaginosis42 and cancer43,44, with evidence to support its role in increasing elevation and cell death42. It has been isolated from gut, oral, and female reproductive tract42,43,45. Megasphaera micronuciformis has been implicated in various cancers, where it is a putative microbial biomarker found commonly on tongue coatings in gastric cancer patients43, along with Selenomonas sputigena ATCC 35185, and is in elevated abundance in the gut microbiota of locally advanced lung cancer chemotherapy treatment non-responders44. M. micronuciformis has been shown to elevate inflammatory mediators, induce cytotoxicity, and lead to accumulation of cell membrane lipids in a model of vaginal ecosystems where Veilloneallaceae members including M. micronuciformis were shown to alter the microenvironment of the cervix42. Selenomonas was also found at higher abundance in individuals with SI. This is a genus that is elevated in generalized aggressive periodontal disease46. Another notable bacterium observed in higher abundances in individuals with SI included Bacteroides dorei (log2fold change of 4.0 when compared to all individuals without SI). This bacterium has been implicated as a potential biomarker for future autoimmunity, where it dominates the gut microbiome of infants prior to seroconversion47.

Two ASVs that were more likely to be present in individuals with SI and were found at elevated abundance have high shared sequence identity with a newly proposed species of Veillonella, Veillonella nakazawae T1-7 (AP022321.1 and LC512486.1)48. Strain T1–7 produces acetic and propionic acids as fermentation end-products. Its catalase production and partial sequence for dnaK, rpoB, and citrate synthase (gltA) distinguish the species from other members of Veillonella, suggesting a novel lineage within the genus. The literature on the oral microbiome and its connection to depression is quite limited, with only one recent publication in this area. The investigation found that Neisseria spp. and Prevotella nigrescens were more abundant in 40 depressed young adults compared to matched controls49. This study did not control for sleep or diet factors, as we have done in the current report, and utilized a six-item instrument adapted from a diagnostic interview which did not capture suicidal behaviors or SI. We controlled for sleep and diet factors, and found decreased abundance in individuals with SI, while the authors found slightly increased abundance in their depression cohort.

Saliva is a major nutrient source for the surface coating of the tongue, contributing glycoproteins, amino acids, and peptides. Some bacteria including Actinomyces, Veillonella, and Fusobacterium are capable of degrading short-chain fatty acids into ammonia, sulfur, and indoles, while the latter two and Lactobacillus are also capable of lactate conversion into weaker acids, which can neutralize acid in the mouth50. Although Fusobacterium periodonticum and Fusobacterium as a genus were both overall found in greater abundance in NOSI, one Fusobacteriaceae species, Fusobacterium naviforme, was found higher in individuals with SI. Veillonella atypica was observed in elevated abundances in individuals with SI. In the present report, two Lactobacilliaceae species, G. adiacens and S. parasanguinis, were observed at higher abundance in the absence of SI. All associations occurred after propensity matching on sex and diet, and irrespective of sleep as a potential confounder.

Among the bacteria associated more with the microbiome of the controls rather than those with SI were Granulicatella, R. mucilaginosa, Haemophilus parainfluenzae, Streptococcus, and N. perflava, which is considered a commensal. These findings are consistent with the current literature investigating the healthy oral cavity. R. mucilaginosa is a Gram-positive, facultative anaerobe with weak catalase activity and is a common resident of the natural microbiota of the upper respiratory tract and oropharynx51,52. R. mucilaginosa was detected at 3.41% of the aerobic microbiota of the human oral cavity and has been isolated from pit, fissure, buccal surface plaque, buccal membrane surface, soft tissues of the oral cavity, and the tongue dorsum, where predominantly resides52. In a recent WGS analysis of healthy individuals, H. parainfluenzae was found to be the most abundant species in the genus, as were P. melaninogenica, N. subflava, and R. dentocariosa of their respective genera, while Streptococci were the most abundant genus in the oral cavity53. Haemophilus, Streptococcus and Pseudomonas were found to be present on the tongue only in low abundance in patients with liver carcinoma and were more abundant in controls54. Likewise, Granulicella was significantly elevated in lung cancer chemotherapy responders versus non-responders33. Here Granulicatella, specifically Granulicatella adiacens, was characteristic of individuals without SI all major comparative approaches: (1) SI versus all individuals without SI (NOSI); (2) SIsleep versus NOSIsleep when assessing only those with sleep issues, irrespective of the inclusion or exclusion of NOSI individuals with moderate to moderately severe PHQ-9 scores; and (3) using propensity score matching on sex and diet variables, as well as sleep. Another Lactobacillales bacterium, Streptococcus parasanguinis, was discovered to be less abundant in individuals with SI, even after matching on sex, diet, and sleep. Streptococcus parasanguinis is a commensal bacterium of the oral cavity and gut55 and an early colonizer of the oral cavity56 with high prevalence. It possesses the ability to adhere to the oral cavity and bind to other bacteria directly due to its long peritrichous fimbrial surface structures57. Importantly, S. parasanguinis demonstrates antimicrobial properties in the oral cavity and has been shown to inhibit periodontopathogen growth in vitro through the production of hydrogen peroxide as the primary mechanism58, playing a role in the etiology of dysbiotic oral disease.

Increasing evidence of autoimmunity or immune dysfunction being implicated in neurobiological, neurodevelopmental and neuropsychiatric disorders, such as in schizophrenia14, motivated us to investigate whether HLA genes encoding proteins that regulate the immune system are associated with increased SI risk or protection and to explore bacterial associations with those haplotypes. Whether HLA genes are involved in psychiatric and neurodevelopmental disorders is not established, with contrasting conclusions. The role of HLA in depression is not clear or may have a moderate effect20. HLA genes or C4 complement loci59,60 have been implicated in intellectual disability, autism15,16,17,18,19, and schizophrenia61. Although the haplotype risk is inconclusive62, molecular mechanisms involving excessive complement activity may explain in part the synaptic reduction in schizophrenia61. Most recently, in a large dataset of over 65,000 individuals participating in the Danish Integrative Psychiatric Research Consortium, Nudel et al. were unable to replicate reported associations but found protective effects of DPB1*1505 on autism and intellectual disability15.

Here, those with DQ2.2 and DQ2.5 alleles were more than three times as likely to endorse SI, as were those who did not carry DR4–DQ8. None of the 55 subjects carrying the homozygous DR4–DQ8 allele expressed SI but was found in 51 of the 235 subjects (21.7%) without SI. DR4–DQ8 is associated with multiple autoimmune diseases including celiac disease and type 1 diabetes63. We are not aware of any connections between DR4–DQ8 and mental health. There are reports of connections between celiac disease and cognition, but these are controversial. The DR4–DQ8 positive association with no SI and celiac disease might predict that celiac disease would not be associated with mental health challenges. The DRB1*15 allele doubled the chances of SI in this cohort. DRB1*15 increases risk for multiple sclerosis63,64,65,66. A previous connection with a nervous disorder is consistent with a role for DRB1*15 in SI. DRB1*04 is negatively associated with schizophrenia15. DPB1*15 may be protective against intellectual disability and autism15. The associations in this cohort provide additional evidence that HLA may contribute to depressive etiologies.

In this work, we also assessed SNPs that have been associated with suicidality in published GWAS studies. Of the fourteen SNPs we tested, rs10437629 was significantly associated with incidence of SI in this cohort. This SNP has been previously identified as associated with attempted suicide6. We found that the homozygous GG or heterozygous AG alternative alleles were positively correlated with SI. This SNP is located within an intron of the gene coding for the membrane bound KIAA1549L protein, which suggests that the effect of this mutation may be regulatory. Furthermore, the KIAA1549L protein is enriched in the brain and parathyroid gland67. Succinate-producing Alloprevotella rava was only found in those without SI and only in those homozygous for rs10437629 AA alleles (Fig. 2b). Succinate has been shown to be produced by gut bacteria and is a known signaling molecule to the brain. Succinate has long been known as a short chain fatty acid that assists in glucose oxidation in the brain and in brain respiration68,69. Application of succinate to the brain can improve cerebral metabolism after traumatic brain injury70,71. Moreover, succinic acid is among several plasma metabolite neurotransmitters proposed to have diagnostic capability in MDD, found at significantly reduced levels in plasma of individuals with MDD in a metabolomics study72. Succinate levels have been reported to be 2.3 times higher in the saliva of males compared to the saliva of females73. The mechanism for higher levels of succinate in male versus female saliva is unknown. This is consistent within our cohort, where females were 1.94 times more likely to express SI than males.

Investigating suicidal behaviors at the university affords the opportunity to intervene in at-risk students prior to perceived and anticipated stress events, such as mid-term examinations, in a population where the incidence of suicidal ideation is high. Academic stress has been associated with increased interleukin-1B74, plaque75, and gingival inflammation76 and it is unknown whether the bacteria identified here mediate these associations. The microbiota may provide a potential therapeutic target for brain health and neurological disorders given their role in neuroinflammation25. Although we did not assess inflammatory markers, the present study identifies specific bacteria in the mouth that appear to occur more frequently or at higher abundances in individuals with SI and bring to question the mechanism by which such bacteria might confer risk or protection against these phenotypes. Chronic low-grade inflammation may increase risk for MDD. Basal and LPS-stimulated inflammatory markers predicted MDD symptoms over nine years in the Netherlands Study of Depression and Anxiety (NESDA) cohort of MDD patients77. Environmental selection of microbiota based on resilience and diversity may be partly responsible for the rise of autoimmune and inflammatory disorders78 but could also contribute to the etiology of depression. Understanding how these bacteria promote inflammation may give mechanistic insight into the diatheses of suicide, offering opportunity to characterize an individual’s susceptibility to stressors, including precipitating and perpetuating factors.

This work is the first to report associations of the oral microbiome with recent SI. It is important to note that the PHQ-9, upon which these results are based, only assessed for SI in the past 2 weeks. As such, it is possible that some individuals in the “No SI” control group had a history of SI or self-harm, which could alter the interpretation of results. History of self-harm was not addressed in this investigation, which would help elucidate whether the findings are specific to SI or to suicidal behaviors in general. Metrics such as prior history of SI and suicidal behaviors, antidepressant use, and comorbidities would benefit future extension and confirmation of these findings. Drug by gene interactions may increase risk of SI as a result of antidepressant treatment10. Data on antidepressant use must be included in future cohorts. Likewise, although several significant confounds of the microbiome, such as sex, sleep, and diet, were accounted for in this investigation, other data were not obtained, such as geography or ethnicity of the subjects. Because the present analyses are cross-sectional, we cannot speak to the chronology or causality of the factors that we have identified as potential risk indicators or being associated with increased incidence of SI. In this investigation, we assigned species names based on the closest cultured relative and 100% matching of the ASV using the 16S rRNA V3–V4 methodology. To be more confident in the precise taxonomy of the bacteria, metagenomic sequencing is warranted to interrogate bacterial functions and/or other sequencing methods such as the rrn operon to obtain longer reads. Investigating the neurophysiological consequences of these microbes in the pathology of MDD and suicidal behaviors will require longitudinal cohorts and further study on the role in precipitating and perpetuating inflammation.

Here, we identified novel connections between diet, genetics, and saliva microbiome composition in suicidal ideation in young adults, expanding on the existing evidence supporting the diathesis of suicidality. To our knowledge, this is the first study to demonstrate a role for genetic and salivary bacterial interactions in the etiology of suicidal ideations. Our work warrants further investigation into the role that rs10437629 and HLA play in SI, and whether these genotypes influence suicidality risk via the saliva microbiota or independently. Evidence supports the notion that depression is an inflammatory disease79, with at least one-quarter of patients with depression demonstrating low-grade inflammation80. Elucidating how the oral microbiota mediate or contribute to the inflammatory and metabolic status of the oral milieu may inform novel interventions or preventions for MDD and SI, especially given the emerging and important connection between the mouth and brain.


Study design

Undergraduate and graduate students taking Microbiology and Cell Science courses at the University of Florida voluntarily enrolled in the study between August 2018 and February 2020. The study was approved by and carried out in accordance with the University of Florida Institutional Review Board (IRB), with study approval IRB study #201801744. All methods were carried out in accordance with relevant guidelines and regulations.

Participants were recruited from Microbiology & Cell Science courses as approved by the IRB. The cohort was comprised of 489 total students of whom 318 were female (65.0%) and 171 were male; the average age was 21.37 ± 4.14 years. After providing their informed consent, participants completed the Mediterranean Diet Quality Index (KIDMED81) and the Patient Health Questionnaire (PHQ-938) electronically through Research Electronic Data Capture (REDCap)82. The KIDMED was initially developed for children and adolescents, but has been validated in college students as a measure of Mediterranean diet adherence, with adequate test–retest reliability83. The PHQ-9 is widely used in clinical and research settings, with a sensitivity of 88% (potentially higher84) and specificity of 88% for identifying major depression in respondents scoring ≥ 10. The 1-factor structure of the PHQ-9 has been validated in a sample of 857 U.S. college students, suggesting equivalent assessment across sex and racial/ethnic groups. Subjects provided a 2 mL saliva sample concurrently with the survey using the IsoHelix GeneFix DNA/RNA Collection kit (IsohelixTM DNA/RNA GFX-02 2 mL; Cell Projects Ltd., Harrietsham, UK), which stabilizes DNA and RNA long term at ambient temperature. Students who screened positive for suicidal ideation (SI) based on a non-zero response on question 9 of the PHQ-9 as well as those who scored > 19 on the PHQ-9 were immediately referred to the Counseling and Wellness Center at UF and those students who scored in the moderate to moderately severe range (10–19) were informed of mental health resources.

Depression levels were calculated from the PHQ-9 total score as previously documented by Kroenke et al.38 Binary (zero or non-zero) responses were considered in this manuscript for questions 3 and 9 of the PHQ-9, pertaining to sleep issues and SI, respectively. A non-zero response on PHQ-9 question 9 reflects that the participant denied having had “thoughts that [they] would be better off dead, or of hurting [themselves] in some way” for several days or more over the past 2 weeks38. A non-zero response on PHQ-9 question 3 reflects that the participant denied having “trouble falling or staying asleep, or sleeping too much” several days or more over the past 2 weeks.

Processing of saliva samples

Saliva samples were stored at ambient temperature per the manufacturer’s instructions. These kits are designed to collect a 2 mL saliva sample into a 2 mL mix of proprietary stabilization and lysis buffer pre-filled into the collection kit. The optimized stabilization buffer allows for long-term storage of DNA and RNA at ambient temperature. DNA was extracted and stored at 4C. Extraction was carried out in accordance with the Isolation Protocol for GFX-02 GeneFix saliva samples, with the following revisions: incubation at 37 °C and an increased TE/DNA rehydration step to a duration of 95 min, and omission of purification steps 9–11 since purification was subsequently carried out with Qiagen or the Zymo Genomic DNA Clean & Concentrator kit. DNA was quantified, and 260/280 and 260/230 values were obtained with NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE). The DNA was then fractionated for 16S rRNA and human genetic analysis.

16S rRNA gene PCR and amplicon sequencing

The V3-V4 variable region of the bacterial 16S rRNA gene was polymerase chain reaction (PCR)-amplified with 341F (NNNNCCTACGGGAGGCAGCAG) and 806R (GGGGACTACVSGGGTATCTAAT) with 1 µM final concentration of Illumina fusion, using single-indexed, barcoded primers. Primers, purified with Qubit dsDNA High Sensitivity (Invitrogen, Life technologies Inc., Carlsbad, CA), and quantified with the 1X dsDNA High Sensitivity kit with a Qubit 2.0 fluorometer (Invitrogen, Life Technologies Inc., Carlsbad, CA as we have previously described85. Briefly, 50 ng of template DNA for each sample was taken into the PCR reaction. The templates were barcoded and adapted using custom primers during PCR amplification prior to pooling. Standard Illumina Read 1 sequencing/indexing primers were used to produce the amplicons. Each of the standard Illumina barcodes carries 11 unique nucleotides. PCR amplification was carried out with an initial denaturation of 95 °C for 3 min, 25 cycles of 95 °C for 45 s, 53 °C for 30 s, and 75 °C for 90 s, with a final step of elongation at 75 °C for 10 min. Those samples that did not initially amplify were subjected to barcode shotgun PCR. One sample did not amplify with barcode shotgun PCR and thus was excluded from the analysis. Fragment size was verified with gel electrophoresis, and the purified PCR products were quantified using the 1X dsDNA High Sensitivity kit with a Qubit 2.0 fluorometer (Invitrogen, Life Technologies Inc., Carlsbad, CA) via Qubit fluorometer.

PCR products were spin column purified using either a Qiagen kit or the Zymo Genomic DNA Clean & Concentrator kit, and then their concentration was determined using Qubit dsDNA High Sensitivity (Invitrogen, Life Technologies Inc., Carlsbad, CA). Equal masses of 40 ng DNA for each sample were pooled for Illumina MiSeq processing. The three pools of the barcoded, paired-end amplicons were then sequenced using the Illumina MiSeq with 2x300 cycles at UF’s Interdisciplinary Center for Biotechnology Research (ICBR) NextGen DNA Sequencing facility.

Amplicon processing

Forward and reverse reads were merged using Qiime1 using the script. Demultiplexing was performed either with Qiime1 ( and or the BBMap pipeline.

The sample inference program DADA286 was employed in R for filtering, trimming, and taxonomic assignment of the amplicon sequence variants (ASVs) with single-nucleotide resolution. DADA2 has demonstrated fewer false positive sequence variants than OTU-based methods, and its resolution of biological differences allows for exact sequence inference. It has been suggested that use of ASVs with exact matching (or 100% identity) is an appropriate method for assigning bacterial species to amplicons generated by high-throughput 16S rRNA techniques86. However, species assignments here are based on the closest cultured relatives in the Silva database. Reads shorter than 420 bp were discarded, as were reads that matched against the phiX genome. Amplicons were truncated to remove low quality tails and the 21 bp forward primer was removed, resulting in reads with 399 nucleotides in length. Reads with at least one ambiguous nucleotide were filtered out. Reads were truncated at the first instance of a quality score ≤ 11. The maximum error allowed in a read was set to 187. Error rates were learned by DADA2’s machine learning algorithm. The resulting reads from the four sequencing pools were merged and chimeras removed. Between 95.47 and 98.43% of sequence variants remained in each pool after chimera removal.

In aggregate, the 372 samples for whom we had microbiome data represented 9352 unique ASVs (6125 without chimeras) with 96.83% of the sequence variants remaining after chimera removal. A total of 13,259,769 reads were present in the dataset, with an average of 35,644.54 (median 33,751) reads per sample. The fewest number of reads in a sample was 6506.

Taxonomy was assigned in DADA2 with the assignTaxonomy and addSpecies functions, using the Silva version 138 database88. Species-level assignments were exclusively reserved for exact matching between the ASVs and reference strains, which is considered optimal for identity thresholds using 16S rRNA sequencing data89. Full sequences corresponding to the 6135 ASVs are provided as a reference in Supplementary Table 10, along with the taxonomic assignment from SILVA.

Human genotyping and HLA imputations

Extracted DNA was purified using either a Qiagen kit or the Zymo Genomic DNA Clean & Concentrator kit (Genomic DNA Clean & Concentrator-10; Zymo Research, Irvine, CA USA) for high yield ultra-pure and large fragment recovery. DNA concentration was determined using Qubit dsDNA High Sensitivity (Invitrogen, Life technologies Inc., Carlsbad, CA). Purified samples were stored at 4 °C. 100 ng of DNA at a concentration of 5 ng/uL were plated and then sequenced using the Axiom Precision Medicine Research Array (PMRA) (Applied Biosystems, ThermoFisher Scientific) against a standard control for microarray analysis according to the PMRA manufacturer’s instructions. The PMRA has a genome-wide imputation grid that covers over 902,560 single-nucleotide polymorphism (SNP) markers. The sequencing was performed according to the protocol described in the Axiom 2.0 Assay Manual Workflow User Guide (Thermo Fisher Scientific Inc.) with the GeneTitan MC Instrument. We generated call codes from the SNP data of the 310 subjects whose samples were sequenced by the PMRA and met quality control cut-offs, using the Axiom Array Suite Software (Thermo Fisher Scientific Inc.).

HLA haplotypes were imputed from the SNP genotype data with the Axiom HLA Analysis Suite version 1.2 (Thermo Fisher Scientific Inc.). This imputation determines the HLA type of 11 major MHC Class I and Class II loci with 2-digit and 4-digit resolution over an extended major histocompatibility (MHC) region that affords accurate HLA typing from SNPs90, using a multi-population reference panel and HLA imputation model to infer HLA from Affymetrix genotyping arrays91. The HLA*IMP:02 model is tolerant of missing data and performs at 2-digit resolution with an accuracy of over 90% across most ethnicities (and with 95% accuracy at 4-digit resolution on diverse European panels). HLA markers were evaluated at 2-digit resolution.

Statistical analyses

Continuous variables (age, KIDMED and PHQ-9 total scores) were compared by sex and across depression severity levels using a Mann–Whitney U or Kruskal Wallis test after testing the dataset for normal distribution. Correlations between the sixteen binary dietary habits and HLA haplotypes with recent SI (non-zero scores on PHQ-9:9) were conducted with chi-square tests. These tests were carried out in SPSS version 27 (IBM). All statistical tests were two-tailed.

HLA Class II DQA1*01, DQA1*03, DQB1*03, and DRB1*015 associations with increased incidence of SI

HLA Class I and Class II haplotype distributions for A*01, A*02, DPA1*01, DPA1*02, DPA1*03, DPA1*04, DPB1*01, DQA1*01, DQA1*01, DQA1*01, DQA1*05, DQB1*03, DQB1*05, DRB1*03, DRB1*04, DRB1*15, DRB5*01, and DRB5*99 were compared between the NOSI and SI groups using a Pearson chi-square statistic in SPSS. For those haplotypes with lower incidence, subjects with 1–2 alleles of the haplotype were combined. Individuals who had at least one allele each of DQA1*02 and DQB1*02 were designated DQ2.2, and those with at least one allele each of DQA1*05 and DQB1*02 were designated DQ2.5. Those who were DQA1*03 and DQB1*03 were designated DQ8, and those with DRB1*04 and DQ8 designated DR4–DQ8.

Human SNPs associated with increased incidence of SI and subsequent testing of associations with bacterial abundances

There are sixteen single nucleotide polymorphisms (SNPs) assessed by the PMRA and with an EBI trait inclusive of the key word “suicide” or “suicidal.” These SNPs are rs12373805, rs17387100, rs11143230, rs4918918, rs300774, rs358592, rs2462021, rs10437629, rs2610025, rs10748045, rs10448044, rs3781878, rs10854398, rs17173608, rs4732812, and rs7296262. The MAF of each SNP was calculated within the cohort (Supplementary Table 11). Fourteen of the SNPs were observed at a MAF ≥ 0.010, while rs17387100 and rs17173608 were below. The genotypes at these fourteen loci (excluding rs17387100 and rs17173608) were assessed in the cohort with respect to the presence of SI using 3 × 2 and 2 × 2 chi square tests based on dosage and then presence of the minor allele (homozygotes and heterozygotes combined). Kruskal Wallis statistics were then conducted to identify whether and which genera and species were differentially abundant in individuals carrying the risk allele for those SNPs that we found to be associated with SI.

Bacterial statistical analyses

Confounders of the saliva microbiome

To determine which diet features differed in saliva microbial composition and whether there was microbial divergence among those with or without SI, a PERMANOVA based on Bray–Curtis distance matrix was conducted using the adonis function of the vegan R package on the compositionally transformed counts of the 6,135 ASVs that we identified across the 372 samples. The computation was carried out with 1000 permutations. Reads were compositionally transformed using the microbiome R package.

Microbiome diversity

Global ecosystem state measures including richness92, evenness , and diversity were calculated using the alpha function of the microbiome R package and compared across the Nsleep cohort, comprised of subjects with sleep issues only (Nsleep = 229 with SIsleep = 43 and NOSIsleep = 186), using a non-parametric Wilcoxon test in ggpubr.

Differentially expressed ASVs in individuals with SI

To assess differential expression of the 6,135 ASVs across SI (n = 47) and various depression categories, we first used an RNA-seq based method for assessing a negative binomial distribution by maximum likelihood estimation, DESeq293 which has been adapted for microbiome data94. For this approach, the log geometric means were calculated using the unnormalized counts of the 6,135 ASVs and applied a pseudo-count of 1. Independent filtering was applied in the DESeq2 package to optimize the adjusted p value by hypothesis weighting95.

Four comparative groups were compared against SI using this approach: (1) all subjects without SI (NOSI, n = 325), (2) subjects without SI and with a PHQ-9 total score of 0–4 (no or minimal depression, NODEP, n = 157), (3) subjects without SI and with PHQ-9 of 5–9 (mild depression, MILD, n = 128), (4) subjects without SI and with a PHQ-9 score ≥ 10 (high likelihood of depression, DEP, n = 40). For each model, outliers were replaced and 2076 (NOSI), 1369 (DEP), 1813 (NODEP), 1746 (MILD) ASVs were refitted for differential abundance analyses. A local smoothed dispersion fit was applied to estimate the dispersions, and the Wald test used to assess statistical significance. Prevalence differences were calculated using a chi-square test in a SI (n = 47) versus NOSI (n = 325) comparison.

Network analysis of salivary microbiota

Genus-level networks were assessed using a Spearman correlation in MicrobiomeAnalyst96. To control for sleep issues, the correlation was carried out with the Nsleep restricted cohort (Nsleep = 229).

Differentially abundant closest species and genera in individuals with SI after controlling for sex, diet, and sleep

ASVs were combined at higher taxonomic ranks using the tax_glom function of phyloseq. 60.7% of ASVs were identifiable with a closest microbiome relative at the species level in SILVA. Of the 560 species observed in the saliva, 177 were found in at least 2% of the cohort and were retained for differential abundance analysis. Of the 227 genera observed in the saliva, 118 were present in at least 2% of the subjects.

To offset potential over-dispersion that could result from application of RNA-seq paradigms to microbiome datasets97, differential abundances were also modeled with nonparametric approaches after matching on potential confounders. Propensity score matching was employed using the matchit function of the MatchIt R package to match SI and NOSI samples on confounders of SI or the microbiome: sex, daily fruit consumption, regular consumption of pasta and rice, dining at a fast-food hamburger restaurant more than once weekly, and regular consumption of nuts, since we found that the former and latter are associated with increased risk of SI and the other variables are possible confounders for the saliva microbiome. Propensity score matching was employed at both a 1:2 and 1:3 ratio of SI to NOSI controls. Given that individuals with SI were significantly more likely to report sleep issues on the PHQ-9 (91.5% SI vs. 57.2% NOSI), propensity score matching was again applied to additionally account for sleep differences using a dichotomous classification of response to Q3 of the PHQ-9. In a third model controlling for HLA genetics, propensity score matching was applied on the previously mentioned variables (sex, five dietary habits, and presence of sleep issues) as well as DRB1*15, DQA1*01, DQB1*03, DPB1*05, and A*30 genotype at a 1:2 ratio of SI to NOSI controls. Only individuals with a genotype at all five haplotypes were included in this model, for an N of 93 (SI n = 31, NOSI n = 62).

For the propensity score matching analyses, Kruskal–Wallis or Mann–Whitney tests were conducted in R across the 177 species’ and 118 genera’s percent relative abundances to identify significant differentially abundant features along each of the matching paradigms.

In a restricted dataset comprised only of those individuals who reported sleep issues (Nsleep = 229), DESeq2 was run again at the genus and species levels to identify differentially abundant taxa. Of the 227 and 560 genera and species respectively, 75 genera and 219 species passed the internal filtering of the DESeq2 program. These were considered in the final dispersion estimates and model fitting. Those who reported sleep problems and were included in this restricted Nsleep dataset.

Core microbial differences in individuals with SI after matching on sleep issues

Microbial communities were defined in the SIsleep and NOSIsleep cohorts selected in the 2:1 propensity score match. In the 47 SIsleep and 94 NOSIsleep individuals, a total of 1960 and 2881 unique ASVs were observed respectively. Prevalence thresholds were determined with out-of-bag error rates for the classification of SI and the ASVs comprising the prevalence threshold resulting in 0% error using the PIME package98 in R.

Salivary species and genera associated with HLA haplotypes linked to SI

With a two-sided means test in SPSS v27 (IBM), the relative abundances of filtered species and genera were compared across the HLA haplotypes linked to SI risk (DQ2.2 and DQ2.5, DQ8, DR4–DQ8, and DRB1*15).