Transcriptome analysis in whole blood reveals increased microbial diversity in schizophrenia

The role of the human microbiome in health and disease is increasingly appreciated. We studied the composition of microbial communities present in blood across 192 individuals, including healthy controls and patients with three disorders affecting the brain: schizophrenia, amyotrophic lateral sclerosis, and bipolar disorder. By using high-quality unmapped RNA sequencing reads as candidate microbial reads, we performed profiling of microbial transcripts detected in whole blood. We were able to detect a wide range of bacterial and archaeal phyla in blood. Interestingly, we observed an increased microbial diversity in schizophrenia patients compared to the three other groups. We replicated this finding in an independent schizophrenia case–control cohort. This increased diversity is inversely correlated with estimated cell abundance of a subpopulation of CD8+ memory T cells in healthy controls, supporting a link between microbial products found in blood, immunity and schizophrenia.


Introduction 46
eukaryotes, bacteria, archaea and viruses. In recent years, mounting evidence has 48 demonstrated the involvement of the microbiome in human health and disease. In particular, 49 through the 'microbiota-gut-brain-axis' (1, 2), the microbiome has been implicated in complex 50 psychiatric disorders, including schizophrenia and major depressive disorder(3-8), possibly via 51 an impact on intestinal permeability(9). 52 High-throughput sequencing offers a powerful culture-independent approach to study 53 the underlying diversity of microbial communities in their natural habitats across different 54 human tissues (10) and diseases (3,(11)(12)(13)(14)(15). The majority of current microbiome studies use 55 fecal samples and target 16S ribosomal RNA gene sequencing (16). With the availability of 56 comprehensive compendia of reference microbial genomes and phylogenetic marker genes 57 (17), it has become feasible to use non-targeted sequencing data to identify the microbial 58 species across different human tissues and diseases in a relatively inexpensive and easy way. 59 Other than in cases of sepsis, we currently lack a comprehensive understanding of the 60 human microbiome in blood, as blood has been generally considered a sterile environment 61 lacking proliferating microbes (18). However, over the past few decades, this assumption has 62 been challenged (19,20), and the presence of a microbiome in the blood has received 63 increasing attention (21)(22)(23). 64 To explore potential connections between the microbiome and diseases of the brain, we 65 performed a comprehensive analysis of microbial products detected in blood in almost two

Sequence Analysis 101
We separated human and non-human reads, and use the latter as candidate microbial reads for 102 taxonomic profiling of microbial communities. To identify potentially microbial reads, we 103 developed the following pipeline. First, we filtered read pairs and singleton reads mapped to 104 the human genome or transcriptome. Because total number of reads may affect microbial 105 profiling, we performed normalization by sub-sampling to 100,000 reads for each sample. Next, 106 we filtered out low-quality and low-complexity reads using FASTX and SEQCLEAN (see urls). 107 Finally, the remaining reads were realigned to the human references using the Megablast 108 aligner (27) in order to exclude any potentially human reads. The remaining reads were used as 109 candidate microbial reads in subsequent analyses. Figure 1 displays an overview of our pipeline. 110

Taxonomic profiling 111
To access the assembly and richness of the microbiomial RNA in blood, we used phylogenetic 112 marker genes to assign the candidate microbial reads to the bacterial and archaeal taxa. We 113 used PhyloSift (v 1.0.1 with default parameters) to perform taxonomic profiling of the whole 114 blood samples (17). PhyloSift makes use of a set of protein coding genes found to be relatively 115 universal (i.e., present in nearly all bacterial and archaeal taxa) and have low variation in copy 116 number between taxa. Homologs of these genes in new sequence data (e.g., the 117 transcriptomes used here) are identified and then placed into a phylogenetic and taxonomic 118 context by comparison to references from sequenced genomes. For our replication study, we 119 used MetaPhlAn for microbial profiling v.1.7.7(28). MetaPhlAn was run in 2 stages; the first 120 stage identifies the candidate microbial reads (i.e., reads hitting a marker) and the second stage profiles meta-genomes in terms of relative abundances. We used MetaPhlAn, rather than 122 PhyloSift, due to differences in library preparation (polyA enrichment versus Ribo-Zero); there 123 were an insufficient number of reads matching the database of the marker genes curated by 124 PhyloSift for adequate microbial profiling of the replication sample. 125 126

Estimating Microbial diversity 127
Microbial diversity, or alpha diversity, within each sample was determined using the inverse 128 Simpson index. This index simultaneously assesses both richness (corresponding to the number 129 of distinct taxa) and relative abundance of the microbial communities within each sample (29). 130 In particular, it enables effective differentiation between the microbial communities shaped by 131 the dominant taxa and the communities with many taxa with even abundances (30) (asbio R 132 package). To measure sample-to-sample dissimilarities between microbial communities, we use 133 Bray-Curtis beta diversity index, which accounts for both changes in the abundances of the 134 shared taxa and for taxa uniquely present in one of the samples (vegan R package). Higher beta 135 diversity indicates higher level of dissimilarity between microbial communities, providing a link 136 between diversity at local scales (alpha diversity) and the diversity corresponding to total 137 microbial richness of the subject group (gamma diversity (31)). 138 139

Statistical analysis of microbiome diversity 140
To test for differences in alpha diversity between disease groups, we fit an analysis of 141 covariance (ANCOVA) model using normalized values of alpha, including sex and age, and

Reference-free microbiome analysis 149
We complement the reference-based taxonomic analysis with a reference independent 150 analysis. We use EMDeBruijn (https://github.com/dkoslicki/EMDeBruijn), a reference-free 151 approach capable of quantifying differences in microbiome composition between the samples. 152 EMDeBruijn compresses the k-mer counts of two given samples onto de Bruijn graphs and then 153 measures the minimal cost of transforming one of these graphs into the other. To determine 154 overlap between the results from PhyloSift and EMdeBruin, we correlated principal 155 components of EMdeBruin and PhyloSift by Spearman rank correlation, including all samples. 156 157

Estimation of cell proportions in whole blood 158
We assessed DNA methylation data from 65 controls taken from our replication sample, and we 159 compared methylation-derived blood cell proportions estimated using Houseman's estimation 160 method (32, 33) to alpha diversity after adjusting for age, gender, RIN and all technical 161 parameters. We tested whether alpha diversity levels are associated with cell type abundance 162 estimates. More details on the method, quality control pipeline of the methylation data and 163 statistical analysis can be found in Supplementary Methods.

Studying microbial RNA in blood 166
To study the composition of microbial RNA in blood, we determined the microbial meta-167 transcriptome present in the blood of unaffected controls (Controls, n=49) and patients with 168 three brain-related disorders: schizophrenia (SCZ, n=48), amyotrophic lateral sclerosis (ALS, 169 n=47) and bipolar disorder (BPD, n=48) ( Figure 1, Table 1). 170 [ Figure 1 about here] 171 [ Table 1 about here] 172 173 Using our filtering pipeline, an average of 33,546 of 100,000 unmapped reads are identified as 174 high quality, unique non-host reads and were used as candidate microbial reads in our analyses. 175 From these, PhyloSift was able to assign an average of 1,235 reads (1.24% ± 0.41%, mean ± 176 standard deviation) to the bacterial and archaeal gene families. A total of 1,880 taxa were 177 assigned, with 23 taxa at the phylum level ( Figure 2). Most of the taxa we observed derived 178 from bacteria (relative genomic abundance 89.8% ± 7.4%), and a smaller portion derived from 179 archaea (relative genomic abundance 12.28% ±6.4%). abundance 73.4% ± 18.3%, 14.9 ±10.9%, and 11.0% ± 8.9% (Table S2). This is in line with recent 188 published work on the blood microbiome using 16S targeted metagenomic sequencing 189 reporting relative abundance of 80.4-87.4% and 3.0-6.4% for Proteobacteria and Firmicutes,190 respectively (23).
The other two phyla identified in this study (Actinobacteria and 191 Bacteroidetes) were also detected in our sample in more than 25 individuals. Although 192 Proteobacteria and Firmicutes, are commonly associated with the human microbiome (34) To compare the inferred microbial composition found in blood with that in other body 204 sites, we used taxonomic composition of 499 meta-genomic samples from Human Microbiome 205 Project (HMP) obtained by MetaPhlAn or five major body habitats (gut, oral, airways, and skin) 206 (10). Of the 23 phyla discovered in our sample, 15 were also found in HMP samples, of which 13 are confirmed by at least ten samples. Our data suggest that the predominant phyla detected in 208 blood are most closely related to the known oral and gut microbiome (Table S2). Comparing the 209 microbial composition of whole blood with the microbiome detected in atherosclerotic plaques 210 (37), we observe that the four phyla that together make up for >97% of the microbiome in 211 plaques are also identified in our sample (Firmicutes, Bacteroidetes, Proteobacteria, 212 Actinobacteria). 213 Finally, it should be noted that the sequencing technology does not allow for 214 identification of the origin of microbial RNA. That is, we can not distinguish whether the 215 observed microbial signatures in blood are originate from bacterial communities actually 216 present in the blood, or whether the RNA crossed into the blood stream from elsewhere. 217 218

Increased microbial diversity in schizophrenia samples 219
To evaluate potential differences in microbial profiles of individuals with the different disorders 220 (SCZ, BPD, ALS) and unaffected controls, we explored the composition and richness of the 221 microbial communities across the groups. 222 We observed increased alpha diversity in schizophrenia samples compared to all other 223 groups (ANCOVA P < 0.005 for all groups, Figure 3a, Table 2 and Table S5, Bonferroni 224 correction). These differences are corrected for covariates and are independent of potential 225 confounders, such as experimenter and RNA extraction run ( Figure S1 and S2), and they are not 226 the consequence of a different number of reads being detected as microbial in schizophrenia 227 samples (see Supplementary Results). No significant differences were observed between the three remaining groups (BPD, ALS, Controls). In our sample, alpha diversity was found to be a 229 significant predictor of schizophrenia status and explained 5.0% of the variation as measured by 230 reduction in Nagelkerke's R 2 from logistic regression. We observe no correlation between 231 polygenic risk scores (39) and alpha diversity in our schizophrenia sample (n= 32, Kendall's tau= 232 0.008, P = 0.96, Supplementary Methods). We also did not observe differences in alpha 233 diversity between sexes or across ages, nor are our results driven by the relatively younger 234 schizophrenia cohort (Supplementary Results). Alpha diversity at other main taxonomic ranks 235 yields a similar pattern of increased diversity in schizophrenia ( Figure S3). 236 The increased diversity observed in schizophrenia patients may be due to specific phyla 237 characteristic to schizophrenia, or due to a more general increased microbial diversity in people 238 affected by the disease. To investigate this, we compared diversity across individuals within the 239 schizophrenia group to control samples. We compared beta diversity across pairs of samples 240 with schizophrenia and controls, resulting in three subject groups: SCZ_Controls, SCZ_SCZ and 241 Controls_Controls. The lowest diversity was observed in the Controls_Controls group (0.43 ± 242 0.21), followed by SCZ_SCZ (0.50 ± 0.14), and the highest beta diversity values were observed 243 for SCZ_Controls (0.51 ± 0.17) (P< 0.05 for each comparison, by ANCOVA after correcting for 244 three tests). This result was confirmed by permanova (P<0.001) based on 1,000 permutations. 245 Thus, the observed increased alpha diversity in schizophrenia is not caused by a particular 246 microbial profile, but most likely represents a non-specific overall increased microbial burden 247 (see also Figure S4 and Supplementary Results). 248 In addition to measuring individual microbial diversity (alpha), and diversity between 249 individuals (beta), we measured the total richness of the microbiome by the total number of distinct taxa of the microbiome community observed within an entire subject group (gamma 251 diversity (40)). We observed that all 23 distinct phyla are observed in schizophrenia: 252 gamma(SCZ)=23 compared to gamma(Controls)=20, gamma(ALS)=16 and gamma(BPD)=18. 253 We complemented reference-based methods (PhyloSift and MetaPhlAn) with 254 EMDeBruijn, a reference-independent method. EMDeBruin distances measured between 255 samples correlated significantly with beta diversity (Spearman rank P < 2.2e-16, rho = 0.37, 256 including SCZ and Controls). Also, EMDeBruijn PCs correlated with principal components 257 obtained from edge PCA based on the PhyloSift taxonomic classification (correlation between 258 EMDeBruijn PC1, and PhyloSift PC1 is P = 1.824e-09; Spearman rank correlation is rho = -0.42; 259 see also Figure S5). After correcting covariates, the first three EMDeBruijn PCs are significant 260 predictors of schizophrenia status, and jointly explained 7.1% of the variance (P< 0.05 for each 261

Group differences of individual phyla 265
In addition to a global difference between schizophrenia and the other groups, we also 266 investigated whether there are particular individual phyla contributing to the differences 267 between schizophrenia and other groups. There are two phyla detected more often in 268 schizophrenia cases versus all the other groups: Plactomycetes, observed in 20 SCZ cases 269 compared to 3(ALS) 2(BPD) 5(Controls) (P= 0.0002 Fisher's exact for four groups, Bonferroni corrected for 23 tests P=0.0057) and Thermotogae, observed in 20 SCZ cases compared to 6 271 ALS, 3 BPD and 6 Controls (P= 0.0006 Fisher's exact, corrected P=0.014). No outliers were 272 observed for the other groups (see Table S7). 273 274

families. 279
Schizophrenia samples showed increased alpha diversity on genus level (2.73 ± 0.77 for 280 cases, versus 2.32 ± 0.57 for controls, corrected P = 0.003 Figure 3b) and explained 2.5% of 281 variance as measured by reduction in Nagelkerke R 2 , thus replicating our main finding of 282 increased diversity in schizophrenia. While our original analysis was performed on the phylum 283 level, in our discovery sample we observe a similar increase of diversity at the genus level (see 284 Figure S3). Similar to our discovery cohort, we observed no significant correlation between 285 alpha diversity and age or differences across gender. Beta diversity and EMDeBruijn analyses 286 also show similar, though not identical, patterns of nonspecific increased diversity in 287 schizophrenia samples (Supplementary Results). 288 289

Cell type composition and diversity 290
We hypothesized that differences in microbial diversity may be linked to whole blood cell type 291 composition. Our analysis shows that the proportion of one cell type, CD8 + CD28 -CD45RAcells, 292 is significantly negatively correlated with alpha diversity after correction for all other cell-count 293 estimates as estimated from whole blood DNA methylation data (correlation = -0.41, P=7.3e-4, 294 n= 65 Controls from the Replication study, Figure S6, Table S6). These cells are T cells that lack 295 CD8 + naïve cell markers CD28 and CD45RA and are thought to represent a subpopulation of 296 CD8 + memory T cells (41,42). We observed that low alpha diversity correlates with high levels 297 of cell abundance of this population of T cells. 298 299 300

Discussion 301
We used high throughput RNA sequencing from whole blood to perform microbiome profiling 302 and identified an increased diversity in schizophrenia patients. 303 While other studies of human microbiome using RNA-Seq have been conducted (43,44), 304 this is the first assessing the microbiome from whole blood by using unmapped non-human 305 reads. Despite the fact that transcripts are present at much lower fractions than human reads, 306 we were able to detect microbial transcripts from bacteria and archaea in almost all samples. 307 The microbes found in blood are thought to be originating from the gut as well as oral cavities 308 (45, 46), which is in line with our finding that the microbial profiles found in our study most 309 closely resemble the gut and oral microbiome as profiled by the HMP (10). The taxonomic 310 profile of the cohort samples suggests the prevalence of the several phyla, Proteobacteria, 311 Firmicutes and Cyanobacteria, across individuals and different disorders included in our study. This is in line with a recent study that used 16S targeted meta-genomic sequencing, which 313 reported Proteobacteria and Firmicutes among the most abundant phyla detected in blood 314

(23). 315
Rigorous quality control is critically important for any high-throughput sequencing 316 project, especially in the context of studying the microbiome (35). To this end, we performed 317 both negative and positive quality control experiments, and we carefully evaluated possible 318 contamination effects introduced during the experiments. Our results suggest that the detected 319 phyla represent true microbial communities in whole blood and are not due to contaminants. 320 However, it should be noted that whether only the microbial products crossed into the 321 bloodstream or whether the microbes themselves are present in blood cannot be answered 322 using sequencing techniques. Future experiments, for example, using microscopy, culturing or 323 direct measures of gut permeability, may be able to shed light on this question. 324

325
The most striking finding of our study that relates to diseases affecting the central 326 nervous system is the increased microbial alpha diversity in schizophrenia patients compared to 327 controls and the other two disease groups (ALS, BPD). We replicate this finding in an 328 independent cohort of schizophrenia cases and controls. The replication experiment, while 329 based on different library preparation (Ribo-Zero versus Poly(A)), provides strong evidence for 330 an increased alpha diversity of the microbiome detected in blood in schizophrenia and explains 331 roughly 5% of disease variation. We not only observe an increased individual microbial 332 diversity, but also an increased diversity between individuals (Beta diversity) with schizophrenia 333 compared to controls, rendering it unlikely that a single phylum or microbial profile is causing the disease-specific increase in diversity. Nevertheless, in our study we observed that two phyla 335 in particular, Planctomycetes and Thermotogae, were present in significantly more 336 schizophrenia samples when compared to the other groups. Interestingly, Planctomycetes is 337 group of gram-negative bacteria closely related to Verrucomicrobia and Chlamydiae; together 338 these comprise the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum (47). From 339 peripheral blood, infection with Chlamydiaceae species has been reported to be increased in 340 schizophrenia (40%) compared to controls (7%) (48). Since Chlamydiae is one of the taxa of the 341 superphylum, it is possible that the increase in Planctomycetes we observe is related to the 342 observed increase in Chlamydiaceae species. As the collection of available reference genomes 343 continues to grow and improve, future studies are needed to corroborate and refine these 344

findings. 345
For the study of microbiome diversity, we employed reference-based methods 346 (PhyloSift and MethPhlAn) and the EMDebruin method, a purely reference-agnostic approach. 347 The latter showed strong correspondence to both reference-based methods, highlighting the 348 value of this unbiased sequence-based analysis for investigating microbial differences across 349 groups. However, in addition to differences in distribution of microbial transcripts, EMDebruin 350 may capture variation of other, yet unknown, origin. 351 In addition to our observation that microbial diversity is more generally increased in 352 schizophrenia, our study demonstrates the value of analyzing non-human reads present in the 353 RNA-Seq data to study the microbial composition of a tissue of interest (49, 50). The RNA-Seq 354 approach avoids biases introduced by primers in targeted 16S ribosomal RNA gene profiling. In 355 addition, since mRNA stability is low in prokaryotes, RNA-Seq might offer a potential advantage of avoiding contamination of genomic DNA by dead cells compared to genome sequencing (51). 357 Given the many large-scale RNA-Seq datasets that are becoming available, we anticipate that 358 high-throughput meta-transcriptome-based microbiome profiling will find broad applications as 359 a hypothesis-generating tool in studies across different tissues and disease types. 360

361
The increased microbial diversity observed in schizophrenia could be part of the disease 362 etiology (i.e., causing schizophrenia) or may be a secondary effect of disease status. In our 363 sample, we observed no correlation between increased microbial diversity and genetic risk for 364 schizophrenia as measured by polygenic risk scores (52). In addition, it is remarkable that 365 bipolar disorder, which is genetically and clinically correlated to schizophrenia (53), does not 366 show a similar increased diversity. We did observe a strong inverse correlation between 367 increased diversity and estimated cell abundance of a population of T-cells in healthy controls. 368 Even though this finding is based on indirect cell count measures using DNA methylation data 369 (41), the significant correlation highlights a likely close connection between the immune system 370 and the blood microbiome, a relationship that has been documented before (54). More 371 extensive cell count measures and/or better markers of immune sensing of microbial products 372 could be used to study this relationship more directly. In the absence of a direct link with 373 genetic susceptibility and the reported correlation with the immune system, we hypothesize 374 that the observed effect in schizophrenia may be mostly a consequence of disease. This may be 375 affected by lifestyle and/or health status differences of schizophrenia patients, including 376 smoking, treatment plans, (chronic) infection, GI status, the use of probiotics, antibiotics and 377 other drug use or other environmental exposures. Future targeted and/or longitudinal studies with larger sample sizes, detailed clinical phenotypes and more in-depth sequencing are 379 needed to corroborate this hypothesis. Another interesting direction for future work is to study 380 gut permeability in the context of our findings more directly. For example, how does damage to 381 the gut (such as measured using I-FABP) affect observed microbial diversity in blood? These 382 studies would likely result in an expanded understanding of the functional mechanisms 383 underlying the connection between the human immune system, microbiome, and disease 384 etiology. In particular, we hope that these future efforts will provide a useful quantitative and 385 qualitative assessment of the microbiome and its role across the gut-blood barrier in the 386 context of psychiatric disorders. total RNA using ribo-depletion protocol. Reads that failed to map to the human reference 551 genome and transcriptome were sub-sampled and further filtered to exclude low-quality, low 552 complexity, and remaining potentially human reads. High quality, unique, non-host reads were 553 used to determine the taxonomic composition and diversity of the detected microbiome. See 554 also Table S1. 555 556 Figure 2. Relative abundances of microbial taxa at phylum level. Phylogenetic classification is 557 performed using PhyloSift, which is able to assign the filtered candidate microbial reads to the 558 microbial genes from 23 distinct taxa on the phylum level. increased diversity compared to all three other groups (ANCOVA P < 0.005 for all groups, after 564 adjustment of covariates, see also Methods, Table S5 and Figure S3). (B) Alpha diversity per 565 sample of schizophrenia cases and controls, measured using the inverse Simpson index on the