Metagenome-wide association of gut microbiome features for schizophrenia

Evidence is mounting that the gut-brain axis plays an important role in mental diseases fueling mechanistic investigations to provide a basis for future targeted interventions. However, shotgun metagenomic data from treatment-naïve patients are scarce hampering comprehensive analyses of the complex interaction between the gut microbiota and the brain. Here we explore the fecal microbiome based on 90 medication-free schizophrenia patients and 81 controls and identify a microbial species classifier distinguishing patients from controls with an area under the receiver operating characteristic curve (AUC) of 0.896, and replicate the microbiome-based disease classifier in 45 patients and 45 controls (AUC = 0.765). Functional potentials associated with schizophrenia include differences in short-chain fatty acids synthesis, tryptophan metabolism, and synthesis/degradation of neurotransmitters. Transplantation of a schizophrenia-enriched bacterium, Streptococcus vestibularis, appear to induces deficits in social behaviors, and alters neurotransmitter levels in peripheral tissues in recipient mice. Our findings provide new leads for further investigations in cohort studies and animal models.


Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, seeAuthors & Referees and theEditorial Policy Checklist .

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: Randomization 240 mice used for animal study and 299 human feces samples used in metagenome-wide association study. The sample sizes were determined by several previous studies that are similar with our study in trial design and methods. Compared with these studies, our sample sizes are bigger and will fulfill more statistical power (n= 15/group for behavioral test, n = 8 or 6/group for ELISA and chemical analysis; and n > 80 subjects/group for human cross-sectional study). As the effect size of treatment is difficult to predicted, so it is hard to calculate the exact statistical power, so we did no use any software to get the power.
When mice were dead or hurt, their data in the behavioral test were deleted.
For human study, we validated the results in an additional 90 samples of the testing set and the consistent findings are reported in our Results section. For mice study, all analysis were carried out at least two batches of mice. The two batches of mice were investigated independently at different time with complete same procedure. When the results were consistent between the two times of animal experiment, they are regarded as true results. When contradictory results were gotten, we carried out additional batch of mice animal to replicate the investigation. The two results that were consistent in three times of investigate were thought as true result.
The mice were randomly allocated into three group using random numbers. For human studies , the cases and controls are not sampling randomly. But we investigated and collected most reported confounding factors that can affect our main Y variable, gut microbiota diversity and composition, such as diet habit, medication use, antibiotics use, sports, drug use, alcohol drinking, smoking, BMI, metabolism serum biomarker and so on. All these potential confounding factors were included in statistical analysis to control their effects.
The technician who did behavioral test of mice did not know the group information of mice or which treatment was given to the mice. The scientists doing data analysis did not know the grouping information, i.e., the participants was case or control; what treatment was given to the mice.
Briefly describe the study type including whether data are quantitative, qualitative, or mixed-methods (e.g. qualitative cross-sectional, quantitative experimental, mixed-methods case study).
State the research sample (e.g. Harvard university undergraduates, villagers in rural India) and provide relevant demographic information (e.g. age, sex) and indicate whether the sample is representative. Provide a rationale for the study sample chosen. For studies involving existing datasets, please describe the dataset and source.
Describe the sampling procedure (e.g. random, snowball, stratified, convenience). Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient. For qualitative data, please indicate whether data saturation was considered, and what criteria were used to decide that no further sampling was needed.
Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper, computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether the researcher was blind to experimental condition and/or the study hypothesis during data collection.
Indicate the start and stop dates of data collection. If there is a gap between collection periods, state the dates for each sample cohort.
If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.
State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no participants dropped out/declined participation.
If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if allocation was not random, describe how covariates were controlled.

nature research | reporting summary
October 2018 Ecological, evolutionary & environmental sciences study design All studies must disclose on these points even when the disclosure is negative. Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.
Describe the data collection procedure, including who recorded the data and how.
Indicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for these choices. If there is a gap between collection periods, state the dates for each sample cohort. Specify the spatial scale from which the data are taken If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.
Describe the measures taken to verify the reproducibility of experimental findings. For each experiment, note whether any attempts to repeat the experiment failed OR state that all attempts to repeat the experiment were successful.
Describe how samples/organisms/participants were allocated into groups. If allocation was not random, describe how covariates were controlled. If this is not relevant to your study, explain why.
Describe the extent of blinding used during data acquisition and analysis. If blinding was not possible, describe why OR explain why blinding was not relevant to your study.
Describe the study conditions for field work, providing relevant parameters (e.g. temperature, rainfall).
State the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water depth).
Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, the date of issue, and any identifying information).
Describe any disturbance caused by the study and how it was minimized. Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Policy information about studies involving human research participants

Population characteristics
Recruitment Describe all antibodies used in the study; as applicable, provide supplier name, catalog number, clone name, and lot number.
Describe the validation of each primary antibody for the species and application, noting any validation statements on the manufacturer's website, relevant citations, antibody profiles in online databases, or data provided in the manuscript.
State the source of each cell line used.
Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated.
Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination.
Name any commonly misidentified cell lines used in the study and provide a rationale for their use.
Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the issuing authority, the date of issue, and any identifying information).
Indicate where the specimens have been deposited to permit free access by other researchers.
If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new dates are provided.
Male C57BL/6J mice were obtained from the Experimental Animal Center of Xi'an Jiaotong University Medical College (five weeks of age; 4-5 per cage); The mice were maintained in a temperature-controlled (21-23°C) specific pathogen-free level environment with a relative humidity 55 ± 10% and 12/12-h light-dark cycle.
This study did not involve wild animals.
This study did not involve samples collected from the field.
All animal procedures were approved by the Animal Care and Use Committee of Xi'an Jiaotong University.
All subjects are Han Chinese and local residents living in Shaanxi Province, China. Their families have been complete Han race for at least last two generation, i.e. the participants' parents, as well as their grandpa and grandma are Han race. Although we did not test their genetic background, we think the possibility that a difference in genetic background exists between the cases and controls is very low. The current mental disorder and physical disorder or disease were investigated via The Mini-International Neuropsychiatric Interview (MINI), physical check, blood bio-chemistrical analysis. All subjects did not have current physical illness (such as diabetes, heart disease, thyroid disease, autoimmune disease or any recent infections) or DSM-IV axis I or axis II disorders (except schizophrenia in patients). Other covariates charactering population in this study includes demographic features, socio-economic levels, alcohol and tobacco use, and diet habit, which are described detailed in Supplementary information.
All patients were recruited from inpatients. All healthy controls were selected from the people who attended psychological counseling or physical examination in five hospital. All participants who met our inclusion criteria were asked whether want to join our study project. The one who was willing to be included was finally selected and signed written consent. Some bias in diet habit and resident region may have impact on gut microbiota composition. These information were collected by self-reported questionnaire. Diet is a very strong factor to impact gut microbiota. The wrong profiling of diet intake will disturb the true effects of other variables on gut microbiota.

nature research | reporting summary
October 2018 Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript. Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication. The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).

Files in database submission
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided. The diversity of gut microbiota and relative abundance of each microbe; the functional potential of gut microbiota; the serum biomarkers for tryptophan metabolism and 5-HT, dopamine, glutamine and GABA.
For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, provide a link to the deposited data.
Provide a list of all files available in the database submission.
Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to enable peer review. Write "no longer applicable" for "Final submission" documents.
Describe the experimental replicates, specifying number, type and replicate agreement.
Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of reads and whether they were paired-or single-end.
Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone name, and lot number.
Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and index files used.
Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold enrichment.
Describe the software used to collect and analyze the ChIP-seq data. For custom code that has been deposited into a community repository, provide accession details.