Structural basis of mammalian high-mannose N-glycan processing by human gut Bacteroides

The human gut microbiota plays a central role not only in regulating the metabolism of nutrients but also promoting immune homeostasis, immune responses and protection against pathogen colonization. The genome of the Gram-negative symbiont Bacteroides thetaiotaomicron, a dominant member of the human intestinal microbiota, encodes polysaccharide utilization loci PULs, the apparatus required to orchestrate the degradation of a specific glycan. EndoBT-3987 is a key endo-β-N-acetylglucosaminidase (ENGase) that initiates the degradation/processing of mammalian high-mannose-type (HM-type) N-glycans in the intestine. Here, we provide structural snapshots of EndoBT-3987, including the unliganded form, the EndoBT-3987-Man9GlcNAc2Asn substrate complex, and two EndoBT-3987-Man9GlcNAc and EndoBT-3987-Man5GlcNAc product complexes. In combination with alanine scanning mutagenesis and activity measurements we unveil the molecular mechanism of HM-type recognition and specificity for EndoBT-3987 and an important group of the GH18 ENGases, including EndoH, an enzyme extensively used in biotechnology, and for which the mechanism of substrate recognition was largely unknown.

For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub All other data that support the findings of this study are available from the corresponding authors on reasonable request.

Methods section
No data was excluded from the alanine scan experiment.
Methods section: The alanine scan experiments were performed in triplicate and the average of the three experiments was plotted and CI was calculated from these.
Describe how samples/organisms/participants were allocated into experimental groups. If allocation was not random, describe how covariates were controlled OR if this is not relevant to your study, explain why.
Describe whether the investigators were blinded to group allocation during data collection and/or analysis. If blinding was not possible, describe why OR explain why blinding was not relevant to your study.
Describe the validation of each primary antibody for the species and application, noting any validation statements on the manufacturer's website, relevant citations, antibody profiles in online databases, or data provided in the manuscript. Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Policy information about studies involving human research participants Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Clinical data
Policy information about clinical studies All manuscripts should comply with the ICMJEguidelines for publication of clinical research and a completedCONSORT checklist must be included with all submissions.

Clinical trial registration
Study protocol

Data collection ATCC HEK293T
Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated.
Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination.
Name any commonly misidentified cell lines used in the study and provide a rationale for their use.
Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the issuing authority, the date of issue, and any identifying information).
Indicate where the specimens have been deposited to permit free access by other researchers.
If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new dates are provided.
For laboratory animals, report species, strain, sex and age OR state that the study did not involve laboratory animals.
Provide details on animals observed in or captured in the field; report species, sex and age where possible. Describe how animals were caught and transported and what happened to captive animals after the study (if killed, explain why and describe method; if released, say where and when) OR state that the study did not involve wild animals.
For laboratory work with field-collected samples, describe all relevant parameters such as housing, maintenance, temperature, photoperiod and end-of-experiment protocol OR state that the study did not involve samples collected from the field.
Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance was required and explain why not.
Describe the covariate-relevant population characteristics of the human research participants (e.g. age, gender, genotypic information, past and current diagnosis and treatment categories). If you filled out the behavioural & social sciences study design questions and have nothing to add here, write "See above." Describe how participants were recruited. Outline any potential self-selection bias or other biases that may be present and how these are likely to impact results.
Identify the organization(s) that approved the study protocol.
Provide the trial registration number from ClinicalTrials.gov or an equivalent agency.
Note where the full trial protocol can be accessed OR if not available, explain why.
Describe the settings and locales of data collection, noting the time periods of recruitment and data collection.

nature research | reporting summary
October 2018 Outcomes ChIP-seq Data deposition Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.

Files in database submission
Genome browser session The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.

Methodology
Sample preparation Instrument Software Cell population abundance Gating strategy Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.
Describe how you pre-defined primary and secondary outcome measures and how you assessed these measures.
For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, provide a link to the deposited data.
Provide a list of all files available in the database submission.
Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to enable peer review. Write "no longer applicable" for "Final submission" documents.
Describe the experimental replicates, specifying number, type and replicate agreement.
Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of reads and whether they were paired-or single-end.
Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone name, and lot number.
Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and index files used.
Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold enrichment.
Describe the software used to collect and analyze the ChIP-seq data. For custom code that has been deposited into a community repository, provide accession details.
Describe the sample preparation, detailing the biological source of the cells and any tissue processing steps used.
Identify the instrument used for data collection, specifying make and model number.
Describe the software used to collect and analyze the flow cytometry data. For custom code that has been deposited into a community repository, provide accession details.
Describe the abundance of the relevant cell populations within post-sort fractions, providing details on the purity of the samples and how it was determined.
Describe the gating strategy used for all relevant experiments, specifying the preliminary FSC/SSC gates of the starting cell population, indicating where boundaries between "positive" and "negative" staining cell populations are defined.