Microinvasion by Streptococcus pneumoniae induces epithelial innate immunity during colonisation at the human mucosal surface

Control of Streptococcus pneumoniae colonisation at human mucosal surfaces is critical to reducing the burden of pneumonia and invasive pneumococcal disease, interrupting transmission, and achieving herd protection. Here, we use an experimental human pneumococcal carriage model (EHPC) to show that S. pneumoniae colonisation is associated with epithelial surface adherence, micro-colony formation and invasion, without overt disease. Interactions between different strains and the epithelium shaped the host transcriptomic response in vitro. Using epithelial modules from a human epithelial cell model that recapitulates our in vivo findings, comprising of innate signalling and regulatory pathways, inflammatory mediators, cellular metabolism and stress response genes, we find that inflammation in the EHPC model is most prominent around the time of bacterial clearance. Our results indicate that, rather than being confined to the epithelial surface and the overlying mucus layer, the pneumococcus undergoes micro-invasion of the epithelium that enhances inflammatory and innate immune responses associated with clearance.

A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars
State explicitly what error bars represent (e.g. SD, SE, CI) Our web collection on statistics for biologists may be useful.

Software and code
Policy information about availability of computer code Data collection -Zen Zeiss software -TissueFAXS software -LSR2 Flow Cytometer Data analysis -Statistical analysis; GraphPrism v7 -Microscopy Images; LSM Image Browser -Flow Cytometry; FlowJo -RNASeq data; processing and analysis was conducted in R, language and environment for statistical computing (https://www.Rproject.org). Mpping and generation of read counts per transcript were performed using Kallisto60, based on pseudoalignment. R/ Bioconductor package Tximport was used to import the mapped counts data and summarise the transcripts-level data into gene level data61. DESeq2 and SARTools packages62, were used for differential gene expression analysis.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Sample size on in vivo data was dependent on volunteers within the study (up to 18). Exact numbers are defined in the main table (microscopy and microbiology) and supplementary figure legends (flow cytometry). Sample sizes for in vitro experiments was a minimum of three independent experiments with technical replicates.
Data exclusions For flow cytometry EHPC data, samples were excluded from analyses if the epithelial cell population was less than 500. For other exclusions, more than 2 standard deviations away from the mean were excluded (for Figure 3c: 1x 23F, 1x dPly for IL-8, 1x for dPly ICAM).

Replication
In vitro experiments were conducted in three or more independent experiments with technical replicates, unless stated otherwise. For EHPC data, entire samples were utilized for each assay.
Randomization No randomization was required since experimental groups for the EHPC was double-blinded.

Blinding
For EHPC data, CMW was blinded for the entirety of the data collection as to the microbiology carriage status of each volunteer. FlowJo data collection was also blinded so that no bias was included in the gating strategy for the EHPC data.
Reporting for specific materials, systems and methods

ChIP-seq Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.
For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, provide a link to the deposited data.

Provide a list of all files available in the database submission.
Genome browser session (e.g. UCSC) Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to enable peer review. Write "no longer applicable" for "Final submission" documents.

Methodology Replicates
Describe the experimental replicates, specifying number, type and replicate agreement.

Sequencing depth
Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of reads and whether they were paired-or single-end.

Antibodies
Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone name, and lot number.

Peak calling parameters
Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and index files used.

Data quality
Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold enrichment.

Software
Describe the software used to collect and analyze the ChIP-seq data. For custom code that has been deposited into a community repository, provide accession details.

Flow Cytometry
Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.

Methodology Sample preparation
Human mucosal cells from the inferior turbinate were obtained by curettage using a plastic Rhino-probe and incubated in cold PBS++ (PBS supplemented with 5mM EDTA and 0.5% FCS). Cells were dislodged by pipetting and centrifuged at 440g for 5 mins at 4oC. Supernatant was removed, and cells resuspended in 25μl of PBS++ with Live/DeadTM Fixable stain. After 15 minutes incubation on ice, an antibody cocktail to stain for epithelial surface marker expression was added and incubated for another 15 minutes. Samples were vortexed, resuspended in 3.5mls of PBS++ and filtered over a pre-wetted 70μm filter. Samples were transferred to a 5ml FACS tube, centrifuged and resuspended in 200μl Cell Fix.
For the in vitro analysis, confluent monolayers of Detroit 562 cells on 6 well plates were incubated with S. pneumoniae for 6 hours in 1% FCS phenol free alpha MEM (base media, Life Technologies). Cells were washed three times in PBS and gently lifted from the plate using a cell scraper in 300μl of base media supplemented with 1mM EDTA. Samples were transferred to 5ml FACS tubes and placed on ice for the duration of the protocol. Each cell sample was incubated with an antibody cocktail for epithelial surface marker expression (see Supplemental Information) for 30 minutes before rinsing in 1ml base media and centrifuging at 300g for 5 minutes at 4oC. Cells were fixed in 600μl of 4% PFA and acquired on a LSR II Flow Cytometer (BD Biosciences). Compensation was run and applied for each experimental replicate and voltages consistent throughout. Isotype controls (BD Biosciences), FL-1 and single stains were also run for each experiment. Samples were acquired until 300,000 events had been collected. Analyses were performed using FlowJo LLC version 10 software.

Instrument
Samples were acquired on LSRII Flow Cytometer (BD Biosciences).

Software
Analyses of data was performed using FlowJo LLC version 10 software.
Cell population abundance In vivo: The entire sample was acquired as the starting population size varied between volunteers. Data was performed on the gated epithelial cell population and only samples containing 500 or more cells are reported. In vitro: Samples were acquired until 300,000 events had been collected.

Gating strategy
In vivo: Samples were gated into 'all cells', 'single cells', and finally 'EpCAM positive' cells, against an empty channel (AF-700-A) to analyse the epithelial cell population of the samples. In vitro: Cells collected from Detroit 562 confluent monolayers were gated into 'all cells', the population further defined to 'single cells', and finally exclusion of dead cells (defined by treatment with H2O2), lead to the live cell population used for analyses.
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.

Magnetic resonance imaging
Experimental design Design type Indicate task or resting state; event-related or block design.

Design specifications
Specify the number of blocks, trials or experimental units per session and/or subject, and specify the length of each trial or block (if trials are blocked) and interval between trials.
Behavioral performance measures State number and/or type of variables recorded (e.g. correct button press, response time) and what statistics were used to establish that the subjects were performing the task as expected (e.g. mean, range, and/or standard deviation across subjects).