High-resolution in situ transcriptomics of Pseudomonas aeruginosa unveils genotype independent patho-phenotypes in cystic fibrosis lungs

Life-long bacterial infections in cystic fibrosis (CF) airways constitute an excellent model both for persistent infections and for microbial adaptive evolution in complex dynamic environments. Using high-resolution transcriptomics applied on CF sputum, we profile transcriptional phenotypes of Pseudomonas aeruginosa populations in patho-physiological conditions. Here we show that the soft-core genome of genetically distinct populations, while maintaining transcriptional flexibility, shares a common expression program tied to the lungs environment. We identify genetically independent traits defining P. aeruginosa physiology in vivo, documenting the connection between several previously identified mutations in CF isolates and some of the convergent phenotypes known to develop in later stages of the infection. In addition, our data highlight to what extent this organism can exploit its extensive repertoire of physiological pathways to acclimate to a new niche and suggest how alternative nutrients produced in the lungs may be utilized in unexpected metabolic contexts.

genome (com_pangenome in Supplementary Figure 10), which was made up of all the complete genomes of species belonging to the genera representing at least 2% of the active community in at least one sample (see Supplementary Data 1 for complete list of accession numbers). Reads not mapping on community pan-genome were considered free of transcript deriving from bacteria other than P. aeruginosa and mapped against 79 deposited P. aeruginosa genomes marked as complete in NCBI database (pau_pangenome in Supplementary Figure 10. See Supplementary Data 1 for complete list of accession numbers). Reads mapping on "com_pangenome" were inspected using BLASTn to recover sequences that could be uniquely assigned to one of the P. aeruginosa genomes (identity > 95% on 99% of length) contained in "pau_pangenome".
Reads reassigned to P. aeruginosa by BLASTn analysis were integrated with those mapping uniquely on "pau_pangenome" data set ("Filter and mapping on pan-genomes" strategy in Supplementary Figure 10) and compared with reads obtained by mapping directly ("Direct mapping" strategy in Supplementary Figure 10) the NHR datasets against P. aeruginosa PA14 genome. Reads shared between the two datasets were identified using reads unique identifiers (IDs).
When we used the "Direct mapping" strategy, a highly variable number of reads mapped directly on P. aeruginosa PA14 genome, with most of the samples (n = 10, ca. 77%) encompassing the range 1 -5 million (Supplementary Table 3). As expected, the percentage of P. aeruginosa reads obtained agreed with the observed relative abundance of the pathogen in the transcriptionally active bacterial community, with samples dominated by P. aeruginosa (i.e. P30M0_S1, P30M0_S2, P24M1_S1, P77F1_S1, P11F2_S1) scoring the highest percentage of reads assigned to this bacterium (Supplementary Figure 2 and Supplementary Table 3). When we employed the "Filter and mapping on pan-genomes" strategy, on average, 85% reads (range 54% -96%) were not assignable to the any bacterial species contained in the com_pangenome data set (Supplementary Table 3). Almost half of the sequences surviving the filtering step were assigned to P. aeruginosa by mapping against the pau_pangenome data set, showing figures similar to those obtained by directly mapping against the P. aeruginosa PA14 genome (Supplementary Table 3). Moreover, ~90% of the reads were shared between the two datasets in all samples. These figures increased to 99% when we reassigned the sequences that mapped to the "com_pangenome" but that can be unequivocally assigned to only P. aeruginosa using BLASTn tool. Therefore, pre-filtering and mapping the filtered reads using a pan-genome of P. aeruginosa strains did not improved reads assignment compared to directly mapping total non-human reads against P. aeruginosa PA14 genome, and the faster direct method was chosen for our analysis. However, we anticipate that the negligible bias we observed is derived by the nature of our samples in which P.
aeruginosa is the major pathogen. We therefore strongly advise reads filtering when the percentage of the target species drops under 10-15% of the total bacterial community, and the initial number of reads used for mapping is lower than 500,000 high-quality reads, as experienced with another analyzed sample not connected to this study.

Supplementary Note 3. Technical and biological reproducibility
In order to evaluate the technical reproducibility of the technique we used three samples (P30M0_S3, P30M0_S4 and P77F1_S1). Each sputum sample was physically split in two technical replicates at the time of collection and each replicate was processed independently, RNA sequenced, and gene expression evaluated as described in the Methods section. Normalized reads count and relative composition of the transcriptionally active microbial community were used to evaluate reproducibility calculated as Pearson correlation coefficient (r) (Supplementary Figure 4).
In order to evaluate the biological reproducibility of the technique we collected from the same patient two independent expectorates on the same day at a distance of 15 minutes (P30M0_S3, P30M0_S4) (Supplementary Table 1 Genes belonging to the accessory, soft-core and strict-core genomes are represented by red, gray and black dots, respectively. Relative abundance of bacterial genera identified by mapping total reads classified as non-human against a curated database of clade-specific markers. The graph report unfiltered data containing all the genera identified by MetaPhlan 2 tool. Samples' and patients' identifiers, antibiotic administration (oral, inhalation, intravenous), and attendance to hospital are reported. Bacterial genera diversity is measured using Shannon diversity index, with 0.0 representing the presence of only one dominant genus. If a genus was described in literature as associated with cystic fibrosis using either culture-dependent or culture-independent methods the "Y" character is reported in the CF microbiome column.