Dynamics of the compartmentalized Streptomyces chromosome during metabolic differentiation

Bacteria of the genus Streptomyces are prolific producers of specialized metabolites, including antibiotics. The linear chromosome includes a central region harboring core genes, as well as extremities enriched in specialized metabolite biosynthetic gene clusters. Here, we show that chromosome structure in Streptomyces ambofaciens correlates with genetic compartmentalization during exponential phase. Conserved, large and highly transcribed genes form boundaries that segment the central part of the chromosome into domains, whereas the terminal ends tend to be transcriptionally quiescent compartments with different structural features. The onset of metabolic differentiation is accompanied by a rearrangement of chromosome architecture, from a rather ‘open’ to a ‘closed’ conformation, in which highly expressed specialized metabolite biosynthetic genes form new boundaries. Thus, our results indicate that the linear chromosome of S. ambofaciens is partitioned into structurally distinct entities, suggesting a link between chromosome folding, gene expression and genome evolution.


Supplementary Figure 1: Size distribution of S. ambofaciens GIs
Using a minimal threshold of 15 CDS within the synteny break in at least one of the compared genome, 50 GIs (composed of 2 to 158 CDSs) and 22 insertion points (≤ 1 CDS within the synteny break) were identified in S. ambofaciens ATCC 23877 genome. Figure 2: The metabolic differentiation of S. ambofaciens a Growth curves of Streptomyces ambofaciens in MP5 and YEME10 liquid media. In these media, S. ambofaciens grows in a rather dispersed way, so that growth can be monitored by optical density (OD600nm). We refer to "pseudo-opacymetry" because of the presence of some clumps of bacteria. The boxplots represent the first quartile, median and third quartile, each dot corresponding to an independent experiment. The upper whisker extends from the hinge to the largest value no further than 1.5 * the inter-quartile range (IQR, i.e. distance between the first and third quartiles) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. b Bioassays performed in conditions used in this study. Bioassays were performed against Micrococcus luteus with the supernatants of S. ambofaciens ATCC 23877 grown in MP5 or YEME10 liquid media and harvested at different time points. Noninoculated media were used as negative controls ('T-'). 'C1' to 'C7' refer to the name of the studied conditions (Table 1). The width of the plates is 12 cm. The presence of a halo indicates the presence of one or more antibacterial activities. The expression of the SMBGCs encoding known antibiotic is presented in Supplementary Fig.3c. Source data are provided as a Source Data file. c HPLC analyses of the supernatants of S. ambofaciens ATCC 23877 during growth in MP5 medium. After 24 h and 48 h of growth, filtered culture supernatants were submitted to HPLC analysis (see Method section for details). Sterile MP5 medium was used as a negative control. The absorbance was monitored at 297 nm. After 48 h of growth, the presence in the culture supernatants of several peaks (highlighted by numbers) that are absent after 24 h of growth illustrates the process of metabolic differentiation of S. ambofaciens. The peak # 4 around 15.6 min presents the same retention time and absorbance spectrum absorbance (one representative result in the islet) than an authentic congocidine standard and is absent in the culture supernatant of a strain with the congocidine cluster deleted ('ΔCGC', lab collection). This peak therefore corresponds to congocidine. The results correspond to the mean of five independent experiments per strain. Source data are provided as a Source Data file.

Supplementary Figure 3: The metabolic differentiation of S. ambofaciens
analyzed at the transcriptional level a Multidimensional analyses of the studied conditions. Left panel -Axes 1 and 2 of the principal component analysis (PCA) of the whole data set (21 samples), with percentages of variance associated with each axis. The numbers indicate the condition (see Table 1) with a distinct letter (a, b, c, d) for each replicate. Right panel -Ascending hierarchical classification (Euclidean distance, Ward criterion) and its inertia gain. The conditions form two clusters (1 and 2) and five subclusters which are boxed. The cluster 1 corresponds to the earliest growth time points (24 h, 30 h and 36 h) in which no antibiotic production was detected by bioassay (Supplementary Fig.2b). The cluster 2 corresponds to conditions associated to metabolic differentiation and the latest growth time points. Its sub-clusters separate owing the growth time and medium: 48 h in MP5 (subcluster 2.1), 72 h in MP5 (subcluster 2.2), 48 h in YEME10 (subcluster 2.3). b MA-plot comparing transcriptomes over growth and over media. The MA-plot represents the log ratio of differential expression as a function of the mean intensity for each pairwise comparison. The differentially expressed features are highlighted in red, and the number of up-and down-regulated genes indicated on each graph. Triangles correspond to features having a too low/high log2 (fold change) to be displayed on the plot. c Transcriptomes over growth of the four biosynthetic gene clusters encoding all known antibacterial activities of S. ambofaciens. Genes expressed at the highest level are boxed in red for the congocidine, spiramycin and kinamycin BGCs. They correspond to cgc1 (SAM23877_RS39345, encoding a transcriptional regulator), srmB (SAM23877_RS26680, encoding the ABC-F type ribosomal protection protein SrmB) and alpZ (SAM23877_RS00890, encoding a transcriptional regulator) genes, respectively. The gene encoding SrmS (SAM23877_RS26595), the transcriptional regulator of the spiramycin BGC, is boxed in black. The dots correspond to the mean number of DESeq2 normalized reads per kb (normRPK) of the replicates, and are joined by a line for clarity (but they do neither report a real observation nor a model).

Supplementary Figure 4: The dynamic of S. ambofaciens transcriptome a
Categories of gene expression level. The boxplot represents the distributions of the read counts per gene (normalized by DESeq and on gene size, RPK) individually for each tested condition and as the mean expression of each gene in all tested conditions. The genes were classified owing their expression level by considering the distribution parameters as indicated. The boxplots represent the first quartile, median and third quartile. The upper whisker extends from the hinge to the largest value no further than 1.5 * the inter-quartile range (IQR, i.e. distance between the first and third quartiles) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Outliers are not represented. b Pattern of gene expression depending on genome features of interest. The genes were classified owing their mean (upper panel), their minimal or their maximal (lower panel) expression level in all studied conditions. The table presents the statistical analysis of the data (two-sided Fisher's exact tests for count data). c Multidimensional analysis (overlay factor map) of genomic and transcriptomic data. The squares correspond to the raw labels (genome features, see Fig.1 for abbreviations). The triangles correspond to the column labels (e.g. category of level of mean, maximal, minimal sense-transcription, antisense index < 0.05, antisense index > 0.5). The closer the squares or triangles are to the origin of the graph, the closer they are likely to be to the mean data. Proximity between triangles or between squares indicates similarity. A small angle (e.g. less than 30°) connecting a triangle and a square to the origin indicate that they are probably associated. If they are located at the opposite, they are probably negatively associated. The proximity between the variables 'SMBGCs' and 'Switch' is due to the fact that a high proportion (19.3 %) of the SMBGC genes switched on at the transcriptional level in at least one studied condition (compared to 4.4 % at the level of the whole genome, odds ratios 5.2, pvalue < 2.2.10 -16 ; 7.6 % of genes within GIs, odds ratios 2.9, p-value 2.2.10 -9 , twosided Fisher's exact tests for count data). Other abbreviations: 'AS < 0.05' and 'AS > 0.5': very low (below 0.05) or very high (above 0.5) antisense-transcription indices; 'Max', 'Mean' and 'Min': maximal, mean and minimal expression levels in all studied condition, from the lowest category ('0') to the highest ('4'); 'Switch': switch from CAT_0 to CAT_3 or more. d & e Correlation between gene persistence and transcription in sense (d) and antisense (e) orientation. This correlation was analyzed by a Spearman's rank correlation test. Gene transcription corresponds to the number of DESeq2 normalized reads per kb (normRPK). f Mapping of sense-and antisense-reads along the chromosome during growth in MP5 medium. The normalized counts [transcripts per million, normalized considering the total (sense-plus antisense-) transcription] were mapped on the S. ambofaciens chromosome. Please note that transcription in the antisense orientation is presented on two different scales.

Supplementary Figure 5: Analyses of chromosome structure and its correlation
with transcription a Contact probability plotted as a function of genomic distance in YEME10 at 24 h. The plot represent the intra-chromosomal contact probability, P(s), for pairs of loci separated by a genomic distance on the chromosome. Results of two replicates are presented. b Derivative of the contact matrix for WT cells in YEME10 at 24 h. The map shows for each genome locus the tendency of contact frequencies to go downwards or upwards. Boundaries are visualized as the black vertices of the squares along the diagonal. Regions with no upstream/downstream bias show no squares (e.g. terminal regions). c Boxplot presenting the antisense index over growth depending of genes and their structural location. The distribution of the antisense index is presented depending on the location of the genes. The antisense indexes in C1 condition of the genes at the boundaries or domains in common between C1 and C6 conditions ('Common C1 & C6') are also presented. The boxplots represent the first quartile, median and third quartile. The upper whisker extends from the hinge to the largest value no further than 1.5 * the inter-quartile range (IQR, i.e. distance between the first and third quartiles) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Outliers are not represented. The p-values of two-sided Wilcoxon rank sum tests with continuity correction (boundaries versus domains) are indicated. The number of analyzed genes ('#') per condition is indicated. Please note that the antisense index was not calculated when no reads were detected, either in sense or antisense orientation, in all replicates. d Dispersion index (DI) of compartments plotted as a function of genomic distance in YEME10 after 24 h of growth. The DI reflects the range of variations relative to the mean value as a function of the genome distance for the contact maps. Long-range DNA contacts within the terminal compartments (> 100 kbp) are more variable than within the central compartment. e Contact probability of the compartments plotted as a function of genomic distance. f Contact map analysis of the primary and secondary diagonals. Contact maps were obtained after setting the threshold for significant interactions at 0.5 x and 1.0 x the standard deviation above the median to the original 3C-contact maps obtained in exponential (24 h in MP5 and YEME10 media) and stationary (48 h in MP5 medium) phase. Contact frequencies above or below the threshold were assigned a value of 1 or 0, respectively, generating a binary contact map in which significant interactions are represented in yellow, whereas non-significant interactions are represented in blue. Connected binary maps were generated by connecting 10 elements considered as significant using a diamond shape of 15 to fill out the empty points comprised by the connected elements 1,2 . Figure 6: Contact maps and 3D-models for S. ambofaciens a Replicate of cells grown 24 h in YEME10 growth medium, C6 condition b Replicate of cells grown 24 h in MP5 growth medium, C1 condition c Replicate of cells grown 48 h in MP5 growth medium, C4 condition. The violet and pink arrows indicate the location of boundaries formed at desferrioxamin and congocidine BGCs, respectively. d 3D-models of S. ambofaciens genome generated by ShRec3D software 3 e Results of the frontier index analysis including peaks that do not form boundaries. This panel presents the frontier index analysis of the 3C-maps presented in Fig.3 and Fig.4, showing single peaks that where not considered as indicative of a boundary. At each locus, two indices are computed, reflecting the intensity of the loss of contact frequencies when going downwards (green peaks) or upwards (orange peaks), respectively, to the locus. A boundary is defined as any bin in which there is a change in the right bias of contacts towards the left bias (± 2 bin, green and orange peaks, respectively). The detection of bins with only an upstream or a downstream significant peak reflects the intrinsically noisy nature of 3C-seq data. and of the region encompassing 100 % of the core CDSs ('complete core region') were also determined and expressed as percentages relative to the size of the whole chromosome (plasmids were not considered). The boxplots represent the first quartile, median and third quartile. The upper whisker extends from the hinge to the largest value no further than 1.5 * the inter-quartile range (IQR, i.e. distance between the first and third quartiles) from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Outliers are represented (dots). The odds ratio represents the odds than an outcome will occur within the extremities compared to the odds of the outcome occurring in the central region. The genetic information of each TIR was taken into consideration for the calculations. b The statistical significance of the odd ratios was assessed by two-sided Fisher's exact tests for count data.

Supplementary table 2: Gene distribution in the central region versus the extremities (defined by the first and last rRNA operons) of S. ambofaciens ATCC23877 chromosome depending on features of interest
c For this calculation, only the chromosomal strain-specific CDSs were taken into account (excluding the 44 strain-specific CDSs present on pSAM1 non-integrative plasmid). d 'always' refers to all the conditions investigated in this study (i.e. C1 to C7). Only chromosomal genes were considered in the calculation.