Introduction

Mycobacterium tuberculosis (Mtb) is the causative agent of tuberculosis (TB), a disease existing for millennia and still remaining a major global health problem. Primary infection occurs by inhaling aerosol particles containing bacteria. Mtb is able to replicate inside alveolar macrophages and inflammatory cells recruited at the infection site determine formation of a histological pulmonary lesion named granuloma. In most cases Mtb is never cleared and survives inside granulomas in a non-replicative and non-infectious state known as latency1. Around one third of the world’s population is affected by latent TB. Latently infected individuals have 5–15% probability to develop the active disease during their lifetime. According to the 2016 World Health Organization (WHO) report, 10.4 million new TB cases were estimated worldwide, with 480,000 new cases of multidrug-resistant TB (MDR-TB) and 1.4 million deaths. In addition, the emergence of extensively drug–resistant TB (XDR-TB) and totally drug–resistant TB (TDR-TB) is becoming one of the biggest threats to public health and TB control programs2. Therefore, new insights into Mtb physiology are required to better characterize the pathogenesis mechanisms that Mtb exploits to survive and persist in its host, in order to individuate strategies to eradicate this ancient pathogen.

G-quadruplexes (G4s) are nucleic acids secondary structures that may form in single-stranded G-rich sequences under physiological conditions3. Four Gs bind via Hoogsteen-type hydrogen bonds base-pairing to yield G-quartets, which stack to form the G4. The presence of K+ cations specifically supports G4 formation and stability4. Based on the strand orientation, G4s can adopt three main topologies: parallel, antiparallel, and hybrid-type structures. Stability studies about the formation of G4s have demonstrated that these non-canonical DNA secondary structures are able to destabilize the double helix, since many G4 structures are thermodynamically more stable than double stranded DNA and their unfolding kinetics are significantly slower5, 6. In eukaryotes G4s have been reported to be involved in key regulatory roles, including transcriptional regulation of gene promoters and enhancers, translation, chromatin epigenetic regulation, DNA recombination6,7,8,9,10. Expansion of G4-forming motifs has been associated with relevant human neurological disorders8, 11. Formation of G4s in vivo has been consolidated by the discovery of cellular proteins that specifically recognize G4s12, 13 and the development of G4 specific antibodies14, 15. In viruses G4s have been implicated in key steps16: in the human immunodeficiency virus, the presence of functionally significant G4s10, 13, 17,18,19 and their targeting by G4 ligands with consequent antiviral effects10, 20, 21 have been reported. G4s have been also discovered in herpesviruses22,23,24,25, SARS coronavirus26 and human papilloma, Zika, Ebola and hepatitis C virus genomes27,28,29,30.

In prokaryotes, G4 sequences have been reported in Escherichia coli 7, 31, 32, Deinococcus radiodurans 33,34,35 Xanthomonas and Nostoc sp 36. Evidence of bacterial enzymes that process G4s, such as Pif1 and RecQ helicases, has been provided in Escherichia coli, Clostridium difficile and Bacteroides sp 37,38,39,40,41,42. Bacterial G4s have been implicated in antigenic variation of the cell-surface pilin proteins of Neisseria gonorrhoeae 43,44,45,46. In Mtb, whose genome is 65% GC rich, previous bioinformatics analysis identified more than 10,000 motifs with the potential to fold into G4 structures32. Additionally, evidence for the presence of a specific helicase that targets G4s (DinG) and for a G4 aptamer that inhibits a polyphosphate kinase involved in the inorganic polyphosphate intracellular metabolism has been provided in Mtb 47, 48.

The involvement of G4 structures in several human diseases propelled the development of small molecules directed against G4s9. Aromatic cores with protonable side chains, such as the acridine, BRACO-1949, 50 and water-soluble naphthalene diimides (NDIs)21, 51,52,53,54,55,56, specifically bind the G4 conformation. So far, the vast majority of molecules has been tested against cellular G4s implicated in tumor pathogenesis: some compounds showed interesting antiproliferative properties57; in particular, quarfloxin proceeded into phase II clinical trials, but its limited bioavailability prevented further progress58. In bacteria, N-methyl mesoporphyrin has been shown to attenuate Deinococcus resistance to radiation33; to our knowledge no other G4 ligand has been so far tested in bacteria.

To search for G4 motifs in Mtb, we have implemented a tool able to scan the whole genome and rank potentially interesting G4s according to their score. Only high scoring hits close to known transcription start sites (TSS) were considered. Four G4 sequences, close to the TSS of genes with known function, were selected and their G4 folding confirmed in solution. Two G4 ligands stabilized the selected G4s and inhibited bacterial cells growth with minimal inhibitory concentrations (MIC) in the low micromolar range.

Results and Discussion

Identification of putative G4 motifs in the promoter region of Mtb genes

To detect the presence of putative G4 motifs, the Mtb genome was scrutinized in silico assessing various lengths of G-islands and loops (Supplementary Figures 1a and b). A G4 was reported when at least four consecutive G-islands (n = 4) were identified. We also defined two parameters, l and d, corresponding to the minimal length of a G4 homopolymeric G-island and the maximum allowed distance between consecutive G-islands, respectively. Different combinations of l and d parameters were applied to allow the detection of G4 motifs with increasing stringency (i.e. 2 ≤ l ≤ 5 and d = 7, 11, and 15); we chose G4s with loop length up to 15 nucleotides since it has been reported that they can fold into stable G4s59. Computational searches have detected a high concentration of G4 motifs near promoter regions both in eukaryotic and prokaryotic genomes and in some cases a possible role of G4 motifs in transcription regulation has been reported60. For this reason and because of the abundance of GC content in Mtb, we restricted G4 analysis to regions close to transcription start sites (TSS). A short and a long score were computed considering 15 and 50 nucleotides, respectively, both upstream and downstream of the G4 motif, according to Beaudoin et al.61 (Table 1).

Table 1 Number of putative G4s in both strands of the Mtb genome within 50 nts upstream of a primary TSS.

The genomic coordinates of the predicted G4s both in the forward (Supplementary File S1a) and in the reverse strand (Supplementary File S1b) were intersected with the putative gene promoters, inferred by considering 50 nt upstream of the known primary TSS62 (Table 1 and Supplementary File S2 “Primary TSS”). The G4 motifs overlapping promoter regions were ranked by the short and long scores (Supplementary File S2 “G4 overlapping promoters”). As expected, the amount of detected G4 motifs decreased with the stringency of the searching parameters (i.e. longer G-islands and shorter distance between them). Moreover, the distribution of the predicted G4s was homogeneous in the two strands of the genome, with a slight prevalence of the reverse strand in six categories (out of 12) as opposed to four categories, which were more abundant in the forward strand (Table 1). To note that both the forward and reverse strand, depending on the gene, can be the coding strand in transcription.

Genes with putative DNA G4 forming sequences in Mtb

Based on the described bioinformatics analysis, we identified 45 genes with a putative G4, upstream or overlapping their TSS, with at least 3 Gs in each island (therefore with the ability to form at least a three-stacked G4) and a short or long score ≥ 2 (Table 2 and Supplementary File S2 “Candidate genes”). This threshold was chosen according to Beaudoin et al.61, which did not validate G4s with lower score. These genes were classified according to their functional category as reported in TubercuList63. In addition, a de novo function prediction based on Gene Ontology (GO) annotations was performed with the online server Argot2.564 to expand already available annotations and potentially define functions for those genes that are still hypothetical/unknown (Supplementary File S2 ‘Function prediction’). Globally, 35 genes out of 45 were annotated with at least one GO term: 8 of them had been previously unannotated, while the others were confirmed or expanded (Supplementary File S2 “Candidate genes”). We found that most G4s were distributed among the following functional categories: “cell wall and cell processes”, “intermediary metabolism and respiration”, “regulatory proteins”, and “conserved hypotheticals” (i.e. conserved proteins with no confirmed known function).

Table 2 G4 sequences upstream or overlapping TSS in the Mtb genome, forming G4s with at least three stacked tetrads (at least 3 Gs in each G-rich island) and with short or long score ≥ 2.

Among the identified putative G4s, the sequence upstream rv0166 (fadD5) (Supplementary File S2 “Candidate genes”) had been previously reported by Thakur and colleagues to fold into a G4 structure47. The same authors reported two additional genes to display a G4 motif; these genes are not present in our analysis since they are not associated to reported TSS62.

Selected G-rich sequences in the Mtb genome fold into G4

Among the genes with a predicted G4 in their promoter region, we selected four candidates for further experimental validations, namely Glucose-6-phosphate dehydrogenase 1 (zwf1), ATP-dependent Clp protease (clpx), Oxidation-sensing Regulator Transcription Factor (mosR), and membrane NADH dehydrogenase (ndhA) (Table 2). The choice fell on putative G4s belonging to the most stable categories (at least three ‘Gs’ in each island and loops no longer than 11 nt), prioritizing those present in multiple categories (for instance zwf1 has a G4 that falls both in the 3_4_7 and 3_4_11 category) with at least one score > 2 and in the promoter of genes with a known function.

G4 folding and topology was initially assessed by circular dichroism (CD) spectroscopy in the absence or presence of increasing concentrations of K+, since this monovalent cation is reported to stabilize the G4 conformation. All the selected molecules in the presence of K+ displayed the G4 CD signature (Fig. 1a–d).

Figure 1
figure 1

CD spectra of the putative G4 molecules of zwf1 (a), clpx (b), mosR (c) and ndhA (d) in the presence of increasing KCl concentrations (0–150 mM).

The zwf1 G4 structure exhibited a mixed-type conformation in K+, with a shoulder at 265 nm, a positive and a negative peak at 290 nm and 240 nm, respectively (Fig. 1a). clpx G4 adopted a parallel-like conformation in K+, with a maximum at 265 nm and a minimum at 240 nm (Fig. 1b). mosR G4 folded in a mixed type conformation in K+ showing a spectrum with two positive peaks (267 and 290 nm) and a negative peak at 240 nm (Fig. 1c). Molar ellipticity values of all these structures increased in a K+-dependent manner, further supporting G4 formation (Fig. 1a–c). zwf1 and mosR displayed a G4-like CD spectrum (mixed-type conformation) also in the absence of K+, indicating high propensity to fold and stability. The ndhA G4 sequence transitioned from mixed-type in the absence of K+ to fully antiparallel (CD spectrum with two maxima at 240 and 290 nm and a minimum at 265 nm) in the presence of K+ 150 mM (Fig. 1d). Overall our data indicate that the selected sequences of Mtb can effectively fold into G4 conformations.

Stability of zwf1, clpx, mosR and ndhA G4s in the absence and presence of increasing K+ concentrations (50–150 mM) was assessed by melting experiments monitored by CD, calculating the melting temperatures (Tm) according to the van’t Hoff equation (Table 3).

Table 3 Melting temperatures (Tm) of Mtb G4 oligonucleotides (4 µM) in the absence and presence of increasing KCl concentrations (50–150 mM) and G4 ligands (16 µM).

In all cases the CD signal decreased over temperature. For zwf1, clpx and mosR G4s a single transition between 20 °C and 90 °C was appreciable, leading to discrete Tm values. ndhA G4 showed a peculiar behaviour, with a relatively high Tm (60.5 ± 0.3 °C) in the absence of K+ and two different Tm values in the presence of K+ ascribable to two transitions due to the presence of spectroscopically distinct species in solution. Overall we observed increase of Tm values in a K+-dependent manner, indicating that G4s were stabilized by K+ with increase of Tm up to 34.1 °C (Table 3).

Effect of G4 ligands on Mtb G4s

We next investigated Mtb G4 sequences in the presence of G4 ligands that have been reported to specifically recognize and stabilize G4 structures over double- and single-stranded nucleic acids. In particular, we tested a commercially available G4 ligand, BRACO-1965, and a newly synthesized compound, c-exNDI 221, both of which have shown high selectivity for tetraplex structures over duplex. The effect of the two G4 ligands on the selected sequences in the presence of 100 mM K+ was initially assessed by CD analysis: they induced mild conformational changes in Mtb G4s without affecting the main topology, which remained characteristic of the G4 conformation (Fig. 2).

Figure 2
figure 2

Effect of the G4 ligands BRACO-19 and c-exNDI 2 on the conformation of the selected Mtb G4s. (a) Chemical structures of the G4 ligands BRACO-19 and c-exNDI 2. (b) CD spectra of G4 oligonucleotides zwf1, clpx, mosR and ndhA (final concertation 4 μM) in the presence of KCl (100 mM) and BRACO-19 or c-exNDI 2 (final concentration 16 μM) to assess G4 topology changes. The molar ratio oligonucleotide:compound was 1:4.

G4 ligand-induced stabilization was assessed by CD thermal unfolding analysis. G4 ligands were able to highly stabilize Mtb G4s with Tm values in some cases higher than 90 °C (Table 3). In cases where several transitions were observed (Supplementary Figures 2 and 3), Tm values for each transition were reported (Table 3). zwf1 G4 was the most efficiently stabilized sequence with an increase of Tm higher than 41.5 °C in the presence of both BRACO-19 and c-exNDI 2 (Table 3).

G4 folding of zwf1, clpX, mosR and ndhA sequences in the absence/presence of G4 ligands was additionally tested by the Taq polymerase stop assay (Fig. 3). This technique allowes to evaluate G4 formation in a DNA template and G4 involvement in arresting the Taq polymerase processing. This G4-specific block can be then accurately solved in a denaturing polyacrylamide gel in terms of intensity and position in the sequence.

Figure 3
figure 3

Taq polymerase stop assay. (a) Sequencing PAGE of Taq-amplified zwf1, clpX, ndhA and mosR templates in the absence (lanes 1) or presence of 100 mM KCl (lanes 2) and G4 ligands BRACO-19 (lanes 3) or c-exNDI 2 (lanes 4). The control template is a sequence unable to fold in G4. Symbols *, ¤, § and # indicate pausing sites just before the G4 region of the templates. Pr indicates the band of the labeled primer. M is a marker lane obtained with the Maxam and Gilbert sequencing protocol. B and N indicate BRACO-19 and c-exNDI 2, respectively. (b) Sequences of the selected G4 oligonucleotides. The exact position of the pausing sites within the template G4 sequence is indicated by the symbols *, ¤, § and #, as shown also in (a). (c) Quantification of the intensity of the stop bands obtained in (a).

For this purpose, the zwf1, clpX, mosR and ndhA oligonucleotides were added of a primer annealing region at their 3′-end. Moreover, additional T-flanking bases at both 5′- and 3′-ends were added to separate the 3′-end of the primer and the first G of the G4 portion. Samples were incubated in the absence or presence of 100 mM KCl (Fig. 3a, lanes 1 and 2, respectively), and with 200 nM BRACO-19 or 100 nM c-exNDI 2 (Fig. 3a, lanes 3 and 4, respectively). A control template unable to fold into G4 was also used to exclude unspecific inhibition of the polymerase enzyme by the G4 ligands. Taq polymerase was tested at 47 °C on all DNA templates. In the presence of all Mtb G4 templates, G4 ligands blocked enzyme processing (Fig. 3a,*, ¤, § and # symbols in lanes 3–4). Stop sites resulted specific and located at or just before the first 5′ G-tract involved in G4 folding (Fig. 3b). No stop site was detected on the negative control template (Fig. 3a). Quantitative analysis of G4 stop bands showed increased G4 formation in the presence of G4 ligands for all G4-forming sequences (Fig. 3c). Taken together these data indicate that the tested G4 binders strongly recognize and stabilize Mtb G4 sequences.

Effect of G4 ligands on Mtb growth

The effect of BRACO-19 and c-exNDI 2 on Mtb growth was analyzed using a REsazurine Microplate Assay (REMA). As shown in Fig. 4, both compounds were able to inhibit bacterial cell growth with minimal inhibitory concentrations (MIC80) in the micromolar range; c-exNDI 2 was 10 times more potent than BRACO-19 with an MIC80 of 1.25 μM vs 12.5 μM. The increased potency of c-exNDI 2 may be at least in part due to its higher efficiency in stabilizing Mtb G4s (Table 3). However, the intracellular concentration reached by these compounds under the investigated conditions is not known. Interestingly, at least for BRACO-19, the MIC80 was lower than the toxic concentration for eukaryotic cells20 supporting the possibility to use G4 ligands to develop new antitubercular agents.

Figure 4
figure 4

Resazurine Microplate Assay to measure the activity of different G4 ligands (BRACO- 19 and c-exNDI2) on Mtb.

Conclusions

Among the identified putative G4s in the Mtb genome, we selected 45 of them which were localized upstream of confirmed TSS and formed by at least 3 Gs in each island. The genes with predicted G4s in their TSS were distributed in several functional gene categories. Four putative G4s were selected for further characterization: we showed that all of them actually folded and were stabilized by two G4 ligands. Interestingly, the two ligands were able to inhibit Mtb growth in vivo. Our data support the possibility of Mtb G4 formation in vivo and their role as potential modulators of gene expression. Finally, our data suggest the possibility to use G4s as novel targets to develop antitubercular agents with a new mechanism of action.

Materials and Methods

Bioinformatics prediction of putative G4 motifs in the Mtb genome

An algorithm for the detection of putative G4 motifs was developed in house using Perl programing language and was applied to the reference genome of Mtb H37Rv (NC_000962.3). First, all guanine homopolymers (G-islands) were identified through pattern matching with the following line of code: (equation I)

$${\textstyle \text{I)}}\,seq=\sim /(G\{l,\}/g$$

where seq is the complete genome of Mtb and l is the minimum length required for the homopolymer. A putative G4 was reported when at least four G-islands were detected and the distance between consecutive homopolymers (loop region) was less than or equal to an additional parameter d (distance). G4s in the reverse strand were searched considering cytosines (C) in the same reference sequence.

In order to rank the identified G4s and focus only on those with the highest folding probability, we implemented a score measure as reported by Beaudoin et al.61. This score evaluates the presence and the relative positioning of cytosines (C) in the flanking regions surrounding a G4 motif and within the loops, since runs of consecutive ‘Cs’ were demonstrated to impair the folding of G4 structures by sequestering the ‘Gs’ in canonical Watson-Crick pairing. The score was calculated as follows (equation II):

$${\textstyle \text{II)}}\,G4\,score=\frac{cG\,score}{cC\,score}$$

cG and cC scores are defined as (equations III and IV):

$$\begin{array}{c}\mathrm{III})\quad cG(s)=\sum _{i=1}^{n}(|Gs(i)|\,\ast \,10\,\ast \,i)\\ \mathrm{IV})\quad cC(s)=\sum _{i=1}^{n}(|Cs(i)|\,\ast \,10\,\ast \,i)\end{array}$$

where ‘Gs(i)’ is the set of substrings of consecutive ‘Gs’ found in the string s, and |Gs(i)| is the cardinality of the set. A short and a long score were calculated, considering the G4 regions 15 or 50 nucleotides upstream and downstream.

The genomic coordinates of the predicted G4s were then intersected with promoter regions. To this aim, the list of primary TSS62 was exploited to extract putative promoters, which were considered embedded in the 50 nts upstream of each TSS (downstream for TSS in the reverse strand). A G4 was deemed associated to a TSS when at least one nucleotide of the G4 overlapped with the promoter. A list of all potential G4s associated to promoters is provided in Supplementary File S1.

Oligonucleotides

All oligonucleotides used in this study were from Sigma-Aldrich (Milan, Italy) (Supplementary Table S1). BRACO-19 was from ENDOTHERM, (Saarbruecken, Germany), c-exNDI-2 was synthetized by Dr. Filippo Doria and Prof. Mauro Freccero (University of Pavia).

CD spectroscopic analysis

For CD analysis, all DNA oligonucleotides were diluted to a final concentration of 4 μM in lithium cacodylate buffer (10 mM, pH 7.4) and, where appropriate, KCl (50–150 mM). After annealing (95 °C for 5 min), all samples were gradually cooled to room temperature and compounds added from stock at final concentration of 16 µM. CD spectra were recorded on a ChirascanTM-Plus (Applied Photophysisics, Leatherhead, UK) equipped with a Peltier temperature controller using a quartz cell of 5-mm optical path length and an instrument scanning speed of 50 nm/min over a wavelength range of 230–320 nm. The reported spectrum of each sample, representing the average of 2 scans, is baseline-corrected for signal contributions due to the buffer. Observed ellipticities were converted to mean residue ellipticity (θ) = deg × cm2 × dmol−1 (mol. ellip.). For the determination of Tm, spectra were recorded over a temperature range of 20–90 °C, with temperature increase of 5 °C/min. Tm values were calculated according to the van’t Hoff equation, applied for a two state transition from a folded to unfolded state, assuming that the heat capacity of the folded and unfolded states are equal.

Taq polymerase stop assay

Taq polymerase stop assay was carried out as previously described10. Briefly, the 5′-end labelled primer was annealed to its template (Supplementary Table S1) in lithium cacodylate buffer in the presence or absence of KCl 100 mM and by heating at 95 °C for 5 min and gradually cooling to room temperature. Where specified, samples were incubated with BRACO-19 (250 nM) or c-exNDI-2 (100 nM). Primer extension was conducted with 2 U of AmpliTaq Gold DNA polymerase (Applied Biosystem, Carlsbad, California, USA) at 47 °C for 30 min. Reactions were stopped by ethanol precipitation; primer extension products were separated on a 16% denaturing gel, and finally visualized by phosphorimaging (Typhoon FLA 9000).

Mtb strains and growth conditions

Mtb strain H37Rv was grown at 37 °C in Middlebrook 7H9 containing 0.5% glycerol and supplemented with 10% bovine serum albumin (BSA) – D-dextrose – NaCl (ADN), 0.05% Tween 80. Middlebrook 7H10 medium supplemented with ADN and glycerol was used as solid medium.

REsazurine Microtiter Assay (REMA)

Drug sensitivity was determined using REMA as previously described66. Briefly, frozen stock cultures were grown on solid medium 7H10/ADN. Subsequently, a pre-culture was carried out in 2 ml of liquid medium (7H9/ADN) starting from an OD540 of 0.05. Cultures were then grown up to mid-exponential phase (OD540 0.6–0.8) and then diluted to an OD540 of 0.01. Microplates suitable for fluorescence reading (96-well FluoroNuncTM black flat bottom plates) were used to determine the MIC of each bacterial strain. Serial dilutions were used to dispense the correct amount of each compound in each well. Each well was than inoculated with a bacterial suspension containing 5 × 104 cfu. The plates thus obtained were sealed and incubated for 1 week at 37 °C. After incubation, 10 µl (10% of final volume) of Alamar-Blue (Invitrogen) was added to each well and the plates, after another day of incubation at 37 °C, were read on a microplate reader (Tecan Infinite 200 Pro) to determine the relative fluorescence (excitation 535 nm and emission 590 nm). For each strain we used a positive control (cells without antibiotic) to determine the maximum fluorescence that could be obtained, and a negative control (medium plus antibiotic without cells).