The first mechanistic description of gene expression control was proposed by Jacob and Monod (1961) for the lac operon of E. coli. The first version of the model contemplated exclusively a negative regulation of the Plac promoter by a repressor that liberated its cognate operator in the presence of lactose. Much of the success of the scheme relied not only in its simplicity, but also in its similarity to well-known feedback inhibition devices employed by engineers to control, for example, hydraulic fluxes. It is to be remembered that the ensuing evidence of yet another, this time positive, regulator of the same operon (that is, Crp, the catabolic regulatory protein) was received at first with considerable skepticism. Why such a complexity (two factors: one negative, one positive) was necessary when something simpler could suffice to explain much of the phenomenon at stake? Since then, the regulation of the lac operon has done nothing but reveal more and more sophisticated features that could have hardly been anticipated at the beginning. These include the way the system reacts to the disappearance of the inducer (what is called hysteresis) and the stochastic performance of the promoter that makes cells in a population adopt to two extreme physiological states (Veening et al., 2008). Let alone that the function of lacA, the 3rd gene of the lac operon is still controversial (Danchin, 2009). But the question still remains as to why some promoters—let alone regulatory networks—are so complex when their function could be accomplished with much simpler counterparts.

Let us take one paradigm of genetic circuit that stems from environmental bacteria: the xyl operons encoded by the TOL plasmid pWW0 carried by Pseudomonas mt-2 (Ramos et al., 1997). This strain has received much attention since its isolation in the early 70’s because of its fascinating ability to thrive on (the otherwise quite unpalatable) m-xylene and toluene as sole C sources. Although many other strains have been described to grow on the same or similar hydrocarbons, the complexity of the regulatory network that orchestrates their biodegradation in P. putida mt-2 is quite perplexing. If the problem were only maximizing m-xylene biodegradation, an engineer (or a synthetic biologist) would surely consider on arraying the genes encoding the necessary enzymatic activities one after the other to form a single polycistronic operon and place the whole under the control of a strong inducible promoter responsive to the pathway substrate (Figure 1a). Under such an engineering perspective, the only uncertainty in the blueprint would be the sequence of the connecting intercistronic regions of the long mRNA to guarantee an adequate stoichiometry of each of the enzymes of the biochemical process. But, in sharp contrast with these sensible design principles, what we find in the TOL system (and in many other strains that degrade unusual chemicals; Tropel and van der Meer, 2004) is a regulatory scheme that looks somewhat overdone (Figure 1b). First, the route for m-xylene catabolism is divided into two parts, one for what biodegradation literature calls the upper pathway and other for the lower pathway. It is remarkable that such a genetic division of labor (genes to go from m-xylene to 3-methylbenzoate (3MBz), and from this to pyruvate and acetaldehyde) is not entirely coincident with the major biochemical blocks of the catabolic pathway, that is, those that go from m-xylene to 3-methylcatechol (the ring-cleavage key intermediate), and from there to the Kreb’s cycle. A second feature is the existence of two separate transcriptional factors (TFs), one for the upper operon (XylR, responsive to m-xylene) and one for the lower operon (XylS). The latter responds to 3MBz, the intermediate that results from complete oxidation of one methyl group of m-xylene through the action of the upper pathway enzymes (Ramos et al., 1997). Why do we need two TFs? Having too long an operon may cause a sort of fatigue of the transcribing RNA polymerase; one could consider expression of the downstream genes by having an extra promoter responsive to the same TF but advantageously located at the site of the DNA sequence where such a transcriptional strengthening was needed. This is in fact observed in the regulation of the two nah operons of the naphthalene-degrading strain P. putida NAH7, which both respond to a single TF (NahR) in the presence of salicylate, one of the pathway intermediates (Huang and Schell, 1991). Why do we not have something similar in the case of the TOL operons? Perhaps the NAH system has evolved to deal with just one compound (naphthalene), whereas TOL has the flexibility to activate only the lower pathway for 3MBz—or the entire system, upper and lower for m-xylene. In this respect, it is remarkable that the upper and the lower TOL pathways are transcribed by variants of RNA polymerase with different sigma factors (Ramos et al., 1997; Silva-Rocha et al., 2011b). This may reflect the need of linking the chemistry of each of the compounds with the stress associated to their metabolism.

Figure 1
figure 1

Rational engineering versus evolutionary tinkering for biodegradation of m-xylene. (a) Forward design of a new pathway for biodegradation/biotransformation of aromatic compounds. The figure sketches the minimal components of an idealized engineered system: one substrate-responsive regulator R able to induce transcription of the adjacent operon, an array of enzymatic modules, and intervening inter-cistronic sequences and a terminator. The system can also be added with a feedback loop for controlling the flux of the biotransformation at stake. (b) The extant TOL network of P. putida mt-2 for metabolization of m-xylene. The system has two operons: upper and lower, expressed from the Pu and Pm promoters, respectively. The XylR regulator is expressed from the Pr promoter, whereas XylS is expressed from Ps. When active, XylR triggers the expression of Pu and Ps while it represses Pr. In the case of XylS, its active form triggers the expression of Pm; m-xylene is converted to 3-methylbenzoate through the action of upper enzymes, whereas this product is further metabolized to TCA intermediates by the action of the lower pathway. At the regulatory level, m-xylene binds to the inactive form of XylR switching this regulator to an active form. Similarly, XylS switches to the active form upon binding of 3-methylbenzoate. In addition, XylR/m-xylene triggers overexpression of XylS, which can activate Pm without 3-methylbenzoate.

But still, the most striking feature of the regulatory architecture of the TOL plasmid is the interplay between the two regulators (XylR and XylS) and the way they activate their cognate promoters (Figure 1b). Expression of XylS is under the control of the XylR-responsive promoter Ps. This means that the presence of m-xylene triggers both transcription of the upper pathway promoter (called Pu) as well as overproduction of XylS. What makes this system really extraordinary is that such an overproduction suffices to activate Pm, the promoter of the lower pathway, in the absence of the endogenous effector of XylS (3MBz). This unusual property of XylS results in the simultaneous activation of the upper and the lower operons before the substrate of the lower route has the time to materialize (Silva-Rocha et al., 2011a). This anticipatory behavior looks like a considerable waste at first sight. Is it just a casual occurrence or does it bear some significance?

The issue of complexity and its raison d’etre is at the core of the evolutionary understanding of biological phenomena, and biodegradative systems are no exception. We entertain that the extant architecture of regulatory circuits—more so in those that deal with metabolism of recalcitrant and xenobiotic compounds—contains a record of the series of bottlenecks that bacteria had to defeat for assembling a functional pathway for a new compound, as well as the solutions to overcome them (Silva-Rocha et al., 2011a). Note that such problems are not limited to finding the right combination of enzymes and expressing them at the right time when the substrate is available, that is, the easy part that any genetic engineer could figure out (Figure 1a). The most difficult is wiring the new activities to the existing metabolic and transcriptional network of the cell without creating biochemical havoc, managing the division of labor (metabolic or otherwise) in the population when cells are exposed to mixtures of nutrients-to-be, and dealing with neighbors of other species that may share the same nutritional niche. The more the TOL system is examined, the more traces we find that P. putida mt-2 has been through these (and possibly many other) challenges along its evolutionary history, and that the extant genetic and biochemical network has found solutions to (perhaps) all of them.

How can we decode the metabolic and regulatory roadmap encrypted in the TOL system? Biological networks can be abstracted as relational, dynamic objects whose functionality is not determined by the material nature of their components, but by the interactions between them—their number, their topology and their kinetic parameters. A useful approach to penetrate the inner logic of the given genetic circuits is the adoption of simple Boolean formalisms in which every relevant action (enzymatic or regulatory) can be represented as a binary logic gate with defined inputs and outputs (Silva-Rocha et al, 2011b). Although the logic circuit that results from combining all of such gates (the so-called logicome) present in a complex network does not reveal much about the performance of the system from a kinetic point of view, the resulting scheme does tell something about why a specific configuration has been selected instead of another. The findings of applying such tools to the TOL system have been quite surprising. On one hand, it appears that the entire regulatory architecture of the circuit rotates around what we called a metabolic amplifier motif (Silva-Rocha et al., 2011a). This regulatory device causes a premature response to a signal (3MBz) that will appear in the system only afterwards. Such a metabolic memory (Mitchell et al., 2009) seems to serve various needs, for example, avoiding accumulation of toxic intermediates and preventing misrouting of 3MBz through a nonproductive pathway (Silva-Rocha et al., 2011a). In reality, managing the possible biochemical conflict posed by the production of 3MBz out of the upper TOL enzymes and the risk of being channeled through an alternative dead-end pathway encoded in the chromosome of P. putida could account for much of the intricacy of the TOL regulation. On the other hand, the layout of the circuit and the very low levels of two limiting factors (XylR and σ54), which are required for the activity of Pu and Ps (Fraile et al., 2001; Jurado et al., 2003), makes the TOL architecture prone to generate a stochastic activation of the promoters at stake, and thus a considerable phenotypic diversity (Silva-Rocha and de Lorenzo, 2012). We argue that such stochasticity favors the population, as it allows a degree of metabolic variation when an otherwise genetically identical population faces a mixture of nutrients. Finally, it should not escape our notice that the main regulator of the system, XylR, responds to many structural analogs of m-xylene that cannot be metabolized by the TOL enzymes (Abril et al., 1989). Furthermore, XylR easily mutates towards effector promiscuity (Galvao et al., 2007). Although activating the TOL operons with a non-substrate makes no sense in single cells and in a genetically homogenous population, it can be advantageous in a site with various chemicals available as carbon sources and inhabited by a multi-strain community. This is because new metabolic abilities can emerge through the combination of biochemical steps contributed by different bacteria. If the endogenous regulators/promoters of the operons that encode the enzymes were absolutely specific for each pathway and each strain, such an ectopic route (greek, ek: out and topos: place) would never materialize. In contrast, if the substrate promiscuously activates the corresponding promoters, then the community could increase its biodegradative potential (de Lorenzo et al., 2010).

In conclusion, we argue that regulatory complexity is hardly gratuitous and that the unusual architectures that we often find in environmental biodegradative systems (for example, the TOL plasmid) have been shaped by prior biochemical, populational and community conflicts. It is remarkable that current views on the evolutionary interplay between regulation and metabolism mostly contemplate sole strains growing in a homogenous culture medium with single carbon sources (Shlomi et al., 2007). Under such conditions, the architecture of metabolic networks can be explained through a sheer economic objective (the so-called Pareto optimality; Schuetz et al., 2012), that is, maximum production of competing goods from the same set of resources. Yet microbes in the environment are not only about metabolic economy, but also about sociology, non-uniform territory and competition/collaboration. In this context, the architecture of genetic circuits of the sort discussed above appears to encode a record of historical bottlenecks (for example, biochemical jams) as well as the evolutionary novelty that has solved them. It is difficult to establish a temporal series of problems/solutions, because what we see today represents the outcome of all of them—and they may have occurred/solved simultaneously. This state of affairs resembles what in some proteins has been called moonlighting, that is, the property of a single polypeptide to hold entirely different functions in the same structure (Huberts and van der Klei, 2010). The data so far with the TOL plasmid—and possibly many other circuits of the sort (Tropel and van der Meer, 2004)—indicate that one regulatory scheme can meet a considerable number of different needs at the level of single cells, populations and multi-strain communities. It could well be the case that not only the physiology, but also the code of social conduct of environmental bacteria were chartered in their regulatory networks.