Abstract
Inspired by multiprotein complexes in biology and recent successes in synthetic DNA tile and colloidal selfassembly, we study the spontaneous assembly of structures made of many kinds of components. The major challenge with achieving high assembly yield is eliminating incomplete or incorrectly bound structures. Here, we find that such undesired structures rapidly degrade yield with increasing structural size and complexity in diverse models of assembly, if component concentrations reflect the composition (that is, stoichiometry) of the desired structure. But this yield catastrophe can be mitigated by using highly nonstoichiometric concentrations. Our results support a general principle of ‘undesired usage’—concentrations of components should be chosen to account for how they are ‘used’ by undesired structures and not just by the desired structure. This principle could improve synthetic assembly methods, but also raises new questions about expression levels of proteins that form biological complexes such as the ribosome.
Introduction
A central feature of living systems is that they selfassemble their parts. Within the cell, an enormous variety of macromolecular complexes form spontaneously^{1}, with nearly every part having different shape and binding affinities. These complexes range from structures with a small number of different components (chaperones and proteasomes) to large structures with complex topologies (the nuclear pore complex and the ribosome). Although the assembly pathways of such complexes have been investigated^{2,3,4,5}, general design principles for binding energies, concentrations and other parameters of the selfassembly process are still mysterious. To minimize waste or errors, these complexes presumably assemble with high yield.
Material synthesis strategies have also recently started to use large numbers of different components, particularly with nanostructures either coated^{6} with or entirely composed of DNA^{7,8,9,10}. For example, work^{7,8,11,12,13} on assemblies of short DNA strands has now been extended to a plethora of complex large shapes called tile or ‘brick’^{9,10} assemblies. Similar efforts are underway using rationally designed proteins^{14,15} or colloidal particles^{16} with specific interactions.
The major challenge with achieving high yield is eliminating competing undesired structures. Even when components are chosen so that the desired structure is the ground state, incomplete and incorrectly formed structures will occur. Incomplete structures can result from a lack of sufficient components to drive the reaction to completion. Incorrectly bound structures can result from inevitable nonspecific binding between components. Such crosstalking interactions are ubiquitous in both natural and synthetic systems. Nonspecific interactions between proteins arise from protein structure constraints^{17,18}, creating challenges for both signalling pathways^{19} and selfassembly^{12,20}. In DNA brick assembly^{9,10}, even with designer DNA sequences that precisely match the desired structure, there is still (weaker) binding affinity of each strand to other general components.
The number of undesired structures of varying size and composition grows exponentially with the number of component types in the desired structure^{21,22,23,24,25}. A fundamental question is whether undesired structures can be sufficiently suppressed by choosing the binding energies^{26,27} and concentrations^{28} of different components.
Within biology, it is believed that the concentrations of individual proteins of multiprotein complexes obey the dosage balance hypothesis (DBH)^{29}, which asserts that the optimal expression levels are proportional to the stoichiometric ratio in which proteins form complexes. Likewise, the synthetic examples that we are aware of choose concentrations of individual components to match the stoichiometry of the desired structure. However, there is no evidence that stoichiometrically balanced concentrations maximize assembly yield.
We study these questions in a broad set of models relevant to protein complexes, DNA brick and colloidal assemblies. Individual structures are represented by their connectivity graphs (Fig. 1), which we then model with multivalent components with short ranged binding affinity. Our models build upon a framework introduced^{12,13,30} in the context of DNA tiles.
Our analysis of these models suggests a simple new principle of ‘undesired usage’: control parameters such as component concentrations should be chosen based on how those components are ‘used’ by undesired structures, as opposed to just the desired structure. For example, if undesired structures ‘use’ component A more often than component B, A should be supplied in smaller amounts than B—even if the desired structure is made of one A and one B. We give a mathematical definition of ‘usage’ and show that this principle leads to regimes with markedly higher yields in diverse models of assembly, relevant to protein complexes, DNAbased selfassembly and colloidal assemblies. We first rigorously establish these results in an equilibrium model of selfassembly, introducing a perturbative Feynman diagram technique for computing yield that accounts for the combinatorial explosion of competing structures. We then demonstrate that the principle of undesired usage can also alleviate kinetic yield catastrophes in nonequilibrium models, focusing on two models of recent interest in colloidal and DNA brick assembly. In the systems we study, the yield improvement from nonstoichiometric concentrations is typically larger when the stoichiometric concentrations yield is smaller.
Results
Undesired usage must balance desired usage
We consider selfassembly of heterogeneous structures made of n components; each component is one of m species types and has multiple distinct binding sites. Besides the desired structure, the m species of components can assemble numerous incomplete or incorrect undesired structures. (See Fig. 2 where m = n =4.) Incomplete structures are pieces of the desired structure that do not have all the necessary components, whereas incorrectly bound structures contain weak ‘crosstalking’ interactions. Such undesired structures vary in size, shape and composition and can markedly reduce yield.
We define yield Y as the number of desired structures produced relative to all undesired structures. That is,
where X_{d}(c_{i}), X_{u}(c_{i}) are the numbers of desired and undesired structures produced by assembly and which depend on species concentrations c_{i}. X_{u} can be written as a sum over all undesired structures a, .
We define the ‘usage’ of species i by structure a:
reflects how the production rate of structure a depends on the concentration of component i. For example, consider an experiment carried out using concentrations c_{i} and then perturb around these concentrations, c_{i} → f_{i}c_{i}. If the perturbations are small, we can Taylor expand to write . Thus, large usage implies that increasing the concentration of species i will greatly increase the production of structure a. The gradient of yield with respect to log c_{i} can be written in terms of usage,
where the average is over all undesired structures a, each weighted by the amount X_{a} of it produced. The proportionality constant in equation (3) is always positive.
This equation defines the principle of undesired usage—yield is improved by lowering the concentration of component i whose average undesired usage is higher than its correct usage (and viceversa). Changing concentrations in such a manner produces different distributions of undesired structures X_{a}/X_{u}, whose average undesired usage is closer to the correct usage . The resulting optimal concentrations can be very different from the stoichiometry in the desired structure.
Our analysis of specific models will show that the resulting yield improvements are marked. In what follows, we apply this principle to equilibrium assembly as well as to two paradigmatic kinetic examples. In our model of equilibrium assembly, c_{i} and X_{a} in equation (2) will refer to steadystate concentrations, and we analytically show that the usage is the number of occurrences of species i in structure a. In our kinetic models, quantifies the dependence of the final amount X_{a} of structure a on initial concentrations c_{i} that deplete with time. Thus, equation (3) can be applied to both steadystate or initial concentrations and to equilibrium or kinetic assembly.
Equilibrium assembly and yield catastrophes
We begin by studying equilibrium assembly. We assume that the interactions between binding sites in the desired structure are strong and of energy s k_{B}T <0. For example, in Fig. 3, components 6 and 7 and components 4 and J_{1} have strong binding with each other. In many systems, such as DNA bricks designed with random sequences or protein assemblies, the strong interaction energy will typically vary across the structure. Optimal concentrations will depend on such variation in binding energies and can be computed using the framework we introduce below; for simplicity, we focus on the case of a single strong binding energy scale s k_{B}T and discuss generalizations in Supplementary Note 1.
In any natural or synthetic system, there is always some level of nonspecific interactions that we call ‘crosstalk’^{12,18,20,31,32,33}. To model crosstalk, we assume that all nondesired binding sites interact weakly with an energy w k_{B}T <0 that is distributed randomly as ρ(w). Thus, components 6 and 2 interact through such a weak crosstalk interaction. The ‘male’ site of component 3 interacts weakly with all components since it is unbound in the desired structure.
We assume for now that each component i is supplied at a fixed chemical potential μ_{i}, so the concentration of free components has the constant value , with . This steadystate model mimics the assembly of the ribosome and other macromolecular complexes whose protein components are being continually produced, or assembly in a large sea of components whose concentrations change very little during assembly. Recent works^{10,34,35} suggest that, in some temperature regimes, the experiments of^{9,10} can be described in such a manner. See kinetic models below for complementary possibilities.
Yield at equilibrium can be obtained by summing the partition function over all undesired structures; we developed a method adapted from Feynman diagrams to perform such numerical computations efficiently (see Methods section). We find that yield is determined by an energy–entropy balance, with the number of the most stable competing structures growing as ~n^{2} and each such crosstalkcontaining structure suppressed by an energetic factor of e^{−β(w−s)}. Building structures of size n with any yield at all requires the energetic suppression to dominate which in turn implies a bound on the crosstalk energy w. Extending such analytic arguments (detailed in Supplementary Note 2 and Supplementary Fig. 5), we find , where is the exponential average of crosstalk energies w between all species, assumed to be distributed as ρ(w). A and B are O(1) numbers that depend on the topology of the desired structure. Note that stronger interactions correspond to more negative energies in our convention. If crosstalk strength w exceeds this bound, the different species of components are indistinguishable and the system effectively behaves as having only one species. Hence, we will restrict ourselves to crosstalk below this bound (which we call the ‘ crosstalk threshold’) and examine how the maximum yield depends on the size and structural complexity of the desired structures.
Equal chemical potentials lead to a yield catastrophe. For simplicity, we begin by considering structures where all components are distinct, that is, the number m of distinct species used is also the size n of the structure. We first assume that the chemical potentials of all m = n species are stoichiometric, so μ_{i} = μ. Figure 4a shows the yield (red curve) as a function of μ, for a linear structure of size n =8. Low chemical potentials favour incomplete structures, whereas high potentials favour large aggregates held together by weak bonds; yield is maximized for an intermediate value of μ.
The red data points in Fig. 4b show how this maximum yield depends on n, the size of the linear structure. To fairly compare yields for structures of different n, we need to increase the difference g = w−s between strong and weak bond energies as , so that g stays above the crosstalk threshold by a fixed amount G. We set A=2.0 for linear structures. Strikingly, the yield degrades exponentially with increasing n, indicating that by n ≈35 the maximum yield is at most ~1%. The yield degrades because the number of competing structures increases markedly with increasing size n; although each individual competing structure has higher energy, the combinatorial explosion of possibilities with growing n strongly limits the yield.
This yield catastrophe also occurs with increasing structural complexity. Using the Feynman method, we numerically computed the yield of branched structures for fixed n, but with increasing numbers of arms (Fig. 4c). Again, we find that the yield decreases exponentially with the number of arms; the reason is that the number of competing structures increases markedly with the number of arms (calculated in Supplementary Note 1).
The yield catastrophe presents a fundamental constraint: if the concentrations of components are stoichiometrically matched to the structure, then there is a limit to how large and how complex a structure can be robustly assembled.
Unequal chemical potentials alleviate the yield catastrophe. We now allow the components to have unequal chemical potentials μ_{i}. We numerically vary μ_{i} independently to optimize yield, using gradient descent on our Feynman construction for partition functions. Strikingly, the optimal values of μ_{i} are highly nonuniform across the structure, with exterior pieces having much higher μ_{i} (and hence concentrations) than interior pieces. Moreover, the optimal μ_{i}s lead to a nearly complete recovery from the yield catastrophe. Figure 4 shows the effect on our two model calculations: Fig. 4b,c (black dots) show that the optimal μ_{i}s lead to a yield that is independent of the length n of linear structures and the number of arms a in branched structures.
To understand why the optimal μ_{i}s are unequal, we analyse undesired usage. In equilibrium, the ‘usage’ of species i by structure a, defined in equation (2), simply reduces to the number of times species i occurs in structure a (see derivation in Methods section).
Figure 5 shows the implications of usage graphically; we plot the undesired usage of components i = 1,…8 averaged over competing structures grouped as larger and smaller than the desired structure. With equal chemical potentials (Fig. 5a), only 60% of small undesired structures contain component 1 but almost 90% of them contain 5. This asymmetry arises because, for example, the dominant small structures of length four are 1234, 2345, 3456, 4567, 5678 (others, such as 1278, are suppressed by weak bonds). Component 5 occurs in most of these structures while 1 occurs in only one. Large structures have unequal usage for similar reasons.
In the desired structure, each component is only used once. Hence, increasing the supply of 1 (that is, increasing μ_{1}) will suppress small structures (whose usage of 1 is 0.6) relative to the desired structure while only somewhat enhancing large structures (whose usage of 1 is 1.1). The suppression of small structures dominates and yield is improved. Similarly, decreasing μ_{5} will improve yield by greatly suppressing large structures (usage ≈2) and somewhat enhancing small structures (usage ≈0.9).
Modifying chemical potentials in this way changes the distribution of competing structures produced and hence usage will have to be recomputed. The process terminates when the undesired usage of different components matches the stoichiometry in the desired structure (see equation (3) and Fig. 5b). Increasing μ_{1} any further will promote large structures more than it suppresses small structures. Similarly decreasing μ_{5} any further will promote small structures and harm yield. In this way, the optimal supply of components strikes a balance between different groups of competing structures.
Undesired usage analysis also explains the optimal profiles for general branched structures shown in Fig. 6; for example, the dominant competing structures for the 3armed star in Fig. 6a consist of junction J with partially built arms. These partial arms will almost always contain component 1 and not contain component 3 unless it is a complete arm. Partial arms in which component 3 occurs without 1 involve crosstalking interactions and contribute less to the partition function. In the millipedelike structure, junction J_{2} is to have lower concentration than J_{1}, since J_{2}, with four sites, is more prone to selfaggregation. See Supplementary Figs 1–3 and Supplementary Note 1 for a sampling of such competing structures, further usage analysis of structures shown here and the weak dependence of optimized yield on topology of structures in Fig. 4c.
In Supplementary Fig. 4, we show how the optimal concentration profile changes if the strong binding energy s varies across the structure; we show that our results on the yield catastrophe and its alleviation continue to hold.
In summary, we see marked yield improvement (Fig. 4) by using unequal chemical potentials as dictated by undesired usage (equation (3)). Thus, unequal potentials provide sufficient ‘control knobs’ to suppress competing structures whose number is exponential in size n.
Undesired usage analysis alleviates kinetic yield catastrophes
Thus far, we have analysed the mitigation of the equilibrium yield catastrophe using the principle of undesired usage. However, assembly in many systems is limited by kinetics. Unlike equilibrium yield, the details of kinetic pathways differ from system to system. Nonetheless, equation (3) shows that undesired usage analysis still applies, although the definition of usage depends on the details of the kinetics.
Here we examine complementary exemplars of kinetic issues in models of two synthetic systems—DNA brick and irreversible colloidal assemblies. In both cases, we show that nonstoichiometric concentrations markedly alleviate yield catastrophes. In the kinetic problems we study here, concentrations c_{i} will refer to the initial amounts of components supplied, since free components are significantly depleted over the course of assembly.
Incomplete incompatible structures. Kinetic yield catastrophes can occur even in the absence of crosstalk. Our first model assumes that a desired structure can nucleate and grow from several inequivalent seeds (Fig. 7a). If the rate of nucleation is comparable or fast compared with the rate of subsequent growth of seeds into full structures, many inequivalent seeds will nucleate and grow in parallel. Such parallel growth can lead to a kinetic yield catastrophe in the form of a ‘depletion trap’; all components are locked up in incomplete incompatible structures.
The problem of incomplete structures has been argued to be central to both DNA brick assembly^{9,34,35,36} and protein complexes^{27,37}. Indeed, before the experiments of ref. 9, such depletion traps were thought to make DNA bricklike approaches unlikely to succeed. Recent simulations^{34} show that in a narrow regime of parameter space, assembly is successful because nucleation is sufficiently slow compared with growth, that is, each nucleated structure typically completes growth before the nucleation of another seed. However, outside of this regime, multiple structures nucleate rapidly, deplete monomers and might incorrectly bind each other^{34}. Hence, it is important to develop strategies that expand the range of conditions for successful assembly; other materials, assembly conditions and larger target structures may not satisfy the criterion of slow nucleation and fast growth.
Undesired usage analysis implies that unequal concentrations can alleviate depletion traps. We first demonstrate this in a simple nucleationandgrowth model of a ring (Fig. 7a), where the ring is made of n bivalent components that bind only to their correct partners. We assume that the interactions are cooperative (for example, through allostery) so that the ring grows from critical nucleating seeds of size a or greater: structures smaller than size a quickly dissolve into monomers, but a nucleating seed of size a grows irreversibly^{30}. Assembly starts with an initial supply of components of concentration c_{i},i =1,…,n that is depleted over time. We compute yield by numerically solving the master equation with initial conditions c_{i} (see Methods section).
With equal concentrations, we find that the depletion trap strongly degrades yield for large n; this degradation occurs because the nucleation rate of seeds is independent of structure size n while the subsequent growth rate of the seed into the full structure falls as 1/n. Further, the number of parallel nucleation pathways increases as n. Hence, yield falls exponentially with increasing structure size n, giving a kinetic yield catastrophe (Fig. 7b).
With unequal concentrations, this depletion trap is strongly alleviated. Through gradient descent optimization, we find that the optimal concentration profile is highly nonuniform (Fig. 7c), with high concentrations for a region of size ~a and low concentrations everywhere else. (By symmetry, this high concentration region can be anywhere around the ring.) The unequal concentrations greatly enhance nucleation for one of the many assembly parallel paths shown in Fig. 7a, suppressing the rest. Hence, even with slow growth, the components needed for growth are not significantly depleted by competing pathways, with a marked effect on yield (red dots in Fig. 7b). We validated these same conclusions in a 2dim ‘P’shaped structure (Fig. 7d), made of n =208 tiles and resembling DNA brick assemblies^{9}. We assume that each tile in the structure is a unique species with four distinct binding sites that bind only to their correct partners. We simulated a nucleationandgrowth model similar to the ring above (see Methods section); seeds of size a =18 could nucleate anywhere in the structure and then grow irreversibly, with the rates of both processes determined by the concentrations of components.
Starting assembly with equal amounts (300 copies) of each species, we find a severe depletion trap, resulting in a wide distribution of incomplete structures of typical size ~100–150 tiles (blue bars in Fig. 7e). Only nine structures are of size within 15% of the complete n =208 Pstructure. On the other hand, the highly unequal profile (with the same average number 300 as earlier) shown as a heatmap in Fig. 7d, gives a yield of ~135 structures, a 15fold improvement.
Incomplete sticky structures. Colloidal particles form aggregates easily because they are isotropically sticky and pick up multiple partners, making it difficult to build predictable finite structures. Such a yield catastrophe was recently observed^{38} and mitigated using highly nonstoichiometric concentrations. Finite clusters of type AB_{4} were built from two kinds of DNAcoated colloidal particles A and B, designed so as to bind each other irreversibly but not bind their own kind. The radii of A and B were chosen so that exactly four Bs bind to each A. Figure 8a shows a selection of kinetic pathways leading to the desired AB_{4} structure (black) and to several undesired aggregates (red). The main hurdle to high assembly yield is ensuring that incomplete structures (that is, AB, AB_{2} and AB_{3}) bind only to particle B and not to other incomplete structures, A, or larger aggregates. The experiments^{38} found large aggregates and a very low yield of AB_{4} if assembly starts with the stoichiometric ratio 4:1 of B:A. On the other hand, supplying a large excess of B greatly enhanced the yield of AB_{4}.
We were able to reproduce this behaviour with the stochastic simulation of a simple model of irreversible aggregation (see Methods section). Yield is maximized at the highly nonstoichiometric ratio of 8:1 for B:A (Fig. 8b). Figure 8c (red bars) shows that the 8:1 supply shifts the distribution of assembled structures towards lower mass, compared with a 4:1 supply (blue bars). An excess of B suppresses the probability of taking the red pathways, relative to the correct black pathway (Fig. 8a). The depletion of free A particles with time helps further suppress undesired pathways—otherwise, the desired structure AB_{4} can pick up free As and aggregate. Hence, a high ratio of B:A helps rapidly complete incomplete structures through addition of monomers and prevent them from sticking to each other. The results here generalize Flory's classic theory of irreversible condensation^{39} that involved only a small number of species.
These complementary examples from DNA, protein and colloidal assembly demonstrate that undesired usage analysis can greatly enhance yield in examples where assembly is kinetically controlled. Unlike our equilibrium examples, the character of kinetic yield catastrophes depends on modelling assumptions; for example, numerical prefactors in nucleation and growth rates, reversibility of growth and so on, which vary across different systems. Nevertheless, using unequal concentrations markedly alleviates kinetic yield catastrophes, especially when kinetic traps are strong. We leave a study of unequal concentrations using kinetic models tailored to particular systems such as colloids or DNA brick assemblies (for example, along the lines of refs 25, 34) to future work.
Discussion
We have studied strategies to suppress undesired structures that can catastrophically reduce the assembly yield of complex heterogeneous structures. The central lesson of our work is that the supply of different components should account for their ‘usage’ by undesired structures. The resulting optimal concentrations can differ greatly from the stoichiometry of the desired structure itself.
We have shown that undesired usage analysis is a useful consideration to apply to a range of assembly experiments involving DNA, colloids and proteins^{9,10,14,15,40}, in both equilibrium and highly kinetic conditions, even if the precise degree of kinetic yield improvement is hard to predict without knowledge of the detailed kinetics. We leave the extensions to more complex kinetic models^{41,42,43} such as hierarchical assembly to future work. Our framework and results have precedence in earlier work on the impact of concentrations in specific models of micelles^{44}, polymer condensation^{39} and protein complexes^{37}. Chen and Kao^{28} studied a DNA tile model with crosstalk and found that optimal concentrations are proportional to the square root of stoichiometric ratios. They assume that correct bonds are irreversible. In contrast, we used a fully reversible crosstalk model that describes intrinsically more errorprone assembly and showed that even catastrophically low yields can be restored by nonstoichiometric concentrations. As a result, while our results are similar in spirit, yield improvements found in ref. 28 are weaker than those reported here. Our results also differ because our physical partition function treatment accounts for all possible erroneous structures while ref. 28 used a heuristic local definition of errors.
The principle of undesired usage can be generalized to other control parameters beyond concentrations. For example, unequal binding energies are expected to alleviate yield catastrophes in our linearly branched models and in depletion trap models^{27}. These results differ from earlier analyses^{26} that only considered competing structures of the same size as the desired structure (that is, not bulk assembly). The design of structure itself^{45} can be subject to undesired usage analysis; given a physical structure, how should it be composed of different species to minimize assembly of competing structures? Finally, in practice, some structures might be more undesirable than others, owing to their being more difficult to separate or more toxic in a cell. Our framework can be generalized to accommodate such varying functional costs (Supplementary Note 4), providing a way to produce the least undesirable mix of structures appropriate in a given functional context.
Biology has many examples of heterogeneous multiprotein/RNA complexes^{1}, ranging from simple linear complexes to ribosomes and the nuclear pore complex. Our work predicts that optimal expression levels of components of large protein complexes can be highly nonuniform; for example, if a protein component is particularly likely to form toxic aggregates, that protein should be expressed at lower levels than other complex components.
Phrased this way, the implications of our work seem intuitive and perhaps even obvious—and yet they seemingly contradict a common formulation of the DBH^{29,46,47,48}. DBH states that the optimal expression levels for complexforming proteins is stoichiometric and that deviations have large fitness costs. Such fitness costs are believed to be a strong evolutionary constraint on gene duplication events, aneuploidy, gene family sizes and dominance.
Instead, our results suggest that the optimal expression level baseline may differ greatly from stoichiometry due to undesired structures. However, the rest of DBH—that changes from this optimal level have large fitness costs—would still apply about this modified baseline.
We note that all the strong evidence for DBH involves changes in expression levels^{29,46}; such evidence includes experiments that enhance or repress expression of select proteins and statistical studies of gene duplication events. These data say little about what the optimal ratios are in the first place.
In fact, the database consensus for expression levels of proteins involved in complexes is highly uneven across those complexes (ref. 49 and Supplementary Note 4 and Supplementary Fig. 6). Our proposal also finds support in the work of refs 20, 31, which found that highly promiscuous proteins in yeast have lower expression levels. The production of proteins comes at a cost (energy, material and time) to the cell and highly unequal production of proteins needed in equal amounts seems wasteful, unless some other benefit were conferred^{49,50}. However, expression levels need to be measured more accurately to draw further conclusions.
Finally, it is intriguing to ask how the organization of operons—sets of bacterial genes whose expression is regulated together—is related to potential undesired complexes^{51,52}. There is evidence for such connections between genome structure and the physics of protein complexes in other contexts^{53,54}. We leave a detailed study of these questions to future work.
Methods
Equilibrium assembly and Feynman expansion
Our model has m species of multivalent particles with distinguishable binding sites as shown in Fig. 3. The desired structure defines the interaction energy between pairs of binding sites on two different particles. If a pair of sites is bound in the desired structure, the binding is strong and of energy sk_{B}T; else, the binding energy is wk_{B}T (a weak ‘crosstalk’ interaction). Each species is supplied at a steady chemical potential μ_{i} (or equivalently concentration where . We define g = w−s.
At equilibrium, yield can be written in terms of partition functions of desired and undesired structures; in equation (1), X_{d} = Z_{d} and where Z_{a} is the partition function of structure a,
where p is the total number of bonds in a, q is the number of crosstalking bonds and g = w−s >0 is the difference between the strong and weak bond energies. is the sum of chemical potentials over all particles k (of species type i_{k}) present in structure a.
Since X_{a} = Z_{a} at equilibrium and , equation (4) implies that usage at equilibrium is simply given by the number of occurrences of species i in structure a.
Feynman method
Finding the yield requires summing the partition function over all competing structures a of varying shapes and sizes as shown in Fig. 3; the list can be quite large even for a simple structure. We markedly simplified such tedious calculations by developing a perturbative method for computing the partition functions of linearly branched and/or looped structures based on rules adapted from Feynman diagrams. Feynman methods have been used before in the computation of partition functions of polymers^{55}. In our context, Feynman rules give us a onestep method of summing over structures of all sizes that are consistent with a given topology. The Feynman rules here are: (1) A linear segment made of any number of bivalent components between two junctions is associated with the matrix D_{i}^{j}.
where E_{ij} is the interaction energy between species i and j, binding in the order i−j, and c_{j} are the concentrations of components. (2) Every binding site p of a junction J is associated with a row vector (column vector if site p is female) where E_{pi} is the binding energy of p to bivalent species i. In addition, each junction J is associated with a concentration factor c_{J}. 3. Free ends of linear segments are associated with a special vector that can account for solvent–component interactions. We take in this paper.
To find the partition function summed over all structures with a given topology, we first carry out matrix multiplication of the row and column vectors associated with junctions (or free ends) and the matrix D_{i}^{j} associated with the linear segment between such junctions, giving a scalar factor of the type . The total partition function is the product of all such scalar factors associated with a given topology.
The Feynman method is particularly efficient when the number of bivalent species is large, as long as the number of species with valence larger than two stays small. Supplementary Fig. 1 and Supplementary Note 1 contain detailed derivations and examples of applying these Feynman rules.
Kinetic depletion trap
Ring. The ring structure is modelled as made of n bivalent particles, each of a distinct species. The left binding site on species i−1 will bind only to the right binding site of i; for example, in Fig. 7a, the left site of species 5 will only bind the right site of 6. Correct binding is irreversible and no incorrect binding is allowed. Particle n is assumed to bind to 1, allowing assembly of a ring.
We denote concentration of species i by positive real numbers C_{i}(t), i = 1…n. The concentration of a clockwise segment of the ring from i to j is denoted X_{ij}(t); we assume modular (mod n) arithmetic for indices. Structures of size less than a critical nucleation size a are assumed to quickly dissolve back into monomers. Structures of size exactly a are created through nucleation at a rate k_{n}. Additional monomer are added to the structure at the two ends with rate k_{g}. For simplicity we assume that all the correct partners bind with the same rate k_{g}. The master equation for X_{i, i+a−1}(t) is:
Structures of size larger than critical size a and smaller than n grow by picking up monomers on either side, that is, for n−1 > j−i > a−1,
Structures of size n are assumed to close up and form full rings, which are stable and inert.We numerically solved these equations in Mathematica with fixed values for different initial concentrations C_{i}(t = 0) and found the value of X_{ij}(t) at large times when no further changes occur. We varied n between 25 and 50 with fixed a =10. Supplementary Note 3 contains numerical details.
Pshaped structure. We carried out discretetime stochastic simulations for nucleation and growth of the Pshaped structure, which is made of n =208 distinct square tiles; each tile has four distinct binding sites that bind only to their correct partners in Fig. 7(d). We start the simulation with an integer c_{i}(t =0) copies of species i. At each discrete time step, we randomly choose to either nucleate a new seed of size a =18 or grow existing seeds. The probability of nucleating a seed S of critical size a =18 is where the product is over the a =18 species jεS found in the seed S and λ_{N} is a normalization constant. The probability of a preexisting seed growing by picking up the correct tile of species i at a boundary site is . We reduce the numbers c_{i}(t + 1) = c_{i}(t) − 1 for species i, which participate in nucleation or growth of seeds and run the simulation until no further nucleation or growth takes place (that is, tiles are fully depleted).
The ratio of the constants λ_{N}, λ_{G} in the above probabilities of nucleation and growth is a free parameter that is not set by normalization and determines the relative speed of nucleation and growth processes. We report results for rapid nucleation (that is, a high ratio of λ_{N} to λ_{G}), a limit in which depletion traps are severe. See Supplementary Note 3 for more details of the above Gillespie algorithm, including normalization and numeric details.
Kinetic aggregation of colloids
In our discretetime stochastic simulations, we begin with a mixture of c_{A} particles of type A and c_{B} particles of type B such that c_{A},c_{B} are positive integers with c_{A} + c_{B} = 1,000. Any structure composed of i_{A} A and i_{B} B particles is said to be of type i = (i_{A}, i_{B}). At each time step, we pick two random structures (including free particles), uniformly and without replacement, out of the mixture; let us say they are of type i = (i_{A}, i_{B}) and type j = (j_{A}, j_{B}). Then, we glue them together—producing a new structure of type k = (i_{A} + j_{A},i_{B} + j_{B})—with a probability given by a kernel K(i, j)ε[0,1] that depends on the mass and composition of the two structures. Hence, reactions (that is., gluing) between structures of type i and j happen at a rate c_{i}c_{j}K(i, j), where c_{i}, c_{j} are the numbers of the type i and j structures. The two original structures of type i and j are removed from the solution. Kernel K(i, j) implements the rule that A and B only bind to each other and not to themselves. Note that as an approximation, we track structures only by their overall composition (that is, numbers of A and B) and do not track the precise arrangement of the A, B particles within the structure. This approximation and corresponding details of the kernel K(i, j) are described in Supplementary Note 3. We evolve the system in discrete time steps until no further gluing occurs, giving a final mix of structures that cannot react any further.
Additional information
How to cite this article: Murugan, A. et al. Undesired usage and the robust selfassembly of heterogeneous structures. Nat. Commun. 6:6203 doi: 10.1038/ncomms7203 (2015).
References
Levy, E. D., PereiraLeal, J. B., Chothia, C. & Teichmann, S. A. 3D complex: a structural classification of protein complexes. PLoS Comput. Biol. 2, e155 (2006).
Mizushima, S. & Nomura, M. Assembly mapping of 30S ribosomal proteins from E. coli. Nature 226, 1214–1218 (1970).
Talkington, M. W. T., Siuzdak, G. & Williamson, J. R. An assembly landscape for the 30S ribosomal subunit. Nature 438, 628–632 (2005).
Mulder, A. M. et al. Visualizing ribosome biogenesis: parallel assembly pathways for the 30S subunit. Science 330, 673–677 (2010).
Trylska, J., McCammon, J. A. & Hamacher, K. a. y. Dependency map of proteins in the small ribosomal subunit. PLoS Comput. Biol. 2, e10 (2006).
Mirkin, C. A., Letsinger, R. L., Mucic, R. C. & Storhoff, J. J. A DNAbased method for rationally assembling nanoparticles into macroscopic materials. Nature 382, 607–609 (1996).
Fu, T. J. & Seeman, N. C. DNA doublecrossover molecules. Biochemistry 32, 3211–3220 (1993).
Seeman, N. C. Nucleic acid junctions and lattices. J. Theor. Biol. 99, 237–247 (1982).
Wei, B., Dai, M. & Yin, P. Complex shapes selfassembled from singlestranded DNA tiles. Nature 485, 623–626 (2012).
Ke, Y., Ong, L. L., Shih, W. M. & Yin, P. Threedimensional structures selfassembled from DNA bricks. Science 338, 1177–1183 (2012).
Winfree, E., Liu, F., Wenzler, L. A. & Seeman, N. C. Design and selfassembly of twodimensional DNA crystals. Nature 394, 539–544 (1998).
Winfree, E. Algorithmic SelfAssembly of DNA California Institute of Technology (1998).
Adleman, L. M. Towards a Mathematical Theory of SelfAssembly. Technical Report No. 00722 (University of Southern California, USA, 2000) .
King, N. P. et al. Computational design of selfassembling protein nanomaterials with atomic level accuracy. Science 336, 1171–1174 (2012).
Lai, Y.T., King, N. P. & Yeates, T. O. Principles for designing ordered protein assemblies. Trends Cell Biol. 22, 653–661 (2012).
Biancaniello, P. L., Kim, A. J. & Crocker, J. C. Colloidal interactions and selfassembly using DNA hybridization. Phys. Rev. Lett. 94, 058302 (2005).
Segel, L. A. & Perelson, A. S. Shape space: an approach to the evaluation of crossreactivity effects, stability and controllability in the immune system. Immunol. Lett. 22, 91 (1989).
Johnson, M. E. & Hummer, G. Nonspecific binding limits the number of proteins in a cell and shapes their interaction networks. Proc. Natl Acad. Sci. USA 108, 603–608 (2011).
Laub, M. T. & Goulian, M. Specificity in twocomponent signal transduction pathways. Annu. Rev. Genet. 41, 121–145 (2007).
Zhang, J., Maslov, S. & Shakhnovich, E. I. Constraints imposed by nonfunctional proteinprotein interactions on gene expression and proteome size. Mol. Syst. Biol. 4, 210 (2008).
Halverson, J. D. & Tkachenko, A. V. DNAprogrammed mesoscopic architecture. Phys. Rev. E 87, 062310 (2013).
Tkachenko, A. V. Theory of programmable hierarchic selfassembly. Phys. Rev. Lett. 106, 255501 (2011).
Winfree, E. & Bekbolatov, R. Proofreading tile sets: error correction for algorithmic selfassembly. DNA Comput. 2943, 126–144 (2004).
Fujibayashi, K., Zhang, D. Y., Winfree, E. & Murata, S. Error suppression mechanisms for DNA tile selfassembly and their simulation. Nat. Comput. 8, 589–612 (2009).
Hedges, L. O., Mannige, R. V. & Whitelam, S. Growth of equilibrium structures built from a large number of distinct component types. Soft Matter 10, 6404–6416 (2014).
Hormoz, S. & Brenner, M. P. Design principles for selfassembly with shortrange interactions. Proc. Natl Acad. Sci. USA 108, 5193–5198 (2011).
Deeds, E. J., Bachman, J. A. & Fontana, W. Optimizing ring assembly reveals the strength of weak interactions. Proc. Natl Acad. Sci. USA 109, 2348–2353 (2012).
Chen, H.L. & Kao, M.Y. in Proceedings of the 16th International Conference on DNA Computing and Molecular Programming 13–24 (Heidelberg (2010).
Birchler, J. A. & Veitia, R. A. Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc. Natl Acad. Sci. USA 109, 14746–14753 (2012).
Adleman, L., Cheng, Q., Goel, A., Huang, M.D. & Wasserman, H. a. l. in Sixth International Conference on Difference Equations and Applications Taylor and Francis (2001).
Heo, M., Maslov, S. & Shakhnovich, E. Topology of protein interaction network shapes protein abundances and strengths of their functional and nonspecific interactions. Proc. Natl Acad. Sci. USA 108, 4258–4263 (2011).
Milenkovic, O. & Kashyap, N. On the design of codes for DNA computing. Coding Cryptogr. 3969, 100–119 (2006).
Sacanna, S., Irvine, W. T. M., Chaikin, P. M. & Pine., D. J. Lock and key colloids. Nature 464, 575–578 (2010).
Reinhardt, A. & Frenkel, D. Numerical evidence for nucleated selfassembly of DNA brick structures. Phys. Rev. Lett. 112, 238103 (2014).
Gothelf, K. V. LEGOlike DNA structures. Science 338, 1159–1160 (2012).
Rothemund, P. W. K. & Andersen, E. S. Nanotechnology: the importance of being modular. Nature 485, 584–585 (2012).
Bray, D. & Lay, S. Computerbased analysis of the binding steps in protein complex formation. Proc. Natl Acad. Sci. USA 94, 13493–13498 (1997).
Schade, N. B. et al. Tetrahedral colloidal clusters from random parking of bidisperse spheres. Phys. Rev. Lett. 110, 148303 (2013).
Flory, P. J. Molecular size distribution in linear condensation polymers. J. Am. Chem. Soc. 58, 1877–1885 (1936).
Rothemund, P. W. K. Folding DNA to create nanoscale shapes and patterns. Nature 440, 297–302 (2006).
Haxton, T. K. & Whitelam, S. Do hierarchical structures assemble best via hierarchical pathways? Soft Matter 9, 6851–6861 (2013).
Whitelam, S. & Jack., R. L. The Statistical Mechanics of Dynamic Pathways to Selfassembly. arXiv (2014).
Whitelam, S., Schulman, R. & Hedges, L. Selfassembly of multicomponent structures in and out of equilibrium. Phys. Rev. Lett. 109, 265506 (2012).
Leibler, L., Orland, H. & Wheeler, J. C. Theory of critical micelle concentration for solutions of block copolymers. J. Chem. Phys. 79, 3550 (1983).
Adleman, L. e. n. et al. in Proceedings of the Thiryfourth Annual ACM Symposium on Theory of Computing 23–32 (New York, NY, USA (2002).
Papp, B., Pál, C. & Hurst, L. D. Dosage sensitivity and the evolution of gene families in yeast. Nature 424, 194–197 (2003).
Veitita, R. A., Bottani, S. & Birchler, J. A. Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. Trends Genet. 24, 390–397 (2008).
Veitita, R. A. Nonlinear effects in macromolecular assembly and dosage sensitivity. J. Theor. Biol. 220, 19–25 (2003).
Warner, J. R. The economics of ribosome biosynthesis in yeast. Trends Biochem. Sci. 24, 437–440 (1999).
Warner, J. R. & McIntosh, K. B. How common are extraribosomal functions of ribosomal proteins? Mol. Cell. 34, 3–11 (2009).
Dean, D. & Nomura, M. Feedback regulation of ribosomal protein gene expression in Escherichia coli. Proc. Natl Acad. Sci. USA 77, 3590–3594 (1980).
Nomura, M., Yates, J. L., Dean, D. & Post, L. E. Feedback regulation of ribosomal protein gene expression in Escherichia coli: structural homology of ribosomal RNA and ribosomal protein mRNA. Proc. Natl Acad. Sci. USA 77, 7084–7088 (1980).
Callahan, B., Thattai, M. & Shraiman, B. I. Emergent gene order in a model of modular polyketide synthases. Proc. Natl Acad. Sci. USA 106, 19410–19415 (2009).
Marsh, J. A. et al. Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153, 461–470 (2013).
de Gennes, P. G. Statistics of branching and hairpin helices for the dAT copolymer. Biopolymers 6, 715–729 (1968).
Dumas, J. J. Crystal structure of the GpIb thrombin complex essential for platelet aggregation. Science 301, 222–226 (2003).
Acknowledgements
We thank Miranda HolmesCerfon, Miriam Huntley, Sarah Kostinksi, Nicholas Schade, Eugene Shakhnovich, William Shih, Tom Witten, Zorana Zervavcic and members of the Brenner group for their helpful discussions. This research was funded by the National Science Foundation through the Harvard Materials Research Science and Engineering Center (DMR0820484, DMR1435964), the Division of Mathematical Sciences (DMS1411694) and by grant RFP1204 from the Foundational Questions in Evolutionary Biology Fund. M.P.B. is an investigator of the Simons Foundation.
Author information
Authors and Affiliations
Contributions
A.M., J.Z. and M.P.B. designed the study; A.M. carried out the calculations and analysis; and A.M. and M.P.B. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figures 16, Supplementary Notes 14 and Supplementary References. (PDF 3907 kb)
Rights and permissions
About this article
Cite this article
Murugan, A., Zou, J. & Brenner, M. Undesired usage and the robust selfassembly of heterogeneous structures. Nat Commun 6, 6203 (2015). https://doi.org/10.1038/ncomms7203
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/ncomms7203
This article is cited by

Integrating protein copy numbers with interaction networks to quantify stoichiometry in clathrinmediated endocytosis
Scientific Reports (2022)

Programming multiprotein assembly by genebrush patterns and twodimensional compartment geometry
Nature Nanotechnology (2020)

Shaping the topology of folding pathways in mechanical systems
Nature Communications (2018)

The importance of thermodynamics for molecular systems, and the importance of molecular systems for thermodynamics
Natural Computing (2018)

Associative Pattern Recognition Through Macromolecular SelfAssembly
Journal of Statistical Physics (2017)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.