Abstract
From bacteria following simple chemical gradients^{1} to the brain distinguishing complex odour information^{2}, the ability to recognize molecular patterns is essential for biological organisms. This type of informationprocessing function has been implemented using DNAbased neural networks^{3}, but has been limited to the recognition of a set of no more than four patterns, each composed of four distinct DNA molecules. Winnertakeall computation^{4} has been suggested^{5,6} as a potential strategy for enhancing the capability of DNAbased neural networks. Compared to the linearthreshold circuits^{7} and Hopfield networks^{8} used previously^{3}, winnertakeall circuits are computationally more powerful^{4}, allow simpler molecular implementation and are not constrained by the number of patterns and their complexity, so both a large number of simple patterns and a small number of complex patterns can be recognized. Here we report a systematic implementation of winnertakeall neural networks based on DNAstranddisplacement^{9,10} reactions. We use a previously developed seesaw DNA gate motif^{3,11,12}, extended to include a simple and robust component that facilitates the cooperative hybridization^{13} that is involved in the process of selecting a ‘winner’. We show that with this extended seesaw motif DNAbased neural networks can classify patterns into up to nine categories. Each of these patterns consists of 20 distinct DNA molecules chosen from the set of 100 that represents the 100 bits in 10 × 10 patterns, with the 20 DNA molecules selected tracing one of the handwritten digits ‘1’ to ‘9’. The network successfully classified test patterns with up to 30 of the 100 bits flipped relative to the digit patterns ‘remembered’ during training, suggesting that molecular circuits can robustly accomplish the sophisticated task of classifying highly complex and noisy information on the basis of similarity to a memory.
Main
Winnertakeall computation^{4} is one of the simplest competitive neuralnetwork models, inspired by the lateral inhibition and competition observed among biological neurons in the brain^{14}. In this model, the output of a neuron is ON if and only if the weighted sum of all binary inputs is the largest among all neurons (Fig. 1a). Here, in a winnertakeall neural network, the weight matrix associated with each output is referred to as a ‘memory’. As shown in Fig. 1b, a simple training algorithm involves using the target patterns as weights. The example network has two memories—in other words, it ‘remembers’ two patterns—‘L’ and ‘T’. The network ‘recognizes’ a pattern by comparing it to all memories and identifying which memory the pattern is most similar to—the output associated with this memory will be ON and all other outputs will be OFF. For instance, a corrupted ‘L’ with the last bit flipped from 1 to 0 can be recognized as ‘L’, because it will result in y_{1} (the output of the neuron remembering ‘L’) being ON and y_{2} (the output of the neuron remembering ‘T’) being OFF.
The winnertakeall function can be broken into five subfunctions, each of which can be implemented with a simple chemical reaction (Fig. 1c): First, weight multiplication of x_{i} × w_{ij} (where x_{i} is a binary input and w_{ij} is an analogue weight) is implemented with reactions wherein an input species X_{i} catalytically converts a weight species W_{ij} to an intermediate product P_{ij}. If X_{i} is absent, then no P_{ij} will be produced; if X_{i} is present, then the final concentration of P_{ij} will be determined by the initial concentration of W_{ij}, thus setting the value of the weighted input. Second, summation is implemented with reactions that convert all intermediate species P_{ij} within the same neuron to a common weightedsum species S_{j}. Third, comparison of weighted sums to determine which is the largest is implemented with a set of ‘pairwise annihilation’ reactions, wherein each weightedsum species S_{j} destroys any other weightedsum species S_{k} until only a single winner remains. Fourth, signalrestoration reactions bring the concentration of the winner species back to a predetermined output value—the final concentration of a winning output species Y_{j} corresponds to the initial concentration of a restorationgate species RG_{j}. Last, reporting reactions are used to convert each output Y_{j} to a fluorescent signal Fluor_{j}.
All reactions except pairwise annihilation and signal restoration naturally take place sequentially, because the product of a previous reaction is a reactant of the next one. Because there are common reactants in the annihilation and restoration reactions, we used different rates to control their order: the former has a much faster rate constant than the latter, so a winner that survives all fast competitions is then converted slowly to an output signal.
Weight multiplication and signal restoration are both catalytic reactions, implemented with a pair of seesawing reactions^{11} (Fig. 1e, Extended Data Fig. 1). An input X_{i} (or weighted sum S_{j}) species first interacts with a weight W_{ij} (or restoration gate RG_{j}) species through a reversible stranddisplacement reaction^{15} to release an intermediate product P_{ij} (or output Y_{j}) species. A fuel strand XF_{i} (or YF_{j}) then frees the input (or weighted sum) species for more catalytic cycles. As long as the fuel strand is in excess, all weight (or restoration gate) molecules will eventually be converted to intermediate (or output) molecules. Summation is implemented with a single seesawing reaction facilitated by a summation gate SG_{j} (Extended Data Fig. 1). The reaction is reversible by itself but drained forward by the downstream irreversible reaction of pairwise annihilation.
The annihilation reaction is implemented with cooperative hybridization^{13} (Fig. 1f). One weightedsum strand S_{j} can bind to a toehold on one side of an annihilator molecule Anh_{jk} and branchmigrate to the middle point of the doublestranded domain. If only S_{j} is present, then this process is completely reversible and no molecules will be consumed. However, if another weightedsum strand S_{k} is also present, then it can bind to another toehold on the opposite side of the annihilator and also branchmigrate to the middle point of the doublestranded domain. When the S_{j} and S_{k} strands reach the middle point simultaneously, the annihilator will be split apart into two waste molecules. Because neither waste molecule has a toehold exposed, it cannot interact with any other molecules. The annihilation reaction shown in Fig. 1f is designed to be roughly 100 times faster than the signalrestoration reaction shown in Fig. 1e, owing to the two extra nucleotides in both toeholds on the annihilator—it is known that the rate of strand displacement reactions grows exponentially faster with a longer toehold^{15,16}.
Reporting is implemented with an irreversible stranddisplacement reaction, wherein an output strand Y_{j} interacts with a doublestranded reporter molecule Rep_{j} (Extended Data Fig. 1) to separate the fluorophore and quencherlabelled strands in the reporter, resulting in increased fluorescence. Overall, the implementation of an arbitrary winnertakeall neural network can be mapped systematically to a seesaw DNA circuit (Extended Data Fig. 2).
We started the experimental demonstration with a twospecies winnertakeall function (Fig. 2a), which is similar to approximate majority^{17} and consensus network^{18} functions. If the initial concentration of one weightedsum species (S_{1} or S_{2}) is higher than that of the other, then we expect the corresponding output strand (Y_{1} or Y_{2}) to be released catalytically and the fluorescent signal to reach an ideal ON state, while the other output signal remains at an ideal OFF state. The data agree with the expected overall circuit behaviour, and lead to two main observations. First, the circuit computed an ON state faster with a larger difference between the two species, as shown in the plots farther away from the diagonal line in Fig. 2a. This is because the signalrestoration reaction reaches completion faster with a larger amount of catalyst, which is the leftover amount of the winner after the annihilation reaction. Second, among experiments for which the differences between the two species are the same, the circuit maintained a cleaner OFF state with lower initial concentrations of the two species, as shown in the plots that are equidistant to the diagonal line but closer to the bottom left corner of the grid. This is because a small fraction of the weightedsum strands will interact with a restorationgate molecule before encountering an annihilator molecule—the stronger the runnerup is (that is, with a higher concentration), the more it can escape the process of being completely annihilated. These observations suggest that the DNA circuit does not yield a perfect winnertakeall behaviour, but that it does compute correctly for competitors that are not too similar to each other and are not both too strong.
Next, we added a weightedsum layer to the winnertakeall circuit to demonstrate recognition of 4bit patterns (Fig. 2b). Using the two target patterns as weights, the perfect input patterns each triggered the desired output trajectory to turn ON, indicating that the inputs were recognized correctly. When one or two bits of the input patterns were flipped, either from a 1 to a 0 or vice versa, the circuit still yielded the desired output for all six examples that are classifiable. The other eight possible inputs are not classifiable because they result in equal weighted sums (s_{1} = s_{2}). Interestingly, the circuit behaviour was better for the inputs with 2bit corruptions than for the perfect inputs: the ON trajectories reached completion just as fast and the OFF trajectories remained lower. This result can be understood by looking at the input patterns in the weightedsum space (Fig. 2b, bottom left): all four inputs are equidistant to the diagonal line and the corrupted patterns are closer to the bottom left corner of the space. Because catalytic reactions are used to implement weight multiplication, together with thresholding reactions, the circuit can also handle a range of input concentration that varies from the ideal high or low concentration (Extended Data Fig. 3).
To understand the theoretical limits of the scalability and power of winnertakeall DNA neural networks, in the context of simply using the target patterns as weights, we now address the following three questions. The first is the number of distinct target patterns that can be remembered simultaneously. Any set of patterns that consists of the same number of 1s can be remembered (Methods, Theorem 1). For example, the largest set of 9bit patterns that can be remembered, each consisting of five 1s, consists of ^{9}C_{5} = 126 patterns. Moreover, any set of patterns can be remembered if it does not contain a pattern in which all 1s are a subset of 1s in another pattern (Methods, Theorem 2). The second question concerns which corrupted patterns can be recognized. All patterns with fewer than b − o corrupted bits can be recognized, where b is the total number of 1s and o is the maximum number of overlapped 1s in all target patterns (Methods, Theorem 3). For example, all patterns with fewer than three corrupted bits can be recognized for the 9bit target patterns ‘L’ and ‘T’ shown in Fig. 1b, because b = 5 and o = 2. Moreover, some patterns with more than b − o corrupted bits can still be recognized; for example, in all possible 9bit patterns, there are 128, 102 and 30 patterns with three, four and five corrupted bits, respectively, that can be recognized as ‘L’ or ‘T’. We chose 28 example 9bit patterns with an increasing number of corrupted bits from one to five, and demonstrated that the DNA neural network correctly classified all examples (Extended Data Fig. 4). The final question asks how the size of the DNA circuit scales with an increasing number of more complex patterns. In general, constructing a network that can remember m distinct nbit patterns requires n input strands, n × m weight molecules and n fuel strands for weight multiplication, m summation gates, ^{m}C_{2} annihilators, m gates and m fuel strands for signal restoration, and m reporters, totalling n × m + 2n + 4m + ^{m}C_{2} molecules. However, for a specific set of target patterns, only a subset of the weight molecules are required, each corresponding to a 1 in the patterns.
To demonstrate the scalability and power of winnertakeall DNA neural networks experimentally, we chose a task that is visually interesting: recognizing handwritten digits. Some aspects of this task are computationally nontrivial, such as distinguishing a sloppy ‘4’ from a sloppy ‘9’. The patterns of digits were taken from the Modified National Institute of Standards and Technology (MNIST) database^{19}, which is commonly used to test machine learning algorithms^{20}. We converted the original patterns to binary patterns with 20 1s on a 10 × 10 grid, averaged 100 example ‘6’ and ‘7’ patterns, and selected and normalized the top 20 pixels as weights (Fig. 3a, Methods section ‘Neural network training and testing’). The value of each analogue weight was then implemented with the concentration of a weight molecule. The test inputs remained binary patterns, in which each 1 or 0 corresponded to the presence or absence of an input strand, respectively (Fig. 3b). The theoretical limits of the winnertakeall neural networks with analogue weights are similar to those with binary weights (Methods, Theorems 4 and 5). In total, 104 distinct molecules were used for testing any specific input pattern out of 184 distinct molecules for all possible inputs (Fig. 3c).
In the MNIST database, there are more than 14,000 example handwritten ‘6’ and ‘7’ digits. On the basis of the understanding that we have established from the experimental characterization of smaller winnertakeall circuits, we looked at all example patterns in the weightedsum space (Fig. 3d, Extended Data Fig. 5a): 2% of the patterns are on the wrong side of the diagonal line, which means that it is impossible for the DNA circuit to recognize them correctly; 8% of the patterns are fairly close to the diagonal line (within a 15% margin), which we expect to be experimentally difficult; however, the remaining 90% of the patterns are far enough from the diagonal line that we expect correct recognition. Therefore, we chose 36 representative example patterns from the last category, ensuring both uniform distribution in the weightedsum space and the full range of bit deviation from the memories (Methods section ‘Neural network training and testing’). As shown in the experimental data (Fig. 3e, Extended Data Fig. 5d), the perfect patterns (the weights converted to binary) each yield the desired circuit output. More importantly, patterns that increasingly deviate from the memories were also recognized, with up to 30 flipped bits. Similar to observations in the smaller DNA neural networks, some of the patterns that are visually more challenging to recognize are not necessarily more difficult for the DNA circuit—a desirable property of the winnertakeall computation.
We have shown that the winnertakeall DNA neural networks scale well to more complex patterns. Next, we explore whether they could also be used to remember an increasing number of distinct patterns simultaneously. The pairwiseannihilation approach alone is not well suited for scaling up the number of patterns because the number of annihilators grows quadratically with the number of patterns. We show that the threespecies winnertakeall function was still robust enough (Extended Data Fig. 6a) to allow the construction of a DNA neural network that remembers three 100bit patterns. However, the competition became harder with more competitors: the reaction rates for multiple annihilation pathways could be matched approximately but not perfectly (Methods section ‘Sequence design’, Extended Data Fig. 6b, c), and it took much longer for the annihilation reactions to yield a winner and for the signal level of the winner to be fully restored (Extended Data Fig. 7). Using the same method, it would be difficult to construct networks that remember more patterns. We therefore propose an alternative approach that first divides the target patterns into groups and then uses multiple distinct group identities to classify the patterns (Fig. 4a). The nine digits ‘1’–‘9’ can be divided into three groups in two ways (shown as three rows and three columns in Fig. 4b), such that a pair of outputs corresponds uniquely to each digit (Fig. 4d). For example, a ‘4’ is recognized if and only if y_{1} = 1 and z_{1} = 1 (where y_{1} is the output identifying the first row and z_{1} is the output identifying the first column). With this grouping approach, nine distinct patterns can be recognized using only \({}^{\sqrt{9}}{C}_{2}\times 2=6\) annihilators, which would otherwise require ^{9}C_{2} = 36 annihilators. In total, 225 distinct molecules were used for testing any specific input pattern out of 305 distinct molecules for all possible inputs (Fig. 4c).
We determined the weights for each group using a simple ‘average then subtract’ method (Fig. 4b): take the average of 100 examples per ingroup digit, subtract the average of 100 examples per outofgroup digit, then select and normalize the top 20 pixels (Methods section ‘Neural network training and testing’). The tradeoff of the grouping approach is that fewer example patterns can be recognized. With the best grouping, 47% of the patterns can potentially be recognized, of which 48% are experimentally feasible (with a 15% margin to the diagonal line in the normalized weightedsum space). In general, with the same circuit complexity, this alternative approach enables a larger set of distinct target patterns to be classified, but with less accuracy. Nonetheless, as shown in the experimental data, the circuit yields the desired pair of outputs for 99 representative example patterns (Fig. 4d, e).
To facilitate the design of winnertakeall DNA neural networks, we developed an online software tool. The WTA Compiler^{21} (Extended Data Fig. 8) converts a userdefined set of memories and test patterns into program code that describes a DNA neural network, which can then be used to simulate the kinetics of the network. It also provides sequences of the DNA strands that are required to construct the DNA neural network experimentally.
It is interesting to compare the performance of winnertakeall neural networks with logic circuits. For example, it is possible to distinguish whether a 9bit pattern is more similar to ‘L’ or ‘T’ using a circuit consisting of 8 logic gates, for all input patterns that we have tested experimentally. However, a more complex circuit consisting of 21 logic gates is required to correctly compute the output for all classifiable patterns (Extended Data Fig. 9a). Similarly, the 100bit handwritten digits can be recognized by circuits with up to 23 logic gates, if only the example patterns that we have tested experimentally are considered. But these logic circuits perform poorly when tested against the entire MNIST database (Extended Data Fig. 9b). To match the theoretical limit of winnertakeall neural networks, measured by the percentage of classifiable patterns, much more complex logic circuits are needed. Importantly, varying the concentrations of the weight molecules in the winnertakeall neural networks would enable the same set of DNA molecules to be used for different patternclassification tasks. By contrast, without reconfigurable circuit architectures, a different set of DNA molecules would be required for a logic circuit that performs a different task.
The power of winnertakeall DNA neural networks could be explored further in several directions. Instead of the pairwiseannihilation approach, a winner could be selected by utilizing competing resources^{5,6}, which could potentially lead to more scalable and accurate pattern recognition. It could also provide the possibility of selecting several winners instead of just one, which in theory is computationally more powerful^{4}. Extending the circuit construction from singlelayer to multilayer winnertakeall computation, or simply allowing the outputs of winnertakeall circuits to be connected to downstream logic circuits, could enable more sophisticated pattern recognition (such as involving translated and rotated patterns)^{22}. Using a variablegain amplifier^{23,24}, winnertakeall DNA circuits could be adapted to process analogue inputs, which would enable a wider range of signalclassification tasks, including applications in detecting complex disease profiles that consist of mRNA and microRNA signals. With aptamers^{25,26}, more diverse biomolecules could be detected.
The fact that we were able to use target patterns as weights in winnertakeall DNA neural networks opens up immediate possibilities for embedding learning within autonomous molecular systems. With one additional circuit component that actives weight molecules during a supervised training process, the DNA circuits would be capable of activating a specific set of wires in the weightmultiplication layer when exposed to a specific set of patterns. As widely discussed in experimental^{27} and theoretical^{28,29,30} studies, learning—the most desirable property of biochemical circuits—would allow artificial molecular machines to adapt their functions on the basis of environmental signals during autonomous operations.
Methods
Sequence design
All DNA strands used in the winnertakeall neural networks were composed of long branchmigration domains and short toehold domains. Owing to the modularity of the previously developed seesaw DNA motif^{3,11} and the extended new circuit component—the annihilator—the sequence design was performed at the domain level. A pool of domain sequences was generated according to a set of design heuristics that have previously been experimentally validated^{12}. All domains used a threeletter code (A, T and C) to reduce secondary structure and undesired strand interactions. No domain sequences include runs of more than four consecutive As or Ts or more than three consecutive Cs, which reduces synthesis errors. All domain sequences had between 30% and 70% Ccontent so all doublestranded complexes would have similar melting temperatures. Finally, no pairs of domain sequences share a matching sequence longer than 35% of the domain length, and all pairs have at least 30% different nucleotides. This ensures that a strand with a mismatched branchmigration domain will not complete strand displacement initiated from either the 3′ or the 5′ end. In addition to a 15nucleotide sequence pool used in previous work^{3,11,12}, a 20nucleotide sequence pool was generated and used in the weight multiplication layers because of the large number of molecules used here. The two sequence pools were checked to ensure that the same pairwise criteria were met. All domains included the clamp design introduced previously^{11}, to reduce leak reactions between initial gate species.
All molecular complexes shared a 5nucleotide universal toehold domain^{3,11,12}. The annihilator complexes had 7nucleotide toeholds composed of the 5nucleotide universal toehold and a 2nucleotide extension that matched the 2 nucleotides adjacent to the toehold on the upstream seesaw gate. This increased the binding energy and thus the effective stranddisplacement reaction rate between the annihilator complexes and the weightedsum strands, compared to that between the signalrestoration gates and the weightedsum strands.
To ensure ‘fair competition’ between the weightedsum species (that is, same rates for all pairwiseannihilation reactions), all annihilators within a set of winnertakeall computations had identical toehold extensions, and the weightedsum strands had the same singlenucleotide dangle to keep the binding energies consistent within a winnertakeall computation. Here, we used up to two sets of three annihilators. The extensions and dangle sequences were chosen by estimating the binding energies using NUPACK^{31}, and the sequences for the second set of annihilators were chosen with similar energies to those of the first set that worked well in the threespecies winnertakeall experiments (Extended Data Fig. 6a). In addition, the rate of an annihilation reaction could depend on the sequence of the branchmigration domains. We measured the rates of 15 catalytic gates, and selected two groups of three gates with the closest rates (Extended Data Fig. 6b, c). By using these gates for signal restoration, the branchmigration domains in the annihilators were determined simultaneously, because the signalrestoration gates and annihilators share the same branchmigration domains (Extended Data Fig. 1).
All DNA sequences are listed in Supplementary Table 1.
Neuralnetwork training and testing
The winnertakeall DNA neural network was tested on patterns derived from the MNIST handwrittendigit database^{19}. The training and testing sets were downloaded and merged into a single database, and all example patterns of digits ‘1’–‘9’ were retained, totalling 63,097 images. The original MNIST dataset consists of weightcentred greyscale images on a 28 × 28 grid. Here, we used binary patterns on a 10 × 10 grid. First, the images were rescaled to a 12 × 12 grid using Gaussian resampling. The largest 20 bits in each image were set to 1 and the remaining bits were set to 0. Finally, the digits were recentred on a 10 × 10 grid on the basis of their bounding boxes.
We made a conscious effort to train the neural networks using a simple algorithm. In the neural networks that remember two or three handwritten digits, for each digit, the weight matrices were the average of the first 100 example patterns in the database, restricted to the 20 most common bits (that is, the ones with the largest averaged values), and normalized to sum to 1. For the ninedigit network, all digits were divided into three groups in two ways. For each group, the weight matrix was the average of the first 100 examples of the three ingroup digits less the average of the first 100 examples of the six outofgroup digits. The 20 most common bits were retained, and all weight matrices were normalized to sum to 1.15, to shift the test patterns into a more ideal area in the weightedsum space. The fraction of experimentally feasible test patterns (with a 15% margin to the diagonal line in the weightedsum space for all pairs of species) was calculated for all ways of grouping the nine digits, and the best grouping was chosen. The classification performance of the network using weights determined by nonnegative least squares was only slightly better than the performance using weights from the simple ‘average then subtract’ method (54% versus 47%).
Experimentally tested input patterns were chosen to represent the whole weightedsum space as well as the full range of bit deviation from the memories of the networks. To choose a set of test patterns for a digit, all correctly classified examples of that digit with at least a 15% margin in the weightedsum space were divided into six corruption classes. The weighted sums for the digits in each class were then clustered using the kmedoids algorithm, and an example test pattern was chosen randomly from each cluster according to a uniform distribution. This ensured that the test patterns represented the whole weightedsum space and not just the most common digits.
Weights and inputs used in all experiments are listed in Supplementary Table 2. By exporting each sheet of the Excel file to a .csv file and uploading it to the WTA Compiler^{21}, the weights and inputs can be visually displayed, the inputs analysed in their weightedsum space, the kinetics behaviour of the winnertakeall DNA neural network simulated and DNA sequences generated.
DNA oligonucleotide synthesis
All DNA strands were purchased from Integrated DNA Technologies (IDT). The reporter strands with fluorophores and quenchers were purified (HPLC) and the other strands were unpurified (standard desalting). All strands were shipped lyophilized then resuspended at 100 μM in TrisEDTA (TE) buffer, pH 8.0, and stored at 4 °C.
Annealing protocol and buffer condition
Annihilator and gate complexes were prepared for annealing at 45 μM with top and bottom strands in a 1:1 ratio. Reporters were prepared at 20 μM with top quencher strands in 20% excess of bottom strands. The buffer for all experiments and annealed complexes was TE with 12.5 mM Mg^{2+}. Complexes were annealed in a thermal cycler (Eppendorf) by heating to 90 °C for 5 min and then cooling to 20 °C at a rate of 0.1 °C per 6 s.
Purification
Annealed annihilator and gate complexes were purified using 12% polyacrylamide gel electrophoresis (PAGE). Doublestranded complex bands were cut from the gel, chopped into pieces and incubated for 24 h at room temperature in TE buffer with 12.5 mM Mg^{2+} to allow DNA to diffuse into the buffer. The solution with purified complexes was recovered and concentrations were determined with NanoDrop (Thermo Fisher). Weight matrices for the DNA neural networks that remember handwritten digits had 20 gate complexes for each neuron. These gates (weight molecules) were annealed individually and then mixed together in the appropriate ratio, on the basis of the values of the weights. This mixture was then purified via PAGE, recovered and the concentration determined by NanoDrop using the weightedaverage extinction coefficient.
Fluorescence spectroscopy
Fluorescence kinetics data were collected every 2, 3 or 4 min, depending on the overall length of the experiment, using a microplate reader (Synergy H1, Biotek). Excitation (emission) wavelengths were 496 nm (525 nm) for dye ATTO488, 555 nm (582 nm) for dye ATTO550 and 598 nm (629 nm) for dye ATTO590. Experiments were performed in 96well plates (Corning) with 160μl reaction mixture per well for the ninedigit experiments and 200μl reaction mixture per well for all other experiments. Experiments were performed at a standard concentration of 100 nM for all 4bit and 100bit pattern recognition and at a standard concentration of 50 nM for all other experiments. Initial concentrations of all species are listed in Extended Data Fig. 10. Detailed protocols for all experiments are listed in Supplementary Table 3.
In the ninedigit experiments, six distinct output trajectories were read using three distinct fluorophores. Every experiment was run twice, each having half of the outputs connected to fluorophorelabelled reporters and the other half to nonfluorophorelabelled reporters. Combining the output trajectories from each pair of experiments into a single plot allows the observation of all six outputs simultaneously.
Data normalization
All data were normalized from raw fluorescence level to standard concentration, which is the maximum concentration of an output strand Y_{j} released from gate RG_{j} and interacted with a doublestranded reporter molecule Rep_{j}. The fluorescence level that corresponds to standard concentration (1×) was obtained from the average of the final five measurements from the highest signal produced from gate RG_{j} on a plate. Negligible concentration (0×) corresponds to the background fluorescence of the reaction mixture before any reporter molecules have been triggered, which was obtained from the first measurement of the lowest signal produced from gate RG_{j} on a plate. All experiments on a single plate were normalized together, allowing direct comparison between the output of a network for different input patterns. In the twospecies winnertakeall experiments shown in Extended Data Fig. 3, the first six columns of data were measured on one plate and the last five columns measured on another. In the 9bit patternrecognition experiments shown in Extended Data Fig. 4, the input patterns with 0–2 corrupted bits were measured on one plate and those with 3–5 corrupted bits were measured on another.
Model and simulations
Massaction simulation were performed using the same set of reactions and rate constants developed in the seesaw model^{11}, with four additional reactions to model pairwise annihilation:
$$\begin{array}{l}{S}_{j}+{{\rm{Anh}}}_{jk}\,\underset{{k}_{{\rm{r}}}}{\overset{{k}_{{\rm{f}}}}{\rightleftharpoons }}\,{S}_{j}{{\rm{:Anh}}}_{jk}\\ {S}_{k}+{{\rm{Anh}}}_{jk}\,\underset{{k}_{{\rm{r}}}}{\overset{{k}_{{\rm{f}}}}{\rightleftharpoons }}\,{S}_{k}{{\rm{:Anh}}}_{jk}\\ {S}_{j}{{\rm{:Anh}}}_{jk}+{S}_{k}\mathop{\to }\limits^{{k}_{{\rm{f}}}}\varnothing \\ {S}_{k}{{\rm{:Anh}}}_{jk}+{S}_{j}\mathop{\to }\limits^{{k}_{{\rm{f}}}}\varnothing \end{array}$$Here, k_{f} = 2 × 10^{6} M^{−1}s^{−1}, which is the same as the forward rate constant of the thresholding reaction in the seesaw model^{11}. The reverse rate constant k_{r} = 0.4 s^{−1} was determined using the experimental data shown in Extended Data Fig. 3a. This rate constant is of the same order as found in a previous study of cooperative hybridization^{13}. Similar to the spurious reactions in the original seesaw model, temporary toehold binding between any singlestranded species and any annihilator (or intermediate annihilator species listed above) are also included here.
Code availability
Simulation code is available at the WTA Compiler website^{21}.
Theoretical limits of the power of winnertakeall neural networks
The winnertakeall function shown in Fig. 1a is defined to have:
$$\begin{array}{ll}{\rm{Inputs}} & {\boldsymbol{x}}=({x}_{1},{x}_{2},\ldots ,{x}_{n})\\ {\rm{Weights}} & W=({{\boldsymbol{w}}}_{1}^{\top },{{\boldsymbol{w}}}_{2}^{\top },\ldots ,{{\boldsymbol{w}}}_{m}^{\top }),\hspace{2.77626pt}{\rm{with}}\hspace{2.77626pt}{{\boldsymbol{w}}}_{j}=({w}_{1j},{w}_{2j},\ldots ,{w}_{nj})\\ {\rm{Weighted\; sums}} & {s}_{j}={{\boldsymbol{w}}}_{j}\cdot {\boldsymbol{x}}\\ {\rm{Outputs}} & {y}_{j}=\left\{\begin{array}{ll}1 & {\rm{if}}\hspace{2.77626pt}{s}_{j} > {s}_{k}\hspace{2.77626pt}\forall \hspace{2.77626pt}k\ne j\\ 0 & {\rm{otherwise}}\end{array}\right.\end{array}$$Definition 1
Let X = {x^{1}, x^{2}, …, x^{m}} be a set of m patterns, each with n bits. Let an example pattern from X be \({{\boldsymbol{x}}}^{\alpha }=({x}_{1}^{\alpha },{x}_{2}^{\alpha },\cdots \,,{x}_{n}^{\alpha })\), with \({x}_{i}^{\alpha }\in \{0,1\}\). We say that a winnertakeall neural network with weights W remembers X if y_{α} = 1 for all 1 ≤ α ≤ m (and y_{j} = 0 for all j ≠ α) when x = x^{α}.
Theorem 1
If X is a set of m distinct nbit patterns, each containing exactly b 1s, then the winnertakeall neural network with \(W=({{\boldsymbol{w}}}_{1}^{\top },{{\boldsymbol{w}}}_{2}^{\top },\ldots ,{{\boldsymbol{w}}}_{m}^{\top })\) and w_{j} = (w_{1j}, w_{2j}, …, w_{nj}) = x^{j} (that is, \({w}_{ij}={x}_{i}^{j}\)) remembers X.
Proof. Consider this network on input x = x^{α}. First, for j = α, we calculate s_{α} = x^{α} · x^{α} = b. Second, for j ≠ α, x^{j} ≠ x^{α}. Because the number of 1s in both of these patterns is b, the number of indices at which the bits are both 1 is strictly less than b. Therefore, s_{j} = x^{j} · x^{α} < b. Putting the first and second calculations together, we conclude that s_{α} > s_{j} and thus y_{α} = 1 and y_{j} = 0 for all j ≠ α.
The next theorem is a generalization of Theorem 1.
Theorem 2
If X is a set of m distinct nbit patterns, and the 1s in any example pattern x^{α} is not a subset of the 1s in another pattern x^{β} (that is, no two example patterns satisfy x^{α} · x^{β} = x^{α} · x^{α}), then the winnertakeall neural network with \(W=({{\boldsymbol{w}}}_{1}^{\top },{{\boldsymbol{w}}}_{2}^{\top },\ldots ,{{\boldsymbol{w}}}_{m}^{\top })\) and w_{j} = x^{j} remembers X.
Proof. Consider this network on input x = x^{α}. First, s_{α} = x^{α} · x^{α} and is equal to the total number of 1s in x^{α}. Second, for j ≠ α, s_{j} = x^{j} · x^{α} ≠ x^{α} · x^{α}. Third, for all j, s_{j} = x^{j} · x^{α} ≤ x^{α} · x^{α} = s_{α}. Putting these three constraints together, we conclude that s_{α} > s_{j} and thus y_{α} = 1 and y_{j} = 0 for all j ≠ α.
Definition 2
In a winnertakeall neural network with \(W=({{\boldsymbol{w}}}_{1}^{\top },{{\boldsymbol{w}}}_{2}^{\top },\ldots ,{{\boldsymbol{w}}}_{m}^{\top })\) and w_{j} = x^{j}, we that say each x^{j} is a memory. We say that the network recognizes input x as memory x^{α} if y_{α} = 1 (and y_{j} = 0 for all j ≠ α). We say that a pattern x has c corrupted bits compared to a memory x^{α} (or has cbit deviation from x^{α}) if the number of indices at which the bits are different (that is, one bit is 0 and the other is 1 or vice versa) in x and x^{α} is exactly c. We say that two memories x^{α} and x^{β} have o overlapped bits if the number of indices at which the bits are both 1 in these memories is exactly o.
Theorem 3
If x is a pattern with c < b − o corrupted bits compared to a memory x^{α}, where b is the total number of 1s in x^{α} and o is the maximum number of overlapped bits in x^{α} and x^{j} for all j ≠ α, then the winnertakeall neural network recognizes x as x^{α}.
Proof. Let c_{0} be the number of flipped 0s (that is, where 1 in x and 0 in x^{α} appear at the same index) and c_{1} be the number of flipped 1s (that is, where 0 in x and 1 in x^{α} appear at the same index). First, s_{α} = x^{α} · x = b − c_{1}. Second, for j ≠ α, s_{j} = x^{j} · x ≤ o + c_{0} (s_{j} reaches its maximum when all corrupted 1s are 0s and all corrupted 0s are 1s are at the same indices in x^{j}). Third, because c = c_{0} + c_{1} and c < b − o, o + c_{0} = o + c − c_{1} < o + b − o − c_{1} = b − c_{1}. Putting the three constraints together, we conclude that s_{α} > s_{j} and thus y_{α} = 1 and y_{j} = 0 for all j ≠ α.
Next, we consider a much larger set of nbit patterns, X = {x^{1}, x^{2}, …, x^{M}} with \(M\gg m\).
Definition 3
Let each example pattern \({{\boldsymbol{x}}}^{\mu }=({x}_{1}^{\mu },{x}_{2}^{\mu },\cdots \,,{x}_{n}^{\mu })\) be associated with a desired output \({{\boldsymbol{y}}}^{\mu }=({y}_{1}^{\mu },{y}_{2}^{\mu },\cdots \,,{y}_{n}^{\mu })\), with \({y}_{j}^{\mu }\in \{0,1\}\) and \({\sum }_{j=1}^{m}{y}_{j}^{\mu }=1\) (that is, only one specific \({y}_{\alpha }^{\mu }=1\) and \({y}_{j}^{\mu }=0\) for all j ≠ α). If \({y}_{\alpha }^{\mu }=1\), then we say that x^{μ} is a pattern in class α.
Let \({\widetilde{{\boldsymbol{x}}}}^{\alpha }=({\widetilde{x}}_{1}^{\alpha },{\widetilde{x}}_{2}^{\alpha },\cdots \,,{\widetilde{x}}_{n}^{\alpha })=\left({\sum }_{\mu }{x}_{1}^{\mu },{\sum }_{\mu }{x}_{2}^{\mu },\cdots \,,{\sum }_{\mu }{x}_{n}^{\mu }\right)\) for all μ with \({y}_{\alpha }^{\mu }=1\) (that is, the sum of all patterns in class α). Let \({t}_{\alpha }={\sum }_{i}{\widetilde{x}}_{i}^{\alpha }\) for the b largest components of \({\widetilde{{\boldsymbol{x}}}}^{\alpha }\). Let \({\bar{{\boldsymbol{x}}}}^{\alpha }=({\bar{x}}_{1}^{\alpha },{\bar{x}}_{2}^{\alpha },\cdots \,,{\bar{x}}_{n}^{\alpha })\), with \({\bar{x}}_{i}^{\alpha }={\widetilde{x}}_{i}^{\alpha }/{t}_{\alpha }\) if \({\widetilde{x}}_{i}^{\alpha }\) is one of the b largest values and \({\bar{x}}_{i}^{\alpha }=0\) otherwise (that is, the averaged pattern for class α, restricted to the b most common bits and normalized to sum to 1). Let \({\hat{{\boldsymbol{x}}}}^{\alpha }=({\hat{x}}_{1}^{\alpha },{\hat{x}}_{2}^{\alpha },\cdots \,,{\hat{x}}_{n}^{\alpha })\) , with \({\hat{x}}_{i}^{\alpha }=1\) if \({\bar{x}}_{i}^{\alpha } > 0\) and \({\hat{x}}_{i}^{\alpha }=0\) if \({\bar{x}}_{i}^{\alpha }=0\). Let \(\hat{X}=\{{\hat{{\boldsymbol{x}}}}^{1},{\hat{{\boldsymbol{x}}}}^{2},\cdots \,,{\hat{{\boldsymbol{x}}}}^{m}\}\) be the set of averaged patterns converted to binary.
The next two theorems are similar to Theorems 1 and 3, but generalized to using averaged training patterns as analogue weights rather than using a single training pattern (that is, target pattern) as binary weights.
Theorem 4
If X is a set of M distinct nbit patterns, \({\hat{{\boldsymbol{x}}}}^{j}\) contains exactly b 1s for all j and \({\hat{{\boldsymbol{x}}}}^{j}\ne {\hat{{\boldsymbol{x}}}}^{k}\) for all j ≠ k, then the winnertakeall neural network with \(W=({{\boldsymbol{w}}}_{1}^{\top },{{\boldsymbol{w}}}_{2}^{\top },\ldots ,{{\boldsymbol{w}}}_{m}^{\top })\) and \({{\boldsymbol{w}}}_{j}={\bar{{\boldsymbol{x}}}}^{j}\) remembers \(\hat{X}\).
Proof. Consider this network on input \({\boldsymbol{x}}={\hat{{\boldsymbol{x}}}}^{\alpha }\). First, we calculate \({s}_{\alpha }={\bar{{\boldsymbol{x}}}}^{\alpha }\cdot {\hat{{\boldsymbol{x}}}}^{\alpha }={\sum }_{i=1}^{n}{\bar{x}}_{i}^{\alpha }=1\). Second, for j ≠ α, \({\hat{{\boldsymbol{x}}}}^{j}\ne {\hat{{\boldsymbol{x}}}}^{\alpha }\). Because the number of 1s in both of these patterns is b, there exist at least one index i at which \({\hat{x}}_{i}^{j}=1\) (and \({\bar{x}}_{i}^{j} > 0\)) and \({\hat{x}}_{i}^{\alpha }=0\); thus \({s}_{j}={\bar{{\boldsymbol{x}}}}^{j}\cdot {\hat{{\boldsymbol{x}}}}^{\alpha } < {\sum }_{i=1}^{n}{\bar{x}}_{i}^{j}=1\). Putting the two constraints together, we conclude that s_{α} > s_{j} and thus y_{α} = 1 and y_{α} = 0 for all j ≠ α.
Definition 4
In a winnertakeall neural network with W = (w_{1}^{T}, w_{2}^{T}, …, w_{m}^{T}) and \({{\boldsymbol{w}}}_{j}={\bar{{\boldsymbol{x}}}}^{j}\), we say that each \({\bar{{\boldsymbol{x}}}}^{j}\) is a memory and each \({\boldsymbol{x}}={\hat{{\boldsymbol{x}}}}^{j}\) is a perfect input. We say that a binary pattern x has cbit deviation from a memory \({\bar{{\boldsymbol{x}}}}^{\alpha }\) if the number of indices at which the bits are different in x and \({\hat{{\boldsymbol{x}}}}^{\alpha }\) is exactly c. We say that two memories \({\bar{{\boldsymbol{x}}}}^{\alpha }\) and \({\bar{{\boldsymbol{x}}}}^{\beta }\) have overlap \(o=max\{{\bar{{\boldsymbol{x}}}}^{\alpha }\cdot {\hat{{\boldsymbol{x}}}}^{\beta },{\bar{{\boldsymbol{x}}}}^{\beta }\cdot {\hat{{\boldsymbol{x}}}}^{\alpha }\}\). We say a bit i is no more than average in \({\bar{{\boldsymbol{x}}}}^{\alpha }\) if \({\bar{x}}_{i}^{\alpha }\le 1/b\), where b is the total number of 1s in \({\hat{{\boldsymbol{x}}}}^{\alpha }\).
Theorem 5
If x is a pattern with cbit deviation from a memory \({\bar{{\boldsymbol{x}}}}^{\alpha }\), where c < b(1 − o), b is the total number of 1s in \({\hat{{\boldsymbol{x}}}}^{\alpha }\) and o is the maximum overlap in \({\bar{{\boldsymbol{x}}}}^{\alpha }\) and \({\bar{{\boldsymbol{x}}}}^{j}\) for all j ≠ α, and if all flipped 1s are no more than average in \({\bar{{\boldsymbol{x}}}}^{\alpha }\) and all flipped 0s are no more than average in \({\bar{{\boldsymbol{x}}}}^{j}\) for all j ≠ α, then the winnertakeall neural network recognizes x as \({\hat{{\boldsymbol{x}}}}^{\alpha }\).
Proof. Let c_{0} be the number of flipped 0s (that is, where 1 in x and 0 in \({\hat{{\boldsymbol{x}}}}^{\alpha }\) appear at the same index) and c_{1} be the number of flipped 1s (that is, where 0 in x and 1 in \({\hat{{\boldsymbol{x}}}}^{\alpha }\) appear at the same index). First, \({s}_{\alpha }={\bar{{\boldsymbol{x}}}}^{\alpha }\cdot {\boldsymbol{x}}\ge 1{c}_{1}/b\). Second, for j ≠ α, \({s}_{j}={\bar{{\boldsymbol{x}}}}^{j}\cdot {\boldsymbol{x}}\ge o+{c}_{0}/b\). Third, because c = c_{0} + c_{1} and c < b(1 − o), o + c_{0}/b = o + (c − c_{1})/b < o + [b(1 − o) − c_{1}]/b = 1 − c_{1}/b. Putting the three constraints together, we conclude that s_{α} > s_{j} and thus y_{α} = 1 and y_{j} = 0 for all j ≠ α.
These are not the strongest results possible, but they provide intuition about how the winnertakeall neural network functions, with both binary and analogue weights, and how tolerant to errors it is.
Data availability
All data that support the findings of this study are included in the manuscript and its Extended Data. Source Data for Figs. 2–4 and Extended Data Figs. 3–7 are provided with the online version of the paper.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.
Wadhams, G. H. & Armitage, J. P. Making sense of it all: bacterial chemotaxis. Nat. Rev. Mol. Cell Biol. 5, 1024–1037 (2004).
 2.
Mori, K., Nagao, H. & Yoshihara, Y. The olfactory bulb: coding and processing of odor molecule information. Science 286, 711–715 (1999).
 3.
Qian, L., Winfree, E. & Bruck, J. Neural network computation with DNA strand displacement cascades. Nature 475, 368–372 (2011).
 4.
Maass, W. On the computational power of winnertakeall. Neural Comput. 12, 2519–2535 (2000).
 5.
Kim, J., Hopfield, J. & Winfree, E. Neural network computation by in vitro transcriptional circuits. Adv. Neural Inf. Process. Syst. 17, 681–688 (2005).
 6.
Genot, A. J., Fujii, T. & Rondelez, Y. Scaling down DNA circuits with competitive neural networks. J. R. Soc. Interface 10, 20130212 (2013).
 7.
Muroga, S. Threshold Logic and its Applications (Wiley Interscience, New York, 1971).
 8.
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).
 9.
Yurke, B., Turberfield, A. J., Mills, A. P., Simmel, F. C. & Neumann, J. L. A DNAfuelled molecular machine made of DNA. Nature 406, 605–608 (2000).
 10.
Zhang, D. Y. & Seelig, G. Dynamic DNA nanotechnology using stranddisplacement reactions. Nat. Chem. 3, 103–113 (2011).
 11.
Qian, L. & Winfree, E. Scaling up digital circuit computation with DNA strand displacement cascades. Science 332, 1196–1201 (2011).
 12.
Thubagere, A. J. et al. Compileraided systematic construction of largescale DNA strand displacement circuits using unpurified components. Nat. Commun. 8, 14373 (2017).
 13.
Zhang, D. Y. Cooperative hybridization of oligonucleotides. J. Am. Chem. Soc. 133, 1077–1086 (2011).
 14.
Redgrave, P., Prescott, T. J. & Gurney, K. The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89, 1009–1023 (1999).
 15.
Zhang, D. Y. & Winfree, E. Control of DNA strand displacement kinetics using toehold exchange. J. Am. Chem. Soc. 131, 17303–17314 (2009).
 16.
Yurke, B. & Mills, A. P. Using DNA to power nanostructures. Genet. Program. Evol. Mach. 4, 111–122 (2003).
 17.
Cardelli, L. & CsikászNagy, A. The cell cycle switch computes approximate majority. Sci. Rep. 2, 656 (2012).
 18.
Chen, Y.J. et al. Programmable chemical controllers made from DNA. Nat. Nanotechnol. 8, 755–762 (2013).
 19.
LeCun, Y., Cortes, C. & Burges, C. J. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/index.html.
 20.
Deng, L. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 29, 141–142 (2012).
 21.
Cherry, K. M. WTA Compiler. http://www.qianlab.caltech.edu/WTAcompiler/ (2017).
 22.
Rojas, R. Neural Networks: A Systematic Introduction (Springer, Berlin, 2013).
 23.
Zhang, D. Y. & Seelig, G. DNAbased fixed gain amplifiers and linear classifier circuits. In DNA 2010: DNA Computing and Molecular Programming (eds Sakakibara, Y. & Mi, Y.) 176–186 (Springer, 2011).
 24.
Chen, S. X. & Seelig, G. A DNA neural network constructed from molecular variable gain amplifiers. In DNA 2017: DNA Computing and Molecular Programming (eds Brijder, R. & Qian, L.) 110–121 (Springer, Cham, 2017).
 25.
Cho, E. J., Lee, J.W. & Ellington, A. D. Applications of aptamers as sensors. Annu. Rev. Anal. Chem. 2, 241–264 (2009).
 26.
Li, B., Ellington, A. D. & Chen, X. Rational, modular adaptation of enzymefree DNA circuits to multiple detection methods. Nucleic Acids Res. 39, e110 (2011).
 27.
Pei, R., Matamoros, E., Liu, M., Stefanovic, D. & Stojanovic, M. N. Training a molecular automaton to play a game. Nat. Nanotechnol. 5, 773–777 (2010).
 28.
Fernando, C. T. et al. Molecular circuits for associative learning in singlecelled organisms. J. R. Soc. Interface 6, 463–469 (2009).
 29.
Aubert, N. et al. Evolving cheating DNA networks: a case study with the rock–paper–scissors game. In ECAL 2013: Advances in Artificial Life (eds Liò, P. et al.) 1143–1150 (MIT Press, Cambridge, 2013).
 30.
Lakin, M. R., Minnich, A., Lane, T. & Stefanovic, D. Design of a biochemical circuit motif for learning linear functions. J. R. Soc. Interface 11, 20140902 (2014).
 31.
Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).
Acknowledgements
We thank R. M. Murray for sharing an acoustic liquidhandling robot. We thank C. Thachuk and E. Winfree for discussions and suggestions. K.M.C. was supported by a NSF Graduate Research Fellowship. L.Q. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund (1010684), a Faculty Early Career Development Award from NSF (1351081), and the Shurl and Kay Curci Foundation.
Reviewer information
Nature thanks R. Schulman and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Author information
Affiliations
Bioengineering, California Institute of Technology, Pasadena, CA, USA
 Kevin M. Cherry
 & Lulu Qian
Computer Science, California Institute of Technology, Pasadena, CA, USA
 Lulu Qian
Authors
Search for Kevin M. Cherry in:
Search for Lulu Qian in:
Contributions
K.M.C. developed the model, designed and performed the experiments, and analysed the data; K.M.C. and L.Q. wrote the manuscript; L.Q. initiated and guided the project.
Competing interests
The authors declare no competing interests.
Corresponding author
Correspondence to Lulu Qian.
Extended data figures and tables
Extended Data Fig. 1 DNA implementation of winnertakeall neural networks.
The winnertakeall computation is broken into five subfunctions: weight multiplication, summation, pairwise annihilation, signal restoration and reporting. In the chemical reactions listed next to the five subfunctions, the species in black are needed as part of the function, the species in grey are needed to facilitate the reactions and the waste species are not shown. k_{f} and k_{s} are the rate constants of the pairwiseannihilation and signalrestoration reactions, respectively. In the DNAstranddisplacement implementation, weight multiplication and signal restoration are both catalytic reactions. The grey circle with an arrow indicates the direction of the catalytic cycle. Representative, but not all possible, states are shown for the pairwiseannihilation reaction. Zigzag lines indicate short (5 or 7 nucleotide) toehold domains and straight lines indicate long (15 or 20 nucleotide) branchmigration domains in DNA strands, with arrowheads marking their 3′ ends. Each domain is labelled with a name, and asterisks in the names indicate sequence complementarity. Blackfilled and whitefilled arrowheads indicate the forwards and backwards directions of a reaction step, respectively. All DNA sequences are listed in Supplementary Table 1.
Extended Data Fig. 2 Seesaw circuit implementation of winnertakeall neural networks.
a, Same as Fig. 1a. b, Seesaw circuit diagram^{11} for implementing the winnertakeall neural network. Each black number indicates the identity of a seesaw node. A total of n + 3m nodes are required for implementing a winnertakeall neural network with m memories that each has n bits. The location and absolute value of each red number indicates the identity and relative initial concentration of a DNA species, respectively. A red number on a wire connected to a node (or between two nodes) indicates a free signal molecule, which can be an input or fuel strand. A red number inside a node indicates a gate molecule, which can be a weight, summation gate or restoration gate. A red number on a wire that stops perpendicularly at two wires indicates an annihilator molecule. A negative red number inside a half node with a zigzag arrow indicates a reporter molecule.
Extended Data Fig. 3 Experimental characterization of winnertakeall DNA neural networks.
a, Twospecies winnertakeall behaviour. The experimental data (left, same as Fig. 2a) were used to identify the reverse rate constant k_{r} = 0.4 s^{−1} of the annihilation reaction in simulations (right). All fluorescence kinetics data and simulation are shown over the course of 2.5 h. The standard concentration is 50 nM (1×). Initial concentrations of the annihilator, restoration gates, fuels and reporters are 75 nM (1.5×), 50 nM (1×), 100 nM (2×) and 100 nM (2×), respectively. b, A 4bit pattern recognition circuit with input concentration varying from 50 nM to 500 nM. In each output trajectory plot, dotted lines indicate fluorescence kinetics data and solid lines indicate simulation. The patterns to the left and right of the arrow indicate input signal and output classification, respectively. c, Applying thresholding to clean up noisy input signals. The thresholding mechanism has been reported previously in work on seesaw DNA circuits^{11}. The extended toehold in threshold molecule has 7 nucleotides. In b and c, to compare the range of inputs, the concentration of each input strand is shown relative to 50 nM. The initial concentration of each weight molecule is either 0 or 50 nM; weight fuels are twice the concentration of weight molecules. The initial concentrations of the summation gates, annihilator, restoration gates, restoration fuels and reporters are 100 nM (1×), 400 nM (4×), 100 nM (1×), 200 nM (2×) and 200 nM (2×), respectively, with a standard concentration of 100 nM. Source Data
Extended Data Fig. 4 A winnertakeall DNA neural network that recognizes 9bit patterns as ‘L’ or ‘T’.
In each output trajectory plot, dotted lines indicate fluorescence kinetics data and solid lines indicate simulation. The standard concentration is 50 nM (1×). The initial concentration of each input strand is either 0 or 50 nM (1×). The initial concentration of each weight molecule is either 0 or 10 nM (0.2×); weight fuels are twice the concentration of weight molecules. The initial concentrations of the summation gates, annihilator, restoration gates, restoration fuels and reporters are 50 nM (1×), 75 nM (1.5×), 50 nM (1×), 100 nM (2×) and 100 nM (2×), respectively. The patterns to the left and right of the arrow indicate input signal and output classification, respectively. In addition to the perfect inputs, 28 example input patterns with 1–5 corrupted bits were tested. Note that 5 is the maximum number of corrupted bits, because an ‘L’ with more than 5bit corruption will be as similar as or more similar to a ‘T’, and vice versa. Source Data
Extended Data Fig. 5 A winnertakeall DNA neural network that recognizes 100bit patterns as one of two handwritten digits.
a, Choosing the test input patterns on the basis of their locations in the weightedsum space. b, Overlap between the two memories: ‘6’ and ‘7’. c, 36 test patterns with the number of flipped bits shown next to their weighted sums. d, Recognizing handwritten digits with up to 30 flipped bits compared to the perfect digits. Dotted lines indicate fluorescence kinetics data and solid lines indicate simulation. The standard concentration is 100 nM. Initial concentrations of all species are listed in Extended Data Fig. 10. The input pattern is shown in each plot. Note that 40 is the maximum number of flipped bits because all patterns have exactly 20 1s. Source Data
Extended Data Fig. 6 Threespecies winnertakeall behaviour and rate measurements for selecting DNA sequences in winnertakeall reaction pathways.
a, Fluorescence kinetics data for a threespecies winnertakeall circuit. Initial concentrations of the three weightedsum species are shown on top of each plot as a number relative to a standard concentration of 50 nM (1×). The initial concentrations of the annihilator, restoration gates, fuels and reporters are 75 nM (1.5×), 50 nM (1×), 100 nM (2×) and 100 nM (2×), respectively. b, Measuring the rates of 15 catalytic gates. Fluorescence kinetics data (dotted lines) and simulations (solid lines) of the signal restoration reaction are shown, with a trimolecular rate constant (k) fitted using a Markov chain Monte Carlo package (https://github.com/joshburkart/mathematicamcmc). The reporting reaction was needed for the fluorescence readout. Initial concentrations of all species are listed as a number relative to a standard concentration of 50 nM. c, The 15 catalytic gates sorted and grouped on the basis of their rate constants. All rate constants are within ±65% of the median. The two coloured groups of three rate constants are within ±5% of the median. These two groups of catalytic gates were selected for signal restoration in the winnertakeall DNA neural networks that remember two to nine 100bit patterns (Methods section ‘Sequence design’). Source Data
Extended Data Fig. 7 A winnertakeall DNA neural network that recognizes 100bit patterns as one of three handwritten digits.
a, Circuit diagram. b, Choosing the test input patterns on the basis of their locations in the weightedsum space. c, Overlap between the three memories: ‘2’, ‘3’ and ‘4’. d, Recognizing handwritten digits with up to 28 flipped bits compared to the ‘remembered’ digits. Dotted lines indicate fluorescence kinetics data and solid lines indicate simulation. The standard concentration is 100 nM. Initial concentrations of all species are listed in Extended Data Fig. 10. The input pattern is shown in each plot. Note that 40 is the maximum number of flipped bits because all patterns have exactly 20 1s. Source Data
Extended Data Fig. 8 Workflow of the winnertakeall compiler.
The compiler^{21} is a software tool for designing DNAbased winnertakeall neural networks. Users start by uploading a file that describes a winnertakeall neural network. Alternatively, the weight matrix and test patterns can be drawn graphically. Next, a plot of the weightedsum space provides a visual representation of the classification decision boundaries. The kinetics of the system can be simulated using Mathematica code downloaded from the compiler website, and the set of reaction functions are displayed online. Finally, the compiler produces a list of DNA strands that are required to experimentally demonstrate the network as designed by the user.
Extended Data Fig. 9 Size and performance analysis of logic circuits for pattern recognition.
a, Logic circuits that determine whether a 9bit pattern is more similar to ‘L’ or ‘T’. b, Logic circuits that recognize 100bit handwritten digits. To find a logic circuit that produces correct outputs for a given set of inputs, with no constraint on other inputs, we first created a truth table including all experimentally tested inputs and their corresponding outputs. The outputs for all other inputs were specified as ‘don’t care’, meaning the values could be 0 or 1. The truth table was converted to a Boolean expression and minimized in Mathematica, and then minimized again jointly for multiple outputs and mapped to a logic circuit in Logic Friday (https://download.cnet.com/LogicFriday/300020415_475848245.html). In the minimized truth tables shown here, ‘X’ indicates a specific bit of the input on which the output does not depend. For comparison, minimized logic circuits were also generated from training sets with a varying total number of random examples from the MNIST database. The performance of each logic circuit, defined as the percentage of correctly classified inputs, was computed using all examples in the database. To make the minimization and mapping to logic gates computable in Logic Friday, the size of the input was restricted to the 16 most significant bits, determined on the basis of the weight matrix of the neural networks.
Extended Data Fig. 10 Species and their initial concentrations in all neural networks that recognize 100bit patterns.
a, List of species and strands. Reporters were annealed with top strands (that is, Rep[j]t) in 20% excess. All other twostranded complexes were annealed with a 1:1 ratio of the two strands and then PAGEpurified (Methods section ‘Purification’). b, Weights and example inputs in the neural network that recognizes ‘6’ and ‘7’. c, Weights in the neural network that recognizes ‘1’–‘9’. Weights and inputs used in all experiments are listed in Supplementary Table 2. Detailed protocols for all experiments are listed in Supplementary Table 3.
Supplementary information
Supplementary Table 1
DNA sequences
Supplementary Table 2
Weights and inputs
Supplementary Table 3
Experimental protocols
Rights and permissions
To obtain permission to reuse content from this article visit RightsLink.
About this article
Further reading

1.
Switchable DNAorigami nanostructures that respond to their environment and their applications
Biophysical Reviews (2018)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.