An event-based architecture for solving constraint satisfaction problems

Constraint satisfaction problems are ubiquitous in many domains. They are typically solved using conventional digital computing architectures that do not reflect the distributed nature of many of these problems, and are thus ill-suited for solving them. Here we present a parallel analogue/digital hardware architecture specifically designed to solve such problems. We cast constraint satisfaction problems as networks of stereotyped nodes that communicate using digital pulses, or events. Each node contains an oscillator implemented using analogue circuits. The non-repeating phase relations among the oscillators drive the exploration of the solution space. We show that this hardware architecture can yield state-of-the-art performance on random SAT problems under reasonable assumptions on the implementation. We present measurements from a prototype electronic chip to demonstrate that a physical implementation of the proposed architecture is robust to practical non-idealities and to validate the theory proposed.

Constraint satisfaction problems (CSPs) are a fundamental class of problems in computer science with wide applicability in areas such as channel coding [1], circuit optimization [2], and scheduling [3].Algorithms for solving CSPs are typically run on von Neumann architectures where a number of processing units compute using a shared memory pool.These architectures were not explicitly developed to solve CSPs.This raises the question: How can we construct a more efficient computing substrate whose architecture better reflects the distributed nature of CSPs?In this paper we address this question by describing a distributed dynamical system whose dynamics execute an efficient search for CSP solutions and which can be easily implemented using Complementary Metal-Oxide Semiconductor (CMOS) Very Large Scale Integration (VLSI) electronic chips.
Many dynamical systems that have been proposed for solving CSPs violate the 'physical implementability' condition [4][5][6].Non-physicality arises from the use of variables that can grow without bounds as the system is searching for solutions.On the other hand, there is a long well-established tradition of developing physically realizable dynamical systems, e.g., in the form of artificial neural networks, to solve CSPs or "Best-match problems" [7,8].Early attempts in this field used attractor networks, such as Hopfield networks [9], to solve CSPs like the traveling salesman problem [10,11].These attractor networks, however, would often get stuck at locally optimal problem solutions (so called "local minima").To avoid getting stuck at local minima, stochastic attractor networks were proposed [12,13] which make use of explicit sources of noise to force the network to continuously explore the solution space.While noise is an inextricable part of any physical system, dynamically controlling the noise power to balance "exploratory" versus "greedy" search, or to realize an annealing schedule that moves the network from an exploratory phase to a greedy one, is not a trivial operation and puts an additional overhead on the physical implementation.
The architecture we propose in this paper makes use of analog oscillator circuits that, once fabricated, inevitably have incommensurable frequencies (i.e., frequencies that are not rational multiples of each other).Rather than requiring external sources of noise or relying on random fluctuations, our architecture exploits the non-repeating phase relations among the analog oscillators to drive the search for optimal solutions.This greatly reduces the requirements for the design of the hardware implementation, as it does not require precise circuits.Indeed, as the system exploits, by construction, the inhomogeneities present in physical devices, it is possible to design extremely compact and simple oscillator circuits, that are allowed to have large mismatch figures.Each variable in a CSP is represented by a node consisting of an analog oscillator and a state-holding asynchronous digital circuit.To achieve robust and scalable computation, the nodes communicate using digital pulses, or events.This combination of analog and digital circuits running in a hybrid continuous/event-driven mode avoids many of the problems that affect pure analog VLSI systems such as susceptibility to noise, degradation of analog signals during storage and communication, and signal restoration/refresh issues.
This architecture is inspired by computational neuroscience studies that used oscillatory rate-based neural networks to solve constraint satisfaction problems [14].The architecture we present here uses different dynamics that were developed primarily based on engineering considerations to be as robust and easily implementable as possible.The architecture was also developed to be general enough to allow the instantiation of various efficient algorithms for solving CSPs.We present results from a CMOS implementation of this architecture on a prototype VLSI chip and show that the chip operation reproduces the dynamics of the simulated system.Our results expose a surprising relation between the dynamics of coupled multi-stable oscillators and the search for CSP solutions and highlight a novel mode of distributed, parallel, mixed analog/digital computation that can form the basis of various hardware/physical systems for solving CSPs.

Description of the architecture
The proposed event-based architecture for solving CSPs is composed of a network of nodes which communicate via digital events.A node is shown schematically in Fig. 1a.Each node has N externally accessible input ports, one internal input port, M output ports, and one dummy output port.The analog oscillator in the node generates a continuous stream of digital events which are sent to the node's internal port: in.0.The digital logic in the node has an internal state s which can take one of Q possible values.On the arrival of an event on any of the input ports, the node's digital logic evaluates the index of the output port to which it should send the event based on the index of the triggered input port and on the current state of the digital logic; it updates its internal state; and it transmits the event via the output port selected (see Fig. 1b).Selection of the 'dummy' output port out.0 is equivalent to suppressing the event.The digital logic is fully described by the event routing function g and the state update function f which are both deterministic.Given their analog nature, the frequencies of the oscillators in the different nodes are not rational multiples of each other.This requirement is trivial to achieve in a VLSI design of analog oscillators.
For solving CSPs, a subset of the nodes in the network will represent the actual problem variables while others will represent helper variables that encode other problem-relevant quantities (for example, whether a constraint is satisfied or not).The value of a variable/node at a point in time is the index of the output port on which the node emitted its last event.Thus, a variable/node with M output ports can have M possible values.The output port of one node can connect to the input ports of one or more nodes and one input port can receive events from multiple output ports.One output port can not be connected to multiple input ports on the same node.In the following sections, we describe how to connect nodes/variables together and how to define the nodes/variables behavior so as to solve a number of hard CSPs.The procedure to map a CSP to this distributed architecture depends on the type of the CSP but in general, the mapping is done so that the distributed and parallel dynamics of the network of nodes tries to put the problem variables/nodes in a state where their outputs satisfy all the constraints Figures 1c and 1d show the definition and illustrate the behavior of an example node which has two input ports, two output ports, and two possible internal states (N = M = Q = 2).The state of the example node/variable is the index of the last external event it received and the node/variable advertises its state by generating an event on the corresponding output port when it receives an event on the internal port in.0 as shown in Fig. 1c (we refer to this as 'updating').Assume this example node is the target node receiving events from multiple sources nodes.Since it is only the last received event that determines the Figure 1.(a) General form of a node composed of an analog oscillator and digital logic.The digital logic is event-driven.(b) On an input event, the digital logic evaluates the index of the output port, updates its state, and generates an output event in that order according to the functions f and g.(c) Definition of the f and g functions for an example binary node.(d) Simulation of the example node showing how its output events reflect the identity of the last input event it received.
value advertised by an updating node, the phase relations between the analog oscillators in the network determine which of the source nodes generates the decisive event that determines the event generated by the target node.This would be the source node that updated just before the target node updates.The phase relations are continuously changing in an aperiodic manner since the oscillation frequencies are incommensurable.The shifting phase relations thus continuously change which source node manages to influence the output events of the target node.
For the node described in Fig. 1c, assume N 1 nodes are trying to force a target node to 1 and N 2 nodes are trying to force it to 2, the fraction of 1 events generated by the target node roughly goes to N1 N1+N2 (assuming the difference in oscillator frequencies of the source nodes are small) if observed for a long enough time.Thus, the more nodes that try to force a target node to a particular value, the more likely the target node is to output that value, yet there is always a chance that even a single source node that is in conflict with the majority will update just before the target node updates, thereby causing the target node to go against the majority influence.As we will show in the next sections, this behavior can be exploited to allow the network to escape from local minima where flipping a single variable/node may increase the number of violated constraints.However, a node will never take a value that is in conflict with all incoming influences which is why the globally optimal state is stable.We show that this mostly greedy, but sometimes exploratory, behavior can be exploited to efficiently solve a variety of hard CSPs.

Boolean Satisfiability Problems
Let X = {x 1 , . . ., x N } be a set of boolean variables.A literal is either a variable or its negation.The solution to a boolean satisfiability or K-SAT problem is the variable assignment that satisfies the logical expression involving the variables of X: where the clause c i is the disjunction of K literals.n-SAT for n ≥ 3 is NP-complete [15].

The probSAT Algorithm
One of the most efficient algorithms for solving SAT problems is the probSAT algorithm [16], which iteratively modifies a variable assignment by choosing a random unfulfilled clause c u and changing the assignment of ('flipping') a random variable x f in c u , thereby fulfilling c u .The choice x f is governed by a heuristic function f (m, b), where m (the 'make' heuristic) is the number of clauses that are newly fulfilled when x f is flipped and b (the 'break' heuristic) is the number of clauses that are newly unfulfilled when x f is flipped.The heuristic function is renormalized into a probability over the available choices and x f is chosen according to these probabilities.The heuristic function f (m, b) can take several different forms.In our benchmarks, we use the particularly effective 'exponential' form: where x and y are parameters.

Mapping Break-only probSAT to a network of nodes
The probSAT algorithm that only uses the 'break' heuristic can be loosely mapped to our architecture by using two types of nodes: nodes representing variables and nodes representing constraints/clauses (see Fig. 2).We consider only 3-SAT problems but extensions to n-SAT for n > 3 are straightforward.Each variable node has two states.It updates and advertises its state (by generating an event on one of its two output ports) whenever it receives an event from a clause node.Additionally, it advertises its state whenever it receives an event from the internal oscillator on input port in.0.
When the clause node receives a break event (event arriving on one of the three break input ports, one corresponding to each variable), it increments the corresponding break counter.On events from the internal oscillator, a clause node evaluates what state the connected variables have last advertised.If there is no variable in a 'fulfilling' state, the clause node sends an event to flip the variable with the smallest associated 'break' count and sends a 'break' event to every constraint this variable is connected to in order to indicate that the flipped variable is the only variable keeping the constraint fulfilled.If there is exactly one 'fulfilling' variable, the clause node sends a 'break' event to every constraint that the variable is connected to.If there is more than one 'fulfilling' variable, the clause node does not send out any events.The break counters are reset after each event from the internal oscillator.An unfulfilled clause node thus always chooses to flip the variable with minimal break count (with ties resolved according to a fixed variable ordering).This flip heuristic is deterministic and simpler than the heuristic employed by standard probSAT.

Network and Sequential probSAT: Performance Comparison
We compare the performance of the network to that of the standard (sequential) probSAT algorithm [16].We evaluate the network performance in two cases: The 'ideal' case where events are transmitted instantly and never lost; and a 'non-ideal' case where events have a delay uniformly distributed between zero and   ten percent of the node oscillation period and a ten percent chance to get lost completely.The non-ideal case simulates the imperfections of an actual physical implementation where spike delivery is neither instantaneous nor guaranteed.The first benchmark is a set of 1000 intermediate size, difficult 3-SAT problems taken from SATLIB [17] with 50 variables and 218 clauses each.Figure 3a shows the number of variable flips needed to reach the solution for the ideal and non-ideal network and for standard probSAT.The median flips to solutions is smallest for the non-ideal network.This indicates that using incommensurable oscillators to effectively "randomize" clause update order is better than choosing a clause to fulfill at random.Surprisingly, the non-ideal network performs better than the ideal one.Losing/delaying events might increase network efficiency by making it more exploratory as clauses now have imperfect information about the state of the variable nodes.
Figure 3b shows the average number of oscillation cycles (per variable) needed to find the solution.For standard probSAT, we assume one cycle corresponds to one variable flip.This measure takes into account the fact that the network, due to its distributed nature, can update variables in parallel.The number of oscillation cycles indicates how fast a hardware implementation of the network would need to be to run faster than the standard algorithm on a conventional CPU (which achieves around 1-10 Mega-Flip per second).Since the average number of oscillation cycles to solution is about a fifth of the number of flips performed by the sequential algorithm, a hardware implementation would need to use analog oscillators with an average frequency of 0.2-2 MHz to perform as well as a conventional computer.
We can not give a comparison with state-of-the-art benchmarks because our software simulator is only able to simulate 10 cycles for each node per second if used to simulate networks implementing large modern benchmarks.Based on the solution times of standard probSAT, an estimated 10 6 − 10 9 cycles are needed to solve a single current day problem (i.e.1-1000 days to simulate the corresponding network).
As an alternative, we evaluate the performance on various problem sizes to ensure that the network performance scales equally well as standard probSAT; since this is the case (see Fig. 4), it is reasonable ).Note that the histograms have logarithmically spaced and sized bins.The best performing algorithm in both metrics is the network with lost and delayed events.In plot 3b, the distribution looks bimodal because for very small numbers of cycles to solution, this number could not be accurately measured (network convergence is checked every 20 cycles).
to assume that for large problems, the network performs as well as the standard algorithm in terms of number of flips to solution.

Graph coloring problem
A k-coloring for a graph G with vertices V (G) and a set of edges E(G) is a map φ : V (G) → {1, 2, . . ., k}.
In the graph coloring problem, the goal is to find a proper k-coloring φ 0 of G where φ 0 (x) = φ 0 (y) for all {x, y} ∈ E(G).

Network implementation and performance
To solve a k-coloring problem, we map each vertex in the graph to a network node with k input ports and k output ports as shown in Fig. 5. Whenever the internal oscillator in a node/vertex generates an event, the node advertises its color by generating an event on one of the k output ports.Events from a node/vertex are routed to all its neighbors in the graph.Each node maintains k counters that count how many of its neighbors have a particular color.These counters are incremented when a node receives events from its neighbors.At an internal oscillator event, if the counter corresponding to the current node color is non-zero (one of the neighbors has the same color), the node chooses a different color.If the internal boolean variable, 'heuristic', is true, the node chooses the color with the fewest conflicts (smallest neighbor count).If 'heuristic' is false, the node chooses the next color in a fixed arbitrary ordering of colors.The node then resets the k counters, flips the 'heuristic' binary variable and generates an event to advertise its color.A min conflict heuristic thus takes turns with a heuristic free scheme to update a conflicting node in each cycle.
We assessed the performance of this algorithm on several k-coloring problems of intermediate difficulty (see Table 1) taken from [18] in which a different massively parallel coloring algorithm ('gravitational swarm intelligence' (GSI)) was assessed.As in the previous section on boolean satisfiability, we cannot attempt state-of-the-art sized problems since the software simulation of a large network takes an infeasibly long time.In terms of average numbers of oscillation cycles to solution the network compares favorably to GSI [18].[18] of our network and a massively parallel algorithm [18].Each number in the network column is an average of 4 runs with redrawn oscillator frequencies; one run for the queens graph did not converge in 10 5 cycles (the other runs averaged 530 steps to convergence).

Implementing analog costs and the traveling salesman problem
The problems considered so far, 3-SAT and graph coloring problems, have hard constraints that should all be satisfied.Our architecture can also handle CSPs with weighted constraints by exploiting the relations between the frequencies of the analog oscillators.We illustrate this approach using the traveling salesman problem (TSP).A TSP with N cities is defined by the N × N matrix D where d ij is the distance from city i to city j.The goal is to find a minimum length closed tour of all the cities.We can not require a distributed architecture like ours to stabilize at the optimal solution as there is no general way, short of using long-term memory resources for storing previously visited tours, to verify that a tour is optimal.Our goal is thus to map a TSP to a network of nodes that continuously explores all possible valid tours but that has a higher chance of visiting tours with smaller distances.We only consider symmetric TSPs where d ij = d ji .We associate nodes in our network with directed edges in the TSP tour.An edge from city i to city j is represented by node (i, j).Each edge node has 3 possible internal states, 3 input ports and 1 output port and an event from edge node i, j means that this edge is present in the tour.Edge nodes are described by the state update function f edge and spike routing function g edge (see Fig. 1b) given by: The node will only generate an event when it is in state 3 (activated state) and when the internal oscillator generates an event.It then goes to state 1 (deactivated state).The node will go to state 3 if it receives an event on input port 4 or if it is in state 2 and receives an event on input port 2. We first consider a six cities symmetric TSP which can be mapped to the network shown in Fig. 6a.Row i contains edge nodes for edges originating from city i and column j edge nodes for edges that terminate at city j + 1. Events from edge node (i, j) are routed to port 1 of all edge nodes in the same row and column, and to port 2 on all edge nodes in row j.Assume initially all edge nodes are in state 2 except the nodes in row 1 which are in state 3 (activated).One of the nodes in row 1, for example (1, 3), will generate an event first and deactivate all edges nodes originating from city 1 or terminating on city 3 by putting these edge nodes in state 1.The 1, 3 event activates all edge nodes originating from city 3 (edge nodes in row 3) by putting them in state 3 (Fig. 6a).Assume (3,2) generates an event first in row 3 thereby disabling all edges originating from city 3 or terminating on city 2. The event from (3, 2) switches edge nodes in row 2 that were in state 2 to state 3 (Fig. 6b).One of the activated edge nodes in row 2 will now generate an event and the sequence continues (Fig. 6c).Through the events of the edge nodes (exactly 5 nodes will generate an event), we obtain a valid tour, for example: (1, 3); (3,2); (2,4); (4,6); (6,5).After these 5 events, all edge nodes are in state 1 (deactivated).
The chance for an edge node (i, j) that is at state 3 to "win the race" and send an event first when row i is activated depends on its frequency f i,j and the frequencies of the other active nodes in the row.As f i,j increases, the likelihood for edge node (i, j) to generate an event first and become part of the tour when its row is activated increases.To increase the "probability" of obtaining a short tour, we set f i,j to: where K is a positive scaling constants and η i,j is a small perturbation to keep the frequencies incommensurable.The node of a shorter edge has a higher frequency which makes the edge more likely to appear in the tour.The 'tour completion' node in Fig. 6a is responsible for resetting the network at the end of a valid tour.It has 2 internal states, 1 input port, and 1 output port and is defined as: The frequency of the 'tour completion' node is chosen to be slightly less than that of the edge node having the smallest frequency and it receives events from all edge nodes.If the 'tour completion' node does not receive any event within one oscillation cycle (all edge nodes are inactive), it generates an event that goes to port 4 of the edge nodes in row 1 and port 3 of all other edge nodes thereby resetting the network and starting the tour generation process.Figure 6d shows the frequency of occurrence of the tours generated by a network encoding a six city TSP.There are 120 tours as the first city in the tour is always city 1. Tours having the same distance do not occur equally often and some tours with longer distances occur more often than tours with shorter distances.In general, however, shorter tours tend to occur more often.
For large problems, the performance of the TSP network falls far short of state of the art TSP algorithms.TSPs have a global constraint which is the requirement that a tour is valid.The architecture we describe is ill-suited to handling such a global constraint as different parts of the network can no longer operate in parallel to optimize the local constraints.As the size of the TSP increases, the fraction of valid tours decreases exponentially which makes it imperative that this global constraint be strictly enforced, otherwise the obtained tours will mostly be invalid.We followed a pseudo-sequential scheme, which is counter to the distributed nature of the architecture, where edges are added one by one to keep the global tour validity constraint satisfied at all times.Even though the TSP architecture we describe is inefficient, it shows that analog costs can be implemented through the appropriate choice of node frequencies and that the deterministic network can exhibit a sampling-like behavior.In implementations where individual node frequencies are not controllable, the same effect can be achieved by choosing nodes having the appropriate frequency relations from a large pool of mismatched nodes.

Spikes from all nodes
to port 1 to port 2 (a) .. ..

Spikes from all nodes
to port 1 to port 2 (b) .. ..

Spikes from all nodes
to port 1 to port 2

Prototype VLSI implementation
The prototype VLSI chip that implements a version of the architecture described in this paper is composed of a 2D array of binary nodes that communicate using events.The problem of transmission and routing of asynchronous events has been thoroughly investigated in the neuromorphic engineering literature [19,20].An elegant solution uses a communication protocol based on the Address-Event Representation (AER).When a node generates an event on one of its output ports, it executes a handshake protocol with the 'output AER interface'.The 'output AER interface' encodes the address of the output port on which the event was generated and transmits the address off-chip using an output bus that has log 2 (K out ) lines.K out is the number of possible event sources (the output ports of all the nodes).The array has K in possible event targets (the input ports of the nodes), if an event is to be sent to one of these targets, the target address is sent to the 'input AER interface' on a bus that has log 2 (K in ) lines.The 'input AER interface' decodes the address and sends an event to the target element by simultaneously activating the correct row and column in the array.
The 2D array on the chip comprises 64*32 binary nodes/variables, i.e, nodes/variables with two output ports.The chip can be configured so that 2,3, or 4 adjacent variables are merged together to realize 4-, 6-, or 8-valued variables respectively.An n-valued variable (n ∈ {2, 4, 6, 8}) has n output ports and n possible internal states and has 2 n − 1 input ports.Physically, a variable has n digital input lines on which it receives a binary word encoding the index of the input port receiving the event.An off-chip event router implemented on a field programmable gate array (FPGA) communicates with the output and input AER interfaces to route events from nodes/variables output ports to input ports according to a programmable routing table.
When an n-valued node/variable receives an event on port i, The 1s in the binary representation of i denote the allowable internal states that the variable can take.The variable has n possible internal states and an event on one of the 2 n − 1 input ports can thus decide which non-empty subset of these states are allowed.If multiple states are allowed, the variable stays at its current state if the current state is one of the allowed states, otherwise it goes to the lowest index allowed state.Let i(p) be the p th bit of i where indexing starts at 1, the state update function f is thus: The node/variable generates an event only when it receives an event on port 0. The event is generated on the port corresponding to the currents state.The event routing function g is: The analog oscillator in each variable is realized using an integrate and fire neuron [21] receiving constant current injection.As shown in Fig. 7c, the oscillator frequencies are significantly different due to transistor mismatch.Since the oscillation frequencies are real numbers drawn from a probability distribution arising from the variability inherent in the fabrication process, it is impossible for an oscillator to have a frequency that is a rational multiple of another's.[17] send to each other will not be able to move any node out of these 2 states.We use the more elaborate scheme shown in Fig. 9a where two 4-valued chip nodes are used to implement one 4-valued graph vertex.The value of this graph vertex is index of the last event emitted by the 'main' chip node.Pair-wise inequality constraints are implemented by routing events from the i-exclude output port of one vertex to the i-exclude input port of the other vertex.Assume a vertex has value 1, i.e, the state of the main (helper) chip nodes are 1 (4).The state/color of this vertex will only change if it receives an event on the 1-exclude port.In that case, the 'main' and 'helper' chip nodes go to states 2 and 1 respectively since these are the lowest index allowed states in the two chip nodes.The two chip nodes now have inconsistent states and whichever of them generates an event first forces the other node to switch its state; for example if the 'helper' node generates an event first, it forces the 'main' node to take state 4. A 1-exclude input event effectively has a 50% chance of moving this graph node to state 2 and a 50% chance to move it to state 4 due to the irregular phase relations.
The scheme can be extended to 6-and 8-valued vertices by using three 6-valued and four 8-valued chip nodes respectively to represent a single graph vertex and it is straightforward to show that using this scheme, the network representing the coloring graph always uses all available colors.3-, 5-, and 7-coloring problems can be implemented by adjusting the even color schemes so that events are routed to input ports that exclude both the color/index of the source output port as well as the highest index/color which will then be unused.
One difficult graph for this architecture is the '5 × 5 queen' graph whose solution is equivalent to finding the non-interfering positions of 5 queens on a 5 × 5 chess board.The average number of cycles needed to find a solution is shown in Fig. 9b.(a) Implementation of a 4-valued graph vertex using two 4-valued chip nodes which are coupled so that an event from port 1, 2, 3, or 4 of one chip node puts the other node in state 4, 3, 2, or 1 respectively.This vertex receives events from other vertices which go the exclude input ports of the two chip nodes (red dashed lines).For example an event arriving on the 1-exclude input port goes to port 14(binary '1110') on the 'main' chip node and port 7(binary '0111') on the 'helper' chip node.(a) Histogram of the number of oscillation cycles (averaged per variable) needed by the chip to find the optimal coloring of the 5 × 5 queen graph.

Discussion and Conclusions
CSPs have often been examined through the lens of statistical physics [22,23].Within the framework of statistical physics, a CSP is formulated as a distributed system that seeks to minimize the number of frustrated interactions (violated constraints) between its elements.Direct analogies can be established between the ground energy states of physical systems (where frustrated interactions are at a minimum) and solutions to CSPs [24].The architecture we describe in this paper is fundamentally different from the systems analyzed in the framework of statistical physics, yet it captures some of the general features of such systems: The architecture makes use of a large number of locally interacting elements that mutually constrain each other so that the system as a whole tries to go to states where the number of frustrated interactions is at a minimum.Perhaps the most distinguishing feature of our system is the mechanism used to explore the solution space.In lieu of random fluctuations, the continuously varying phase relations between incommensurable oscillators are a source of non-repeating fluctuations that can be easily exploited in our event based architecture to realize efficient search algorithms.True-or pseudo-random number generators require significant hardware resources, in terms of power and silicon area.The paradigm described in this paper is more efficient, as only mismatched oscillator circuits are needed.The transistor mismatch inherent in VLSI electronic circuits ensures that the fabricated oscillator circuit frequencies are incommensurable.While circuit designers typically struggle to minimize the effects of device mismatch by using larger devices, in our case mismatch is a beneficial property whose effects should be preserved, thereby simplifying the implementation of the oscillators/variables and minimizing their area on silicon.
The digital event-based nature of variable communication is key to the architecture's scalability and configurability.These digital pulses can be transmitted and routed using a digital fabric that links together a large number of nodes/variables.In the prototype chip, event routing is done off-chip in a serial manner on the FPGA.This introduces a serial bottleneck in the otherwise massively parallel operation of the architecture.However, configurable and parallel AER routing fabrics are already in use in large-scale neuromorphic systems [25,26] and could be directly adapted for use in an implementation of the described architecture.As shown when solving SAT problems, the architecture is robust to event delays and lost events which relaxes the requirements on the event routing fabric.
In simulation we showed for the case of SAT problems that the proposed architecture can run at a surprisingly slow mean oscillation frequency (around 0.2-2 MHz) and still attain a time to solution that is comparable to a CPU running at three orders of magnitude higher clock rate.The simple logic operations in the constraint and literal nodes can certainly run at such slow frequencies.These results indicate that the proposed architecture is a far more efficient approach to solving SAT problems than conventional CPUs.
Algorithms for solving CSPs are often conceived with the digital von Neumann model of computation in mind.The results presented in this paper highlight an alternative approach which starts with no prior assumptions about the computational model, and seeks to exploit the physical characteristics of the underlying substrate in order to find a solution tailored to the computational problem at hand.In our case, we exploited the natural incommensurability of physical analog oscillators to derive a distributed novel algorithm for solving CSPs.This algorithm naturally results in an efficient implementation in the physical substrate that underlied its derivation.The computing architectures developed using this bottom-up approach, such as the VLSI device we present in this paper, have the potential to achieve considerable performance gains in their target problems compared to conventional purely digital approaches.

Figure 2 .
Figure 2. Network corresponding to the example SAT problem C1 ∧ C2 where C1 = (L1 ∨ L2 ∨ ¬L3) and C2 = (L2 ∨ L3 ∨ L4).For the constraints C1 and C2, the squares at the edge of the box indicate input ports (purple) and output ports (red).Events are routed along the arrows.Input/output behavior of both types of nodes is described in the main text.

Figure 3 .
Figure 3. Performance comparison of network and standard probSAT with exponential heuristic function (x = 1, y = 2.06).Note that the histograms have logarithmically spaced and sized bins.The best performing algorithm in both metrics is the network with lost and delayed events.In plot 3b, the distribution looks bimodal because for very small numbers of cycles to solution, this number could not be accurately measured (network convergence is checked every 20 cycles).

Figure 4 .
Figure 4. Performance comparison of network and standard probSAT with exponential heuristic function (x = 1, y = 2.06) on different problem sizes with 100 trials per problem.The red line indicates the median, the box outlines the 1st and 3rd quartile and the whiskers show the full range of the data.In plot 4c, the ideal network did not converge in 8 of 100 cases within the given 1.5 • 10 8 cycles.Sequential and network formulations scale similarly well with problem size.

Figure 5 .
Figure 5. Network corresponding to the 3-coloring of the graphV = {V 1, V 2}, E = {(V 1, V 2)}.The squares at the edge of the box indicate input ports (purple) and output ports (red).Events are routed along the arrows.

Figure 6 .
Figure 6.(a, b, c) TSP with 6 cities.Red and green arrows indicate inputs to ports 1 and 2 respectively.Numbers next to black arrows denote target input ports.Only the routing of the events of edge nodes (1, 2) and (4, 3) and of the 'tour completion' node are shown.A green node is a node that has just generated an event, red nodes are in state 1 (inactive), yellow nodes are in state 3 (active).a, b, c show the network state after each event for 3 successive events.(d) Frequency of occurrence of each tour in 1.5 * 10 7 tours generated by a network implementing a six cities TSP problem as a function of the tour distance.

Figure 7 .
Figure 7. (a) Layout of the minimum size (2mm*3mm) prototype chip fabricated using a 180 nm process that implements the architecture described in this paper.The 64*32 node array in the middle is surrounded on three sides by the digital asynchronous AER interfaces.An externally programmable bias generation block generates the analog biases needed by the analog oscillators.(b)An off-chip event router implemented on an FPGA communicates with the chip AER interfaces to route events from output ports to input ports.(c) Frequency distribution of the 2048 analog oscillators on-chip for the bias conditions used in the experiments in this paper.

Figure 8 .
Figure 8.(a) Network showing the implementation of C1 ∧ C2 where C1 = (L1 ∨ L2 ∨ ¬L3) and C2 = (L2 ∨ L3 ∨ L4).Numbered squares indicate the output ports and, indirectly, the input ports of a variable and arrows indicate how events are routed.For example, events from port 1 of L1 go to input port 9( '1001' in binary) of C1 which instructs C1 to go to state 4 or state 1.Events from port 2 of L1 go to port 8 of C1 which instructs C1 to go to state 4. (a) Histogram of the number of oscillation cycles (averaged per variable) needed by the chip to find the solution of a 3-SAT problem with 50 variables and 218 clauses over 1000 trials taken from[17]

Figure 9 .
Figure 9. (a) Implementation of a 4-valued graph vertex using two 4-valued chip nodes which are coupled so that an event from port 1, 2, 3, or 4 of one chip node puts the other node in state 4, 3, 2, or 1 respectively.This vertex receives events from other vertices which go the exclude input ports of the two chip nodes (red dashed lines).For example an event arriving on the 1-exclude input port goes to port 14(binary '1110') on the 'main' chip node and port 7(binary '0111') on the 'helper' chip node.(a) Histogram of the number of oscillation cycles (averaged per variable) needed by the chip to find the optimal coloring of the 5 × 5 queen graph.

Table 1 .
The squares at the edge of the box indicate input ports (purple) and output ports (red).Events are routed along the arrows.Number of cycles to convergence on common k-coloring benchmarks