Uncertain behaviours of integrated circuits improve computational performance

Improvements to the performance of conventional computers have mainly been achieved through semiconductor scaling; however, scaling is reaching its limitations. Natural phenomena, such as quantum superposition and stochastic resonance, have been introduced into new computing paradigms to improve performance beyond these limitations. Here, we explain that the uncertain behaviours of devices due to semiconductor scaling can improve the performance of computers. We prototyped an integrated circuit by performing a ground-state search of the Ising model. The bit errors of memory cell devices holding the current state of search occur probabilistically by inserting fluctuations into dynamic device characteristics, which will be actualised in the future to the chip. As a result, we observed more improvements in solution accuracy than that without fluctuations. Although the uncertain behaviours of devices had been intended to be eliminated in conventional devices, we demonstrate that uncertain behaviours has become the key to improving computational performance.

where T i is temperature at the i-th step and P is the probability function. The state transition to the neighbouring state in Eq. (2) is accepted with probability P. When the neighbouring state has lower energy, the state transition is always accepted just like it is with local search in Eq. (1). However, simulated annealing even accepts state transition to a higher energy state with probability according to the energy difference and current temperature T i . Temperature is scheduled by: where β is the cooling coefficient (0 < β < 1). Temperature is exponentially decreased from given initial temperature T 0 .
A pseudo-random number generator is needed to achieve probabilistic behaviour as part of the algorithm. Threshold accepting 26,27 , which eliminates the necessity for randomness from simulated annealing, can be described by: where Th is the threshold that takes the role of temperature T in simulated annealing. Threshold accepting still needs the energy difference between current and neighbouring states ′ ( ( ) − ( )) s s E E i i to be calculated.
Here, we propose a method for the optimisation problem that does not need the energy difference to be calculated, where the bits representing the next state, s i+1 , are randomly flipped in agreement with the local search in Eq. (1). The probability of flipping the bits is described by: where P is the probability function and T i is the temperature at the i-th step, as shown in Eq. (4). The behaviour of Eq. (6) is independent of local search, and it can be implemented as intentionally occurring error in the memory that stores state s i+1 .
Ground-state search of Ising model. We applied the proposed method of the ground-state search of the Ising model. The N-spins Ising model can be described as the energy function: where s = {σ 1 , σ 2 , …, σ N } is a state of the Ising model called a spin configuration, σ 1 = {+ 1, − 1} is a spin, J ij is an interaction coefficient between the i-th and j-th spins, and h i is an external magnetic field for the i-th spin. The goal of the ground-state search is to find a spin configuration that minimises the energy function.
The local search of the Ising model can be achieved by using the nearest-neighbour interactions of individual spins. Each i-th spin is pulled in the direction of + 1 or − 1 by the force of the interaction between nearest-neighbour spins and the external magnetic field. The next state of an i-th spin that minimises the local energy within the neighbourhood can be determined by:  Fig. 1(a), whose ground-state search is an NP-hard problem, were implemented as a single silicon chip called an Ising chip 28 , which is shown in Fig. 2(a). The chip was constructed as a repetition of the unit element called a "Spin unit" to enable scalability. A spin and accompanying coefficients are grouped into a spin unit outlined in Fig. 1(b). A three dimensional lattice topology was extracted to the two dimensional array of spin units shown in Fig. 2(b). Each spin unit had a memory cell array to represent a spin and the coefficients outlined in Fig. 1(c). The next state of the spin was determined by the digital logic gate and analog majority decision circuit according to Eqs. (8) and (9). The memory cells could be accessed via the bit lines and word lines outlined in Fig. 1(d) from the outside of the Ising chip in the same way as that in static random access memory (SRAM). The Ising chip had an inter-spin unit connection for local search and random bit flipping unlike a conventional memory chip. The spin units were connected as outlined in Fig. 1(e) according to the topology of the Ising model. The connections transferred the values of spins from nearest-neighbour spin units. All spin units were connected to the wires outlined in Fig. 1(f) that distributed random pulse signals. The random pulses emulated the uncertain behaviours of semiconductors in future processes. The memory cells representing the spins changed randomly being affected by the random pulses. The random pulses were injected from outside the Ising chip and the ratio of high and low was controlled to satisfy the probability of bit flipping shown in Eq. (6).
Comparison with the conventional computer, the Ising chip has several significant differences. The conventional computer is controlled by the program, which is sequence of the instruction for the central processing unit. All programs and data are stored in the main memory, and the central processing unit reads and writes the main mainly. The interconnection between the memory and the central processing unit leads performance bottleneck and power consumption. Unlike the conventional computer, the Ising chip is not controlled by the program. The Ising chip is a kind of analog computer that behavior is defined by hardware property. All data including spins and coefficients have been placed at the closest to the location to compute. The hardware structure is corresponding to the spatial structure of the problem. These features lead simple small hardware implementation with low power consumption (49.2 mW to do inter-spin interaction).
The results obtained from the ground-state search of Ising models by using the Ising chip are presented in Figs 3 and 4. The Ising models were generated randomly with various problem sizes under the Scientific RepoRts | 5:16213 | DOI: 10.1038/srep16213 restrictions of the specifications for the Ising chip. The topology of an Ising model is a three-dimensional lattice with free boundary conditions, as was previously described. This means spins have interaction coefficients in a lattice pattern. Each coefficient is randomly determined from two possible coefficients: + 1 and − 1. The ratio of coefficient r is also varied from r = 0 (all coefficients are − 1) to r = 1 (all coefficients are + 1). Figure 3 plots the process for the ground-state search by using the Ising chip with or without random pulses for random bit flipping. One step is equivalent to 100 ns through operation of the Ising chip. There is a common problem in all experiments in Fig. 3 that had 20-k spins and r = 0.5. The state fell into a local optimum (at energy − 28070) in the first several steps by only applying local search and the solution never improved as plotted in Fig. 3(a). Memory cells representing spins were randomly flipped by injecting random pulses to further improve the solution, and the probability of random flips is plotted in Fig. 3(b). The solution improved more than the previous solution as plotted in Fig. 3(c) by applying local search and random flips. The randomness helped to escape from the local optimum and the state could reach a better solution that was not possible with local search only. Figure 4 plots performance evaluations of the Ising chip with various problem sizes. The ratio of coefficients is r = 0.5 in all the experiments in Fig. 4, which is the same as in the experiments in Fig. 3. Performance has two main aspects of accuracy and time. Time, which is needed to solve the problem,  should be evaluated under conditions that can achieve at least the same accuracy. The Ising chip was evaluated with various numbers of steps such as 10 2 , 10 3 , 10 4 , 10 5 and 10 6 . The number of steps was equivalent to the time to solve the problem, and the Ising chip had a tradeoff between the steps (time) and accuracy. Figure 4(a) plots the accuracy of the solution with various problem sizes. The same problem is solved by using ten Ising chips and the best solution is selected. We evaluated the accuracy of the solution by comparing it with three algorithms on a conventional computer that had a Spin Glass Server 29 , SG3 30 and simulated annealing. The Spin Glass Server is a cloud service to compute the exact ground-state of the Ising model to find the global optimum. The exact ground-state from the Spin Grass Server indicated an upper bound for accuracy. However, the problem size was limited to 512 spins. The SG3 is a greedy algorithm for the maximum cut problem of the graph and it is relatively faster because its computation time is almost proportional to the number of spins (or nodes in a graph). The maximum cut problem is essentially equivalent to the ground-state search of the Ising model without an external magnetic field as was previously explained. The SG3 can provide an approximate ground-state even with larger problem sizes. We have used the highly optimised implementation of simulated annealing for Ising model called optimised simulated annealing 31 . Optimised simulated annealing needs some parameter as same as Ising chip. We have used two parameter sets that are referred as SA (Speedy) and SA (Accurately). SA (Speedy) is the configuration that emphasises the computing time comparable to the Ising chip. SA (Accurately) spends a long period of computing time as much as possible in the experiment to emphasises the solution accuracy.
The quality of the solution can be measured by the energy that is calculated with the energy function. However, it is difficult to compare the quality with various problem sizes by using energy as a metric because the energy of the global optimum solution differs for each problem. Therefore, we defined the relative energy as a metric for comparison that is defined by: where s is the solution derived from the method for comparison, and s SG3 is the solution to the same problem derived from the SG3 algorithm. The method for comparison can achieve better accuracy than SG3 when the relative energy is greater than one. Figure 4(b) plots the computing times for the methods we described. The accuracy of the Ising chip depends on both the steps and problem size as indicated in Fig. 4 (a). The red dotted line (SG3 comparable) plots the computing time to achieve accuracy equal to or better than that of the SG3 algorithm. The number of steps for the ground-state search by the Ising chip was chosen appropriately for each problem size.
The accuracy of the ground-state search depends on the number of steps. Plural Ising chips improve accuracy as was previously explained and ten Ising chips were used in the experiments. Figure 5 lists the accuracy achieved by combining the number of steps and the number of chips. This experiment used the 20-k spin problem with r = 0.5.

Memory error under voltage control. However, the uncertain behaviours of miniaturised semicon-
ductor devices in the future will be a serious problem for the conventional computer architecture. We demonstrated that the uncertain behaviours of transistors can be used as a source of randomness in each spin unit. The memory cells in an Ising chip represent spins and coefficients. The bit error rate (BER) of the memory cells in an SRAM is varied in the supply voltage 32 . Sufficiently high voltage (~1 V) is supplied to the SRAM in conventional computers to maintain accurate memory by using lower BER. We set the lower voltage (~0.7 V) to intentionally induce memory error in the SRAM. Furthermore, memory read operation, which is called dummy read, was executed during the period of lower voltage to increase the bit error rate so that it was higher than that without read operation. Figure 6 plots the randomness of memory cell values under voltage control and dummy read operation. We controlled the power supply voltage of memory cells that represented the value of spins. The memory cell value of zero represented spin state − 1, and the memory cell value of one represented the spin state + 1. In both kinds of initial values, zero and one, we can observe the spatially random pattern of memory cells according to the voltage. Dummy read operation accelerates the occurrence of bit errors and it achieves bit error in a relatively high voltage that is easier to control. Figure 7 plots the ground-state search of the Ising model by using the previously mentioned voltage control scheme. Figure 7(a) presents the schedule of voltage control for power supply to the memory cells that represents the value of spins. This schedule corresponds to the probability of spin flips plotted in Fig. 7(b). Figure 7(b) plots the process of ground-state search with this methodology. The quality (energy) of the answer is better than that with local search plotted in Fig. 3(a), but worse than when using  random pulses shown in Fig. 3(c). The cause of this phenomenon is that the randomness of memory cells is mainly dominated by static properties in the 65-nm node. Figure 8 plots the ground-state search of the various Ising models by using Ising chip and previous algorithms. The ratio of coefficients are varied from r = 0 to r = 1.0. All models are 20-k spin models. SA (speedy) and voltage controlled Ising chip has similar performance in the solution accuracy. The time-independent fluctuations of transistors in the semiconductor chip, which are called mismatch properties, have been studied and their causes have been analysed 33 . One cause of mismatch is random dopant fluctuations (RDFs) that affect the threshold voltage of transistors 34 . The variations in threshold voltage are spatially random but temporally permanent. The fluctuations due to RDF are increased according to process miniaturisation and they determine the limits of scaling 35 . The effect by RDF should be suppressed under normal operation conditions from the viewpoint of SRAMs as memories in computers, and this can be done by optimising the device structure in current processes including the 65-nm node 36 . Time-independent spatial randomness from RDF is another viewpoint, and is used as a fingerprint to identify individual semiconductor chips [37][38][39] . However, few memory cells have time-dependent random behaviours and these are obstacles to fingerprints.
The temporal random behaviours of transistors and memory cells occur due to random telegraph noise (RTN) [40][41][42] . The influence of RTN has been increasing according to device scaling and its growth is faster than that of RDF 43,44 . The impact of RTN is expected to be more dominant than that of RDN in the 15-nm node. Temporal random behaviours by memory cells in SRAMs have been observed 45,46 and bit errors have varied both in space and time. RTN is an obstacle to conventional usage. However, we expect that temporal variations in memory cell behaviours will help to search for better states in ground-state searches.

Discussion
We examined the possibility of using fluctuations in device characteristics in this study as computational resources by carrying out experiments on real integrated circuits. We chose the optimisation problem, especially the ground-state search of the Ising model, as an example for the proof of concept. The randomness inherent in current devices was tested but that effect was insufficient for the ground-state search because temporally-static behaviours are dominant in current devices. The emulation of expected temporally-dynamic behaviours in future can achieve significant results, which would be comparable to the well-known greedy algorithm in conventional computers. We have proposed the use of random telegraph noise (RTN) as a source of randomness. However, the characteristics of RTN are still being investigated. The time constant of RTN and its controllability have become the main problem in applying the effect of RTN to information processing.

Methods
Detailed structure of chip in experiment. See Yamaoka et al. 28 for the detailed structure of the chip. The chip was operated at 10 MHz clock frequency for the interactions.
Generating problem for experiment. We generated various Ising models for the experiment. All problems were three-dimensional lattice Ising models without external magnetic coefficients. We had two aspects in the variations: problem size and the ratio of interaction coefficients. Random pulse injected to chip. Two random pulse signals were injected into the chip. The signals were generated by the pseudo-random number generator. The signals were generated three-times faster than the clock rate to operate the chip. The random pulse signals had high-level and low-level periods. The product of two signals was used in each spin inside the chip. The spin was flipped when the product of two signals was a high-level period. The probability of a high-level period occurring was defined as the mark rate. The scheduled spin flip plotted in Fig. 3(b) was achieved by dynamically changing the mark rate. Mark rate p(t) at time t is defined as: where N is the number of steps to solve the Ising model, p initial is the initial mark rate, and p final is the final mark rate at step N. After that, we add the period of the mark rate of zero to 1000 steps to stabilise the solution. The p initial was 0.75 and p final was 0.01 for all the experiments discussed in the paper. The random pulse signals were generated by comparison with the mark rates and random numbers. The random numbers were generated by the pseudo-random number generator 47 . The random signal is high-level when r < p(t), where r is the output from the pseudo-random number generator.
Initial spin values. All the initial spin values were − 1 in all the experiments described in this paper to align the experimental conditions.
Previous algorithms for performance evaluation. In this paper, three implementations of previous algorithms, which is running on the conventional computers, have been used for the performance evaluation: Spin Glass Server 29 , SG3 30 and optimized simulated annealing 31 . These implementations are available at refs 29,48,49 respectively. SG3 30,48 has been executed at the popular personal computer (Intel Core i5 1.87 GHz, 8 GB Memory, Windows 7). It is enough resource for SG3 because the algorithm uses single core only. Optimized simulated annealing 31,49 has been executed at the high-performance server (Hitachi HA8000/RS440, Intel Xeon E7-4807 1.87 GHz × 4 sockets, 256 GB Memory, CentOS 6.5). We have used "an_ms_r1_nf_omp" program of the optimized simulated annealing because it is highly optimized for the problem of our experiment. We have assumed two experiment conditions: SA (Speedy) and SA (Accurately). Speedy conditions are 10 sweeps and 10 repetitions (-s 10 -r 10 options are used). Accurately conditions are 10 4 sweeps and 10 3 repetitions (-s 10000 -r 1000 options are used). Sweeps means the number of updating each spin in one annealing process. Multiple annealing processes, which multiplicity is defined by repetitions parameter, are executed and the best solution is chosen. The "an_ ms_r1_nf_omp" program uses the parallelism in the 64 bits integer number. Therefore, actual multiplicity is multiplied by 64.