Introduction

Ising Machines, as dynamical systems, have recently shown promise for accelerating computationally challenging problems in combinatorial optimization. The intrinsic energy minimization in the highly interconnected system gives rise to rich spatio-temporal properties, which can subsequently be mapped to the solutions of many computationally intractable optimization problems1,2. However, the highly interconnected nature of the system also poses a significant implementation and scalability challenge for Ising platforms. In fact, the number of coupling elements (representing edges) required for mapping an arbitrary graph scales up quadratically (~ N2) with the number of nodes in the graph. Consequently, scaling the system to large sizes continues to be a significant challenge for most Ising machine designs. Our approach to addressing this challenge relies on developing novel hardware components that are not only compact but can also leverage the maturity of CMOS-process technology and integration.

There is a full pallet of hardware technologies3,4,5,6,7,8,9,10,11,12,13,14,15,16, each with their advantages and shortcomings, that have been considered for implementing Ising machines. Quantum annealing approaches17,18,19,20,21 using qubits offer the possibility of exponential speed up (by overcoming the fundamental hardness of the problem) but require cryogenic cooling. Besides cost, this requirement also restricts the type of applications where this approach would be practical. At the classical end, Ising machine implementations can be classified into the optical domain—using optoelectronic oscillators to design Coherent Ising Machines (CiM)22,23,24,25, and the electronic domain using a variety of classical spin implementations. CiMs offer advantages such as speed as well as a relatively broad dynamic range for the implementation of weights26. However, such implementations have traditionally been bulky, requiring long optical fibers, although there is some recent work on monolithic integration27,28,29.

Electronic Ising machines have relied on the following approaches: iterative annealing in memory (AIM)30,31, that digitally emulates the Ising model32,33 and the actual implementation of the dynamical system34,35,36,37,38. While the former approach essentially minimizes the Ising energy using a heuristic iterative approach, the latter method (relevant to this work) uses the hardware as classical spins and maps the energy minimization in the hardware directly for minimization of the Ising Hamiltonian. Various devices such as oscillators39,40,41,42,43,44,45,46 and ZIV diodes47 have been experimentally demonstrated as classical Ising spins. More recently, CMOS-based (bi-stable) latches have also been theoretically shown to behave as Ising spins as well48. Here, we experimentally demonstrate CMOS-latches as highly scalable and compact Ising spins. Additionally, in all of the electronic designs, the implementation of the coupling network continues to be a significant challenge for scaling. To address this, we propose to exploit the non-volatile behavior of CMOS-compatible FeFET memory arrays (in fact, the FeFETs used in this work are built in 28 nm high-κ metal gate technology platform) to implement the interaction among the spins (CMOS latches). Consequently, our work enables a pathway to a compact Ising platform (Fig. 1) that is positioned to exploit the maturity of CMOS process technology to realize a scalable solution.

Figure 1
figure 1

Overview of bi-stable latch based ising machine hardware. Proposed design for the Ising machine using CMOS latches (cross-coupled inverters) as artificial Ising spins. The interaction among the spins is implemented using a CMOS-process compatible FeFET based array.

Results

CMOS latch as a classical spin

We first focus on experimentally evaluating the behavior of CMOS latches as classical spins with simple resistive coupling. The theoretical foundation for the latch-based Ising machine was elegantly formulated by J. Roychowdhury48 wherein the energy function for a resistively coupled system of latches was shown to map to the Ising Hamiltonian. The Ising Hamiltonian is given by \({\text{H}}=-\sum_{\mathrm{i},\mathrm{j}}^{\mathrm{N}}{\mathrm{J}}_{\mathrm{ij}}{\mathrm{s}}_{\mathrm{i}}{\mathrm{s}}_{\mathrm{j}}\), where \({\mathrm{s}}_{\mathrm{i}}\) {± 1} corresponds to the \(\mathrm{ith}\) spin, and \({\mathrm{J}}_{\mathrm{ij}}\) is the interaction coefficient between nodes \(\mathrm{i}\) and \(\mathrm{j}\). Figure 2 shows the experimentally observed behavior of a system of two latches with positive (Jij = + 1) and negative (Jij = -1) coupling over 100 runs; the details of the setup used in the experiments are discussed in supplementary note 1. It can be observed that the latches settle to the same voltage output level [i.e., the read terminals have the same outputs (0, 0) or (VDD, VDD)], when positively coupled, and opposite voltage levels [i.e., the read terminals have opposite outputs (VDD, 0) or (0, VDD)], when negatively coupled. We note, however, that the exact output (i.e., whether Vout1 or Vout2 settles to VDD or 0) shows statistical behavior, as expected. Moreover, the skew in the statistical behavior likely arises from the mismatch in the coupling resistances; although the correct ground state is attained, one configuration of the ground state is preferred over the other one.

Figure 2
figure 2

Coupling of two CMOS latch-based artificial Ising spins. Negative (Jij = − 1) and positive coupling (Jij = + 1) among the latches. The latch outputs (at the read terminal) always settle to the opposite (same) polarity when the negatively (positively) coupled, respectively, when the system is powered up. It is noted though that the exact output [i.e., whether Vout1 or Vout2 settle to 1 (= VDD) or 0 (= 0 V)] shows probabilistic behavior.

Building on this coupled two-latch spin system, we evaluate the dynamics of a system of four negatively coupled latches as shown in Fig. 3a–c. Our choice of negative coupling is motivated by the fact that the dynamics of such an Ising network can be directly mapped to computing the Maximum Cut (MaxCut) of a topologically equivalent graph-the MaxCut of a graph is defined as the challenge of dividing the nodes of a graph into two sets such that the number of common edges is maximized (unweighted graphs are considered here).

Figure 3
figure 3

Solving MaxCut using latch based ising machine. (a) A representative 4 node graph problem. (b) Time domain outputs of the negative coupled (Jij = − 1) CMOS latches for the input graph. (c) MaxCut solution measured using the latch outputs in (b). (d) Effect of coupling resistance on the solution quality. (e) A representative network of 10 spins with randomly generated interactions (represented by edges). (f) Experimentally observed MaxCut solution over 150 trials, illustrating the statistical behavior of the Ising machine. (g) Distribution showing occupied solution states (represented by cut values) and their frequency (orange) compared to the complete solution space (grey) for the problem. (h) Experimentally measured MaxCut solutions measured for multiple graphs of various size from 4 to 10 nodes. Each graph is measured 10 times, and the measured solution is represented by its distance d from the optimal solution.

When powered up, the latch outputs [L1 = L3 = VDD (+ 1); L2 = L4 = 0 V (− 1)] correspond to the two sets created by the MaxCut (Fig. 3b) and yield an optimal MaxCut solution of 4 (Fig. 3c). Furthermore, to rule out the possibility that the output states of the latches are resulting from any inherent asymmetry among them, we map the nodes of the graph in Fig. 3 to different physical latches. The results, discussed in supplementary note 2, show that optimal solutions are observed in all the cases irrespective of the mapping, implying that the observed relationships between the latch outputs arise from the interaction among the latches governed by the minimization of the energy of the system. We also evaluate the effect of the coupling strength on the system properties by tuning the coupling resistance (\({J}_{ij}\propto 1/R)\). It can be observed that the system exhibits the desired functionality as an Ising machine only in a limited coupling range (Fig. 3d). When the coupling is very weak (large R), the latch outputs are essentially decoupled (to a varying degree depending on the coupling strength) resulting in sub-optimal ‘solutions’. In contrast, when the coupling is very strong (small R), the latches settle to a nearly common state (~ VDD/2). Next, we experimentally evaluate the (Max)Cut on graphs of up to 10 nodes. Figure 3f shows the stochastic behavior of the system (expected for the Ising machine) for a representative graph with 10 nodes (Fig. 3e) measured over 150 iterations. Optimal solutions are observed in 86 out of the 150 measurements and the sub-optimal solutions in the remaining measurements result from the system getting trapped in local minima in the high dimensional phase space (Fig. 3g). Further, we also experimentally compute the cut value on multiple graph configurations (= 20) up to 10 nodes (Fig. 3h); each graph is measured 10 times. Optimal MaxCut solutions are obtained in 17 of the 20 graphs.

Coupling spins using FeFETs

While the above experiments showcase the functional behavior of CMOS latches as classical Ising spins, the implementation of programmable monolithic resistors as coupling elements can be challenging and area inefficient, particularly in scaled systems. We therefore evaluate, first at a singular device level, the possibility of using non-volatile FeFETs (Fig. 4a) as programmable coupling elements between the CMOS latches. Our choice of using the FeFET as the coupling element is motivated by the fact that FeFETs are compatible with CMOS process technology, provide a wide dynamic range for the resistance (coupling strength), and can be efficiently integrated and programmed in a scalable array that is required to map the spin interactions in the entire network. We envision that the tunable threshold voltage of the FeFET would allow us to program the interaction between the latches; the low VT (high conductance) state would correspond to Jij = ± 1 whereas the high VT (low conductance) state would correspond to Jij = \(0\). Figure 4b shows the experimentally measured transfer characteristics of the ferroelectric-HfO2-based FeFETs used in this work; the devices are fabricated (see “Methods”) on 28 nm high-κ metal gate technology platform, as shown in the cross-sectional TEM image49,50. It features a doped HfO2 layer as the ferroelectric and SiO2 as the interlayer. Detailed processing information is described elsewhere50. Figure 4c shows the memory window vs. programming voltage characteristics for the FeFET. When a programming voltage of ± 4 V is used to program the FeFET state, a 100 × modulation in the current is obtained for VGS = 1 V.

Figure 4
figure 4

FeFET coupled CMOS latches. (a) Schematic and TEM cross-section50 of a 28 nm high-κ metal gate FeFET device. (b) IDS–VGS characteristics of the FeFET (W/L = 0.5/0.5 µm) after program and erase pulses. (c) Evolution of memory window (MW) as a function of write voltage (VGS). FeFET coupled two-latch system settles (d) out-of-phase and (e) in-phase when the coupling is negative (Jij = − 1) and positive (Jij = + 1), respectively.

We subsequently characterize the behavior of the FeFET as a programmable coupling element in a two-latch system. To evaluate this, the FeFETs are first programmed into the low VT/high conductance state (Jij = ± 1) using a programming pulse of magnitude + 4 V and a period of 1 µs. We test the interaction induced by the FeFETs among the latches by intentionally programming them into the ‘incorrect’ state, in order to observe the system evolve into the correct state i.e., when Jij = − 1 (+ 1), the latches are initialized into the same (opposite) states (0/VDD), and subsequently, it is observed whether the system evolves to the correct state. During the initialization of the latches, the FeFETs are maintained at VGS = − 0.5 V. This reduces the conductance of the FeFETs without affecting the threshold voltage. After the latches are initialized, the gate voltage is increased to VGS = 1.5 V, and the corresponding dynamics of the FeFET coupled CMOS latches are evaluated. We note that VGS = 1.5 V was required to achieve the desired level of conductance from the FeFET-based coupling element (coupling strength) since the threshold voltage of the device has not been optimized for this application. The gate voltage can be reduced to zero by appropriately adjusting the threshold as considered in the following simulations and a detailed study of the optimization and the effect of the finite FeFET output conductance will be undertaken in future work. Figure 4d,e shows the observed behavior of the coupled system. Similar to the resistive coupling above, the coupled two-latch system settles in-phase (out-of-phase) when the coupling is positive Jij = + 1 (negative, Jij = − 1), respectively.

Using the above building blocks, we now explore a pathway to design a scalable Ising machine using latches as classical Ising spins and the FeFETs as programmable coupling elements. We propose to use a FeFET-based array to realize the coupling network among the latches (Fig. 5a). In this architecture, the rows (bit lines; BL) and the columns (source lines; SL) are connected to the drain and the source of the FeFET, respectively. Each row and column are driven by the two complementary outputs of a latch, effectively realizing the negative coupling (Jij = − 1), required to solve the MaxCut problem using the Ising model (Fig. 5a). We also note that the latches can be decoupled from the rows and columns using the transmission gate-based switches. This is required for the two-stage operation of the array as illustrated further. The word-line (WL), connected to the gates of all the FeFETs in a row, is used to program the FeFET state according to the adjacency matrix of the input graph.

Figure 5
figure 5

CMOS Latch-based Ising Machine with FeFET-based Coupling. (a) Schematic of the FeFET array used to implement the coupling network and its interfacing with the CMOS-latches; negative coupling (Jij = − 1) is implemented here. (b) Half select VW/2 write scheme that is used to program the FeFET array. (c) A representative 4 node graph problem. (d) Time domain output of the write voltages and latch outputs for solving the representative problem in (c). (e) MaxCut solutions obtained for graphs of various size up to 50 nodes. The graphs were randomly generated; 30 graphs were tested in total (10 different graph configurations per node).

The above array is simulated using the HSPICE platform51 which is interfaced with MATLAB in order to input the circuit simulation parameters required as well as to evaluate the output obtained from the circuit simulation. The CMOS latches are designed using the 10 nm PTM model52; a 5-fin FinFET design is used in order to achieve the desired drive strength for each inverter. The FeFETs are implemented using a circuit-compatible SPICE model which tracks the history-dependent switching behavior of the ferroelectric and the details of the model have been presented in our prior work50. A nominal variation of 10 mV in threshold voltage (per fin) is also considered; furthermore, the impact of the increased threshold voltage is detailed in supplementary note 3. Additionally, for the interconnect routing in the FeFET array, a line-to-ground capacitance of 0.122 fF/µm and 0.109 fF/µm, and a line-to-line capacitance of 0.0229 fF/µm and 0.0217 fF/µm are considered for the two metal layers- M1 and M2, respectively44. A random noise source, available in the spice stimulus53, of a maximum amplitude of 50 mV is added to the supply voltage.

Computing the MaxCut using the above array is a two-step process, and is illustrated here with the aid of a small 4-node graph as shown in Fig. 5c: (a) Programming the FeFET array to represent the input graph: During this phase, the latches are decoupled from the FeFET array by turning OFF the interfacing switches. The FeFET array can now be programmed as a standard memory array, and we employ the half-select (VW/2) bias scheme (Fig. 5b) to write the desired state to the FeFETs. First, a negative write voltage VW (− 3 V considered here) pulse is applied to all the WLs ensuring that all the FeFETs are initialized to the high-VT (low conductance) state. Subsequently, a column-wise write scheme is used wherein all the FeFET cells corresponding to Aij = 1 (in the respective column) are programmed into the low-VT (high conductance) state; no programming is required for the FeFETs that represent to Aij = 0 since the whole FeFET array was initialized into the high VT (low conductance) state at the start, as discussed earlier. Programming the FeFETs into the low-VT (high conductance) state is achieved by asserting the word line to + 3 V and the corresponding BL and SL to 0 V. The column-wise programming scheme as well as the applied voltages are selected such that the half-selected cells in the corresponding column and row experience minimal program disturbs (see Supplementary Note 4).

(b) Execution phase: Once the FeFET array has been programmed to represent the input graph, the latches are powered on, and connected to the rows and columns of the FeFET array. The FeFETs now act as coupling elements among the latches. Subsequently, the coupled system evolves towards the ground-state energy- manifested as certain latches changing their state, and in the process, computing the solution to the MaxCut problem. Considering the statistical nature of the computation, wherein the system can get trapped in local minima (resulting in a sub-optimal solutions), the power supply to the latches and the enable signal to the switches interfacing the latches and the FeFET array are cycled five times. Such periodic cycling is done to facilitate the system’s evolution through multiple trajectories through the phase space, and consequently, increase the probability of finding the global minima. However, we note that the size of the graph and the resulting complexity of the phase space will impact the number of cycles required to achieve a high-quality solution, which will be a critical factor for evaluating the scalability of the proposed implementation.

Figure 5d shows the programming and execution of a 4-node representative problem using the array proposed above. Once the FeFET array has been programmed according to the adjacency matrix of the graph, the interface between the latch and the array is turned ON and OFF periodically five times. It can be observed that each time the latches are connected to the array, the latch outputs evolve toward the ground-state energy, and the final state (Vout1 = Vout3 = VDD; Vout2 = Vout4 = 0) represents the two sets created by the MaxCut of the input graph (Fig. 5d). We note that optimal solutions are observed in all five evaluations here. However, this may not be the case in all cases, particularly, as the graph size increases.

Subsequently, we evaluate the dynamics of the proposed array using numerous (30) graph instances of varying sizes up to 50 nodes; 10 randomly instantiated graphs are considered for each graph size. The entire two-stage process including the programming of the FeFETs is simulated in each case, and hence the simulations are computationally intensive. We also observe that in all the evaluated instances, the latches settle to a steady-state within 100 ns after the transmission gates are turned ON. Figure 5e shows the Cut computed by the proposed array. For 10-node problem instances, the solutions are normalized to the MaxCut solution computed by the BiqMac solver54 which guarantees an optimal solution if the algorithm converges (within the maximum run-time of 3 h). For the larger 25 and 50-node graphs, since the convergence time exceeded the BiqMac solver’s run-time limit, we compare our results with the heuristic manifold optimization55 algorithm. We observe that for smaller graphs (10 nodes) optimal solutions are obtained in 9 out of the 10 cases. However, the solution quality degrades (mean accuracy for the 50 node graphs is 94%) as the graph size is increased owing to the increasing complexity of the solution space- a feature observed in all Ising machine implementations.

Discussion

Figure 6 compares the proposed approach with other approaches used to implement Ising machines. The implementation provides a pathway to realizing a compact Ising machine to solve computationally challenging problems such as MaxCut using CMOS-compatible components. We also note that the FeFET array, in general, provides a coupling framework among the Ising spins, that in principle, could also be used for other realizations of Ising spins such as oscillators39,40,41,42,43. We also acknowledge that the present design considers only a subset of graphs i.e., unweighted graphs, and has been evaluated here for negative Jij coefficients only (since they are relevant to solving the MaxCut problem). Nevertheless, these results motivate future work into the feasibility of this approach for solving weighted graphs (preliminary simulations shown in supplementary note 5) that can potentially exploit the multi-bit operation of the FeFET—a task that requires co-optimization of the FeFET array at the materials, device, and circuit level56,57,58,59.

Figure 6
figure 6

Comparison of the Latch-based Ising machine. Comparison of the proposed Latch-based Ising machine with other design approaches.

In summary, this work marks the first step towards proposing an Ising machine implementation that is well-positioned to take advantage of the maturity of the CMOS process technology making it a promising design approach for realizing high-performance application-specific accelerators for solving combinatorial optimization problems.

Methods:

Device fabrication

FeFETs employed in this work have a poly-crystalline Si/TiN/doped HfO2/SiO2/p-Si gate stack, which is integrated on the 28 nm node high-κ metal gate CMOS technology platform on 300 mm silicon wafers. Detailed information is described elsewhere50. For the ferroelectric gate stack, a thin SiO2 interfacial layer is grown first, followed by the deposition of the doped HfO2 film. Then a TiN metal gate electrode was deposited using physical vapor deposition and following that the poly-Si gate electrode is deposited. The source and drain n + regions were formed by phosphorous ion implantation and then rapid thermal annealing at ~ 1000 °C. This step also results in the formation of the ferroelectric orthorhombic phase within the doped HfO2.