Integrated neuromorphic computing networks by artificial spin synapses and spin neurons

One long-standing goal in the emerging neuromorphic field is to create a reliable neural network hardware implementation that has low energy consumption, while providing massively parallel computation. Although diverse oxide-based devices have made significant progress as artificial synaptic and neuronal components, these devices still need further optimization regarding linearity, symmetry, and stability. Here, we present a proof-of-concept experiment for integrated neuromorphic computing networks by utilizing spintronics-based synapse (spin-S) and neuron (spin-N) devices, along with linear and symmetric weight responses for spin-S using a stripe domain and activation functions for spin-N. An integrated neural network of electrically connected spin-S and spin-N successfully proves the integration function for a simple pattern classification task. We simulate a spin-N network using the extracted device characteristics and demonstrate a high classification accuracy (over 93%) for the spin-S and spin-N optimization without the assistance of additional software or circuits required in previous reports. These experimental studies provide a new path toward establishing more compact and efficient neural network systems with optimized multifunctional spintronic devices. An array of low-power devices that mimic the behavior of the brain has been constructed by researchers in South Korea. The brain works by passing chemical and electrical signals across a network of cells called neurons, which are connected via synapses. Scientists want to create an artificial neural network to take advantage of the parallel way in which the brain processes information. YeonJoo Jeong from the Korea Institute of Science and Technology, Jin Pyo Hong from Hanyang University, both in Seoul, and their colleagues have demonstrated a proof-of-principle neuromorphic computing network using so-called spintronic devices. Spintronics are low-power devices that use the property of an electron called spin rather than its charge as in conventional electronics. The team’s neural network of electrically connected spin-synapses and spin-neurons successfully performed a simple pattern classification task. We introduced spintronics-based synapses (spin-S) by utilizing a stripe domain ensuring its highly linear and symmetric weight responses, together with domain-wall motion-based neurons (spin-N) for activation functions. In addition, a crossbar array architecture for the spin-S/N has been proposed and experimentally demonstrated. A simple pattern-classification task was tested using an integrated network of electrically connected spin-S and spin-N to mimic a human brain. Our experimental findings provide a new avenue toward establishing more efficient neural network systems with spintronic devices.


Introduction
Advances in hardware technologies have resulted in hardware implementations of numerous neural network algorithms, including deep neural networks and convolutional neural networks, that use a feasible amount of computing resources. In turn, such implementations have fueled further algorithm developments 1,2 . Intensive studies and commercialization are underway to accelerate execution of such algorithms, using mature complementary metal-oxide semiconductor (CMOS) technology; some examples are graphics processing units 3,4 and application-specific integrated circuits [5][6][7] . However, this approach has the disadvantages of high power consumption and large-area coverage, which limits the use of recently developed algorithms despite significant breakthroughs in CMOS-based frameworks. To overcome these limitations, one promising approach is to apply the emerging in-memory analog computing concept in a crossbar array network, which is similar to an artificial synapse 8 . The simple Ohm's and Kirchhoff's laws for crossbar networks can enable massively parallel data processing that offer beneficial energy efficiency and performance [9][10][11] . For example, previous studies using oxide-based resistive random access memory (RRAM) have successfully demonstrated in situ training of a simple perceptron algorithm in emerging device networks and validated the aforementioned concept using classification tasks [12][13][14] . Nevertheless, hardware-based systems require further improvements. From an emerging device viewpoint, a linear and symmetric resistance change is important for achieving higher accuracy on a given task; however, most RRAMs exhibit intrinsically imperfect linear and symmetric characteristics. Another essential component in an artificial neural network, a perceptron, and neuron, as shown in Fig. 1a, relies mainly on software or dedicated CMOS analog circuits. Thus, developing neuron components that can be monolithically integrated at a simple device level with the existing weight device crossbar array in a compatible fabrication process are necessary for widespread use in hardware-based neural networks.
In recent years, spintronics-based devices, such as those employing current-induced domain wall (DW) motion, have attracted considerable interest as basic building blocks for advanced neuromorphic component deployments. These devices offer low-power consumption, and highly stable and reproducible operation [15][16][17][18][19][20] , according to the experimentally well-established model 21,22 . Experimental demonstrations of DW-based neuromorphic components using magnetic tunnel junctions (MTJs) have been reported 23,24 . MTJ-based resistance output devices generate a current output from the voltage input signal. Therefore, the current output signal from the weight device must be converted to the voltage signal to operate the activation function, which is achieved by an I-V converter with an operational amplifier (op-amp). Please note that I-V converter is required by all the resistance-based neuromorphic computing components. However, if the weight devices were to generate an output signal as voltage, they could be connected directly to the activation generator to operate it. Recently 25 , the performance of an artificial synapse was investigated by manipulating Hall voltage output signals conjugated by a novel spin texture, a magnetic skyrmion. However, previously reported output voltage-based artificial synapse devices still have the aforementioned issues (such as nonlinearity) that prevent practical implementations of hardware-based neural networks. In contrast, this work exhibits weight behavior linearity by employing a stripe domain motion that serves as a weight in the DNN algorithm and produces a spinsynapse (spin-S) voltage output signal instead of employing the conventional fully filled domain wall. Furthermore, by simply tuning the device operation principles, we also test a DW-based neuron (spin-N) in the same device geometry that functions as a sigmoidal activation function and has a voltage output signal. Together, these findings provide a new crossbar array configuration that employs voltage output (Hall voltage) weights and acts as a DNN accelerator. Furthermore, we propose the concept of how the voltage output signal of the spin-Ss can operate the spin-N at an array level, along with an experimental demonstration.   Fig. S1) were patterned into a Hall bar geometry with an asymmetric length, as shown in Fig. 1b. The distance between the nucleation region and the Hall detection region (red and white boxes, respectively, in Fig. 1b) was~260 µm, and the channel width was 60 µm. Figure 1c, d shows the variation in the Hall resistance (R H ) under voltage pulses and the corresponding DW states observed from the contrast difference in polar magneto-optical Kerr effect (MOKE) microscopy images. Here, R H is defined as the Hall voltage (V H ) detected in the Hall detection regime (red box) divided by the x-axis injected current (I x ). It is dominated by the z-component of the magnetization in the Hall detection region because the Hall voltage detected here is mostly dominated by the anomalous Hall effect. In its initial state, the entire magnetic layer is intentionally aligned with the −z-axis (state 1 in Fig. 1c, d). Upon the application of a voltage pulse of +18 V and 50 ms with an external x-axis magnetic field (H x ) of 225 Oe, a DW is clearly formed at the nucleation point (details are presented in the "Method" section), and then the DW position shifts to state 2 or 3. However, R H remains unaffected until the DW reaches the Hall detection region (red line). The R H starts to increase in state 4, reflecting the arrival of the DW in the Hall detection region and becomes saturated after reaching state 7 upon consecutive voltage pulses. These DW behaviors, including the velocity, detection, and starting positions, can be precisely adjusted by changing diverse operation parameters and adjusting the device architecture, as described later.
Spin-synapse for highly linear and symmetric functions Figure 1e shows the linear and symmetric variation in the R H of spin-S versus the number of pulses during potentiation (blue)-depression (red) operation, along with the corresponding time-synchronized MOKE images (yellow box). These distinct weight features are mainly attributable to the uniform shifts of the DW caused by the consecutive pulses. Voltage pulses of ±12 V (duration of 50 ms) are introduced with a y-axis magnetic field of -80 Oe.
To gain insights into the linear and symmetric variation in R H under identical voltage pulses, we adopt a creep scaling model for the DW motion, in which the velocity of the DW (v DM ) is explained by the Arrhenius form with an effective energy barrier height, αH Àμ z;eff V ð Þ: where v 0 , α, k B , T, and H z,eff represent the characteristic velocity, scaling constant, Boltzmann constant, absolute temperature, and z-component of the magnetic field, respectively, and the exponent μ is 0. 25 (refs. 26,27 S3). To further achieve a precise weight performance by means of the stripe domains, the number of stripe domains associated with the ratio of the stripe domain width and device width must be determined. The stripe domain width can be controlled by material parameter engineering, including PMA, dipole energy, DMI, and pinning densities.
To further examine the foregoing observations, the effects of the total pulse numbers for potentiation and depression were evaluated, as shown in Fig. 2d. The linearity parameter (β) was quantitatively extracted from the curves using a quadratic model for the change in the Hall resistance under the assumption that the stripe DW has a trapezoidal shape, as shown in Fig. S4, where β = 0 indicates a completely linear function. The green line in Fig. 2e represents the β of the representative oxide-based weight device 31 exhibiting a higher β, and the red and orange lines reflect the β values taken from the spin-S for potentiation and depression, respectively. Both lines are relatively close to the ideal case (blue line). The slight difference in the β value for the potentiation (β p ) and depression (β d ) of spin-S can be explained by considering either the surface energy of the DW or a slight change in the shape of the stripe domain during operation 32 . Figure 2f shows a plot of the representative endurance features of R H , which reflects the stability of the weight operation.

Spin-neuron for integration and activation functions
To satisfy the demand for compact artificial neuron-synapse integrated networks, a spin-N with a sigmoid function was prepared. The fabrication approach for the spin-N was identical to the spin-S described earlier to facilitate monolithic network integration. To achieve a basic sigmoid function, which is essential for solving nontrivial problems in multilayer networks 33 , both the inactive states (1-3 in Fig. 1c) and the saturation states (7-9 in Fig. 1c) of the DW device were employed, as sources of nonlinearity in the spin-N. The plot in Fig. 3a shows the spin-N responses in R H as a function of the voltage amplitude with a pulse width of 500 ms under various x-axis magnetic fields (H x ). The sigmoid function can be fitted using the following equation: where x 0 and k represent the rising point and slope parameters of the sigmoid function, respectively. Interestingly, both parameters can be tuned in the spin-N by varying the operation conditions or the device geometry.
x 0 can be controlled by changing H x ; that is, the shift in x 0 with an increase in H x is a result of the higher initial DW velocity caused by the reduced DW energy (Fig. 2a). At a high H x , the DW reaches the Hall detection region quickly, yielding a small value of x 0 . Figure 3b summarizes the relationship between x 0 and k at various H x values, where k remains almost unaffected by H x . However, because intentional control of k is one of the most important goals in the development of the sigmoid activation function, various k values were achieved, as shown in Fig. 3d, e, where the initial DW positions were intentionally selected by applying different erase pulse durations as indicated by the numbers in the corresponding MOKE images of Fig. 3c. When the distance from the initial DW position to the Hall detection region is shorter, the applied voltage when the DW passed the Hall detection region is lower. Then, because k depends on how fast the DW passes the Hall detection region, x 0 shifts to the left and a change in k can be achieved. These tuning parameters can also be determined by varying the device geometry, such as the distance between the nucleation region and the Hall detection region. Finally, because the spin-N functions as a nonvolatile neuron, an erase operation is required after the activation level is read from the spin-N. Thus, the spin-N should also have high endurance. Figure 3f presents the endurance performance of spin-N in a cycling test (up to 10 4 cycles).

Integration of spin-S and spin-N for pattern classification
To facilitate massively parallel computation, we propose the incorporation of the obtained spin-S and spin-N functions into the crossbar array frame frequently adopted in conventional RRAM-based networks, as shown in Fig. 4a. The proposed spin-N/S devices are based on fourterminal electrodes, in which the programming path is separated from the read operation path and-more importantly-the input and output are both voltage signals. Thus, ideally, the two devices (spin-S and spin-N) can be directly connected to form a crossbar network. Although as simple proof-of-concept, we have connected them via an op-amp, a proposed network array may not require an op-amp after further optimization, as described later. The working principle is as follows. The input voltages (x n ), which are proportional to the input amplitude, are introduced into the network, leading to the accumulation of charge carriers weighted by the synaptic weights (s nm ) at the Hall detection electrode. Then, the accumulated charges from all the weights in the same column gather and develop a Hall voltage. Hence, in the proposed network, all the information from all the connected weights is simply integrated in a convenient voltage form. The obtained total Hall voltage is appropriately adjusted through an op-amp to supply a suitable y m to the next neuron stage. This operational principle follows vector matrix multiplication (VMM): y m = AΣx n s nm , where A represents the gain of the op-amp. The corresponding y m drives the connected spin-N to operate in the same manner, producing an activation output (O m ) at each column.
To experimentally demonstrate the aforementioned operation principle, we plotted the response curves of y m and O m with respect to the input and weight values for integrated devices consisting of one spin-S and one spin-N, as shown in Fig. 4b, c. The output value at y 2 was clearly linearly dependent on the input and weight (Hall resistance). In addition, the final output of spin-N (O 2 ) generated representative activation curves depending on the y 2 level. These experimental observations demonstrate a successful VMM operation in the integrated spin-S and spin-N networks. We examined the concept further by considering a simple inference task of pattern classification utilizing the integrated frame from two spin-Ss and one spin-N (2 S + 1 N), where each device is electrically wire-connected in a printed circuit board (PCB), as shown in Fig. 4e. To conduct a proof-of-concept experiment for the integrated neuromorphic network behavior of Fig. 4, we installed a spin-N rotated by 90 degrees with respect to the spin-Ss under a single external magnetic field. Specifically, we used two manual patterns as inputs (Fig. 4d), where the weight column was already programmed to one of the patterns. With input pulses of 3 and 0 V (corresponding to the black and white pixels of the patterns, respectively), the network provides a higher activation value for the matched column. Figure 4f shows the two output levels corresponding to patterns 1 and 2 for a simple classification task performed by the small DW device array.
Simulation of a multilayer neural network with spin-S and spin-N By exploiting the aforementioned observed spin-S and spin-N characteristics, we performed the pattern classification task shown in Fig. 5a by utilizing the Modified National Institute of Standards and Technology 34 and Canadian Institute For Advanced Research 10 (CIFAR-10) 35 datasets (see the "Methods" section and Fig. S5). To evaluate the impact of the spin-N/S characteristics on the performance of the network, four different types of artificial synapse devices were selected: a software synapse (SW-S, red), the spin-S (stripe domain type, green), and the C-DW (conventional domain type, pink), as well as an oxide-based RRAM artificial synapse (blue) previously reported by another group 29 . Figure 5b shows the representative long-term potentiation (LTP) and long-term depression (LTD) operational curves for the four different artificial synapses. The SW-S corresponded to the ideal condition for weight updating and error propagation, exhibiting high linearity and symmetry in the LTP and LTD curves. Thus, the SW-S can develop a precise weight update calculated via the gradient descent method during the entire training process, regardless of the current weight value. Figure 5c shows the results for the evolution of the classification accuracy under different combinations of neural components. As predicted, the SW-S/ rectified linear unit (ReLU; red) combination exhibited the highest accuracy (>96%) after 200 iterations, whereas the oxide-based RRAM synapse device integrated with the ReLU (blue) exhibited the lowest accuracy (~79%). Surprisingly, due to its optimized characteristics, the spin-S/ ReLU (orange) also generated a high accuracy (~94%) even without aid from a circuit or compensation algorithm. In contrast, when the same ReLU activation function and the conventional DW type serving as weights (pink) were adopted, the accuracy reached only 91.5%. Moreover, the accuracy remained >93% even for the full hardware combination (i.e., the spin-S/spin-N case). These results indicate that the proposed spin-S and spin-N devices provide a novel solution for building a complete neuromorphic computing hardware implementation, while previously reported artificial synapse devices require a software-assisted ReLU activation function to play a neuron role. The outstanding abilities of the spin-S and spin-N components make them promising for use in fully operational artificial networks for highperformance systems with a simple artificial neural network design. These trends were confirmed in a simulation on the CIFAR-10 dataset, as depicted in Fig. S6. To establish the crucial relationship between the spin-S and spin-N device features and the fitting parameters, the simulation was conducted in an experimentally possible (x 0 , k′) subspace range, where two parameters-x 0 (start point) and k′ (converted slope)-were selected for spin-N, as shown in Fig. 5d. As shown in Fig. 3e, k varied from 0.3 to 0.6; thus, the offset translation for the k value was established by adding an external resistor (Fig. S7) to employ the converted k (k′) value in the simulation. The accuracy was increased to >95.18% by tuning the suitable parameters of spin-N (x 0 = 5.3 and k′ = 5.5) with the fixed weight characteristics of spin-S. Further performance enhancements could be achieved by adjusting the magnetic field, voltage range, or device design, as suggested in Fig. 3. In addition, the effects of the spin-S parameters, including the nonlinearity (β p and β d , defined in Fig. 2e), were simulated with the spin-N parameters fixed, as shown in Fig. 5e. Here, in addition to the nonlinearity, the symmetry between β p and β d is important to the accuracy. Our findings may provide guidelines for implementing novel hardware-based neural networks.

Potential advantages and issues of spin-N and spin-S for DNN accelerator applications
Our findings with the spin-N/S devices form an initial proof-of-concept experiment; such devices still have numerous limitations to the realization of real device applications. Thus, this section emphasizes the potential for employing spin-N and spin-S to construct DNN accelerators after further optimization. Possible approaches are discussed to resolve the current issues and to compare the potentials to those of other emerging devices.
The first issue is the operating speed. The operating speed of the current spin-N/S devices is governed by a DW motion using a creep model, as given in Fig. 2a. While the creep regime of a DW motion is too slow to be used in a modern computing system. However, the device performance could be expanded to the flow regime of DW motion, providing a higher DW velocity of~5700 m/s under~1 ns pulses 36 . This speed has also been demonstrated in a racetrack memory 37 , supporting the future operation of spin-N/S devices in a GHz range.
The second issue involves scaling. This work employed relatively large (60 × 260 μm 2 ) devices as an initial approach for spin N/S devices and to precisely detect domain state variation. However, the device size could be reduced to a sub-nm scale because the physically possible minimum size of spin-N/S can be estimated by the minimum stripe domain size. Note that when estimating a minimum stripe domain size, the stripe domain is a consequence of a trace of the half-skyrmion at the end. Thus, the minimum stripe width corresponds to the minimum half-skyrmion or minimum skyrmion size. To date, the theoretically and experimentally skyrmion sizes have been observed down to the sub-nm scale. Therefore, the stripe domain device (a half-skyrmion device) can be scaled to sub-nm sizes.
The third issue is high operating voltage. The operation voltage of the spin-N/S was~20 V, which is incompatible with modern circuitry. We believe that the high operating voltage issue will be solvable in the future because spin-N/ S device operation is based on current, not on voltage (electric field). In this paper, the driving force for the spin-N/S is spin-orbit torque (SOT), which is torque created by the spin Hall effect. Thus, the main parameter for spin-N/S devices is the operating current density, which is 10 11 A/m 2 in our paper. However, to experimentally demonstrate a relation between DW states and electrical outputs, we fabricated spin-N/S devices at a relatively large size (260 × 60 µm), which led to high resistance and is the reason why our prototype devices operated at a relatively high operating voltage (~20 V). At a real device design level, the ferromagnetic layer thickness, heavymetal layer thickness, and the device width/length ratio could all be parameters for adjusting the device resistance. For example, one recent paper 25 utilized (Pt/CoFeGd/ MgO) multilayers to operate skyrmion synapse devices with low resistance; the operation voltage was only a few millivolts with a subnanosecond pulse duration. It is worth noting that the operating nature of skyrmion synapse device paper was exactly the same as that of our work: SOT. Thus, further optimizing the device material parameters to lower device resistance may be a reliable approach for overcoming the high operating voltage issue in this work (more information is provided in Figs. S8 and S9).
The fourth issue is the required use of an additional external magnetic field during device operation. A possible alternative frame that would enable removal of the external magnetic field in future applications is as follows: we describe the role of external parallel and perpendicular magnetic fields in the current direction briefly before explaining our alternative frame. First, the magnetic fields parallel to the current in spin-N have an effect on controlling the DW velocity, that is, the operating voltage of the spin neuron, as seen in Fig. 3a, b. Therefore, the role of the x-axis magnetic field in the spin neuron could easily be replaced by employing a suitable PMA value, which could be achieved by varying the CoFeB layer thickness or choosing a postannealing temperature. A more critical factor is the magnetic field perpendicular to the current in the spin-S device. This additional magnetic field serves to adjust the stripe domain elongation direction. The stripe domain with a sufficient DMI has a half-skyrmion at the end of the stripe domain, thereby inducing the skyrmion Hall effect. This effect implies that the skyrmion motion has a transverse component in velocity around the driving force (the driving current). As such, the stripe domain elongates along the direction deviating from the current, which consequently pushes the stripe domain toward the edges of the patterned device. As mentioned before, because the linearity of spin-S originates from stripe domain motion away from the edge, the stripe domain must elongate in the direction parallel to the current. Manipulation of the half-skyrmion Hall effect can be achieved by the in-magnetic field perpendicular to the driving force due to internal structure deformation. Among the various approaches for creating effective inplane fields, one of the most compatible approaches is the insertion of the exchange bias layer. For example, a recent report 38 addressed magnetic field-free SOT switching by inserting an in-plane magnetized layer on a perpendicularly magnetized ferromagnet; that is, the stable external magnetic field can be replaced by inserting an exchange bias layer into the previous configuration. Thus, although our current work utilizes an external y-axis magnetic field to ensure the straight motion of the stripe domain, the insertion of an additional exchange bias layer may be crucial for removing the external magnetic field in the near future.
The possible advantages of spin-N/S devices over the existing emerging devices are as follows: first, spin-S has a higher linear weight variation due to the stripe domain motion. As described in Fig. 5e, the linearity of weight variation has a significant effect on the accuracy of the trained network. However, the emerging weight devices 25,39,40 still possess nonlinearity features except for those reported in a few papers 41,42 due to their resistance variation mechanism, and also possess relatively wide cycle-to-cycle distribution in device performance. However, the main information carriers of the spin-N/S (stripe domain or convention DW) are governed by the wellestablished physical model, thereby enabling the possibility of precise control of the information carriers. In addition, the spin-N/S shares similar materials, structures, and operating schemes with the recently well-established SOT random access memories and racetrack memories, which have high endurance and retention features compared to those of RRAM and PCRAM devices. Table S1 summarizes the potential performance for diverse weight devices.

Conclusion
This study presented the first proof-of-concept demonstration of an integrated neuromorphic network using spintronics-based synapses (spin-S) and neurons (spin-N), both of which are prepared via the same fabrication process. We provided a crossbar array architecture for the Hall voltage output of the spin-S, not for conventional resistance output devices, and experimentally applied it to a simple pattern classification task using an electrically integrated two spin-Ss/one spin-N network, showing its possibility for constructing more compact neuromorphic computing networks. Simulations using experimentally determined parameters yielded a high accuracy (93%) in completely spin-N/S-based neural networks, thereby showing the possibility of developing compact and efficient spin-based neural networks. Nevertheless, further empirical observations and comparisons together with the elimination of the external magnetic field commonly required for device operation are needed to exploit a crucial device architecture.

Sample fabrication
The films used in this study were deposited on 200-nmthick thermally oxidized Si substrates via magnetron sputtering with a base pressure of <7 × 10 −8 Torr at room temperature. To provide the PMA characteristics, a postannealing process was conducted at 350°C for 30 min under vacuum conditions of <1 × 10 −6 Torr with a 3-T perpendicular magnetic field. The asymmetric Hall bar geometry was obtained by utilizing photolithography and Ar ion milling, followed by an O 2 plasma ashing process for 2 min at 50-W radiofrequency power to completely remove the residual photoresist material hardened by the ion milling process. The electrodes for the Hall channel and the nucleation line were prepared as Ta (3 nm)/Pt (100 nm) layers.

MOKE microscopy and electrical measurement
A custom-built MOKE microscopy system with out-ofplane and in-plane electromagnets was employed to image the domains used in the spin-N and spin-S devices. As shown in Fig. S1, a stable PMA feature was observed. The +z and −z domains were clearly identified by the contrast difference in the MOKE microscopy images, as shown in Fig. 1d, f. To observe the current-induced DW motion, four probes were incorporated in the MOKE system; two were connected to the voltage source path, and the other two were connected to the Hall voltage detection terminals. The Hall voltage was monitored using a Hewlett Packard 34401 A multimeter device. In addition, to synchronize the MOKE images with the Hall voltage signals, the MOKE images were programmed to be taken immediately after the injection of each voltage pulse.

Formation of DWs at nucleation sites and current-driven DW motion
With the Hall bar design, the driving current flowed mainly through the W layer along the x-direction (yellow line in the right-hand image in Fig. 1b). The magnetic CoFeB layer could easily be damaged in the nucleation region by the sputtering growth of the nucleation electrode, reflecting the presence of a significantly reduced PMA energy (K eff ) that was proportional to the energy required for magnetization reversal. Hence, the initially reversed magnetization drove the formation of the DW within the nucleation region; then, the DW spread out in the CoFeB layer along the driving current direction via the SOT phenomenon, where the Neel-type DW was stabilized by the finite DMI energy.

Network structure for simulation
In the neural network simulation, two-synapse-layer perceptron networks with 784 (28 × 28) input neurons, 128 hidden neurons, and 10 output neurons were used, as shown in Fig. 5a. ReLU and softmax 35 were adopted as the activation and loss functions, respectively. ReLU is highly popular due to its hardware-friendly implementation 36 .