Impact of Synaptic Device Variations on Classification Accuracy in a Binarized Neural Network

Brain-inspired neuromorphic systems (hardware neural networks) are expected to be an energy-efficient computing architecture for solving cognitive tasks, which critically depend on the development of reliable synaptic weight storage (i.e., synaptic device). Although various nanoelectronic devices have successfully reproduced the learning rules of biological synapses through their internal analog conductance states, the sustainability of such devices is still in doubt due to the variability common to all nanoelectronic devices. Alternatively, a neuromorphic system based on a relatively more reliable digital-type switching device has been recently demonstrated, i.e., a binarized neural network (BNN). The synaptic device is a more mature digital-type switching device, and the training/recognition algorithm developed for the BNN enables the task of facial image classification with a supervised training scheme. Here, we quantitatively investigate the effects of device parameter variations on the classification accuracy; the parameters include the number of weight states (Nstate), the weight update margin (ΔG), and the weight update variation (Gvar). This analysis demonstrates the feasibility of the BNN and introduces a practical neuromorphic system based on mature, conventional digital device technologies.

Conventional computing architectures (von Neumann architectures) consume large amounts of energy when solving cognitive tasks due to the unavoidable inefficiency of data transfer between the processor and the off-chip memory. This inefficiency is referred to as the von Neumann bottleneck. Alternatively, by mimicking both the functional and structural advantages of the biological neural system, power-efficient computing systems (i.e., neuromorphic systems 1 ) have recently been developed and are expected to offer promising breakthroughs. The practical implementation of the neuromorphic system depends on the development of ideal synaptic weight storage (i.e., the synaptic device). Highly integrated synaptic devices with sufficient reliability are essential for the on-chip implementation of a neuromorphic system that can process big data in real time, similar to the human brain.
Currently, various nanoelectronic synaptic devices based on two-terminal resistive switches (i.e., memristors) have demonstrated promising results by emulating the functionalities of biological synapses using their intrinsic analog conductance states [2][3][4][5][6][7][8] . Furthermore, using an integrated memristor network, functional neuromorphic systems have been experimentally applied to practical calculation tasks involving pattern recognition 9 , sparse coding 10 , matrix equations 11 , and differential equations 12 . Nevertheless, the sustainability of such devices is still in doubt due to the variability that is common to all nanoelectronic devices [13][14][15] . Because the physical mechanism of the conductance modulation in most prospective synaptic devices is a random process, that is, an atomic-level change based on electro/thermodynamics 16 , both cycle-to-cycle and device-to-device variations of conductance modulation are unavoidable 17 .
This concern may result from a misunderstanding of the neuromorphic system. The neuromorphic system simulates and exploits the characteristics and advantages of the brain, but this simulation and exploitation do not mean that the system must exactly imitate all of the structural and functional features of the brain. Unfortunately, with the goal of realizing a neuromorphic system that resembles the brain, most previous synaptic device studies blindly worked to demonstrate devices that were as similar as possible to biological synapses. As a result, most of the previous studies have focused only on the development/improvement of the analog conductance modulation dynamics, attempting to make them more similar to the dynamics of biological synapses while ignoring the variability issues [2][3][4][5][6][7][8] .
Alternatively, the sustainability and reliability of digital-type switching devices have been consistently ensured over the past 20 years 18 . Using current NAND flash technology, stable multiple memory states (4-bit = 16 states) with three-dimensional stackability have already been applied to a product. Therefore, if well-qualified conventional digital devices can contribute to a synaptic device, the aforementioned variability issues from memristors can be addressed. We have demonstrated a binarized neural network (BNN) in our previous study 19 , in which the synaptic device was a mature digital-type switchable device-a gate-all-around (GAA) silicon nanosheet transistor. By applying a supervised online training scheme, a set of multiple digital-type synaptic devices (buckets) were able to represent the analog synaptic weight. The BNN had an image classification capability that was verified by simulation and experiment 19 . However, our previous simulation was limited because the effect of synaptic device variations was ignored; the simulation was performed under the assumption that all synaptic devices in the system had equivalent characteristics without any variations. Therefore, in this study, the BNN is applied to facial image classification, and the effect of the synaptic device variations such as the number of weight states (N state ), the weight update margin (ΔG), and the weight variation (G var ) is included. The effect of device variations on the classification accuracy is analyzed quantitatively using the simulation. These results demonstrate the feasibility of BNNs, which provide higher immunity to synaptic device variability than conventional neuromorphic systems based on analog synaptic devices do.

Results and Discussion
In our previous work, we demonstrated a BNN and its supervised training scheme for an image classification application 19 . Briefly, Fig. 1a depicts the architecture of a BNN with M inputs and N outputs. The input image information is delivered into the network by two types of vectors u 1 (i) and w 1 (i), which denote the probabilityand write-vector, respectively (subscripted numbers indicate the order of each network when multiple networks are involved). When an input pattern needs to be distinguished from previously trained patterns (i.e., recognizing phase), u 1 (i) is applied to the network. The vector u 1 (i), which is rescaled to 0 ≤ u 1 (i) ≤ 1, directly corresponds to the intensity of each pixel. When an input pattern needs to be trained by updating the synaptic weight (i.e., training phase), w 1 (i) is applied to the network instead of u 1 (i). The vector w 1 (i), which is defined as w 1 (i) = {0 or 1}, is www.nature.com/scientificreports www.nature.com/scientificreports/ stochastically determined by the learning probability p, defined as p = γ•u 1 (i) (γ is the learning rate). Note that the most distinctive feature of the BNN is that the synaptic weights in the network G 1 (i, j) are given within a binary value: G 1 (i, j) = {G high or G low }, where G high and G low represent the high-and low-conductance states of the synaptic device, respectively. To represent actual analog weights using only G high and G low , the network G 1 is partitioned into sub-buckets (the size of each bucket is B 1 ). Each bucket is trained with a single specific input image according to the label. In addition, the selection vector s 1 (i), defined as s 1 (i) = {1 or −1 or 0}, directs the training on the input image according to the label, where 1, −1, and 0 represent "potentiation, " "depression, " and "no update" of the synaptic weight, respectively. Consequently, a set of binary values stored in the bucket can represent analog synaptic weights, which are related to the input image according to the label. Additional explanations for the operational principles of the BNN are presented in Supplementary Information Note 1.
In this study, the performance of the BNN is evaluated through the task of classifying images of faces from the Yale Face Database 20 , which contains a total of 165 grayscale images (32 × 32 pixels) of 15 individuals. In the database, there are 11 images per subject, and each image represents a different facial expression or configuration (center light, with glasses, happy; left light, without glasses, normal; right light, sad, sleepy, surprised, and winking). Here, we select 8 of the 11 images for the training set, and the remaining 3 images are used as the test set. Only the images in the training set are inputted to the network during the training phase. To evaluate the classification accuracy during the recognizing phase, only the images in the test set are inputted into the network.
For the storage of binarized synaptic weights in the BNN, a GAA silicon nanosheet transistor is used as the synaptic device (Fig. S2, Supplementary Information Note 2). The embedded charge-trap layer (silicon nitride, SiN) in the gate dielectric enables adjustable channel conductance (i.e., a synaptic weight update). The synaptic device array is configured such that s 1 (i) corresponds to the gate voltage (V G ) of the synaptic transistors in a particular row, and either u 1 (i) or w 1 (i) corresponds to the drain voltage (V D ). The source current of each synaptic transistor (I S ) is determined by the channel conductance (G high or G low ) and V D . The integrated I S of each row is the summation vector z 1 (i). Figure 1c shows the evolution of channel conductance in synaptic transistors as a pulse train is applied. Negative V G (V G = V pot ) leads to the detrapping of electrons in the SiN layer, which results in an increase in channel conductance up to G high (i.e., potentiation). In contrast, positive V G (V G = V dep ) results in the decrease in channel conductance down to G low (i.e., depression). The number of trapped electrons in the SiN layer depends on the level of V G . This dependence allows G high or G low to be adjusted, which enables control of the weight update margin (ΔG = G high /G low ) and the multiple weight state (N state ). The cycle-to-cycle weight variation (G var = [max(G) -min(G)]/mean(G)) is relatively smaller even after thousands of switchings, and ΔG is larger than the previous two-terminal memristors whose ΔG is below 10 with severe fluctuations [21][22][23][24] . The remainder of the paper discusses how the improved reliability of the digital-type weight update will contribute to the sustainability of the entire neuromorphic system. First, we investigate the impact of the number of weight states (N state ) on the classification accuracy of the BNN. Conventional memristors can theoretically have infinite internal conductance states (N state = ∞), but considering only the states that can guarantee reliability (e.g., data retention time or endurance), N state = 8-16 is the current technological limit [25][26][27] . Considering this reliability limitation, N state that can be obtained with current digital-type switching devices (e.g., N state of a quad-level cell NAND flash is 16) is not inferior to memristors. To identify the effect of N state on the classification accuracy in the BNN, two different cases are compared: one with N state = 2 (Fig. 2a) and the other with N state = 16 (Fig. 2b). The comparison assumes that there is no device-to-device variation. The simulated accuracy of facial image classification is shown in Fig. 2c,d as a function of the training epoch, where the number of networks alters the accuracy. With a single network (gray curve), the accuracy reaches approximately 50% with B 1 = 200. By deploying an additional network (red curve), the accuracy improves to approximately 70% with B 1 = 200 and B 2 = 100. The accuracy continues to improve with more networks, up to 80% (blue curve). However, as shown in Fig. 2d, a larger N state is less effective in improving the accuracy; rather, a greater number of training epochs are required. In the case of typical neuromorphic system based on the memristors, as the synaptic weight should be adjustable exactly as we desired to achieve the higher accuracy, a larger N state is advantageous for more precise G control. However, in the case of our BNN, as binarized/ quantized weight is gathered to represent a specific analog weight, the effect of the controllability in each synaptic weight on the accuracy is relatively reduced. Additionally, the pattern classification in the BNN is performed based on the bucket grouping multiple synaptic weights, and the effect of each weight value is also inevitably reduced. This unique feature of the BNN allows a feasible implementation of the neuromorphic system; engineering of the synaptic device to have a larger N state is not required any more like a conventional memristor-based neuromorphic system. Consequently, although only binarized/quantized weight is used, a reasonable accuracy can be obtained from a BNN with a higher training speed. This result indicates that a neuromorphic system without analog-type synaptic weights can perform a cognitive task by exploiting both BNN architecture and its supervised training scheme.
Next, a similar analysis was performed to study the effect of the weight update margin (ΔG) on the classification accuracy. In a conventional memristor-based neuromorphic system, increasing ΔG can improve the classification accuracy 28,29 . The ΔG of common memristors is about 10 21-24 , thus, much research has been devoted to further increasing ΔG. In contrast, our digital-type synaptic device (i.e., a GAA silicon nanosheet transistor) can obtain larger ΔG of up to 10 6 (Fig. 1c) by modulating the amplitude of V pot or V dep . In this BNN simulation, as shown in Fig. 3a, ΔG is adjusted from 2 to 10 3 , assuming no device-to-device variation. Figure 3b shows the classification accuracy as a function of ΔG. The modulation of ΔG (as well as the increased N state ) has little effect on the accuracy, which is contrary to the behavior of conventional memristor-based neuromorphic systems. The reason for this conflicting result is as follows: The memristor-based neuromorphic system uses multiple analog states defined within G high and G low for image training and recognizing, and the distinguishability and stability of each analog state critically affect the performance of the system. A larger ΔG leads to better distinction of each www.nature.com/scientificreports www.nature.com/scientificreports/ analog state, resulting in better distinction between the patterns to be distinguished and the background (noise) 30 . However, since BNN uses only binarized synaptic weight values (G high and G low ), the amount of difference between G high and G low is not critical. Therefore, the classification accuracy in BNN is independent of ΔG. This feature of the BNN can be a great advantage in realizing practical on-chip neuromorphic systems, because current nanoelectronic device technology is already sufficient to produce a ΔG of more than 10 without any further engineering of synaptic device.
Finally, the effect of the weight variation (G var ) on the classification accuracy was analyzed. The intrinsic instability and lack of control of analog conductance switching behavior in memristors critically degrade the performance of neuromorphic systems 24,28 , although these systems are capable of tolerating device-to-device variation  www.nature.com/scientificreports www.nature.com/scientificreports/ or noise to a certain degree. In our digital-type synaptic device, shown in Fig. 1c, G high and G low fluctuate during repeated switching. G var can be defined as [max(G) -min(G)]/mean(G), where G is either G high or G low . In this BNN simulation, shown in Fig. 4a, G var is adjusted from 0.2 to 1.0 with fixed ΔG = 10. As the weight of all synaptic devices is determined stochastically within a given G var range during the weight update process, this simulation considers not only cycle-to-cycle variation but also device-to-device variation. Figure 4b shows the classification accuracy as a function of G var . An increase of G var leads to the degradation of the accuracy. When ΔG = 2 (blue curve), the accuracy is severely degraded, to below 40%. However, when ΔG is above 5 (green and red curves), the effect of an increase of G var is not critical. As a BNN uses only binarized synaptic weight values, the immunity of cycle-to-cycle or device-to-device variations is considerably higher than for memristor-based neuromorphic systems. The high immunity to device variability is not the result of a well-demonstrated digital-type synaptic device. Instead, the BNN architecture and its supervised training scheme contribute to the high sustainability of the system. Therefore, further research efforts to implement a practical neuromorphic system should be devoted to developing the architecture and training scheme, rather than focusing on the improvement of analog properties in the synaptic device.
In summary, we have analyzed the impact of synaptic device variations on image classification accuracy in a BNN. The BNN has the following unique characteristics: 1) By using only binarized weight, the BNN can classify the input images with reasonable accuracy through the supervised training scheme. 2) The classification accuracy is independent of the weight update margin (ΔG) of the synaptic device. 3) The BNN is highly immune to variability (such as G var ). Due to characteristics 2 and 3, current device technology is sufficient to create a synaptic device without any further research effort. Actually, prior to our study, memristor-based BNNs has been proposed to reduce the memory access by binarizing the weight [31][32][33] . But it is still an open question how to build and train a neural network with binarized weight. So far, each previous study has proposed different BNN operation schemes, and each study has a different point of view. The main goal of previous memristor-based BNNs is to focus on more energy-efficient processing of deep neural network algorithms. However, our research rather focuses on providing an architecture and operation scheme that is less sensitive to synaptic device variations. Consequently, our BNN can provide a device-level breakthrough for neuromorphic systems, which are currently based on conventional memristors, and provide a novel direction and inspiration for future neuromorphic engineering.