The von Neumann architecture provides accurate calculations, however, it is not suitable for low power applications because of the data bottleneck between the memory and the processor1. In order to overcome the limitations of the von Neumann architecture, various artificial neuromorphic devices were explored to imitate functions of the brain. In details, two-terminal memristors such as resistive random-access memory (RRAM) and phase-change memory (PCM), and the three-terminal charge trap memory and electrochemical random-access memory (ECRAM) with separated reading and writing paths have been actively studied as synaptic devices for artificial neural networks (ANN)2,3,4,5,6.

By the way, vision systems assisted by neural processing allow accurate object detection, pattern recognition, and real-time image processing for robotics, autonomous vehicles, and sensory electronics7,8,9,10,11. A conventional vision system separates image sensing and signal processing. Its performance is thus adversely limited owing to signal latency and power consumption that arises from a huge amount of data processing with the inclusion of redundant data passing through a converting circuit such as an analog-to-digital converter (ADC), as illustrated in Fig. 1a12,13,14. In contrast, a biological retina performs sensing and simultaneous pre-processing of visual information in order to extract key features from the input visual data15,16,17,18. By the elimination of redundant visual data, subsequent information processing in the brain such as object detection and pattern recognition can become faster with lower power consumption.

Figure 1
figure 1

Schematic diagram of (a) conventional vision system and (b) in-sensor vision system. An internal artificial neural network (ANN) performs both sensing and preprocessing in a sensor for reduction of the signal latency and power consumption at converting circuits such as an analog-to-digital converter (ADC). (c) An internal ANN constituting an in-sensor vision system with the MOSTs, which can be located at the forefront of the ANN to simultaneously detect the optical signals and transmitting the preprocessed signal to the next layer. (d) Schematic of the mnemonic-opto-synaptic transistor (MOST). A charge trap layer (CTL) for tunable photoresponsivity and memory function is inserted into the gate dielectrics. (e) Scanning electron microscopy (SEM) image of the MOST array and (f) cross-sectional transmission electron microscopy (TEM) image of the gate region. Bandgap-engineered (BE) tunneling dielectrics (OI/NI/OII) were adopted to reduce the operating voltage.

Recently, inspired by a biological vision system, various optoelectronic synaptic devices that can act as both a photodetector and a synapse used for an ANN by preprocessing of the data in a sensor have been demonstrated9,10,11. During the optical sensing, however, their synaptic weight is changed owing to an optically controllable synaptic weight. This optical weight update is useful for recognizing one pattern or similar patterns, but it is difficult to recognize various subsequent patterns because the synaptic weights are customized to a previous pattern. Therefore, repetitive reset operations are needed before accepting new patterns. Unlike the abovementioned optoelectronic synaptic devices, Wang et al. and Mennal et al. demonstrated vision sensors where repetitive reset operations were unnecessary due to the invariant synaptic weight during the optical sensing. They reported tunable photoresponsivity using a photodetecting device composed of two-dimensional (2D) materials, such as a phototransistor or a photodiode18,19. The tunable photoresponsivity in a photodetecting device corresponds to the controllability of weight update in a synapse, and it is a significant advantage for an in-sensor vision system, because photoresponsivity tunable photodetecting device can act as a synapse for an ANN as well as a photodetector for a sensor. Thus, the in-sensor processing with the inclusion of image sensing and signal processing allows real-time multiplication of the image with a memorized photoresponsivity matrix. Such an in-sensor vision system is attractive for reduction of signal latency and power consumption, which occur at converting circuits such as the ADC, as illustrated in Fig. 1b.

It is worth noting that the previous photodetecting device with tunable photoresponsivity requires external memory, which is indispensable for storing the value of gate voltage to tune the photoresponsivity18,19. This memory can impose a burden on accessing a designated memory cell with high speed and realizing a mobile vision system with a compact size for an all-in-one chip. Thus, signal latency and power consumption that arise from external memory become increasingly problematic. In addition, 2D materials cannot be easily integrated by microfabrication of a complementary metal–oxide–semiconductor (CMOS) based image sensor system with high throughput owing to less CMOS compatibility. For a large-scale vision system, a CMOS compatible photodetecting device such as a photodiode and a phototransistor is preferred; however, tunable photoresponsivity is not available. Each approach for tunable photoresponsivity without CMOS compatibility and CMOS compatibility without tunable photoresponsivity has its respective strengths and weaknesses. Therefore, it is very timely to explore another photodetecting device with tunable photoresponsivity, CMOS compatibility, and even more memorability.

In this work, a mnemonic-opto-synaptic transistor (MOST) is demonstrated in the form of a metal–oxide–semiconductor field-effect transistor (MOSFET). This MOSFET has a vertical pillar-shaped channel protruded from a silicon bulk substrate and a gate wraps a sidewall of the pillared channel completely with a gate-all-around structure. This vertical MOSFET is advantageous from the perspective of the footprint area and light absorption20,21,22. Moreover, by embedding a charge trap layer (CTL) of a nitride (Si3N4) to the gate dielectrics of the MOST for the memory function, individual control of photoresponsivity for each MOST is achieved and real-time multiplication of the image with a memorized photoresponsivity matrix is performed. Therefore, it can act as a photodetector and a synapse with non-volatile retention of learned weights in the ANN for the in-sensor vision system due to the intrinsic memory function of the intercalated CTL. It does not need repetitive reset operations because the synaptic weight is not changed during the optical sensing. This characteristic is attributed to fully electrical control of the synaptic weight. Furthermore, by virtue of 100% CMOS compatible fabrication, it can be integrated with a conventional large-scale CMOS image sensor system comprising numerous small-sized pixels. After optical and electrical characterization of the MOST, recognition of a simple pattern is performed using the sets of the fabricated devices, and recognition of a complex MNIST hand-written number is exploited using software simulations.

Results and discussion

Figure 1c represents the ANN for the in-sensor vision system using the MOSTs. The MOSTs are located at the forefront of the ANN for detecting the light intensity and transmitting pre-processed weights with a reflection of optical signals to the next layer. The photocurrent (Iphoto) summed from each neuron at the next layer is produced by the multiplication of the memorized photoresponsivity matrix and the light intensity of each pixel. When the vision system has N pixels and M neurons at the next layer, current summed in the mth neuron of the next layer (Im) can be represented by the following equation: \({I}_{m}=\sum_{n=1}^{N}{I}_{photo}=\sum_{n=1}^{N}{R}_{mn}{P}_{n}\), where n = 1, 2, …, N and m = 1, 2, …, M denote the indices of the pixel and the neuron at the next layer, respectively. Rmn represents the memorized photoresponsivity matrix and Pn represents the light intensity of each pixel. In this way, the in-sensor processing with the inclusion of image sensing and signal processing allows real-time multiplication of the image with the memorized photoresponsivity matrix19.

Figure 1d shows a schematic of an n-channel MOST with a vertical pillar structure. n+ heavily doped source (S) and drain (D) are located at the top and the bottom of each pillar in the array of MOSTs shown in Fig. 1e, which protrudes from a bulk-silicon wafer, respectively. Between the S and D, there is a p-type channel. As gate dielectrics, quintuple-layers (OI/NI/OII/NII/OIII) composed of triple-layered tunneling dielectrics (OI/NI/OII), the aforementioned CTL nitride (NII), and a blocking oxide (OIII) wrap around a sidewall of the pillared channel, as shown in Fig. 1f. The triple layers of the OI/NI/OII were adopted to reduce the operating voltage by barrier engineering (BE) of the tunneling dielectrics23,24. Each thickness of the gate dielectrics is 1.3 nm/1.3 nm/1.6 nm/5.6 nm/6.3 nm in the order of OI/NI/OII/NII/OIII, respectively. A triple-layered metal gate composed of titanium, titanium nitride, and tungsten (Ti/TiN/W) also surrounds the sidewall exterior of the gate dielectrics and pillar. When the light is illuminated, the carriers are generated and flown in the channel in the form of Iphoto that drives the photodetector. Iphoto is actually the drain current (ID) flowing between the source and the drain, which is controlled by the gate voltage (VG) and drain voltage (VD). The gate electrode makes the photoresponsivity tunable by charging and discharging the CTL of NII (hereafter simply abbreviated as ‘CTL’) and controls the memory function. Note that NI in the tunneling dielectrics cannot serve as a CTL because OI is too thin to block tunneling of the trapped charges. Fabrication details of the MOST are described in Figure S1.

In the MOST, threshold voltage (VT) can be adjusted by two factors, photo-carriers controlled by light illumination and trapped electrons modulated by the VG in the CTL. Figure 2 shows the transfer characteristic curve of ID versus VG (IDVG) according to the light intensity (P) and the number of gate pulses (Npulse). This Npulse determines the level of ID at each state in the synaptic operation, i.e., the number of states. As an example, Npulse of 0 is the initial state with the highest ID due to the lowest VT, and Npulse of 31 is composed of 31 gate pulses that produce the lowest ID due to the highest VT in the depression for multi-states of 32. In this work, a variable pulse number with an identical pulse amplitude and width is used for a potentiation–depression (P–D) operation. An LED (SOL 3.0, Fiber Optic Korea Co., Ltd.) was used as a white light source. The P indicated in Fig. 2 is the measured value in a blue region with a wavelength of 405 nm. It was quantified by a power meter that has a detection spot area of 0.785 cm2. Figure 2a shows a leftward VT shift. This is caused by the photo-carrier generation, which arises from light illumination25. In contrast, Fig. 2b exhibits a rightward VT shift. It is attributed to electron trapping in the CTL by applied positive depression gate voltage (VG,dep); i.e., it suppresses inversion at the channel surface. This is analogous to the depression operation to reduce the synaptic weight in an artificial synapse26,27,28. The magnitude of VG,dep is 9 V and its pulse width is 10 μs. It should be noted that the rightward VT shift by the electron trapping is semi-permanent and the leftward VT shift by the light illumination is temporal. In other words, the VT shift is returned to a pristine state when the light illumination is removed. Figure 2c superimposes IDVG with the photo-carrier generation by incident light and the electron trapping by the applied VG,dep in one graph. The ratio (η) of photoresponsivity without charge trapping to that with charge trapping by VG is approximately 800 at a VG,read of 0 V. In this way, photoresponsivity can be modulated effectively by controlling the trapped electrons in the CTL. Therefore, the MOST acts as a photodetector by sensing Iphoto with light, a synapse by updating a weight with VG, and a non-volatile memory by holding a weighted state with trapped charges for the in-sensor vision system. This tunable photoresponsivity is utilized as a controllable synaptic weight in the ANN. Unlike the previously reported photodetecting device, extra memory is no longer needed because the MOST itself harnesses an inherent non-volatile memory function18,19.

Figure 2
figure 2

(a) Transfer characteristics (IDVG) of the MOST for various light intensities (P). Leftward VT shift with increased P that corresponds to a temporal response by the photo-carrier density. (b) IDVG for various Npulse with + VG. Rightward VT shift with increased Npulse that corresponds to a semi-permanent response by the trapped electron density. This is a depression operation for reducing the weight of the synaptic device. (c) IDVG at dark and 1 mW light illumination before and after depression. Photocurrent (Iphoto) at read gate voltage (VG,read) of 0 V is approximately 0.1 μA before the depression and 0.1 nA after the depression, respectively. In this way, the trapped electron density tunes the photoresponsivity.

Figure 3a shows the depression where ID was decreased by an increased Npulse for various P. Herein Npulse is varied from 0 to 31; i.e., there are 32 states. The magnitude of VG,dep is 9 V and its pulse width is 1 μs. This result shows that the photoresponsivity was finely tunable with multi-states. For a typical synaptic operation, the potentiation that increases the synaptic weight should be available, similar to the depression that decreases the synaptic weight. Figure S2(a) represents the P-D characteristics for various P, i.e., with light illumination. The conductance (G) is defined as ID/VD, which can be simplified to ID because the applied VD was 1 V. The photoresponsivity was finely tunable during the potentiation as well as the depression. The magnitude of potentiation gate voltage (VG,pot) is − 10 V and its pulse width is 200 μs. Figure S2(b) shows another P-D characteristic in a dark environment, i.e., without light illumination. From Figure S2(b), the nonlinearity parameters (α) were extracted using the following equation:

$$ G = \left\{ {\begin{array}{l} {\left( {\left( {G_{max}^{\alpha } - G_{min}^{\alpha } } \right) \times w + G_{min}^{\alpha } } \right)^{1/\alpha } \quad if \alpha \ne 0,} \\ {G_{min}^{\alpha } \times (G_{max} /G_{min} )^{w} \quad if \alpha = 0.} \\ \end{array} } \right. $$

where Gmax is the maximum conductance, Gmin is the minimum conductance, α is a nonlinear parameter, and w is an internal variable that ranges from 0 to 129. The extracted αpot and αdep were − 0.02 and − 0.58, respectively. These parameters are used for the subsequent software simulations. It is well known that a large number of states is preferred to enhance the performance of pattern recognition in a synaptic device26,27,28. In this context, it was also confirmed that the P–D characteristics for Npulse of 64 and 128 were achievable by delicately tuning the gate pulse, as shown in Figure S3.

Figure 3
figure 3

(a) ID versus Npulse for various P. The photoresponsivity is reduced by the depression. (b) Real-time ID for various P when Npulse is zero. ID is increased as P increases. (c) Real-time ID for various Npulse when P is 1 mW. ID is decreased as Npulse increases. (d) Retention characteristics of the MOST for various Npulse when P is 1 mW.

Figure 3b, c show the real-time ID for various P and Npulse, respectively, when the light is turned on and off. At a fixed Npulse, ID was increased as P increased. At a fixed P, ID decreased as the Npulse increased. It is worth noting that ID returned to the initial state when the light was off. This feature assures that the synaptic weight is not changed during the optical sensing and repetitive reset operations are not needed. As shown in Fig. 3d, ID was sustained even after 40,000 s owing to the superior retention characteristics of the CTL-based memory. This attribution has been proven by commercial flash memory adopting the CTL. It should be recalled that good retention characteristics of a synaptic device are crucial for reliable operation over time28.

Figure S4 shows the P–D characteristics of the MOST for various wavelengths (λ). Measurements were performed by using a blue (B), red (R), and infrared (IR) light source. Each λ of B, R, and IR light is 405 nm, 638 nm, and 1550 nm, respectively. As shown in Figure S4, tunable photoresponsivity was observed for visible light of B and R, whereas it was not for the IR light. This is because the B and R light can generate photo-carriers to increase Iphoto. However, the IR light cannot create them owing to a small photon energy of 0.80 eV compared to the silicon energy bandgap of 1.12 eV30,31. It should also be noted that the photoresponsivity of the B light was smaller than that of the R light because the penetration depth is decreased with shorter λ32. The demonstrated wavelength dependency as well as the intensity dependency of the tunable photoresponsivity can help in recognizing a color mixed pattern33,34.

As mentioned above, BE tunneling dielectrics composed of the triple layers renamed BE layers were adopted to reduce the operating voltage. In order to confirm this effect, simplified MOSTs were fabricated as a control group. The BE layers of OI/NI/OII were replaced by a single layer of thermal oxide (Osingle). Other structures were set to be the same. As plotted in Figure S5(a), the measured transfer characteristics of the fabricated MOST with Osingle/NII/OIII showed similar photoresponsivity compared to those with OI/NI/OII/NII/OIII. This is because the gate dielectric has no effect on the photo-carrier generation by light. Whereas VT was shifted rightward by a VG,dep of 9 V in the case of the OI/NI/OII/NII/OIII (Fig. 2), it was not changed by that in the case of the Osingle/NII/OIII, as shown in Figure S5(b). A VG,dep larger than 11 V should be applied to change the VT and update the synaptic weight, as shown in Figure S5b. As a consequence, the P–D characteristics in Figure S5(c) show that synaptic weight update is impossible with the same VG,dep in the case of the Osingle/NII/OIII. Therefore, it is confirmed that the gate dielectric structure of OI/NI/OII/NII/OIII is more attractive than that of Osingle/NII/OIII for low-power neuromorphic hardware.

Using a full set of the fabricated MOSTs, simple pattern recognition was performed using a single-layer perceptron (SLP). As illustrated in Fig. 4a, two images, ‘A’ of an off-diagonal pattern and ‘B’ of a diagonal pattern, were prepared. Each pattern comprises 2 × 2 black-and-white pixels. Classification of the two patterns was attempted. A neural network was composed of four input pixels labeled P1, P2, P3, and P4 and two nodes in the output layer labeled OA and OB, as depicted in Fig. 4b. By detecting the output current of the MOSTs connected to each output node, each pattern was recognized. The photoresponsivity that corresponds to the synaptic weight was preset with a binary value, the maximum photoresponsivity and the minimum photoresponsivity, from the data of Fig. 3a. The solid lines and the dashed lines in Fig. 4b represent the device with the maximum photoresponsivity and the minimum photoresponsivity, respectively. Each photoresponsivity is represented as ‘R’ in the neural network configuration. This in-sensor processing with the inclusion of image sensing and signal processing performs real-time multiplication of the image with a memorized photoresponsivity matrix19. Figure 4c shows the circuit diagram to construct the neural network of Fig. 4b. VG and VD were set as 0 V and 1 V, respectively. Each output was measured in the form of the output current: Iout,A and Iout,B; i.e., Iout,A was measured in the output node OA for the input image of ‘A’ and Iout,B was measured in the output node OB for the input image of ‘B’, as shown in Fig. 4d. As a result, inference for the simple pattern was experimentally verified. It is worth comparing the required components to distinguish the abovementioned two simple patterns. This work that is applicable to an in-sensor vision system demands only eight MOSTs without extra photodetectors, ADCs or synaptic devices. In contrast, a conventional approach that is suitable for a conventional vision system may need four photodetectors, an ADC, and eight synaptic devices. Thanks to this in-sensor vision system, rapid classification within 1 ms was achieved with low power consumption under 150 nW. This is very small compared to the power consumption of an ADC used for a conventional vision system, which ranges from a few tens of μW to a few mW35,36.

Figure 4
figure 4

Demonstration of hardware-based pattern recognition. (a) Two input images ‘A’ (off-diagonal) and ‘B’ (diagonal), which are composed of 2 × 2 black-and-white pixels. (b) Neural network and (c) Circuit diagram for the 2 × 2 pattern recognition. They are composed of eight MOSTs with each tunable photoresponsivity represented as ‘R’. (d) Measured classification data of the off-diagonal and diagonal patterns. By comparing two output currents (Iout,A and Iout,B), fast classification within 1 ms was achieved with low power consumption under 150 nW.

To demonstrate recognition of more complex patterns such as hand-written digits in the MNIST dataset, a multi-layer perceptron (MLP) network composed of two hidden layers was constructed, as illustrated in Fig. 5a. An input layer corresponds to 528 input pixels, which were cropped from the 28 × 28 pixels, and an output layer corresponds to the 10 numbers from 0 to 9. Each hidden layer is composed of 250 neurons. The MOSTs were located at the forefront of the network for detecting the light intensity and transmitting pre-processed weights with a reflection of optical signals to the first hidden layer. Each device has its own photoresponsivity corresponding to the synaptic weight, which is represented as ‘R’ in the neural network configuration. This simultaneous image sensing and signal processing allow real-time multiplication of the image with a memorized photoresponsivity matrix19. The measured photoresponsive and P-D characteristics from the fabricated MOSTs in a dark environment were reflected in the software simulations. Figure 5b shows a flow chart that summarizes the simulation sequence to reflect the measured photoresponse characteristics and electrical characteristics of the fabricated MOST. Iphoto is the drain current with light illumination (ID,light) and Idark is the referenced drain current without light illumination (ID,dark). Except light-on and light-off, all other conditions are the same. Herein the ratio of Iphoto/Idark, i.e., ID,light/ID,dark is defined as γ, which is extracted from the experimental results. Prior to the simulation, γ was extracted for various light intensities (P) by linear interpolation, as shown in Fig. 5c. For improvement of the simulation accuracy, this step was repeated for each synaptic state. γ of each pixel was extracted by substituting the MNIST dataset into the interpolated curve, because the MNIST dataset represents the pixel intensity. Afterwards, the conductance of each synapse in a dark environment (Gdark), which was extracted from the P–D characteristic of Figure S2(b), was multiplied by γ. Because the applied VD of the MOST is 1 V, Gdark, defined as Idark/VD, is simplified to the Idark. The multiplication thus results in Iphoto. Finally, Iphoto that contains information of the pixel intensity and the photoresponsivity of the synapse is transmitted to the first hidden layer for summation at each neuron. In detail, current summed in the mth neuron in the first hidden layer (Im) can be represented by the following equation:

$$ I_{m} = \mathop \sum \limits_{n = 1}^{528} I_{photo,mn} = \mathop \sum \limits_{n = 1}^{528}\gamma_{mn} G_{dark,mn}, $$

where n = 1, 2, …, 528 and m = 1, 2, …, 250 denote the indices of the pixel and the neuron at the first hidden layer, respectively.

Figure 5
figure 5

Demonstration of software-based pattern recognition. (a) Neural network for recognition of hand-written numbers in the MNIST dataset. Photoresponsive optical characteristics (sensory function) and non-volatile electrical characteristics (mnemonic and synaptic function) measured from the fabricated MOSTs are reflected at the forefront of the network. Measured electrical characteristics are reflected in normal synapses that are connected to the first hidden layer and the second hidden layer, or the second hidden layer and the output layer. (b) Software-based simulation sequence to reflect the measured characteristics of the MOST. (c) γ (≡Iphoto/Idark) as a function of the light intensity (P). Linear interpolation from the measured data is utilized to create extra data. (d) Simulated recognition accuracy according to the number of training epochs. Recognition rate of 85.7% is achieved, which is close to the upper limit of 88.3% by an ideal software-based algorithm.

For a normal synapse between the first hidden layer and the second hidden layer or between the second hidden layer and the output layer, only the electrical characteristics (e.g., P–D characteristics at dark environment) were reflected because they could not respond to the light owing to deficiency of a photo-effect. The sigmoid activation function was adopted and supervised learning with back propagation was employed for the learning process to update the synaptic weight of the MOST and a normal synapse. Figure 5d shows the simulated recognition accuracy according to the number of training epochs and the saturated recognition rate was 85.7%. This recognition rate is comparable to an upper limit of 88.3%, which is achievable by software-based pattern recognition simulations that directly multiply the MNIST dataset by the conductance of each synapse, which has ideal P-D characteristics of perfect linearity and symmetry; i.e., αpot = 1 and αdep = 1.


In summary, a mnemonic-opto-synaptic transistor (MOST) was demonstrated for an in-sensor vision system by embedding a non-volatile memory function into a photodetecting device. Because the threshold voltage of the MOST was controlled both by light illumination and by an electrical pulse, the photoresponsivity was tunable by changing the trapped electrons in the charge trap layer (CTL) that enable the non-volatile memory function. Thereby it performed triple functions: photoresponsivity memorizing as a memory cell, light-sensing as a photodetector, and weight updating as a synapse. At the forefront of the ANN, the MOST simultaneously detects light and generates a pre-processed signal to perform real-time multiplication of an image with a memorized photoresponsivity matrix in sensors. More advantageously, it does not require repetitive reset operations because of the invariant synaptic weight during the optical sensing (Table S6). Furthermore, it does not require external memory because of the inherent memory function of the CTL. In addition, the MOST can be integrated with a conventional CMOS image sensor composed of numerous small-sized pixels because it was fabricated with 100% CMOS compatible microfabrication.