Photo-induced non-volatile VO2 phase transition for neuromorphic ultraviolet sensors

In the quest for emerging in-sensor computing, materials that respond to optical stimuli in conjunction with non-volatile phase transition are highly desired for realizing bioinspired neuromorphic vision components. Here, we report a non-volatile multi-level control of VO2 films by oxygen stoichiometry engineering under ultraviolet irradiation. Based on the reversible regulation of VO2 films using ultraviolet irradiation and electrolyte gating, we demonstrate a proof-of-principle neuromorphic ultraviolet sensor with integrated sensing, memory, and processing functions at room temperature, and also prove its silicon compatible potential through the wafer-scale integration of a neuromorphic sensor array. The device displays linear weight update with optical writing because its metallic phase proportion increases almost linearly with the light dosage. Moreover, the artificial neural network consisting of this neuromorphic sensor can extract ultraviolet information from the surrounding environment, and significantly improve the recognition accuracy from 24% to 93%. This work provides a path to design neuromorphic sensors and will facilitate the potential applications in artificial vision systems.

Here, Gn and Gn+1 denote the conductance of nth and (n+1)th pulses. Gmin and Gmax are the minimum and maximum conductance. αP and αD are the differences in the conductance between two points on the potentiation and depression curves. βP and βD denote the curvatures of the potentiation and depression curves (i.e., NL).
The fitting curve and extracted fitting parameters of the experimental LTP/LTD data are shown in Supplementary Figure 6 c-d and summarized in Supplementary Table 1, respectively.

Supplementary Note 2
The X-ray diffraction (XRD) pattern demonstrated that the structure of VO2 transformed from the monoclinic phase to the rutile phase under UV exposure (Supplementary Figure  12). Due to the expansion of lattice during monoclinic distortion, the XRD peak of VO2 shifted from 37.1 o to 37.15 o , corresponding to the Bragg angles of (200)M (2θ=37.088 o ) and (011)R (2θ=37.12 o ) in the XRD standard card, respectively. This result indicated that with the induced of oxygen vacancies in the crystal lattice, VO2 underwent a structural phase transition from low-symmetry to high-symmetry.

Supplementary Note 3 Construction of convolution kernel
In convolution processing, the weighted average of pixels in a small area in the input image becomes the corresponding pixel in the output image. The weight value is defined by a function, which is called convolution kernel. Convolution kernels are commonly used in image processing applications. For example, the Sobel operator, which is attributed to filtering operations, is also one of the current convolution kernels. Previous study 1 showed that the convolution kernel in the convolutional neural network will have the function of local feature extraction after training. Here, we used the proposed device as a convolution kernel for extracting ultraviolet feature in the application of neural networks. It is worth noting that such convolution kernel is used to perform a weighted average operation on the RGB and UV values of a single pixel. Therefore, a device array with the same size as the input image can realize the whole convolution processing in one step.
The convolution kernel is essentially a weight function, so that a set of weight functions based on device characteristics need to be defined in order to be used in subsequent computer simulation. Since the device exhibits a unique wavelength-dependent light response, the result of convolution processing based on the device characteristics is the sum of the responses to light of different wavelengths. The equation describing the convolution operation is as follows:

Substituting Supplementary Equation 4-7 into Supplementary Equation 3
, the convolution kernel used to describe the device characteristics was constructed. Combining Supplementary Equation 4-7, it can be found that the weight function of UV value is much larger than that of RGB value. After weighted average, that is, convolution processing, the UV information in each pixel will occupy a dominant position, while the RGB information will be greatly suppressed.

The preparation of test dataset
To compare the differences of image recognition accuracy under different conditions, three test datasets were used, the first of which is the initial MNIST test dataset downloaded from the website (http://yann.lecun.com/exdb/mnist/). This type of test dataset was used to verify the reliability of the simulated ANN built based on the proposed devices. The test dataset includes 10,000 test images (where each image is 28 × 28 pixels). It is worth noting that the test images are all grayscale, that is, each pixel is represented by one value. In order to store information of different colors in the images, RGB mode had been used. However, RGB values could only store visible light information. Therefore, a separate value should be introduced to storage the UV information. Subsequently, in order to design a set of images with fuzzy visible light information and clear UV information (i.e. the characteristic information that humans cannot recognize, but bees can recognize), the Gaussian noise was attached to the RGB values and the values representing UV information were not changed, which formed the second test dataset. The third test dataset is the result obtained after the second group of data set was preprocessed by the simulated convolution kernel array. In the third test dataset, the UV information is strengthened and the visible light information is weakened, which simulates the situation that bees can perceive and focus on the UV information when collecting nectar.

The architecture of ANN
The simulated three-layer ANN includes an input layer (784), a hidden layer (300), and an output layer (10). The activation functions of the hidden layer and output layer were Relu and Softmax, respectively. It is worth noting that the synaptic weight, in practical applications, could have both positive and negative values, while the conductance of the proposed device was always positive. Therefore, each synaptic weight (w) in the ANN were determined by a pair of normalized conductance values, i.e., = ( + − − )/ max .

Weight update method
For the weight updating, the back-propagation algorithm was used for training process. The weight update rule had been defined by using the experimental LTP/LTD data. The long-term potentiation (LTP) and long-term depression (LTD) were achieved by illuminating optical pulses of 375 nm wavelength and applying electrical pulses variety from -1.5V to -3.5V, respectively.
Next, the sign of weight change was calculated based on the output value and label value to determine whether the synaptic weight needed to increase (potentiation) or decrease (depression). In addition, to prevent overfitting, if |∆ | < 5 × 10 −4 , the conductance of w would not be updated, otherwise, the conductance would be updated as follows. Considering that at most one pulse was applied to each device update, there were three ways to update the device conductance, which were as follows: applying an optical pulse, applying an electrical pulse, and applying no pulse. Since the w was determined by the conductance of the paired devices, there were nine methods in total. Next, the optimal update method would be used to update the conductance.

Calculation of learning accuracy
After parameter updating, the three types of test datasets would be sent to the ANN for image recognition and the recognition accuracy of each dataset was calculated by: where Accuracy denotes the recognition accuracy, n is the number of correctly identified images, and N is the total number of images in the test dataset, which was 10,000.