Introduction

Color vision is an ability to perceive differences between light composed of different wavelengths, providing substantial environmental adaptivity to organisms1. The light passes through the cornea and lens and forms an inverted image on the retina at the back of the eye. The retina contains two types of photoreceptors: rods and cones2. Color information is detected in daylight by cones and transmitted to the brain for color perception3. Unlike the digital camera obtains color information by filters and processes based on centralized, sequential, and binary operations, a color vision formed in a biological visual system relies on cone-type photoreceptors that selectively respond to light with three wavelengths and encode them into spike trains for event-driven, temporal-correlated, and parallel processing4,5,6,7,8. As a result, the feature size and response time of the image sensors like charge-coupled device (CCD) or complementary metal oxide semiconductor (CMOS) is crucial to the efficiency of intelligent tasks such as image segmentation and object recognition9, which inevitably requires enormous throughput as well as energy consumption10,11. There are three types and ~6 million cones in our eye12,13, which consume roughly hundreds of picowatts of each and enable us to discriminate more than 1 million colors in a very compact configuration, outperforming most of the digital sensors14. Hence, developing biologically plausible artificial photoreceptors, especially the cone type, would give birth to a visual system with exquisite visual perception and extremely high energy efficiency and would boom the related areas such as prosthesis5,15,16, neurorobotics17,18, and cyborgs19.

Hardware spiking cone photoreceptors (SCPs) are able to respond differently to light composed of different wavelengths and encode them into spiking at a certain rate. In the beginning, the emulation of essential synaptic functions in the visual neural system and the development of light-sensitive synaptic devices were pursued, which is aimed at mimicking short-term/long-term memory with respect to light20,21,22. For example, an optic-neural synaptic device based on an optical sensing transistor and a synaptic transistor connected in series was proposed, which is able to classify color-mixed patterns with a full-connected optic-neural network20. More recently, the artificial spiking receptors have aroused great interest based on a consensus that information implied in rate is extremely energy-efficient and very robust to noise23,24,25,26,27,28,29,30. A spiking photoreceptor based on an optical sensor and oscillation neuron in series exhibited efficient edge image segmentation out of a complex background, representing a pioneer and feasible approach toward SCP10. However, a capacitor-free and more compact configuration are required for pursuing a smaller footprint. Furthermore, high biological plausibility is still challenging, and essential properties with respect to energy consumption, range of spiking rate, selectivity to different wavelengths of light, and so on have not been well realized.

Here, we demonstrate a vertically integrated SCP (VISCP), which is capable of converting light into spike trains and discriminating lights composed of different wavelengths at an ultralow energy consumption. The VISCP is built based on a vertically integrated configuration of indium–tin–oxide (ITO)/tantalum-oxide (Ta2O5)/Ag/indium–gallium–zinc-oxide (IGZO)/ITO. The power consumption in response to visible light is ≤400 pW, and the spiking rate of such VISCP is ~0.1–1200 Hz, which is very close to the response range of biological cones14. Such devices have been verified by three wavelengths of light and exhibit a high selectivity (>1.5 orders of magnitude) which enables the discriminating of different combinations of these lights. This also facilitates the demonstration of color-blind test simulations due to such high selectivity. Handwritten digits with color-blind tests are served as the testing dataset, and the differences in recognition accuracy can be observed between devices with and without the ability to discriminate mixed colors. This oxide-based VISCP could be regarded as a building block for hardware-spiking neural networks with sophisticated color perception.

Results

Structure and characterization of the VISCP

In biological visual systems, photoreceptors (rod and cone cells) convert external optical stimuli into spiking potentials, which eventually form vision in the brain, as shown in Fig. 1a. Rod cells are sensitive to dim light while lacking color-distinguishing ability31. Cone cells, which work in bright environments, contain three types of light-sensitive pigments (red, green, and blue). Cone cells encode specific wavelengths of light into spikes with specific frequencies, which are the basis of color vision32. These photoreceptor cell bodies are precisely arranged in clusters in the apical region of the eye disc and project axons into the brain’s optic lobe (Fig. 1b)33. Inspired by the cone cells, a VISCP is proposed with a device configuration of ITO/Ta2O5/Ag/IGZO/ITO (Fig. 1c–e), and the response to colorful lights is conceptually illustrated in Fig. 1f. The VISCPs consist of an IGZO-based photoresistor and a Ta2O5-based spike-encoder, which is capable of direct transducing persistent lights into spike trains dependent on the light intensity. This enables the discrimination of optic patterns with different intensities pixels similar to rod cells. While in order to mimic the properties of cone cells, a VISCP should respond differently to light with different wavelengths (e.g., the responses to wavelengths of λ1, λ2, and λ3 as shown in Fig. 1g). In this case, the colorful patterns like color-blind test image could be transformed into a pattern with high contrast ratio in terms of spiking rate, which facilitate the recognition of such pattern.

Fig. 1: The vertically integrated spiking cone photoreceptors (VISCP) and their biological counterparts.
figure 1

a Biological photoreceptor convert specific colors of optical stimuli into spiking potentials. The coded information is ultimately sent to the brain for further processing. The Portraits of the colorful parrot and women’s eye are reproduced with permission from Pexels. b The photoreceptor cell bodies of wild drosophila are connected to the brain’s optic lobe. Reproduced with permission33. Copyright 2004, COMPANY OF BIOLOGISTS. c The digital image of oxide-based VISCP array on the two-inch silicon wafer. The whole array contains 5 × 5 sub-arrays, and each sub-array consists of 8 × 8 VISCP devices. d Micrograph of a group of oxide-based VISCP shows the micropillar structure. e The sectional view of such micropillar illustrates the vertically integrated layers. f Schematic illustration of color blindness image perception in artificial spiking cone photoreceptors array (ITO/ Ta2O5/Ag/ IGZO /ITO). g The spiking frequency as a function of light intensity with different wavelengths. h A 3D stereo image of firing rate after artificial spiking cone photoreceptors recognition.

Electrical characterizations of the spike-encoder and photoresistor

The spike-encoder is an ITO/Ta2O5/Ag-based threshold switching (TS) memristor, as shown in the inset of Fig. 2a. The IV characterization exhibits the typical TS property as shown in Fig. 2a. A sweep voltage was applied on the top electrode with a compliance current of 1 µA and the bottom ITO electrode was grounded. When the applied positive voltage exceeds the threshold voltage (VTH), Ag conductive filaments (CFs) are formed to bridge the top and bottom electrodes, enabling the memristor to switch from the high resistance state (HRS) to the low resistance state (LRS). The formation of CFs is dominated by cation migration and redox processes34,35,36,37,38,39. When the voltage sweeps back and is below the hold voltage (VHOLD), CFs would rupture spontaneously due to the interfacial energy minimization40,41, switching to the HRS of the memristor. The TS characteristics exhibit no obvious degradation during 500 consecutive cycles. The VTH and VHOLD increase with the sputtering time of the Ta2O5 layer, which is shown in the Supplementary Fig. 1. The HRS–LRS switching speed of the TS memristor is shown in Fig. 2b. The driving pulse with an amplitude of 2.5 V and duration of 2.0 µs is applied to the memristor. A fast switching-on speed of ~40 ns and a recovery time of ~55 ns after the driving pulse can be observed by applying a reading voltage with an amplitude of 0.05 V and a duration of 2.0 µs. The negative differential resistance (NDR) effect is also known as TS. The NDR effect means that as the applied current increases, the voltage decreases instead.

$$R=\frac{{{{{{\rm{d}}}}}}V}{{{{{{\rm{d}}}}}}I} < \, 0$$
(1)
Fig. 2: Electrical characterizations of the photoresistor and spike-encoder.
figure 2

a Current–voltage curves of the ITO/Ta2O5/Ag memristor in 500 sweep loops. b The VT characteristics of the memristor and the switching speed between high and low resistance states. c Frequency-dependent specific capacitance of the Ta2O5-based TS memristor with inset showing the equivalent circuit. d The firing frequency is a linear relationship with the IIN, and the short-dashed line is a linear fit. e The resistance of the IGZO sensor as a function of light intensity with different wavelengths. Inset: schematic of Ag/IGZO/ITO sensor. f The transient electrical characteristics of the IGZO sensor under different wavelengths with an applied voltage of 0.2 V.

In Supplementary Fig. 2, the NDR effect can be observed through current sweeping. The voltage decreases as the applied current increases, resulting in a negative resistance when the TS memristor voltage reaches the threshold voltage (VTH). The NDR effect is attributed to the formation of the Ag filament, resulting in a sharp drop in resistance above VTH. Such an NDR effect provides the basis for the oscillation of a spike-encoder42,43. The parasitic capacitance (CP) of the TS memristor is estimated to be 2.5-5 μF/cm2 at the frequency range between 1 and 1000 Hz, as shown in Fig. 2c. The equivalent circuit of the device is shown in the inset of Fig. 2c. The spike-encoder takes full advantage of such high parasitic capacitance, enabling a capacitor-free configuration, simplifying the structure, and offering greater potential for further scaling down.

The output spikes (VOUT) could be observed by applying a current bias (IIN), as shown in Fig. 2d. The current bias would charge the parasitic capacitor as long as the Ta2O5-based TS memristor is at its HRS. This charging process increases VOUT until it approaches the VTH. The memristor will switch from HRS to LRS when VOUT increases to VTH. Due to the reduced resistance, the parasitic capacitor discharges, and VOUT drops rapidly. When the voltage is below VHOLD, the TS memristor recovers to HRS, and the CP is charged again. As a consequence, the charging/discharging process, along with the spontaneous resistance switching (HRS→LRS→HRS) of the TS memristor, underlies the oscillating in VOUT. Supplementary Fig. 3 clearly shows the charging and discharging processes in a single spike behavior. The increase in input current would accelerate the charging process and lead to an increase in the spiking rate of VOUT. The spiking rate is plotted as a function of the input current, as shown in Fig. 2d, exhibiting a nearly linear relationship. As the input current (IIN) increases from 5 to 500 nA, the spiking rate of the VOUT increases from 25 Hz to 4 kHz. The inset shows the typical response with IIN = 5 nA, which depicts the voltage-spiking behavior. The spiking behavior of the other input currents is shown in Supplementary Fig. 4.

The artificial photoreceptor is an Ag/IGZO/ITO-based photoresistor, as shown in Fig. 2e. The resistance state of the IGZO-based photoresistor in response to increasing light intensity under different wavelengths is also presented. The band gap of perfect IGZO films is about 3.5 eV, which can absorb high-energy photons (e.g., UV light with a wavelength of 360 nm)44. The introduction of oxygen vacancies in the IGZO film can be achieved by controlling the growth atmosphere (oxygen–gas partial pressure relative to argon) during the sputtering deposition process. The oxygen vacancies lead to a defect energy level lower than 3.5 eV, which enables a certain level of sensing capability to visible lights45,46. Supplementary Figure 5 shows the optical absorption spectra of the IGZO film. It clearly shows that the light absorption of the IGZO film decreases with increasing wavelength. Therefore, light with different wavelengths can induce significant differences in conductance. Lights with three wavelengths were used in this work. The wavelength gaps among the three lights are large enough to enable the differentiation by the IGZO-based photoresistor, as shown in Fig. 2e. Figure 2f displays the transient response to different wavelengths with a light intensity of 0.5 nW/μm2. Significant differences can be observed in the resistance state of the IGZO-based photoresistor among different wavelengths, indicating the capability of color selectivity. The IGZO photoresistors with different sputtering times are shown in Supplementary Fig. 6. It shows that the resistance increases with the thickness of the IGZO layer.

Vertically integrated VISCP and color selectivity

The vertically integrated ITO/Ta2O5/Ag/IGZO/ITO device that incorporates the spike-encoding and light-sensing properties enables the mimicking of cone functions. The equivalent circuit of the integrated VISCP is shown in Fig. 3a. A constant voltage (VBias = 0.5 V) and ground voltage were applied on the top and bottom ITO electrodes, respectively. The output voltage (VOUT) was measured on the Ag electrode. A more detailed description of the design of the VISCP is given in Supplementary Fig. 7. A detailed structural and chemical characterization of the IGZO and Ta2O5 films is presented in the Supplementary Information (Supplementary Figs. 8 and 9). The device monolithically encodes persistent light into a spike train with a certain level of frequency, as shown in Fig. 3b. The VISCPs are resting in a dark environment. In contrast, the SPCs continuously encode and fire under persistent light illumination (λ = 360 nm, P = 0.03 nW/μm2). The frequency of output spikes in the VISCP exhibited a positive correlation with the light intensity and wavelength (Fig. 3c). Lights with wavelengths of 360, 405, and 532 nm were used as stimulations in this work, and the oxide-based VISCP exhibits strong distinction to these ‘color’. There is no overlap among the frequency ranges in response to these wavelengths. In this case, the spiking rate can convey color and intensity information of light stimulation. Figure 3d shows the experimental observation of the spiking in response to the three wavelengths with a fixed intensity of 0.5 nW/μm2. The spiking frequencies of the VISCPs were 1200, 7, and 0.1 Hz for lights with wavelengths of 360, 405, and 532 nm, respectively. The effective inputs of the memristor depend on the resistance of the IGZO-based photoresistor, which is wavelength and intensity-dependent and modulates the spike frequency. For the wavelength of 360 nm, as the light intensity reduces from 0.5 to 0.03 nW/μm2, the frequency decreases from 1200 Hz to 37 Hz, as shown in Fig. 3e. In conclusion, the VISCPs have color selectivity in bright environments while ineffective in the dark environments, which is similar to the biological cone.

Fig. 3: The spike-encoding behavior and color selectivity of the vertically integrated spiking cone photoreceptors (VISCP) under light illumination.
figure 3

a The equivalent circuit of the integrated VISCP. b The VISCP is resting in a dark condition. The VISCP fired spikes with light a wavelength of 360 nm. c The fire frequency plotted as a function of the light intensity with different light wavelengths. d Experimental observation of the VISCP was fired with three different frequencies under 360, 405, and 532 nm. The light intensity was kept constant at 0.5 nW/μm2. e Experimental observation of the VISCP was fired under various intensities at the wavelength of 360 nm.

The device-to-device variation data are shown in the Supplementary Fig. 10. The Gaussian fit of Supplementary Fig. 10a demonstrates a certain amount of variations of VTH (~0.27 V ± 0.06) and VHOLD (~0.04 V ± 0.017) (the amplitude of the output spike train). The frequency statistics of the different device responses for the three wavelengths are shown in Supplementary Fig. 10b. The spike frequencies of the VISCPs are clearly distinguished at the three illumination wavelengths. Table 1 shows a comparison of several artificial visual nerves/photoreceptors and the human eye photoreceptor10,14,18,20,47,48,49,50,51,52,53. Our work features both spike encoding and color perception. It has a low power consumption (≤400 pW per spike) in response to visible light, similar to the photoreceptors of the human eye, which well mimics the information encoding scheme of its biological counterpart.

Table 1 Summary of the reported basic performance of the artificial visual nerve/photoreceptors and our device

Color-blind image recognition

The color perception of the biological visual system depends on the cone’s response to the ratio of red, green, and blue. The VISCP spike rates increase as the wavelength decreases from 532 nm (λ1) to 405 nm (λ2) and 360 nm (λ3). We utilized the three wavelengths as pseudo-colors and performed a mixed-color test as shown in Fig. 4a. Four mixed-colors defined by the power percentage of the combined lights were: 100% λ1 for ‘red’, 50% λ1 and 50% λ2 for ‘orange’, 50% λ2 and 50% λ3 for ‘olive’ and 100% λ3 for ‘green’, respectively. The total energy intensity of each light input is fixed at 0.5 nW/μm2. The spike rates in response to red, orange, olive, and green lights increase exponentially, as shown in Fig. 4b. The spike rate for red light is the lowest at 0.2 Hz, while the rate reaches the highest of 1200 Hz under green light irradiation. The spike rates difference between adjacent colors is over one order of magnitude, indicating the excellent selectivity of the VISCP in distinguishing mixed colors. The modified MNIST handwritten digit images were generated on MATLAB based on the color blindness test style for image preprocessing, which consist of randomly distributed circles with several similar colors. In this work, the main body pixels of a handwritten digit were randomly painted orange and red, while the background pixels were randomly painted olive and green. In this way, a dataset for red-green color blindness, the most common type of color blindness, can be generated with a size of 280 × 280. Generally, red-green color blindness is difficult to tell the difference between red and green, especially for the mixed colors that contain red or green, like orange and olive. As shown in Fig. 4c, we simulate the behaviors of individuals with normal color vision and color blindness. The device with excellent selectivity to the four ‘colors’, as demonstrated in Fig. 4b, was analogous to the one with color vision. A simulated device that can only differentiate ‘red’ and ‘green’ and cannot differentiate ‘red/orange’ and ‘green/olive’ was analogous to the one with red-green color blindness. The parameters for simulation are extracted from Fig. 4b (details of simulation see Supplementary Note 3). Five thousand treated images were mapped to the light matrixes, which can trigger the spiking responses of the array of VISCP. Then the spiking rate of each VISCP was measured, serving as the preprocessed images for further processing. The preprocessed images were fed into a five-layer convolutional neural network for recognition, with 90% of the images for training and the rest for testing (details of simulation see Supplementary Note 4). Figure 4d shows recognition accuracy during 30 training epochs for the devices with and without the mixed-color selectivity. Although the treated digits are more complex than the original MNIST digits and a relatively low recognition accuracy can be observed, the VISCP ‘eye’ with color selectivity can identify the target digit from the background with mixed colors with an accuracy of ~83.2%. However, the ‘eye’ without mixed color selectivity shows great difficulty, and a low accuracy of only 75.5% was achieved. Such results successfully mimicked the color perception of human, which incorporate sensing, rate encoding, and recognition. We also demonstrate color selectivity as a significant positive effector on the recognition accuracy of complex objects, which exists among humans with color and color-blindness vision.

Fig. 4: Color-mixed pattern recognition.
figure 4

a With different combinations of wavelengths (red, orange, olive, and green), the oscillation waveform diagram. b The characteristic of firing rate in the case of different ratios of wavelength components. c Comparison of color blindness recognition results with and without color selectivity. d The evolution of recognition accuracy as a function of training epochs w/wo color selectivity.

Discussion

Future apparatuses that are intended to interact with humans and/or the environment can benefit inexhaustibly from biological systems with highly sophisticated perceptual and sensorimotor capabilities. The biological counterparts enable energy-efficient and autonomous interactions with the real world, where the signals are always nonstructural, non-normalized, and fragmented. The sensing information in conventional systems is encoded into amplitudes (analog) or represented by binary signals (digital), which are thought to be data-intensive and highly redundant. What’s worse, the physical separation of sense, memory, and processing in these systems aggravates the computational burden. Encoding external stimuli into spikes could be regarded as the most biologically plausible coding scheme, and the converter that is capable of spike-encoding could be regarded as the core of a future bionic system.

In this work, the vertical-integrated oxide-based devices that are capable of converting light into spikes monolithically represent a step forward in the artificial visual system with high biological plausibility. More importantly, the large response range corresponding to three wavelengths of light makes it possible to discriminate ‘colors’. An ultralow power consumption of ≤400 pW per spike in response to visible lights can be achieved. As a proof-of-concept, such devices were implemented to mimic color perception. The devices with mixed-color selectivity showed a higher recognition accuracy to MNIST handwritten digits with a color-blind test style in comparison with the devices without such selectivity. These results reveal the great potential of such devices for constructing an artificial visual system with high energy efficiency and high biological plausibility. Future improvement could be devoted to the manufacturing of large-scale arrays with the capability to process images or even videos with high resolution with the aid of necessary peripheral circuits. With further integration with spiking neural networks, energy efficiency, and accuracy might be improved. Furthermore, a more biologically plausible system would be available by translating the devices into a flexible/stretchable form, just like the biological cones in the retina.

Methods

Fabrication of the vertically integrated VISCP

The VISCP is built based on a vertically integrated configuration of ITO/Ta2O5/Ag/IGZO/ITO. First, an ITO bottom electrode with a thickness of 100 nm was obtained on a pre-cleaned silicon substrate by radio frequency (RF) magnetron sputtering for 10 min in a pure argon ambient at 0.8 Pa using an ITO target (90 wt% In2O3 and 10 wt% SnO2). A Ta2O5 (80 nm) switching layer was deposited by magnetron sputtering for 30 min using a Ta2O5 target (100 wt% Ta2O5). During the sputtering process, the RF power and the Ar:O2 ratio were 100 W and 30:2, respectively. The patterned circular Ag intermediate electrodes (200 µm in diameter) were deposited on the Ta2O5 switching layer by thermal evaporation with a metal mask process. Then, the 55 nm IGZO sensitized layers (120 μm in diameter) were deposited on Ag (100 nm) electrodes by RF magnetron sputtering using IGZO targets (In:Ga:Zn = 2:2:1 atom ratio). During the sputtering process, the RF power and the Ar:O2 ratio were 100 W and 15:15 for 30 min, respectively. Finally, the ITO top electrodes (70 μm in diameter) were deposited by RF magnetron sputtering using an ITO target in a pure argon atmosphere at 0.8 Pa for 10 min.

Device characterization

The threshold switching characteristics and encoding performance of the devices were tested by the Fs-Pro PX500. For the VISCP measurements, fiber-coupled laser modules (CNI, Laser PGL-FC-360, Laser PGL-FC-405, Laser PGL-FC-532) were used to apply persistent light at 360, 405, and 532 nm on top of the device. The switching times of the devices were performed by using a Keithley 4200 semiconductor parameter analyzer. The capacitance of Ta2O5 was measured by a HIOKI IM 3533-01 LCR instrumentation impedance analyzer. Cross-sectional images of ITO/Ta2O5/Ag/IGZO/ITO optical encoding nerve components were measured by field emission scanning electron microscopy (JEOL, JSM-7000F) for measurement. All the devices in this work were measured in a probe station in the atmospheric environment. The humidity and temperature are ~50% RH and ~300 K.