Curved neuromorphic image sensor array using a MoS2-organic heterostructure inspired by the human visual recognition system

Conventional imaging and recognition systems require an extensive amount of data storage, pre-processing, and chip-to-chip communications as well as aberration-proof light focusing with multiple lenses for recognizing an object from massive optical inputs. This is because separate chips (i.e., flat image sensor array, memory device, and CPU) in conjunction with complicated optics should capture, store, and process massive image information independently. In contrast, human vision employs a highly efficient imaging and recognition process. Here, inspired by the human visual recognition system, we present a novel imaging device for efficient image acquisition and data pre-processing by conferring the neuromorphic data processing function on a curved image sensor array. The curved neuromorphic image sensor array is based on a heterostructure of MoS2 and poly(1,3,5-trimethyl-1,3,5-trivinyl cyclotrisiloxane). The curved neuromorphic image sensor array features photon-triggered synaptic plasticity owing to its quasi-linear time-dependent photocurrent generation and prolonged photocurrent decay, originated from charge trapping in the MoS2-organic vertical stack. The curved neuromorphic image sensor array integrated with a plano-convex lens derives a pre-processed image from a set of noisy optical inputs without redundant data storage, processing, and communications as well as without complex optics. The proposed imaging device can substantially improve efficiency of the image acquisition and recognition process, a step forward to the next generation machine vision.

which components were patterned (i.e. MoS2 , graphene, etc.) into a serpentine mesh, or if other features were included in the sensor device to accommodate the strains resulting from induced curvature. 4. The MoS2 / pV3D3 clearly exhibits response to optical inputs ; however the presented experiments in Figure 2 do not provide complete details on the optical portion of the experiment . Can the authors provide the require optical illumination intensity (W/cm2) used in the experiments in Figs 3 and 4? What is the spectrum? 5. In figure 3, it is clear that there are no hidden layers that would store weighting functions to translate the built up potentiation to recognition of specific objects. It would be helpful if the authors could elaborate on the potential advantages or disadvantages to including additional processing layers in close proximity to the imaging surface. Schematic diagram that describes the overall architecture for image acquisition, data pre-processing, and data post-processing by using cNISA, a customized data acquisition system, and post-processors.

Comment #2:
We can visualize that the overall processing is sequential, but our visual system performs the processing in parallel. How is the model brain-inspired?
Our response to comment #2: We appreciate the reviewer's comment. The human visual recognition system features efficient imaging and data processing, which are based on the synaptic plasticity of the neural network in the brain in addition to the single-lens-based imaging with the curved retina in the eye. Thus, we adopted these two features of the human visual recognition system into our imaging device to achieve the efficient image acquisition and data pre-processing in a single integrated device. Although overall processing in our system is still sequential if we consider the post-processing step, our device still has efficient aspects inspired from the human visual recognition system with regard to image acquisition and data pre-processing. We revised the manuscript to clarify these points.
Our modification to the manuscript: (Line 1, page 4: in the revised main text) "A distinctive feature is extracted in the neural network from the visual information acquired by the human eye 15,16 , which is used for image identification based on memories 17 ." (Line 22, page 4: in the revised main text) "The cNISA integrated with a single plano-convex lens realizes unique features of the human visual recognition system, such as imaging with simple optics and data processing with photon-triggered synaptic plasticity." (Line 8, page 5: in the revised main text) "In addition, the neural network exhibits high efficiency for classification of unstructured data by deriving distinctive features of the input data based on the synaptic plasticity 14 (i.e., short-term plasticity (STP) and long-term potentiation (LTP); Fig. 1a inset); the intensity of the post-synaptic output signal is weighted by the frequency of pre-synaptic inputs 15 ." 4 (Line 18, page 5: in the revised main text) "The photon-triggered electrical responses, which are similar to synaptic signals in the neural network, are enabled by the MoS2-pV3D3 heterostructure and result in a weighted electrical output from optical inputs (Fig. 1b inset)." (Line 2, page 13: in the revised main text) "Distinctive features of the human visual recognition system, such as the curved retina with simple optics and the efficient data processing in the neural network with synaptic plasticity, have inspired the development of a novel imaging device for efficient image acquisition and pre-processing with the simple system construction." (Figure 1a: in the revised manuscript) Figure 1a | Schematic illustration of the human visual recognition system comprised of a single humaneye lens, a hemispherical retina, optic nerves, and a neural network in visual cortex. The inset schematic shows the synaptic plasticity (i.e., STP and LTP) of the neural network.

Comment #3:
The article is hard to comply with a complex organization. They present the figures after the reference and in a separate file also. This organization confuses readers. Besides, some pictures appear in the paper several times.
Our response to comment #3: We thank the reviewer for the comment. In the original manuscript, we included figures with their corresponding captions once. But the same figures were automatically included at the end of manuscript again by the system of Nature Communications. This may have caused confusion to the reviewer, and we feel sorry for that. But this issue will be solved during the editing procedure. In addition, we revised the manuscript to avoid redundancy of some figure frames.
Our modification to the manuscript: (Line 18, page 3: in the revised main text) "Such iterative computing steps 1 as well as multi-lens optics 13 (Supplementary Fig. 1a) in the conventional imaging device increase system-level complexity." (Line 24, page 5: in the revised main text) "In case of a conventional imaging system with a conventional processor (i.e., von-Neumann architecture; Fig. 1c top), a flat image sensor array responds to incoming light (i.e., optical inputs) focused by multi-lens optics 10 (Supplementary Fig. 3a) and generates a photocurrent proportional to the intensity of applied optical inputs 23 ." (Line 5, page 9: in the revised main text) "The synaptic weight (An/A1), a ratio of the photocurrent generated by n optical pulses (An) to the photocurrent generated by a single optical pulse (A1), is defined to analyze the contrast quantitatively ( Supplementary Fig. 9)." (Line 2, page 12: in the revised main text) "A customized data acquisition system including current amplifiers and an analog-to-digital converter (ADC) enables the photocurrent measurement from cNISA (  Figure 17a | Exploded schematic illustration that shows the electrical connection between cNISA and the customized data acquisition system through the ACF.

Comment #4:
Their motivation is mainly adopting the visual recognition system in the sensor. It is desired to discuss the relevant works in literature and the performance comparison with them.
Our response to comment #4: We appreciate the reviewer's comment. As the reviewer pointed out, there have been innovative works for neuromorphic image sensors. We summarized the specifications of our curved neuromorphic imaging device and compared them to those of the relevant works.
Our modification to the manuscript: (Line 12, page 13: in the revised main text) "Such a curved neuromorphic imaging device could enhance the efficiency for image acquisition and data pre-processing as well as simplify the optics to miniaturize the overall device size, and thus has a potential to be a key component for efficient machine vision applications. Note that detailed specifications of the curved neuromorphic imaging device are compared to those of the relevant neuromorphic image sensors in Supplementary Table 3." (Line 2, page 21: in the revised main text) 7

Supplementary Table 3 | Specifications of neuromorphic image sensors.
Comparison of the current curved neuromorphic imaging device with the state-of-the-art neuromorphic image sensors in terms of the device type, key materials, the device format, the type of optical inputs, the number of pixels, information for optics, and information for their applications.

Comment #5:
The authors miss referring to recent work such as "Bio-inspired smart vision sensor: toward a reconfigurable hardware modeling of the hierarchical processing in the brain. or "Event-Based Reconfigurable Hierarchical Processors for Smart Image Sensors." It would be interesting to observe the distinction among these works.
Our response to comment #5: We thank the reviewer for the comment. The bio-inspired image sensors with reconfigurable hierarchical processors reported in the suggested references are highly promising options toward machine vision applications. We revised the manuscript to include these references and discuss the necessity of highly efficient post-processors for machine vision applications.

Our modification to the manuscript:
(Line 17, page 13: in the revised main text) "Nevertheless, additional processors for data post-processing, which extract features from the preprocessed image data and identify the target object, are still necessary for machine vision applications 46,47 . Therefore, further device units for efficient post-processing of the pre-processed image data should be still integrated 36 , although the pre-processed image can be efficiently obtained by cNISA." Thank you very much again for your insightful comments. We feel that these comments have helped to improve the quality of the manuscript significantly.

Reviewer #2:
Summary Comments: The paper describes a concept and initial prototype for a combined visual imager and object recognition neural network based on the hardware implementation of a photon triggered MoS2-pV3D3-PTr artificial synapse. The paper is generally well written the figures clearly communicate the concepts and ideas. The paper merges material synthesis and characterization, electronic theory and modeling, and a prototype device. This work attempts to recreate certain characteristics of biological vision systems, including plasticity and potentiation using the optically stimulated field-effect transistor. Overall the paper is original work and demonstrates several potential advantages of thin-film electronics and 2D materials including the ability to be curved into 3D shapes to reduce optical complexity. The authors are commended in the extension of the excellent electronic materials development and modeling discoveries to a fully functional prototype imaging system., In the reported work, the sensitivity of the optical modulation is clearly visible and functionally, but the dynamics occur over the range of seconds, limiting practical applicability of this specific materials and phototransistor design. The paper presents an initial implementation of a physical neural network within this optical context; for comparison other groups focusing on crossbar networks for purely electrical signals (e.g. microphone outputs) have used hidden layers and training to perform very low power signal pattern recognition.
Our response to summary comments: We sincerely appreciate the reviewer for precise comments. These comments were very helpful to improve the quality of our manuscript. The time-dependent photocurrent dynamics in our device confers the neuromorphic data processing function (i.e., synaptic plasticity) on the image sensor, although there may be issues to be solved in the future. In this regard, we hope to emphasize that the main contribution of this work is in the development of a novel imaging device for efficient image acquisition and data pre-processing through a single readout operation in a simple miniaturized device. Previously, these processes (i.e., image acquisition and data pre-processing) have required complicated optics as well as massive data storage, processing, and communications in multiple devices. Although future studies are needed for further integration and optimization for the post-processing part toward the practical applications, this device is expected to become a key component among various types of image sensors. We modified our manuscript to convey these points.
Our modification to the manuscript: (Line 18, page 4: in the revised main text) "We herein present a curved neuromorphic image sensor array (cNISA) using a heterostructure of MoS2 and poly(1,3,5-trimethyl-1,3,5-trivinyl cyclotrisiloxane) (pV3D3), aiming at aberration-free image acquisition and efficient data pre-processing with a single integrated neuromorphic imaging device ( Supplementary Fig. 1d)." (Line 18, page 5: in the revised main text) "The photon-triggered electrical responses, which are similar to synaptic signals in the neural network, are enabled by the MoS2-pV3D3 heterostructure and result in a weighted electrical output from optical inputs (Fig. 1b inset)." (Line 22, page 6: in the revised main text) "By a single readout of the electrical output, cNISA can derive a pre-processed image from a set of noisy optical inputs. Therefore, massive data storage, numerous data communications, and iterative data processing that have been required to obtain the pre-processed image data in conventional systems are not necessary 7,12 ." (Line 7, page 11: in the revised main text) "The remaining image can be immediately erased, if needed, by applying a positive gate bias (e.g., Vg = 1 V) (Fig. 3f). The positive gate bias facilitates de-trapping of holes in the MoS2-pV3D3 heterostructure, which removes the photogating effect and returns pV3D3-PTrs to the initial state 29 . Therefore, the subsequent image acquisition and pre-processing can be proceeded without interference by the afterimage in the previous imaging and pre-processing step 36 ." (Line 12, page 13: in the revised main text) "Such a curved neuromorphic imaging device could enhance the efficiency for image acquisition and data pre-processing as well as simplify the optics to miniaturize the overall device size, and thus has a potential to be a key component for efficient machine vision applications." (Line 17, page 13: in the revised main text) "Nevertheless, additional processors for data post-processing, which extract features from the preprocessed image data and identify the target object, are still necessary for machine vision applications 46,47 . Therefore, further device units for efficient post-processing of the pre-processed image data should be still integrated 36 , although the pre-processed image can be efficiently obtained by cNISA."  (Fig 4).
For example, what is the lens type, focal length, radius of curvature, etc. Was any aperture used for the imaging testing?
Our response to comment #1: We thank the reviewer for the comment. We used a plano-convex lens whose radius of curvature is 13.127 mm. The focal length, indicating the distance between the lens and cNISA, was set to be 17.045 mm. In addition, a front aperture was used to block the stray light. We revised the original manuscript to provide details on the optical experiments and the experimental setup. 10 (Line 16, page 6: in the revised main text) "The detailed optical analyses for the plano-convex lens and cNISA in comparison with the conventional imaging system are described in Supplementary Note 1 and Supplementary Tables 1 and 2." (Line 2, page 12: in the revised main text) "A customized data acquisition system including current amplifiers and an analog-to-digital converter (ADC) enables the photocurrent measurement from cNISA ( Fig. 4d  Our response to comment #3: We thank the reviewer for the comment. We revised the manuscript to explain strategies for fabricating the concavely hemispherical image sensor array without mechanical fractures in more detail. Our modification to the manuscript: (Line 18, page 11: in the revised main text) "By employing an ultrathin device structure 37,38 (~2 m thickness including encapsulations) and using intrinsically flexible materials (i.e., graphene 39 , MoS2 40-42 , and pV3D3 43 ), we could fabricate a mechanically deformable array. We also adopted a strain-releasing mesh design 44,45 , added patterns to fragile materials (i.e., Si3N4), and located the array near the neutral mechanical plane. Therefore, the strain induced on the deformed array was less than 0.053 % (Supplementary Fig. 16). As a result, the array can be integrated on a concavely curved surface without mechanical failures (Fig. 4c). Additional details for mechanical analyses are described in Supplementary Note 4 and Supplementary Fig. 16d." (Line 9, page 6: in the revised Supplementary Note 4) "Supplementary Note 4. Mechanical analyses of the curved neuromorphic image sensor array Finite element analysis for the strain distribution of cNISA on a hemispherical substrate was carried out by using COMSOL Multiphysics software (COMSOL Inc., USA). The mesh-patterned PI film whose thickness is 2 m was deformed along the hemispherical substrate whose bending radius is 11.3 mm. It was assumed that the conformal contact was made. First principle mechanical strain with plasticity of PI was calculated by using an initial yield stress of 24.8 MPa, an isotropic tangent modulus of 1.39 GPa, and a Poisson's ratio of 0.4."

Comment #4:
The MoS2/pV3D3 clearly exhibits response to optical inputs; however, the presented experiments in Figure 2 do not provide complete details on the optical portion of the experiment. Can the authors provide the require optical illumination intensity (W/cm 2

) used in the experiments in Figs 3 and 4? What is the spectrum?
Our response to comment #4: We appreciate the reviewer's comment. We added the normalized emission spectrum and the optical illumination intensity of the white light-emitting diode that we used in this work.

Our modification to the manuscript:
(Line 11 page 16: in the revised main text) "A white light-emitting diode whose intensity is 0.202 mW cm -2 was used as a light source for the device characterization. The emission spectrum of the white light-emitting diode is shown in Supplementary Fig.  20." (Line 5, page 17: in the revised main text) "The programmed white optical pulses with durations of 0.5 sec, intervals of 0.5 sec, and intensities of 0.202 mW cm -2 were irradiated to cNISA for illumination of a series of 20 noisy optical inputs ( Supplementary Fig. 19)."

Comment #5: In figure 3, it is clear that there are no hidden layers that would store weighting functions to translate the built up potentiation to recognition of specific objects. It would be helpful if the authors could elaborate on the potential advantages or disadvantages to including additional processing layers in close proximity to the imaging surface.
Our response to comment #5: We thank the reviewer for the comment. As the reviewer pointed out, image recognition requires several steps including image acquisition, data pre-processing, and data postprocessing. Although our curved neuromorphic imaging device enables efficient image acquisition and data pre-processing, additional processors for post-processing of the pre-processed data are still necessary for machine vision applications. We modified the manuscript to discuss these points including potential advantages and disadvantages.
Our modification to the manuscript: (Line 21, page 13: in the revised main text) "Neuromorphic processors (e.g., memristor crossbar array) enable efficient post-processing of the preprocessed image data in terms of the fast computation and the low power consumption 11 . The combination of cNISA with such neuromorphic processors would be helpful for demonstrating machine vision applications, although massive data storage and communications between them are still required due to their isolated architecture. In this regard, the development of a fully integrated system, which can perform the entire steps from image acquisition to data pre-/post-processing in a single device, can be an important goal in the future research. The development of such technologies would make a step forward to the highperformance machine vision."

Thank you very much again for your insightful comments. We feel that these comments have helped to improve the quality of the manuscript significantly.
Other minor modifications: #1 The order of authors was changed according to their contribution to the revision, one more affiliation of the corresponding author was added, and one more acknowledgement to the funding was added. (Line 16, page 2: in the revised main text) "The cNISA integrated with a plano-convex lens derives a pre-processed image from a set of noisy optical inputs without redundant data storage, processing, and communications as well as without complex optics. The proposed imaging device can substantially improve efficiency of the image acquisition and recognition process, a step forward to the next generation machine vision." "The cNISA receives optical inputs through a single lens, which can simplify the optical system construction ( Supplementary Fig. 3b)."  Our response to summary comments: We sincerely appreciate the reviewer for evaluation on our revised work.

Reviewer #2:
Summary Comments: Manuscript clarity and completeness have been significantly improved with the revision. The additional information and experimental details will help others in the field with understanding and benefiting from this work.
Our response to summary comments: We thank the reviewer for positive evaluation on our revised work.

Thank you very much again for your insightful comments. We feel that these comments have helped to improve the quality of the manuscript significantly.
Modifications upon the editor's request: #1 Abstract was revised.
(Line 9, page 2: in the revised main text) "Here, inspired by the human visual recognition system, we present a novel imaging device for efficient image acquisition and data pre-processing by conferring the neuromorphic data processing function on a curved image sensor array. The curved neuromorphic image sensor array (cNISA) is based on a heterostructure of MoS2 and poly(1,3,5-trimethyl-1,3,5-trivinyl cyclotrisiloxane) (pV3D3). The curved neuromorphic image sensor array features photon-triggered synaptic plasticity owing to its quasi-linear time-dependent photocurrent generation and prolonged photocurrent decay, originated from charge trapping in the MoS2-organic vertical stack. The curved neuromorphic image sensor array integrated with a plano-convex lens derives a pre-processed image from a set of noisy optical inputs without redundant data storage, processing, and communications as well as without complex optics." "For quantitative comparison, a linearity factor (), a degree of linearity of the photocurrent increase with respect to the illumination time (Iph  ~ t), is analyzed. As  approaches 1, the photocurrent increase becomes linear. However, if  is much larger than 1, the photocurrent increases nonlinearly and becomes saturated 4 shortly, which hinders efficient pre-processing of data 1 . The linearity factor of pV3D3-PTr (pV3D3) and that of Al2O3-PTr (Al2O3) are obtained by fitting log(Iph) with respect to log(t), where pV3D3 (1.52) is closer to unity than Al2O3 (2.50)." (Line 13, page 8: in the revised main text) "The analytical model, Iph(t) = I1(1-exp(-t/τ1)) + I2 (1-exp(-t/τ2)), consists of two exponential photocurrent generation terms with time constants (τ1 and τ2) and photocurrent coefficients (I1 and I2)." (Line 17, page 8: in the revised main text) "The pV3D3-PTr exhibits a large photocurrent coefficient ratio (I2,pV3D3/I1,pV3D3 = 11.03) and large τ2 (τ2,pV3D3 = 12.85 sec), resulting in a quasi-linear photocurrent generation function after series expansion of the exponential function (Iph(t)  I2,pV3D3(t/τ2,pV3D3); Supplementary Fig. 7a). In contrast, the control device (i.e., Al2O3-PTr) exhibits a much smaller photocurrent coefficient ratio (I2,Al2O3/I1,Al2O3 = 0.95) and smaller τ2 (τ2,Al2O3 = 4.39 sec), thus showing non-linear photocurrent generation ( Supplementary Fig. 7b)." (Line 22, page 8: in the revised main text) "The total decay time becomes longer with more frequent optical inputs. The decay time constant (decay), time required for photocurrent decay to 1/e of an initial value, of pV3D3-PTr is dependent on the number of applied optical pulses (Supplementary Fig. 8). The decay time constant for LTP and STP (decay,LTP and decay,STP) are 8.61 sec and 1.43 sec, respectively (red line and black line in Fig. 2h), and the retention time for LTP and STP are 3,600 sec and 1,200 sec, respectively ( Supplementary Fig. 9)." (Line 7, page 9: in the revised main text) "The synaptic weight (An/A1), a ratio of the photocurrent generated by n optical pulses (An) to the photocurrent generated by a single optical pulse (A1), is defined to analyze the contrast quantitatively." (Line 13, page 9: in the revised main text) "Therefore, pV3D3-PTr exhibits a larger synaptic weight (A25/A1) of 5.93 than Al2O3-PTr with A25/A1 of 2.89 upon the irradiation of 25 optical pulses ( Supplementary Fig. 10), leading to a better contrast in the neuromorphic imaging and pre-processing." (Line 7, page 10: in the revised main text) "The spatial distribution of the charge density difference (), in which negative  indicates existence of potential hole trapping sites, was computationally analyzed. The MoS2-pV3D3 heterostructure exhibits a spatially inhomogeneous distribution of  in the both in-plane and out-of-plane direction ( Fig. 2k and its inset), compared to the relatively homogeneous distribution in the MoS2-Al2O3 heterostructure (Supplementary Fig. 13b and its inset), due to the irregular geometry of the polymeric pV3D3 structure.
Such an inhomogeneous distribution of  results in the complex spatial and energy distribution of the potential hole trapping sites which are required for the active interfacial charge transfer 35 ." (Line 19, page 10: in the revised main text) "A set of noisy optical inputs (Im), successively incident to the array, induces a weighted photocurrent (Iph,n) 5 in each pixel (Pn) (Fig. 3a). For example, Iph,n changes gradually by the irradiation of nine optical inputs (I1-I9; Supplementary Fig. 14)." (Line 9, page 11: in the revised main text) "The remaining image can be immediately erased, if needed, by applying a positive gate bias (e.g., Vg = 1 V) (Fig. 3f)." Based on the photocurrent measurement data in Fig. 2f, the time constants (τ1, τ2) and the ratio of photocurrent coefficients (I2/I1) of pV3D3-PTr and Al2O3-PTr were estimated. The estimated τ1, τ2, and I2/I1 of pV3D3-PTr are 0.68 sec, 12.85 sec, and 11.03, respectively, and those of Al2O3-PTr are 0.80 sec, 4.39 sec, and 0.95, respectively.
Using these estimated parameters, we can analyze the contribution from each photocurrent term in the model to the overall photocurrent, i.e., the relative contribution by Iph,1(t), i.e., I1(1-exp(-t/τ1)), and Iph,2(t), i.e., I2(1-exp(-t/τ2)), to Iph(t)." "In case of pV3D3-PTr, I2,pV3D3 is an order of magnitude larger than I1,pV3D3 (I2,pV3D3/I1,pV3D3 ~ 11.03). In addition, τ2,pV3D3 is 18.90 times larger than τ1,pV3D3. As a result, Iph,2(t) is much larger than Iph,1(t), which leads to the quasi-linear increase of the overall photocurrent ( Supplementary Fig. 7a). And, the overall photocurrent can be approximated as Iph(t)  I2,pV3D3(t/τ2,pV3D3) for small t (i.e., t/τ2,pV3D3 ≪ 1)." 6 "The interfacial charge density,  = MoS2,B -MoS2 -B where the subscript B is either dielectric (e.g., pV3D3 or Al2O3), was also computed to investigate the charge density distribution and potential charge trapping sites at the interface of MoS2-pV3D3 heterostructure and MoS2-Al2O3 heterostructure." (Line 2, page 5: in the revised Supplementary Note 5) "Although PL intensity of the MoS2-pV3D3 heterostructure is consistent with that of the as-grown MoS2, the intensity ratio between charged exciton and A exciton (IA-/IA) increases by the pV3D3 deposition, which suggests small amount of electron doping effect by the deposited pV3D3 layer 13   4e-4h | e-h, Demonstrations for deriving a pre-processed image from massive noisy optical inputs (e.g., acquisition of a pre-processed C-shape image (i), decay of the memorized C-shape image (ii), erasure of the afterimage (iii), and acquisition of a pre-processed N-shape image (iv)). Figure 4e shows applied optical inputs and an applied electrical input. Figure 4f shows obtained images at each time point. Figures  4g and 4h show the photocurrent obtained from the pointed pixels at each time point.