Higher-dimensional processing using a photonic tensor core with continuous-time data

New developments in hardware-based ‘accelerators’ range from electronic tensor cores and memristor-based arrays to photonic implementations. The goal of these approaches is to handle the exponentially growing computational load of machine learning, which currently requires the doubling of hardware capability approximately every 3.5 months. One solution is increasing the data dimensionality that is processable by such hardware. Although two-dimensional data processing by multiplexing space and wavelength has been previously reported, the use of three-dimensional processing has not yet been implemented in hardware. In this paper, we introduce the radio-frequency modulation of photonic signals to increase parallelization, adding an additional dimension to the data alongside spatially distributed non-volatile memories and wavelength multiplexing. We leverage higher-dimensional processing to configure such a system to an architecture compatible with edge computing frameworks. Our system achieves a parallelism of 100, two orders higher than implementations using only the spatial and wavelength degrees of freedom. We demonstrate this by performing a synchronous convolution of 100 clinical electrocardiogram signals from patients with cardiovascular diseases, and constructing a convolutional neural network capable of identifying patients at sudden death risk with 93.5% accuracy. Radio-frequency modulation of optical signals increase the parallelization of photonic processors beyond that afforded by exploiting spatial and wavelength dimensions alone. The approach is then demonstrated on electrocardiogram signals and identifies patients at sudden death risk with 93.5% accuracy.


Article
https://doi.org/10.1038/s41566-023-01313-xelements 31 .Broadcast-and-weight PIC-based MVM processors using light at different wavelengths as data carriers and tunable microring resonator add-drop filters as weighting elements have also been developed 32 .More recently, optical frequency comb technology was introduced with PIC-based MVM processors to provide a high-quality multiwavelength light source with dense wavelength spacing 33,34 .A record high of 11 tera operations per second has been realized using a single optical frequency comb with the wavelength-and-time interleaving technique 33 .The latest advance in delocalized photonic deep learning shows the advantages of using PIC-based MVM processors on the Internet's edge 35 .In addition, it is worth noting that a photonic counterpart of an electronic crossbar array has been demonstrated 34 .The passive photonic crossbar array uses waveguide directional couplers and crossings as interconnects and phase-change materials (PCMs) as memories (optical transmissions tuned by the non-volatile crystalline state of the PCM 36 ).
In all the PIC-based MVM processors, two DOFs are accessible by the input data, that is, space and wavelength, allowing a twodimensional (2D) array input (Fig. 1b).Here Q d M×1 input vectors, each carried by a different wavelength λ q , can be processed in parallel, leading to one d K×M × d M×Q matrix-matrix multiplication (equivalent to Q d K×M × d M×1 MVMs).A parallelism (defined as the number of MVMs per operation cycle of units 13 , field-programmable gate arrays 14 and application-specific integrated circuits 15 .In addition, perhaps the most notable recent advance is the use of memristive crossbar arrays for analogue in-memory computing [16][17][18] .Various mechanisms have been explored to store memories in physical states of materials (redox 19 , phase change 20 , ferroelectric 21 and magnetoresistive 22 ) to enable such in-memory computing.A memristive crossbar array with M inputs and K outputs mathematically represents a matrix of dimension d K×M that contains K d 1×M kernels.Multiplication and addition operations are performed according to Ohm's law and Kirchhoff's law, respectively.The input data use the spatial degree of freedom (DOF) and are a one-dimensional (1D) array X 1D = (x 1 x 2 …x M ) T representing a d M×1 vector, leading to one d K×M × d M×1 MVM per operation cycle (Fig. 1a).
Photonic MVM is emerging as a next-generation alternative with the advantages of low latency, low energy consumption and high DOFs 23,24 .Compared with electronic data transmission that is inherently limited by capacitive delay and the energy consumption required to charge/discharge electronic integrated circuits, photons transmit data at the speed of light with near-zero power consumption 25 .Photonic MVM can access a huge terahertz bandwidth compared with a gigahertz bandwidth accessible by electronics, opening the possibility of high parallelism by exploiting the wavelength DOF, that is, wavelength-division multiplexing (WDM).Traditionally, photonic MVM was implemented by light diffraction in free space, an approach that continues to inspire computing architectures 26 .In the past decade, photonic MVM using photonic integrated circuits (PICs) has flourished 27,28 owing to the development of scalable on-chip dense integration of optical waveguide components 29,30 .Notable progress includes the demonstration of PIC-based MVM processors based on cascaded Mach-Zehnder interferometer arrays using coherent light as the data carriers and thermo-optic phase shifters as weighting Discrete-time data representation with WDM Data representation in higher dimensions a Discrete-time data representation only x MN,1 x 11,1 x 22,1 x 12,1 x 21,1 x 1N,1 x 2N,1 x M1,2 x MN,2 x 11,2 x 22,2

Article
https://doi.org/10.1038/s41566-023-01313-x a physical device) of 4 using a photonic crossbar array and WDM has been realized 34 .Recently, a similar endeavour to increase data dimensionality was reported in electronic crossbar arrays by exploring the continuous-time data representation 37  we demonstrate a computing architecture in hardware that allows three-dimensional (3D) array inputs for higher-dimensional MVM by simultaneously exploiting three DOFs, that is, space, wavelength and RF.The input data are a 3D array: X 3D represents multiple d M×N matrices each carried by a wavelength λ q , when N RF components (f 1 to f N ) and Q wavelengths are used (Fig. 1c).The 3D array input is processed by an electro-optically controlled photonic tensor core with reconfigurable non-volatile PCM memories to enable photonic in-memory computing.Our system is effectively implementing ) and achieves a remarkable ultrahigh parallelism of 100, two orders higher than the previous implementation 34 using only two DOFs.Having such a higher-dimensional processing advantage allows our system to accelerate hugely common artificial-intelligence-type processing tasks.We demonstrate this by realizing the synchronous convolution of 100 clinical electrocardiogram (ECG) signals from cardiovascular disease (CVD) patients and facilitating a convolutional neural network (CNN) to identify patients at sudden death risk with 93.5% accuracy.Increasing the dimensionality from 1D to 2D to 3D data processing by exploiting additional DOFs, the system parallelism is increased from 1 to (Q or N) to Q × N, providing a viable path for ultraparallel photonic computing.

Data architecture and working principle
The proposed computing architecture utilizes continuous-time data representation instead of traditional discrete-time data representation to add RF as the third DOF for data input.Figure 2 conceptually illustrates the data architecture and working principle of using continuous-time data representation.An example of matrix-matrix multiplication without using WDM is illustrated to highlight RF parallelism and maintaining visual clarity (Fig. 2a).
To perform higher-dimensional in-memory computing that simultaneously utilizes the spatial, wavelength and RF DOFs, a photonic tensor core system based on electro-optically controlled PIC technology is proposed (Fig. 2b).To implement the matrix-matrix multiplication shown in Fig. 2a, the photonic tensor core with M inputs and K outputs defines a d K×M matrix A cell (red-dashed box) contains a tunable power splitter for power distribution and routing, a PCM memory (or weight) for multiplication, a directional coupler for accumulation and a crossing for interconnect (Fig. 2c).The system scalability is evident from the periodic cell layout in the 2D plane.MVM requires equal power distribution to all the PCM weights and the same contribution from different cells for linear accumulation.The requirements are fulfilled by a meticulous design of the power splitter and directional coupler (Supplementary Section 1).In addition to equal power distribution, power splitters also serve to concentrate all the optical power in a specific cell during the PCM weight-setting process (Methods).The input data architecture features a 2D array x mn e i2πf n t through encoding individual elements into amplitudes of N different RF components and input via optical channel m of the photonic tensor core.The weighted sum of M such inputs that is output from column k is Consequently, the collective outputs from all the columns are Y represents N MVM results of all the N d M×1 vectors in X 2D multiplied by the kernel matrix W. Using Q WDM channels will result in Q × N MVMs.

Verification of fundamental operations
The additional RF DOF is introduced to the system using a continuoustime data representation.We first verify the feasibility of using continuous-time data representation for photonic in-memory computing.A photonic tensor core provides three fundamental functions: data summation by routing cell outputs to common buses, data weighting by PCM memory and consequent weighted data summation.These three functions correspond to three mathematical operations, namely, addition, multiplication and multiplyaccumulate (MAC), respectively.These three operations are studied using a Y junction loaded with PCM memories on both arms (Fig. 3a).Supplementary Section x k e i2πf k t (Fig. 3b), and the frequencydomain output is the sum of two RF amplitudes at discrete frequencies, that is, out 3c).The accuracy of the addition operation is revealed by its error distributions (Supplementary Section 4).The wavelength spacing (Δλ) between the two inputs is also studied (Supplementary Section 5) for harnessing dense WDM parallelism in system implementation.The accuracy of the addition operation under different numbers of multiplexed RF components is also studied (Supplementary Section 6), suggesting that N = 50 presented here is not a limitation of parallelism for low-precision ML models 38 .To verify the multiplication operation, only one arm of the Y junction is active.A continuous-time input consisting of multiplicands is in(t) = ∑ N j=1 x j e i2πf j t .The multiplier w (or weight) is determined by the crystalline state of PCM and can be set using optical pulses (Supplementary Section 7).The resultant change in optical transmission ΔT = can be continuously tuned from 0% to more than 20% by increasing the amorphization pulse width (Fig. 3d).The weight w can be mapped to [0, 1], leading to normalized outputs from PCM memory: w × x ∈ [0, 1].Supplementary Section 8 describes the details of weight mapping.The frequency-domain outputs at different weights are examined to confirm that the multiplicands encoded in the different RF components are operated by the same multiplier (Supplementary Section 9).The accuracy of the multiplication operation is revealed by the Gaussian error distribution of 1,500 multiplication results, obtained by multiplying 300 random numbers and 5 weights, showing a standard deviation of 0.056 ± 0.001 (Fig. 3e).The whole Y junction is active for the verification of the two-channel MAC operation.The input vectors and operation principle are similar to the combination of addition and multiplication operations.Using 300 pairs of random numbers and 5 pairs of weights using just a Y junction, we obtain a standard deviation of 0.057 ± 0.001 in Gaussian error distribution from 1,500 MAC operations (Fig. 3f).In a photonic tensor core with 300 three-element arrays of random numbers and 5 three-element arrays of weights, the standard deviation we record is 0.063 ± 0.001 (Supplementary Section 10), where the expected performance of using more optical channels is also estimated.The errors are x 23 The former can be minimized by the progressive setting method that gradually increases the setting pulse energy until the desired transmission is reached 34 , and the latter can be improved by using on-chip integrated photodetectors with a lower noise-equivalent power or reducing the optical loss of the PIC to enhance the signal-to-noise ratio.This successful verification of three fundamental operations proves the feasibility of using a continuous-time data representation to add the RF DOF to photonic in-memory computing.Using multiplexed N = 50 RF components for a simple PCM-loaded Y junction, a parallelism of 50 is achieved, showing the high parallelism provided by the additional RF DOF.Importantly, this high parallelism contributed by RF can be conveniently incorporated into optoelectronic systems since it involves no additional optical multiplexing or filtering.Possible existing solutions to implement RF multiplexing include the use of field-programmable gate arrays and operational amplifier banks 39 , making on-chip integration feasible for our proposed architecture.

Healthcare monitoring using a CNN
Statistics revealed by the World Health Organization show that CVDs are the leading cause of death, taking 17.9 million lives annually, with more than 80% caused by sudden heart attacks and strokes 40 .Real-time ECG recording and analysis are crucial to minimize sudden death risks.An edge computing framework is a solution to simultaneously monitor the health condition of multiple CVD patients in real time with low latency 41 .The proposed computing architecture exploiting three DOFs is a potential platform to implement computing in edge clouds and perform the high-dimensional synchronous convolution of ECG signals and can facilitate ML-aided analysis to alert sudden death events, simultaneously benefiting a large number of CVD patients.
Having verified the feasibility of simultaneously using three DOFs, we configure our system to an architecture for edge cloud computing (Fig. 4).Specifically, the wavelength and spatial DOFs are utilized for high-bandwidth parallel convolution and the RF DOF enables low latency and synchronization between the end devices.The system contains three layers (edge device, edge interface and edge cloud) with five functional blocks: input light generation and (de)multiplexing in the edge cloud, input-multiplexed RF generation at the edge device and interface, optical modulation relating edge interface and edge cloud, photonic tensor core for in-memory computing in the edge cloud, and output light (de)multiplexing and detection in the edge cloud.In our system implementation, six wavelengths covering 1,548.51 to 1,552.52 nm, with an adjacent spacing of 0.8 nm (equivalent to 100 GHz), are used for WDM.The highest RF frequency limited by our variable optical attenuators is 1 kHz.Methods and Supplementary Section 11 show the detailed system setup and electro-optic response, respectively.The corresponding operation is a specific case of the generalized data architecture and working principle described previously and is discussed in detail in Supplementary Section 12.In a single operation cycle, the system is synchronously performing 300 convolutions, convolving 100 ECG signals using three kernels.
The convolution results are further fed to a CNN for ML-aided ECG signal analysis.The CNN is designed to identify CVD patients at sudden death risks caused by ventricular fibrillation (a type of abnormal heart rhythm).The CNN architecture is illustrated with a single ECG signal without the loss of generality (Fig. 5a) and described in detail in Methods. Figure 5b shows a typical expected (Fig. 5b(i), convolved by CPU) and measured (Fig. 5b(ii), convolved by photonic system) convolution result of normal ECG signals, whereas Fig. 5c shows those in sudden death events.All the convolutions are performed once, and the error bands shown in Fig. 5b,c  The features are effectively extracted, and the measured results resemble the expected ones.The convolution accuracy is examined by comparing 24,750 pairs of expected and measured results, showing a Gaussian error distribution with a low standard deviation of 0.015 ± 0.001 (Fig. 5d).Supplementary Fig. 17 shows the expected convolution result density.The standard deviation is lower than that obtained in MAC verification because most convolution results are small, within the range of [0, 0.5].The CNN classification accuracies are presented in Fig. 5e.In the absence of a convolution layer, only 89% accuracy can be reached.With a convolution layer that helps to extract features, the accuracy is increased to 94.0% and 93.5% when the expected and measured convolution results are used, respectively.Minor differences in loss and accuracy evolution curves are observed between the use of expected and measured convolution results (Supplementary Fig. 18), suggesting a high accuracy of photonic-system-implemented convolution using continuous-time data representation.The confusion maps of classification results are shown in Supplementary Fig. 19, showing that there is only a 1% The system has five functional blocks: input light generation and (de)multiplexing in the edge cloud, input-multiplexed RF generation at the edge device and interface, optical modulation relating edge interface and edge cloud, photonic tensor core for in-memory computing in the edge cloud, and output light (de)multiplexing and detection in the edge cloud.In the device layer, each ECG signal is a 1D timedomain signal.In the edge interface layer, the ECG signal data from patient j at time i are denoted as x ij and encoded in the amplitude of RF f mod(j,50) using λ i or λ′ i as the carrier (λ i if j ≤ 50; λ′ i if j > 50).For j ∈ [1, 100] ⊆ Z + and i ∈ [1, 3] ⊆ Z + , the input matrix X has dimension d 3×100 .In the edge cloud layer, the weight bank determined by the photonic tensor core defines a d 3×3 matrix W, containing three d 1×3 kernels.Effectively, one such matrix-matrix multiplication performs 300 convolutions resulting in a d 3×100 matrix Y, which is obtained by convolving the middle three time-domain data of 100 ECG signals using 3 kernels.PD, photodetector; EOM, electro-optic modulator; PC, polarization controller; VOA, variable optical attenuator.

Article
https://doi.org/10.1038/s41566-023-01313-xprobability that abnormal ECG signals will be misclassified as normal ECG signals.Similar details are observed in the two maps, indicating the simultaneous achievement of high accuracy, effectiveness and ultraparallelism using our system that exploits three DOFs.

Discussion and conclusion
We have demonstrated the first instance of a photonic in-memory computing architecture capable of implementing higher-dimensional MVM in a single operation cycle of a physical device by increasing the multiplexing dimensionality using RF as a carrier.By verifying the feasibility of computing with continuous-time data in the optical domain, we provide an additional pathway to increase parallelism to photonic processors.An electro-optically controlled photonic tensor core system was built to simultaneously exploit spatial, wavelength and RF DOFs to harness ultrahigh parallelism.A parallelism of 100, two orders higher than the previous implementation 34 , was achieved by multiplexing 50 RF components on top of 2 WDM channels.Leveraging this higher-dimensional processing capability and high parallelism, we configured our system to an architecture for edge cloud computing to perform a synchronous convolution of 100 clinical ECG signals from CVD patients and built a CNN capable of identifying patients at sudden death risk with 93.5% accuracy.Although these are achieved using a small-size 3 × 3 photonic tensor core, larger-size photonic tensor cores are envisioned for better compute density, compute efficiency and Time (s) ) is close to that using the expected convolution results (94.0%).Both accuracies are higher than that using the same neural network but without a convolution layer.

Article
https://doi.org/10.1038/s41566-023-01313-xmore general applications 42 .The scalability and performance estimation of larger-size photonic tensor cores are also discussed in detail (Supplementary Section 15).Crucially, the parallelism of 100 is not an upper limit (Supplementary Fig. 9); multiplexing 150 RF components is possible if lower precision is allowed.By using 16 WDM channels, an overall parallelism of 2,400 can also be achieved, suggesting that a single system can synchronously process signals from 2,400 end devices; this is currently not possible using existing technologies with lower-dimensional processing capability.Possible alternative methods towards this high computing capability include increasing the clock speed of electronics and using ultradense WDM channels.Supplementary Section 16 discusses the challenges associated with these two alternatives.Our proposed architecture is ubiquitously applicable to other photonic processing systems [43][44][45] to enrich data information by exploiting more DOFs.
A key understanding underlying the mechanism of higher-dimensional data processing is that although the wavelength spacing of 0.8 nm may be considered 'dense' in WDM, this is orders of magnitude larger from an RF perspective.Therefore, the RF dimension can be regarded as a quasi-independent dimension that enriches data information.Meanwhile, continuous-time data representation brings another key advantage of avoiding electronic logic-state flips to potentially increase the clock frequency 46 .More interestingly, the recent exploration of synthetic dimensions in photonics suggests that a single photonic cavity acousto-optic modulator naturally compatible with RF could be adopted to substantially reduce the footprint of the weighting matrix 47,48 .From the hardware perspective, even though off-chip light sources, circulators, amplifiers, modulators and photodetectors were used in a lab environment aiming to verify high parallelism, these active photonic components can be monolithically integrated on a single chip 29,49,50 .Complementary metal-oxide-semiconductor RF electronics can be adopted in the system to maximize the compute efficiency and density (Supplementary Section 17).In addition to the RF DOF, phase 51 , polarization 52 and mode 53 DOFs of light could also offer more dimensions to further parallelize signal processing.However, the possible parallelism from these dimensions is restricted by their limited number of possible states and the requirement of waveguide compactness.It is also worth highlighting that the realization of ultrahigh parallelism relies on the combination of photonics that provides the wavelength DOF and electronics that provides the additional RF DOF, suggesting that synergy between photonics and electronics should be sought to fully unleash the potential of both in a single integrated system.

Device fabrication
Waveguide devices for verification of basic operations.The fabrication started from a silicon-on-insulator wafer (Soitec) with a 220 nm silicon (Si) device layer and a 2 µm buried oxide layer.A 400-nm-thick positive electron-beam resist (CSAR-62) was spin coated on a diced 1 cm × 1 cm silicon-on-insulator chip, followed by 3 min of pre-baking at 150 °C.The electron-beam resist was patterned by electron-beam lithography (EBL; JEOL JBX-5500 50 kV) and developed in AR600-546 for 30 s, methyl isobutyl ketone for 15 s and isopropanol for 15 s in sequence.The waveguide patterns were transferred to the Si device layer (etch depth, 110 nm) by reactive ion etching (Oxford Instrument PlasmaPro) with SF 6 and CHF 3 gases, followed by O 2 plasma cleaning of CSAR.Next, a 2-µm-thick double-layer PMMA (PMMA 495 A8 and PMMA 950 A8) was spin coated on the chip, followed by EBL patterning and development in methyl isobutyl ketone:isopropanol = 1:3 for 1 min to define the sputtering windows.A 10-nm-thick/10-nm-thick Ge 2 Sb 2 Te 5 (GST)/indium tin oxide (ITO) stack was deposited on the waveguide using a magnetron sputtering system (PVD, AJA International).The GST and ITO targets were sputtered at 30 W RF power with 3 s.c.c.m.Ar flow and 40 W RF power with 3 s.c.c.m.Ar flow, respectively, at a base pressure of 10 −7 torr.The stack was then lifted off in acetone for 180 min at 50 °C.Finally, the chip was annealed on a hotplate for 5 min at 250 °C to fully crystallize the GST.
Electro-optically controlled photonic tensor core.The passive silicon photonic circuit was fabricated using the foundry multi-project wafer service provided by CORNERSTONE.The detailed specifications of CORNERSTONE standard waveguide components can be found at https://cornerstone.sotonfab.co.uk/.The fabricated Si photonic circuit has a 1-µm-thick silicon dioxide (SiO 2 ) upper cladding.SiO 2 windows were patterned by EBL and opened by hydrogen fluoride for the subsequent deposition of the GST/ITO stack, which is similar to the previously described GST/ITO sputtering procedure.Next, NiCr heater patterns were defined by EBL using a double-layer PMMA (PMMA 495-A3 and PMMA 495-A6) as the photoresist.A 200-nm-thick NiCr layer was sputtered followed by PMMA lift-off to form NiCr heaters.Gold pads with 75 nm thickness were fabricated using a similar process as the NiCr heater fabrication, but with thermal evaporation (Edwards 306).A 3-5 nm Cr layer is deposited before gold deposition to serve as an adhesion layer.The chip was then annealed on a hotplate for 5 min at 250 °C to fully crystallize the GST.Finally, the chip was wire bonded to a printed circuit board for electro-optic control.

Measurement setup
Setup for verification of operations using continuous-time data representation.Supplementary Section 2 comprehensively describes the experimental setups used to verify the fundamental operations using continuous-time data representation.The setup used to verify the transmission operation and the multiplication operation is an optical waveguide pump-probe setup (Supplementary Fig. 3), which was reported before 54 .The pump line and probe line were taking opposite routes in the waveguide by using two fibre-optic circulators.The full setup was used for multiplication.The pump laser line was idle in transmission.The setup used to verify the addition and MAC operations is a modified optical waveguide pump-probe setup that accommodates a Y junction (Supplementary Fig. 4).The pump line and probe line followed the same route in the waveguide.The full setup was used for verifying the MAC operation.The pump laser line was idle in verifying the addition operation.
System setup for synchronous convolution.The experimental setup for the synchronous convolution of 100 ECG signals is shown in Fig. 4. The photonic tensor core has three input optical channels and three output optical channels, representing a d 3×3 matrix consisting of three d 1×3 kernels.The input light was switchable between a supercontinuum laser (SuperK COMPACT, NKT Photonics) and a tunable pump laser (Santec, TSL-550) using an optical switch (Gezhi GZ-12C-1×2-SM).The PCM memory in each cell of the photonic tensor core was first set to the desired weight to correctly define the kernels.The tunable pump laser was used in the PCM weight setting.The amplified pump light passed through a demultiplexer (DEMUX) module (Gezhi, DWDM-100G-DEMUX) so that different wavelengths were routed to different input optical channels (λ 1 = 1,552.52nm to optical channel 1, λ 2 = 1,551.72nm to optical channel 2 and λ 3 = 1,550.92nm to optical channel 3).The tunable power splitters of the photonic tensor core were controlled by a microprocessing unit (Analog Devices DC2026) to ensure that all the pump power was concentrated into the PCM of the target cell.For example, to set w 23 , λ 3 was used so that the pump light was routed to optical channel 3. Cell 13 was controlled to distribute all the light into the top channel of its 2 × 2 multimode interferometer (MMI), and cell 23 was controlled to distribute all the light into the bottom channel of the MMI to efficiently set w 23 .In this case, cell 33 was idle.After setting all the PCM weights, a parallel convolution was performed using the supercontinuum laser.The DEMUX module was used to separate six wavelengths with a spacing of 0.8 nm (equivalent to 100 GHz) to different optical channels (λ 1 = 1,552.52nm, λ 2 = 1,551.72nm, λ 3 = 1,550.92nm, λ′ 1 = 1,550.12nm, λ′ 2 = 1,549.32nm and λ′ 3 = 1,548.51nm).The ECG signal data were loaded onto each wavelength using a variable optical attenuator (VOA; Thorlabs V1550A).The VOAs with the highest RF frequency of 1 kHz were driven via coaxial cables by a digital signal processor (NI USB-6259) that generated 50 multiplexed RF components.Note that in practice, when the RF frequency is high (in the gigahertz range) and the transmission distance is long (>10 m) in the edge cloud computing framework, coaxial cables should be replaced by fibre-optic connections to avoid the power loss of high-frequency signals in the coaxial cables.Here λ 1 to λ 3 were carrying three respective time-domain data points of the ECG signals 1-50, whereas λ′ 1 to λ′ 3 were carrying the same data of ECG signals 51-100.The polarization of output light from VOA was controlled by a polarization controller (Thorlabs FPC032).The six wavelengths were then grouped by a multiplexer (MUX) array (Gezhi, DWDM-100G-MUX) to form three inputs to the respective input optical channels of the photonic tensor core (λ 1 and λ′ 1 to optical channel 1, λ 2 and λ′ 2 to optical channel 2 and λ 3 and λ′ 3 to optical channel 3).Convolutions were naturally performed as light propagated through the photonic tensor core.Each output optical channel of the photonic tensor core contained all the wavelengths λ 1 -λ 3 and λ′ 1 -λ′ 3 .These six wavelengths were demultiplexed and regrouped by a MUX/DEMUX array to form two groups of multiplexed output.
Here λ 1 -λ 3 formed one group representing the convolution results of three time-domain data points of ECG signals 1-50 and λ′ 1 -λ′ 3 formed another group representing the same representation but for ECG signals 51-100.The resultant six groups of output light were detected by a photodetector array (Newport New Focus 2011).

Generation, convolution and output of multiplexed RF signals
The properties of the original ECG data collected from Holter monitors are described in the 'ECG signal dataset' section.The Holter monitors represent the edge device layer.The generation of multiplexed RF signals represents the operations performed in the edge interface layer.The convolution and output are implemented in the edge cloud layer.
For parallel convolution of the middle three time-domain data of 100 ECG signals, the input matrix is a d 3×100 matrix: The jth column of X contains the middle three time-domain data of the jth ECG signal (Fig. 4).The ith row of X contains the ith time-domain data of 100 ECG signals.Taking the first row (x 11 x 12 …x 1,100 ), for example, https://doi.org/10.1038/s41566-023-01313-x the jth element x 1j , where j ∈ [1, 100] ⊆ Z + , was encoded in the amplitude of the RF component f mod(j,50) , resulting in a continuous-time data representation of x 1j e i2πf j t .The whole row was represented by the multiplexed RF signal in 1 (t) = ∑ 100 j=1 x 1j e i2πf j t .Similarly, in 2 (t) = ∑ 100 k=1 x 2k e i2πf k t and in 3 (t) = ∑ 100 l=1 x 3l e i2πf l t .The three inputs with continuous-time data representation were mathematically generated in MATLAB R2021b, and converted to .tfwfiles 55 readable by a function generator (Tektronix AFG3102C).The subsequent electrical output from the function generator drove VOAs to load the ECG data into the optical domain.Here in 1 (t) to in 3 (t) were input to optical channel 1 to channel 3, respectively.The photonic tensor core was then effectively performing: The frequency-domain representation of Y is where y ij = w 1i x 1j + w 2i x 2j + w 3i x 3j was encoded in the RF component f mod(j,50) , representing the convolution result of the middle three time-domain data of the jth ECG signal using the ith kernel.Each row of Y was output from the output optical channel of the respective photonic tensor core.

CNN model
ECG signal dataset.Long-time-duration ECG signals (shortest duration, 4 h 15 min 10 s) from ten CVD patients were taken from Sudden Cardiac Death Holter Database in PhysioNet 56,57 .Supplementary Section 14 provides the corresponding clinical information of these ten patients.Here 50 normal pulses and 50 abnormal pulses were extracted from each patient, leading to a total of 500 normal pulses and 500 abnormal pulses.Each pulse has a 0.7 s duration.The original ECG signals have a 0.004 s time resolution.The ECG pulses were extracted with a time interval of 0.02 s (that is, one out of every five original dataset), leading to 35 datasets in the extracted ECG pulses.The 0.02 s time interval was carefully chosen to minimize the extracted dataset and maintaining the key features in the original ECG pulses.
Here 80% of the pulses were used for training and 20% were used for testing, that is, a total of 800 pulses for training (400 normal pulses and 400 abnormal pulses) and 200 pulses for testing (100 normal pulses and 100 abnormal pulses).
CNN architecture.The CNN architecture is shown in Fig. 5a.The input layer takes the ECG pulse, which is in the form of a d 35×1 1D array.Time multiplexing is used to assist in sending the data of the ECG signals.
At each time step, the convolution window takes three data points.
The window is moved by one data point after each step.Therefore, 35 - 3 + 1 = 33 time steps are required to process the whole trace containing the 35 data points.This signal, represented as a 1D array is passed to a convolution layer consisting of three d 1×3 kernels.Convolution operations were implemented with a stride of 1 and valid padding, resulting in a d 3×(35-3+1) output.The output was activated by a rectified linear unit layer and flattened to a d 99×1 vector.The flattened activated output was then fed to a fully connected layer with 20 neurons.The output from the fully connected layer was converted into probabilities by a softmax layer.Finally, the classification result was obtained.The ECG pulses were classified into 20 categories, representing two heart health conditions (normal or abnormal) of 10 individual patients.The convolution operations were implemented using the electro-optically controlled photonic tensor core system.The convolution results were processed by the following CNN layers using the deep learning toolbox in MATLAB R2021b.Weights of the fully connected layer were trained by the Adam optimizer.Here 100 epochs were used to reach the final CNN outcomes.

Fig. 1 |
Fig. 1 | High-dimensional photonic in-memory computing using data with three DOFs.Comparison of computing schemes.a, Traditional electronic computing uses the spatial DOF for data input, inputting 1D arrays to achieve MVM.b, Recent photonic computing uses the spatial and wavelength DOFs,

Fig. 2 |
Fig. 2 | Data architecture and working principle of a photonic tensor core for in-memory computing using continuous-time data representation.a, Target matrix-matrix multiplication using only one optical wavelength and N multiplexed RF components.b, Implementation of the matrix-matrix multiplication.The weight matrix W of dimension d K×M containing K d 1×M kernels is defined by the tensor core with M inputs and K outputs.Carried by one wavelength λ 1 , a d M×N matrix X is input using M input optical channels and N multiplexed RFs.The nth d M×1 vector (x 1n x 2n …x Mn ) T is encoded in the amplitude

1 Fig. 3 |
Fig. 3 | Photonic addition, multiplication and MAC operations using continuous-time data representation with 50 multiplexed RFs.All the input numbers x are randomly generated from [0, 1] ⊆ R with a 0.01 resolution.a, Y junction loaded with PCM memories on each arm for the verification of operations.b,c, Comparison of normalized measured and expected time-domain addition output (b) and frequency-domain output (c).d, Quasi-analogue PCM weight setting using optical pump pulses with varying widths.e, Accuracy of 1,500 multiplication results from 300 random multiplicands and 5 multipliers.The inset shows the normalized error distribution.s.d., standard deviation.f, Accuracy of 1,500 MAC results using 300 pairs of random numbers and 5 pairs of weights.The inset shows the normalized error distribution.

Fig. 4 |
Fig.4| System architecture for edge cloud computing to synchronously convolve 100 clinical ECG signals from patients with CVD.The system has five functional blocks: input light generation and (de)multiplexing in the edge cloud, input-multiplexed RF generation at the edge device and interface, optical modulation relating edge interface and edge cloud, photonic tensor core for in-memory computing in the edge cloud, and output light (de)multiplexing and detection in the edge cloud.In the device layer, each ECG signal is a 1D timedomain signal.In the edge interface layer, the ECG signal data from patient j at time i are denoted as x ij and encoded in the amplitude of RF f mod(j,50) using λ i or

Fig. 5 |
Fig. 5 | Healthcare monitoring of CVD patients using a CNN.a, CNN architecture.The CNN is designed identify CVD patients at the risk of sudden death.ECG signals are supplied to the input layer.The system presented in Fig. 4 performs higher-dimensional convolution.A rectified linear unit (ReLU) layer, a fully connected layer and a softmax layer are applied in sequence after convolution.b,c, Comparison of expected convolution results (CPU convolved) (i) and measured convolution results (photonic system convolved) (ii) of normal ECG signals when patients are safe (b) and when patients are at risk when

yy 31 y 32 ⋯ y 3
11 x 1j + w 21 x 2k + w 31 x 3l ) × δ (f j − f k ) (f k − f l ) 12 x 1j + w 22 x 2k + w 32 x 3l ) × δ (f j − f k ) (f k − f l ) 13 x 1j + w 23 x 2k + w 33 x 3l ) × δ (f j − f k ) (f k − f l ) 11 y 12 ⋯y 1,100 y 21 y 22 ⋯ y 2,100 . Conceptually similar to WDM, continuous-time data are generated by multiplexing radio-frequency (RF) signals at different frequencies, where data are encoded in RF amplitudes.As this was done in electronics, the input data are a 2D array restricted to spatial and RF DOFs, leading to one d K×M × d M×N matrix-matrix multiplication (equivalent to N d K×M × d M×1 MVMs) if N RF components are used.Inspired by such advances, in this paper, representing multiple d M×1 vectors.Here N RF components are multiplexed to produce this d M×N matrix.The nth vector is carried by the corresponding RF component at frequency f n .Data in the mth row are carried by a continuous-time signal in m (t) = ∑ Supplementary Section 3 shows the basic transmission performance of multiplexed RF modulated optical signal.To verify the addition operation, two weights are idle.Each value in a d 1×50 vector (x 1 x 2 …x 50 ) is encoded in the respective RF amplitude.Two multiplexed RFs modulate two optical carriers to generate continuous-time inputs, namely, in 1 the greatest common divisor.All the numbers are randomly generated from [0, 1] with a 0.01 resolution.