Programmable photonic neural networks combining WDM with coherent linear optics

Neuromorphic photonics has relied so far either solely on coherent or Wavelength-Division-Multiplexing (WDM) designs for enabling dot-product or vector-by-matrix multiplication, which has led to an impressive variety of architectures. Here, we go a step further and employ WDM for enriching the layout with parallelization capabilities across fan-in and/or weighting stages instead of serving the computational purpose and present, for the first time, a neuron architecture that combines coherent optics with WDM towards a multifunctional programmable neural network platform. Our reconfigurable platform accommodates four different operational modes over the same photonic hardware, supporting multi-layer, convolutional, fully-connected and power-saving layers. We validate mathematically the successful performance along all four operational modes, taking into account crosstalk, channel spacing and spectral dependence of the critical optical elements, concluding to a reliable operation with MAC relative error \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< 2\%$$\end{document}<2%.


S1 PPNN modes of operation
Depending on the configuration of the switches in Photonic Neural Network (PNN) axons, introduced in main body of the Manuscript, Fig. 1, the Programmable (P)-PNN can operate in 4 distinct modes illustrated in Fig. S1. Left-hand side of the Fig.  S1 shows the n-th branch (axon) of the PPNN, according to Fig. 1(e) from the main body of the Manuscript, with inaccessible (inactive) optical paths represented as semi-transparent, whereas the right-hand side of Fig. S1 shows the corresponding abstraction of the NN layer.
When the switch S X,n is in its on or up state (S X,n = 1, ∀n), as in Fig. S1(a), (b), each channel m carries its designated input sequence X m = [x 1,m , . . . , x N,m ], depicted by appropriately colored input circles in the right-hand side abstractions, otherwise, when S X,n is in its off or down state (S X,n = 0, ∀n), as in Fig. S1(c), (d), all channels m ∈ [1, M] carry identical input sequence X 0 = [x 1,0 , . . . , x N,0 ], represented by grey circles in the right-hand side abstractions.
Similar conclusion can be made for weights, with the exception that they are controlled by a combination of input, S X,n , and weight switches, S W,n . Let us introduce a convention where the bar (straight) position of the weight switch is assumed as its on state (S W,n = 1), whereas the cross position is assumed as its off state (S W,n = 0). If the optical signal can reach the upper weight modulator bank of the axons, enclosed between the demultiplexer (DEMUX) and multiplexer (MUX), as is the case in Fig. S1(a) for S X,n = 1 ∧ S W,n = 1, ∀n or in Fig. S1(c) S X,n = 0 ∧ S W,n = 0, ∀n, each channel m will be pondered by its designated weight set W m = [w 1,m , . . . , w N,m ], depicted by appropriately colored lines connecting input and output circles in the right-hand side abstractions. Otherwise, if the optical signal reaches a single weight modulator, as is the case in Fig. S1(b) for S X,n = 1 ∧ S W,n = 0, ∀n or in Fig. S1(d) S X,n = 0 ∧ S W,n = 1, ∀n, all m ∈ [1, M] channels will be pondered by identical set of weights W 0 = [w 1,0 , . . . , w N,0 ], represented by grey lines connecting input and output circles in the right-hand side abstractions.
Finally, the output switch S O,n is not controlled independently; its state depends on the path along which the optical signal will arrive to it. Its truth table is given in Table 1 in the main body of the Manuscript and can be summarized through XNOR operation as S O,n = S X,n ⊙ S W,n .
The following modes of operation are supported by PPNN, according to Fig. S1 and Table 1: c) fully-connected -All channels λ m carry identical N-element input sequence x n,0 where n ∈ [1, . . . , N] (in total N unique inputs) which is pondered by a designated set of N × M weights w n,m , yielding M outputs y m . As illustrated in Fig.  S1(c) this implies that each of N inputs is connected to each of M outputs via a unique connection, concluding to a fully-connected layer. These types of layers are particularly convenient for classification and denoising purposes. d) power-saving -All channels λ m carry identical N-element input sequence x n,0 where n ∈ [1, . . . , N] (in total N unique inputs) which is filtered by a single N element set of weights w n,0 , yielding M identical outputs y 1 = . . . = y M . From practical point of view, this layer is intended to be used with only one channel being active when sequential operation is required concluding to a power-saving regime (M − 1 channels are powered off). If all available channels are active, it can be useful for in-situ PPNN calibration purposes with respect to wavelength sensitive performance.  Figure S1. Mapping between the PPNN modes of operation represented through its n-th branch (axon) configuration (left-hand-side) and the corresponding NN layer abstraction (right-hand-side) for: (a) multi-neuron, (b) convolutional, (c) fully-connected (FC), and (d) power-saving arrangement. Semitransparent paths of axons are inaccessible to the optical signal. MUX: multiplexer, DEMUX: demultiplexer

S2 Multichannel PPNN theoretical foundations
The input signal into the PPNN, given by the column-vector E LD , as defined in the main body of the Manuscript, first passes through the 3dB X-coupler, as shown in Fig. 1(a), resulting in the signal entering the bias branch, E B,in , and the one entering The outputs from N axons will subsequently interfere within N-to-1 combiner given in Fig. 1(d). The combiner is designed to be a π-rotated copy of the input 1-to-N splitter, where one stage's lower outputs enter the next stages inputs, ensuring in that way identical phase accumulation across all N signals, while the upper outputs of previous stage are discarded. The result of the interference is where l n represents the n-th element of the reversed sequence of Hamming (binary) weights of N L = [log 2 N, . . . , 3, 2, 2, 1, 2, 1, Finally, l n + k n can be substituted by log 2 N according to equations (S3) and (S6). The outputs of the bias branch, E B,out , and the OLAU, E OLAU,out , are The two signals interfere in the last 3dB X-coupler of Fig. 1(a), giving

S3 Engineering a non-power-of-2 splitter and combiner
In what follows, we present an algorithm for designing a 1-to-N splitter with an arbitrary number of outputs N based on cascading the stages of X-couplers that are not restricted to equal power splitting and their electric field transfer function is described by the following matrix where α denotes power fraction transmitted to the cross port of the coupler. The flowchart of the algorithm is given in Fig. S2 and it is designed to be resilient to variations in splitting ratio due to fabrication tolerances by limiting α to the range [1/2, 2/3]. We verify the algorithm for N = 9 and N = 11. The algorithm starts by examining if the splitter can be implemented by cascading the log X N identical, smaller-scale unit cells providing 1-to-N 1/Y splitting, with X,Y, log X N, N 1/Y ∈ N. If so, X = N 1/Y , otherwise X = N and designing of the 1-to-X splitter is initiated. If X is not a prime number, a further splitting of unit cells to k sub-cells should be done, such that the output port count of a single sub-cell, x i , is a prime number, X = x y 1 1 x y 2 2 . . . x y k k . Following the algorithm, in case of N = 9, we have X = 3 and Y = 2, whereas for N = 11 we have X = 11 and Y = 1, where both X values are prime numbers, yielding x 1 = 3 and y 1 = 1 in the first case and x 1 = 11 and y 1 = 1 in the second case. The unit cell (or sub-cell) design requirements are to achieve equal power splitting among all of its outputs, while the induced phase difference will be monitored for each output and compensated by proper engineering of the combining stage.
In designing the sub-cell(s), if the number of outputs, x i , is even, a 3dB splitter is used with the input forwarded to the upper port and the two outputs, equal in terms of power, recorded at the upper and lower output port, having accumulated phase shifts of 0 and π/2, respectively. In general, the algorithm would proceed by designing two smaller-scale splitters, each of which would have x i = x i /2 output ports; however, since the only even prime number is 2, yielding x i = 1, the design of the sub-cell is finished.
Otherwise, if the number of outputs, x i , is odd, the algorithm starts by bringing the input signal to the upper input of the X-coupler with α = (x i + 1)/(2x i ), and collecting the signals having a 0-phase change for the upper output and a π/2 for the lower one. The algorithm proceeds with designing new sub-cells, one with the number of outputs x i = (x i − 1)/2 and the other with x i = (x i + 1)/2. It starts by checking if the number of outputs is even or odd and follows the previously outlined procedure which is repeated until x i = 1 is reached.
Applying this method to the arbitrary number of outputs X, where X is a prime number, the number of couplers for 1-to-X cell will be X − 1, and the number of stages (cascades) ⌈log 2 X⌉, implying that the maximum accumulated phase shift per cell will be ⌈log 2 X⌉π/2. For the whole 1-to-N splitter, (N − 1) couplers are needed, arranged in Y ⌈log 2 X⌉ stages, yielding a maximum phase accumulation of Y ⌈log 2 X⌉π/2.
Applying the developed algorithm to x i = 3, we have the first X-coupler with α = 2/3, or the splitting ratio 1/3-to-2/3, followed by another 3dB (or α = 1/2) X-coupler connected to the lower port of the initial coupler. The three outputs are equal in terms of power and have accumulated phase shifts of [0, π/2, π]. The full 1-to-N splitter can be realized by concatenating 1-to-X unit cells at each of the outputs of the initial, first-layer's cell. In the case of N = 9, we use a total of N − 1 = 8 couplers, the powers at the outputs are identical, 1/N = 1/9 of the input power, whereas the phase accumulation within the n-th axon reads exp(ik n π/2), where k n is the n-th element of k = [0, 1, 2, 1, 2, 3, 2, 3, 4].
The combining of the signals leaving the axons is done in an inverse manner, using X-to-1 combiner elementary units, constructed by rotating the splitter elementary unit by π. Signals are forwarded to both inputs of X-coupler, but collected only from the lower output. In this manner, it is ensured that the phase accumulation for the signal coming from the n-th input will read exp(il n π/2), where l n is the n-th element of ⌈log 2 X⌉ − k, yielding an overall identical phase accumulation for all signals, ensuring coherence preservation and constructive interference.
When it comes to the unit cell with x i = 11, we split the power as 5/11-to-6/11 by setting α = 6/11. We then proceed with the design of two couplers, one with x i = 5 outputs and the other with x i = 6 outputs. The first one splits the input with the power ratio of 2/5-to-3/5, further forwarded to 3dB coupler for the upper and 1-to-3 coupler for the lower port (designed by concatenating 1/3-to-2/3 coupler and another 3dB coupler at the lower output port). The second coupler, used for x i = 6, starts with a 3dB coupler, followed by two 1-to-3 couplers for each of the outputs. The total number of couplers used is X − 1 = 10, the total number of stages is ⌈log 2 X⌉ = 4 and the total phase accumulation is 2π. The X-to-1 combiner is designed following the same, previously described approach: rotating the splitter by π and collecting the outputs only from the lower ports.  Figure S2. Flowchart of the algorithm for designing power-conserving 1-to-N splitting stage by employing X-couplers. The couplers can have arbitrary splitting ratio defined on the domain α ∈ [1/2, 2/3] and N does not need to fulfil any particular requirement.

S4 Splitters, combiners and switches
Let us assume that X-coupler has a wavelength dependent transfer function such that (S9) for the m-th channel can be rewritten as where α m = 1/2 + ∆α m denotes the power splitting ratio of the m-th channel and ∆α m its deviation from the targeted value of 1/2 in case of power-of-2 splitting/combining stages. The deviation can be either positive or negative and is not required to fulfil any particular requirement, except that its magnitude does not exceed 1/2, i.e., that the X-coupler allows communication between all input and output ports. When operated in splitter mode (having active only one input, E in ), the column-vector signals exiting through the two output ports of the X-coupler, E out,bar and E out,cross will be given as where diagonal matrices A bar and A cross carry the wavelength dependent deviations of splitting ratios On the contrary, when operated in coupler mode (having active both inputs, E in,bar and E in,cross ), the column vector of the signal leaving the X-coupler, E out reads Let us also assume that the transfer function of the switch introduces a wavelength dependent loss-penalty originating from non-ideal routing, such that the amount of optical power forwarded to the active port (consult Table 1 in the main body of the Manuscript) is proportional to s m ≤ 1, implying that the electrical field of the optical signal passing through the switch gets pondered by √ s m . Assuming that the inactive branches of the input or weight banks will have their modulators set to zero-transmission, we can assume that the excess optical power, proportional to 1 − s m , will diminish and is not of concern for further analysis. The transfer function of the switch can be given in matrix form Having three switches in each axon (S X , S W and S O , see Fig. 1(e) in the main body of the Manuscript), and assuming they are identical among themselves and among different axons, the loss-penalty will accumulate to S 3 . Taking (S10)-(S14) into account, we repeat the procedure from Section S2 and find that the signals entering the bias branch and the OLAU read After being passed through 1-to-N splitting stage, the signal entering the n-th axon is where k n and l n denote the n-th element of the Hamming weight sequence and its reverse, given by (S3) and (S6), respectively. At the output of the n-th axon, accounting for the switch-induced wavelength selective loss, we have E OLAU,out,n = 1 √ 2 1 √ N S 3 A 1+l n bar A k n cross e iπ/2 k n W n X n E LD . (S17)

6/15
Passing the signals from all axons through N-to-1 combining stage yields The signal leaving the bias branch reads Finally, the two signals given by (S18) and (S19) interfere in the last X-coupler, giving where W b denotes the bias branch channel-wise transfer matrix accounting for loss balancing and phase alignment, with its m-th element being

S5 Inputs: Amplitude modulator -MZM
In case of input imprinting, we assume that Mach-Zehnder Modulators (MZMs) are voltage controlled, with both of their arms having Phase Shifters (PSs) and that splitting/coupling is ideal in terms of optical power. Induced phase shifts are decomposed to the contribution coming from DC bias voltage, φ DC,1/2 (V DC,1/2 , λ ), and modulation RF voltage, φ 1/2 (V 1/2 , λ ), where subscripts 1 and 2 correspond to upper and lower phase shifter, respectively. Assuming push-pull operation, i.e., V 1 = V RF and V 2 = −V RF , and assuming that the refractive index n dependence on the applied voltage can be represented by an odd function in the 1 st order approximation, phase shifts can be written in the following form where L DC and L stand for the lengths of the DC and RF electrodes of PSs, which may be different if separate phase sifters are used, or identical if bias is applied together with the RF signal, whereas n 0 = n(V = 0) denotes the refractive index of the material without having the voltage applied and ∆n(V ) = n(V ) − n 0 . The electric field transfer function of the MZM in push-pull configuration reads Let us assume that the modulator is centered to operate at the wavelength λ c , which can be either equal to the channel wavelength if a modulator-per-channel is used (as is the case in modes of operation #1 and #2 shown in Fig. S1(a), (b)), or chosen independently if one modulator for several channels is used (as is the case in modes of operation #3 and #4 shown in Fig. S1(c), (d)). In either of the two cases, the transfer function should be optimized to yield the appropriate x n,c value at λ c and the deviation should be monitored for the remaining wavelengths. Choosing the length of the PS such that φ 0 (λ c ) = 2p x π, where p x ∈ N and thus eliminating the accumulated phase shift at the central wavelength, we have Assigning the minimum value of the input transfer function to zero RF voltage requires following condition to be met where q 1 ∈ Z. The simplest approach, by which generality is still not lost, is to choose q 1 = 0 and set φ DC, In order to eliminate the accumulated phase shift at λ c , we choose the DC voltages such that φ DC (V DC , λ c ) = 2q x π + π/2, where q x ∈ N resulting in which will eventually be equal to the input x n,c , which we are aiming to imprint at λ c . Variations of ∆φ (V RF , λ ) and φ DC,1 (λ ) − φ DC,2 (λ ) with wavelength can be neglected in the following analysis as they are orders of magnitude lower than the variation of either φ 0 (λ ) or φ DC,1 (λ ) + φ DC,2 (λ ), i.e., they are proportional to ∆n, as opposed to n. This implies that no significant variation of the transfer function magnitude is anticipated with variation of wavelength; rather, the major contribution will be reflected within the transfer function's phase. This allows us to write, based on equation (S23) and with previously introduced assumptions Restricting ourselves to the 1 st order approximation, phases φ 0 and φ DC can be estimated for λ in close proximity of λ c as where ∆λ = λ − λ c and n g = n/(1 + λ /n · ∂ n/∂ λ ) is the group index of refraction. Introducing equation (S30) to equation (S29) we have which implies that for a nominal input x n,c , only the channel λ c will have the targeted value imprinted, whereas any other channel m will carry the signal where the subscript "{m, c}" denotes that the value x n,m,c is experimental (recorded at channel m ̸ = c) rather than targeted. In equation (S32b), ∆λ 1 = λ m+1 −λ m denotes the channel spacing (assuming equidistant channels), whereas p x = n(V = 0, λ c )L/λ c and q x = n(V DC , λ c )L DC /λ c represent the normalized lengths of the RF and DC pads of the phase shifters within the MZM and are restricted to p x , q x ∈ N.

S6 Weights: Amplitude modulator followed by a phase shifter -MZM-PS
Assuming that both arms have thermally-controlled PSs and that splitting/coupling is ideal in terms of power, the MZM's electric field transfer function will depend on different phase shifts in two arms, φ 1 and φ 2 . Adding an additional PS following the MZM, with the phase shift φ 3 , allows for precise control of the signal's phase, which carries the sign of the weight. The electric field transfer function of the MZM-PS system reads
At any point in time, only one phase shifter is being used for adjusting the weight magnitude |w n,c | by increasing its temperature. The lengths of the two PSs within MZM arms are equal and the inherent phase difference is achieved by increasing/reducing the length of the waveguide in the arms by the appropriate amount. Under these assumptions, phases φ 1 and φ 2 can be written in the following form, if the magnitude of the weight is |w n,c | ≤ cos θ or, if |w n,c | ≥ cos θ where φ (T 0 , λ ) = 2πn(T 0 , λ )L/λ and ∆φ (∆T, λ ) = 2π∆n(∆T, λ )L/λ . Based on equations (S34) and (S35), the sum and the difference of the two phases is Substituting equation (S36) to equation (S33) we have Incorporating the condition for eliminating the phase offset at the nominal temperature introduced earlier, φ (T 0 , λ c ) = 2p w π, where p w ∈ N, we can equate the transfer function of the MZM-PS system, given by equation (S37), at the central wavelength λ c with the targeted weight value w n,c and determine the required thermally induced phase shift in MZM arms, as well as in the subsequent standalone PS as follows ∆φ (∆T, λ c ) = 2 sgn (|w n,c | − cos θ ) (θ − arccos |w n,c |) , (S38a) When looking at λ ̸ = λ c , variation of ∆φ (∆T, λ c ) with wavelength can be neglected as it is orders of magnitude lower than variation of either φ (T 0 , λ ) or φ 3 (λ ), i.e., it is proportional to ∆n, as opposed to n, resulting in Restricting ourselves to the 1 st order approximation, phases φ and φ 3 can be estimated for λ in close proximity of λ c as where ∆λ = λ − λ c and n g = n/(1 + λ /n · ∂ n/∂ λ ) is the group index of refraction. Introducing equation (S40) to equation (S39) and recognizing that in all cases of practical interest p s , p w ≫ 1, we have

9/15
which implies that for a nominal weight w n,c , only the channel λ c will have the targeted value imprinted, whereas any other channel m will carry the signal w n,m,c ≈ w n,c exp −iξ where the subscript {m, c} denotes that the value w n,m,c is experimental (recorded at m ̸ = c) rather than targeted. In equation (S42b), ∆λ 1 = λ m+1 − λ m denotes the channel spacing (assuming equidistant channels), whereas p w = n(T 0 , λ c )L/λ c and p s = n(T 0 , λ c )L 3 /λ c represent normalized lengths of the PSs within the MZM and the standalone PS, respectively, and are restricted to p w , p s ∈ N.

S7 Signal multiplexing and demultiplexing
As outlined in the main body of the Manuscript, for purposes of (de)multiplexing, Arrayed Waveguide Gratings (AWGs) are used with the assumption of parabolic channel-wise power transfer function. According to the power conservation law, the transfer function of the pass channel reads T AWG (0) = (1 + 2r AWG ) −1 and in the case of suppressed channels we have T AWG (±∆λ 1 ) = r AWG /(1 + 2r AWG ), with r AWG denoting AWG crosstalk in linear terms. The formalism above is valid for both DEMUX and MUX. In case of DEMUX, m-th channel, denoted by subscript, is distributed to the targeted and two adjacent ports, denoted by superscript These signals get modulated either by {x n,m−1 , x n,m , x n,m+1 } or {w n,m−1 , w n,m , w n,m+1 } depending on the mode of PPNN operation. However, as already shown in Chapters S5 and S6, being detuned from the wavelength for which the modulators are optimized, side channels, indexed by m ± 1, will carry suboptimal input or weight value. In the following analysis we focus on imprinting of inputs in modes of operation #1 and #2, as shown in Fig. S1 (a), (b), recognizing that the same formalism can be applied for weights imprinting in cases #1 and #3, as given in Fig. S1 (a), (b). After demultiplexing, signals are pondered by the corresponding x n,m,c values as follows When reaching the MUX, instead of collecting only the pass channel (at the m-th port), MUX will also collect residuals of the two adjacent ports (indexed m − 1 and m + 1 in the superscript): all of which are at the same wavelength λ m yielding the output electric field Two additional sets of approximations can be made: (i) knowing that the crosstalk exists only between adjacent channels, and assuming that the channel spacing ∆λ 1 is not large, phase shift due to suboptimal inputs/weights can be neglected implying x n,m,m−1 ≈ x n,m−1 and x n,m,m+1 ≈ x n,m+1 , and (ii) typical values of r AWG ≪ 1 allow to approximate (1 + 2r AWG ) −1 ≈ 1 − 2r AWG finally resulting in an experimentally recorded input under the constrain x n,0 = x n,M+1 = 0. The same formalism can be applied to weights in modes of operation #1 and #3 w AWG n,m ≈ w n,m + r AWG (w n,m−1 − 2w n,m + w n,m+1 ) , with w n,0 = w n,M+1 = 0, as well as biases in all modes of operation with w b,0 = w b,M+1 = 0.

S8 Approximate experimental PPNN matrices
Experimental operation of the PPNN can be described by its transfer function, Q e , given in diagonal matrix form, similar to the targeted one, Q t , defined by equation (2) where quantities indexed by "e" take different form depending on the mode of operation.

S8.2 Convolutional
Following the Fig. S1 Following the previously adopted approach while deriving Q t , where accumulated phase was not taken into consideration for the transfer matrix of the PPNN (see equations (1) and (2) in the main body of the Manuscript), equation (S54) can be rewritten as where the phase shifters in the bias branch take the responsibility for phase-aligning the signals leaving the OLAU with the signals coming from the bias branch to allow for constructive interference. The diagonal matrix describing the bias branch will now read W with the m-th element of W under the constrain x n,0 = x n,M+1 = 0 and w b,0 = w b,M+1 = 0.

S8.3 Fully-connected
This mode of operation, given in Fig. S1(c), exhibits similar behaviour to convolutional mode, having its inputs imprinted by a single modulator for all channels, whereas the weights are controlled on per-channel basis, yielding Substituting equation (S32) to equation (S59) we have Disregarding the accumulated phase, equation (S60) can be rewritten as and with the m-th element of W under the constrain w n,0 = w n,M+1 = 0 and w b,0 = w b,M+1 = 0.

S8.4 Power-saving
Final, power-saving mode of operation is given in Fig. S1(d). If used with a single channel, no deviation due to either AWG or wavelength-dependent operation of the modulators will exist and the experimental matrix element will be equal to the targeted one. However, if employed in PPNN calibration with all channels active using a single input and a single weight modulator per axon, matrix element will read where W c and Ξ (w) c being given by equations (S62) and (S56), Substituting equation (S49) to equation (S67) we have

S9 PPNN performance metrics
Let us assume that the number of active channels is given as M A ≤ M and the number of active axons as N A ≤ N. Insertion loss (IL) for the bias branch, in units of dB, remains identical for all four modes of operation given in Table 1 in the main body of the Manuscript, whereas the IL of the OLAU depends on the path taken by signal, given by states of the switches S W and S O as follows where we adopt the following notation for the insertion losses originating from R . In (S70b) we assume that the splitting and combining stages are designed according to the algorithm from Fig. S2 with X being the smallest principal integer root of N; if N is a power of 2, log X N⌈log 2 X⌉ reduces to log 2 N.
The loss of the PPNN as a whole will be dictated by IL OLAU , being the greater of the two given by (S70). Leaving the optical power, or, equivalently, the loss, in the bias branch as is, will allow its proper operation if bias is used only for sign conversion from phase to the amplitude of the electrical field; however, if the bias also carries useful information, the losses in the bias branch and the OLAU should be made equal, which can be achieved either by relying on the T/O MZM used for bias weight amplitude modulation to suppress the excess optical power or by introducing a Variable Optical Attenuator (VOA) in the bias branch with the attenuation equal to Finally, the PPNN loss reads As a comparison, the non-programmable counterpart of PPNN (denoted as dual-IQ), which supports only one channel, has the insertion loss of IL dual−IQ = 2(1 + log X N⌈log 2 X⌉)IL C + IL X + IL W + IL implying that the penalty introduced by programmability and multi-channel operation reads Power consumption of the PPNN is dictated by all of its active components, including the Laser Diodes (LDs), assumed to have the optical output power P LD per channel and wall-plug efficiency of η wp , input amplitude modulators (P (DC) X and P (RF) X ),

13/15
weight amplitude and phase modulators (P W ) and switches (P S ). Having input modulators biased such that they output 0 at zero RF voltage, (S25), implies that their power consumption will be proportional to the number of active axons N A and active channels M A . Similar conclusion can be made for the power consumption of the weights, even though they are biased at 2θ point at the nominal temperature T 0 , (S33)-(S35), implying that their transfer function is nonzero if control voltage signal is not applied. Nevertheless, this poses no issue as the signals will already be suppressed by zero transfer function of the input modulators when N < N A , or, will not be launched into the PPNN if M A < M. The total power consumption (in units of mW) can be calculated as where we also account for the power consumption of the optional Transimpedance Amplifiers (TIAs), P TIA , following the photodiodes (PDs) if immediate detection is mandated by the specific application, as well as the optional temperature controller (TEC), P TEC . Note that these two terms get reduced proportionally to the number of interconnected PPNN layers. The power consumption per active channel is As (S75)-(S76) show, there is no power penalty when excess LDs are powered off, M A < M. On the other hand, penalty exists when the number of employed axons is below the maximum one, N A < N, which is attributed to the synchronized switch states in all axons (a penalty that can be alleviated by allowing the switches to be set independently), as well as due to the DC biasing of the input modulators. We note that P (DC) X can be set to zero if asymmetrical MZMs are used, providing a built-in phase difference of π between the upper and lower MZM branch.
In case of the non-programmable PNN, which supports only one channel, the power consumption amounts to P dual−IQ = P LD η wp + NP (DC) X + N A P (RF) X + (N A + 1)P W + P TIA + P TEC .
If PPNN is configured to operate in multi-neuron mode (#1), where S X = S W = S O = 1, or in power-saving mode (#4), where M A = 1, S X = S O = 0 and S W = 1, P dual−IQ and P PPNN,m are comparable and only marginal power-consumption penalty arises in the PPNN case attributed to switches, which is, in mode #1, counterbalanced by the reduction in TEC power consumption on per-channel basis. On the contrary, operating in modes #2 (convolutional) or #3 (fully-connected) allows sharing of the weight or input modulators, driving the power consumption of PPNN below the one of dual-IQ.
It is worth noting that the power consumption of lasers need not be the maximum available; LDs should be biased such that they guarantee enough power at the PPNN output to meet the sensitivity requirements (P R ) and the appropriate margin (IL M ) as follows P LD = P R 10 (IL PPNN +IL M )/10 .
Footprint of the PPNN is governed by the number of employed components and the minimum spacing between them. Let us denote the length of one X-coupler with the associated routing waveguides as L C , the length of the whole axon as L (PPNN) A and the minimum spacing between the waveguides L ∆ . Without accounting for any particular optimization in device placement, we estimate the PPNN area as , implying that the added benefit of programmability introduces a penalty along the longitudinal neuron dimension. When it comes the to lateral one, if we assume the best-case scenario for PPNN operation, M A = M, the coefficient pondering L ∆ reduces to (N + 1) + (N − 1)/M, which, in the limiting case of a very large M, yields N + 1, revealing the always-present penalty in lateral dimension originating from the added benefit of programmability, i.e., the existence of two alternative routes a signal can take within the input and/or weight banks. In a more realistic case, when M is large enough, but not infinite, e.g., of the order of N, the lateral coefficient yields approximately N + 2. The larger the N, the lesser footprint penalty will exist [∼ (1 + 2/N)]. On the other hand, when operating with a single wavelength (such as in mode #4 where M A = 1), the lateral penalty becomes proportional to M, i.e., the number of channels for which the PPNN was designed.
The throughput of the PPNN in inference applications, measured in Multiply-Accumulate (MAC) operations per second, depends on the bandwidth of the input modulators and the mode in which the network is operated. Assuming the maximum datarate of B X , we find that modes #1 through #3 operate at whereas in mode #4 the throughput reduces to T PPNN (#4) = N A B X , which can be deduced also from M A = 1. In other words, the throughput per channel equals T PPNN,m = N A B X . Finally, the footprint-and energy-efficiency (η PPNN F and η PPNN E ) are defined as the ratios of the throughput and the area and consumed power, respectively, or equivalently, their per-channel values, and can be calculated based on (S82), (S79) and (S75).