Rotating neurons for all-analog implementation of cyclic reservoir computing

Hardware implementation in resource-efficient reservoir computing is of great interest for neuromorphic engineering. Recently, various devices have been explored to implement hardware-based reservoirs. However, most studies were mainly focused on the reservoir layer, whereas an end-to-end reservoir architecture has yet to be developed. Here, we propose a versatile method for implementing cyclic reservoirs using rotating elements integrated with signal-driven dynamic neurons, whose equivalence to standard cyclic reservoir algorithm is mathematically proven. Simulations show that the rotating neuron reservoir achieves record-low errors in a nonlinear system approximation benchmark. Furthermore, a hardware prototype was developed for near-sensor computing, chaotic time-series prediction and handwriting classification. By integrating a memristor array as a fully-connected output layer, the all-analog reservoir computing system achieves 94.0% accuracy, while simulation shows >1000× lower system-level power than prior works. Therefore, our work demonstrates an elegant rotation-based architecture that explores hardware physics as computational resources for high-performance reservoir computing.


Report on Rotating neurons for all-analog implementation 1 of cyclic reservoir computing
In this work, the authors realize an analog electronic implementation of cyclic reservoirs.
On the plus side: -cyclic reservoirs have to my knowledge not been implemented before (while reservoirs with delay lines, which are very similar, have been extensively studied).
-the method to implement cyclic reservoirs, using a rotating logic element, is novel and elegant.
-the system is fully analog, including an analog output layer. An interface with a tactile screen is presented.
-An estimate of the energy consumption of an integrated system using CMOS technology is presented.
-Performance on tasks is good. -The paper is mostly well written and easy to follow.
On the minus side: -The trick of using a rotating logic element is a trick, not an important conceptual advance.
-The cyclic reservoir is only a minor variation on delay reservoirs, which have been much studied. -A convincing argument for using cyclic reservoirs with rotating elements, rather than delay reservoirs, is not presented. -The analog output layer has been presented in previous work.
Overall my feeling is that, while this is elegant and well implemented work, it does not reach the threshold of impact and broad interest for publication in Nature Communications.
Major comments: The prototype system is not presented in detail. Please provide enough details for another researcher to be able to reproduce the experiment.
Methods. Power estimation. I find this section unclear. -Eq. 11 does not have any dependence on the number N of neurons, only on M.
-"The M parallel eRNRs can share one counter but the power for the other components increases." Increases with what ? Unclear -What would be the best speed at which to operate the system? 50ns per operation to be compatible with the memristor array? -Panels c and d. Please add in caption that this is 1 step ahead prediction.
-Panel e and f. Unclear how these are computed. Please clarify.
Reviewer #3 (Remarks to the Author): In the manuscript "Rotating neurons for all-analog implementation of cyclic reservoir computing" by X. Liang, Y. Zhong, J. Tang, et. al., the authors adopted a cyclic reservoir computing architecture which they show can be efficiently implemented using rotating neurons reservoir integrated with analog memristor array. They prove the equivalence between software simulation and hardware implementation. A proofof-concept prototype of RNR was developed and to demonstrate near-sensor computing. The novel hardware design is tested on benchmarks, observing excellent performance in tasks such as nonlinear and Mackey-Glass chaotic time series prediction as well as in handwriting recognition.
Overall, the manuscript describes a novel physical design principle and hardware implementation that adopts a cyclic reservoir computing architecture. The works in this manuscript are systematic and well organized, representing an interesting and exciting progress towards practical RC systems for real-time signal processing in applications. I recommend publication in Nature Communications, subject to some of the technical questions be addressed properly -as follows.
1. The proposed physical implementation relies on a simple (yet effective) cyclic reservoir structure. The underlying hypothesis seems to be that random RCs are not as easily realizable physically. In between pure random and cyclic RC, and from a more fundamental and basic perspective, can the authors discuss and comment on what would be the class of RC structures that can similarly be mapped to efficient physical designs?

Reviewer #1
General comments: This manuscript reports on a study demonstrating a novel type of reservoir computing based on a cyclic reservoir. The study is noteworthy in two aspects: (i) it is the first to demonstrate a hardware realization of the concept of a cyclic reservoir, showing its equivalence with a rotating neuron reservoir device; (ii) the novel hardware prototype developed in this study demonstrates a low-power end-to-end analog system capable of online learning. The work is highly original and has potential to significantly impact the field of neuromorphic computing. It is refreshing to see such a compelling demonstration of physical reservoir computing. In my opinion, the novelty of the work and quality of the results merit publication in NCOMMS. The supplementary videos are impressive. This notwithstanding, I do have some suggestions below for improving the manuscript.

Response:
We are grateful for the reviewer's positive comments and recognizing the novelty and importance of our work in the field of neuromorphic computing. We have further revised the manuscript following the reviewer's suggestions as detailed in the following.

Comment #1:
Discussion: The discussion as currently presented is a summary of the results. I would like to see this rewritten so that the authors discuss their results, both qualitatively and quantitatively, in the context of other similar studies. e.g. how do their NARMA10 simulation results compare to those reported by Appeltant et al. (2011) for a single cyclic reservoir? What are the similarities/differences? Similarly, how do their experimental results for the Mackey-Glass forecasting task compare to other physical RC studies? Can the authors comment on the ability to predict more chaotic MG signals (e.g. tau >> 17)? So, instead of predicting more steps ahead, predict one step ahead, but for a more chaotic signal. See also this paper that shows how training on more chaotic MG signals can improve prediction of less chaotic signals (i.e. demonstration of transfer learning): doi: 10.1109/ICRC2020.2020.00007

Response:
We thank the reviewer for raising this important point. Following your suggestions, we have added more qualitative discussions and comparisons in the revised Discussion section. Compared to literature reports, our simulation result on NARMA10 achieves the best performance in terms of NRMSE (0.078 for single eRNR and 0.055 for parallel eRNRs), to the best of our knowledge. Appeltant et al. (2011) claimed that their result (NRMSE=0. 15) was the best one using hardware-based model at that time. Meanwhile, it has also been found that an earlier work 1 using a software-based echo state network achieved NMSE = 0.0098 (equivalent to NRMSE of 0.099). It can be seen that our NRMSE values are clearly lower than both prior works. Such improvement in our work can be mainly attributed to the extra nonlinearity provided by the diode model in our hardware-based simulation, which is different from the abrupt ON/OFF in the softwarebased ReLU function. In addition, the Mackey-Glass (MG) forecasting task preliminarily tested the computing ability of our eRNR prototype under different experimental setups and numbers of parallel eRNRs. The main advantages of our prototype compared with other physical implementations are the system simplicity and the novel architecture. As the reviewer suggested, we have now tested the prototype's performance with more chaotic MG signals ( >> 17), and the results show that the system still works properly for one-step ahead prediction of more chaotic MG signals (Supplementary Figs. 1a-f). Meanwhile, for sanity check, we have also evaluated the system performance for 20-steps ahead prediction, which degrades more obviously as  increases ( Supplementary Fig. 1g). This result is also consistent with the paper by Changes in the manuscript: Page 14, Line 352: It is found that the additional nonlinearity provided by the hardwarebased dynamic neuron could enhance our system performance on approximating NARMA10 system, which demonstrates the computing potentials of the proposed RNR method. Page 15, Line 358: This experiment result further validates the computing ability of our eRNR prototype under different experimental setups. Page 10, Line 242: Moreover, our experiment also revealed the eRNR prototype can properly predict more chaotic signals ( >> 17) one step ahead (Supplementary Fig.   1a-f). In comparison, the system performance could degrade as  increases when predicting more steps ahead ( Supplementary Fig. 1g).

Comment #2:
On line 347-348: please cross-reference supp table 1. Also, please add clarification that the stated power (32.7 uW) is for 10 Hz while the memristor array dominates at higher processing rates. Can the authors similarly compare to other memristor-based RC systems?

Response:
Thank you for your comment. We have now cross-referenced Supplementary Table  1 in the revised Discussion section. Our power analysis shows that the static power, mainly dissipated by the dynamic neurons, dominates the system for processing rates lower than 100 kHz, while the overall system power remains at a low level for higher processing rates. Here we have added Supplementary Table 2 to show the power breakdown at different processing frequencies according to Eq. (11). This result can be explained by the fact that most computations occur in the analog domain that only contribute to static power, which is in line with the advantage of analog neuromorphic computing.
In addition, we have also tried to compare the power consumption of our eRNR system with memristor-based reservoir computing systems reported in liteature 2,3 . However, their system-level power consumptions were not reported in most studies, which mainly focused on the device-level power consumption instead. So we listed all the previous studies in Supplementary Table 1 that have clearly reported how much power is needed to run their reservoir computing system. Changes in the manuscript: Page 15, Line 363: The overall system power consumption was estimated to be as low as 32.7 mW for the handwriting tasks operating at 10Hz (r = 0.1s), which showed an advantage of more than three orders of magnitudes compared to literature-reported reservoir computing systems. Also, further power analysis suggests that the static power, mainly dissipated by the dynamic neurons, dominates the system for processing rates lower than 100 kHz, while the overall system power remains at a low level for higher processing rates (>100kHz) (see Supplementary Table 1). This result can be explained by the fact that most computations occur in analog domain that only contributes to the static power at low frequency, which is in line with the advantage of analog neuromorphic computing. The dynamic power, mainly attributed to logic switches and memristor array, starts to dominate for processing rates higher than 100kHz (see Supplementary Table 2). Further discussions on the low-power advantage of eRNR can be found in Supplementary Note 3. Page 23, Line 530: The power breakdown at different processing frequencies is shown in Supplementary

Comment #3:
Lastly, the authors should mention prospects for future studies that expand the capabilities of their system. e.g. online classification of more than 5 classes.

Response:
Thank you for pointing this out. There are several techniques to improve the RNR system capabilities to deal with more complex tasks (such as classification of more than 5 classes): 1) increase the number of neurons (N) to enhance both MC and network size; 2) increase the number of parallel RNR (M) to enhance the network size; 3) using different configurations for each neuron (currently they are simply replicated) is expected to enhance the state richness, and they could be optimized for different tasks. For example, higher n could enhance the system performance on more chaotic signals; 4) develop a deep eRNR, consisting of multiple eRNR cells in series, could enhance the classification ability for different classes input. We have incorporated these performance enhancement techniques for future studies in the Discussion section to provide prospects for future studies.
Changes in the manuscript: Page 15, Line 375: To further enhance the eRNR system capabilities to deal with more complex tasks, a useful approach would be increasing the number of neurons (N) or the number of parallel eRNRs (M) to expand the network size. Furthermore, a deep eRNR, consisting of multiple eRNR cells in series, could enhance the classification performance for input of different classes. Also, from hardware perspective, different configurations of each neuron could be beneficial to enhance the state richness and hence improve the system performance. In addition, the eRNR design can be miniaturized and monolithically integrated on chip for low power and ultrafast computing. It is also worth mentioning that the dynamic neuron may be replaced by recently reported emerging devices (e.g., dynamic memristors 2,8 , spintronics 9 ) to further reduce the size and power consumption.

Response:
We thank the reviewer for this comment. The reservoir computer based on memristive nanowire is also a novel and advanced implementation that should be cited. Minor Comment #3: lines 117, 119 (and possibly elsewhere): will be --> is

Response:
We have corrected the gramatic errors as the reviewer suggested. Thank you.

Response:
We have cross-referenced to the Methods section as the reviewer suggested. Thank you.

Minor Comment #5:
lines 141-143: needs rephrasing, e.g. The simulator was developed to evaluate the performance of the eRNR and demonstrate its equivalence to a CR (as shown analytically in Methods).

Response:
We have rephrased the sentence as the reviewer suggested. Thank you.
Changes in the manuscript: Page 6, Line 141: Moreover, a simulator was developed to evaluate the performance of the eRNR under different configurations and demonstrate its equivalence to a CR (as proved analytically in Methods). The first simulation was designed to confirm the consistency between RNR and CR and emphasize the role of rotation in RNR. The key network characteristics under different parameters, nonlinearity and rotation directions were investigated.

Response:
Thank you very much for the comment. As the reviewer correctly pointed out, NARMA is not a chaotic system since the output is insensitive to the initial condition. Only a certain number of previous inputs are dependent as defined by the equation. We have removed the 'chaotic' from the sentence.
Changes in the manuscript: Page 7, Line 170: the eRNR should be able to approximate a nonlinear chaotic system.

Response:
Thank you very much for pointed out the typo. We have corrected it in the revision.

Response:
We have corrected it as the reviewer suggested. Thank you.

Minor Comment #9:
lines 192 -194: this strong statement needs to be backed up with references and discussed in the Discussion (as per my comment above).

Response:
We thank the reviewer for raising this good point. We have carefully checked related literature. For the NARMA10 system, the article 10 that proposed delay-based reservoir computing mentioned that their result (NRMSE=0.15) was the best using hardwarebased model. Meanwhile, it has also been found that an earlier article 1 using software echo state network can reach NMSE = 0.0098 (equivalent to NRMSE of 0.099). Here we have now cited these two papers to support our statement in the revision.

Changes in the manuscript:
Reference cited at Page 8, Line 196: To the best of our knowledge, the NRMSE values for both single eRNR (0.078) and parallel eRNR (0.055) are the lowest compared with the previous studies 1, 10 in the field of RC.

Response:
Thank you for your comment. We agree that "nonlinearity" is a better word to describe the reason for the record-low NRMSE value achieved in this work.
Changes in the manuscript: Page 8, Line 198: The reason is that the exponential type of nonlinearity provided by the transition region of diode (differ from the ideal ON/OFF in the software ReLU function) enhances the state representation of NARMA10 system.

Minor Comment #11:
line 207: remove "it has been studied that"

Response:
We have removed "it has been studied that" as the reviewer suggested. Thank you.

Minor Comment #12:
lines 238-239: cite references to back up this statement

Response:
We thank the reviewer for this comment. We have now cited three papers using spintronic and memristive devices ( Changes in the manuscript: Reference cited at Page 10, Line 247: In literature, the previously reported reservoir computing demonstrations can reach rather low power consumption for certain parts inside the system using novel devices and materials 2,8,9 . Minor Comment #13: line 361: remove "or backpropagation" Response: Thank you. We have removed the word 'backpropagation' as the reviewer suggested.
Minor Comment #14: line 378: remove "which is of great interest"

Response:
We have corrected it as the reviewer suggested. Thank you.

Response:
We have corrected them as the reviewer suggested. Thank you.

Minor Comment #16:
line 593: Fig. 1 caption, possibly try placing the legend at the bottom because it applies to each fig

Response:
We thank the reviewer for this good suggestion. We have corrected the figure caption.
Changes in the manuscript: Minor Comment #17: line 616: Fig. 2b caption, mention a(k) is neuron output at k-th step

Response:
We have mentioned that a(k) is neuron output at k-th step as the reviewer suggested. Thank you.
Changes in the manuscript: Page 31, Line 686: b, A general schematic for the dynamic properties required to be the neuron in an RNR. On the arrival of a neuron input R k Winu(k) that has been processed by pre-neuron rotor and input weights, the neuron performs nonlinear transform f, integration (feedback line), and leakage (decay factor d) operations on the signal. The a(k) is neuron output at k th step.

Response:
We have clarified it as the reviewer suggested. Thank you.
Changes in the manuscript: Page 33, Line 696: Figure 3. Simulation results on network characteristics of eRNR and its performance in time series prediction.

Response:
We have clarified it as the reviewer suggested. Thank you.

Reviewer #2
General comments: Report on Rotating neurons for all-analog implementation 1 of cyclic reservoir computing In this work, the authors realize an analog electronic implementation of cyclic reservoirs.
On the plus side: -cyclic reservoirs have to my knowledge not been implemented before (while reservoirs with delay lines, which are very similar, have been extensively studied).
-the method to implement cyclic reservoirs, using a rotating logic element, is novel and elegant.
-the system is fully analog, including an analog output layer. An interface with a tactile screen is presented.
-An estimate of the energy consumption of an integrated system using CMOS technology is presented.
-Performance on tasks is good.
-The paper is mostly well written and easy to follow.

Response:
We are grateful to the reviewer for the thoughtful comments and recognizing the merits of this works. In this following, we have addressed your detailed comments point by point and revised the manuscript accordingly. Hopefully the revised version and our responses can fully address the reviewer's concerns.

Comment #1:
The trick of using a rotating logic element is a trick, not an important conceptual advance.

Response:
We thank the reviewer for this comment. The community of neuromorphic computing has been actively pursuing resource-efficient hardware to implement brain-like learning algorithms. The major conceptual advance of our work is that we have proposed a novel rotation-based hardware architecture to implement reservoir computing.
We have also proved excellent network-level consistency (see Fundamental of RNR, Methods) between the software algorithm (CR) and physically rotating object, and therefore we can empower rotating objects with computing capabilities. The eRNR system composed by logic elements is just a prototype example to demonstrate the concept of RNR, which can be readily extended to other rotating objects. In our opinion, the conceptual advance of RNR could provide an advantageous paradigm to explore the computing resource from physics. Meanwhile, it is also a practical solution for engineering a resource-efficient processor at the edge, which could find applealing applications in near-sensor or in-sensor computing where the all-analog processor can act as a direct interface of a sensory signal without any intermediary modules.
With the above conceptual advances, we believe our work could make a valuable contribution to the field of neuromorphic computing.
Changes in the manuscript: Page 14, Line 342: In summary, we have developed a novel RNR architecture for allanalog neuromorphic computing for the first time, which represents a fundamentally different reservoir architecture compared to conventional hardware implementations. The proposed RNR has been validated in theory, simulation, and experiment. The theoretical analysis of RNR rigorously mapped the CR algorithm onto the physical rotation of dynamic neuron array, laying a solid foundation for the following hardware implementation. Such RNR can be embedded into natural rotating components in various electronics, mechanical systems or even nanorobotics and empower them with computing ability.

Comment #2:
The cyclic reservoir is only a minor variation on delay reservoirs, which have been much studied.

Response:
We thank the reviewer for this comment. In our opinion, the cyclic reservoir, which is equivalent to the proposed rotating neuron reservoir (RNR) in this work, is fundamentally different from delay-based reservoirs. In fact, delay-based reservoir computing proposed in 2011 10 was also inspired by the cyclic reservoir 11 (see the Supplementary Information of (Appeltant et. al., Nature Communication, 2011) 10 ) and has since gained lots of attention because it is friendly to hardware implementations. The relation between classical random RC, cyclic RC, delay-based RC and RNR are illustrated in Fig. R1. Figure R1. Relation between classical random RC, cyclic RC, delay-based RC and RNR.
As we can see from the comparison in Table R1, delay-based RC should be considered in parallel to RNR since they represent completely different implementation paradigms, while both are inspired by cyclic reservoir which is a simplified version of classical reservoir. Although delay-based reservoir computing has been well studied, its weakness hinders its further development as discussed in the Introduction part (Page 3, Lines 57-69).
The highly cited review article on physical reservoir computing 12 by Tanaka et al.
(2019) also pointed out that implementing a delayed feedback loop is not a straightforward task. A novel resource-efficient implementation is still of great interest in the field of reservoir computing and neuromorphic computing, which is what we have demonstrated in this work with RNR.

Common #3
A convincing argument for using cyclic reservoirs with rotating elements, rather than delay reservoirs, is not presented.

Response:
Thanks for the comment. In the Introduction part, we have already discussed the major drawbacks associated with the use of delayed feedback in delay-based reservoirs (Page 3, Lines 57-69), which motivates us to propose a novel resource-efficient implementation, the cyclic reservoir (CR) with rotating elements. In the revision, we now make more comparisons between the proposed RNR with existing reservoir implementations including delay-based one (see Line 75-77, Line 343-344, Line 330-332, Line 365-366, Supplementary Note 3). The main advantages of using RNR can be summarized as follows:  Explainability. In the Methods section, we have proven the equivalence between software CR and rotation-based framework, which is also endorsed by the taskindependent properties analysis in Fig. 3. Such explainability, which is currently lacking for delay-based reservoir, would benefit future developments and optimizations. For delay-based reservoir, a nonlinear node is used to provide nonlinear function while its dynamic property makes connections between neighboring virtual nodes. Additionally, one or more external delayed feedback lines merge current input and previous state matrix to provide memory capacity. Appeltant et al. (2011) also mentioned in their supplementary information that the working principle is different from the traditional reservoir computing. Therefore, delay-based dynamical system can act like a reservoir, but it is less likely to be explained by a standard software reservoir computing model.  Hardware implementation. The advantage of RNR in hardware implementation has been discussed in Line 366, Line 424 (added), and Supplementary Note 3 (added). In our method, a physical reservoir computer can be implemented by rotating elements. The electrical RNR can be achieved by low-cost logic elements without other critical components, such as analog-to-digital converter (ADC) and memory units. For delay-based reservoirs, the delayed feedback line plays a crucial role in generating memory capacity. However, the electrical implementation of delayed feedback usually requires ADC, digital-to-analogue converter (DAC) and memory, as presented in 10 . These components are usually not desirable in neuromorphic computing since they (1) require additional space and cost; (2) consume more power, especially in high processing speed or large network size; (3) require additional control unit for reading, erasing and writing the memory as well as ADC/DAC, which increases the system complexity. More comprehensive discussions about these digital memory-related constraints in neuromorphic computing can be found in the literature 9,13,14 .  Parallel computing. RNR is a highly parallel architecture (see Fig. 2 and Supplementary Videos 1 and 2). In comparison, for delay-based reservoir, the use of time-multiplexing leads to serial operation at both input and readout; therefore, only one state value can be obtained at each time step. The benefits of parallel computing can be found in the literature [12][13][14] .  Power consumption. Because of the simplicity of eRNR, it exhibits excellent power efficiency. According to the implementation of eRNR (Fig. 2), all the components that consume power are discussed (see Methods). The comparison with delay-based reservoir and other implementations is listed in Supplementary  Table 1. To the best of our knowledge, our implementation shows the record-low power consumption for the entire system, which highlights the power advantage when using eRNR. Of course, we have to emphasize that the delay-based reservoir and cyclic reservoir are not necessarily exclusive, but instead they can be supplementary for each other. For example, one can add a delayed feedback line to RNR to further enhance the memory capacity. Such a hybrid rotation and delay reservoir is feasible and warrants future exploration. Here we summarize the above comparison in Table R2:

Comment #4:
The analog output layer has been presented in previous work.

Response:
Thanks for the comment. As the reviewer pointed out, memristor crossbar has been presented in previous works to implement the analog fully connected output layer for reservoir computing. In fact, memristor has been widely used to implement fully connect layers in artificial neural network by taking its advantage of computing-inmemory capability [19][20][21][22] . Similarly, memristor arrays have also been used in recently reported reservoir computing demonstrations to implement the final output layer 23,24 . However, we shall clarify that this is not the main focus and key novelty of our work, which instead aims to demonstrate a novel rotation-based reservoir computing. The memristor crossbar-based analog output layer used in our eRNR system is rather for the purpose of demonstrating all-analog neuromorphic computing. To clarify this point, we have added a statement in Line 499.
Changes in the manuscript: Page 21, Line 499: Memristor-based analog computing shows great potentials in neuromorphic computing. While the input and reservoir layer have been achieved by eRNR design, the output layer, which employs standard vector-matrix multiplication operations, can be effectively implemented by a memristor array for end-to-end allanalog computing. The description of the memristor array can be found in Supplementary Fig. 2 Technical Comment #1: The prototype system is not presented in detail. Please provide enough details for another researcher to be able to reproduce the experiment.

Response:
We thank the reviewer for this thoughtful comment. The details about the prototype system have been now added to Supplementary Note 2.
Changes in the manuscript: Supplementary Note 2, Line 110: The schematic of eRNR is shown in Fig. 2. The network size of our prototype is N = 8 and M = 8, which means the single eRNR consists of 8 neurons and there are 8 parallel eRNRs. Both pre-and post-neuron rotors were implemented by eight CD4051B which is an 8-channel analog multiplexer from Texas Instrument. The three signal selection ports were connected to a 3-bit binary counter consisted of a 4-bit counter (74LS161) and an inverter (74HC04). The input mask was implemented by 8 switches to select positive or negative signals. In order to improve the state richness, each eRNR circuit should use a different input mask configuration.

Technical Comment #2:
The paper is not very clear (particularly in the abstract and introduction) on what is experimental, what is simulation, and what is theoretical calculation.

Response:
Thank you very much for the comment. We have now clarified how the results were obtained in the revised manuscript: (1) the equivalence between software and hardware was proven mathematically; (2) the record-low performance on NARMA10 was obtained in the simulation using eRNR model; (3) Mackey-Glass chaotic signal was tested in hardware experiment; (4) handwritten vowel recognition was demonstrated in hardware experiment; (5) power consumption was estimated using 65nm CMOS design in Cadence.
Changes in the manuscript: Page 2, Line 26: The equivalence between the rotating neuron reservoir and standard cyclic reservoir algorithm is mathematically proved. Simulation shows the rotating neuron reservoir achieved record-low errors in the time-series prediction benchmark. Page 4, Line 79: Furthermore, a prototype of eRNR composed of eight parallel reservoir circuits was built to demonstrate analog near-sensor computing, where Mackey-Glass time series prediction and real-time handwriting recognition were successfully performed in hardware experiment. Page 4, Line 86: Finally, the CMOS circuit simulation based on standard 65nm technology indicated that the eRNR system consumed as low as 32.7W for the handwriting recognition task, which was more than three orders of magnitudes lower than literature-reported reservoir systems.
Page 33, Line 697: Figure 3. Simulation results on network characteristics of eRNR and its performance in time series prediction.

Technical Comment #3:
If the authors claim that, in the context of CMOS implementations, their rotating architecture is more efficient than other architectures. They should present evidence. For instance it is not clear why a delay system with a single nonlinear node (Ref 6 Appeltant et al) could not be as energy efficient.

Response:
We thank the reviewer for this comment. The reasons for the high efficiency of RNR implementation can be explained as follows: 1. In the proposed rotating neuron architecture, the CR algorithm highly matches the hardware behavior. This excellent consistency frees the system from using an extra control unit, ADC and memory, which significantly reduces the system complexity and power consumption. Meanwhile, the rotation implemented by logic elements consumes extremely low power (in pW scale). In the delay-based reservoir presented by Appeltant et al. 10 , a continuous analog signal needs to be delayed for a certain time length. In their experiment, the delay line was implemented by a PCcontrolled NI-6025E with 12-bit AD/DA from National Instrument, which alone consumes power >5W according to the datasheet, much higher than that of the logic elements for rotation. For a more practical comparison, we can roughly estimate the power needed to implement a typical delay line using CMOS components: 8-bit ADC (ADC1175-50), 8-bit DAC (DAC084S085) and 8-bit static random-access memory (SRAM). As can be found from their datasheet, their static power are all in milliwatts scale (i.e. ADC is 5mW, SRAM is ~6.6mW and DAC is ~1.1mW). In practical implementations, this delay line should be subjected to a controller or logic circuit to control the timing and digital address. Also, extra modulation units are essential in both the input and output layer for the serial timemultiplexing operation and taking the summation between the input signal and feedback signal. These components would also consume considerable power. In addition to the ADC-SRAM-DAC solution, there exists an analog CMOS-based solution for delaying a signal named bucket-brigade delay line invited by Philips Research Labs 25,26 , which has been discontinued in last centary. Its datasheet (MN3004) suggests that it consumes 165mW power. These values indicate that, even if we only consider the static power of delay line, the delay-based approach is significantly more power-hungry than eRNR. 2. From a more fundamental perspective, the different mechanisms of introducing memory in delay-based and rotation-based methods determine their potential in power efficiency. In the delay-based approach, the memory is actually separated from the processor. Although the processing is carried out in a nonlinear dynamic node, the memory is mainly provided by the delay unit which is constrained by the limitations of conventional digital computing, such as power consumption, throughput and latency 13 . In the rotation-based method, the memory is provided by the rotating dynamic node itself (see Fig. 3a and Methods). Implementing the logic switches for rotation is much more resource-efficient than delay lines in terms of power and cost. Meanwhile, the rotating dynamic node serves to process the signal and retain previous information simultaneously. Such in-memory computing paradigm is advantageous for low-power computing. These fundamental differences result in the higher power efficiency of the rotation-based implementation in this work. 3. In addition, as we responded to Comment #3 earlier, the comparisons with delaybased reservoir and other implementations are listed in Supplementary Table 1.
To the best of our knowledge, our implementation shows the record-low power consumption for the entire system, which highlights the power advantage when using eRNR. Changes in the manuscript: Page 14, Line 338: More discussions and comparison on the power efficiency of eRNR can be found in Supplementary Note 3. Page 15, Line 366: Also, further power analysis suggests that the static power, mainly dissipated by the dynamic neurons, dominates the system for processing rates below 100 kHz, while the overall system power remains at a low level for higher processing rates (>100kHz) (see Supplementary Table 1). This result can be explained by the fact that most computations occur in the analog domain that only contributes to static power, which is in line with the advantage of analog neuromorphic computing. The dynamic power, mainly attributed to logic switches and memristor array, start to dominate the system for processing rates higher than 100kHz (see Supplementary Table 2). Further discussion on the low-power advantage of eRNR can be found in Supplementary Note 3. Page 18, Line 424: It implies that a rotating object with dynamic neurons can act as a reservoir computer without using extra control units, ADC and memory, which remarkably reduce the system complexity and power consumption compared with conventional hardware implementation (see Supplementary Note 3).
Supplementary Note 3. Why eRNR can be more power efficient? From a fundamental perspective, the different mechanisms of introducing memory in rotation-based architecture and other architectures largely determine their power efficiency. In the rotation-based architecture, the memory is provided by the rotating dynamic node itself (see Fig. 3a and Methods). The excellent consistency between the rotation behavior and software algorithm frees the system from using extra control units, ADC and memory, which remarkably reduce the system complexity and power consumption. Also, implementing the logic switches for rotation is a resource-efficient approach by using CMOS-based transmission gates. Meanwhile, the rotating dynamic node serves to process signal and retain previous information simultaneously. Such inmemory computing paradigm is advantageous for low-power computing. In other architectures, such as the well-studied delay-based one, the memory is actually separated from the processor. Although the processing carried out in nonlinear dynamic node was a significant progress, the memory is mainly provided by the delay unit which is constrained by the limitations of conventional digital computing, such as power consumption, throughput and latency 13 . These fundamental differences result in the better power efficiency for the rotation-based architecture.
Compared with the classic random reservoir computing, the key difference of cyclic reservoir is the connection in the reservoir layer defined by Wres. The Wres of random reservoir is a randomly generated matrix with a proper spectral radius, while the cyclic counterpart is a shifted identity matrix which can be implemented in a more deterministic manner without performance degradation 11 . In this work, it has been proven that the cyclic Wres can be equivalent to a physical rotor (see Methods), while an effective physical counterpart of random Wres is yet to be found, which remains an exciting challenge to be addressed for future studies.

Minor Comment #1:
I found it quite confusing to use the acronym RC for reservoir computer, and CR for cyclic reservoir computer. Initially while reading the paper, I thought CR was a typo.

Response:
Thanks for the comment. To avoid any confusion, we have changed all the acronym 'RC' back to 'reservoir computing' in the revision, while keeping the acronym 'CR' for 'cyclic reservoir'.

Changes in the manuscript:
All the acronym 'RC' were changed into 'reservoir computing'

Minor Comment #2:
"Through our noise-aware training method, the conductance variation of memristor array was accommodated and a high classification accuracy of 94.0% was achieved." Imprecise. Classification of what?

Response:
Thanks for the comment. The accuracy of 94.0% was achieved for in the handwriting vowel recognition task, which has been clarified in the revision.
Changes in the manuscript: Page 4, Line 83: Through our noise-aware training method, the conductance variation of memristor array was accommodated and a high classification accuracy of 94.0% was achieved in the handwriting vowel recognition task.

Minor Comment #3:
"Finally, the system benchmark indicated that the eRNR system consumed as low as 32.7 microW for the handwriting recognition task, which was more than three orders of magnitudes lower than literature-reported reservoir systems." Isn't this a theoretical estimate. The sentence suggests it is an experimental result.

Response:
We thank the reviewer for this comment. Yes, the power consumption is a theoretical estimate using the circuit design with standard 65nm CMOS technology. We have clarified it in the text.
Changes in the manuscript: Page 4, Line 86: Finally, the CMOS circuit simulation based on standard 65nm technology indicated that the eRNR system consumed as low as 32.7W for the handwriting recognition task, which was more than three orders of magnitudes lower than literature-reported reservoir systems.

Minor Comment #4:
Performance benchmark of eRNR. « the eRNR should be able to approximate a nonlinear chaotic system, for which NARMA is a widely recognized benchmark task to test RC performance." NARMA is a Nonlinear Auto Regressive Moving Average system, not a nonlinear chaotic system.

Response:
Thanks for pointing it out. NARMA is indeed not a chaotic system since the output is insensitive to the initial condition. Only a certain number of previous inputs are dependent as defined by the equation. We have hence removed the word 'chaotic' from the sentence.
Changes in the manuscript: Page 7, Line 170: the eRNR should be able to approximate a nonlinear chaotic system.

Minor Comment #5:
"It is worth mentioning that the biologically realistic time constant values ( … ) were used throughout our hardware implementation and simulation so as to interact with the environment in biological time scales." Unclear. What is biological? Why is slow good? Is it the electronic neurons that should imitate biological neurons? Or the task (handwriting recognition) that must be on the time scale of humans?

Response:
We thank the reviewer for this comment. The reviewer is correct that the electronics are preferred to be on the time scale of the human for the specific task of handwriting recognition. In fact, the 'biologically realistic time constant' was originated from the literature 13 (Indiveri, G. et al., Proc. IEEE, 2015). It has been discussed that, in neuromorphic processors, electronics that are directly interacting with the environment and natural signals could exhibit a much longer time constant (e.g., >millisecond scale) compared with a conventional digital processor (e.g. <nanosecond scale). Similarly, if we design the neurons with a short time constant (n) in our handwriting application, the user will have to write fast to maintain the effect of memory capacity, which is not realistic. In general, the setting of the time constant in the implementation of reservoir computing systems depends on the specific tasks. To avoid confusion, we have further clarified this point in the revision.
Changes in the manuscript: Page 8, Line 186: It is worth mentioning that, in neuromorphic computing system, the electronics directly interacting with the environment and natural signals could exhibit a much longer time constant (e.g., >millisecond scale) compared with those of digital system 13

Minor Comment #6:
"To the best of our knowledge, the NRMSE values for both single eRNR (0.078) and parallel eRNR (0.055) are the lowest compared with the previous studies in the field of RC." Maybe add some references?

Response:
We thank the reviewer for raising this important point. We have carefully checked related literature. For the NARMA10 system, the article 10 that proposed delay-based reservoir computing mentioned that their result (NRMSE=0.15) was the best using a hardware-based model. Meanwhile, it has also been found that an earlier article 1 using software echo state network can reach NMSE = 0.0098 (equivalent to NRMSE of 0.099). Here we have now cited these two papers to support our statement in the revision.

Changes in the manuscript:
Reference cited at Page 8, Line 197: To the best of our knowledge, the NRMSE values for both single eRNR (0.078) and parallel eRNR (0.055) are the lowest compared with the previous studies 1,10 in the field of RC.

Minor Comment #7:
Eq. 2. The Mackey Glass equation has a power at the denominator. Eq. 2 seems wrong?

Response:
Thanks for pointing out the typo in Eq. 2. The power of n at the denominator was indeed missing. In addition, we suggest using the differential equation rather than the discrete approximation. We have corrected it in the revised manuscript.
Changes in the manuscript: Page 9, Line 210: The Mackey-Glass system is defined by: where the system parameters g, b, and n follow the widely used values 0.1, 0.2, and 10, respectively.

Minor Comment #8:
Demonstration of near-sensor computing: handwriting recognition ."Using the datasets collected from the eight participants, the noise of our memristor array associated with the mapping and reading processes was taken into simulation to analyze the performance". Unclear sentence.

Response:
Thanks for pointing it out. Due to the conductance noise of memristor devices, the weight values cannot be accurately mapped onto the memristor, which has to be taken into consideration in the simulation in order to find a proper training scheme. We have completely revised the sentence to avoid confusion.
Changes in the manuscript: Page 12, Line 300: The next simulation evaluates the effect of conductance noise of memristors on the classification performance in order to find a proper training scheme.

Minor Comment #9:
Demonstration of near-sensor computing: handwriting recognition. Noise-aware training. Adding noise to the training data before regression is equivalent to Ridge Regression. It would seem to me easier to use Ridge regression with an appropriate parameter. If this is indeed equivalent to Ridge Regression, then please don't introduce a new terminology for something already existing.

Response:
Thanks for your comment. As the reviewer mentioned, adding noise before regression could result in similar effect as Ridge regression on the weight matrix. In this work, by adding noise, we can systematically evaluate the effect of the noise amplitude as well as the distribution on the system performance, which enables us to match the memristor behavior in simulation and experiment, as shown in Fig. 5h. In fact, the idea and terminology of noise-aware training has been commonly used in literature for memristor-based neural networks [27][28][29] . Considering the main noise source in the allanalog architecture is the conductance noise of memristor devices, we suggest keeping the term of "noise-aware training" in this work and adding explanations to clarify it.
Changes in the manuscript: Page 12, Line 303: In our experiment, the intrinsic noise of memristors is the dominating noise source in the all-analog system. To achieve a high accuracy, we have adopted a noise-aware training method to obtain a more robust Wout in the presence of memristor conductance variation 27

Minor Comment #10:
Methods. Power estimation. I find this section unclear.
-Eq. 11 does not have any dependence on the number N of neurons, only on M.
-"The M parallel eRNRs can share one counter but the power for the other components increases." Increases with what ? Unclear -What would be the best speed at which to operate the system? 50ns per operation to be compatible with the memristor array?

Response:
We thank the reviewer for raising these important points. For the first question, the power estimation aims to analyze the power consumed by M 8-neuron eRNR in the handwriting recognition and Mackey-Glass time series prediction task. Here the number of neurons is fixed at N = 8 in order to evaluate the computing ability dependence on M in the Mackey-Glass time series prediction. We have clarified it in the revision. For the second question, the M parallel eRNR can share a common counter, so the static power consumed by the counter does not increase with M. This is the reason that the Pc is excluded in the multiplication with M in Eq. (11). Meanwhile, the dynamic power of counter and transmission gate, and static power of neuron and transmission gate are proportional to M. We have clarified it in the revision.
For the last question, the best speed to operate the system should actually depend on the specific application. As discussed above, the applications interacting with the natural environment in real-time usually have a slow processing rate. For example, our eRNR prototype was designed for handwriting recognition task. Both time constants for operating the system (r) and neurons (n) are chosen to be relatively slow (on the order of ~0.1s). Meanwhile, it does not need to be compatible with the memristor array.
During every r, only one-time inference is needed since all state channels are monotonously increased or decreased. So the memristor array is activated once for ~50ns during every r to yield the output . More information about our memristor array can be found in our previous works 19,21 . Furthermore, the operation speed can be much faster when the system acts as a data processing accelerator that servers to speed up computing and no need to interact with the natural environment, which remains future exploration. We have clarified this point in the revision.  -Panels c and d. Please add in caption that this is 1 step ahead prediction.
-Panel e and f. Unclear how these are computed. Please clarify.

Response:
Thank your for the comment. We have added the one-step ahead prediction to the caption. In addition, the phase diagram was computed by plotting y(t) for x-axis and y(t-) for y-axis, respectively. The y(t) values are from the signal in Fig. 4c and d. The phase plot is a commonly used diagram to study chaotic signal and visualize the chaotic attractor.
Changes in the manuscript: Page 35, Line 720: Two episodes of one-step ahead prediction of Mackey-Glass time series result compared with the ground truth using (c) one eRNR (NRMSE = 0.17) and (d) eight parallel eRNRs (NRMSE = 0.03). Page 35, Line 724: The phase diagram was computed by plotting the prediction and ground truth series y(t) for x-axis and y(t-) for y-axis, respectively.

Reviewer #3
Overall Comment: In the manuscript "Rotating neurons for all-analog implementation of cyclic reservoir computing" by X. Liang, Y. Zhong, J. Tang, et. al., the authors adopted a cyclic reservoir computing architecture which they show can be efficiently implemented using rotating neurons reservoir integrated with analog memristor array. They prove the equivalence between software simulation and hardware implementation. A proof-ofconcept prototype of RNR was developed and to demonstrate near-sensor computing.
The novel hardware design is tested on benchmarks, observing excellent performance in tasks such as nonlinear and Mackey-Glass chaotic time series prediction as well as in handwriting recognition.
Overall, the manuscript describes a novel physical design principle and hardware implementation that adopts a cyclic reservoir computing architecture. The works in this manuscript are systematic and well organized, representing an interesting and exciting progress towards practical RC systems for real-time signal processing in applications. I recommend publication in Nature Communications, subject to some of the technical questions be addressed properly -as follows.

Response:
Thank you very much for your positive comments on our work. We have improved the manuscript as the reviewer suggested.

Comment #1:
The proposed physical implementation relies on a simple (yet effective) cyclic reservoir structure. The underlying hypothesis seems to be that random RCs are not as easily realizable physically. In between pure random and cyclic RC, and from a more fundamental and basic perspective, can the authors discuss and comment on what would be the class of RC structures that can similarly be mapped to efficient physical designs?

Response:
We thank the reviewer for raising this important question. In between pure random and cyclic RCs, and from a more fundamental and basic perspective, cyclic reservoir and RNR is an interesting pair that exhibits a high level of consistency (see Methods). Such high-level consistency means that a complete cyclic reservoir algorithm can be fully mapped onto the physical behavior of an RNR without extra components. In neuromorphic computing, such consistency is highly favored in order to map computing operations onto more efficient electronics. Recent examples include: 1) using the computing-in-memory property of memristors to perform vector-matrix multiplication or multiply-accumulate operations 21,22 ; 2) using device's nonlinearity for nonlinear calculations 10 ; 3) and also exploring device's oscillation 9 , phase change 29,30 and stochastic response as computational resources. A neuromorphic system fully utilizing these physical properties can be more resource-efficient compared with digital implementations. However, these physical behaviors usually can only be partially mapped to the functionalities of a computing system. In most cases, extra components, such as ADC, DAC, memory and controller, are needed to run the entire system. Differently, the proposed RNR in this work maps the cyclic reservoir to a rotation-based object at the architecture level, which yields efficient physical design and unique advantages. Therefore, for a class of RC structure as well as other algorithms, it can be mapped to efficient physical design when its Regarding the classic random RC (also known as echo state network), the key difference of cyclic reservoir is the connection in the reservoir layer defined by Wres. The Wres of random reservoir is a randomly generated matrix with a proper spectral radius, while the cyclic counterpart is a shifted identity matrix which can be implemented in a more deterministic manner without performance degradation 11 . In this work, it has been proven that the cyclic Wres can be equivalent to a physical rotor (see Methods), while an effective physical counterpart of random Wres is yet to be found, which remains an exciting challenge to be addressed for future studies. Therefore, at the current stage, the random RC is less likely to be similarly mapped to efficient physical design as RNR. To clarify this point, we have provided detailed discussions to explain the fundamental reason for the power efficiency and why random RC is not easy to be similarly mapped to efficient design in Supplementary Note 3. Changes in the manuscript: Page 14, Line 338: More discussions and comparison on the power efficiency of eRNR can be found in Supplementary Note 3.
From a fundamental perspective, the different mechanisms of introducing memory in rotation-based architecture and other architectures largely determine their power efficiency. In the rotation-based architecture, the memory is provided by the rotating dynamic node itself (see Fig. 3a and Methods). The excellent consistency between the rotation behavior and software algorithm frees the system from using extra control units, ADC and memory, which remarkably reduce the system complexity and power consumption. Also, implementing the logic switches for rotation is a resource-efficient approach by using CMOS-based transmission gates. Meanwhile, the rotating dynamic node serves to process signal and retain previous information simultaneously. Such inmemory computing paradigm is advantageous for low-power computing. In other architectures, such as the well-studied delay-based one, the memory is actually separated from the processor. Although the processing carried out in nonlinear dynamic node was a significant progress, the memory is mainly provided by the delay unit which is constrained by the limitations of conventional digital computing, such as power consumption, throughput and latency 13 . These fundamental differences result in the better power efficiency for the rotation-based architecture.
Compared with the classic random reservoir computing, the key difference of cyclic reservoir is the connection in the reservoir layer defined by Wres. The Wres of random reservoir is a randomly generated matrix with a proper spectral radius, while the cyclic counterpart is a shifted identity matrix which can be implemented in a more deterministic manner without performance degradation 11 . In this work, it has been proven that the cyclic Wres can be equivalent to a physical rotor (see Methods), while an effective physical counterpart of random Wres is yet to be found, which remains an exciting challenge to be addressed for future studies.

Comment #2:
Suitable mask matrix can enhance the nonlinear dynamics in the reservoir. What is the influence of mask matrix (or the input layer matrix) on the proposed RNR? Will the performance be enhanced if chosen a non-binary mask matrix?

Response:
Thank you for raising this important point. There is actually an earlier study that compared the binary Win and multilevel Win (see Fig. S2). 31 Interestingly, the result suggested that there is no obvious difference between the two different Win configurations. In response to your comments, we have also carried out addtional simulations to compare the NARMA10 results of using binary weights and uniform distribution of multilevel weights for cyclic reservoir. Other parameters in Eq. (3) are the same: the standard deviation of Win is 0.5 (tuned by ), network size is 400, and  is 0.75.
The results are: NRMSE of multilevel weights is 0.2116±0.0146, NRMSE of binary weights is 0.2177±0.0117, as shown in Fig. S3. The multilevel weights yielded almost the same result as binary weights. In practice, for the convenience of hardware implementation, the reconfigurable input layer using binary weights can be easily realized by shifting the negative and positive signal sources, as shown in Fig. 2a in the main text. This is also a key design allowing the system to directly interface with analog sensory signals. In comparison, for the multilevel weights, it would be much more complicated to be implemented since additional memory would be required. We have clarified this important point in the revision. Changes in the manuscript: Page 5, Line 106: The implementation of the input layer using binary weights is also a key design allowing the system to directly interface with analog sensory signals. Win can be a matrix consisting of a randomly generated uniform distribution of -1 and 1, which has been proved as effective as multilevel weights 31 . Reference added: 26 Kuriki, Y., Nakayama, J., Takano, K. & Uchida, A. Impact of input mask signals on delay-based photonic reservoir computing with semiconductor lasers. Opt. Express 26, 5777-5788, (2018).

Comment #3:
Could the authors discuss more about the impact of the consistency between each nonlinear node on the system performance?

Response:
We thank the reviewer for this comment. As the reviewer pointed out, the hardware consistency (or variability) of each nonlinear node could affect the system performance.
In our eRNR design, the nonlinear nodes are implemented by standard resistors, capacitors and diodes (Fig. 2c), which have quite small and negligible device variations. This variability problem would become more prominent when novel devices or materials (such as dynamic memristors or spintronic devices) are used in such computing methods. According to previous studies, an interesting and important conclusion is that a certain degree of device variation would be beneficial to the performance of physical reservoir computing by enhancing the state richness 2,8 . In practice, how to precisely control the device variability to improve system performance remains challenging and warrants future explorations.
Changes in the manuscript: Page 19, Line 433: The dynamic node working in a physical reservoir may suffer from device variation and impact the system performance. Previous studies find that a certain degree of device variation may be beneficial to the system performance by enhancing the state richness 2,8 , where how to precisely control the device variability warrants future explorations.
We would like to thank the reviewers again for taking the time to assess our manuscript.

REVIEWERS' COMMENTS
Reviewer #1 (Remarks to the Author): I am satisfied with the responses to my comments and with the changes made to the manuscript, particularly the additional results added to Supplementary Material. I have no further comments and recommend publication.
Reviewer #2 (Remarks to the Author): Second report on Rotating neurons for all-analog implementation of cyclic reservoir computing I thank the authors for having carefully taken into account the comments of the reviewers. I understand now much better the significance of their work, and can recommend that it be published in Nature Communications. A few points should still be addressed.
First, the English is weak, with many mistakes and even a few sentences are misleading or incomprehensible. I list below an INCOMPLETE list of mistakes. The authors should have the manuscript reread by a native English speaker. Such mistakes will decrease the impact of their work, as it makes it more difficult for the reader to assess what has been achieved.
Second, numerical simulations of the CRC are described. Please specify whether realistic noise levels are taken into account in the simulators, or whether the simulators are noise free. This is particularly important in regards to the NARMA10 benchmark, which to my knowledge is very difficult to carry out in noisy systems. Personnaly I attribute the very good performance on NARMA10 to absence of noise, and to the size of the system: there seem to be 388*50 trained output weights.
Third. There is sometimes a confusion about whether power levels are estimated for the CMOS simulations, or measured for the demonstrator. For instance please specify in Supplementary Tables 1  and 2. (And it would be most interesting if you could give both numbers: for simulations and for demonstrator).
Introduction recurrent neural network with much lower training cost. Either lower training cost THAN something; or LOW training cost.
In principle, the complex dynamicS generated by the Furthermore, reservoir computing is powerful FOR processing temporal signalS owing to the recurrent connections that create the dependency between the current and the past neurons dynamicS, which is also known as (ii) in the absence of the delayed feedback line, the reservoir computing hardware suffers from the trade-off between memory capacity (MC) and state richness. Unclear sentence. Are you sure this is what you want to say?
indicated that the eRNR system WOULD consume as low as 32.7microW for the handwriting recognition task, which WOULD BE more than three orders of magnitudes lower than literature-reported reservoir systems. These results show the tremendous potential of the proposed RNR, offering a novel paradigm FOR resource-efficient reservoir computer. To better understand how the number of parallel RNRs affected the prediction, the states within 360s … Unclear Sentence.

Results
In comparison, the system performance could degrade as tau increases when predicting more steps ahead ( Supplementary Fig. 1g). Unclear sentence.
processing and feature extraction, are massively required. Unclear, particularly use of MASSIVELY system complexity and power consumption but cannot be neglected under **ARE NECESSARY IN** conventional physical RC, which remains a **AND REMAIN A KEY CHALLENGE *** key challenge for practical deployments**NO S** AS that the processor can act as a direct sensor interface for cognitive computing purposes To demonstrate **the** analog near-sensor computing, REMOVE THE and this experiment DEMONSTRATES that five different handwriting vowels Also, one important advantage of using eRNR is that its short-term memory property allows the network to retain the fading information of previous inputs in the state matrix at every time step. THIS IS GENERIC FOR RESERVOIR COMPUTING, NOT SPECIFIC TO eRNR. Maybe revise sentence.
Further advancement in this system involves the analog output weights stored in our memristor crossbar array. Not clear why memristors are mentioned here, as they are discussed later. Maybe remove this sentence.
Using the labeling, training and testing procedure introduced in the Methods section, 683 handwritings in the testing set (in total 703 handwritings) were correctly recognized, Please replace here and throughout "handwritings" by "handwritten vowels". Maybe better: "of the 703 handwritten vowels in the test set, 683 were correctly recognized" Here the software-trained Wout was deployed onto our demonstration platform interfacing the eRNR hardware to perform real-time nearsensor handwriting recognition (see Supplementary Video 2). UNCLEAR SENTENCE In the noise-aware training, a Gaussian white noise of ±0.03 was added to the normalized training state data **Is this the standard deviation? Please be more precise.*** and the standard deviation (target conductance-measured conductance) was about UNCLEAR. Particularly "conductance-measured conductance" seems garbled.
system that does not need to consume energy on writing and reading binary data frequently THAT RARELY NEEDS TO CONSUME ENERGY ON WRITING AND READING BINARY DATA.
lower power consumption compared with previous cutting-edge reservoir computing systems **in recent years**REMOVE, whose values are in the range dynamic neuron array, laying a solid foundation for ***the following***REMOVE hardware implementation. Furthermore, the output of dynamic neurons is ** determined by both the shifted input and the previous states ** (better than subjected to both shifted input and previous states:) the signal is fed into the dynamic nonlinear neurons and output a(n+1). If **Argument of a is a(k+1) or a(n+1). Please check** which remarkably reduceS the system complexity and power consumption By observing Eq. (3), it **is suggested that** BAD FORMULATION. Maybe "It appears that" a dynamic neuron for the proposed RNR should satisfy could be considered as a dynamic neuron under RNR architecture by coupling the time constant between the neuron and rotors.UNCLEAR I thank the authors for having carefully taken into account the comments of the reviewers. I understand now much better the significance of their work, and can recommend that it be published in Nature Communications. A few points should still be addressed.

Response:
We are grateful for the reviewer's constructive comments in the 1 st round of revision and recognizing the significance of our work. We have further revised our manuscript following the reviewer's suggestions. Our point-by-point responses to your comments are as follows.

Comment #1:
First, the English is weak, with many mistakes and even a few sentences are misleading or incomprehensible. I list below an INCOMPLETE list of mistakes. The authors should have the manuscript reread by a native English speaker. Such mistakes will decrease the impact of their work, as it makes it more difficult for the reader to assess what has been achieved.

Response:
We appreciate the reviewer's close check for the language problems. We have corrected all the mistakes according to the reviewer's comments. In addition, as suggested by Editor Dr. Iryna Omelchenko, the manuscript has been further edited by the qualified native English speaking editors from Springer Nature Author Service (see the certificate below). We hope the revised manuscript could be now acceptable for publication.

Comment #2:
Second, numerical simulations of the CRC are described. Please specify whether realistic noise levels are taken into account in the simulators, or whether the simulators are noise free. This is particularly important in regards to the NARMA10 benchmark, which to my knowledge is very difficult to carry out in noisy systems. Personnaly I attribute the very good performance on NARMA10 to absence of noise, and to the size of the system: there seem to be 388*50 trained output weights.

Response:
Thank you for your comment. Our simulator was indeed noise-free for the purpose of analyzing the working mechanism of the rotating neuron reservoir. We agree with the reviewer that the simulation result will be worse if noise is taken into consideration. In fact, all neuromorphic computing systems that work in the analog domain inevitably suffer from noise problem. In our study, we surprisingly found that a specific region of nonlinear transformation results in a much better performance on NARMA10 system approximation, which reveals an important fact that the rich dynamics can be explored as computing resource to enhance the approximation performance. It also demonstrates the computing potentials of the proposed hardware, which is in line with the purpose of neuromorphic engineering that fully explores physical dynamics for computing.
Besides the absence of noise, we respectfully disagree with the reviewer that the performance is attributed to the large network size. In our study, the large network size of 388×50 is for the case of parallel RNRs, which achieved a NRMSE value of 0.055. For a single RNR with a smaller network size of 400×1, the achieved NRMSE is 0.078. Both values are record low compared with literature reported noise-free models, as summarized in Table R1. For comparison, Appeltant et al. (2011) claimed that their result (NRMSE=0.15) was the best one using a hardware-based model (noise-free simulation) 1 . Meanwhile, it has also been found that an earlier work 2 using a softwarebased echo state network achieved NMSE = 0.0098 (equivalent to NRMSE of 0.099). It is also found that the NRMSE value cannot keep decreasing by simply expanding the network size. It will converge at a certain level. In our opinion, the record performance achieved in this work should be mainly attributed to the hardware-based neuron dynamic and the proposed eRNR architecture, which is the key innovation of our work. To clarify this point and also stimulate follow-up studies in the community, we have revised the manuscript accordingly and published the source code of our eRNR simulator (see Code availability).  Fig. 3c Page 9, Line 209: This result demonstrates the tremendous potential of the eRNR in high-order nonlinear system approximation due to the rich physical dynamics of electronics devices.

Comment #3:
Third. There is sometimes a confusion about whether power levels are estimated for the CMOS simulations, or measured for the demonstrator. For instance please specify in Supplementary Tables 1 and 2. (And it would be most interesting if you could give both numbers: for simulations and for demonstrator).

Response:
Thank you very much for pointing this out. The power results were estimated by the simulation of the CMOS circuit using standard 65nm technology, where an eRNR was designed and simulated. Actually, the device models and library provided by the foundry are quite accurate， so the simulation results are close to actual silicon chips. In Supplementary Table 1, the previous works that reported their system-level power were also the simulation results or roughly estimated results. We have further clarified that these are simulation results in the revised manuscript. Regarding the demonstrator implemented by discrete components, the power can be estimated by referring to their datasheets as shown in Table R2. Such demonstration system only serves to prove the proper functionality and signal flows of eRNR, and the estimation of its power consumption (dominated by associative parts) is not meaningful. Changes in the manuscript: Page 14, Line 333: The power estimation and simulation are described in the Methods, where the power of eRNR was estimated by the simulation of the CMOS circuit using foundry-provided library. The result indicates that the eRNR method can reduce the system power consumption for the handwriting task and chaotic signal prediction to 32.7 W. The simulation also suggests that the static power, mainly associated with the dynamic neurons and the leakage current of transistors, plays a dominant role when the processing rate (1/r) is lower than 100 kHz (for which the power consumption was estimated to be 79.1 W).
Page 23, Line 549: The simulated power breakdown at different frequencies is shown in Supplementary Table 2.
Supplementary Page 15, Line 375: In the simulation of the eRNR circuit, the overall system power consumption was estimated to be as low as 32.7 W for the handwriting tasks operating at 10 Hz (r = 0.1 s), reflecting an advantage of more than three orders of magnitude compared to the consumption reported for reservoir computing systems in the literature.

Minor comments:
Abstract the all-analog reservoir computing system achieved 94.0% accuracy with >1000× lower power than prior works. Are you refereeing to the actual demonstrator you built, or to your simulations of CMOS system? This sentence suggests the first, but the main text indicates the later.

Response:
Here we are referring to the simulation result. We have revised this sentence as follows.
Changes in the manuscript: By integrating a memristor array as a fully-connected output layer, the all-analog reservoir computing system achieves 94.0% accuracy, while simulation shows >1000× lower system-level power than prior works.
reservoir achieveS record-low errors in A time-series prediction benchmark By integrating A memristor array as A fully-connected output layer