Unveiling the complexity of spatiotemporal soliton molecules in real time

Observing the dynamics of 3D soliton molecules can hold great opportunities for unveiling the mechanism of molecular complexity and other nonlinear problems. In spite of this fantastic potential, real-time visualization of their dynamics occurring on femtosecond-to-picosecond time scales is still challenging, particularly when high-spatiotemporal-resolution and long-term observation are required. In this work, we observe the real-time speckle-resolved spectral-temporal dynamics of 3D soliton molecules for a long time interval using multispeckle spectral-temporal measurement technology. Diverse real-time dynamics of 3D soliton molecules are captured for the first time, including the speckle-resolved birth, spatiotemporal interaction, and internal vibration of 3D soliton molecules. Further studies show that nonlinear spatiotemporal coupling associated with a large average-chirp gradient over the speckled mode profile plays a significant role in these dynamics. These efforts may shed new light on decomposing the complexity of 3D soliton molecules, and create an analogy between 3D soliton molecules and chemical molecules.

1. There are other existing methods that are capable imaging complex optical processes in multiple domains. Notably the compressed ultrafast photography (Nature volume 516, pages74-77 (2014)). Soliton dynamics (Nature Communications 11,2059(2020) and optical chaos (Science advances 7 (3), eabc8448) have been observed. These methods should be discussed/referenced and compared with the method in the paper.
2. The spatial resolution is implemented by using a single-mode fiber as a spatial filter for the multimode fiber in which soliton is generated. Discussion about the spatial resolution should be included. Moreover, some information and plan about how to realize more spatial channels will be helpful for readers to understand the limit and advantage of this method.
3. The discussion to link soliton molecules with chemical and physical properties of real molecules is vague. Some references should be given in the introduction part. It may be helpful to give some concrete examples about how the nonlinear interaction in fiber optics can be mapped to molecule dynamics.
4. The authors did very careful characterization of the soliton molecule. However, it will be good if more discussion can be included about the soliton molecule dynamics. Such as what are the parameters and factors that lead to the observation of different soliton molecules. While some simulation results are given, a more intuitive explanation and clear physics picture can be helpful.
5. There are too many abbreviations in the manuscript, and some are difficult to follow. For example, SG in some other papers are normally used as Signal Generation, but here as Speckle Grain. RT can be Room Temperature, but here as Round Trip.
Overall, I believe the manuscript is novel, and can be published under revision.

Authors' responses to reviewers' comments
We thank the reviewers for their valuable comments and suggestions on improving our manuscript. We have carefully considered all the comments and made appropriate amendments to the manuscript. Our point-by-point responses are shown below, wherein we first list reviewers' comments in italics and then respond to them. The manuscript has also been accordingly modified, wherein the changes are highlighted in red.

Authors' point-by-point responses to reviewers' comments
Reviewer #1: The authors reported the real-time speckle-resolved spectral-temporal dynamics of 3D soliton molecules for a long time interval using multi-speckle spectral-temporal measurement technology.
They observed speckle-resolved birth, spatiotemporal interaction, and internal vibration of 3D soliton molecules. By comprehensively analyzing the observations, they found that nonlinear spatiotemporal coupling is the main origin for these dynamics, where a large average-chirp gradient over the speckled area was characterized. These findings are original, and can pave the way for further understanding the complexity of 3D soliton molecules, as well as other multidimensional nonlinear problems. The manuscript is well-organized and the results support their claims, thus I recommend it for publishing if the authors address the following concerns.
Authors' response: We thank the reviewer for the positive comments on our manuscript. Our responses to the reviewer's concerns are as follows.
2 Comment 1: The information acquired by inspecting the gradient distribution of the average chirp is interesting. Does it indicate that the speckle grains covering the area with larger average-chirp gradient have more complex evolutionary dynamics?
Authors' response: We appreciate the reviewer for the insightful comment. The simulation results show that, over the speckled mode profile, the area with a larger average-chirp gradient has more complex evolutionary dynamics, as shown in Fig. R1. As can be observed, the temporal signal in the area with a larger average-chirp gradient dramatically varies over the evolution, while it exhibits small variation in the area with a smaller average-chirp gradient.

3
To address the reviewer's concern, the corresponding discussion has been provided in the revised manuscript: The simulation results imply that, in the area with a large average-chirp gradient, e.g., the white arrow of Fig. 4a, the temporal and spectral profiles significantly vary along the gradient direction (Fig. 4c). In contrast, they remain relatively stable in the area with a small average-chirp gradient (Fig. 4d). Notably, the temporal and spectral evolutions in the area with a large averagechirp gradient dramatically change over roundtrips (Figs. 4c, Authors' response: We are grateful to the reviewer for the valuable comment. As shown in Fig. R2, the periodic intensity variation is barely observed from the evolution of the corresponding temporal waveform directly captured by the real-time oscilloscope, i.e., Fig. R2b. Corresponding temporal waveform captured by the real-time oscilloscope.

Comment 3:
The authors used a YDFA in the MUST system, will the performance of the YDFA affect the measurement?
Authors' response: We thank the reviewer for his/her careful studies on our results. To avoid the potential spectral distortion from using a YDFA in the spectroscopy measurement, we have carefully optimized the parameters of the YDFA, mainly adjusting the length of the Yb-doped gain fiber used in the YDFA to suppress the nonlinear effects, which potentially occur on the forward direction of the double-pass dispersive Fourier transformation (DFT). It is also worth noting that, the YDFA imparts limited change to the spectral shape of the amplified signal on the backward direction, which is because the pulse signal has been largely broadened after double-passing the long SMF that is highly dispersive, such that its peak power is sufficiently low and the nonlinear effects are prevented.
To clarify this concern, we tested the spectroscopy performance of the MUST measurement system employing the YDFA, and the results are shown in Fig. R3. In this case, we used a single-mode modelocked (SM ML) laser ( Fig. R3a) with a repetition rate comparable to that of the STML laser utilized in the manuscript. To examine the influence of the YDFA on the spectral shape of the amplified signali.e., the main concern for real-time spectroscopy analysis, in each measurement we fixed the net gain of the YDFA to be ~20 dB, which is comparable to that of the loss of double-passing the long SMF (about 16 dB). Two different power levels of the input signal were tested, i.e., -8 dBm (Fig. R3b) and 1 dBm ( Fig. R3c), respectively. The results show that, the optical spectrum of the amplified signal is well maintained for a low signal power level, i.e., the case of -8 dBm in Fig. R3b, while it could be slightly changed for a higher signal power level, i.e., 1 dBm in Fig. R3c. In our experiments, the power of the signal captured by the MUST measurement system is typically less than -10 dBm. As a result, measurement accuracy can be ensured, which can also be justified by other implementations of DFTs involving optical amplifiers [Nat. Photon. 7, 102-112 (2013)]. respectively. Here, the net gain is fixed to be ~20 dB for both cases. The OSA curves were recorded before the YDFA, while the MUST curves were recorded after the YDFA and double-pass SMF units, i.e., right before the photodiode (PD).
To clarify this concern, we have provided the corresponding discussion in the revised Supplementary Information: To avoid the potential spectral distortion from using a YDFA in the dispersive Fourier transformation (DFT), we have carefully optimized the parameters of the YDFA, mainly adjusting the length of the Yb-doped gain fiber used in the YDFA to suppress the nonlinear effects, which potentially occur on the forward direction. It is also worth noting that, the YDFA imparts limited change to the spectral shape of the amplified signal on the backward direction, which is because the pulse signal has been largely broadened after double-passing the long SMF that is highly dispersive, such that its peak power is sufficiently low and the nonlinear effects are prevented.
We tested the spectroscopy performance of the MUST measurement system employing the YDFA, and the results are shown in Supplementary Figure 10. In this case, a single-mode mode-locked (SM ML) laser with a repetition rate comparable to that of the STML laser utilized in the manuscript.
In each measurement, we fixed the net gain of the YDFA to be ~20 dB, which is comparable to that of the loss of double-passing the long SMF (about 16 dB). Two different power levels of the input signal were tested, i.e., -8 dBm and 1 dBm, respectively. The results show that, the optical spectrum of the amplified signal is well maintained for a low signal power level, i.e., the case of -8 dBm, while it could be slightly changed for a higher signal power level, i.e., 1 dBm. In this work, the power of the speckle-resolved signal captured by the MUST measurement system is typically less than -10 dBm, such that the accuracy of the spectroscopy measurement can be ensured. Authors' response: We thank the reviewer for his/her insightful comments. Indeed, to accelerate the computation of (3+1)-dimensional (i.e., the x, y, t and z dimensions) simulations, in the numerical simulation we used a fiber length that is shorter than that of the experimental implementation, but without loss of generality for qualitative analysis, which has also been adopted in prior works, like Ref. [Nat. Phys. 16, 565-570 (2020)] that studies the mechanism of the spatiotemporal mode-locking.
To address the reviewer's concern, here we present the simulation results (Fig. R4) by using a laser cavity the same as that of the experiment, i.e., composed of 5-m gain fiber and 4.5-m passive fiber (noted that, 7 there is also a small part of free space, about 30 cm). As can be observed, 3D soliton molecules can be successfully generated. In this case, the massive calculation of (3+1)-D light field (x,y,t;z) takes about 3 days to finish a single simulation of generating 3D soliton molecules. Although the use of generalized multimode nonlinear Schrödinger equations (GMMNLSEs) can, to some extent, reduce the computation, it is still time-consuming for the (3+1)-D simulation of complex laser dynamics over a long evolution time, especially when many transverse modes (several tens to hundreds) are included.
To speed up the simulation, in this work all the parameters of the fiber laser cavity, including gain, nonlinearity, chromatic/modal dispersion, saturable absorption, spectral/spatial filtering, etc, have been equivalently integrated into the simulation model with a shorter fiber length, i.e., 50 cm, such that it takes only about 3 hours to finish a single simulation of generating 3D soliton molecules, which largely facilitates the numerical study of complex 3D soliton molecules. Compared with the results generated from a simulation model consistent with the experiment (Fig. R4), similar dynamics of 3D soliton molecules can be generated from the equivalent model used in this work, e.g., Supplementary Figure 5.
Considering the numerical results using both consistently long fiber and equivalently short fiber (i.e., the equivalent model), the latter case can precisely predict the experiment with a much faster simulation speed, but without loss of generality, e.g.,  From the above discussion, we can conclude that the equivalent simulation model using a shorter fiber length can provide good enough qualitative analysis and a much faster simulation speed. To clarify this concern, we have provided a corresponding discussion in the revised manuscript: It is worth noting that, in contrast to a fiber length of 5 m used in the experiment, this equivalent simulation model using a shorter fiber length can provide good enough qualitative analysis and a much faster simulation Authors' response: We appreciate the reviewer for his/her careful reading of our manuscript. We have modified the corresponding description in the revised manuscript. We hope that these revisions could satisfy the reviewer's concerns and that they meet the publication requirements. Thank you very much for your attention and consideration to our paper. should be much more profound for inspiring audiences.

Soliton molecules have been undergoing lots of studies, and the authors shift interests into
In addition, concerning some technical questions, the retrieved phase seems to be not very clear or exact.
It is due to the optical detections, or the phase-retrieved method.

Authors' response:
We are grateful to the reviewer for these positive and valuable comments on improving our manuscript. According to the comments from the reviewer, the deeper understandings of what we have observed are provided as follows, i.e., the major concern from the reviewer, particularly in the aspects: (1) underlying dynamic rules of these observations and profound understandings of these interesting views of 3D soliton dynamics; (2) accuracy of the phase-retrieval method.

(1) Underlying dynamic rules of these observations and profound understandings of these interesting views of 3D soliton dynamics
To provide an intuitive picture of the underlying dynamic rules of spatiotemporal (ST) soliton molecules, we first make a closely-related analogy between the photonic and chemical concepts, as outlined in Fig.   R6, and then try to understand these observations using such a photonic-chemical analogy.

3D dual-soliton molecule
In comparison with 1D soliton molecules with pulse-to-pulse interaction resulted from modal-irrelevant Particularly, their assembling dynamics are governed by the spatiotemporal interaction involved Lett. 42, 3419 (2017)], i.e., where ; and ; are the field envelopes of the LP21a and LP31a modes, respectively; while , and , are the corresponding transverse-mode-field distributions. and are the effective mode area and second-order dispersion of the LP21a mode, respectively. is the nonlinear refractive index. Δ represents the propagation constant mismatch, and is the speed of light. To intuitively understand the mechanism of providing the binding force, we try to interpret it through an analogous 'chemical bond' generated between two distinguishing atoms (e.g., a C for LP21a and a E for LP31a 15 as indicated in Fig. R7a) by means of the Gordon-Mollenauer approach [Phys. Rev. A 78, 063817 (2008)], and the force can be expressed as the 1D gradient of the average chirp 〈 〉 (also see Methods), i.e., By combining it with the equation of ⁄ mentioned before, we have where accounts for the group velocity mismatch between LP21a and LP31a modes. and are the first-order dispersions of LP21a and LP31a modes, respectively. and 〈 〉 are the second-order dispersion and average chirp of LP31a mode. The calculated results are shown in Fig. R7b, wherein the binding force reaches its maximum during the reaction (as marked by the arrow in Fig. R7b), resulting in the assembling of molecule 'CE'. It is noteworthy that this 3D dual-soliton molecule, as a photonic counterpart of heteronuclear diatomic molecule, is only accessible in the ST mode-locking that involves diverse multimode elements.

17
For kind-II dual-soliton molecules, the pulses in a pair of degenerate modes (e.g., LP31a and LP31b as indicated in Fig. R7c) have very similar characteristics, particularly their propagation constant, and they evolve in a landscape akin to these 1D soliton molecules, while their synchronization can be further reinforced by the maximum gain principle in the ST mode-locking [Nat. Phys. 16, 565 (2020); Nat. Commun. 12, 67 (2021)]. In contrast to kind I, kind-II 3D soliton molecules resemble homonuclear diatomic molecules. Despite its similarity with these 1D counterparts, the binding mechanism driven by the modal degeneracy and the maximum gain principle still has no analogue in the scenario of singlemode mode-locking.

3D triple-soliton molecule
The generation of 3D triple-soliton molecules in the ST mode-locking is much more complicated due to the modal/spatial complexity [Nat. Commun. 10, 1638(2019]. To show this, we present typical assembling dynamics in Fig. R8a. As illustrated, in the initial state, a single 3D soliton that involves LP21a, LP12a and LP12b modes approaches a kind-II dual-soliton molecule (i.e., a homonuclear diatomic molecule involving LP31a and LP31b). As depicted in Figs. R8a,b, the production of the 3D triple-soliton molecule, as a hybrid case of kind-I and -II dual-soliton molecules, undergoes a similar binding process as that of the dual-soliton molecule (Fig. R7). Such assembling dynamics can emulate a combination reaction of generating a compound, as the invariant modal energy distribution can serve as the analogue of the law of conservation of mass. According to the mode-resolved intensity profiles shown in Fig. R8c, the 3D triplesoliton molecule can be treated as an analogous molecular formula CF2E2.
(2) Accuracy of the phase-retrieval method To address the reviewer's concern on the accuracy of the phase-retrieval method, we first present the numerical validation of the phase-retrieval method and then discuss the phase ambiguity of the optical detection.

Numerical validation of the phase-retrieval method
To address the reviewer's concern on the validation of the phase-retrieval method used in this work, we numerically create interferograms of soliton molecules with preset relative phases , as shown in Fig.   R9a, and subsequently evaluate the accuracy of the retrieved relative phases using these preset values, as shown in Fig. R9b. As can be observed, a good agreement is obtained, theoretically confirming the feasibility of the phase-retrieval method.

Phase ambiguity of the optical detection
In the experimental implementation, the limited bandwidth of the detection system ( where, is the bandwidth of the detection system, i.e., 21 GHz in this case (defined by the digitizer).
The phase ambiguity in this work is comparable to prior works (Table R1).
To further identify the feasibility of the phase evaluation, we calculate and simulate the roundtrip-evolved retrieved phase for Fig. 6. As shown in Figs. R10a,b, the numerical simulation can well reproduce varying features of , , and , e.g., the jump and plateau over the phase evolution (as indicated by the arrows). As marked by the black dashed square in Fig. R10a, a non-trivial fluctuation of is observed, also see the close-up shown in Fig. R10c.  From the above discussion, we can conclude that: (i) the phase-retrieval method is numerically validated; (ii) the bandwidth limitation of the detection system can impart an acceptable phase ambiguity for qualitative study, wherein the major feature of the phase evolution can be retrieved.
To address the reviewer's concern, we have provided a corresponding discussion in the main text of the More discussions have also been provided in the responses to Comments 3,4 of Reviewer #3.
We hope that these revisions could satisfy the reviewer's concerns and that they meet the publication requirements. Thank you very much for your attention and consideration to our paper.  (2014)). Soliton dynamics (Nature Communications 11,2059(2020) and optical chaos (Science advances 7 (3), eabc8448) have been observed. These methods should be discussed/referenced and compared with the method in the paper.

Authors' response:
We appreciate the reviewer for his/her valuable comments. 3D solitons and their molecule-fashion can exhibit complicated spatio-spectral-temporal dynamics, due to spatiotemporal coupling and nonlinear interaction. The evolution of these dynamics can last for 1000s of roundtrips/frames, corresponding to a time scale of 10's-to-100's µs, as shown in Figs. 6,7 as well as Supplementary Figures 1,3,18,22. From this perspective, multi-dimensional measurement technologies with both fast refresh rate and long recording length/time are highly desired for studying these dynamics of 3D soliton molecules. As mentioned by the reviewer, the compressed ultrafast photography (CUP) technology [Nature 516, 74-77 (2014)] is a powerful tool for spatiotemporal characterization of transient events with a femtosecond-to-picosecond temporal resolution, and significant findings have been obtained in the field of soliton dynamics [Nat. Commun. 11, 2059(2020] and optical chaos [Sci. Adv. 7, eabc8448 (2021)]. In short, the CUP technology works by encoding the transient scenes with pseudo-random pattern, which is followed by a shearing operation in the time domain using a streak camera with a fully opened aperture. The CUP technology can capture non-repetitive time-evolving events with a refresh rate of 10 11to-10 13 frames per second, which is prominent for measuring x-y-t (x, y, spatial coordinates; t, time) scenes with super-fine temporal resolution. It is also noticed that, the number of continuous recording frames of the CUP technology is from 10s to 100s, and it may prevent the recording of the long-term evolving optical events, like the formation of 3D soliton molecules in this case. To this end, the MUST technology used in this work compromises the temporal resolution to a moderate level and thus prolongs the recording length (Table R2), especially useful for studying the long-term evolving 3D soliton molecules. To address the reviewer's concern, we have cited and discussed the CUP technology in the revised manuscript: In the meantime, super-high-speed photography technologies, especially the compressed ultrafast photography (CUP) 24 , have been successfully demonstrated, and interesting findings have been obtained in the field of soliton dynamics 25 and optical chaos 26 .

Comment 2:
The spatial resolution is implemented by using a single-mode fiber as a spatial filter for the multi-mode fiber in which soliton is generated. Discussion about the spatial resolution should be included.
Moreover, some information and plan about how to realize more spatial channels will be helpful for readers to understand the limit and advantage of this method.

Authors' response:
We thank the reviewer for the insightful comments. For collecting the spectraltemporal signal from a single speckle grain, a single-mode fiber that can serve as spatial filter is useful.
To do so, we utilize three probes to collect the spectral-temporal signals of three different speckle grains, and each probe consists of a single-mode fiber and a collimator for light coupling (Fig. R11). To freely access different speckle grains, the collimator is mounted on the translation stage. It is worth noting that, each bright speckle grain is a coherent spot resulted from the constructive interference. In this sense, a spatial resolution that can resolve the speckle grains is good enough for the MUST measurement, as shown in Fig. R12.To ensure this capability, the collimator is associated with a telescope for magnifying the size of the speckle grains, and such a configuration can provide a spatial resolution of about 2.6 μm.  To demonstrate the ability of simultaneously measuring more speckle grains, we implement a MUST measurement system with six channels, and the spectral evolutions of six different speckle grains are simultaneously captured, as shown in Fig. R13. This implies that, the channel scaling is feasible by using free-space optics, if compactness compromise is not a concern. It should also be pointed out that, here the real-time DFT spectroscopy is performed in a single long-SMF unit that is associated with the OTDM 27 technology. Thus, temporal overlapping of the DFT signal will limit the number of channels, i.e., about 8 channels for the configuration used in this work. To solve this problem, parallel detection with more DFT units and digitizers is helpful. In addition, using micro-lens array and multicore dispersive fiber, as shown in Fig. R14, can potentially further increase the number of channels, but making the measurement system more complicated and expensive. Based on the above discussion, we implemented a MUST measurement system with only three channels in this study, given that resolving the spectral-temporal signal for three speckle grains is already useful for probing the spatial-spectral-temporal dynamics of the 3D soliton molecule, from the perspective of qualitative analysis. To address the reviewer's concern, we have correspondingly discussed in the revised manuscript: The laser signal is extracted by a 50:50 beam splitter. The size of the extracted laser beam is enlarged by a 5 magnification telescope, which is subsequently launched to the MUST measurement system.
The magnification telescope associated with the single-mode fiber probe can enable a spatial resolution of ~2.6 μm (Supplementary Note 3.2). In the MUST measurement system, the laser signal is split into three branches, and three speckle grains of the multimode laser beam are individually received by the three single-mode fiber probes (while a larger number of probes is also available by compromising the compactness of the measurement system, see Supplementary Note 3.3).
Two new sections have also been added to the Supplementary Information,  Authors' response: We thank the reviewer for the insightful comment. To clarify the reviewer's concern, we first discuss the influence of nonlinear interaction on 3D soliton molecular dynamics over multiple dimensions. Then, the photonic-chemical analogy is established for correlating the 3D soliton molecule to real molecules, based on which molecular-like assembling dynamics are demonstrated.
To extend the existing theoretical framework for studying 3D soliton molecules, we consider a prime pulse with field envelope ( ; ) (corresponding to transverse mode ) that is perturbed by another pulse ( ; ) in a distinguishing transverse mode . In this case, the nonlinear interaction between pulses in two distinguishing transverse modes can be described by a generalized nonlinear Schrödinger equation in the form of Through the Gordon-Mollenauer approach [Opt. Lett. 17, 1575-1577(1992Phys. Rev. A 78, 063817 (2008)], molecular-like binding force can be defined by the 1D gradient of the average chirp (also see Methods), i.e., The binding force accounting for the IM-XPM and IM-FWM effects is where accounts for the group velocity mismatch between transverse modes and . and are first-order dispersions of transverse modes and , respectively. and 〈 〉 are the second-order dispersion and average chirp of mode .
Consequently, the connection between spatiotemporal interaction in multimode fiber and molecular dynamics can be established.

Photonic-chemical analogy
Based on the above discussion, we can make a closely-related analogy between the photonic and chemical concepts, as shown in Fig. R15, wherein multimode elements (A-F) and photonic isotopes are defined.
Using such a photonic-chemical analogy, we discuss a paradigm -assembling dynamics of a heteronuclear diatomic molecule 'CE' (i.e., a 3D dual-soliton molecule made up of LP21a and LP31a modes), as shown in Fig. R16a. Three phases of the assembling dynamics are illustrated in Fig. R16b. In phase 1 (P1, i.e., dissociated atoms), the pulses in LP21a and LP31a modes propagate at different group velocities, and there is no spatiotemporal overlapping, leading to weak nonlinear-interaction-induced binding force. Thus, the pulses in LP21a and LP31a modes behave like dissociated atoms, i.e., elements a C and a E. The existence of group velocity mismatch, primarily caused by the modal and chromatic dispersion, can impose weak attraction on the atoms and thus initiate the self-organization of the 'CE' molecule.
In phase 2 (P2, i.e., binding process), the desynchronized pulses overlap in the time and space domains and thus nonlinearly interact to produce molecular-like binding force between the initially uncorrelated 33 atoms (i.e., dissociated elements a C and a E). As shown in Fig. R16c, the binding process (like reaction) starts from roundtrip ~700 and lasts for ~360 roundtrips, during which a maximum strength of = 4.3 × 10 is generated.
In phase 3 (P3, i.e., molecule formation), the equilibrium of the reaction attains is established, i.e., 〈 〉 ⁄ = 0, leading to the successful assembling of the heteronuclear diatomic molecule 'CE', whose intensity profiles on the time-intensity plane before and after the assembling are shown in Fig. R16d.
To address the reviewer's concern, we have correspondingly discussed in the revised manuscript: While experimentally observing the real-time motion of atoms and molecules is still difficult 3-5 , it can be studied by referencing the dynamics of three-dimensional (3D) soliton molecules in nonlinear optical systems, which may share many common characteristics with the dynamics of chemical molecules 6 ; while one-dimensional (1D) soliton generated through nonlinear interaction in fiber optics has been intensively studied and correlated to chemical molecular dynamics 7-9 .
A new section has also been added to the Supplementary Information, i.e., Supplementary Note 7: Underlying dynamic rules of these observations and profound understandings of these interesting views of 3D soliton dynamics.
More discussions have also been provided in the responses to Comments of Reviewer #2.

Authors' response:
We thank the reviewer for the valuable suggestions. In addition to what has been discussed in the response to Comment 3, we further discuss 3D soliton molecular dynamics with greater complexity and subsequently the parameters affecting the state of 3D soliton molecules.

3D soliton molecular dynamics with greater complexity
In our simulation, because of the modal/spatial complexity, we observe different 3D soliton molecules in a long-term assembling process (~30,000 roundtrips), as illustrated in Figs. R17a,b. Such an evolution can emulate a combination reaction of generating a compound, as the invariant modal energy distribution ( Fig. R17c) can serve as the analogue of the law of conservation of mass. During the assembling process, different soliton molecules are discovered (Fig. R17b), which can be analogous to multiple intermediates produced in a complex combination reaction. In the initial state, a pair of pulses in LP31a and LP31b modes co-propagate as a soliton molecule. Here, we classify the dual-soliton molecule comprised of degenerate modes (i.e., photonic isotopes) as the kind-II dual-soliton molecule -an analogue of homonuclear diatomic molecule (see the responses to the comments of Reviewer #2). Individual pulses in LP31a and LP31b modes are synchronized because of similar propagation constant, and also it can be further reinforced by the maximum gain extraction in the ST mode-locking [Nat. Phys. 16, 565 (2020); Nat. Commun. 12, 67 (2021)]. According to the photonic-chemical analogy described in Fig. R15, this dualsoliton molecule can be treated as an analogous molecular formula, i.e., E2. Then, a single 3D soliton that involves LP21a, LP12a and LP12b modes (indicated by the blue arrow in Fig. R17a) approaches the homonuclear diatomic molecule E2, resulting in the production of a 3D triple-soliton molecule, as a hybrid case of kind-I and -II dual-soliton molecules. This 3D triple-soliton molecule can be treated as an analogous molecular formula, i.e., CF2E2. Finally, by further reacting with a single soliton in LP21b mode, as indicated by the red arrow in Fig. R17a, a stable molecule C2F2E2 is produced. Three different 3D soliton molecules, i.e., E 2 , CF 2 E 2 , and C 2 F 2 E 2 shown as isosurface plots. c. Energy distribution of the multimode elements at roundtrips 50*20 and 1470*20.

Parameters affecting the state of 3D soliton molecules
Here, we want to stress that the configuration of the intermediates, e.g., E2 and CF2E2, are susceptible to the initial modal condition (i.e., 3D initial condition). This reminds us of the versatile spectral-temporal evolution of single-passing the parabolic multimode optical fiber by controlling the input spatial profile [Opt. Express 25, 9078 (2017)].
To address the reviewer's concern, we have provided a corresponding discussion in the main text of the A new section has been added to the Supplementary Information, i.e., Supplementary Note 7: Underlying dynamic rules of these observations and profound understandings of these interesting views of 3D soliton dynamics, wherein related discussion has been provided: Here, we address that the assemblings of dual-soliton or triple-soliton molecules are usually susceptible to the initial modal condition (i.e., 3D initial condition). This reminds us of the versatile spectral-temporal evolution of single-passing the parabolic multimode optical fiber by controlling the input spatial profile 12 .
More discussions have also been provided in the responses to Comments of Reviewer #2.