Filming ultrafast roaming-mediated isomerization of bismuth triiodide in solution

Roaming reaction, defined as a reaction yielding products via reorientational motion in the long-range region (3 – 8 Å) of the potential, is a relatively recently proposed reaction pathway and is now regarded as a universal mechanism that can explain the unimolecular dissociation and isomerization of various molecules. The structural movements of the partially dissociated fragments originating from the frustrated bond fission at the onset of roaming, however, have been explored mostly via theoretical simulations and rarely observed experimentally. Here, we report an investigation of the structural dynamics during a roaming-mediated isomerization reaction of bismuth triiodide (BiI3) in acetonitrile solution using femtosecond time-resolved x-ray liquidography. Structural analysis of the data visualizes the atomic movements during the roaming-mediated isomerization process including the opening of the Bi-Ib-Ic angle and the closing of Ia-Bi-Ib-Ic dihedral angle, each by ~40°, as well as the shortening of the Ib···Ic distance, following the frustrated bond fission.

The experiment was conducted at the XSS beamline of PAL-XFEL (the Pohang Accelerator Laboratory X-ray Free-Electron Laser) 1,2 . The XFEL delivered x-ray pulses with 12.7 keV energy and a temporal width less than 50 fs at a repetition rate of 30 Hz to the experimental hall of the XSS beamline, and the x-ray beam was focused to a spot of ~30 μm diameter. Optical pump pulses at the wavelength of 400 nm were generated via second-harmonic generation of the 800 nm femtosecond laser pulses from a Ti:sapphire regenerative amplifier. The optical pump pulses were temporally compressed to a temporal width of ~100 fs at the sample position and spatially focused to a spot of ~200 μm diameter, yielding a laser fluence of ~1.8 mJ/mm 2 . The x-ray and the optical laser beams were overlapped at the sample position in a collinear geometry.
The sample solution of BiI3 was prepared by dissolving powder of bismuth iodide (BiI3, Aldrich, 99%) in acetonitrile (Aldrich, anhydrous, 99.8%) at a concentration of 1 mM. In addition, 0.5 mM a dye, 4-bromo-4′-(N,N-diethylamino)-azobenzene (HANCHEM, 99.9%), in acetonitrile was prepared to measure the solvent heating signals. All of the reagents were used as received without further purifications. The sample solution was circulated through a quartz capillary nozzle, which provided a free-flowing cylindrical liquid jet of 100 μm diameter, delivering a fresh sample solution for every pump-probe pair. Scattered x-rays were detected by a Rayonix MX225-HS CCD (5760 × 5760 pixels with a pixel size of 39 μm) placed ~33.7 mm behind the liquid jet, which allowed a q-space coverage up to ~7.5 Å −1 . The detector was operated in the 4 × 4 binning mode.
We performed 13 independent sets of measurements (runs) and obtained ~2,560 images per time delay on average. The timing jitter between the x-ray and optical laser pulses was corrected in real time using an Optical laser and XFEL Cross-correlator (OXC) timing feedback tool.

Data processing
The time-resolved two-dimensional (2D) difference scattering images were obtained by subtracting the laser-off images from the laser-on images. The 2D difference scattering images S5 were converted to one-dimensional (1D) azimuthally-integrated difference scattering curves, ΔSazimut(q, t), by calculating the average intensity as a function of the momentum transfer, q = (4π/λ)•sin(2θ/2) = (4π/λ)•sin[1/2•tan −1 (l/d)], where λ is the wavelength of x-ray, the 2θ is the scattering angle, l is the distance from the beam center to a given pixel and d is the sample-todetector distance (see Supplementary Fig. 2). The resultant scattering curves are contaminated by systematic noise. To get rid of the systematic noise, we applied the SANOD method 3 to ΔSazimut(q, t) and used the output ΔSazimut(q, t) for the subsequent data processing (see Supplementary. Fig. 4).
As can be seen in Supplementary Fig. 2, the raw difference scattering images are anisotropic, and therefore ΔSazimut(q, t) can be decomposed into anisotropic and isotropic components. Based on a well-established method 4,5 , the 2D difference scattering images at each time delay were decomposed into 1D anisotropic and isotropic difference scattering curves, ΔSaniso(q, t) and ΔSiso(q, t), respectively (see Supplementary Fig. 4). Briefly, the difference scattering images can be decomposed as follows iso 2 aniso is the second-order Legendre polynomial, and θq is the angle between the laser polarization axis and the momentum transfer vector. For the configuration of fs-TRXL experiment where x-ray beam and optical laser are almost collinear, which is the standard experimental geometry of fs-TRXL including that of this work, cosθq can be expressed as follows cos cos cos where ϕ is the azimuthal angle on the detector plane. Following the linear relation between P2(cos q) and the scattering intensity in equation (S1), the contributions of ΔSaniso(q, t) and ΔSiso(q, t) can be considered as the slope and intercept, respectively, of a linear function with P2(cos q) as an independent variable defined along the ϕ on the detector plane at a given θ (or q). Therefore, at each q, ΔSaniso(q, t) and ΔSiso(q, t) as well as their corresponding errors can be obtained through a linear-regression fit.
The systematic noises generate an artifact in the scattering curve with the shape similar to the scattering curve arising from the solvent density change mainly in the low-q region.
Considering that the spatial fluctuation of the liquid jet is one of the major origins of the systematic noises, we can infer that the systematic noises affect the scattering curve in an isotropic manner.
Therefore, most of the systematic noises arise from ΔSiso(q, t) and ΔSaniso(q, t) should be free from the effect of systematic noises. By taking advantage of the noise-free ΔSaniso(q, t), we filtered out S6 the noise from ΔSiso(q, t) in the following manner. First, 2D anisotropic images were inversely reconstructed from ΔSaniso(q, t). Then, by azimuthally averaging the reconstructed 2D anisotropic images, we obtained the anisotropic component of the azimuthally averaged 1D difference scattering curves, ΔSaniso,azimut(q, t) (see Supplementary Fig. 4). By subtracting this anisotropic component from ΔSazimut(q, t), noise-filtered ΔSiso(q, t) were obtained (see Supplementary Fig. 4).
These procedures were applied to each of multiple runs. After correcting the time-zero of each run and taking an average, we finally obtained ΔSiso(q, t), which were used in the subsequent analysis.
The resultant ΔSiso(q, t) were further scaled to the absolute scale corresponding to the scattering intensity of one solvent molecule by multiplying the scaling factor, which was obtained from the comparison of the scattering intensity in the high-q region (q = 3.5 ~ 6.5 Å −1 ) of ΔSiso(q, 100 ps) with that of the 100-ps scattering curve measured at ESRF, which had been already scaled to the absolute scale in our previous study 6 . In supplementary Fig. 16, the scaled ΔSiso(q, 100 ps) is shown together with the ΔSazimut(q, 100ps) measured at ESRF.

Singular value decomposition of ΔSiso(q, t) and ΔSaniso(q, t)
To examine the temporal behavior of the scattering data, we applied the singular value decomposition (SVD) analysis to both ΔSaniso(q, t) and ΔSiso(q, t) by reforming each data set into the form of nq × nt matrix, M, where nq and nt are the number of data points along the q-and taxes, respectively. Upon the SVD, each of the ΔSaniso(q, t) and ΔSiso(q, t) can be decomposed into three matrices satisfying the relationship M = U•S•V T where the superscript T means the transpose of a matrix, U is an nq × nt matrix whose column vectors are called the left singular vectors (LSVs), S is a diagonal nt × nt matrix whose elements (si, where i = 1nt) are called singular values of a matrix M, and V is an nt × nt matrix whose column vectors are called the right singular vectors (RSVs). The LSVs represent time-independent scattering curves, and RSVs represent the timedependent amplitude changes of the corresponding LSVs. The singular values represent the weights of the corresponding LSVs and RSVs. Since the singular values are ordered such that s1 ≥ s2 ≥ … ≥ sn ≥ 0, the left-hand columns of U and V have a larger contribution to the data.
From the SVD analysis of ΔSiso(q, t), four significant RSVs were obtained (see Supplementary Fig. 5). The first five RSVs of ΔSiso(q, t) of the BiI3 solution weighted by the corresponding singular values are shown in Supplementary Fig. 5a. Among those RSVs, 3 rd and S7 two RSVs are removed from ΔSiso(q, t), the ultrafast shifts of the positive and negative peaks in qspace occurring at < 500 fs disappear, as shown in Supplementary Fig. 17. Therefore, we can infer that the first two RSVs of the ΔSiso(q, t) of the BiI3 solution contain the population kinetics of reacting chemical species while RSV3 and RSV4 are associated with coherent motions of BiI3.
Thus, to extract the population kinetics, we globally fit the first two RSVs using a sum of exponential functions convoluted with a Gaussian instrumental response function (IRF). As shown in Supplementary Fig. 6a and b, a sum of four exponential functions was enough to fit those two RSVs satisfactorily, whereas a sum of three exponential functions did not. As a result, we obtained four time constants, 508 (± 13) fs, 3.11 (± 0.43) ps, 8.83 (± 1.11) ps, and 11.90 (± 1.67) ps, and the full width at half-maximum (FWHM) of the IRF was determined to be 162 (± 7) fs.
In Supplementary Fig. 3, the first two RSVs and LSVs of ΔSaniso(q, t) are shown in comparison with ΔSaniso(q, t) of a dye solution, and it can be seen that ΔSaniso(q, t) are dominated by the first two singular values. We note that, as shown in Supplementary Figs. 3a and b, the first two SVs of significant contributions obtained from ΔSaniso(q, t) of BiI3 solution are nearly identical to those obtained from ΔSaniso(q, t) of a dye solution. Since ΔSaniso(q, t) of the dye solution originates from the optical Kerr effect (OKE) of neat solvent, that is, transient alignment of solvent molecules induced by polarized optical field, the similarity of ΔSaniso(q, t)'s of the BiI3 and dye solutions indicates that the structural dynamics of BiI3, which is of our interest, are not reflected in ΔSaniso(q, t) but contained only in ΔSiso(q, t). Therefore, only ΔSiso(q, t) of the BiI3 solution was further analyzed.

Extraction of the difference scattering signal, ΔSiso'(q, t), representing the exponential kinetics
The difference scattering signal, ΔSiso(q, t), of BiI3 solution has rich features including the signals from both exponential kinetics and coherent dynamics. To simplify the analysis, we first extracted and analyzed the contribution corresponding to the exponential kinetics, termed ΔSiso'(q, t). As explained in the previous section, the first two RSVs of the ΔSiso(q, t) of the BiI3 solution contain information on the exponential kinetics. To reconstruct ΔSiso'(q, t), the 1 st and 2 nd RSVs in the V of ΔSiso(q, t) were replaced by their corresponding fit curves, and the 3 rd and 4 th SVs were filtered by setting the corresponding singular values, LSVs and RSVs to zero in the S, U and V, generating a new set of matrices, U', S', and V'. By using the relation M = U•S•V T , ΔSiso'(q, t) were S8 constructed in the form of U'•S'•V' T . These procedures are relevant only for the scattering data at early time delays, where coherent dynamics are present, and therefore were applied to the data up to the time delay of 2 ps. Since the coherent dynamics are nearly absent after 2 ps in our data, Δ Siso'(q, t) was set to be equal to the original data, ΔSiso(q, t), after 2 ps. The reconstructed signals, ΔSiso'(q, t), representing the exponential kinetics are shown in Supplementary Fig. 18. As stated in the previous section, ΔSiso(q, t) of BiI3 solution was fitted by four exponentials with time constants, 508 (± 13) fs, 3.11 (± 0.43) ps, 8.83 (± 1.11) ps, and 11.90 (± 1.67) ps, The separation of signal contributions corresponding to exponential kinetics and coherent dynamics can be expressed as follows: es gs solvent es es,eq es,eq gs solvent es es,eq es,eq gs solvent where c(t) is the fractional concentration of excited molecules, S es (q ,t), S es,eq (q), S gs (q) is the scattering signals from the time-dependent structures of the solute (or cage) in the excited-state, equilibrium structure of the solute (or cage) in the excited-state and the structure of the solute (or cage) in the ground-state, respectively, and ΔSsolvent(q, t) is the difference scattering signal arising from the temperature rise of the bulk solvent, whose detailed description is given in the following section. Therefore, the kinetics of solute and solvent as well as time-independent excited-state structures can be obtained by analyzing ΔS ' (q, t). Then, as the next step, the time-dependent snapshots of the structures of reacting molecules can be obtained by analyzing ΔSresidue(q, t) or, equivalently, the whole signal, ΔS(q, t), based on the information of kinetics and equilibrium structure obtained from ΔS'(q, t). In this work, we chose the later method.

Global fit analysis on ΔSiso'(q, t)
As mentioned in the main text, the previous TRXL study on BiI3 in solution reported an isomer (iso-BiI2-I) and dissociation fragments (BiI2· and I·) are the only detectable solute species at 100 ps 6 . Fitting ΔSiso'(q, t) at earlier time delays, for example, 1 ps, using the scattering curves of these known components of solute and solvent did not provide any satisfactory agreement, as shown in S9 Supplementary Fig. 8. Therefore, we hypothesized that an additional intermediate, "X", was involved in the time delays earlier than 100 ps.
Then, we built kinetic models including the X intermediate by assuming that three out of the four time constants obtained by fitting RSVs correspond to the kinetics of solute and the other one to the heating kinetics of bulk solvent. According to our previous result 6 , "X" should completely decay to another species before 100 ps. Also, the fs-TRXL data of the BiI3 solution shown in Fig.   2a undergoes a significant reduction of the overall amplitude over time. Thus, at least one of the reacting species should relax back to the ground-state BiI3. We built eleven kinetic model frames compatible with these conditions, as shown in Supplementary Fig. 9. These eleven kinetic model frames require three time constants, τa, τb, and τc, to be assigned, as shown in Supplementary Fig.   9. Accordingly, three time constants from the four time constants obtained from the fit of RSVs were selected and distributed to τa, τb, and τc while the remaining time constant was assigned to the solvent heating. In this way, we built a total of 264 kinetic models (24 per each kinetic model frame), each of which was tested through global fit analysis (GFA) on ΔSiso'(q, t). We note that if only three exponentials are used, then the whole kinetic framework of the solute needs to be described with only two exponentials (one is for the solvent heating). Such a kinetic framework cannot explain the experimental data well.
In GFA, ΔSiso'(q, t) was fit against theoretical scattering curves, ΔStheory'(q, t), by minimizing the reduced χ 2 value (χ 2 red), which is defined as follows where N is the total number of data points along the q-and t-axes, p is the number of fit parameters, and σi,j is the standard deviation of the difference scattering intensity at i th q of j th time delay. The χ 2 red minimization was performed using the MINUIT package written at CERN and the error analysis was performed by MINOS, a built-in algorithm in the MINUIT software 7 . ΔStheory'(q, t) was constructed by a linear combination of the scattering signals of solute and solvent as follows where the symbols are explained in the following. During the GFA process, (i) the excitation ratio, (ii) a prefactor for the exponential growth function for describing the temperature rise of the solvent, and (iii) the structural parameters for "X", which will be described in the following section, S10 were used as the free parameters for the fit. Depending on the kinetic model frame, several branching ratios are used as free parameters among eight branching ratios, which are (i) the ratio between the initial "X" and the dissociation fragments, (ii) the ratio of "X" relaxing to the iso-BiI2-I, (iii) the ratio of "X" relaxing to the ground-state BiI3, (iv) the ratio of "X" BiI3 were used as reported in the previous TRXL study on the BiI3 in solution 6 . The structure of "X" was optimized during GFA by using the structural parameters for "X" as free parameters for the fit and using the structure of the ground state BiI3 as the initial starting structure. To determine the structure of "X", we introduced total five structural parameters, one for simultaneously modifying the two Bi-I bonds (Ia-Bi and Bi-Ib), constraining the two bonds to have the same length, and the others for the Bi-Ib-Ic angle, the Ia-Bi-Ib-Ic dihedral angle, the Ia-Bi-Ib angle, and the Ib‧‧‧Ic distance.
The scattering signal of the cage, ΔScage(q, t), stems from the atomic pair distances between solute and solvent molecules and can be theoretically calculated from pair distribution functions (PDFs) obtained from the molecular dynamics (MD) simulations. Since we have already calculated the cage signal of the dissociation fragments, iso-BiI2-I, and ground-state BiI3, only the cage signal for "X" was newly calculated. The molecular structure of "X" used to calculate its cage signal was roughly determined by performing GFA by using the cage signal of iso-BiI2-I for that of "X". Then, the obtained structure was used to perform the MD simulation using the MOLDY 9 program with the same condition described in our previous study 6 and obtained PDFs from which the cage signal of "X" was calculated. During the MD simulation, the atomic charges for "X" were approximated to those of ground-state BiI3. Considering the relatively small contribution of the cage signal compared to that of the solute signal due to the heavy atoms consisting of solute molecules, the discrepancy coming from those approximations should not be significant. We calculated the cage signal of "X" using the charge values obtained from DFT calculations described in Supplementary Method 8, and the calculated cage signal does not show significant difference from that calculated with the charge values of the ground-state BiI3.
The scattering signal of solvent, ΔSsolvent(q, t), originates from the change of the scattering signal due to a temperature increase of the bulk solvent, (∂S(q)/∂T)ρ, scaled by the amount of the temperature change, ΔT(t). The temperature derivative of the scattering signal, (∂S(q)/∂T)ρ, can be obtained from a separate TRXL experiment on a dye solution as described elsewhere 10,11 . The temperature change, ΔT(t), of the bulk solvent was modeled using an exponential growth function as follows where A is a prefactor, t0 is the time-zero, τ is the time constant, H(t − t0) is the Heaviside step function. In fact, the temperature of the solvent will also respond to the transition between S12 intermediates (for example, early isomer → late isomer or early isomer → ground state) because the energy difference between the states is transferred to the solvent, increasing the solvent temperature. Considering the current sample concentration (1 mM), excitation ratio, and the relative energies of the reacting species, a relatively small amount of additional rise of the solvent temperature (~0.05 K for the best kinetic model) is expected. In the fit, however, we did not take this additional source of the temperature change into account because the fit qualities were not affected meaningfully due to the relatively small contribution of the solvent heating signal.
By putting all these components together, ΔStheory'(q, t) in GFA is constructed according to the following equations where R is the number ratio of the solvent molecules to solute molecules, k is the index of the solute species, ck(t) is the fractional concentration of k th species at time delay t, Sk(q) is the soluterelated scattering curve of k th species, which consists of the scattering curves of solute and cage, Sg(q) is the scattering curve of the ground state BiI3 molecule, IRF(t) is the Gaussian IRF determined in the fitting of RSVs and ⊗ is the convolution operator. The fractional concentration of each species, ck(t), was calculated by numerically solving differential equations derived from the kinetic models.
In Supplementary Data 1, we summarize the relative χ 2 red values with respect to the minimum χ 2 red value obtained from GFA on 264 kinetic models. The kinetic model A3 gives the best quality of fit, and the detailed kinetic parameters determined based on this kinetic model are described in the main text and shown in Fig 3a. The concentration profiles of the reacting solute species and the time profile of the temperature rise are shown in Fig. 3b and Supplementary Fig. 10, respectively. The optimized structure of "X" (the early isomer) from GFA based on the kinetic model A3 is shown in Fig. 4.

Structural analysis on ΔSiso(q, t ≥ 175 fs)
Based on the kinetics and the equilibrium structures of the intermediates determined in the GFA on ΔSiso'(q, t), the coherent structural motions of the reacting molecules were analyzed by fitting S13 the ΔSiso(q, t). For the time delays later than the experimental IRF (t ≥ 175 fs), the fit was performed at each time delay independently using the following equations theory g 1 ( , 175 fs) where Nq is the number of q points, p is the number of parameters, σ(qi, t) is the standard deviation of the ΔSiso at qi of time delay t. Terms in Equation (S11) are the same as those in Equation (S10) except Sk(q, t). In contrast with the Sk(q) in Equation (S10), the time-dependent structures were taken into account in calculating the Debye equation for the solute signal of the solute-related scattering curve of k th species, Sk(q, t), which is, therefore, time-dependent here. During the fitting, only the structural parameters of "X" and the dissociating BiI3 were used as free parameters while the parameters for the kinetics of solute and solvent, ck(t) and ΔT(t), were fixed to the values determined in the GFA of ΔSiso'(q, t). For the structure of "X", five structural parameters, one for the two bond lengths, Ia-Bi and Bi-Ib, and the others for Bi-Ib-Ic angle, Ia-Bi-Ib-Ic dihedral angle, Ia-Bi-Ib angle, and Ib‧‧‧Ic distance, were used. For the dissociating BiI3, three structural parameters, one for Bi‧‧‧Ic distance, another for the two bond lengths, Ia-Bi and Bi-Ib, and the other for Ia-Bi-Ib angle. Besides, a Debye-Waller factor (DWF), 22 /2 q e  , was applied only for the atomic pairs accompanying the departing iodine atom, Ic, of the dissociating BiI3 to compensate for the effects of the wavepacket dispersion during the dissociation. Hence, the root-mean-squared distance, σ, of the DWF was used as another free parameter. Since the dissociating iodine atom has little effect on the theoretical scattering curves at time delays later than 600 fs in the measured q-range (1.0 Å -1 ≤ q ≤ 6.5 Å -1 ), the fully dissociated fragments, BiI2· and I·, in their equilibrium structures were used instead of the dissociating BiI3 for the signals at time delays later than 600 fs. An averaged set of structural parameters was obtained by fitting the signals using various initial guesses for the structural parameters at each time delay. Specifically, a random structural pool consisting of ~5000 structures for each time delay was generated within boundary conditions, which are, for the "X", conditions were chosen to search the structures between the ground state BiI3 and the final S14 equilibrium structure for each species, which is "X" and the dissociation fragments, BiI2• and I•.
Every structure in the structural pool of each time delay was used as the initial guess to fit the signal at the given time delay. During the fit, those structures were optimized and the corresponding χ 2 red was obtained. With respect to the minimum value of χred 2 , min[χ 2 red (t)], at the given time delay among the obtained χ 2 red (t), the output structures giving less than 1.00001 times the min[χ 2 red (t)] were collected and the structural parameters of those structures were averaged. By performing this process at each time delay independently, the time-dependent structural motions were obtained.

Structural analysis on ΔSiso(q, t < 175 fs)
For the time delays shorter than, or nearby, the experimental IRF (t < 175 fs), an approach slightly modified from that used for the time delays larger than the experimental IRF was used because directly fitting the difference scattering signals at the time delays shorter than, or nearby, the temporal width of IRF results in distorted structural information as explained in Supplementary Note 2.
Each of the time-dependent parameters for the structures of "X" and dissociating BiI3 and σ of the DWF was modeled by a quartic polynomial function, where a4−k is the coefficient of the polynomial function. We note that structural fit with a cubic polynomial degrades the quality of fit while using quintic polynomial results in a similar quality of fit to that with a quartic polynomial. The Debye equations for the solute signal in Sk(q, t) were calculated based on x(t) from which the molecular structures were constructed and the DWF for the dissociating BiI3 was calculated. Subsequently, the instantaneous theoretical difference scattering curves, ΔSinst(q, t), were calculated using the following equation where the terms in Equation (S14) are the same as those in Equations (S10) and (S11). Then, ΔSinst(q, t) was temporally blurred by convoluting with the Gaussian IRF, IRF(t), which was determined in the fitting of RSVs, resulting in ΔStheory(q, t < 175 fs) as shown in the following equation S15 '' theory inst ( , 175 fs) where t' is the dummy variable of time delay. Under the constraints where the polynomial functions smoothly connect the structure at 0 fs, which is the structure of the BiI3 at Frank-Condon state, and the structures at 175 fs obtained from the fitting using Equations (S10) and (S11), the coefficients of the polynomial functions for each structural parameter were optimized by minimizing χ 2 red using the following equation where N is the total number of data points along the q-and t-axes, p is the number of fit parameters, and σi,j is the standard deviation of the difference scattering intensity at i th q of j th time delay. The structural parameters obtained in this way show oscillatory features in terms of time. Such oscillations cannot be justified with the current IRF and signal-to-noise ratio even if structural parameters indeed have temporal oscillations. We suspected that the temporal oscillations of structural parameters are artifact caused by use of a polynomial function. To remove the artifact, we applied penalties to parameters that inverted the sign of the first derivative. We minimized a sum of following penalty term (P) and χ 2 red instead of χ 2 red alone.
where α, which is set to be 100, is a constant weighting factor, xi is the i-th structural parameter belonging to the subset C in which the sign of the first derivative of the parameter changes, tj is the j-th time delay, (∂xi/∂t)t = t j is the value at t = tj of the first derivative of xi with respect to t, and max[(∂xi/∂t)] is the maximum value of the first derivative of xi with respect to t. The value of this penalty term increases when the temporal profiles of structural parameters have more oscillatory features. Consequently, the addition of the penalty term smooths the resulting temporal profiles of structural parameters.
Finally, the resultant ΔStheory(q, t < 175 fs) was concatenated with ΔStheory(q, t ≥ 175 fs), giving rise to ΔStheory as shown in Fig. 2b, Supplementary Fig. 7 and Supplementary Fig. 18 where σx(t) is the standard deviation of the parameter x at time delay t, Var(a4₋k) is the variance of the coefficient a4₋k and Cov(a4₋i, a4₋j) is the covariance between the coefficients a4₋i and a4₋j. We note that, in most cases, the standard deviations obtained at t < 175 fs are relatively small compared to those of at t ≥ 175 fs because of the involvement of additional penalty term as well as the globally constrained parameters over several time delays, rather than a specific time delay, for the analysis at t < 175 fs.

Computational details of quantum and DFT calculations
The geometry optimization and subsequent harmonic vibrational frequency calculation of BiI3 (C3v) using the coupled-cluster singles and doubles with perturbative triples (CCSD(T)). The scalar relativistic effects of Bi and I were treated using the relativistic effective core potential (RECP).
The triple-ζ basis sets (aug-cc-pVTZ) were used for the valence electrons of Bi and I. This  All DFT and CCSD(T) calculations were performed using the Gaussian16 program and MS-CASPT2 with SOC calculations were performed using the Molcas8.0 program.

Ic of the early isomer
To check the possible effect of σ on the structure of the early isomer, we also analyzed the data by including DWFs on the structure of the early isomer. Since the position of partially dissociated iodine, Ic, is expected to have a distribution, we applied DWFs with a shared σ to the atomic pairs containing I(c) of the early isomer. Following the same procedures introduced in section 3 but including the σ for the early isomer as another free-parameter, we performed global fit analysis on ΔSiso'(q, t). The fitting results show that the value of σ converges to zero, giving the same kinetics and equilibrium structure shown in Fig. 4. Thus, the equilibrium structure of the early isomer has a fairly well-defined structure within our error ranges of structural parameters. To further check the effect of σ on the early time structural motions of the early isomer, we also performed structure refinement on ΔSiso(q, t) up to 600 fs, following the procedures in section 4 with including DWFs with a shared σ to the atomic pairs containing Ic of the early isomer. The fit results at each time delay provide σ values varying around 0.1 Å . This value is comparable to the error range for the structural parameters that can modulate those atomic pairs. The Bi-Ib-Ic angle, for example, has an error range of about ±3 degrees. When the Bi-Ib-Ic angle is changed by 3 degrees, the pair distances containing Ic are altered by ~0.14 Å , which covers comparably wider distance distributions to those generated based on the obtained σ.

S18
Thus, the distributions in pair distances during the roaming reaction of BiI3 might be wide, but they are in the comparable range with the structural error range of our signal. Therefore, we conclude that the effect of σ on the structural dynamics of the early isomer is negligible under the structural and temporal resolution of our experiment.
Supplementary Note 2: Simulation to check the effect of IRF on structural analysis of ΔSiso(q, t < 175 fs) Directly fitting the difference scattering signals at the time delays shorter than the temporal width of IRF results in distorted structural information, as shown in Supplementary Fig. 19. In Supplementary Fig. 19, we generated mock data of a virtual diatomic molecule whose bond length undergoes an elongation from 2.816 Å to 3.616 Å following a sum of two exponential growth functions whose time constants are 50 fs and 150 fs, respectively. The mock data of instantaneous signals, qΔSinst, are shown in Supplementary Fig. 19a and are convoluted with a Gaussian IRF whose FWHM is 162 fs to make the temporally blurred signals, qΔSconv shown in Supplementary   Fig. 19b. The qΔSconv was directly fit using the Debye equation at each time delay independently and the fit result is shown in Supplementary Fig. 19c. This mode of data analysis corresponds to the one used to fit the data of ΔSiso(q, t ≥ 175 fs). The resultant bond length, rfit(t) is shown in Supplementary Fig. 19d and is compared to the true bond length, rtrue(t), used for generating the mock data. While the general trends of the bond elongation are captured by the fit, the rfit(t) do not correctly reproduce neither the rtrue(t) nor the IRF convoluted rtrue(t). In particular, rfit(t) shows a significant discrepancy at the time delays shorter than the temporal width of the IRF. In Supplementary Fig. 19e, Bi-Ib-Ic angle Ia-Bi-Ib-Ic dihedral angle, which are obtained from fitting ΔSiso using the same analysis method used for fitting ΔSiso(q, t ≥ 175 fs), are shown as an example.
The two angles evolve in a similar fashion to those shown in Fig. 5a, but the temporal profiles in Supplementary Fig. 19e are distorted as in the case of the mock data.