Vibronic origin of long-lived coherence in an artificial molecular light harvester

Natural and artificial light-harvesting processes have recently gained new interest. Signatures of long-lasting coherence in spectroscopic signals of biological systems have been repeatedly observed, albeit their origin is a matter of ongoing debate, as it is unclear how the loss of coherence due to interaction with the noisy environments in such systems is averted. Here we report experimental and theoretical verification of coherent exciton–vibrational (vibronic) coupling as the origin of long-lasting coherence in an artificial light harvester, a molecular J-aggregate. In this macroscopically aligned tubular system, polarization-controlled 2D spectroscopy delivers an uncongested and specific optical response as an ideal foundation for an in-depth theoretical description. We derive analytical expressions that show under which general conditions vibronic coupling leads to prolonged excited-state coherence.

in the main text). The time interval between pulses is called coherence time t 1 , waiting time t 2 , and rephasing time t 3 for the first and second, the second and third, and the third excitation pulse and the emerging signal, respectively. The Fourier transform along t 1 and t 3 leads to the absorption and detection frequencies denoted by ω 1 and ω 3 , respectively. b-e, The stimulated emission and ground state bleaching diagrams contributing to the beating signals in R31. In a-d, grey shaded waiting time periods during t 2 highlight vibronic coherences in the electronic excited states. In e, on the other hand, the vibronic system is in the electronic ground state during t 2 .
Supplementary Figure 3. Absorption spectrum of C8O3. a, Absorption spectrum with light polarized parallel to the longitudinal axis of C8O3. Experimental and theoretical results are shown as a black solid line and a black dashed line, respectively. Theoretical results were modeled by a sum of Lorentzian functions, which describe bands 1-5 of C8O3. Each Lorentzian function is shown as a colored dashed line. b, Absorption spectrum with light polarized normal to the longitudinal axis of C8O3. Note that the vertical scales in a and b are different.
Supplementary Figure 4. Long-lived beating signals in N11 and R31. a, The ratio |η e /η g | between the contributions of the vibronic and vibrational coherences to the long-lived beating signal in R31. For the experimentally estimated value of Γ 13 , marked by a vertical dashed line, the contribution of the vibronic coherence is greater than the vibrational coherence. b, The ratio Γ g3 (Γ g1 |η e − η g |) −1 between the amplitudes of the long-lived beating signals in N11 (∝ Γ −2 g1 ) and R31 (∝ Γ −1 g3 Γ −1 g1 |η e − η g |). In both a and b, we take the values of the parameters estimated from experimental results. According to Eq. (19), (Γ g1 + Γ g3 ) ≈ 215 cm −1 is the theoretical upper bound for Γ 13 . Here we take the lowest decoherence rate Γ 13 ≈ (100 fs) −1 of the inter-exciton coherence allowed by correlated fluctuations, constrained by experimentally determined population transfer rates from band 3 to bands 1 and 2. As shown in the inset, correlated fluctuations cannot explain the experimentally measured long-lived beating signal in R31. b, Simulated beating signal for R31 in the absence of the exciton relaxation, shown as a solid blue line. The correlated fluctuation model predicts the lowest decoherence rate Γ 13 ≈ (303 fs) −1 of the inter-exciton coherence when there is no exciton relaxation, even though this condition is not satisfied for C8O3. Nevertheless, the correlated fluctuation model cannot explain the experimentally measured long-lived beating signal in R31, which persist beyond t 2 ≈ 800 fs, as shown in an inset. c, Simulated beating signal for R31 in the presence of a vibrational mode with frequency ν 1 ≈ 668 cm −1 , which is quasi-resonant with the exciton energy splitting ∆Ω 31 between bands 1 and 3. Here the vibronic and vibrational coherences induce a long-lived beating signal in good agreement with the experimental results. The root-mean-square deviation (RMSD) of the experimental results and theoretical prediction in a, b and c is 0.74, 0.86 and 0.59, respectively. We note that the correlated fluctuation model can also not explain the long-lasting beating signal in N11 (not shown). The monomer, tetrachlorobenzimidacarbocyanine chromophore with two attached hydrophobic octyl groups (FEW-Chemicals, Wolfen, Germany) was dissolved in 10 −2 M NaOH solution to achieve a concentration of 10 −4 M. The solution was then stirred in the dark for several hours. Subsequently, polyvinyl alcohol (PVA) of molecular weight ∼130000 was added in 1:10 w/w ratio (monomer:PVA) to slow down the formation of aggregate bundles during the storage of dye solutions. Moreover, the adsorbed PVA chains 1 obviously prevent the reassembly of double-layered into single-layered tubes upon bundling. This effect was observed recently for another derivative 2 (C8S3) of the present dye. The individual tubular aggregates degrade in that case into single-layered tubes, which is accompanied by a dramatic change of absorption spectra. Similar effects were not observed for C8O3 when PVA is present. In particular, the aggregate solutions prepared in the described way were stable for approximately ten days when stirred continuously. Without stirring the spectral signature of the double-layered tubes retained even after 12 weeks of storage 3 . For 2D experiments, we additionally diluted the sample with 10 −2 M NaOH to obtain optical density below 0.3 at 598 nm at a path length of 200 µm.
A total sample volume of approximately 10 ml was circulated through the U-shaped wire-guided jet 4 by a peristaltic pump (Masterflex C/L) with a flow speed optimized for film stability. Solvent evaporated from the recollecting container was refilled every 4 hours during the course of 13 hour measurement.

Data acquisition
Passively stabilized 2D spectroscopy was described in detail elsewhere 5 . Briefly, a home-built non-collinear optical parametric amplifier (NOPA) seeded by 180 fs pulses at 1030 nm from PHAROS (Light Conversion Ltd) was tuned to generate ∼16 fs pulses (80 nm full width at half maximum) centered at 580 nm. The NOPA output was split into four pulses and arranged in the so-called boxcar geometry. Waiting time t 2 was controlled by a mechanical translation stage (PI), whereas coherence time t 1 was scanned by inserting a pair of fused silica wedges into the first two pulses. All four pulses were focused and overlapped in the sample. The first three generated a third order nonlinear optical response which is emitted in the photon echo phase-matched direction. This signal was heterodyned with an attenuated fourth pulse, called local oscillator (LO). The resulting interference pattern was spectrally resolved and detected by a CCD camera (PIXIS, Princeton Instruments). Most of the scatter was eliminated by the double-frequency lock-in modulation of the first two pulses 6 . The polarization of each pulse was controlled by the combination of λ/4 wave plates and wire grid polarizers (contrast ratio > 800). The accuracy of the polarization angle was estimated to be ±1 • , where the unwanted signals were typically suppressed by a factor of ∼80 for the selected polarization sequence.
To prevent degradation of the sample, the power and repetition rate of the laser were set to 200 pJ/pulse and 40 kHz, respectively. Spectral resolution of ∼35 cm −1 for the detection frequency ω 3 was determined by the grating, the number of CCD pixels and Fourier filtering of the signal during the standard analysis procedure. Coherence time was scanned from −300 fs to 384 fs in 1.5 fs steps, providing ∼43 cm −1 spectral resolution of absorption frequency ω 1 . Waiting time steps of 12 fs were sufficient to resolve oscillatory features up to 1350 cm −1 with ∼35 cm −1 resolution along ω 2 .

Polarization-controlled 2D-ES
The strength of 2D signals is determined by the scalar products of molecular transition dipole moments and pulse polarizations. To take advantage of i) the preferential orientation of the J-aggregate (from here on referred to as C8O3) along the flow direction of the jet and ii) mutually perpendicular transition dipole moments of bands 1(2) and 3 of C8O3, we designed a polarization scheme selective for interband coherences. This is similar to the case of an isotropic sample discussed both theoretically 7 and experimentally 8,9 . In the presented experiments, the polarization scheme for rephasing signals reads (90, 0, 90, 0) for beams 1-4, respectively. The first and third pulses, polarized orthogonal (90) to the jet's flow direction, interact with bands 1-3. The second pulse, polarized parallel (0) to the jet's flow direction, interacts preferentially with bands 1 and 2, due to the negligible transition dipole moment of band 3 along this direction. The polarization scheme for non-rephasing spectra reads (0, 90, 90, 0), as the ordering of the first two pulses is reversed. These polarization schemes restrict oscillatory signals induced by interband coherences to the lower cross peak in rephasing spectra (R31) and the lower diagonal peak in non-rephasing spectra (N11), as shown in Figures 2 and 3 of the main text. Non-oscillatory 2D signals were subtracted prior to Fourier transformation t 2 → ω 2 .
The polarization-controlled 2D spectra were phased to pump-probe data where pump and probe pulses were polarized in parallel. This procedure is not rigorously correct because in polarization-controlled 2D-ES the first two pulses have different polarization directions while in pump-probe the first two interactions derive from the same pump pulse which naturally means parallel interactions. In other words, the projection slice theorem is strictly speaking not valid for the experiments presented here 10 . Despite this discrepancy, one can still satisfactorily phase polarization-controlled 2D spectra to all-parallel pump-probe as shown in Supplementary Figure 1. One explanation of this is leakage of the much stronger all-parallel signals through the crossed polarizers, meaning that the all-parallel signal still dominates the non-oscillatory part of the (90,0,90,0) 2D-signal. In this work, we decided to phase polarization-controlled 2D data to parallel pump-probe data. We note that the imperfection in phasing parameters only affects the lineshapes of the real and imaginary part of ω 2 maps, but preserves their amplitude-maps in both lineshape and magnitude. Hence, the difficulties in phasing polarization-controlled 2D spectra discussed above do not affect the conclusions drawn in the main part of the main text, which were based on ω 2 amplitude-maps. To this end, we found that arbitrary and large changes of the phasing parameters do not alter ω 2 amplitude-maps shown in Figure 2 of the main text (results not presented). It is noted that sophisticated phasing techniques based on heterodyned transient grating instead of pump-probe offer a correct method to phase crossed-polarization 2D signals 11 .
Supplementary Note 2: Theory of Coherent Vibronic Coupling

A vibronic model for bands 1 and 3 of C8O3
In the following the vibronic model used to describe bands 1 and 3 of C8O3 and simulate 2D spectra is described. We consider coherent interaction of bands 1 and 3 with the intramolecular vibrational modes of frequency ν 1 ≈ 668 cm −1 , which is quasiresonant with the exciton energy splitting between bands 1 and 3. The environmental noise induced by background phonons (a phonon bath) is modeled by a Markovian quantum master equation.

1-1. Hamiltonian
The electronic Hamiltonian of C8O3 that consists of a network of cyanine dye molecules is described by where |e α represents the excited state of site α (or molecule α), E α denotes the site energy including electronic and reorganization energies, and J αβ the electronic coupling between sites α and β. The diagonalization of the electronic Hamiltonian H e gives rise to the exciton states |k = α |e α e α |k associated with the exciton energies Ω k , where bands 1 and 3 are denoted by |1 and |3 , respectively: e α |k ∈ R for E α , J αβ ∈ R. The vibrational modes with frequency ν 1 ≈ 668 cm −1 are described by a set of harmonic oscillators where a † α and a α represent the creation and annihilation operators, respectively, of the intramolecular vibrational mode of site α.

The interaction between vibrations and the electronic excitation of molecules is modeled by
where s 1 denotes the Huang-Rhys factor of the vibrational modes. In the exciton basis {|k }, the interaction Hamiltonian H e−v is represented by where the diagonal terms (k = l) lead to adiabatic surfaces in the electronic excited states, called vibrons, while the non-diagonal terms (k l) induce coherent transition between different excitons mediated by exciton-vibrational couplings.
In this work, we are interested in the coherent interaction of bands 1 and 3 with the quasi-resonant vibrational modes of frequency ν 1 , which is described by the following cross termH e−v in Eq. (5) whereã 1 = N α 1|e α e α |3 a α describes an effective vibrational mode with frequency ν 1 . Here N is introduced to normalize the effective vibrational mode, such that [ã 1 ,ã † 1 ] = 1, leading to an effective Huang-Rhys factor S 1 = s 1 /N 2 . This implies that for a given Huang-Rhys factor s 1 , the effective Huang-Rhys factor S 1 is increased as the spatial overlap 1|e α e α |3 between excitonic wavefunctions of bands 1 and 3 increases, leading to smaller N and larger S 1 = s 1 /N 2 . The effective Hamiltonian of bands 1 and 3 coupled to the effective vibrational mode is then described byH =H e +H v +H e−v , whereH e = Ω 1 |1 1| + Ω 3 |3 3| andH v = ν 1ã † 1ã 1 . We note that the vibrational energy ν 1 ≈ 668 cm −1 is higher than the thermal energy k B T ≈ 208 cm −1 at room temperature T = 300 K, implying that the thermal state of the vibrational mode is well approximated by its ground state. In addition, when exciton-vibrational couplings are sufficiently small, the light-induced vibrational excitation of overtones is negligible due to the small Franck-Condon factors. This is the case for C8O3, where N11 and R31 in 2D spectra can be well described within a subspace spanned by {|g 0 , |g 1 , |1 0 , |1 1 , |3 0 }. Here, |k 0 and |k 1 denote the vibrational ground and first excited states of an electronic state |k , respectively, i.e. (H e +H v ) |k l = (Ω k +lν 1 ) |k l , where |g represents the electronic ground state with Ω g = 0. In this scenario, {|1 0 , |3 0 } can be directly excited by light from the ground state |g 0 , while {|g 1 , |1 1 } has an extremely low transition probability due to small Franck-Condon factors. Nonetheless, {|g 1 , |1 1 } can be populated through exciton-vibrational coupling ν 1 √ S 1 , leading to transition from |3 0 to |1 1 , and subsequently to |g 1 via emission. The coherent transition between |3 0 and |1 1 requires resonance between vibrational frequency ν 1 and exciton energy splitting ∆Ω 31 = Ω 3 −Ω 1 between bands 1 and 3, i.e. ∆Ω 31 ≈ ν 1 .

1-2. Decoherence
In addition to the coherent interaction of bands 1 and 3 with the effective vibrational modeã 1 , we consider electronic decoherence induced by background phonons. We characterize the decoherence by two dynamical processes, i) the incoherent population transfer between excitons, called exciton relaxation, and ii) the pure dephasing noise that destroys electronic coherence without exciton population transfer. In addition, we consider iii) relaxation of the effective vibrational mode.
We assume that each cyanine dye molecule is coupled to an independent phonon bath. The Hamiltonian of the background phonons is given by between molecules and phonons, where b † ξ and b ξ denote the creation and annihilation operators, respectively, of a background phonon mode ξ. Here g αξ represents the exciton-phonon coupling between site α and phonon mode ξ, which satisfies g αξ g βξ = 0 for all β α, implying that when site α is coupled to the phonon mode ξ with g αξ 0, all the other sites β are decoupled from the mode with g βξ = 0. For the sake of simplicity, we assume that there is no degeneracy in the exciton energies Ω k , which leads to a relatively simple form of a Markovian quantum master equation. This condition is satisfied even if the exciton energies are close to degeneracy unless they are strictly degenerate, which is satisfied for bands 1 and 3 of our interest. The influence of the background phonons on the vibronic system consisting of bands 1 and 3 with the effective vibrational mode is then described by a Markovian quantum master equation 12 where ρ(t) denotes the reduced vibronic state, while D r [ρ(t)], D d [ρ(t)] and D v [ρ(t)] describe exciton relaxation, pure dephasing noise and relaxation of the effective vibrational mode, respectively.
By substituting electronic coherences |g 1|, |g 3| and |1 3| to the dissipators D r [ρ(t)] and D d [ρ(t)] in Eqs. (8) and (10), one can obtain the following electronic decoherence rates Γ g1 , Γ g3 and Γ 13 of the coherences |g 1|, |g 3| and |1 3| where γ k→l denotes the incoherent population transfer rate from band k to l while γ g1 and γ g3 represent the pure dephasing rates of the coherences |g 1| and |g 3|, respectively, and γ 13 represents the pure dephasing rate of the inter-exciton coherence |1 3| between bands 1 and 3 These results imply that the inter-exciton dephasing rate γ 13 should be lower than the sum of the other dephasing rates γ g1 and γ g3 when there is a spatial overlap between excitonic wavefunctions of bands 1 and 3 with γ αα (0) ≥ 0 for all α, the equality γ 13 = γ g1 + γ g3 holds if and only if there is no spatial overlap between excitonic wavefunctions, i.e. | 1|e α | 2 | 3|e α | 2 = 0 for all α, or the spectral densities J α (ω) of the molecules shared by bands 1 and 3 do not induce pure dephasing noise by γ αα (0) = 0 for all sites α satisfying | 1|e α | 2 | 3|e α | 2 0. This implies that even if each molecule is coupled to an independent phonon bath, the spatial overlap between excitonic wavefunctions can reduce the interexciton dephasing rate γ 13 . Here the independent phonon baths of the molecules shared by excitons effectively form a common phonon bath coupled to both excitons, leading to a partial dephasing-free subspace. For instance, if bands 1 and 3 have perfect spatial overlap, i.e. | 1|e α | 2 = | 3|e α | 2 for all α, while the orthogonality between them is satisfied by the phases of 1|e α e α |3 , i.e. 1|3 = α 1|e α e α |3 = 0, the inter-exciton dephasing rate γ 13 will become zero, as each k| forms a dephasing-free subspace |1 1| + |3 3| of bands 1 and 3. Since band 1 is localized on the inner layer of C8O3, while band 3 is delocalized on both the inner and outer layers 13 , there is a partial spatial overlap between excitonic wavefunctions, leading to γ 13 < γ g1 + γ g3 . The spatial overlap is also required for a non-zero value of the effective Huang-Rhys factor S 1 , which is responsible for the long-lived beating signals observed in the experiment, as will be discussed later.
iii. Relaxation of quasi-resonant vibrations Finally, D v [ρ(t)] in Eq. (7) describes the relaxation of the effective vibrational mode Since n(ν 1 ) ≈ 0.04 at room temperature T = 300 K due to the high vibrational energy ν 1 k B T , Eq. (20) can be reduced to which describes the dissipation of the vibrational mode with the rate of γ v .
In Supplementary Figure 2a, the Feynman diagram contributing to the beating signals in N11 after the employed (0, 90, 90, 0) excitation is displayed. As the thermal state of the effective vibrational mode at room temperature is well approximated by its ground state ( ν 1 k B T ), the initial state of the vibronic system is given by |g 0 g 0 |. After excitation to |1 0 g 0 | by the first pulse, the dynamics of |1 0 g 0 | during coherence time t 1 is governed by a time evolution super-operator U(t 1 ) determined by the quantum master equation in Eq. (7) for which the Fourier transform is given by where the prefactor −(i(ω 1 − Ω 1 ) − Γ g1 ) −1 determines the lineshape of N11 along the ω 1 -axis, which is centered at ω 1 = Ω 1 with a linewidth of 2Γ g1 . By the second pulse, |1 0 g 0 | becomes |1 0 3 0 |, which evolves during waiting time t 2 into a mixture of |1 0 3 0 | and |1 0 1 1 |, mediated by exciton-vibrational coupling, scaling with ν 1 √ S 1 . The time evolution of |1 0 3 0 | is formally expressed as where K is a non-Hermitian operator describing both the Hamiltonian dynamics and decoherence Here we evaluate f (t 2 ) in Eq. (24), which describes the case that |1 0 3 0 | becomes |1 0 g 0 | by the third pulse, as shown in Supplementary Figure 2a. By diagonalizing the non-Hermitian operator K, one can show that f (t 2 ) is given by for which the Fourier transform leads to the lineshape −(i(ω 3 − Ω 1 ) − Γ g1 ) −1 of N11 along the ω 3 -axis. Therefore, the response function for N11 is given by where µ 1p denotes the transition dipole moment of band 1 for light polarized parallel to the longitudinal axis of C8O3, while µ 3n represents the transition dipole moment of band 3 for light polarized normal to the axis. This is due to the (0, 90, 90, 0) polarization scheme employed for measuring non-rephasing spectra in the experiment, as schematically shown in Supplementary  Figure 2a.  28) is decreased to µ 2 1n µ 2 3n with µ 2 1p > µ 2 1n , as band 1 is mainly polarized along the longitudinal axis of C8O3, as shown in the linear dichroism spectrum in Figure 1 of the main text. This implies that the (0, 90, 90, 0) polarization scheme for non-rephasing spectra enhances the signal-to-noise ratio when compared to the (90, 90, 90, 90) excitation. Similarly, the signal-to-noise ratio of rephasing spectra is enhanced by (90, 0, 90, 0) excitation.
The contribution of the GSB diagram to R31, with ground state coherence 14 during t 2 , is given by with 3 representing the vibronic mixing during t 3 which is associated with the transition |3 0 g 1 | → |1 1 g 1 | during t 3 shown in Supplementary Figure 2e. Here the vibrational frequency ν 1 in e iν 1 t 2 stems from the vibrational coherence |g 0 g 1 | in the electronic ground-state manifold and not the result of the approximation δω ≈ 0. In summary, when ν 1 √ S 1 < |i∆ν 1 − Γ 13 |, ν 1 √ S 1 < |i∆ν 1 + Γ g1 − Γ g3 | and γ v ≈ 0, the response function for R31 at (ω 1 , ω 3 ) = (Ω 3 , Ω 1 ) is given by where η e = (Γ g1 + Γ 13 )(Γ g1 + i∆ν 1 ) −1 stems from the SE diagrams shown in Supplementary Figures 2b-d, while η g = (Γ 13 − i∆ν 1 ) 2 (Γ g1 + i∆ν 1 ) −1 (Γ g3 + i∆ν 1 ) −1 originates from the GSB diagram shown in Supplementary Figure 2e. It is interesting to note that the origin of the long-lived oscillations at R31, whether predominantly vibrational or vibronic, depends upon the electronic decoherence rates {Γ g1 , Γ g3 , Γ 13 } and detuning ∆ν 1 = ∆Ω 31 − ν 1 . In Supplementary Figure 4a, the ratio |η e /η g | between the contributions of the vibronic and vibrational coherences to the long-lived beating signal in R31 is displayed as a function of the inter-exciton decoherence rate Γ 13 , where {Γ g1 , Γ g3 , ∆ν 1 } are taken to be the values estimated from experimental results. Here |η e /η g | > 1 implies that the long-lived beating signal in R31 is dominated by the vibronic coherence |1 1 1 | in the electronic excited-state manifold. By fitting the experimentally measured beating signals in N11 and R31 to the theoretical model, we found that Γ 13 ≈ 80 cm −1 , which is marked by a vertical dashed line in Supplementary Figure 4a, where the contribution of the vibronic coherence is ∼ 2.5 times greater than the vibrational coherence. These results imply that the long-lived beating signal in R31 is dominated by vibronic coherence, originating from electronic excited states. It is notable that the vibronic contribution outweighs the vibrational part for a wide range of Γ 13 . This is mainly due to the fact that the vibronic mixing 2 ∝ (i∆ν 1 − Γ 13 ) −1 during t 2 depends on the inter-exciton decoherence rate Γ 13 , while the other vibronic mixings 1 ∝ (i∆ν 1 + Γ g1 − Γ g3 ) −1 and 3 ∝ (−i∆ν 1 + Γ g1 − Γ g3 ) −1 during t 1 and t 3 are independent of Γ 13 . Considering that vibronic coherence depends on 2 (see Eq. (34) and Supplementary Figures 2b and c), while vibrational coherence depends on 1 3 (see Eq. (36) and Supplementary  Figure 2e), the vibronic contribution is increased as Γ 13 decreases. We note that these results are in line with the experimental observation that the amplitude of the long-lived beating signal in N11 is greater than that of R31 (see Figures 3b and c in the main text). In Supplementary Figure 4b, the ratio Γ g3 (Γ g1 |η e − η g |) −1 between the amplitudes of the long-lived beating signals in N11 and R31 is displayed as a function of the inter-exciton decoherence rate Γ 13 . Here the amplitude of the long-lived beating signal in N11 is greater than R31, i.e. Γ g3 (Γ g1 |η e − η g |) −1 > 1, for a range of Γ 13 where the vibronic coherence dominates the long-lived beating signal in R31, as shown in Supplementary Figure 4a.

1-5. Numerical simulation of N11 and R31
So far the analytic form of the response functions for N11 and R31 were derived with the assumption that the vibronic system is well described within the subspace of the vibrational ground and first excited states, which is valid for a small Huang-Rhys factor S 1 . To clarify the validity of this assumption, we performed numerical simulation of the beating signals in N11 and R31 with higher vibrational excited states, i.e. {|g 0 , |g 1 , · · · , |g n , |1 0 , |1 1 , · · · , |1 n , |3 0 , |3 1 , · · · , |3 n } with n ≥ 1. We found that the theoretical beating signals converge for n ≥ 1 and the numerical results are well matched to the analytical results. Here the electronic decoherence was modeled by a convex combination of two effective dissipators, i.e. pD 1 [ρ(t)] + (1 − p)D 2 [ρ(t)] with 0 ≤ p ≤ 1, where the dissipators are given by By substituting electronic coherences |g 1| and |g 3| to the dissipators, one can show that both D 1 [ρ(t)] and D 2 [ρ(t)] give rise to the same set of decoherence rates Γ g1 and Γ g3 for |g 1| and |g 3|, respectively, implying that the decoherence rates of |g 1| and |g 3| are independent of the value of p in the convex combination. For |1 3|, on the other hand, D 1 [ρ(t)] and D 2 [ρ(t)] lead to different decoherence rates Γ g1 + Γ g3 and ( Γ g1 − Γ g3 ) 2 , respectively. This enables us to vary the inter-exciton decoherence rate Γ 13 within a range of ( Γ g1 − Γ g3 ) 2 ≤ Γ 13 ≤ Γ g1 + Γ g3 by changing the value of p in the convex combination. In addition to the electronic decoherence, the relaxation of the vibrational mode was modeled by Eq. (20) in the simulations. We found that Eq. (20) can be approximated by Eq. (21) due to the high vibrational frequency ( ν 1 k B T ).

1-6. Feynman diagrams represented in vibronic eigenbasis
Here we provide the Feynman diagrams for N11 and R31 represented in the vibronic eigenbasis of the time evolution superoperator U(t), which are equivalent to the Feynman diagrams in the uncoupled state basis shown in Supplementary Figure 2. For N11, the vibronic mixing 2 takes place during waiting time t 2 (cf. Supplementary Figure 2a), where the vibronic coherences responsible for the short-lived and long-lived beating signals in N11 are given by respectively, where the vibronic eigenstates 3 0 | ∝ 3 0 | + 2 1 1 | and 1 1 | ∝ 1 1 | − 2 3 0 | are normalized by (1 + 2 2 ) −1/2 , not by (1 + | 2 | 2 ) −1/2 , due to the biorthogonality of the eigenstates of the non-Hermitian operator K in Eq. (25). When the light-induced vibrational excitation of overtones, i.e. 1 1 |, is negligible due to the small Franck-Condon factors, the transition dipole moments of the vibronic eigenstates 3 0 | and 1 1 | are determined by their amplitudes in 3 0 |, each of which is given by µ 3n (1 + 2 2 ) −1/2 and −µ 3n (1 + 2 2 ) −1/2 2 , respectively. Here µ 3n denotes the transition dipole moment of 3 0 |. In the eigenbasis, the Feynman diagrams responsible for the short-lived and long-lived beating signals in N11 are described by Supplementary Figures 5a and b, respectively. Given that there are two transitions between g 0 | and 3 0 | (and also between g 0 | and 1 1 |) by the second and third pulses, the square of the transition dipole moments of 3 0 | and 1 1 | is reflected in the response function, each of which is given by µ 2 3n (1 + 2 2 ) −1 ≈ µ 2 3n (1 − 2 2 ) and µ 2 3n 2 2 , respectively. This is in line with the analytic form of the response function for N11 shown in Eq. (31).

The response function for N22
Here we provide a vibronic model for bands 2 and 3 of C8O3, where bands 2 and 3 are coupled to the intramolecular vibrational modes with frequency ν 2 ≈ 470 cm −1 .
In Supplementary Figure 6a, the absolute square of the Fourier transform of the beating signal in N22 is displayed as a function of ω 2 , which is normalized by the amplitude of N11 at ω 2 ≈ 705 cm −1 . The amplitude of N22 is maximized around ω 2 ≈ 460 cm −1 with an amplitude in the range of 5 % of the N11 peak. When bands 2 and 3 are coupled to a vibrational mode with frequency ν 2 mediated by an effective Huang-Rhys factor S 2 , the response function for N22 is given by with ∆ν 2 = ∆Ω 32 − ν 2 , µ 2p denotes the transition dipole moment of band 2 for light polarized parallel to the longitudinal axis of C8O3 and Γ g2 ≈ 110 cm −1 represents the electronic decoherence rate of band 2, both of which can be estimated using the absorption spectrum shown in Supplementary Figure 3. From the experimentally measured beating signal in N22, we found that Γ 23 ≈ 200 cm −1 < (Γ g2 + Γ g3 ) (not shown). In Supplementary Figure 6b, the amplitude of the theoretical N22 is displayed as a function of the Huang-Rhys factor S 2 , which is about 5 % of N11 over a range of realistic S 2 values. For a comparison, the Huang-Rhys factor S 1 of the vibrational mode with frequency ν 1 ≈ 668 cm −1 is marked by a vertical dashed line. These results imply that the small amplitude of the beating signal in N22 is mainly due to the high electronic decoherence rate of band 2.

Supplementary Note 3: Theory of Markovian Correlated Fluctuations
Here we provide a correlated fluctuation model for bands 1 and 3 of C8O3 where coherent interaction between excitons and quasi-resonant vibrations is not considered. Within the level of Markovian quantum master equations, we show that the experimentally measured long-lived beating signals in N11 and R31 cannot be explained by correlated fluctuations.
The main idea of the correlated fluctuations is that when bands 1 and 3 are coupled to a common environment, the correlated noise enables the inter-exciton coherence |1 3| to decohere very slowly compared to the coherences |g 1| and |g 3| between electronic ground state and excitons. This is similar in spirit to the decoherence-free subspaces in quantum information theory 15 . Here we consider a Markovian quantum master equation in the form of which is the same to Eq. (3.143) in The Theory of Open Quantum Systems by H.-P. Breuer and F. Petruccione 12 , which is called the Redfield equation with the secular approximation in some literature 16 . Here the interaction Hamiltonian is modeled by H e−ph = α A α ⊗ B α with A α = A † α and B α = B † α , each of which is a Hermitian operator of the system and environmental degrees of freedom, respectively. With the exciton states |k defined byH e |k = Ω k |k , we introduce a projection operator Π(Ω) = Ω=Ω k |k k| = k δ(Ω, Ω k ) |k k| where the Kronecker delta is defined by δ(i, j) = 1 if i = j and δ(i, j) = 0 otherwise. In other words, Π(Ω) is a projection operator onto the exciton subspace belonging to the exciton energy Ω. In Eq. (52), A α (ω) = Ω −Ω=ω Π(Ω)A α Π(Ω ) = Ω,Ω δ(ω, Ω − Ω)Π(Ω)A α Π(Ω ). The interaction Hamiltonian H e−ph between excitons and background phonons is modeled by A α = |e α e α | and B α = ξ g αξ (a † ξ + a ξ ), where g αξ denotes the coupling of the local excitation of site α to a background phonon mode ξ. When g αξ 0 and g βξ 0 for different α and β, spatially separated sites α and β are coupled to a common phonon mode ξ, leading to correlated fluctuations in the energy levels of the different sites α