Determination of hot carrier energy distributions from inversion of ultrafast pump-probe reflectivity measurements

Developing a fundamental understanding of ultrafast non-thermal processes in metallic nanosystems will lead to applications in photodetection, photochemistry and photonic circuitry. Typically, non-thermal and thermal carrier populations in plasmonic systems are inferred either by making assumptions about the functional form of the initial energy distribution or using indirect sensors like localized plasmon frequency shifts. Here we directly determine non-thermal and thermal distributions and dynamics in thin films by applying a double inversion procedure to optical pump-probe data that relates the reflectivity changes around Fermi energy to the changes in the dielectric function and in the single-electron energy band occupancies. When applied to normal incidence measurements our method uncovers the ultrafast excitation of a non-Fermi-Dirac distribution and its subsequent thermalization dynamics. Furthermore, when applied to the Kretschmann configuration, we show that the excitation of propagating plasmons leads to a broader energy distribution of electrons due to the enhanced Landau damping.


SUPPLEMENTARY NOTE 1. SOLUTION OF THE FIRST INTEGRAL EQUATION
The first integral equation of our double inversion procedure (Eq. (2) in the main text), can be solved by assuming that ∆ (ω) is composed from a sum of square pulse functions ∆ (ω) = j a j P j (ω), where a j = a j are the components of the unknown solution vector, and each pulse function P j (ω) is localized around ω j , i.e., P j (ω) = 1, ω j − ∆ω/2 < ω < ω j + ∆ω/2, 0, otherwise.
Substituting this sum in Eq. (2) in the main text and enforcing the equation on each data point at ω i leads to Finally this equation can be written in a matrix form by defining the data vector (r) which elements are r j = ∆R R (ω j ), the solution coefficients vector a which elements are a j , solving the integral and noting that P j (ω i ) = δ ij , giving the matrix equation A simple inverse of the matrix M might not exist, hence a regularized pseudo inverse should be applied as explained in Supplementary Note 4.

SUPPLEMENTARY NOTE 2. CALCULATION OF D( ω, E)
Following Rosei and co-workers [1, 2] we calculate D( ω, E), the energy distribution of the joint density of states (EDJDOS) with respect to the energy of the final state associated with transitions from a lower (l) to upper (u) band, The lower and upper bands are assumed to be parabolic in character near the relevant transition points, with the convention that all the masses are positive. Use of these expressions with a line integral equivalent of Supplementary Equation (6) [1] can be shown to lead to It is important to note that implicit in Supplementary Equation (9) are the two conditions implied by Supplementary Equation (6), namely that and For given E and ω these two equations can readily be solved for k 2 ⊥ and k 2 and if either of these is negative then one must set D = 0. (Note that simply checking that the argument of the square root in Supplementary Equation (9) is greater than zero is not sufficient.) For gold at the frequencies of interest there are two points in the Brillouin zone of relevance, the L and X points. As such there are two terms of the form Supplementary Equation (9), each with different lower and upper band parameterizations, and the full EDJDOS is written where a L/X is the ratio between the different L and X contributions. The connection between and the room temperature T r = 298K Fermi-Dirac distribution f (E) = 1/(e E/kTr + 1) is This integral can be performed numerically over a suitably broad range of energies (e.g., E min = -5 eV and E max = 5 eV) and keeping in mind the conditions on each contributing EDJDOS factor as discussed below Supplementary Equation (9). (Alternatively one can use these conditions to determine the integration range.) It is possible for the argument of the square root in Supplementary Equation (9) to become very small for energies near where the conditions become unsatisfied and we have found that simply setting a non-zero cut-off minimum value for the argument is sufficient for obtaining good results. The parameters a L/X and A are related to electronic transition moments and are not readily available. However, by evaluating Supplementary Equation (13) as outlined above for varying a L/X and A, one can obtain a fit with the empirical dielectric function of Johnson and Christy [3], and the result is presented in Supplementary Figure 1. The value of a L/X in Supplementary Equation (12) was fit to be 0.4 which is close to the values found by Guerrisi et al., [2] which were 0.32 and 0.37, and the value of A in Supplementary Equation (13) was found to be 10 atomic units (hartree 3 a 3 0 ). SUPPLEMENTARY FIGURE 1: Fit of using the fit of the D function (red), and comparison to the − intra (ω) measured by Johnson and Christy [3] in blue.

SUPPLEMENTARY NOTE 3. BAND STRUCTURE PARAMETERIZATIONS
The band energies and electron masses were inferred from Christensen and Seraphin. Note that the energy positions the bands, X u and X l above, are given relative to the Fermi level (7.208 eV) as the zero of energy.

SUPPLEMENTARY NOTE 4. SOLUTION OF THE SECOND INTEGRAL EQUATION
Here we aim to solve the second and final integral equation of our double inversion procedure (Eq. (4) of the main text), Next, as in the first integral equation, we express the solution function ∆f as a sum of non-overlapping square pulse functions, i.e., ∆f = j b j P j (E), where: and substituting back inside the integral equation gives: Applying this equation for each data point ∆ ( ω i ) gives a set of equation written in matrix form: where ∆ is the vector of data points (result of the first inversion of the data), b is the vector of unknown coefficients which needs to be solved, and where The naive solution is c = N −1 J, however the matrix N is highly singular and an inverse does not exists. Therefore the solution is achieved using a pseudo inverse which is regularized using the L-Curve method. Increasing the number of singular values used in the inversion traces a curve from the down right. Upon arriving on the knee point (marked here with red circle), which correspond in this case for using 507 singular values, the optimal inversion is reached.
Here we briefly summarize how to solve and regularize a matrix equation of the form r = M a in cases where the full inverse of M does not exist, using the L-curve method [5]. Obviously, other regularization techniques might be applied. The matrix M is decomposed using the singular value decomposition as where u, v are orthonormal vector bases spanning the vector spaces with appropriate dimensions according to the dimension of M , {σ j } is a set of non-increasing real singular values. Note that the decomposition is a sum of outer products of column vector with a row vector. Next, the pseudo inverse of M is written as The task is now to the determine the number N j of meaningful singular values σ j to be added, since small singular values are attributed to error in the model and may also be below the noise level. The number N j is therefore data-dependent and needs to be carefully chosen for each specific problem.
In the method of the L-Curve (Supplementary Figure 2), one calculates the solution with a specific number of singular values, and then plots the solution norm (y axis) versus the residue of re-inserting the solution back into the original matrix equation (x axis). Written more simply, given a solution a Nj the L-curve is the set of points r − M a Nj , a Nj . Increasing the number of singular values used in the pseudo inverse, the residue of the matrix equation becomes smaller, since the solution is improving, and therefore the curve will show a horizontal approaching left behavior (reduced residue). After a certain point the singular values σ j are too small and describe error in the model and/or values below the noise level. Therefore the inverse term 1 σj will result in exploding norm of the solution almost without improvement of the residue (i.e., the solution does not become more exact), and hence the curve will exhibits a vertical leg.

SUPPLEMENTARY NOTE 6. NUMERICAL DETAILS
In the normal incidence configuration, for each time instant we had 532 data points, plus 100 points of extrapolation for each side, summing to 732 points. In the first inversion the L-Curve method yield the result that all singular values should be used, however in the second inversion the elbow was reached at 385 singular values. For the Kretschmann configuration we had respectively 540 data points plus 100 points of extrapolation from each side, summing to 740 points. Out of that, 405 singular values were determined as the optimal number for the second inversion.  Fig. 2(c) in the main text, which has been truncated at ±1eV above/below Fermi level, and then extrapolated to mimic extrapolation of unkonwn data ; (b) inverted curve is shown together with the original curve used for calculation, and with an truncated and extrapolated curve which avoids discontinuities.; (c) inverted curve along with original ∆f The use of the Kramers-Kronig relation to implicitly account for the real part of the dielectric function changes requires that the imaginary part, ∆ (ω) be known for a sufficiently broad range of ω values such that it is effectively zero outside this range. This means, in turn, that the experimental ∆R/R(ω) data must also be known over such a sufficiently broad range of ω. While this is generally the case for most of the experimental data we have analyzed, there can be situations, particularly when one is examining the very short-time dynamics with very small ∆R/R responses that are somewhat noisy that some sort of extrapolation of the results outside the known region is necessary. This can be a subtle issue and care must be taken to ensure that the results so-obtained are reliable.
Here we demonstrate the effect of the extrapolation of the data at the edges, using simulated data which has been synthetically truncated above/below Fermi level, then extrapolated with smooth decay to mimic the extrapolation done on incomplete data, and finally inverted. At the second step, the result is checked and if there is large non-physical SUPPLEMENTARY FIGURE 4: An extended two-temperature model result for ∆T e (green curve) is compared with the T e inferred from our present approach (purple curve).
discontinuity/behavior, the result is again truncated and extrapolated at the same points of the original truncation. It should be noted that since the original simulated data is created starting from the box-like step functions, which have strong discontinuity, then the simulated data also has strong discontinuities, and the extrapolation done is therefore very sensitive to cases where the truncation and extrapolation is done before the discontinuity point, before the slope has changed. In such cases the extrapolation will miss the correct decay, however, we show here that the inversion procedure still yields reasonable results. Supplementary Figure 3(a) shows the simulated data in purple, and the truncated and extrapolated curve in dashed blue. Next, Supplementary Figure 3

SUPPLEMENTARY NOTE 8. EXTENDED TWO-TEMPERATURE MODEL
Extended two-temperature models (TTMs) [6][7][8][9] can incorporate some aspects of the early-time non-thermal energy density, N (t). We use the following relatively simple extended TTM [9]: with C e (t) = γT e (t), and where T e (t) is the thermalized electron temperature and T l (t) is the thermalized phonon temperature. The rate of absorbed energy density (associated with the growth of non-thermal electrons) is given by P (t) = AI(t)/α where A is the absorbance, α is the absorption coefficient , and I(t) is the laser pulse intensity associated with the pump of the form B exp[−4ln(2)(t/t pulse ) 2 ]. The above equations represent a simple set of three coupled first-order differential equations in time that can be readily solved numerically.
SUPPLEMENTARY FIGURE 5: Power dependence of the experimental differential reflectivity at an early time (122 fs).Full spectrum of the probe reflectivity (left panel) and the maximum of the reflectivity peak(right panel).
We take [9] γ = 70 J m −3 K −2 , and C l = 2.5 x 10 6 J m −3 K −1 . The full-width at half-maximum of the laser pulse is t pulse = 130 fs in our case and B is determined to be such that the time integral of I(t) is the fluence, which we have estimated to be 640 µJ/cm 2 = 6.4 J/m 2 for our experiments. We focus on the case of normal incidence case with incident light having wavelength 1200 nm. Fresnel calculations then suggest that A = 0.03 and α −1 = 11.8 nm. We take a = 1/τ th and b = 1/τ p where τ th is the electron thermalization time constant and τ p is the non-thermal electron-phonon relaxation time constant. We employ τ th = 100 fs, τ p = 1.4 ps, and g = 2 x 10 16 W m −3 . These parameter values are typically used for gold, except for the value of τ th , which is smaller than the 500 fs value of Ref. 6 which is typically assumed. Note, however, that in general the early times for which τ th is relevant pertain to times for which our ∆f functions will contain a significant amount of non-thermal contributions and so we probably cannot reliably infer this parameter from our data.
Supplementary Figure 4 displays the resulting electron temperature T e dynamics (green curve). We compare these extended two-temperature results to electron temperatures determined from fits of the normalized ∆f (E)/∆f max electron population changes inferred from the experimental data, where ∆f (E) = 1/(exp(E/kT e )+1)−1/(exp(E/kT 0 )+1) and ∆f max is the maximum value of this difference. This result is shown as the purple curve in Supplementary Figure 4. While the present effective two-temperature model underestimates the maximum temperature, it is seen to be a reasonable description of the results. It is possible to much achieve better agreement if the fluence were increased and/or the absorption coefficient were increased, reflecting more energy being pumped into the sample than expected on the basis of our limiting formulae.
An interesting issue arises concerning radiation penetration depths that should be noted. It turns out that the skin-depth for gold (i.e., its inverse absorption coefficient) for wavelengths near the pulse wavelength of 1200 nm is on the order of 10nm and that the skin-depth for shorter wavelengths (e.g., 520 nm) consistent with the probe light is on the order of 25nm. This discrepancy in skin depths can have consequences for gold films that are d = 30 nm or larger because the non-thermal electron disturbance created by the pump corresponds to just a portion of the film thickness whereas the probes are sampling the entire film thickness. If s pump is the skin depth associated with the pump, and s probe is the skin-depth associated with the probe, then a correction factor of s probe (ω)/s pump should be applied to the ∆ values inferred from the probe-based reflection measurements. We have not investigated this correction but anticipate that it could lead to somewhat larger magnitude ∆f (E) values.

SUPPLEMENTARY NOTE 9. POWER DEPENDENCE
First, we present the differential reflectivity data at early times (122 fs) for normal incidence for four different excitation powers ranging from 500 µW to 5 mW, Supplementary Figure 5. From this figure it is clear that the dependence of the pump-probe response is close to linear. The left panel shows the full spectrum of the probe reflectivity and the right panel shows the maximum of the reflectivity peak as a function of pump power. The linear dependence on the pump power would imply the absence of multi-photon processes. To examine the power dependence in more detail we plot in Supplementary Figure 6 the normalized differential reflectivity spectra for two different times SUPPLEMENTARY FIGURE 6: Power dependence of the experimental differential reflectivity at two different times. The curves are normalized to have the same maximum. SUPPLEMENTARY FIGURE 7: Calculated population changes, ∆f , inferred from our double-inversion procedure, at early and late times for various powers. The curves are normalized to have the same maximum (90 fs and 122 fs). The normalized spectra nearly collapse into a single curve indicating that one-photon processes dominate the population change across the whole measured spectrum. This not very surprising given the fact that we are probing only < 1 eV above and below the Fermi level whereas the pump photon energy is 1 eV. The only noticeable deviation in the normalized spectra is the short wavelength (high energy) tail of the spectra where we see slight broadening of the signal at higher power. The deviation is small indicating that the contribution of multi-photon processes is negligible.
To quantify this non-linear optical contribution we have calculated the carrier population change (∆f ) using our double inversion method, Supplementary Figure 7. The population change is calculated for two different times: early time when the electron gas is not completely thermalized (left panel) and at later time when it is fully thermalized. As it can be seen from these normalized curves the dependence of the width of these curves on the excitation power is very small (it is nonexistent for thermalized gas). Most importantly the observed difference is much smaller than the wide distribution observed for Kretschmann configuration shown in Fig. 6 in the main text. Given that in Kretschmann configuration (Fig. 6) ∆R/R signal strength is comparable to the signal strength for normal incidence at 5 mW (11 mOD vs 9 mOD) we can conclude that the observed broad distributions for Kretschmann configuration in Fig. 6 are very unlikely to be caused by multi-photon processes. Thus qualitatively different mechanism (such as Landau damping) is needed to explain the broad distribution of ∆f in Fig. 6.