## Introduction

Optical coherence tomography (OCT) has been established as one of the most successful and significant optical techniques for biophotonics and clinical biomedical imaging, most notably within the field of ophthalmology. OCT has the ability to perform real-time, non-invasive, and noncontact measurements in reflection, providing three-dimensional (3D) sample visualization1,2. Rapid advances in light sources, detectors, and components for the visible and near-infrared (IR) spectral region have enabled the development of advanced functional techniques as well as high-speed, high-resolution in vivo imaging3,4, including OCT. In recent years, there has been a growing interest in applying OCT for non-destructive testing (NDT) in cultural heritage conservation and industrial quality control to measure, for example, the thickness of coatings and layered materials and to identify subsurface structures and defects5,6,7. In this regard, OCT stands out from other traditional NDT techniques, such as high-frequency laser ultrasonic (LUS) imaging, terahertz (THz) imaging, and micro X-ray computed tomography (μCT), each with its own advantages and disadvantages. Specifically, OCT can offer several advantages in applications prohibiting the use of μCT due to its low contrast and hazardous ionizing radiation, LUS due to the need for noncontact measurements, and THz due to poor spatial resolution and long acquisition time7. Furthermore, OCT is an industry-ready technology that is relatively robust and easy to apply, and can be implemented using low optical power8. However, the main limitation of OCT is the strong scattering of light at visible and near-IR wavelengths, which limits the penetration depth in turbid media from a few tens to hundreds of microns depending on the sample. Since scattering losses are inversely proportional to the wavelength of light relative to the size of the scattering features, it has long been known that the penetration of OCT would benefit from employing a longer center wavelength. Current state-of-the-art commercially available OCT systems for NDT operate in the 1.3-μm wavelength range, utilizing the low water absorption and the maturity of optical fibers and components developed for telecommunications in this region. At longer wavelengths, the light sources and detectors are significantly less efficient and the components are less mature. In addition, water absorption is generally considered to be too strong for imaging of biological tissues and other aqueous samples in this region, and even in the absence of water, many materials may have significant vibrational absorption bands in this region. Therefore, the combined effect of absorption and scattering on the penetration depth makes it nontrivial to assess whether a sample would benefit from being imaged at a longer wavelength. However, limited penetration is easily confirmed at 1.3 μm, where previous studies on a great variety of samples have found that scattering is the primary obstacle for deeper imaging. These studies have examined tablet coatings9, various polymer samples, fiber composite materials6,10, LEDs and printed electronics11,12, and paper quality13, as well as microchannels in ceramics14.

OCT studies investigating longer wavelengths in the near-IR regime at 1.7 and 2 µm have been reported, where increased penetration was demonstrated in composite paint samples15,16, intralipids, rubber17,18, and ceramic materials such as alumina and zirconia19, although scattering was still considered to be the dominant factor in limiting penetration.

Few attempts have been made to utilize the vast potential of mid-IR OCT, such as the early proof of concept work of Colley et al.20,21 demonstrating the first in-depth reflectivity profiles. In their work, a single reflectivity profile (A-scan) of a calcium fluoride window and a topographic image of a gold-palladium-coated tissue sample were presented. These first measurements were based on quantum cascade laser (QCL) emission and cryogenically cooled mercury cadmium telluride (MCT) detection using a time-domain OCT scheme. With a center wavelength of 7 µm, Colley et al.20,21 achieved an axial resolution of 30 µm with an acquisition time of 30 min for a single reflectivity profile, which was heavily affected by the side lobes due to the heavily modulated spectral shape. A similar study on mid-IR OCT was reported by Varnell et al.22 using a superluminescent QCL centered at 5 µm with a smooth spectral shape, resulting in an improved signal-to-noise ratio (SNR) but a reduced bandwidth, leading to a poor axial resolution of 50 µm. Recently, Paterova et al.23 investigated tunable mid-IR OCT up to 3 µm using visible light detection by exploiting the nonlinear interference of correlated photon pairs generated through spontaneous parametric downconversion. However, due to the narrow bandwidth of their tunable source, the axial resolution at 3-µm center wavelength was limited to 93 µm, while the time-domain OCT scheme chosen limited the sensitivity and acquisition speed. In this work, a real-time mid-IR OCT system with a high axial resolution of 8.6 µm is demonstrated. This is achieved by the use of a broadband, high-brightness mid-IR supercontinuum (SC) source for illumination and a broadband frequency upconversion system for detection. Using this technique, an improved penetration of the 4-µm OCT system compared to a 1.3-µm OCT system is demonstrated for the same optical power. The 1.3-µm OCT system used for comparison is a state-of-the-art spectral-domain system using a near-IR SC source with a 400-nm bandwidth (for details, see Supplementary materials), which was recently used in a clinical skin study24.

## Results

### Characterization of the 4-µm OCT system

An overview of the system is shown in Fig. 1 (see Methods for details). The system consists of five modular parts: a custom mid-IR SC source based on a 1.55-μm master-oscillator power amplifier (MOPA) pump laser and a single-mode zirconium fluoride fiber, a Michelson interferometer, a scanning sample translation system, an in-house developed frequency upconversion module, and a silicon complementary metal–oxide–semiconductor (CMOS)-based spectrometer. Each subsystem is connected via optical fiber to ease the coupling and alignment between subsystems. The mid-IR SC source produces a continuous spectrum from 0.9 to 4.7 μm and is set to operate at 1 MHz pulse repetition rate generating 40 mW of average power above 3.5 μm, which is comparable in brightness to synchrotron radiation25. The spectral components below 3.5 μm are blocked by a longpass filter, resulting in 20 mW coupled to the sample arm of the interferometer. The beam is focused onto the sample using a barium fluoride (BaF2) lens, and images are acquired by moving the sample using motorized translation stages. The sample and reference signals are collected in a single-mode indium fluoride fiber and relayed to the upconversion module for spectral conversion to the near-IR. The upconverted signal is then coupled to a multimode silica fiber and imaged onto the spectrometer to resolve the spectrum. Examples of the spectra before and after upconversion are shown in Fig. 2a.

Here, the key technology for enabling fast and low-noise detection is the broadband nonlinear frequency upconversion. Shifting the spectrum to the near-IR is critical to the performance of the system, since state-of-the-art mid-IR detectors (e.g., PbSe, InSb, MCT) suffer from intrinsic thermal background noise and low responsivity compared to their near-IR counterparts. Furthermore, due to the relative immaturity and high cost of mid-IR detectors, focal plane arrays and array spectrometers usually have a limited number of pixels available for detection, thus reducing the spectral resolution that minimizes the axial range. To alleviate these shortcomings, researchers have demonstrated frequency upconversion-based detectors that are functional at room temperature as a promising alternative to traditional direct detection schemes26. In this process, the mid-IR signal passes through a suitable nonlinear crystal, where it mixes with a strong mixing field; thus, a near-IR sum frequency signal is generated without any loss of the information encoded in the spectral modulation of the mid-IR signal. The near-IR signal is then detected in-line by a low-noise high-resolution silicon-based spectrometer. Here, the upconversion module is designed to convert a broad bandwidth of more than 1 μm in the mid-IR (3576–4625 nm) to a narrow band in the near-IR (820–865 nm) without any parametric tuning. To achieve this, a noncollinear angular phase-matching scheme is employed for fast parallel detection (see Methods for details). This scheme includes an MgO-doped periodically poled lithium niobate (PPLN) crystal as the nonlinear medium and a continuous-wave (CW) solid-state pump laser at 1064 nm with ~30 W of mixing power. The PPLN crystal is placed inside a laser cavity to gain access to the highest available power, which is directly proportional to the upconversion efficiency27. The module works at room temperature with a high quantum efficiency (QE) of ≥1% for polarized input signal over the entire targeted spectral range. The upconverted spectra of 45-nm bandwidth (corresponding to 1049-nm mid-IR bandwidth) are detected over 2286 pixels, which results in an average mid-IR sampling resolution of 0.46 nm.

The lateral resolution of the system is investigated by scanning a USAF 1951 resolution test target. The translation speed of the fast (horizontal) scanning axis was set to 3 mm/s, and with an integration time of 3 ms per line for maximum signal, 1000 line scans were performed in 3 s, resulting in a 9-μm horizontal sampling resolution. Along the slow (vertical) scanning axis of the stage, the resolution is determined by the step size of the scanner, which was set to 10 μm. From Fig. 2b, it is shown that the system is able to resolve elements 1 and 2 of group 6, which results in a lateral resolution of ~15 μm. To determine the axial resolution, the sensitivity, and the sensitivity roll-off, a plane mirror was positioned in the sample arm and the resulting interferograms were recorded for different one-way optical path difference (OPD) values by translating the mirror in the reference arm.

Figure 2c displays the sensitivity roll-off for 3-mm translation, which can be used to infer the effective imaging depth of the system. From the 6 dB roll-off of 1.35 mm, the corresponding mid-IR spectral resolution was determined to be 2.7 nm. From the same data, an axial resolution of 8.6 µm is found at an OPD of ~100 µm (see inset of Fig. 2c) and better than 9 µm within the first 1 mm. Therefore, even for such a long center wavelength, the axial resolution in air is similar to values reported for near-IR systems at 1.7 and 2 µm15,17,18,19. In the best case, a sensitivity of 60 dB is obtained, which is significantly lower than the 90 dB sensitivity of the 1.3-µm OCT system used for the near-IR comparison (see Supplementary section IV for details).

### Mid-IR OCT in comparison to near-IR OCT: proof-of-principle

As an example of the improved penetration of the mid-IR OCT system for high-resolution imaging of subsurface features, the experiments of Su et al.19,28 are replicated. They demonstrated that features scanned through ceramic plates, which were not distinguishable using 1.3-µm OCT, became visible by employing a 1.7-µm central wavelength (see Supplementary section VI for a direct comparison to this work). Supported by a series of Monte Carlo (MC) simulations, Su et al.19,28 concluded that a 4-μm wavelength OCT system would be able to image through the milled alumina plate to reveal the backside of the stack. To test this, the same ceramic samples were obtained through the Research Center for Non Destructive Testing (RECENDT) in Linz, Austria. A schematic of the ceramic stack is illustrated in Fig. 3a. The stack consists of three layers of ceramic plates (C1–C3), where C1 is a 375-μm-thick zirconia plate, C2 is a 475-μm-thick alumina plate with up to 60 μm deep channels and squares of varying widths between 5- and 300-μm laser milled into the surface, and C3 is a 300-μm alumina plate. All three plates have negligible absorption from 0.5 to 4 μm19. Due to the strong scattering of alumina at 1.3 μm, the near-IR system was not able to penetrate the alumina plates, so for comparing OCT at 1.3 and 4 μm, the sample was imaged first from the top passing through the zirconia plate C1. Figure 3b shows an average of ten A-scans obtained using the two different systems, corresponding to the vertical dashed lines in Fig. 3c. The peaks between 1080- and 1200-μm OPD is the interface between the zirconia (C1) and inscribed alumina samples (C2) separated by an inscribed air channel, and the peak at 1880-μm OPD is the interface between the two alumina plates (C2 and C3). The thickness of the plates can then be calculated by dividing the OPD thickness (809.9 μm for C1 and 787.0 μm for C2) by the refractive indices (2.138 at for zirconia and 1.675 for alumina)19, resulting in 378.8 ± 3.8 and 469.9 ± 4.9 μm for C1 and C2, respectively, with the uncertainty corresponding to twice the sampling resolution divided by the refractive index. Both thicknesses are very close to the values measured using the mechanical thickness gauge, especially considering the uncertainty of the refractive indices. Figure 3c shows the corresponding B-scans obtained using the two OCT systems. By comparing the two images, it is clear that the mid-IR signal penetrates deeper into the sample and is able to resolve the boundary between C1 and C2 with a higher contrast, as well as the boundary between C2 and C3. The 1.3-μm OCT system provides only a weak signal from the C1 to C2 boundary despite the much higher sensitivity. This result is consistent with the findings of Su et al.19; to further support this finding, a series of MC simulations were performed (see Methods for details). The simulations, shown in Fig. 3d, qualitatively confirm the improved visualization of the in-depth interfaces in the 4-μm OCT images (see also Supplementary video I). Figure 3e shows a 3D visualization of the sample over a 5.6 mm × 7.3 mm scanned area, which shows strong localized scattering centers from within the porous C1 layer and a clear identification of the laser-milled structures below the C1–C2 interface (see also Supplementary videos II and III).

To further test the penetration capability of the 4-μm system, the stack was imaged from the bottom through the C3 alumina plate. Figure 3f shows a 3D scan of this configuration, which surprisingly reveals a significantly improved signal from the backside C2–C1 interface despite having traveled through 775 μm of alumina. As seen in Fig. 3g, the microstructures are clearly resolved, revealing the surface roughness from inside the milled areas (see Supplementary Fig. S1). This demonstrates the capability of the mid-IR OCT system to perform high-resolution imaging in scattering media. Due to the high spatial sampling resolution and maximum integration time used, the 800 × 730 × 2048 pixel volume took 36.5 min to acquire.

### Multiple scattering, dispersion, and non-uniform samples

In the experiments with the ceramic plates, the reduced scattering at longer wavelengths clearly resulted in improving the imaging depth. However, for highly scattering materials, OCT imaging is severely impaired not only by the reduced penetration depth but also by the scrambling of the spatial image information through the effect of multiple scattering. To illustrate this, a sample consisting of a highly scattering alumina green tape of varying thickness applied to a cellulose acetate foil of ~62-μm thickness was imaged with the near-IR and mid-IR OCT systems. Figure 4 shows a photograph of the sample with annotations to indicate the scanned regions P1–P5, where the thickness of the alumina tape increases from P1 to P5. The corresponding B-scans for the two systems are presented in adjacent columns for direct comparison. P1 corresponds to a B-scan of the acetate foil only, while the following images (P2–P5) portray the effect of the increased thickness of the alumina tape imaged by the two systems. It is clear from the B-scans using the 1.3-μm OCT system that a band of multiple scattered signals increasing in thickness from P2 to P5 eventually obscures the acetate–air interface in P4 and P5, extending the apparent thickness of the alumina tape beyond its physical extension. However, in the images acquired using the 4-μm OCT system, the integrity of the spatial information is retained and the thickness of the alumina tape can be evaluated. Sectioning the structure along P3, the backside of the acetate foil (n = 1.48) is just barely resolved with 1.3-μm OCT, while it can be clearly seen in the 4-μm B-scan. From the latter, the thickness of the alumina film (n ~1.68)19 is estimated to be just 24 μm, but the multiple scattered signals in the B-scan image at 1.3 μm extend all the way beyond the backside of the acetate foil. The strong scattering of the 1.3-μm light indicates that the porous structure of the alumina layer is comparable in size to the wavelength, which is consistent with the 0.5-μm mean pore diameter for alumina reported in Su et al19. The significant reduction in multiple scattering going from 1.3 to 4 μm therefore represents the transition from the domain of strong directional Mie scattering with a short photon mean-free path to the domain of weaker uniform Rayleigh scattering having a long photon mean-free path, compared to the wavelength of light.

The mid-IR OCT system can also have an advantage even in highly uniform and transparent media, such as germanium due to its bandgap in the near-IR, or silicon due to its lower group-velocity dispersion (GVD) at 4 μm. The effect of dispersion is to introduce a difference in delay between the short and long wavelengths of the spectrum, which in turn results in broadening of the backscattered signal. For this reason, the correct depth resolved information in OCT can only be obtained after dispersion compensation29. To illustrate this, a 255-μm-thick (±5 µm) silicon wafer was imaged using the two systems having a GVD of 1576 and 385fs2/mm at 1.3 and 4 μm, respectively30. At 1.3 μm, the strong dispersion results in significant broadening of the reflection peak of the wafer from 5 to 18 μm full-width at half-maximum (FWHM), whereas at 4 μm the peak broadened significantly less from 12 to 17 μm FWHM, as seen in Fig. 5. As a result, the image distortion due to dispersion is less pronounced in the 4-μm OCT system, which could be useful for characterization of silicon-based devices, such as micro electromechanical systems, solar cells, and waveguides. The thicknesses measured with the 4- and 1.3-µm OCT systems are 250.6 ± 1.7 and 263.0 ± 0.79 µm, respectively, when accounting for the refractive indices and with the uncertainty given by the digital sampling. The discrepancy in the measured thicknesses is attributed to the strong dispersion at 1.3 µm.

Finally, to demonstrate 3D imaging of more complex non-uniform structures, a Europay, MasterCard, Visa-chip (EMV chip) and a near-field communication antenna embedded in a standard credit card are imaged at 4 μm. Credit cards are commonly made from several layers of laminated polymers mixed with various dyes and additives. An overview of the scanned chip area is shown in Fig. 6ab in an en face and cross-sectional view, respectively (see also Supplementary video IV and Supplementary Fig. S2). Underneath the thin transparent surface, three layers of highly scattering polymers are identified, which are seen most clearly at the edges of Fig. 6b as bright uniform bands (P1–P3) separated by dark lines. As seen in Fig. 6d, the polymer is so scattering in the near-IR that even the top most layer (P1) cannot be penetrated by the 1.3-μm OCT system. On the other hand, the 4-μm OCT system is able to resolve all three polymer layers and can in some places even detect the backside of the card, which is 0.76 mm thick. Below the first scattering polymer layer, the first structure to appear is the encapsulation layer protecting the embedded silicon microprocessor. The bonded wires and circuitry connecting the microprocessor to the underlying gold contact pad are also clearly visible, as seen in Fig. 6c.

## Discussion

From the images of various samples presented here, it would seem that the 4-μm OCT system is superior to the 1.3-μm OCT system in every aspect. Even with a significantly reduced sensitivity, the reduced scattering at 4 μm allows clear depth resolved imaging of interfaces from samples considered too thick for the 1.3-μm OCT system. The increased water absorption at long wavelengths naturally excludes biological samples, but except for a group of carboxylic acids, the spectral region of 3.6–4.6 μm is remarkably devoid of vibrational resonances, making it ideal for NDT using OCT. It is therefore expected that most materials and samples with low water content, that are currently being analyzed at 1.3 μm but are limited to a penetration depth of few tens to hundreds of microns due to scattering, can benefit from being imaged at 4 μm. The measured axial resolution is 8.6 µm, which agrees well with the expected value for a zero-padded harmonically modulated spectrum ranging from 3576 to 4625 nm, obtained for the shortest OPDs. To improve the resolution further, it is necessary to increase the bandwidth of the detected spectrum beyond the already remarkably broad >1-µm bandwidth, which is currently limited by the bandwidth of the upconversion scheme and the long-wavelength edge of the SC source. The sensitivity roll-off is currently limited by the radially varying spatial mode profile from the upconversion module, which enforces the use of a large multimode fiber as the limiting aperture for the spectrometer. This could be optimized if the upconverted beam could be efficiently transformed to match the fundamental mode of a single-mode collecting fiber, for which the spectrometer was designed. One way of achieving this could be implementing aperiodic poling of the nonlinear crystal31,32 to satisfy collinear phase matching over a broad spectral range and thus enabling single-mode fiber coupling to the spectrometer. Fulfilling this, the spectral resolution can be made comparable to the sampling resolution of the spectrometer, thus increasing the maximum image depth of the system. According to the characterization made by the manufacturer, the maximum image depth supported by the spectrometer is 8.6 mm, which is possible since only a 15 dB attenuation is expected using an optimized single-mode spot size. Approaching a centimeter in optical path length would pave the way to a whole new series of NDT applications where both imaging at macroscopic length scales along with microscopic details are required. In the end, a compromise between the upconversion efficiency, the upconversion bandwidth and the properties of the output spatial mode must be achieved. To reach 90 dB sensitivity, the upconversion efficiency has to be increased tenfold. In this regard, new upconversion schemes under development may allow an increase in the intracavity power to as much as 100–150 W, thus improving the efficiency by a factor of 3–5. Similarly, the acquisition speed could be improved through increased conversion efficiency or by increasing the signal power, to reduce the integration time of the spectrometer.

In conclusion, fast real-time spectral-domain OCT imaging in the mid-IR was demonstrated using an SC source in combination with parallelized frequency upconversion covering more than 1µm bandwidth from 3.58 to 4.63 μm. The broad bandwidth resulted in an axial resolution as high as 8.6 µm, which together with a lateral resolution of 15 µm enabled detailed imaging of microscopic structures embedded in media that are highly scattering at the more conventional 1.3-µm wavelength. The upconverted signal was measured using a Si-CMOS spectrometer acquiring data with a line rate of 0.33 kHz, which enabled real-time B-scans at mm/s speed, and C-scans at mm2/min acquisition rate. To demonstrate the superiority of mid-IR OCT, comparative data have been presented using an ultra-high-resolution OCT system at the more conventional 1.3-µm wavelength. To validate the results, samples were assembled similar to those used in other previous reports. Through the realization of real-time mid-IR OCT imaging, this work bridges the gap between technology and practical applications of mid-IR OCT as an industry-ready tool for NDT.

## Materials and methods

### SC generation

The SC source was pumped by a four-stage MOPA using an unfolded double-pass amplifier configuration based on a 1.55-μm directly modulated seed laser diode. The seed pulse duration is 1 ns, and the repetition rate is tunable between 10 and 10 MHz. The seed is subjected to three stages of amplification in erbium- and erbium-ytterbium-doped silica fibers, which extends the spectrum to 2.2 μm by in-amplifier nonlinear broadening. To further push the spectrum towards longer wavelengths, the erbium fiber was spliced to ~40 cm of the 10-μm core diameter thulium-doped double-clad fiber, which extended the spectrum to 2.7 μm. The thulium-doped fiber was subsequently spliced to a short piece of silica mode-field adapter fiber having a mode-field diameter of 8 μm, which provided a better match to the fluoride fiber. The mode adapter fiber is butt-coupled to a 6.5-μm core diameter single-mode ZrF4-BaF2-LaF3-AlF3-NaF (ZBLAN) fiber from FiberLabs Inc., with a short length of approximately 1.5 m to reduce the effect of strong multiphonon absorption beyond 4.3 μm.

### Interferometer

The interferometer is based on a Michelson design employing a gold-coated parabolic mirror collimator, a broadband CaF2 wedged plate beam splitter, a BaF2 plano-convex lens in the sample arm, and a BaF2 window and flat silver mirror in the reference arm. The BaF2 lens was chosen to minimize the effect of dispersion, while being available at a relatively short focal length of 15 mm. At 4 μm, the dispersion of BaF2 is relatively low at 16.4 ps/nm/km compared to other standard lens substrates, such as CaF2 (33.0), Si (−45.8), and ZnSe (−59.9), but most importantly, the dispersion slope is flat from 3.5–4.5 μm (13.6–19.1 ps/nm/km)33. Even so, the residual dispersion from the 6.3-mm center thickness lens was roughly compensated by a 5-mm window and the remaining dispersion was compensated numerically. Coupling to the upconversion module was performed using a 6-mm focal length aspheric chalcogenide lens and a 9-μm core diameter single-mode indium fluoride patch cable.

### Upconversion

Upconversion is realized by mixing the low-energy IR photons at a wavelength λIR with pump photons at a wavelength λP to generate upconverted photons with a wavelength λUP, under energy and momentum conservation:

$${\mathrm{\lambda }}_{\mathrm{P}}^{ - 1} + {\mathrm{\lambda }}_{{\mathrm{IR}}}^{ - 1} = {\mathrm{\lambda }}_{{\mathrm{UP}}}^{ - 1},\:{\mathrm{\Delta }}\vec k = \vec k_{{\mathrm{UP}}} - \vec k_{\mathrm{P}} - \vec k_{{\mathrm{IR}}}$$
(1)

where $$\vec k$$ is the wave propagation vector and $${\mathrm{\Delta }}\vec k$$ is a measure of the phase mismatch among the three interacting waves, which should ideally be zero for maximum QE. Furthermore, the QE scales linearly with the pump power and scales quadratically with the effective nonlinear coefficient (deff) and the length of the nonlinear medium27. The mid-IR OCT system is operated at a wavelength of 4 µm with a spectral bandwidth >1 µm. Accordingly, the upconversion module was designed and optimized to upconvert the entire spectral range from 3.6 to 4.6 μm for the fastest detection. Among the various phase-matching configurations, quasi-phase matching in a PPLN crystal was chosen for the broadband upconversion, owing to its design flexibility, access to a high deff (14 pm/V), and optical transparency up to 5 µm26. The upconversion took place inside the PPLN crystal, where each wavelength was phase-matched at a different propagation angle. Thus, noncollinear interaction among the three participating waves was used to phase match over a wide spectral range (see Supplementary Fig. S3a). As the λUP is below λP, by choosing the pump wavelength λP=1 µm, conventional Si-CMOS detection can be engaged for λUP. Here, a solid-state (Nd:YVO4) CW linearly polarized laser operating at 1064 nm was used as the pump. This was driven by a broad area emitting laser diode (3 W, 880 nm). The high-finesse folded solid-state laser cavity was formed by mirrors M1–M7 (see Supplementary Fig. S4). All mirrors are high-reflective coated for 1064 nm and anti-reflective (AR) coated for 700–900 nm. The mirror M7 is based on an undoped YAG substrate and additionally high-transmission -coated for the 2–5-µm range. The mirror M6 acts as an output coupler for the upconverted light. The mirrors M4 and M5 are placed in a separate compartment to filter out the fluorescence from the laser crystal and 880-nm pump laser. A 20-mm-long 5% MgO-doped PPLN crystal is used for the experiment (Covesion, AR-coated for 1064 nm, 2.8–5.0 µm on both facets). The PPLN crystal consists of five different poling periods (Λ) ranging from 21 to 23 µm in steps of 0.5 µm. Each poled grating has a 1 mm × 1 mm aperture and is separated by 0.2-mm-wide regions of unpoled material. For different values of Λ, the phase mismatch and hence the overall upconversion spectral bandwidth varies (see Supplementary Fig. S3b). A wider bandwidth requires larger input angles for the IR beam, which reduces the overall QE as the effective interaction length is reduced. For a best-case scenario, Λ = 23 µm was considered in the setup. A CW intracavity power of >30 W at 1064 nm was realized with a spot size (beam radius) of 180 µm inside the PPLN crystal. The entire system is operated at room temperature. The estimated maximum QE at each mid-IR wavelength is plotted in Supplementary Fig. S3c, considering an effective interaction length of 20 mm inside the PPLN crystal. The IR light (output of the fiber coupled 4-µm OCT signal) is collimated and then focused into the PPLN crystal using a pair of CaF2 aspheric lenses (f = 50 mm, AR coated for 2–5 µm). The upconverted light is collimated by a silica lens (f = 75 mm, AR coated for 650–1050 nm). A shortpass 1000-nm filter and a longpass 800-nm filter are inserted to block the leaked 1064-nm beam and 532-nm parasitic second harmonic light, respectively. A schematic representation of the radial wavelength distribution across the transverse upconverted beam profile is shown in Supplementary Fig. S4. The module is able to upconvert all wavelengths in a broad mid-IR spectral range of 3.6–4.8 µm to 820–870 nm, simultaneously.

### Detection, scanning, and data processing

After the upconversion module, the near-IR light is collected by a 100-µm core multimode silica fiber guiding the light to a line scan spectrometer (Cobra UDC, Wasatch Photonics, USA) operating with a maximum line rate of 45 kHz (for a bit depth of 10). The spectral range covered the wavelengths of 796 to 879 nm, which is sampled by 4096 pixels. To scan the sample, it is mounted on a double translation stage (2 × ILS50CC from Newport) with a maximum travel speed of 100 mm/s, a travel range of 50 mm and a stepping resolution of 1 µm. The detected raw spectra are dark signal subtracted and normalized to the reference arm signal. Pixel to wavenumber translation and interferometer dispersion compensation are achieved by exploiting the phase information across the pixel array retrieved for two reference interferograms showing clear interference fringes. In this way, spectral resampling is performed to linearize wavenumber sampling34, after which a phase shift is applied for compensating the unevenly matched dispersion in the arms of the interferometer. To suppress the effects stemming from the spectral envelope of the interferograms, a Hanning spectral filter is applied to the spectral region of the interferometric signals. Finally, a fast Fourier transform is applied to generate a reflectivity profile, that is, an A-scan. A compromise between signal strength and acquisition time is made that leads to an A-scan acquisition time of 3 ms. To build B-scans (2D images), the horizontal stage (X) is programmed to move continuously over a specified distance, achieving a 500 line B-scan in 1.5 s. 3D scans are built by stepping the vertical (Y) stage at a proportionately slower rate to assemble multiple B-scans.

### MC simulations

The MC simulations are performed using the open source software MCX35. The simulated domain consists of 1400 × 9 × 230 uniform voxels in XYZ with a size of 5 × 5 × 5 μm3 each, corresponding to a total size of 7000 × 45 × 1150 μm3. The structure is considered uniform along the Y direction, similar to what is shown in the ridge/valley part of Fig. 3a. The slab thicknesses and the valley depths in the simulation are identical to the real ones, but the valley widths in the simulation increase along the horizontal direction from 5 μm at one side of the domain to 180 μm to the other side in steps of 5 μm for the sake of simplicity. The optical properties of zirconia and alumina used in the MC simulations are taken from Su et al19. The MC software launches a number of photons into the defined structure and tracks their path taking into account the refractive index, absorption, scattering, and scattering anisotropy using a random number generator to emulate scattering statistics. The simulated source uses a beam with a Gaussian transverse distribution with a constant width (5 μm for 1.3-µm center wavelength, and 15 μm for 4-µm center wavelength), due to the MC simulations being pure geometrical optics, which hinders the use of a diverging Gaussian beam. The OCT signal in a single A-scan is extracted from the MC simulation by, at each time step, summing the flux of photons that return to the top layer of the simulation within an angle corresponding to the NA of the system, which is 0.1 for both 1.3 and 4 µm. The simulations thus emulate a time-domain OCT implementation. The simulations ran for T = 14.4 ps, in steps of dt = 24 fs, giving a digital sampling of dz = (c dt)/ 2 = 3.6 µm in air. Due to the lack of wave nature of light in the MC simulations, both the axial and lateral resolutions are limited by the digital sampling only, which is unphysical of course, but accepted here because the investigation focuses on the comparison of the penetration depth. The B-scans are constructed by collecting A-scans obtained by moving the source 10 µm laterally, which gives 700 A-scans in a B-scan. Each A-scan simulation is performed using a different seed for the random number generator. To make the raw B-scans represent their experimental counterparts, they are corrected for roll-off by multiplying each A-scan with the measured roll-off curve, as well as corrected for the reduction in signal strength caused by the divergence of the Gaussian beam. This correction is introduced by multiplying the signal at a given depth, z, by the overlap integral, C, between the fundamental mode of the fiber that collects the light and the transverse distribution of the light reflected at z. This overlap integral is approximated here by the overlap integral between the Gaussian distribution of the propagating beam at the focal point U(x,y,zF) and the transverse distribution of light that is reflected at another point z>zF. Due to the double pass of light, the second distribution becomes Gaussian like the input beam at a distance 2(z − zF) from the focal point:

$$C \approx \frac{{\left( {{\int} {{\int}_{ - \infty }^\infty {U\left( {x,y,2z - z_{\mathrm{F}}} \right)U\left( {x,y,z_{\mathrm{F}}} \right){\mathrm{d}}x{\mathrm{d}}y} } } \right)^2}}{{{\int} {{\int}_{ - \infty }^\infty {U\left( {x,y,2z - z_{\mathrm{F}}} \right)^2{\mathrm{d}}x{\mathrm{d}}y} } {\int} {{\int}_{ - \infty }^\infty {U\left( {x,y,z_{\mathrm{F}}} \right)^2{\mathrm{d}}x{\mathrm{d}}y} } }} \\ = \frac{{4w_0^4\pi ^2\left( {4\left( {z - z_{\mathrm{F}}} \right)^2\lambda ^2 + \pi ^2w_0^4} \right)}}{{\left( {4\left( {z - z_{\mathrm{F}}} \right)^2\lambda ^2 + 2\pi ^2w_0^4} \right)^2}}$$
(2)

where U(x,y,z) represents a Gaussian beam with e–2 width w0. The focal point, zF, is placed 50 µm below the top surface of C1 to emulate the experimental conditions. The exact parameters and options input to the MCX software as well as the files specifying the spatial domain and the overlap integrals are freely available from the authors upon request.