Introduction

Abnormalities in intraluminal epithelium can herald the onset of serious diseases such as cervical cancer1,2, bladder cancer3, esophageal cancer4, lung cancer (especially bronchial carcinoma)5, and pancreatic cancer6. The importance of early detection and intervention strategies by imaging membrane microstructures, like the mucosa, are broadly acknowledged in the literature7. Optical coherence tomography (OCT) generates high-resolution images with an axial resolution of ~1–15 µm8,9. Its penetration depth is tissue-dependent, spanning 0.1–2.0 mm, relatively inferior to that afforded by ultrasound10. Other methods, like confocal endoscopy, offer higher resolution (0.5–1 µm) but limited imaging depth (up to 250 µm)11. By striking a balance between imaging resolution and depth, OCT emerges as a promising modality for intraluminal imaging. It has also been validated for accurately diagnosing coronary artery disease and thrombus formation12,13 (see

However, the complex structure of internal lumens14 and their dynamic nature15, as well as the mechanical rotational resistance encountered during imaging, impose challenges to conventional OCT endoscopes operating in proximal scanning mode12. This mode involves the torque transmission from a proximal motor via a torsion spring. It is not applicable for deployment in certain in vivo settings12, such as the nearly perpendicular intersection between the pancreatic duct entrance and the duodenum16, and the large subcarinal angles of the anterior segment of the left main bronchi17. The sharp turns introduce uneven pressure along the torsion spring, hindering smooth torque transmission. As a consequence, nonuniform distal rotation and imaging distortions occur. To overcome these obstacles, the miniature permanent magnet synchronous motor (PMSM) presents a promising alternative for directly driving the distal end of the probe18,19. However, the use of PMSM inevitably generates local heat, which may trigger thermal injuries and leads to thrombus formation and blood–brain barrier disruptions20,21,22,23. In addition, potential voltage leakages are unacceptable for their potential to induce arrhythmias, respiratory arrest, seizures, or even direct brain damage and paralysis24,25,26. Electrical shock injuries may stimulate a rise in thrombin levels, thereby leading to vasoconstriction and thrombus formation27,28,29. These potential risks have precluded the use of PMSM-based endomicroscopy in thermoelectric-sensitive scenarios.

Previous studies have demonstrated the application of an external magnetic field to a magnetized object, enabling both rotational and linear movements30. The telerobotic mono-magnet actuator has recently demonstrated potential in tasks such as guidewire navigation31 and stent deployment32, owing to its operational flexibility, ease of implementation, and compact design. Moreover, its ability to continuously achieve a 0–30 Hz magnetic field working frequency without significant heat effect and to avoid substantial interference with ferromagnetic surgical instruments enhances its safety and clinical potential33,34,35 (see Supplementary Note 1). This has led to the replacement of internal electromagnetic motors in PMSMs with external magnetic actuators, circumventing the risks of local heat and voltage leakage36,37. However, the exploration of mono-magnet-actuated robotic systems for OCT imaging in complex and curved environments is still limited. In particular, the effects of a 6-degree-of-freedom (DoF) mono-magnet, allowing translation along x, y, z axes and rotation around three axes (see Fig. 1a), in a nonlinear magnetic field (0–10 mT/mm), along with the difficulties in designing and fabricating a millimeter-scale probe, on imaging quality, remain poorly understood. Furthermore, the simultaneous remote optical scanning and steering entails further research.

Fig. 1: Overview of the MMAT-OCT system and its capacities.
figure 1

a A 6-degree-of-freedom (6-DoF) robotic arm terminated with a magnetic actuator for probe navigation and OCT imaging in in vivo scenarios. The black box represents the MMAT-OCT probe. BEPM indicates the magnetic field generated by the actuator. b Schematic of the ultrahigh-resolution 800-nm SD-OCT endoscopic system. AC achromatic collimator, PP prism pair, M mirror, PC polarization controller, CLTS computerized linear translation stage. c Assembly design (I) and photography (II) of the MMAT-OCT probe. d Schematic illustration of steering MMAT-OCT in a phantom model. Fmag denotes the magnetic force, Tmag denotes the magnetic torque. e The angular scanning MMAT-OCT with a pullback CLTS, enabling programmable imaging of the targeted tissue. \(r\) denotes the inner diameter of the cavity; \(\theta\) is the scanning angle; and \({{{\mathcal{l}}}}\) represents the laser’s path during the angular scanning. The cross-sections display the representative angular scans of the mouse colon tissue. a, b Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

Here, we present the mono-magnet actuated OCT (MMAT-OCT), a telerobotic-OCT endoscope that incorporates a rotatable diametrically magnetized cylinder permanent magnet (RDPM) controlled by the mono-magnet actuator, enabling high-resolution navigational imaging in complex lumens (see Fig. 1a, b). The externally driven magnetic field eliminates heat effects induced by current and voltage, achieving a maximum generated voltage of less than 0.02 mV (estimated current of less than 0.016 mA as opposed to 0.7–3.8 A in PMSMs19,38), and a temperature rise of less than 0.5 °C (compared to 2–8 °C in PMSMs19,38) after an hour of continuous operation (see Supplementary Note 2). The proposed MMAT-OCT probe is produced utilizing a simple and economical laboratory technique, with a total cost not exceeding US$ 14 (as opposed to US$ 1000–2000 in commercial PMSMs, see Fig. 1c and Supplementary Table II)38,39. Despite the challenges posed by the small probe size, making closed-loop feedback control unfeasible, and the introduction of nonuniform rotational distortion (NURD) during open-loop sampling, a neural network trained with synthetic data is employed to effectively identify and correct distorted regions in OCT images (see Figs. 2 and 3). The 3D imaging performance of MMAT-OCT has been further validated on in vivo mouse colon and rat esophagus (see Fig. 3c, d). The mono-magnet actuated RDPM inside the probe distal allows for maximum steering of 110° at a working distance of 30 mm (magnetic field strength ~500 mT, gradient ~0.06 mT/mm). We demonstrate the guidewire-free navigation of MMAT-OCT under extreme bending conditions using an ex vivo vascular phantom (see Figs. 1d and 4). This feature improves the utility of MMAT-OCT in thermoelectrically sensitive cavities by addressing issues such as guidewire shadow that obstruct tissue or stent visualization. It also prevents OCT endoscope entrapment and tissue damage during guidewire withdrawals in distal, calcified, or curved vessels12. The mono-magnet actuator reliably controls the RDPM’s orientation with an average open-loop angle control error of ~4°, facilitating programmable 3D imaging (see Figs. 1e and 5). These advancements have the potential to greatly enhance microscopic mucosal imaging technologies, opening up possibilities for targeted imaging, diagnosis, and therapy40.

Fig. 2: Quantitative evaluation of the MMAT-OCT’s scanning performance.
figure 2

a Operational configuration of the EPM and MMAT-OCT probe within the robotic system (I), inclusive of RDPM force analysis (II), and the simulated and theoretical impact of relative positioning (i.e., working distance) on the magnetic forces (III1 and IV1) and torques (III2 and IV2) exerted on the RDPM (FEM refers to the finite element method; MDM refers to the magnetic dipole model). Plane abcd serves as the central radial section of the EPM for reduced 2D dynamics analysis of the RDPM. \({{{\mathcal{L}}}}\) indicates the working distance. The forces and torques analyzed include the magnetic force \({{{{\bf{F}}}}}_{{{{\rm{m}}}}}\), gravity \({{{\bf{G}}}}\), normal force \({{{{\bf{F}}}}}_{{{{\rm{n}}}}}\), and coulomb friction \({{{{\bf{F}}}}}_{{{{\rm{f}}}}}\) from the enclosure, along with magnetic torque \({{{{\bf{T}}}}}_{{{{\rm{m}}}}}\) and frictional torque \({{{{\bf{T}}}}}_{{{{\rm{f}}}}}\), and n represents the RDPM’s magnetic moment, and \(\alpha\) indicates the contact point position between the RDPM and the enclosure. b Comparison of simulated and example measured rotation dynamics (n = 3) under variant \({{{\mathcal{L}}}}\) ranging from 45 to 65 mm, based on three independent measurements, along with the corresponding box plots from 100 samples, consistent with the measured data. Mean friction coefficients in simulated conditions are: \(\bar{\mu }\)c1 = 0.12; \(\bar{\mu }\)c2 = 0.325; \(\bar{\mu }\)c3 = 0.53. Cubic spline interpolation is used to approximate the rotation dynamics in the left figure. c Comparison of OCT images at sampling frequencies of 20 Hz, 30 Hz, and 40 Hz for a rated rotation frequency of 10 Hz. Blue arrows indicate missing A-line regions, while red arrows indicate strut locations. In all box plots, the center line represents the median. The red hollow square in the middle indicates the mean. The bounds of the box indicate the 25th (Q1) and 75th (Q3) percentiles. The whiskers extend to the smallest and largest values within 1.5 times the interquartile range (IQR) from the bounds of the box (Q1 − 1.5IQR, Q3 + 1.5IQR). Data points outside this range are considered outliers and are plotted individually. Source data are provided as a Source Data file.

Fig. 3: NDNC-Net modeling RDPM rotation dynamics for correcting OCT NURD.
figure 3

a Synthetic data generation involves distortion-free B-frame resampling with theoretical and measured rotation dynamics. The integrated area (e.g., s1 and s’1, unit: rad) of rotational speed (unit: rad/s) over the sampling interval (i.e., \({\lambda }_{1}\) and \({\lambda {\prime} }_{1}\), unit: second) represents the displacement of the acquired A-line. The matched index pairs (e.g., 1 and 1’) describe the same column pixel. The intervals between adjacent A-lines in resampled images either stretch or shrink compared to the distortion-free B-frame due to nonuniform rotation dynamics during resampling. Conversely, NURD correction is the inverse operation of synthetic data generation. The estimated rotation dynamics are derived from NCNet. b Inference phase: Struct detection network and NDNet detect struct regions and stretching regions. Preprocessing steps, including cropping and padding, estimate the rotation dynamics using NCNet. The unit “a.u.” stands for arbitrary units. The final corrected image is obtained by resizing and concatenating the stretch-corrected regions, stretch-free regions, and struct regions. c, d Test NDNC-Net on mouse colon (n = 4 biological replicates) and rat esophagus (n = 4 biological replicates) images, demonstrating high-resolution tissue microstructures correlating well with histological micrographs. Scale bars are 250 µm.

Fig. 4: Design and evaluation of the steerable MMAT-OCT for luminal navigation.
figure 4

a Design of the Flexible–Soft–Rigid segmented gradient stiffness distribution of the MMAT-OCT. b Analytical and experimental results (n = 3) of the maximum bending angle in the orthographic (OPA, I and II) and lateral (LPA, III) projection areas. A dashed line indicates the fit. Data are presented as mean values +/− SD. c MMAT-OCT with a 10° pre-bend (a) tested on a transparent, rigid vascular phantom (I). The gray arrows indicate the feed or withdrawal direction of the MMAT-OCT. The blue arrows represent the resistance applied by the phantom on the probe. The red arrows denote the steerable force/torque. Manual advancement without rotation. Path 1 (a1–e1) shows navigation challenges overcome by EPM-induced 20° deflection at 00.24 s. Successful lumen retraction between 00.44 s and 00.46 s. At 00.53 s, the probe traps at c1, likely due to frictional self-locking. (II). Effective navigation of a longer, curved lumen (a2–g2) demonstrates the Flexible–Soft–Rigid design’s efficiency (III). The probe’s position is indicated by a red square marker. Source data are provided as a Source Data file.

Fig. 5: Programmable imaging of the MMAT-OCT facilitated by angle control.
figure 5

a (I) The input magnetic field in the PMSM’s coordinate system {\({{{\mathcal{N}}}}\)} is unobservable due to catheter deformation, making positional information alone in the global coordinate system {\({{{\mathcal{M}}}}\)} insufficient for precise angular control. (II) The state of the EPM’s fixed coordinate system {\({{{\mathscr{Q}}}}\)} is observable. With the known origin of probe’s coordinate system {\({{{\mathcal{P}}}}\)}, effective angular control in {\({{{\mathcal{M}}}}\)} is feasible. b Raw B-frames of scanning angle of 90° (I) and 360° (II), with red boxes signifying similar tissue areas. (III) The effective scanning area scales from -15% to 15%, using the circular scan image as a template for feature matching. The full-circle scanning duty cycle is set to 1 by default. The unit “a.u.” stands for arbitrary units. c Angular errors (n = 5) increase with the predefined scanning angle. Data are presented as mean values +/− SD. d Polar plots of post-processed OCT angular scanning. The red arrows indicate the epidermis; the blue arrows indicate the dermis, and the blue asterisks indicate the air gaps. e 3D programmable imaging of the capital “CUHK” on mouse colon tissue. (I) Targeted 3D scanning region defined as the capital letters “CUHK”. (II) 3D reconstruction via back-projection to reflect the pattern’s morphology. (III) Patterned scanning results (top view) on mouse colon tissue. Scale bars are 250 µm. (IV) Representative B-scan images at 200 µm intervals. Source data are provided as a Source Data file. a Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

Results

Design of MMAT-OCT endoscopic system

The endoscopic system consists of an 800-nm spectral-domain OCT (SD-OCT) engine (see Fig. 1b and “Methods”) and an integrated MMAT-OCT probe (see Fig. 1c). The MMAT-OCT probe is made of two key parts, including the optical assembly and the rotational assembly (see Supplementary Fig. S1). The optical assembly utilizes a gradient-index (GRIN) lens-based approach, common in OCT endoscopes41,42. Traditional designs use BK7 lenses as spacers, increasing assembly complexity and insertion loss due to additional reflective surfaces. We have replaced the BK7 lens with UV glue as a lower dispersion spacer material. This change led to several optimizations: First, we optimized the UV glue-based spacer length (distance between the single-mode fiber end facets and the GRIN lens) to achieve a high lateral resolution of ~7.6 µm. Second, we set the reflector’s angle to 54°, resulting in a back-reflection level of less than −60 dB. The use of UV glue also reduced the chromatic shift to 74 µm, compared to 133 µm with BK7 lenses, and achieved an optimal axial resolution of 2.4 µm. In addition, using UV glue reduces the number of reflective surfaces in the optical path, thus lowering insertion loss and simplifying the assembly process compared to BK7 lenses. The rotational assembly is made of a reflector affixed to a cylindrical magnet (or rotor component, see “Methods”). The reflector redirects the focused laser beam onto the adjacent tissue.

The distal enclosure of the probe is designed with a 1.3-mm outer diameter and 1-mm inner diameter, matching the dimensions of the GRIN lens and reflector integrated within. To optimize the functionality of the probe, we opted for nickel-titanium as the enclosure material instead of the commonly used stainless steel 316. This choice was made due to nickel-titanium’s weak magnetism, which allows for better interaction with the RDPM and reduces rotational damping. To fabricate the enclosure, laser micromachining was employed to create three struts occupying a 12° area. In addition, a 5° chamfer was added at the root of each strut to prevent breakage under significant external loads (see Supplementary Fig. S1). The length of the struts, 1.7 mm, was carefully chosen to ensure adequate assembly tolerance and unobstructed laser beam transmission (see Fig. 1cI and Supplementary Movie S1). It is important to note that the current MMAT-OCT probe can be further downsized to accommodate smaller lumens (see “Discussion”).

The length of the probe’s distal rigid part poses challenges in cavity navigation and should be minimized43,44. To minimize the rigid length, we designed a hemisphere-shaped stopper using a 0.05-mm thick 304 stainless steel sheet (see Supplementary Fig. S2). The stopper’s weak magnetism guarantees the magnet’s stability even under vertical gravity, while the rotational friction between the stopper and RDPM is negligible. The length of rotor consisting of RDPM and stopper is about 1.28 mm or 2.28 mm (see Fig. 1cI, II). This size (1.28 mm) is ~40% smaller than commercial miniature PMSMs (see Supplementary Fig. S1). The air gap (\({l}_{2}\)) between the GRIN lens and the reflector is correlated with the working distance (see Supplementary Fig. S1 and Supplementary Note 3). The probe fabrication involves five steps (see “Methods”) and costs about US$ 14 (see Supplementary Table II). This represents a significant cost reduction of more than 95% compared to commercial PMSMs, making it an attractive option for cost-effective, single-use applications in clinical settings. To facilitate 3D imaging of luminal organs, a computerized linear translation stage (CLTS, Fig. 1b) is employed to perform the pullback motion of the MMAT-OCT probe, while it conducts circumferential scanning within the luminal organ.

Characterization of MMAT-OCT

Considering safety and spatial constraints (see Supplementary Note 2), we propose utilizing a mono-magnet-based external magnetic actuator to remotely drive the RDPM in probe (see Fig. 2a). A 6-DoF robotic arm was used to carry a servo motor and an external diametrically permanent magnet (EPM, 50 mm in length and diameter, see “Methods”). \({{{\mathcal{L}}}}\) is defined as the working distance between EPM and RDPM (see Fig. 2aI, II). However, this configuration introduces a nonlinear driving magnetic field, necessitating a characterization using the finite element method (FEM) and a simplified magnetic dipole model (MDM, see Supplementary Note 4 and “Methods”). For instance, when \({{{\mathcal{L}}}}\) is 55 mm, we note torque fluctuations between ±0.125 mN·m (see Fig. 2aIII1) and force fluctuations between ±3.8 mN (see Fig. 2aIII2). This nonlinearity correlates with \({{{\mathcal{L}}}}\), as torque and force decrease cubically and quartically, respectively (see Fig. 2aIV)30. Our findings indicate an operational range for \({{{\mathcal{L}}}}\) of 45–65 mm, beyond which magnetic field strength is inadequate for stable torque, causing unstable rotation of RDPM (1 mm in length and diameter). A coplanar arrangement of EPM and RDPM was assumed (see Fig. 2aII), with non-coplanar effects discussed later.

Furthermore, fluctuations in rotational speed are not only influenced by the magnetic field, but also by the nonuniform distribution of the friction coefficient (FC) between the enclosure and RDPM (see Supplementary Note 5 and Supplementary Figs. S3–S6). Therefore, the rotation dynamics of the RDPM are influenced by a complex interplay of gravity, enclosure support, friction, and the magnetic field (see Fig. 2aII). These forces, along with the driving torque, exhibit nonlinearity and time dependence, resulting in an unpredictable rotational speed of the RDPM (see Supplementary Figs. S7S10). For instance, while the rotation frequency oscillates around 10 Hz (see Supplementary Figs. S11S13), OCT imaging reveals unexpected rotation frequencies exceeding 25 Hz in certain regions (see Fig. 2b). This discrepancy can be attributed to the reduced magnetic moment along the 1-mm long RDPM and the impact of the reflector on the rotational inertia. To minimize the nonuniform distribution of the FC, a smooth surface roughness of ~322 nm is achieved by coating 100 nm of silver and 200 nm of silica dioxide (see Supplementary Fig. S5I2, “Methods”). Furthermore, speed fluctuations are reduced within the range of \({{{\mathcal{L}}}}\) from 55 to 60 mm due to a decrease in the variation of the magnetic force (see Fig. 2aIV1).

We further develop an analytical rotation dynamic model that assumes the RDPM maintains pseudo-equilibrium through inertial forces (see Supplementary Note 6 and Supplementary Movie S2). Experimental data captured using a high-speed industrial camera (800 frames/s, see Supplementary Movie S3) confirms the validity of this model. Consequently, by assuming random FCs, a multitude of complex and random rotation dynamics curves can be generated. These curves can be utilized to generate synthetic data for training neural networks aimed at correcting distorted images. The rotational speed fluctuations observed in the RDPM are applicable to both PMSMs and EPM (see Supplementary Fig. S14). In EPM, these fluctuations contribute to errors that cause image distortion. These errors can be categorized into two types: high-frequency errors (see Supplementary Note 7), primarily caused by nonuniform friction between the rotor and enclosure, resulting from dynamic changes in FC and squeezing stress; and low-frequency errors (see Supplementary Note 7), primarily due to variations in the input torque to the rotor, leading to low-frequency components of angular speed variation. Inspired by the Nyquist sampling theorem, a rotation frequency (\({{{{\rm{F}}}}}_{{{{\rm{r}}}}}\)) of 10 Hz can be paired with a sampling frequency (\({{{{\rm{F}}}}}_{{{{\rm{s}}}}}\)) of 20 Hz to mitigate image distortions caused by high-frequency errors. As the \({{{{\rm{F}}}}}_{{{{\rm{r}}}}}\) / \({{{{\rm{F}}}}}_{{{{\rm{s}}}}}\) ratio increases, OCT images of mouse rectal mucosa exhibit reduced shrinkage (see Fig. 2c). In this study, all OCT image acquisitions followed this strategy, typically employing an \({{{{\rm{F}}}}}_{{{{\rm{r}}}}}\) / \({{{{\rm{F}}}}}_{{{{\rm{s}}}}}\) ratio of 2.

Deep-learning method for distortion correction

The nonuniform rotational speed of MMAT-OCT induces rotational distortions or NURD, which manifests as stretching and shrinking distortions in OCT images. Given the comparable lumen sizes (2–3 mm) of the luminal organs under study, namely the rat esophagus and the mouse colon, and the 1.8-mm outer diameter of MMAT-OCT, rotational distortions are more pronounced compared to the common distortions caused by translational displacement of tissue relative to the endoscope and the resultant radial sampling nonuniformity. To accurately correct rotational distortions, we have developed a deep-learning approach called NDNC-Net. This approach integrates the rotation dynamics model of the RDPM and is trained using synthetic data (see Fig. 3).

Synthetic data generation involves distortion-free B-frame resampling with theoretical (as described in “Characterization of MMAT-OCT”) and measured (see Fig. 2b and Supplementary Fig. S10) rotation dynamics. The displacement of the acquired A-line is determined by the integral of the rotational speed (unit: rad/s), denoted as s1 and s1, over the sampling interval (unit: second), represented by \({\lambda }_{1}\) and \({\lambda ^{\prime} }_{1}\). Specifically, it indicates the actual positional difference between two points on the tissue; in the rotational scanning, this is the angular difference. As shown in Fig. 3a, the matched index pairs (e.g., 1 and 1’) describe the same column pixel. However, due to nonuniform rotation dynamics during resampling, the intervals between adjacent A-lines in resampled images either stretch or shrink compared to the distortion-free B-frame (see Supplementary Fig. S15). Conversely, NURD correction is the inverse operation of synthetic data generation. The estimated rotation dynamics are derived from NCNet. The efficacy of this approach is evaluated by the error between the estimated rotation dynamics and the preset ground truth within the synthetic dataset (see Supplementary Fig. S16 and “Methods”). This inverse mapping allows for the correction of NURD regions, resulting in corrected OCT images (see Supplementary Fig. S17).

NURD correction involves two-stage process of detection and correction. For the detection task, we devised a struct detection network to identify structs in raw B-frames and crop them into three subsections without any struts, named tissue regions (see Fig. 3b). Subsequently, a NURD Detection Network (NDNet) is implemented to localize stretching regions within the tissue regions, demarcating these regions with bounding boxes. Stretching regions represent high-frequency errors, while low-frequency errors are indicated by struts (detailed in Supplementary Note 7). For the correction task, we developed a NURD Correction Network (NCNet) to estimate the rotation dynamics between adjacent A-lines. During the inference phase, the struct detection network delineates bounding boxes for struct regions (blue, see Fig. 3c) and crops the B-frame into tissue regions. NDNet then is used to detect stretching distortions (red, see Fig. 3c) within tissue regions, followed by NCNet’s estimation of rotation dynamics to obtain corrected OCT images. Finally, struct regions are used to resize the corrected OCT images, producing undistorted images (see Supplementary Fig. S18).

To evaluate the imaging and distortion correction capabilities, we conducted in vivo investigations on mouse colon and rat esophagus tissues (see Supplementary Movie S4). Using MMAT-OCT, we captured and processed full circumferential scans in real-time at a rate of 20–40 frames per second, with each frame comprising 2048 ×  8196 pixels (axial × circumferential) as illustrated in Fig. 3c, d. The alignment of struts and scaling of A-lines were utilized to correct each B-frame prior to the 3D rendering of OCT volume. Representative OCT images, along with their fourfold magnified views, exhibit intricate tissue microstructures. The OCT images of the mouse colon depict distinct layers, including the upper colonic mucosa (CM), mucosal muscle layer (MM), submucosa (SM), internal muscle layer (MI), and external muscle layer (ME). Similarly, in the rat esophagus, the multi-layered structure is resolved, highlighting the keratinized stratified squamous epithelium (EP), lamina propria (LP), muscularis mucosae (MM), submucosa (SM), and muscularis propria (MP). These OCT images correspond well with hematoxylin and eosin (H&E)-stained histological micrographs, confirming the system’s potential for high-resolution, label-free imaging. Furthermore, in-situ OCT imaging of the porcine pancreatic duct was conducted to validate its deployment and application within the complex lumens of medium-sized mammals (see Supplementary Fig. S19).

Steerability of MMAT-OCT

Axially magnetized guidewires have been well-studied for intracorporeal navigation31,45. However, using a diametrically magnetized rotor, i.e., RDPM, for steering remains relatively underexplored30. The effect of the EPM on the RDPM can be evaluated within its working spaces: the orthographic (OPA) and lateral (LPA) projection areas (see Supplementary Fig. S20). The preferred working distance, ɦ, ranges from 30 to 60 mm31,32, contingent on the catheter’s stiffness and length31. To enhance the MMAT-OCT’s navigational performance and extend effective ɦ, we implemented a Flexible–Soft–Rigid segmented stiffness distribution (see Fig. 4a). This design includes two stiffness levels in the flexible section: ~200 GPa for the catheter body and ~140 GPa for the optical components. The 8-mm soft segment, with a Young’s modulus of 16–79 GPa, consists of the optical fiber and coating. The 7-mm rigid segment features a nickel-titanium enclosure for protecting the optical components. We evaluated the MMAT-OCT’s steering capabilities using an EPM-bending angle mapping model based on the potential energy optimization method (see Supplementary Note 8 and Supplementary Fig. S21)46. The distal shape is described by an iso-curvature model47. In the OPA configuration, EPM I can achieve a maximum turning angle of 110° at 30 mm and 5° at 60 mm (see Fig. 4bI, II). The LPA configuration showed a faster attenuation rate with increasing ɦ (see Fig. 4bIII). Experimental results indicated smaller deflection angles than theoretical estimates, decreasing from 40° to 15° as ɦ ranged from 35 to 50 mm, due to model errors, misalignment, and initial catheter state46. For larger bending angles, increasing the EPM volume is feasible. EPM II, with a diameter and height of 100 mm, has been validated for cerebral vessel navigation31. Simulations show a maximum deflection angle of ~123° within the 30-60 mm range (see Fig. 4bII), slightly decreasing in the LPA configuration from ~120° to ~106° (see Fig. 4bIII).

We further validated the MMAT-OCT’s navigational abilities using a transparent, rigid vascular phantom (30 mm thickness, 2–5 mm apertures) (see Fig. 4cI and Supplementary Movie S5). The MMAT-OCT’s flexible section has a pre-bend of 10° (see Fig. 4a) and was manually advanced without rotational maneuvers (see Fig. 4cII). In path 1 (a1–e1), an obstacle at b1 required an active deflection of ~5°. The endoscope’s ~7 mm rigid length risked entrapment at d1, and the structure at c1 limited pre-deflection. At 00.24 s, the EPM enabled a 20° active deflection, reaching e1. Smooth retraction within the lumen was demonstrated between 00.44 s and 00.46 s. However, at 00.53 s, the probe became entrapped at c1, likely due to frictional self-locking. We optimized segment lengths using Abaqus CAE, revealing that large bending angles reduce the minimum turning radius, necessitating adjustments for MMAT-OCT compatibility (see Supplementary Fig. S22).

We also tested a longer, continuously curved lumen (a2–g2) (see Fig. 4cIII and Supplementary Movie S5), requiring bending angles of 79° at b2 and 72° at f2. The combined use of the EPM and manual advancement allowed the MMAT-OCT to navigate the path smoothly, demonstrating its effective navigational abilities. This confirmed our Flexible–Soft–Rigid gradient stiffness design’s capability to transmit force and enable active deflection. Here we chose an axially magnetized EPM for its superior gradient distribution, enhancing navigation at equivalent distances. Moreover, in complex curved areas, the MMAT-OCT’s distal imaging window could be magnetically manipulated to attach to the target, ensuring optimal beam focus.

Programmable imaging using angle control

The MMAT-OCT angle control relies on the EPM’s ability to generate a magnetic field in any desired direction at the RDPM’s location (see Supplementary Fig. S23a). Ideally, a near-linear relationship between the magnetic field direction and EPM angle is observed when \({{{\mathcal{L}}}}\) is 45–100 mm (see Supplementary Fig. S23b). PMSMs have been used for B-scan angular imaging but are limited to in vitro settings or larger structures (diameter: 12 mm)38,48 due to catheter deformation, which obscures the magnetic field’s input signal coordinate system {\({{{\mathcal{N}}}}\)}. This makes global coordinate system {\({{{\mathcal{M}}}}\)} positional information insufficient for precise angular control without additional orientation feedback. In contrast, the EPM’s fixed coordinate system {\({{{\mathcal{Q}}}}\)} state is easily obtained, and with the OCT probe’s coordinate system {\({{{\mathcal{P}}}}\)} origin known, effective angular control in {\({{{\mathcal{M}}}}\)} is feasible if the probe’s axial direction is visible (see Fig. 5a).

Magnetic field attenuation due to \({{{\mathcal{L}}}}\) and friction between the RDPM and the enclosure complicate alignment. A direct mapping between the EPM and RDPM angles (see Supplementary Fig. S24 and Supplementary Note 9) addresses this. Numerical noise from friction coefficient variations (0.12–0.53) can be mitigated by filtering and approximating control sequences49. As \({{{\mathcal{L}}}}\) exceeds 65 mm, the gradient approaches zero, making control impractical (see Supplementary Fig. S24). Unstable states near 0° should also be avoided. This study focuses on angular imaging within 180°, operating within (−180°–0°) and (0°–180°) ranges, preferably around −90° or 90°.

To validate the angle control precision, we performed programmable B-scan imaging of human fingers50,51. The probe, positioned between the thumb and index finger, operated at four angles (90°, 120°, 180°, 360°) (see Fig. 5b, d). Angle control precision involves restricting scanning boundaries during open-loop operation. Additionally, angular scanning introduces NURD; however, these NURD areas have consistent features that can be corrected through NDNC-Net (see Fig. 3c). Here, we use shared features in B-frames from different angular scans (e.g., similar tissue areas indicated by the red dotted box in Fig. 5bI, II) for registration to characterize angle control precision. Specifically, the effective scanning area within the B-scan undergoes scaling between −15% and 15% to facilitate feature matching (see Fig. 5bII, Supplementary Fig. S25, and Supplementary Note 8II). The peak values (indicating the actual angular scanning boundary) of the matching coefficients correspond to image scaling ratios of ~-0.4%, ~2.2%, and ~4.7% (see Fig. 5bIII), respectively. The maximum angle errors, calculated from the peak values of the matching coefficients, suggest that a larger scanning range leads to a greater angle error, with an average angular error of ~4° (see Fig. 5d, versus about 6° in PMSMs38). Thus, the predetermined angle then can be viewed as the scanning border in B-scan during image processing (see Fig. 5d). Compared to the PMSM, the MMAT-OCT probe provides stepless angle adjustment with a comparable angle resolution38. By harnessing an external magnetic field for control, the rotor’s pose in {\({{{\mathcal{M}}}}\)} is secured, thus enhancing the safety margin for angle control, and holding promise for clinical applications such as targeted laser ablation52,53 or micrometer-scale photodynamic regional treatment54, in addition to optimizing OCT data collection and storage55.

The angle control capabilities have been extended to facilitate programmable pattern scanning, superimposing angle scanning on a linear motion with 5 µm/frame fiber retraction. Figure 5eI defines the targeted 3D scanning region as the capital letters “CUHK”. This is followed by generating corresponding 2D orthographic contours and 2D path planning (see Supplementary Note 8III). Ultimately, 3D reconstruction is achieved via back-projection to elucidate the pattern’s morphology (see Fig. 5eII). The 3D surface is then converted into a set of direction-scanning angle pairs, which serve as the input for the stepper motor. Figure 5eIII shows the patterned scanning results (top view) on mouse colon tissue (see Supplementary Movie S6), aligning with the planned target region. Representative B-scan images from 200 µm intervals further validate the MMAT-OCT’s superior angle control capabilities and potential for selective tissue mucosa scanning (see Fig. 5eIV). Representative B-scan images from 200 µm intervals, distinguishing between “CU” and “HK” to exemplify the EPM’s controllability within an absolute coordinate framework. Notably, the shape of the letter K has been modified to maintain the continuous beam scanning of SD-OCT.

Discussion

The Food and Drug Administration (FDA) approval for 1300 nm endoscopic OCT for gastrointestinal and coronary artery imaging reinforces its status as a minimally invasive, low-risk imaging modality56. The remote control of MMAT-OCT is accomplished through the utilization of a mono-magnet actuator, enabling improved pathological examination of living tissues in small, intricate lumens (see Fig. 6 and Supplementary Note 2), as well as heat-sensitive structures such as blood vessels (see Fig. 7 and Supplementary Note 10).

Fig. 6: MMAT-OCT imaging performance in curved lumens.
figure 6

a, b Experimental setup: MMAT-OCT navigated through four distinct lumens in a human lung phantom and an ex vivo rat trachea to assess imaging quality. Pathway I: Catheter route from trachea to the right lower lobe lateral basal segment, with a terminal curvature of 55°. Pathway II: Extends from left main bronchus to left upper lobe bronchus, showing continuous bending with a terminal angle of 101°. Pathway III: From right main bronchus to right upper lobe apical segment, with severe bending of 315°, causing catheter stress and wall pressure that induce NURD and axial resolution loss. Pathway IV: Starts at left main bronchus, ends at left upper lobe anterior segment; features 3D continuous bending and an interplanar angle of 62°, leading to rotational inhomogeneity and axial translation. c Quantitative analysis (n = 3) in imaging quality impact across four conditions. The high-frequency errors (Supplementary Note 7) have a mean of 47.06% with a standard deviation of 4.12%, while the low-frequency errors (Supplementary Note 7) have a mean of 5.73% with a standard deviation of 1.27%. Data are presented as mean values +/− SD. Source data are provided as a Source Data file.

Fig. 7: Electrical and thermal properties of MMAT-OCT.
figure 7

a Leakage current: MMAT-OCT complies with FDA’s safety limit (0.1 mA) for cardiac floating-type devices. b Voltage characteristics: The MMAT-OCT’s 304 stainless steel stopper exhibits a voltage amplitude <0.02 mV, measured at 200 mV and 50 Hz using an oscilloscope and signal amplification filter circuit. c Clinical relevance: Intraluminal temperatures exceeding 41 °C are associated with esophageal ulceration. d Temperature monitoring: MMAT-OCT probe temperature, monitored by a thermal camera over 45 min, showed minimal fluctuations within ±0.5 °C from ambient conditions. Source data are provided as a Source Data file. a, c Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

The imaging quality in MMAT-OCT is influenced by the distance between RDPM and EPM. However, it remains robust due to several factors. First, the system has a fault-tolerant working range of 45–65 mm (see Fig. 2c), which allows for maintaining quality without the need for positional feedback. Second, the experiments were conducted on small animals, such as rats and mice, with body size variations under 20 mm. This ensures that the imaging is not significantly affected by anatomical differences. In addition, the pullback distance is 2.5 mm in Fig. 3c, d, and 0.4 mm for each pattern in Fig. 5e, which further minimizes anatomical variations to below 5 mm. Furthermore, the OCT image serves as a sensor during angle scanning, allowing for pre-configured adjustments based on observed rotational angles (see Supplementary Fig. S26). This strategy effectively corrects for working distance variations, with an acceptable distance error of 3–5 mm.

Despite a suitable working distance, there are still risks of misalignment due to angular discrepancies (Θ) as shown in Fig. 8a. Achieving and maintaining optimal alignment (Θ = 0°) without probe pose feedback is challenging. To evaluate the imaging quality of MMAT-OCT under non-ideal conditions, we conducted experiments and found that it remains robust up to a 40° angular deviation (see Supplementary Fig. S27a). Our findings indicate a linear relationship between the low-frequency error and Θ (see Supplementary Fig. S27b), suggesting the potential for implementing closed-loop feedback using distorted OCT image features (see Supplementary Fig. S27c). To explore this further, we designed an experiment that utilized OCT image feedback for servo control, with actions executed through teleoperation (see Fig. 8b). Post-processed OCT images with NDNC-Net were used to calculate the low-frequency error. The optimization of Θ involves two operations: negative action (NA), which deviates from the optimal configuration, and positive action (PA), which approaches the optimal configuration. An increase in the low-frequency error curve indicates NA, while a decrease indicates PA. The changes in the system configuration caused by NA (red arrows) and PA (blue arrows), specifically the 2D pose correction of the EPM, are visualized in Fig. 8b, where Θ is adjusted from the initial state of 30° to the final state of 5°.

Fig. 8: OCT images as feedback control in MMAT-OCT for optimal imaging quality.
figure 8

a Diagram showing the intersection angle (Θ) between the EPM and probe, influencing torque as T∙cos(Θ), with the mouse body thickness approximated to be 30 mm. b Post-processed OCT images using NDNC-Net can calculate the low-frequency error. The red arrows (left figure) indicate the position of the strut. The Θ optimization process involves two types of operations: negative action (NA, deviating from the optimal configuration) and positive action (PA, approaching the optimal configuration). An increase in the low-frequency error curve indicates NA, while a decrease indicates PA. Changes in system configuration caused by NA (red arrows, right figure) and PA (blue arrows) operations, specifically the EPM’s 2D pose correction, are visualized. Source data are provided as a Source Data file. a, b Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

MMAT-OCT utilizes a cost-effective design but requires compromises in probe dimensions—such as diameter and rigid tip length—due to the use of commercially sourced optical components. Future research should focus on miniaturizing the reflector to mitigate its impact on rotation dynamics and comply with the dimensions needed for submillimeter structures. While reducing commercial reflectors to 0.3 mm3 could increase costs 30-fold, using polished and electroplated capillary metal tubes might lower costs37, but potentially degrade image quality. In addition, replacing GRIN lenses with ball lenses to reduce probe dimensions has become common but may compromise image quality due to severe back-reflection37,57.

The main source of background noise in endoscopic OCT imaging arises from back reflections of optical surfaces in the optical path. To reduce the background noise in our OCT endoscope, we have implemented the following measures: first, the GRIN lens is designed with one flat end coated with anti-reflection material (reflection <0.5% @ 850 nm), while the other end is angled at 8° to minimize back-reflection; second, the 8° angled end of the single-mode fiber is precisely aligned with the 8° angled facet of the GRIN lens, and UV glue is utilized as the spacer material to minimize the refractive index difference between optical surfaces; lastly, we have incorporated a reflective mirror with a 54° angle to further minimize back-reflection from the plastic sheath. These measures collectively contribute to the reduction of back-reflection in our OCT endoscope, resulting in a back-reflection level of less than −60 dB.

The MMAT-OCT represents a significant advance in microscopic mucosal imaging, surpassing traditional OCT endoscopes in resilience to probe deformation and thermal-electrical effects, thus providing a robust and safe solution for in vivo imaging. Optimizing sampling frequency and applying learning-based corrections effectively mitigate stretch–shrink NURD, improving image quality. The integration of an RDPM enhances steering capabilities and ensures 3D imaging under extreme bending. In addition, its compatibility with standard optical devices and cost-effective production makes it economically viable, aligning with single-use interventional catheters. Despite challenges like small probe size and nonuniform friction, system optimizations have minimized tissue information loss and corrected distortions. Validated under various conditions, the system demonstrates versatility and reliability, enhancing early detection and supporting targeted therapies, thereby potentially revolutionizing mucosal imaging and expanding applications in controlled imaging, diagnosis, and treatment40,58.

Methods

Manufacture and fabrication procedures of MMAT-OCT probe

The construction of MMAT-OCT probe is a five-stage process. Initially, a rod reflector (OD 1 mm, reflection angle 54°, Foctek Photonics, China) is bonded to the RDPM (diametrically magnetized N52, 1 mm diameter, 1–2 mm length \({l}_{3}\), Hangzhou Yongci Group Company Ltd.) using epoxy resin (G14250, Thorlabs Inc.), creating a magnetic reflector. The N-BK7 rod reflector features one 54-degree tilted flat surface with a silver coating for beam redirection and side-viewing. To reduce surface roughness, a surface coating process was implemented using the IVS EB-600 E-beam Evaporator. The RDPM surface was consecutively coated with 100 nm of Ag (Silver) and 200 nm of SiO2 (Silicon Dioxide), each applied at a rate of 0.1 nm/s.

Following this, a 780 HP single-mode fiber (SMF, 780HP, Thorlabs Inc., USA) is cleaved at an 8° angle using an automated glass processor (GPX3800, Thorlabs Inc.). This cleaved fiber is then affixed to a 3-mm-long GRIN lens (OD 1 mm, ComFiber Communications Technology, USA), maintaining a distance \({l}_{1}\) of 275 μm between the distal end of the SMF and the GRIN lens for optimal optical performance, as determined by ray-tracing simulations. The gap is filled with an optical adhesive (NOA 68, Norland Product Inc., USA) and the SMF-GRIN-lens assembly is subjected to 4 h of UV light exposure (central wavelength of 365 nm, power density of 65 mW/cm2 on sample) to ensure complete polymerization and rigid bonding.

In the third stage, we employed the YC-SLC300 medical stent laser cutting machine (Kunshan Yunco Precision Co., LTD) to craft the desired light-transparent structure from a thin-walled, small-diameter nickel-titanium tube (inner diameter 1.1 mm, wall thickness 0.1 mm). Two perpendicularly oriented cameras equipped with a 10× objective oversee each step, monitoring the precise alignment and relative position of components in the longitudinal direction.

In the subsequent stage, the prefabricated elements are assembled within the enclosure. The rotor is inserted at the distal end of the enclosure, and the stopper is introduced to generate a magnetic attractive force and confine the reflector’s position longitudinally. The dome shape of the metal stopper minimizes friction during high-speed rotation. Epoxy resin is applied to the stopper’s brim to secure its position relative to the enclosure.

Finally, the SMF-GRIN-lens assembly is integrated within the preassembled enclosure. The distance \({l}_{2}\) between the distal surface of the GRIN lens and the reflector’s front end is ~1.2 mm. \({l}_{1}\) and \({l}_{2}\) collectively fine-tune the focused beam spot size and the focal length. Epoxy resin fills the gap between the GRIN lens and the enclosure. A transparent plastic sheath of 1.8 mm OD and 200 μm wall thickness safeguards the MMAT-OCT probe during imaging. To ensure precise alignment, a custom-built platform, consisting of separate holders for the magnet, fiber, and prism mounted on independent translational and rotational stages, is used to assemble the optical components at the catheter’s distal end.

Surface roughness characterization

Under the operation of MarSurf MSW 8.6 software, high-resolution surface topographical maps of the RDPM and the enclosure were acquired using the MarSurf CM mobile confocal microscope. The aim of utilizing this apparatus was to ascertain the roughness characteristics of the rotating components. The arithmetical mean height (Sa), as defined by ISO 25178, is employed to measure and assess surface roughness, representing the mean deviation across a surface from a reference plane59.

Simulation of magnetic field

We developed a simulated workspace utilizing two cylindrical permanent magnets. An EPM measuring 50 mm in both diameter and length, and a RDPM of 1 mm in diameter and length were used. We employed the COMSOL Multiphysics platform for the simulation, using the ‘magnetic field, no current’ physics interface. Both magnets were assigned the same material—N52 (Sintered NdFeB). With respect to the magnetization model, we incorporated the remanent flux density, featuring a recoil permeability of 1.05 and a remanent flux density norm of 1.44 T. During the simulation, the mesh element size for the EPM was set to a maximum of 3 mm and a minimum of 2.88 mm. In contrast, the RDPM mesh had a maximum element size of 0.05 mm. The magnetization direction for both magnets was set diametrically. We calculated the magnetic field within a cubic space with a side length of 320 mm using a stationary study.

Robotic magnetic actuator

We utilized a 6-DoF Universal Robots UR5 serial robotic manipulator, capable of handling a 5-kg payload, to manipulate the actuator attached to its end-effector. For circumferential scanning, we employed an external motor, the RMD-L-90 (GYEMS), which incorporates a spherical permanent magnet (50 mm diameter, 50 mm height, NdFeB, N52 grade). For angular scanning, we deployed a planetary reduction stepper motor (80FXMA-016A, Shanghai Fengxin Transmission Machinery Co., Ltd).

Ultrahigh-resolution 800-nm SD-OCT endoscopic system

A near 800-nm SD-OCT distal scanning system (Fig. 1b) has been devised to enable ultrahigh-resolution tissue imaging. The system boasts a broadband super luminescent diode source (M-D-840-HP, Superlum, Ireland) with a central wavelength of 842 nm and a 3-dB bandwidth of 160 nm. A 2 × 2 broadband 50:50 fiber coupler (TW850R5A2, Thorlabs, USA) is utilized to construct the interferometric system. Within the detection arm, a high-speed spectrometer with 2048 pixels (CS800-800/300, Wasatch Photonics, USA) is employed, operating at a bandwidth from 650 to 950 nm and achieving a maximum scanning rate of 250 kHz. The light beam in the reference arm is collimated by a reflective fiber collimator (RC08APC-P01, Thorlabs, USA) and reflected by a protected silver mirror (PF05-03-P01, Thorlabs, USA). A pair of prisms (N-SF11, Edmund, USA) are integrated between them to balance the dispersion between the reference and sample arms. Furthermore, polarization controllers (FPC030, Thorlabs, USA) are deployed to regulate light polarization in the system and augment image contrast. In the sample arm, the imaging probe is affixed to a translational stage for longitudinal scanning. Unlike the traditional fiber-optic rotary joint used to drive the OCT catheter for circumferential tissue image acquisition, which incurs an insertion loss of ~3 dB, the MMAT-OCT probe negates the need for such a component. This led to a one-way transmission efficiency of ~90% and a back-reflection of less than −60 dB. The optimized imaging optics resulted in a working distance of 260 μm, an axial resolution of 2.4 µm, and a lateral resolution of 7.6 µm. The working distance is defined as the distance from the outer surface of the plastic sheath to the endoscope’s focal plane.

The OCT image acquisition is performed as follows: for continuous rotation scanning at frequencies ranging from 10 to 30 Hz, data from the initial and final phases were excluded, focusing solely on data obtained during regular rotation periods. In the case of angular scanning, data beyond the boundaries of the B-scan (see the red boxes in Supplementary Fig. S25) were discarded. The pertinent B-scan region (see the blue boxes in Supplementary Fig. S25), characterized by periodic decelerations (speed decreases) and accelerations (speed increases), is corrected using NDNC-Net.

NDNC-Net

Addressing the challenge of rotational distortion, we propose a bifurcated strategy delineated in Fig. 3, segregating the process into detection and correction phases. The initial phase employs a struct detection network to locate struct regions within raw B-frames using a semi-supervised object detection framework60. This framework introduces a soft teacher-student training paradigm, where the student model is trained with robustly augmented data and the teacher model with mildly augmented data, generating high-fidelity pseudo boxes. The composite loss function combines supervised and unsupervised loss elements, optimizing performance across both labeled and pseudo-labeled data:

$${{{\mathcal{L}}}}{{{\mathscr{=}}}}\frac{1}{{N}_{l}}{\sum}_{i=1}^{{N}_{l}}\left({{{{\mathcal{L}}}}}_{{cls}}\left({I}_{l}^{i}\right)+{{{{\mathcal{L}}}}}_{{reg}}\left({I}_{l}^{i}\right)\right)+{{{\rm{\alpha }}}}\left(\frac{1}{{N}_{u}}{\sum}_{i=1}^{{N}_{u}}\left({{{{\mathcal{L}}}}}_{{cls}}\left({I}_{u}^{i}\right)+{{{{\mathcal{L}}}}}_{{reg}}\left({I}_{u}^{i}\right)\right)\right)$$
(1)

where \({I}_{l}^{i}\) denotes the ith labeled image, \({I}_{u}^{i}\) denotes the ith unlabeled image, \({{{{\mathcal{L}}}}}_{{cls}}\) denotes the classification loss, \({{{{\mathcal{L}}}}}_{{reg}}\) denotes box regression loss, \({{{\rm{\alpha }}}}\) denotes the weight factor, \({N}_{l}\) and \({N}_{u}\) denote the number of labeled images and unlabeled images. The teacher model is updated using an exponential moving average (EMA) strategy. We manually labeled part of raw MMAT-OCT images together with 100 unlabeled images to train the struct detection network. With the detected struct regions, raw B-frames are cropped into three subsections, named tissue region, without struct regions for afterwards processing.

The NURD Detection Network (NDNet) is proposed to pinpoint potential NURD-affected areas. Using separate networks for struct and NURD region detection improves NDNet’s precision and specificity. NDNet is implemented using the YOLO v8 framework, which is renowned for its efficiency and accuracy in object detection tasks. Key features of YOLO v8, such as the use of CSPDarknet as the backbone network, and the integration of a Path Aggregation Network (PANet) for better feature fusion, were pivotal in achieving high detection accuracy. The model also employs an anchor-free detection head, which simplifies the detection process and reduces computational complexity. NDNet is trained with the self-developed synthetic dataset (see Supplementary Fig. S15).

The correction network (NCNet) maps the raw OCT image domain \({{{\mathcal{I}}}}\) to the rotation dynamics domain \({{{\mathcal{D}}}}\), a 1D vector specifying the variation of sampling distance differences between adjacent A-lines. Rotation dynamics indicates over- and under-sampling (see Supplementary Fig. S15). The estimated rotation dynamics \(\hat{D}\) is optimized to match the ground truth \({{{\mathcal{D}}}}\) by minimizing the mean square loss:

$${{{{\mathcal{L}}}}}_{C}=\frac{1}{{N}_{d}}{\sum}_{i=1}^{{N}_{d}}{{{{\mathcal{L}}}}}_{{mse}}({D}^{i},\, {\widehat{D}}^{i})$$
(2)

where \({N}_{d}\) denotes the number of A-lines within a B-frame, \({D}^{i}\) refers to the ith ground truth, \({\hat{D}}^{i}\) refers to the ith estimated rotation dynamics and \({{{{\mathcal{L}}}}}_{{mse}}\) denotes the mean square loss. NCNet is implemented using the ViT-Base (Vision Transformer) architecture, which has demonstrated significant advancements in image analysis by leveraging the transformer model, originally designed for natural language processing, to process visual data. NCNet comprises 12 transformer blocks, and we employ two fully connected layers following the ViT to output the estimated rotation dynamics with the correct dimensions.

During inference, stretching regions \(\{{b}_{i}\}\) are detected by NDNet. NCNet estimates the corresponding rotation dynamics, and a distortion pattern \(F\) is calculated by cumulative summation and normalization:

$${F}_{{raw}}^{k}={\sum}_{i=0}^{k}{\widehat{D}}^{i}$$
(3)
$$F={minmax}(F_{raw}) \, \cdot \, W$$
(4)

where \({minmax}\) is the min-max normalization that map the input to [0, 1] and W represents the width of image. The distortion pattern \(F\) is also a 1D vector indicating where A-lines in the input OCT image should move to obtain the corrected image. Each detected region within the B-frame is then resampled with the inverse mapping \({F}^{-1}\) of \(F\). Since the inverse mapping is not a one-to-one function, bilinear interpolation is devised to fill in the zero A-lines of the inverse-mapped OCT image. Subsequently, the NURD-corrected OCT image is resized, using detected struct regions as markers in the MMAT-OCT probe, with each structure representing 4° of a full 360-degree circumference, uniformly distributed.

A large-scale synthetic dataset was created for NDNet and NCNet, where the NURD region and corresponding RDVV exhibit similar features to the rotational NURD distortion region of MMAT-OCT (see Fig. 3a and Supplementary Fig. S15). We used distortion-free OCT images acquired by the proximal scanning method. The proximal scanning approach involves rotating the imaging probe with a fiber-optic rotary joint for two-dimensional circumferential imaging. There are a total of 300 distortion-free OCT images, including mouse colon and rat esophagus. A total of 11,600 synthetic OCT images depicting stretching and shrinking distortion regions were utilized to train the NDNC-Net (see Supplementary Fig. S15). Focused stretching distortion regions exhibit a normalized rotation dynamics value below 0.

The training process was carried out using the PyTorch framework on a single Nvidia 3090 GPU, with memory allocations of 7 GB, 7 GB, and 6 GB for struct detection network, NDNet, and NCNet, respectively. Optimization was performed using the Adam algorithm, with both the initial learning rate and weight decay parameter set at 0.0001. The training process was configured with a batch size of 4 with 100 epochs for struct detection network and NDNet, and a batch size of 16 with 200 epochs for NCNet, spanning up to 50 epochs for each task. The inference times for the struct detection network, NDNet, and NCNet are 0.9 s, 0.9 s, and 0.012 s, respectively. We conducted various rotation dynamics experiments, obtaining 5000 synthetic samples and 120,000 synthetic samples for training NDNet and NCNet, respectively. Based on current performance metrics, increasing the number of training samples does not significantly enhance performance. However, increasing the diversity of OCT image types and tissue types could improve the network’s generalization ability.

NDNC-Net’s correction capabilities depend on the synthetic data and the methods used to accurately model the rotation dynamics of RDPM. To ensure the correction capability of NDNC-Net and understand its dependency on the quality of training data, we explored two methods for modeling the rotation dynamics in the synthetic data used to train NDNC-Net. The first method involves approximating theoretical data with cubic spline interpolation to simulate rotation dynamics. The rotation dynamics of the RDPM are influenced by various factors such as gravity, enclosure support, friction, and the magnetic field, resulting in complex and unpredictable angular speed due to the nonlinearity and time dependence of these forces. To model this complexity, we used cubic spline interpolation to approximate theoretical data (see Supplementary Fig. S8 and Fig. 2b), generating complex and random curves by assigning random friction coefficients. This synthetic dataset currently simulates 10,000 scenarios, with potential for expansion. The second method utilizes measured rotation dynamics. We captured high dynamics of the rotating RDPM using a high-speed industrial camera with a frame rate of 800 frames per second, as shown in Supplementary Fig. S9 and Supplementary Movie S3. These measured rotational dynamics curves were then approximated using cubic spline interpolation (see Supplementary Fig. S10), resulting in a synthesized dataset of 11,600 images incorporating these measured dynamics.

To evaluate the performance of the trained NDNC-Net, we conducted tests on 350 randomly selected synthetic images, with the distortion-free OCT images serving as the ground truth. The results show that the mean squared error (MSE) of the rotation dynamics prediction is 4.3%, thereby validating the correction performance of NDNC-Net.

Animal studies

The experimental protocols for animal imaging on mice (n = 4, Nu/J, The Laboratory Animal Services Centre, The Chinese University of Hong Kong) and rats (n = 4, Sprague Dawley, The Laboratory Animal Services Centre, The Chinese University of Hong Kong) received ethical approval from the Animal Experimentation Ethics Committee (AEEC) at The Chinese University of Hong Kong. The OCT imaging on the pig (n = 3, Yorkshire Pig, Tianjin Bainong Laboratory Animal Breeding Technology Co., Ltd.) was approved by Qilu Hospital of Shandong University. The animals did not experience dark/light circle, and the ambient temperature was 23 °C, and the humidity was kept 35%. Ex vivo imaging was preceded by the humane euthanasia of rats through the administration of a pharmacological overdose, specifically 100 mg/kg ketamine and 10 mg/kg xylazine, after which OCT evaluation was performed. Concurrently, in vivo imaging necessitated the induction of anesthesia via subcutaneous injection of ketamine and xylazine at the same dosages mentioned above, enabling the acquisition of OCT scans. The pig anesthesia was initiated with ketamine (25 mg/kg) and then maintained by continuous intravascular administration of propofol (800 μg/kg/h) during imaging. Following the completion of the OCT imaging procedures, the tissues were promptly excised and preserved in formalin, thus preparing them for subsequent histological examination.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.