Single-frame deep-learning super-resolution microscopy for intracellular dynamics imaging

Chen, Rong; Tang, Xiao; Zhao, Yuxuan; Shen, Zeyu; Zhang, Meng; Shen, Yusheng; Li, Tiantian; Chung, Casper Ho Yin; Zhang, Lijuan; Wang, Ji; Cui, Binbin; Fei, Peng; Guo, Yusong; Du, Shengwang; Yao, Shuhuai

doi:10.1038/s41467-023-38452-2

Download PDF

Article
Open access
Published: 18 May 2023

Single-frame deep-learning super-resolution microscopy for intracellular dynamics imaging

Nature Communications volume 14, Article number: 2854 (2023) Cite this article

14k Accesses
14 Citations
34 Altmetric
Metrics details

Subjects

Abstract

Single-molecule localization microscopy (SMLM) can be used to resolve subcellular structures and achieve a tenfold improvement in spatial resolution compared to that obtained by conventional fluorescence microscopy. However, the separation of single-molecule fluorescence events that requires thousands of frames dramatically increases the image acquisition time and phototoxicity, impeding the observation of instantaneous intracellular dynamics. Here we develop a deep-learning based single-frame super-resolution microscopy (SFSRM) method which utilizes a subpixel edge map and a multicomponent optimization strategy to guide the neural network to reconstruct a super-resolution image from a single frame of a diffraction-limited image. Under a tolerable signal density and an affordable signal-to-noise ratio, SFSRM enables high-fidelity live-cell imaging with spatiotemporal resolutions of 30 nm and 10 ms, allowing for prolonged monitoring of subcellular dynamics such as interplays between mitochondria and endoplasmic reticulum, the vesicle transport along microtubules, and the endosome fusion and fission. Moreover, its adaptability to different microscopes and spectra makes it a useful tool for various imaging systems.

Bio-friendly long-term subcellular dynamic recording by self-supervised image enhancement microscopy

Article Open access 13 November 2023

DBlink: dynamic localization microscopy in super spatiotemporal resolution via deep learning

Article 27 July 2023

Isotropic super-resolution light-sheet microscopy of dynamic intracellular structures at subsecond timescales

Article 11 March 2022

Introduction

Live-cell fluorescence imaging, requiring both low phototoxic illumination and a high imaging speed, is usually performed with a wide-field (WF) fluorescence microscope¹. The spatial resolution of a conventional fluorescence microscope is limited by diffraction and thus unable to resolve subcellular structures smaller than 200 nm. In the past two decades, various types of super-resolution microscopy surpassing the diffraction limit have been developed. For example, structured illumination microscopy (SIM)² can be used for live-cell imaging with low invasiveness; however, it only improves the spatial resolution of images by a factor of up to 2. Although advanced SIM³ has improved the resolution to ~60 nm, multiple frames are still required to construct a single super-resolution (SR) image. Stimulated emission depletion (STED) microscopy⁴ can achieve an ~50 nm resolution using highly intense light pulses, but point-to-point scanning makes STED too slow for live-cell imaging. Single-molecule localization microscopy (SMLM)^5,6,7,8, including photoactivated localization microscopy (PALM)^6,7 and stochastic optical reconstruction microscopy (STORM)⁵, further enhances the spatial resolution by a factor of 10 (~20 nm) but typically requires more than thousands of frames with separated single-molecule fluorescence events to reconstruct one SR image; hence, in rare cases, SMLM has been applied to live cells at a second-scale temporal resolution^9,10,11. To perform time-resolved and noninvasive super-resolution imaging, numerous advanced labeling strategies^12,13,14, optical imaging systems^15,16, and image reconstruction methods^17,18,19 have been explored in recent decades. Nonetheless, inherent tradeoffs among spatial and temporal resolutions, the achievable signal intensity and cytotoxicity must be made due to the physical boundaries of optical systems²⁰.

The rapid development of artificial intelligence has led to many traditional hardware limits being surpassed. Various deep learning networks have displayed excellent performance in the single-image super-resolution (SISR) task^21,22,23 which usually transforms a single low-resolution (LR) photograph to a high-resolution (HR) photograph. The focus of the SISR task for realistic photographs is to enhance texture and improve visual quality^24,25. In contrast, super-resolution tasks for microscopic images demand ultrastructure recovery from diffraction-limited images with high accuracy. Recently, popular neural networks in computer vision have been modified to enhance the resolution of microscopic images, for instance, from low magnification to high magnification^26,27, confocal to STED^27,28, and total internal reflection fluorescence (TIRF) or WF to SIM^27,28,29; and also combined with PALM and STORM to accelerate the localization process of SMLM reconstruction³⁰ and reduce the number of frames of single-molecule images required for SMLM reconstruction^31,32. However, due to the large resolution gap between the LR images acquired by WF microscopes and HR images obtained from SMLM reconstructions, multiple frames of LR images with single-molecule fluorescence events are still required to reconstruct an SR image. Therefore, the fundamental problems of multi-frame super-resolution imaging, such as the long acquisition time and photobleaching-induced phototoxicity in localization microscopy, still hinder its application in the imaging of live-cell dynamics.

In this work, we first explore the possibility of using a neural network to directly transform a single diffraction-limited image to an SR image with a 10-fold higher resolution. By applying an enhanced super-resolution generative adversarial network (ESRGAN)²⁵, multi-component loss function, and prior information regulation, we develop a super-resolution network (SRN) that can resolve a single diffraction-limited frame to an SR image with up to a 10-fold resolution improvement. Then, we investigate the challenges of implementing this SRN for real-time live-cell observations where the acquired images normally have an ultralow signal-to-noise ratio (SNR). By deploying a signal-enhancement network (SEN) in advance to progressively optimize the image SNR and resolution, we are able to reduce the requirement on the input SNR for satisfactory reconstruction quality, thus allowing for high-speed live-cell imaging without sacrificing the spatial resolution. Taken together, we propose a single-frame super-resolution microscopy (SFSRM) approach that allows us to reveal time-resolved intracellular events in live cells, for instance, the vesicle transport dynamics, the endosome fusion and fission process, and mitochondria-endoplasmic reticulum interactions. Moreover, we demonstrate that the well-trained SFSRM networks can be used in various imaging systems without further training, making super-resolution imaging possible for laboratories lacking training datasets.

Results

SFSRM based on joint-optimization-enhanced deep learning networks

The central goal of the deep-learning-based microscopic image SR tasks is to reconstruct the high-frequency structures with high accuracy from LR images. Therefore, in pursuit of high fidelity, the mostly used loss functions in microscopic image restoration are mean absolute error (MAE) loss, mean square error (MSE) loss, and structural similarity (SSIM) loss. These loss functions which focus on pixel-wise differences between the network output and the GT image can achieve a high peak signal-to-noise ratio and SSIM index, but suffer from oversmoothed reconstruction result and loss of high-frequency details²⁴ (Supplementary Fig. 1). By contrast, perceptual loss and adversarial loss²⁵, which have been extensively utilized in photograph SR tasks to restore the high-frequency details, are regarded inappropriate for microscopic image restoration because undesirable artifacts can be induced²⁴.

To achieve high-frequency detail reconstruction, here we investigated the possibility of using perceptual loss and adversarial loss for microscopic image restoration. We propose a multi-component loss function containing (i) the combination of multi-scale structure similarity loss and mean absolute error loss, noted as MS-SSIM-L1 loss, to improve the pixel-wise reconstruction accuracy, (ii) the perceptual loss to generate high-frequency structures, (iii) the adversarial loss from a U-net discriminator to provide pixel-wise feedback to the generator about whether the reconstructed image is true or fake, (iv) the frequency loss to suppress the high-frequency artifacts (Fig. 1a). To quantitatively assess the functionality of the proposed loss function, we simulated randomly distributed polymer lines in the GT images with a pixel size of 10 nm. The GT images were then blurred by a Gaussian kernel of 200-nm full-width-half-maximum (FWHM) size to generate the corresponding LR images. The result shows that the multi-component loss has notably improved the fine-structure reconstruction capability of the network compared to the conventional pixel-wise loss and effectively suppressed the artifacts in the original ESRGAN (Supplementary Fig. 2). We repeated the experiments on 30 images and constantly found that the network trained with the multi-component loss was able to restore fine structures from blurred LR images and achieved an MS-SSIM index of ~0.98 with respect to the GT, manifesting the capability of the network to transform a single diffraction-limited image to an SR image with a 10-fold resolution increase under the noise-free condition.

Unfortunately, the reconstruction quality quickly degrades if the image is corrupted by noise (Supplementary Fig. 3, without the edge map), which is reasonable since single-image super-resolution restoration is already an ill-posed problem, and noise will add further complexity to this task. Determining how to improve the reconstruction accuracy of noisy images remains a critical task. Here, we integrate the prior information (edge map) from an LR image into the network to aid in the reconstruction. Although edge priors have been considered in realistic photograph restoration³³, edge detection operators that are well suited for realistic photographs cannot be directly applied to microscopic images since the diffraction effect is not considered. As shown in Supplementary Fig. 4a, the edges extracted from the microscopic LR image by these operators fail to indicate the high-resolution structures in the GT image. Instead, we extracted a subpixel edge map from a microscopic image based on the radial symmetry of imaged fluorophores (Supplementary Fig. 4b) which has been utilized in super-resolution microscopy methods^34,35,36. Inspired by Gustafsson et al.³⁴, who analyzed the temporal cumulates in the radiality maps of a sequence of images to reconstruct one SR image, we computed the edge map from a single LR image (Supplementary Fig. 4c) and used it as an additional input to the network, which is proven to effectively improve the fine-structure reconstruction accuracy of the network from the noisy LR image (Supplementary Fig. 3, with the edge map).

We then wonder to what extent the network can maintain its performance in the presence of different levels of noise. To test this, we set the background and noise at a certain level and decreased the signal intensity to different levels to simulate a set of images with different SNRs as shown in Supplementary Fig. 5. After comparing the reconstruction results from inputs with different SNRs by visually inspecting the reconstruction quality (Supplementary Fig. 5a) and quantitatively analyzing the reconstruction accuracy (Supplementary Fig. 5b), we found that the network has a prerequisite for input SNR at about 15, which poses a challenge for applications in live-cell imaging where the signal level could be quite low due to short exposure and low illuminance. To address this challenge, we tried to improve the image SNR in advance. There are plenty of networks that could be used for denoising, such as RCAN²⁸ and CARE³⁷. Because the ESRGAN generator works well for denoising tasks with registered low-SNR (LSNR) and high-SNR (HSNR) data when trained with MS-SSIM-L1 loss and perceptual loss, we adopted another ESRGAN generator as SEN prior to SRN to progressively optimize the image SNR and resolution, and thus an SR image can be exquisitely restored from a low-SNR LR image (Fig. 1b). With the aid of SEN, the minimum SNR requirement of SFSRM could be extended to SNR of 7 (Supplementary Fig. 5, HSNR-SR), making it accessible to most live-cell applications (see Supplementary Note 1 for the SNR estimation in fluorescent imaging).

SFSRM reconstructs a super-resolution image from a single diffraction-limited image

Considering signal variations in density and intensity during the fluorescent imaging, we systemically evaluated the resolution and accuracy of SFSRM in a range of signal density and intensity on simulation line pairs. As shown in Supplementary Note 2 (Section I), we investigated the reconstruction accuracy of line-pairs with different interpair distances between 10 nm and 50 nm at different SNRs and signal densities and regarded the smallest interpair distance that can be resolved at an accuracy >0.85 as the best achievable resolution of the network (Supplementary Figs. 6–12). We summarized the achievable resolution of the network at different SNRs and signal densities in Fig. 2a which indicates that SFSRM can generally separate two lines that are 30 nm apart when the SNR of the LR image is above 7 and the signal density is <60%.

**Fig. 2: Overview performance of SFSRM.**

We further evaluated the network reconstruction accuracy considering more reconstruction errors such as missing/biased structures via HAWKMAN analysis³⁸ which gives a HAWKMAN score indicating the overall structural cross-correlation between the SR and HR images, and a confidence map marking the low-confidence structures (Supplementary Note 2, Section II, Supplementary Figs. 13 and 14). We regarded the HAWKMAN score as the reconstruction accuracy of the network. Figure 2b shows that SFSRM can achieve an accuracy over 0.9 when the SNR of the LR image is above 7 and signal density is <60%; while for higher signal density over 60%, the network requires a higher input SNR to achieve comparable accuracy.

SFSRM reconstructs a super-resolution image from a single experimental images of fixed cells

We then tried to validate the resolution of SFSRM on experimental images. DNA origami nanorulers are standard samples that have two fluorescent markers with a specified mark-to-mark distance. To test SFSRM on DNA origami nanorulers, we first simulated dot pairs with an interpair distance ranging from 20 nm to 50 nm randomly distributed in GT images. The GT images were then blurred by a Gaussian kernel of a 280-nm FWHM size and followed by applying Poisson noise and Gaussian noise to get the LR images. The results in Supplementary Fig. 15 show that SFSRM accurately reconstructs 46%, 80%, 82%, and 84% dot pairs of 20-nm, 30-nm, 40-nm, and 50-nm interpair distances from the indistinguishable spots in the LR image, the reconstruction bias within half of the interpair distance is 76%, 99%, 99%, and 99% respectively, suggesting a reliable highest resolution at ~30 nm, similar to our observation on the simulated line pairs. We then used the trained SFSRM network to process the experimental WF images of DNA origami nanorulers with a 30-nm mark-to-mark distance (Fig. 2c). SFSRM clearly distinguished two spots and accurately reconstructed the distance between the two spots which is about 30 nm as measured from the STORM image. By contrast, a representative deep-learning-based super-resolution method called ANNA-PALM³¹ is only able to reduce the size of the spot while failing to reconstruct the dot pairs from the blurred spots in the WF image given the challenging SNR of the WF image (e.g., SNR ~8).

We then investigated the performance of SFSRM on the experimental images of subcellular structures. We first validated the effectiveness of the SFSRM method on experimental images of fixed microtubules. We collected training data (11 frames of STORM images with the corresponding WF images) of fixed microtubules stained with Alexa Fluor 647, and trained the network with different strategies. The results in Supplementary Fig. 16 suggest our approach can effectively improve the reconstruction resolution and reconstruction fidelity of fine structure compared to the basic ESRGAN generator trained with the pixel-wise loss (MS-SSIM-L1 loss). To test network robustness to different levels of experimental noise. A sequence of images of different SNRs was obtained and processed by the network. The reconstructed SR images were then compared with the corresponding STORM image by HAWKMAN analysis. The confidence maps indicate that the reconstruction errors increase as the SNR decreases (Supplementary Fig. 17, LSNR-SR confidence map). When only SRN is used, the HAWKMAN score falls below 0.8 for SNR < 15, indicating a less reliable reconstruction result (Supplementary Fig. 17b, LSNR-SR). By contrast, if SEN is used in combination with SRN, the input SNR limit can be extended to SNR > 7 (Supplementary Fig. 17b, HSNR-SR).

We compared the performance of SFSRM and ANNA-PALM on experimental images of microtubules in Fig. 2d. Although ANNA-PALM successfully reconstructs isolated microtubules, some of the microtubules are merged or lost in the reconstruction results where the microtubules are densely distributed (Fig. 2d, indicated by white arrows). In contrast, SFSRM correctly reconstructed most microtubules without losing or merging them even when they are close to each other. Quantitative assessment of the network reconstruction fidelity via HAWKMAN analysis demonstrates notably reduced local errors in the SFSRM reconstruction and on-average higher fidelity of the SFSRM reconstruction (HAWKMAN score: 0.95 vs. 0.90) (Fig. 2d, confidence maps). As depicted by the intensity profiles for the lines in Fig. 2d, two microtubules only 75 nm apart are indistinguishable in the ANNA-PALM reconstruction and are resolved in the SFSRM reconstruction result (Fig. 2d, plot), demonstrating a superior fine-structure reconstruction capability of SFSRM. In addition to our experimental data, SFSRM also achieves comparable reconstruction results to those via Deep-STORM³⁰ on the public dataset from the EPFL SMLM challenge website³⁹ (Supplementary Fig. 18). Unlike Deep-STORM which requires 300 frames of densely-distributed single-molecule images to reconstruct an SR image, SFSRM restores the SR image only from a single WF image, greatly reducing the photobleaching to the specimen as well as the data acquisition time. Apart from filaments, the performance of SFSRM on diverse subcellular structures is also promising. As shown in Fig. 2e, SFSRM resolves the ring-shaped clathrin-coated pits (CCPs) with diameters ranging from 50 nm to 160 nm from the noisy WF image. The estimated diameters of the CCPs from the SFSRM reconstructions show good consistency with that measured from the STORM images (Supplementary Fig. 19).

We further benchmarked the performance of SFSRM on more subcellular structures including mitochondrial outer membrane, endoplasmic reticulum (ER), epidermal growth factor receptor (EGFR) protein, and nuclear pore complex proteins post a 2.5-fold expansion (Fig. 3a). The reconstruction fidelity is measured by the MS-SSIM index of the SR images with respect to the STORM images, and the resolution is measured by decorrelation analysis⁴⁰. SFSRM achieves in general an MS-SSIM score over 0.8 (Fig. 3b) and resolutions of different structures ranging from 15 nm to 40 nm, consistent with those obtained from the corresponding STORM images (Fig. 3c). In addition to the SR reconstruction of diverse organelles, SFSRM also demonstrates remarkable robustness to changes in imaging conditions including different imaging systems (Fig. 3d) and different spectra (Fig. 3e). Therefore, it serves as a versatile tool to transform different types of LR images to their SR counterparts by overcoming the limitations of SR microscopy such as requiring fluorophore blinking, long acquisition time, and high illuminance.

**Fig. 3: SFSRM applies to different subcellular structures, imaging systems, and spectra.**

SFSRM enables live-cell SR imaging at millisecond temporal resolution

Benefiting from its capability of reconstructing an SR image from a low-SNR LR image, SFSRM achieves SR imaging of ER in live cells at low illuminance (e.g., 15 W/cm²) (Fig. 4a), which allows long-term observation of ER dynamics for over 5000 frames without an apparent shrinking of the ER network or bleaching of fluorescent signals (Supplementary Movie 1). We assessed the reconstruction fidelity via resolution-scaled error analysis⁴¹ and network ensemble disagreement (see more details about disagreement analysis in Supplementary Note 2 (Section II). The resolution-scaled error analysis measures a resolution-scaled error and a resolution-scaled Pearson coefficient to indicate the correlation between the SR and WF images based on their intensity distribution. As shown in the error maps in Fig. 4b, no significant artifacts were found. Instead, we observed some errors in the upper corner of the error maps which gradually fade out. This might be caused by the non-linear mapping between the SR images and the WF images since STORM images which are regarded as the GT images of the network cannot preserve the intensity information in the WF images. In contrast to the error maps which indicates the apparent errors occurring in the upper corner, the disagreement maps suggest that some abnormally thin ER tubules in the reconstruction images could be problematic (Fig. 4b, disagreement map; Supplementary Fig. 20).

**Fig. 4: SFSRM enables noninvasive super-resolution imaging in live cells at millisecond temporal resolution for thousands of frames.**

The enhancements in both spatial and temporal resolutions can promote the visualization of mitochondrial dynamics in live cells (Fig. 4c). The SR imaging of mitochondria at 100 Hz reveals frequent “kiss-and-run” interactions between mitochondria at the millisecond scale, which are indistinguishable in the LR time-lapse images (Fig. 4d and Supplementary Movie 2). Because of the high local signal density in mitochondria, it is necessary to check whether the observed fusion and fission events are true mitochondrial interactions or reconstruction artifacts. Therefore we investigated the network reconstruction consistency of the image sequence. We used the network to reconstruct a sequence of WF images of mitochondria in fixed cells and analyzed the consistency of the SR image sequence by calculating the pixel-wise agreement score (Supplementary Fig. 21a). Comparing the agreement map with the STORM image, no obvious reconstruction errors are detectable at the junctions where the mitochondrial connections are solid (Supplementary Fig. 21b, solid connections). However, reconstruction errors occur at junctions with ambiguous connections, which can further induce artificial fusion/fission events. Fortunately, these junctions can be detected by the agreement map (Supplementary Fig. 21b, ambiguous connections). This suggests that we can use the agreement map to detect the problematic junctions in the SR images. However, it is not feasible to acquire multiple WF images of the same sample to calculate the agreement map in live-cell imaging. Hence we used the five adjacent frames in the SR time-lapse images to check the temporal consistency (Supplementary Fig. 22a), which helps to filter out structures with low agreement scores in the five adjacent frames. From the filtered SR time-lapse images, we observed a higher transient fission and fusion frequency in the SR image sequence compared to those observed from the LR image sequence (Fig. 4f). Besides, the mitochondrial morphological changes can also be more precisely quantified with the aid of SFSRM. Figure 4e indicates the mitochondria undergo diverse morphological changes; however, these changes can barely be detected in the LR time-lapse images (Supplementary Movie 2). Clear tracking of the mitochondrial morphological changes enabled by SFSRM improves the segmentation accuracy and discloses a more rapid mitochondrial area change (Fig. 4g) that might be associated with the transient fusion which is reported to enhance the functional stability and plasticity of mitochondria⁴².

Real-time SFSRM imaging has also revealed some dynamics of microtubules that were unexplored in previous studies. Compared to the wide-field imaging result, the tangled microtubule’s network is more clearly resolved in the SR image with various morphologies such as bending, crossing, and bundles (Fig. 4h, SR; Supplementary Movie 3, part I). Besides, deformation dynamics of microtubules, such as bending (Supplementary Fig. 23a and Supplementary Movie 3, part II), growth and shrink instability (Supplementary Fig. 23c and Supplementary Movie 3, part II), are recorded at high temporal resolution (10-ms intervals), which allows us to capture high-frequency fluctuations including the time-varying bending (Supplementary Fig. 23b; 100 Hz) and the random-walk growth trajectory (Supplementary Fig. 23d; 100 Hz). These results suggest that the intracellular dynamics at the millisecond scale may be greatly underestimated at a low sampling frequency (Supplementary Figs. 23b,d; 2 Hz). Moreover, we also noticed the intracellular transverse fluctuation of microtubules from the temporal-coded image of the SR image sequence, which is undetectable in the LR counterpart (Fig. 4j). We observed the transverse position of the microtubule varies in an ~600 nm range (Fig. 4k and Supplementary Movie 3, part III), which are composed of displacements ranging from −100 nm to 100 nm at 50 ms intervals (Fig. 4l), far larger than the system drift (<10 nm in 50 s; Supplementary Fig. 24a). To figure out whether these displacements are true microtubule vibrations or noise-induced reconstruction misplacements, fixed microtubules were imaged at different illumination intensities to get time-lapse images at different SNRs. The statistical results in Supplementary Fig. 24b indicate that only less than 18% of microtubules present noise-induced reconstruction misplacements and these misplacements are within the ±25 nm range when SNR is >7. However, in the SFSRM reconstructed live-cell image sequences, we observed ~46% displacements in the ±25 nm range and ~18% displacements in ±25 ~ 100 nm. Besides, we also observed that the microtubule can rapidly move towards one direction in a short time as shown in Supplementary Fig. 24c and the distributions of the displacements shift toward that direction correspondingly, suggesting that true microtubule fluctuations rather than random reconstruction misplacements were present in our SFSRM live-cell imaging. The rapid and random fluctuations of microtubules could further cause the local microtubule’s network morphology changes such as bundle instability (Fig. 4i), which may be involved in multiple cellular functions, such as organizing and maintaining cell shape⁴³, promoting cilia movement⁴⁴, and modulating cargo transport⁴⁵.

SFSRM reveals the millisecond dynamics of cargo trafficking in live cells

Intracellular transport plays an essential role in maintaining cellular functions. Many cellular processes rely on the transport system to deliver proteins or organelles to a specific functional location. External cargos such as viruses and nanoparticles also utilize the transport system to deliver their genomes or drugs to specific compartments for function⁴⁶. Considering that the cytoplasm of eukaryotic cells is highly crowded and dynamic, how cargo is delivered across the cytoplasm to specific positions remains largely unclear. Previous studies have reported that microtubules serve as highways to deliver cargo between the perinuclear region and the cell periphery in rapid and directed motions involving motor proteins⁴⁷. Recently, facilitated by single-particle tracking techniques^48,49, mounting dynamic behaviors during the cargo transport process, e.g., back-and-forth movement, rotation, pause, and switching direction, have been discovered, suggesting that rapid and directed motions are frequently interrupted. Some in vitro studies have suggested that the intersections of the microtubules are likely to interfere with cargo transport and form tethering points for cargo⁴⁵. Single-particle tracking combined with confocal microscopy or STORM microscopy has also been employed to investigate vesicle behavior at microtubule intersections in live cells^48,49,50. However, confocal microscopy fails to provide a high-resolution microtubule map, while STORM requires the sequential imaging of vesicles and microtubules. In addition, to register the vesicle trajectory along microtubules in STORM images, microtubule dynamics are stabilized by paclitaxel and nocodazole during live-cell imaging⁵⁰. The lack of real-time high-resolution microtubule imaging has impeded the further exploration of cargo-microtubule interactions. Hence, the underlying mechanism of the complicated dynamics of vesicular trafficking along microtubules remains largely unknown.

Here, benefiting from the high spatiotemporal resolution offered by SFSRM, we can simultaneously monitor the cargo and microtubule dynamics, thereby investigating how the observed microtubule dynamics could affect cargo transport. As a demonstration, we imaged the intracellular transport of the endocytic trafficking of epidermal growth factor (EGF) protein in live cells. The internalization of EGF was recorded with dual-channel SFSRM (Fig. 5a and Supplementary Movie 4, part I). High-spatiotemporal-resolution videometry reveals the vesicle transport details, from which we noticed that slight fluctuations of a single microtubule do not interrupt the directed transport of vesicles, but the motions of the vesicles along the fluctuating microtubules are significantly more dynamic than expected. Figure 5b illustrates three examples of vesicle transport dynamics: (I) moving back and forth along a microtubule, (II) moving around a microtubule along a sinusoidal-like trajectory (Supplementary Fig. 25), and (III) colliding with other vesicles and then changing direction (Supplementary Movie 4, part II). These subtle and fast random walks are undetectable at low spatial (Fig. 5c) or temporal (Fig. 5d) resolutions (Supplementary Movie 4, part II), which implies that vesicle movement is scale-dependent. At the millisecond scale, thermal diffusion is dominant (Fig. 5e, 100 Hz, α = 0.25), and at the second scale, directed transport dominates (Fig. 5e, 100 Hz, α = 1.2)⁵¹. At inadequate imaging speeds, these diffusive motions would have been missed, as manifested by the distinct trajectories derived by the images taken at 2 Hz and 100 Hz shown in the MSD plot (Fig. 5e, 2 Hz vs. 100 Hz). Consequently, the actual instantaneous velocity of vesicles during transport, which is ~4 µm/s (Fig. 5f, 100 Hz), would have been substantially underestimated (estimated as ~0.5 µm/s in Fig. 5f at 2 Hz, which is in accordance with a previous report⁴⁹).

**Fig. 5: Dual-color real-time SFSRM imaging reveals the microtubule-vesicle interactions.**

In addition to these subtle diffusive motions, we also observed nondirected transport, which has been reported in previous studies using single-particle tracking (SPT) but not fully explained^49,52. Compared to SPT, our SFSRM can not only precisely determine the vesicle positions following its moving trajectory (Supplementary Fig. 26), but also resolve the vesicle morphology and orientation, thus enabling the study of the interaction of vesicles with their surroundings in a dynamic process. From the dual-channel video via SFSRM (Supplementary Movie 4), we observed some microtubule dynamics that might contribute to nondirected vesicle transport. For example, the transverse movement of a microtubule can transfer the vesicles attached to it to a nearby microtubule (Fig. 5g, first row, and the fluctuations in the surrounding microtubules can cause vesicles to switch among different microtubules (Fig. 5g, second row), resulting in nondirected transport (Supplementary Movie 4, part III). We compared the instantaneous velocities of directed transport and nondirected transport in Fig. 5h and found that nondirected movements have an ~2-fold higher average instantaneous velocity and a four-fold broader range of distribution than directed movements, indicating that these displacements are likely related to microtubule fluctuations rather than motor-driven movement. This observation is in good agreement with the results of a previous study by Giannakakou et al.⁵³, who reported that the suppression of microtubule dynamics enhanced nuclear-targeted cargo P53 accumulation near the cell nucleus. More interestingly, the movement of vesicles can in turn contribute to the microtubule morphology change (Supplementary Fig. 27).

Since microtubules are densely distributed, aside from providing tracks for vesicles, they also form intersections that may interrupt vesicle transport. Previous studies have shown that vesicles may pass, pause, switch, or reverse at an intersection^45,50. In our experiments, we noticed that all vesicles eventually passed the observed intersections; however, the dwell time varied greatly and largely depended on the complexity of the intersection. To further quantify how the complexity of the intersection would affect the vesicle transport, we first assessed the accuracy of our network in reconstructing the microtubule network morphology. As shown in Supplementary Fig. 28, we quantitatively analyzed the reconstruction errors which may affect the identification of intersections, for example, reconstructing an artificial microtubule (false positive error) or missing a microtubule (false negative error) (Supplementary Fig. 28b), false positive or negative intersections, or wrong microtubule number at the intersection (Supplementary Fig. 28c), as a function of signal density. As expected, the false rate increases as the signal gets dense. To ensure the reconstruction error rate is smaller than 15% (corresponding to a HAWKMAN score >0.8), we select regions with signal density <50% for the following analysis. The intersections in the selected regions are classified into three groups based on the number of microtubules at each intersection (Fig. 5i). For the simplest intersections of two microtubules, the vesicle can easily pass through it by climbing over one microtubule, usually within two seconds; and the microtubule vibration is unlikely to interrupt vesicle transport. For intersections with 3–5 microtubules, the vesicles tended to interfere with the dynamics of nearby microtubules. Thus, if the surrounding microtubules fluctuate severely, the vesicles are hindered, and the time needed to pass through these intersections ranged from several to ten seconds. For intersections involving more than 5 microtubules tethered together, the vesicles were most likely to be trapped at the intersection until the fluctuations of the surrounding microtubules became coordinated and the stellate intersection loosened. However, coordinated fluctuations and intersection loosening are highly uncertain, and such processes may take tens of seconds to minutes Supplementary Movie 4, part IV). A statistical comparison of the dwell time at different kinds of intersections is shown in Fig. 5j. Generally, the more complex the intersection is, the longer the resulting dwell time. For intersections involving more than 5 microtubules, the dwell time could be longer than one minute. Fortunately, this kind of intersection only accounts for ~9% of all intersections in a cell, whereas more than half of the intersections consist of only 3–5 microtubules (Fig. 5j, pie chart).

SFSRM is robust to different imaging systems and different samples

In the above demonstration, we used a Zeiss Elyra 7 microscope as the live-cell imaging system. Here, we demonstrate that SFSRM can also be applied in different live-cell imaging systems without retraining the networks. We validated the robustness of our network based on a commercial confocal microscope (Confocal sp8, Zeiss). Compared to that used for WF imaging, a confocal microscope requires a longer time (2.5 s/frame) to obtain a dual-channel image due to the point scanning strategy used. Here, we recorded the EGF receptor (EGFR) protein transport dynamics at 0.4 Hz for over 10 min after EGF treatment (Supplementary Movie 5, part I). Long-term observation allows us to discover some long-time-scale phenomena. For example, as shown in Fig. 6a, the EGFR protein gradually accumulates in endosomes, which appear as ring structures in the image. We noticed that the microtubules tended to generate local grids to trap the endosomes, as shown in the zoomed-in view in Fig. 6a. These traps will actively participate in the transport (Fig. 6b) and fusion processes of the endosomes (Fig. 6c), and the morphology of these grids will dynamically change in response to the endosome shape (Supplementary Movie 5, part II).

**Fig. 6: SFSRM imaging allows long-time-scale observation.**

SFSRM can be also applied in monitoring different live-cell dynamic processes involving various subcellular structures. For example, Fig. 7a shows the colocalization of clathrin protein and EGFR protein after EGF treatment, indicating the role of clathrin protein during EGF endocytosis. In addition to mediating the endocytosis process of EGF by generating ring-shaped CCPs (Supplementary Fig. 29; Supplementary Movie 6), clathrin protein is also recruited to endosomes that are larger than endocytic vesicles (Fig. 7b). During the fusion processof two adjacent endosomes, the membranes of endosomes that are uncoated with clathrin fused with each other. After the two endosomes are fully-fused, the extra clathrin protein is released from the fused endosomes (Supplementary Movie 6). The function of the clathrin protein on endosomes has been reported in cargo sorting⁵⁴. After being delivered to the early endosomes, some of the endocytic EGFR will be sorted to tubular structures to be retrieved back to the cell surface⁵⁵. As demonstrated in Fig. 7c, we detected that the endocytic EGFR in endosomes was sorted to the tubular membranes on endosomes and then vesicles enriched with EGFR were generated via the fission of tubular membranes (Supplementary Movie 6).

**Fig. 7: Dual-color real-time SFSRM imaging reveals subcellular dynamics of diverse organelles.**

Besides, SRSFM also reveals the intensive contact between mitochondria and endoplasmic reticulum (ER) (Fig. 7d and Supplementary Movie 7) such as mitochondrial fission at the ER-mitochondria contact site⁵⁶ (Fig. 7e), mitochondrial growth, and branch along the ER tubules, as well as the ER tubule hitchhiking on a moving mitochondria⁵⁷ (Supplementary Fig. 30 and Supplementary Movie 7). These subtle yet fast interplays between different organelles, which could happen within a second, can be revealed by SFSRM, manifesting the high spatiotemporal resolution of SFSRM will make vital contributions to biological sciences involving live-cell dynamic processes.

Discussion

We developed an SFSRM method for single-frame SR reconstruction from the LR images acquired from live cells with up to 10-fold resolution improvement. We have demonstrated that SFSRM restores subtle structures from LR images owing to the adoption of multicomponent loss and shows superior robustness to the corruption of noise with the assistance of the edge map. Besides, by employing a dual subnet framework that progressively improves the image SNR and resolution, the single-frame image conversion of SFSRM circumvents the possible tradeoffs among the spatial resolution, imaging speed, and light dose in the SR microscopies and allows long-term SR imaging in live cells without inducing noticeable photodamage to the cells. The coupling of SFSRM with different live-cell imaging systems improves the spatiotemporal resolution of fluorescence microscopes with a limited photon budget, thus serving as a powerful tool combining merits of the super-resolution microscopy and real-time live-cell imaging.

Nonetheless, as with all other learning-based methods, SFSRM faces the accuracy concern. Although it achieves an on-average higher accuracy than previous methods (Fig. 2 and Supplementary Figs. 16 and 18), SFSRM still has local reconstruction errors. Therefore, we have performed careful inspections of SFSRM reconstructions comparing to their corresponding GT images on the fixed samples based on multiple metrics including reconstruction bias, MS-SSIM, and HAWKMAN score. From our observations, the reconstruction accuracy of SFSRM shows dependency on the SNR and signal density of the input image (Supplementary Note 2, Supplementary Figs. 5, 7–14, 17). It achieves a generally satisfactory accuracy (mean reconstruction bias <1 pixel, HAWKMAN score >0.8, error rate <0.15) when the input SNR is >7 and the signal density is ≤0.5. While in live-cell applications where the GT images are inaccessible, we have tried assessments based on the similarity with the LR images (Fig. 4b) and the reconstruction uncertainty analysis of noise ensemble images of the same sample, network ensembles, and adjacent frames in live-cell imaging (Supplementary Note 2 and Supplementary Figs. 20–22). Considering SFSRM is robust enough for different spectra and imaging systems (Fig. 3d, e), we believe our prior validation of SFSRM on different SNRs, signal densities, and structures on fixed samples, combined with the quality check methods in live-cell imaging, can make it a practical tool in live-cell super-resolution imaging. In addition to these on-average accuracy evaluations, we also suggest custom-designed evaluations for specific applications. For example, when we were investigating the microtubule’s transversal fluctuations, we validated the pixel-level reconstruction consistency of SFSRM under different SNRs on the fixed cells in advance (Supplementary Fig. 24); and when we were studying the influence of the microtubule intersections on the vesicle transport, we quantified the precision of SFSRM for detecting the correct number of microtubules at each intersection beforehand (Supplementary Fig. 28).

The implementations of SFSRM with common fluorescent microscopes have allowed high-frequency transverse vibration of microtubules and surprisingly dynamic behaviors of vesicles such as diffusive motions along microtubules, swinging on microtubules, and switching between microtubules to be clearly resolved. Many of these processes, which have not been seen before, enhance our understanding of the real intracellular transport environment. Moreover, other subcellular processes revealed by SFSRM such as clathrin-endosome colocalization and ER-mitochondria interactions, suggest the potential of SFSRM in promoting the investigation of subcellular processes that necessitate interpreting temporal dynamics in the context of ultrastructural information, which may open doors to discoveries in live-cell imaging involving organelle dynamics⁵⁸ and interactions⁵⁹.

Overall, we consider SFSRM a useful alternative to traditional SR microscopies in challenging conditions such as low illuminance, short acquisition time, or multi-channel SR imaging. Its success lies in the use of adequate training data obtained for advanced SR microscopies. Considering these high-cost SR imaging systems are not ubiquitous in most biological laboratories, we hope SFSRM can make the most of the available SR datasets and serve more researchers. Nevertheless, in a realistic experimental setting, the risks that structural features presented in the experimental images do not match the training dataset range cannot be precluded. For example, the curvatures of the microtubules or the diameters of the clathrin-coated-pits are outside the curvature/diameter range of the training dataset, or cruder mismatch such as the network trained by endoplasmic reticulum being used to process the images of microtubules. From our observation in Supplementary Note 3(Supplementary Figs. 31–33) and Supplementary Fig. 34, the SFSRM network is not able to handle such scenarios. A convenient way to address this problem is to fine-tune the trained network using matched training dataset, which helps the network quickly adjust to a different structure feature space.

Beyond our approach, our observation that the network can reconstruct the SR image from a 10-fold blurred LR image without any difficulty while reconstructing a noise-corrupted image will need much more effort such as the edge map assistance and the multicomponent loss function, suggests improving the SNR of the network input and introducing prior regulations (e.g. total variation prior⁶⁰, sparse prior⁶¹) are useful strategies to improve the reconstruction quality and could be further investigated to improve the performance of other networks.

Methods

SFSRM network

Network architecture

The networks of our SFSRM, including the SEN and SRN, are based on the ESRGAN generator²⁵, which includes 23 residual-in-residual dense blocks used to map low-resolution images to super-resolution images. By inheriting the basic architecture of SRGAN²⁴, this network performs most computations in the LR feature space, hence reducing complexity and achieving high stability without requiring batch normalization (BN) layers²⁵. The original ESRGAN is designed for a single RGB image. When it was applied to a grayscale image, we found that the network will easily crash at the beginning of or during the training process if we duplicate the grayscale image three times to generate a fake RGB input. Therefore, we adopted a single-channel ESRGAN generator. To incorporate the prior information provided by the edge map, we added another input channel to the network. The input LR image and the corresponding edge map are initially concatenated to generate a two-channel input to the generator. Similarly, duplicated grayscale images are used as fake RGB inputs to the well-trained VGG network⁶² for feature map extraction.

Loss functions

To generate high-resolution details while maintaining high fidelity, the network is trained with a multicomponent loss function, as follows:

(1)
Content loss evaluates the L₁-norm distance between an estimated SR image ${{{{{\rm{G}}}}}}\left(x\right)$ and a GT image $y$. L₁-norm loss focuses on pixel differences, thus allowing the network to quickly converge but often resulting in a blurred image.
$${L}_{1}={{{{{\rm{||G}}}}}}\left(x\right)-y{{{{{\rm{||}}}}}}$$
(1)

MS-SSIM measures the structural similarity of SR and GT images based on luminance, contrast, and structure at different scales. The computation of MS-SSIM is detailed in the assessment metrics section. Here, we focus on the construction of the loss function. MS-SSIM loss is defined as:
$${L}_{{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}}=1-{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}\left({{{{{\rm{G}}}}}}\left(x\right),y\right)$$
(2)

Content loss is a hybrid of MS-SSIM loss and L₁-norm loss and is noted as MS-SSIM-L1 loss:
$${L}_{{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}-{{{{{\rm{L}}}}}}1}=\alpha \cdot {L}_{{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}}+(1-\alpha )\cdot {L}_{1}$$
(3)
where $\alpha$ is used to balance the contributions of MS-SSIM loss and ${L}_{1}$-norm loss and is empirically set as $\alpha=0.84$⁶³.
(2)
Perceptual loss ${L}_{{{{{{\rm{Percep}}}}}}}$ is used to measure feature distance differences in the estimated SR image and corresponding GT image. Features are extracted by a VGG network⁶² pretrained for material recognition and that is good at texture extraction.
$${L}_{{{{{{\rm{Percep}}}}}}}={{{{{\rm{||F}}}}}}\left({{{{{\rm{G}}}}}}(x)\right)-{{{{{\rm{F}}}}}}(y){{{{{\rm{||}}}}}}$$
(4)
where F represents the feature extraction network.
(3)
Adversarial loss estimates the probability that the discriminator input $x$ is real or fake. Here we use U-net as the discriminator, which has an encoder and a decoder. The discriminator is trained to provide both global and pixelwise decisions on whether the input image is real or fake⁶⁴. Specifically, an input real image $y$ or fake image ${{{{{\rm{G}}}}}}(x)$ will be first gradually convolved by the encoder to one pixel to get a global decision on whether this image is real or fake, then the input will be gradually deconvolved by the decoder to its original size to get a per-pixel decision on whether this pixel is real or fake. The encoder and decoder are trained by the following losses:
$${L}_{{{{{\rm{enc}}}}}}=-{{{{{\rm{E}}}}}}\left[{\log }{D}_{{{{{\rm{enc}}}}}}\left( y \right)\right]-{{{{{\rm{E}}}}}}\left[{\log }(1-{D}_{{{{{{\rm{enc}}}}}}}\left({{{{{\rm{G}}}}}}(x)\right))\right]$$
(5)
$${L}_{{{{{{\rm{dec}}}}}}}=-{{{{{\rm{E}}}}}}\left[\mathop{\sum}\limits_{i,j}{\log }{\left[{D}_{{{{{{\rm{dec}}}}}}}\left(y\right)\right]}_{i,j}\right]-{{{{{\rm{E}}}}}}\left[\mathop{\sum}\limits_{i,j}{\log }(1-{\left[{D}_{{{{{{\rm{dec}}}}}}}\left({{{{{\rm{G}}}}}}(x)\right)\right]}_{i,j})\right]$$
(6)
where ${D}_{{{{{{\rm{enc}}}}}}}\left( \cdot \right)$ is the encoder decision of the whole input and ${\left[{D}_{{{{{{\rm{dec}}}}}}}\left(\cdot\right)\right]}_{i,j}$ is the decoder decision at pixel $\left(i,\, j\right)$; E[·] represents taking the average for all data in the minibatch. The discriminator is trained by both encoder loss and decoder loss.
$${L}_{D}={L}_{{{{{{\rm{enc}}}}}}}+{L}_{{{{{{\rm{dec}}}}}}}$$
(7)

Correspondingly, the discriminator feedback to the generator, i.e., the adversarial loss is formulated as
$${L}_{{{{{{\rm{Adv}}}}}}}=-{{{{{\rm{E}}}}}}[{\log }{D}_{{{{{{\rm{enc}}}}}}}\left({{{{{\rm{G}}}}}}(x)\right)]-{{{{{\rm{E}}}}}}\left[\mathop{\sum}\limits_{i,j}{\log }{\left[{D}_{{{{{{\rm{dec}}}}}}}\left({{{{{\rm{G}}}}}}(x)\right)\right]}_{i,j}\right]$$
(8)
(4)
Frequency loss compares the frequency difference between an estimated SR and the original GT image:

$${L}_{{{{{{\rm{Freq}}}}}}}={{{{{\rm{||FFT}}}}}}\left({{{{{\rm{G}}}}}}(x)\right)-{{{{{\rm{FFT}}}}}}(y){{{{{\rm{||}}}}}}$$

(9)

where FFT is the fast Fourier transformation function. We compared all frequency components when the GT images do not contain noise and 75% of frequency components when the GT images contain noise, for experimental images as well as some simulation images.

When using the ESRGAN as SEN for signal enhancement, the network only uses LR images as single-channel inputs. The training of the SEN includes two steps:

(1)
Training with MS-SSIM-L1 loss for ~100,000 minibatch iterations at a 3 × 10⁻⁴ learning rate
$${L}_{{{{{{\rm{G}}}}}}}={L}_{{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}-{{{{{\rm{L}}}}}}1}$$
(10)
(2)
Training with MS-SSIM-L1 loss and perceptual loss for 20,000 to 50,000 minibatch iterations at a 1 × 10⁻⁴ learning rate.

$${L}_{{{{{{\rm{G}}}}}}}={L}_{{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}-{{{{{\rm{L}}}}}}1}+{\delta \cdot L}_{{{{{{\rm{percep}}}}}}}$$

(11)

where $\delta$ is the coefficient to balance different loss components and we empirically set $\delta=0.1$.

When using the ESRGAN as SRN for super-resolution restoration, the network uses both LR images and edge maps as inputs. The training process also includes two stages. The first stage uses the same loss function as the SEN, and the second stage uses the following loss function with a 5 × 10⁻⁵ learning rate for ~10,000 minibatch iterations.

$${L}_{G}={L}_{{{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}-{{{{{\rm{L}}}}}}1}+{\delta \cdot L}_{{{{{{\rm{Percep}}}}}}}+\beta \cdot {L}_{{{{{{\rm{Adv}}}}}}}+\gamma \cdot {L}_{{{{{{\rm{Freq}}}}}}}$$

(12)

In our experiments, we empirically set $\delta$, $\beta$, $\gamma$ and to $\delta=0.1$, $\beta=0.001$, and $\gamma=0.01$, respectively.

Assessment metrics

Multiscale structure similarity (MS-SSIM)⁶⁵ quantifies the similarity of two images and is an improvement of SSIM⁶⁶, which assesses the similarity between two images, $x$ and $y$, based on three factors: luminance $l\left(x,\, y\right)$, contrast $c\left(x,\, y\right)$, and structure $s\left(x,y\right)$.

$$l\left(x,y\right)=\frac{2{u}_{{{{{{\rm{x}}}}}}}{u}_{{{{{{\rm{y}}}}}}}+{C}_{1}}{{{u}_{{{{{{\rm{x}}}}}}}}^{2}{{u}_{{{{{{\rm{y}}}}}}}}^{2}+{C}_{1}}$$

(13)

$$c\left(x,y\right)=\frac{2{\sigma }_{{{{{{\rm{x}}}}}}}{\sigma }_{{{{{{\rm{y}}}}}}}+{C}_{2}}{{{\sigma }_{{{{{{\rm{x}}}}}}}}^{2}{{\sigma }_{{{{{{\rm{y}}}}}}}}^{2}+{C}_{2}}$$

(14)

$$s\left(x,y\right)=\frac{2{\sigma }_{{{{{{\rm{xy}}}}}}}+{C}_{3}}{{\sigma }_{{{{{{\rm{x}}}}}}}{\sigma }_{{{{{{\rm{y}}}}}}}+{C}_{3}}$$

(15)

where ${u}_{{{{{{\rm{x}}}}}}}{,\, u}_{{{{{{\rm{y}}}}}}}$ represent the average of $x,\, y$; ${\sigma }_{{{{{{\rm{x}}}}}}},\, {\sigma }_{{{{{{\rm{y}}}}}}}$ represent the variance of $x,y$; ${C}_{1}$, ${C}_{2}$ and ${C}_{3}$ are small constants given by ${C}_{1}={({K}_{1}L)}^{2}$, ${C}_{2}={({K}_{2}L)}^{2}$, and ${C}_{3}={C}_{2}/2$. Here $L$ is the dynamic range of pixel values, and ${K}_{1}$ and ${K}_{2}$ are two scalar constants.

The general form of SSIM is defined as:

$${{{{{\rm{SSIM}}}}}}\left(x,\, y\right)={\left[l\left(x,\, y\right)\right]}^{\alpha }{\left[c\left(x,\, y\right)\right]}^{\beta }{\left[s\left(x,\, y\right)\right]}^{\gamma }$$

(16)

where $\alpha$, $\beta$, and $\gamma$ are parameters used to define the relative importance of the three components and are set to 1 in most cases⁶⁶.

MS-SSIM is calculated by iteratively applying low-pass filters, down sampling the filtered image result by a factor M and then calculating the SSIM index of the scaled images. The overall MS-SSIM evaluation is based on combining the measurements at different scales:

$${{{{{\rm{MS}}}}}}-{{{{{\rm{SSIM}}}}}}\left(x,\, y\right)={\left[l\left(x,\, y\right)\right]}^{{\alpha }_{{{{{{\rm{j}}}}}}}M}.\mathop{\prod }\limits_{j=1}^{M}{\left[{c}_{{{{{{\rm{j}}}}}}}\left(x,\, y\right)\right]}^{{\beta }_{{{{{{\rm{j}}}}}}}}{\left[{s}_{{{{{{\rm{j}}}}}}}\left(x,\, y\right)\right]}^{{\gamma }_{{{{{{\rm{j}}}}}}}}$$

(17)

where ${\alpha }_{{{{{{\rm{j}}}}}}}$, ${\beta }_{{{{{{\rm{j}}}}}}}$, and ${\gamma }_{{{{{{\rm{j}}}}}}}$ are used to adjust the relative importance of different components⁶⁵.

HAWKMAN analysis³⁸ assesses the similarity of two images based on their structures rather than their intensity, making it suitable for SMLM images whose intensity is not linearly related to the labeling density. In HAWKMAN analysis, two images are first normalized and blurred by Gaussian kernels with successive sizes up to a user-specified maximum, and the blurred images are then normalized by the maximum intensity of one. Next, the images are binarised based on the local threshold to extract the feature signals. The obtained images are regarded as sharpening images. The sharpening images are further blurred and flattened, re-binarised at a higher threshold, and then skeletonized to get skeletonized images. The skeletonized images are re-blurred with a Gaussian kernel of FWHM equal to the original scale to get the structure images. Finally, the cross-correlations of the sharpening images and the structure images are calculated to yield a confidence score of the test image (here we note as HAWKMAN score). And a confidence map is produced and a local confidence score below 0.85 indicates that the structures are less trustable.

$${{{{{\rm{HAWKMAN}}}}}}\; {{{{{\rm{score}}}}}}=\frac{1}{2}{\min }\left(1,\frac{{{{{{{\rm{PCC}}}}}}}^{{{{{{\rm{sharp}}}}}}}}{0.85}\right)+\frac{1}{2}{\min }\left(1,\frac{{{{{{{\rm{PCC}}}}}}}^{{{{{{\rm{str}}}}}}}}{0.85}\right)$$

(18)

where PCC^sharp and PCC^str are the Pearson correlation coefficients for the sharpening and structure images.

Signal density is computed from a GT image by first conducting binarization for the image to extract the signal-containing pixels and then calculating the ratio of the number of signal-containing pixels to the total number of pixels in the image.

$${{{{{\rm{signal\; density}}}}}}=\frac{{{{{{{\rm{Pixel}}}}}}}_{{{{{{\rm{signal}}}}}}}}{{{{{{{\rm{Pixel}}}}}}}_{{{{{{\rm{total}}}}}}}}$$

(19)

Simulation image generation

For the simulation of polymer lines, simulated polymer chains in a 10 × 10 µm² region were generated in MATLAB. The polymer density was set to 50 polymers per image to mimic a densely distributed microtubule network. The GT image was created by fitting the fluorophore positions to an image with a pixel size of 10 nm and convolved with a Gaussian kernel of a 20-nm FWHM size. For the GT image, no noise and a uniform background were used. Similar to the process of generating the GT image, the corresponding LR image was generated by fitting the fluorophore positions to images with a pixel size of 100 nm and then performing convolution with a Gaussian kernel of a 200-nm FWHM size. In addition to the background, Poisson noise and read noise were added to the LR image.

For the simulation of dot pairs and line pairs, simulated line/dot pairs with a distance randomly decided in the range of 10 nm/20 nm to 50 nm, and randomly distributed in a 10 × 10 µm² region were first generated. The GT image was created by fitting the signal positions to an image with a pixel size of 10 nm and convolved with a Gaussian kernel of a 20-nm FWHM size. The GT images were then blurred by a Gaussian kernel of a 280-nm FWHM size and followed by applying Poisson noise and Gaussian noise to get the LR images.

Sample preparation

Cell culture and transfection

The Beas2B cell line was bought from ATCC (CRL-9609) and was grown in Dulbecco’s Modified Eagle Medium (DMEM) (Gibco) supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin/streptomycin at 37 °C. The plasmid constructs used in this study included EGFR-mCherry, EGFR-EGFP (the cDNA encoding human EGFR were ordered from BGI (Beijing, China). The plasmids Str-KDEL_SBP-mCherry-EGFR and Str-KDEL_SBP-EGFP-EGFR were generated by standard molecular cloning procedures. The N-terminus of SBP-EGFP tag, SBP-mCherry tag are followed by a signal sequence derived from IL-2⁶⁷), Tomm20-EGFP (artificially constructed based on EGFP-N1 backbone), 3XmEmerald-ensconsin (a gift from Prof. Dong Li (University of Chinese Academy of Sciences), Tomm20-mCherry (artificially constructed based on mCherry-N1 backbone), EGFP-Sec61β and Halo-clathrin (gifts from Prof. Yuhui Zhang (Huazhong University of Science and Technology)). The day before transfection, cells were seeded into the wells of a 24-well plate with 500 μL culture medium. The indicated plasmid was transfected into cells by the Lipofectamine LTX (Invitrogen) according to the standard protocol. The cells were digested with 0.25% trypsin (Thermo Fisher Scientific) 6–8 h after transfection, seeded onto confocal dishes, and cultured at 37 °C with 5% CO₂ for another 24 h.

Staining organelles in fixed cells

For labeling microtubules in fixed cells, Beas2B cells cultured on coverslips after 24 h were stained according to the approach in⁶⁸. Briefly, cells were first washed with cytoskeleton buffer (CB buffer: 10 mM MES of pH 6.1, 150 mM NaCl, 5 mM EGTA, 5 mM D-glucose, and 5 mM MgCl2) three times, prefixed with 0.6% paraformaldehyde with 0.1% glutaraldehyde and 0.25% Triton in CB buffer for 1 min. Then, cells were fixed with 4% paraformaldehyde and 0.2% glutaraldehyde in CB buffer for 15 min. After washing three times with 1× PBS, cells were incubated for 10 min in 0.1% NaBH4 to reduce background fluorescence due to glutaraldehyde, and another washing step with PBS was performed. To quench reactive cross-linkers, cells were incubated in 10 mM Tris for 10 min, followed by 2 washes with PBS. Then, cells were permeabilized in 5% BSA and 0.05% Triton X-100, diluted in PBS for 15 min, and then incubated with 1:500 mouse anti-α-tubulin antibody (Sigma, T6199) for 1 h, followed by three washes with PBS. Cells were then incubated with 1:500 Alexa Fluor 647 goat anti-mouse IgG (Invitrogen, A-21236) for 1 h. Finally, the cells were washed with PBS three times.

For labeling EGFR and CCP in fixed cells, Beas2B cells were cultured on coverslips after 24 h and treated with 5 ng/ml EGF in the culture medium for 3 min. Then, cells were incubated with 0.25% Triton, and 0.1% Glutaraldehyde in PEM buffer (80 mm PIPES, 5 mm EGTA, 2 mm MgCl2, pH 6.8) for 30 s. Next, cells were fixed with 0.25% Triton, and 0.5% GA in PEM for 10 min. After washing three times with 1× PBS, cells were incubated for 7 min with 0.1% NaBH4. After another washing step, cells were incubated with blocking buffer (5% normal goat serum, 0.05% Triton X-100 in PBS) for 1 h which increased to 3 h for labeling clathrin⁶⁹. Then cells were incubated overnight with primary antibodies (1:200 Anti-EGFR antibody (R-1) (SCBT, sc-101) for EGFR and 1:200 anti-clathrin heavy chain antibody (Abcam, ab2731) for clathrin) in blocking buffer. After incubation with primary antibodies, the coverslips were rinsed using the blocking buffer (3 × 10 min). Then, cells were incubated with 1:500 corresponding secondary antibodies in the blocking buffer for 1 h.

For labeling ER and mitochondria in fixed cells, Beas2B cells were transfected with EGFP-Sec61β, Tomm20-EGFP. After being transfected for 24 h, cells were first fixed with 3% paraformaldehyde and 0.1% glutaraldehyde in PBS for 10 min, then incubated with 0.1% NaBH4 for 7 min. After a washing step with PBS, cells were blocked with blocking buffer (5% normal goat serum, 0.05% Triton X-100 in PBS) for 1 h. Then cells were incubated with 1:500 anti-GFP primary antibody (Proteintech, 50430-2-AP) in the blocking buffer for 1 h and then incubated with the secondary antibody in the blocking buffer for another hour.

For labeling nuclear pore complex in fixed cells, Beas2B cells were fixed with 4% paraformaldehyde in PBS for 10 min, then incubated with 0.2% Triton X-100 for 10 min, next blocked with blocking buffer (2.5% BSA and 0.1% Triton X-100 in PBS) for 15 min. After that cells were incubated with 1:100 anti-Nup133 antibody (Sigma-Aldrich, HPA059767) in blocking buffer at 4 °C for 12 h, and then washed four times for 30 min with PBS. Next, cells were incubated with 1:500 goat anti-rabbit Alexa Fluor 647 (Sigma-Aldrich, SAB4600184) in the blocking buffer for 2–3 h. Finally, cells were anchored with MA-NHS (Sigma-Aldrich, 730300) for 1 h. Then a gelation solution of monomers was cast across the sample and polymerized at 37 °C for 2 h. The gelation solution was prepared according to the previous method⁷⁰. Next, cells were homogenized by proteinase K (New England Biolabs, #P8107) at 50 °C for 2 h. After homogenization, the gel was expanded with ddH2O.

For single-molecule imaging, we used the standard photoswitching buffer that contained 50 mM Tris of pH 7.5, 10 mM NaCl, 0.5 mg/mL glucose oxidase, 40 μg/mL catalase, 10% (w/v) glucose, and 1% (v/v) β-mercaptoethanol.

Labeling organelles in live cells

For labeling microtubules and EGF in live cells, Beas2B cells were transfected with 3XmEmerald-ensconsin plasmid. After 24 h post-transfection, cells were incubated with Qdot 655 (ThermoFisher, Q10123MP) conjugated EGF (5 ng/ml) in the culture medium for 30 min at 37 °C with 5% CO2. Then, the EGF solution is replaced by the culture medium for the following live-cell imaging.

For labeling microtubules and EGFR in live cells, Beas2B cells were co-transfected with plasmids encoding 3XmEmerald-ensconsin and EGFR-mCherry. After 24 h post-transfection, cells were prepared for live-cell imaging.

For labeling clathrin and EGFR in live cells, Beas2B cells were co-transfected with plasmids encoding Halo-clathrin and EGFR-EGFP and cultured for 24 h. Then cells were incubated with Halo-SiR in the culture medium at 37 °C for 1 h. After washing three times with the prewarmed culture medium, cells were incubated with EGF (5 ng/ml) in the culture medium for 3 min at 37 °C with 5% CO2 to induce endocytosis. Then, the EGF solution was replaced by the culture medium for the following live-cell imaging.

For labeling ER and mitochondria in live cells, Beas2B cells were transfected with Tomm20-mCherry, EGFP-Sec61β plasmids, and cultured for 24 h. After 24 h post-transfection, cells were prepared for live-cell imaging.

Experimental data acquisition

Data acquisition from fixed cells

The experimental training data for fixed cells were obtained from a home-built super-resolution localization microscope⁷¹ based on an inverted microscope (Nikon Ti Eclipse) equipped with a 100 × 1.49 NA TIRF objective (Nikon Apo TIRF). Excitation was provided by a 500 mW 656 nm laser (CNI, MRL-N-656.5–5500 mW), and images were acquired by EMCCD (Andor, IXon-Ultra) with a 16 μm pixel size. When performing single-molecule imaging, a 1.5× telescope was used, resulting in a 106 nm effective pixel size. For training data acquisition, a WF image of every field of view was first acquired at low illuminance, and then the laser intensity was increased to the maximum to obtain single-molecule images. For super-resolution imaging, an optimal focus system and a home-built drift-correction system were used to correct system drift⁷¹. The software was provided by NanoBioImaging Ltd. The frame rate was set to 30 frames per second, and 20,000 frames were acquired per super-resolution image.

Data acquisition from live cells

The live-cell data were acquired from different systems, and the image’s effective pixel size was adjusted to ~100 nm. Specifically, the data shown in Figs. 4, 5, 7, and the corresponding supplementary figures were acquired from a commercial Zeiss Elyra 7 microscope in HILO mode with a 60×/1.46 oil objective. For a FOV size of 25.6 × 25.6 µm², we recorded dual-color live-cell images at 100 Hz with 15 W/cm² illuminance for 5000 time points (Figs. 4, 5, and 7, and the corresponding supplementary figures) except for the data in Fig. 7a which is recorded at 0.5 Hz for 200 time points and Supplementary Fig. 29 which is recorded at 1 Hz and for 250 time points. And for whole-cell imaging with a FOV size of 60 × 50 µm², due to the data transmission limitation of the system, we used a 20 Hz imaging speed for 5000 time point recordings (Fig. 5a); in this process, the illumination intensity was reduced to 3 W/cm². The data in Fig. 6 was acquired with a Zeiss SP8 confocal microscope at 3 W/cm² illuminance with a 63×/1.4 oil objective. We recorded 300 time points at 0.4 Hz for a FOV of 51.2 × 51.2 µm².

Image processing

The single-molecule image sequences were analyzed with the ThunderSTORM⁷² plug-in in FIJI. The super-resolution reconstructed images were obtained at 5× magnification for images of microtubules and 10× magnification for vesicle images. To generate the training data, the LR images were processed by a custom code to extract the edge map. To generate the training pairs of LR images, edge maps, and GT images, the LR images and edge maps were interpolated at a scale of 1.25× based on bicubic interpolation. The intensity of all images was normalized to the range of 0–255. Then, the images were split into small blocks of size 256 × 256 to correspond to the size of the GT images (64 × 64 for LR images and edge maps). Finally, ~1000 training pairs were used to train the network for the simulated polymer images; ~300 training pairs were used to train the network for the experimental images of microtubules; ~600 training pairs were used to train the network for the experimental vesicle images.

Statistics and reproducibility

Except for network ensembles, all networks for different simulation/subcellular structures mentioned in this work were trained once per set of hyper-parameters and input dataset. For network inference results, using the same network parameters, repetition of the inference on the same input should always produce identical results.

Experiments on DNA origami (Fig. 2c) were repeated on 2 WF images of 256 × 256 pixels. Experiments for testing the effectiveness on experimental images were performed on 50 WF images of 64 × 64 pixels (Supplementary Fig. 16). Experiments for network performance evaluation on different subcellular structures in fixed cells were repeated 4 WF images of 256 × 256 pixels (Figs. 2d, e and 3a, and Supplementary Fig. 19). Experiments for testing the network robustness to different microscopies and fluorescent dyes were performed on 4 WF images of 256 × 256 pixels (Fig. 3d, e) Experiments on live-cell imaging were performed on 2 ~ 3 similar image sequences containing 2000–5000 frames (Figs. 4, 5, and 7, Supplementary Figs. 29 and 30) or 300 frames. All the simulation images were randomly generated. All the experimental images for the same experiment were acquired under the same experimental condition. No data were excluded from the analyses. Similar results were observed for the multiple incidences examined.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The source images generated in this study are publicly accessible at https://doi.org/10.5281/zenodo.7805563. The source data supporting the findings in this study are provided with this paper. Source data are provided with this paper.

Code availability

The codes of the SFSRM network, trained models, as well as some example images for testing is publicly available at https://github.com/crrayna/SFSRM.

References

Stephens, D. J. & Allan, V. J. Light microscopy techniques for live cell imaging. Science 300, 82–86 (2003).
Article ADS CAS PubMed Google Scholar
Gustafsson, M. G. Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy. J. Microsc. 198, 82–87 (2000).
Article CAS PubMed Google Scholar
Zhao, W. et al. Sparse deconvolution improves the resolution of live-cell super-resolution fluorescence microscopy. Nat. Biotechnol. 40, 606–617 (2021).
Hell, S. W. & Wichmann, J. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Opt. Lett. 19, 780–782 (1994).
Article ADS CAS PubMed Google Scholar
Rust, M. J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–796 (2006).
Article CAS PubMed PubMed Central Google Scholar
Hess, S. T., Girirajan, T. P. & Mason, M. D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).
Article ADS CAS PubMed Google Scholar
Sharonov, A. & Hochstrasser, R. M. Wide-field subdiffraction imaging by accumulated binding of diffusing probes. Proc. Natl Acad. Sci. USA 103, 18911–18916 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Jones, S. A., Shim, S., He, J. & Zhuang, X. Fast, three-dimensional super-resolution imaging of live cells. Nat. Methods 8, 499 (2011).
Article CAS PubMed PubMed Central Google Scholar
Huang, F. et al. Video-rate nanoscopy using sCMOS camera–specific single-molecule localization algorithms. Nat. Methods 10, 653–658 (2013).
Article CAS PubMed PubMed Central Google Scholar
Shim, S. et al. Super-resolution fluorescence imaging of organelles in live cells with photoswitchable membrane probes. Proc. Natl Acad. Sci. USA 109, 13978–13983 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
van de Linde, S., Heilemann, M. & Sauer, M. Live-cell super-resolution imaging with synthetic fluorophores. Annu. Rev. Phys. Chem. 63, 519–540 (2012).
Article ADS PubMed Google Scholar
Jungmann, R. et al. Quantitative super-resolution imaging with qPAINT. Nat. Methods 13, 439–442 (2016).
Article CAS PubMed PubMed Central Google Scholar
Takakura, H. et al. Long time-lapse nanoscopy with spontaneously blinking membrane probes. Nat. Biotechnol. 35, 773–780 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gustavsson, A., Petrov, P. N., Lee, M. Y., Shechtman, Y. & Moerner, W. E. 3D single-molecule super-resolution microscopy with a tilted light sheet. Nat. Commun. 9, 1–8 (2018).
Article CAS Google Scholar
Hu, Y. S. et al. Light-sheet Bayesian microscopy enables deep-cell super-resolution imaging of heterochromatin in live human embryonic stem cells. Opt. Nanoscopy 2, 1–12 (2013).
Article CAS Google Scholar
Holden, S. J., Uphoff, S. & Kapanidis, A. N. DAOSTORM: an algorithm for high-density super-resolution microscopy. Nat. Methods 8, 279–280 (2011).
Article CAS PubMed Google Scholar
Zhu, L., Zhang, W., Elnatan, D. & Huang, B. Faster STORM using compressed sensing. Nat. Methods 9, 721–723 (2012).
Article CAS PubMed PubMed Central Google Scholar
Cox, S. et al. Bayesian localization microscopy reveals nanoscale podosome dynamics. Nat. Methods 9, 195–200 (2012).
Article CAS Google Scholar
Scherf, N. & Huisken, J. The smart and gentle microscope. Nat. Biotechnol. 33, 815–818 (2015).
Article CAS PubMed Google Scholar
Dong, C., Loy, C. C., He, K. & Tang, X. Learning a deep convolutional network for image super-resolution. Computer Vision–ECCV, 184–199 (Springer, 2014).
Kim, J., Kwon Lee, J. & Mu Lee, K. Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 1646–1654 (IEEE, 2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 4700–4708 (IEEE, 2017).
Ledig, C. et al. Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE conference on computer vision and pattern recognition, 4681–4690 (IEEE, 2017).
Wang, X. et al. Esrgan: Enhanced super-resolution generative adversarial networks. Proceedings of the European Conference on Computer Vision (ECCV 2018), 63–79, (Springer, 2018).
Wang, Z. et al. Real-time volumetric reconstruction of biological dynamics with light-field microscopy and deep learning. Nat. Methods 18, 551–556 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Deep learning enables cross-modality super-resolution in fluorescence microscopy. Nat. Methods 16, 103–110 (2019).
Article CAS PubMed Google Scholar
Chen, J. et al. Three-dimensional residual channel attention networks denoise and sharpen fluorescence microscopy image volumes. Nat. Methods 18, 678–687 (2021).
Article ADS CAS PubMed Google Scholar
Qiao, C. et al. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat. Methods 18, 1–9 (2021).
Article Google Scholar
Nehme, E., Weiss, L. E., Michaeli, T. & Shechtman, Y. Deep-STORM: super-resolution single-molecule microscopy by deep learning. Optica 5, 458–464 (2018).
Article ADS CAS Google Scholar
Ouyang, W., Aristov, A., Lelek, M., Hao, X. & Zimmer, C. Deep learning massively accelerates super-resolution localization microscopy. Nat. Biotechnol. 36, 460–468 (2018).
Article CAS PubMed Google Scholar
Speiser, A. et al. Deep learning enables fast and dense single-molecule localization with high accuracy. Nat. Methods 18, 1082–1090 (2021).
Article CAS PubMed PubMed Central Google Scholar
Fang, F., Li, J. & Zeng, T. Soft-Edge Assisted Network For Single Image Super-resolution. IEEE Trans. Image Process. 29, 4656–4668 (2020).
Article ADS MATH Google Scholar
Gustafsson, N. et al. Fast live-cell conventional fluorophore nanoscopy with ImageJ through super-resolution radial fluctuations. Nat. Commun. 7, 1–9 (2016).
Article Google Scholar
Chen, R. et al. Efficient super‐resolution volumetric imaging by radial fluctuation Bayesian analysis light‐sheet microscopy. J. Biophoton. 13, e201960242 (2020).
Article CAS Google Scholar
Parthasarathy, R. Rapid, accurate particle tracking by calculation of radial symmetry centers. Nat. Methods 9, 724–726 (2012).
Article CAS PubMed Google Scholar
Weigert, M. et al. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat. Methods 15, 1090–1097 (2018).
Article CAS PubMed Google Scholar
Marsh, R. J. et al. Sub-diffraction error mapping for localisation microscopy images. Nat. Commun. 12, 1–13 (2021).
Article ADS Google Scholar
Sage, D. et al. Quantitative evaluation of software packages for single-molecule localization microscopy. Nat. Methods 12, 717–724 (2015).
Article CAS PubMed Google Scholar
Descloux, A., Grußmayer, K. S. & Radenovic, A. Parameter-free image resolution estimation based on decorrelation analysis. Nat. Methods 16, 918–924 (2019).
Article CAS PubMed Google Scholar
Culley, S. et al. Quantitative mapping and minimization of super-resolution optical imaging artifacts. Nat. Methods 15, 263–266 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Weaver, D., Shirihai, O. & Hajnóczky, G. Mitochondrial ‘kiss‐and‐run’: interplay between mitochondrial motility and fusion–fission dynamics. EMBO J. 28, 3074–3089 (2009).
Article CAS PubMed PubMed Central Google Scholar
Pelling, A. E. et al. Distinct contributions of microtubule subtypes to cell membrane shape and stability. Nanomed. Nanotechnol. Biol. Med. 3, 43–52 (2007).
Article CAS Google Scholar
Sanchez, T., Welch, D., Nicastro, D. & Dogic, Z. Cilia-like beating of active microtubule bundles. Science 333, 456–459 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Ross, J. L., Shuman, H., Holzbaur, E. L. & Goldman, Y. E. Kinesin and dynein-dynactin at intersecting microtubules: motor density affects dynein function. Biophys. J. 94, 3115–3125 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Sodeik, B., Ebersold, M. W. & Helenius, A. Microtubule-mediated transport of incoming herpes simplex virus 1 capsids to the nucleus. J. Cell Biol. 136, 1007–1021 (1997).
Article CAS PubMed PubMed Central Google Scholar
Vale, R. D. Intracellular transport using microtubule-based motors. Annu. Rev. Cell Biol. 3, 347–378 (1987).
Article CAS PubMed Google Scholar
Gao, Y., Anthony, S. M., Yu, Y., Yi, Y. & Yu, Y. Cargos Rotate at Microtubule Intersections during Intracellular Trafficking. Biophys. J. 114, 2900–2909 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, S. et al. Globally visualizing the microtubule-dependent transport behaviors of influenza virus in live cells. Anal. Chem. 86, 3902–3908 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bálint, Š., Vilanova, I. V., Álvarez, Á. S. & Lakadamyali, M. Correlative live-cell and superresolution microscopy reveals cargo transport dynamics at microtubule intersections. Proc. Natl Acad. Sci. USA 110, 3375–3380 (2013).
Article ADS PubMed PubMed Central Google Scholar
Fakhri, N. et al. High-resolution mapping of intracellular fluctuations using carbon nanotubes. Science 344, 1031–1035 (2014).
Article ADS CAS PubMed Google Scholar
Xia, L., Zhang, L., Tang, H. & Pang, D. Revealing microtubule-dependent slow-directed motility by single-particle tracking. Anal. Chem. 93, 5211–5217 (2021).
Article CAS PubMed Google Scholar
Giannakakou, P. et al. Enhanced microtubule-dependent trafficking and p53 nuclear accumulation by suppression of microtubule dynamics. Proc. Natl Acad. Sci. USA 99, 10855–10860 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Raiborg, C. et al. Hrs sorts ubiquitinated proteins into clathrin-coated microdomains of early endosomes. Nat. Cell Biol. 4, 394–398 (2002).
Article CAS PubMed Google Scholar
Maxfield, F. R. & McGraw, T. E. Endocytic recycling. Nat. Rev. Mol. Cell Biol. 5, 121–132 (2004).
Article CAS PubMed Google Scholar
Friedman, J. R. et al. ER tubules mark sites of mitochondrial division. Science 334, 358–362 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Guo, Y. et al. Visualizing intracellular organelle and cytoskeletal interactions at nanoscale resolution on millisecond timescales. Cell 175, 1430–1442.e17 (2018).
Article CAS PubMed Google Scholar
Fenton, A. R., Jongens, T. A. & Holzbaur, E. L. Mitochondrial dynamics: Shaping and remodeling an organelle network. Curr. Opin. Cell Biol. 68, 28–36 (2021).
Article CAS PubMed Google Scholar
Hirabayashi, Y. et al. ER-mitochondria tethering by PDZD8 regulates Ca2 dynamics in mammalian neurons. Science 358, 623–630 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Basty, N., McClymont, D., Teh, I., Schneider, J. E. & Grau, V. In Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment 127–135 (Springer, 2017).
Wang, Z., Liu, D., Yang, J., Han, W. & Huang, T. Deep Networks For Image Super-resolution With Sparse Prior (Proceedings of the IEEE international conference on computer vision, 2015).
Bell, S., Upchurch, P., Snavely, N. & Bala, K. Material recognition in the wild with the materials in context database. Proceedings Of The IEEE Conference On Computer Vision And Pattern Recognition, 3479–3487 (IEEE, 2015).
Zhao, H., Gallo, O., Frosio, I., & Kautz, J. Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, Vol. 3, 47–57 (IEEE, 2016).
Schonfeld, E., Schiele, B. & Khoreva, A. A u-net based discriminator for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8207–8216 (IEEE, 2020).
Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Vol. 2, p. 1398–1402 (IEEE, 2003).
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Article ADS PubMed Google Scholar
Boncompain, G. et al. Synchronization of secretory protein traffic in populations of cells. Nat. Methods 9, 493–498 (2012).
Article CAS PubMed Google Scholar
Li, Y. et al. Real-time 3D single-molecule localization using experimental point spread functions. Nat. Methods 15, 367–369 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jimenez, A., Friedl, K. & Leterrier, C. About samples, giving examples: optimized single molecule localization microscopy. Methods 174, 100–114 (2020).
Article CAS PubMed Google Scholar
Chen, F., Tillberg, P. W. & Boyden, E. S. Expansion microscopy. Science 347, 543–548 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhao, T. et al. A user-friendly two-color super-resolution localization microscope. Opt. Express 23, 1879–1887 (2015).
Article ADS CAS PubMed Google Scholar
Ovesný, M., Křížek, P., Borkovec, J., Švindrych, Z. & Hagen, G. M. ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging. Bioinformatics 30, 2389–2390 (2014).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Dong Li (University of Chinese Academy of Sciences) for providing 3XmEmerald-ensconsin plasmid. We also thank HKUST Bioscience Central Research Facility and Super-Resolution Imaging Center for essential equipment support. We gratefully acknowledge financial support from the Research Grants Council of Hong Kong under the General Research Fund (GRF; 16205818 and 16205619 to S.Y.; and 16102921, 16102218, 16103319, and 16104020 to Y.G.).

Author information

Authors and Affiliations

Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Rong Chen, Binbin Cui, Shengwang Du & Shuhuai Yao
Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China
Xiao Tang, Zeyu Shen, Yusheng Shen, Tiantian Li & Yusong Guo
School of Optical and Electronic Information, Huazhong University of Science and Technology, 430074, Wuhan, China
Yuxuan Zhao, Meng Zhang & Peng Fei
Department of Mechanical and Aerospace Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Casper Ho Yin Chung, Ji Wang & Shuhuai Yao
School of Pharmaceutical Sciences, Guizhou University, 550025, Guizhou, China
Lijuan Zhang
Department of Physics, The Hong Kong University of Science and Technology, Hong Kong, China
Shengwang Du
Department of Physics, The University of Texas at Dallas, Richardson, TX, 75080, USA
Shengwang Du

Authors

Rong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Shen
View author publications
You can also search for this author in PubMed Google Scholar
Meng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yusheng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tiantian Li
View author publications
You can also search for this author in PubMed Google Scholar
Casper Ho Yin Chung
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ji Wang
View author publications
You can also search for this author in PubMed Google Scholar
Binbin Cui
View author publications
You can also search for this author in PubMed Google Scholar
Peng Fei
View author publications
You can also search for this author in PubMed Google Scholar
Yusong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Shengwang Du
View author publications
You can also search for this author in PubMed Google Scholar
Shuhuai Yao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.Y. and S.D. conceived the project. S.Y., S.D., and Y.G. supervised the research. S.Y., S.D., and R.C. designed the experiments. X.T., Y.Z., M.Z., T.L., Y.S., J.W., B.C., and C.C. prepared samples. Z.S. and R.C. performed experiments. R.C. analyzed the data with conceptual advice from S.Y., L.Z., S.D., and Y.G.; R.C. wrote the manuscript with input from all authors under the supervision of S.Y, P.F., S.D., and Y.G. All authors discussed the results and commented on the manuscript.

Corresponding authors

Correspondence to Yusong Guo, Shengwang Du or Shuhuai Yao.

Ethics declarations

Competing interests

The authors declare the following competing interests: S.Y., S.D., and R.C. are listed as inventors on US patent applications 63/252,181 and 17/822,902 filed by The Hong Kong University of Science and Technology. All remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Description of Additional Supplementary Files

Supplementary Movie 1

Supplementary Movie 2

Supplementary Movie 3

Supplementary Movie 4

Supplementary Movie 5

Supplementary Movie 6

Supplementary Movie 7

Supplementary Software 1

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, R., Tang, X., Zhao, Y. et al. Single-frame deep-learning super-resolution microscopy for intracellular dynamics imaging. Nat Commun 14, 2854 (2023). https://doi.org/10.1038/s41467-023-38452-2

Download citation

Received: 18 July 2022
Accepted: 28 April 2023
Published: 18 May 2023
DOI: https://doi.org/10.1038/s41467-023-38452-2

This article is cited by

Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration
- Chenxi Ma
- Weimin Tan
- Bo Yan
Nature Methods (2024)
Cross-layer transmission realized by light-emitting memristor for constructing ultra-deep neural network with transfer learning ability
- Zhenjia Chen
- Zhenyuan Lin
- Huipeng Chen
Nature Communications (2024)
Super-resolution techniques for biomedical applications and challenges
- Minwoo Shin
- Minjee Seo
- Kyungho Yoon
Biomedical Engineering Letters (2024)
DBlink: dynamic localization microscopy in super spatiotemporal resolution via deep learning
- Alon Saguy
- Onit Alalouf
- Yoav Shechtman
Nature Methods (2023)
High-fidelity 3D live-cell nanoscopy through data-driven enhanced super-resolution radial fluctuation
- Romain F. Laine
- Hannah S. Heil
- Ricardo Henriques
Nature Methods (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.