Introduction

Caveolae are smooth 50–80 nm plasma membrane invaginations whose formation requires the coat protein Cav1 and the adaptor protein CAVIN1 (also called PTRF)1. Functional roles of caveolae include: mechanoprotective membrane buffers; mechanosensors; signaling hubs; and endocytic transporters2. Cryo-electron microscopy (cryoEM) analysis of caveolae has reported that the Cav1 coat is polygonal, formed of distinct edges and suggested to form a dodecahedral cage3,4. CryoEM analysis of Cav1 protein distribution in the caveolae coat, in either mammalian cells or following heterologous Cav1 expression in bacteria (h-caveolae), show that Cav1 exhibits a highly regular distribution of repeating polygons3,5,6. CAVIN1 forms an outer filamentous coat layer whose filamentous structure likely corresponds to the striations observed on caveolae, as well as flattened caveolae, by deep-etch EM3,4,6,7,8. In the absence of CAVIN1, Cav1 is localized to non-caveolar membrane domains known as Cav1 scaffolds9,10. While scaffolds have been characterized functionally11,12, defining their structure has proven more difficult. Biochemical analysis identifies small 8S oligomers that correspond to SDS-resistant oligomers of 10–15 Cav1 molecules as well as larger 60S oligomers that correspond to the caveolae coat13,14. CryoEM suggests that small 8S oligomers combine to form the caveolar coat3,4. However, the structural relationship of scaffolds to caveolae remains an open question. The size of both caveolae and scaffolds is below the diffraction limit of visible light (~200–250 nm) and cannot be distinguished by diffraction limited microscopy.

Super-resolution microscopy is therefore ideally suited to identify and characterize these sub-diffraction limit cellular structures. Of the various super-resolution microscopy approaches, the best resolution is obtained using single-molecule localization microscopy (SMLM), based on the repeated activation (blinking) of small numbers of discrete fluorophores, such as dSTORM, PALM, MINFLUX15,16,17,18. In dSTORM, precise localization of these blinks is determined from a Gaussian fit of the point spread function (PSF) providing ~10–15 nm X-Y (lateral) resolution and ~30 nm Z (axial) resolution for astigmatic lens 3D SMLM19,20. SMLM generates point coordinates in 3D space that can then be used to reconstruct localizations with significantly improved resolution and has been applied to distinguish invaginated and flattened caveolae based on Cav1 density in clusters21. An alternate approach to study point distributions is to visualize them as a graph or network. Graphs are mathematical structures used to model interactions between entities for many systems, with the entities represented as graph nodes and the connections between them as edges22. Real world graphs are frequently complex networks that have many different subgraphs or modules23. Networks with high modularity have dense connections (edges) between the nodes within modules (sub-networks) and sparse connections between nodes in different modules. The optimization problem of finding divisions within a network (i.e. modules or communities) has been solved via various methods such as normalized-cut graph partitioning and spectral algorithms24,25,26. Network and subgraph (module) analysis are therefore ideally suited to define molecular and subgroup organization between labeled molecules within the 3D SMLM point cloud of macromolecular complexes.

Previously, network analysis of SMLM Cav1 data sets in PC3 prostate cancer cells27, that express Cav1 but no CAVIN1 and therefore no caveolae1, identified two classes of Cav1 scaffolds corresponding to small Cav1 homo-oligomers (S1 scaffolds) that correspond to 8S Cav1 oligomers14,28, as well as larger hemispherical S2 scaffolds. The formation of curved Cav1 structures in the absence of CAVIN1 is consistent with Cav1 induction of invaginated h-caveolae in bacteria and supports a role for Cav1 in membrane curvature5,29. To identify signatures for caveolae, we compared PC3 prostate cancer cells that lack caveolae with PC3 cells transfected with the CAVIN1 adaptor required for caveolae formation. Larger hollow caveolae were only detected upon transfection of PC3 cells with CAVIN1 (PC3-PTRF cells)27 and their modular nature supported the polyhedral Cav1 coat structure observed by cryoEM3,4.

We now process (using spectral decomposition) the array of distances between localizations to find modules in endogenous Cav1 domains of HeLa cells. To determine the relationship between Cav1 scaffold domains and the multimeric caveolae structure, we leveraged a multi-threshold modularity analysis to extract the modules of the various Cav1 blobs. We then matched the blobs with the sub-modules of the various Cav1 domains based on the similarity (i.e., smaller Euclidean or L2 norm) between their biosignatures. To enhance localization precision, we used an SMLM microscope equipped with real-time nanometer-scale drift correction hardware30 and included only high-precision localizations to improve localization accuracy31,32. With this in-house built SMLM microscope, the localization precision approaches 10 nm33, and the drift is limited to 1 nm in the x-y plane and 3 nm in the z axis30,34. Modularity analysis and group matching show that S1A scaffolds can dimerize to form S1B scaffolds and oligomerize to form hemispherical S2 scaffolds. S1B scaffolds match the modules that make up the caveolae coat suggesting that the caveolae coat is built progressively by dimerization of S1A scaffolds, composed of the basic polygonal Cav1 units, that then combine to form a polyhedral caveolae coat.

Results and Discussion

Tunable iterative merge algorithm

A major challenge to determining molecular structure by SMLM is defining molecular localizations (i.e. the location of the labeled molecule, in this case Cav1) from the millions of blinks (i.e. 3D spatial coordinates of the labelled Cav1 fluorescent events) generated from the stochastic blinking detected by SMLM. Many blinks derive from the same labeled molecule, particularly when the labeling approach is based on antibody labelling (i.e. dSTORM). The same fluorophore can blink twice in succeeding acquisition frames dependent on the on-off duty cycle35 and the same molecule can be labeled by different fluorophores, either on the same secondary antibody or on different secondaries bound to the same target protein, introducing error in blink localization relative to the actual antigen. Each of these blink localizations is in addition subject to localization error due to drift and to Gaussian fitting of the PSF. Multiple, distinct blink localizations, therefore, derive from the same molecule and generate a dense non-biological network (NBN) with high degree nodes centred around the actual molecule27,36. Network analysis of the biological network, composed of nodes corresponding to predicted molecular localizations of the labeled proteins, requires reduction/consolidation of the NBNs.

Several methods have been proposed to reduce this artifact using temporal or spatial fluorophore information. Annibale et al.37,38 proposed a method to correct the multiple-blinking in PALM by determining the merging time for mEos2 photoactivatable fluorescent protein. Other methods39,40,41 spatially merged nearby localization events. Here, to correct for multiple-blinking and estimate molecular localization, we adopt the iterative merging algorithm of Khater et al.27, which iteratively merges nearby nodes (blinks), that are within a threshold merging distance, until convergence is reached. The process starts with the high network degree nodes and continues until the distance between all pairs of reconstructed nodes, that correspond to predicted or estimated molecular localizations, is within the threshold merging distance. Nodes in closest proximity are combined first such that merging is initiated within the dense NBNs and continues progressively until no nodes within the point cloud are closer than the merging proximity threshold (MPT).

3D point clouds of Cav1 from 10 dSTORM images of HeLa cells were processed using the 3D SMLM Network Analysis computational pipeline27. To address the multiple-blinking artifact that may bias the quantification process, we applied a tunable MPT from 10–20 nm in steps of 1 nm. Importantly, 4 classes were learnt at each MPT from 10–20 nm. Further, tuning the MPT from 10–20 nm minimally impacted classification, size, modularity, characteristic path and hollowness of all 4 classes of blobs (Fig. 1A). Machine learning blob classification is therefore independent of the merge algorithm for MPTs from 10–20 nm. Not unexpectedly, increasing the MPT reduced the predicted molecular localization number per blob. We set the MPT based on the reported 145 Cav1 proteins per caveolae42. An MPT of 19 nm resulted in an average of 142 localizations for the largest H2 blobs (Fig. 1A), that match the PP2 caveolae blobs from PC3-PTRF cells (see Fig. 2A).

Figure 1
figure 1

MPT tuning does not impact blob identification. (A) Biological signatures of HeLa Cav1 blobs at different MPTs (10–20 nm) were obtained by 3D SMLM Network Analysis27. We learn 4 groups/classes of Cav1 domains at each MPT. Cav1 blob shape, topology, hollowness, and network features are minimally affected by MPT tuning while the number of molecular localizations is affected by MPT tuning. Error bars represent standard deviation. (B) 3D Cav1 point clouds of a representative HeLa cell imaged with drift-corrected dSTORM30,34 before (green) and after (red) iterative blink merging at 19 nm and filtering out noisy localizations. Color-coded representations of blobs after segmentation and after identification by machine learning using 3D SMLM Network Analysis27 pipeline are shown. We identified four groups of blobs representing different Cav1 domains in HeLa cells.

Figure 2
figure 2

3D SMLM Network Analysis of the HeLa cells dataset. (A) Matching HeLa Cav1 groups with previously identified Cav1 domains in PC3 and PC3-PTRF cells27. The numbers are the Euclidean distances that capture the similarity/dissimilarity between the groups with smaller numbers indicating increased similarity. We matched learned groups from PC3, PC3-PTRF, and HeLa cells and show distances among the feature vector of group centers (in bold are the closest matching groups). The table to the right shows color matching of HeLa groups with previously identified P1 and P2 Cav1 domains in PC3 cells and PP1, PP2, PP3, and PP4 Cav1 domains in PC3-PTRF cells27. (B) Distribution of the matched groups from HeLa, PC3-PTRF and PC3 datasets are presented for comparison. (C) Signatures of matched groups from HeLa (at 19 nm MPT), PC3 and PC3-PTRF (at 20 nm MPT) cells show a high degree of correspondence of the individual group features. See Supp. Fig. S1 for the rest of the features.

Figure 1B shows the 3D point cloud of one of the HeLa cells in our dataset at various stages of the pipeline: 1) The 3D point cloud of Cav1-labeled HeLa cell generated by real-time drift control SMLM; 2) After iterative merging and denoising filtration. The denoising module visits every Cav1 event and predicts whether it is signal or noise. This prediction is based on examining the network features for every Cav1 event in our data, as well as examining corresponding network features of nodes in a random network. If the network features of a Cav1 event are similar to those of the random network’s nodes, then that Cav1 event will be declared as noise and removed. This denoising process will retain the Cav1 clusters (blobs) and filter out noisy localizations and monomeric Cav1; 3) After segmentation into separate blobs and extraction of a 28 feature/descriptor vector for every blob; 4) After unsupervised machine learning to learn the various Cav1 domains from the extracted blobs and their descriptor features.

Group matching

Machine learning identified four groups of Cav1 domains (H1, H2, H3, and H4) in HeLa cells (Fig. 1A). We used the Euclidean distance in 28 dimensions to encode similarity of HeLa groups with groups previously identified in PC3 and PC3-PTRF cells27, with similarity proportional to the inverse Euclidean distance. As seen in Fig. 2A, for the groups with larger blobs, H2 matches PP2, corresponding to caveolae, and H1 matches PP1, corresponding to the larger hemispherical S2 scaffolds. For the smaller S1 scaffolds, H4 matches PP3 and H3 matches PP4. Distribution of the different classes of blobs in the different cell types shows that HeLa and PC3-PTRF cells present a similar distribution of Cav1 blobs with slightly more caveolae detected in HeLa cells (Fig. 2B).

Feature analysis after group matching shows that the four HeLa groups match with high degree the four PC3-PTRF groups as well as the S2 and S1A scaffolds present in PC3 cells (Fig. 2C; see also Supp. Fig. S1 for additional data on blob features). Relative to the PC3 data27, we observed a doubling in molecular localizations for S1B scaffolds relative to S1A scaffolds and increased modularity of S1B scaffolds in the HeLa data set that we attribute to the improved resolution obtained with the real-time drift control SMLM30,34. Increased Cav1 localization number in S1B scaffolds parallels the increased size (X-range) and reduced network density of these clusters relative to S1A scaffolds, reflecting differences between these structures that led to their classification as distinct cluster groups in this and our previous analysis27. Indeed, we observe a progressive increased number of localizations (Caveolae > S2 Scaffolds > S1B scaffolds > S1A scaffolds) associated with increased modularity and decreased network density (Fig. 2C). Caveolae are the most modular structures (modularity >0.4), then S2, and S1B. S1A scaffolds have the least tendency to form modules (modularity <0.04).

Small Cav1 S1 scaffolds combine to build larger scaffolds and caveolae

The variable modularity of the different classes of Cav1 blobs led us to extract the blobs’ modules and study their features. To make sure that we correctly construct the modules within the blob’s network, we used multi-proximity threshold (PT) network analysis (Fig. 3A) to decompose the blobs’ networks into modules using spectral analysis. For all the groups, any PT greater than 60 nm renders every blob a single connected component; the average connected component size plateaus and equals the blob size for the different groups at PTs greater than 60 nm. This indicates that at PT >60 nm, all post-merge Cav1 localizations in a cluster are within 60 nm of each other. The number of modules and of Cav1 localizations per module are stable across the PT range from 60 to 170 nm. This range is therefore suitable to determine the number of modules and of Cav1 localizations per module. HeLa caveolae were found to be highly modular containing 6–7 modules of ~29 Cav1 localizations, S2 scaffolds 5 modules of ~14 localizations and S1B scaffolds ~4 modules of 7–8 localizations each (Fig. 3A). S1A scaffolds have the minimum average number of modules of ~2 modules per blobs of ~5–6 localizations each.

Figure 3
figure 3

Modularity analysis of Cav1 blobs. (A) Multi-proximity threshold modularity analysis shows the number of connected components, number of modules and localizations per module (at 19 nm MPT) for HeLa blobs at different proximity thresholds. (B) Representative blobs from the different HeLa Cav1 domains are shown. The visualization shows the blob’s localizations, the localizations’ connections, and the blob’s modules.

Visualization of blobs from the identified groups (Fig. 3B) highlights the modular nature of the various Cav1 structures. At 80 nm, each blob forms one connected component network and extracted modules for every blob are shown in different colors. The presence of small modules (~5–8 molecules) within both S1A and S1B scaffolds is indicative of an additional degree of suborganization within these small scaffold domains. 3D cryoEM tomography identified a network of 3-way junctions and polygonal arrangements of Cav1 protein densities within the caveolae coat3. Similarly, cryoEM analysis of Cav1-induced vesicles in bacteria (h-caveolae) present distinct polygonal repeating units on the h-caveolae cage5. We propose that the sub-modules that we detect in S1A and S1B scaffolds correspond to these polygonal repeating units that comprise the caveolae coat. The fact that S1A scaffolds form one connected component unit and that the number of localizations of S1A scaffolds matches that of Cav1 homo-oligomers (~14–15 Cav1s)14,28 suggests that interaction between these polygonal sub-modules forms more stable structural units. This is supported by the identification of larger modules in both S2 scaffolds and caveolae (Figs 3A and 4A).

Figure 4
figure 4

Module-blob matching between Cav1 domains. (A) Signatures of Cav1 blobs and blob modules shows that some module features are similar to blob features. For example, the right bars that represent the caveolae modules (blue) are very similar to the left bars that represent the S1B blobs (magenta). (B) We extracted 28 features (e.g. shape, topology, hollowness, network) for every blob and module. The table encodes the module-blob similarity between the different Cav1 domains (blobs) and the modules of each type as Euclidean distances between every pair of group centres.

Most interestingly, the decomposed modules from the different Cav1 cluster groups show a much higher degree of similarity in terms of the shape, topology and network features than the clusters from which they originate (Fig. 4A). For instance, while Cav1 blob classes show a progressive reduction in network density from S1A scaffolds to caveolae, modules from the different blob classes show a similar network density. This suggests that differential interaction between modules is responsible for the changes in network density of the different classes of Cav1 blobs and that these modules form fundamental building blocks of larger Cav1 structures.

Indeed, many features of caveolae modules match S1B scaffolds while S2 and S1B modules match S1A scaffolds. For instance, number of localizations, hollowness, characteristic path, modularity, size, and network density of the caveolae modules (blue bars to the right of graphs) are very similar to their corresponding features in the S1B blobs (magenta bars to left) (Fig. 4A). We quantitatively assessed module-blob similarity across all features using the matching matrix of the features for the various blobs and modules group center using Euclidean distance (Fig. 4B). The column-wise (i.e. the modules) similarity shows that: S2 scaffold modules match S1A blobs; caveolae modules match S1B blobs; and S1B modules match S1A blobs. The close matching of S1B modules with S1A blobs and doubling in the number of modules and localizations of S1B modules relative to S1A blobs suggests that S1B scaffolds represent dimers of S1A scaffolds. Further, PC3 cells that lack caveolae have only S1A and S2 scaffolds (Fig. 2B,C)27 supporting the matching between S1A blobs and S2 modules reported here. The dissimilarity between caveolae and S2 scaffolds and the modules of any other blob types suggests that these are complex structures made up of primitive S1A and S1B scaffolds.

Overall, our data support a model in which Cav1 is organized into smaller units of 5–8 Cav1 localizations that correspond to the polygonal base units observed by cryoEM analysis of the Cav1 caveolar coat3,5. These base units combine to form larger stable structures of which the smallest is S1A scaffolds, that we propose correspond to the previously identified ~14–15 Cav1 homo-oligomers14,28. We also identify S1B scaffolds, previously classified as distinct from S1A scaffolds27 as larger structures that may correspond to S1A dimers. Modularity analysis and group matching show that S1A scaffolds combine to form both S1B dimers and the larger hemispherical S2 scaffold structures. Caveolae modules show better matching and correspond in size to S1B and not S1A modules suggesting that caveolae formation may be a two-step process in which S1A scaffolds first combine to form dimers that then interact to form the caveolae coat (Fig. 5, Video S1). Consistent with a role for S1B in caveolae formation, PC3 cells that lack caveolae and CAVIN1 do not contain S1B scaffolds27. As cluster size increases, Cav1 domains show a more pronounced reduction in density relative to their constituent modules. This suggests that interaction between smaller S1 scaffolds to form larger structures, including caveolae, is associated with changes in how modules interact and are organized. Importantly, our analysis based on TIRF microscopy argues that all 4 Cav1 domains, from S1A scaffolds to caveolae are present at the plasma membrane. Based on the role of caveolae as membrane buffers that flatten in response to mechanical stretching43, we suggest that these modular interactions are dynamic and reversible.

Figure 5
figure 5

Modular interaction of Cav1 S1A scaffolds forms larger scaffolds and caveolae. Based on the module-blob matching results (Fig. 4B), S1A blobs are stable primitive structures (simplex) that are used to build up more complex, modular S1B and S2 scaffolds. S1B scaffolds correspond to S1A dimers and are used to build the caveolae coat complex (see Video S1). The figure also shows the hemispherical shape of S2 blobs and the hollow caveolae blobs.

In this work, we applied multi-threshold modularity analysis to networks/blobs constructed from 3D point clouds of Cav1 localizations acquired via SMLM. Classification of endogenous Cav1 domains in HeLa cells matched those previously identified in CAVIN1-transfected PC3 prostate cancer cells27. Spectral decomposition allowed us to define the relationship between the different Cav1 blobs and their extracted sub-networks/modules via biosignature similarity and matching across the various domains/groups. This approach is applicable for modular analysis of other oligomeric macromolecular biological structures improving our understanding of their architecture by disassembling them into their basic building components.

Materials and Methods

Cell culture and immunofluorescent labeling

HeLa cells were tested for mycoplasma by PCR (Applied Biomaterial, Vancouver, BC, Canada) and cultured in Dulbecco’s Modified Eagle’s medium (DMEM; Invitrogen) containing 10% fetal bovine serum (Invitrogen). For SMLM imaging, cells were plated on fibronectin coated coverslips (No. 1.5 H) for 24 h prior to fixation with 3% paraformaldehyde (PFA) for 15 min at room temperature. Fixed cells were rinsed with PBS/CM (phosphate buffered saline complemented with 1 mM MgCl2 and 0.1 mM CaCl2), permeabilized with 0.2% Triton X-100 in PBS/CM, and blocked with 10% goat serum and 1% bovine serum albumin (BSA; Sigma-Aldrich Inc.) in PBS/CM before incubation with rabbit anti-caveolin-1 (BD Transduction Inc.) for 12 h at 4 °C and then Alexa Fluor 647-conjugated goat anti-rabbit (Thermo-Fisher Scientific Inc.) for 1 h at room temperature. Primary and secondary antibodies, at saturating concentrations of antibodies, were diluted in SSC (saline sodium citrate) buffer containing 1% BSA, 2% goat serum and 0.05% Triton X-100. Cells were washed extensively after each antibody incubation with SSC buffer containing 0.05% Triton X-100 and post-fixed using 3% PFA for 15 min followed by extensive washing with PBS/CM. Near-infrared fiducial markers (diameter 100 nm; Thermo Fisher Scientific) were added for real-time drift correction. Immediately prior to imaging, cells were mounted and sealed on glass depression slides in freshly prepared imaging buffer (10% glucose, 0.5 mg/ml glucose oxidase, 40 μg/mL catalase, 50 mM Tris, 10 mM NaCl and 50 mM β-mercaptoethylamine (Sigma-Aldrich Inc.) in double-distilled water20,35.

SMLM Imaging

Imaging of Hela cells was performed on an in-house built SMLM system equipped with an apochromatic TIRF oil immersion objective lens (60×/1.49; Nikon Instruments) and a real-time drift correction system which limits the lateral drift to ~1 nm and the axial drift to ~3 nm. A 639 nm laser line (Genesis MX639, Coherent Inc., USA) was used to excite Alexa Fluor 647 fluorophores and near-infrared fiducial markers. A 405 nm laser line (Laserglow Technologies) was used to activate Alexa Fluor 647. The detailed optical setup and the imaging acquisition procedure were described previously30,34.

The dataset used in this work consists of 10 fields of view (FOV) of Cav1-labeled HeLa cells. Each HeLa FOV is 54 × 54 × 1 µm3 which is 9 times larger than the FOV in the PC3 cell study (18 × 18 × 1 µm3), acquired using a Leica GSDIM microscope equipped with a 160X objective27. Each HeLa FOV therefore included multiple cells and this study analyzed a larger number of cells than the PC3 study. We collected 40,000 frames per super-resolution image. The total number of collected localizations that we processed per image ranged from 1.6 to 6.7 million. The 3D SMLM Network Analysis method was able to process the whole FOV. Details of the 3D SMLM Network Analysis approach can be found in27.

Multi-proximity threshold network modularity analysis

For every blob of 3D localizations, more than one network can be constructed; one per each proximity threshold in the set {PT1, PT2,…, PTT} (i.e. blobi has T networks {Gi1, Gi2,…, GiT}, where Git is composed of a set of nodes Vi and edges Eit to form Git(Vi, Eit)). Vi, unaffected by PTt, represents the molecules of blobi and Eit is the set of edges connecting all pairs of molecules interacting within PTt nm.

We leverage a spectral decomposition algorithm to find modules within Cav1 blobs. Given Git, a blobi’s network at PTt, we find its modules (communities) using the Newman method24,25,26. Specifically, we first calculate an adjacency matrix whose element m, n encodes the distance between the m-th and n-th localizations. A spectral decomposition method calculates the eigenvector representation of this adjacency matrix. This eigendecomposition defines the modules as it maximizes the intra-connectivity between Cav1 molecules within a module and minimizes inter-connectivity between Cav1 molecules across modules.

Given Git, a blobi’s network at PTt, we find the optimal number of modules (communities) using eigenvectors of the network adjacency matrix. At small PTs, the molecules of a blob might not form one connected network (i.e. the network might consist of more than one connected component). A blob network containing non-dense and non-connected regions cannot be used to extract modules (i.e. as per definition, networks with high modularity have dense connection between the nodes within modules and sparse connections between nodes in different modules). Hence, PTs that generate networks with more than one connected component should be avoided when extracting the modules.

Features extraction and module-blob similarity

For every segmented Cav1 blob, we extracted 28 features that are then used to group the blobs into classes using the 3D SMLM Network Analysis pipeline27. The learned classes from HeLa dataset are then matched with the previously identified Cav1 domains from PC3 and PC3-PTRF datasets27. The modules of the HeLa Cav1 blobs are then extracted using the multi-proximity threshold network modularity analysis described in the previous subsection. We extracted 28 features for every module. To find the similarity/dissimilarity among the extracted modules and the various blobs, we leveraged the matching analysis to match blob modules with intact blobs using the Euclidean distance of group centers.