Classifying soft self-assembled materials via unsupervised machine learning of defects

Gardin, Andrea; Perego, Claudio; Doni, Giovanni; Pavan, Giovanni M.

doi:10.1038/s42004-022-00699-z

Download PDF

Article
Open access
Published: 14 July 2022

Classifying soft self-assembled materials via unsupervised machine learning of defects

Communications Chemistry volume 5, Article number: 82 (2022) Cite this article

2908 Accesses
12 Citations
21 Altmetric
Metrics details

Subjects

Abstract

Unlike molecular crystals, soft self-assembled fibers, micelles, vesicles, etc., exhibit a certain order in the arrangement of their constitutive monomers but also high structural dynamicity and variability. Defects and disordered local domains that continuously form-and-repair in their structures impart to such materials unique adaptive and dynamical properties, which make them, e.g., capable to communicate with each other. However, objective criteria to compare such complex dynamical features and to classify soft supramolecular materials are non-trivial to attain. Here we show a data-driven workflow allowing us to achieve this goal. Building on unsupervised clustering of Smooth Overlap of Atomic Position (SOAP) data obtained from equilibrium molecular dynamics simulations, we can compare a variety of soft supramolecular assemblies via a robust SOAP metric. This provides us with a data-driven “defectometer” to classify different types of supramolecular materials based on the structural dynamics of the ordered/disordered local molecular environments that statistically emerge within them.

Reinforcing materials modelling by encoding the structures of defects in crystalline solids into distortion scores

Article Open access 17 September 2020

Alexandra M. Goryaeva, Clovis Lapointe, … Mihai-Cosmin Marinica

Machine learning enabled autonomous microstructural characterization in 3D samples

Article Open access 06 January 2020

Henry Chan, Mathew Cherukara, … Subramanian K. R. S. Sankaranarayanan

Tracking the time evolution of soft matter systems via topological structural heterogeneity

Article Open access 12 January 2022

Ingrid Membrillo Solis, Tetiana Orlova, … Malgosia Kaczmarek

Introduction

Supramolecular structures, composed of molecular units that self-assemble via non-covalent interactions, represent the key substrate for biological systems (membranes, micelles, protein fibers, etc.), and for new types of self-healing, stimuli-responsive and bioinspired materials^1,2,3,4. Among many interesting features, the highest potential of soft supramolecular materials lies in their intrinsically dynamic character at room temperature⁵. The self-assembled monomers continuously exchange within and in-and-out these materials^6,7, controlling how they communicate with each other at the equilibrium^8,9, how they respond to external stimuli¹⁰, and how they behave out-of-equilibrium¹¹. Such dynamical features determine a set of intriguing properties that, on the one hand are crucial for the functioning of biological tissues (e.g., self-healing, chemotacticity, molecular transport, etc.)^{12,13,14,15,16} and, on the other hand, are promising features for the design of new functional materials and nano-technologies^{1,17,18,19,20,21,22,23}.

A major goal in the study of self-assembled architectures is understanding how changes in the structure of the self-assembling building blocks (input) affect the overall properties of the supramolecular assembled architecture (output) (Fig. 1). Gaining such knowledge would pave the way toward the rational design of new functional materials with controlled properties, reducing the costs of trial-and-error synthesis. In the last two decades many efforts have been made in this direction. Notably, previous works thoroughly investigated and tried to rationalize how geometric properties of dispersed colloidal particles drive their mutual assembly into complex structures^24,25,26,27. However, because of the complexity and the dynamical, multiform nature of such systems^28,29,30,31, a direct connection between the collective properties of supramolecular materials and the microscopic features of the constituent monomers remains often impossible to attain.

**Fig. 1: Classification of supramolecular polymers based on the structural/dynamical features of their molecular motifs.**

Molecular simulations play a fundamental role in this framework, providing abundant data on the structure and dynamics of supramolecular assemblies, with thorough chemical-physical information on the factors that control the dynamical features of these materials^{6,7,11,32,33,34,35,36}. However, analysing and interpreting the high-dimensional, high-detail data produced by computer simulations can be challenging, particularly for supramolecular systems. In such complex structures, irregularities and defects play a crucial role^6,8,10,11,16, demanding for molecular descriptors capable to translate the simulation data into more reliable classifications of the molecular structure and dynamics characteristic of these systems.

"Human-based” analyses rely on low-dimensional descriptors, defined based on the experience and on those features directly readable by visual inspection. This often leads to biased predictions, affected by the choice of such low-dimensional descriptors, which possibly overlook important degrees of freedom of the system. To overcome these limitations, data-driven, Machine-Learning (ML) approaches, such as, e.g., unsupervised clustering, dimensionality reduction, pattern recognition, etc., are particularly useful to characterize complex self-assembling systems^{37,38,39,40,41}. These techniques allow to fully exploit and analyse the rich, high-dimensional information contained in high-resolution MD simulations, providing insight on key patterns and correlations that may remain hidden to conventional human-based analyses^{39,40,42,43,44}.

In particular, data-driven analyses allow to effectively monitor the local environment of each atom/molecule in a system, providing accurate, high-dimensional atomic/molecular descriptors, called “fingerprints”, that classify the mutual arrangement of atomic/molecular entities ^{45,46,47,48,49}.

The Smooth Overlap of Atomic Position (SOAP)⁴⁹, proved to be a very efficient density-based descriptor. It encodes the information of atomic environments into a rich, high-dimensional and agnostic (i.e., independent from a priori knowledge of the system) fingerprint^50,51,52. Coupling a descriptor-based analysis with unsupervised clustering techniques enables a robust characterization of the ordered/disordered arrangement in molecular systems, as demonstrated for Lennard-Jones clusters, peptides⁵¹, and water^53,54,55. Recently, combining SOAP descriptors and unsupervised clustering, we were able to reconstruct the complex structural dynamics of molecules in supramolecular materials, tracking e.g., defects formation and evolution⁷. This data-driven approach, in principle enables a rigorous and unbiased classification of soft structures^56,57.

In the present work we push the data-driven comparability between supramolecular assemblies to the limit. Based on SOAP description and unsupervised clustering, as well as on a SOAP-based metric, we construct a data-driven analysis strategy, which can be used as a “defectometer”, measuring and comparing different structures based on their local molecular environments. This allows us to compare different types of assemblies, such as supramolecular fibers, micelles, layers, nanoparticles, etc. We obtain a general, unbiased classification of supramolecular systems that are profoundly different among each other, capturing structural and dynamical aspects that are hardly readable with standard descriptors.

Results and discussion

The formation and dynamics of defects play a fundamental role in determining the properties of soft, supramolecular assemblies. Detecting and classifying such structural defects require adequate descriptors of local molecular environments, carrying high-dimensional information on the monomer arrangements and dynamicity. In this framework, general, data-driven descriptors and analyses were recently proven to be more effective than standard, “human-based” descriptors, especially for complex and dynamic molecular systems⁷. In the following, we characterise the behavior of different types of soft, supramolecular self-assembled materials as emerging from equilibrium MD trajectories obtained using CG molecular models. For all considered systems, we compute the SOAP spectra associated to the local molecular environment that surrounds each monomer core in the system, at every sampled step of an equilibrium CG-MD trajectory. Gathering these SOAP spectra in global datasets, we can compare and classify different systems. (i) Combining dimensional reduction and unsupervised clustering techniques, we can determine from the SOAP dataset the main molecular motifs present in each studied assembly. This allows us, for example, to identify defects in such soft assemblies that are key for their intrinsic dynamics^6,7,8. (ii) By defining a SOAP-based metric, we can compute the distance between the compared systems in the high-dimensional SOAP space, assessing and quantifying their similarity in terms of the SOAP monomeric motifs that emerge and are present within them, in equilibrium conditions^55,56. This approach is first applied to a set of one-dimensional supramolecular polymers, having different monomer structures, and is then extended to compare and classify two-dimensional as well as three-dimensional assemblies with each other, proving a remarkable flexibility.

Comparing variants of a supramolecular polymer

Recently, it has been shown that unsupervised clustering of SOAP data from equilibrium MD of 1,3,5-Benzenetricarboxamides (BTA) supramolecular polymers allows the reconstruction of the structural dynamics of such complex assemblies⁷. In particular, monitoring the SOAP vectors defined in the center of each monomer in a BTA supramolecular fiber allows us to retrieve detailed information on the structural order/disorder (both at a local and global level), as well as on the internal monomer dynamics that characterize such supramolecular polymers. A pre-requisite is that the MD simulation trajectories considered in the analysis provide sufficient sampling of the equilibrium of the simulated systems (in terms of, e.g., microstates, their populations, and transitions between them). All simulations used in the following analyses meet such requirement (see the Methods section for complete details on the simulation and analysis protocols).

Considering different variants of BTA fibers, and combining the SOAP data obtained for each system, we can assemble a dataset gathering all possible configurations (each one identified by a SOAP feature vector) visited by monomers in the different BTA fiber variants. This guarantees that SOAP vectors in this dataset, associated to the local arrangement of monomers in all the different assembly variants, belong to a unique high-dimensional feature space, thus allowing us to compare between different systems⁷. In these analyses, the resolution of the employed molecular models is crucial, as it implicitly determines the accuracy by which, e.g., the monomer-monomer interactions, the monomers’ flexibility, etc., are described. Coarse-grained (CG) models with a resolution <5 Å— developed based on the Martini force-field⁵⁸, and optimized to match the behavior of the respective all-atom (AA) models—were proven accurate enough to capture all essential information while guaranteeing sufficient sampling of the assembled fibers in equilibrium^6,7,34. With such CG models, it was demonstrated that water-soluble BTA polymers, composed of BTA_W monomers with amphiphilic arms, possess a variegated structure with a rich and diverse set of monomeric states, in continuous exchange and dynamic interconnection with each other. The SOAP data of the BTA_W fiber were analysed via the Probabilistic Analysis of Molecular Motifs (PAMM) unsupervised clustering (see Methods section for details)^7,51. As shown in Fig. 1C, PAMM identifies different molecular motifs (microstates) in the SOAP data, indicated by different colors in the 2D PCA projection. The hierarchical structural/dynamical interconnections between microstates (i.e., how structurally similar they are, and how much they dynamically exchange with one another) are captured by a statistical analysis of the identified microclusters (see the Methods section) which outputs the related microstates dendrogram (Fig. 1C, bottom-left). As the SOAP data analyzed herein are extracted from snapshots taken along equilibrium MD trajectories, the microclusters that are found close to each other in the dendrogram identify SOAP monomer environments (monomer microstates) that are similar from a structural point of view (see, e.g., Supplementary Fig. 5) and/or quickly dynamically exchanging with each other. The dendrogram also allows to perform a systematic coarse-graining of the classification, identifying the dominant motifs (macrostates) in the BTA_W fiber (Fig. 1C, right): the ordered/persistent interior (bulk) of the fiber (in red: ~66% of the monomers), the stacking defects (~30% of the monomers, in green: bound to the stack only by one side), and the monomers which are adsorbed on the fiber surface, moving from one defect to another (in blue: population ~3% of the monomers). From the frequency at which the monomers change state during the equilibrium MD, we can estimate the relative transition probabilities between the various states. The BTA_W fiber turns out to possess a very rich and diverse internal dynamics. In comparison, the BTA_C8 variant, with shorter carbon side-chains, produce straight and substantially defect-free fibers (Fig. 1D). Interestingly, artificially reducing the directional interactions between the monomers a new BTA^* fiber variant is defined, where a fraction (~2%) of monomers form defects that are continuously created-and-repaired ⁷.

It is worth noting that such a direct comparison between fiber variants is possible since all SOAP vectors, sampled in the compared systems, belong to the same high-dimensional feature space (324-dimensional, see Methods for details). This approach allows to identify defects (i.e., disordered domains) in the supramolecular fibers, to track their emergence/disappearance, and to use them to compare fiber variants. Moreover, as the concept of defects is elusive in such dynamical structures, this approach is general and flexible, as the definition of the molecular motifs that statistically populate the assemblies, and thus of defects, is exquisitely data-driven. This analysis can be thus extended to other families of assemblies. As a first proof-of-concept, we proceed by comparing variants of different supramolecular polymer families.

We focus on two other families of supramolecular polymers, formed by monomers based on naphthalene diimide (NDI)⁵⁹ (Fig. 2A, B) and benzotrithiophen cores (BTT)⁶⁰ (Fig. 2C, D). We compare two variants per-family, NDI_O vs. NDI_S, and BTT_F vs. BTT_5F, for which we employed reliable CG models having analogous resolution of the BTA models used in the analyses of Fig. 1^59,60. We repeated the analysis performed for BTAs to compare the NDI_O and NDI_S fiber variants, building a unique SOAP dataset from the equilibrium MD of the two systems (Fig. 2A, B). Also in this case, we computed SOAP vectors, defined in the center of each monomer (see Supplementary Fig. 11 for a visual representation), over equilibrium MD trajectories, using the same sampling frequency and length used for BTA analysis (see Methods section for complete details). We then performed PCA on the SOAP dataset, and used PAMM to identify the molecular states. Again, in this case, PAMM highlighted three dominant monomeric states (Fig. 2A, B: in red, green and blue). Indeed, this analysis of NDI variants evidences a picture similar to that of BTAs. One variant, NDI₀, exhibits a well-ordered structure with all monomers belonging to the fiber backbone (Fig. 2A: in red), apart from rare fluctuations. The slight change in the NDI_S monomer structure (the replacement of two oxygen with two sulfur atoms) produces instead a more disordered fiber, rich of defects (Fig. 2B, in green: ~16%), and of adsorbed monomers which travel from one defect to another within the assembly (in blue: ~7%). The same protocol was used for the BTT fibers (Fig. 2C, D), showing again three molecular states, and providing a similar picture to that of BTA and NDI systems. The BTT_F molecule tends to induce a much ordered fiber than the BTT_5F variant, in agreement with what is known experimentally for these systems⁶⁰. A difference that emerges from these analyses, is that BTT fibers possess a more diverse structure than NDI fibers, which appear as more ordered in comparison. In particular, BTT_5F exhibits the highest occurrence of disordered monomeric domains (~49% of the monomers are in defected state and ~19% are adsorbed/traveling monomers). The ordered monomers in the BTT_5F fibers are only ~32% (~78% in BTT_F), and considerable dynamic interconversions are observed between all monomeric states (Fig. 2C, D).

**Fig. 2: Comparing variants of supramolecular polymers.**

The proposed analysis allowed us to qualitatively detect the statistical formation of relevant molecular motifs such as defects. This is an important ingredient since, as demonstrated for BTA supramolecular polymers^6,7,8, the possibility to create and repair defects in the assembly is crucial for their dynamic properties¹⁰. The analyses of Figs. 1, 2 demonstrate that this approach is versatile, and it can be used to compare fiber variants within the same supramolecular polymer family. However, at this stage, the analysis does not allow us to unambiguously compare between the different families. Especially, since the definition of defects emerge from the data contained in separate SOAP datasets, it is not clear to what extent the defects in the BTA fibers are comparable to the BTT or NDI ones. The SOAP dataset used to compare the BTA variants in Fig. 1, in fact, does not contain information on the monomer states in the NDI and BTT fibers. To compare these supramolecular fibers in a more objective and quantitative way, a further step is required.

Comparing different types of 1D supramolecular polymers

As described above, by processing a comprehensive dataset containing SOAP vectors sampled via MD simulation of multiple systems, one can build a framework to rigorously and unambiguously compare different supramolecular systems. The idea is to retain in the SOAP analysis those relevant features that are common to all the systems (a sort of “common molecular denominator”)⁷. As detailed in the Methods section, we considered only the position of monomer centers in the SOAP calculation, with two advantages: (i) this has been shown to retain sufficient information on the monomer arrangement in the fibers, and thus on their supramolecular structure and dynamics^7,8. (ii) This makes the analysis very general/abstract—all assemblies are composed of mutually interacting monomers—opening the possibility to compare, not only variants of a supramolecular fiber, but also widely different assemblies.

To this end, we thus built a joint dataset containing all the SOAP vectors sampled via the equilibrium MD of the systems simulated for the analyses of previous section (3 BTA, 2 NDI and 2 BTT fibers), obtaining the global contour plot reported in Fig. 3A. We repeated the PCA and PAMM analyses over this dataset, identifying the main molecular motifs shared by all these one-dimensional assemblies (Fig. 3A). As in the analyses of Figs. 1, 2, three main structural clusters are identified by PAMM: backbone (in red), defects (in green) and adsorbed/surface-diffusing monomers (in blue). This distinction in three macrostates is used in the the PCA scatter plots of Fig. 3A to classify the SOAP states sampled by the individual systems, overimposed on the global SOAP distribution (contour plot). These scatter plots can represent characteristic “fingerprints” of the supramolecular structures, indicating which molecular states are populated in each system. Qualitatively, such fingerprints provide an information of similarity between systems, in terms of structural arrangement and dynamicity of the monomers. However, since PCA scatter plots are the result of a dimensionality reduction, they can provide a distorted picture, where dimensions potentially relevant in the overall feature space may be neglected.

To reach a more quantitative insight, we employed a SOAP-based metric that allows comparing complex supramolecular systems based on the average SOAP spectra of the monomers forming the assemblies (i.e., based on the structural/dynamical features of the local environment that, on average, surrounds the monomers). Proven useful to compare complex interacting molecular systems, such as, e.g., lipid bilayers⁵⁶, or liquid aqueous systems⁵⁵, we build on such high-dimensional metric to quantify the similarity between the SOAP data associated to each fiber (see Methods for details). For each frame extracted from an equilibrated MD simulation, SOAP vectors associated to the local environments surrounding all monomers are averaged into a single SOAP spectrum (the frame-average, Eq. 4). These frame spectra are then averaged along the MD trajectory, obtaining an average SOAP spectrum, which is characteristic of a given assembly and of those molecular motifs that populate it at equilibrium (the simulation-average, Eq. 5). The frame- and simulation-averages associated to each system are projected onto the first two PCs, in the scatter plot of Fig. 3B. We can thus assess the similarity among the different 1D assemblies in a more rigorous way, by employing a SOAP-induced metric^50,55,56 (Eq. 7) to compute the distance (d_SOAP) between the SOAP simulation-averages of the various systems. Such distance is defined directly in the SOAP space, and preserves the complete information contained in such high-dimensional descriptors. The result of this analysis is the d_SOAP distance matrix in Fig. 3C. The off-diagonal values in the d_SOAP matrix grant a classification of the assemblies under investigation in the global SOAP space. The darker the color of the entry, the lower is the d_SOAP between the assemblies, indicating their similarity (in terms of molecular environments). The d_SOAP matrix confirms the qualitative indications provided by the scatter plots of Fig. 3A: the most ordered supramolecular polymers, namely, BTA_C8, BTA^*, and NDI_O, are similar and mainly populated by ordered, backbone-like molecular domains (Fig. 3A: in red). In these systems, the defect states are just sparsely populated, and located very close to the ordered domains—this is consistent with the fact that defects are relatively rare fluctuations, which are readily repaired (consistently with, e.g., Fig. 1E). The remaining 1D systems, containing a higher defect concentration (Figs. 1, 2), are more or less distant from the previous three fibers. NDI_S and BTT_F SOAP spectra are nearly superimposed, with a d_SOAP ~ 0, and very similar PCA projections (Fig. 3A, B). This indicates that the monomeric environments associated to these systems are, on average, structurally/dynamically very similar (see also population histograms and interconversion graphs in Fig. 2). Conversely, BTT_5F and BTA_W present unique features, distancing themselves from the other fibers. Such a peculiarity, already suggested by the fingerprints of Figs. 1, 2, is here confirmed quantitatively.

Classifying different types of supramolecular polymers, these results demonstrate how the proposed approach is effective in comparing 1D assemblies in general. This is possible through the unbiased, high-dimensional, SOAP-based metric, that quantitatively compares the average molecular environment (and defects) emerging in the various assemblies. The flexibility of the approach suggests to investigate whether such a “defectometer” can be used to compare assemblies with higher structural dimensionality (i.e., 2D or 3D assemblies).

Comparing 2D dynamic assemblies

We here extend our approach to compare 2D supramolecular systems. We focus on assemblies where the monomers may be arranged on flat, as well as on curved surfaces. As case studies, we chose DPPC lipids, which self-assemble into 2D (planar) lipid bilayers, and DPC and SDS surfactants, that form nearly spherical micellar aggregates. For all these systems, we employed validated Martini-based CG models (same resolution of the previously studied models) and collected equilibrium MD trajectories for the SOAP analysis (Fig. 4A). Also in these analyses, we consider one SOAP center per-monomer (centered in the lipid/surfactant heads).

DPPC bilayers are known to undergo a gel-to-liquid transition around ~300–320 K of temperature, which is well-captured by DPPC Martini models⁵⁶. In our analysis, we considered equilibrium MD trajectories for DPPC at T = 273, T = 293, and T = 323 K, as representative of 2D planar assemblies having variable reconfiguration dynamics of monomers, depending on the bilayer gel/liquid state. Analysing the DPPC trajectories at different temperatures, we first performed a SOAP + PAMM analysis, analogous to the SOAP + PAMM comparisons between fiber variants belonging to the same family, in Figs. 1, 2 (see Methods for details). The results prove that this approach captures the bilayer gel-to-liquid transition correctly, solely based on how the environment surrounding the lipids changes with the temperature (Fig. 4B). At low temperature (~273 K, Fig. 4B: left) the bilayer is entirely in the gel phase (in red), while as the transition temperature is approached (between ~300 and 320 K), liquid domains appear (in blue, ~5% at ~293 K, Fig. 4B: center)⁵⁶. When the temperature raises to ~323 K, the bilayer is entirely liquid, with residual (<5%), gel-like lipids (Fig. 4B: right). Our analysis built on the SOAP data correctly distinguishes gel and liquid domains in planar lipid bilayers, without prior knowledge on the lipid arrangement in each phase. Nonetheless, interesting questions may arise on how similar the identified monomeric environments are to, e.g., those present on the curved surface of a micelle.

We thus, extended our SOAP + PAMM analysis to curved 2D assemblies, enriching the dataset with the SOAP data obtained from the equilibrium MD (at T = 300 K) of two micellar aggregates, made of DPC or SDS surfactants. The PCA scatter plots (Supplementary Fig. 10) indicate a significant overlap of both micelles fingerprints with that of the lipid membrane in the liquid phase (T = 323 K). This suggests that the structural/dynamical features of the monomeric environments characterizing these micelles are closer to those of a dynamic/liquid, rather than of a static/gel-like flat bilayer.

To obtain quantitative insights we then projected the SOAP frame- and simulation-averages of these systems along the first two PCs (Fig. 4D), and computed the d_SOAP matrix (Fig. 4E). The DPPC_273K and DPPC_293K assemblies appear relatively close to each other, and separated from the other three systems—while in DPPC_293K both gel and liquid phases coexist, most of the lipids are gel-like. Similarly, DPC_300K and SDS_300K micelles have d_SOAP ~ 0, appearing close in the scatter plot (Fig. 4D). Interestingly, the liquid DPPC_323K shows “intermediate” features, being closer (in terms of monomers environment, disorder/reshuffling) to dynamic micelles formed by different monomers (SDS/DPC), than to the DPPC bilayer at lower temperature (Fig. 4D, E).

Comparing 3D dynamic assemblies

We tested this approach also on soft self-assembled 3D systems. In particular, we focused on spherical-shaped nanoparticles. As an example, we chose hexadecane (HEXA), an hydrophobic alkane composed of 16 Carbon atoms, for which we employed a Martini-based CG model. HEXA molecules undergo aggregation forming spherical assemblies (droplets) in water. We tested our method by comparing assemblies of variable size, composed of 128, 512 and 2048 molecules (Fig. 5A). For each assembly, we collected equilibrium MD trajectories in explicit water, and analysed them via SOAP + PAMM analysis. Again, we defined one SOAP vector in the center of each monomer. Since the compared systems have different size, we adapted the sampling statistics, considering different number of frames depending on the system size (with fixed sampling frequency of 1 ns⁻¹). Therefore, all systems contribute to the global SOAP dataset in equal percentages (see Methods for details).

The comparison of HEXA droplets is reported in Fig. 5B–D. Not surprisingly, the molecular environments do not change radically across different system sizes (scatter plots in Fig. 5B–D). Our analysis identifies two populated motifs, essentially corresponding to the monomers that are on the surface (pink) or in the bulk (gray) of the HEXA assemblies (in general, the larger is the assembly, the more populated is the bulk phase). The analysis also highlights a dynamicity between these two molecular motifs (interconversion between bulk and surface), which varies with the aggregate size in the same way as the ratio between the population of the pink and gray domains. While these cases are rather simple, they show that our analysis provides robust and reasonable results also in the case of 3D assemblies of variable size. This analysis enriches the systems studied in the present work with isotropic 3D assemblies, providing additional data to push the limits of the comparability in the next section.

A “defectometer” to compare and classify different types of soft dynamic assemblies

As a last step, we processed the simulation data of all the studied systems in a single SOAP + PAMM characterization, assessing to what extent completely different assemblies can be compared to each other. It is worth noting that all self-assembled systems share a common feature, in that they are formed of individual monomers that interact with each other. Having used a single-center per-monomer SOAP definition in each system, we can thus compare all these different assemblies in a common SOAP feature space containing information on the mutual arrangement of their monomers along the equilibrium MD trajectories. We gathered in a single dataset all SOAP vectors computed in the analyses of the previous sections. An identical number of SOAP vectors is considered for each system, to equally weight them, guaranteeing a balanced comparability.

We processed the complete SOAP dataset via the described PCA + PAMM workflow. Figure 6A shows the PCA projections of the SOAP vectors for each system, overimposed onto the contour map of the global dataset. Different colors are associated to the five dominant molecular motifs detected via PAMM. The systems populate different regions in the SOAP space (Fig. 6A). This comparative analysis provides results compatible with those of previous analyses (e.g., different PCA scatter plots between ordered and disordered fibers). Surprisingly, many systems populate all the identified motifs, suggesting that structural analogies are present, despite the intrinsic diversity of the considered assemblies. Nonetheless, the cluster superposition in the PCA scatter plots hampers the comparison, as relevant dimensions might be hidden. To obtain a more quantitative insight, we turned again to the average SOAP vectors associated to each system. Using the high-dimensional SOAP metric⁵⁶ introduced above, we computed the d_SOAP matrix comparing the simulation-averages associated to each assembly with each other, indicating the similarity and differences between the systems in terms of monomer environments (distance in the SOAP space). An important parameter for the SOAP analysis is the cutoff radius (rcut), which determines the size of the neighborhood considered in characterizing the molecular environment of each SOAP center⁷. In principle, the d_SOAP scale may change while changing the rcut in the analysis. In order to prove the robustness of our analysis, and given the differences in the assemblies compared herein, we computed the d_SOAP matrix using three different rcut values: 0.8, 1.6, and 3.0 nm. Shown in Fig. 6B (top panels), the matrices show a reduced d_SOAP between the various systems when a shorter rcut is used (i.e., the differences between systems fade for rcut = 0.8 nm with respect to rcut = 3.0 nm). The bottom panels of Fig. 6B report the same d_SOAP matrices of the top panels, with an adapted color scale. Aside from subtle differences—e.g., a slightly lower resolution when a smaller rcut is used—the global picture in terms of similarity between the compared assemblies is consistent in all cases (more details on the effect of changing SOAP + PAMM parameters on the resolution and completeness of the analysis are reported both in the Methods and Supplementary Note 2).

**Fig. 6: Comparison of different types of soft dynamic supramolecular materials.**

In general, in the d_SOAP matrices we observe four main dark areas—i.e., that of fibers (1D assemblies), two neighboring/entangled ones for the flat and spherical 2D assemblies, and a third one for the 3D aggregates (bottom-right in the matrix). An interesting exception is the BTA_W fiber, found more similar to 2D assemblies (DPPC and surfactant micelles) than to the other ordered 1D assemblies. Another interesting system is the DPPC bilayer at 323 K, which is found closer to highly dynamic SDS and DPC micelles rather than to DPPC at lower temperature (293 or 273 K).

In the d_SOAP matrix, one can select one assembly and rank all the others with respect to it. For example, selecting the ordered BTA_C8 fiber, the plot of Fig. 6C shows a high similarity (small d_SOAP) with the other 1D ordered assemblies, lower similarity with disordered 1D fibers (e.g., d_SOAP ~ 0.6–0.7 vs. BTA_W), while d_SOAP increases further with respect to 2D and 3D assemblies. As anticipated above, an interesting result is obtained for BTA_W (Fig. 6D). In terms of d_SOAP ranking, the closer assemblies to this water-soluble, disordered fiber are indeed highly dynamic, planar or spherical 2D assemblies. Surprisingly, all the ordered 1D fibers are less similar to BTA_W than all the 2D studied systems. This suggests that the solvophobic component of the BTA_W-BTA_W interactions in water (key factor in controlling the defect formation in such 1D assemblies)^7,8 can shape a molecular environment in the surrounding of the monomers that is closer to the environment of 2D micelles or liquid-like lipid bilayers than to the environment of ordered BTA variants (e.g., BTA_C8). Noteworthy, this is known to produce a dynamic surface adaptability in this specific fiber that is similar to the surface fluidity seen, e.g., in lipid bilayers^6,10. The relative d_SOAP distances in the matrix (computed with rcut = 3.0 nm, Fig. 7A) were then processed via a further clustering step (see the Methods section for details), to assess the similarity interconnections among all considered systems. This led to the global dendrogram shown in Fig. 7B, which clearly underlines how the disordered BTA_W fiber is, for example, closer to the ensemble of 2D assemblies—bilayers and micelles (highlighted in green) -, rather than to that of ordered 1D fibers (in blue)^6,7,10. This demonstrates that the proposed SOAP-based analysis can provide a handle to classification exceeding the typical ones based on, e.g., the theoretical dimensionality of an assembly, monomer-monomer coordination number, etc., which cope badly with complex and dynamic assemblies. This crucial difference between “standard” human-based classifications and the data-driven method proposed herein is underlined further in Fig. 7C, D. As an example, these plots report the dimensionality of each system as a function of its distance d_SOAP from the BTA_C8 (C) and BTA_W (D) systems, showing that a correlation between microscopic similarity and a priori theoretical dimensionality is not obvious when dealing with soft dynamic assemblies as those studied herein.

**Fig. 7: Similarity classification of soft, dynamic supramolecular materials.**

It is worth noting that such similarity measurements rely solely on data extracted from equilibrium MD simulations, with no major assumption on the structure/features of the studied materials. While this approach is flexible, and can be used to compare in an unbiased way different types of materials, it also creates the important opportunity to classify assemblies based on the molecular environments that populate them, which is a crucial step towards the rational design of supramolecular materials with programmable dynamic properties.

Conclusion

Rigorous and precise classification of the structure and dynamics of soft supramolecular materials is essential for the rational design of monomers that can self-assemble into supramolecular structures having controllable dynamic behavior. While it is known that the dynamics of such materials originates from defect states that statistically populate them, it is not trivial to design a unified, unbiased, and robust approach to classify soft assemblies based on their “defectivity”.

Here we report an unsupervised, machine-learning approach that allows comparison and classification of soft self-assembled materials (1D supramolecular fibers, 2D, and 3D assemblies) based on the structural dynamics of the internal molecular environments that surround their constituive monomers. We simulate these supramolecular assemblies via molecular models having submolecular (<5 Å) resolution, and analyse their equilibrium MD trajectories via a synergistic use of atomic environment descriptors, unsupervised clustering, and similarity/dissimilarity measurements in a high-dimensional feature space. Monomers’ displacement/arrangement data extracted from the MD simulations are translated into relevant information on the molecular arrangements within the assemblies by means of SOAP feature vectors⁴⁹. The high-dimensional data contained in the resulting SOAP spectra are then used to identify dominant monomer clusters/states in the different assemblies via unsupervised PAMM clustering (and PCA)^7,51. The analysis outputs a characteristic fingerprint for each system in terms of the most populated molecular motifs, their structural and dynamical features. A metric defined in the high-dimensional SOAP feature space^55,56 then provides a quantitative way to compare and classify the studied systems. We show how such data-driven analysis can be used to compare a wide range of supramolecular systems, providing a classification based on the monomeric motifs that emerge from equilibrium MD data, overcoming standard classification approaches based on a priori assumptions on the features of the assemblies (such as, e.g., their ideal/theoretical dimensionality, etc.). The obtained results also demonstrate how, monitoring the monomer transitions between the detected molecular motifs, it is possible to obtain relevant information on the inner dynamics of the assemblies. We can observe how ordered 1D stacked fibers are quite similar with each other, while defected supramolecular polymers (e.g., BTA_W) have a richer and diverse internal structure/dynamics, closer to that of some types of considered lipid bilayers or micelles. This fits well, e.g., with the complex dynamics of the surface of BTA_W water-soluble fibers^6,61,62,63, and with their dynamic adaptivity and structural reconfigurability ^10,64.

This workflow is powerful for multiple reasons. (i) It does not build on any a priori knowledge on the structure and dynamics of the various assemblies that are compared. (ii) Building on the concept of “defectivity”, it proposes defects as a common ground to compare different materials, a direction that holds a great potential to unify supramolecular materials. (iii) Such a data-driven “defectometer” allows to quantitatively classify dynamic assemblies that are different from each other (e.g., fibers vs. micelles vs. layers vs. nanoparticles). This provides us with a precious tool toward the rational design of self-assembled materials with controllable dynamic properties, which is key to conceive complex systems where multiple assembled entities can effectively communicate with each other in a dynamic way.

Methods

Descriptors of atomic environments

Let a generic output of an MD simulation be the ensemble of system conformations sampled at a series of MD timesteps, represented by the atomic coordinates R(t) of the molecular species in the system at timestep t. R is a 3N-dimensional configuration vector, where N is the number of particles. A generic descriptor is a mapping from the 3N-dimensional coordinate space to a D-dimensional feature space, that associates a feature vector to each of the sampled conformations R(t). We aim at a descriptor capable to characterize and compare the atomic/molecular environment surrounding the multiple constituents of a soft, supramolecular system. A first, crucial requirement of this mapping is that it preserves physical symmetries such as permutation, translation and rotation invariance, ensuring that physically equivalent configurations are recognized as such by the descriptor⁶⁵. Other less obvious properties, required for a useful representation of the atomic/molecular environment are e.g., smoothness, additivity and level of locality; SOAP descriptors satisfy all these requirements ^66,67.

Smooth Overlap of Atomic Position (SOAP)

The SOAP⁴⁹ is an atom-centered descriptor, that accurately reproduces density correlation features of many-body systems. SOAP were introduced in ref. ⁴⁹ as new bond-order parameters, able to efficiently account for radial and angular information of the environment that surrounds atoms or molecules^41,49,66. The SOAP descriptor of a system of N particles describes the atomic/molecular surroundings of a selected set of M coordinates of the system components, which are referred to as the “centers” of the SOAP vector. These M centers can include the position of every single atom of the system, as well as selections or combinations (as e.g., center of geometry/mass) of them. In the following, we comment on the choice of the SOAP center for each of the studied systems, but in general we associate a single SOAP vector per each monomer (see also “Results and discussion”). In this work we have considered only single-specie systems, but the approach is generalizable to multiple species.

The SOAP descriptor is built from the density of neighboring centers j that surround the i-th center, namely

$${\rho }^{i}({{{{{{{\bf{r}}}}}}}})\,=\,\mathop{\sum}\limits_{j}\exp \left[\frac{-| {{{{{{{\bf{r}}}}}}}}\,-\,{{{{{{{{\bf{r}}}}}}}}}_{ij}{| }^{2}}{2{\sigma }^{2}}\right]{f}_{{\mathtt{rcut}}}(| {{{{{{{\bf{r}}}}}}}}\,-\,{{{{{{{{\bf{r}}}}}}}}}_{ij}| ),$$

(1)

where a Gaussian function is associated to each neighboring center (located at distance r = r_ij from the i-th center), to build a smooth density function. The σ parameter sets the width of the Gaussian located at the j-th neighboring center. The total neighbor density ρ = ∑_iρⁱ is retrieved by summing all contributions. The function f_rcut smoothly goes from 1 to 0 at rcut, so that the environment of each center extends up to a fixed cutoff rcut. Starting from Equation (1), which is intrinsically invariant for the permutation of centers, the SOAP descriptors are defined by incorporating the translation and rotation invariances. This is done in two steps: (i) Eq. 1 is expanded in the basis of orthonormal radial functions R_n(r) and spherical harmonics ${Y}_{l,m}(\hat{{{{{{{{\bf{r}}}}}}}}})$; (ii) rotational invariance is enforced by building symmetrized combinations of the expansion coefficients^49,66.

We here employ the second-order SOAP descriptor, also called SOAP power spectrum (in analogy with the Fourier analysis), which can be written as

$${\gamma }_{nn^{\prime} l}^{i}\,\propto\, \frac{1}{\sqrt{2l\,+\,1}}\mathop{\sum }\limits_{m\,=\,-l}^{+l}{({c}_{nlm}^{i})}\,{* }\,{c}_{n^{\prime} lm}^{i},$$

(2)

where ${c}_{nlm}^{i}$ are the expansion coefficients of the particle density surrounding the ith-center. The full SOAP descriptor associated to the i-th center is a vector including all contributions from Eq. 2,

$${{{{{{{{\bf{p}}}}}}}}}_{i}\,=\,\{{\gamma }_{nn^{\prime} l}^{i}\},$$

(3)

where n and $n^{\prime}$ range from 1 to ${\mathtt{nmax}}$ and l ranges from 1 to ${\mathtt{lmax}}$, setting the dimension D of the vector (in cases of multiple species D depends also on the number of species)^41,49. To compute Eq. 2 we used the python package DScribe⁶⁸, using nmax, lmax = 8 and three different cutoff values rcut = 0.8, 1.6, 3.0 nm. The remaining parameters were set to default values of the DScribe library.

To summarize, for a given 3N-dimensional configuration vector R sampled via CG-MD, we define a set of M centers {i}, and for each center we compute the SOAP vector {p_i}. This generates a dataset of M SOAP vectors that describe the structural arrangement of the centers in the selected configuration of the system. Such SOAP vectors represent “local” descriptors, encoding the information on the environments that surrounds each center.

We can also define “global” descriptors, useful for the comparison of different systems: (i) the frame-average SOAP descriptor ${\bar{{{{{{{{\bf{p}}}}}}}}}}_{t}\,=\,\{{\bar{\gamma }}_{nn^{\prime} l}^{t}\}$, where each component of the vector is computed as the power spectrum of the density (Eq. 1), after averaging over all the M centers, namely ⁶⁸:

$${\bar{\gamma }}_{nn^{\prime} l}^{t} \,\sim\, \mathop{\sum }\limits_{m\,=\,-l}^{+l}{\left(\frac{1}{M}\mathop{\sum }\limits_{i}^{M}{c}_{nlm}^{i}\right)}\,{* }\,\left(\frac{1}{M}\mathop{\sum }\limits_{i}^{M}{c}_{n^{\prime} lm}^{i}\right).$$

(4)

This gives a global (averaged) picture of the structural features characterizing a specific MD frame. (ii) The simulation-average SOAP descriptor,

$$\langle \bar{{{{{{{{\bf{p}}}}}}}}}\rangle \,=\,\frac{1}{T}\mathop{\sum }\limits_{t}^{T}{\bar{{{{{{{{\bf{p}}}}}}}}}}_{t},$$

(5)

that averages all the frame-average SOAP descriptors along the T frames collected through the MD simulation. Equation 5 represents a compact global fingerprint for the equilibrium structure of the system under investigation, provided that the T frames are sampled at the equilibrium; we used the frame-average SOAP to assert similarities between different structures. See further details in Supplementary Figs. 1, 2 and Supplementary Note 1.

Comparison of molecular environments

Once SOAP feature vectors are computed, the similarity between the SOAP characterization of different supramolecular structures can be inferred via a distance estimation (or kernel function), provided that the dimensionality D of the compared SOAP vectors is the same (i.e., using equal nmax and lmax) between the compared systems. We measure the similarity between two SOAP vectors using a linear polynomial kernel ^49,50,

$${{{{{{{\mathcal{K}}}}}}}}(i,j)\,=\,\left({{{{{{{{\bf{q}}}}}}}}}_{i}\cdot {{{{{{{{\bf{q}}}}}}}}}_{j}\right),$$

(6)

where q = p/∣p∣ is the unit-normalized SOAP vector. Upon normalization, ${{{{{{{\mathcal{K}}}}}}}}(i,j)$ is equal to 1 if the two molecular environments are exactly superimposed or 0 if no overlap occurs. Since Eq. 6 defines a positive-definite kernel, it naturally induces a metric, for which the distance between two feature vectors is defined as^50,67

$${{{{{{{{\rm{d}}}}}}}}}_{{{{{{{{\rm{SOAP}}}}}}}}}(i,j)\,=\,\sqrt{{{{{{{{\mathcal{K}}}}}}}}(i,i)\,+\,{{{{{{{\mathcal{K}}}}}}}}(j,j)\,-\,2{{{{{{{\mathcal{K}}}}}}}}(i,j)}.$$

(7)

Recently, this SOAP-induced metric was employed in ref. ⁵⁶ to compare and classify the structural arrangement of lipid membranes represented via different force-fields. We here apply the same approach for the comparison of different supramolecular systems, computing Eq. 7 for the pairs of simulation-average SOAP vectors obtained via the simulation of each different test case.

The SOAP metric of Eq. 7 fully depends on the high-dimensional information contained in the SOAP spectra, and it provides a classification that is free from prior assumptions on the structural order of the considered systems. The setting of the cutoff radius rcut in the calculation of the SOAP can influence the resulting feature vectors, and few considerations are useful in this sense. The comparative analysis of all the systems, reported in Fig. 6, shows that the choice of a smaller cutoff (rcut = 0.8 nm) can lower the accuracy of the comparison, especially when substantially different systems are compared (see the d_SOAP matrices in Fig. 6B:top, Supplementary Figs. 6, 7 and 9, and Supplementary Note 2 for additional discussion on this matter). Nonetheless, even a rcut = 0.8 nm (which takes into account only the closest neighbors of the monomers in the SOAP analysis, see Supplementary Fig. 4) appears sufficient to obtain a good characterization of the molecular environment in each system, capturing an analogous pattern of relative distances d_SOAP with respect to the results obtained with larger rcut (see the rescaled d_SOAP matrices in Fig. 6B:bottom). We underline that we chose a fixed value of rcut for all the systems, in each of the analysis reported previously. This is made possible by the fact that the studied systems are all described with a similar resolution, namely employing Martini-based CG models. This gives rise to comparable inter-beads distances and facilitates a comprehensive analysis that accounts for the same neighborhood region per each monomer. In principle it should be possible to include in the dataset also simulation results obtained with molecular models at different resolution, provided that the cutoff is differentiated to have a consistent SOAP description across the different models.

Dimensionality reduction

The SOAP data output of an MD simulation consists of a set of feature vectors (Eq. 3) of dimension D (typically large). High-dimensional data, while rich in information, are not well-suited for visualization and classification/clustering, especially with methods based on Euclidean distance. To overcome these limitations a dimensionality reduction method is necessary.

In the realm of available dimensionality reduction algorithms we have opted for the Principal Component Analysis (PCA)^69,70, a widely used method that was already proven to handle with efficacy high-dimensional SOAP data obtained from high-resolution MD simulations of complex molecular and supramolecular systems^7,54,55,56. For such systems, PCA was found to offer a good compromise between efficiency, reliability of the results and affordable computational cost (as shown in Supplementary Fig. 3, discussed in the following).

Briefly, PCA projects the ensemble of SOAP data onto an orthogonal basis set, that best accounts for the variance in the original dataset, namely catching the main directions along which the SOAP feature vectors undergo the larger variations. The first components of this projection contain most of the relevant information, allowing one to retain only few components, thus reducing the dimensionality. We here performed PCA via the python class sklearn.decomposition.PCA() from the python library Scikit-learn⁷¹. After PCA, we chose to keep only three principal components, (if not stated otherwise), as this allows us to maintain more than 90% of the total variance in all our cases (see Supplementary Fig. 3). The PAMM clustering step is performed on 3-dimensional vectors containing the first three PCs of the SOAP feature vectors. For visual representation of the SOAP vectors in the scatter plots (Figs. 1–6) we employed the first two PC projections.

Building of a training set

Crucial to our comparative analysis is the construction of a “shared” dataset, that comprises SOAP feature vectors from all the individual datasets associated to the systems that are compared. This shared dataset is then used to train the PCA model reducing the dimensionality of the SOAP feature vectors, so that we can proceed with the application of the clustering algorithm and visualization of the data in the PC-plane. The usage of comprehensive data coming from a set of multiple systems allows us to take into account the overall data diversity, and compare the molecular environment across the different systems. To allow for a faster PCA step, we sampled the production trajectories including equal subsets of conformations per each system.

Unsupervised clustering

Relevant patterns in the SOAP data are detected by means of unsupervised clustering: we used Probabilistic Analysis of Molecular Motifs (PAMM)⁵¹, a density-based clustering algorithm, specifically developed to partition the SOAP data collected via MD simulations. In ref. ⁵¹ the authors presented the complete workflow for the algorithm. PAMM takes as input a set of N feature vectors, that represent local or global SOAP descriptors. As detailed earlier, in our case the algorithm processes the three-dimensional projections of SOAP vectors on the first three PCs. Then, the Probability Distribution Function (PDF) of these dimensionally reduced SOAP vectors is estimated by means of a Kernel Density Estimation (KDE), built from a multivariate anisotropic Gaussian function in the 3D space. In order to mitigate the computational load of KDE, the PDF estimation is done on a grid of N_grid ⊂ N points, selected non-uniformly through a farthest point sampling method. After the PDF is accurately estimated a density-based clustering algorithm, based on Gaussian Mixture Modeling (GMM), associates each different local maximum in the PDF to a cluster. Once this analysis is completed we obtain the so-called Probabilistic Motifs Identifiers (PMIs) (as the molecular motifs were called in the original reference)⁵¹ that characterize the features of the system under consideration. The robustness of the clustering analysis (in terms of identified microclusters) is assessed by means of a bootstrapping procedure⁵¹. During this step, the SOAP vectors are resampled generating several (N = 73) independent samples of SOAP vectors (sub-datasets).The clustering procedure (KDE and GMM steps) is then performed on each resampled sub-dataset, and the resulting clusters are compared to those obtained using the original (non-resampled and complete) dataset. This comparison results in a stability matrix, which measures how robust the detected clusters are. From this matrix we can build a hierarchical tree (dendrogram) that indicates the adjacency of the obtained microclusters (or the monomer microstates) in the SOAP space. The analysis is performed on snapshots collected along equilibrium MD trajectories, keeping track of the microclusters to which each individual monomer belongs to at each timestep. Thus, this analysis allows us to retrieve information on how structurally similar/different and how dynamically interconnected (i.e., how quickly/slowly monomers exchange between the microstates) the detected microclusters are. The dendrogram of Fig. 1C describes how the microclusters can be condensed in macroclusters (based on their structural/dynamic adjacency), obtaining a coarse-grained version of the analysis (see also Supplementary Fig. 5). All PAMM clustering analyses were conducted using the original PAMM algorithm (Available online at https://github.com/cosmo-epfl/pamm) modified with a tailored Python3 wrapper for handling the different analysis steps and post-processing.

Hierarchical clustering

Hierarchical clustering generally refers to clustering methods that fracture datasets into subsets according to a selected measure of (dis)similarity. In our work we employed the SOAP metric d_SOAP to merge pairs of clusters in higher-level groups, until the hierarchy of all the systems is completed. This results in the tree-like plot or dendrogram reported in Fig. 7B, showing the connectivity among the systems. The hierarchical clustering step was performed using the open source Python library Scikit-learn sklearn.cluster.AgglomerativeClustering() and choosing as linkage method single.

Molecular dynamics simulations

Our analyses build on sampling conformations/transitions during equilibrium MD simulations of the different assembled systems that are studied herein. In general, since in these analyses we reconstruct the internal dynamics and the equilibrium microstates present in these assemblies, a prime requisite for the robustness of the obtained results is that the monomer conformations and transitions are statistically well-sampled, in order to obtain a reliable representation of the equilibrium for each considered assembly (compatibly with the approximations and limits of the molecular models that are used). For this reason, in all systems considered herein, the length of the MD simulations is in the order of microseconds, ensuring that the equilibrium ensemble of each system is well represented by the collected conformational data. In the following we describe the MD approach and sampling protocol that we adopted to ensure this condition for each individual systems simulated herein.

All the simulations presented in this work were performed using the MD package GROMACS⁷², version 2018.6, from the setup of the simulation box, to the equilibration and production runs. We adopted CG descriptions for each system, built via the Martini scheme⁵⁸, with explicit solvent description. The parametrization of the monomer models was conducted following the literature data (where available), using the standard, non-polarizable Martini force-field⁵⁸ (version 2.2). Details per each system are provided in the following. All simulations were performed using Periodic Boundary Conditions (PBC) to limit the finite-size effects, also allowing to simulate portions of infinite aggregates or membranes. To prepare the CG model systems for MD in equilibrium conditions we performed CG-MD equilibration runs of the order of μs in NPT conditions, starting from energy-minimized system conformations. We then proceeded with the production CG-MD runs, to collect the statistics of equilibrium configurations processed in our analyses; all our systems where sampled every 1 ns of CG-MD. For all the CG-MD runs (equilibration and production) we used a 20 fs timestep, an interaction cutoff (1.1 nm) where the non-bonded potentials are truncated and shifted (consistently with the Martini scheme). The Verlet neighbor list scheme⁷³ is employed to reduce the load of calculating pair interactions. The temperature was maintained through the V-rescale thermostat⁷⁴, set at 300 K for all the simulations, except the cases where a different temperature is stated (i.e., for lipid membranes) with a coupling constant of 1.0 ps. The pressure was maintained via Berendsen barostat⁷⁵, set at 1 atm with a coupling constant of 2 ps. Isotropic scaling of the simulation cell is adopted when finite aggregates are simulated (e.g., the HEXA clusters), while semiisotropic scaling (decoupling x/y from z) is adopted when a portion of an infinite aggregate is considered (crossing the PBCs along one or two axes), as for fibers and membranes.

In the following, we report the model references and the specific simulation setup for each system studied.

Supramolecular polymers (1D assemblies)

The supramolecular polymers studied in this work belong to three families, characterized by a specific chemical structure of their monomer functional core. For each family we considered monomer variants:

1,3,5-Benzenetricarboxamides (BTA) monomers, already extensively studied in the literature^6,7,76. We considered three different variants: a water-soluble BTA_W (Fig. 1A), an organic solvent (e.g., octane) soluble BTA_C8 (Fig. 1B), and an intermediate case, BTA^* (Fig. 1C), obtained from BTA_C8 by artificially changing the inter-monomer interaction. The parametrization used for the three monomer models was the same as in ref. ⁷ and the solvents, water and octane, were parametrized according to Martini standards⁵⁸. For the calculation of SOAP vectors we associated a single center for each BTA monomer, namely the center-of-geometry (COG) of the monomer core (the three central beads in Fig. 1), which was proven sufficient to describe the structural dynamics of such supramolecular fiber ⁷.
Core-substituted naphthalene diimide (NDI)⁵⁹ monomers (Fig. 2A); we studied two different variants of NDIs, by changing the substituent atom on the side of the core structure. Following ref. ⁵⁹ we considered a substituted Oxygen (NDI_O, Fig. 2A) and Sulfur (NDI_S, Fig. 2B). The parametrization of the two monomer models was that of ref. ⁵⁹. In both cases the solvent was cyclohexane, parametrized according to Martini standards⁵⁸. Also for the analysis of NDI dynamics we selected the COG of the monomer cores (which correspond to the central pink bead in the central arrangement of five pink beads in Fig. 2A, B) as SOAP center.
Benzotrithiophene (BTT)⁶⁰ monomers; we studied two variants of these planar, water-soluble monomers, changing the three amino-acids attached to the aromatic core of the molecule. As in ref. ⁶⁰ we used L-phenylalanine for the first variant (BTT_F, Fig. 2C), and pentafluoro-L-phenylalanine for the other (BTT_5F, Fig. 2D). Both monomers present octaethylene glycol side-chains that impart water-solubility to the compound. The CG parametrization of these monomers was the same of ref. ⁶⁰. The solvent used for these BTT polymers was water, parametrized according to Martini standards (Martini water⁵⁸). The COG of the monomer cores (computed amongst the 3 central pink beads in Fig. 2C, D) was employed as SOAP center.

For all the seven fiber-systems we have built an ordered, one-dimensional stack that crosses the PBCs along the principal axis of the aggregate, so that the monomers at the extremes are in contact with each other mimicking and a infinite-fiber portion. Each pre-stacked aggregate was then equilibrated, and a CG-MD production run of about ~2 μs was performed to sample the structural dynamics in equilibrium conditions.

Micelles and membranes (2D assemblies)

The lipid molecule selected as reference case for two-dimensional aggregates was the dipalmitoylphosphatidylcholine phospholipid (DPPC, Fig. 4A) for which homogeneous membranes at three different temperatures (273, 293, and 323 K, Fig. 4B) were prepared. In this work we utilized the CG Martini parametrization adopted in ref. ⁵⁶, where SOAP + PAMM characterization of this phospholipid membrane, modeled using different descriptions, is performed. As test cases for the micelle aggregates we selected two surfactant molecules, namely Dodecylphosphocholine (DPC), and Sodium-dodecylsulfate (SDS); both parametrized according to the standard Martini force-field^58,77. The explicit solvent used for these lipid and surfactant systems was Martini water. The equilibrated structure of the surfactant micelles was obtained via spontaneous self-assembly from dispersed monomers, to obtain a size near the normal size distribution at our conditions. The adopted equilibration procedure for the bilayer models is described in ref. ⁵⁶ (which is the minimization and equilibration protocol given by CHARMM-GUI⁷⁸). For each system we performed production CG-MD runs of 1 μs of simulation time. For DPPC, DPC and SDS molecules we employed a single-center approach for the SOAP vector calculation, choosing the CG-bead that represents the head of the amphiphilic molecules as SOAP center (i.e., the “PO4” (DPPC), “PO4” (DPC) and “SO3” (SDS) beads in Fig. 4A). The results of an extra SOAP + PAMM analysis, limited to DPC and SDS systems are reported in Supplementary Fig. 8.

Spherical nanoparticles (3D assemblies)

The molecule chosen as representative case for spherical aggregates is the alkane hydrocarbon Hexadecane (HEXA); we parametrized this molecule following the Martini force-field standards⁵⁸. The explicit solvent used for these simulations was Martini water. Three aggregate structures of different size (128, 512, and 2048 monomers) were constructed via 2 μs of CG-MD equilibration run, starting from randomly dispersed monomers in water. During such CG-MD stage the monomers quickly self-assemble and equilibrate forming the spherical structures represented in Fig. 5A). Production runs of different length were carried out to gather the data for the analysis of the equilibrium structures, collecting the same number of SOAP vectors per each systems, independently of their different sizes. Namely, the 128 monomer system was simulated for 2 μs, the 512 monomer system for 0.5 μs and the 2048 monomer system for 0.125 μs. As center for the SOAP representation we chose the COG of the two central beads in the CG model of HEXA.

Data availability

Details on the molecular models and MD simulations, and additional MD data are provided in the “Methods” section and in the Supplementary Information. Computational materials and data pertaining to the study conducted herein are available at: https://doi.org/10.5281/zenodo.6822330. Other information needed is available from the corresponding author upon request.

References

Aida, T., Meijer, E. W. & Stupp, S. I. Functional supramolecular polymers. Science 335, 813–817 (2012).
Article CAS PubMed PubMed Central Google Scholar
Brunsveld, L., Folmer, B. J. B., Meijer, E. W. & Sijbesma, R. P. Supramolecular polymers. Chem. Rev. 101, 4071–4098 (2001).
Article CAS PubMed Google Scholar
de Greef, T. F. A. & Meijer, E. W. Supramolecular polymers. Nature 453, 171–173 (2008).
Article PubMed CAS Google Scholar
Boekhoven, J. & Stupp, S. I. 25th anniversary article: supramolecular materials for regenerative medicine. Adv. Mater. 26, 1642–1659 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lehn, J.-M. Supramolecular chemistry. Science 260, 1762–1763 (1993).
Article CAS PubMed Google Scholar
Bochicchio, D., Salvalaglio, M. & Pavan, G. M. Into the dynamics of a supramolecular polymer at submolecular resolution. Nat. Commun. 8, 147 (2017).
Article PubMed PubMed Central CAS Google Scholar
Gasparotto, P., Bochicchio, D., Ceriotti, M. & Pavan, G. M. Identifying and tracking defects in dynamic supramolecular polymers. J. Phys. Chem. B 124, 589–599 (2020).
Article CAS PubMed Google Scholar
de Marco, A. L., Bochicchio, D., Gardin, A., Doni, G. & Pavan, G. M. Controlling exchange pathways in dynamic supramolecular polymers by controlling defects. ACS Nano 15, 14229–14241 (2021).
Article PubMed PubMed Central CAS Google Scholar
Crippa, M., Perego, C., de Marco, A. L. & Pavan, G. M. Molecular communications in complex systems of dynamic supramolecular polymers. Nat. Commun. 13, 2162 (2022).
Article CAS PubMed PubMed Central Google Scholar
Torchi, A., Bochicchio, D. & Pavan, G. M. How the dynamics of a supramolecular polymer determines its dynamic adaptivity and stimuli-responsiveness: Structure-dynamics-property relationships from coarse-grained simulations. J. Phys. Chem. B 122, 4169–4178 (2018).
Article CAS PubMed Google Scholar
Bochicchio, D., Kwangmettatam, S., Kudernac, T. & Pavan, G. M. How defects control the out-of-equilibrium dissipative evolution of a supramolecular tubule. ACS Nano 13, 4322–4334 (2019).
Article CAS PubMed Google Scholar
LI, J. & LOH, X. Cyclodextrin-based supramolecular architectures: Syntheses, structures, and applications for drug and gene delivery. Adv. Drug Deliv. Rev. 60, 1000–1017 (2008).
Article CAS PubMed Google Scholar
Matson, J. B., Zha, R. H. & Stupp, S. I. Peptide self-assembly for crafting functional biological materials. Curr. Opin. Solid State Mater. Sci. 15, 225–235 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bastings, M. M. C. et al. A fast pH-switchable and self-healing supramolecular hydrogel carrier for guided, local catheter injection in the infarcted myocardium. Adv. Healthc. Mater. 3, 70–78 (2013).
Article PubMed CAS Google Scholar
Nagel, S. R. Experimental soft-matter science. Rev. Mod. Phys. 89, 025002 (2017).
Article Google Scholar
Lionello, C. et al. Toward chemotactic supramolecular nanoparticles: from autonomous surface motion following specific chemical gradients to multivalency-controlled disassembly. ACS Nano 15, 16149–16161 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yan, X., Wang, F., Zheng, B. & Huang, F. Stimuli-responsive supramolecular polymeric materials. Chem. Soc. Rev. 41, 6042–6065 (2012).
Article CAS PubMed Google Scholar
Davis, A. V., Yeh, R. M. & Raymond, K. N. Supramolecular assembly dynamics. Proc. Natl Acad. Sci. 99, 4793–4796 (2002).
Article CAS PubMed PubMed Central Google Scholar
Lehn, J.-M. Dynamers: dynamic molecular and supramolecular polymers. Prog. Polym. Sci. 30, 814–831 (2005).
Article CAS Google Scholar
Lancia, F. et al. Reorientation behavior in the helical motility of light-responsive spiral droplets. Nat. Commun. 10, 5238 (2019).
Article PubMed PubMed Central CAS Google Scholar
Babu, D. et al. Acceleration of lipid reproduction by emergence of microscopic motion. Nat. Commun. 12, 2959 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xiu, F. et al. Multivalent non-covalent interfacing and cross-linking of supramolecular tubes. Adv. Mater. 34, 2105926 (2021).
Article CAS Google Scholar
Gentile, S. et al. Spontaneous reorganization of DNA-based polymers in higher ordered structures fueled by RNA. J. Am. Chem. Soc. https://doi.org/10.1021/jacs.1c09503 (2021).
Glotzer, S. C. & Solomon, M. J. Anisotropy of building blocks and their assembly into complex structures. Nat. Mater. 6, 557–562 (2007).
Article PubMed Google Scholar
Xia, Y. et al. Self-assembly of self-limiting monodisperse supraparticles from polydisperse nanoparticles. Nat. Nanotechnol. 6, 580–587 (2011).
Article CAS PubMed Google Scholar
Cersonsky, R. K., van Anders, G., Dodd, P. M. & Glotzer, S. C. Relevance of packing to colloidal self-assembly. Proc. Natl Acad. Sci. 115, 1439–1444 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lee, S., Teich, E. G., Engel, M. & Glotzer, S. C. Entropic colloidal crystallization pathways via fluid–fluid transitions and multidimensional prenucleation motifs. Proc. Natl Acad. Sci. 116, 14843–14851 (2019).
Article CAS PubMed PubMed Central Google Scholar
Capito, R. M., Azevedo, H. S., Velichko, Y. S., Mata, A. & Stupp, S. I. Self-assembly of large and small molecules into hierarchically ordered sacs and membranes. Science 319, 1812–1816 (2008).
Article CAS PubMed Google Scholar
Marchetti, M. C. et al. Hydrodynamics of soft active matter. Rev. Mod. Phys. 85, 1143–1189 (2013).
Article CAS Google Scholar
Chen, Q., Bae, S. C. & Granick, S. Directed self-assembly of a colloidal kagome lattice. Nature 469, 381–384 (2011).
Article CAS PubMed Google Scholar
Sanchez, T., Chen, D. T. N., DeCamp, S. J., Heymann, M. & Dogic, Z. Spontaneous motion in hierarchically assembled active matter. Nature 491, 431–434 (2012).
Article CAS PubMed PubMed Central Google Scholar
Garzoni, M. et al. Effect of H-bonding on order amplification in the growth of a supramolecular polymer in water. J. Am. Chem. Soc. 138, 13985–13995 (2016).
Article CAS PubMed Google Scholar
Tantakitti, F. et al. Energy landscapes and functions of supramolecular systems. Nat. Mater. 15, 469–476 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bochicchio, D. & Pavan, G. M. From cooperative self-assembly to water-soluble supramolecular polymers using coarse-grained simulations. ACS Nano 11, 1000–1011 (2017).
Article CAS PubMed Google Scholar
Carter-Fenk, K., Lao, K. U., Liu, K.-Y. & Herbert, J. M. Accurate and efficient ab initio calculations for supramolecular complexes: symmetry-adapted perturbation theory with many-body dispersion. J. Phys. Chem. Lett. 10, 2706–2714 (2019).
Article CAS PubMed Google Scholar
Souza, P. C. T. et al. Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nat. Methods 18, 382–388 (2021).
Article CAS PubMed Google Scholar
Ceriotti, M., Tribello, G. A. & Parrinello, M. Simplifying the representation of complex free-energy landscapes using sketch-map. Proc. Natl Acad. Sci. 108, 13023–13028 (2011).
Article CAS PubMed PubMed Central Google Scholar
Long, A. W. & Ferguson, A. L. Nonlinear machine learning of patchy colloid self-assembly pathways and mechanisms. J. Phys. Chem. B 118, 4228–4244 (2014).
Article CAS PubMed Google Scholar
Cheng, B. et al. Mapping Materials and Molecules. Acc. Chem. Res. 53, 1981–1991 (2020).
Article CAS PubMed Google Scholar
Magdău, I.-B. & Miller, T. F. Machine learning solvation environments in conductive polymers: Application to ProDOT-2hex with solvent swelling. Macromolecules 54, 3377–3387 (2021).
Article CAS Google Scholar
Deringer, V. L. et al. Gaussian process regression for materials and molecules. Chem. Rev. 121, 10073–10141 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ferguson, A. L. Machine learning and data science in soft materials engineering. J. Phys. Condens. Mat. 30, 043002 (2017).
Article Google Scholar
Chen, W., Sidky, H. & Ferguson, A. L. Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets. J. Chem. Phys. 150, 214114 (2019).
Article PubMed CAS Google Scholar
Ceriotti, M. Unsupervised machine learning in atomistic simulations, between predictions and understanding. J. Chem. Phys. 150, 150901 (2019).
Article PubMed CAS Google Scholar
Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
Article PubMed CAS Google Scholar
Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Article PubMed CAS Google Scholar
Faber, F., Lindmaa, A., von Lilienfeld, O. A. & Armiento, R. Crystal structure representations for machine learning models of formation energies. Int. J. Quantum Chem. 115, 1094–1101 (2015).
Article CAS Google Scholar
Huo, H. & Rupp, M. Unified representation of molecules and crystals for machine learning. https://doi.org/10.48550/arXiv.1704.06439 (2018).
Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Article CAS Google Scholar
De, S., P. Bartók, A., Csányi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).
Article CAS PubMed Google Scholar
Gasparotto, P., Meißner, R. H. & Ceriotti, M. Recognizing local and global structural motifs at the atomic scale. J. Chem. Theory Comput. 14, 486–498 (2018).
Article CAS PubMed Google Scholar
Engel, E. A., Anelli, A., Ceriotti, M., Pickard, C. J. & Needs, R. J. Mapping uncharted territory in ice from zeolite networks to ice structures. Nat. Commun. 9, 2173 (2018).
Article PubMed PubMed Central CAS Google Scholar
Gasparotto, P. & Ceriotti, M. Recognizing molecular patterns by machine learning: An agnostic structural definition of the hydrogen bond. J. Chem. Phys. 141, 174110 (2014).
Article PubMed CAS Google Scholar
Monserrat, B., Brandenburg, J. G., Engel, E. A. & Cheng, B. Liquid water contains the building blocks of diverse ice phases. Nat. Commun. 11, 5757 (2020).
Article CAS PubMed PubMed Central Google Scholar
Capelli, R., Miranda, F. M. & Pavan, G. M. Ephemeral ice-like local environments in classical rigid models of liquid water. J. Chem. Phys. 156, 214503 (2022).
Article CAS PubMed Google Scholar
Capelli, R., Gardin, A., Empereur-mot, C., Doni, G. & Pavan, G. M. A data-driven dimensionality reduction approach to compare and classify lipid force fields. J. Phys. Chem. B 125, 7785–7796 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bian, T. et al. Electrostatic co-assembly of nanoparticles with oppositely charged small molecules into static and dynamic superstructures. Nat. Chem. 13, 940–949 (2021).
Article CAS PubMed PubMed Central Google Scholar
Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P. & de Vries, A. H. The MARTINI force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111, 7812–7824 (2007).
Article CAS PubMed Google Scholar
Sarkar, A. et al. Self-sorted, random, and block supramolecular copolymers via sequence controlled, multicomponent self-assembly. J. Am. Chem. Soc. 142, 7606–7617 (2020).
Article CAS PubMed Google Scholar
Casellas, N. M. et al. From isodesmic to highly cooperative: reverting the supramolecular polymerization mechanism in water by fine monomer design. Chem. Commun. 54, 4112–4115 (2018).
Article CAS Google Scholar
Albertazzi, L. et al. Probing exchange pathways in one-dimensional aggregates with super-resolution microscopy. Science 344, 491–495 (2014).
Article CAS PubMed Google Scholar
Baker, M. B. et al. Exposing differences in monomer exchange rates of multicomponent supramolecular polymers in water. ChemBioChem 17, 207–213 (2016).
Article CAS PubMed Google Scholar
Lou, X. et al. Dynamic diversity of synthetic supramolecular polymers in water as revealed by hydrogen/deuterium exchange. Nat. Commun. 8, 15420 (2017).
Article PubMed PubMed Central CAS Google Scholar
Albertazzi, L. et al. Spatiotemporal control and superselectivity in supramolecular polymers using multivalency. Proc. Natl Acad. Sci. 110, 12203–12208 (2013).
Article CAS PubMed PubMed Central Google Scholar
Ceriotti, M. Unsupervised machine learning in atomistic simulations, between predictions and understanding. J. Chem. Phys. 150, 150901 (2019).
Article PubMed CAS Google Scholar
Musil, F. et al. Efficient implementation of atom-density representations. J. Chem. Phys. 154, 114109 (2021).
Article CAS PubMed Google Scholar
Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).
Article CAS PubMed Google Scholar
Himanen, L. et al. DScribe: Library of descriptors for machine learning in materials science. Comput. Phys. Commun. 247, 106949 (2020).
Article CAS Google Scholar
Pearson, K. LIII. on lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin philos. mag. j. sci. 2, 559–572 (1901).
Article Google Scholar
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Abraham, M. J. et al. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015).
Article Google Scholar
de Jong, D. H., Baoukina, S., Ingólfsson, H. I. & Marrink, S. J. Martini straight: boosting performance using a shorter cutoff and GPUs. Comput. Phys. Commun. 199, 1–7 (2016).
Article CAS Google Scholar
Bussi, G., Donadio, D. & Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 126, 014101 (2007).
Article PubMed CAS Google Scholar
Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).
Article CAS Google Scholar
Bejagam, K. K. & Balasubramanian, S. Supramolecular polymerization: a coarse grained molecular dynamics study. J. Phys. Chem. B 119, 5738–5746 (2015).
Article CAS PubMed Google Scholar
Alessandri, R., Grünewald, F. & Marrink, S. J. The martini model in materials science. Adv. Mater. 33, 2008635 (2021).
Article CAS Google Scholar
Lee, J. et al. CHARMM-GUI input generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM simulations using the CHARMM36 additive force field. J. Chem. Theory Comput. 12, 405–413 (2015).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

G.M.P. acknowledges the funding received by the ERC under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 818776 - DYNAPOL), by the Swiss National Science Foundation (SNSF grant IZLIZ2_183336) and H2020 under the FET Open RIA program (grant agreement no. 964386 - MIMICKEY). The authors acknowledge the computational resources provided by the Swiss National Supercomputing Center (CSCS), CINECA and HPC@POLITO (http://www.hpc.polito.it).

Author information

Authors and Affiliations

Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
Andrea Gardin & Giovanni M. Pavan
Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland, Lugano-Viganello, Switzerland
Claudio Perego, Giovanni Doni & Giovanni M. Pavan

Authors

Andrea Gardin
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Perego
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Doni
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni M. Pavan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.M.P. conceived this research and supervised the work. A.G. performed the simulations and the analyses. A.G. and G.D. implemented the analysis methods. A.G., C.P., G.D., and G.M.P. discussed the results. A.G., C.P., and G.M.P wrote the paper.

Corresponding author

Correspondence to Giovanni M. Pavan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gardin, A., Perego, C., Doni, G. et al. Classifying soft self-assembled materials via unsupervised machine learning of defects. Commun Chem 5, 82 (2022). https://doi.org/10.1038/s42004-022-00699-z

Download citation

Received: 22 December 2021
Accepted: 29 June 2022
Published: 14 July 2022
DOI: https://doi.org/10.1038/s42004-022-00699-z

This article is cited by

Machine learning of atomic dynamics and statistical surface identities in gold nanoparticles
- Daniele Rapetti
- Massimo Delle Piane
- Giovanni M. Pavan
Communications Chemistry (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.