## Abstract

The complexity and diversity of polymer topologies, or chain architectures, present substantial challenges in predicting and engineering polymer properties. Although machine learning is increasingly used in polymer science, applications to address architecturally complex polymers are nascent. Here, we use a generative machine learning model based on variational autoencoders and data generated from molecular dynamics simulations to design polymer topologies that exhibit target properties. Following the construction of a dataset featuring 1342 polymers with linear, cyclic, branch, comb, star, or dendritic structures, we employ a multi-task learning framework that effectively reconstructs and classifies polymer topologies while predicting their dilute-solution radii of gyration. This framework enables the generation of polymer topologies with target size, which is subsequently validated through molecular simulation. These capabilities are then exploited to contrast rheological properties of topologically distinct polymers with otherwise similar dilute-solution behavior. This research opens avenues for engineering polymers with more intricate and tailored properties with machine learning.

### Similar content being viewed by others

## Introduction

The topology of a polymer chain, or equivalently the chain architecture, can substantially influence their properties and those of derivative materials. For example, in natural polymers, while linear amylose forms dense aggregates with low aqueous solubility, the analogous but highly branched structure of amylopectin impedes association of chains, thereby enhancing its solubility^{1}. In the realm of synthetic polymers, the branching in low-density polyethylene improves its processability for applications like blow and extrusion molding, whereas linear high-density polyethylene possesses superior mechanical strength and chemical resistance. There is also growing interest in understanding implications of polymer topology due to advancements in various controllable synthetic methodologies^{2,3,4,5}. These methods enable the creation of polymers with a wide range of complex topologies, such as stars^{6,7}, combs^{8,9}, branches^{10,11}, hyperbranches^{12,13}, dendrimers^{14,15}, rings^{16,17}, and brushes^{18,19}.

Establishing quantitative relationships between polymer topology and material properties remains challenging. Both experimental and computational investigations have enhanced understanding of how polymer topology influences properties of interest to many areas, such as enhanced oil recovery^{20,21}, coatings and adhesives^{22,23}, rheology and fluid dynamics^{24,25,26}, energy storage^{27,28,29,30,31,32,33}, and biomedical applications^{34,35,36,37,38}. Nevertheless, the efforts of labor-intensive and potentially costly synthesis and characterization typically limits experimental studies to a small set of systems, which may still not yield well-defined topological ensembles^{3,39}. Computationally, although there is no ambiguity associated with the underlying topologies of the polymers or their construction, simulations are often restricted to a particular class of topologies owing to computational costs and perhaps uncertainty with how to tangibly compare diverse topologies^{40,41}. Overall, these factors obfuscate the construction of general topology-property correlations, which also precludes facile design of topologically complex polymers.

Recent advancements in and applications of machine learning have spurred significant developments in polymer design. These efforts span many applications, such as tailoring the structures of single-chain nanoparticles^{42,43}, enhancing enzyme stability^{44,45}, delivering drugs and therapeutics^{46,47,48}, and identifying gas-separation membranes^{49}. Generative machine learning models^{50} are a particularly intriguing class of algorithms for chemical design. For example, variational autoencoders (VAEs) are adept at encoding complex data into lower-dimensional latent spaces^{51,52} and have previously facilitated the generation of small molecules^{53,54}. Applications of VAEs in polymer science are also emerging^{55,56}. Shmilovich et al. combined VAEs with molecular dynamics (MD) simulations and Bayesian optimization to guide the discovery of *π*-conjugated oligopeptides^{57} with desirable aggregation behavior to influence optoelectronic properties. In devising the Open Macromolecular Genome (OMG), Kim et al. utilized a generative framework with VAEs that can not only provide polymer structures but also retrosynthesis^{58}, thereby facilitating optimization of synthetically accessible materials. Nevertheless, these and other studies primarily focus on specific chemical spaces or linear polymers, highlighting the need for methods to generate polymers with complex topologies and tailored properties.

In this study, we create a multi-task VAE to generate polymers with specified topology and desired characteristics. This model is developed using an original dataset comprising coarse-grained MD data for over 1300 polymers of various topologies, including star, comb, *α**ω*-branched, linear, cyclic, and dendrimer structures, spanning a range of molecular weights. Input and encoding strategies are critically assessed by training several models that aim to reconstruct the polymer topology and also perform auxiliary tasks of estimating the characteristic size of the polymer and classifying its topology. We find that auxiliary tasks enhance the physical interpretability of the learned latent space of the VAE. Our most effective generative modeling framework, TopoGNN, incorporates both graph and topological descriptor features. For demonstrative purposes, TopoGNN is leveraged to produce sets of topologically diverse polymers that exhibit the same characteristic size in dilute solution (Fig. 1, top) but contrasting rheological behavior at finite concentrations (Fig. 1, bottom). This work expands the utility of generative modeling for polymer design and demonstrates how such algorithms can also facilitate controlled studies across complex, topologically diverse polymers.

## Results

### Polymer dataset

We first generate and characterize a topologically diverse set of polymers for training and evaluating the VAE. In particular, we initially prepare and simulate 1342 polymers across six architectural classes (11 each for linear and cyclic and 330 each for *α**ω*-branched, comb, star and dendrimer); the *α**ω*-branched architecture possesses side-chains at two backbone termini and is simply denoted “*branch*” in figures and tables. The degree of polymerization ranges from 90 to 100 for each architectural class. The VAE here is tasked to encode and decode a specific manifestation of a polymer topology, although the representation of ensembles of such structures is of future interest.

Figure 2a showcases the diversity of structures across a representative set of these polymers. This diversity is also manifest through the variation of numerous topological descriptors shown in Fig. 2b: *N*_{nodes} represents the number of nodes, *N*_{edges} the number of edges, \(\bar{d}\) the average node degree, \({\bar{d}}_{{{{\rm{nb}}}}}\) the average neighbor degree, *δ* the graph density, *ϕ*_{diam} the graph diameter, *ϕ*_{rad} the graph radius, *λ*_{alg} the algebraic connectivity, *C*_{deg} the degree centrality, *C*_{bet} the betweenness centrality, and *r*_{deg} the degree assortativity. These descriptors, which are derived purely from knowledge of the molecular graph/polymer connectivity, provide a first means to quantitatively characterize and distinguish polymer topologies. The current dataset is restricted to architectures that possess at most one cycle (corresponding to the macrocycle of the cyclic polymer) and also does not describe polymer networks; however, including descriptors related to cyclization or meshes may benefit future models. Despite the uniformity in the number of nodes and edges, which are commonly used to characterize polymers, significant variations are observed in other topological descriptors. For instance, comb, branch, star, and dendrimer topologies, exhibit notable differences in descriptors like graph diameter, radius, betweenness centrality, and degree assortativity, even when node and edge counts are identical. Our primary aim is to assess the efficacy of ML to describe properties of topologically complex polymers. Consequently, the CG simulations used to generate data, for training and benchmarking, are based on the phenomenological Kremer-Grest (KG) model, which is agnostic to constitutional unit chemistry. Furthermore, the results do not represent any specific polymer, although KG can be descriptive of many polymer systems based on mapping schemes^{59}. We anticipate that future work could straightforwardly leverage similar ML architectures of chemically specific parameterized CG models, including those with hydrodynamic interactions.

Figure 2c and Supplementary Fig. 1 illustrate the range of characteristic polymer sizes, as expressed through the simulated mean squared radius of gyration \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\), observed in each class. Because the present study imposes a maximum number of monomers, polymers from the linear, cyclic, and dendrimer classes exhibit relatively narrow distributions in \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) by contrast to comb, branch, and star classes. Dendrimers notably form compact, globular structures over the range of simulated molecular weights relative to all other classes. Overall, the dataset is partitioned into a 64/16/20 train/validation/test split for future model construction and evaluation; stratified sampling is used to ensure proportional representation of architectural classes across all splits.

### Polymer reconstruction and property prediction

Based on prior work on linear polymer featurization^{60,61}, we hypothesized that polymer reconstruction with a VAE could be enhanced if derived topological descriptors were supplied as inputs. To examine this, we evaluate three distinct encoding strategies: TopoGNN, which integrates topological descriptors with graph features; GNN, which exclusively relies on graph features; and Topo, which solely employs topological descriptors. For each strategy, we consider a multitude of models with distinct hyperparameters and their performance across a broad range of evaluation metrics. For example, reconstruction performance is quantitatively evaluated with balanced accuracy (BACC), which measures the accuracy of individual entries in the reconstructed adjacency matrix. For topology classification, *F*_{1} score is chosen to address the class imbalance in our dataset. Other metrics include the coefficient of determination *R*^{2} for regression on \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) and the Kullback-Leibler (KL) divergence. Representative models for each encoding strategy are selected using a comprehensive evaluation score (CES) that simultaneously considers all criteria:

where \(\overline{a}\) denotes the min-max normalized value of *a*; CES can be interpreted as the distance from the origin (a perfect model) in a vector space spanned by error metrics.

Table 1 summarizes the performance of these representative models. Across encoding strategies, TopoGNN emerges as the most overall effective, registering the smallest CES. By comparison, the Topo model yields slightly superior performance on regression and comparable *F*_{1} score. Conversely, the GNN model demonstrates a slightly higher balanced accuracy in reconstruction tasks and a lower KL divergence; however, it significantly underperforms in regression and classification. These results support the inclusion of topological descriptors during construction of the VAE.

To assess model generalizability, we examine the performance of the representative models on the held-out test set. Figure 3 again indicates that TopoGNN delivers consistently strong performance across several evaluation criteria, while GNN and Topo can be deficient in particular metrics. Balanced accuracy is highest for GNN (0.9397), closely followed by TopoGNN (0.9369) and then Topo (0.9164). This suggests that topological descriptors do not necessarily enhance reconstruction performance, although the ability of Topo to effectively reconstruct certain topologies (e.g., branch polymers) highlights the extensive information content encompassed by the 11 topological descriptors. By contrast, directly supplying topological information is clearly advantageous for predicting the characteristic polymer size. Here, TopoGNN stands out as the most effective, achieving the highest mean value (0.9920), surpassing Topo (0.9854)and GNN (0.9639). Meanwhile, GNN achieves the highest mean *F*_{1} score (0.9783), followed by TopoGNN (0.9689) and Topo (0.9678); however all models display statistically comparable results regarding this classification metric. Taken together, this suggest workflows with VAEs can effectively address complexities induced by these polymer architectures.

For a more nuanced assessment of model quality, Fig. 4 breaks down TopoGNN performance across architectures; comparable information for other models is in Supplementary Figs. 2 and 3. In polymer reconstruction, TopoGNN excels but faces challenges with specific cyclic and comb polymers (Fig. 4a, gray dashed boxes). Notably, GNN generates errors, especially for star polymers, while Topo exhibits minor errors across most architectures. Regarding the prediction of \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) (Fig. 4b), TopoGNN performs well regardless of polymer class. Both GNN and Topo display high correlation, but errors are generally larger for GNN (Supplementary Fig. 2), indicating the difficulty in establishing a direct relationship between graph features and \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). A saliency analysis (Supplementary Fig. 3) reveals that graph diameter, betweenness centrality, and algebraic connectivity most strongly influence \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\), aligning with their direct correlation with \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) (Supplementary Fig. 1). For topology classification, TopoGNN (Fig. 4c) is broadly effective, with most misclassifications occurring in linear, *α**ω*-branched, and comb architectures. These issues are more pronounced in Topo and GNN (Supplementary Fig. 3) and can be augmented with other misclassifciations. Overall, TopoGNN, which utilizes both graph and topological features, not only consistently outperforms other models but also delivers high-quality results. The remainder of the article therefore focuses on analysis and applications of TopoGNN to illustrate its practical deployment.

### Latent space exploration and polymer generation

Figure 5 presents the UMAP projection of the 8-dimensional latent space of topoGNN into a 2-dimensional space for visualization. Distinct topological clusters emerge in Fig. 5a and b, which reveals organization of the latent space that depends on relationships amongst architectures and their physical properties. Dendrimers, characterized by their high orders of branches, form three, mostly isolated and distinct clusters that reflect how the dendrimer architectures were algorithmically generated; they are most closely related to star polymers and *α**ω*-branched polymers (particularly those with pom-pom architectures). Branch, comb, and star polymers all notably overlap within the latent space, which is attributed to topological similarities (Fig. 2b). Cyclic and linear polymers are interspersed within comb and branch clusters, with linear polymers sharing a long backbone and cyclic polymers possessing a long ring-closed backbone. This organization is clearly informed training with auxiliary tasks for predicting \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) and classifying topologies, as illustrated in Fig. 5b. A vertical trajectory in the UMAP space (marked by an increase in *Z*_{2}) results in an almost monotonic increase in \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) for the generated polymer topologies (Fig. 5c). Conversely, a horizontal trajectory (associated with an increase in *Z*_{1}) moreso transitions topology classes with slight variations in \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) ((Fig. 5d). Omitting the auxiliary tasks leads to less distinct separation of topological classes and disrupts the monotonicity of the \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) (Supplementary Fig. 6). The latent spaces of GNN and Topo (Supplementary Fig. 5) are prone to similar issues. Overall, this highlights the effectiveness of the workflow for TopoGNN to produce an intuitive and physically meaningful latent space.

The latent space of TopoGNN can be used to generate a diverse set of polymer topologies. This is exemplified by computing the Vendi Score (VS) for each architecture (see section “Machine Learning Details” for details) and comparing it to that of the originally constructed dataset. Whereas the VS for the original dataset (1342 points) is 2.0968, that for 1342 topologies generated using TopoGNN is 5.0684, which exceeds those for GNN (4.9580) and Topo (4.3305). Examples of the generated polymer topologies and their distribution are shown in Supplementary Figs. 7–10. This indicates that all models can generate a more diverse range of polymer topologies compared to the original handcrafted dataset, which could have implications for downstream tasks, as explored in the next section.

### Property-guided polymer topology generation

To illustrate one application for TopoGNN, we generate a series of distinct polymer topologies that exhibit specific \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). While \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) itself is a fundamental characteristic of the polymers, the rationale here is moreso to demonstrate the production of alternative materials with similar characteristics and further to assess how topology affects other polymer properties, such as rheology, without conflation of other factors. We therefore select target \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) ranges of 7.5 ± 2, 30 ± 2, and 50 ± 2 which represent the low, intermediate, and high regions of \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) in the dataset, respectively (Fig. 2) and conditionally sample polymers from the latent space across the different topological classes. The \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) are then validated for the generated polymer topologies using MD simulation. These results are shown in Fig. 6, which illustrates that TopoGNN can indeed produce a range of distinct structures that exhibit effectively similar \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). \(\left.{{{\rm{Targeting}}}}\right\rangle =7.5\pm 2\) predominantly yields dendrimer and star topologies, targeting \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle =30\pm 2\) yields branch, comb, cyclic, and star topologies, and targeting \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle =50\pm 2\) mostly yields in branch and comb architectures. With the current approach, however, architectures that satisfy specific targets cannot be arbitrarily produced based on the molecular-weight restrictions. For example, dendrimers are more or less restricted to low \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\), while linear polymers are mostly restricted to larger \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). Moreover, relatively few polymers meet the ambitious target of 50 ± 2, which is consistent with the paucity of data points around \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle =50\pm 2\) within the original dataset; however, the group of polymers here uniformly exceed those of the smaller 30 ± 2 target. Interestingly, TopoGNN also proffers architectures, such as irregular dendrimers and nuanced branching patters in stars and combs, that go beyond those of the original dataset. Overall, these results reflect the intended capability of TopoGNN to generate a broad spectrum of original polymer topologies that align with a target property.

### Rheological Analysis

The viscosity-modifying properties of polymers are key to numerous applications^{62,63,64} and depend on a variety of factors, including unit chemistry, polymer composition, and chain topology^{65,66}. The relative impacts of such factors can be difficult to disentangle. Using TopoGNN, we specifically explore the influence of polymer topology on rheological characteristics. While solution viscosity at dilute concentrations is primarily determined by polymer size, which sets the overlap concentration^{67}, we control for this factor by designing topologies with specified \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) and examine topological implications across a range of concentrations. Figure 7a examines the concentration-dependent shear viscosity as determined from MD simulations of four selected topologies. Figure 7a presents concentration-dependent shear viscosity from MD simulations of four selected topologies. Differences emerge beyond 0.4 *σ*^{−3}, with cyclic polymers showing lower viscosities due to reduced entanglements, and branched polymers exhibiting elevated viscosities due to extended side chains. Star and comb polymers demonstrate similar, somewhat lower shear viscosities compared to branched polymers, highlighting the impact of side-chain position and density on entanglement effectiveness. Additionally, we observe nuanced differences in frequency-dependent storage and loss moduli, \(G^{\prime}\) and *G*″, across topologies and concentrations (Fig. 7b, c). While all solutions exhibit liquid-like viscous behavior at low frequencies and solid-like behavior at high frequencies below 0.6 *σ*^{−3}, star, branch, and comb polymers display three crossover frequencies as concentration increases. In contrast, cyclic polymers maintain a single crossover frequency, indicating less nuanced viscoelastic behavior. The presence of multiple crossover frequencies at higher concentrations (Fig. 7c and Supplementary Fig. 11) has been previously observed in both simulations and experiments^{68,69,70,71,72}. Notably, the plateau between the lowest and second lowest crossover frequencies, where \(G^{\prime}\, >\, G^{\prime\prime}\) signifies a rubbery plateau attributed to polymer entanglement. Regarding the relative viscosities of differing architectures, some results are also resonant with prior work. For example, cyclic polymers exhibit relatively lower viscosities, which is due to the absence of free ends that tends to reduce entanglements^{73}, and *α**ω*-branched polymers tend to possess higher viscosities, which is consistent with expectations set by experimental investigation of the impact of side-chain length on viscosity^{74,75}. Here, polymers classified with comb architectures have a similar number of side chains and similar backbone lengths as those classified as *α**ω*-branched architectures; however, the side-chains are shorter, resulting in less effective friction and lower viscosity. This highlights potential for how rheological properties might be modulated through strategic architecture design.

## Discussion

This study employed variational autoencoders to address emergent combinatorial complexity of diverse polymer topologies, which has been scarcely addressed in machine learning of macromolecules. We constructed an extensive dataset featuring the average squared radius of gyration (\(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\)) for 1342 polymers with various architectures, including linear, cyclic, branch, comb, star, and dendrimer structures. By analyzing different encoding strategies and input representations, we found that meaningful latent spaces of polymers with complex topologies can be established by (i) incorporating both graph-explicit and graph-derived features and (ii) coupling graph reconstruction tasks with auxiliary prediction tasks, such as those related to physical properties. Probabilistic sampling over the latent space was shown to result in rich topological diversity. These generative capabilities were then used to produce distinct polymer topologies with target characteristic sizes in dilute solution. This enabled subsequent investigation by coarse-grained molecular dynamics into how topology influences rheological properties, such as shear viscosity and viscoelastic moduli, while controlling for polymer size. While all architectures exhibited similar rheological behavior at relatively low concentrations, distinct responses emerged at higher concentrations. For instance, localized branches at chain ends resulted in more viscous solutions compared to other architectures, including cyclic structures that exhibited minimal entanglements. Apart from illustrating how rheological behavior might be tuned or altered via polymer architecture, this also showcases a paradigm for studying the physical properties of topologically distinct systems.

This work also invites several future research directions. Particularly, TopoGNN exhibits promising potential as a generative model, offering a cost-effective alternative to experiments or simulations in predicting properties like \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). While \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) serves as a straightforward and computationally accessible quantity, there is interest in extending the strategy to incorporate or utilize other properties. Although this work leveraged TopoGNN to simply compare rheological properties in systematic fashion, in the future, it may be deployed to guide design efforts aimed at optimizing polymer properties. We also note that the dataset and machine learning framework are currently limited to polymers with a narrow range of bead numbers (equivalently, molecular weights). Future research will explore the extensibility and transferability of machine learning architectures across various molecular weights, potentially through the use of string-based representations^{76,77,78}.

This study also focused on specific structural and rheological characterizations of chemically homogeneous and precise polymers at coarse-grained resolution. The ML framework might be feasibly extended to address compositional complexity; however, such efforts will need to address increased data burdens to capture the behavior of such systems. For this and other reasons, TopoGNN and related ML strategies will benefit from advancements that accelerate molecular simulation to increase data throughput, and those that improve the accuracy of CG models, which will expand the validity and range of properties that can be reliably computed. In particular, parameterized CG models may not be transferable across all thermodynamic conditions of interest, and dynamical consistency between CG models and high-resolution systems poses a persistent challenge^{40}. Finally, the dataset and ML models introduced here feature precisely defined polymer architectures. Although the architectural classes studied are broadly accessible, precision control over architecture is beyond current synthetic capabilities. Therefore, future efforts in both the ML and modeling space must address how to predict and represent ensembles of polymer structures^{77} that are accessible with modern synthetic approaches and appropriately tailoring generative capabilities towards these^{58}. Overall, understanding and controlling the properties of polymers, which involve chemical, compositional, and topological complexity, and aligning these properties with synthesizable polymer systems remains a significant challenge in polymer science. This study provides a foundation to pursue these directions.

## Methods

### Description of dataset

The dataset comprises 1342 polymer architectures, each containing between 90 and 100 constitutional units, or beads. Polymer architectures encompass a wide range of topologies, including linear, cyclic, branch, comb, star, and dendrimer structures. Due to limitations bead count, linear and cyclic topologies are restricted to 11 distinct polymers each, whereas other topologies are represented by 330 distinct polymers each. The polymers are chemically homogeneous with all beads treated equivalently. The procedure for generating polymer graphs is described in the Supplementary Discussion Section 2. For each polymer graph, we calculate an 11-dimensional topological descriptor vector^{43,79} using the number of nodes, number of edges, average degree, average neighbor degree, density, diameter, radius, algebraic connectivity, degree centrality, betweenness centrality, and degree assortativity as elements. For further details on these descriptors, readers are referred to Supplementary Discussion Section 1.

### Calculation of polymer properties

#### Radius of gyration

We investigate the structural properties of individual polymer chains using coarse-grained molecular dynamics. To do so, we compute the gyration tensor **S**:

where **r**_{i} denotes the position vector of the *i*th bead, **r**_{cm} represents the center-of-mass position of the polymer, and *T* indicates the transpose operation. Diagonalizing yields \({{{\bf{S}}}}={{{\rm{diag}}}}({\lambda }_{1}^{2},{\lambda }_{2}^{2},{\lambda }_{3}^{2})\) where the diagonal elements are the principal moments of the gyration tensor ordered as *λ*_{1} ≤ *λ*_{2} ≤ *λ*_{3}. The squared radius of gyration can be subsequently computed as

and quantifies the size of a given polymer conformation. The ensemble average \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) is the constructed using a series all sampled configurations. This ensemble-averaged quantity serves as the target for the regression auxiliary task.

#### Rheological properties

We also characterize several rheology-related properties for select polymer systems. The shear viscosity *η* of the polymer solution is formally calculated via

where *G*(*t*) denotes the stress relaxation modulus. We determine *G*(*t*) using the Green-Kubo relation

with *V* representing the simulation box volume, \(\overline{{\sigma }_{\alpha \beta }}(t)\) signifying the off-diagonal stress tensor components averaged at intervals of 1000 steps, and 〈 ⋯ 〉 denoting an ensemble-average. Often, *G*(*t*) exhibits significant noise at long times, which renders direct numerical integration of Eq. (4) unreliable. Therefore, following prior work^{80}, we fit the simulated *G*(*t*) data to a generalized Maxwell model, given by \(G(t)={\sum }_{p}{G}_{p}\exp (-t/{\tau }_{p})\), where *G*_{p} and *τ*_{p} represent the modulus and relaxation time of the *p*-th element, respectively. This approach yields the viscosity *η* = ∑_{p}*G*_{p}*τ*_{p}. We also compute the storage modulus (\(G^{\prime}\)) and the loss modulus (*G**″*) to better characterize the viscoelastic properties of the polymers. These moduli are obtained from the Fourier transform of the stress relaxation modulus, yielding

Here, \(G^{\prime} (\omega )\), the storage modulus, reflects the elastic, or energy-storing, aspect of the material, while *G**″*(*ω*), the loss modulus, represents the viscous, or energy-dissipating, component. This analysis is thus restricted to linear viscoelasticity.

### MD simulation details

MD simulations are used to generate polymer configurations for the characterization of polymer properties. All simulations are conducted using the LAMMPS simulation package^{81} in reduced units; the units of mass, distance, and energy are denoted by *m*, *σ*, and *ε*, respectively. The reduced time unit follows as \({(m{\sigma }^{2}/\varepsilon )}^{1/2}\). All simulations are considered to take place in an implicit athermal solvent environment, with dynamics of the polymer(s) governed by the Langevin equation, such that hydrodynamic interactions are neglected. The equations-of-motion are numerically integrated using the velocity-Verlet integration scheme with a 0.001 timestep. The solvent friction coefficient is set to *ς* = 0.1.

Polymer interactions are modeled via a combination of bonded and nonbonded potential energy contributions. The total potential energy *U* of a system with configuration **r**^{N} is expressed as:

where *r*_{ij} represents the internal distance calculated from the coordinates **r**^{N}. The nonbonded energy contributions for all pairs of beads are computed using the following equation:

where *ε*_{ij} and *σ*_{ij} are set to 1. For directly bonded beads, the stretching energy is calculated as:

where *K*_{ij} is assigned a value of 30, and \({R}_{ij}^{(0)}\) is fixed at 1.5.

#### Single-chain simulations

Simulations of single coarse-grained polymer chains (no boundary conditions) are used to characterize \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). Each simulation is conducted for 2 × 10^{7} steps, allocating the first half for system equilibration. Configurations for analysis are sampled every 2 × 10^{3} timesteps during the latter half of the simulation.

#### Many-chain simulations

Simulations of many chains within a simulation cell with cubic periodic boundary conditions are used for rheological analysis of a subset of polymers with comparable ensemble-averaged square radii of gyration, \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\). Simulations are performed across various concentrations (0.1 to 0.8) to cover both semi-dilute and semi-concentrated regimes. Each simulation uses 100 chains with the simulation cell dimensions adjusted to match the desired concentration. Equilibration periods of 10^{7} steps are utilized for all simulation concentrations. Upon achieving equilibrium, data are collected for 10^{7} steps at a timestep of 0.001. We note that using an implicit-solvent environment and neglecting hydrodynamic interactions has implications for simulating rheological properties^{82}. However, while these choices affect data generation and its interpretation relative to the physics of real polymer solutions, they do not affect the analysis of the ML task.

### Machine learning details

#### Data preprocessing

Polymers are represented using graph notation \({{{\mathcal{G}}}}=(V,E)\), where *V* is the set of nodes, and *E* is the set of edges. To address the variability in node counts across different polymers, ranging from 90 to 100, we introduce “ghost” nodes with zero-edge connections to standardize graph sizes to 100 nodes using node padding^{83,84}. Because all polymer beads are equivalent, the adjacency vector \({a}_{i}\in {{\mathbb{R}}}^{100}\) serves as the sole node feature for each polymer bead. Elements of this vector are defined such that *a*_{i} = 1 if node *i* is connected to the current node, and **a**_{i} = 0 otherwise. All bonds are also equivalent, and so edge features are not included in the representation. Polymers are also characterized by an 11-dimensional topological descriptor vector \({{{\bf{t}}}}\in {{\mathbb{R}}}^{11}\) as previously described. For the task of polymer reconstruction, an adjacency matrix \({{{\bf{A}}}}\in {{\mathbb{R}}}^{100\times 100}\) is associated with each polymer, where *A*_{ij} = 1 indicates an edge between nodes *i* and *j*, and *A*_{ij} = 0 indicates no edge. For the auxiliary regression task, each polymer is associated with a label for \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\), denoted \({y}_{r}\in {\mathbb{R}}\). For the auxiliary classification task, each polymer is associated with a one-hot encoded topology label, denoted \({{{{\bf{y}}}}}_{{{{\rm{t}}}}}\in {{\mathbb{R}}}^{6}\). The dataset of 1342 polymers is divided into three subsets: 858 for training (64%), 215 for validation (16%), and 269 for testing (20%). Stratified splitting is used to ensure each subset represents all polymer topologies. The training set is utilized to train the VAE, the validation set for hyperparameter optimization, and the test set to evaluate the model generalizability.

#### Model architectures

Overall, we explore three distinct encoder architectures while maintaining a uniform decoder architecture. The first model, designated as TopoGNN, combines a graph encoder with a topological descriptor encoder, thus operating as a multi-input model. The second model, GNN, exclusively employs the graph encoder. The third model, Topo, relies solely on the topological descriptor encoder. The architecture of the VAE for TopGNN is depicted in Fig. 8. The encoder transforms input data into a latent space representation. Graph inputs are represented using an adjacency matrix \({{{\bf{A}}}}\in {{\mathbb{R}}}^{100\times 100}\) and a node feature matrix \({{{\bf{X}}}}\in {{\mathbb{R}}}^{100\times 100}\), with the adjacency vector serving as the node feature due to identical nodes. The Graph Isomorphism Network encoder^{85}, equipped with two graph convolutional layers, maps these inputs into a 32-dimensional feature vector **h**_{g}. Despite its shallow architecture and narrow receptive fields, GIN has demonstrated robust performance across a range of tasks in materials science and chemistry^{86,87}. The topological descriptor vector is similarly converted into a 32-dimensional feature vector **h**_{t} by a dense neural network (DNN) encoder. Subsequently, the feature vectors **h**_{g} and **h**_{t} are concatenated to yield a combined feature vector \({{{\bf{h}}}}\in {{\mathbb{R}}}^{64}\). Additional dense layers generate the parameters of the latent Gaussian distribution: the mean ** μ** and the logarithm of variance \(\log {{{{\boldsymbol{\sigma }}}}}^{2}\). These parameters define the latent space embedding \({{{\bf{z}}}} \sim {{{\mathcal{N}}}}({{{\boldsymbol{\mu }}}},{{{\boldsymbol{\sigma }}}})\), which has a dimensionality of 8. The decoder then samples from

**z**to reconstruct data. A convolutional neural network is used to reconstruct the adjacency matrix \(\hat{{{{\bf{A}}}}}\). Additionally, two additional and distinct neural networks are tasked with predicting \({\hat{y}}_{{{{\rm{r}}}}}\) and \({\hat{{{{\bf{y}}}}}}_{{{{\rm{t}}}}}\). We note that the present approach does not enforce symmetry of the reconstructed adjacency matrix during training, similar to the approach of prior work using VAE to generate conjugated peptides

^{57}. However, symmetry is enforced during the polymer graph reconstruction and generation process by selecting \({\hat{{{{\bf{A}}}}}}_{{{{\rm{sym}}}}}={\max }_{ij}\{{\hat{{{{\bf{A}}}}}}_{ij},{\hat{{{{\bf{A}}}}}}_{ij}^{T}\}\).

#### Loss functions

Training of the VAE uses a composite loss function \({{{{\mathcal{L}}}}}_{{{{\rm{VAE}}}}}\)

which features terms associated with reconstruction, \({{{{\mathcal{L}}}}}_{{{{\rm{Rec}}}}}\) via binary cross-entropy (BCE); Kullback-Leibler (KL) divergence, \({{{{\mathcal{L}}}}}_{{{{\rm{KL}}}}}\); regression for *y*_{r}\({{{{\mathcal{L}}}}}_{{{{\rm{Reg}}}}}\); and classification for *y*_{t} via cross-entropy (CE), \({{{{\mathcal{L}}}}}_{{{{\rm{Cls}}}}}\). In Eq. (10), *λ*_{Reg} and *λ*_{Cls} are hyperparameter weights that are adjustable for optimizing performance. The individual loss terms are defined as follows:

#### Model training and hyperparameter tuning

All models are implemented using TensorFlow^{88}. Models undergo training for 1000 epochs with the Adam optimizer^{89}. A broad range of hyperparameters is explored, encompassing batch sizes {32, 64, 128}, learning rates {0.0001, 0.001, 0.01}, and regularization terms *λ*_{Reg} ∈ {0.01, 0.1, 1, 10, 100} and *λ*_{Cls} ∈ {0.01, 0.1, 1, 10, 100}. Criteria for model weight saving include overall validation loss, Evidence Lower Bound (ELBO), and reconstruction balanced accuracy. Across three encoder types, this approach results in 2025 distinct hyperparameter combinations. For each encoder type, the optimal hyperparameter configuration is selected based on a composite validation metric that combines several key performance indicators: reconstruction balanced accuracy (BACC), KL divergence, \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) regression *R*^{2} value, and the topology classification *F*_{1} score.

These metrics are min-max normalized

and consolidated into a four-dimensional vector as

Subsequently, the optimal hyperparameter configuration is determined as that nearest to the origin (0, 0, 0, 0). Since hyperparameter optimization does not involve updating model weights, compared to abstract loss functions, these metrics are more interpretable and directly related to our objectives, such as improving reconstruction, prediction accuracy, and model generalization.

#### Random polymer generation

To generate random polymer topologies, points are sampled from a predefined latent distribution, and the resultant latent vector, **z**_{gen}, is transformed into an adjacency matrix, **A**_{gen}. Each element in **A**_{gen} indicates the connectivity between nodes. To avoid spurious and unphysical edge-formation or other errors during reconstruction, generated polymers then undergo a graph-cleansing step. This step principally removes isolated nodes and breaks small rings. Because this modifies the original adjacency matrix, we implement a validation protocol, which is fully described in Supplementary Discussion Section 3. Briefly, the cleansed graph and its recalculated topological descriptors are re-encoded to derive updated values for \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) and topology class. Cleansed graphs are considered valid if they satisfy three criteria. First, the difference in \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) values before and after cleansing is less than 2 *σ*^{2}. Second, the topology classification is unchanged. Third, the mean squared difference between the pre- and post-cleansing latent vectors is less than 1. These criteria preserve the inherent properties of the generated polymers.

#### Polymer generation with target properties

To generate polymers with specific target properties, namely \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) and topology, “parent” polymers that exhibit these desired characteristics are first identified from the original dataset. The criterion for \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) is relaxed to allow a tolerance range of ± 2 around the target value. Points are then sampled near the latent-space vectors of the parent polymers by introducing Gaussian noise with a mean of 0 and a variance of 0.1. The \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) and topology of each generated candidate polymer is then predicted using the trained ML model. Candidates that do not exhibit target topology or deviate in \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) by more than 2 *σ*^{2} are discarded. Following this initial screening, polymer graphs undergo cleansing as previously described, except that \(\langle {R}_{{{{\rm{g}}}}}^{2}\rangle\) of candidates must more stringently remain within 2 *σ*^{2} of both the initial target and pre-cleansing values. Subsequently, non-distinct graphs, either duplicated from the original dataset or already present within the generated pool, are identified and removed through graph isomorphism checks. Additional details are in the Supplementary Discussion Section 4. The proportion of generated polymer graphs with target properties that undergo graph cleansing and pass all validation checks is detailed in Supplementary Table 1.

#### Latent-space visualization

The latent space is visualized using the Uniform Manifold Approximation and Projection (UMAP) algorithm^{90}. The parameters follow that of prior work^{43}, wherein the UMAP local neighborhood size is fixed at 200, the minimum embedding distance between points is set to 1, and the Euclidean distance metric is utilized in feature space analysis. This results in a mapping from \({{\mathbb{R}}}^{8}\) to \({{\mathbb{R}}}^{2}\): UMAP(**z**) = **u**, where **z** denotes a latent vector and **u** its corresponding low-dimensional representation.

#### Diversity evaluation

To calculate the diversity of a set of polymer topologies, each graph representation undergoes transformation into a Laplacian spectrum, encapsulating all eigenvalues of the graph Laplacian matrix. The Laplacian matrix is defined as the difference between the adjacency matrix and the degree matrix of the graph. Diversity quantification employs the Vendi Score (VS)^{91}, defined as:

where *λ*_{i} represents the eigenvalues of the matrix **K**/*n*, with the convention \(0\log 0=0\). The similarity function in use is the dot product between normalized Laplacian spectra, denoted as \({{{\bf{X}}}}\in {{\mathbb{R}}}^{n\times 100}\), with 100 indicating the maximum eigenvalue count. For spectral vectors shorter than 100, zero-padding ensures length standardization. For reference, the minimum VS value is unity.

## Data availability

The data associated with this study are publicly accessible at https://doi.org/10.5281/zenodo.10672434.

## Code availability

The code associated with this study is publicly accessible at https://github.com/webbtheosim/poly-topoGNN-vae.

## References

Bertoft, E. Understanding starch structure: recent progress.

*Agronomy***7**, 56 (2017).Gao, Y. et al. Complex polymer architectures through free-radical polymerization of multivinyl monomers.

*Nat. Rev. Chem.***4**, 194–212 (2020).Blosch, S. E., Scannelli, S. J., Alaboalirat, M. & Matson, J. B. Complex polymer architectures using ring-opening metathesis polymerization: synthesis, applications, and practical considerations.

*Macromolecules***55**, 4200–4227 (2022).Matyjaszewski, K. Atom transfer radical polymerization (ATRP): current status and future perspectives.

*Macromolecules***45**, 4015–4039 (2012).Chiefari, J. et al. Living free-radical polymerization by reversible addition-fragmentation chain transfer: the RAFT process.

*Macromolecules***31**, 5559–5562 (1998).Bazan, G. C. & Schrock, R. R. Synthesis of star block copolymers by controlled ring-opening metathesis polymerization.

*Macromolecules***24**, 817–823 (1991).Levi, A. E. et al. Efficient synthesis of asymmetric miktoarm star polymers.

*Macromolecules***53**, 702–710 (2020).Yoo, J., Runge, M. B. & Bowden, N. B. Synthesis of complex architectures of comb block copolymers.

*Polymer***52**, 2499–2504 (2011).Bousquet, A., Barner-Kowollik, C. & Stenzel, M. H. Synthesis of comb polymers via grafting-onto macromolecules bearing pendant diene groups via the hetero-Diels-Alder-RAFT click concept.

*J. Polym. Sci. Part A: Polym. Chem.***48**, 1773–1781 (2010).Bayer, U. & Stadler, R. Synthesis and properties of amphiphilic “dumbbell”-shaped grafted block copolymers, 1. anionic synthesis via a polyfunctional initiator.

*Macromol. Chem. Phys.***195**, 2709–2722 (1994).Knauss, D. M. & Huang, T. Star-block-linear-block-star triblock (pom-pom) polystyrene by convergent living anionic polymerization.

*Macromolecules***35**, 2055–2062 (2002).Liu, B., Kazlauciunas, A., Guthrie, J. T. & Perrier, S. One-pot hyperbranched polymer synthesis mediated by reversible addition fragmentation chain transfer (RAFT) polymerization.

*Macromolecules***38**, 2131–2136 (2005).Chen, S., Xu, Z. & Zhang, D. Synthesis and application of epoxy-ended hyperbranched polymers.

*Chem. Eng. J.***343**, 283–302 (2018).Hawker, C. J. & Frechet, J. M. Preparation of polymers with controlled molecular architecture. a new convergent approach to dendritic macromolecules.

*J. Am. Chem. Soc.***112**, 7638–7647 (1990).Lepoittevin, B., Matmour, R., Francis, R., Taton, D. & Gnanou, Y. Synthesis of dendrimer-like polystyrene by atom transfer radical polymerization and investigation of their viscosity behavior.

*Macromolecules***38**, 3120–3128 (2005).Lepoittevin, B. et al. Synthesis and characterization of ring-shaped polystyrenes.

*Macromolecules***33**, 8218–8224 (2000).Iatrou, H., Hadjichristidis, N., Meier, G., Frielinghaus, H. & Monkenbusch, M. Synthesis and characterization of model cyclic block copolymers of styrene and butadiene. comparison of the aggregation phenomena in selective solvents with linear diblock and triblock analogues.

*Macromolecules***35**, 5426–5437 (2002).Zhang, H., Gnanou, Y. & Hadjichristidis, N. Well-defined polyethylene molecular brushes by polyhomologation and ring opening metathesis polymerization.

*Polym. Chem.***5**, 6431–6434 (2014).Zhang, H. & Hadjichristidis, N. Well-defined bilayered molecular cobrushes with internal polyethylene blocks and

*ω*-hydroxyl-functionalized polyethylene homobrushes.*Macromolecules***49**, 1590–1596 (2016).Wever, D. A. Z., Picchioni, F. & Broekhuis, A. A. Polymers for enhanced oil recovery: a paradigm for structure-property relationship in aqueous solution.

*Prog. Polym. Sci.***36**, 1558–1628 (2011).Wever, D. A. Z., Polgar, L. M., Stuart, M. C. A., Picchioni, F. & Broekhuis, A. A. Polymer molecular architecture as a tool for controlling the rheological properties of aqueous polyacrylamide solutions for enhanced oil recovery.

*Ind. Eng. Chem. Res.***52**, 16993–17005 (2013).Fan, Z. W. et al. Topology and dynamic regulations of comb-like polymers as strong adhesives.

*Macromolecules***56**, 1514–1526 (2023).Xiong, C., Xiong, W., Mu, Y., Pei, D. & Wan, X. Mussel-inspired polymeric coatings with the antifouling efficacy controlled by topologies.

*J. Mater. Chem. B***10**, 9295–9304 (2022).Modica, K. J., Martin, T. B. & Jayaraman, A. Effect of polymer architecture on the structure and interactions of polymer grafted particles: theory and simulations.

*Macromolecules***50**, 4854–4866 (2017).Khabaz, F. & Khare, R. Effect of chain architecture on the size, shape, and intrinsic viscosity of chains in polymer solutions: a molecular simulation study.

*J. Chem. Phys.***141 21**, 214904 (2014).Wijesinghe, S., Perahia, D. & Grest, G. S. Polymer topology effects on dynamics of comb polymer melts.

*Macromolecules***51**, 7621–7628 (2018).Liu, Y. et al. Recent development in topological polymer electrolytes for rechargeable lithium batteries.

*Adv. Sci*.**10**, e2206978 (2023).Zhou, Y. et al. Dicationic tetraalkylammonium-based polymeric ionic liquid with star and four-arm topologies as advanced solid-state electrolyte for lithium metal battery.

*React. Funct. Polym.***145**, 104375 (2019).Zhang, L., Wang, S., Wang, Q., Shao, H. & Jin, Z. Dendritic solid polymer electrolytes: a new paradigm for high-performance lithium-based batteries.

*Adv. Mater*.**35**, e2303355 (2023).Su, Y. et al. Rational design of a topological polymeric solid electrolyte for high-performance all-solid-state alkali metal batteries.

*Nat. Commun.***13**, 4181 (2022).Webb, M. A. et al. Systematic computational and experimental investigation of lithium-ion transport mechanisms in polyester-based polymer electrolytes.

*ACS Cent. Sci.***1**, 198–205 (2015).Fong, K. D. et al. Ion transport and the true transference number in nonaqueous polyelectrolyte solutions for lithium ion batteries.

*ACS Cent. Sci.***5**, 1250–1260 (2019).Brandell, D., Priimägi, P., Kasemägi, H. & Aabloo, A. Branched polyethylene/poly (ethylene oxide) as a host matrix for Li-ion battery electrolytes: a molecular dynamics study.

*Electrochim. Acta***57**, 228–236 (2011).Cook, A. B. & Perrier, S. Branched and dendritic polymer architectures: functional nanomaterials for therapeutic delivery.

*Adv. Funct. Mater.***30**, 1901001 (2020).Yu, C. et al. Molecular dynamics simulation studies of hyperbranched polyglycerols and their encapsulation behaviors of small drug molecules.

*Phys. Chem. Chem. Phys.***18**, 22446–22457 (2016).Javan Nikkhah, S. & Thompson, D. Molecular modelling guided modulation of molecular shape and charge for design of smart self-assembled polymeric drug transporters.

*Pharmaceutics***13**, 141 (2021).Ahmad, S. et al. In silico modelling of drug–polymer interactions for pharmaceutical formulations.

*J. R. Soc. Interface***7**, S423–S433 (2010).Martinho, N. et al. Molecular modeling to study dendrimers for biomedical applications.

*Molecules***19**, 20424–20467 (2014).Polymeropoulos, G. et al. 50th anniversary perspective: polymers with complex architectures.

*Macromolecules***50**, 1253–1290 (2017).Dhamankar, S. & Webb, M. A. Chemically specific coarse-graining of polymers: methods and prospects.

*J. Polym. Sci.***59**, 2613–2643 (2021).Gartner III, T. E. & Jayaraman, A. Modeling and simulations of polymers: a roadmap.

*Macromolecules***52**, 755–786 (2019).Webb, M. A., Jackson, N. E., Gil, P. S. & de Pablo, J. J. Targeted sequence design within the coarse-grained polymer genome.

*Sci. Adv.***6**, eabc6216 (2020).Patel, R. A., Colmenares, S. & Webb, M. A. Sequence patterning, morphology, and dispersity in single-chain nanoparticles: insights from simulation and machine learning.

*ACS Polym. Au***3**, 284–294 (2023).Kosuri, S. et al. Machine-assisted discovery of chondroitinase abc complexes toward sustained neural regeneration.

*Adv. Healthc. Mater.***11**, e2102101 (2022).Tamasi, M. J. et al. Machine learning on a robotic platform for the design of polymer–protein hybrids.

*Adv. Mater.***34**, e2201809 (2022).Kumar, R. et al. Efficient polymer-mediated delivery of gene-editing ribonucleoprotein payloads through combinatorial design, parallelized experimentation, and machine learning.

*ACS Nano***14**, 17626–17639 (2020).Kumar, R. Materiomically designed polymeric vehicles for nucleic acids: quo vadis?

*ACS Appl. Bio Mater.***5**, 2507–2535 (2022).Panganiban, B. et al. Random heteropolymers preserve protein function in foreign environments.

*Science***359**, 1239–1243 (2018).Barnett, J. W. et al. Designing exceptional gas-separation polymer membranes using machine learning.

*Sci. Adv.***6**, eaaz4301 (2020).Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering.

*Science***361**, 360–365 (2018).Kingma, D. P. & Welling, M. Auto-encoding variational bayes. In

*International Conference on Learning Representations*(2014).Dieng, A. B., Kim, Y., Rush, A. M. & Blei, D. M. Avoiding latent variable collapse with generative skip models. In

*International Conference on Artificial Intelligence and Statistics*, 2397–2405 (PMLR, 2019).Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In

*International Conference on Machine Learning*, 2323–2332 (PMLR, 2018).Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules.

*ACS Cent. Sci.***4**, 268–276 (2018).Batra, R. et al. Polymers for extreme conditions designed using syntax-directed variational autoencoders.

*Chem. Mater.***32**, 10489–10500 (2020).Chiu, Y.-H., Liao, Y.-H. & Juang, J.-Y. Designing bioinspired composite structures via genetic algorithm and conditional variational autoencoder.

*Polymers***15**, 281 (2023).Shmilovich, K. et al. Discovery of self-assembling

*π*-conjugated peptides by active learning-directed coarse-grained molecular simulation.*J. Phys. Chem. B***124**, 3873–3891 (2020).Kim, S., Schroeder, C. M. & Jackson, N. E. Open macromolecular genome: generative design of synthetically accessible polymers.

*ACS Polym. Au***3**, 318–330 (2023).Everaers, R., Karimi-Varzaneh, H. A., Fleck, F., Hojdis, N. & Svaneborg, C. Kremer–grest models for commodity polymer melts: linking theory, experiment, and simulation at the kuhn scale.

*Macromolecules***53**, 1901–1916 (2020).Patel, R. A., Borca, C. H. & Webb, M. A. Featurization strategies for polymer sequence or composition design by machine learning.

*Mol. Syst. Des. Eng.***7**, 661–676 (2022).Patel, R. A. & Webb, M. A. Data-driven design of polymer-based biomaterials: high-throughput simulation, experimentation, and machine learning.

*ACS Appl. Bio Mater.***7**, 510–527 (2024).Inoue, K. Functional dendrimers, hyperbranched and star polymers.

*Prog. Polym. Sci.***25**, 453–571 (2000).Scott, A. J., Romero-Zerón, L. & Penlidis, A. Evaluation of polymeric materials for chemical enhanced oil recovery.

*Processes***8**, 361 (2020).Alves, T. F. R. et al. Applications of natural, semi-synthetic, and synthetic polymers in cosmetic formulations.

*Cosmetics***7**, 75 (2020).Martini, A., Ramasamy, U. S. & Len, M. Review of viscosity modifier lubricant additives.

*Tribol. Lett*.**66**, 58 (2018).van Ravensteijn, B. G. P., Zerdan, R. B., Hawker, C. J. & Helgeson, M. E. Role of architecture on thermorheological properties of poly(alkyl methacrylate)-based polymers.

*Macromolecules***54**, 5473–5483 (2021).Larson, R. G. The rheology of dilute solutions of flexible polymers: progress and problems.

*J. Rheol.***49**, 1–70 (2005).Colby, R. H., Fetters, L. J. & Graessley, W. W. The melt viscosity-molecular weight relationship for linear polymers.

*Macromolecules***20**, 2226–2237 (1987).Rubinstein, M. & Colby, R. H.

*Polymer Physics*(Oxford Univ. Press, 2003).Ferry, J. D.

*Viscoelastic Properties of Polymers*(John Wiley & Sons, 1980).Johnson, K. J., Glynos, E., Sakellariou, G. & Green, P. Dynamics of star-shaped polystyrene molecules: from arm retraction to cooperativity.

*Macromolecules***49**, 5669–5676 (2016).Roland, C., Archer, L., Mott, P. & Sanchez-Reyes, J. Determining rouse relaxation times from the dynamic modulus of entangled polymers.

*J. Rheol.***48**, 395–403 (2004).Pasquino, R. et al. Viscosity of ring polymer melts.

*ACS Macro Lett.***2**, 874–878 (2013).Inkson, N., Graham, R., McLeish, T., Groves, D. & Fernyhough, C. Viscoelasticity of monodisperse comb polymer melts.

*Macromolecules***39**, 4217–4227 (2006).Abbasi, M., Faust, L. & Wilhelm, M. Comb and bottlebrush polymers with superior rheological and mechanical properties.

*Adv. Mater.***31**, 1806484 (2019).Lin, T.-S. et al. BigSMILES: a structurally-based line notation for describing macromolecules.

*ACS Cent. Sci.***5**, 1523–1531 (2019).Lin, T.-S., Rebello, N. J., Lee, G.-H., Morris, M. A. & Olsen, B. D. Canonicalizing BigSMILES for polymers with defined backbones.

*ACS Polym. Au***2**, 486–500 (2022).Schneider, L., Walsh, D., Olsen, B. & de Pablo, J. Generative BigSMILES: an extension for polymer informatics, computer simulations & ML/AI.

*Digit. Discov.***3**, 51–61 (2024).Hu, G., Yan, W., Zhou, J. & Shen, B. Residue interaction network analysis of Dronpa and a DNA clamp.

*J. Theor. Biol.***348**, 55–64 (2014).Liang, H., Webb, M. A., Chawathe, M., Bendejacq, D. & de Pablo, J. J. Understanding the structure and rheology of galactomannan solutions with coarse-grained modeling.

*Macromolecules***56**, 177–187 (2022).Plimpton, S. J. Fast parallel algorithms for short-range molecular dynamics.

*J. Comput. Phys.***117**, 1–19 (1993).Ripoll, M., Winkler, R. & Gompper, G. Hydrodynamic screening of star polymers in shear flow.

*Eur. Phys. J. E,***23**, 349–354 (2007).Niepert, M., Ahmed, M. & Kutzkov, K. Learning convolutional neural networks for graphs. In

*International Conference on Machine Learning*, 2014–2023 (PMLR, 2016).Grattarola, D. & Alippi, C. Graph neural networks in Tensorflow and Keras with Spektral [application notes].

*IEEE Comput. Intell. Mag.***16**, 99–106 (2021).Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? In

*International Conference on Learning Representations*(2018).Peng, Y. et al. Enhanced graph isomorphism network for molecular ADMET properties prediction.

*IEEE Access***8**, 168344–168360 (2020).Bao, L. et al. Kinome-wide polypharmacology profiling of small molecules by multi-task graph isomorphism network approach.

*Acta Pharm. Sin. B***13**, 54–67 (2023).Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. In

*OSDI’16: Proc. 12th USENIX Conf. Operating Systems Design and Implementation*, 265–283 (USENIX Association, 2016).Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In

*International Conference on Learning Representations*(2015).McInnes, L., Healy, J., Saul, N. & Großberger, L. Umap: uniform manifold approximation and projection.

*J. Open Source Softw.***3**, 861 (2018).Friedman, D. & Dieng, A. B. The vendi score: a diversity evaluation metric for machine learning.

*Trans. Mach. Learn. Res.*(2023).

## Acknowledgements

M.A.W. and A.B.D acknowledge funding from the Princeton Catalysis Initiative for this research. M.A.W. and S.J. also acknowledge support from the donors of ACS Petroleum Research Fund under Doctoral New Investigator Grant 66706-DNI7.

## Author information

### Authors and Affiliations

### Contributions

S.J. performed all computations and analyses. S.J. wrote the paper with input from M.A.W. and A.B.D. M.A.W. and A.B.D. supervised all the work and edited the paper. M.A.W. and A.B.D. acquired funding for the work.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Jiang, S., Dieng, A.B. & Webb, M.A. Property-guided generation of complex polymer topologies using variational autoencoders.
*npj Comput Mater* **10**, 139 (2024). https://doi.org/10.1038/s41524-024-01328-0

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41524-024-01328-0