Systematic Coarse-graining of Epoxy Resins with Machine Learning-Informed Energy Renormalization

A persistent challenge in predictive molecular modeling of thermoset polymers is to capture the effects of chemical composition and degree of crosslinking (DC) on dynamical and mechanical properties with high computational efficiency. We established a new coarse-graining (CG) approach that combines the energy renormalization method with Gaussian process surrogate models of the molecular dynamics simulations. This allows a machine-learning informed functional calibration of DC-dependent CG force field parameters. Taking versatile epoxy resins consisting of Bisphenol A diglycidyl ether combined with curing agent of either 4,4-Diaminodicyclohexylmethane or polyoxypropylene diamines, we demonstrated excellent agreement between all-atom and CG predictions for density, Debye-Waller factor, Young's modulus and yield stress at any DC. We further introduce a surrogate model enabled simplification of the functional forms of 14 non-bonded calibration parameters by quantifying the uncertainty of a candidate set of high-dimensional and flexible calibration functions. The framework established provides an efficient methodology for chemistry-specific, large-scale investigations of the dynamics and mechanics of epoxy resins.


INTRODUCTION
Computational design of high-performance epoxy resins calls for methods to circumvent costly experiments. Chemistry-specific molecular models are critically needed to bridge the gap in scales between molecular dynamics (MD) simulations and experiments, while predicting accurately the highly tunable macroscopic properties of epoxy resins and their composites [1][2][3] . This remains a challenging problem to tackle due to the chemical complexity 4-6 of epoxy resins, the high number of properties that must be targeted for realistic predictions, and their strong dependence on the degree of crosslinking (DC) [7][8][9][10][11][12] . This up-scaling problem requires multi-dimensional functional calibration, taking inputs from high-fidelity simulations such as all-atomistic simulations. Allatom (AA) MD simulations have demonstrated great success in predicting the effect of DC on the glass transition temperature (Tg), thermal expansion coefficient and elastic response 13,14 of epoxy resins, and the fracture behavior of epoxy composites 15,16 . This makes AA-MD suitable for informing larger-scale models, provided that the data required for upscaling is not prohibitively expensive to obtain. While theoretical tools such as time-temperature superposition have been instrumental in bridging temporal scales 17,18 , AA simulations on their own remain prohibitively expensive for high-throughput design.
Systematically coarse-grained (CG) models can extend the length and time scales of MD simulations by orders of magnitude, but chemistry-specificity requires calibration of a complex force-field to match the properties of underlying AA simulations or experimental data. Most CG models proposed for epoxies matched the structural features 19 or the thermomechanical properties 20,21 for highly-crosslinked networks. Prior models have generally not addressed the question of transferability of the model over different temperatures or curing states, which is challenging because of the smoother energy landscape and reduced degrees of freedom of CG models compared to AA models 22,23 . This particular aspect requires a functional calibration of the forcefield parameters against DC, temperature (T), or any other variable over which transferability is desired. Machine Learning (ML) tools can efficiently handle such a parametric functional calibration in a complex force field. Despite the growing interest in utilizing ML approaches to CG modeling [24][25][26] , complex chemistries such as epoxy resins have not been explored extensively.
Progress was made on this issue in a recent epoxy CG model 27 where a particle swarm optimization algorithm was used to calibrate a T-dependent force-field for three different curing states with elastic modulus as the only target property. A general CG framework for epoxy resins that can target multiple properties at different DCs and demonstrate the method for more than one cure chemistry remains to be established. An accurate description of the dynamics and mechanical properties of partially cured epoxies is particularly relevant in the context of epoxy-based composites, where the exploitation of partial and multi-step curing processes can lead to enhanced performance of the epoxy resin for storage, additive manufacturing or functionalization 28 . Additionally, a model that can account for differences in curing degree across the material can be used to capture gradient properties within interphase regions of composites like CFRP 29 .
To address this issue, here we simultaneously target the DC-dependence of density, dynamics, modulus, and yield strength of two model epoxy resins. A parametric functional calibration requires the functional form to be defined a priori 30,31 . This is not required by non-parametric methods that construct the calibration functions through a reproducing kernel Hilbert space 32,33 .
However, either approach requires additional assumptions when used to calibrate functions in high-dimensional spaces to avoid identifiability issues [34][35][36] . For this reason, we employ a physicsinformed strategy, leveraging our recently developed energy renormalization (ER) 37 method, which calibrates the non-bonded interactions of the CG model in a T-dependent fashion to match the underlying AA simulation. Based on the generalized entropy theory of the glass formation 38,39 , the variation of the cohesive interaction of the CG model with varying external parameters allows to tune the activation energy of the system, which compensates for the different entropic variations of the AA and CG models caused by the different resolution of the energy landscape.
Recent ER models for different homopolymers 40 , molecular glass-formers 41 , and biomimetic copolymers 42 matched the mean square displacement at the picosecond time scale, 〈 ! 〉, to also predict dynamical and mechanical properties. This is because 〈 ! 〉 is strongly connected to diffusion 41 , relaxation time [43][44][45] , shear modulus 37 , and vibrational modes 46 in glass-formers.
Here we extended the ER protocol to a CG model for epoxy resins, focusing on the DC-transferability and simultaneously matching the density, dynamics, and mechanical properties of the systems. We supported our protocol with the use of Gaussian processes for the calibration of the force field. This particular ML technique is extremely efficient in treating high-dimensional parametrizations, and naturally incorporates multi-response calibrations. Details of our protocol are reported in the Methods section. We targeted a system with Bisphenol A diglycidyl ether (DGEBA) as the epoxy and either 4,4-Diaminodicyclohexylmethane (PACM) or polyoxypropylene diamines (Jeffamine D400) as the curing agents. We focused on this versatile system because recent experiments 47-49 on resins prepared using a combination of PACM and Jeffamines of varying molecular weight showed remarkable mechanical properties stemming for dynamical heterogeneities at molecular scales not easily accessible to AA models. For the DCdependent parameters of the CG force field, we initially assumed a relatively high dimension and flexible class of radial basis functions. For uncertainty quantification purposes, we calculated the fluctuations of the Gaussian process prediction in response to perturbations of the optimal solution. This information was then used to simplify calibration functions while maintaining a comparable degree of accuracy.
The manuscript is laid out as follows. We first report the target properties from AA simulations at different values of DC from 0% to 95%. Then, we define the parametric range for the non-bonded parameters of the CG models and determine the sensitivity of the target properties on the CG parameters in this 15-dimensional range. We train surrogate ML models based on the CG and AA simulations and we report the optimal functions for all the non-bonded parameters. Using uncertainty quantification, we simplify the functional form of the parametrization, resulting in only 21 free parameters needed to calibrate 14 functions. We show that the optimized CG model has excellent agreement with all eight (8) target macroscopic properties from the AA simulations.
Finally, we also show that optimal parameters for the target properties also provide a reasonably good match between AA and CG curves for the complete mean square displacements and stressstrain response datasets.

All-atom model target properties
The CG model for the proposed double curing agent epoxy resin system contains 7 types of beads, and 7 types of bonds and 10 types of angles among them. We aim to functionally calibrate the parameters of the CG model to simultaneously capture the DC-dependent density, 〈 ! 〉, Young's modulus, and yield stress at T=300K of an underlying AA model. The AA force field here employed has been validated for similar epoxy systems 50 , showing that it captures the glass transition temperature and fracture behavior of experimental systems. Details on the AA model are given in our Methods section. We first calibrate the bonded parameters using a standard Boltzmann inversion (BI) approach. More importantly, the non-bonded parameters calibration was done using Machine-Learning (ML) Gaussian process models as they are data-efficient 51,52 and enable the quantification of the modeling uncertainties intrinsic to MD simulations 53 . To manage the high dimensionality of inverse functional calibration, we employ a statistical inference approach to simplify the underlying function forms. We report a scheme of our CG model and a flowchart of our parametrization process in Figure 1.
The first step in the calibration of the CG force field was to set the parameters of the bonded potentials, which was done through a BI 54 approach, to match the probability distributions  Table 1.
To determine the non-bonded parameters, we first extracted initial values for the cohesive energies and bead sizes $ , %, ( = 1, … ,7) from the AA radial distribution functions of all seven CG beads of the model using BI. These non-bonded parameters correctly reproduce the structure of the AA system in CG representation but fail to capture the macroscopic dynamics and mechanical properties of the system. This inadequacy makes the model insufficient to extract quantitative information from the simulations and guide the experimental design of these materials. In this study, we treat the non-bonded force field parametrization as a multi-objective optimization problem where we aim to determine 14 parameters $ , %, ( = 1, … ,7) to simultaneously match the target density, Debye-Waller factor 〈 ! 〉, Young's modulus, and yield stress at all DCs. Figure 2 reports the values of density, 〈 ! 〉, Young's modulus, and yield stress of the AA systems for DGEBA+PACM and DGEBA+D400. We note that the values found for the Young' modulus of the high DC systems are in line with experimental results 47,49 , in the range of 2.5 to 3 GPa. For both systems, the density and mechanical properties increase with increasing DC, while 〈 ! 〉, a marker of mobility, decreases. This is expected, and more pronounced in the DGEBA+PACM system, which has stiffer and less mobile chain networks due to the rigidity of the curing agent PACM. Flexibility introduced by D400 increases mobility and reduces density as well as mechanical properties of the DGEBA+D400 system 47  Young's modulus in particular changes differently depending on DC in the two systems, since the spatial density of crosslinks is higher in the DGEBA+PACM system due to the lower molecular weight of PACM compared to D400. In other words, because of the different chain configurations of the curing agent, increasing DC leads to different changes in configurational entropy caused by the reduction in degrees of freedom. In addition, we observe that the dependence of Young's modulus on DC is nonlinear, indicating complex changes of configurational entropy with increasing DC in the epoxy resin networks.

Non-bonded CG force field: sensitivity analysis
Any fixed parametrization of the CG model is not able to match the properties of the AA system at all DC values, as we show in Supplementary Figures 2 and 3 in our Supplementary Note 2. This is arguably caused by the different rate with which the configurational entropy of the AA and CG models changes with varying DC, similarly to what happens with varying T 37 . Thus, we introduced a DC-dependence for all non-bonded parameters $ , % = [ (DC), (DC)], ( = 1, … ,7) . In previous models with highly homogeneous polymers and few CG bead types, it was possible to study the dependence on temperature with manual parameter sweeps. ER in these circumstances required only one T-dependent function to rescale all cohesive energies (the ' ) and another to rescale all the effective sizes of the CG beads (the ' ). We found that this was not possible in our current epoxy model due to the high complexity of the system, including the effect of crosslinks and the large amount of CG beads with different cohesive energies and sizes. Here, we introduced a generalization of previous protocols that relies on ML to explore the high-dimensional space of the model parameters. The idea is to surrogate the AA and CG models with Gaussian random processes followed by minimizing the difference between the CG and the AA models for all DC with respect to the calibration functions. Preserving the seminal idea of the ER procedure, the protocol outlined in this paper can be easily generalized to any CG model. We used the simulation data presented in Figure 2 to train the AA Gaussian process models: 19 samples for the DGEBA+PACM system and 20 samples for the DGEBA+D400 system. In the AA model, DC is the only input variable. For the CG model, DC and the non-bonded parameters $ , % are the input parameters. The range of the parameters was determined by preliminary simulations calibrating the cohesive energies either to match the dynamics of the AA systems at DC=0% or the Young's modulus at DC=90% or 95% (the highest DC we can achieve for the DGEBA+PACM or DGEBA+D400 AA networks respectively). This gave us extremes for the values of cohesive energies # , and we further expanded them by around 20%. We also selected a range of around +/-20% for the # parameters from the initial estimate obtained from the BI of the radial distribution functions. We report the final range for all parameters $ , %, ( = 1, … ,7) in Supplementary Table   1 of the Supplementary Note 7. Our ranges were post-validated by our final calibration, as discussed in the following.
We trained the Gaussian process surrogate models on 700 simulation samples of the CG DBEGA+PACM system, which also allowed us to fine-tune the extremes for the calibration parameters. Then we trained 500 simulation samples of the CG DGEBA+D400 system, where fewer simulations where needed thanks to the initial fine-tuning. With these surrogates it was possible to perform a variance-based sensitivity analysis, as reported in Figure 3. This type of analysis provided insight into how the responses of the surrogate models depend on their inputs 57,58 .
As one would expect, the analysis revealed a strong influence of the ' parameters on the density, while the dynamics and mechanical properties of the system depend more on the cohesive energies ' . This separation was already assumed in previous ER models 40 and it was confirmed here. Since the main sensitivity (white, thinner bars) dominates the total sensitivity (which includes the higherorder interaction effects between the input parameters) in all cases, the response of the CG model can be approximated with a first-degree polynomial. This also suggests that many of the functional relations between the forcefield parameters and DC can be described through a linear function, since the target responses presented in Figure 2 are also close to linear. The relative contribution of the different cohesive energies to our target properties is similar for 〈 ! 〉, Young's modulus, and yield stress. DC is as relevant as the cohesive energies for 〈 ! 〉 and yield stress, while its role is suppressed for the Young's modulus. We notice the prominent influence of the parameter ( on all four measures used here to quantify the mechanical and dynamical properties of the DGEBA+D400 network. This is expected, as bead 6 is a relatively large bead in the repeated unit of the longer D400 molecule. As such, bead 6 makes up for around 28% of all the CG beads of the network, and close to 40% in terms of the bead volume. Variations of ( lead to large changes in the density of the system, as well as dynamics and mechanical properties.

CG force field optimization and validation
Before the calibration of the CG force-field, we needed to identify a flexible candidate class of calibration functions for the nonbonded parameters of the CG model. Previous ER papers 40-42 for simple glass-forming polymers used a sigmoid function for the temperature dependence of cohesive energy and bead size with temperature. The choice is theoretically supported 38 by the transition from the Arrhenius regime of liquids at high temperature to the glassy regime below the glass transition temperature Tg, with the supercooled phase in between dominated by the caging dynamics and α-relaxation processes. We initially assumed a similar sigmoidal function for DC, roughly equating an increase in DC to a decrease in temperature given that both actions slow down dynamics. We found this constraint to be too restrictive for our systems: minimizing the discrepancy between the AA and CG response ( equation (3)  To uncover what functions best describe the DC dependence of the 14 non-bonded parameters, we employed a class of radial basis functions (RBF) described in our methods section. We assumed that each calibration function shares the same shape parameter and that we have three centers for each calibration parameter = [0% , 50% , 100%]. The number of centers can be increased to capture more complex behavior, but at the cost of overfitting the data and getting unrealistic approximations of the 'true' calibration functions. Our goal was to obtain the simplest force field that is still able to capture the response of the system. To demonstrate the effect of an overfitting parametrization, we include an example in the Supplementary Note 4 (see Supplementary Figures   6 and 7) where the model has been calibrated at DC = 5% increments without analytical description.
The approach described so far using RBF for all the parameters gave us a possible solution for our force-field (see Supplementary Figures 8 and 9 in the Supplementary Note 5), but at the cost of a highly complex parametrization. We wanted to simplify our model by reducing the degrees of freedom of the parametrization without affecting the model's accuracy. Given that our CG and AA models have intrinsic uncertainty that is approximated with our Gaussian process models through the assumption of homoscedasticity, we calculated the probability that for a specific set of calibration parameters the CG models came from the same distribution as the AA models through an objective function that captures the goodness of fit: where the subscript corresponds to the 78 response variable. Equation (1) has similar properties as a likelihood function and thus lends itself to be used in an approximate Bayesian computation scheme to get a posterior approximation of the parameters that make up the calibration functions.
Through a quasi-random sampling scheme, we approximated the first two statistical moments of the calibration functions.
The green curves in Figure 4 show the functions in the RBF class that maximize the objective function of the CG and AA models yielding the same target properties, where the uncertainty quantification for each function is also reported (green band). Note that some of the calibration functions have a large envelope of uncertainty (e.g., ( and 9 ), while others have a small uncertainty envelope (e.g. ! and ( ). If the uncertainty envelope is small, we were able to make a well-informed decision on the class of functions that would be most suited to model the nonbonded force field relation to DC. When the uncertainty bounds are large, then the choice of function is not consequential to the calibration accuracy, and we were able to simplify the function.
In essence, the quantified uncertainty provides a decision support tool that gives modelers insight into what calibration functions are most significant to the calibration accuracy. The functions' uncertainty reported in Figure 4 is a local measure of uncertainty around the function mean value considering all the target properties, while the sensitivity analysis of Figure 3 is a global measure in the whole parameter space for each property separately. Still, it is possible to connect the two quantities considering the joint probability distributions. We discuss this briefly in our Supplementary Note 8 (see Supplementary Figures 11 and 12), and we will report these technical findings in detail in an upcoming paper focused on the statistical analysis approach to functional calibration.
With this procedure, it was possible to drastically simplify our parametrization, reducing most functional forms either to linear functions or constants with changing DC. For the simplification, we used the results presented in Figure 4 and considered either a constant function or a linear function if it would fit within the envelope of uncertainty (where we preferred constant over linear as it requires one fewer parameter). With this initial guess, we used Equation (3)  properties in the AA and CG force fields, as we show in the following, which is robust against these variations. For reproducibility purposes, we include in our supplementary materials our complete data set, inputs and outputs of all AA and CG simulations, as well as the LAMMPS input files and structure used to obtain these results.
We report in Table 2 the analytical description of all the parameters in the simplified formulation shown in the black curves of Figure 4. For each parametrization, the ML algorithm predicted the response of the CG model for all target properties as a function of DC, which was compared to the values of the same properties in the AA Gaussian process model through Equation (3). For the parametrization shown in Figure 4, the ML-predicted response of the CG model compared to the AA values is reported in Figure 5. For each target property, the ML extrapolation assigned a confidence interval in addition to the expected value for both the AA and the CG systems, with larger intervals for complex properties like the Young's modulus, that has a higher measurement uncertainty (see Figure 2c) and, for the CG model, large sensitivity to the variation of the force field parameters. The CG prediction is in line with the AA values for all properties and at any DC. Our parametrization has a high level of accuracy, and we found a fair agreement 59 (average RMSRE = 10%) between the AA and CG responses. We also note that the limit on the accuracy is easily generalizable to any system, for any set of target properties. Higher accuracy can be achieved, if needed, at the cost of a more complex force field. We discuss other possible parametrizations in our Supplementary Notes. We note that the framework here developed can be generalized to different systems of high chemical complexity, where a tradeoff between accuracy and generality of the CG force field must be considered depending on the goal and application of the model. Our method can be readily applied to multi-objective parametrizations, where proper weights are attributed, tailoring the force field to specific applications.
Finally, we discuss the results of the CG simulations performed with the parameters reported in Table 2. The stars in Figure 5 correspond to the values of the target properties extracted from CG simulations performed with the simplified parametrization of Figure 4, showing the agreement between the CG Gaussian process prediction and the actual CG simulation.

CG model predictivity beyond target properties
With the validated approach and optimized CG force field parameters, we now report the overall dynamics and mechanical response of the CG and AA systems with varying DC. Figure 6 shows the MSD and stress-strain curves up to 20% tensile deformation for both DGEBA+PACM and DGEBA+D400 systems at DC=0%, 50%, and 90-95% (for PACM and D400 respectively  The CG model here reported is »10 3 times faster than AA simulations and it will allow the investigation of a broad class of epoxy resins beyond the nanoscale, providing quantitative predictions to explain experimental findings and to guide the design of new materials. By introducing bond-breaking events at large deformations, it would be possible to use this model to study the fracture and impact resistance of epoxy resin networks. Our preliminary results show that this model is robust when multiple curing agents in varying stoichiometric ratio are used, but a more quantitative analysis will be the focus of a future study. Thanks to the larger scales achievable by this model, it will also be possible to investigate the properties of composite systems by adding nanofillers, polymer matrixes or other elements to the resin, at size scales of hundreds of nanometers. The ML tools developed for the parametrization of our model allowed the extension of the energy renormalization CG protocol to a highly complex system with multiple target macroscopic properties. The same scheme can be adopted by the modeling community for the creation of chemistry-specific CG models of arbitrary complexity, coupling physical intuition with the computational power of Gaussian processes for the exploration of the force field parameters space.

Systems preparation
Our simulations were performed with the LAMMPS software 60  DGEBA and 500 D400 molecules. In our CG representation, shown in Figure 1a, we used five beads to represent DGEBA (with only three different bead types due to the molecular symmetry), four beads to represent PACM (of two different types), and fifteen beads (of three different types) to represent D400. This choice allowed us to have independent beads to conveniently use for crosslinking (one for the epoxide group and one for the amino group). The centers of the beads locate at the center of mass of the grouped atoms. We note that other mappings might also work, and have been used in the literature 27 . We think that the capability of our ML protocol is robust to variations in the mapping choice, though rigorous testing of this idea is beyond the scope of this paper. We refer to the DGEBA beads as beads 1, 2, 3; PACM beads as beads 4, 5; D400 beads as

Crosslinking protocol
We used the Polymatic package 63 to create crosslinks in our systems in cycles of polymerization.
In each cycle, the Polymatic algorithm created a certain number of new bonds between target beads within a distance criterion, and for each new bond, it updated the topology Each amine group can be connected to two DGEBA epoxide groups. In the formation of our networks, we first prioritized the crosslinking between an epoxide group and an amine group with no other crosslinks, creating networks with a DC of up to 50%. After that, we created crosslinks between amine groups and epoxide groups of DGEBA molecules that are not already in the same network, to avoid the formation of closed loops involving only a fraction of the molecules of the system. This restriction allows up to 75% crosslinked networks, at which point all molecules of the system are connected to the same network. We applied no restriction after that, and stopped the procedure when the formation of a new crosslink is not achieved within 30 MD cycles. This limit was at DC=90% for the atomistic DGEBA+PACM system, at DC=95% for the AA DGEBA+D400, and at DC>99% for the CG systems. The data production of this work used these networks with varying chemistry and DC as starting points.

Data production
After a short run with a non-bonded soft potential at T=300 K and P=0 atm to remove overlapping atoms, we followed previous annealing protocols 64 to reach an equilibrated state (signaled by zero residual stress in the system) at room temperature and pressure in the NPT ensemble. For the AA systems, we used a timestep of 1 fs. We first increased the temperature to T=600 K and the pressure to P=1000 atm in 50 ps in NPT ensemble, then equilibrated the system for 100 ps at high T and P, then quenched down to T=300 K and P=0 atm in 100 ps and finally equilibrated at T=300 K and P=0 atm for 200 ps. The mean square displacement of the systems was calculated after the equilibration, for the following 100 ps, then a tensile deformation was performed in the NPT ensemble at strain rate ̇= 0.5 > s ?4 . 〈 ! 〉 was calculated from the mean square displacement at * = 3 ps, following previous protocols 40 . The choice of the timescales was made to obtain an equilibrated system within a reasonable computational time. The tensile deformation was performed separately in three different directions, i.e., x, y and z to obtain improved statistics of the mechanical properties of the systems. The Young's modulus was calculated from the slope of the stress curve during the tensile test within total strain=2%. The yield stress was calculated at the intersection of the stress curve with a fit of the Young's modulus shifted to start at strain=3%.
The CG systems used a timestep of 4 fs. They were first equilibrated at T=800 K and P=100 atm, then quenched to 500 K and 0 atm to relax the pressure, then quenched in temperature to 300 K and 0 atm, and finally equilibrated at constant T=300 K and P=0 atm. Each of these simulation phases run for 2 ns. The dynamics was then measured in the equilibrated state to extract 〈 ! 〉 and density. A tensile test with strain rate ̇= 0.5 × 10 > s ?4 (same as the AA simulations) was performed in the NPT ensemble to extract the Young's modulus and the yield stress.

Machine Learning and Functional Calibration
A key component of the proposed framework is the adoption of Gaussian process ML models to replace our costly AA and CG models and simulations. The motivation for choosing Gaussian process models over other ML models (e.g., in comparison to artificial neural networks 65 and random forests 66 ) is that they are data efficient and enable the quantification of prediction uncertainty. The uncertainty quantification allows us to start with a high-dimensional parametrization with many free parameters, and simplifying the final solution based on the predicted uncertainty, as we show in Figure 4. Gaussian processes naturally incorporate the multiresponse calibration that we need. Finally, we remark that the convergence of alternative methods such as a particle swarm optimization (PSO) algorithm would require millions of CG simulations even for a 30-dimensional function 67 with exponential growth, whereas our protocol only needed around 1000 CG simulations to converge for a 43-dimensional problem. For the epoxy model of interest, we trained four Gaussian process models for the DGEBA+PACM and DGEBA+D400 systems (two CG and two AA models).
For the Gaussian process surrogates of the CG models, we designed a set of simulations where each simulation is represented by a point in a 15-dimensional hypercube (7 # and 7 # parameters describing the non-bonded interactions of the seven beads, plus DC). Since our two CG networks do not share the same set of CG beads, we created two experimental designs containing samples , (2) where ‖⋅‖ K ! is the ! norm and the subscript corresponds to the <= response variable. This is a parametric approach that allows the identification of a set of parameters that are constant over the space of ∈ [0%, 100%]. However, this assumption greatly limits the flexibility of the CG models' responses (i.e., poor calibration performance) We showed in Supplementary Figures 2   and 3 that DC-independent parameters are not sufficient to obtain a good match between the AA and CG models. Consequently, we required that each parameter has a dependence on crosslinking density [ ' (DC), ' (DC)] described analytically from DC=0% to DC=100%. Using the functional representation and by replacing the ! norm with the sample average taken over samples gives where * (⋅), * (⋅) is the set of calibration functions associated with the nonbonded potentials of the DGEBA+PACM system, and 6 (⋅), 6 (⋅) is the set of calibration functions for the nonbonded potentials of the DGEBA+D400 system. We chose radial basis functions as the class of functions describing $ , (⋅) (⋅)%, ( = 1, … ,7). The general formulation of the RBFs is given as where worrying about the bounds of the space over which has been defined, as we can set it equal to the bounds used to generate the training data set of the CG models. This is important for two reasons (i) we can ensure that we do extrapolate from our Gaussian process surrogate models as the search space is restricted to a hypercube, and (ii) having the search space defined on a hypercube greatly simplifies the optimization scheme as no constraints need to be enforced.

DATA AVAILABILITY
The authors confirm that the data supporting the findings of this study are available within the   systems. Error bars result from the variance of statistically independent simulations. Density, modulus, and yield stress increase with increasing DC, while 〈 ! 〉, related to the mobility of the system, decreases. The D400 system, with the longer and flexible curing agent, has a lower density, higher mobility, and softer mechanical response. The dependence of these properties on DC is different in the CG model due to the different changes in configurational entropy caused by the reduction in degrees of freedom. This is typically discussed for changes in temperature, and here observed during the curing process of the polymer network.
For this reason, a DC-independent parametrization of the CG model cannot fully capture the features of the AA model at all DC values (see Supplementary Figures 2 and 3), and an energy renormalization procedure is needed.    Figure 4, and the results of the corresponding CG simulations (black stars). Debye-Waller factor, density, Young's modulus and yield stress are reported for the DGEBA+PACM system (panels a-d) and for the DGEBA+D400 system (panels e-h). The confidence intervals were obtained from the data of Figure   (as shown in Figure 5) but can also predict the whole MSD (panels a-b) and tensile stress curves (panels cd), validating our choice of targets as good predictors of the systems dynamics and mechanical properties.

SUPPLEMENTARY NOTE 1: BONDED POTENTIALS CALIBRATION
In Supplementary Figure 1 we report the AA distributions and CG potentials obtained through iterative Boltzmann inversion of the bonds, angles, and radial distribution functions.
Supplementary Figure 1: Complete set of all distributions of bonds, angles, and radial distribution functions for the centers of mass informed from the AA systems corresponding to the CG beads 1-7 (blue), and the derived CG potentials obtained for bonds and angles using Boltzmann inversion (BI) (red). The atomistic data are the distributions of the distance/angle between the centers of mass of the corresponding CG beads (calculated as the center of mass of the atoms included in the CG bead), from which the CG bonded potentials are obtained through BI. These bonded potentials are directly adopted in the CG model, which are listed in Table 1 of the manuscript. The last three panels report the radial distribution functions of all seven beads, from which we extract Lennard-Jones (LJ) parameters through BI, which include the cohesive energy ( ) and bead size ( ) of each type of beads. We then use an arithmetic rule of mixing for cross-interactions between different types of beads. We note that these LJ potential parameters are initial guesses, which are subject to further calibration using the ML approach.

SUPPLEMENTARY NOTE 2: DC-INDEPENDENT FUNCTIONAL CALIBRATION
We mention in the main manuscript that a DC-dependence of the non-bonded parameters is necessary to obtain a DC-transferable matching between the AA and CG responses, as the free energy landscape and configurational entropy of the two systems are affected differently by the creation of crosslinks. In Supplementary Figures 2 and 3 we show the model response for a DCindependent calibration which is optimized only for the DC=0% or DC=100% response, For each target property it is possible to obtain a high level of accuracy. This shows that the main limit to the accuracy of the simultaneous calibration resides on the competition between physical properties, rather than on the ability of the ML protocol to find optimized parameters.

SUPPLEMENTARY NOTE 7: THE RANGE OF NON-BONDED PARAMETERS
Supplementary Table 1 reports the range of the non-bonded parameters used for the training set and the calibration of the model, as shown by the dotted lines in Figure 4.
Using the RBF formulation described, the resulting calibration functions are presented in Figure   4, and the corresponding performance predictions are presented in Figure 5. It should be clear that despite the relatively high demand of having to match eight response curves, the RBF functions provide sufficient freedom to minimize the discrepancy between the CG and AA models.