Introduction

Mechanical metamaterials inherit unique behavior and extreme properties from their intricate internal organization. Conceptualized back at the end of the previous century1,2, mechanical metamaterials underwent significant progress over the last thirty years3,4, in part thanks to the revolution in additive manufacturing, enabling precise fabrication5,6. If classical metamaterials were mostly lattices assembled from the repeating unit cells7, modern metamaterials often feature heterogeneous8 or modular designs9, exhibit multistability10 and are even capable of performing simple computations11,12,13. Such advanced mechanical behavior relies on structure-properties relationships more complex than ever before14. Tailoring metamaterial architectures to achieve specific desired behavior is a challenging task, but it has become more accessible thanks to the advancements in artificial intelligence (AI) and machine learning (ML)15,16. While some approaches rely directly on inverse models8,17,18, others couple forward surrogate models with iterative design improvements using, for example, genetic algorithms19,20,21.

Several successful architectures, such as Variational Autoencoders (VAE)22, Generative Adversarial Networks (GAN)23 and the more recent Denoising Diffusion Models24 have revolutionized image generation25,26. Initially developed for generation of images, these techniques have recently been adapted for the inverse design of metamaterials27,28,29,30,31. However, upon examining the application of generative models in the inverse design of mechanical metamaterials32,33, it is apparent that they are used differently than in image generation. In image generation, the strength of these models is in their ability to narrow the design space from all possible images to a subset of admissible ones, such as generating only human faces34. Contrary to this, it is common to parameterize mechanical metamaterials such that every randomly chosen set of parameters is admissible18,35, or the restrictions are trivial, such as requiring the trusses to have positive lengths. For imposing more complex restrictions on the design space, advanced generative ML techniques are required. In general, learning restrictions is not a trivial task, even within the computer science domain. In some instances, incorporating negative data or invalid designs can enhance the learning of constraints36,37. Another method involves directly feeding constraints as input parameters into network architectures, such as GAN38. Additionally, some strategies employ a dual training process, where the general generative model is trained alongside a constraint function that learns to capture structured knowledge about restrictions39. Despite noticeable progress in constraint learning, ML methods applied to metamaterials are still primarily used to learn the relationship between structure and properties, rather than to understand structural restrictions that should be imposed.

Figure 1
figure 1

Mechanical metamaterials based on straight cuts. (a) Initial \(6\times 6\) pattern with alternating cuts. This architecture gives rise to auxetic behavior through the rotating squares mechanism. (b) Perturbation of the initial architecture is performed by adding rotations \(\beta _{i,j}\) to each cut. The absolute value of rotations is capped by parameter \(\beta _{max}\). (c) Resulting admissible design without intersections between cuts. The likelihood of obtaining intersection-free sample through random rotations (\(\beta _{max}=90^\circ\)) does not exceed 0.001%.

However, an all-admissible parameterization can come at the cost of excluding more sophisticated and sometimes optimal solutions from the design space. A good example is the metamaterials employing straight cuts in the planar sheets – often called kirigami metamaterials40,41,42,43 – to program the desired mechanical behavior. For instance, alternating cuts with horizontal and vertical orientations within the sheet (Fig. 1a) can lead to the manifestation of auxetic behavior via the rotating squares mechanism44. Further, it was shown45 that adding random rotations to the base alternating structure (Fig. 1b) could be a method to program the behavior of the resulting metamaterial. The greater the maximum deviation from the base structure, the greater the achievable range of material properties. It is important to note that rotations for each cut were limited to prevent intersections that can cause undesired effects such as stress concentration or even lead to disconnected regions. While limiting the deviations from the initial structure helps in preventing intersections, it simultaneously disqualifies the majority of intersection-free configurations, such as the one shown in Fig. 1c.

While kirigami metamaterials have been the focus of multiple studies employing machine learning techniques, such studies were either limited to property predictions46, or employed strict restrictions to get all-admissible parameterizations47,48. This prompts the question of why ML is not more extensively used to learn design restrictions in metamaterials in a manner akin to its application in image generation. In this manuscript, we show the fundamental problem in generation of kirigami metamaterials, associated with a non-trivial design space and absence of a good similarity metric. We discuss why certain classical ML algorithms can learn these design restrictions while others cannot. Through this analysis, we also aim to highlight the presence of survivorship bias in the existing literature on generative AI in metamaterial design, which arises from considering only metamaterials with favorable design spaces.

In general, kirigami metamaterials present a case where humans outperform machines in an intuitive task. It is extremely easy for a human to generate intersection-free samples after examining just a dozen valid examples. In contrast, standard generative models struggle with this seemingly simple task. This contrast underscores the importance of either adapting existing methods or developing new approaches for generating metamaterials with complex restrictions. Furthermore, we demonstrate that the challenge of generating the aforementioned kirigami metamaterials serves as an effective benchmark for assessing the ability of generative algorithms to learn general design space restrictions.

Results

Problem statement

Most generative design algorithms adhere to a similar core concept. A dataset filled with examples of what to generate, e.g. admissible metamaterials, is presented to the corresponding algorithm that learns from this information to generate similar data. The difficulty lies in determining what qualifies as “similar data”. There are two approaches to this, which can also be combined together. The first approach views the dataset as samples from a probability distribution, where certain combinations of parameters are more or less likely than others. In this case, similarity is assessed by comparing the distribution of the generated data with that of the example data. The second approach performs a direct comparison between samples. For this, a specific sample-to-sample similarity metric must be chosen. The most commonly used one is the Euclidean distance (ED). In a n-dimensional space \(\mathbb {R}^n\), the ED between two samples x and \(\hat{x}\) is calculated as follows:

$$\begin{aligned} D_E(x, \hat{x}) = \sqrt{\sum _{i=1}^n (x_i-\hat{x}_i)^2} \end{aligned}$$
(1)
Figure 2
figure 2

Suitability of the Euclidean Distance for two cuts. (a) Three different configurations (A: [\(5^\circ\),\(4^\circ\)], B: [\(-5^\circ\),\(-3^\circ\)] C: [\(65^\circ\),\(-45^\circ\)]) of adjacent cuts with unit length between centers and length of \(\sqrt{3}\). (b) The design space for the considered system with two cuts. Purple zones correspond to the angle pairs of intersecting cuts. Magenta and green lines show two possible routes between (A) and (B). (c) Sequence of cut positions corresponding to direct transition from (A) and (B) (magenta path). d Sequence of cut positions for detour path shown by green line. Note passing configuration (C) on a route from (A) to (B).

While other similarity measures might be more effective for image generation49, ED has still been shown to yield good results, and it is a part of the original formulations for several generative algorithms22,50. For mechanical metamaterials, on the other hand, ED may not be the most appropriate choice for measuring similarity. For illustration, consider Fig. 2a, which displays three different configurations for two neighboring cuts with angles to the vertical direction as follows: A) [\(5^\circ\),\(4^\circ\)], B) [\(-5^\circ\),\(-3^\circ\)] C) [\(65^\circ\),\(-45^\circ\)]. We can pose the question: which two configurations are most similar? At first glance, the answer (A, B) appears straightforward, which aligns with ED since \(D_E({\textbf {A}},{\textbf {B}})<D_E({\textbf {A}},{\textbf {C}})<D_E({\textbf {B}},{\textbf {C}})\), where

$$\begin{aligned} \begin{aligned} D_E(A,B)&= \sqrt{(5+5)^2 + (4+3)^2} \approx 12.206\\ D_E(A,C)&= \sqrt{(5-65)^2 + (4+45)^2} \approx 77.466 \\ D_E(B,C)&= \sqrt{(-5-65)^2 + (-3+45)^2} \approx 81.633\\ \end{aligned} \end{aligned}$$
(2)

However, this conclusion overlooks a crucial point. Dark regions in Fig. 2b represent pairs of angles where two neighboring cuts intersect, forming a non-admissible zone. The transition from configuration A to configuration B through linear interpolation (as illustrated in Fig. 2c) follows the shortest path in Euclidean space, indicated by the magenta line. This path, however, clearly passes through the non-admissible zone. Consequently, not all intermediate configurations between A and B belong to the admissible design space, despite both end configurations being intersection-free. To navigate from A to B staying within the admissible design space, a considerably longer trajectory is required, as shown by the green line in Fig. 2b. Notably, the admissible path from A to B includes passing through configuration C (Fig. 2d). Therefore, if no intersections are allowed, the title of most similar pair belongs to (A, C). This implies that ED might not be an appropriate measure of similarity for these simplified two-cut kirigami designs. Moreover, it suggests that generative algorithms relying on ED may not effectively learn to avoid intersections, which will be shown further. This paper analyzes the ability of the four most-common generative design algorithms – VAE22, GAN23, Wasserstein GANs (WGAN)50 and Denoising Diffusion Probabilistic Models26 – to handle such geometrical challenges.

Figure 3
figure 3

Generative approaches. (a) Variational Autoencoder (VAE), comprised of Encoder and Decoder stages, learns to map the designs into latent space and retrieve them back. (b) Generative Adversarial Network (GAN) utilizes competition between Generator and Discriminator to create samples that look real. (c) Denoising Diffusion Probabilistic Model (DDPM) employs sequential addition of noise to map the designs to latent space.

Variational autoencoders

The Variational Autoencoder (VAE), shown in Fig. 3a and introduced in 2013 by Kingma and Welling22, is a modification of the traditional autoencoder51 for generative design. Autoencoders transform input data into a usually lower-dimensional representation in so-called latent space and are comprised of two parts. The encoder creates a representation of the original data in the latent space while the decoder tries to reconstruct the original data from such a representation. Both parts are trained jointly by minimizing the error between the original and reconstructed data, which is usually referred to as reconstruction loss. While these traditional autoencoders can be used for a variety of tasks, including image denoising52, dimensionality reduction53, and anomaly detection54 they lack the ability to generate new data. Unlike traditional autoencoders, VAEs map to mean and variance of a normal distribution in the latent space. An additional regularization term in the loss function compels this distribution to have zero mean and unit variance. This enables the generation of new data by sampling a representation from this normal distribution and then decoding it. While the Kullbeck-Leiber divergence55 has been consistently used as regularization loss, the choice of reconstruction loss, in general, depends on the input data. In the original formulation, Binary Crossentropy loss was used for the MNIST dataset, where pixels are supposed to be either white or black, while the Mean Squared Error (the square of the ED) was used for continuous problems22. Together they yield the following combined loss function:

$$\begin{aligned} \mathscr {L}_{VAE} = \underbrace{D_{KL} \big (\mathscr {N}(\mu _x, \sigma _x), \mathscr {N}(0, 1) \big )}_\text {regularization loss} + \underbrace{\kappa D_E(x,\hat{x})^2}_\text {reconstruction loss} \end{aligned}$$
(3)

where x is the original input, \(\hat{x}\) the reconstructed one and \(\mu _x, \sigma _x\) are the mean and variance of the learned distribution in the latent space, while \(\kappa\) is a parameter controlling the trade-off between the two losses. Note the reliance of reconstruction loss on the Euclidean distance. We note that there have been efforts to make VAEs independent of the ED, such as by combining it with a GAN, that have shown promising results56.

Generative adversarial networks

In 2014, Goodfellow et al.23 introduced a framework for training generative models via an adversarial process, giving rise to Generative Adversarial Networks (GANs). This architecture (Fig. 3b) encompasses both a generative model \(\mathscr {G}\), and a discriminative model \(\mathscr {D}\), which are trained simultaneously. This training takes the form of a two-player game. While \(\mathscr {D}\) is trained to distinguish between data generated by \(\mathscr {G}\) and the training data, \(\mathscr {G}\) is simply trained to maximize the probability of \(\mathscr {D}\) making a mistake. This process reaches an equilibrium when the generative model has learned a mapping between a chosen distribution in the latent space and the data distribution, similar to the discriminator of a VAE. It has further been shown that it is equivalent to minimizing the Jensen-Shannon divergence between the distribution of the data generated by \(\mathscr {G}\) and the distribution of the training data:

$$\begin{aligned} \mathscr {L}_{GAN} = D_{JS} \big (p(x), p(\hat{x}) \big ) = D_{KL} \big (p(x), p(\hat{x}) \big ) + D_{KL} \big (p(\hat{x}), p(x)\big ) \end{aligned}$$
(4)

This means that the vanilla GAN relies only on distances between probability distributions and not on the distance between samples, which should allow it to learn intersection avoidance.

Wasserstein generative adversarial networks

The Wasserstein Generative Adversarial Network (WGAN)50 is a GAN variant where the Jensen-Shannon divergence has been replaced by the Wasserstein distance (also called Kantorovich-Rubinstein metric or Earth mover distance) to measure the discrepancy between the distributions of the generated and the training data. This metric, which was first introduced by Kantorovich57, is based on the principle of optimal transport. A probability distribution is seen as a distribution of mass in the design space, and the difference between distributions is measured as the minimal cost of transporting mass so that one distribution resembles the other. While it has the benefit of always staying finite and can, therefore, always provide meaningful gradients to update the generator, the Wasserstein distance (\(\text {Wass}_1\)) relies on an underlying metric to measure how far the mass has been transported. For the WGAN, this is usually the Euclidean metric \(D_E\):

$$\begin{aligned} \text {Wass}_1(p(G(z)),p(x)) = \inf _{\pi \in \Pi (G(z)),p(x))} \mathbb {E}_{(X_1,X_2) \sim \pi } D_E(X_1,X_2) \end{aligned}$$
(5)

Note that replacing the Jensen-Shannon by the Wasserstein distance makes training generally more stable, but WGAN formulation comes with the drawback of not always converging to the equilibrium point58.

Denoising diffusion probabilistic models

In 2015, Sohl-Dickstein et al.25 laid the foundation for a class of latent variable models, which have since become widely known as Denoising Diffusion models (Fig. 3c). The ingenious concept behind these models is that, rather than attempting to learn an arbitrary direct mapping from the latent space to the design space, they learn to reverse a diffusion process that stepwise transforms an image into its representation in the latent space. This diffusion process is defined as a Markov Chain - a stochastic model where the probability of transitioning to another state depends only on the current state - that gradually introduces noise to the image over a number of steps T. So for data of the from \({\textbf {x}}_0 \sim q({\textbf {x}}_0)\) the forward process \(q({\textbf {x}}_{1:T}|{\textbf {x}}_0)\) is given as:

$$\begin{aligned} q({\textbf {x}}_{1:T}|{\textbf {x}}_0) = \prod _{t=1}^{T} q({\textbf {x}}_t|{\textbf {x}}_{t-1}) \end{aligned}$$
(6)

where the variance of the Gaussian noise that is added in each step is altered for each step based on a variance schedule \(\beta _1, \ldots ,\beta _T\):

$$\begin{aligned} q({\textbf {x}}_t|{\textbf {x}}_{t-1}) :=\mathscr {N}({\textbf {x}}_t;\sqrt{1-\beta _t}{} {\textbf {x}}_{t-1},\beta _t{\textbf {I}}) \end{aligned}$$
(7)

This formulation possesses the advantage that when the variances of the forward process \(\beta _t\) are small, the reverse process \(p_{\theta }({\textbf {x}}_{0:T})\) can also be described as a Markov chain with Gaussian transitions. Only that in this instance, both the mean and variance of the transitions are learned25:

$$\begin{aligned} p_\theta ({\textbf {x}}_{1:T}) = \prod _{t=1}^{T} p_\theta ({\textbf {x}}_{t-1}|{\textbf {x}}_{t}) \end{aligned}$$
(8)

where

$$\begin{aligned} p_\theta ({\textbf {x}}_{t-1}|{\textbf {x}}_{t}) :=\mathscr {N}({\textbf {x}}_{t-1};\varvec{\mu }_\theta ({\textbf {x}}_t,t), \varvec{\Sigma }_\theta ({\textbf {x}}_t,t)), \quad p({\textbf {x}}_T) = \mathscr {N}({\textbf {x}}_T;\varvec{0}, {\textbf {I}}) \end{aligned}$$
(9)

Training of the reverse process is usually performed by minimizing the variational upper bound on the negative log-likelihood using stochastic gradient descent. In 2020 Ho et al.26 introduced a variant of these models called Denoising Diffusion Probabilistic Model (DDPM). When conditioned on \({\textbf {x}}_0\), this bound can be rewritten using KL divergences between Gaussians:

$$\begin{aligned} \begin{aligned} L(\theta ) = \mathbb {E}_q \Big [&D_{KL}( q({\textbf {x}}_T | {\textbf {x}}_0) \Vert p({\textbf {x}}_T)) + \sum _{t>1} D_{KL}( q({\textbf {x}}_{t-1} | {\textbf {x}}_t, {\textbf {x}}_0) \Vert p_\theta ({\textbf {x}}_{t-1} | {\textbf {x}}_t)) \\&- \log p_\theta ({\textbf {x}}_0 | {\textbf {x}}_1) \Big ] \end{aligned} \end{aligned}$$
(10)

Furthermore, as these KL-divergences are computed between two Gaussian distributions, reparametrization enables the formulation of the variational bound as the MSE between the actual noise \(\epsilon \sim \mathscr {N}({\textbf {0}},{\textbf {I}})\) and its predicted counterpart \(\epsilon _\theta ({\textbf {x}}_t,t)\):

$$\begin{aligned} L(\theta ) :=\mathbb {E}_{t,{\textbf {x}}_0,\epsilon } \Big [ \Vert \epsilon - \epsilon _\theta ({\textbf {x}}_t,t) \Vert ^2 \Big ] \end{aligned}$$
(11)

Here, once again, ED is encountered, though this time it is computed between instances of noise rather than instances from the design space. As this distance is derived from the initial choice of Gaussian noise in the forward process, it means that this choice limits the ability of denoising diffusion models to deal with non-Euclidean data. Therefore, research to extend diffusion models has recently focused on more general choices of forward processes and their reversal59,60,61.

Kirigami dataset

Figure 4
figure 4

Limiting the perturbations enables control over the success rate of generation. (a) The average number of intersections in the samples and the likelihood of generating unit cells without intersections vs maximum deviation \(\beta _{max}\) from the base structure. (b) Exemplary unit cells for a maximum added rotations \(\beta _{max}\) of \(20^\circ\), \(60^\circ\) and \(90^\circ\).

To demonstrate the practical limitations of the generative algorithms discussed in the previous sections, we assessed their ability to generate kirigami metamaterials akin to those introduced by Grima et al.45. We started with the alternating \(6\times 6\) pattern (Fig. 1a) and introduced random rotations for each cut, maintaining the periodicity of structure in both directions. These added rotations are denoted as \(\beta _{i,j}\) for a given cut \(c_{i,j}\), where \(i,j=0,..,5\) (Fig. 1b). Correspondingly, the angle of each cut relative to the vertical direction is denoted as \(\alpha _{i,j}\). The dimensions were chosen to ensure that intersections begin to occur only if the maximum absolute value for the added rotations \(\beta _{max}\) exceeds \(30^\circ\). The centers of adjacent cuts are one unit apart, while the length of each cut is \(l = \sqrt{3}\). Through variation of \(\beta _{max}\), it is possible to control the average number of intersections in the generated data and the likelihood of randomly generating unit cells without intersections, as illustrated in Fig. 4a. It becomes virtually impossible to obtain an admissible configuration by randomly selecting 36 rotation values even if a maximum disturbance is limited to \(\beta _{max}=60^\circ\), with only three out of a million samples containing no intersections. It is interesting to observe that the average number of intersections per sample is slightly lower for \(\beta _{\text {max}}=90^\circ\) compared to \(\beta _{\text {max}}=60^\circ\). This is caused by the existence of a number of intersection-free configurations around the alternating structure, mirrored to the one shown in Fig. 1a, where all vertical cuts are replaced by horizontal cuts and vice versa.

Figure 4a illustrates that if \(\beta _{max}\) is set to less than \(30^\circ\), intersections are not possible. In this case, ED serves as a suitable metric, and any linear combination of two samples remains within the admissible design space. However, once \(\beta _{max}\) exceeds this threshold, the likelihood of randomly generating intersection-free samples rapidly becomes practically negligible. In general, this means that admissible designs become sparsely scattered in the design space, making it challenging for ED to measure similarity, as there is no guarantee that a linear combination of two samples is admissible anymore. Therefore the extent to which ED can effectively describe the similarity between unit cells for generative models can be indirectly controlled through \(\beta _{max}\). Here, we created datasets for three distinct values of \(\beta _{max}\): \(20^\circ\), \(60^\circ\) and \(90^\circ\), to see the corresponding effect on different machine learning algorithms. Each dataset comprised 20, 000 unit cells, represented as \(6\times 6\) matrices containing the values of \(\alpha _{i,j}\). Given the near impossibility of randomly generating intersection-free unit cells for \(\beta _{max}=60^\circ\) and \(\beta _{max}=90^\circ\), we employed a randomization process. Starting from the base alternating structure, each cut was sequentially replaced with another cut, chosen randomly from those that would not create intersections. This sequence of random replacements was repeated until each cut had been replaced 200 times. Examples of admissible designs for \(\beta _{max}\) of \(20^\circ\), \(60^\circ\) and \(90^\circ\) are shown in Fig. 4b. After normalization, independent for each dataset, the values of \(\alpha _{i,j}\) were in the range of \([-1,1]\). In general, samples can be represented as greyscale \(6 \times 6\) images where a value of \(-1\) corresponds to a black pixel and a value of 1 to a white pixel, with intermediate values linearly interpolating to corresponding shades of grey.

In order to demonstrate the varying degree of reliance of four different machine learning algorithms (VAE, GAN, WGAN and DDPM) on ED, generators for each respective approach were trained on three datasets (\(\beta _{max}=\) \(20^\circ\), \(60^\circ\) and \(90^\circ\)) separately. Theory suggests that VAE and WGAN will learn to avoid intersections only when ED is applicable for the dataset, i.e., when rotations are limited. Proving that an approach is not learning can be conceptually more challenging than showing that one is. The inability to learn might stem from insufficient model complexity or poor parameter selection. To rule out these factors, VAE, GAN and WGAN models had almost identical architectures with consistent hyperparameters across all experiments. This approach ensures that observed differences in performance can be attributed solely to the training process. Thus, if one of the methods successfully learns to avoid intersections, the model complexity and parameter choices are not to blame.

Implementation

For implementing the four machine learning approaches - VAE, GAN, WGAN, and DDPM - the PyTorch deep learning framework62 was chosen. PyTorch provides the capability to capture the periodicity of the samples by using Convolutional Neural Networks (CNN) with circular padding. CNN architecture is particularly well-suited for the unit cells of the kirigami metamaterial under investigation, as intersections can be effectively represented through hierarchical feature mapping. Intersections primarily depend on the direct neighbors of a cut, which corresponds to a 3 \(\times\) 3 convolution, with these neighboring arrangements influencing each other. Both GAN architectures, as well as the decoder of the VAE, were based on the DCGAN framework63. Batch Normalization64 was employed in VAE’s encoder and sparingly in the generators/decoder to minimize batch internal dependencies65. The discriminator architecture was nearly identical for both GAN and WGAN, with the addition of a sigmoid activation function for the GAN due to different value range requirements imposed by the objective functions. Given the potential instability of GAN training, different learning rates were utilized for the generator and discriminator (\(10^{-5}, 5\cdot 10^{-4}\)) to improve convergence towards a local Nash equilibrium66. Due to the restrictions on the dimensions of the latent space, a different architecture was required for DDPM26. A convolutional U-Net architecture67 was chosen there, with filter numbers similar to the other generative network. For all models, Adam68 was used as an optimizer, and a batch size was set to 32. Note that no implicit penalties for intersections were given, enabling models to learn the restrictions just by examining the training dataset. More details about the network architectures are presented in the supplementary materials.

Figure 5
figure 5

Training of models for \(\beta _{max}=20^\circ\). (a) The evolution of the average number of intersections during training for unit cells generated by different machine learning approaches. An averaging over five epochs was used for curve smoothing. (b) Distribution of the cuts with the specific added angles \(\beta _{i,j}\) in the training dataset for \(\beta _{max}=20^\circ\) (red) and in the set generated by trained DDPM (blue).

Figure 6
figure 6

Relation between adjacent cuts for \(\beta _{max}=20^\circ\). 2D histograms show the likelihood of angle combinations for a cut and its bottom neighbor for a maximum absolute value for \(\beta =20^\circ\). Only angles at positions that correspond to vertical cuts in the base structure were chosen as the first elements in pairs.

Euclidean case

As previously discussed, when the maximum absolute value for the added rotations \(\beta _{max}\) is set to \(20^\circ\), the angles of these rotations can be chosen independently, allowing ED to effectively measure similarity between samples. This scenario is akin to the classic mechanical metamaterials with benign parameterization. In this case, to generate intersection-free unit cells, the model simply needs to understand that the angle of each cut must be confined within a specific range. Figure 5a demonstrates that after a few epochs, three out of four approaches learn to generate unit cells with none or very few intersections on average. The WGAN is capable of reducing the number of intersections to less than 0.1 on average, although it exhibits poorer stability during training. In general, if \(\beta _{i,j}\) values are drawn from a uniform random distribution with \(\beta _{max}=20^\circ\), only admissible configurations are created. The histograms in Fig. 5b reveal that after 3000 epochs, the DDPM successfully learns to emulate this uniform distribution of rotation angles \(\beta _{i,j}\), maintaining the generated angles within the range of \(-25^\circ\) to \(25^\circ\). Minor deviations between the training dataset and the DDPM-generated datasets at the boundary angles of \(-20^\circ\) and \(20^\circ\) can be attributed to the challenges of learning sharp transitions within continuous models.

Another method to evaluate the effectiveness of a model in learning the intricate constraints of the design space involves examining the overall distribution of rotation angles between neighboring cuts. If the samples generated by the trained model exhibit distributions that closely match those of the training dataset, the model can be deemed suitable for generation. Figure 6 shows 2D histograms for the training dataset and datasets generated by trained models. The color intensity represents the number of instances where a random cut and its bottom-side neighbor possess a specific combination of angles (\(\beta _{i,j}\), \(\beta _{i+1,j}\)). For illustrative purposes, the first elements of these angle pairs are always from cuts at positions corresponding to vertical cuts in the initial undisturbed sample (Fig. 1a). Since no intersections are possible by construction due to \(\beta _{max}= 20^\circ\), all combinations of neighboring angles are equally likely to be observed in the training dataset, as indicated by the homogeneous square in Fig.  6. A comparison of the datasets generated by trained models with the initial training dataset reveals that all evaluated models (VAE, GAN, WGAN, DDPM) effectively capture the limits of the perturbations from the initial alternating pattern. However, it is noticeable that the VAE model slightly narrows the range of generated angles, avoiding borderline cases. This behavior is associated with the VAE’s loss function (eq. 3), consisting of reconstruction and regularization terms. The inclusion of the regularization term with the hyperparameter \(\kappa\) is driven by the need to facilitate a normal distribution in the latent space. Due to the regularization term, even imperfect reconstruction of the sample, usually characterized by rotation angles closer to the average values, becomes acceptable. The observed effect of narrowing the angle range persists even after careful hyperparameter tuning. In contrast with other models, the histogram for WGAN clearly shows a more Gaussian distribution rather than a uniform one, indicating the challenge in capturing the angle relationships from the training dataset. The other models (GAN and DDPM) maintain a nearly uniform 2D distribution akin to the training dataset.

Fully random case

Figure 7
figure 7

Training of models for \(\beta _{max}=90^\circ\) (a) and \(\beta _{max}=60^\circ\) (b). The evolution of the average number of intersections during training for unit cells generated by different machine learning approaches. An averaging over 20 epochs was used for curve smoothing. The dashed line corresponds to the average number of intersections when sampling all angles of a unit cell randomly and independently from the angle distribution of the training dataset.

When the maximal absolute value for added rotations, \(\beta _{max}\), is set to \(90^\circ\), the design space encompasses all possible samples with non-intersecting cuts. The final rotations of the cuts \(\alpha _{i,j}\) are no longer affected by their position in the unit cell and depend only on the rotations of the neighboring cuts instead. This dependency influences the frequency at which certain rotations occur in the dataset, as some rotations are less likely to result in intersections with random neighbors. As a result, randomly generated unit cells, created by sampling cuts based on the angle distribution of the training dataset, typically have fewer intersections (represented by the black dashed line in Fig. 7a) compared to those generated from a uniform distribution (indicated by the orange dotted line). Therefore, a reduction in the number of intersections during the training of a machine learning algorithm may occur for two different reasons: the algorithm might learn to fit the angle distribution, or it might also learn the dependency of cuts on each other. In this context, the approach of randomly drawing the rotation angles of the cuts from the angle distribution of the training data serves as a valuable baseline. A decrease in the number of intersections on average during training unequivocally indicates that the model is learning the dependencies between neighboring cuts. Figure 7a demonstrates that VAE does not succeed in reducing the average number of intersections below the established baseline. Meanwhile, WGAN manages to slightly lower the number of intersections without going beyond the baseline but fails to converge, which is a common drawback of WGAN58. In contrast, both GAN and DDPM achieve significantly lower intersection counts, although they still fall short of generating completely intersection-free samples.

Figure 8
figure 8

Relation between adjacent cuts for \(\beta _{max}=90^\circ\). 2D histograms show the likelihood of angle combinations for a cut and its bottom neighbor for fully random rotations of the cuts. Only angles at positions that correspond to vertical cuts in the base structure were chosen as the first elements in pairs.

A more illustrative measure of whether different generative approaches successfully learn the dependencies between neighboring cuts, as previously mentioned, can be captured using 2D histograms (Fig. 8). Unlike the dataset with \(\beta _{max} = 20^\circ\), the distribution of angles in neighboring cut pairs for the training dataset with \(\beta _{max} = 90^\circ\) is no longer uniform. Recall that the dark regions in the histogram for the training dataset correspond to angle pairs where cuts intersect (compare with Fig. 2b). Therefore, avoiding these zones is an important indicator of a successful generative model. By comparing the training dataset with datasets generated by different trained models, we can categorize these models into two groups. Models belonging to the first category (VAE and WGAN) can limit the range of rotation angles but fail to capture the dependency of a cut on its bottom neighbor. Notably, the datasets generated by trained VAE and WGAN models contain angle pairs even from non-admissible zones, likely due to relying on linear interpolation between samples. On the other hand, models from the second category (GAN and DDPM) demonstrate a much better understanding of the design space by learning additional constraints between neighboring cuts, with DDPM slightly outperforming GAN.

Intermediate case

When the maximum absolute value for the added rotations, \(\beta _{max}\), is set to \(60^\circ\), it constitutes an intermediate situation between the Euclidean and the fully random cases, with intersections still occurring in the design space. Figure 4a illustrates that it remains almost impossible to generate designs without intersections by chance, even under stricter restrictions compared to the fully random case. Figure 7b displays the training progress for different generative models on the dataset with \(\beta _{max}=60^\circ\). Similar to the fully random scenario, all models exhibit gradual improvement during training, as evidenced by a decrease in the average number of intersections in the generated samples. However, in contrast to the \(\beta _{max}=90^\circ\) case, all models are shown to be capable of going significantly below the baseline defined by random sampling from the training dataset distribution. Since the learning of the angle restriction (\(|\beta _{i,j}|<60^\circ\)) plays an important role, both VAE and WGAN show their capacity for performing that task, similar to the Euclidean case. Nevertheless, as in the fully random case, DDPM and GAN surpass the other two models, more accurately recreating the admissible design space after training. As seen in Fig. 7b, DDPM does not achieve a 100% success rate in generating intersection-free configurations, with an average of 1.7 intersections per sample observed. While the average number of intersections per sample provides an adequate measure of model performance (Fig. 7), it is more convenient to use another metric demonstrating the probability of generating an intersection-free sample by the corresponding model. For random sampling from a uniform distribution, approximately three structure out of a million have no intersections (success rate less than 0.001%). If random sampling is done from the training set distribution, then the success rate is around 0.5%. After training, both GAN and DDPM show approximately a 25% success rate, meaning that one out of every four generated samples has no intersections. This represents a 100,000-fold improvement over uniform distribution sampling and a more than 50-fold improvement compared to sampling from the training set. VAE and WGAN lag behind with a success rate of around 10%. These results underscore the potential of DDPM and GAN for generating metamaterials with complex geometrical restrictions.

While DDPM demonstrates the best performance in this scenario, it is noteworthy that previously weaker models, such as VAE and WGAN, also show significant improvements. Given that their network architectures are identical to those used in the fully random case, it logically follows that changes in the design space are the primary contributors to their improved performance. To avoid the need to work in a 36D space, the corresponding changes in the characteristics of the design space can be illustrated using the example of two neighboring cuts with perturbations added to the initial angles of \(0^\circ\) and \(90^\circ\) (Fig. 1b). In the case of fully random rotations (\(\beta =90^\circ\)), the fraction of angle pairs corresponding to intersecting configurations, calculated as the normalized area of non-admissible zones (shown in Fig. 2b), is approximately 16.5%. This implies that one in six randomly chosen cut pairs is non-admissible. Surprisingly, the probability of intersection under a \(\beta _{max}=60^\circ\) constraint increases to 18.7%. This trend also holds true for \(6\times 6\) samples, where the average number of intersections is slightly higher for \(\beta _{max}=60^\circ\) as compared to \(\beta _{max}=90^\circ\) (Fig. 4a). Therefore, the size of the non-admissible zone alone does not account for the improved performance of generative models in the intermediate case.

At the same time, an alternative metric, closely linked to ED, can be considered. As shown prior, the suitability of ED as a similarity measure is compromised when there is no direct path between samples, as demonstrated in Fig. 2. Therefore, the shapes and positions of non-admissible zones, in addition to their overall area, could significantly influence the appropriateness of ED as a similarity measure in the examined cases. In the previous example involving two adjacent cuts, there is a 23.7% probability that a straight path connecting two random points within the admissible design space of the fully random case (\(\beta _{max}=90^\circ\)) passes through a non-admissible zone. However, when \(\beta _{max}\) is set to \(60^\circ\), this probability drops significantly to only 3.5%, making ED a much more suitable similarity measure. While generalizing these findings from this 2D example to a 36D case is not straightforward, the observed relationships between neighboring cuts suggest that the improved performance of certain models, particularly VAE and WGAN, in the intermediate case is likely due to a design space that aligns better with ED metric. Consequently, the ability to influence the “goodness” of the design space through the selection of \(\beta _{max}\) could hold significant promise for future generative models tailored to deal with the complex design spaces of mechanical metamaterials.

Discussion

Machine learning is deeply embedded into the research on mechanical metamaterials, as evidenced by the growing number of successful generative approaches applied to inverse design problems. However, this perceived success may be somewhat misleading, akin to survivorship bias, where only designs with “nice” parameterizations are considered. In image generation, machine learning often derives design constraints from training data, whereas, for many mechanical metamaterials, parameterizations are preselected to include these constraints. Our study extends beyond these benevolent parameterizations to examine more complex kirigami structures. One identified issue with the application of generative models to such metamaterials lies in the inapplicability of the classical Euclidean distance (ED) as a metric for assessing the similarity between unit cells.

Thus, in this study, we explore the extent to which four of the most common generative design algorithms – VAE, GAN, WGAN, and DDPM – rely on ED. Using established theoretical findings and the example of kirigami structures, we demonstrate that out of these four algorithms, both VAE and WGAN depend on this similarity measure for effective generation. This dependence limits their ability to learn complex design space constraints, although they are more suitable for generating metamaterials with simpler all-admissible parameterizations due to their stability and lower computational costs during training as compared to GAN and DDPM. In contrast, GAN and DDPM demonstrate potential in learning design space limitations but still fall short of fully capturing these constraints. This suggests that reliance on ED is just one factor contributing to the lack of generative models for kirigami metamaterials, highlighting a need for further investigation. Simultaneously, even imperfect generation does not undermine the potential of the considered generative models. Since checking if a sample contains intersections is faster than generating one using any available algorithm, ML-based generative models capable of proposing samples with a good enough success rate, in combination with subsequent screening, might provide superior performance. At the same time, further improvement in ML-based generation of the structures is beneficial, especially considering the relative ease of conditioning the models for specific mechanical properties.

Several factors could contribute to such an improvement of ML-based generation. One idea is the incorporation of negative data during training69,70. When learning the admissible design space using only valid data, it does not matter whether samples are generated inside or outside the admissible space; what matters is how well the probability distributions match. Adding invalid examples during training penalizes generated samples that lie outside the admissible space, allowing the model to more accurately estimate the density of the valid data. A similar approach is to increase the model’s ability to infer sharp decision boundaries from the training data. Even with negative examples included, edge cases where two cuts barely avoid intersection have a very low likelihood. As a result, these cases may not be represented in the training data, making it impossible to determine their admissibility from the data alone. Active Learning, which is concerned with choosing samples in a dataset so that the information gain is maximized71, might offer the solution for inclusion of important border cases. Various active learning approaches have been successfully applied to constraint learning in engineering applications, such as Gaussian Processes72 and t-METASET73. Recent progress in this field has been rapid, especially with the development of Generative flow networks74. However, these methods still need to be adapted for complex and high-dimensional spaces.

Another possible factor in the struggle of generative models (GAN and DDPM) with kirigami metamaterials is connected to the assumptions they make on the structure underlying the data. While VAE and WGAN assume the Euclidean distance can be used to measure data similarity, GAN and DDPM assume the data lies on a low-dimensional Euclidean manifold. If such a mapping does not exist and the underlying structure of the training data is more complex, GAN and DDPM will perform suboptimally. Research into extending diffusion models to more complex data geometries, such as Riemannian manifolds, is still young but promising59,60,61.

Our research highlights the inherent limitations of classical generative approaches when employed in the domain of mechanical metamaterials, as opposed to their typical application in image generation. We underscore the reliance of these methods on the Euclidean distance metric, which is unsuitable for many metamaterials with intricate design spaces. The kirigami metamaterials presented here serve as an ideal benchmark for the development of new generative models, given that the complexity of their design space can be modulated by adjusting the perturbations of the initial system.