Abstract
The energy harvesting capability of a graded metamaterial is maximised via reinforcement learning (RL) under realistic excitations at the microscale. The metamaterial consists of a waveguide with a set of beam-like resonators of variable length, with piezoelectric patches, attached to it. The piezo-mechanical system is modelled through equivalent lumped parameters determined via a general impedance analysis. Realistic conditions are mimicked by considering either magnetic loading or random excitations, the latter scenario requiring the enhancement of the harvesting capability for a class of forcing terms with similar but different frequency content. The RL-based optimisation is empowered by using the physical understanding of wave propagation in a such local resonance system to constrain the state representation and the action space. The procedure outcomes are compared against grading rules optimised through genetic algorithms. While genetic algorithms are more effective in the deterministic setting featuring the application of magnetic loading, the proposed RL-based proves superior in the inherently stochastic setting of the random excitation scenario.
Similar content being viewed by others
Introduction
The reduced power requirements of recent small electronic components1 makes energy harvesting solutions a realistic alternative to conventional chemical batteries, allowing for clean and low-cost autonomous devices2. Several solutions have been developed for harvesting energy from elastic and acoustic waves, one of the most ubiquitous and accessible energy sources. Acoustic and elastic energy is transduced into electricity through electromagnetic, electrostatic and piezoelectric mechanisms3 or, alternatively, through magnetostriction and electroactive polymers4,5. Among these transduction mechanisms, piezoelectric materials are usually preferred thanks to their large power densities and ease of application6. The basic mechanism of piezoelectric energy harvesting is the direct piezoelectric effect converting the deformation of the host structure due to mechanical or acoustic vibrations to an electrical potential7.
In an attempt to enhance energy harvesting capabilities, metamaterial based structures are increasing in popularity, thanks to their advanced performance in wave control and manipulation8,9. They often rely on local resonance for sub-wavelength control of waves, enabling the design of compact devices10. Since such systems already contain a collection of resonators, the inclusion of vibrational energy harvesters is straightforward leading to truly multifunctional meta-structures combining vibration insulation with harvesting11,12,13. To reduce wave scattering induced by impedance mismatch and simultaneously increase the broadband capabilities, graded designs have been proposed. The term graded is adopted to denote a smooth variation of a particular parameter of the local resonators along space (conventionally the resonant frequency), creating local band-gaps to control wave propagation. Graded metamaterials are thus able to manipulate waves by confinement over some spatial region along the structure, enabling wideband vibration attenuation and simultaneous energy harvesting14,15,16,17. In Fig. 1, we exemplify the idea of combining the use of the piezoelectric effect with a metamaterial based structure employing a graded array of resonators.
Different grading profiles have been studied for energy harvesting17,18, spatially programmable trapping or mode conversion19,20. High frequency homogenisation was employed to set the gradient function in graded metamaterials21. A general procedure for grading optimization (both in terms of frequency and spacing) has also been proposed22. The procedure, based on reinforcement learning (RL), has been developed to maximise the mechanical energy confinement of a graded metamaterial under monochromatic input. Whilst the solution obtained provides relevant insights on the physics of graded metamaterials, the relation between grading and energy harvesting efficiency remains an unsolved problem. Moreover, most of the studies on metamaterials consider ideal input sources, neglecting the effect of realistic noisy signals. In this paper, we define a RL procedure for the optimisation of graded metamaterials for energy harvesting under general loading conditions, here exemplified for the common cases of magnetic loading or random vibrations. Other works treating design optimisation as a Markov decision process (MDP) solvable via RL were proposed for phononic crystals for matching23,24 or maximising band-gaps25, and for acoustic metamaterials to minimise wave scattering26. The combination of MDP and RL was also employed in structural optimisation27,28, material design29,30, optics31, and chemistry32. Attempts to improve the computational efficiency of RL-based design optimisation were done by exploiting transfer learning to accelerate the design synthesis starting from data regarding existing bio-structures33; and by using convolutional neural network to generate optimisation initiation points closer to the final outcome26.
A RL-based approach has been preferred to gradient-based optimisation because RL does not require continuity and low-modality throughout the design space34, and because RL does not require the functional relation between the design parameters and the objective function which is hard to obtain when there are discontinuities with respect to the design parameters. Alternative optimisation methods such as particle swarm usually suffer from the difficulty of imposing constrains on the design parameters35. To assess the performance of the proposed RL-based procedure, a comparison has been made with genetic algorithms (GA)36, a state of the art metaheuristic method for nonlinear optimisation. When the optimisation problem is defined in a deterministic setting, GA have proved to be more effective and less computational demanding than the RL. However, when a stochastic setting is considered, the proposed RL-based procedure has notably outperformed GA.
While physics-informed machine learning (ML) has emerged as an effective and elegant approach to solve regression problems37, the combination of physical knowledge and RL has only been explored recently. For example, physical considerations were exploited to provide meaningful and synthetic representations of the studied system38, and governing equations were used to inform policy search and model learning39. Here, the design space is constrained on the basis of the physical understanding of wave propagation in a locally resonant system. Specifically, the dimensionality of the design space is reduced to get configurations possibly leveraging the rainbow effect14,15,40,41, i.e. spatial signal separation depending on frequency.
The innovative aspects of this paper concern both the methodology and the application. On the methodological side, the knowledge of the physics of the problem has been exploited to constrain the state representation of the system and the action space. At variance with what proposed in22, we modified the interpolation rules, used to reduced the dimensionality of the state of the system, on the basis of considerations on the learning of the RL agent, in this way greatly benefiting its training. Such considerations are general and can be applied to other design optimisation problems. On the application side, the optimisation of graded metamaterials for energy harvesting under a realistic input has been considered for the first time. The design we propose is fully compatible with micro fabrication paving a path towards the development of ultra-wide band (UWB) micro-electro-mechanical systems (MEMS)42 for energy harvesting using metamaterials.
Results
Dealing with energy harvesting at the microscale, one of the greatest limitations is the enormous mismatch between the frequencies at which the energy content of the environment is available and the MEMS natural frequency43,44. For this reason, frequency up-conversion (FuC) techniques have been developed to enable MEMS excitation at low frequencies45. Among different FuC techniques, a privileged position is occupied by the magnetic interaction via permanent magnets46, consisting of exploiting a magnetic field to generate an impulsive phenomenon on a piezoelectric transducer without any contact. An optimisation of the graded metamaterial for this kind of input is addressed here. As the theoretical evaluation of the magnetic force is a complex operation, out of the scope of the present paper, we use the simplified formulation developed by Akoun and Yonnet47, which demonstrated good experimental prediction for cubic magnets in Neodymium–Iron–Boron alloy46. In particular, the definition of the magnetic force is done by evaluating47: the magnetisation vectors; the magnetic permeability of vacuum; a few coefficients whose definition depends on the dimension of the two magnets and on the force direction. The magnetic loading is depicted in Fig. 2, alongside a second loading scenario that mimics the excitations to which the device is subjected under random vibrations. Specifically, 128 different time histories, each featuring a frequency spectrum within 0.1 and 2 MHz, are considered to account for the random nature of the vibrations. The use of many load histories avoids the trivial matching of single excitation spectrum peaks; from a numerical point of view, this random loading is generated through a white noise shaped by a Gaussian pulse.
In Fig. 2, we show the geometry of the metamaterial and the resonator configuration that is employed to assess the increase in energy harvesting possibly enabled by grading the resonators’ lengths. A number \(\text {N}_{\text {r}}=30\) of resonators have been considered, symmetrically placed to minimise possible torsional motions17. Both the waveguide and the resonators are made from silicon with Young’s modulus \(E_{\text {al}}=160\) GPa and density \(\rho _{\text {al}}=2330\) kg/m\(^3\). Each resonator is covered by a piezoelectric patch with density \(\rho _{\text {piez}}=7800\) kg/m\(^3\), Young’s modulus \(E_{\text {piez}}=59\) GPa, piezoelectric constant \(d_{31}= -171 \times 10^{-12}\) C/N, and permittivity at constant stress \({\bar{\epsilon }}^S_{33}= 13.3 \times 10^{-12}\) C/(Vm). The cross sections of the aluminium and piezoelectric parts are \(8{\times }5 \mu\)m and \(8{\times }2 \,\upmu\)m, respectively. The waveguide is \(L_{w}=6\) mm long, and it has cross section \(50{\times }5 \,\upmu\)m. Therefore, the cross sectional area and the moment of inertia relevant for the dispersion relation of a propagating flexural wave are \(B_{\text {w}}=250.0 \,\upmu\)m\(^2\) and \(I_{\text {w}}=520.8 \mu\)m\(^4\), respectively. The resonator spacing is set to \(5.5\, \upmu\)m to enable resonator interactions at the sub-wavelength scale48. Alternative choices are not investigated due to the reduced impact of spacing on the dispersive properties of rainbow-based metamaterials22.
The mechanical response of the system is simulated through the finite element (FE) method by employing 304 Euler–Bernoulli beam elements to discretise the waveguide, and by associating a single degree of freedom (dof) to each resonator pair. Specifically, an equivalent lumped mechanical parameter system is employed to simultaneously model the mechanical response of the beam-like resonators and the effect of the attached circuit17. From the mechanical side, a certain mass \(M_{\text {m}}\), damping \(D_{\text {m}}\) and stiffness \(K_{\text {m}}\) are associated to each resonator according to its first bending mode. An equivalent damping \(D_{\text {e}}\) and stiffness \(K_{\text {e}}\) is also introduced to account for the electrical side. Such quantities are determined through a general impedance analysis method49. The steps of the method are briefly described here; additional details can be found in17,50. First, the piezoelectric coupling term \(\vartheta\) is computed on the basis of \(d_{31}\) and of the distance between the centre of the piezoelectric layer and the resonator neutral axis. Second, the electro-mechanical coupling coefficient \(\alpha\) and the piezoelectric capacitance coefficient \(C_p\) are worked out from \(\vartheta\), \({\bar{\epsilon }}^S_{33}\), and the geometric properties of the resonators. Last, the electrically induced damping \(D_{\text {e}}\) and stiffness \(K_{\text {e}}\) of the circuit are determined through the following relations:
where \(Z_{\text {e}}=\frac{1}{\omega C_p}\) is the equivalent impedance of the piezo-mechanical system; \(\text {Re}\left( Z_{\text {e}}\right)\) and \(\text {Im}\left( Z_{\text {e}}\right)\) accounts for the real and imaginary part of \(Z_{\text {e}}\); \(\omega\) is the resonance frequency of the considered resonator pair. The final damping and stiffness properties of each resonator pair are computed by summing mechanical and electrical counterparts, employing an iterative procedure to account for the consequent modification of \(\omega\).
Two absorbing layers are used to avoid wave reflections51, thus minimising the operative conditions in which, typically, reflections at the waveguide extremities cannot be exploited due to the large substrate to which the device is attached. In the laboratory, such absorbing properties are achievable through acoustic black holes15,52,53. The length of the absorbing layers is set to 2.5 mm as it must be at least 4 or 5 times longer than the wavelength to be effective. Consequently, the exciting force, applied at \(x=2.5\) mm, produces a left propagating wave immediately damped out by the absorbing conditions, and a right propagating wave interacting with the resonator array.
Wave propagation is performed in time with a time integration step set to \(1.5 \times 10^{-8}\) s to correctly sample the vibration of each resonator pair and to match the Courant–Friedrichs–Lewy condition. The length of the analysis has not been fixed in advance considering that local interactions between resonators may require a very long time of observation22. Hence, the analysis is ended only if the harvested energy does not modify meaningfully for a sufficiently long time interval. The cumulative harvested energy is computed as in Eq. (2):
where \({\dot{u}}_{\text {n}_{\text {r}}}\) is the velocity of the dof associated to \(\text {n}_{\text {r}}\)th resonator pair; \({\dot{u}}^{\text {w}}_{\text {n}_{\text {r}}}\) is the velocity of the dof of the waveguide to which the resonator pair is attached; T is the time length of the analysis.
The optimisation of the grading rule is carried out by a RL agent. The maximum length assignable to each resonator is roughly \(300\, \upmu\)m, the minimum is \(10\, \upmu\)m. The corresponding resonant frequencies are, respectively, 0.1 and 350 MHz. If one or more resonators are made shorter than \(10\, \upmu\)m, they are removed to avoid ill conditioning of the stiffness matrix.
For magnetic loading, the outcome of the RL-based procedure is reported in Fig. 3. The optimised RL configuration allows an increase in the harvested energy by 32.3% with respect to the random resonator arrangement depicted in Fig. 2 (that spans the same frequency range as the optimized array). In particular, the rainbow effect is enabled by the monotonic increase of the resonators’ lengths. Informing the RL agent of the physics of the problem facilitates this outcome, greatly reducing the variability of the optimisation process. Beyond the rainbow effect, resonators’ lengths are set to amplify the harmonic components of the loading that bear the highest amount of elastic energy. Figure 3 reports 3 different rainbow configurations: linear (Fig. 3a), optimized using GA (Fig. 3b), optimized using RL (Fig. 3c). The linear case is reported and compared to the other two to give a baseline over the effectiveness of a naively monotonic array. The GA solution is reported to show the result suggested by a different optimization algorithm. The RL solution finally is the one reached by a SAC policy. Both GA and RL converge to a solution where a fast initial change of resonators’ lengths is optimal for the combined purpose of slowing down the passing wave and absorbing it. More interestingly the function defined is not monotonic and it searches for a maximum resonator’s length in the second half of the array. Finally, in the last segment both optimizers choose to shorten again the resonators, confirming that this final length’s decrease is more effective or at least equally effective with respect to a simple length increase (commonly implemented in rainbow reflection systems). These last resonators have the role of reflecting back into the first part of the array the leftover energy components of the wave that manage to get thought the absorbing array. Generally higher resonators are implemented because they implicitly generate a local resonance band gap that stop and reflect the wave backward. In order to see whether this result of lower resonators has merit or it is an artifact of the constrain space, we performed new analyses in which the resonators after the highest one in the array are forced to increase rather then decrease. Result showed a similar efficiency (decrease of 0.4%) with respect to the original case. Looking at the linear grading and the one optimised through GA, they are respectively 1.9% and 38.4% more efficient with respect to the random case of Fig. 3. This implies that, for magnetic loading, GA find a better configuration when put up against RL. GA outperform RL also looking at the computing time. Specifically, GA complete the optimisation procedure in 1 h and 18 minutes by performing 50 generations, while RL takes 3 h and 47 minutes to run 5000 episodes. Analyses have been performed on a workstation equipped with Intel®Core\(^{\text {TM}}\), i9 CPU @ 3.6 GHz and with 32 GB RAM.
For white noise loading, the performance of the RL based optimisation has been evaluated again by comparing it to a linear grading and a GA optimised grading. The white noise spectrum always spans the frequency range between 0.1 and 2 MHz but the energy carried by each frequency components is randomized for a total of 128 different loads. Moreover, given that the input to the numerical analyses is a force that follows the white noise function, the energy component of each frequency is not constant with frequency. The relation between the work inserted in the system by an applied harmonic force F and its frequency content (in steady state regime) is \({\mathscr {L}}(f) \div f^{-5/2}\). The proof is based on the Green’s function governing the response to an harmonic force of an infinite waveguide54 ruled by the Euler–Bernoulli beam theory. The extended proof and its numerical validation are reported in the supplementary material. This denotes an intrinsic energy decrease at higher frequencies for the same applied force. For the linear grading the upper limit of 2 MHz is exceeded by the resonance of the first three resonator pairs with which the propagating waves interact.
The outcome of the RL optimisation (depicted in Fig. 4c) for the white noise scenario is reported in Fig. 4 against the linear grading (Fig. 4a) and the GA optimised configuration (Fig. 4b). The RL optimised configuration outperforms both GA and the linear grading, as can be checked by looking at the harvested energy of each of the 128 loading scenarios reported in Fig. 5a. This result shows that it is beneficial to the graded array to have resonators’ heights that follow a monotonic increasing function and, more importantly, that are just concentrated on a smaller frequency range, where most of the energy of the input is concentrated. This means that the interval between 0.1 and 2 MHz is not fully covered. This can be in part attributed to the fact that the circuit damping broadens the bandwidth around the resonance, so a more distributed frequency range is actually absorbed. Looking at Fig. 5b, where \({\mathscr {E}}\) is the energy absorbed by all the resonators combined, a Gaussian distribution shows how much each configuration is able to generalise and be efficient for a wider range of loads. The comparison shows that the GA algorithm is actually more efficient at finding better performing configurations for a specific range of loads, while the RL optimiser finds a configuration that does not outperform the GA for most of the loads, but that can, in general, harvest more energy from all the possible load histories. Looking at the computing time, the duration of the optimisation processes run by the two algorithms is equal to the one detailed for the magnetic loading case, as the same number of generations (for GA) and of episodes (for RL) have been considered. It can be noted that constraining the design space has reduced the possible outcomes of the optimisation process, but it has not precluded obtaining unexpected configurations. Indeed, the finally obtained designs are not only of interest for fabrication, but also help in further understanding the physics of the problem.
Discussion
We have shown the outcome of a RL-based optimisation procedure as applied to the design of a graded metamaterial under two realistic loading scenarios, the first considering magnetic loading as FuC mechanisms, the second mimicking the excitation due to a random source. The latter scenario has required us to evaluate the performance of the optimised design on a statistical basis by considering a set of excitation with different frequency content.
For magnetic loading, GA outperform the proposed RL-based approach, both looking at the performance of the system and at the computing time. On the contrary, for white noise loading, RL-based optimisation outperforms GA and the linear grading giving a baseline over the effectiveness of the optimisation algorithms. Such outcome is due to the stochastic optimisation setting featured by this latter load case.
The application of the RL-based optimisation procedure to the design optimisation of graded metamaterials for energy harvesting shows that: in one sense, physics can be effectively used to inform the optimisation procedure by constraining the design space; in the other sense, investigating the reason behind the decisions of the RL agent can improve the physical understanding of the problem highlighting working mechanisms different from the expected ones.
Methods
The workflow of operations of the RL-based design optimisation is detailed in Fig. 6. A model-free actor-critic approach55, namely the soft actor critic (SAC) algorithm56, is used to maximise the optimisation target, here the amount of harvested energy. The RL agent takes actions according to a stochastic policy parametrised by a neural network (NN). The NN plays the role of a component of the agent termed actor. The policy is updated according to a second NN, modelling another component of the agent called critic. In turn, the critic is updated on the basis of the expected return computed in the environment, here the FE setting in which the response of the piezo-mechanical system is modelled. By assuming a finite number \(\text {N}_{\text {t}}\) of states or, in other words, a finite number of decisions modifying the system configuration, the expected return \(G_{\text {n}_{\text {t}}}\) at the current \({\text {n}}_{\text {t}}\)th state is computed as:
where \({\mathscr {E}}_{{\text {n}}_{\text {t}}+1}\) is the harvested energy, see Eq. (2), for the \(({\text {n}_{\text {t}}+1})\)th state of the system. More details on the application of the SAC algorithm can be found in the literature56.
The use of SAC with respect to other RL algorithms is now motivated. Employing policy gradient methods is suggested to handle the stochastic optimisation setting required by the white noise load case. The actor-critic paradigm reduces the variance error in estimating the optimal policy55. Algorithms handling continuous action spaces have been considered. Looking for the best performance, we have tested the following algorithms on the white noise loading case: the deep deterministic policy gradient57 (DDPG), the trust region policy optimisation58 (TRPO), the proximal policy optimisation (PPO)59, and SAC. The latter has recorded the best performance. It is worth to mention that SAC enjoys the advantage of not requiring the tuning of the learning rate.
We have pointed out that the selected algorithm can handle continuous action space. In principle, a discrete action space could have been set by modifying the length of each resonator according to the tolerances of the manufacturing process. However, considering 30 resonators and a fabrication process tolerance of 0.15 \(\upmu\)m (consistent with the STMicroelectronics ThELMA®technology42) leads to an exploding number of possible designs. Consequently, a continuous action space and a continuous system state have been preferred, and the use of function approximators, namely NNs, is required instead of tabular approaches.
As the automatic extraction through NNs of low-dimensional features informative of the system state is challenging in a RL setting, we have decided to provide an already dimensionality reduced vector as input in alternative of a vector detailing the lengths of each resonator. For this goal, we have introduced the interpolation rule reported in Fig. 7b as it limits the system space to just three dimensions. As a secondary outcome, we have reduced by 10 times (from 30 to 3) the number of states \(\text {N}_{\text {t}}\) explored by the agent for each episode.
To reduce the dimensionality of the system state and of the action space, we have exploited the physical understanding of wave propagation in a local resonant system to exploit the rainbow effect. Specifically, resonators’ lengths are set according to an envelope defined by interpolating a reduced number of points, as done in our previous proposal22 in which B-spline interpolation was exploited. However, B-spline interpolation makes extremely challenging for the agent to foresee the effect of its decisions. As shown in Fig. 7a, every action of the agent, consisting in modifying one interpolation point, completely changes the lengths of each resonator, possibly altering the concavity and the ascending or descending behaviour of their envelope. The current proposal overcomes such difficulties. The first two actions, setting the lengths of the first and last resonator and a linear grading between them, enable to roughly define the operating frequency range of the resonators. The main effect of the third action is to control how fast the dispersion properties of the medium are modified allowing for a possibly parabolic final grading. The arrangement defined for one side of the device is then mirrored to get a symmetrical configuration.
In conclusion, we have demonstrated potential advantages of RL-based optimisation in designing graded arrays of resonators. Superior performances with respect to conventional optimisation procedures, e.g. genetic algorithms, have been demonstrated for inherently stochastic setting of random excitations. More in general, the greatest potential of the procedure is the ability to deal with stochastic optimisation. Possible limitations come from the large number of required model evaluations, implying a three times longer optimisation process with respect to genetic algorithms for the problem at hand. Such drawback becomes more severe by scaling the dimensionality of the problem. For this reason, a critical aspect of the method is represented by the definition of the physical-based constraint on the state representation and action space to avoid evaluating less promising designs, still allowing for unexpected outcomes as seen in the proposed applications.
Future work will act to refine the RL-based optimisation with additional interpolation points, as well as to include asymmetric configurations with further types of waves, e.g. longitudinal or torsional. Extensive experimental activity will be also carried out to validate the obtained numerical results. The proposed method could be applied to 2D or even 3D wave propagation problems for the optimisation of elastic lattices or frames. Finally, further extensions come from the ability to cope with not simultaneous, possibly competing, design goals thanks to the adopted MDP formalisation of the optimisation problem.
Data availability
The datasets analysed during the current study are available from the corresponding author on reasonable request.
Code availability
Codes will be made available upon reasonable request.
References
Newell, D. & Duffy, M. Review of power conversion and energy management for low-power, low-voltage energy harvesting powered wireless sensors. IEEE Trans. Power Electron. 34, 9794–9805. https://doi.org/10.1109/TPEL.2019.2894465 (2019).
Ma, Y., Ji, Q., Chen, S. & Song, G. An experimental study of ultra-low power wireless sensor-based autonomous energy harvesting system. J. Renew. Sustain. Energy 9, 1–19. https://doi.org/10.1063/1.4997274 (2017).
Williams, C. & Yates, R. Analysis of a micro-electric generator for microsystems. Sens. Actuators A 52, 8–11. https://doi.org/10.1016/0924-4247(96)80118-X (1996).
Beeby, S., Tudor, M. & White, N. Energy harvesting vibration sources for microsystems applications. Meas. Sci. Technol. 17, 1–22. https://doi.org/10.1088/0957-0233/17/12/R01 (2006).
Koh, S., Zhao, X. & Suo, Z. Maximal energy that can be converted by a dielectric elastomer generator. Appl. Phys. Lett. 94, 1–4. https://doi.org/10.1063/1.3167773 (2009).
Erturk, A. & Inman, D. Piezoelectric Energy Harvesting 1st edn. (Wiley, 2011).
Elvin, N. & Erturk, A. Advances in Energy Harvesting Methods 1st edn. (Springer, 2016).
Carrara, M. et al. Metamaterial-inspired structures and concepts for elastoacoustic wave energy harvesting. Smart Mater. Struct. 22, 1–9. https://doi.org/10.1088/0964-1726/22/6/065004 (2013).
Wen, Z., Wang, W., Khelif, A., Djafari-Rouhani, B. & Jin, Y. A perspective on elastic metastructures for energy harvesting. Appl. Phys. Lett. 120, 1–13. https://doi.org/10.1063/5.0078740 (2022).
Craster, R. V. & Guenneau, S. Acoustic Metamaterials 1st edn. (Springer, 2013).
Gonella, S., To, A. C. & Liu, W. K. Interplay between phononic bandgaps and piezoelectric microstructures for energy harvesting. J. Mech. Phys. Solids 57, 621–633. https://doi.org/10.1016/j.jmps.2008.11.002 (2009).
Li, Y., Baker, E., Reissman, T., Sun, C. & Liu, W. K. Design of mechanical metamaterials for simultaneous vibration isolation and energy harvesting. Appl. Phys. Lett. 111, 1–6. https://doi.org/10.1063/1.5008674 (2017).
Sugino, C. & Erturk, A. Analysis of multifunctional piezoelectric metastructures for low-frequency bandgap formation and energy harvesting. J. Phys. D Appl. Phys. 51, 1–12. https://doi.org/10.1088/1361-6463/aab97e (2018).
De Ponti, J. M. et al. Graded elastic metasurface for enhanced energy harvesting. New J. Phys. 22, 013013. https://doi.org/10.1088/1367-2630/ab6062 (2020).
De Ponti, J. M. et al. Experimental investigation of amplification, via a mechanical delay-line, in a rainbow-based metamaterial for energy harvesting. Appl. Phys. Lett. 117, 143902. https://doi.org/10.1063/5.0023544 (2020).
De Ponti, J. et al. Enhanced energy harvesting of flexural waves in elastic beams by bending mode of graded resonators. Front. Mater. 8, 1–7. https://doi.org/10.3389/fmats.2021.745141 (2021).
Zhao, B. et al. A graded metamaterial for broadband and high-capability piezoelectric energy harvesting. Energy Convers. Manage. 269, 116056. https://doi.org/10.1016/j.enconman.2022.116056 (2022).
Alshaqaq, M. & Erturk, A. Graded multifunctional piezoelectric metastructures for wideband vibration attenuation and energy harvesting. Smart Mater. Struct. 30, 1–11 (2020).
Alshaqaq, M., Sugino, C. & Erturk, A. Programmable rainbow trapping and band-gap enhancement via spatial group-velocity tailoring in elastic metamaterials. Phys. Rev. Appl. 17, L021003. https://doi.org/10.1103/PhysRevApplied.17.L021003 (2022).
Alan, S., Allam, A. & Erturk, A. Programmable mode conversion and bandgap formation for surface acoustic waves using piezoelectric metamaterials. Appl. Phys. Lett. 115, 1–6. https://doi.org/10.1063/1.5110701 (2019).
Davies, B., Fehertoi-Nagy, L. & Putley, H. On the problem of comparing graded metamaterials. Proc. R. Soc. A Math. Phys. Eng. Sci. 479, 20230537. https://doi.org/10.1098/rspa.2023.0537 (2023).
Rosafalco, L., De Ponti, J. M., Iorio, L., Ardito, R. & Corigliano, A. Optimised graded metamaterials for mechanical energy confinement and amplification via reinforcement learning. Eur. J. Mech. A. Solids 99, 104947. https://doi.org/10.1016/j.euromechsol.2023.104947 (2023).
Luo, C., Ning, S., Liu, Z. & Zhuang, Z. Interactive inverse design of layered phononic crystals based on reinforcement learning. Extreme Mech. Lett. 36, 100651. https://doi.org/10.1016/j.eml.2020.100651 (2020).
He, L. et al. Machine-learning-driven on-demand design of phononic beams. Sci. China Phys. Mech. Astron. 65, 214612. https://doi.org/10.1007/s11433-021-1787-x (2021).
Maghami, A. & Hosseini, S. M. Automated design of phononic crystals under thermoelastic wave propagation through deep reinforcement learning. Eng. Struct. 263, 114385. https://doi.org/10.1016/j.engstruct.2022.114385 (2022).
Shah, T. et al. Reinforcement learning applied to metamaterial design. J. Acoust. Soc. Am. 150, 321–338. https://doi.org/10.1121/10.0005545 (2021).
Ororbia, M. E. & Warn, G. P. Design synthesis through a Markov decision process and reinforcement learning framework. J. Comput. Inf. Sci. Eng. 22, 021002. https://doi.org/10.1115/1.4051598 (2021).
Brown, N. K., Garland, A. P., Fadel, G. M. & Li, G. Deep reinforcement learning for engineering design through topology optimization of elementally discretized design domains. Mater. Design 218, 110672. https://doi.org/10.1016/j.matdes.2022.110672 (2022).
Rajak, P. et al. Autonomous reinforcement learning agent for stretchable Kirigami design of 2d materials. NPJ Comput. Mater. 7, 102. https://doi.org/10.1038/s41524-021-00572-y (2021).
Zheng, B., Zeyu, Z. & Gu, G. X. Designing mechanically tough graphene oxide materials using deep reinforcement learning. NPJ Comput. Mater. 8, 225. https://doi.org/10.1038/s41524-022-00919-z (2022).
Sajedian, I., Badloe, T. & Rho, J. Optimisation of colour generation from dielectric nanostructures using reinforcement learning. Opt. Express 27, 5874–5883. https://doi.org/10.1364/OE.27.005874 (2019).
Zhou, Z., Kearnes, S., Li, L., Zare, R. N. & Riley, P. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752. https://doi.org/10.1038/s41598-019-47148-x (2019).
Badloe, T., Kim, I. & Rho, J. Biomimetic ultra-broadband perfect absorbers optimised with reinforcement learning. Phys. Chem. Chem. Phys. 22, 2337–2342. https://doi.org/10.1039/C9CP05621A (2020).
Skinner, S. & Zare-Behtash, H. State-of-the-art in aerodynamic shape optimisation methods. Appl. Soft Comput. 62, 933–962. https://doi.org/10.1016/j.asoc.2017.09.030 (2018).
Perez, R. & Behdinan, K. Particle swarm approach for structural design optimization. Comput. Struct. 85, 1579–1588. https://doi.org/10.1016/j.compstruc.2006.10.013 (2007).
Holland, J. H. Genetic algorithms. Sci. Am. 267, 66–73 (1992).
Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P. & Sifan, L. Y. W. Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440. https://doi.org/10.1038/s42254-021-00314-5 (2021).
Zhao, P. & Liu, Y. Physics informed deep reinforcement learning for aircraft conflict resolution. IEEE Trans. Intell. Transp. Syst. 23, 8288–8301. https://doi.org/10.1109/TITS.2021.3077572 (2022).
Liu, X.-Y. & Wang, J.-X. Physics-informed dyna-style model-based deep reinforcement learning for dynamic control. Proc. R. Soc. A Math. Phys. Eng. Sci. 477, 20210618. https://doi.org/10.1098/rspa.2021.0618 (2021).
Colombi, A., Colquitt, D., Roux, P., Guenneau, S. & Craster, R. V. A seismic metamaterial: The resonant metawedge. Sci. Rep. 6, 1–6. https://doi.org/10.1038/srep27717 (2016).
De Ponti, J. M. Graded Elastic Metamaterials for Energy Harvesting (Springer, 2021).
Corigliano, A. et al. Mechanics of Microsystems (Wiley, 2017).
Roundy, S., Wright, P. K. & Rabaey, J. M. Energy Scavenging for Wireless Sensor Networks 1st edn. (Springer, 2004).
Kulah, H. & Najafi, K. Energy scavenging from low-frequency vibrations by using frequency up-conversion for wireless sensor applications. IEEE Sens. J. 8, 261–268. https://doi.org/10.1109/JSEN.2008.91712 (2008).
Maamer, B. et al. A review on design improvements and techniques for mechanical energy harvesting using piezoelectric and electromagnetic schemes. Energy Convers. Manage. 199, 1–23. https://doi.org/10.1016/j.enconman.2019.111973 (2019).
Rosso, M., Corigliano, A. & Ardito, R. Numerical and experimental evaluation of the magnetic interaction for frequency up-conversion in piezoelectric vibration energy harvesters. Meccanica 57, 1139–1154. https://doi.org/10.1007/s11012-022-01481-0 (2022).
Akoun, G. & Yonnet, J.-P. 3d analytical calculation of the forces exerted between two cuboidal magnets. Energy Convers. Manage. 20, 1–3. https://doi.org/10.1109/TMAG.1984.1063554 (1984).
Lemoult, F., Fink, M. & Lerosey, G. Acoustic resonators for far-field control of sound on a subwavelength scale. Phys. Rev. Lett. 107, 064301. https://doi.org/10.1103/PhysRevLett.107.064301 (2011).
Liang, J. & Liao, W.-H. Impedance modeling and analysis for piezoelectric energy harvesting systems. IEEE ASME Trans. Mechatron. 17, 1145–1157. https://doi.org/10.1109/TMECH.2011.2160275 (2012).
Hu, G., Tang, L., Liang, J. & Das, R. Modelling of a cantilevered energy harvester with partial piezoelectric coverage and shunted to practical interface circuits. J. Intell. Mater. Syst. Struct. 30, 1896–1912. https://doi.org/10.1177/1045389X19849269 (2019).
Rajagopal, P., Drozdz, M., Skelton, E. A., Lowe, M. J. & Craster, R. V. On the use of absorbing layers to simulate the propagation of elastic waves in unbounded isotropic media using commercially available finite element packages. NDT & E Int. 51, 30–40. https://doi.org/10.1016/j.ndteint.2012.04.001 (2012).
O’Boy, D., Krylov, V. & Kralovic, V. Damping of flexural vibrations in rectangular plates using the acoustic black hole effect. J. Sound Vib. 329, 4672–4688. https://doi.org/10.1016/j.jsv.2010.05.019 (2010).
Georgiev, V., Cuenca, J., Gautier, F., Simon, L. & Krylov, V. Damping of structural vibrations in beams and elliptical plates using the acoustic black hole effect. J. Sound Vib. 330, 2497–2508. https://doi.org/10.1016/j.jsv.2010.12.001 (2011).
Graff, K. Wave Motion in Elastic Solids. Dover Books on Physics Series (Dover Publications, 1991).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT press, 2018).
Haarnoja, T. et al. Soft actor-critic algorithms and applications. https://doi.org/10.48550/arXiv.1812.05905 (2019).
Lillicrap, T. et al. Continuous control with deep reinforcement learning. arXiv:1509.02971 (2019).
Schulman, J., Levine, S., Abbeel, P., Jordan, M. & Moritz, P. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning, vol. 37 of Proceedings of Machine Learning Research (Bach, F. & Blei, D., eds.), 1889–1897 ( PMLR, 2015).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. https://doi.org/10.48550/ARXIV.1707.06347 (2017).
Acknowledgements
The authors are grateful to Bao Zhao (ETH Zürich) for fruitful discussion on the equivalent lumped parameter description of the piezo-mechanical system. The authors acknowledge the support of the H2020 FET-proactive Metamaterial Enabled Vibration Energy Harvesting (MetaVEH) project under Grant Agreement No. 952039.
Author information
Authors and Affiliations
Contributions
L.R. and J.M.D.P. conceived the methodology; L.R. ran the numerical experiments; L.R., J.M.D.P. L.I. and R.V.C. interpreted the results; R.A. and A.C. contributed to the ideas and general organisation of the work. All authors contributed to write the original draft and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rosafalco, L., De Ponti, J.M., Iorio, L. et al. Reinforcement learning optimisation for graded metamaterial design using a physical-based constraint on the state representation and action space. Sci Rep 13, 21836 (2023). https://doi.org/10.1038/s41598-023-48927-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-48927-3
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.