Abstract
Wavelength-selective thermal emitters (WS-TEs) have been frequently designed to achieve desired target emissivity spectra, as a typical emissivity engineering, for broad applications such as thermal camouflage, radiative cooling, and gas sensing, etc. However, previous designs require prior knowledge of materials or structures for different applications and the designed WS-TEs usually vary from applications to applications in terms of materials and structures, thus lacking of a general design framework for emissivity engineering across different applications. Moreover, previous designs fail to tackle the simultaneous design of both materials and structures, as they either fix materials to design structures or fix structures to select suitable materials. Herein, we employ the deep Q-learning network algorithm, a reinforcement learning method based on deep learning framework, to design multilayer WS-TEs. To demonstrate the general validity, three WS-TEs are designed for various applications, including thermal camouflage, radiative cooling and gas sensing, which are then fabricated and measured. The merits of the deep Q-learning algorithm include that it can (1) offer a general design framework for WS-TEs beyond one-dimensional multilayer structures; (2) autonomously select suitable materials from a self-built material library and (3) autonomously optimize structural parameters for the target emissivity spectra. The present framework is demonstrated to be feasible and efficient in designing WS-TEs across different applications, and the design parameters are highly scalable in materials, structures, dimensions, and the target functions, offering a general framework for emissivity engineering and paving the way for efficient design of nonlinear optimization problems beyond thermal metamaterials.
Similar content being viewed by others
Introduction
All objects in nature emit thermal radiation outwardly at anytime and anywhere in a broadband, non-selective, incoherent, diffusive, and reciprocal manner1,2. Thanks to the fast development of thermal metamaterials and metasurfaces in recent years, thermal radiation has been demonstrated to be engineered with comprehensive control of spectral, directional, and dynamic characteristics, enabling higher-efficiency radiative heat transfer than the thermal radiation of natural objects3. Among them, spectral emissivity engineering of thermal radiation enables more applications, such as energy harvesting4, thermal management5,6, radiative cooling7,8, thermal camouflage9,10, infrared (IR) sensing11, far-/near-field radiation control12, thermophotovoltaics13, thermography14,15, heat-assisted magnetic recording16, etc. The emissivity engineering aims to select materials and design nanostructures to achieve specific functionalities with a target emissivity spectrum. The common physics of selective emissivity comes from the excitation of different photon modes, which leads to the local enhancement or suppression of the internal electric field, thus allowing for control over the radiation emission at different wavelengths17. Wavelength-selective thermal emitters (WS-TEs), as the main output of emissivity engineering, can be achieved by multilayers18, photonic crystals, nano-grating19, nano antennas arrays20, multiple-quantum-well21, Fabry-Perot cavities22, hallow cavities23, etc. As one of the simplest structures of WS-TEs, one-dimensional multilayers are frequently employed which are composed of alternating layers of materials with different refractive indices, allowing or blocking the propagation of light of specific wavelength in it, and together with absorption of lossy medium, so as to achieve the regulation of emissivity24. The diversity of materials and the large parameter space of multilayer structures provide significant flexibility in tuning emissivity. Additionally, they are relatively easy to fabricate using thin film deposition at a low cost, which makes them promising for large-scale manufacturing. More importantly, the emissivity spectra of the multilayers can be efficiently simulated using the transfer matrix method (TMM), which is easily combined with various optimization algorithms. Therefore, multilayers are frequently designed as typical WS-TEs and are widely applied for extensive applications in thermal camouflage (TC)25,26, radiative cooling (RC)27,28,29, gas sensing (GS)30,31,32, etc.
In general, different applications require distinct emissivity spectra, as illustrated in Fig. 1. For instance, TC necessitates low emissivity within the long-wavelength IR range (8–13 μm) to prevent detection by most IR detectors when the background temperature is low, where the long-wavelength IR range is called as atmosphere window (AW) due to its high transmittance. Additionally, it is advantageous for the emissivity outside the AW to remain as high as possible to facilitate further radiative heat dissipation33. To achieve TC, Peng et al. designed a silver/germanium (Ag/Ge) multilayered structure, where impedance matching is utilized to manipulate the radiation characteristics34. Zhu et al. designed a Ge/ZnS multilayer on a silica aerogel substrate with efficient radiative cooling capability for TC in high ambient temperatures35. In contrast, RC aims to achieve passive cooling by radiating the heat directly to the outer space at ~3 K via the high emissivity within the AW. In addition, a high reflectivity in the solar band is necessarily required to reflect as much solar energy as possible to maximize the cooling power, ultimately achieving net energy outflow and reducing object temperature36. Raman et al. adopted needle optimization method to design a seven-layer HfO2/SiO2 emitter. The fabricated multilayer emitter achieved daytime RC under direct solar irradiance for the first time, which reflected 97% of solar irradiance and cooled to 4.9 °C below the ambient temperature37. Similarly, Ma et al. optimized seven-layer SiO2/Si3N4 emitter using an evolutionary algorithm, and the emitter was highly reflective towards solar radiation and had a broadband high emissivity within the AW38. Different from the broadband emissivity spectra for RC and TC, the emissivity spectrum for GS needs narrow-band peaks which match the absorption peaks of the detected gas. Sakurai et al.39 and Xi et al.31 both utilized machine-learning to design and optimize multilayered structures and achieved ultra-narrowband emission peaks at multiple wavelengths. In particular, Xi et al. obtained the whole database of multilayered WS-TEs for narrowband emissivity spectra in the wavelength range of 3 to 10 μm, and the highest q-factor reaches 508 far beyond the q-factor record in the literature.
As far as we are concerned from the literature, although many combinations of materials of multilayer WS-TEs have been proposed to regulate emissivity, both material selection and structural design still rely on physics-inspired methods and past design experience or guidelines, which are inefficient and difficult to achieve optimal structural design. To further improve the performance of multilayer WS-TEs, machine-learning optimization algorithms have shown unique advantages in structure optimization and designing problems31,40. However, designers still have to conduct extensive searches in existing work to determine suitable materials and initial structural parameters for their design goals before optimization. Consequently, researchers, following their prior knowledge of materials or structures for different applications, either fix materials to design structures11,39,41 or fix structures to optimize material arrangement13,42 to reduce the optimization space and improve the design efficiency. Hence, one open question comes that can we offer a general framework for designing WS-TEs for different applications without a prior knowledge of materials and structures? If so, we can just change the target emissivity spectra and the WS-TEs will be output directly with matching emissivity spectra to the target one.
Recently, deep learning has attracted increasing attention in various domains, such as natural language processing, computer vision, image processing, speech recognition and material structure optimization43. Through establishing the artificial neural network and data-driven method, deep learning obtains the mapping relationship between data pairs, that is, from emissivity spectra to design parameters of the emitters. However, challenges such as the one-to-many mapping problem, analysis from complex spectra to design parameters, along with the dataset acquisition, collectively render most neural network models inefficient for addressing the emitter design within an enormous optimization space that concurrently encompasses material selection and structural optimization simultaneously. Fortunately, deep reinforcement learning (DRL), which combines deep learning and reinforcement learning, promises to address the above challenges. It does not directly parse the mapping relationship between data pairs from the pre-collected dataset, but constantly interacts with the current environment to make decisions to update the state of the environment, and uses historical experience as the dataset to learn and optimize the deployment of decisions, so as to maximize the accumulated reward value44. Consequently, it has been proven to be capable of solving large-scale and complicated tasks, such as Go and Chess45. Wang et al. proposed a sequence generation network based on DRL for the design of optical multilayer films46. However, due to the design parameters being generated from the same network, their diversity is limited. In addition, other DRL based design frameworks still face serious challenges in terms of design efficiency47.
In this study, we propose a general design framework based on deep Q-learning network (DQN) for the design and optimization of WS-TEs in emissivity engineering without a prior knowledge of materials and structures. This framework demonstrates high accuracy and efficiency as well as flexibility and scalability in design parameters and applications. Three multilayer WS-TEs for three applications including TC, RC, and GS, are designed and optimized by the framework, which are then experimentally fabricated and measured, matching with the designed emissivity spectra. The selection of materials and the design of the structure are independently completed by DQN within the extensive optimization space. The designed multilayer WS-TEs all exhibit exceptional performance in these three applications, validating DQN as a general deep learning framework for emissivity engineering.
Results
Construction of DQN framework for WS-TEs design and optimization
The roadmap of optimization process of DQN is illustrated in Fig. 2. The whole optimization process can be described as an interactive process with the environment. The state of the environment, which consists of the material ID number and the thicknesses of each layer, represents the materials and structural parameters of the current multilayer. Here, we set up the multilayer WS-TEs as a 5-layer structure composed of alternating two materials. Considering that this specific structural configuration has been implemented for various applications in emissivity engineering41,48. Naturally, the setting of design parameters is flexible and can be adjusted according to design objectives, including the kinds of materials, layer count, and other structural parameters (For more details, see Supplementary Information Note 3). It is worth mentioning that, while increasing the number of layers and materials may meet more rigorous emissivity spectrum requirements, it also significantly expands the optimization space by several orders of magnitude, requiring greater computing power and longer design time. Consequently, according to the structural configuration set above, the state can be represented by a 1×7 vector containing material and structure information. The two materials are selected from the self-built material library, as shown in Table 1, which contains 8 commonly used materials for emissivity engineering. These candidate materials cover most optical properties. Their optical properties (refractive index) are referred to E. Palik’s and Querry’s books49,50 and other research work51,52 (See Supplementary Information S1). Regarding the substrate material, it needs to be selected according to specific design goals, we chose silver for RC, silica for TC, and tungsten for GS. Each layer thickness is varied within the range of 20–1000 nm with a uniform step size of 20 nm, which results in a total of 50 possible steps for each layer. Considering the 8 available materials, this structural configuration leads to 8 × 7 × 505 = 1.75 × 1010 potential candidate structures. The demand of simultaneous material selection and structure optimization, together with the sheer volume of optimization space, renders manual design impractical and presents significant challenges to conventional machine learning methods.
After the physical information of the multilayer structure is encoded into digital information, it is inputted into an artificial neural network. The network, called ‘agent’ in DQN, consists of an input layer, three fully connected layers and an output layer. The number of neurons in the three fully connected layers is 24, 48 and 24, respectively. These layers perform computations on the input data, extracting relevant features and learning patterns from the encoded structural information. The output layer of the agent is referred to as the “action” layer. It generates a single value and each value corresponds to a policy that can be applied to update the current state (structure). More details about the actions and their corresponding policies can be found in Table 2, which provides a mapping between the output values of the action layer and the structural modifications they represent. Then TMM is adopted to simulate the radiation characteristics of the new state (new structure), and obtain its emissivity according to Kirchhoff’s law. To evaluate the performance of the new state, a reward R is obtained from the emissivity spectra. The reward serves as feedback for the agent and plays a crucial role in determining the convergence direction of the DQN model. The specific definition of the reward will depend on the desired application or emissivity target, and further details regarding the reward for TC, RC, and GS will be provided later.
In the DQN, a Q-function Q (s, a) is defined to represent the expected cumulative reward for taking the action a on state s and following the optimal policy thereafter. The agent is trained to approximate the Q-function to make the best choice of action to achieve higher reward by utilizing the replay buffer, which stores historical experiences (state, action, reward, and next state) during the interaction with the environment. To enhance the stability of training process, the dual network structure is utilized, where the main network (agent) is used to collect experiences and the target network, a copy of agent, is used to calculate the target Q-value based on Bellman equation as follows53:
where rt is the reward, γ is the discount factor, \({a}^{\ast }=\text{arg}{\max }_{a}Q({s}_{t+1},a;w)\) represents the action selected by the main network that maximizes the Q-value. w- and w are the weights of the target network and the main network, respectively. The update of the network parameters is achieved by the back-propagation algorithm to minimize the loss function, which is the mean squared error between the predicted Q-value and the target Q-value, as follow:
In addition, Epsilon Greedy Exploration (EGE) algorithm is employed to balance exploration and exploitation. Initially, DQN tends to generate action randomly, but gradually, as epsilon decreases, it relies on the Q-function for decision making. Finally, it is crucial to design an appropriate initialization method for the state to make DQN capable for multilayer optimization with high efficiency. Here we randomly initialize two materials of the state from the material library, with the thickness of each layer randomly generated with the range described above. Additionally, we introduce an iteration threshold, which servers to evaluate whether the iteration should continue. When the reward R of a state exceeds the iteration threshold, the state with the highest historical reward is chosen as the initial structure for the next iteration. For each iteration, DQN continues to accept the state, take the action, simulate the emissivity spectra, feedback and then accept the next state. Once the reward of a new state falls below the iteration threshold, the structure will be reinitialized for the next iteration. It is important to note that the ‘train from buffer’ mechanism results in the number of simulations or the number of calculated structures are not equal to the number of iterations. In simple terms, the design and optimization process of DQN can be likened to playing a game. The game will continue until the mission fails, at which point it needs to be initialized and restarted. An ingenious initialization method can help achieve higher scores efficiently.
In order to showcase the generality and effectiveness of the DQN algorithm, we design multilayer WS-TEs in the following for three applications in emissivity engineering, including TC, RC, and GS, respectively, under the same optimization framework and utilizing a common material library.
Design and optimization of WS-TE for TC
As mentioned earlier, the reward function needs to be meticulously defined to ensure that the optimization progress in the desired direction. So firstly, for TC, since an ideal TC emitter requires low emissivity inside AW (8–13 μm) but high emissivity outside, we therefore define the reward R as the difference between the average emissivity inside and outside the AW, which can be calculated as:
where \({I}_{{\rm{BB}}}=h{c}^{2}/{\lambda }^{5}\cdot {[\exp (hc/\lambda {k}_{{\rm{B}}}T)-1]}^{-1}\) is the spectral radiance of a blackbody at wavelength λ and temperature T. h and kB are the Planck’s constant and Boltzmann constant, respectively and c is the speed of light. \(\varepsilon (\lambda )\) is the emissivity spectrum of the designed TC emitter. The temperature here is set to 350 K, which is slightly higher than the average surface temperature of armored vehicles in the military. The reward R yields a value between 0 and 1 based on Eq. (3). By pre-trial, the iteration threshold is set as 0.2. In addition, the rewards R less than 0.2 are mandatorily modified to −0.2, which signals to the agent that the states corresponding to the negative rewards do not meet the design requirements. The initialization method may introduce randomness to the optimization results or lead the optimization to a local optimal solution. To mitigate the above impact, the optimization process is run 5 times to obtain the optimal TC emitter structure. Each run consists of 1000 iterations, which is sufficient to reduce epsilon in the Epsilon Greedy algorithm to its minimum value. This ensures that the agent dominates the selection of actions. Once the optimization is completed, the optimal structure is experimentally fabricated using magnetron sputtering to demonstrate the feasibility of the structural optimization.
The schematic of resulting optimal structure and corresponding scanning electron microscopy (SEM) image of fabricated multilayer are shown in Fig. 3a. It can be seen that DQN finally choose ZnS and Ge as the materials for the TC emitter. The thicknesses of each layer, including the values designed and those obtained from the SEM image of the fabricated sample, are presented in Fig. 3a. It is evident that the layer thicknesses in the optimal TC emitter are irregular and aperiodic, which is difficult to design accurately for manual optimization. However, due to the manufacturing precision, there are certain deviations between the thicknesses of fabricated sample and the designed values, resulting in the discrepancy of their corresponding emissivity spectra as depicted in Fig. 3b. In addition, the differences between the optical properties of the sputtered materials used for fabrication and the input parameters used in the numerical simulation also make a certain impact. Nevertheless, both the designed and fabricated structures exhibit low emissivity within the AW and high emissivity outside the window. The calculated average normal emissivity in AW of simulation is 0.18, while 0.79 is obtained outside the AM, resulting in the reward value of 0.61. The excellent camouflage effect is attributed to low thermal emission in the AW (IR camera detected band) and high emission outside AW for further radiative cooling. For further verification, the normalized electric field intensities of the optimal structure at 6.65 μm and 8.93 μm are plotted in Fig. 3c. The intensity of the electric field at 8.93 μm is degraded heavily, which means a forbidden band is formed in AW resulting in low absorption (and therefore low emissivity) in this band. While the intensity outside AW remains relatively unchanged, resulting in high emissivity for the structure with the lossy SiO2 substrate. The emissivity of the optimal structure as a function of incident angle and wavelength is shown in Fig. 3d, indicating the angular independence of the excellent performance.
In order to demonstrate the efficiency of the optimization under the framework of DQN algorithm, we quantitatively analyze the reward R as a function of the percentage of the number of calculated structures. As shown in Fig. 4a, DQN only calculated less than 0.2% of all the calculated structures to obtain 70% and 90% of the maximum reward and calculated only 4.428% of the structures to find the optimal structure for TC. It can be obviously seen that, with the progress of optimization iterations, the emissivity within the AW decreases continuously, while the emissivity outside the window gradually increases, aiming to achieve a better camouflage effect. In addition, the material combinations of structures achieving 70% and 90% of the maximum reward are the same as the optimal structure, as shown in Fig. S2, which indicates that DQN is capable of selecting appropriate materials at a rapid pace and then performs subsequent structural optimization. The parametric distribution curves of each layer thickness are presented in Fig. 4b, which indicate that the optimal layer thicknesses are derived from the peak of the curves. To further confirm the optimality of the structure obtained, we perform Bayesian optimization (BO) on the multilayer WS-TEs for TC under the fixed material combination, namely ZnS and Ge. Figure 4c illustrates the reward histories, showing that the maximum reward and corresponding structure by BO are consistent with those obtained using DQN. However, the proposed framework based on DQN still demonstrates higher efficiency while optimizing both materials and structure. Further details on BO for TC emitter are available in Supplementary Information Note 1.
Design and optimization of WS-TE for RC
For designing a RC emitter, the objective is to maximize the emissivity within the AW, while minimizing it in the solar band so as to achieve maximum net energy power outflow. The net energy power, also called cooling power, can be denoted by
where \({P}_{{\rm{rad}}}\) is the output power from the RC emitter, \({P}_{{\rm{atm}}}\) is the input power from the atmosphere radiation, \({P}_{{\rm{sun}}}\) is the input power from the sun and \({P}_{{\rm{cond}}+{\rm{conv}}}\) describes the heat exchange between the RC emitter and the environment by conduction and convection. T and Tamb are the temperature of RC emitter and ambient, respectively. \(\theta\) is the angle of solar radiation. A more detailed calculation method of each power is provided in the Supplementary Information Note 2. In the following calculation, the conjugate heat transfer coefficient in \({P}_{{\rm{cond}}+{\rm{conv}}}\) is set as \({h}_{{\rm{c}}}=5W\cdot {m}^{-2}\cdot {K}^{-1}\) and the ambient temperature is kept at \({T}_{amb}=25\,^\circ {\rm{C}}\) to simulate a breeze situation. Obviously, the greater the cooling power, the better the performance of the designed RC emitter. However, it seems not intuitive to use cooling power as reward, and it is difficult to set a suitable iteration threshold. Therefore, the reward R is set as the difference between the steady-state temperature (Tsteady) of the RC emitter and the ambient temperature, namely the temperature drop below the Tamb. If the \({P}_{{\rm{cooling}}}\) is positive at the initial temperature \({T}_{{\rm{init}}}\) (\({T}_{{\rm{init}}}\)=\({T}_{{\rm{amb}}}\)), the RC emitter starts to be cooled down. As the temperature of cooler decreases, the cooling power \({P}_{cool}\) also reduces until \({P}_{{\rm{cooling}}}({T}_{{\rm{steady}}})=0\). At that time, the RC emitter reaches an equilibrium state and the Tsteady can be obtained from the Eq. (4)27. Previous studies have shown that the temperature difference (\(\varDelta T={T}_{{\rm{amb}}}-{T}_{{\rm{steady}}}\)) can reach 8 °C or even higher8,36, so the iteration threshold is set as 5 °C. Similar to the previous design for TC, the rewards R less than 5 will be mandatorily modified to −5. The optimization is also implemented for 5 times with 1000 iterations each.
The design and optimization results of the RC emitter are presented in Fig. 5a. SiO2 and TiO2 are finally chosen as the materials for the optimal RC structure. The layer thicknesses of the optimal RC emitter also exhibit irregular and aperiodic. The emissivity spectra of the designed and the fabricated structures are shown in Fig. 5b. It can be seen that the designed RC emitter exhibits near zero emissivity in solar spectrum band, allowing it to reflect the incident solar radiation energy. In contrast, a high emissivity is obtained within the AW, enabling it radiates heat efficiently to outer space. Due to the differences between the thickness of the fabricated sample and the designed values, their emissivity spectra are not completely consistent. The reward R of the optimal RC emitter is 16.99, which means it can maintain 16.99 °C below the ambient temperature at thermal equilibrium in theory. The cooling power at the initial temperature is 132.40 W/m2. The equilibrium temperature difference and cooling power both exhibit the excellent performance of the designed RC emitter. The normalized electric field intensities of the optimal structure in the visible wavelength band and AW are illustrated in Fig. 5c, indicating the strong reflection of the Ag substrate and the high emissivity resulting from the electric field enhancement, respectively. Furthermore, the angular independence of emissivity spectra can also be observed within an angle of less than 80°, as shown in Fig. 5d.
The optimization process is quantitatively shown in Fig. 6a. In the early stage of optimization, the reward R increases sharply, which means that DQN can quickly identify suitable materials for the RC emitter and perform optimization under this material combination until the optimization process tends to be smooth (as shown in Fig. S3). The material combination of the structure yielding 50% of maximum reward is Si/SiO2, which indicates that DQN replaces Si with TiO2 to achieve better cooling performance, as shown in Fig. S3a. During the smooth optimization period, the thickness of each layer is continuously optimized to further enhance the radiative cooling performance. When calculating less than 2% of the candidate structures, the RC emitter could reach a temperature drop of 14.94 °C below the ambient temperature in a steady state. After 1000 iterations, only 6.31% of structures need to be calculated to find the structure for the RC emitter with the maximum reward. To further exhibit the details of the optimization, the parametric distribution curves of each layer thickness are shown in Fig. 6b. In addition, except for the material combination of the optimal RC emitter, other material combinations are shown in Fig. 6c. It can be seen that SiO2 and Si3N4 also exhibit potential as the materials of RC emitter, in addition to TiO2 and SiO2. The occurrence of less frequent material combinations can be explained by the random initialization of the DQN and the random selection of the EGE algorithm used in DQN.
Design and optimization of WS-TE for GS
In the final part of this work, we adopt DQN to tackle a more rigorous task, that is, to achieve peak emissivity at a fixed wavelength for GS. More specifically, the target is to obtain a narrow-band emission peak with a high emissivity at the wavelength of absorption peak of the detected gas, while the emissivity at other wavelengths is zero to eliminate the impact of absorption by other gases. Here, we take carbon dioxide (CO2) as the target gas, which has an absorption peak at 4.26 μm. The reward is defined as the product of the emissivity at 4.26 μm and the q-factor of the narrow peak, as follows:
where q is used to ensure that a narrow-band emission peak can be generated and \({\varepsilon }_{{\rm{t}}}\) is to ensure a high emissivity at target wavelength, 4.26 μm. Maximize the product of the two terms to optimize the resulting GS emitter with a narrow-band emission peak that matches the carbon dioxide absorption peak. By pre-train, the iteration threshold is set to 2. The optimization was run 5 times with 1000 iterations each to obtain the optimal structure while eliminating randomness.
As shown in Fig. 7a, the Si and SiO2 are chosen as the materials of GS emitter by DQN. The emissivity spectra of the optimized structure are shown in Fig. 7b. The simulation result shows that a sharp and high emissivity peak can be realized with the optimized structures at 4.26 μm, and the emissivity outside the narrow-band is close to zero. The corresponding emissivity of the peak is 0.9996, and the reward R of the structure is 60.62. The result shows that the designed WS-TE is sufficient to be an excellent CO2 sensor. Due to the thickness deviation of the fabricated sample, the measured wavelength of the emissivity peak deviates from the target wavelength but still within the CO2 absorption peak. The emission peak is located at 4.3 μm and the peak value is 0.905. Regrettably, the fabricated sample presents a certain low emission outside the absorption peak, which can be attributed to the discrepancy in properties between the sputtered and simulated materials. Figure 7c displays the normalized electric field intensities of the optimal structure at 4.26 μm and 5 μm. Due to the excitation of the localized Tamm plasmon state, the electric field intensity is significantly enhanced in a region 0.3 μm from the top of the substrate, resulting in peak emissivity at 4.26 μm. However, there is no notable enhancement of the intensity of electric field at 5 μm, resulting in near-zero emissivity at this wavelength. The incident angle related emissivity spectra are displayed in Fig. 7d. It can be seen that the angular independence only occurs within 30°, but it does not have any effect on gas sensing since the emitter typically faces the detected gas in the normal direction.
The optimization process of the GS emitter is presented in Fig. 8a. In the early stage of optimization, the emitter has only a small emissivity peak within the research band and the wavelength of emissivity peak deviates from 4.26 μm. As the iterations progress, suitable material combinations and optimized structure parameters lead to an improved and more obvious emissivity peak that gradually shifts towards the target wavelength of 4.26 μm. Eventually, a near perfect emissivity peak is achieved at 4.26 μm with a q-factor of 60.64. Further insights into the structure evolution during the optimization process can be obtained from Fig. S4. The distribution of each layer thickness as well as the material combinations are shown in Figs. 8b, c, respectively. Note the formation of peaks in the layer thickness distribution and the diversity of material combinations, indicating that appropriate material combinations are more important for achieving finer emissivity spectra. DQN successfully recognized this feature and efficiently implemented the design of the emitter with the help of the defined initialization method. Consequently, the combination of Si and SiO2 is undoubtedly the most suitable choice for achieving the target emissivity spectrum for CO2 sensing.
Discussion
In summary, we present a general deep learning framework, i.e., DQN, for emissivity engineering of WS-TE design across applications. To demonstrate the generality, three WS-TEs are designed for typical applications, namely TC, RC and GS, which can autonomously select suitable materials from the same self-built material library for different design target functions and optimize to output the best structural parameters from a huge optimization space efficiently. The three design tasks are based on the same structural framework, so they can share the same material library, and can be easily extended from application to application by setting the corresponding reward function. The merits of the deep Q-learning algorithm include that it can (1) offer a general design framework for WS-TEs beyond one-dimensional multilayer structures, such as two-dimensional periodic array and complicated structures; (2) autonomously select suitable materials from a self-built material library without presetting the initial materials, and (3) autonomously optimize structural parameters for the target emissivity spectra efficiently across different applications. Additionally, the input parameters of the DQN framework are highly flexible in materials, structures, dimensions, and the target functions, paving a general solution to other nonlinear optimization problems beyond emissivity engineering.
Materials and methods
Simulation
The reflection and transmission of the multilayer WS-TEs were calculated using transfer matrix method based on the Fresnel equations. The emissivity was obtained from the corresponding reflection and transmission according to the law of conservation of energy. The code of DQN was written using the Keras package in TensorFlow and implemented in Python.
Sample fabrication and preparation
The designed multilayer WS-TE samples were all deposited by a magnetron sputter (Kurt J. LesKer-VD75). The deposition rates of SiO2, TiO2, Ag, Si, W, Ge and ZnS are 2, 3, 6, 9, 3, 5, and 11 nm/min, respectively.
Optical characterization
The infrared emissivity of the multilayer WS-TE samples was measured using a Fourier transform infrared spectrometer (Nicolet iN10, Thermo Scientific).
Data availability
The data that support this research’s findings are available and can be provided based on the request to the corresponding authors.
References
Baranov, D. G. et al. A. Nanophotonic engineering of far-field thermal emitters. Nat. Mater. 18, 920–930 (2019).
Li, W. & Fan, S. H. Nanophotonic control of thermal radiation for energy applications [Invited]. Opt. Express 26, 15995–16021 (2018).
Wang, J., Dai, G. L. & Huang, J. P. Thermal metamaterial: fundamental, application, and outlook. iScience 23, 101637 (2020).
Byrnes, S. J., Blanchard, R. & Capasso, F. Harvesting renewable energy from Earth’s mid-infrared emissions. Proc. Natl Acad. Sci. USA 111, 3927–3932 (2014).
Xu, J., Mandal, J. & Raman, A. P. Broadband directional control of thermal emission. Science 372, 393–397 (2021).
Xu, L. J., Dai, G. L. & Huang, J. P. Transformation multithermotics: controlling radiation and conduction simultaneously. Phys. Rev. Appl. 13, 024063 (2020).
Cho, J. W., Lee, E. J. & Kim, S. K. Radiative cooling technologies: a platform for passive heat dissipation. J. Korean Phys. Soc. 81, 481–489 (2022).
Hossain, M. M., Jia, B. H. & Gu, M. A metamaterial emitter for highly efficient radiative cooling. Adv. Opt. Mater. 3, 1047–1051 (2015).
Zhu, H. Z. et al. Multispectral camouflage for infrared, visible, lasers and microwave with radiative cooling. Nat. Commun. 12, 1805 (2021).
Xu, L. J., Wang, R. Z. & Huang, J. P. Camouflage thermotics: a cavity without disturbing heat signatures outside. J. Appl. Phys. 123, 245111 (2018).
He, M. Z. et al. Deterministic inverse design of Tamm plasmon thermal emitters with multi-resonant control. Nat. Mater. 20, 1663–1669 (2021).
Biehs, S. A., Tschikin, M. & Ben-Abdallah, P. Hyperbolic metamaterials as an analog of a blackbody in the near field. Phys. Rev. Lett. 109, 104301 (2012).
Hu, R. et al. Machine learning-optimized Tamm emitter for high-performance thermophotovoltaic system with detailed balance analysis. Nano Energy 72, 104687 (2020).
Yang, R. Z. & He, Y. Z. Optically and non-optically excited thermography for composites: a review. Infrared Phys. Technol. 75, 26–50 (2016).
Bai, B. J. et al. To image, or not to image: class-specific diffractive cameras with all-optical erasure of undesired objects. eLight 2, 14 (2022).
Cen, Z. H. et al. Optical property study of FePt-C nanocomposite thin film for heat-assisted magnetic recording. Opt. Express 21, 9906–9914 (2013).
Liu, T. J. et al. Thermal photonics with broken symmetries. eLight 2, 25 (2022).
Liu, M. Q. et al. Broadband mid-infrared non-reciprocal absorption using magnetized gradient epsilon-near-zero thin films. Nat. Mater. 22, 1196–1202 (2023).
Greffet, J. J. et al. Coherent emission of light by thermal sources. Nature 416, 61–64 (2002).
Liu, B. A. et al. Perfect thermal emission by nanoscale transmission line resonators. Nano Lett. 17, 666–672 (2017).
De Zoysa, M. et al. Conversion of broadband to narrowband thermal emission through energy recycling. Nat. Photonics 6, 535–539 (2012).
Ying, Y. B. et al. Whole LWIR directional thermal emission based on ENZ thin films. Laser Photonics Rev. 16, 2200018 (2022).
Kim, Y. B. et al. High-index-contrast photonic structures: a versatile platform for photon manipulation. Light Sci. Appl. 11, 316 (2022).
Yue, Y. F. & Gong, J. P. Tunable one-dimensional photonic crystals from soft materials. J. Photochem. Photobiol. C Photochem. Rev. 23, 45–67 (2015).
Pan, M. Y. et al. Multi-band middle-infrared-compatible camouflage with thermal management via simple photonic structures. Nano Energy 69, 104449 (2020).
Kim, J., Park, C. & Hahn, J. W. Metal–semiconductor–metal metasurface for multiband infrared stealth technology using camouflage color pattern in visible range. Adv. Opt. Mater. 10, 2101930 (2022).
Sheng, C. X. et al. Colored radiative cooler under optical Tamm resonance. ACS Photonics 6, 2545–2552 (2019).
Yao, K. Q. et al. Near-perfect selective photonic crystal emitter with nanoscale layers for daytime radiative cooling. ACS Appl. Nano Mater. 2, 5512–5519 (2019).
Zhu, Y. N. et al. Color-preserving passive radiative cooling for an actively temperature-regulated enclosure. Light Sci. Appl. 11, 122 (2022).
Xu, H. et al. Photonic crystal for gas sensing. J. Mater. Chem. C. 1, 6087–6098 (2013).
Xi, W. et al. High-throughput screening of a high-Q mid-infrared Tamm emitter by material informatics. Opt. Lett. 46, 888–891 (2021).
Yang, Z. Y. et al. Narrowband wavelength selective thermal emitters by confined Tamm plasmon polaritons. ACS Photonics 4, 2212–2219 (2017).
Hu, R. et al. Thermal camouflaging metamaterials. Mater. Today 45, 120–141 (2021).
Peng, L. et al. A multilayer film based selective thermal emitter for infrared stealth technology. Adv. Opt. Mater. 6, 1801006 (2018).
Zhu, H. Z. et al. High-temperature infrared camouflage with efficient thermal management. Light Sci. Appl. 9, 60 (2020).
Fan, S. H. & Li, W. Photonics and thermodynamics concepts in radiative cooling. Nat. Photonics 16, 182–190 (2022).
Raman, A. P. et al. Passive radiative cooling below ambient air temperature under direct sunlight. Nature 515, 540–544 (2014).
Ma, H. C. et al. Multilayered SiO2/Si3N4 photonic emitter to achieve high-performance all-day radiative cooling. Sol. Energy Mater. Sol. Cells 212, 110584 (2020).
Sakurai, A. et al. Ultranarrow-band wavelength-selective thermal emission with aperiodic multilayered metamaterials designed by Bayesian optimization. ACS Cent. Sci. 5, 319–326 (2019).
Hu, R. et al. Machine-learning-optimized aperiodic superlattice minimizes coherent phonon heat conduction. Phys. Rev. X 10, 021050 (2020).
Wang, Q. X. et al. Module-level polaritonic thermophotovoltaic emitters via hierarchical sequential learning. Nano Lett. 23, 1144–1151 (2023).
Kim, S. et al. High-performance transparent radiative cooler designed by quantum computing. ACS Energy Lett. 7, 4134–4141 (2022).
Molesky, S. et al. Inverse design in nanophotonics. Nat. Photonics 12, 659–670 (2018).
Ma, W. et al. Deep learning for the design of photonic structures. Nat. Photonics 15, 77–90 (2021).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Wang, H. Z. et al. Automated multi-layer optical design via deep reinforcement learning. Mach. Learn. Sci. Technol. 2, 025013 (2021).
Sajedian, I., Badloe, T. & Rho, J. Optimization of colour generation from dielectric nanostructures using reinforcement learning. Opt. Express 27, 5874–5883 (2019).
Xi, W. et al. Ultrahigh-efficient material informatics inverse design of thermal metamaterials for visible-infrared-compatible camouflage. Nat. Commun. 14, 4694 (2023).
Palik, E. D. Handbook of optical constants of solids (Academic Press, 1997).
Querry, M. R. Optical constants of minerals and other materials from the millimeter to the ultraviolet. https://apps.dtic.mil/sti/citations/ADA192210 (1987).
Siefke, T. et al. Materials pushing the application limits of wire grid polarizers further into the deep ultraviolet spectral range. Adv. Opt. Mater. 4, 1780–1786 (2016).
Yang, H. U. et al. Optical dielectric function of silver. Phys. Rev. B 91, 235137 (2015).
Van Hasselt, H., Guez, A. & Silver, D. Deep reinforcement learning with double Q-learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI Press, 2016).
Acknowledgements
The authors would like to acknowledge the financial support by National Natural Science Foundation of China (52211540005, 52076087, 52161160332), Natural Science Foundation of Hubei Province (2023AFA072), the Open Project Program of Wuhan National Laboratory for Optoelectronics (2021WNLOKF004), Wuhan City Science and Technology Program (2020010601012197), Knowledge Innovation Shuguang Program. W.L. acknowledges the financial support from Key Research and Development plan of Hubei Province (2021BGE037). J.S. acknowledges the financial support from JSPS Bilateral Joint Research Projects (120227404).
Author information
Authors and Affiliations
Contributions
S.Y. and R.H. conceived the study. S.Y., W.X., Z.C., and R.H. developed the code and performed numerical simulations. P.Z., Y.D., and W.L. performed the experiment. S.Y. and R.H. wrote the manuscript and analyzed the results. X.L., J.S., W.L., and R.H. revised the manuscript. W.L. and R.H. supervised the study. All the authors provided feedback and contributed to the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yu, S., Zhou, P., Xi, W. et al. General deep learning framework for emissivity engineering. Light Sci Appl 12, 291 (2023). https://doi.org/10.1038/s41377-023-01341-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41377-023-01341-w