Introduction

Metasurfaces can manipulate the phase and spectrum of impinging light1,2,3,4,5,6,7,8. The geometrical properties of metasurfaces can be tuned to change the phase of light for desired applications. Many applications have emerged from this idea, including ultrathin lenses9,10, vortex beam generators11,12 and holograms13,14,15,16,17,18,19,20,21. Different designs have been proposed to increase the holograms’ efficiency18,19,20,21; some used metallic structures but the efficiency in visible wavelengths was low because of metals’ intrinsic loss22,23,24,25. Some efficient designs work only for reflected light26.

The incident light’s polarization must be considered when a metasurface hologram is designed. Some of the proposed metasurfaces are noncircular18,27,28,29 so they only work for a specific polarization. Some are highly efficient and work independently of polarization, but they do not work in visible light20. The ideal structure would generate the whole phase map, work with transmitted light, work in the visible regime, be independent of polarization, and have high efficiency. No structure has yet satisfied all these conditions simultaneously.

The task of finding the best geometrical parameters and choosing the right materials for a structure is always a challenge. Recently, researchers have used neural networks (NNs) to design nanophotonic structures30,31,32,33,34,35,36, and to design chiral metamaterials37. The double deep Q-learning network (DDQN)38 has been used to find the optimized parameters for a photonic structure39. Here we use DDQN to optimize the design of a metasurface holograms. This method is like an intelligent sweep, which learns how to efficiently explore the given parameter space to reach the highest reward in the lowest time, and it can be used to optimize a given physical structure.

We would like to clarify why using DDQN (which belongs to the family of reinforcement learning (RL) methods) is much more efficient (or probably the only way) than using classic optimization methods like Monte-Carlo search, swarm intelligence, genetic algorithms or Bayesian method as a few examples. In neural networks, a number of hidden layers of nodes connect the input data to the output data. This connection is then optimized by an optimization method. This way the network not only optimizes the problem but also learns from it. To compare this to a human mind, assume that we are getting a reward for each step that we take. For example if we get 10 points for the first step and 20 points for the second step, we immediately learn that to get to 100 points we need to take 10 steps, and no further actions is needed to be taken. In a same way, an RL method optimizes a problem by learning from it and not just finding a maxima or minima in a mathematical space (like optimization methods). This leads to a much more efficient optimization method and in some cases the only method that can find the solution for a complex problem.

A detailed benchmark of DDQN compared to other methods is provided by other researchers38. We explained the details of this method and how it can be used in optics and its benefits previously39. Reinforcement learning was also used for identifying variational protocols in quantum physics40 as an optimizing mechanism.

Methods

We can tune the phase of the incident light by changing the geometrical properties of the metasurfaces. The metasurface consists of nano-antennas with dimensions much smaller than the operating wavelength, so the thin layer of metasurface acts as a homogeneous medium with different refractive index n than the surrounding medium. The function of this thin layer is to apply a small phase delay to the light that is passing through it26. n of this thin layer can be controlled by changing the geometrical properties of the nano-antennas; the change in n leads to different phase delays by different structures. So each structure creates a phase delay. This way we can create a phase map (which is a collection of different phase delays from −π to π) by combining different structures.

Some authors use non circular Nano-antennas (like V-shaped25, rectangular22,…) to produce the required phase map. As an example, V-shaped Nano-antennas can easily be tuned by changing the length of each antenna or by changing the angle between them, to create the required phase delay. But they only work for a specific polarization. A structure that can produce a polarization independent phase delay should be cylindrical. But the problem with cylindrical structures is that they can only be tuned by their radius to generate the required phase delay, compared to non-circular shaped structures (since the thickness should be kept constant for manufacturing limitations). This makes it very hard to find the right structure which can generate the whole phase map and at the same time have a high transmission efficiency. Adding lattice constant and material type variables (which is common between all holograms) to this problem makes it very hard for human researchers to check all the possibilities to find the optimum structures. Here we use an AI code to help us find the optimum structure.

Structure definition

Here we try to find the optimal structure type to achieve high efficiency in the visible range for transmission-type holograms. The metasurface structure that we chose for this idea is a nano-disk laying on a thin film that is laying upon a grating, all on a glass substrate (Fig. 1(a)). Having this structure as the starting point covers many possibilities. The combination of the grating with nano-disks can increase transmission in metallic metamaterials by forming a structure that is similar to a Fabry-Perot cavity41. Using the circular shape for nanoantennas makes the hologram independent of polarization. All geometrical properties (except disk radius) and material types will be found using the DDQN. The important factor here is that the DDQN decides to use the starting structure as it is or to change it (removing grating, thin film, or both).

Figure 1
figure 1

Schematic of application of DDQN algorithm to find high-efficiency holograms. (a) Initial structure used as the input DDQN at each step. Abbreviations are defined in the text and in Table 1. (b) Structure of DDQN method used to optimize the metasurface hologram. (c) The optimal structure found by DDQN with a high efficiency for generating holograms. The DDQN determined that no grating or film is required, by setting the coverage of the grating to zero and the material of the film as glass.

DDQN structure

DDQN can be used to optimize a physical structure, as described previously39. Briefly, based on the given and future rewards, DDQN tries to connect the state of the structure to the action that should be taken (Fig. 1(b)). In DDQN we have two neural network models. A main model and an auxiliary model. The auxiliary model is used to update the main model’s weights, and the main model is used to predict the actions. These models had 3 hidden layers with 24, 48, 24 neurons each, with an Adam optimizer with a learning rate of 0.00539. A Markov decision process42 is used to predict the actions.

The model creates some data for itself by initial guessing at the beginning and by doing some actions (the model creates some data by itself from what it learned so far) as the code progresses, and all of these data are saved as an experience replay. This experience replay keeps getting updated as the model progresses (the old data is replaced by new data) so the model learns from the newly generated data. In other words, the model is training on data that is continuously updated.

An epsilon-greedy method is used to create the initial database. This method determines when the guessing should finish and the learning should start. To do this we define an epsilon function starting from 0.95 to 0.1 with a decay rate of 0.995 as shown in Fig. 2. At each step, a random number is generated by the code. If the generated random number was lower than epsilon, then the model guesses the next action randomly (known as exploration), and if it was higher than epsilon the model predicts the next action by what it learned so far. At each step, the epsilon decays until it reaches 0.1 (this assures that the model always has a 10 percent chance of exploration).

Figure 2
figure 2

Epsilon greedy method. At each step, a random number is generated by the model. If the number was lower than epsilon in that step (exploration area) the model chooses a random action and if it was higher than epsilon in that step (exploitation area) the model chooses an action based on what it learned.

At each step, the DDQN changes a geometrical property or material type of the structure. Based on the given feedback from the simulating environment it learns the effect of the change it made, and so it learns how to act better in future. The model consists of three parts: (1) the state of the structure at each step; (2) the action that should be taken to change the geometrical properties of the structure at each step; (3) a reward system that awards or penalizes the model for the action that it chose.

The state of the structure is composed of the geometrical properties and material types of the structure at each step:

  • Nano-disk material type (D_M): 23 different materials.

  • Thin film material type (F_M): 23 different materials.

  • Grating material type (G_M): 23 different materials.

  • Nano disk thickness (D_H): 10 nm–250 nm, step size: 10 nm; number of steps: 24

  • Film thickness (F_H): 10 nm–150 nm, step size: 10 nm; number of steps: 14

  • Grating thickness (G_H): 10 nm–150 nm, step size: 10 nm; number of steps: 14

  • Grating coverage (G_C) in each unit cell: 0–100%; number of steps: 10

  • The spacing between disks (L): 20 nm–120 nm, step size: 10 nm; number of steps: 10

The total number of possible states = 23 × 23 × 23 × 24 × 14 × 14 × 10 × 10 = 5,723,356,800.

We did not include the disk’s radius in the parameters, because it is used to evaluate the structure’s ability to produce the needed phase map. At each state, a separate loop is performed on the disk’s radius between 45 nm to 190 nm and the phases generated by different radii are computed and saved. If the phase range generated by the structure is big enough for holographic uses, the structure is considered as a candidate for optimization by the DDQN. All of these processes are performed in the reward system.

Action definitions

The next step is to define the actions, which determine what the model should do at each step. To change the material of the disks, film, and grating we represented 23 materials in a matrix (Table 1). If the model wants to change the material of one of the parts, it simply changes the index of the material matrix of that specific part. Two actions are defined for changing the material of each part: one to increase the material’s matrix index and one to decrease it. The definitions of actions are shown in Table 2.

Table 1 List of materials used for disks, thin film, and grating.
Table 2 Definitions of actions used in DDQN.

Reward system

The final step is to define a reward system. It gives feedback to the model at each step, so it learns to improve its actions in future steps. We designed the reward system to give the highest feedback to the phase-generating property of the structure, and lesser feedback to its efficiency; i.e., the model prioritizes the structures that increase the phase map, then considers their efficiencies. To do this we divided the range of −π to π into six equal parts. A model gets 100 points for finding a structure that generates one of these parts, so in total, a model can get 600 points for finding a structure that can generate the whole phase map. A model gets additional points for the minimum transmitted power of the structure times 100. For example, if a structure generates four phase parts and has a minimum transmitted power of 0.25 it will get 4 × 100 + 0.25 × 100 = 425 points. This way, a structure that generates a large number of phase parts will be preferred by the model compared to a structure with a lower number of generated phase parts regardless of the structure’s efficiency. We choose this scheme because we seek a structure that can generate the whole phase map. The scheme also sets the terminating reward of the structure as 700 as is needed for DDQN model. A score of 700 means that the model has reached its ideal structure and should stop looking for new structures. The final found structure by DDQN is shown in (Fig. 1(c)).

Generating the hologram

Now we discuss how the found structure can be used to generate holograms. The first step is to find the phase that the structure generates. This procedure is performed by calculating the S-parameter, which shows the generated phase. We generated the whole phase range of [−π, π] while changing the radius of the disk from 45 nm to 190 nm. The radius affected the phase and amplitude of the S-parameter of the transmitted light (Fig. 3(a)). It also affects the transmission (Fig. 3(b)), which will be used for calculating hologram’s efficiency.

Figure 3
figure 3

Physical results of the optimal structure found by DDQN. (a) Phases and amplitudes of the S-parameter for transmitted light generated by disks with different radii. This information is used to create the phase map needed for generating the hologram. (b) Effect of disk radius on transmitted power, which is used to determine the hologram’s efficiency.

To generate the hologram, we need the phase map of our desired image. To find the phase map of a given image, we used an algorithm43 that creates a numerical phase map from a given image. Once we have the phase map, the next step is to construct the phase map by metasurfaces (Fig. 4(a)). The phase map is a matrix of numbers. We replace each phase by its corresponding diameter (Fig. 3(a)); this process yields a matrix of diameters by which we can construct an array of metasurfaces, and create the needed phase map and also calculate the efficiency of the hologram (Fig. 3(b)). The full phase map contains all the phases. A Fourier transform of the phase map generates a hologram (Fig. 4(b)). This procedure is done physically by using a Fourier transform lens44 or done numerically by applying a Fourier transform to the phase map matrix.

Figure 4
figure 4

Procedure to generate holograms from metasurfaces. (a) Transforming phase map to phase matrix, diameter matrix and transmission matrix for constructing the final structure and calculating transmission efficiency. This is shown for a 3 × 3 pixel subsection of the phase map. (b) Converting phase map to hologram.

To generate the whole phase map, we need all radii from 45 nm to 190 nm. However, fabricating radii with a 1 nm is precision is impossible in practice, so we chose only some of them. This process is known as an m-level phase map, in which m represents the number of chosen radii. For example, m = 6 means that the phase map has 6 levels or in other words only 6 radii. This procedure leads to loss of data and decreases the quality of the recovered image (Fig. 5(a–d)). Only 6 cylinders were used to calculate the final phase map and since the average of transmission power of those cylinders were higher than the average of transmission power of all the cylinders, the efficiency of 6-level phase map is higher in this case.

Figure 5
figure 5

Numerically-generated holograms and transmission efficiencies of the structure found by DDQN. (a) Desired image and its full phase map, and recovered image from the full phase map. The desired image and its 30-level (b), 16-level (c) and 6-level (d) phase map and their corresponding images recovered by Fourier transform, and the calculated transmission efficiencies. Increase in the level of the phase map, increases the image quality and decreases the transmission efficiency.

Results

The simulations were done at 532 nm (green) to be compatible with most recent experimental work in the visible range21,23,45. The numerical simulations were performed in Lumerical and the machine learning codes are written in Python. A machine with a 16-core 3.40 GHz processor, 64 GB of RAM, and a NVIDIA GTX 1080ti GPU with 11GB DDR5X RAM was used. Although the number of states was ~5.7 billion, DDQN found the optimal results in only 2169 steps. As can be seen the model could find the results pretty fast. This may imply that the model just found this result by random guessing. As can be seen from Fig. 2 after step 400 the model predicts the actions just by learning and only 10 percent of the actions are done by guessing. It should be noted that there might be better answers than what the model found. So based on how long we let the model run, how the good the rival results are, initial conditions or complexity of the problem, it may take longer or shorter to find the optimum results.

It took a month for coding and running the model. The time is variable for different problems based on their complexity.

The final structure is as follows:

  • Nano-disk material type (D_M): 19 (Indium phosphide) (Table 2)

  • Thin film material type (F_M): 23 (SiO2 Glass)

  • Grating material type (G_M): 23 (SiO2 Glass)

  • Nano disk thickness (D_H): 190 nm

  • Film thickness (F_H): 10 nm

  • Grating thickness (G_H): 20 nm

  • Grating coverage (G_C) in each unit cell: 0\%

  • The spacing between disks (L): 70 nm

  • Minimum transmitted power: 0.16

We can compute the efficiency of the hologram by calculating the average transmitted power from the phase map. We have the transmission for each of the radii (Fig. 3(b)), so by counting the number of each of the radii used in the phase map and calculating the average of transmission power, we can estimate the efficiency of the corresponding phase map (Fig. 5(a)). This is a rough estimation for two reasons. First the coupling between adjacent cylinders should be considered, and second, each image will have its own phase map and so different setup of cylinders is used for each image which results in different transmission efficiencies. So the only way to correctly find the transmission efficiency of a hologram is by fabricating it, and each image will have its own efficiency21. But this method gives us an approximate estimation of the average transmission efficiency as is shown in Fig. 4.

The computed transmission efficiency was 32% for a high-quality recovered image (Fig. 5(a)). Compared to the structures with the same properties (transmission type, polarization independent, and in visible regime) our proposed structure’s transmission efficiency is two times higher than21 with 17% transmission efficiency (theoretical) (6% experimental), and much higher than other similar work24 with <1% transmission efficiency(experimental). It should be noted that what we computed here is the total transmission efficiency (that is defined as the ratio of image intensity to the total power of incident light) and it shouldn’t be confused with diffraction efficiency (that is defined as the ratio of image intensity to the total power of hologram plane24). In diffraction efficiency, the source monitor is placed after the hologram (unlike the transmission efficiency in which the source monitor is placed before hologram) and so the efficiency is much higher compared to transmission efficiency, since the effect of hologram is neglected. A comparison of our hologram’s transmission efficiency with some of the previously reported results is shown in Table 3.

Table 3 Comparison of our hologram’s efficiency with some of the previously reported results.

Conclusion

Here, we used double deep Q-learning to optimize a hologram structure to increase its efficiency. The DDQN model optimized the geometrical properties and also found the best material types for the structure. The hologram structure reported here is transmission type, works in the visible range and is independent of polarization. The previously reported structures with these properties had a maximum of 17% transmission efficiency, but our AI code could find a structure that had a 32% transmission efficiency while yielding a high-quality output.