Abstract
We use a double deep Qlearning network (DDQN) to find the right material type and the optimal geometrical design for metasurface holograms to reach high efficiency. The DDQN acts like an intelligent sweep and could identify the optimal results in ~5.7 billion states after only 2169 steps. The optimal results were found between 23 different material types and various geometrical properties for a threelayer structure. The computed transmission efficiency was 32% for highquality metasurface holograms; this is two times bigger than the previously reported results under the same conditions. The found structure is transmissiontype and polarizationindependent and works in the visible region.
Introduction
Metasurfaces can manipulate the phase and spectrum of impinging light^{1,2,3,4,5,6,7,8}. The geometrical properties of metasurfaces can be tuned to change the phase of light for desired applications. Many applications have emerged from this idea, including ultrathin lenses^{9,10}, vortex beam generators^{11,12} and holograms^{13,14,15,16,17,18,19,20,21}. Different designs have been proposed to increase the holograms’ efficiency^{18,19,20,21}; some used metallic structures but the efficiency in visible wavelengths was low because of metals’ intrinsic loss^{22,23,24,25}. Some efficient designs work only for reflected light^{26}.
The incident light’s polarization must be considered when a metasurface hologram is designed. Some of the proposed metasurfaces are noncircular^{18,27,28,29} so they only work for a specific polarization. Some are highly efficient and work independently of polarization, but they do not work in visible light^{20}. The ideal structure would generate the whole phase map, work with transmitted light, work in the visible regime, be independent of polarization, and have high efficiency. No structure has yet satisfied all these conditions simultaneously.
The task of finding the best geometrical parameters and choosing the right materials for a structure is always a challenge. Recently, researchers have used neural networks (NNs) to design nanophotonic structures^{30,31,32,33,34,35,36}, and to design chiral metamaterials^{37}. The double deep Qlearning network (DDQN)^{38} has been used to find the optimized parameters for a photonic structure^{39}. Here we use DDQN to optimize the design of a metasurface holograms. This method is like an intelligent sweep, which learns how to efficiently explore the given parameter space to reach the highest reward in the lowest time, and it can be used to optimize a given physical structure.
We would like to clarify why using DDQN (which belongs to the family of reinforcement learning (RL) methods) is much more efficient (or probably the only way) than using classic optimization methods like MonteCarlo search, swarm intelligence, genetic algorithms or Bayesian method as a few examples. In neural networks, a number of hidden layers of nodes connect the input data to the output data. This connection is then optimized by an optimization method. This way the network not only optimizes the problem but also learns from it. To compare this to a human mind, assume that we are getting a reward for each step that we take. For example if we get 10 points for the first step and 20 points for the second step, we immediately learn that to get to 100 points we need to take 10 steps, and no further actions is needed to be taken. In a same way, an RL method optimizes a problem by learning from it and not just finding a maxima or minima in a mathematical space (like optimization methods). This leads to a much more efficient optimization method and in some cases the only method that can find the solution for a complex problem.
A detailed benchmark of DDQN compared to other methods is provided by other researchers^{38}. We explained the details of this method and how it can be used in optics and its benefits previously^{39}. Reinforcement learning was also used for identifying variational protocols in quantum physics^{40} as an optimizing mechanism.
Methods
We can tune the phase of the incident light by changing the geometrical properties of the metasurfaces. The metasurface consists of nanoantennas with dimensions much smaller than the operating wavelength, so the thin layer of metasurface acts as a homogeneous medium with different refractive index n than the surrounding medium. The function of this thin layer is to apply a small phase delay to the light that is passing through it^{26}. n of this thin layer can be controlled by changing the geometrical properties of the nanoantennas; the change in n leads to different phase delays by different structures. So each structure creates a phase delay. This way we can create a phase map (which is a collection of different phase delays from −π to π) by combining different structures.
Some authors use non circular Nanoantennas (like Vshaped^{25}, rectangular^{22},…) to produce the required phase map. As an example, Vshaped Nanoantennas can easily be tuned by changing the length of each antenna or by changing the angle between them, to create the required phase delay. But they only work for a specific polarization. A structure that can produce a polarization independent phase delay should be cylindrical. But the problem with cylindrical structures is that they can only be tuned by their radius to generate the required phase delay, compared to noncircular shaped structures (since the thickness should be kept constant for manufacturing limitations). This makes it very hard to find the right structure which can generate the whole phase map and at the same time have a high transmission efficiency. Adding lattice constant and material type variables (which is common between all holograms) to this problem makes it very hard for human researchers to check all the possibilities to find the optimum structures. Here we use an AI code to help us find the optimum structure.
Structure definition
Here we try to find the optimal structure type to achieve high efficiency in the visible range for transmissiontype holograms. The metasurface structure that we chose for this idea is a nanodisk laying on a thin film that is laying upon a grating, all on a glass substrate (Fig. 1(a)). Having this structure as the starting point covers many possibilities. The combination of the grating with nanodisks can increase transmission in metallic metamaterials by forming a structure that is similar to a FabryPerot cavity^{41}. Using the circular shape for nanoantennas makes the hologram independent of polarization. All geometrical properties (except disk radius) and material types will be found using the DDQN. The important factor here is that the DDQN decides to use the starting structure as it is or to change it (removing grating, thin film, or both).
DDQN structure
DDQN can be used to optimize a physical structure, as described previously^{39}. Briefly, based on the given and future rewards, DDQN tries to connect the state of the structure to the action that should be taken (Fig. 1(b)). In DDQN we have two neural network models. A main model and an auxiliary model. The auxiliary model is used to update the main model’s weights, and the main model is used to predict the actions. These models had 3 hidden layers with 24, 48, 24 neurons each, with an Adam optimizer with a learning rate of 0.005^{39}. A Markov decision process^{42} is used to predict the actions.
The model creates some data for itself by initial guessing at the beginning and by doing some actions (the model creates some data by itself from what it learned so far) as the code progresses, and all of these data are saved as an experience replay. This experience replay keeps getting updated as the model progresses (the old data is replaced by new data) so the model learns from the newly generated data. In other words, the model is training on data that is continuously updated.
An epsilongreedy method is used to create the initial database. This method determines when the guessing should finish and the learning should start. To do this we define an epsilon function starting from 0.95 to 0.1 with a decay rate of 0.995 as shown in Fig. 2. At each step, a random number is generated by the code. If the generated random number was lower than epsilon, then the model guesses the next action randomly (known as exploration), and if it was higher than epsilon the model predicts the next action by what it learned so far. At each step, the epsilon decays until it reaches 0.1 (this assures that the model always has a 10 percent chance of exploration).
At each step, the DDQN changes a geometrical property or material type of the structure. Based on the given feedback from the simulating environment it learns the effect of the change it made, and so it learns how to act better in future. The model consists of three parts: (1) the state of the structure at each step; (2) the action that should be taken to change the geometrical properties of the structure at each step; (3) a reward system that awards or penalizes the model for the action that it chose.
The state of the structure is composed of the geometrical properties and material types of the structure at each step:

Nanodisk material type (D_M): 23 different materials.

Thin film material type (F_M): 23 different materials.

Grating material type (G_M): 23 different materials.

Nano disk thickness (D_H): 10 nm–250 nm, step size: 10 nm; number of steps: 24

Film thickness (F_H): 10 nm–150 nm, step size: 10 nm; number of steps: 14

Grating thickness (G_H): 10 nm–150 nm, step size: 10 nm; number of steps: 14

Grating coverage (G_C) in each unit cell: 0–100%; number of steps: 10

The spacing between disks (L): 20 nm–120 nm, step size: 10 nm; number of steps: 10
The total number of possible states = 23 × 23 × 23 × 24 × 14 × 14 × 10 × 10 = 5,723,356,800.
We did not include the disk’s radius in the parameters, because it is used to evaluate the structure’s ability to produce the needed phase map. At each state, a separate loop is performed on the disk’s radius between 45 nm to 190 nm and the phases generated by different radii are computed and saved. If the phase range generated by the structure is big enough for holographic uses, the structure is considered as a candidate for optimization by the DDQN. All of these processes are performed in the reward system.
Action definitions
The next step is to define the actions, which determine what the model should do at each step. To change the material of the disks, film, and grating we represented 23 materials in a matrix (Table 1). If the model wants to change the material of one of the parts, it simply changes the index of the material matrix of that specific part. Two actions are defined for changing the material of each part: one to increase the material’s matrix index and one to decrease it. The definitions of actions are shown in Table 2.
Reward system
The final step is to define a reward system. It gives feedback to the model at each step, so it learns to improve its actions in future steps. We designed the reward system to give the highest feedback to the phasegenerating property of the structure, and lesser feedback to its efficiency; i.e., the model prioritizes the structures that increase the phase map, then considers their efficiencies. To do this we divided the range of −π to π into six equal parts. A model gets 100 points for finding a structure that generates one of these parts, so in total, a model can get 600 points for finding a structure that can generate the whole phase map. A model gets additional points for the minimum transmitted power of the structure times 100. For example, if a structure generates four phase parts and has a minimum transmitted power of 0.25 it will get 4 × 100 + 0.25 × 100 = 425 points. This way, a structure that generates a large number of phase parts will be preferred by the model compared to a structure with a lower number of generated phase parts regardless of the structure’s efficiency. We choose this scheme because we seek a structure that can generate the whole phase map. The scheme also sets the terminating reward of the structure as 700 as is needed for DDQN model. A score of 700 means that the model has reached its ideal structure and should stop looking for new structures. The final found structure by DDQN is shown in (Fig. 1(c)).
Generating the hologram
Now we discuss how the found structure can be used to generate holograms. The first step is to find the phase that the structure generates. This procedure is performed by calculating the Sparameter, which shows the generated phase. We generated the whole phase range of [−π, π] while changing the radius of the disk from 45 nm to 190 nm. The radius affected the phase and amplitude of the Sparameter of the transmitted light (Fig. 3(a)). It also affects the transmission (Fig. 3(b)), which will be used for calculating hologram’s efficiency.
To generate the hologram, we need the phase map of our desired image. To find the phase map of a given image, we used an algorithm^{43} that creates a numerical phase map from a given image. Once we have the phase map, the next step is to construct the phase map by metasurfaces (Fig. 4(a)). The phase map is a matrix of numbers. We replace each phase by its corresponding diameter (Fig. 3(a)); this process yields a matrix of diameters by which we can construct an array of metasurfaces, and create the needed phase map and also calculate the efficiency of the hologram (Fig. 3(b)). The full phase map contains all the phases. A Fourier transform of the phase map generates a hologram (Fig. 4(b)). This procedure is done physically by using a Fourier transform lens^{44} or done numerically by applying a Fourier transform to the phase map matrix.
To generate the whole phase map, we need all radii from 45 nm to 190 nm. However, fabricating radii with a 1 nm is precision is impossible in practice, so we chose only some of them. This process is known as an mlevel phase map, in which m represents the number of chosen radii. For example, m = 6 means that the phase map has 6 levels or in other words only 6 radii. This procedure leads to loss of data and decreases the quality of the recovered image (Fig. 5(a–d)). Only 6 cylinders were used to calculate the final phase map and since the average of transmission power of those cylinders were higher than the average of transmission power of all the cylinders, the efficiency of 6level phase map is higher in this case.
Results
The simulations were done at 532 nm (green) to be compatible with most recent experimental work in the visible range^{21,23,45}. The numerical simulations were performed in Lumerical and the machine learning codes are written in Python. A machine with a 16core 3.40 GHz processor, 64 GB of RAM, and a NVIDIA GTX 1080ti GPU with 11GB DDR5X RAM was used. Although the number of states was ~5.7 billion, DDQN found the optimal results in only 2169 steps. As can be seen the model could find the results pretty fast. This may imply that the model just found this result by random guessing. As can be seen from Fig. 2 after step 400 the model predicts the actions just by learning and only 10 percent of the actions are done by guessing. It should be noted that there might be better answers than what the model found. So based on how long we let the model run, how the good the rival results are, initial conditions or complexity of the problem, it may take longer or shorter to find the optimum results.
It took a month for coding and running the model. The time is variable for different problems based on their complexity.
The final structure is as follows:

Nanodisk material type (D_M): 19 (Indium phosphide) (Table 2)

Thin film material type (F_M): 23 (SiO2 Glass)

Grating material type (G_M): 23 (SiO2 Glass)

Nano disk thickness (D_H): 190 nm

Film thickness (F_H): 10 nm

Grating thickness (G_H): 20 nm

Grating coverage (G_C) in each unit cell: 0\%

The spacing between disks (L): 70 nm

Minimum transmitted power: 0.16
We can compute the efficiency of the hologram by calculating the average transmitted power from the phase map. We have the transmission for each of the radii (Fig. 3(b)), so by counting the number of each of the radii used in the phase map and calculating the average of transmission power, we can estimate the efficiency of the corresponding phase map (Fig. 5(a)). This is a rough estimation for two reasons. First the coupling between adjacent cylinders should be considered, and second, each image will have its own phase map and so different setup of cylinders is used for each image which results in different transmission efficiencies. So the only way to correctly find the transmission efficiency of a hologram is by fabricating it, and each image will have its own efficiency^{21}. But this method gives us an approximate estimation of the average transmission efficiency as is shown in Fig. 4.
The computed transmission efficiency was 32% for a highquality recovered image (Fig. 5(a)). Compared to the structures with the same properties (transmission type, polarization independent, and in visible regime) our proposed structure’s transmission efficiency is two times higher than^{21} with 17% transmission efficiency (theoretical) (6% experimental), and much higher than other similar work^{24} with <1% transmission efficiency(experimental). It should be noted that what we computed here is the total transmission efficiency (that is defined as the ratio of image intensity to the total power of incident light) and it shouldn’t be confused with diffraction efficiency (that is defined as the ratio of image intensity to the total power of hologram plane^{24}). In diffraction efficiency, the source monitor is placed after the hologram (unlike the transmission efficiency in which the source monitor is placed before hologram) and so the efficiency is much higher compared to transmission efficiency, since the effect of hologram is neglected. A comparison of our hologram’s transmission efficiency with some of the previously reported results is shown in Table 3.
Conclusion
Here, we used double deep Qlearning to optimize a hologram structure to increase its efficiency. The DDQN model optimized the geometrical properties and also found the best material types for the structure. The hologram structure reported here is transmission type, works in the visible range and is independent of polarization. The previously reported structures with these properties had a maximum of 17% transmission efficiency, but our AI code could find a structure that had a 32% transmission efficiency while yielding a highquality output.
Data Availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files).
References
 1.
N. K. Kildishev, A. V., Boltasseva, A. & Shalaev, V. M. Broadband light bending with plasmonic nanoantennas. Science 335, 427–427 (2012).
 2.
Sun, S. et al. Gradientindex metasurfaces as a bridge linking propagating waves and surface waves. Nature materials 11, 426 (2012).
 3.
Yu, N. et al. Light propagation with phase discontinuities: generalized laws of reflection and refraction. science 334, 333–337 (2011).
 4.
Kim, I. et al. Thermally robust ringshaped chromium perfect absorber of visible light. Nanophotonics 7, 1827–1833 (2018).
 5.
Rana, A. S. et al. Tungstenbased ultrathin absorber for visible regime. Scientific Reports 8, 2443 (2018).
 6.
Kim, I. et al. Outfitting next generation display with optical metasurfaces. ACS Photonics 5, 3876–3895 (2018).
 7.
Lee, T. et al. Plasmonic and dielectricbased structural coloring: from fundamentals to practical applications. Nano Convergence 5, 1 (2018).
 8.
Li, Z. et al. Fullspace cloud of random points with a scrambling metasurface. Light: Science & Application 7, 63 (2018).
 9.
Aieta, F. et al. Aberrationfree ultrathin flat lenses and axicons at telecom wavelengths based on plasmonic metasurfaces. Nano letters 12, 4932–4936 (2012).
 10.
Ni, X., Ishii, S., Kildishev, A. V. & Shalaev, V. M. Ultrathin, planar, Babinetinverted plasmonic metalenses. Light: Science & Applications 2, e72 (2013).
 11.
Li, G. et al. Spinenabled plasmonic metasurfaces for manipulating orbital angular momentum of light. Nano letters 13, 4148–4151 (2013).
 12.
Mahmood, N. et al. Polarisation insensitive multifunctional metasurfaces based on alldielectric nanowaveguides. Nanoscale 10, 18323–18330 (2018).
 13.
Devlin, R. C., Khorasaninejad, M., Chen, W. T., Oh, J. & Capasso, F. Broadband highefficiency dielectric metasurfaces for the visible spectrum. Proceedings of the National Academy of Sciences 113, 10473–10478 (2016).
 14.
Lee, G.Y. et al. Complete amplitude and phase control of light using broadband holographic metasurface. Nanoscale 10, 4237–4245 (2018).
 15.
Yoon, G. et al. “Cryptodisplay” in dualmode metasurfaces by simultaneous control of phase and spectral responses. ACS Nano 12, 6421–6428 (2018).
 16.
Li, Z. et al. Dielectric metaholograms enabled with dual magnetic resonances in visible light. ACS Nano 11, 9382–9389 (2017).
 17.
Ansari, M. A. et al. A spinencoded alldielectric metahologram for visible light. Laser and Photonics Reviews 13, 1900065 (2019).
 18.
Arbabi, A., Horie, Y., Bagheri, M. & Faraon, A. Dielectric metasurfaces for complete control of phase and polarization with subwavelength spatial resolution and high transmission. Nature nanotechnology 10, 937 (2015).
 19.
Chong, K. E. et al. Efficient polarizationinsensitive complex wavefront control using Huygens’ metasurfaces based on dielectric resonant metaatoms. Acs Photonics 3, 514–519 (2016).
 20.
Wang, L. et al. Grayscale transparent metasurface holograms. Optica 3, 1504–1505 (2016).
 21.
Yoon, G., Lee, D., Nam, K. T. & Rho, J. Pragmatic metasurface hologram at visible wavelength: the balance between diffraction efficiency and fabrication compatibility. ACS Photonics 5, 1643–1647 (2017).
 22.
Li, X. et al. Multicolor 3D metaholography by broadband plasmonic modulation. Science advances 2, e1601102 (2016).
 23.
Huang, K. et al. Ultrahighcapacity nonperiodic photon sieves operating in visible light. Nature communications 6, 7059 (2015).
 24.
Huang, K. et al. Photonnanosieve for ultrabroadband and largeangleofview holograms. Laser & Photonics Reviews 11, 1700025 (2017).
 25.
Ni, X., Kildishev, A. V. & Shalaev, V. M. Metasurface holograms for visible light. Nature communications 4, 2807 (2013).
 26.
Zheng, G. et al. Metasurface holograms reaching 80% efficiency. Nature nanotechnology 10, 308 (2015).
 27.
Wang, B. et al. Visiblefrequency dielectric metasurfaces for multiwavelength achromatic and highly dispersive holograms. Nano letters 16, 5235–5240 (2016).
 28.
Qin, F. et al. Hybrid bilayer plasmonic metasurface efficiently manipulates visible light. Science advances 2, e1501168 (2016).
 29.
Ding, X. et al. Ultrathin Pancharatnam–Berry Metasurface with Maximal CrossPolarization Efficiency. Advanced Materials 27, 1195–1200 (2015).
 30.
Liu, D., Tan, Y., Khoram, E. & Yu, Z. Training deep neural networks for the inverse design of nanophotonic structures. ACS Photonics 5, 1365–1369 (2018).
 31.
Peurifoy, J. et al. In Frontiers in Optics. FTh4A. 4 (Optical Society of America).
 32.
Malkiel, I. et al. Deep learning for design and retrieval of nanophotonic structures. arXiv preprint arXiv 1702, 07949 (2017).
 33.
Peurifoy, J. et al. Nanophotonic particle simulation and inverse design using artificial neural networks. Science advances 4, eaar4206 (2018).
 34.
Sajedian, I., Kim, J. & Rho, J. Finding the optical properties of plasmonic structures by image processing using the combination of convolutional neural networks and recurrent neural networks. Microsystems and Nanoengineering 5, 27 (2019).
 35.
So, S. & Rho, J. Designing nanophotonic structures using conditionaldeep convolutional generative adversarial networks. Nanophotonics 8, 1255–1261 (2019).
 36.
So, S., Mun, J. & Rho, J. Simultaneous inverse design of materials and structures via deep learning: Demonstration of dipole resonance engineering using coreshell nanoparticles. ACS Applied Materials & Interfaces 11, 24264–24268 (2019).
 37.
Ma, W., Cheng, F. & Liu, Y. Deeplearningenabled ondemand design of chiral metamaterials. ACS nano 12, 6326–6334 (2018).
 38.
Mnih, V. et al. Humanlevel control through deep reinforcement learning. Nature 518, 529 (2015).
 39.
Sajedian, I., Badloe, T. & Rho, J. Optimisation of colour generation from dielectric nanostructures using reinforcement learning. Optics express 27, 5874–5883 (2019).
 40.
Bukov, M. et al. Reinforcement learning in different phases of quantum control. Physical Review X 8, 031086 (2018).
 41.
Sajedian, I., Zakery, A. & Rho, J. High efficiency second and third harmonic generation from magnetic metamaterials by using a grating. Optics Communications 397, 17–21 (2017).
 42.
Sutton, R. S. & Barto, A. G. Reinforcement learning: An introduction. (MIT, 1998).
 43.
Gerchberg, R. W. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 35, 237–246 (1972).
 44.
Goodman, J. W. Introduction to Fourier optics. (Roberts and Company Publishers, 2005).
 45.
Huang, K. et al. Silicon multimetaholograms for the broadband visible light. Laser & Photonics Reviews 10, 500–509 (2016).
Acknowledgements
This work is financially supported from the National Research Foundation (NRF) grants (NRF2018M3D1A1058998, NRF2019R1A2C3003129, CAMM2019M3A6B3030637 and NRF2015R1A5A1037668) funded by the Ministry of Science and ICT (MSIT), Republic of Korea.
Author information
Affiliations
Contributions
J.R. and I.S. conceived the concept and initiated the project. I.S. designed the research and performed numerical calculations. H.L. provided the materials chracterizations. I.S. and J.R. wrote the manuscript. All authors contributed to the discussion, analysis and confirmed the final manuscript. J.R. guided the entire the project.
Corresponding author
Correspondence to Junsuk Rho.
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Received
Accepted
Published
DOI
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.