The ability to present three-dimensional (3D) scenes with continuous depth sensation has a profound impact on virtual and augmented reality, human–computer interaction, education and training. Computer-generated holography (CGH) enables high-spatio-angular-resolution 3D projection via numerical simulation of diffraction and interference1. Yet, existing physically based methods fail to produce holograms with both per-pixel focal control and accurate occlusion2,3. The computationally taxing Fresnel diffraction simulation further places an explicit trade-off between image quality and runtime, making dynamic holography impractical4. Here we demonstrate a deep-learning-based CGH pipeline capable of synthesizing a photorealistic colour 3D hologram from a single RGB-depth image in real time. Our convolutional neural network (CNN) is extremely memory efficient (below 620 kilobytes) and runs at 60 hertz for a resolution of 1,920 × 1,080 pixels on a single consumer-grade graphics processing unit. Leveraging low-power on-device artificial intelligence acceleration chips, our CNN also runs interactively on mobile (iPhone 11 Pro at 1.1 hertz) and edge (Google Edge TPU at 2.0 hertz) devices, promising real-time performance in future-generation virtual and augmented-reality mobile headsets. We enable this pipeline by introducing a large-scale CGH dataset (MIT-CGH-4K) with 4,000 pairs of RGB-depth images and corresponding 3D holograms. Our CNN is trained with differentiable wave-based loss functions5 and physically approximates Fresnel diffraction. With an anti-aliasing phase-only encoding method, we experimentally demonstrate speckle-free, natural-looking, high-resolution 3D holograms. Our learning-based approach and the Fresnel hologram dataset will help to unlock the full potential of holography and enable applications in metasurface design6,7, optical and acoustic tweezer-based microscopic manipulation8,9,10, holographic microscopy11 and single-exposure volumetric 3D printing12,13.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Our hologram dataset (MIT-CGH-4K) and the trained CNN model will be made publicly available (on GitHub) along with the paper.
The code to evaluate the trained CNN model will be made publicly available (on GitHub) along with the paper. Additional codes are available from the corresponding authors upon reasonable request.
Benton, S. A., Bove, J. & Michael, V. Holographic Imaging (John Wiley & Sons, 2008).
Maimone, A., Georgiou, A. & Kollin, J. S. Holographic near-eye displays for virtual and augmented reality. ACM Trans. Graph. 36, 85:1–85:16 (2017).
Shi, L., Huang, F.-C., Lopes, W., Matusik, W. & Luebke, D. Near-eye light field holographic rendering with spherical waves for wide field of view interactive 3D computer graphics. ACM Trans. Graph. 36, 236:1–236:17 (2017).
Tsang, P. W. M., Poon, T.-C. & Wu, Y. M. Review of fast methods for point-based computer-generated holography [Invited]. Photon. Res. 6, 837–846 (2018).
Sitzmann, V. et al. End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging. ACM Trans. Graph. 37, 114:1–114:13 (2018).
Lee, G.-Y. et al. Metasurface eyepiece for augmented reality. Nat. Commun. 9, 4562 (2018).
Hu, Y. et al. 3d-integrated metasurfaces for full-colour holography. Light Sci. Appl. 8, 86 (2019).
Melde, K., Mark, A. G., Qiu, T. & Fischer, P. Holograms for acoustics. Nature 537, 518–522 (2016).
Smalley, D. et al. A photophoretic-trap volumetric display. Nature 553, 486–490 (2018).
Hirayama, R., Plasencia, D. M., Masuda, N. & Subramanian, S. A volumetric display for visual, tactile and audio presentation using acoustic trapping. Nature 575, 320–323 (2019).
Rivenson, Y., Wu, Y. & Ozcan, A. Deep learning in holography and coherent imaging. Light Sci. Appl. 8, 85 (2019).
Shusteff, M. et al. One-step volumetric additive manufacturing of complex polymer structures. Sci. Adv. 3, eaao5496 (2017).
Kelly, B. E. et al. Volumetric additive manufacturing via tomographic reconstruction. Science 363, 1075–1079 (2019).
Levoy, M. & Hanrahan, P. Light field rendering. In Proc. 23rd Annual Conference on Computer Graphics and Interactive Techniques 31–42 (ACM, 1996).
Waters, J. P. Holographic image synthesis utilizing theoretical methods. Appl. Phys. Lett. 9, 405–407 (1966).
Leseberg, D. & Frère, C. Computer-generated holograms of 3-D objects composed of tilted planar segments. Appl. Opt. 27, 3020–3024 (1988).
Tommasi, T. & Bianco, B. Computer-generated holograms of tilted planes by a spatial frequency approach. J. Opt. Soc. Am. A 10, 299–305 (1993).
Matsushima, K. & Nakahara, S. Extremely high-definition full-parallax computer-generated hologram created by the polygon-based method. Appl. Opt. 48, H54–H63 (2009).
Symeonidou, A., Blinder, D., Munteanu, A. & Schelkens, P. Computer-generated holograms by multiple wavefront recording plane method with occlusion culling. Opt. Express 23, 22149–22161 (2015).
Lucente, M. E. Interactive computation of holograms using a look-up table. J. Electron. Imaging 2, 28–35 (1993).
Lucente, M. & Galyean, T. A. Rendering interactive holographic images. In Proc. 22nd Annual Conference on Computer Graphics and Interactive Techniques, 387–394 (ACM, 1995).
Lucente, M. Interactive three-dimensional holographic displays: seeing the future in depth. Comput. Graph. 31, 63–67 (1997).
Chen, J.-S. & Chu, D. P. Improved layer-based method for rapid hologram generation and real-time interactive holographic display applications. Opt. Express 23, 18143–18155 (2015).
Zhao, Y., Cao, L., Zhang, H., Kong, D. & Jin, G. Accurate calculation of computer-generated holograms using angular-spectrum layer-oriented method. Opt. Express 23, 25440–25449 (2015).
Makey, G. et al. Breaking crosstalk limits to dynamic holography using orthogonality of high-dimensional random vectors. Nat. Photon. 13, 251–256 (2019).
Yamaguchi, M., Hoshino, H., Honda, T. & Ohyama, N. in Practical Holography VII: Imaging and Materials Vol. 1914 (ed. Benton, S. A.) 25–31 (SPIE, 1993).
Barabas, J., Jolly, S., Smalley, D. E. & Bove, V. M. Jr in Practical Holography XXV: Materials and Applications Vol. 7957 (ed. Bjelkhagen, H. I.) 13–19 (SPIE, 2011).
Zhang, H., Zhao, Y., Cao, L. & Jin, G. Fully computed holographic stereogram based algorithm for computer-generated holograms with accurate depth cues. Opt. Express 23, 3901–3913 (2015).
Padmanaban, N., Peng, Y. & Wetzstein, G. Holographic near-eye displays based on overlap-add stereograms. ACM Trans. Graph. 38, 214:1–214:13 (2019).
Shimobaba, T., Masuda, N. & Ito, T. Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane. Opt. Lett. 34, 3133–3135 (2009).
Wakunami, K. & Yamaguchi, M. Calculation for computer generated hologram using ray-sampling plane. Opt. Express 19, 9086–9101 (2011).
Häussler, R. et al. Large real-time holographic 3Dd displays: enabling components and results. Appl. Opt. 56, F45–F52 (2017).
Hamann, S., Shi, L., Solgaard, O. & Wetzstein, G. Time-multiplexed light field synthesis via factored Wigner distribution function. Opt. Lett. 43, 599–602 (2018).
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. International Conference on International Conference on Machine Learning (ICML) 807–814 (Omnipress, 2010).
Sinha, A., Lee, J., Li, S. & Barbastathis, G. Lensless computational imaging through deep learning. Optica 4, 1117–1125 (2017).
Metzler, C. et al. prdeep: robust phase retrieval with a flexible deep network. In Proc. International Conference on International Conference on Machine Learning (ICML) 3501–3510 (JMLR, 2018).
Eybposh, M. H., Caira, N. W., Chakravarthula, P., Atisa, M. & Pégard, N. C. in Optics and the Brain BTu2C–2 (Optical Society of America, 2020).
Rivenson, Y., Zhang, Y., Günaydın, H., Teng, D. & Ozcan, A. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light Sci. Appl. 7, 17141 (2018).
Ren, Z., Xu, Z. & Lam, E. Y. Learning-based nonparametric autofocusing for digital holography. Optica 5, 337–344 (2018).
Wu, Y. et al. Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery. Optica 5, 704–710 (2018).
Horisaki, R., Takagi, R. & Tanida, J. Deep-learning-generated holography. Appl. Opt. 57, 3859–3863 (2018).
Peng, Y., Choi, S., Padmanaban, N. & Wetzstein, G. Neural holography with camera-in-the-loop training. ACM Trans. Graph. 39, 185:1–185:14 (2020).
Jiao, S. et al. Compression of phase-only holograms with JPEG standard and deep learning. Appl. Sci. 8, 1258 (2018).
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S. & Vedaldi, A. Describing textures in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3606–3613 (IEEE, 2014).
Dai, D., Riemenschneider, H. & Gool, L. V. The synthesizability of texture examples. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3027–3034 (IEEE, 2014).
Kim, C., Zimmer, H., Pritch, Y., Sorkine-Hornung, A. & Gross, M. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph. 32, 73:1–73:12 (2013).
Matsushima, K. & Shimobaba, T. Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields. Opt. Express 17, 19662–19673 (2009).
Shimobaba, T. & Ito, T. A color holographic reconstruction system by time division multiplexing with reference lights of laser. Opt. Rev. 10, 339–341 (2003).
Hsueh, C. K. & Sawchuk, A. A. Computer-generated double-phase holograms. Appl. Opt. 17, 3874–3883 (1978).
Mendoza-Yero, O., Mínguez-Vega, G. & Lancis, J. Encoding complex fields by using a phase-only optical element. Opt. Lett. 39, 1740–1743 (2014).
Xiao, L., Kaplanyan, A., Fix, A., Chapman, M. & Lanman, D. DeepFocus: learned image synthesis for computational displays. ACM Trans. Graph. 37, 200:1–200:13 (2018).
Wang, Y., Sang, X., Chen, Z., Li, H. & Zhao, L. Real-time photorealistic computer-generated holograms based on backward ray tracing and wavefront recording planes. Opt. Commun. 429, 12–17 (2018).
Hasegawa, N., Shimobaba, T., Kakue, T. & Ito, T. Acceleration of hologram generation by optimizing the arrangement of wavefront recording planes. Appl. Opt. 56, A97–A103 (2017).
Sifatul Islam, M. et al. Max-depth-range technique for faster full-color hologram generation. Appl. Opt. 59, 3156–3164 (2020).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) 234–241 (Springer, 2015).
Yu, F., Koltun, V. & Funkhouser, T. Dilated residual networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 472–480 (IEEE, 2017).
We thank K. Aoyama and S. Wen (from Sony) for discussions; J. Minor, T. Du, M. Foshey, L. Makatura, W. Shou and T. Erps from MIT for improving/editing the manuscript; R. White for the administration of the project; X. Ju for the design of iPhone demo; and P. Ma for providing an iPhone 11 Pro for the mobile demo. We acknowledge funding from Sony Research Award Program.
The authors declare no competing interests.
Peer review information Nature thanks Tomoyoshi Shimobaba and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Visualization of masked Fresnel zone plates computed by OA-PBM and performance comparison of foreground occlusion.
a, A depth image cropped from a frame of Big Buck Bunny. Three regions with different depth landscapes are highlighted in different colours. b, Masked Fresnel zone plates computed for the centre pixel of each highlighted region. Three pixels are propagated for the same distance for ease of comparison. The flat depth landscape around the green pixel results in a non-occluded Fresnel zone plate. The masked Fresnel zone plates of red and blue pixels contain sharp cutoffs at their long-distance separated occlusion boundaries, and freeform shapes at occlusion boundaries with moderate distance separation and varying depth distribution. c, Comparison of foreground reconstruction by the PBM, OA-PBM and Fresnel diffraction. The scene is a cropped modulation transfer function bar target with a step depth profile. The PBM leaks a considerable portion of the background into the foreground due to a lack of occlusion handling. The artefacts are clearly visible in the original unmagnified view. The OA-PBM removes a considerable portion of the artefacts and the remaining artefacts are visually inconsequential in the unmagnified view. d, Comparison of focal stacks reconstructed by the PBM and OA-PBM for the Big Buck Bunny. The orange bounding boxes mark the background leakage in the PBM reconstructions. a, d, Images reproduced from www.bigbuckbunny.org (© 2008, Blender Foundation) under a Creative Commons licence (https://creativecommons.org/licenses/by/3.0/).
a, The RGB-D image, amplitude and phase of two samples from the MIT-CGH-4K dataset. The RGB image records the amplitude of the scene (directly visualized in sRGB space) and consists of large variations in colour, texture, shading and occlusion. The pixel depth has a statistically uniform distribution throughout the view frustum. The phase presents high-frequency features at both occlusion boundaries and texture edges to accommodate rapid depth and colour changes. b, A sample RGB-D image from the DeepFocus dataset51. c, Histograms of pixel depth distribution computed for the MIT-CGH-4K dataset and the DeepFocus dataset. b, Image reproduced from ‘3D Scans from Louvre Museum’ by Benjamin Bardou under a Creative Commons licence (https://creativecommons.org/licenses/by-nc/4.0/).
a, A holographic display magnified through a diverging point light source. b, A holographic display unmagnified through the thin-lens formula. c, The target hologram in this example is propagated to the centre of the unmagnified view frustum to produce the midpoint hologram. The width of the maximum subhologram is considerably reduced.
a, Performance comparison of different CNN architectures. b, Performance comparison of different CNN miniaturization methods. c, CNN prediction of two standard test pattern (USAF-1951 and RCA Indian-head) variants made by the authors.
a, b, CNN prediction of amplitude and phase along with focused reconstructions for holograms of a living room scene from the DeepFocus dataset51 (a) and a night landscape scene from the Stanford light field dataset29 (b). a, Certain still images from ‘ArchVizPRO Vol. 2’ were used to render new images for inclusion in this publication with the permission of the copyright holder (© Corridori Ruggero 2018), under a Creative Commons licence (https://creativecommons.org/licenses/by-nc/4.0/). Panel b reproduced with permission from ref. 29, ACM.
a, b, CNN prediction of amplitude and phase along with focused reconstructions for holograms of a statue scene (a) and a mansion scene (b). Both scenes are from the ETH light field dataset46.
Reconstruction of two real-world scenes from the encoded phase-only holograms. The couch scene is focused on the mouse toy and the statue scene is focused on the black statue. Orange bounding boxes highlight regions with strong high-frequency artefacts. Left: DPM. Right: AA-DPM.
Extended Data Fig. 8 Holographic display prototype used for the experimental results shown in this paper.
The control box of the laser, Labjack DAQ and camera are not visualized in the figure.
The RGB-D input can be found in Extended Data Fig. 6.
This video demonstrates a simulated focal sweep of a CNN predicted hologram computed for a real-world captured 3D couch scene. The image resolution is 1080p.
This video demonstrates a simulated focal sweep of a CNN predicted hologram computed for a computer-rendered 3D living room scene. The image resolution is 1024*1024.
This video demonstrates a photographed focal sweep of a CNN predicted hologram computed for a real-world captured 3D couch scene. The video is captured by a Sony A7 Mark III mirrorless camera paired with a Sony GM 16-35mm/f2.8 camera lens at 4K/30 Hz and downsampled to 1080p. Only green channel is visualized for temporal stability.
This video demonstrates real-time 3D hologram computation on a NVIDIA TITAN RTX GPU. The video is captured by a Panasonic GH5 mirrorless camera with a Lumix 10-25 mm f/1.7 lens at 4K/60 Hz (a colour frame rate of 20 Hz) and downsampled to 1080P. The color is obtained field sequentially.
This video demonstrates interactive hologram computation on an iPhone 11 Pro using a mini version of tensor holography CNN (see Fig. 2 caption for network architecture details).
This video demonstrates a simulated focal sweep of a CNN predicted hologram computed for a 3D Star test pattern. The image resolution is 1550*1462.
About this article
Cite this article
Shi, L., Li, B., Kim, C. et al. Towards real-time photorealistic 3D holography with deep neural networks. Nature 591, 234–239 (2021). https://doi.org/10.1038/s41586-020-03152-0