Towards real-time photorealistic 3D holography with deep neural networks

Shi, Liang; Li, Beichen; Kim, Changil; Kellnhofer, Petr; Matusik, Wojciech

doi:10.1038/s41586-020-03152-0

Article
Published: 10 March 2021

Towards real-time photorealistic 3D holography with deep neural networks

Nature volume 591, pages 234–239 (2021)Cite this article

37k Accesses
283 Citations
271 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 26 April 2021

This article has been updated

Abstract

The ability to present three-dimensional (3D) scenes with continuous depth sensation has a profound impact on virtual and augmented reality, human–computer interaction, education and training. Computer-generated holography (CGH) enables high-spatio-angular-resolution 3D projection via numerical simulation of diffraction and interference¹. Yet, existing physically based methods fail to produce holograms with both per-pixel focal control and accurate occlusion^2,3. The computationally taxing Fresnel diffraction simulation further places an explicit trade-off between image quality and runtime, making dynamic holography impractical⁴. Here we demonstrate a deep-learning-based CGH pipeline capable of synthesizing a photorealistic colour 3D hologram from a single RGB-depth image in real time. Our convolutional neural network (CNN) is extremely memory efficient (below 620 kilobytes) and runs at 60 hertz for a resolution of 1,920 × 1,080 pixels on a single consumer-grade graphics processing unit. Leveraging low-power on-device artificial intelligence acceleration chips, our CNN also runs interactively on mobile (iPhone 11 Pro at 1.1 hertz) and edge (Google Edge TPU at 2.0 hertz) devices, promising real-time performance in future-generation virtual and augmented-reality mobile headsets. We enable this pipeline by introducing a large-scale CGH dataset (MIT-CGH-4K) with 4,000 pairs of RGB-depth images and corresponding 3D holograms. Our CNN is trained with differentiable wave-based loss functions⁵ and physically approximates Fresnel diffraction. With an anti-aliasing phase-only encoding method, we experimentally demonstrate speckle-free, natural-looking, high-resolution 3D holograms. Our learning-based approach and the Fresnel hologram dataset will help to unlock the full potential of holography and enable applications in metasurface design^6,7, optical and acoustic tweezer-based microscopic manipulation^8,9,10, holographic microscopy¹¹ and single-exposure volumetric 3D printing^12,13.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Tensor holography workflow for learning Fresnel holograms from RGB-D images.**

**Fig. 2: Performance evaluation of the OA-PBM and tensor holography CNN.**

**Fig. 3: Experimental demonstration of 2D and 3D holographic projection.**

End-to-end learning of 3D phase-only holograms for holographic display

Article Open access 03 August 2022

Neural étendue expander for ultra-wide-angle high-fidelity holographic display

Article Open access 22 April 2024

Augmented reality and virtual reality displays: emerging technologies and future perspectives

Article Open access 25 October 2021

Data availability

Our hologram dataset (MIT-CGH-4K) and the trained CNN model will be made publicly available (on GitHub) along with the paper.

Code availability

The code to evaluate the trained CNN model will be made publicly available (on GitHub) along with the paper. Additional codes are available from the corresponding authors upon reasonable request.

Change history

26 April 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41586-021-03476-5

References

Benton, S. A., Bove, J. & Michael, V. Holographic Imaging (John Wiley & Sons, 2008).
Maimone, A., Georgiou, A. & Kollin, J. S. Holographic near-eye displays for virtual and augmented reality. ACM Trans. Graph. 36, 85:1–85:16 (2017).
Article Google Scholar
Shi, L., Huang, F.-C., Lopes, W., Matusik, W. & Luebke, D. Near-eye light field holographic rendering with spherical waves for wide field of view interactive 3D computer graphics. ACM Trans. Graph. 36, 236:1–236:17 (2017).
Article Google Scholar
Tsang, P. W. M., Poon, T.-C. & Wu, Y. M. Review of fast methods for point-based computer-generated holography [Invited]. Photon. Res. 6, 837–846 (2018).
Article Google Scholar
Sitzmann, V. et al. End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging. ACM Trans. Graph. 37, 114:1–114:13 (2018).
Article Google Scholar
Lee, G.-Y. et al. Metasurface eyepiece for augmented reality. Nat. Commun. 9, 4562 (2018).
Article ADS PubMed PubMed Central Google Scholar
Hu, Y. et al. 3d-integrated metasurfaces for full-colour holography. Light Sci. Appl. 8, 86 (2019).
Article ADS PubMed PubMed Central Google Scholar
Melde, K., Mark, A. G., Qiu, T. & Fischer, P. Holograms for acoustics. Nature 537, 518–522 (2016).
Article ADS CAS PubMed Google Scholar
Smalley, D. et al. A photophoretic-trap volumetric display. Nature 553, 486–490 (2018).
Article ADS CAS PubMed Google Scholar
Hirayama, R., Plasencia, D. M., Masuda, N. & Subramanian, S. A volumetric display for visual, tactile and audio presentation using acoustic trapping. Nature 575, 320–323 (2019).
Article ADS CAS PubMed Google Scholar
Rivenson, Y., Wu, Y. & Ozcan, A. Deep learning in holography and coherent imaging. Light Sci. Appl. 8, 85 (2019).
Article ADS PubMed PubMed Central Google Scholar
Shusteff, M. et al. One-step volumetric additive manufacturing of complex polymer structures. Sci. Adv. 3, eaao5496 (2017).
Article PubMed PubMed Central Google Scholar
Kelly, B. E. et al. Volumetric additive manufacturing via tomographic reconstruction. Science 363, 1075–1079 (2019).
Article ADS CAS PubMed Google Scholar
Levoy, M. & Hanrahan, P. Light field rendering. In Proc. 23rd Annual Conference on Computer Graphics and Interactive Techniques 31–42 (ACM, 1996).
Waters, J. P. Holographic image synthesis utilizing theoretical methods. Appl. Phys. Lett. 9, 405–407 (1966).
Article ADS Google Scholar
Leseberg, D. & Frère, C. Computer-generated holograms of 3-D objects composed of tilted planar segments. Appl. Opt. 27, 3020–3024 (1988).
Article ADS CAS PubMed Google Scholar
Tommasi, T. & Bianco, B. Computer-generated holograms of tilted planes by a spatial frequency approach. J. Opt. Soc. Am. A 10, 299–305 (1993).
Article ADS Google Scholar
Matsushima, K. & Nakahara, S. Extremely high-definition full-parallax computer-generated hologram created by the polygon-based method. Appl. Opt. 48, H54–H63 (2009).
Article PubMed Google Scholar
Symeonidou, A., Blinder, D., Munteanu, A. & Schelkens, P. Computer-generated holograms by multiple wavefront recording plane method with occlusion culling. Opt. Express 23, 22149–22161 (2015).
Article ADS PubMed Google Scholar
Lucente, M. E. Interactive computation of holograms using a look-up table. J. Electron. Imaging 2, 28–35 (1993).
Article ADS Google Scholar
Lucente, M. & Galyean, T. A. Rendering interactive holographic images. In Proc. 22nd Annual Conference on Computer Graphics and Interactive Techniques, 387–394 (ACM, 1995).
Lucente, M. Interactive three-dimensional holographic displays: seeing the future in depth. Comput. Graph. 31, 63–67 (1997).
Article Google Scholar
Chen, J.-S. & Chu, D. P. Improved layer-based method for rapid hologram generation and real-time interactive holographic display applications. Opt. Express 23, 18143–18155 (2015).
Article ADS PubMed Google Scholar
Zhao, Y., Cao, L., Zhang, H., Kong, D. & Jin, G. Accurate calculation of computer-generated holograms using angular-spectrum layer-oriented method. Opt. Express 23, 25440–25449 (2015).
Article ADS CAS PubMed Google Scholar
Makey, G. et al. Breaking crosstalk limits to dynamic holography using orthogonality of high-dimensional random vectors. Nat. Photon. 13, 251–256 (2019).
Article ADS CAS Google Scholar
Yamaguchi, M., Hoshino, H., Honda, T. & Ohyama, N. in Practical Holography VII: Imaging and Materials Vol. 1914 (ed. Benton, S. A.) 25–31 (SPIE, 1993).
Barabas, J., Jolly, S., Smalley, D. E. & Bove, V. M. Jr in Practical Holography XXV: Materials and Applications Vol. 7957 (ed. Bjelkhagen, H. I.) 13–19 (SPIE, 2011).
Zhang, H., Zhao, Y., Cao, L. & Jin, G. Fully computed holographic stereogram based algorithm for computer-generated holograms with accurate depth cues. Opt. Express 23, 3901–3913 (2015).
Article ADS PubMed Google Scholar
Padmanaban, N., Peng, Y. & Wetzstein, G. Holographic near-eye displays based on overlap-add stereograms. ACM Trans. Graph. 38, 214:1–214:13 (2019).
Article Google Scholar
Shimobaba, T., Masuda, N. & Ito, T. Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane. Opt. Lett. 34, 3133–3135 (2009).
Article ADS PubMed Google Scholar
Wakunami, K. & Yamaguchi, M. Calculation for computer generated hologram using ray-sampling plane. Opt. Express 19, 9086–9101 (2011).
Article ADS PubMed Google Scholar
Häussler, R. et al. Large real-time holographic 3Dd displays: enabling components and results. Appl. Opt. 56, F45–F52 (2017).
Article PubMed Google Scholar
Hamann, S., Shi, L., Solgaard, O. & Wetzstein, G. Time-multiplexed light field synthesis via factored Wigner distribution function. Opt. Lett. 43, 599–602 (2018).
Article ADS PubMed Google Scholar
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. International Conference on International Conference on Machine Learning (ICML) 807–814 (Omnipress, 2010).
Sinha, A., Lee, J., Li, S. & Barbastathis, G. Lensless computational imaging through deep learning. Optica 4, 1117–1125 (2017).
Article ADS Google Scholar
Metzler, C. et al. prdeep: robust phase retrieval with a flexible deep network. In Proc. International Conference on International Conference on Machine Learning (ICML) 3501–3510 (JMLR, 2018).
Eybposh, M. H., Caira, N. W., Chakravarthula, P., Atisa, M. & Pégard, N. C. in Optics and the Brain BTu2C–2 (Optical Society of America, 2020).
Rivenson, Y., Zhang, Y., Günaydın, H., Teng, D. & Ozcan, A. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light Sci. Appl. 7, 17141 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ren, Z., Xu, Z. & Lam, E. Y. Learning-based nonparametric autofocusing for digital holography. Optica 5, 337–344 (2018).
Article ADS Google Scholar
Wu, Y. et al. Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery. Optica 5, 704–710 (2018).
Article ADS Google Scholar
Horisaki, R., Takagi, R. & Tanida, J. Deep-learning-generated holography. Appl. Opt. 57, 3859–3863 (2018).
Article ADS PubMed Google Scholar
Peng, Y., Choi, S., Padmanaban, N. & Wetzstein, G. Neural holography with camera-in-the-loop training. ACM Trans. Graph. 39, 185:1–185:14 (2020).
Article Google Scholar
Jiao, S. et al. Compression of phase-only holograms with JPEG standard and deep learning. Appl. Sci. 8, 1258 (2018).
Article Google Scholar
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S. & Vedaldi, A. Describing textures in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3606–3613 (IEEE, 2014).
Dai, D., Riemenschneider, H. & Gool, L. V. The synthesizability of texture examples. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3027–3034 (IEEE, 2014).
Kim, C., Zimmer, H., Pritch, Y., Sorkine-Hornung, A. & Gross, M. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph. 32, 73:1–73:12 (2013).
Article MATH Google Scholar
Matsushima, K. & Shimobaba, T. Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields. Opt. Express 17, 19662–19673 (2009).
Article ADS CAS PubMed Google Scholar
Shimobaba, T. & Ito, T. A color holographic reconstruction system by time division multiplexing with reference lights of laser. Opt. Rev. 10, 339–341 (2003).
Article Google Scholar
Hsueh, C. K. & Sawchuk, A. A. Computer-generated double-phase holograms. Appl. Opt. 17, 3874–3883 (1978).
Article ADS CAS PubMed Google Scholar
Mendoza-Yero, O., Mínguez-Vega, G. & Lancis, J. Encoding complex fields by using a phase-only optical element. Opt. Lett. 39, 1740–1743 (2014).
Article ADS PubMed Google Scholar
Xiao, L., Kaplanyan, A., Fix, A., Chapman, M. & Lanman, D. DeepFocus: learned image synthesis for computational displays. ACM Trans. Graph. 37, 200:1–200:13 (2018).
Article Google Scholar
Wang, Y., Sang, X., Chen, Z., Li, H. & Zhao, L. Real-time photorealistic computer-generated holograms based on backward ray tracing and wavefront recording planes. Opt. Commun. 429, 12–17 (2018).
Article ADS CAS Google Scholar
Hasegawa, N., Shimobaba, T., Kakue, T. & Ito, T. Acceleration of hologram generation by optimizing the arrangement of wavefront recording planes. Appl. Opt. 56, A97–A103 (2017).
Article ADS Google Scholar
Sifatul Islam, M. et al. Max-depth-range technique for faster full-color hologram generation. Appl. Opt. 59, 3156–3164 (2020).
Article ADS PubMed Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) 234–241 (Springer, 2015).
Yu, F., Koltun, V. & Funkhouser, T. Dilated residual networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 472–480 (IEEE, 2017).

Download references

Acknowledgements

We thank K. Aoyama and S. Wen (from Sony) for discussions; J. Minor, T. Du, M. Foshey, L. Makatura, W. Shou and T. Erps from MIT for improving/editing the manuscript; R. White for the administration of the project; X. Ju for the design of iPhone demo; and P. Ma for providing an iPhone 11 Pro for the mobile demo. We acknowledge funding from Sony Research Award Program.

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Liang Shi, Beichen Li, Changil Kim, Petr Kellnhofer & Wojciech Matusik
Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, MA, USA
Liang Shi, Beichen Li, Changil Kim, Petr Kellnhofer & Wojciech Matusik

Authors

Liang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Beichen Li
View author publications
You can also search for this author in PubMed Google Scholar
Changil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Petr Kellnhofer
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Matusik
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.S. conceived the idea, implemented the proposed framework, built the display prototype, performed experimental validation, and conducted the iPhone and Edge TPU demo. B.L. performed the pipeline evaluation and made the Supplementary Videos. B.L., C.K. and P.K. were involved in the design of the proposed framework. L.S. and P.K. led the writing and revision of the manuscript. W.M. supervised the work. All authors discussed ideas and results, and contributed to the manuscript.

Corresponding authors

Correspondence to Liang Shi or Wojciech Matusik.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Tomoyoshi Shimobaba and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Visualization of masked Fresnel zone plates computed by OA-PBM and performance comparison of foreground occlusion.

a, A depth image cropped from a frame of Big Buck Bunny. Three regions with different depth landscapes are highlighted in different colours. b, Masked Fresnel zone plates computed for the centre pixel of each highlighted region. Three pixels are propagated for the same distance for ease of comparison. The flat depth landscape around the green pixel results in a non-occluded Fresnel zone plate. The masked Fresnel zone plates of red and blue pixels contain sharp cutoffs at their long-distance separated occlusion boundaries, and freeform shapes at occlusion boundaries with moderate distance separation and varying depth distribution. c, Comparison of foreground reconstruction by the PBM, OA-PBM and Fresnel diffraction. The scene is a cropped modulation transfer function bar target with a step depth profile. The PBM leaks a considerable portion of the background into the foreground due to a lack of occlusion handling. The artefacts are clearly visible in the original unmagnified view. The OA-PBM removes a considerable portion of the artefacts and the remaining artefacts are visually inconsequential in the unmagnified view. d, Comparison of focal stacks reconstructed by the PBM and OA-PBM for the Big Buck Bunny. The orange bounding boxes mark the background leakage in the PBM reconstructions. a, d, Images reproduced from www.bigbuckbunny.org (© 2008, Blender Foundation) under a Creative Commons licence (https://creativecommons.org/licenses/by/3.0/).

Extended Data Fig. 2 Samples of the MIT-CGH-4K dataset and comparison with the DeepFocus dataset.

a, The RGB-D image, amplitude and phase of two samples from the MIT-CGH-4K dataset. The RGB image records the amplitude of the scene (directly visualized in sRGB space) and consists of large variations in colour, texture, shading and occlusion. The pixel depth has a statistically uniform distribution throughout the view frustum. The phase presents high-frequency features at both occlusion boundaries and texture edges to accommodate rapid depth and colour changes. b, A sample RGB-D image from the DeepFocus dataset⁵¹. c, Histograms of pixel depth distribution computed for the MIT-CGH-4K dataset and the DeepFocus dataset. b, Image reproduced from ‘3D Scans from Louvre Museum’ by Benjamin Bardou under a Creative Commons licence (https://creativecommons.org/licenses/by-nc/4.0/).

Extended Data Fig. 3 Schematic of the midpoint hologram calculation.

a, A holographic display magnified through a diverging point light source. b, A holographic display unmagnified through the thin-lens formula. c, The target hologram in this example is propagated to the centre of the unmagnified view frustum to produce the midpoint hologram. The width of the maximum subhologram is considerably reduced.

Extended Data Fig. 4 Evaluation of tensor holography CNN on model architecture and test patterns.

a, Performance comparison of different CNN architectures. b, Performance comparison of different CNN miniaturization methods. c, CNN prediction of two standard test pattern (USAF-1951 and RCA Indian-head) variants made by the authors.

Extended Data Fig. 5 Evaluation of tensor holography CNN on additional computer-rendered scenes.

a, b, CNN prediction of amplitude and phase along with focused reconstructions for holograms of a living room scene from the DeepFocus dataset⁵¹ (a) and a night landscape scene from the Stanford light field dataset²⁹ (b). a, Certain still images from ‘ArchVizPRO Vol. 2’ were used to render new images for inclusion in this publication with the permission of the copyright holder (© Corridori Ruggero 2018), under a Creative Commons licence (https://creativecommons.org/licenses/by-nc/4.0/). Panel b reproduced with permission from ref. ²⁹, ACM.

Extended Data Fig. 6 Evaluation of tensor holography CNN on real-world captured scenes.

a, b, CNN prediction of amplitude and phase along with focused reconstructions for holograms of a statue scene (a) and a mansion scene (b). Both scenes are from the ETH light field dataset⁴⁶.

Extended Data Fig. 7 Comparison of the original DPM and the AA-DPM.

Reconstruction of two real-world scenes from the encoded phase-only holograms. The couch scene is focused on the mouse toy and the statue scene is focused on the black statue. Orange bounding boxes highlight regions with strong high-frequency artefacts. Left: DPM. Right: AA-DPM.

Extended Data Fig. 8 Holographic display prototype used for the experimental results shown in this paper.

The control box of the laser, Labjack DAQ and camera are not visualized in the figure.

Extended Data Fig. 9 Additional experimental demonstration of 3D holographic projection (part 1).

The RGB-D input can be found in Extended Data Fig. 6.

Extended Data Fig. 10 Additional experimental demonstration of 3D holographic projection (part 2).

The RGB-D inputs can be found in Extended Data Fig. 6 for a, and Extended Data Fig. 4 for b. Panel a reproduced with permission from ref. ²⁹, ACM.

Supplementary information

Video 1

This video demonstrates a simulated focal sweep of a CNN predicted hologram computed for a real-world captured 3D couch scene. The image resolution is 1080p.

Video 2

This video demonstrates a simulated focal sweep of a CNN predicted hologram computed for a computer-rendered 3D living room scene. The image resolution is 1024*1024.

Video 3

This video demonstrates a photographed focal sweep of a CNN predicted hologram computed for a real-world captured 3D couch scene. The video is captured by a Sony A7 Mark III mirrorless camera paired with a Sony GM 16-35mm/f2.8 camera lens at 4K/30 Hz and downsampled to 1080p. Only green channel is visualized for temporal stability.

Video 4

This video demonstrates real-time 3D hologram computation on a NVIDIA TITAN RTX GPU. The video is captured by a Panasonic GH5 mirrorless camera with a Lumix 10-25 mm f/1.7 lens at 4K/60 Hz (a colour frame rate of 20 Hz) and downsampled to 1080P. The color is obtained field sequentially.

Video 5

This video demonstrates interactive hologram computation on an iPhone 11 Pro using a mini version of tensor holography CNN (see Fig. 2 caption for network architecture details).

Video 6

This video demonstrates a simulated focal sweep of a CNN predicted hologram computed for a 3D Star test pattern. The image resolution is 1550*1462.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, L., Li, B., Kim, C. et al. Towards real-time photorealistic 3D holography with deep neural networks. Nature 591, 234–239 (2021). https://doi.org/10.1038/s41586-020-03152-0

Download citation

Received: 22 April 2020
Accepted: 21 December 2020
Published: 10 March 2021
Issue Date: 11 March 2021
DOI: https://doi.org/10.1038/s41586-020-03152-0

This article is cited by

Liquid lens based holographic camera for real 3D scene hologram acquisition using end-to-end physical model-driven network
- Di Wang
- Zhao-Song Li
- Qiong-Hua Wang
Light: Science & Applications (2024)
Waveguide holography for 3D augmented reality glasses
- Changwon Jang
- Kiseung Bang
- Douglas Lanman
Nature Communications (2024)
Neural étendue expander for ultra-wide-angle high-fidelity holographic display
- Ethan Tseng
- Grace Kuo
- Felix Heide
Nature Communications (2024)
Methods of diffractive optical element generation for rapid, high-quality 3D image formation of objects divided into a set of plane layers
- E. Yu. Zlokazov
- E. D. Minaeva
- A. V. Shifrina
Measurement Techniques (2024)
Intelligent optoelectronic processor for orbital angular momentum spectrum measurement
- Hao Wang
- Ziyu Zhan
- Qiang Liu
PhotoniX (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.