Dielectric metalens for miniaturized imaging systems: progress and challenges

Lightweight, miniaturized optical imaging systems are vastly anticipated in these fields of aerospace exploration, industrial vision, consumer electronics, and medical imaging. However, conventional optical techniques are intricate to downscale as refractive lenses mostly rely on phase accumulation. Metalens, composed of subwavelength nanostructures that locally control light waves, offers a disruptive path for small-scale imaging systems. Recent advances in the design and nanofabrication of dielectric metalenses have led to some high-performance practical optical systems. This review outlines the exciting developments in the aforementioned area whilst highlighting the challenges of using dielectric metalenses to replace conventional optics in miniature optical systems. After a brief introduction to the fundamental physics of dielectric metalenses, the progress and challenges in terms of the typical performances are introduced. The supplementary discussion on the common challenges hindering further development is also presented, including the limitations of the conventional design methods, difficulties in scaling up, and device integration. Furthermore, the potential approaches to address the existing challenges are also deliberated.


Introduction
Miniscule and lightweight imaging systems are increasingly desired in consumer electronics, industrial, medical, and automotive markets. The primary component in such systems is the optical lens that focuses light. Therefore, it is critical to develop lenses that are small in size. With the development of freeform lenses or digital surfacing, thinner refractive lenses with lesser weight are achievable. Microlens-arrays in thickness comparable to several wavelengths are also developed with advanced nanofabrication techniques. However, light focusing by conventional lenses relies on the accumulated propagation phase, and it is difficult to downscale further as sufficient accumulated phase is challenging due to the limited refractive indices of natural materials. Moreover, cascading lenses are usually required for high imaging quality, leading to a bulky architecture and challenges in precise alignment. Diffractive lenses, which rely on constructive interference of transmitted light manipulated by spatially arranging "zones", have been proposed as planar optical lenses 1 . However, they are often impeded by low efficiency, high dispersion, shadowing effect, and integration difficulties 2 . Furthermore, it is difficult for a single diffractive lens to replace multiple refractive lenses.
As an alternative, the metasurface-based flat lens, metalens, could overcome most of the existing challenges [3][4][5][6][7][8] . Metalens focusing is achieved by abrupt phase change locally imparted by subwavelength structures, namely, meta-atoms. Plasmonic metalenses are firstly demonstrated with metallic nanoantennas 9 , but they suffer from large intrinsic loss. To improve the overall efficiency, all-dielectric metalenses consisting of materials with a high refractive index and low loss are preferred. With the rapid progress in meta-optics, dielectric metalenses have shown subsequent advantages compared to the conventional refractive lenses: (i) corrections for monochromatic and chromatic aberrations are both enabled by a single layer or a few layers of nanostructures, although one needs to manipulate the conflicts among lens parameters (e.g., the trade-off between numerical aperture (NA) and field of view (FOV) for correcting multiple aberrations). (ii) In virtue of a high degree of freedom in shaping the wavefront, multiple functions are simultaneously achievable via a single metalens, opening a new era for building diverse fascinating optical devices. For instance, a single-shot polarization camera is achieved by a metalens paired with an image sensor while the corresponding conventional division of focal plane system consists of beam splitters, polarizers, waveplates, cascading lenses, and multiple detectors. (iii) The fabrication of metalenses is compatible with the complementary metaloxide-semiconductor (CMOS) fabrication process in the microelectronics industry 2 . Hence, the metalenses have the potential to be directly integrated with the image sensor and the precise alignment of the component elements is enabled by the well-developed techniques in semiconductor foundries. Thus, dielectric metalenses have been explored in imaging and optical information processing, showing the potential to replace the conventional cascading lenses. However, one must address particular challenges for further technological applications of metalens-based integrated systems.
To encourage the further development of metalensbased compact devices, this review mainly emphasized the progress of dielectric metalenses in compact imaging systems and highlights the challenges obstructing future advancements (Fig. 1). First, a brief introduction to the fundamental physics of dielectric metalenses is presented. The progress and challenges of dielectric metalenses are then introduced in terms of typical performances. Metalenses with a high NA, a large FOV, dispersion engineering capability (for achromatic imaging), and multifunctionality are focused on. The fundamental limitations and design constraints of each performance and conflicts among these performances are discussed further. Subsequently, the existing common challenges hindering the future applications of dielectric metalenses in an integrated system, including limitations of conventional design methods, scaling up of dielectric metalenses, and approaches for integrated devices are highlighted. Finally, we conclude the review with our perspectives on the future work.

Principle
The design of a metalens typically involves three steps: (i) designing the target phase profile; (ii) constructing a nanostructure-phase library; (iii) matching the target phase with the nanostructure phase. Particularly in step (ii), there are several methods to control the local phase by adjusting the geometrical parameters of meta-atoms, and the nanostructure-phase library is conventionally obtained by parameter sweep.

Phase profile design
According to Huygens principle 10 , each point on a wavefront of a propagating wave is itself the source of spherical wavelets in a homogeneous medium. The wavefront in the next instant conforms to the envelope of these secondary wavelets. As depicted in Fig. 2a, the aberration-free focusing requires a spherical wavefront. Hence, the radius of the secondary wavelet at the radial position r should be RðrÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 2 þ f 2 p À f , where f is the focal length of the metalens. Then the phase retardation compared with the reference phase at the center is: φ r ð Þ ¼ ÀkRðrÞ, where k is the wavevector of the light in the medium. As we know, k ¼ n b ω=c ¼ 2πn b =λ, where n b is High-efficiency design methods Large-area metalenses Device integration the refractive index of the background medium, c is the speed of light in the vacuum, ω is the circular frequency, and λ is the vacuum wavelength. Consequently, the normal incidence will focus at a spot after transmitting a surface with a hyperboloidal phase profile: In the design of a metalens, the local phase shifts are imposed by nanostructures with finite unit cell size, so discrete phase levels covering a 2π phase delay are imparted instead of the continuous phase distribution 11 (Fig. 2b). The difference between the envelope wavefront of discretized phase profile and the reference spherical wavefront is represented by the wave aberration function (WAF). As the number of phase discretization levels increases, the metalens discrete phases are closer to the continuous counterparts, and the real envelope wavefront is thus more consistent with the ideal one. Consequently, the root mean square of WAF (WAF rms ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi hWAFi 2 À hWAF 2 i q where the brackets represent the mean value 12 ) decreases as shown in Fig. 2c (red line). Meanwhile, as a conventional criterion for focusing performance, the Strehl ratio (SR), which is defined as the peak intensity normalized to that of the Airy disk, increases with decreasing WAF rms 12 . Therefore, the SR increases with the number of phase discretization levels (blue line in Fig. 2c). According to the Marèchal criterion 13 , aberrations are negligible when WAF rms is less than λ /14, i.e., WAF rms =λ < 0:071. In this case, diffraction is the dominant factor to limit the imaging quality. As shown in Fig. 2c, the Marèchal criterion is satisfied with a phase discretization into four levels (N = 4) 11 , and the corresponding SR is about 0.81. In some cases, for instance, for wide-angle imaging, the envelope may be an aspherical wavefront for the compromise between on-axis and off-axis aberrations (detailed discussion in section "Compromise between high-NA and high FOV"). The WAF rms is thus contributed by not only the phase discretization but also the difference between the aspherical wavefront and the reference spherical wavefront. Increasing the number of phase discretization levels (N) reduces the WAF rms , but a nontrivial fabrication challenge is also posed due to the reduced unit size. Hence, the tradeoff between the focusing quality and fabrication should be manipulated based on the application requirements.
Benefitting from the high design freedom, metalens is also popular to focus the customized beams such as vortex beams [14][15][16][17][18][19] . The desired phase profile, especially those that are too complicated to be presented by an analytical equation, can be extracted from the simulations, or obtained by numerical strategies such as computergenerated holography methods 20,21 .

Phase modulation mechanisms
A metalens is designed by sampling meta-atoms that impose the required local phase for the target. The required 2π phase delay imparted by meta-atoms mainly includes phase shift of resonance, propagation phase, and geometric phase, as shown in Fig. 3.

Resonant phase
A kind of ultrathin dielectric metalens (thickness H ( λ), namely, dielectric Huygens' metalens is achieved based on resonant-type phase related to excited resonances such as Mie resonance [22][23][24][25][26][27][28] and Fabry-Pérot resonance [29][30][31]  high refractive index in the spectral range of interest (Fig. 3a). The excited electric and magnetic dipole moments can be tuned by tailoring the nanodisc's dimensions, meeting the so-called first Kerker condition, and then allowing a maximum of 2π phase shift with near-unity transmission 22 (Fig. 3b). Using anisotropic nanoantennas introduces the degree of freedom related to polarization manipulation 24 . The birefringent effect also can introduce phase change accompanied by polarization transformation 32,33 . Nevertheless, such resonant type of metalens only operates in a narrow bandwidth. Moreover, the considerable resonance mode coupling between adjacent nanostructures may significantly lead to errors in the case of large phase gradients, degrading the focusing performance 25 .

Propagation phase
The dielectric meta-atoms with heights comparable to wavelength (H~λ) can be regarded as truncated waveguides [34][35][36][37] (an example is demonstrated in Fig. 3c 35 ). In this situation, the phase modulation is based on the propagation phase (also known as the dynamic phase) propagating through the meta-atom: where n eff x; y; λ ð Þ is the local effective refractive index of the meta-atoms. The propagation phase can be adjusted by the physical parameters of each unit cell (the size of the nanostructure, the duty cycle, and lattice periodicity of the unit cell, etc.). For meta-atoms with simple configuration, the n eff can be readily computed using single stepindex waveguide models. An example is shown in Fig. 3d. For the nanopillars, the phase calculated by an isolated cylindrical waveguide model agrees well with the results obtained via a finite-difference time-domain (FDTD) analysis. The better agreement with larger diameters corresponds to the stronger confinement 35 . Polarizationinsensitive metalenses are achieved by isotropic nanostructures [34][35][36] , while linear-polarization-dependent responses are realized by anisotropic meta-atoms 37 . Similar to resonant-type nanoantennas, phase modulation is achievable by polarization transformation 38 .

Geometric phase
For circularly polarized incidence E in = [1 ± i] T , the output electric field E out from an anisotropic meta-atom is 7 : where C 1 e iϕ 1 , and C 2 e iϕ 2 are the transmitted coefficient involving the propagation phase. The first term on the right side of Eq. c The normalized magnetic field indicating a waveguide mode. Reproduced with permission 46 . Copyright 2018, American Physical Society. d Comparison of the phase calculated by FDTD simulation of nanopillar on a glass substrate, and the phase due to propagation in an isolated cylindrical waveguide. Reproduced with permission 35 . Copyright 2016, American Chemical Society. e Schematic of the geometric phase in Poincaré sphere. f Geometric phase generated by rotating the orientation of anisotropic meta-atoms (blue) and the corresponding conversion efficiency of the incident circular polarization (red). Reproduced with permission 34 . Copyright 2017, American Chemical Society a phase shift of 2θ (Fig. 3f). Such additional phase shift is originated from the geometric phase, also known as Pancharatnam-Berry (PB) phase 39,40 . PB phase is the extra phase difference caused by paths from one point to another on the Poincaré sphere, so it depends only on the rotation orientations of anisotropic meta-atoms related to points on the Poincaré sphere (Fig. 3e). Hence, broadband phase modulation with 2π phase coverage can be achieved by rotating identical anisotropic meta-atoms, enabling broadband imaging 41 .
The degree of freedom for phase control can be further extended with the combination of two more types of phase control methods. Considering the phase ϕ 2 in Eq. (3), the phase profile is achieved by combining propagation and geometric phase by simultaneously adjusting the rotated angle and size of meta-atoms [42][43][44] . The resonance phase is also combined with the propagation or geometry phase 32 .

Progress and challenges for typical performances
High-resolution imaging is enabled by a single metalens, which possesses capabilities for monochromatic aberration corrections and dispersive engineering. In addition, multifunctionality can be achieved by the versatile manipulation capabilities of metalenses on electromagnetic fields. By virtue of these high-performance features, metalens is a competitive candidate to replace conventional optical components in integrated or microscale optics for applications such as microscopy, augmented reality (AR), virtual reality (VR), polarization imaging, adaptive zoom, and three-dimensional (3D) imaging. Nevertheless, even with the outstanding performance, multiple challenges, both theoretical and experimental, have to be overcome for the continued development of metalens-based optical applications.

Monochromatic aberration correction
Resolution is a key factor for judging the performance of a lens. According to the diffraction theory of physical optics, the focal spot size of a lens is limited by its NA: where FWHM represents the full width of the half maximum of the focal spot size, λ is the incident wavelength, and NA depends on the ambient index (n) as well as the collecting angle of marginal ray (θ): NA ¼ n sin θ. With the fixed wavelength, a higher theoretical resolution can be obtained by a lens having a larger NA. However, the actual resolution of conventional lenses is lowered by monochromatic aberration including axis and off-axis aberrations. As aforementioned, spherical wavefronts are required for aberration-free focusing. However, conventional lenses, which are usually manufactured as spherical surfaces, generate non-spherical wavefronts for parallel incidence. Axis aberration, i.e., spherical aberration, occurs when rays are focused on different image planes after passing through the central and marginal regions of the lens (Fig. 4a). Off-axis aberrations include coma, astigmatism, and field curvature. Coma aberration occurs when rays from an off-axis point are focused at different foci in the ideal image plane, forming a comet-like pattern (Fig. 4b); even without spherical and coma aberrations, astigmatism occurs when the convergent point of the meridian beam and the sagittal beam from an off-axis object point cannot be focused at the same position along the propagating direction (Fig. 4d), and the field curvature is the bending of the image plane (Fig. 4c). Therefore, conventional spherical lenses with high NA suffer from great spherical aberration, while the off-axis aberrations are negligible with a large FOV. With the local phase control capability, metalenses could provide spherical wavefronts for both on-axis and off-axis incident light (Fig. 4e, f), so they are widely explored for the high NA and large FOV cases.

High-NA metalens
A large NA is desirable for applications requiring small light−matter interaction volumes or large angular collections (e.g., microscopy, and functionalized fiber applications). In conventional optics, the supplementary spherical aberration is usually eliminated by cascading a series of precisely aligned compound lenses or aspherical lenses. In contrast, diffraction-limited imaging with single-piece metalens is achievable.

Representative designs and the sampling constraints
Metalenses with hyperbolic phase profiles are originally free of spherical aberration, so it is extendedly explored in high-NA cases. In the work of the Capasso group, a hyperbolic phase profile is implemented with the PB phase controlled by rotating TiO 2 nanofins for a freespace NA of 0.8 41 . The achievable NA is related to the parameters of the unit cell. According to the Nyquist-Shannon sampling theorem in the spatial domain, the spherical aberration could be prevented when NA λ=ð2pÞ 45 , where p is the periodicity of the unit cell. Hence, small periodicities (i.e., high sampling rate) are generally set for large-NA metalens, as listed in Table 1. Note that in Fan's work 46 , the divergent metalens offers blur images as the periodicity is not small enough for the near-unity NA.
However, coupling between adjacent elements would be inevitable with the decreased unit cells. For PB phasebased metalens, the peak polarization conversion efficiency of the nanofin blue shifts as p decreases in a PB phase-based metalens 45 . For waveguide-type metalenses, although the adjacent coupling effect is efficiently reduced by the highly confined electromagnetic field, the range of phase delay is reduced by the limited variation in the lateral dimensions. Hence, meta-atoms with a higher refractive index or higher aspect ratio are preferred to push the NA further. The former, however, is limited by the material, while the latter is restricted by fabrication constraints. In contrast, the easier approach to increase the NA is conventional immersion 45,47 . The highest NA is reported as 1.48 with an oil-immersion c-Si metalens 47 . Moreover, a solid-immersion lens composed of nanoscale diamond pillars is proposed to collect and collimate the emission of an individual nitrogen-vacancy (NV) center 48 .
Approaches to relax sampling restriction The aforementioned lattice constraint could be broken with several methods. For example, Paniagua-Domínguez et al. 49  Fig. 4 Schematic diagrams for the monochromatic aberrations. Panels a-d show the aberrations generated by waves passing through conventional spherical lenses: a the spherical aberration is generated with the non-spherical wavefront of normal incidence; b the coma is generated with the non-spherical wavefront of oblique incidence; c the filed curvature occurs when the incident angle-dependent focal spots form a curve plane; d astigmatism occurs when the transmitted wavefronts for the meridian beam and sagittal beam correspond to different focal lengths, respectively. Panels e-h show aberration-free focusing by metalenses that provide ideal spherical wavefronts: e spherical aberration-free focusing; f coma-free focusing; g focusing without field curvature; h astigmatism-free focusing. The parallelly normal (oblique) incidence is from an on (off)-axis object at an infinite distance. The dash lines in orange represent the wavefront of light after transmitting the focusing lenses (which could be metalenses and also the conventional counterparts). Insets are the corresponding focal spots designed a metalens based on diffracted energy redistribution employing amorphous silicon (a-Si) nanoantenna inclusions with asymmetric scattering patterns (Fig. 5a). The designed lens is applied in a confocal configuration to map color centers in sub-diffractive diamond nanocrystals. Inverse design methods are another candidate to design high-NA metalens since the freeform structures have no limitation of phase sampling (details in section "Advanced design methods"). Benefiting from the small dimension, an inverse-designed metalens is fabricated on a port of an optical fiber tip via 3D nanoprinting. The metalens functionalized fiber device is used for a homemade twophoton direct laser lithography setup 50 (Fig. 5b). At the operating wavelength of 980 nm, the concentric circular structure offers a large NA (~0.85), a focal length (~8 μm), and a small written width (~200 nm). In addition to the discussion of metalens-incorporated optical tweezer 51 , the unique properties of the metalens-based fiber tip are demonstrated by optically trapping freely diffusing microobjects in water (NA~0.88) 52 (Fig. 5c).
Limited focusing efficiency The focusing efficiency is determined by the ratio of integral power at the focusing spot and the incident optical power. To achieve high focusing efficiency, high transmission and tight focusing of the transmitted wave are simultaneously required. The former is vital to avoid image defects such as flare spots, while the latter promises high contrast. Hence, lossless materials possessing high refractive indices within the bandwidth of interest are preferred. With a high refractive index, the electromagnetic field is strongly confined within the nanostructures, enhancing the phasemodulation. Singlet metalenses with focusing efficiency over 90% 34,[53][54][55] have been demonstrated. In particular, the NIL Technology company has reported silicon metalenses possessing 94% efficiency at 940 nm wavelength with anti-reflection coating on the non-structured side of the glass substrate. However, the NAs of these high-efficiency metalenses are smaller than 0.2. The fundamental trade-off between the NA and efficiency can be interpreted by the following physical intuition. The diffracted angle of light at the edge of metalens increases with NA, requiring the increase in phase gradient. Then phase errors due to the coupling among adjacent elements are inevitable, and phase discretization level within the limited period should be reduced. Both factors lead to a decrease in diffracted efficiency at the edge, and the focusing efficiency thus decreases. Faced with the great demand for metalenses with high efficiency and large NA in state-of-the-art imaging devices, some design approaches are developed, including adjoint optimization 56 , grating averaging technique 57 , patching methods 58 , etc. However, there are still challenges such as large computational resources and boundary discontinuities. Hence, innovative methods for designing high-NA metalenses with high focusing efficiencies are in demand.

Wide-angle imaging
Wide-angle imaging is required in imaging applications such as AR, microscopy, landscape imaging, and image projection. Nevertheless, the FOV is often limited by the off-axis aberrations, in particular, the coma aberration. Aplanatic metalenses, which correct both spherical and coma aberrations simultaneously, are highly desired. As depicted in Fig. 4a-d, spherical wavefronts with different centers are required for free of axis and off-axis aberrations, respectively. Therefore, aplanatic imaging is challenging with conventional optics. To balance the axis and off-axis of aberrations, a bulky lens kit with cascading lenses is the conventional manipulation. Patterning metaatoms on a curved surface have been demonstrated to reduce the volume 11 , but it poses inherited drawbacks of conventional lenses as well. For further miniaturization, flat metalenses to ideally focus light with arbitrary incident angle θ i are desired, and the corresponding ideal phase profile should be a function of incident angle 59 : Note that Eq. (5) yields a hyperboloidal phase profile [Eq. (1)] when θ i = 0. However, the phase shift is usually angle-independent in common designs of metalenses. Alternatively, aplanatic metalenses with three main types of configurations are proposed to achieve wide-angle imaging (Fig. 6).
The first type is the singlet metalens with hyperbolic 60-62 or quadratic phase 63 profiles (Fig. 6a). In this case, the FOV is limited by the increased WAF rms with the incident angle, and further manipulation is required for wide-angle imaging. For instance, to achieve wide-field microscopy imaging with a FOV of 18°, a metalens array is developed to cover a wide area of CMOS image sensors so a full stitched wide-field image is completed 60 . Through a nanoimprinted large-area singlet metalens, Lee et al. 62 experimentally demonstrated AR and VR near-eye displays (Fig. 7a). The metalens is constructed by spatially rotating poly-Si nanorods and shows full-color imaging with a wide FOV near 90°with the assistance of three dichroic mirrors. With circularly polarized incidence, the cross-polarized component providing a virtual image is focused while the copolarized component carrying real-world scenes directly transmits 62 . Another kind of singlet aplanatic metalens is the concentric nanoring structures fabricated using the 3D printing technique by Qiu's group 64 (Fig. 7b). The vertical sizes and heights of the nanorings are designed with an epsilon-greedy algorithm-based scheme, achieving a full FOV of 32°. Through the microscopic images of the USAF-1951 target and a bird-feathers sample, the imaging quality of the fabricated metalens is comparable with the commercial plano-convex lens.
The second type is the Chevalier Landscape metalens (Fig. 6b). Normally and obliquely incident rays are separated by the small aperture located at the front focal plane of metalens and then focused by different parts of the metalens, correcting all on-and off-axis aberrations simultaneously 65,66 . With the method, Uriel Levy's group achieved a compact camera for wide-angle outdoor NIR imaging (FOV~30°) with a Huygens metalens patterned by a-Si nanopillars 65 (Fig. 8a). Quadratic phase function is used, and the sufficient focal length (3.36 mm) provides the higher angular resolution necessary for the recognition of facial features at a distance of several meters. Using a polynomial phase profile instead, Hu's group demonstrated a fish-eye metalens operating at 5.2-μm wavelength 66 (Fig. 8b). A Huygens metalens is constructed by a b Singlet Chevalier Landscape Doublet (multilayer metalens) f c Fig. 6 Three types of configurations for aplanatic imaging. a Singlet aplanatic metalens. b Chevalier Landscape metalens. c Doublet aplanatic metalens rectangular and H-shaped PbTe blocks and fabricated on a 2-mm-thick CaF 2 planar substrate. The other side of the substrate is a 1-mm-diameter circular aperture. The resolved images from 0°to 82°show the capability of the metalens to perform diffraction-limited imaging over an unprecedented FOV near 170°. Such a wide-angle singlepiece metalens has no bulk counterpart, but the configuration would be still bulky with a large focal length.
The third type is multilayer metalens. In particular, the doublet configuration includes a corrector metasurface paired with a focusing phase profile (Fig. 6c). Capasso's group utilizes TiO 2 nanofins patterned on opposite silica substrate surfaces according to the PB phase (Fig. 9a). The doublet aplanatic metalens offers a NA of 0.44 and a 50°F OV with circularly polarized illumination at the working wavelength of 532 nm 67 . Achromatic aplanatic metalens has also been proposed combining this approach and the dispersion engineering method 68 . Similarly, Faraon's group constructs a doublet metalens comprising of a-Si nanopillars cladded by a SU-8 polymer protection layer, which shows nearly diffraction-limited focusing for incident angle smaller than 30°. Then a conceptional miniaturized planar camera (total size: 1:6 mm 1:6 mm 1:7 mm) is realized using the doublet metalens and a CMOS image sensor, offering a wide FOV imaging with unpolarized light 69 (Fig. 9b). Working similar to the doublet metalens, two reflective metalenses are integrated side-by-side in a compact spectrometer in millimeter-scale volume 70 . The lights with different wavelengths are diffracted by the first grating (diffractive angle up to 33.9°) and then are collimated by the subsequent metalenses. As a result, the spectrometer has a resolution of about 1.2 nm, resolving more than 80 spectral points from 760 to 860 nm 70 (Fig. 9c). Combining an additional transmissive metalens, a push-broom high spectral imager is further demonstrated 71 (Fig. 9d). However, precise alignment is challenging in these aplanatic configurations.

Compromise between high-NA and high FOV
High-NA metalenses usually utilize hyperbolic phase profiles for free-spherical-aberration, but the FOV is limited as coma is inevitable with oblique incidence, accompanying the decreased focusing efficiency 72 . The behind physics can be revealed with the Fourier transform (FT) spectrum of the field distribution immediately after the metalens 63,72 . Only k-vector components between -k 0 and k 0 are leaky and contribute to focusing, and the symmetry in this efficient space leads to the symmetry of the focal spot. As shown in Fig. 10a, the FT components for hyperbolic phase profiles are tightly confined inside the light line for normal incidence, and the dominant contribution of large k ensures a tight focal spot. Nevertheless, the coma is obvious when the large-k component is shifted beyond the effect space with an oblique angle. Because the amplitude of the large-k component increases with NA 72 , the achievable FOV decreases with increasing NA. Hence, an approach to enlarging the FOV of hyperbolic metalens is to reduce the NA, as adopted in some of the aforementioned multilayer and Chevalier Landscape metalenses. To maintain the high NA, the phase profile can be optimized by superimposing a term such as a polynomial function 73 or be replaced by spherical or quadratic phase profiles 63 . They offer flatter FT spectra than the ideal hyperbolic counterparts, showing larger tolerance to the shift caused by oblique incidence (Fig. 10b, c). However, the on-axis resolution is compromised due to the broad spectra going beyond the effect space. Although the resolution can be increased with designs such as an aplanatic superoscillatory metalens, the focusing efficiency is limited by the sidelobe 74 .
Several advanced design methods such as adjoint optimization and catenary optics have been discussed for highperformance wide-angle metalenses 75 . In addition, exploiting the angular dispersion capability of metasurfaces would efficiently manipulate the on-axis and off-axis aberrations. By controlling the coupling between meta-atoms and the radiation properties of constituent meta-atoms, angledependent phase shift is achieved by meta-devices with metal-insulator-metal configuration 76,77 and multilayered structures 78,79 . Sell et al. 80 demonstrated a freeform siliconbased metagrating that can be continuously adjusted from linear to elliptical birefringence by changing the angle of incidence. Although the overall phase shift is not focused on in this work, it is angle-dependent according to the Jones matrix analysis.

Chromatic aberration correction
Chromatic aberration occurs when light rays passing through a lens focus at frequency-dependent points due to optical dispersion. Chromatic aberration elimination at discrete wavelengths has been achieved through spatial multiplexing 34,[81][82][83] . Multiple sets of meta-atoms are interleaved or stacked, with each controlling the wavefront of a wavelength. Meanwhile, broadband achromatic focusing is achievable by the dispersion engineering capability of metalenses.

Dispersion engineering for achromatic imaging
Metalenses introduce dispersion that is dictated mainly by the geometrical parameters and arrangement of the subwavelength structures 7,84 . As we know, the metalens exhibits different phase profiles depending on the wavelength of the incident light, and the Taylor The first term on the right-hand side corresponds to the transmitted wavefront with the design frequency ω d , and it can be engineered by the phase control methods introduced in section "Phase modulation mechanisms". The higher-order derivative terms including the second and the third terms ( ∂φ r;ω ð Þ ∂ω and ∂ 2 φ r;ω ð Þ 2∂ω 2 represent group delay and group delay dispersion profile, respectively) control the chromatic focal length shift of the metalens 85 . By simultaneously manipulating the phase, group delay, and group delay dispersion profiles, achromatic and chromatic applications are both demonstrated. On the one hand, the high diffractive dispersion is utilized for spectral tomographic imaging negating chromatic aberrations 86 , and a metalens-integrated nano-optic endoscope is presented for optical coherence tomography application 87 (6), the group delay (GD) for a hyperbolic phase profile is: and the corresponding group delay dispersion ∂ 2 φ r;ω ð Þ 2∂ω 2 is zero. To achieve achromatic imaging, the GD is compensated by structural dispersion. There are four cases for achromatic metalenses in terms of methods: (i) The transmitted wavefront and the structural dispersion are simultaneously tailored by changing the geometric parameters of resonant meta-atoms. (ii) The transmitted wavefront and the structural dispersion are simultaneously tailored by changing the geometric parameters of waveguide-type meta-atoms. (iii) The transmitted wavefront and the structural dispersion are controlled by rotating orientation (PB phase) and geometric parameters of resonant meta-atoms, respectively. (iv) The transmitted wavefront and the structural dispersion are controlled by rotating orientation (PB phase) and geometric parameters of waveguide-type meta-atoms, respectively. Particularly, cases (ii) and (iv) are more common as waveguide-type meta-atoms usually offer larger achievable GD compensation than the resonant counterparts 97 (see discussion in section "Challenges for broadband achromatic imaging"). A representative work is shown in Fig. 11a. Each metaatom is comprised of two TiO 2 nanofins in close proximity, acting as a coupled waveguide. Their geometric parameters are designed for linear structural dispersion in the 470-600 nm band (Fig. 11b), and the wavefront is tailored by PB phase 88 . Nevertheless, for a larger relative bandwidth (Δω=ω d ), the structural dispersion relationship shows a strong nonlinearity leading to a notable increase in the compensation error. To better match with the nonlinear structural dispersion, Hu et al. 98 constructed the dispersion model with a wavelength-dependent position (Fig. 11c). Using the nonlinear dispersive phase compensation (Fig. 11d), they achieved ultra-broadband (400-1000 nm) achromatic focusing (Fig. 11e).
The waveguide-type achromatic metalenses perform their potential in lenslet-array-based imaging systems. Achromatic projection of objects with different depths under white light illumination (430-780 nm) is achieved through an integral imaging system using SiN metalens array based on propagation phase 99 (Fig. 12a, b). And an achromatic full-color light-field (LF) camera is demonstrated in the band 400-660 nm by a metalens array that captures LF information based on PB phase control of GaN nanoantennas 100 (Fig. 12c, d).

Challenges for broadband achromatic imaging
Broadband achromatic focusing by a single metalens is an inspiring feature, but the achievable achromatic bandwidth is limited by some fundamental bounds and fabrication challenges.
Fundamental limit for ideal metalens Regardless of the specific implementation, the achromatic bandwidth (Δω) of a metalens is related to the time delay experienced by the signals (ΔT ) and a dimensionless quantity κ showing the upper bound of time-bandwidth products 101 : where the time delay ΔT compensates for the additional time taken by a signal arriving from the edge, and it is consistent with the group delay (i.e., ΔT ðrÞ ¼ GD r ð Þ À GD R Δω Δω max ¼ κc where n b represents the background refractive index. With increasing physical parameters of the metalens such as radius (R), focal length (f), and NA, a greater time delay is required, leading to the shrinking achievable achromatic bandwidth (Δω max ). For metalens with specific R and f, the approach to increase achievable achromatic bandwidth is to enlarge κ, which is dependent on the type of the metalens.
For resonant-type metalenses (H ( λ), the coupledmode theory provides a geometry-and materialindependent value (κ ¼ 2) for a single resonant cavity 102 . Hence, the achromatic bandwidth of such metalenses is limited. The value may be enlarged with meta-atoms that contain multiple resonant cavities. Actually, multiresonances have been found to support broadband achromatic reflective response 84,97,103,104 , but the efficiency is limited by the lossy metallic components. Multiresonances in dielectric nanostructure should be focused on, and the behind physics merits investigation. Moreover, one needs to trade off the achromatic bandwidth against the transmission according to the impedancematching limit, known as the Bode-Fano limit 101,105 :    For non-resonant dielectric meta-atoms acting as truncated waveguides (H $ λ), Tucker et al. 106 gave the upper bound value as κ ¼ ω d H=cðn max À n min Þ, where n max and n min are the maximum and minimum effective indices. Miller et al. 107 provided a limit that is valid for more generic cases (not necessarily dielectric) or those with larger thicknesses (H > λ): . In contrast to resonant-type metalenses, the non-resonant dielectric metalenses can offer larger fundamentally achievable achromatic bandwidth as the time delay is compensated for by structural dispersion of the effective waveguides. As discussed in Shrestha's work 90 , since the waveguide-mode dispersion converges to the light line of the high (low)-index material at high (low) frequency, the dispersion is approximately Therefore, a large structural dispersion can be obtained with high refractive-index contrast (Δn ¼ n eff ω max ð ÞÀ n eff ω min ð Þ) and large height (H), automatically reaching the Turker's limits.
Typical challenges for waveguide-type achromatic metalenses Nevertheless, achieving the fundamental achromatic bandwidth is challenging from a fabrication perspective. Considering longer meta-atoms is an effective way to improve the bandwidth performance with transparent materials. However, if the height of a metaatom is limited beyond a wavelength, its lateral size must be very small to form the effectively homogeneous slab, leading to a high ratio aspect, whose fabrication is still challenging. Another approach to enlarge the achromatic bandwidth is increasing the refractive-index contrast Δn. However, the decade-improvement in achromatic bandwidth is unlikely, even considering freeform all-area optimization, as the refractive-index contrast would not be over a magnitude due to the limited refractive indices of the transparent dielectric. Although large effective indices are enabled by hyperbolic photonic crystals 108 , heights comparable to several wavelengths are required, leading to high ratio aspects as well.
Another restriction related to the fabrication challenge is the diameter size. According to Eq. (7)  For instance, to achieve the group delay required by a 1-mm-diameter TiO 2 metalens with NA = 0.1 for the VIS band, the height of the meta-atoms must be about 7.5 μm, implying tens or hundreds of aspect ratio. Therefore, discrete-wavelength-achromatic imaging based on spatial multiplexing 34,81-83 or dispersive phase compensation 109 is exploited instead for the large-diameter case. However, large computation resources are required in the multipleobjects design process. An approach to strike a compromise is to exploit constructive interference of light from multiple zones and dispersion engineering 89 (Fig. 13a).
With the method, a 2-mm-diameter RGB-achromatic metalens is practically applied in a VR/AR eyepiece (Fig. 13b, c). The achromatic imaging capability of the metalens makes the eyepiece more simple and lightweight compared with the previous work shown in Fig. 7a.
Accesses to relax the constraints For a non-ideal metalens that does not achieve aberration-free focusing, the achromatic bandwidth may surpass the upper bounds (Δω max ) since an error in the implemented phase profile would be acceptable 101 . Hence, a compromise should be manipulated between imaging quality and achromatic bandwidth.
In addition to considering multiple-resonances, multilayer metalenses [110][111][112][113] that break the assumption of onedimensionality may surpass the aforementioned bounds to some degree while remaining small aberrations. For instance, A hybrid achromatic metalens designed by combining recursive ray-tracing and phase libraries shows average focusing efficiencies greater than 60% over a broad NIR band (1-1.8 μm), while the refractive index of composite material (GaN) is about 1.5 110 . Furthermore, one can seek other approaches to realize achromatic focusing. For example, superoscillation, which is usually utilized for subwavelength super-resolution focusing, has also been reported as a holographic approach for achromatic focusing 114 . A binary amplitude mask is constructed by two sets of PB meta-atom with different sizes but sharing the same rotating regulation. The focused double-hump-shape intensity distributions along the propagation axis of different wavelengths overlap even though the focal centers are not in the same plane. Therefore, achromatic imaging could be realized in the overlapped area. However, the extension of the intensity distribution also leads to low efficiency. Reverse design methods and artificial intelligent algorithms would give further access to high-performance achromatic metalenses for practical applications.

Multifunctionality
For some imaging applications such as polarization imaging, zoom imaging, and 3D imaging, simultaneous or tunable multi-foci are required. Cascading optical elements with different functionalities and sometimes mechanical components are thus required to work with lenses. Therefore, the difficulty to reduce the volumes of these imaging systems increases with the number of these separating components. The alignment as well the adjustment and matching of elements' performances are also increasingly challenging with the element number. Although fewer elements are used in some optical systems using time-division strategies (for example, by rotating a polarizer for polarization detection), real-time measurements are unachievable. In contrast, multiple functionalities can be implemented within the same shared aperture of a single metalens due to the high flexibility in controlling light 115,116 . Hence, the multifunctionality of metalenses enables compact, simple, and real-time (if required) imaging systems by reducing both the element size and number. To illustrate the various opportunities of single-piece multifunctional metalenses to replace complicated systems, here we highlight some applications.

Real-time polarization imaging
The real-time polarization imaging technique, which simultaneously forms multiple polarization-resolved images, can reveal information that is invisible in traditional imaging by a single shot. Conventional optical setup for real-time polarization imaging involves a division of focal plane systems with multiple components, including beam splitters, polarizers, waveplates, cascading lenses, and multiple detectors 117 . Although the simplified systems have been demonstrated by incorporating a layer of polarizers above the photodiodes 118 or paring a metagrating with a lens 119 , they suffer from low efficiency and limited image contrast, respectively. The polarizationassisted multifunctional metalenses offer a new platform for high-efficiency and compact polarization cameras.
Incident polarization-dependent transverse separation of focal spots is required for real-time polarization imaging. Birefringent effects in anisotropic meta-atoms are utilized for linear polarized (LP) waves 38,120,121 , and the geometric phase is widely used for circularly polarized (CP) waves 94,122,123 . Because the incident LP and CP waves cannot be simultaneously separated by the same meta-atoms due to structural symmetry, the full-polarization/full-stokes metalenses, which involve six foci, are mostly realized by combining several sets of sub-units. Yang et al. proposed a generalized Hartmann-Shack wavefront sensor by integrating a standard camera with a metasurface array whose superpixel contains six metalenses, each of which attributes the focus of one particular polarization state (Fig. 14a) 124 . In addition to the real-time full-polarization imaging, the sensor can simultaneously detect phase profiles by further analyzing the foci displacements. Faraon's group divided the meta-unit into three parts, with each splitting an orthogonal polarization basis set 125 (Fig. 14b). Consequently, the power in the six polarization states is measured by the corresponding image sensor pixel, offering a full-stokes measurement. Furthermore, the illumination system of a polarization microscope is demonstrated by a metalens whose unit is comprised of four nanofins on the SiO 2 substrate, and each nanofin acts as a half-wave plate to control a linear polarization state 126 (Fig. 14c). Similarly, Luo's group achieved the simultaneous imaging of four polarization states at the MIR wavelength of 10.6 μm (Fig. 14d). Two sets of silicon pillars are interleaved. One forms convergent wavefronts of horizontal/vertical polarization by dynamic phase, while the other offers the CP-dependent transverse shift of focusing by combining the dynamic and geometric phase (i.e., the side length and the rotation angle of the silicon pillars are simultaneously adjusted) 127 .

Zoom imaging
Zoom imaging is usually performed by longitudinal motion of multiple lenses in conventional optics. The complex and bulky tunable lens kit, however, has been transformed into a single element with dynamically tunable metalenses. The methods for zoom metalens can be divided into the following three strategies.
(i) Zoom via mechanical deformation or displacement, which incorporates microelectromechanical systems [128][129][130][131][132][133] or elastomeric materials 9,53,134,135 . For instance, the focal length of a dielectric Moiré metalens at 532 nm is changed from ∼10 to ∼125 mm by tuning the mutual angle between the two complementary phase plates composed of GaN meta-atoms 130 (Fig. 15a). The designed telecentric configuration enables high-contrast multiplane fluorescence imaging, and optically sectioned images of ex vivo mice intestine tissue samples are demonstrated. As shown in Fig. 15b, focal length tuning for a multi-color channel is demonstrated by laterally stretching a varifocal graphene metalens. An over 20% focal length tuning range is achieved for red (650 nm), green (550 nm), and blue (450 nm) light, and zoom imaging of different objects located along the axial direction is demonstrated 135 . Furthermore, with tunable metalens controlled by electrically engineered artificial muscles, monochromatic aberration corrections are simultaneously performed while continuously adjusting the focal length 53 . (ii) Step zoom via polarization tuning, where the focal length is dependent on the polarization states of the incident light [136][137][138][139][140][141][142] . Dual-step zoom is achievable with the LP-dependent or CP-dependent response of multilayer metalenses 136,137 . Fu et al. proposed a double-sided metasurface utilizing dynamic phase modulation of anisotropic silicon nanobricks 136 . For the operating wavelength of 658 nm, the effective focal length switches by the LP of the incident beam, as the front-metasurface offers convergent and divergent functions for the orthotropic LPs, respectively (Fig. 15c). However, such a multilayer metalens suffers relatively low focusing efficiencies (<20%) and more difficult fabrication in comparison to singlet metalenses offering multi-step zoom function 138,139,143 . Continuous focal length tuning by changing the LP state of incidence is also achieved 140 , with the tuning rang smaller than the depth of focus of the metalens. Combining dynamic and geometric phases, Tian et al. proposed a metalens offering two CP-dependent foci with high efficiency (up to 72%), whose relative intensities can be modulated conveniently by changing the ellipticity of the incident light 138 (Fig. 15d).
Particularly, OPCMs offer a promising route for realizing zoom metalens owing to the extremely large refractive index contrast associated with material phase transformation [156][157][158][159][160] . Yin et al. 145 patterned two groups of the plasmonic antenna on top of a blanket OPCM film, with each group responding to phase control of incident light at either the amorphous or crystalline state of the film. However, focusing efficiencies are limited (5% and 10% for the two states, respectively) by the shared-aperture layout and the use of metallic meta-atoms. Shalaginov et al. 146 demonstrated an OPCM-based all-dielectric Huygens metalens that is optimized via a generic design methodology. The focal length can be switched between 1.5 mm (at amorphous state) and 2 mm (at crystalline state) by electrical tuning, with the focal efficiency over 20% at the operating 5.2 μm wavelength. Using the bifoci metalens as an objective in a multi-depth imaging system, well-resolved images of standard USAF-1951 resolution charts coinciding with the two foci are both captured (Fig. 15e). LC-based metalenses are also presented due to the maturity of LC materials 147,148 . For instance, Hu et al. 148 demonstrated electrically tunable multi-zoom metalens by integrating birefringent nematic LCs with a PB phase-based metasurface (Fig. 15f). By applying different voltages, the orientation of LC molecules is adjusted to realize a variable wave plate with different phase retardations.

3D imaging
LF imaging is an approach for single-shot 3D imaging without entailing any physical moving parts. In an LF imaging camera, a lenslet array is employed to capture the images from different vantage points. The depth information of the object is captured as the pixellated image and can be reconstructed slice by slice from a series of rendered images with different depths of focus. The monochromatic and achromatic aberrations in LF imaging can be corrected by specific algorithms 161 and dispersion engineering capability of metalenses 100 , respectively. However, there is still an inherent trade-off between spatial and angular resolution in conventional LF imaging. As the spatial density of the lenslet-array increases, the spatial resolution increases, while the angular resolution is limited by the reduced aperture of each lens.
An approach to mitigate this issue is to interleave the metalenses. As a proof of concept, Holsteen et al. 162 proposed a metalens containing three interleaved phase profiles. The three sets of meta-atoms are arranged according to the PB phase and are randomly interleaved within a 200-μmdiameter shared aperture. As depicted in Fig. 16a, the depth information on particles is directly translated to lateral information in the sub-images, and the resolution is doubled compared with the same non-interleaved three metalenses. Synchronous 3D imaging without substantially modifying the optical system of a conventional optical microscope is further demonstrated simply by adding a patterned coverslip to the top of a fluorescent sample specimen. Another example is the passive snapshot depth sensor demonstrated by Capasso's group 163 . The 3-mm-diameter metalens is comprised of two sets of meta-atoms. By applying cuttingedge nanotechnology and computer vision algorithms to two differently defocused images, depth measurement over a 10-cm distance range is achieved (Fig. 16b).
Park et al. 164 focused on enhancing the spatial resolution of LF images by a virtual moving metalens array. By capitalizing on resonant a-Si:H meta-atoms that impart polarization-dependent phase, the sampling position is laterally shifted by half the elemental lens pitch by switching the linear polarization state of incidence (Fig. 16c). Compared with the normal case (demonstrated by only TE illumination), combining two sets of images (with TE and TM illumination, respectively) enhances the image resolution with the same aperture, so the spatial resolution of the reconstructed depth-slice images is increased with no angular resolution degradation.
3D imaging is also demonstrated via a single-aperture lens implemented by a polarization-insensitive Huygens' metasurface phase mask 165,166 . As the generated doublehelix point spread function (PSF) rotates with the object distances, the depth information of the 3D scene is retrieved with the captured raw image 165 (Fig. 16d).

Limitations of multifunctional metalenses
In addition to the applications that have been introduced in this review, there are multifunctional metalenses for nanotracking 167 , hologram 168 , optical analog computing 169 , and phase measurement 170,171 applications. However, multifunctional metalenses usually involve multiple focal spots, so several sets of sub-unit are usually interleaved to achieve the multiple functions within a shared aperture. Hence, the overall efficiency is limited by not only the efficient component of power but also the crosstalk among the sub-units. Balancing the performances among different focal spots is also a tough task. To optimize the multifunctional metalens, a new criterion could be proposed for the overall evaluation. For example, the performance of the ratio of average focal efficiency (η) to the root-mean-square deviation of the focal efficiencies (η rms ) of the multiple focal spots, i.e., FOM ¼ η=η rms . To speed up the optimization, intelligent algorithms would be the right hand [172][173][174] .

Common challenges for further development
In addition to the challenges concerning typical performances, there are some common challenges for the metalens-integrated applications as the following discussion.

Limitations of the conventional design approach
The high degree of freedom enables versatile functionalities and high performances of metalenses. Nevertheless, extremely high memory space is required in the conventional design approach of metalenses. As introduced in step (ii) in the section "Principle", the nanostructurephase library is constructed by exhaustively scanning all parameters of meta-atoms (including the geometry, size, constituent, etc.) and incident beam (including polarization, incident angle, etc.). Therefore, the required simulation time explosively increases with the number of design degrees of freedom.
Moreover, there are conflicts in different performances of metalenses as mentioned above: The high NA and large FOV are incompatible due to the trade-off between onaxis and off-axis resolution; the breakthrough broadband achromatic feature of a single metalens suffers from the compromises among the operating bandwidth, the focusing efficiency, diameter, and the structural complexity; the further applications of multifunctional metalenses involve the optimization for multiple objectives. These problems lead to multi-objective tasks that further extend the memory space requirement.
Another challenge is the overall optimization of metalens-based imaging systems. The conventional key indicators for lens systems such as PSF, SR, modulation transfer function, and signal-noise ratio are also suitable to characterize metalenses. These criteria can be directly measured 65  of metalenses in optical systems, which could be performed with some commercial software such as Zemax involving ray tracing and wave optics. Note that the design of metalens typically relies on numerical programs solving full-wave Maxwell equations.
Zemaxinteroperability has been enabled by Lumerical, with the nearfield data directly imported into the ray-tracing solver. VirtualLab Fusion provides another approach to designing metasurfaces by combining fast physical optics and ray/field-tracing solvers. In the overall optimization, however, efficient loop data transfer and iteration are arduously achieved without the assistance of customized codes for the moment.

Advanced design methods
Intelligent design The simulation requirement in computation resources can be reduced with the assistance of machine learning algorithms such as neural networks [175][176][177][178] . For instance, a phase library containing 15,753 meta-atoms is generated in less than one second by a backpropagation neural network 179 . However, the massive dataset is expensive and sometimes unrealistic regarding the economic and labor costs. To release the computation pressure, artificial intelligence methods requiring smaller training datasets (such as transfer learning 177 and reinforcement learning 180 methods) could be particularly considered. The intelligent methods are also helpful in overall design and optimization. Tsenga et al. 181 presented a highquality, nano-optics imager via neural computational imaging. They devised a fully differentiable learning framework that learns about a metasurface physical structure in conjunction with a neural feature-based image reconstruction algorithm. From the optimizable phase profile, the differentiable model produces spatially varying PSFs, which are then patch-wise convolved with the input image to form the sensor measurement (Fig. 17a). The final image is produced by deconvolved sensor reading using a neural feature-based image reconstruction algorithm. After optimization, a metalens with a FOV of 40°and an f-number of 2 is experimentally demonstrated, providing full-color imaging on par with a bulky commercial lens. By combining reconfigurable meta-atoms, we may envision extending such neural nano-optics towards adaptively programable imaging and sensing devices. There have been intelligent imagers based on programmable metasurfaces through similar end-to-end pipelines 182 . The responses of programable meta-atoms contribute to the physical weights and they are jointly trained with the digital weights 183 . Using dynamic metasurface apertures capable of transceiving programmable microwave patterns, Hougne et al. numerically demonstrated a prototypical object recognition task 184 (Fig. 17b). Subsequently, Li et al. reported the first experimental implementation of learned sensing, which completes a human gesture recognition task with a programmable metasurface reflect-array 185 (Fig. 17c). However, both meta-imagers are realized with microwaves, whose corresponding sizes of meta-atoms are large enough for locally tunable response-controlling. Although the application at optical frequencies is still challenging, these intelligent electromagnetic meta-imagers may indicate important directions for dielectric metalensbased imaging systems.
Freeform all-area optimization Freeform all-area optimization algorithms such as the topology optimization (TO) method [186][187][188] are the alternative design method to avoid the aforementioned memory issue. Instead of fixed primitive shapes in conventional metalens designs, the meta-atoms have nontrivial shapes that can be topologymodified 189 (Fig. 18a). TO method considers the dielectric permittivity at every spatial point as a design variable. The target is a faithful representation that coincides with the whole device design space, and a classical explicit representation is grid representation that approaches the target with finer grids (Fig. 18b). There are also implicit design representation methods that try to capture the "axes" of the true design space via domain expertise 189 .
In terms of the optimizers, local gradient-based and global population-based optimizers are included 190 . The local gradient-based optimizers leverage the adjoint variables method. An example is shown in Fig. 18c 191 . A physical model, the couple-mode-theory (CMT) model, is (see figure on previous page) Fig. 16 Metalenses for single-shot 3D imaging. a Layout scheme of the spatially interleaved metalenses with the zoomed-out false-colored SEM (left), and the comparison of imaging results (right). Reproduced with permission 162 . Copyright 2019, American Chemical Society. b A depth sensor based on a metalens simultaneously capturing two images with different defocus, insets are enlarged view of false-colored design (top) and sideview SEM image (bottom). In the right figure, meta-atoms with different colors correspond to distinct focusing phase profiles of the interleaved lenses. Reproduced with permission 163 . Copyright 2019, National Academy of Sciences. c Schematic configuration of the VMMA for TE/TM beam and the corresponding transmission phase profile (left); reconstructed depth-slice images for incidence with one and two polarizations (right). Reproduced with permission 164 . Copyright 2020, Wiley-VCH. d Schematic of 3D imaging with a double-helix-PSF rotating with the object distances (left) and the experimental result: photo of a 3D scene (middle), raw image (top in the right), and the retrieved depth information (bottom in the right). Reproduced with permission 165 . Copyright 2019, The Authors. Published by SPIE first built by previous simulations. The researchers then initialized the iterative optimization process with a randomly selected design and calculated the far-field. Using the adjoint method, the gradient of the loss function is calculated to update the design parameters. After the optimization, the corresponding geometric parameters of the resonators can be directly obtained by using the look-up table. In this method, a large-scale metalens (R~10,000λ) with NA = 0.9 is designed, and complex functionalities such as angle-multiplexed metasurface holograms are also demonstrated. However, the construction of the CMT model involves conventional design approaches of metalenses. Moreover, an initial dielectric distribution is required.
In contrast, the global population-based optimizers reframe the optimization process as the training of a generative neural network 190 . Researchers in Fan's group have given several examples 192,193 . Let's take a global optimizer presented by Jiang et al. 193 for example. Instead of a training set of known devices, the global topology optimization networks (GLOnets) learn the physical relationship between device geometry and response directly through electromagnetic simulations. As illustrated in Fig. 18d, the conditional GLOnet consists of trainable fully connected, and deconvolutional neural layers. With the input comprised of the desired parameters (λ and output angle θ) and random noise vector z, the spatial distribution of meta-atoms is the output. During each iteration of training, a batch of devices is generated, and the efficiency gradients g between forward and adjoint electromagnetic simulations is then calculated to evaluate the loss function of the network and update the weights of the neurons. As a proof of concept, the GLOnet outputs ensembles of a highly efficient topology-optimized metagrating operating across different wavelengths (600, 900, and 1200 nm) and deflection angles (40°, 60°, and 80°).

Large-area metalenses
Large-scale and low-cost mass production of practical imaging devices such as eyeglasses for AR/VR/mixed reality (MR) and 3D displays 62,89,194,195 requires that the designed metalenses have the possibility of large area design and fabrication. In recent years, more and more millimeter-scale metalenses have been experimentally demonstrated 29,[64][65][66]120,[194][195][196][197][198] . However, the diameters of metalenses are generally difficult to reach centimeterscale due to the extremely high data density (owing to

Design challenges
To reduce the phase error resulting from the coupling of adjacent meta-atoms after the implementation process (step (iii) in the section "Principle"), global simulation and optimization are necessary. Nevertheless, the simulation time conventionally scales approximately as O (R 2.4 ) for electromagnetic solvers that utilize standard matrix multiplication (e.g., FDTD method) 199 . Hence, extremely large run-time and high memory space are required during the numerical simulations and optimizations of metalenses on a scale of over millimeters. To avoid running over the computing resource limit, the numerical simulations of metalenses scaled down to micrometer dimension are usually performed instead ( Table 2). The scaling trend O (R 2.4 ) is also valid for topology-optimized metalenses. To overcome this, Phan et al. 187 introduced a conceptually new approach for optimizing large-area metasurfaces in a computationally efficient manner. By stitching together individually 3λwide optimized sections of the metalens, the computational complexity of total optimization is reduced from high-polynomial to linear. Nevertheless, the diameter of topology-optimized metalenses is rarely exceeding a millimeter size so far. To alleviate these issues, extending computation resources with hardware-accelerated electromagnetic solvers may be another scenario. Hughes et al. 200   Demand for minimizing the layout file A key to the efficient fabrication of metalens is the corresponding layout files. They must undergo computationally demanding processing, such as fracturing, to convert the data into the proper format for manufacturing a reticle. However, unmanageably large layout file sizes are generated for metalenses with diameters over centimeters. For example, the general design file size of a 5-cm-diameter metalens comprised of over 6 billion nanoscale meta-atoms is over 200 gigabytes 54 . To reduce the file size, Capasso's group proposed a scalable metasurface layout compression algorithm, coined METAC. A library of self-referenced structures is generated by using several layers to represent increasingly doubled copies of a primitive structure. At each radial position, the core algorithm then efficiently assembles appropriate library elements to create the desired structure, forming a ring. The design file size is efficiently reduced to approximately 131 megabytes, which is compressed by 3 orders of magnitude 54 . However, the compression algorithm is valid only for radially symmetric designs. Colburn et al. developed a more general algorithm that does not require any symmetry in the layout 131 . By using a dictionary data structure containing 6 unique nanopost radii as keys, more than a 2600× reduction in memory is achieved for the metasurface in a 1 cm × 1 cm aperture.

Scaling up and manufacturing
Another nontrivial challenge posed by large-area metalens is to reach high performance using simple fabrication methods. The performance of metalenses is degraded by a series of fabrication imperfections. For the sake of high resolution, most metalenses especially those operating in the visible regime have been fabricated by electron-beam lithography (EBL) or focused ion beam (FIB) techniques 201 (including those with millimeterdiameter 35,41,47,65,66,70,71,89,122,128,130,163,202 ). Nevertheless, a typical EBL or FIB process to define large area structures suffers from time-consuming and high costs. And the costs explosively increase with the resolution as well as the size. An approach to work out the problem is using projection photolithography which is commonly used in semiconductor foundries. Although customized reticles should be firstly fabricated with UV laser direct writing (LDW) 203 , EBL, or FIB techniques, the feature sizes of the lens are magnified by several times to accommodate the image reduction in the projection system. Hence, the lower resolution is acceptable. Mass production of metalenses has been demonstrated by stepper/scanner lithography 53,131,[204][205][206][207][208] and nanoimprint lithography (NIL) 62,209-211 technologies (Table 2), which are the most promising candidates to move metalenses from lab to fab in the future. For example, 1-cmdiameter all-glass metalenses operating at visible wavelength are manufactured using 248-nm deep-ultraviolet (DUV) projection stepper lithography 204 . The pattern of the metalens is replicated rapidly over the face of the 4-inch wafer by repeatedly exposing and incrementally stepping the wafer position. Consequently, the patterning throughput is as high as thousands of metalenses per hour. Alternative to stepper lithography, NIL transfers the pattern of the mask by direct contacting and resin curing, without relying on semiconductor geometrical limitations and processing capacity. Hence, NIL is cost-effective, highresolution, and high-throughput. However, the contact mode of NIL involves some concerns about defects, throughput, and template wear, especially for a large-area template. For example, thermal NIL 212 , which is widely used, requires high pressure and temperature that easily cause damage to the template and adhesive layer. UV-NIL 209-211 is performed at low pressures and room temperature, minimizing magnification and distortion errors. Nevertheless, the difficultly discharged bubbles in the UV curable resist cause defects in the meta-atoms.
To solve these problems, a systematical study on the quantitative relationship between processing errors and device performance should be considered. Besides, evaluation for processing errors and standards to guide the industrial production of metasurfaces need to be estimated.

Device integration
The packaged miniaturized optical systems are required in consumer applications. For high-performance metadevices, one may consider the necessary elements, the precise alignment, the stability, etc. A few prototypes of imaging systems combining meta-optics and refractive optics have been developed 119,195 . For instance, a portable packaged full-stoke polarization camera is presented by integrating a metagrating with an off-the-shelf catalog lens and a CMOS imaging sensor 119 . The grating sample is 1.5 mm in diameter, but the size of this prototype is much larger owing to the existing bulk lens and optomechanical mounting. A compact spectrometer in millimeter-scale volume is achieved by side-by-side alignment 70,71 (Fig. 9c, d). Such a device, however, is not suitable for transparent dielectric metalenses. To further reduce the device size for practical applications, achieving an integrated optical module comprised of metalenses and photosensors is vital.

Wafer stacking and packaging
The current advanced stacking and packaging technologies [213][214][215][216][217][218] offer access to integrating the lens wafer on the photosensor wafer. Hu et al. 215 demonstrated fullcolor holography by stacking a metasurface upon a layer of color filter array (Fig. 19a). Considering the stepwise structures for color filters, PMMA is coated as the spacer to form a uniform surface for the metasurface (Fig. 19b). In Hu's other work 216 , a metasurface-stacked structure is integrated upon a CMOS imaging detector by a layer of optically clear adhesive (OCA) with the desired shape and thickness (Fig. 19c). Martins et al. 63 proposed a possible prototype of a metalens camera based on a translation stage (Fig. 19d). As shown in Fig. 19e, Xu et al. 60 mounted the metalens on a CMOS image sensor with OCA tapes. The OCA tapes are also the spacer medium of integration for the well-defined stationary thickness. Hence, the imaging distance v is fixed, and a clear image is acquired by tuning the object distance u with the translation stage. As a result, high-resolution images (∼1.74 μm) with millimeter-scale image area are achieved by a ∼3-cm size device prototype 60 (Fig. 19f). Li et al. 61 demonstrated a similar compact NIR microscopic configuration that gives performances comparable with a commercial microscope.

Monolithic integration
The small dimensions of metalenses require fabrication by lithography techniques, where photolithography is currently used as a part of the standard CMOS fabrication process in the microelectronics industry. Hence, a metalens is generally more compatible with a CMOS fabrication line than a conventional lens 2 . In particular, lossless CMOS-compatible materials (e.g., niobium pentoxide and silicon for the visible and infrared wavelengths, respectively) 219 were chosen to form metalenses, reducing the difficulties of integration preparation and encapsulation. The precise alignment of multiple layers of elements is also achieved 218 . The CMOS compatibility could efficiently help the incorporation of metalenses in the manufacturing process of sensors or other on-chip systems 208,220 .
Xie et al. 221 achieved a monolithic back-emitting configuration for arbitrary beam shaping of verticalcavity surface-emitting lasers (VCSELs) with wafer-level integration through VCSEL-compatible technology. The VCSEL wafer is fabricated on a GaAs substrate firstly, and then the backside of the substrate (emitting surfaces) is directly sculptured into metasurfaces (Fig. 20a). When the metasurface acts as a focusing lens that compensates for the beam divergence, the emitted light is self-collimated. As a proof of concept, a chip of 10 × 10 metasurface-VCSELs with different deflection angles is fabricated (Fig. 20b).
Goi et al. 222 manufactured a 3D metasurface on a CMOS image sensor using Galvo-dithered two-photon nanolithography (TPN). Using a dip-in approach and a liquid photoresist in the TPN process (Fig. 20c), an array of 2 × 3 metasurfaces is printed directly on the sensor (Fig. 20d). To precisely regulate the distance between the structures and the imaging plane, the metasurface is mounted on table-like supports with the pillars' height being 47.1 μm. Similar TPN-based 3D nanoprinting technology is also applied to fabricate metalens on a fiber tip 50 .

Other challenges
Encapsulation is necessary for both types of integration. Nevertheless, owing to the index difference between the encapsulation glass and air, the efficiency of the device would be reduced by the reflection at the interface of the two media; the focal length is altered compared with the unpacked metalens, and additional monochromatic aberrations could arise. One also needs to consider the matching of metalens performances and photosensor performances 223 . Hence, the metalens should be designed and optimized according to the overall packaging module from the beginning. For example, the phase profile should be optimized by adding a slab-corrected phase profile 224 .
Moreover, thermal stability and mechanical stability are important in practical applications. The metalens performances would be degraded by variations in refractive index and volume of the constituent material owing to the temperature change. Hence, thermally stable and transparent materials for the operating wavelengths are worth considering. Liu et al. 225 considered infrared metalenses made of Si and Ge. When the temperature is changed from -40 to 80°C, the relative variation in focal length is a 3D -In teg rat ed me tas urf ac es calculated as only 0.032% and 0.121% for 250-μm-diameter metalenses made of Si and Ge, respectively. Nevertheless, the calculation considers the free-standing case. When the metalens is mounted for device integration, the difference between thermal expansion coefficients of the mounting and the metalens materials may cause a surface bending of the metalens, which, in turn, may degrade the focusing performance and damage (crack) the metalenses. Hence, one needs to take the thermal stability and the mechanical stability of metalenses into consideration for metalens-integrated devices.

Conclusion and outlook
Overall, this review summarizes the progress and challenges of metalenses in miniaturized optical systems. Attributed to the customized engineering capability on the wavefront, the unique features of metalenses, such as aberration-correction, dispersion engineering, and multifunctionality, are utilized in ultra-compact imaging systems. Despite the tremendous progress achieved so far, several challenges should be addressed for the practical development of metalens-based imaging systems mostly aiming at the VIS-NIR-MIR wavelengths. High focusing efficiency is challenging for high-NA metalenses due to the fundamental phase discretization and diffraction constraints. The breakthrough broadband achromatic feature of metalens suffers from some fundamental constraints, design limitations, and fabrication challenges. Moreover, Conflicts in different performances should be manipulated. In addition, common challenges are hindering further development of dielectric metalensintegrated systems, including limitations of conventional design methods, the scaling up of dielectric metalenses, and the approaches for integrated devices.
In turn, these challenges provide some possible future development directions of metalenses. A few examples are given but are not limited to the following. (i) New strategies, both theoretical and experimental, to break the conflicts among metalens performances (efficiency versus NA, NA versus FoV, achromatic bandwidth versus diameter, etc.). (ii) Multifunctional and reconfigurable metalenses that could replace complicated configurations of conventional optics (e.g., polarization cameras, zoom lenses, full-optical setups for signal processing as well as optical analog computing). (iii) High-efficiency design methods for large-area and high-performance metalenses (such as end-to-end intelligent designs, freeform optimizations, as well as computational imaging techniques where the aberration correction is offloaded to post-processing software). methodologies for large-area metalenses. A systematical model on the quantitative relationship between processing errors and device performance would be helpful, and we expect that the evaluations and standards to guide the industrial production of metasurfaces would be estimated.
Another interesting direction for metalenses (local flatoptics) in compact imaging systems is the combination with nonlocal flat-optics. Even for the simplest imaging system that is composed of a metalens and a photosensor, the free space constitutes a great portion of the system volumes. Recently, the free-space volumes are compressed by a kind of nonlocal flat-optics structure called "spaceplate", which is characterized by a momentum-dependent transfer function. After transmitting through the spaceplate with a physical length d, the output wavefront is equivalent to that has propagated for an equivalent length d eff in free space. The space-squeezing effect has been demonstrated using 3D photonic crystals with isotropic band structures 226 , uniaxial birefringent slabs (with the same refractive index as the background for the extraordinary light and a larger index for ordinary light) 78 , and multilayered structures 78,79 . Particularly, the potential of combining metalens and spaceplate has been theoretically demonstrated 79 . By matching the compression ratio (d eff /d) of the spaceplate and the imaging distance, an ultrathin, fully solid-state imaging system is potentially achieved with a metalens and a photosensor integrated on the two sides of the spaceplate, respectively. However, there remain some issues to be addressed related to combining metalenses and spaceplates. For instance, since the propagating momentum relies on not only the propagating direction but also the wavelength, there would be a trade-off between the achromatic bandwidth and the compression ratio.
By addressing the present challenges of metalenses and combining the nonlocal optics, we envision that metalensbased imaging systems would be increasingly compact and widely employed in future applications, ranging from camera modules for consumer photography and autonomous vehicles to wearable displays for AR/VR/MR and machine vision, to bioimaging and endoscopy, to signal processing and optical computation.