A machine learning route between band mapping and band structure

The electronic band structure and crystal structure are the two complementary identifiers of solid-state materials. Although convenient instruments and reconstruction algorithms have made large, empirical, crystal structure databases possible, extracting the quasiparticle dispersion (closely related to band structure) from photoemission band mapping data is currently limited by the available computational methods. To cope with the growing size and scale of photoemission data, here we develop a pipeline including probabilistic machine learning and the associated data processing, optimization and evaluation methods for band-structure reconstruction, leveraging theoretical calculations. The pipeline reconstructs all 14 valence bands of a semiconductor and shows excellent performance on benchmarks and other materials datasets. The reconstruction uncovers previously inaccessible momentum-space structural information on both global and local scales, while realizing a path towards integration with materials science databases. Our approach illustrates the potential of combining machine learning and domain knowledge for scalable feature extraction in multidimensional data.


Introduction
The modelling and characterization of the electronic BS of materials play an essential role in materials design [1] and device simulation [2]. The BS lives in the momentum space, Ω(k x , k y , k z , E) and imprints the multidimensional and multi-valued functional relations between energy (E) and momenta (k x , k y , k z ) of periodically confined electrons [3]. Photoemission band mapping [4] (see Fig. 1a) using momentum-and energy-resolved photoemission spectroscopy (PES), including angle-resolved PES (ARPES) [5,6] and multidimensional PES [7,8] measures the BS as an intensity-valued multivariate probability distribution directly in Ω.
The proliferation of band mapping datasets and their public availability brought about by recent hardware upgrades [7][8][9][10] have ushered in the possibilities of comprehensive benchmarking between theories and experiments, which become especially challenging for multiband materials with complex band dispersions [11][12][13]. The available methods for interpreting the photoemission spectra fall into two categories: Physics-based methods require least-squares fitting of 1D lineshapes, named energy or momentum distribution curves (EDCs or MDCs), to analytical models [5,14,15]. Although physics-informed data models guarantee high accuracy and interpretability, upscaling the pointwise fitting (or estimation) to large, densely sampled regions in the momentum space (e.g. including 10 4 or more momentum locations) presents challenges due to limited numerical stability and efficiency. Therefore, their use is limited to selected momentum locations determined heuristically from physical knowledge of the materials and the experimental settings. Image processing-based methods apply data transformations to improve the visibility of dispersive features [16][17][18][19]. They are more computationally efficient and can operate on entire datasets, yet offer only visual enhancement of the underlying band dispersion.
They don't allow reconstruction and therefore are insufficient for truly quantitative benchmarking or archiving.
A method balancing the two sides will extract the band dispersion with sufficiently high accuracy and be scalable to multidimensional datasets, therefore providing the basis for distilling structural information from complex band mapping data and for building efficient tools for annotating and understanding spectra. In this regard, we propose a computational framework (see Fig. 1b) for global reconstruction of the photoemission (or quasiparticle) BS as a set of energy (or electronic) bands, formed by energy values (i.e. band loci) connected along momentum coordinates. This local connectedness assumption is more valid than using local maxima of photoemission intensities because local maxima are not always good indicators of band loci [20]. We exploit the connection between theory and experiment in our framework, based on a probabilistic machine learning [21,22] model to approximate the intensity data from band mapping experiments. The gist of the model is rooted in Bayes rule, where X are the random variables to be inferred and the data D are mapped directly onto unknowns and experimental observables. We assign the energy values of the photoemission BS as the model's variables to extract from data, and a nearest-neighbor (NN) Gaussian distribution as the prior, p(X), to describe the proximity of energy values at nearby momenta. The EDC at every momentum grid point relates to the likelihood, p(D|X), when we interpret the photoemission intensity probabilistically. The optimum is obtained via maximum a posteriori (MAP) estimation in probabilistic inference [21] (see Methods and Supplementary Fig. 2). Given the form of the NN prior, the posterior, p(X|D), in the current setting forms a Markov random field (MRF) [21,23,24], which encapsulates the energy band continuity assumption and the measured intensity distribution of photoemission in a probabilistic graphical model. For one benefit, the probabilistic formulation can incorporate imperfect physical knowledge algebraically in the model or numerically as initialization (i.e. warm start, see Methods) of the MAP estimation, without requiring de facto ground truth and training as in supervised machine learning [25].
For another, the graphical model representation allows convenient optimization and extension to other dimensions (see Supplementary Fig. 1 and Section 1).
To demonstrate the effectiveness of the method, we have first reconstructed the entire 3D dispersion surfaces, E(k x , k y ), of all 14 valence bands within the projected first Brillouin zone (in (k x , k y , E) coordinates) of the semiconductor tungsten diselenide (WSe 2 ), spanning ∼ 7 eV in energy and ∼ 3Å −1 along each momentum direction. Furthermore, we adapt informatics tools to BS data to sample and compare the reconstructed and theoretical BSs globally. The accuracy of the reconstruction is validated using synthetic data and the extracted local structural parameters in comparison with pointwise fitting. The available data and BS informatics enable detailed comparison of band dispersion at a resolution of < 0.02Å −1 . Besides, we performed various tests and benchmarking on datasets of other materials and simulated data, where ground truth is available to evaluate the accuracy and computational efficiency.

Results
Band structure reconstruction and digitization. Our primary example is the 2D layered semiconductor WSe 2 in the hexagonal lattice with a bilayer stacking periodicity (denoted as 2H-WSe 2 ) is a model system for band mapping experiments [11,27,28]. Earlier valence band mapping and reconstruction in ARPES experiments on WSe 2 have demonstrated a high degree of similarity between theory and experiments [11,27,28], but a quantitative assessment within the entire (projected) Brillouin zone is still lacking. The valence BS of 2H-WSe 2 contains 14 strongly dispersive energy bands, formed by a mixture of the 5d 4 and 6s 2 orbitals of the W atoms and the 4p 4 orbitals of the Se atoms in its hexagonal unit cell. The strong spin-orbit coupling due to heavy elements produces large momentum-and spin-dependent energy splitting and modifications to the BS [11,29].
We use a 2D MRF to model the loci of an energy band within the intensity-valued 3D band mapping data, regarded as a collection of momentum-ordered EDCs. It is graphically represented by a rectangular grid overlaid on the momentum axes with the indices (i, j) (i, j are nonnegative integers), as shown in step (3) of Fig. 1b. The undetermined band energy of the EDC at (i, j), with the associated momentum coordinates (k x,i , k y,j ), is considered a random variable,Ẽ i,j , of the MRF. Together, the probabilistic model is characterized by a joint distribution, expressed as the product of the likelihood and the Gaussian prior, in Eq. (1). To maintain its simplicity, we don't explicitly account for the intensity modulations of various   (2) Postprocessing (4) Figure 1: From band mapping to band structure. a, Schematic of a photoemission band mapping experiment. The electrons from a crystalline sample's surface are liberated by extreme UV (XUV) or X-ray pulses and collected by a detector through either angular scanning or timeof-flight detection schemes. b, Overview of the computational framework for reconstruction of the photoemission (or quasiparticle) band structure: (1) The volumetric data obtained from a band mapping experiment first (2) go through preprocessing steps, then are (3) fed into the probabilistic machine learning algorithm along with electronic structure calculations as initialization of the optimization. The reconstruction algorithm for volumetric band mapping data is represented as a 2D probabilistic graphical model with the band energies represented as nodes, leading to tens of thousands of nodes in practice. (4) The outcome of the reconstruction is postprocessed (e.g. symmetrization) to (5) yield the dispersion surfaces (i.e. energy bands) of the photoemission band structure ordered by band indices. c-f, Effects of the intensity transforms in preprocessing viewed in both 3D and along high-symmetry lines of the projected Brillouin zone (hexagonal as in b(1)), from the original data (c) through intensity symmetrization (d), contrast enhancement [26] (e) and Gaussian smoothing of intensities (f). The intensity data in c-f are normalized individually for visual comparison. origins (such as imbalanced transition matrix elements [20]) in the original band mapping data, which cannot be remediated by upgrading the photon source or detector. Instead, we preprocess the data to minimize their effects on the reconstruction (see Fig. 1c-f). The preprocessing steps include (1) intensity symmetrization, (2) contrast enhancement [26], followed by (3) Gaussian smoothing (see Methods), whereafter the continuity of band-like features is restored. The EDCs from the preprocessed data,Ĩ, are used effectively as the likelihood to calculate the MRF joint distribution, Here, Z is a normalization constant, η is a hyperparameter defining the width of the Gaussian   Fig. 3a-b, the band dispersions show generally decreasing dependence (seen from the magnitude of coefficients) on basis terms with increasing complexities (see Fig. 3a), and the majority of dispersion is encoded into a subset of the terms (see Fig. 3b). This observation implies that moderate smoothing may be applied to remove high-frequency features to improve the reconstruction in case of limited-quality data (acquired without sufficient accumulation time), which is often unavoidable when materials exhibit vacuum degradation, or during experimental parameter tuning. The example in Fig. 3b . 2d). These observations are consistent with the outcome obtained from DFT calculations (see Supplementary Fig. 13).
Computational metrics and performance. To quantify the computational advantages of the machine learning-based reconstruction approach, we examine the outcome from diverse perspectives in consistency, accuracy and cost. To assess the consistency of reconstruction in its entirety, we introduce a BS distance metric (see Methods), invariant to the global energy shift frequently used to adjust the energy zero, to quantify the differences in band dispersion and the   Figure 4: Performance evaluation on benchmarks. Visual summary of the benchmarking outcomes for band structure reconstruction using normalized metrics that are able to compare across datasets. These include a, the computing time and b, root-mean-square error (reconstruction error), both normalized to the per-band, per-spectrum level [39]. The other metrics, including c the hyperparameter tuning time and d, the reconstruction instability are normalized to the per-band level. The methods used in reconstruction include pointwise line fitting (LF) and the Markov random field (MRF) approach presented in this work, while the synthetic data are around the K point and along the high-symmetry line (HSL) of the WSe 2 band structure. The benchmarks were run with synthetic datasets terminated at fixed energy ranges that contain the specified number of bands (2, 4, 8, and 14, the maximum band index in the dataset) shown in a-d.
a considerable reduction in both normalized computing time and hyperparameter tuning time, while achieving consistently higher accuracy and stability in all but the two-band case. The combination of accuracy and stability in MRF reconstruction is due to the connectivity built into the prior, whereas in the pointwise fitting approach, information is not explicitly shared among neighbors. Since the number of bands reflects the complexity of multicomponent spectra, a near-constant normalized computing time and hyperparameter tuning time (see Fig. 4a in MRF reconstruction regardless of the number of bands (or spectral components) allow us to scale up the computation to datasets comprising 10 4 -10 5 or more spectra. The substantial gain in computational efficiency is a result of the inherent divide-and-conquer strategy in our BS  In addition to WSe 2 , we have performed BS reconstruction on two other photoemission datasets from other classes of materials: (1) Bismuth tellurium selenide (Bi 2 Te 2 Se), a topological insulator, measured using the same laboratory photoemission setup (see Fig. 6a-e) as for the WSe 2 dataset. Although we used only simple numerical functions (Gaussian and paraboloid) to initialize the MRF reconstruction, the outcome demonstrates correct discrete momentum-space symmetry and details of energy dispersion down to the concave-shaped hexagonal warping in the band energy contours around the Dirac point [43]. Four energy bands, including the two lowenergy valence bands, a surface-state energy band, and a partially occupied conduction band, were recovered using our approach for Bi 2 Te 2 Se. (2) Bulk gold (Au) photoemission dataset measured at a synchrotron X-ray source (see Fig. 6f-g). We used DFT calculations as the initialization to reconstruct four of the bulk energy bands, which are usually very challenging to extract by hand tracing or parametric function fitting, due in part to blurring (k z dispersion) from the 3D characteristics of the electrons in the metallic bulk. Further discussions on these two materials and their band reconstructions are provided in Supplementary Section 3.

Discussion
The reconstruction approach described here provides a quantitative connection between em- ) obtained from photoemission band mapping and their theo- ) through various orders of momentum-dependent "perturbations" ). The connection may be expressed as, In Eq. (3), b is the band index, Σ represents electron self-energy, the zeroth-order term (∆E   largely valid, our reconstruction may still recover the quasiparticle dispersion. We demonstrate this in Supplementary Fig. 10 using simulated photoemission data with a kink anomaly, a strong modification of dispersion from electron self-energy [5,6]. Thirdly, an appropriate initialization may be expensive or impossible to obtain, either due to the computational cost, if higher-level theories (such as DFT with hybrid functionals and GW ) are required, or due to the complexity of the materials system, including undetermined microscopic interactions, sample defects or structural disorder, creating strong intensity blurring from k z dispersion, etc. These scenarios will remain challenging for band reconstruction.
Besides our demonstrations, we anticipate additional use cases that include (i) online mon-

Methods
Band mapping measurements of WSe 2 . Multidimensional photoemission spectroscopy experiments were conducted with a laser-driven, high harmonic generation-based extreme UV light source [9] operating at 21.7 eV and 500 kHz and a METIS 1000 (SPECS GmbH) momentum microscope featuring a delay-line detector coupled to a time-of-flight drift tube [8,57].
The experiment captures photoelectrons directly in their 3D coordinates, (k x , k y , E) [7,8]. Single crystal samples of WSe 2 (> 99.995% pure) were purchased from HQ graphene and were used directly for measurements without further purification. Before measurements, the WSe 2 samples were attached to the Cu substrate by conductive epoxy resin (EPO-TEK H20E). The samples were cleaved by cleaving pins attached to the sample surface upon transfer into the measurement chamber, which operates at an ambient pressure of 10 −11 mbar during photoemission experiments. No effect of surface termination has been observed in the measured WSe 2 photoemission spectra, similar to previous experimental observations [11,27]. Data binning is carried out in conjunction with the necessary lens distortion correction [60] and calibrations as described in [58]. The outcome provides a sufficient level of granularity in the momentum space to resolve the fine features in band dispersion while achieving higher signal-to-noise ratio than using single-event data directly. Afterwards, we applied intensity symmetrization to the data along the sixfold rotation symmetry and mirror symmetry axes [11] of the photoemission intensity pattern in the (k x , k y ) coordinates, followed by contrast enhancement using the multidimensional extension of the contrast limited adaptive histogram equalization (MCLAHE) algorithm, where the intensities in the image are transformed by a look-up table built from the normalized cumulative distribution function of local image patches [26].
Finally, we applied Gaussian smoothing to the data along the k x , k y and E axes with a standard deviation of 0.8, 0.8 and 1 pixels (or about 0.012Å −1 , 0.012Å −1 , and 18 meV), respectively.
After data preprocessing, we sequentially reconstructed every energy band of WSe 2 from the photoemission data using the maximum a posteriori (MAP) approach described in the main text. The reconstruction requires tuning of three hyperparameters: (1) the momentum scaling and (2) the rigid energy shift to coarse-align the computed energy band, e.g. from density functional theory (DFT), to the photoemission data, and (3) the width of the nearest-neighbor Gaussian prior (η in Eq. (2)). The hyperparameter tuning is also carried out individually for each band to adapt to their specific environment. An example of hyperparameter tuning is given in Supplementary Fig. 4. The MAP reconstruction method involves optimization of the band energy random variables, {Ẽ i,j } to maximize the posterior probability p = p({Ẽ i,j }) or to minimize the negative log-probability loss function, L := − log p, obtained from Eq. (2) as is used in our actual implementation.
We implemented the optimization using a parallelized version of the iterated conditional mode during optimization. Typically, the optimization process in the reconstruction of one band converges within and therefore is terminated after 100 epochs, which takes ∼ 7 seconds on a single NVIDIA GTX980 GPU for the above-mentioned data size. Details on the parallelized implementation are provided in Supplementary Section 1. In addition, because symmetry information is not explicitly included in the MRF model, the reconstructed bands generally requires further symmetrization as refinement or post-processing to be ready for database integration.
We described our approach of using band structure calculations to initialize the MAP optimization as a warm start. The term "warm start" in the context of numerical optimization generally refers to the initialization of an optimization using the outcome of an associated and yet more solvable problem (e.g. surrogate model) obtained beforehand that yields an approximate answer, instead of starting from scratch (i.e. cold start). Warm-starting an optimization improves the effective use of prior knowledge and its convergence rate [40]. In the current context, we regard the band structure reconstruction from photoemission band mapping data as the optimization problem to warm start, and the outcome from an electronic structure calcula-tion can produce a sufficiently good approximate to the solution of the optimization problem.
For WSe 2 , straightforward DFT calculations with semi-local approximation (which in itself involves explicit optimizations such as geometric optimization of the crystal structures) are sufficient, but our approach is not limited to DFT. Therefore, the use of "warm start" in our application is conceptually well-aligned with the origin of the term.
To between energy bands, which matter in the event of (anti)crossings. In combining these two approaches, we typically choose the same color map and scale to maintain referenceability between the two representations. For each energy band, the full color scale is used to cover its energy range, becoming the normalized energy scale, which illustrates the local detail of the dispersion that otherwise may be hard to discern. . The momentum grid used for the calculation was equally sampled with a spacing of 0.012Å −1 in both k x and k y directions that covers the irreducible part of the first Brillouin zone at k z = 0.35Å −1 , estimated using the inner potential of WSe 2 from a previous measurement [11]. The calculated band structure is symmetrized to fill the entire hexagonal Brillouin zone to be used to initialize the band structure reconstruction and synthetic data generation. We note here that for the MAP reconstruction, the momentum grid size used in theoretical calculations (such as DFT at various levels used here) need not be identical to that of the data (or instrument resolution) and in those cases an appropriate upsampling (or downsampling) should be applied to the calculation to match their momentum resolution.
Further details are presented in Supplementary Section 4.
Band structure informatics. The shape feature space representation of each electronic band is derived from the decomposition, Here, k = (k x , k y ) represents the momentum coordinate, E b (k) is the single-band dispersion relation (e.g. dispersion surface in 3D), a l and φ l (k) are the coefficient and its associated basis term, respectively. They are grouped separately into the feature vector, a = (a 1 , a 2 , ...), and the basis vector, Φ = (φ 1 , φ 2 , ...). The orthonormality of the basis is guaranteed within the projected Brillouin zone (PBZ) of the material.
For the hexagonal PBZ of WSe 2 , the basis terms are hexagonal Zernike polynomials (ZPs) con- The cosine similarity is bounded within [−1, 1], with a value of 0 describing orthogonality of the feature vectors and a value of 1 and -1 describing parallel and anti-parallel relations between them, respectively, both indicating high similarity. The use of cosine similarity in feature space allows comparison of dispersion while being unaffected by their magnitudes. In comparing the dispersion between single energy bands using Eq. (7), the first term in the polynomial expansion, or the hexagonal equivalent of the Zernike piston [73], is discarded as it only represents a constant energy offset (with zero spatial frequency) instead of dispersion, which is characterized by a combination of finite and nonzero spatial frequencies.
The electronic band structure is a collection of energy bands To quantify the distance between two band structures, containing the same number of energy bands while ignoring their global energy difference, we first subtract the energy grand mean (i.e. mean of the energy means of all bands within the region of the band structure for comparison). Then, we compute the Euclidean distance, or the 2 -norm, for the ith pair of bands, d b,i .
Here,ã denotes the feature vector after subtracting the energy grand mean so that any global energy shift is removed. We define the band structure distance as the average distance over all We use same-resolution error metrics to evaluate the approximation quality of the expansion basis and to quantify the reconstruction outcome with a known ground-truth band structure.
Specifically, we define the average approximation error (with energy unit), η avg , for each energy band using the energy difference at every momentum location, where N k is the number of momentum grid points and the summation runs over the projected Brillouin zone. In addition, we construct the relative approximation error, η rel , following the definition of the normwise error [74] in matrix computation, Eq. (9)

Code availability
The code developed for band structure reconstruction including examples is available at https://github.com/mpes-kit/fuller.

Competing interests
The authors declare no competing interests in the content of the article.

Supplementary Information
A machine learning route between band mapping and band structure 1 Band structure reconstruction

Physical foundations
The three quantities of common interest for the interpretation of photoemission spectra are (1) the bare band energy, k , (2) the complex-valued electron self-energy, Σ(k, E) = ReΣ(k, E) + iImΣ(k, E), and (3) the transition matrix elements connecting the final (f ) and initial (i) electronic states, M f,i (k, E). An established interface between theory and experiment for quantitating and interpreting the photoemission signal is the formalism of an experimental observable: the single-particle spectral function [5,75], A(k, E). For a single energy band of a many-body electronic system, Within this framework, the band loci of the photoemission (or quasiparticle) band structure For a multiband electronic structure, band mapping measurements, in principle, have access to the spectral functions of at least all valence bands. The photoemission intensities are combined in summation to form the multiband (MB) counterpart of the single-band formula.
The condition f FD (E) → 1 applies to valence bands, while |M f j ,i j (k, E)| → 1 may be achieved through nonlinear intensity normalization or contrast enhancement in data processing. The expression of the multiband photoemission intensity in Eqs. (13)- (14) provides the physical foundation and inspiration for the approximate generation of band mapping data (see Supplementary Section 2) that we employ to validate the reconstruction algorithm introduced in this work.

Markov random field modeling
The Deriving the MRF amounts to determining the joint distribution of the random variables associated with its graphical representation. In probabilistic graphical model theory [76], a graph is constructed from the fundamental components called cliques. Each clique C of a graph is a subset of nodes that shares an edge with another node in C, with the total number of nodes in C defined as its size. The MRFs in Supplementary Fig. 1a-c that model the photoemission data are built out of cliques of sizes 1-2 shown in Supplementary Fig. 1d. Although larger cliques can be constructed similarly [76], their parent graphical models are described by more complex joint distributions with drastically higher computational costs in optimization, therefore are not used in our MRFs. Mathematically, each clique is represented by a so-called potential function, ψ C , which is used to derive the joint distribution that characterizes the MRF. The potential function only depends on the node configuration in the cliques, X C , and satisfies ψ C (X C ) 0.
According to the Hammersley-Clifford theorem [76][77][78], the joint distribution of a vector of random variables, X, can be written in the factorized form, Here, C is the set of all cliques in the graph, and the partition function Z is a normalization constant given by   means that the potential function of size-2 cliques can be represented by a Gaussian on adjacent momentum grid positions. Intuitively, this means that the closer the two adjacent energies is, the more probable they are the actual band loci, and vice versa.
In the 1D case (see Supplementary Fig. 1a), the potential function of each node (containing one band energy random variableẼ i ) is given by whereĨ is the photoemission intensity after preprocessing. The potential function of two connected nodes (describing the similarity between two neighboring band energy random variables) is given by Plugging Eqs. (16)- (17) into Eq. (15) yields as the joint distribution of the 1D MRF, with N being the total number of momentum grid points. Analogously, we can derive the joint distribution of the 2D MRF as given in the main text, and that for the 3D MRF in the (k x , k y , E, t) coordinates is The MRF models in different dimensions discussed here follow the same Bayesian interpretation as the 2D MRF (Eq. (1) in the main text).
In practice, a 4D dataset of the kind, I(E, k x , k y , k z ), and comparable spacing along the momentum dimensions (∆k x ∼ ∆k y ∼ ∆k z ) should be treated together to best use the connectivity encoded in the structured prior of the Markov random field model, which becomes a 3D grid of random variables that accounts for the connectedness along the momentum directions. When the fourth dimension (such as k z ) in the data is not sampled as densely as the other dimensions (∆k x ∼ ∆k y ∆k z ), which may be the case for synchrotron-based photoemission instruments (resulting in 3.5 D or quasi-4D datasets), the dataset can also be treated individually per scanned photon energy, since the local connectedness assumption along the third momentum direction is no longer retained.

Optimization procedure
Optimization of the MRF model is a local minima-finding process [76]. The following procedures are described using the 2D MRF in the main text as an example, but the approach can be extended to arbitrary dimensions. Due to the large number of random variables (∼ 10 4 for the 2D MRF in the main text) and their complex dependence structure in the MRF, we solved it numerically using iterated conditional mode (ICM) [61] procedure and implemented with efficient parallelization schemes, including the coding method and the hierarchical grouping of random variables. Next, we discuss the motivations and clarify the details of these three aspects. We provide the associated pseudocode in Algorithm 1.

Iterated conditional mode:
Originally developed for similar optimization problems arising in image denoising [76,79,80], ICM is applicable to optimizing MRF at any dimension. The separation [76,81]. Analogously, blocking the black nodes d-separates the white nodes. Since the MRF models satisfy the Hammersley-Clifford theorem [77], d-separation is equivalent to conditional independence, meaning that the random variables represented by the black nodes are independent if we condition on those represented by the white nodes. Therefore, conditioning on the nodes of one color allows us to compute the terms in the log-probability loss (Eq. (3) in main text Methods) that depends on the nodes of another color in parallel, which means that the nodes associated with different colors can be updated alternately. Further details and proofs related to the coding method have been elaborated in [78,82].

Robust initialization:
Since the current MRF model doesn't include any explicit regularization on the outcome with respect to the initialization, the optimizer is free to explore a large range of values. In other words, the initial band dispersion is able to freely deform to fit to the band loci embedded in the data. This design improves the robustness of the algorithm to initialization. As a result, in scenarios with only non-crossing energy bands, the MAP optimization can simply be initialized with constant energy values to yield consistent results. In general situations involving band crossings, the optimization procedure requires an initialization with approximate energy values that preserves the band-crossing information, such as those provided by electronic structure calculations. In this scenario, the robustness of the algorithm is manifest in the fact that it can tolerate a certain amount of deviation in the initialization and still converges to a satisfactory reconstruction, which, in realistic settings, is closer to the real band structure contained in photoemission data than the initialization (e.g. from electronic structure calculations). Quantitative examples demonstrating the robustness of initialization are provided using synthetic data in Supplementary Figs. 6-9 (see Supplementary Section 2).

Hyperparameter tuning
The optimization process in the band structure reconstruction involves the tuning of three kinds of hyperparameters, which are the momentum scaling parameter, the rigid energy shift and the width of the nearest-neighbor Gaussian prior. A flowchart presented in Supplementary Fig. 3 illustrates the general steps in obtaining a desirable reconstruction including where the tuning of each hyperparameter fits in.
1. Momentum scaling: applied to equalize the momentum scale and resolution between the BS calculation (e.g. conducted on relaxed unit cells, see Supplementary Table 3) and the experimental data (measured on real materials). In our reconstruction procedure, the scaling factor is fixed in the reconstruction of all energy bands using a particular level of density functional theory (DFT) calculation as initialization.

Rigid energy shift (∆E)
: separately applied to each energy band in the calculated BS to coarse-align to the band mapping data. In our case, the shift is chosen manually by visual inspection of the theoretical band energies overplotted on photoemission data (usually in the energy-momentum slices). In practice, the necessary energy shifts vary between bands and also depend on the level of approximation in the BS calculation used as initialization, as illustrated in Fig. 2a of the main text.

Width of the nearest-neighbor Gaussian prior (η):
The value of the parameter η is chosen manually from an initial estimate and subsequently optimized by visual inspection of the reconstruction outcome. In the case of WSe 2 , the momentum grid of the experimental data has a spacing of ∆k x = ∆k y ≈ 0.015Å −1 , we used η ∈ [0.05, 0.2] eV. Generally speaking, the initial estimate of η has the order of magnitude proportional to the momentum grid spacing times the dispersion due to the following argument: To obtain a consistent reconstruction, we expect the  Figure 3: Flowchart for reconstruction tuning. Illustration of the steps for tuning the reconstruction starting from preprocessed data (outcome from the procedure illustrated in the main text Fig. 1c-f). Tuning of the three hyperparameters -the momentum scaling, energy shift (∆E) and nearest-neighbor Gaussian width (η), are placed in sequence within the workflow. The workflow outputs reconstruction of a single band with tuned hyperparameters at the end. For reconstructing the dispersion of multiple energy bands, the workflow is repeated over each band.
posterior to stay relatively constant and be independent of the momentum grid spacing, which should be sufficiently fine to ensure band continuity. Since after preprocessing the data, the intensity (i.e. the likelihood) is normalized and stays constant with respect to the momentum grid spacing, the nearest-neighbor Gaussian prior term should stay constant correspondingly.
For example, for two nearest-neighbor energy variables along the k x axis, the reasoning above requires, Thereby, we obtain η ∝ ∂E ∂kx ∆k x , which provide an order-of-magnitude estimate of η. The same lines of reasoning apply to the k y axis, for detector systems with relatively constant momentum resolution. As the grid spacing is the same in both k x and k y directions, a single η is used for reconstructing each band in the case of WSe 2 , but the best η differs somewhat between energy bands due to their various amounts of dispersion and how they are connected to the neighboring bands (i.e. their environment), hence the range of η as specified earlier.
To demonstrate the process of hyperparameter tuning, we provide an example showing the reconstruction of the second valence band of WSe 2 (see Supplementary Fig. 4), visualized in the top view of the reconstruction outcome and in the momentum path along high-symmetry lines of the projected Brillouin zone. The orange-framed subfigures represent the range of hyperparameter settings that yield a good reconstruction, which represents a relatively broad acceptance range to yield a good reconstruction. Although this aspect is dependent on the data, in our experience, the hyperparameter tuning may be carried out in a semi-automated fashion guided by visualization and heuristics. Typically, 10-20 trials are sufficient to yield a good reconstruction, although a grid search may also be carried out for completeness. For a given dataset, the hyperparameters typically fall within a similar range, therefore, determining the range of hyperparameters need only be carried out once. The choice of hyperparameters is more flexible for reconstructing more isolated bands or those with fewer crossings, and vice

Initialization Reconstruction
Supplementary Figure 4: Demonstration of hyperparameter tuning. An example of tuning the hyperparameters, the rigid energy shift (∆E) and the width of the nearest-neighbor Gaussian prior (η), for reconstructing the second valence band of WSe 2 . a, Evolution of reconstructed energy band during hyperparameter tuning. b, Evolution of the initialization and reconstructed band along high-symmetry directions of the hexagonal lattice of WSe 2 . The energy bands are overlaid on top of preprocessed data from photoemission band mapping of WSe 2 (Fig. 1f in the main text). In a,b, the images showing the optimal region for the hyperparameters identified by the scientists are emphasized with orange-colored frames. versa. The band-wise reconstruction and the computational efficiency of the algorithm also enable further parallelization in hyperparameter tuning by distributing the optimization tasks in a high-performance computing infrastructure.

Reconstructions using different theories as initializations
Comparison between reconstructed and theoretical band structures for 2H-WSe 2 are presented as a similarity matrix in the main text. To provide more intuitive visual guidance in interpreting the BS distance metric used in constructing the similarity matrix, we compare these band structures along the high-symmetry lines of the Brillouin zone in Supplementary Fig. 5.
Here, the comparison between reconstructed and calculated band structures show that the HSE06 and PBE have, respectively, the largest and smallest overestimation of total band width of WSe 2 among the four initializations, though HSE06 has a higher level of chemical accuracy than PBE [30]. The calculated band structures are closer to the reconstruction near the K-point than elsewhere in the projected Brillouin zone, reflecting the difference in electronic dimensionality between K (nearly ideally 2D) and elsewhere [11].

Generation of and validation on synthetic data
The advantage of using synthetic data is that the underlying band structure (i.e. ground truth) is exactly known so it can be used for benchmarking the performance of the MAP reconstruction algorithm described in this work. Benchmarking includes numerical experiments on two interrelated aspects: (1) testing the robustness of the reconstruction algorithm using different initializations and comparing the deviations of the outcome from the ground-truth; (2) testing the accuracy of reconstruction by determining the closest-possible reconstruction outcome from a given initialization. In the following, we first describe the workflow of generating the band structure, the photoemission data and the initializations, which provide all essential components to carry out the tests. Then we present the benchmarking results on various cases.

Generation of band structure data
We have adopted two approaches to generate band structure data to meet the needs for testing the reconstruction algorithm. Firstly, we used analytic functions to describe the band dispersion (see Supplementary Fig. 6). They are computationally efficient, contain tunable parameters, can be produced at any resolution, and are easily extendable to higher dimensions. In 2D momentum space, we constructed a multi-sinusoidal band and two double-crossing parabolic bands.
In 3D momentum space, we constructed a scaled version of the strongly oscillating secondorder Griewank function [83] and the tight-binding formulation of the two-band graphene band structure [84] as model band dispersion surfaces. The modified Griewank function takes the form, The two-band tight-binding model of graphene has energy dispersion relations, E ± (k x , k y ) = ± 3 + 2 cos √ 3k y a + 4 cos Here, E + and E − refer to the conduction band and the valence band, respectively. Secondly, we used numerical band structures from DFT calculations with different exchange-correlation functionals (see Supplementary Section 4). They are more physically realistic, but also require more computation to obtain than generating bands from analytic functions.

Approximate generation of photoemission data
We approximately synthesized momentum-resolved photoemission data for each energy band .
Without loss of generality, we assume the energy resolution in detection for all bands to be the same (σ j = σ). For the cases shown in Supplementary Figs. 6-9, the linewidth parameter γ are set to a constant throughout the band. In all synthetic data, we omitted the inhomogeneous intensity modifications in realistic photoemission data due to experimental factors such as the experimental geometry, sample condition, matrix element effect, photon energy, etc. This omittance relies on the assumption that the essential preprocessing step, such as symmetrization and contrast enhancement [26] in our workflow (see main text Methods), can sufficiently restore the intensity continuity along the energy bands. The momentum resolution effect is also not accounted for because the instrument (such as METIS 1000 [8,59]) has a higher momentum resolution than the momentum spacing used in data binning or generation.

Validation of the reconstruction algorithm
Using synthetic data generated from analytic functions of varying complexities as the band structure, we test out the accuracy of the reconstruction algorithm (see Supplementary Fig. 6); Using synthetic multiband data generated from the LDA-level DFT (LDA-DFT) band structures of WSe 2 (see Supplementary Section 4), we tested out the sensitivity of reconstruction to the initialization (see Supplementary Fig. 9). In this case, to capture sufficient physical realism similar to the photoemission band mapping of WSe 2 presented in the main text, we set the energy resolution parameter of σ = 100 meV, the lineshape parameter γ = 50 meV [88], and the energy spacing of data to ∼ 18 meV, identical to the energy bin size for the experimental data.
The tests include four sets of numerical experiments summarized below: 1. Reconstructing non-crossing bands: For isolated bands, we tested synthetic data constructed from a multi-sinusoidal band ( Supplementary Fig. 6a), the band generated by the Griewank function ( Supplementary Fig. 6c-d), and the two-band tight-binding model of graphene ( Supplementary Fig. 6e-f). In these cases, initialization with a flat band without any initial knowledge of the band dispersion (i.e. cold start) is sufficient to recover its shape, regardless of the complexity of the dispersion.

Reconstructing crossing bands:
We tested the simplest case of crossing bands with two parabolas of opposite directions of opening ( Supplementary Fig. 6b) Results from a series of numerical experiments for demonstrating the effects of data resolution in either instrument resolution (σ) or energy spacing of data (∆E) on the reconstruction accuracy. The results are compared against the ground truth (g.t.) band dispersion -displaced parabolas -by overplotting in dashes lines. The σ parameter is tuned to 50 meV, 100 meV and 200 meV, while the ∆E parameter to 6 meV, 12 meV and 24 meV. In a-i, the synthetic data with ground truth dispersion is shown on the left, the reconstruction outcome is displayed on the right, along with a zoomed-in view near the crossing placed at the bottom. Quantitative values of the reconstruction error are given in Supplementary Table 1. crossing bands in the reconstruction (Supplementary Fig. 7a-c). When the initial straight line contains the crossings in the ground truth, a symmetry breaking in the reconstruction takes place ( Supplementary Fig. 7c), depending on the data and the Gaussian width hyperparameter (η).
(2) Initialization with two straight lines containing a single crossing yields a reconstruction with at most a single crossing ( Supplementary Fig. 7d-f). (3) Initialization with double parabolas yields a reconstruction with at most the same number of crossings within the range of the data ( Supplementary Fig. 7g-l). When the reconstruction is successful, the crossings in the initialization are close to the intersection between the two parabolas. Besides, double-crossing parabolas with other parameters from those in Supplementary Fig. 7 are tested and similar outcomes are obtained.
Supplementary Table 1: Reconstruction error in resolution tuning experiments. For each band, the reconstruction error is the root-mean-square error per momentum spacing (unit in meV) between reconstruction and the ground truth, according to Eq. (9). In each numerical experiment, the tabulated reconstruction error is averaged over the corresponding two parabolic bands shown in Supplementary Fig. 8. The columns are the instrument resolution (σ) and the rows are the energy spacing (∆E) used to generate the intensity data. The crossing-band model is also an effective test case for resolution effects of the reconstruction algorithm. In this case, a momentum shift is introduced to two parabolic bands to produce the crossing, similar to the Rashba-split surface states of Au [89], which if often used

Computational benchmarks
We used the available synthetic photoemission datasets based on the computed band structure of WSe 2 to construct benchmarks. The synthesis made use of the approach described in Supplementary Section 2.1. The two datasets used here, taken from [90], exhibit different characteristics, which may be qualitatively described using the energy range of an energy band. The more overlap in the energy range between two energy bands, the more likely they have crossings (or anti-crossings). The dataset-specific information is as follows: • The synthetic dataset of the WSe 2 K-point shows close proximity in energies between neighboring momentum locations. The energy ranges of all energy bands have no or up to a moderate degree of overlap. The dataset size is 30 × 30 × 500 and contains 900 photoemission spectra.
• The synthetic high-symmetry line dataset of WSe 2 exhibits large dispersion. Since the highsymmetry line often represents the direction with the most dispersion in the band structure, the energy ranges of all energy bands are strongly overlapping. The dataset size is 186 × 500 and contains 186 photoemission spectra.
In both cases, the ground-truth band dispersions are taken from the LDA-DFT calculation, including all 14 valence bands, while the initializations for benchmarking both band reconstruction approaches are the PBE-DFT calculation (partial example see Supplementary Fig. 9g).
Using these two datasets, we compare the reconstruction algorithms based on pointwise fitting .
where N b is the number of bands and the subscript i is the band index. The instability is quantified by the standard deviation of the residual (difference between the ground truth and reconstructed energy dispersion), δE = E gt − E recon .
where the overline indicates the mean. This metric has been used in earth sciences to quantify surface roughness [41,91], which may be interpreted similarly in our case. The difference is that the roughness in the reconstructed surface is largely due to the instability of the optimization, besides the quality of the data, because band dispersions are generally smooth and continuous. In the main text Fig. 4, these tabulated metrics are normalized by the number of spectra to allow comparison between datasets, as is also adopted in [39]. We interpret the results in Supplementary Table 2 in the following two aspects:

Reconstruction for other datasets
To test the functionality of our MRF reconstruction algorithm on other materials, we have acquired photoemission band mapping datasets from gold (Au), a metal, and bismuth tellurium selenide (Bi 2 Te 2 Se), a topological insulator. Due to the complexity of the electronic structure of these materials, we focus here on reconstructing a subset of the energy bands of these two materials that are pronounced within the measured energy range. Besides, we simulated the case where the electron self-energy strongly modifies the band dispersion that results in kink anomalies [6,92,93].

Near-gap electronic bands of a topological insulator (Bi 2 Te 2 Se)
The dataset for Bi 2 Te 2 Se was measured at room temperature at the Fritz Haber Institute in Berlin using a momentum microscope (SPECS METIS 1000). The sample growth method was previously described in [94]. Preprocessing of the 3D band mapping data follows the procedure for WSe 2 data described in the main text, except that the rotational symmetrization is only threefold, due to the symmetry of the material. We used numerical initializations from simple functions such as paraboloid and Gaussian in 2D, instead of any first-principles calculation. The reconstructed energy dispersions were smoothed using Chambolle's total variation denoising algorithm [96] implemented in scikit-image [97], removing the high-frequency noise as a result of the Poisson statistics of the photoemission data. As shown in main text Fig. 6c, the simple initializations we chose are sufficient to reconstruct the complex dispersion from the first two valence bands, the SS and parts of the first conduction band occupied by the excited electronic population. The appearance of the first conduction band for Bi 2 Te 2 Se is a result of photoexcitation [98]. The reconstructed bands show sixfold symmetry and warping in agreement with previous theoretical investigations [43,95], which is more straightforwardly visualized in 2D and 3D as in main text Fig. 6d-e.
For the dispersion surfaces of the SS and the conduction band, we truncated the dispersion to a realistic energy range not far from the photon energies of the excitation light pulses.

Bulk electronic bands of gold (Au)
The Au dataset was measured at 100 K at the SGM-3 beamline [99] of the 3rd-generation synchrotron radiation facility ASTRID2 in Aarhus, Denmark. The Au samples were purchased from MaTecK GmbH with a (111) surface. The sample preparation procedure has been previously described [100]. The photoemission data were measured along the high-symmetry direction (ΓKMΓ) of Au(111), which exhibits a hexagonal symmetry in the surface Brillouin zone [101] (indicated with an overbar over each symmetry label) similar to WSe 2 . As shown in main text Fig. 6f, the collection of energy bands present in the photoemission data for Au (111) includes the surface states (SSs), which, at sufficient momentum and energy resolution, are composed of momentum-shifted parabolas [89]. The photon energy for the photoemission measurement is ∼ 80 eV, which resolves the sp bands and the surface states poorly but the bulk d bands better. The sp bands and the d bands are the low-energy bulk electronic bands of Au.
Before reconstruction, the Au data has been preprocessed using contrast enhancement and intensity smoothing as described in the main text for the WSe 2 data before reconstruction. The reconstruction used existing DFT calculations, which feature a Au(111) slab containing five Au layers constructed according to [102], as initialization to retrieve parts of the d bands that are resolvable within the current dataset. The comparison between initialization and reconstruction is shown in main text Fig. 6g. The choice of the initialization is a consistent set of energy bands (i.e. produced by the same slab) from DFT calculations of Au(111) slabs in the energy range close to the noticeable bands in the photoemission data. Although traditionally, slab calculations along with overplotting are used to approximate the total band width, we have shown that our reconstruction approach can detect existing band-like dispersive features in these highly congested data.

Reconstructing the kink anomaly
Kink anomalies are a kind of feature for electron-phonon interaction in photoemission signals [5,6] found in various materials [92,93,103]. To test the reconstruction performance, we simulated the photoemission signal for a kink anomaly using the full spectral function introduced in Eq. (11). The real part of the electron self-energy is calculated using the Eliashberg function [104] represented as an Einstein mode (i.e. single-phonon mode with a delta-function-like frequency response) [105], which appears near the Fermi level. Further details of the computational model can be found in the Jupyter notebooks within the associated compute capsule.
The presence of an Einstein mode in the spectral function results in a phonon-induced kink at around -0.1 eV.
The outcome of the reconstruction, shown in Supplementary Fig. 10, indicates that the MRF model can recover faithfully the quasiparticle dispersion including the shape of the kink anomaly. The reconstruction can simply be initialized with a flat line, which produces identical results from initialization with a linear dispersion that could represent prior knowledge of the algorithm user. The results show that for strongly dispersive energy bands with almost vertical dispersion along the energy direction, reconstruction along the energy direction (i.e. treating the data as a collection of momentum distribution functions) yields a better outcome. This is because the existence of the kink violates the one-to-one mapping between the band energy, E(k), and the photoelectron momentum, k (see Supplementary Fig. 10a-b). In these cases, a b c d Supplementary Figure 10: Band reconstruction involving a kink anomaly. Band reconstruction was carried out along either the momentum (a, b) or energy (c, d) directions. Reconstruction (recon.) along the momentum direction using a, a flat (uninformative) initialization (init.) and b, an informative initialization that approximates foreknowledge of the linear bare-band dispersion yield mostly identical outcomes, which have deviations from the ground truth (g.t.) near the kink and the Fermi level, as indicated with black arrows in a. Reconstruction along the energy direction using c, a flat (uninformative) initialization and d, an informative initialization both yield highly accurate outcomes compared with the ground-truth quasiparticle dispersion in dashed green lines in a-d.
the reconstruction is still viable using the momentum distribution function as the likelihood in the MRF model, which effectively amounts to transposing the image and swapping the momentum and energy coordinates, while the same optimization algorithm described in this work for the EDC-based approach can be reused to obtain the correct quasiparticle dispersion (see Supplementary Fig. 10c-d).
4 Band structure calculations

DFT calculations
The crystal structure of bulk WSe 2 with 2H stacking (2H-WSe 2 ) belongs to the P6 3 /mmc space group and consists of two Se-W-Se triatomic layers as shown in Supplementary Fig. 11. The  Table 3 shows the optimized lattice constants, a and c, as obtained by the evaluation of the analytical stress tensor [106] using different exchange-correlation functionals. In all BS calculations, we included the effect of spin-orbit coupling, which is known to introduce a large splitting of the outermost valence states of bulk 2H-WSe 2 [107].
The calculated BSs of bulk 2H-WSe 2 using different levels of approximation for the exchangecorrelation (XC) functional are shown in Supplementary Fig. 12. For each XC functional, the a b Supplementary Figure 11: Crystal structure of bulk 2H-WSe 2 . a, Side view and b, top view of the crystal structure of 2H-WSe 2 . The space group of the hexagonal structure is P6 3 /mmc with the c-axis oriented perpendicular to the stacking layers. In each case, the real-space unit cell is labelled by dashed black lines.

Brillouin zone tiling
The generation of a large and densely sampled patch of energy bands covering the first Brillouin zone and beyond is crucial for initializing the MRF model. To balance the computational cost using different XC functionals with the dense sampling similar to the experimental data grid, we used the symmetry properties of the Brillouin zone to tile the calculated momentum-space rectangular patch that covers the Γ, K and M points of the Brillouin zone. The hexagonal Brillouin zone of WSe 2 has a sixfold rotation symmetry axis and two independent mirror planes in the (k x , k y ) coordinates. The initial rectangular patch is first symmetrized about the two mirror planes in the Γ-K and Γ-M directions to form a larger patch, which is then rotated by 60 • and 120 • , respectively, and combined with the original mirror-symmetrized patch. The composite patch is then shifted along all six Γ-M directions by one unit cell distance and the   Fig. 3a in the main text, including the sparse distribution of major basis terms and the decreasing dependence on higher-order basis terms. e-h, Cosine similarity matrices between the 14 energy bands of WSe 2 for the DFT band structure calculations carried out at the levels of LDA (e), PBE (f), PBEsol (g), and HSE06 (h), respectively. The characteristics of these matrices resemble that calculated for the reconstructed band structure as shown in Fig.  3c in the main text.    Supplementary Figure 14: Approximation to the band structure of WSe 2 by a polynomial basis. a-j, Demonstration of the convergence properties of the polynomial approximation using reconstructed photoemission band structure (a-d) and DFT band structure calculated at the LDA level (e-h). When summing the hexagonal Zernike polynomial in the default order, the average and relative approximation errors for the reconstructed (a,b) and theoretical (e,f) energy bands converge much slower than summing the polynomials in an ordering ranked by the magnitude of their coefficients (coefficient order). This observation is similar for reconstructed (c,d) and theoretical (g,h) energy bands. i-j, Visualization of the difference in convergence rates using the reconstructed band structure along the high-symmetry lines. The naturally-ordered polynomial basis has not yet converged with 150 terms (i), while the coefficient-ranked polynomials (j) produces an accurate approximation well within that limit.
result is cut to the required shape compatible with photoemission data.
5 Band structure informatics

Global structure descriptors
We use informatics tools for data retrieval, representation and comparison for entire bands. We extend the examples given in main text Fig. 3b to other bands of WSe 2 reconstructed in the present work. Supplementary Fig. 13 displays the band-wise comparison of dispersion surfaces within other DFT calculations. These results contain similar features as in main text Fig. 3a and c, reaffirming that the geometric featurization provides a sparse representation of the band dispersions and that the dispersion similarities are largely preserved despite the use of different exchange-correlation functionals in the DFT calculations. They may, therefore, be regarded as general features of the WSe 2 band structure.
In Supplementary Fig. 14, we demonstrate numerically the approximation capability of the hexagonal ZP basis set to all 14 valence bands of WSe 2 . Despite the stark differences in energy dispersion, the approximation to reconstructed bands ( Supplementary Fig. 14a-d) and theoretical band structure at the level of LDA-DFT ( Supplementary Fig. 14e-h) show comparable convergence rates. Quantitatively, the approximation using hexagonal ZPs ordered by the magnitude of the corresponding coefficients (i.e. coefficient order) converges to within 10-30 meV/band within 50 polynomial basis terms, substantially faster than using the default order (see also Fig. 3b for reference). The remaining errors are on par with the finite step size along the energy axis in the data (∼ 18 meV) that results in the imperfect smoothness of the reconstructed bands. This further proves that the hexagonal ZPs can provide an accurate and sparse approximation for the band structure data. The trend of convergence between these two types of polynomial ordering is further illustrated in Supplementary Fig. 14i-j in the momentum path along high-symmetry lines of the reconstructed band structure.

Local structure descriptors
Local structural information includes energy gaps, effective masses, warpings, (avoided) crossings, etc. We extracted some of their associated parameters at and around three high-symmetry points (K, M , and Γ, see main text Fig. 5a) and compiled the results in Supplementary Table   4. The dispersions and band structure parameters from the MAP reconstruction are compared with those extracted by the line-by-line fitting of the EDCs, which used the band energies from the reconstruction as initialization to improve robustness. Around K, two spectral peaks corresponding to two spin-split bands were fit simultaneously, while around M and Γ, four were fit simultaneously due to the spectral proximity of the first four valence bands (see Supplementary   Fig. 5). The fitting is carried out using a linear superposition of Voigt lineshapes and the lmfit package [110] with the reconstructed band energy as initialization (but not fixed). The fitting procedure iterates over the EDCs (e.g. a total of 50×50 EDCs for the patch around M ). Unstable fits yielding erratic results (e.g. if differing greatly from neighboring values) are re-fit with either algorithmically or manually adjusted initialization. Supplementary Table 4 shows that the local structural information from reconstruction is generally consistent with those obtained E(q) = 2 q 2 2m K + C|q| 3 cos(3ϕ q + θ) + E 0 .
The second valence band is not fitted at M due to the pronounced dispersion modulation by interband coupling unaccounted for in the existing saddle-shaped model. At around Γ, a single effective mass is extracted by fitting a paraboloid to a local patch of the dispersion surface.