Machine learning the microscopic form of nematic order in twisted double-bilayer graphene

Modern scanning probe techniques, such as scanning tunneling microscopy, provide access to a large amount of data encoding the underlying physics of quantum matter. In this work, we show how convolutional neural networks can be used to learn effective theoretical models from scanning tunneling microscopy data on correlated moiré superlattices. Moiré systems are particularly well suited for this task as their increased lattice constant provides access to intra-unit-cell physics, while their tunability allows for the collection of high-dimensional data sets from a single sample. Using electronic nematic order in twisted double-bilayer graphene as an example, we show that incorporating correlations between the local density of states at different energies allows convolutional neural networks not only to learn the microscopic nematic order parameter, but also to distinguish it from heterostrain. These results demonstrate that neural networks are a powerful method for investigating the microscopic details of correlated phenomena in moiré systems and beyond.


I. INTRODUCTION
Driven by the impressive improvements in machine learning (ML) in the last couple of years, exploring its potential for quantum many-body physics has recently become subject of intense research [1,2].For instance, ML provides powerful tools to solve inverse problems that occur frequently in physics [3][4][5][6]: given a model, it is often straightforward with conventional many-body techniques to compute observables that can be measured experimentally, whereas the often needed inverse problem of extracting the model and underlying microscopic physics from observations is much more challenging and typically even formally ill-defined.A second example of a large class of applications of ML in physics is ML-assisted analysis of experiments, in particular of those yielding imagelike data like scanning tunneling microscopy (STM) [7][8][9][10], photoemission [11], and others [12][13][14][15][16][17].
In the context of applying ML algorithms to data from imaging techniques like STM, van der Waals moiré superlattices [18,19] are particularly promising for three reasons: (i) they display a huge variety of correlated quantum-many-body phenomena, such as interactioninduced insulating phases [20], magnetism [21], superconductivity [22], electronic nematic order [23][24][25][26], which can also coexist microscopically [26,27].Despite intense research on these phenomena over several decades, e.g., in the pnictides or cuprates, their origin and relations are still subject of ongoing debates.However, compared to these microscopic crystalline quantum materials, moiré superlattices are (ii) highly tunable; for instance, the density of carriers can be varied within a single sample just by applying a gate voltage (as opposed to chemical doping) and even the interactions can be tuned [28].This allows producing large data sets of measurements on a single sample, containing a lot of information on the microscopic physics.This aspect, which is crucial for data-driven approaches, is further enhanced by (iii) the large moiré unit cells of these systems compared to that of microscopic crystals, increasing the relative spatial resolution of scanning-probe techniques significantly.This enables experiments to probe the structure of the wave functions within the unit cell, and thus provides unprecedented access to the microscopic physics compared to conventional quantum materials.For instance, in the extreme limit of only one degree of freedom (Wannier state or pixel) per unit cell, the broken rotational symmetry of the electron liquid-the defining property of electronic nematic order [29,30]-is not visible as a consequence of translational symmetry and thus requires a careful analysis of the behavior around impurities [31].
In this work, we explore these advantages of moiré superlattices for extracting or 'learning' effective fieldtheoretical descriptions of their correlated many-body physics from STM data.This can be viewed as an inverse problem and is also conceptually related to the goal of 'Hamiltonian learning' in quantum simulation [32][33][34][35][36][37], albeit in rather different regimes and based on different measurement schemes.As a concrete example, we use electronic nematic order in twisted double-bilayer graphene (TDBG) [38][39][40][41][42][43][44].This moiré system consists of two AB-stacked bilayers of graphene that are twisted against each other; as one can see in Fig. 1(a), it exhibits the point group D 3 , generated by three-fold rotation C 3 along the out-of-plane z axis and two-fold rotation C 2x along the in-plane x axis.Evidence of electronic nematic order has been observed in previous STM experiments [41,45] which clearly exhibit stripe-like features breaking the C 3 symmetry spontaneously for certain electron concentrations.While simple limiting cases have been compared with the data in Ref. 45, there is no systematic analysis of the microscopic form of nematicity in the system.To fill this gap, we consider the more general case in which all leading terms on the graphene and moiré scale describing nematic order in a continuum-model description of TDBG are included.In addition, as it is common in graphene moiré systems [23-25, 41, 46], we also allow for finite strain.The Hamiltonian defining the changes in TDBG resulting from nematic order and strain depends on a set of parameters β, which we reconstruct from STM data using convolutional neural networks (CNN) in a supervised learning procedure.As such, our study differs significantly from recent works, which focused on detecting the presence or absence of nematic order [31] or performed a phenomenological data analysis of STM measurements [47] with ML, rather than extracting the underlying microscopic physics as we do here.

A. Nematic order in TDBG
The non-interacting band structure of TDBG features two moiré minibands per spin and valley close to charge neutrality, where a variety of correlation-driven phenomena can emerge [38][39][40][41][42][43][44].In Fig. 1(b), these minibands are denoted as valence (VFB) and conduction flat bands (CFB).The band structure shown is obtained from continuum model calculations close to half filling of the CFB (band filling ν = 0.475), where electronic nematic order was observed to be the strongest [41], see Appendix A for more details.STM experiments probe the band structure and wave functions of a system by providing direct access to the spatial and energy dependence of the LDOS.Most commonly, the LDOS is studied either for a fixed position r 0 over a range of different energies, D r0 (ω), or for a fixed energy ω 0 covering a spatial region of the system, D ω0 (r).The behavior of D ω0 (r) and D r0 (ω) following from the continuum model for TDBG for three different energies and high-symmetry positions in the moiré unit cell is shown in Fig. 1(c).The C 3 rotational and translational symmetry of the moiré lattice can be clearly seen in D ω0 (r).Meanwhile, C 2x is broken, albeit weakly, as a consequence of the electric field required to control the electron filling to be close to the middle of the CFB in an open-faced STM sample geometry [41].
In graphene moiré systems, there are two fundamentally distinct sources of C 3 symmetry breaking-strain and electronic nematic order.Postponing the discussion of the former to below, electronic nematic order [29,30] refers to the spontaneous rotational symmetry breaking as a result of electronic correlations.While recent works also indicate the possibility of nematic chargedensity wave states in TDBG [42,48], where moiré translational symmetry is simultaneously broken, we here focus on translationally symmetric nematic order since the STM data of Ref. 41 preserves moiré translations.The underlying nematic order parameter we study is a time-reversal-and moiré-translation-invariant vector Φ = Φ Φϕ , Φϕ = (cos 2ϕ, sin 2ϕ), transforming under the irreducible representation E of D 3 (or of C 3 , taking into account the weak C 2x breaking); Φ and ϕ stand for the intensity and orientation of the nematic director, respectively.The microscopic form of nematicity can be modeled by a coupling of Φ to a fermionic bilinear and reads in its most general form in a continuum-model description as [45] H Φ = r ∆r Φ • φ σ, ,s,η; σ , ,s ,η (r, ∆r) where c † and c are the electronic creation and annihilation operators.This general form encompasses couplings between the two sublattices s = A, B of the microscopic graphene sheets, the four graphene layers = 1, . . ., 4, the valley η = ± and spin σ = ↑, ↓ degrees of freedom in the tensorial form factor φ σ, ,s,η; σ , ,s ,η (r, ∆r); its two components are required to transform in the same way as Φ under all symmetries of the system.In the following, we will take φ to be trivial in the spin and diagonal in the valley indices, φ σ, ,s,η; σ , ,s ,η = δ σ,σ δ η,η φ ,s; ,s (η).This is motivated by the weak spin-orbit coupling in graphene [49,50] and the lack of indications of interaction-induced spin-orbit coupling, which is also strongly constrained [51].Furthermore, the intervalley-coherent nematicity is known to lead to stronger effects on the remote bands [45] that were not observed experimentally [41].
Since we are working with a continuum theory, the space of possible couplings φ in Eq. ( 1) is technically infinite dimensional.As such, a complete reconstruction of φ from experimental data is impossible given the finite resolution and energy range of the available data.On top of this, it is not required either as we are primarily interested in understanding the low-energy behavior of the system.In the spirit of gradient expansions commonly used in continuum low-energy field theories, we will therefore only keep the leading terms in Φ.There is, however, a subtlety associated with the presence of an additional moiré length scale.We will therefore have to consider two basic classes of nematic orders, referred to as graphene (GN) and moiré (MN) nematicity [41,45].
In the case of MN, nematic order is associated with the moiré scale, i.e., we choose ∆r 2 in Eq. ( 1), m j ∈ Z, with moiré lattice vectors L M j , to represent the non-trivial transformation behavior of φ under C 3 .We can thus take it to be diagonal in the remaining internal indices, yielding Φϕ MN • φ m1,m2 (r) with multi-index α = (σ, , s, η).We further focus on the lowest moiré-lattice harmonic by setting φ m1,m2 (r) = φ m1,m2 and only keeping the terms with the shortest possible R m1,m2 .Intuitively, MN order can be thought of as a distortion of the effective inter-moiré-unit-cell hopping matrix elements, as illustrated schematically in the lower right panel of Fig. 1(d).
Conversely, GN acts as a local order parameter, ∆r = 0 in Eq. ( 1), without any explicit reference to the moiré scale, Here, the correct transformation properties of φ result from its structure in the internal indices.Focusing on the local intra-layer contributions and the leading (constant) basis function, the most general form reads as where Pauli matrices in sublattice space are represented by ρ j ; α l and ψ l are real-valued parameters.As shown schematically in the upper left panel of Fig. 1(d), one can think of GN as the nematic distortion of the bonds of the individual graphene layers in a way that preserves the graphene translational symmetry.We emphasize that GN and MN should not be viewed as distinct phases; they break the same symmetries and as such in general mix.We thus take H MN Φ + H GN Φ to describe nematicity in TDBG in the following, which depends on the set of parameters β = {α , ψ , Φ MN , Φ GN , ϕ MN , ϕ GN }.The computation of the LDOS for a specific set of parameters can be done straightforwardly from the continuum model.The resulting spatial dependence of the LDOS, D ω0 (r), is also shown in Fig. 1(d) for two different values of β.As opposed to the plots without nematic order, C 3 is now broken, leading to stripes in the VFB, while translational symmetry is still preserved.The inverse probleminferring the value of the parameters β from a given LDOS pattern-is a much more challenging task.Our goal in the following sections will be to use ML, in particular, CNNs to learn the set β directly from LDOS images.

B. ML architecture
Using CNNs to solve this inverse problem can be interpreted as a supervised learning task [2], i.e., a regressionlike procedure using synthetic LDOS data labeled by their respective value of nematicity parameters β.More specifically, our CNNs take as inputs 65 × 65 pixels of LDOS images and apply consecutive transformations (represented by a set of weights between each layer) in order to extract meaningful correlations that represent the set β.One example of the CNN image inputs is shown in Fig. 2(a).The complete dataset consists of 12000 images which are divided into training (60%), validation (20%) and test (20%) subgroups.Each image is generated for a randomly sampled set of nematic parameters β and the intensities in the LDOS are modified with the addition of Gaussian noise (see Appendix A).The motivation for noise is twofold: to avoid overfitting [52] and to test the stability against and performance of the procedure with noise, which is inevitably present in experimental data.
The ML architecture chosen for this task is represented in Fig. 2(a) and its implementation was done with the TensorFlow library [53].Each convolutional layer is followed by batch normalization and max pooling layers (Conv-Batch-MaxPool).The batch normalization layers normalize the input weights in each stage, and also reduce the number of epochs necessary for convergence [54].This process is repeated four times, with the convolutional layers having a kernel size of 3 × 3 and strides set to 1.The filters follow a sequence of 16 − 32 − 32 − 16 with rectified linear unit (ReLU) activation functions [55].Padding is set to zero such that the reduction of dimensionality is performed only by the MaxPool layers.In turn, these have both strides and pool sizes set to 2×2.After a Flatten stage, dense layers lead to a dropout before the final layer with filters equal to the number of parameters in β.The Flatten layer transforms the data to a one-dimensional shape, and the Dropout reduces overfitting by setting a percentage of 20% adjusted weights to zero [56].Tests on variations of this architecture are briefly described in Appendix B.
The learning procedure is then defined by the minimization of the loss function with respect to the CNN's weights in a backward propagation procedure [57].The loss function can be represented as the mean squared error (MSE), which is defined as the difference between the true and expected set of parameters /N , with N representing the number of samples in the test dataset.Finally, we consider the adaptive moment estimation (ADAM) for the minimization of the loss function, with a learning rate of 0.001 and batch size equal to 64 [58].After the completion of the training stage, the algorithm is ready to be deployed to previously unseen data, returning as outputs the parameters β predicted .

C. Orientation of nematic director
As a first investigation, we consider the task of predicting the orientation ϕ of the nematic director from D ω0 (r) images at a single energy in the VFB [ω 0 = −15 meV, see Fig. 1(b)].For this, we consider a dataset with randomly generated MN and GN intensities Φ MN , Φ GN ∈ [0.001, 0.1] eV, and ϕ MN = ϕ GN = ϕ ∈ [0, π].Furthermore, ψ l = 1 and α l = 0 for all layers.The relation between the shape of the LDOS at a single energy D ω0 (r) and ϕ is highly non-trivial for two reasons: even for a given form of nematicity, changing ϕ generically not just merely rotates the LDOS pattern, due to the lattice, but leads to complex distortions of its structure.Additionally, by sampling H MN Φ + H GN Φ , even if the same bond direction is favored over the C 3 -related ones in the LDOS pattern of two samples, the underlying ϕ can be rather different.As can be seen in the three sample LDOS plots in Fig. 2(b) with different values of ϕ, the correspondence between ϕ and D ω0 (r) is complex and not apparent to the human eye.
Using the angles ϕ as labels to the data is the most straightforward choice, but leads to inaccurate predictions around 0 and π due to the periodicity in the definition of the nematic order parameter, Φϕ = (cos 2ϕ, sin 2ϕ) = Φϕ+π .To circumvent this feature, we use the two-component label Φϕ instead of ϕ in the training process and then fold the network's prediction back to ϕ with the arctan2 function [59].The results, shown in Fig. 2(b), are consistent with the true labels, including at the boundaries of ϕ's domain.This shows that even when the precise nature of nematicity (predominantly MN or GN or an admixture of the two) is not known, the director orientation ϕ can be accurately predicted with our CNN setup from D ω0 (r) at a single energy.We have checked that the few outliers in Fig. 2(b) are directly related to small nematic intensities, where ϕ has virtually no impact on the LDOS and is, thus, impossible to predict.

D. Form of nematicity
After successfully learning the director orientation ϕ in the presence of different nematicities, we proceed into investigating finer details of these couplings by learning the parameters β = {Φ MN , Φ GN , α l } defined in Eqs.(2)(3)(4).To this end, we consider ψ l = 1 and α l = α for all layers.For concreteness, we set ϕ MN = ϕ GN = ϕ = 2π/3, which is one of the possible discrete orientations (ϕ MN = ϕ GN = 2π/3, π/6 and symmetry related) of the nematic director in presence of C 2x .The dataset now consists of randomly generated MN and GN intensities Φ MN , Φ GN ∈ [0.001, 0.1] eV, and α ∈ [0, π].The intensity values are chosen such that the stripes in the VFB resemble the experimental results [41].As with ϕ, instead of learning the angular variable α directly, the arctan2 mapping from Section II C is applied.
Using only the LDOS at a single energy (i.e. one D ω0 (r) channel) in the ML architecture for this task does not produce accurate predictions.Additionally, both hyperparameter optimization and architecture modifications did not lead to any significant improvement, implying that nematic order impacts the electronic structure in complex ways that cascade across energy scales.In fact, this is also intuitively clear since, for example, the samples marked by a star and pentagon in Fig. 3(a) have fundamentally different nematic couplings and yet exhibit visually similar D ω0 (r) images at the VFB energy.
In experiments, one can typically obtain single point spectra [D r0 (ω)] and real space LDOS images at fixed energies [D ω0 (r)].We can therefore include additional input channels corresponding to D ω0 (r) and D r0 (ω) for different energies ω 0 and points r 0 , respectively.In the second case, the individual point spectra are transformed to scaleogram images for consistency with the input data for CNNs [5,60], see upper left inset in Fig. 3(a) and Appendix A. The new architecture is then formed by four channels with D ω0 (r) inputs at fixed energies ω 0 = (−35, −15, 1, 23) meV within the flat and remote bands, such that they resemble visually the corresponding ones in the experimental data of Ref. 41, and three channels for D r0 (ω) scaleogram inputs at stacking positions r 0 = {ABAB, BAAC, ABCA}, cf.Fig. 1(c).Each channel is passed through parallel Conv-Batch-MaxPool layers as in Fig. 2(a), but instead of flattening each channel separately, they are concatenated to a Dense-Dropout stage before the last layer [Fig.3(a)].
In Fig. 3(b-d), predictions on the test data set are represented for (b) α, and (c) the moiré and (d) graphene nematic intensities; as can be seen, very good agreement is found between the reconstructed and true parameters.The outliers in α are related to small Φ GN (brighter colors).From Eqs. ( 3) and (4), it is clear that for small Φ GN , minimal changes will be induced in the LDOS, irrespective of the true value of the phase governed by α.This is a similar behavior to what was observed for outliers in the nematic director prediction.If α is maintained fixed, we observed (not shown) that predictions for Φ GN and Φ MN get more accurate.The results of Fig. 3 demonstrate that the microscopic form of nematicity can be extracted from the LDOS if significant energy dependence is included in the input data set.

E. Including strain
As already alluded to above, another possible source of C 3 breaking is strain [46,[61][62][63], which is believed to be a ubiquitous property of graphene moiré superlattices at small twist angles.Breaking the same symmetries as nematic order, strain can obscure the experimental identification of nematic order and their precise interplay is still under debate [23][24][25]64].Experiments indicate [23-25, 41, 46] that the most relevant form of strain in graphene superlattices such as twisted bilayer graphene or TDBG is uniaxial heterostrain.In this case, the matrices E j describing the in-plane metric deformation of the coordinates in the jth rotated bernal bilayer of TDBG FIG.4: Predicted versus true values for the strain intensity (a) and angle θ (b).The prediction for the nematic intensities are depicted in panels (c) and (d).The CNN architecture used to produce these results is described in Fig. 3(a).Similarly to the prediction of the α parameter in the presence of only nematicity, outliers in θ are related to small .are of the form Here ν = 0.16 is the Poisson ratio for graphene and R(θ ) is the 2 × 2 matrix describing rotations of 2D vectors by angle θ .We see that uniaxial heterostrain is characterized by two variables, the strain intensity and the direction of strain, parameterized by the angle θ .
In the following, we allow for the simultaneous presence of uniaxial heterostrain and nematic order, leading to two additional parameters, and θ , in β.We will study whether our ML approach is still able to extract the microscopic form of nematicity and also learn the relative strength and direction of strain.Note that the form of nematicity is still given by Eqs.(2)(3)(4), with the only difference that we replace L M j in the definition of R m1,m2 by the strained moiré lattice vectors.The data set for this task is built with nematic intensities Φ MN , Φ GN ∈ [0.001, 0.1] eV, with the addition of strain parameters ∈ [0, 0.8] % and θ ∈ [0, π/3].Here, α l = 0, ψ l = 1 and ϕ = ϕ MN = ϕ GN = 2π/3.The domain for the strain intensities is chosen based on typical values observed in TBG [23], and for θ on the periodicity of the unstrained system as θ → θ + π/3 [63].The ML architecture employed in this section is the same as in the previous investigation [Fig.3(a)].
In Fig. 4(a-d), predictions on the test data set are shown for (a), θ (b), and the nematic intensities (c-d).At first sight, the result for the strain angle in Fig. 4(b) looks as if the procedure ceased to work, since there are many data points where the true and predicted value of θ differ significantly.However, when indicating the true strain intensity label for each prediction, it becomes clear that the outliers are related to small values of (brighter colors).As such, this behavior is not a shortcoming of the learning procedure but actually a feature of strain: for small enough in Eq. ( 5), the angle θ has no meaning.We have checked that removing the samples with small strain from the training and test data set will lead to accurate predictions of θ .The stability that we find for our learning procedure in the presence of virtually vanishing is, however, important when applying it to experimental data, where the strength of strain is unknown.
Most importantly, we see in Fig. 4(c-d) that the nematic couplings can still be accurately predicted when varying strain is present.The MAE is equally distributed in these cases, in contrast to the strain intensity prediction.This shows that not only nematic order can be identified when strain is present, but also its internal structure and the strength of strain that is present at the same time can be resolved when using different channels consisting of both D r0 (ω) and D ω0 (r) as inputs.This allows the networks to take into account correlations between different energies in the STM data, which in turn conveys the crucial microscopic physics, enabling the model to disambiguate between lattice and electronic effects.

F. Experimental data
After demonstrating the effectiveness of CNNs on learning microscopic parameters {β i } from a synthetic (theoretical) data set D th (β 1 , • • • , β N th ) with N th samples, we now proceed into applying the trained ML architecture for predictions of the a priori unknown sets of parameters {β i } in an experimental data set For concreteness, we use the same synthetic training data set as in Appendix B, where only the nematic and strain intensities are predicted, i.e., β = {Φ MN , Φ GN , }.The data set D exp is constituted of both scaleograms D r0 (ω) and D ω0 (r) maps for different fillings of the CFB (n s ).More details about the preprocessing of the experimental data D exp can be found in Appendix C.
In Fig. 5, predictions of the trained CNN for the set {β i } show non-zero values of nematicity (a) and strain (b) for all fillings of the CFB.For n s ≥ 0.47 (gray region), the experimental data shows the most pronounced signatures of broken rotational symmetry to the human eye, which was previously interpreted as electronic nematic order [41,45].Here the CNN predicts MN to dominate over GN, although both are finite (as expected by symmetry).As can be seen in Fig. 5(c predicted by the CNN nicely reproduce the key features in the experimental data, including the strong stripes in the VFB and the much weaker, albeit finite, signatures of nematicity in the other bands. For smaller fillings, n s < 0.47, the experimental data still exhibit distortions that break C 3 , see Appendix C, but no clear stripe-like features appear.The CNN tries to assign different anisotropy sources to these distorted regions, but the agreement between theoretical prediction and experiment is less accurate than for larger n s .It is clearly possible that, indeed, a crossover from primarily MN to GN occurs when lowering n s , as predicted by the neural network, see Fig. 5(a), in particular, since nematic order is also a plausible instability in non-twisted bilayer graphene [28,65].However, we believe that additional experimental data and refined theoretical models are required to conclude whether this is really the case.
In contrast to this interplay between the nematic couplings, strain remains relatively constant for all n s , and slightly decreases in Fig. 5(b) for n s ≥ 0.47 as it approaches the same order of magnitude of ∈ [0.003 − 0.1%] that is expected for the experimental samples in D exp [41].We note that at low fillings the value of strain that is predicted by the neural network is nevertheless significantly greater than the value extracted from experimental topography.This is likely a consequence of subtle differences between the continuum model calculations and the experimental spectroscopy, which the network attempts to accommodate by including finite strain.

III. DISCUSSION
We constructed and demonstrated a ML procedure that can extract the form of the nematic order parameter in TDBG from LDOS data.The key ingredient was the use of several channels that capture the correlations among different energies.Our work has several important implications.First, it shows that the presence and even the strength and internal structure of nematic order can be extracted when the sample exhibits significant heterostrain; this is a crucial aspect for moiré systems where the issue of distinguishing between nematicity and strain has been subject of debate.Second, our analysis also shows which type of STM data is needed and most useful to extract information about nematicity: as we have seen, the LDOS maps at a single energy, D ω0 (r), are not enough to deduce the form of the nematic order parameter and-contrary to what one might have expected-point spectra, i.e., D r0 (ω), contain a lot of helpful complementary information for that task (see also the second model discussed in Appendix D).We emphasize that this form of 'solid-state Hamiltonian learning', i.e., of parameterizing the leading terms of a set of microscopic order parameters (like nematic order) or perturbations (such as strain) and extracting their form using multi-channel CNNs can be more broadly appliedto other systems, see Appendix D where we discuss a toy model for twisted-bilayer graphene, and other forms of instabilities.As such, this could open up completely new ways of revealing the form and role of nematic order and other phases for the physics of quantum materials.[53], batch sizes (9, 16, 32, 48 and 96), different number of filters and convolution layers in the Conv-Batch-MaxPool channels, different learning rates (10 −1 , 10 −2 , 10 −3 and 10 −4 ) and optimizers (RMSprop, SGD and ADAM).The architecture described in Sec.II B and its variation in Fig. 3(a) already correspond to the optimal configuration.Even though these investigations are not an exhaustive treatment with respect to all possible parameters and correspondent combinations, it shows that certain elements play a major role in the CNN's performance, such as choosing ReLU as activation functions, setting padding to zero in the convolution layers and using a learning rate of 10 −4 .Finally, even though the four Conv-Batch-MaxPool channels in the main architecture may not be necessary for simpler cases (e.g.predicting the nematic director with fixed nematic intensities), it is essential for increasing parameters and complexity, such as learning strain and the internal structure of nematicity simultaneously.We have also investigated the influence of additional preprocessing of the D ω0 (r) channels of the data set D exp on the predictions.We introduce more contrast to the images [Fig.8(c)] and reduce noise by smoothing the pixel distribution with a multidimensional Gaussian filter [Fig.8(d)].In Fig. 8(e-f), the resulting predictions for the nematicities and strain using these augmented D exp are shown.Here, every dot represents an average over the predictions on 10 variations of D exp with Gaussian filter with standard deviations of the Gaussian kernel in σ GF = {0, 1, 2, 5, 10} with and without higher contrast.The overall behavior described in Sec.II F is unaffected by these modifications, but the predictions with the lowest strain intensity in the gray region of Fig. 5(a-b) were found for the raw data inputs [Fig.8(b)] -see Fig. 9.
We emphasize the importance of the multi-channel CNN architecture in Fig. 3(a) for the predictions in Fig. 5(a-b).
While using only D ω0 (r) at the flat bands already seems to capture the interplay between MN and GN as a function of fillings of the CFB, the addition of channels for the remote bands and scaleograms is crucial to discern the influence of strain and nematicity in the experimental samples.This can be intuitively understood by the relative stability of the remote bands with respect to heterostrain over a wide range of fillings n s in D exp [41].These results indicate that the inclusion of more channels from even more remote bands could potentially produce more accurate predictions for the strain intensity; we leave this for future work.To demonstrate that our ML approach of extracting microscopic parameters based on CNNs with multiple channels works more generally, we here apply it to a different moiré system.To further increase the variability of models studied in this work, we do not use a continuum model but, instead, consider a tight-binding model on the moiré scale that captures the symmetries and topological features of the twisted bilayer graphene (TBG).
As is well known [69][70][71], the representations of the flat bands of TBG at high-symmetry momenta requires taking a model on the honeycomb lattice.To be able to study the valleys separately, we take the valley quantum number to be conserved such that the associated (fragile) topological obstructions necessitates taking at least four bands [72].We, therefore, place two Wannier orbitals W ± (r) at every site of the honeycomb lattice.We choose them to be invariant under C 3 (three-fold rotation perpendicular to the graphene layers) and transform into one-another under C 2x (two-fold rotation along x) and ΘC 2 (the product of time-reversal and two-fold rotation perpendicular to the layers); this specifies the behavior of the Wannier orbitals under all symmetries of TBG that act within a given valley.Here, our goal is not to provide a quantitatively accurate description of the LDOS of TBG but rather to demonstrate our ML procedure.It is therefore sufficient to take the simple, phenomenological forms of the Wannier states given by W ± (r) ∝ exp ∓c 1 y y 2 − 3x 2 − c 2 x 2 + y 2 2 , r = (x, y) T , (D1) which obey the required symmetry constraints, as shown in Fig. 10(a).In Eq. (D1), c 1 and c 2 are real-valued constants that we set to {c 1 , c 2 } = {1.5, 0.7} for concreteness.Including symmetry-allowed intra-orbital (inter-orbital) hopping processes up to third-nearest (nearest) neighbor leads to a tight-binding model with momentum-space form (for valley η = + and a given spin flavor)

FIG. 1 :
FIG. 1: (a) Representation in real space of the TDBG heterostructure.Green highlighted domains emphasize the emerging moiré pattern due to the combination of two AB-stacks of graphene bilayers with a relative twist angle θ.(b) Band structure at θ = 1.05 • along highly symmetrical points from the moiré Brillouin zone (inset).Solid lines represent conduction and valence flat bands (CFB/VFB) as well as remote bands (R) with valley η = +.The chemical potential corresponds to roughly a half-filling fraction (ν = 0.475) of the CFB.(c) LDOS for three fixed energies [black dotted lines in (b)] as a function of position (top), and for varying energy at fixed high-symmetry positions (bottom) in the moiré unit cell (white rhombus).(d) Schematic real-space illustration of two limiting cases of graphene and moiré nematicity, along with two sample LDOS plots at a fixed energy in the VFB; both show clear C 3 breaking.

FIG. 2 :
FIG. 2: (a) Schematic figure of the CNN architecture used with only one D ω0 (r) input channel at an energy ω 0 in the VFB, see Sec.II B for details on the architecture and training dataset.(b) Comparison between true and predicted nematic director angles ϕ.Three samples of D ω0 (r) (star, pentagon and triangle) are displayed to emphasize that the relation between the LDOS and ϕ is highly non-trivial as a result of the presence of different forms of nematicity.

FIG. 3 :
FIG. 3: (a) CNN architecture used for learning the nematic microscopic parameters.Each 'Conv2D-MaxPool-Dense' channel refers to the structure from Fig. 2(a).(b) Predicted versus true α parameter, with outliers (brighter colors) being related to small graphene nematic intensity Φ GN .(c-d) Predicted versus true parameters for graphene and moiré intensities, with colorbars representing the mean absolute error (MAE) in the intensities.Two examples (star and pentagon) indicate that two very different forms of nematicity can lead to very similar LDOS patterns at a single energy, making the inclusion of several channels necessary.
FIG. 5: Predicted values from the trained CNN to nematic intensities (a) and strain strength (b) as a function of the filling of the CFB (n s ).The gray region (n s ≥ 0.47) indicates the fillings where the continuum model showed more resemblance to the experimental data obtained in Ref. 41.In panel (c) the experimental D r0 (ω) channels for n s = 0.67 are shown for comparison with the ones obtained from the continuum model with the parameters β exp = {Φ MN , Φ GN , } = {0.086eV, 0.024 eV, 0.05%} predicted by the trained CNN.

FIG. 6 :
FIG. 6: Local density of states for a fixed position D r0=ABCA (ω) as a function of energy ω (a) and its corresponding image W (t, h) after a continuous wavelet transformation (b).Intensities displayed are in arbitrary units.

FIG. 8 :
FIG. 8: (a) Location of ABAB, ABCA and BAAC sites in the moiré superlattice for images in D exp .The LDOS channels D r0 (ω) are built by taking an average over the intensities of equivalent sites for each filling n s .(b-d) Examples of different preprocessing methods of the D ω0 (r) channels: (b) raw data, (c) more contrast and (d) contrast with Gaussian filter.(e-f) Predictions for the same parameters as in Fig. 5(a-b) for different preprocessing procedures; the dots and lines indicate the average values and the shaded region the corresponding standard deviation, see text for more details.

FIG. 9 :
FIG. 9: Comparison between D ω0 (r) from the experimental data set D exp (left panel), and the corresponding configuration obtained posteriorly within the continuum model with the predicted β exp from Fig. 5(a-b).For D th a half-filling fraction of the CFB (ν = 0.475) corresponds to a chemical potential of µ ∼ −15 meV, and the equivalent energies for the RV 1 , VFB, CFB and RC 1 in the continuum model are found for ω 0 = {−35, −15, 1, 23} meV.These values were chosen for the best possible resemblance of the images D ω0 (r) in D th with the ones in D exp , a naturally constrained procedure by the representational power of the theoretical model to the experimental data.The gray box corresponds to the gray regions in Fig. 5(a-b).The experimental images are shown with higher contrast [Fig.8 (c)] for better visual comparison.

Appendix D :
Applicability of the CNN to different models and moiré systems FIG. 11: Predicted versus true values for the nematic paramters α 0 (a), α 1 (b), α 2 (c) and α 3 (d) defined in (D7) for the minimal model in Eq. (D2).As before, colorbars indicate the MAE for each respective case.