Introduction

The recent surge of interest in kagome materials, often discussed in the context of frustrated magnetism and spin liquid phases1,2,3,4,5,6,7, has been boosted by the discovery of the kagome metals AV3Sb5 (A = K, Rb, Cs) undergoing successive charge density wave (CDW) and superconducting transitions upon lowering temperature8,9,10,11. The presence of flat bands, Dirac points, and van-Hove singularities in the electronic band structure of the ideal kagome lattice provides a playground for exotic topological properties and a variety of phases, ranging from superconductivity to charge, orbital momentum, and spin density waves12,13,14,15,16,17,18. Density functional theory calculations for AV3Sb5 have categorized the normal-state of this family as a \({{\mathbb{Z}}}_{2}\) topological metal with multiple protected Dirac crossings10 and renormalization group analyses have proposed the occurrence of various complex CDW and charge bond order (CBO) phases16,17,19,20. Interestingly, reports of giant extrinsic anomalous Hall effect suggest nontrivial band topology in the absence of long range magnetic order21, possibly driven by a CDW order with orbital currents, and high-resolution STM (scanning tunneling microscopy) measurements point to an unconventional intrinsic chiral charge22,23,24,25 consistent with a doubling of the unit-cell (2 × 2 superlattice)26. These observations imply the relevance of the ubiquitous chiral charge order present in the Haldane model27, and the possibility of higher-order topological insulators, an avenue that demands further exploration. Although a thorough understanding of the various electronic orders warrants detailed microscopic investigations16,17,19,28, we use the available plethora of experimental observations as our motivation to learn about the possible topological phases that can manifest within the electronic parameter space of the 2 × 2 kagome superlattice.

Another rapidly developing field of research is the application of machine learning to tackle physical problems29, from variational representation of wave functions30, to the detection of phase transitions31,32. Due to the absence of a local order parameter, topological phase transitions are generally more difficult to capture than symmetry-breaking phase transitions, although some progress has been achieved33,34,35. Additionally, an immediate physical interpretation of the results can turn out to be a complicated task in unbiased machine learning approaches. Yet, in a recent study36 we proposed a statistical learning of topological models on a honeycomb lattice and showed that machine-assisted unbiased learning can differentiate between the electronic parameters that are most significant for the manifestation of the well-known topological Haldane phase in the underlying lattice structure. Making use of a generalization of this method, in this work we extract topological information for the generic 2 × 2 kagome superlattice, as “learned” from the statistics of data-sets of randomized tight-binding parameters, constrained only by specific crystal symmetries. The variations of the tight-binding parameters can be interpreted as modified hoppings arising from changes in lattice parameters, atomic mass, effects of strain, pressure, spin-orbit coupling, etc. We find topologically trivial Star-of-David (SoD)-like CBO phases and non-trivial chiral flux phases. Our results are compatible with present theoretical predictions and experimental observations for the AV3Sb5 family. Additionally, we predict higher-order topological phases which might be realized in future experimental work through appropriate manipulation of the kagome lattice.

Results

Model

We consider a generic tight-binding Hamiltonian on the kagome lattice which can be taken, in a first approximation, as a minimal model to describe the low-energy electronic properties of the vanadium 3d bands in AV3Sb58,37

$$H=\mathop{\sum}\limits_{i}{\epsilon }_{i}{c}_{i}^{{\dagger} }{c}_{i}+\mathop{\sum}\limits_{\left\langle i,j\right\rangle }{t}_{i,j}{c}_{i}^{{\dagger} }{c}_{j}$$
(1)

Here 〈i, j〉 runs over nearest-neighbor sites and ci (\({c}_{i}^{{\dagger} }\)) annihilates (creates) an electron at site i. ϵi denote onsite potentials, while ti,j are hopping integrals between sites i and j. In the simple case of uniform hopping, i.e., ti,j = −1, and zero onsite potentials, ϵi = 0, the band structure of the system, shown in Fig. 1a, is characterized by a flat band at high energy and two lower-lying dispersive bands that touch each other in a Dirac point at the corners of the BZ (K points), and exhibit van Hove singularities at the M points.

Fig. 1: Band structure.
figure 1

Metallic band structure along the high-symmetry path a. Γ-M-K-Γ for the kagome lattice, and b. Γ-M\({}^{\prime}\)-K\({}^{\prime}\)-Γ for the 2 × 2 kagome superlattice, for the tight-binding model with uniform nearest-neighbor hopping (ti,j = −1) and ϵi = 0. The dashed horizontal line indicates the Fermi energy at 5/12 filling. The inset of a shows the hexagonal Brillouin zone of the kagome lattice (in black) together with the location of the high-symmetry points and the chosen path (in red).

Several works on AV3Sb516,17,19 have suggested that CDW instabilities at the van Hove fillings may cause a translational symmetry breaking of the perfect kagome lattice, leading to a lower periodicity described by a 2 × 2 supercell, analogous to that observed in STM experiments22,23,24,25. For this reason, we focus at a filling 5/12 with the Fermi energy lying at the higher van Hove singularity (see Fig. 1a). We assume our Hamiltonian to be periodic over the 2 × 2 enlarged unit cell illustrated in Fig. 2a which contains 12 sites (corresponding to the onsite terms ϵi=1,…,12) arranged in a SoD pattern, and retains all the symmetries of the point group of the kagome lattice (D6). As a consequence of the superlattice periodicity, we are left with 24 independent nearest-neighbor hoppings, which we label as ts=1,…,24 as depicted by the blue-colored links in Fig. 2a. The hopping parameters can be categorized in three distinct classes: the hoppings in the inner hexagon (ts≤6), the hoppings on the spikes of the SoD (t7≤s≤18), and the hoppings connecting sites belonging to different unit cells (t19≤s≤24). The various hoppings of each class can be mapped into each other by point group symmetry operations. In the uniform case (ts = −1, s) with zero onsite potential, one obtains 12 bands as shown in Fig. 1b for the BZ corresponding to the 2 × 2 superlattice. These bands can be unfolded to the 3 bands corresponding to the elementary 1 × 1 unit cell. By tuning the different hopping parameters, it is possible to open a topologically non-trivial gap at 5/12 filling We remark that the tight-binding model of Eq. (1) is not strictly bound to the description of the vanadium 3d bands of AV3Sb5 compounds, but retains a certain level of generality and could be applicable to other kagome systems at the van Hove filling. To investigate the possible topological phases of the 2 × 2 kagome superlattice, we employ the statistical approach described in the Methods section.

Fig. 2: 2 × 2 kagome supercell.
figure 2

a Symmetric 2 × 2 kagome supercell. The unit cell, delimited by black dashed lines, contains 12 distinct sites. The 24 independent nearest-neighbor links are colored in blue. Sites and links are labeled counter-clockwise. b Schematic representation of the hoppings and onsite potentials in the 2 × 2 unit cell with C6 symmetry. Spheres of different colors represent different onsite terms. Symmetries dictate equality of hoppings, as indicated by the colors of the bonds. The direction of the arrows signifies hopping from site j to site i (i.e., \({c}_{i}^{{\dagger} }{c}_{j}\)), and has been chosen arbitrarily, however, adhering to the constraints imposed by the C6 symmetry.

Statistical analysis

A completely unbiased analysis of the full nearest-neighbor 2 × 2 kagome superlattice involves sampling of 11 onsite and 24 hopping parameters (one onsite term is kept fixed to set the global energy scale). To improve tractability, we scale down the number of independent features by enforcing specific symmetry operations on the feature space, for instance, the point group C6, which is a subgroup of the kagome point group D6, lacking reflection symmetries. This specific choice is necessary in order to construct non-trivial tight-binding models, since the Chern number—which we set as our topological index—is odd under the effect of reflections, and thus a reflection-invariant Hamiltonian could only have C = 0.

Under C6 symmetry, the onsite terms and hoppings are reduced to a set of six unique features as illustrated in Fig. 2b. These are real ϵi≤6 = ϵ and \({\epsilon }_{i\ge 7}={\epsilon }^{\prime}\), and complex ts≤6 = t, \({t}_{s}={t}^{\prime}\) for s {7, 9, 11, 13, 15, 17}, ts = \(t^{\prime \prime}\) for s {8, 10, 12, 14, 16, 18}, and ts≥19 = \(p^{\prime \prime \prime}\). We choose \({{{\bf{{x}}}^{{{\mbox{ref}}}}}}=(\epsilon ,{\epsilon }^{\prime},t,{t}^{\prime},{t}^{\prime\prime},{t}^{\prime\prime\prime })=(0,1,-1,-1,-1,-1)\) as reference point. Our choice for the sampling radius ensures that ϵ = 0 for all samples. We generate a data set of ns = 2 × 106 samples, 67% (33%) of which are insulators (metals) (The classification of the samples in metals and insulators is performed numerically by computing the energy bands on a grid of 82 × 82 k-points, and checking whether the indirect gap at 5/12 filling is smaller or larger than an energy threshold (chosen to be 0.01tref here).). As shown by the pie chart in Fig. 3, 69.6% of the insulating samples are topologically non-trivial. Among the topological insulators, the largest fraction (60.4%) has C = ±1, while the second largest fraction (9%) has C = ±2. In the following, we discuss the properties of these topological insulators in detail.

Fig. 3: Chern numbers and marginal probability distributions.
figure 3

Distribution of Chern numbers (pie chart) and ag marginal PDFs for the most descriptive features of insulators within the 2 × 2 kagome superlattice with C6 symmetry. The pie chart shows the percentages of trivial (C = 0) and topological (C ≠ 0) samples obtained out of all insulating samples. For the PDF plots corresponding to different feature components, grey lines denote trivial phases (C = 0) and colored lines denote non-trivial (C ≠ 0) phases as indicated by the legend on top. Dashed vertical lines indicate the reference point.

In the analysis of the hopping parameters we will focus on the marginal probability distribution functions (PDFs) (see Eq. (2) of Methods) of the onsite energy \({p}_{C}(\,{{\mbox{Re}}}\,[{\epsilon }^{\prime}])\) (Fig. 3a), imaginary parts of the hoppings pC(Im[ts]) [Fig. 3(b–d)], which determine the hopping direction, and PDFs of their moduli pC(ts) [Fig. 3(e–g)], which describe the overall hopping strength. These features are those which provide most of the information about the topological character of the samples. Due to the inherent symmetry of the kagome lattice, the PDFs for \({t}^{\prime}\) and \({t}^{{\prime}{\prime}}\) show the same behavior. Hence, the distribution for only one of these hoppings, i.e., \({t}^{\prime}\), is shown. All PDFs are provided in Supplementary Figure 1. First, we analyze the PDF of the onsite term \(\epsilon ^{\prime} ={\epsilon }_{i\ge 7}\) (Fig. 3a) that distinguishes between trivial (grey line) and topological phases (colored lines). For the trivial phase, \(\epsilon ^{\prime}\) tends to be larger than zero, i.e., the outer ring of the spikes tend to be “heavier” compared to the inner hexagon. By contrast, in the topological phase \(\epsilon ^{\prime}\) tends to have smaller values. This behavior is well known from the Haldane model38 and reflects the fact that large \(| {\epsilon }^{\prime}|\) eventually turns the system into a trivial insulator.

Trivial phases (C = 0)

In the trivial phase (C = 0) we observe that the PDFs p0(Im[t]), \({p}_{0}(\,{{\mbox{Im}}}\,[{t}^{\prime}])\) and p0(Im[\(t^{\prime \prime \prime}\)]) [Fig. 3b–d] shows a maximum at zero and perfectly symmetric behavior around it. Hence, no particular hopping direction is preferred. The moduli t and \(| {t}^{\prime}|\) [Fig. 3e–f] tend to be slightly larger than 1 (the reference value), and their PDFs do not show any significant structure. On the other hand, p0(\(t^{\prime \prime \prime}\)) (Fig. 3g) possesses two local maxima of similar magnitude. By restricting the data set to the samples with \(t^{\prime\prime\prime}\) < 1.25 and \(t^{\prime\prime\prime}\) > 1.25 [corresponding to the approximate midpoint between the two local maxima of p0(\(t^{\prime\prime\prime}\)), see Supplementary Figure 2], we identify two distinct dominant configurations with C = 0, which are illustrated schematically in Fig. 4 (top row). One of them shows strong t and \(t^{\prime\prime\prime}\), and weaker \(| {t}^{\prime}|\), consistent with an inverse Star of David (iSoD)-like CBO pattern (Fig. 4, top row, left panel). The fraction of samples with this configuration amounts to 48% of the trivial cases. The remaining 52% of the trivial samples show an opposite pattern similar to the SoD-like CBO pattern, with larger \(| {t}^{\prime}|\), and smaller t and \(t^{\prime\prime\prime}\) (Fig. 4, top row, right panel). Such CBO patterns have also been predicted by phenomenological analyses of possible electronic instabilities at the van Hove filling13,16,19,39, and STM experiments have hinted towards the presence of chiral charge order patterns22 in KV3Sb5 with an iSoD-like CBO as observed for the trivial phase. The C = 0 phase of our analysis, however, is not chiral, since the real hoppings of the tight-binding model fulfill all mirror symmetries of the kagome superlattice. On the other hand, the topological phases discussed in the remainder of the paper possess a chiral character due to complex hoppings, which induce non-trivial fluxes with a specific handedness (see Fig. 4).

Fig. 4: Insulating phases of the kagome superlattice.
figure 4

Overview of the characteristics of possible insulating phases of the 2 × 2 kagome superlattice with C6 symmetry that is extracted from an unbiased data set. The hopping parameter t is colored in blue, \({t}^{\prime}\) and \(t^{\prime}\) in red and \(t^{\prime\prime}\) in green. \(t^{\prime\prime\prime}\) connects different unit cells and are arranged in triangular manner. Arrows illustrate the hopping flux direction and the line thickness indicates the relative magnitude of t, \(| {t}^{\prime}| =| {t}^{\prime\prime}|\) and \(| {t}^{\prime\prime}|\). For C = 1 and C = 2 the hopping flux direction is mirrored with respect to C = −1 and C = −2, respectively. The handedness of the complex hopping patterns testifies the chiral nature of the topological phases.

Topological phases (C = ±1)

In contrast to the trivial phase, the hoppings in the topological phases display a preference for certain winding directions, as can be seen from the distributions pC≠0(Im[t]), \({p}_{C\ne 0}(\,{{\mbox{Im}}}\,[{t}^{\prime}])\) and pC≠0(Im[\(t^{\prime\prime\prime}\)]) in Fig. 3b–d. Phases with positive Chern number can be distinguished from the corresponding phases with negative Chern number by the sign of the imaginary parts of the hoppings, since their respective PDFs are mirror images of each other. The PDFs of the moduli [Fig. 3e–g], instead, are equal for phases with positive and negative Chern number, and hence, do not distinguish between them. By restricting the data set to specific feature values, we analyze the most likely configurations for the respective Chern numbers.

We evaluate the importance score DB(pC(xi), p0(xi)) (see Eq. (3) of the Methods Section) to identify the most descriptive features xi that distinguish the non-trivial phases from the trivial one. The results are shown in Fig. 5. The importance score of ϵ is trivially zero since it is always kept at a constant value of ϵ = 0. Due to the large overlap of \({p}_{0}({\epsilon }^{\prime})\) and \({p}_{C\ne 0}({\epsilon }^{\prime})\), the importance of \({\epsilon }^{\prime}\) is rather low. However, as described earlier, the PDFs show clear peaks revealing \({\epsilon }^{\prime}\) as distinguishing parameter between topological and trivial phases. Next, we infer from Fig. 5 that \({t}^{\prime}\) and \(t^{\prime\prime}\) have the same importance, since their PDFs show the same behavior. t and \(t^{\prime\prime\prime}\) have higher importance than \({t}^{\prime}\) (and \(t^{\prime\prime}\)) for differentiating C = ±1 phases from the C = 0 phase, while \({t}^{\prime}\) (and \(t^{\prime\prime}\)) and \(t^{\prime\prime\prime}\) are more important than t for differentiating C = ±2 phases from the C = 0 phase. This importance with respect to the differentiation between the Chern classes is reflected further in the PDFs of these features in a qualitative manner, as discussed in the following.

Fig. 5: Importance score.
figure 5

Importance score defined by the Bhattacharyya distance DB(pC(xi), p0(xi)) (Eq. (3)) for phases with C = ±1 (left) and C = ±2 (right) of the C6 symmetric model for the 2 × 2 kagome superlattice. The distinct features xi are ϵ, \({\epsilon }^{\prime}\), t, \({t}^{\prime}\), \(t^{\prime\prime}\) and \(t^{\prime\prime\prime}\) as defined in the main text.

For C = ±1 phases, we find that the moduli t, \(t^{\prime\prime}\) and \(t^{\prime\prime\prime}\) behave similarly [Fig. 3e–g], and we infer that the relative bond strengths may not be a strong distinguishing feature for the topological phases. This is depicted in Fig. 4 (middle row) by equal thickness of the blue, red and green bonds for C = ±1 phases. On the other hand, we gain crucial insight from the imaginary parts of hoppings in this phase [Fig. 3(b-d)]. For C = 1, both Im[t] and Im[\(t^{\prime\prime\prime}\)] tend to be larger than zero, which corresponds to a counter-clockwise winding of the hoppings of the inner hexagon and a clockwise winding of the hoppings forming the outer triangles (connecting different 2 × 2 cells), as schematically illustrated by arrows in Fig. 4 (middle row). The sign of \(\,{{\mbox{Im}}}\,[{t}^{\prime}]\) and Im[\(t^{\prime\prime}\)], instead, does not discriminate between C = 1 and C = −1 due to missing contrast between the corresponding probability distributions. Hence, the orientations of \(t^{\prime}\) and \(t^{\prime\prime}\) bonds are not shown in Fig. 4 for C = ±1. We note that a large fraction of C = 1 topological insulators (49%) shows this configuration, while the remaining samples are distributed incoherently.

Our characterization of the C = ±1 phase shares similarities with the “chiral flux phase" (CFP) proposed in ref. 40 as a minimal model for the time-reversal symmetry breaking which is observed in muon spin relaxation experiments in KV3Sb541 and CsV3Sb542, and for the giant anomalous Hall effect measurements21 in KV3Sb5. The CFP phase, which represents a possible electronic instability of the kagome metal at the van Hove filling16, is described by a C6-symmetric tight-binding model, which breaks time-reversal, but is invariant under the simultaneous action of time reversal and lattice reflections40, analogous to the Haldane model on the honeycomb lattice27. As opposed to the CFP phase of ref. 40, our results for the C = ±1 phase suggest that the imaginary parts of \({t}^{\prime}\) and \(t^{\prime\prime}\) hoppings may not play a relevant role in the topological character of this phase.

Topological phases (C = ±2)

In the C = ±2 phases, the moduli of all features behave similarly to the C = ±1 phases. On the other hand, the sign of Im[t] does not discriminate between C = 2 and C = −2 due to low contrast between the PDFs p2(Im[t]) and p−2(Im[t]) [Fig. 3 (b)]. Phases with positive and negative Chern number are differentiated by the signs of \(\,{{\mbox{Im}}}\,[{t}^{\prime}]\), Im[\(t^{\prime\prime}\)] and Im[\(t^{\prime\prime\prime}\)], which leads to their relatively higher importance score. For C = 2, the hoppings along the outer spikes of the SoD (\({t}^{\prime}\) and \(t^{\prime\prime}\)) show clockwise winding, while the hoppings in the outer triangles (\(t^{\prime\prime\prime}\)) show counter-clockwise winding, as illustrated in Fig. 4 (bottom row). The largest coherent group of samples of topological insulators with Chern number C = 2 (56%) shows this particular configuration.

Further analysis

Motivated by recent experimental results detecting signatures of rotational symmetry breaking in the electronic properties of some AV3Sb5 materials18,43, we investigate the fate of the topological phases of Fig. 4 when the symmetry of the tight-binding model is reduced from C6 to C2. We repeat our statistical analysis by forcing the Hamiltonian on the 2 × 2 superlattice to be invariant only under rotations of 180, thus increasing the feature space of the model to 18 distinct parameters, i.e., 6 onsite potentials and 12 hoppings. We start from a reference point (as explained in the Methods Section) with uniform hoppings (t = −1) and zero onsite potentials. Among our samples, only a fraction of 13% represent topological insulators (vs. 46% in the analysis with C6 symmetry), most of them possessing C = ±1 Chern number (98.6%). While a smaller fraction of topological insulators can be expected as a consequence of the enlargement of the feature space, the strong reduction of C = ±2 samples (1.4% of the total number of topological insulators) may suggest that this phase is rather fragile to rotational symmetry breaking. In contrast, the C = ±1 insulating phase is considerably less affected, and thus seemingly more stable.

It is worth emphasizing that, although our analysis has been performed on a model of spinless electrons which explicitly breaks time-reversal symmetry (due to the presence of complex hoppings), our results can provide direct information about what one shall expect for a time-reversal invariant Hamiltonian of spinful electrons. Indeed, in analogy to the generalization of the Haldane model to the Kane-Mele model44, we took two copies of our topological tight-binding Hamiltonians of Fig. 4 to construct a time-reversal invariant model for spinful electrons. The samples with odd Chern number in the spinless case, i.e., those belonging to the C = ±1 phase, yield a non-trivial \({{\mathbb{Z}}}_{2}\) invariant in the case of spinful electrons, which is characteristic of quantum spin Hall phases44.

Discussion

By employing machine-assisted unbiased statistical learning constrained only by specific crystal symmetries, we extract meaningful topological information concerning the 2 × 2 kagome superlattice. The highlights of our procedure are three-fold: first, one is able to tune through a large parameter space to find non-trivial topology in the kagome superlattice, second, specific crystal symmetries can constrain these parameters resulting in certain flux patterns concomitant with CBO/CDW orders, and third, one retains high levels of physical interpretability of the results. For the kagome superlattice with C6 symmetry, we infer possible SoD/iSoD-like CBO patterns and topologically non-trivial flux patterns from the large data sets of randomized hopping parameters. Our findings for the trivial and topological phases share similarities with recent experimental observations and theoretical predictions for the intensely discussed AV3Sb5 kagome materials. Moreover, we infer that additional topological phases with higher Chern index (C = ±2) might exist. Furthermore, by reducing the crystal symmetry to C2, whose signatures were found in AV3Sb5 in recent experiments, we examined the stability of topological phases. While C = ±1 appears to be stable, the discovered C = ±2 phases seem to be rather fragile. We also extended our analysis to spinful Hamiltonians, which show quantum spin Hall states. Our results provide a repository of knowledge that can guide future engineering endeavors to build kagome materials (or modify existing ones) with a desirable topological phase. In this regard, a foreseeable extension of the present work consists of pursuing a material-specific analysis, searching for topological phases in the feature space of a tight-binding model obtained by ab initio calculations for a specific target material. In the case of AV3Sb5 compounds, this may involve a multi-orbital description, featuring additional vanadium d-orbitals and antimony p-orbitals37, and the inclusion of spin-orbit coupling effects. Furthermore, the investigation of a layered superlattice geometry, such as the 2 × 2 × 4 structure recently observed in Raman spectroscopy45 and x-ray diffraction46 experiments in CsV3Sb5, represents a viable future direction. In both cases, the addition of physically motivated ingredients in the tight-binding Hamiltonian could lead to an improved understanding of the actual physical origin of the chiral topological phases.

Methods

Definitions

Our statistical approach36 uses random number generators to yield a data set of ns different tight-binding Hamiltonians (samples) for a given lattice. Each sample is characterized by a vector of features, \({{{\bf{x}}}}=({x}_{1},\ldots ,{x}_{{n}_{{{\mbox{f}}}}})\), grouping the nf distinct parameters of the model, and is classified by the label, which is a function of the features, i.e., l = f(x). For the current system, the onsite terms ϵi and the (complex) hopping parameters ti act as features. Samples are categorized into metals and insulators based on the presence of a finite band gap at the filling to be considered. After omitting metallic samples, the first Chern number C47,48,49 is then chosen as the label for insulating samples. Hence, a feature vector for a sample is given by x = (ϵ1, ϵ2, … , t1, t2, … ) with label \(l=C[H({{{\bf{x}}}})]\in {\mathbb{Z}}\).

Data generation

Each sample in the data set is generated by randomly picking a value for each feature from a uniform probability distribution function (PDF). Specifically, for a given complex feature xi, where i indexes different features, we sample the uniform PDF restricted to a sphere in the complex plane centered at a given reference point \({x}_{i}^{\,{{\mbox{ref}}}\,}\), with radius \(\alpha | {x}_{i}^{\,{{\mbox{ref}}}\,}|\), where \(\alpha \in {\mathbb{R}}\). Throughout this work, we choose α = 1.5. This choice for the sampling space ensures physically reasonable configurations, since extreme hopping values are excluded. For gaining maximum insight into the data, we can decompose the complex features xi into real features, namely the real part Re[xi], the imaginary part Im[xi], the magnitude xi, and the phase \(\varphi [{x}_{i}]=\arg [{x}_{i}]\).

Statistical analysis

To understand which features play a major role in determining the topological properties of the model, we calculate the PDFs pl(xi) of each feature xi for each label l. This is achieved by integrating out all other features xj ≠ xi from the bare distributions of the topological class ρl(x)

$${p}_{l}({x}_{i})=\int \ldots \int {\rho }_{l}({{{\bf{x}}}})\mathop{\prod}\limits_{j\ne i}\,{{\mbox{d}}}\,{x}_{j}.$$
(2)

For a given feature xi, the comparison of the PDFs for different labels l, i.e., for different Chern numbers, provides information on the importance of xi for the topological properties of the tight-binding model. To quantify the difference between two PDFs, we make use of the Bhattacharyya distance50, defined for a complex feature as

$${D}_{B}(p,q)=-\log \left[\int \sqrt{p({x}_{i})q({x}_{i})}\,{{\mbox{d}}}\,{x}_{i}\right].$$
(3)

Here, p(xi) and q(xi) are generic PDFs and DB(p, q) is always larger than zero unless p = q.

The measure represented by DB acts as an indicator of the descriptiveness of features xi through DB(pl(xi), p0(xi)), which quantifies the difference of the respective PDFs for Chern labels l ≠ 0 w.r.t. the trivial case. Larger values contribute most to the topological character. Based on this, one can simplify the investigation of the feature space by focusing only on the most descriptive features with high importance score, which amounts to a dimensionality reduction. A complementary strategy makes use of symmetries that are either based on observed behavior of the PDFs or physical motivation.

The combined approach can generally take several iterations of re-sampling and analyzing the obtained data sets. The interplay of different features can be assessed by computing statistical correlations among them. A straight-forward estimator of linear correlations is provided by the Pearson correlation coefficient51. A complementary way to investigate correlations is to restrict the data set to samples where certain features have specific values, e.g., Im[xi] < 0, and afterwards investigating the PDFs of the restricted data set, as done here.

Discussions

Summarizing, this approach tackles an nf-dimensional phase space by sampling hopping parameters and computing the Chern number of the resulting Hamiltonians. From the average properties of the distributions of the different Chern numbers, we are able to reconstruct a posteriori an effective description of the topological phases and their properties. This method not only yields information on the symmetry of the topological phases, but also provides crucial insights on which hoppings play a relevant role in determining the topological character. For example, the statistical analysis of the C = ±1 phases identified in our work indicates that the imaginary parts of \(t^{\prime}\) and \(t^{\prime\prime}\) hoppings are not important to determine the topological character of the state, as discussed in the main text.