Machine Learning for Exploring Small Polaron Configurational Space

Polaron defects are ubiquitous in materials and play an important role in many processes involving carrier mobility, charge transfer and surface reactivity. Determining the spatial distribution of small polarons is essential to understand materials properties and functionalities. This requires an exploration of the configurational space, which is computationally demanding when using standard first principles methods, and technically prohibitive for many-polaron systems. Here, we propose a machine-learning (ML) accelerated search that compares the energy stability of different polaron patterns and determines the ground state configuration. The kernel-regression based ML model is trained on databases generated by density functional theory (DFT) calculations on a minimal set of initial polaron patterns, obtained by using either molecular dynamics simulations or a random sampling approach. To establish an efficient mapping between training data and configuration stability we designed simple descriptors that model the interactions among polarons and charged point defects. The proposed DFT+ML protocol is used here to explore millions of polaron configurations for two different systems, oxygen defective rutile TiO$_2$(110) and Nb-doped SrTiO$_3$(001). Our data shows that the ML-aided search correctly individuates the ground-state polaron patterns, proposes polaronic configurations not visited in the training and can be used to efficiently determine the optimal distribution of polarons at any charge concentration.


Introduction
Polarons are quasiparticles that can form in polarizable materials by entanglement between charge carriers and lattice distortions 1, 2 . An unbound electron or hole injected in a material can interact with the lattice vibrations (phonon). If the electron-phonon coupling is strong enough, the excess carrier induces local structural deformations associated with a potential well in which the carrier is selftrapped 3,4 . Polarons represent a long-standing yet still exciting field of research with profound impact in different disciplines ranging from physics to chemistry and material science, where they play a pivotal role in several phenomena of practical importance such as carrier mobility [5][6][7][8][9][10][11] , surface reactivity [12][13][14][15][16] and optical excitations [17][18][19][20] , and represent a testbed for the development of many theoretical models and numerical methods 1 .
Small polarons, whose wave function is spatially confined within a few Å around their trapping site, are subjected to thermally-activated hopping processes, which enable polaron mobility. As a consequence, small polarons can travel through the material forming different spatial distributions (polaron configurations) that have a strong impact on the properties and functionality of the material 21 . Predicting favorable polaron configurations is key to correctly interpret experimental measurements and predict material's behaviour, but it is a challenging task. Polaron ground-state distributions result from the balance between contrasting interactions, primarily polaron-polaron repulsion and the attraction between the negatively-charged polarons and the positively-charged donor defects (e.g., oxygen vacancies and/or dopants) 22 . Considering that polaron formation is favored on surface or near-surface sites, dimensionality effects and surface reconstruction also play a crucial role in defining the optimal polaron configuration, complicating the scenario even further 14,21 .
Material-specific properties of small polarons are typically computed by using first principles approaches in the framework of density functional theory (DFT) and appropriate extensions [22][23][24][25] . The DFT modelling of defects-induced polarons is complicated by the need to adopt large supercells in order to attenuate artificial interactions between periodic images of the polaron, which hampers an efficient exploration of the huge configurational space and makes the calculations computationally very demanding [26][27][28] .
The routine approach to explore the polaron configurational space is based on a combination of manual selection and molecular dynamics (MD) 22,29,30 . This protocol involves a three-step procedure: (i) Generating a pool of initial structures, where the polaron trapping sites are manually selected (using different types of site-controlled strategies 22,30,31 ); (ii) subsequent MD runs at temperatures high enough to activate small polaron hopping, thus allowing for a guided exploration of energetically favorable configurations; (iii) finally, a set of ground-state static DFT calculations for all inequivalent configurations found in the MD runs will determine the energetically favorable solutions at low temperature. This DFT-MD scheme can be easily automatized in a workflow, but the prohibitively long time required to execute MD runs on large supercells and the sporadic thermally-activated polaron hopping events, prevent an efficient exploration and restricts de facto the search to a limited subset of the full configurational space. An alternative approach to bypass the extensive MD-simulations (step ii) would be to create a larger pool of manually-selected polaronic configurations in step i and directly determine their energy stability as in step iii. However, this strategy relies on human intuition that could bias the selection process of trapping sites, thus excluding possible favorable configurations from the investigated pool. Exhaustive searches including all possible configurations can be performed for the simplest cases, i.e., the dilute limit, where the amount of polarons is relatively low. At higher defect concentrations, however, the combinatorial growth of configurations is excessive (i.e., dense limit). Therefore, random sampling approaches have to be used, which cannot guarantee the determination of the proper ground state. For this reason, alternative sampling methods are necessary to correctly describe the properties of materials hosting small polarons.
In this work we propose a data-driven strategy for an accelerated exploration of the polaron configurational space in order to predict the optimal trapping-site patterns for polarons. In recent years, machine learning (ML) has been used extensively in problems involving many combinatorial possibilities 32 , as in the cases of chemical compound space 33 , materials design 34 and the calculation of potential energy surfaces and force fields [35][36][37] . Here, we propose a simple kernel regression scheme 38, 39 , with descriptors that embody the polaron-polaron and polaron-defects interactions, trained on minimal DFT datasets to assess the relative stability of the initial (few hundreds) polaron configurations. The trained ML algorithm can then be used to extend the exploration of the configurational space by systematically analyzing millions of configurations, going far beyond the limits of first principles MD or any alternative sampling approaches based entirely on DFT.
The proposed DFT+ML strategy is applied here to two prototypical polaronic materials considering different types of doping: the oxygen-defective rutile TiO 2 (110) surface 12,15,22,29,[40][41][42][43] and the Nb-doped perovskite SrTiO 3 (001) surface 44-46 . We show that the ML-aided search correctly identifies the ground-state polaron configuration for arbitrary carrier density (from the dilute to the dense limit), as confirmed by benchmark DFT tests on selected ML-predicted configurations. Our results show that our proposed method can be employed to efficiently determine the optimal polaron patterns in diverse polaronic materials, and can be trained to account for the interaction of polarons with different point-like defects (oxygen vacancies and dopants, but also interstitials and adatoms for example).

Results
We start with a brief description of the ML-aided algorithm designed to predict the stability of multipolaron configurations (more details can be found in the Methods 4). The schematic protocol, shown in Fig. 1, involves a mapping between a general surface structure containing charge-donor defects (e.g. dopants or oxygen vacancies, Fig. 1(a)) and possible polaron hosting sites (typically transition metal ions) with a kernel-regression scheme. The connection is established by means of a simple descriptor (D) of local interactions ( Fig. 1(b), details provided in Section 4.2) which encode the spatial distance between the reference polaron to other polarons and point defects included within a cutoff sphere. This representation is used in a kernel regression scheme (Fig.1(c)) to map a given many-polaron configuration to the corresponding polaron formation energy (E pol ) as calculated by DFT (see Fig. 1(d)). In DFT the polaron formation energy E DFT pol is defined as the total energy difference between the polaronic solution (E dist loc , with the excess charges localized in a locally distorted lattice sites) and the delocalized solution (E undist deloc , with all excess electrons uniformly delocalized over the lattice), 22 . In our approach, the kernel regressor assigns a virtual single-polaron energy to every polaron in the configuration; the predicted single polaron energies are summed up to return the total polaronic energy of the test configuration, which can be compared to the E pol calculated by DFT.
This scheme is used to first train the kernel regressor on an energy dataset of different polaronic configurations calculated by DFT. Then, the trained ML algorithm can be applied to polaron configurations not included in the DFT training database, in order to drastically expand the exploration of the configuration space. To ensure the quality of the predictions, the optimal polaron patterns identified by the ML algorithm can be ultimately tested and refined by few final DFT calculations.
In the following, we assess the quality and general efficiency of this computational strategy on rutile TiO 2 (110) and perovskite SrTiO 3 (001) surfaces by studying the relative stability of millions of possible polaron configurations with varying polaron concentration (i.e., various polaron densities), from the dilute to dense polaron limits. (c) Kernel functions k(·, ·) with parameters α i measure the similarity of the target-polaron descriptor D with the training-set samples D i , and assign a virtual single-polaron energy to every polaron in the configuration. (d) The ML-predicted single-polaron energies are summed up to return the total polaronic energy E ML pol of the configuration, which can be compared to the E DFT pol .

Rutile TiO 2 (110)
The structural unit of rutile TiO 2 (110) is sketched in Fig. 2(a). The surface layer S0 consists of alternating rows of five-and sixfold-coordinated Ti atoms, referred to as Ti A and Ti B , respectively. Sub-surface and sub-sub-surface layers are indicated with the label S1 and S2, respectively. Ti atoms coordinated below the surface Ti A S0 / Ti B S0 sites are labelled as A-/ B-sites, respectively, with an additional subscript l indicating the layer, e.g., Ti A S1 refers to a Ti A sites located in the subsurface layer S1, whereas Ti B S2 indicates a B-type Ti located in S2. The surface terminates with two parallel rows of oxygen atoms oriented along the [001] direction, which are easily removable to form surface oxygen vacancies (V O ). Each (V O ) effectively donates two electrons that can be trapped in titanium sites converting two pristine Ti 4+ ions into Ti 3+ ions. Increasing the oxygen vacancy concentration c V O leads to higher densities of polarons. Due to the combinatorial growth of accessible polaron configurations with progressively larger amounts of polarons, finding the most favorable distribution of polarons is a very challenging task.
Here we employ a 5 layers thick slab with a large 9×2 two-dimensional (2D) unit cell containing 36 Ti sites per layer. 14, 21 Since the bottom two layers are fixed to bulk form (polaron inactive since no structural relaxation are allowed), this setup results in 108 potential trapping sites. We have inspected nine different defect concentrations starting from 1 V O per 2D unit cell (corresponding to c V O =5.5%) up to 9 V O (c V O =50%). Since polaron trapping at Ti sites follows a binomial distribution, the number of possible configurations ranges from ≈ 5 × 10 3 (2 polarons, c V O =5.5%) to ≈ 10 20 (18 polarons, c V O =50%), without considering symmetries (see Section S1.1 of the SM for a more detailed discussion).
To tackle this formidable problem, we rely on a previously generated MD polaron energy dataset 14,21 . It consists of 492 symmetrically-inequivalent polaron configurations, obtained by running MD simulations at nine different V O concentration levels (ranging from c V O = 5.5% up to c V O = 50%). Details on the database are provided in Table ST1 of the SM. The MD-based dataset suggest that at low c V O , polarons preferably localize at Ti A S1 sites, while at higher defect and polaron concentration the S0 sites become comparatively more populated 14,21 .
To begin with, this DFT-MD database was split in train, validation, and test datasets (70%, 20%, 10%, respectively). The ML scatter plots shown in Figs. 2 (b) and (c) show that the model can well reproduce theĒ DFT pol with a mean square error MSE≈ 1.2 · 10 −4 for training, and MSE≈ 1.3 · 10 −4 for both validation and test (see Table ST5 in SM). We note that configurations at very shallowĒ DFT pol (i.e., values higher than -0.2 eV) show larger errors, which we attribute to the lack of samples in the training dataset, owing to rare occurrence of these unfavorable polaronic solutions in the MD. In fact, these

4/13
polaron energies correspond to configurations where a polaron is localized on the unstable Ti B S1 site. Since we aim to identify the optimal polaron configuration, this issue is not of great concern, but it highlights the need of a well representative sampling of the polaron configurational space. As we show in Section 2.2, a random sampling can overcome these limits of an MD-generated database.
To assess the validity of our method to predict configurations that strongly differ from the available training data, we have performed an additional test on unexplored polaron V O concentrations, by adopting the "omitted defect concentration" strategy described below. We constructed nine distinct test cases, one for each V O concentration level available in the MD database. In each of these cases, we limited the training dataset to samples belonging to only 8 of the 9 available defect concentrations. Finally, we tested the model on the remaining concentration, not used in the training. We analyzed qualitative and quantitative aspects of this procedure by conducting a direct comparison between ML-derived and target DFT energies for specific defect concentrations. We also compared the most stable configurations found via ML and via DFT-MD, by means of a convex hull diagram (reporting the most stable configuration at every defect concentration), to inspect whether ML is capable to deliver a qualitatively consistent energy landscape. This is particularly important for the case under study, since experimental measurements indicate that, for c V O ≈ 20%, highly defective TiO 2 (110) undergoes a 1 × 1 to 1×2 surface reconstruction, which can be connected to a lowering of the polaron energy with increasing c V O beyond a certain critical polaron concentration 14,47 .
The results are collected in Fig. 2(d) and in the Figure SF5 and Table ST5 of the SM. Even though we notice an apparent loss of precision for configurations that are different from those included in the training data, leading to larger systematic errors (the resulting MSE ofĒ pol is of the order of 10 −3 for all test concentrations ), the ML model is still able to correctly identify the optimal arrangements of polarons from the MD dataset and find the correct minimum at c V O =22%. This is shown by the dot sizes in Fig. 2(d), indicating the ML-predicted favorability of a configuration relative to other predicted configurations in the respective defect concentration. This result demonstrates that the DFT+ML scheme, trained with the available DFT dataset, is capable to predict with good accuracy the relative stability of solutions corresponding to different concentrations.
From these results, based on a ML-processing of the DFT-MD input database, we can conclude that a large-scale ML-based systematic search of new configurations (i.e., configurations not included in the original DFT database) should be feasible, provided that the final results are validated by a subsequent comparative DFT simulation. In this way, an accelerated ML exploration of the full configurational space would be possible, and only a small fraction of first principles calculations would be necessary, limited to the most favorable configurations found by ML. Different methods could be used to perform this exhaustive search in the configuration space, and find the optimal polaron arrangement for each concentration: Our designed "exhaustive search" strategy is described in the following.

Systematic search of configurations via ML
An efficient search in the entire configuration space needs to address the problem of the combinatorial growth of polaron configurations with increasing polaron density (increasing c V O ). In our DFT calculations, the unit cell contains 108 possible Ti trapping site, leading to 108 18 ≈ 10 21 possible arrangements of polarons in the most defective case (c V O = 50%, 9 V O defects in our cell with 18 polarons). To cope with this huge combinatorial space, and avoid an inefficient blind search, we have developed a bottom-up searching strategy based on the concept of polaron building units. We have first trained our ML model on the entire DFT-MD database (492 inequivalent configurations). We have then addressed the problem of two polarons (lowest considered concentration, c V O = 5.5%) and predicted all 108 2 = 5778 possible variants using the ML model, thus extending considerably the available DFT-MD database at that concentration. From these 5778 configurations we selected the 100 with the most favorable polaron energy E ML pol as calculated by ML, and used them as polaron building units to construct the polaron configurations at the next concentration level (c V O = 11.1%, 4 polarons). To this aim, we followed the same additive strategy, namely adding two additional polarons to each of the 100 two-polarons configurations, scrutinizing with the ML model all 106 not yet occupied hosting sites. Similarly to the previous concentration, also in this case only a subset of 100 distinct, energetically most favorable configurations has been used to build the next configurations (c V O = 16.7%, 6 polarons). The same protocol has been adopted for all higher concentrations. Finally, at each concentration the three best ML-predicted polaron configurations have been verified by performing a DFT run using ML-predicted pattern as input with a selective initialization of the occupation matrix (see Section 4.1 and Section S2 in the SM). We refer to these final set of energies as exhaustive search ML-DFT energies.
The scheme described above allows for a quick calculation of roughly 4 · 10 6 configurational energies, excluding a larger number of highly unfavorable configurations, and keeping the computational cost constant among each defect concentration. The results are displayed in Fig. 2(d), where we show the comparison between the reference MD-DFT convex hull (input database, dashed line) and the best obtained ML-DFT energies at each concentration (full lines). In Figure SF6 and SF8(a,b) of the SM most stable configurations and the distribution of polarons to specific sites, respectively, are shown. The overall outcome is very satisfactory with a few positive aspects to note. First, the ML-aided DFT procedure finds a lowest absolute minimum not present in the MD database, clearly demonstrating the remarkable efficiency of the proposed ML scheme in exploring the configurational space. The absolute minimum is shifted from c V O = 22.2% to 27.8% resulting in a much smoother shape of the ML-derived convex hull. At defect concentrations lower than 22% the two curves essentially overlap, with the ML configurations all slightly lower than the best MD configurations (in this case with a modest improvement of few meV). More subtle are cases at c V O > 33%, where polarons start populating Ti B S0 -sites. This is due to a poor sampling of Ti B S0 -site in the MD simulations, and to a tendency at high concentration towards charge delocalization which impedes a site-specific assignment of the excess charge, pushing the system out of the polaron regime. As already mentioned, to avoid an unrealistic densely-packed arrangement of polarons or occupation of highly unfavorable trapping sites (unlikely to be observed in experiment 48 ), TiO 2 (110) samples undergo a polaron-driven (1×1) → (1×2) structural reconstruction at c V O ≈ 20% 14 . Therefore, the apparent inefficiency of the ML search above the critical concentration is not an intrinsic deficiency of the designed strategy, rather it should be traced back to physical arguments that hamper the construction of a suitable database for an unrealistic situation. Forcing the structure to maintain the (1×1) symmetry leads to formation of Ti B S0 polarons, which are energetically not favorable, and tend towards charge delocalization and to polaron diffusion to the bulk (trapping at S 2 sites) 49 . In this sense the present ML data provide further support and validation to the DFT-based description of the polaron-driven surface reconstruction discussed in Ref. 14.

SrTiO 3 (001)
To assess the degree of transferability and generality of the proposed methodology we applied a similar scheme to Nbdoped SrTiO 3 (001), a different structure (atomically flat perovskite surface, see Figure 3(a)) with different source of excess charge (chemical doping instead of V O ), resulting in different type of interaction categories (see Section 4.2). To further generalize our DFT+ML procedure we decided to follow a different strategy to build the necessary DFT database. Instead of MD runs, which require long execution time and could result in an inefficient exploration of the configuration ground state, we have adopted a randomly generated polaronic database using the occupation matrix approach (see Table ST2, Section 4.1 and Section S2 in the SM). This procedure has few advantages as compared to the MD-sampling methodology. First of all, obtained samples are less correlated than in the MD-runs. Secondly, the bias towards low energy configurations, which are disproportionately more often visited in an MD-simulation, is removed. This should result in a more general model that has higher accuracy across all possible polaron patterns. Lastly, and most importantly, it fully bypasses the cost of the MD-simulations and only the structural relaxations for each distinct polaronic pattern has to be performed (see Section S5 in the SM). Following the same protocol as discussed in the previous section, we again performed a randomized split for training the model on the entire database, consisting of 379 polaronic configurations. Following this we assessed the extrapolation capabilities by testing the model on defect concentrations not present in the training data. Lastly, we performed an exhaustive bottom-up search for all defect concentrations, were we then evaluated most favorable predictions at the DFT-level. Figure 3 (b) and (c) show the results from the randomized data-split. The model is converged to a similar extent as in the case of TiO 2 (Ē pol MSE of 5.7 · 10 −5 , 9.3 · 10 −5 for training and test data, respectively), and, noticeable and unlike TiO 2 , the model delivers consistent accuracy at virtually all energies. This increased performance most likely originates from the less biased energy database obtained by the random sampling. In the case of omitted defect concentrations (relative favorablity prediction shown in Figure 3 (d)), the model correctly extrapolates the low energy configurations for the omitted concentration based on the energies of the 4 included concentrations, and suffers to a smaller extent from systematic errors as compared to TiO 2 (compare TiO 2 and SrTiO 3 cases in Tables ST5 and ST6 as well as Figure SF5).
Finally, we tested the efficiency of the exhaustive ML search, by exploring 2.25·10 6 nonequivalent configurations. The outcome (see dotted and full line in Figure 3(d)), clearly shows that this ML-augmented scheme outperforms the standard random-database approach, as it finds multiple configurations lower in energy than the minima identified in the randomized search, at any polaron concentration. We confirmed the five lowest energies E ML pol predicted by the ML exhaustive search at each Nb-concentration by a comparative static DFT-calculation and each of the ML-predicted polaron pattern was found more stable than the optimal pattern included in the training database. The most stable configuration predicted by ML (see Figure  SF7 for a collection of the most optimal polaron configurations) typically improved the mean polaronic energyĒ pol by 30 to 50 meV compared to the reference data. Interestingly, the results of the exhaustive ML search suggest a rationale for the most stable configuration based on a few simple rules: the energetically most stable configurations host polarons in the surface and subsurface layer, usually placed as close as possible to the Nb-dopants (preferably below or above a dopant rather than in the same atomic layer (see site distribution of favorable configurations collected in Figure SF8(c) in the SM).

Discussion and Conclusions
In this paper we presented an ML-aided procedure to enhance and accelerate the identification of small polaron ground state configurations in multi-defect systems with varying defect concentration, by employing simple and general descriptors based on distance-dependent interaction categories and a standard kernel regression fed by a DFT energy database. We tested and discussed a few alternative protocols: (i) A conventional train/validation/test ML protocol. (ii) Omitted-defect concentration model, based on extrapolating the polaron energy for a given defect concentration from the DFT energies obtained at other concentrations.
(iii) Exhaustive ML search. An exploration of the polaron configurational space based on a guided bottom-up selection of the most favorable configuration from all possible nonequivalent configurations at each given defect concentration.
We assessed the generality of the procedure by applying it to two different materials (TiO 2 (110) and SrTiO 3 (001)) with different types of defects (V O and Nb dopants) and adopting different strategies to construct the DFT database (MD and random sampling). Our data indicates that a randomized sampling approach is superior to MD-generated database which suffers from undesirable correlation between the MD-generated configurations and excessive computational cost. Importantly, the combination of random sampling and exhaustive ML search results in a robust algorithm that delivers very good results, as demonstrated for SrTiO 3 (001) where this procedure leads to a systematic improvement at all explored concentration: the exhaustive ML search finds configurations with lower 0-K DFT energy as compared to those included in the input database.
While our model has been applied to the identification of polaron configurations with static dopant/vacancy patterns, it can be further extended to consider optimized configurations with mobile point defects considering other type of defects (e.g. hydrogen adatom or Ti interstitials) and other materials 9 . In fact, the descriptor only relies on identifying polaron hosting sites with different local coordination and lattice symmetry and their relative position with respect to the surface, to structure a list of distances between polarons and defects. From this information, the descriptor structure can easily be attained for any material and only few parameters (i.e., number of included distances in each interaction category and cutoff radius) need to be determined to optimize the performance.
A final positive aspect offered by the proposed method is the arbitrary scalability with respect to the supercell size, enabling access to large scale simulations, where defect arrangements could be precisely aligned with experimental data, to determine likely polaronic configurations observed in surface imaging techniques. Also, qualitative extrapolations and interpolations to defect concentrations where no data is available and predictions seem plausible in the presented test cases and could be further developed.

Density Functional Theory
Density-functional theory (DFT) and first-principles molecular dynamics (MD) calculations were performed using the Vienna Ab-initio Simulation Package (VASP) 50, 51 . For our DFT+U calculations we adopted the generalized gradient approximation with Perdew, Burke, and Ernzerhof parametrization (PBE) 52 , including an on-site effective U = 3.9 eV enacted on the d-orbitals of Ti atoms in the case of rutile TiO 2 (previously determined by constrained random-phase approximation (cRPA) 41 ) and U = 4.5 eV for SrTiO 3 , here enacted on the d-orbitals of Ti and Nb, in line with the cRPA value determined in previous works. 53-55 We used the Γ point only for the integration in the reciprocal space, and standard convergence criteria with a plane-wave energy cutoff of 250 eV 21 for rutile TiO 2 , and 350 eV for SrTiO 3 .
The rutile TiO 2 (110) and SrTiO 3 (001) surfaces were modeled by super cells, containing five stochiometric layers in large two-dimensional 9 × 2 and 6 × 4 unit cells, respectively. The three surface layers and the corresponding labeling are shown in Figure 2(a) and Figure 3(a). In both cases the bottom two stochiometric layers were kept fixed at bulk positions in order to mimic and retrieve the electronic and structural properties of the bulk. Oxygen vacancies (V O ) on the two-fold coordinated O sites on the TiO 2 surface, and Nb dopants replacing Ti atoms on the surface and subsurface SrTiO 3 layers, were modeled at nine and five different concentration levels, c V O and c Nb respectively. The defect positions were chosen such that inter-defect distances are maximized, and the concentrations are given in percentage with respect to the number of two-fold surface oxygen sites in TiO 2 and the total number of Ti sites in SrTiO 3 . SrTiO 3 is known to exhibit a wide variety of surface reconstructions, and it has recently been shown that flat bulk-terminated (001)-surface with surface defects can be stabilized using novel cleaving procedure 56, 57 .
The polaronic localization sites were identified by inspecting the size of the local magnetic moments on Ti ions (larger than 0.5 µ B ) and relaxations to 0 K have been performed from each distinct polaron localization pattern. In the case of rutile TiO 2 nonequivalent polaron configurations were generated via MD at high temperature (700 K). Instead, to build a database of polaronic configurations in SrTiO 3 , for each concentration, an appropriate number of Ti sites within the top three layers were randomly chosen to host the polarons. For localizing the excess charge in the selected Ti hosting site we employed the occupation matrix control tool 30 . This tool allows us to constrain the electron density matrix of atoms in the cell directly, such that a polaron can be placed explicitly at the desired site, and in the desired orbital. Polaron configurations suggested by the exhaustive ML-aided search were always initialized via occupation matrix control, starting from initial occupation matrices taken from the training databases. Examples of occupation matrices are collected in the Supplementary Material in Section S2.
A final note on the calculation of the DFT polaron energy E DFT pol : DFT calculations do not grant access to formation energies of individual polarons, but only to the total energy of all polarons in the unit cell. In order to compare polaron formation energies for configurations with different polaron concentration, we defined a mean polaronic energyĒ DFT pol = E DFT pol /N pol , obtained by averaging over the number of polarons N pol in a given configuration. This operation allows us to compare the polaron energies obtained for different defect concetrations.

Descriptors
To design a suitable descriptor, we have exploited the fact that polaron energies are affected by two main interactions 21 : (i) Electrostatic interaction between charged defects: attractive interaction between negatively charged polarons and positively charged defects and polaron-polaron repulsion, both rapidly decreases with increasing spatial separation.
(ii) Polaron orbital symmetry, determined by the local coordination and symmetry of the hosting site as well as by the distance from the surface layer (For example, in rutile TiO 2 (110) the distance-dependence of the polaron-polaron Ti A S1 -Ti A S1 interaction is different from Ti A S1 -Ti A S0 21 ). Based on these principles, we designed a descriptor composed of a list of pairwise polaron-polaron and polaron-defect interactions, defined by the corresponding inter-polaron and polaron-defect distances within a cutoff-sphere R c around each polaron (defect: V O and Nb, for TiO 2 and SrTiO 3 , respectively). In this way, fixed portions of the descriptor vector can be assigned to specific interaction categories. An interaction category always depends on the local coordination of the hosting site (e.g., Ti A S1 ) and the local coordination of its interacting partner (e.g., Ti A S0 ). Within each interaction category, a fixed number of preassigned slots n can be used, containing rescaled distances according to the expression: The rescaling allows that the vector is filled with zeros, were no interaction is present. For an exemplary description of the evaluation of a polaron descriptor see Section S3 in the SM. Mind that each descriptor corresponds to the environment of a single polaron. Therefore, the number of descriptors from the databases are 4368 and 2257 for TiO 2 and SrTiO 3 , respectively.
We complete this section by providing a brief description of the resulting interaction categories for TiO 2 and SrTiO 3 . More details can be found in Section S3 in the Supplementary Materials.

Interaction categories in TiO 2 (110)
In rutile-TiO 2 the three topmost layers contain hosting sites with two different local orientations, for a total of 36 interaction categories (Ti X Si -Ti Y Sj , with X,Y ∈ {A, B} and i,j ∈ {0, 1, 2}). Due to the employed 9 × 2 supercell and the slightly different distance-dependence of stacked and non-stacked hosting sites of same local coordination 21 , we distinguish between stacked (A-or B-sites) and non stacked (A -or B -sites). This adds 18 additional interaction categories. Lastly, a category for the interaction with the V O is added for each differently coordinated sites (Ti X Si -V O , with X ∈ {A, B} and i ∈ {0, 1, 2}), resulting in 60 possible interaction categories. Three shortest distances per interaction category and a cutoff radius of 15 Å resulted in optimal model predictions for this descriptor. The full list of interaction categories for TiO 2 is given in the Supplementary Materials in Table ST3 and examples of interaction categories are collected in Figures SF1 and SF2.

Interaction categories SrTiO 3 (001)
Cubic and atomically flat SrTiO 3 has a more symmetrical structure, which results in a simpler descriptor. The three topmost layers contain identically coordinated hosting sites, leading to nine different interaction categories (Ti Si -Ti Sj , with i,j ∈ {0, 1, 2}). However, a more fine grained distinction of polaron-dopant interactions is necessary, since dopants can in principle lay in any layer (unilke V O in rutile TiO 2 (110)). Therefore, we consider six additional interaction categories (Ti Si -Nb Sj , with i ∈ {0, 1, 2} and j ∈ {0, 1}), resulting in a total of 15 different interaction categories with an optimal number of four included distances per category and a R c = 13 Å. A full list of interaction categories for SrTiO 3 is given in the Supplementary Materials in Table ST4 and examples of interaction categories are collected in Figure SF1

Machine learning model
We have used instance-based learning in form of kernel regression 38 , where a kernel-function k(·, ·) (or similarity measure) of the descriptor of interest x and all descriptors x i of the training set is calculated and scalar multiplied with the optimized regression parameters α, giving a weighted mean of the target quantity based on similarity to training instances: We determine optimal regression parameters of each kernel regressor, corresponding to a specific type of polaron, via backpropagation and gradient-descent performed on the sum of each kernel regressors prediction. We found that training the 9/13 regressors with a stochastic gradient descent varient, results in better extrapolation capabilities then performing an exact fit.
To optimize regression parameters of the different kernel regressors on the training data, the predicted energy E ML pol is used to calculate the loss function with respect to the target polaron energy E DFT pol of the configuration from the training dataset . We employed an adapted mean squared error loss function J, where the error of each training sample is normalized to the number of polarons in the configuration (N pol,i ).
Without a normalization of the error to the number of polarons, the model tended to systematically underpredict energies at low defect concentrations, which can likely be attributed to a greater total number of polarons at high defect concentrations. More polarons result in a cumulative higher error and the model shifts towards providing a better fit for configurations with many polarons at high defect concentrations -compromising accuracy for configurations with fewer polarons. For the optimization of the loss function, an ADAM optimizer 58 has been used with a learning rate set to 0.0001, which resulted in slow yet consistent convergence of regression parameters (see Figure SF3 in the SM). It has been found that an initialization of regression parameters set to 0 resulted in a faster convergence than a random initialization. A batch size of 64 randomly chosen configurations per epoch, and a weight decay of 0.1 to enforce small regression parameters, also benefited convergence of the model to consistent performance throughout many tests. We used the Laplacian kernel for all models and found that a hyperparameter γ between 0.1 and 0.5 leads to optimal results (see Figure SF4 in the SM). The algorithm has been implementd using NumPy 59 and Scikit-Learn 60 for preprocessing, and PyTorch 61 to perform the backpropagation and optimization of parameters.

Data availability
The data and code used in the presented article is available from the authors upon reasonable request.

S1 Datasets
The distribution of polarons in the training databases for different V O and Nb-doping concentrations are collected in Table ST1 Table ST2: Randomly generated dataset of configurations in SrTiO 3 (001). Number of inequivalent polaronic configurations generated for every defect concentration, and corresponding site-dependent polaron occupation (number of times the site is occupied by a polaron).

S1.1 Number of possible configurations
We briefly discuss the number of possible configurations in the TiO 2 super cell. In our setup, consisting of a 9 × 2 large unit-cell with two of the five stochiometric layers fixed to bulk positions (see Methods in the main text), the TiO 2 slab contains 108 Ti sites that can possibly host polarons (36 sites per S0, S1 and S2 layer). In the simplest case of one oxygen vacancy and two excess electrons in the slab (c VO = 5.5%), the number of possible polaronic configurations (with no symmetry applied) is given by the binomial coefficient, 108 2 = 5778. At higher concentration, the number of configurations explodes, e.g., for 4 oxygen vacancies we obtain 108 8 ≈ 3 · 10 11 polaronic configurations. We note that, especially at high defect concentration, exploiting symmetry operations to simplify the problem does not bring any considerable advantage.

S2 Occupation matrices
We used the Occupation Matrix Control scheme as developed in Ref. [?] to localize polarons at specific sites in a two step protocol. In the first step, occupation matrices of chosen Ti sites are fixed in order to ensure polaron localization at the specified sites. The second step consists of an unrestricted relaxation, which starts from the previously determined distorted structure. The second step is crucial to obtain reliable self-consistent results of the polaron trapping.
In the following, we list the spin-resolved density matrices used in the first step of the relaxation procedure. In the case of rutile TiO 2 (110) we employed specific occupation matrices for every site type (Ti A S0 , Ti A S1 , Ti B S1 , Ti B S0 ), here listed with the corresponding spin-channel ↑ and ↓ arrows. All occupation matrices have been extracted from a single polaron configuration at c VO =38.9%. Polarons at Ti A S0 , Ti A S1 , and Ti B S1 showed well defined orbital characters, while the unstable Ti B S0 polarons varied in their specific orbital character. Below, the used input occupation matrices are listed:

S3 Descriptors
In this section we briefly describe the most important concepts employed for constructing our descriptor. To ensure consistent representations of the polaron environment we structure the 2-body-interaction terms used to construct the descriptor via interaction categories. An interaction category depends on the type of polaron-hosting site, and on the nature of interaction (whether with another polaron of with a donor defect). Figure SF1 collects examples of interaction categories for the two materials of our study. The arrows in the left side of Figure SF1 show four interaction categories for rutile TiO 2 (110): polaron-polaron on the same-row (Ti A S1 -Ti A S1 ), non-stacked on different layers (Ti A S0 -Ti A S1 ), non-stacked on same layer (Ti A S1 -Ti A S1 ), and polaron-defect (Ti A S1 -V O ). The right panel similarly shows four possible interaction categories in SrTiO 3 : polaron-polaron on the same layer (Ti S0 -Ti S0 ), polaron-defect on the same (Ti S0 -Nb S0 ) and different (Ti S1 -Nb S0 , Ti S2 -Nb S1 ) layers. A full list of possible interaction categories for TiO 2 (110) and SrTiO 3 (001) is provided in Table ST3 and ST4, respectively.

S4 Model optimization and evaluation
In the following we show the convergence of the models in the training procedure. The hyperparameter optimization results are shown for the case of rutile TiO 2 (110). Finally, we show scatter plots of the omitted defect concentrations cases and provide quantitative information on errors on the target quantityĒ pol .

S4.1 Training
We recorded the adapted loss (see Equation 3 in the main text) at an interval of 100 stochastic optimization epochs for the training and validation phases, using a randomized split of the databases.

S4.2 Hyperparameter optimization
We determined optimal hyperparameters of the model (kernel parameter γ, cutoff-radius R c and number of elements per interaction category) by optimizing the validation accuracy. The results of this procedure are displayed for rutile TiO 2 (110) in Figure SF4. (c) Loss of train and validation data obtained after 5000 epochs for γ = 0.5, by varying the number of shortest distances included in each interaction category. While different interaction categories could be tuned separately to further optimize the descriptor, here we limited the optimization to an equal number of features per interaction category. Figure SF4: Examples of the hyperparameter optimization based of results from oxygen defective rutile TiO2(110).

S4.3 Omitted defect concentrations
Considering that the goal of this work is the efficient exploration of the polaron configurational space, the ML-model must be robust and guarantee accurate predictions for samples not visited in the training phase. Usually, the application of ML-models for extrapolation problems results in unreliable predictions, and careful tests are required. To this aim, we adopted a validation scheme, the "omitted defect concentrations" presented in the main text. Figure SF5 shows the results at every defect concentration for both the SrTiO 3 and TiO 2 systems. Furthermore, in Table ST5 and ST6, we report the mean squared error of the target quantityĒ pol as obtained in the training (including all concentrations in the database except the omitted one) and the test phase (omitted concentration only). The results show the good level of accuracy of the proposed ML model in addressing novel polaron configurations. The error obtained for the omitted concentration is indeed slightly larger than the value obtained for the standard test on randomly split training-vs-test databases (see last line in the Tables).
c Nb Train Test 3.3% 3.01 · 10 −5 2.19 · 10 −4 4.2% 4.26 · 10 −5 1.64 · 10 −4 5.0% 3.25 · 10 −5 3.79 · 10 −4 5.8% 3.64 · 10 −5 1.75 · 10 −4 6.7% 2.92 · 10 −5 4.07 · 10 −4 Randomized 5.75 · 10 −5 9.35 · 10 −5 Table ST6: The mean squared error of the mean polaronic energy for different test cases in the randomized dataset, reported for the training and test sets. The training was performed on configurations from 4 defect concentrations, and the testing on the remaining one (labeled in column c Nb ). The last line shows results from a randomized split of the data from all defect concentrations.  Figure SF6 and Figure SF7 collect the most-stable polaronic configuration as found by the machine-learning exhaustive search for every defect concentration in TiO 2 and SrTiO 3 , respectively. These ground-state configurations were individuated by the ML algorithm among a pool of ∼ 4 and ∼ 2 million polaron patterns, respectively. We note that searches based purely on DFT calculations are limited to smaller datasets due to the higher computational costs: The characterization of just ∼ 500 distinct polaronic configurations in TiO 2 (i.e., calculations including MD simulations plus ionic relaxations for the calculation of E DFT pol ) required us ∼ 2 · 10 6 core hours on our high-performance computing facility (HPC); similarly, the ∼ 400 configurations analyzed for SrTiO 3 via the random sampling approach were calculated in ∼ 0.3 · 10 6 HPC-core hours. The exhaustive ML-search on a configuration space of millions of polaron pattern requires instead a handful of hours (11 and 2 hours for TiO 2 and SrTiO 3 , respectively) on a single core of standard personal computers. The bottleneck of the ML-based strategy is the training data generation, which relies on DFT databases; however, our results for SrTiO 3 show that the ML model can be efficiently trained by performing a random sampling of the configuration space, relying on few hundreds of polaron energies calculated via DFT. The machine learning workflow proposed in this work is indeed able to overcome the practical limitations of approaches based purely on DFT and on physical intuition. We conclude with a statistical analysis on the polaron patters explored during the exhaustive search for TiO 2 and SrTiO 3 at low defect concentrations, as shown in Figure SF8. By considering only favorable configurations (i.e., configurations with E ML pol at least 80% of the ground state energy), the frequency distribution of polarons in TiO 2 shows a clear tendency towards charge trapping in Ti A S1 sites close to the V O at the lowest low defect concentration (SF8 (a)), correctly capturing the underlying symmetries of the supercell. With increasing defect concentration (see SF8 (b)), polarons tend to distribute more evenly across all Ti A S1 sites compared to the lower concentration, and occupation of Ti A S0 sites between the two V O becomes more favorable. In the case of an asymmetric defect pattern, as for SrTiO 3 , the distribution of favorable sites gets more complicated (see SF8 (c)). Nevertheless, The ML-aided search identifies sites directly above or below the Nb-dopants in the first two surface layers as the preferred polaron localization sites.