Introduction

Control of charge-carrier concentration of semiconducting materials is vitally important in a variety of applications, including photovoltaics,1,2 optoelectronics,3,4 transistors,5,6 and thermoelectrics (TE).7,8 To maximize efficiency, many of these applications require tuning both the type (p- or n-type) as well as the concentration of carriers. For many well-studied systems, the methods of controlling the carrier concentration are well-established, both in choice of dopant species and synthetic technique.9,10 However, the control of carrier concentration is not well-understood in novel material systems. Traditionally, experimentalists have relied on basic metrics to guide the choice of doping species, namely ionic charge counting and radius ratio “rules of thumb.”11 These rules may not directly translate to more complex chemistries and structures. Theorists have recently been able to better guide efforts using defect calculations as computational capabilities have improved.12,13,14,15,16 Despite these improvements, issues remain in the widespread use of these calculations due to their computational costs and inaccuracy. Therefore, methods to address dopability are critical for advances in complex semiconductors.

One such example is the high-throughput prediction of material properties, which has become increasingly common in the TE community.17,18,19,20 Various groups have developed their own models which can predict the optimal potential thermoelectric performance for a material based on its structure and simple density functional theory (DFT) calculations. One of these metrics, quality factor (β), is a descriptor for the potential of a material to exhibit high thermoelectric performance, and has been previously shown to track well with experimental TE performance.21 Despite the reasonable accuracy of these models in predicting the potential of a compound’s performance, they rely on a key assumption. In order to predict this quality factor, it must be assumed that the chemical potential (i.e., Fermi level) can be sufficiently tuned to the type and concentration of charge carrier that optimizes performance. In the absence of dopability guidance, experimental investigation of high β compounds is inefficient due to the large number of false positive compounds that cannot be doped.

In the discussion of dopability, we find it helpful to identify the distinct sources that limit dopability. The discussion here is in relation to p-type dopability, but the schematic and discussion for n-type would be a mirror image. In the first case, Fig. 1a, the red native donor defect represents a hole “killer” defect, one that prevents the Fermi level (EF) from being driven beyond some energy range, as it spontaneously produces an electron which increases EF. This donor defect pins the minimum thermodynamically achievable limit of the Fermi level (EF,lim) at the location of the red tick, with the possible doping range shown by the red horizantal gradient bar. As the native donor energy (En,d) increases, EF,lim moves toward the valence band, allowing a larger possible doping range shown by the green bar. Eventually, the donor energy is great enough such that the native donor dopability window (Wn,d) becomes positive, allowing greater p-type carrier concentration. Beyond killer defects, a system may exhibit limited dopability due to the lack of chemical flexibility in the native structure or extrinsic dopants. The Fermi level of the material will be set near the intersection of the lowest energy acceptor and donor defects, regardless of whether they are native or extrinsic (Fig. 1b). In some cases, the lowest energy extrinsic acceptor dopant is too high to substantially lower the Fermi level, meaning there is no extrinsic dopant which increases the dopability window,22 represented with the red extrinsic acceptor. This arises when there is significant phase competition for the dopant element and dopant solubility limits are reached. In more well-behaved systems, high dopant solubility is achieved due to a lack of phase competition and the minimal energetic penalty for dopant incorporation. This scenario yields a dopant where the energy of the extrinsic acceptor (Ee,a) is lower than the window of the native donor (Wn,d), drastically altering the Fermi level toward or into the valence band, leading to an associated high p-type carrier concentration (green extrinsic defect).

Fig. 1
figure 1

a Defect diagram schematic showing native defects, including an acceptor defect (black line) and two possible variations of a native donor defect (red and green lines). The intersection at the valence band maximum (VBM) of the native donor defect gives the p-type dopability window (Wn,d). The achievable thermodynamic limit of the Fermi level (EF,lim) is set by the charge (which determines slope) of the native donor defect and the conduction band minimum (CBM) defect energy (native donor energy or En,d). b Defect diagram schematic showing the effect of extrinsic dopants (dashed colored lines), given native acceptor and donor defects (solid black lines). The Fermi level will be near the intersection of the lowest energy donor and acceptor defects. The red extrinsic acceptor is a poor dopant as it does not significantly lower EF. A good p-type dopant is one where the extrinsic acceptor energy (Ee,a) is less than or equal to Wn,d (Wn,d − Ee,a ≥ 0), allowing high p-type carrier concentration

To date, there have been few efforts to model charge-carrier concentrations, and no comprehensive, analytical/physical model exists to estimate the dopability of materials. Conventional wisdom posits that large band gap materials are harder to dope, elemental properties such as size and electronegativity should be considered when choosing a substituting species, and that the structure and lattice energy have some effect.11 Yet little is known about the relationship (sign, magnitude, functional form) between the physical properties and carrier concentration. DFT can predict intrinsic defects and external dopants, guiding experimentalists to regions of phase space and the necessary dopants needed to achieve the desired carrier concentration.23 In the field of dopability, diamond-like semiconductors (DLS) have received the most computational attention. This includes an amphoteric-defect model,15,24 phenomenological models for doping limits based on universal band alignment,12,13,22,25 and detailed analysis of defects in individual DLS systems.26,27,28 Despite this success, defect calculations can’t be used for high-throughput screening due to their computational cost.

One possible solution is the development of semi-empirical models, as they have proven successful in combining experimental data with physics-based models.18,21 In the absence of an analytic model or high-throughput defect calculations, statistical learning from experimental or computational data can serve as an alternative to create empirical models and rules of thumb to make predictions of dopability in new compounds. Machine learning has proven successful in understanding and predicting energy and entropy,29 potentials and forces,30,31,32 structure, physical, and elastic properties,33,34,35,36,37,38 bandgap,34,39,40 and defects,41 as well as enabling high-throughput screening and discovery,42,43,44,45,46 and guiding experimental synthesis.47,48

In order to properly model and interpret dopability, the construction of an empirical dataset for cross-validation is of vital importance. While other physical properties have been tabulated in databases, there are few resources where carrier concentration in semiconductors has been collected. Minimization of the number of uncontrolled variables and maximization of the size of the dataset is helpful in improving accuracy, statistical significance, and applicability to the largest possible group of materials.40,49 Again, DLS stands out from this perspective as there are a large number of compounds and they are technologically relevant,50,51,52,53 including recent discovery of high β quaternary compounds.54 DLS compounds have the same tetrahedral local bonding environment and span an impressive fraction of the periodic table.55,56

The goal of the following is to establish a broader understanding of the drivers underpinning dopability and develop a method that predicts the possible carrier concentration range in DLS compounds. By performing a careful and extensive literature search with DLS as the model system, the experimentally realized carrier concentration range of 127 compounds have been obtained. Input features for modeling have been generated using structural information, periodic table properties of constituent elements, and widely-available, inexpensive DFT calculation results. We show, using cross-validation, that accurate predictions are possible for this dataset across the entire family, with the model capturing experimental trends in subsets of compounds as well. The features determined to be important in the linear regression are explained and matched with intuition and previous computational results. Finally, the dopability prediction engine is applied to additional DLS compounds which have not been experimentally studied to assess their ultimate potential as TE materials.

Results

Experimental and prediction comparison

The dopability dataset scraped from literature reports of carrier concentration is presented in Fig. 2. The width of each bar represents the dopability range for each of the 127 compounds found in the comprehensive literature search. While the theoretical limits on dopability are determined by defects and chemistry (see Fig. 1), there are also practical limits due to historical research which we call “persistence”. The more times a compound has been reported, the more likely it is that someone has pushed the dopability limit. This is highlighted by the color shading in Fig. 2. The persistence is quite varied, from one report of carrier concentration in a compound to dozens, with the average or median value being approximately five per compound. For compounds which have not been measured with high persistence, for only a single application, or that have not been made with the explicit goal of exploring the the full dopability range, it is likely that only a single distribution (n-type, intrinsic, or p-type) has been investigated. Whether intentionally or not, compounds which have been studied extensively are more likely to have been sampled in each distribution, thus pushing the dopability limits (Fig. S6). While one may intuit that the low-persistence compounds could be ignored, we provide two virtual experiments (Fig. S7) which demonstrate that we should not ignore these compounds in modeling the dataset and instead use this persistence value in weighing the data for fitting.

Fig. 2
figure 2

Experimental dopability range for diamond-like semiconductors collected from literature data. Left end of bar represents highest n-type carrier concentration while right side shows highest p-type achieved. Top to bottom order chosen to minimize both the difference in dopability range and the left/right displacement of the bar. Compounds with more experimental measurements are darker blue

Using this set of dopability data, linear regression, random forest, and neural network models were compared based on cross-validated prediction accuracy, with the results shown in Fig. S3. It was found that a linear regression provided similar or better prediction accuracy and superior interpretability, thus it was chosen for further refinement. Feature downselection for the linear model was performed using LASSO (least absolute shrinkage and selection operator), shown in Fig. S4. Due to the effect of persistence in this dataset, sample weighting was also applied to increase prediction accuracy, followed by further feature downselection. Linear regression and other models predict the mean value, while we are interested in the extreme limits of dopability, therefore confidence and prediction intervals were calculated to determine reasonable estimates of these limits.

The resulting predictions using leave-one-out cross-validation (LOOCV) are shown in Fig. 3. This figure contains only the subset of DLS compounds for which both experimental dopability data could be found as well as properties from DFT databases (OQMD and MP) had been calculated, and defect structures were removed. The experimental range is shown with a blue bar representing the maximum extent of dopability in both directions observed in a given compound. A seperate model is used to predict the maximum dopability on each side, with the predicted dopability range being all values between these maxima and given by the red bar. Since this model predicts the mean value for dopability, where we are interested in the maximum extent, a 50% prediction interval is given by the grey error bars. The prediction interval is calculated individually for each type (n/p) and compound, but in this dataset the bars are largely of the same length, indicating the compounds are fairly evenly distributed across the feature space. As the left and right sides of this dopability range are predicted separately, there is a difference between the accuracy of these individual models. The MAE of the CB is 1.16, while for the VB it is 1.22, giving an average MAE of 1.19 (about one order of magnitude in carrier concentration on average), demonstrating the model is roughly equally predictive of both n- and p-type carrier concentration. The predictive quality of this model has thus been established using LOOCV demonstrating the ability to predict dopability using this data collection and statistical fitting method.

Fig. 3
figure 3

For each compound, the experimental range is shown in blue and the prediction in red (shade of blue denotes experimental persistence). Grey error bar style lines represent a 50% prediction interval for both the n- and p-type models. Top to bottom order is the same as in Fig. 1

Careful observation of the experimental dopability dataset reveals some trends, and these trends are captured by the model. A few of these trends are highlighted here and are discussed in more detail in Fig. S5. The first trend that is captured is in the Group IV compounds, where Si, Ge, and Sn all have quite large carrier concentration ranges across n- and p-type, whereas C can only be made p-type and SiC only n-type. Second, in binary materials, there is a clear trend in II–VI compounds where dopability shifts from left to right (n- towards p-type) as you move down the anion group which is not observed in experimental III–V data. Third, compounds of the II–IV–V2 family show a similar trend in the experimental data for those with Zn but not those with Cd. Finally, in I–III–VI2 compounds, Cu-containing compounds are much more p-type whereas Ag ones are intrinsic or n-type. The predictions for all of these sub-families of DLS match qualitatively with the observation of experimental trends and are discussed in more detail in the Supplemental Information.

Model interpretation

Rather than serving only as a prediction engine, the hope is that some physical insight can be gained through interpretation of the resulting model. As this is a linear regression, it has an intercept (baseline n/p-type dopability) and a number of features that modify the intercept based on the coefficient value associated with each of them. Therefore, Fig. 4 can be interpreted by looking at the sign and magnitude of the coefficient value of each feature in relation to that of the intercept. Features whose coefficients are opposite in sign to their intercept contribute to lowering the carrier concentration range for either the n/p-type side. The eight features of Fig. 4 constitute the set which provide the best fit (lowest MAE) before overfitting begins by the addition of more features. Each of the features and their associated coefficient values will be discussed individually but viewed together, there are three drivers of dopability in this model: substitutional defects, other chemistry related features (including lattice and electronic energy), and practical limits due to historical persistence.

Fig. 4
figure 4

Linear regression intercepts and coefficients that were most important in determining carrier concentration ranges. Separate models were fit for n-type (CB) shown in green and p-type (VB) in purple. Error bars represent a 50% confidence interval for the coefficients. Features are in descending order based on the sum of the absolute value of the coefficients

The first set of coefficients to discuss are those that are partly related to the persistence limitation and more broadly the historical bias of prior experimental work: the intercept, Feature 1, and Feature 4. The sign of the intercept is as we should expect, negative for n-type and positive for p-type. A large number of p-type Cu-containing compounds form the learning set (over 45%), thereby making the intercept for the p-type prediction quite large. Conversely, the low individual persistence of these Cu-containing compounds yields a narrow dopability window and thus a smaller n-type intercept.

Feature 1 represents the number of unique elements in a compound; both coefficients are positive and the n-type value is significantly larger than the p-type value. In other words, the range shrinks and becomes more p-type with increasing number of unique elements. Unary and binary compounds have been well-studied and thus their experimental dopability range is quite large, while ternary and quaternary compounds have less reports in the literature and smaller experimental dopability ranges. The impact of a number of unique elements is not a priori obvious, as the number of sites increases, there are more possible sites to dope but also more possible substitutional defects that could pin the Fermi level. Again, the Cu-containing compounds induce this shift due to their native p-type behavior and limited persistence.

Feature 4 is whether silver is present in the compound, driving the dopability more n-type with a smaller range. Since many of the Ag-containing compounds have low p-type carrier concentration (more intrinsic) or are n-type, the presence of silver as one of the elements in the composition drives both the CB and VB more negative (Feature 4). As the model is heavily biased by Cu-containing compounds, the presence of Ag thus requires a correction to overcome this difference. Once again, persistence is relevant as the Ag-compounds likewise have limited persistence and a small range.

The next set of features all relate to the cations and their similarity to the anion: Feature 3 is the maximum cation average ionic radius, Feature 7 is the maximum electronegativity of the cations, and Feature 8 is the minimum absolute pairwise difference in atomic number between the anion and each cation. The effect of Features 7 and 8 are quite similar; when there is cation that is more like the anion in electronegativity or atomic number, the carrier concentration is pushed toward intrinsic and the compound has a lower dopability range. This can be seen as the mean VB coefficient for both of these features is negative and the mean CB coefficient is positive, both opposite of the VB/CB intercept. Since the VB coefficient for Feature 3 is negative (opposite VB intercept) and more negative than the CB coefficient, the dopability range decreases and shifts toward n-type when there is a cation with a similar ionic radius as the anion. Our interpretation of these trends is that these three features, each relating to having at least one cation that is similar to an anion, whether in size, electronegativity, or atomic radius, reduce the dopability range. Such reduced range may be due to the emergence of low energy “killer” compensating defects.

The other important features are related to chemistry, but not closely linked to the concept of substitutional defects. Feature 2 is the atomic number of the anion; therefore the trend is that as the anionic species is found further to the right and bottom of the periodic table, the compound is more p-type and has a smaller dopability range. Feature 5 is the minimum Bader for the compound, and in this case, the minimum Bader charge is always that of the anion. The Bader charge is an approximation of the total electronic charge of an atom and depends on both the chemistry and structure of the compound. The coefficients for the linear model imply that as the anionic element becomes more positive (i.e., less negatively charged), the anion less strongly holds the electrons and the compound becomes more p-type and the dopability range increases. This matches our intuition that compounds that are more covalent and less ionic are more dopable. Similarly, Feature 6 is the band gap from OQMD, with larger band gaps leading to a slightly more p-type material but one with a much smaller dopability range, again matching our intuition.

Phase competition has been found to affect dopability in materials.57 The number of elements plays a role in phase competition as there are likely to be more possible compounds in the phase diagram as the number of elements increases; as such, this prior work can be related to Feature 1. Another factor that enters into dopability is the energy of the lattice,13,15 captured in this model by the Bader charge in Feature 5. It has long been assumed that compounds with larger band gaps are harder to dope, however more recent work has found that the band offset and the position of the band extrema relative to the Fermi stabilization or pinning energy is what controls doping limits.12,15,22,58 These computational efforts find universal band alignment relate dopability in DLS compounds to the relative position of the VBM and CBM, creating “Pauling-esque” rules. However, these are not linked directly to the elements that are present and the trends in associated properties of those elements and compounds. Therefore, the remaining features are not directly comparable with previous computational work.

High-throughput dopability predictions

Combining dopability predictions with quality factor predictions allows the identification of potential new TE materials in the DLS family. While the optimized version of the dopability model includes features and weighting that do not lend well to high-throughput predictions, a simplified version can be used. A model for dopability was constructed utilizing only a feature set created using the formula and chemistry-based inputs, and LOOCV has a prediction accuracy reasonably similar to the optimized model (MAE/MSE of 1.34/3.05 compared with 1.19/2.23). This model was then used in tandem with β20 to generate predictions for DLS compounds, with results shown in Fig. 5. Out of a total of 188 DLS compounds considered in this study, the dopabiilty learning set was based on 127 compounds. This yields 61 compounds that had no experimental dopability information; both β and dopability could then be predicted for these compounds. An additional 67 compounds were part of the dopability learning set and had β predictions but had a persistence of less than five papers. In this latter case, the dopability prediction may highlight opportunities for extending the known dopant range in promising β compounds.

Fig. 5
figure 5

Predicted quality factor (β) vs. dopability for compounds in the diamond-like structure. Filled circles are experimental dopability, open circles are predicted. Horizontal colored lines connect experimental and predicted values of dopability, with color and thickness of line showing persistence. Dashed gray lines represent approximate benchmarks for good TE performance (carrier concentration > |3 × 1018| cm−3 and β > 10), meaning promising materials are in the regions labeled n-type and p-type

The first notable observation is that both the model predictions and experimental values (where they exist) indicate there are far more DLS compounds that are dopable to an appropriate p-type carrier concentration range to be useful for thermoelectric applications. Furthermore, there are far more DLS compounds with β predicted to be 10 or greater as n-type than p-type. The average hole mobility is approximately two orders of magnitude lower than the electron mobility, and the valence band mass is about one order of magnitude higher than the conduction band on average. This leads to many materials with high-potential n-type performance and fewer with high p-type β. Unfortunately, it appears that far fewer compounds can be realized n-type to a reasonably high-carrier concentrations. Many of these are quaternary Cu-containing compounds which have been made p-type only, and while the quality factor indicates they have the potential to be very good thermoelectric materials, the dopability predictions show it is very unlikely they can be made n-type to realize that potential. However, the high quantity of low-persistence quaternary Cu compounds may bias this conclusion; further studies of extrinsic doping in these materials is needed to establish the attainable carrier concentration bounds.

Figure 5 also demonstrates there are a number of binary and ternary compounds which could be interesting thermelectric materials where the dopability limits have not been fully explored experimentally. For example, β calculations indicate GaP could be a good thermoelectric, with both experimental and predicted dopability in the approriate range. ZnSnSb2 is predicted to have a much larger carrier concentration window than that found to date experimentally, and it could be a good thermoelectric if pushed from p-type through intrinsic to n-type. This also indicates that CuInSe2, which has had recent attention as a p-type thermoelectric, should be looked at instead as an n-type TE. Likewise p-type CuFeS2 and AgFeS2 could be promising TE materials. SnTe59,60,61 and SnS62,63 have both received significant interest as TE materials in their typical structures (rocksalt and orthorhombic, respectively) and our predictions indicate they could exhibit high performance if stabilized in the diamond-like structure.

Discussion

In this work, we show that dopability ranges can be predicted to approximately one order of magnitude against experimental results through linear modeling. The accuracy of a simple linear model is found to be similar to more complex machine learning techniques and allows greater interpretability regarding the features and properties that control dopability in diamond-like semiconductors. A number of features indicate that substitutional defects play a major role in driving carrier concentration ranges toward intrinsic ranges. In addition, features such as band gap and Bader charge match either “rules of thumb” or previous computational findings. Compounds that possess both promising thermoelectric quality factor (β) and complementary dopability are identified. Our results serve as a caution against pursuing a subset of high β compounds that have unfavorable dopability ranges. For compounds or subgroups of DLS where persistence is historically low, this work serves to inspire further experimental interrogation of carrier concentration limits when the predicted dopability range is far greater than the experimentally realized limits. These results also indicate where detailed defect calculation efforts are most impactful. We note that the predictive dopability model is not inherently limited to DLS compounds; an expanded learning set could incorporate structural descriptors so that the model could be applied to a more diverse set of compounds, including all thermoelectric materials.

Methods

An extensive literature search was undertaken to compile a dataset that could be used for statistical modeling. Although a few sources of tabulated carrier concentration exist, these often do not note sample quality or processing conditions, even though these are very important in determining carrier concentration in semiconductors. For this reason, careful consideration was given to experimental conditions from which measurements could be relied upon and all measurements for this dataset were scraped from original sources where possible. An attempt was made to use bulk samples produced at or near equilibrium conditions, measured using the Hall effect technique near room temperature and pressure. Nanomaterials, thin films, and other non-equilibrium processing methods were avoided where possible. The results of this literature search are given in Table S1.

With DLS as the model system, the scale for dopability must be defined such that the data are distributed fairly normal so that it can be modeled. The primary goal is to describe the full extent of the achievable dopability range in a compound, from the maximum n-type carrier concentration to the maximum p-type concentration. In some compounds, carrier concentration can be experimentally varied across many orders of magnitude for both hole and electron majority carriers, while for others the range is narrow or limited to a single type. Although carrier concentration spans many orders of magnitude for both positive and negative values, we are primarily interested in the range from intrinsic to degenerately doped. Thus, our scale ranges from −1 × 1021 (degenerate n-type) to 1 × 1021 (degenerate p-type), with intrinsic considered any concentration less than |1 × 1016| (all units given as cm−3). This allows us to define a linear scale from −5 to 0 to 5, corresponding to −1 × 1021 to ±1 × 1016 to 1 × 1021 where every integer value represents an order of magnitude in carrier concentration. Any concentration greater than 1 × 1021 is assigned a value of five and anything less than 1 × 1016 is considered to be 0. Using this scale for the carrier concentration data scraped from the literature results in a distribution that is relatively normal, as seen if Fig. S1.

To model this dataset, a list of features was first compiled. This includes chemistry-based features from periodic table properties of the consitutuent elements. Added to this feature set were inexpensive calculations from the Open Quantum Materials Database (OQMD)64,65 and Materials Project (MP).66,67,68 Lastly, structure and other miscellaneous features were added (Fig. S2). Modeling was performed using a number of Python packages, most notably Scikit-learn and StatsModels. Linear regression and leave-one-out cross-validation (LOOCV) were chosen due to a combination of prediciton accuracy and interpretability after comparing the results with other machine learning methods. A detailed discussion of data and feature set preparation, model comparison and analysis, and refinement can be found in the Supplemental Information. The primary metric for scoring accuracy used here is the mean absolute error (MAE) which is the average distance between the experimental and predicted value, though mean squared error (MSE) is also evaluated, where smaller values indicate less difference between experiments and predictions.