Introduction

In the wake of the climate crisis, transitioning to a carbon-free society is crucial in creating a sustainable future. In their 2018 report, the Intergovernmental Panel on Climate Change (IPCC) explores sustainable scenarios with PV systems producing up to a third of the planet’s renewable energy1. Today’s most prominent PV technology is silicon (Si) based solar cells2. Improving the efficiency and reliability of Si PV modules and solar cells, as well as reducing the cost of this technology, is paramount in meeting the IPCC targets3. Research focused toward higher efficiency Si solar cells has shone light on Si bulk defects as a key contributor to their total efficiency loss4. It is therefore primordial to identify and characterize bulk defects in order to eliminate or at least reduce their impact. Techniques such as temperature- and injection-dependent lifetimes spectroscopy (TIDLS)5 enables characterization of a wide range of Si bulk defects, such as copper6, aluminium7,8, chromium9,10, iron11,12 and many more13,14,15,16,17. To identify defects, the Shockley-Read-Hall (SRH) equation18,19 that describes the impact of a defect on the overall minority carrier lifetime is often used to fit the lifetime measurements20. Three defect parameters are defined by the SRH equation to represent a defect: its energy level Et and its electron and hole capture cross-sections, σn and σp, respectively. The state of the art fitting method uses Rein’s defect parameter solution surface (DPSS)20 which determines, for each injection-dependent measurement at a fixed temperature, a set of solutions to the SRH equation20. By continuously varying Et, the best fit for other parameters, or combinations of parameters, can be found, creating a continuous surface in the parameter space which represents the entirety of the mathematical solutions for that fit. By repeating the process for each temperature-dependent measurement, different DPSS curves are determined. These curves usually intersect at two points that indicate the possible combinations of the defect parameters, with one set of parameters in the upper bandgap and one in the lower bandgap7,8,9,10,11,12,13,21. This method, henceforth referred to as the ‘traditional approach’, has been refined over the years, such as linearizing the SRH equation under specific conditions14,22 or applying faster convergence techniques to find the intersection points21. However, the technique and its derivatives assume that the defect parameters are independent of temperature and that the defect possesses only one energy level in the Si’s bandgap7,9. In addition, in practice, it fails to identify which of the two mathematical solutions is the correct one.

To overcome these limitations, this study proposes a machine learning (ML) based approach to extract the defect parameters from lifetime curves. ML-based methods are already used at the PV system level, for example for fault detection23,24 or to identify cracks in modules using luminescence imaging techniques25,26,27,28. ML has also been used in non-Si applications to find relevant material parameters for fabrication of CIGS solar cells29, multi-junction solar cells30, organic solar cells31, or perovskite solar cells32. More recently, Kurchin et al.33 proposed a Bayesian-based model to predict the probability distribution of the defect parameters from temperature dependent current-voltage measurements. The model is designed to fit a single combination of defect parameters and needs to be retrained for different measurement scenario, such as iron or copper in silicon. Furthermore, it seems that the use of current-voltage measurements to extract the defect parameters can be influenced by other solar cells parameters, such as edge recombination and shunt resistance, that can mask the impact of the studied defect. This study’s approach brings the technological prowess of ML to defect parameters extraction in order to propose an alternative way to solve this decades-old challenge. Multiple ML training algorithms are considered, and compared, to predict the defect parameters directly from TIDLS measurements with no knowledge of the SRH equation. The developed method is not limited to silicon and can be used for non-Si PV materials, such as perovskites or CIGS.

Results

Defect parameter regression

Five iterations of training for each of the ML training algorithms are performed. Each training uses different random model initialization and random training/validation dataset split to ensure the repeatability of the ML algorithm’s predictive power. For neural networks (NN), the evaluation data is held out until the end of the epoch-based training. In Fig. 1, the results of each training’s coefficient of determination (R2) score are presented for the different ML models for both Et (a) and k (b) predictions. The models have been ordered from the best performers to the worst and the average and standard deviation R2 are given for each model and regression target. In the figure, the statistical box plot for each model is shown in green, while the black dots represent each individual training iteration.

Fig. 1: Defect parameter regression model comparison.
figure 1

a R2 scores of models on validation dataset for Et. b R2 scores of models on validation dataset for k.

Apart from the support vector machine (SVM) models – expected to perform better when the datasets are small and sparse34—and one instance of NN, all trained ML models have am R2 above 95% on the validation dataset, showing the feasibility of using ML as an inverse function, with no a priori knowledge of the SRH equation. The highest average R2 (>99%) is achieved by the random forest (RF) model for both k and Et prediction. From Fig. 1a, b, it can be concluded that the decision tree methods, such as RF, adaptive boosting, and gradient boosting achieve the highest average R2 score in predicting both Et and k, followed by NN and SVM. NN have a higher variance than the other models, due to its sensitivity to the random initial weights and the random order in which the data is processed during training35. Indeed, the training of NN relies on backpropagation steps that shift the weights of the network proportional to the error of the data points that it is trained to predict in order to minimize that error36. Changing the order in which the NN evaluates the data shifts the networks to different local minima because of these initial larger weight shifts.

Figure 2 shows the predicted parameters versus the true parameters for the best performing RF model using the validation dataset. Both (a) Et and (b) k predictions have R2 scores above 99%. In each graph, the data points are semi-transparent, and the darker area outlines the linearity of the correlation between predicted and true values, as most data points lie on the y = x line. The RF model correctly predicts Et across the entire range 87% of the time at ±0.02 eV precision which is typical error bars for reported Et values (between ± 0.02 eV to ±0.04 eV)8,9,10,15,16,17. It is noticeable that far from the mid-gap (|Et | > 0.1 eV), the correct predication is much higher (91%) for this precision level. Close to the mid-gap (|Et | < 0.1 eV), the prediction of Et at ±0.02 eV correctness drops to 75%. This can be explained by examining the SRH equation [Eq. (2)] when Et is close to 0 eV. The dependency on Et comes from the n1 and p1 terms [Eq. (4)]. When Et approaches 0 eV, both n1 and p1 terms becomes negligible compare to n0 + Δn and p0 + Δn, respectively. Thus, the resulting SRH lifetime has a weak dependence on Et, increasing the difficulty of both human and ML prediction near the mid-gap.

Fig. 2: Random forest defect parameter regression.
figure 2

a True vs predicted Et on validation dataset for best performing random forest. b True vs predicted k on validation dataset for best performing random forest.

To validate this explanation, a simulation of 200 defects with a fixed k = 100 and Et spanning from −0.1 eV to 0.1 eV is performed. Figure 3 presents the lifetime curves of these defects at 400 K. It can be observed that the defects are indistinguishable from one another on a macro scale. The insert is a zoom-in of the ensemble of lifetime curves at low injection level and it reveals that this ensemble of curves is 50 ns wide. Whereas it is extremely difficult to distinguish those lifetime curves with the naked eye, the RF model still correctly predicted 75% of the defects with ±0.02 eV accuracy.

Fig. 3: Mid-gap defects lifetime curve simulation.
figure 3

Simulation of 200 defects lifetime curves for Et near the mid-gap and k = 100 at 400 K. The insert is a zoom-in to estimate the width of the lifetime ‘band’.

Defect bandgap location classification

In the traditional DPSS approach two possible solutions are often identified, one in the lower bandgap and one in the upper bandgap7,8,9,21. In certain cases, the half bandgap location can be determined using additional measurements, such as deep level transient spectroscopy17. To address this limitation, ML models are trained to classify in which half bandgap the defect is located. The models have been ordered from the best performers to the worst one in Fig. 4a, and the average and standard deviation of each model’s accuracy. The statistical box plot for each model is shown in green, while the black dots represent each individual training iteration. NN significantly outperforms the other models, reaching an average accuracy of 82.7%. It is worth noting that all the other algorithms produce models above 75% accuracy proving the possibility of identifying the half bandgap’s location with TIDLS measurements. Despite NN’s good performance, the variation in the accuracy score between trained models shows the same behaviour than in the regression’s case of sensitivity to the first few steps of backpropagation (as shown by the standard deviation of this model). From Fig. 4a, it can also be observed that the RF models are significantly better than its boosting counterparts. Boosting techniques add error scoring methods during the training of the different decision trees of the RF37. The difference in performance between boosting and RF is due to the overfitting of the training data by the boosting models38. The best trained NN has an accuracy of 87.7% on the validation dataset and its confusion matrix is shown in Fig. 4b. The confusion matrix is symmetric, as expected, showing that both half bandgaps are treated equally.

Fig. 4: Defect half bandgap energy location classification model comparison.
figure 4

a Accuracy scores for each model on validation dataset. b True vs predicted confusion matrix on validation dataset for best performing neural network.

To provide more insight into the 12.4% incorrectly predicted half bandgap locations, the corresponding defects are plotted in Fig. 5, as a k vs Et plot. Each data point, representing a single defect parameter combination fed into the NN model, is coloured based on the NN’s classification probability towards their true half bandgap location. The probabilities are obtained directly from the output layer of the NN, which are known to be well-calibrated39. The grey datapoints corresponds to correctly predicted defect half bandgap location while coloured datapoints to incorrectly predicted defect half bandgap location. The colour scale represents the correct label probability, and for incorrectly labelled datapoints, is necessarily between 0 and 0.5. The darker the datapoints, the more inaccurate the classification by the NN is.

Fig. 5: Best performing neural network k vs Et plot.
figure 5

Coloured datapoints represents defects where half bandgap location was incorrectly predicted. Grey datapoints represents correctly predicted half bandgap location. The dotted black line corresponds to k values close to vp/vn.

The data points aggregate around two trends. For defects with |Et| < 0.1 eV, the classification accuracy drops to 68.7%, which still provides a decent indication regarding the half bandgap location without the need for additional measurements. As previously discussed for Fig. 2, defects with Et near the mid-gap are harder to differentiate for both human and ML. Outside the mid-gap range (|Et| > 0.1 eV), the NN classification model performs extremely well, with a 94.7% half bandgap location classification accuracy. The second trend is that outside the mid-gap region, the most incorrect classification is for defects with k values around 0.83 ± 0.01, represented by the dotted black line in Fig. 5. This corresponds to the ratio between hole and electron thermal velocity. Indeed, when the value k approaches vp/vn, it equalizes τn0 and τp0, resulting in the following simplified SRH equation:

$$\tau _{{\mathrm{SRH}}} = \tau _{{\mathrm{n}}0} \cdot \left( {1 + \frac{{p_1 + n_1 + {\mathrm{{\Delta}}}n}}{{p_0 + n_0 + {\mathrm{{\Delta}}}n}}} \right)$$
(1)

Since n1 and p1 have opposite Et dependence, the SRH equation in the case k = vp/vn is symmetric in respect to Et. This means that for a fixed k value, satisfying k = vp/vn, or equivalently σnp = vp/vn, the lifetime curves for any given Et are identical to the lifetime curves of the opposite energy level −Et. For this reason, it is difficult to pinpoint the exact half bandgap location of defects when the value of k : = σn/σp is close to vp/vn, as shown in Fig. 5 where the prediction probability tends to 50% for each half. However, the regression targets k and Et are not compromised by that regime, as seen in Fig. 2.

Verification

A comparison between DPSS and the best-trained ML models is given in Fig. 6. The SRH lifetime curves at various temperatures for a selected defect (Et = 0.335 eV and k = 1.1) are presented in Fig. 6a. From these, five DPSS curves are generated following Rein’s method20 [see Fig. 6b] to identify the defect parameters. Figure 6b also includes the combined predications of Et and k by the best of the previously trained ML regressor models of each category (indicated by coloured symbols). As can be noticed, a great agreement is demonstrated between all the ML models and the DPSS method in terms of the prediction of the defect parameters. The best regressor predict Et within ±0.0003 eV and k within ±0.05 of the simulated values.

Fig. 6: Traditional and proposed method comparisons on simulated defect.
figure 6

Comparison of the same defect parameters (Et = 0.335 eV and k = 1.1) without (ac) and with (df) noise. a, d Simulated SRH lifetime curves. b, c Calculated DPSS curves for k := σnp. c, f DPSS curves intersection sharpness ε defined as the standard deviation of the k-DPSS curves divided by the mean of those curves.

As stated previously, the DPSS approach usually provides two solutions for the defect parameter extraction, with one in the upper half bandgap and one in the lower half bandgap. To distinguish between the two solutions, the sharpness of the intersection of the DPSS is plotted in Fig. 6c as a function of Et, where the sharpness ε is defined as the standard deviation of the k-DPSS curves divided by the mean of the curves. The sharpness is only defined when all the DPSS curves exists, thus, the discontinuity in the sharpness curve. Using this approach, the minimum of the sharpness curve points to the most probable solution. In Fig. 6e, the DPSS curves sharpness clearly indicates the correct upper bandgap as the solution (also marked by the orange band). The green band represents the best ML classifier model’s classification of the defect energy level half bandgap location. The ML classifier correctly identify the half bandgap location and, in this case, aligns with the DPSS classification.

This analysis is repeated on the same defect parameters, adding random gaussian noise to the SRH lifetime curves. As shown in Fig. 6d, the noise is scaled up with decreasing excess carrier concentration to reproduce similar behaviour of actual lifetime measurements. The DPSS curves of the noisy data [Fig. 6e] shows that the ML regressor models are robust to the noise and continues to show accurate predictions of the defect parameters, with the best regressor predicting Et within ±0.004 eV and k within ±0.05. However, by adding noise to the SRH lifetime curves, the sharpness of the DPSS curves becomes ambiguous [Fig. 6f] and the DPSS approach now predicts the wrong half bandgap energy location. The ML still predicts the correct half bandgap energy location, showing the ML robustness to noise and presenting a case where the ML approach outperforms the DPSS. The comparison done in Fig. 6 is repeated 1,000 times for the same defect with random noise. The ML approach has an accuracy of 87.7% for half bandgap energy location prediction, while DPSS reaches 80.9% accuracy. When increasing the noise intensity, the accuracy of the DPSS drops to 44.7%, while the ML retains 76.4% accuracy. This is a strong demonstration that ML-based predictions are more reliable compared to DPSS-based predications when dealing with noisy signals.

The comparison between the DPSS and the ML has been repeated for 10,000 random defect parameters and set of SRH lifetime curves with random noise. In terms of classification, the DPSS approach predicts the correct half bandgap location 60% of the time, while the ML is correct 80% of the time over those 10,000 defects. Furthermore, the regression of the correct bandgap has shown an average defect energy level residual of 0.093 eV for the DPSS compared to almost of magnitude lower for the ML (0.011 eV). This is a clear indication that using the ML approach has a significant advantage over the DPSS approach in terms of its robustness to noise. This is quite remarkable considering the fact the ML classifier and regressors have been trained only on a dataset with no noise. We assume that the proven good robustness to potential measurement noise can be even further improved by training on a noisy dataset.

Experimental application

To be able to apply the ML approach to experimental measurements, an adaptation of the simulation parameters (such as wafer type, Ndop, and temperature) to the desired measurement is necessary. Once the ML model is trained, the experimental measurements can be fed to the model in order to extract the defect parameters. This method is validated below by comparison to the traditional approach using the DPSS analysis.

For validation, an n-type Si wafer with Ndop = 5.1 × 1015 cm−3 from Zhu et al.13. is used. Lifetime measurements are done using TIDLS at eight different temperatures. The DPSS method is then used to extract Et and k and the results are compared to ML predictions in Fig. 7. Each DPSS curve represents the best fit to the SRH model at that temperature. As discussed, the defects parameters are identified by a simultaneous minimization of the fit errors at all temperatures. The two solutions found by the DPSS method are marked by the black diamond symbol. The transparent ellipse represents the error-space defined by the error bars of this method. These two solutions are (Et = −0.30 ± 0.03 eV; k = 1.87 ± 1.18) for the lower half bandgap and (Et = 0.28 ± 0.04 eV; k = 1.01 ± 0.44) for the upper half bandgap solution. The ML predictions are obtained by training an instance of different ML training algorithms compared in this study: NN, RF and its boosting counterparts. Figure 7 shows that the ensemble models—RF, adaptive, and gradient boosting—all predict defect parameters within the DPSS solutions uncertainty range for both Et and k, while NN predicts Et within the DPPS solution uncertainty range but overshoot the value of k. Furthermore, calculating the sharpness of the DPSS curves’ intersection points the half bandgap defect energy location in the lower half bandgap. In this experimental case, the ML approach concur with the DPSS results. The agreement between the ML models and the DPSS method highlights the success of this first-ever application of ML techniques for extraction of defect parameters. Based on this study, the approach can be extended to overcome the limitations of the currently used methods. For example, multi-level defects or multiple single-level defects modelling can be used to generate the training dataset and temperature-dependence of the defect parameters, such as the capture cross-sections, can be introduced into the physical models as well.

Fig. 7: Experimental measurements from machine learning predictions.
figure 7

DPSS curves of experimental measurements at multiple temperature. Black point and ellipse (error bar) show the traditional approach parameter extraction and coloured symbols the defect parameters extracted using machine learning.

Discussion

In this study, we proposed a ML-based approach to defect parameters extraction. By generating more than a million TIDLS lifetime curves with the SRH equation, ML models were trained and evaluated on simulated data. Overall, ensemble methods, such as RF, adaptive, and gradient boosting, seem to be the most promising algorithms to extract the defect parameters with an average R2 score of 99%, whereas NN excels at predicting the half bandgap location of the defect with an average accuracy of 82.7%, outperforming the DPSS in predicting the half bandgap location of noisy SRH lifetime curves, which showcase an important advantage of the ML approach over the traditional approach. With no knowledge of the relationships predicated by the SRH equation, ML models provide insights regarding the physical limitations, such as when the lifetime curves are indistinguishable from one another, while still performing well in those conditions. Experimental measurements validated the ML approach in this successful defect parameter extraction using ML from lifetime measurements.

As more complex models are trained, additional insights can be gained that can contribute to a better understanding of defects. More complex models can also breach the limitations of traditional approaches as more modelling possibilities could be incorporated into the dataset generation step, such as multiple single-level defects and temperature-dependent capture cross-sections. The proposed approach can be generalized to any number of temperature measurements and wafer conditions and is not limited to Si wafers. The adaptability of this method ensures its uses in very varied scenarios and transfers its potential to other measurements techniques or material measured such as thin-film or perovskites with possibly more complex ML models.

Methods

Dataset Simulation

The SRH18,19 equation is used to generate a dataset of simulated lifetime curves for a wide range of defect parameters (Et, σn, σp). The SRH lifetime τSRH is given by:

$$\tau _{{\mathrm{SRH}}} = \frac{{\tau _{{\mathrm{p}}0} \cdot \left( {n_0 + n_1 + {\mathrm{{\Delta}}}n} \right) + \tau _{{\mathrm{n}}0} \cdot \left( {p_0 + p_1 + {\mathrm{{\Delta}}}n} \right)}}{{n_0 + p_0 + {\mathrm{{\Delta}}}n}}$$
(2)

with

$$\tau _{{\mathrm{p}}0}: = \left( {\sigma _{\mathrm{p}} \cdot v_{\mathrm{p}} \cdot N_{\mathrm{t}}} \right)^{ - 1};\tau _{{\mathrm{n}}0}: = \left( {\sigma _{\mathrm{n}} \cdot v_{\mathrm{n}} \cdot N_{\mathrm{t}}} \right)^{ - 1}$$
(3)
$$n_1: = n_{\mathrm{i}} \cdot e^{\frac{{E_{\mathrm{t}}}}{{k_{\mathrm{b}}T}}};p_1: = n_{\mathrm{i}} \cdot e^{\frac{{-E_{\mathrm{t}}}}{{k_{\mathrm{b}}T}}}$$
(4)

In Eq. (2), n0 (p0) is the electron (hole) carrier concentration at thermal equilibrium and Δn is the excess carrier concentration. In Eq. (3), vn (vp) is the electron (hole) thermal velocity calculated using the model of Green et al.40 and Nt is the defect density. In Eq. (4), kb is Boltzmann’s constant, T is the temperature and ni is the intrinsic carrier concentration calculated using the model of Couderc et al.41, taking into account bandgap narrowing with the model of Yan et al.42. Et is defined relative to the intrinsic energy of Si Ei (Et = 0 eV is at mid-gap) and grows positive as Et gets closer to the conduction band. At this stage, it is assumed that the defect parameters (Et, σn, σp) are independent of temperature.

For each combination of defect parameters (Et, σn, σp), 500 lifetime points are simulated as a feature vector of 100 Δn points at five different temperatures. Without loss of generality, p-type wafers are chosen with a bulk doping density Ndop fixed at 1015 cm−3, while Nt is fixed at 1012 cm−3. In total, 300,000 defect parameters combinations are chosen randomly within the reasonable physical range resulting in a million and a half-simulated lifetime curves ranging from 0.1 μs to 10 ms. The ranges of the other simulation parameters are given in Table 1.

Table 1 Data simulation parameter range.

Machine learning training

The simulated dataset is randomly split into a training dataset (90% of the simulated data) used to train the ML model and a validation dataset (10%) used to evaluate the model. In the traditional defect parameter extraction approach, such as DPSS20 and its derivatives14,21,22, two solutions for Et are usually obtained, with one in the upper half of the bandgap (Et,+) and the second in the lower half of the bandgap (Et,-). In this study, for the regression of Et, the training of the ML model is done separately for each half bandgap. The results are then combined and presented together. In order to identify in which half-bandgap the defect is located, the same approach is used to train a classifier model to predict the sign of Et. The classifier outputs the probability of the defect being in the upper or lower half bandgap. The capture cross-section regression’s target is the ratio k:= σnp, which is usually reported in defect parameters’ characterization7,9,14,21,22. For the regression of k, the training of the ML model is done on the whole bandgap. In total, four ML models are trained to inverse the SRH equation: two regression models to predict Et, one regression model to predict k; and one classification model to determine the defect energy location. The ML training flow is summarized in Fig. 8.

Fig. 8: Machine learning defect parameter extraction training flow.
figure 8

Dataset is simulated using random defect parameters within defined constraints and different machine learning models are trained for each of the prediction targets: Et,+, Et,−, k and the defect energy location.

For each of the four ML models needed, different ML training algorithms can be used, defined in each grey box of Fig. 8. Random forests (RF)43 are an ensemble of decision trees, where each tree is built from a random subset of the training data. The RF model outputs an average decision by all its decision trees. Enhancements of this method are possible through boosting, notably adaptive boosting37,44 and gradient boosting38, which weights the decision of every tree to improve the predictive power of the overall model. A neural network (NN)36,45 is a layered group of nodes, called neurons, with a set of weights that are updated as it processes the training data. NN are widely used for different applications, such as image recognition46, text translation46 or beating Go world champions47. Another widely used algorithm is support vector machines (SVM)48,49 which consists of mapping the training datapoints to a higher dimensional feature space where the data becomes easily separable. For classification problems, the K-nearest neighbours (KNN)50 algorithm simply assigns the predicted label to be the majority vote of the closest known training data points. Figure 8 also shows that most ML training algorithms are used to train both regression and classification models, however, SVM is dedicated to regressions and KNN to classification.

In order to compare the different trained ML models, two scoring techniques are introduced. For the regression of defect parameters, predictions are made on the validation dataset and correlated to the true defect parameter values. To score the correlation, the coefficient of determination (R2) is used51:

$$R^2: = 1 - \frac{{\mathop {\sum }\nolimits_{i = 1}^N ( {y_{\mathrm{i}}^{{\mathrm{pred}}} - y_{\mathrm{i}}^{{\mathrm{true}}}} )^2}}{{\mathop {\sum }\nolimits_{i = 1}^N \left( {y_{\mathrm{i}}^{{\mathrm{true}}} - y_{{\mathrm{mean}}}^{{\mathrm{true}}}} \right)^2}};$$
(5)

where ‘pred’ (‘true) reference to the predicted (true) values of the validation dataset and yi spans the N data points of the validation dataset. The R2 score has a maximum value of 1 which indicates a perfect correlation and enables comparison between prediction targets. For the trained classification models, a confusion matrix is created, comparing predicted and true labels. The considered classification is binary, and predictions can have one of the following four outcomes: true positive (TP), true negative (TN), false negative (FN) and false positive (FP). The accuracy (‘Acc’) is a defined as52:

$${\mathrm{Acc}}: = \frac{{{\mathrm{TP}} + {\mathrm{TN}}}}{{{\mathrm{TP}} + {\mathrm{TN}} + {\mathrm{FP}} + {\mathrm{FN}}}}$$
(6)

The accuracy measures the ratio between correctly predicted points and the total number of points of the training dataset. The lifetime curves dataset simulation, ML model training and evaluation are implemented using Python and Scikit-learn53. More information about the hyper-parameter used can be found in the Supplementary Methods. The machine used for training has an Intel® Xeon® W-2145 processor, a central processing unit up to 3.70 GHz and a random-access memory of 64 GB. The training time is in the order of minutes for all the algorithms presented in this paper.