Extracting bulk defect parameters in silicon wafers using machine learning models

The performance of high-efficiency silicon solar cells is limited by the presence of bulk defects. Identification of these defects has the potential to improve cell performance and reliability. The impact of bulk defects on minority carrier lifetime is commonly measured using temperature- and injection-dependent lifetime spectroscopy and the defect parameters, such as its energy level and capture cross-section ratio, are usually extracted by fitting the Shockley-Read-Hall equation. We propose an alternative extraction approach by using machine learning trained on more than a million simulated lifetime curves, achieving coefficient of determinations between the true and predicted values of the defect parameters above 99%. In particular, random forest regressors, show that defect energy levels can be predicted with a high precision of ±0.02 eV, 87% of the time. The traditional approach of fitting to the Shockley-Read-Hall equation usually yields two sets of defect parameters, one in each half bandgap. The machine learning model is trained to predict the half bandgap location of the energy level, and successfully overcome the traditional approach’s limitation. The proposed approach is validated using experimental measurements, where the machine learning predicts defect energy level and capture cross-section ratio within the uncertainty range of the traditional fitting method. The successful application of machine learning in the context of bulk defect parameter extraction paves the way to more complex data-driven physical models which have the potential to overcome the limitation of traditional approaches and can be applied to other materials such as perovskite and thin film.


INTRODUCTION
In the wake of the climate crisis, transitioning to a carbon-free society is crucial in creating a sustainable future. In their 2018 report, the Intergovernmental Panel on Climate Change (IPCC) explores sustainable scenarios with PV systems producing up to a third of the planet's renewable energy 1 . Today's most prominent PV technology is silicon (Si) based solar cells 2 . Improving the efficiency and reliability of Si PV modules and solar cells, as well as reducing the cost of this technology, is paramount in meeting the IPCC targets 3 . Research focused toward higher efficiency Si solar cells has shone light on Si bulk defects as a key contributor to their total efficiency loss 4 . It is therefore primordial to identify and characterize bulk defects in order to eliminate or at least reduce their impact. Techniques such as temperature-and injectiondependent lifetimes spectroscopy (TIDLS) 5 enables characterization of a wide range of Si bulk defects, such as copper 6 , aluminium 7,8 , chromium 9,10 , iron 11,12 and many more [13][14][15][16][17] . To identify defects, the Shockley-Read-Hall (SRH) equation 18,19 that describes the impact of a defect on the overall minority carrier lifetime is often used to fit the lifetime measurements 20 . Three defect parameters are defined by the SRH equation to represent a defect: its energy level E t and its electron and hole capture crosssections, σ n and σ p , respectively. The state of the art fitting method uses Rein's defect parameter solution surface (DPSS) 20 which determines, for each injection-dependent measurement at a fixed temperature, a set of solutions to the SRH equation 20 . By continuously varying E t , the best fit for other parameters, or combinations of parameters, can be found, creating a continuous surface in the parameter space which represents the entirety of the mathematical solutions for that fit. By repeating the process for each temperature-dependent measurement, different DPSS curves are determined. These curves usually intersect at two points that indicate the possible combinations of the defect parameters, with one set of parameters in the upper bandgap and one in the lower bandgap [7][8][9][10][11][12][13]21 . This method, henceforth referred to as the 'traditional approach', has been refined over the years, such as linearizing the SRH equation under specific conditions 14,22 or applying faster convergence techniques to find the intersection points 21 . However, the technique and its derivatives assume that the defect parameters are independent of temperature and that the defect possesses only one energy level in the Si's bandgap 7,9 . In addition, in practice, it fails to identify which of the two mathematical solutions is the correct one.
To overcome these limitations, this study proposes a machine learning (ML) based approach to extract the defect parameters from lifetime curves. ML-based methods are already used at the PV system level, for example for fault detection 23,24 or to identify cracks in modules using luminescence imaging techniques [25][26][27][28] . ML has also been used in non-Si applications to find relevant material parameters for fabrication of CIGS solar cells 29 , multijunction solar cells 30 , organic solar cells 31 , or perovskite solar cells 32 . More recently, Kurchin et al. 33 proposed a Bayesian-based model to predict the probability distribution of the defect parameters from temperature dependent current-voltage measurements. The model is designed to fit a single combination of defect parameters and needs to be retrained for different measurement scenario, such as iron or copper in silicon. Furthermore, it seems that the use of current-voltage measurements to extract the defect parameters can be influenced by other solar cells parameters, such as edge recombination and shunt resistance, that can mask the impact of the studied defect. This study's approach brings the technological prowess of ML to defect parameters extraction in order to propose an alternative way to 1 The University of New South Wales, Sydney, NSW 2052, Australia. ✉ email: yoann.buratti@student.unsw.edu.au solve this decades-old challenge. Multiple ML training algorithms are considered, and compared, to predict the defect parameters directly from TIDLS measurements with no knowledge of the SRH equation. The developed method is not limited to silicon and can be used for non-Si PV materials, such as perovskites or CIGS.

Defect parameter regression
Five iterations of training for each of the ML training algorithms are performed. Each training uses different random model initialization and random training/validation dataset split to ensure the repeatability of the ML algorithm's predictive power. For neural networks (NN), the evaluation data is held out until the end of the epoch-based training. In Fig. 1, the results of each training's coefficient of determination (R 2 ) score are presented for the different ML models for both E t (a) and k (b) predictions. The models have been ordered from the best performers to the worst and the average and standard deviation R 2 are given for each model and regression target. In the figure, the statistical box plot for each model is shown in green, while the black dots represent each individual training iteration.
Apart from the support vector machine (SVM) modelsexpected to perform better when the datasets are small and sparse 34 -and one instance of NN, all trained ML models have am R 2 above 95% on the validation dataset, showing the feasibility of using ML as an inverse function, with no a priori knowledge of the SRH equation. The highest average R 2 (>99%) is achieved by the random forest (RF) model for both k and E t prediction. From Fig.  1a, b, it can be concluded that the decision tree methods, such as RF, adaptive boosting, and gradient boosting achieve the highest average R 2 score in predicting both E t and k, followed by NN and SVM. NN have a higher variance than the other models, due to its sensitivity to the random initial weights and the random order in which the data is processed during training 35 . Indeed, the training of NN relies on backpropagation steps that shift the weights of the network proportional to the error of the data points that it is trained to predict in order to minimize that error 36 . Changing the order in which the NN evaluates the data shifts the networks to different local minima because of these initial larger weight shifts. Figure 2 shows the predicted parameters versus the true parameters for the best performing RF model using the validation dataset. Both (a) E t and (b) k predictions have R 2 scores above 99%. In each graph, the data points are semi-transparent, and the darker area outlines the linearity of the correlation between predicted and true values, as most data points lie on the y = x line. The RF model correctly predicts E t across the entire range 87% of the time at ±0.02 eV precision which is typical error bars for reported E t values (between ± 0.02 eV to ±0.04 eV) [8][9][10][15][16][17] . It is noticeable that far from the mid-gap (|E t | > 0.1 eV), the correct predication is much higher (91%) for this precision level. Close to the mid-gap (|E t | < 0.1 eV), the prediction of E t at ±0.02 eV correctness drops to 75%. This can be explained by examining the SRH equation [Eq. (2)] when E t is close to 0 eV. The dependency on E t comes from the n 1 and p 1 terms [Eq. (4)]. When E t approaches 0 eV, both n 1 and p 1 terms becomes negligible compare to n 0 + Δn  and p 0 + Δn, respectively. Thus, the resulting SRH lifetime has a weak dependence on E t , increasing the difficulty of both human and ML prediction near the mid-gap.
To validate this explanation, a simulation of 200 defects with a fixed k = 100 and E t spanning from −0.1 eV to 0.1 eV is performed. Figure 3 presents the lifetime curves of these defects at 400 K. It can be observed that the defects are indistinguishable from one another on a macro scale. The insert is a zoom-in of the ensemble of lifetime curves at low injection level and it reveals that this ensemble of curves is 50 ns wide. Whereas it is extremely difficult to distinguish those lifetime curves with the naked eye, the RF model still correctly predicted 75% of the defects with ±0.02 eV accuracy.
Defect bandgap location classification In the traditional DPSS approach two possible solutions are often identified, one in the lower bandgap and one in the upper bandgap [7][8][9]21 . In certain cases, the half bandgap location can be determined using additional measurements, such as deep level transient spectroscopy 17 . To address this limitation, ML models are trained to classify in which half bandgap the defect is located. The models have been ordered from the best performers to the worst one in Fig. 4a, and the average and standard deviation of each model's accuracy. The statistical box plot for each model is shown in green, while the black dots represent each individual training iteration. NN significantly outperforms the other models, reaching an average accuracy of 82.7%. It is worth noting that all the other algorithms produce models above 75% accuracy proving the possibility of identifying the half bandgap's location with TIDLS measurements. Despite NN's good performance, the variation in the accuracy score between trained models shows the same behaviour than in the regression's case of sensitivity to the first few steps of backpropagation (as shown by the standard deviation of this model). From Fig. 4a, it can also be observed that the RF models are significantly better than its boosting counterparts. Boosting techniques add error scoring methods during the training of the different decision trees of the RF 37 . The difference in performance between boosting and RF is due to the overfitting of the training data by the boosting models 38 . The best trained NN has an accuracy of 87.7% on the validation dataset and its confusion matrix is shown in Fig. 4b. The confusion matrix is symmetric, as expected, showing that both half bandgaps are treated equally.
To provide more insight into the 12.4% incorrectly predicted half bandgap locations, the corresponding defects are plotted in Fig. 5, as a k vs E t plot. Each data point, representing a single defect parameter combination fed into the NN model, is coloured based on the NN's classification probability towards their true half bandgap location. The probabilities are obtained directly from the output layer of the NN, which are known to be well-calibrated 39 . The grey datapoints corresponds to correctly predicted defect half bandgap location while coloured datapoints to incorrectly predicted defect half bandgap location. The colour scale represents the correct label probability, and for incorrectly labelled datapoints, is necessarily between 0 and 0.5. The darker the datapoints, the more inaccurate the classification by the NN is.
The data points aggregate around two trends. For defects with |E t | < 0.1 eV, the classification accuracy drops to 68.7%, which still provides a decent indication regarding the half bandgap location without the need for additional measurements. As previously discussed for Fig. 2, defects with E t near the mid-gap are harder to differentiate for both human and ML. Outside the mid-gap range (|E t | > 0.1 eV), the NN classification model performs extremely well, with a 94.7% half bandgap location classification accuracy. The second trend is that outside the mid-gap region, the most incorrect classification is for defects with k values around 0.83 ± 0.01, represented by the dotted black line in Fig. 5. This corresponds to the ratio between hole and electron thermal velocity. Indeed, when the value k approaches v p /v n , it equalizes τ n0 and τ p0 , resulting in the following simplified SRH equation: Since n 1 and p 1 have opposite E t dependence, the SRH equation in the case k = v p /v n is symmetric in respect to E t . This means that for a fixed k value, satisfying k = v p /v n , or equivalently σ n /σ p = v p /v n ,  the lifetime curves for any given E t are identical to the lifetime curves of the opposite energy level −E t . For this reason, it is difficult to pinpoint the exact half bandgap location of defects when the value of k : = σ n /σ p is close to v p /v n , as shown in Fig. 5 where the prediction probability tends to 50% for each half. However, the regression targets k and E t are not compromised by that regime, as seen in Fig. 2.

Verification
A comparison between DPSS and the best-trained ML models is given in Fig. 6. The SRH lifetime curves at various temperatures for a selected defect (E t = 0.335 eV and k = 1.1) are presented in Fig. 6a. From these, five DPSS curves are generated following Rein's method 20 [see Fig. 6b] to identify the defect parameters. Figure 6b also includes the combined predications of E t and k by the best of the previously trained ML regressor models of each category (indicated by coloured symbols). As can be noticed, a great agreement is demonstrated between all the ML models and the DPSS method in terms of the prediction of the defect parameters. The best regressor predict E t within ±0.0003 eV and k within ±0.05 of the simulated values.
As stated previously, the DPSS approach usually provides two solutions for the defect parameter extraction, with one in the upper half bandgap and one in the lower half bandgap. To distinguish between the two solutions, the sharpness of the intersection of the DPSS is plotted in Fig. 6c as a function of E t , where the sharpness ε is defined as the standard deviation of the k-DPSS curves divided by the mean of the curves. The sharpness is only defined when all the DPSS curves exists, thus, the discontinuity in the sharpness curve. Using this approach, the  minimum of the sharpness curve points to the most probable solution. In Fig. 6e, the DPSS curves sharpness clearly indicates the correct upper bandgap as the solution (also marked by the orange band). The green band represents the best ML classifier model's classification of the defect energy level half bandgap location. The ML classifier correctly identify the half bandgap location and, in this case, aligns with the DPSS classification.
This analysis is repeated on the same defect parameters, adding random gaussian noise to the SRH lifetime curves. As shown in Fig.  6d, the noise is scaled up with decreasing excess carrier concentration to reproduce similar behaviour of actual lifetime measurements. The DPSS curves of the noisy data [ Fig. 6e] shows that the ML regressor models are robust to the noise and continues to show accurate predictions of the defect parameters, with the best regressor predicting E t within ±0.004 eV and k within ±0.05. However, by adding noise to the SRH lifetime curves, the sharpness of the DPSS curves becomes ambiguous [ Fig. 6f] and the DPSS approach now predicts the wrong half bandgap energy location. The ML still predicts the correct half bandgap energy location, showing the ML robustness to noise and presenting a case where the ML approach outperforms the DPSS. The comparison done in Fig. 6 is repeated 1,000 times for the same defect with random noise. The ML approach has an accuracy of 87.7% for half bandgap energy location prediction, while DPSS reaches 80.9% accuracy. When increasing the noise intensity, the accuracy of the DPSS drops to 44.7%, while the ML retains 76.4% accuracy. This is a strong demonstration that ML-based predictions are more reliable compared to DPSS-based predications when dealing with noisy signals.
The comparison between the DPSS and the ML has been repeated for 10,000 random defect parameters and set of SRH lifetime curves with random noise. In terms of classification, the DPSS approach predicts the correct half bandgap location 60% of the time, while the ML is correct 80% of the time over those 10,000 defects. Furthermore, the regression of the correct bandgap has shown an average defect energy level residual of 0.093 eV for the DPSS compared to almost of magnitude lower for the ML (0.011 eV). This is a clear indication that using the ML approach has a significant advantage over the DPSS approach in terms of its robustness to noise. This is quite remarkable considering the fact the ML classifier and regressors have been trained only on a dataset with no noise. We assume that the proven good robustness to potential measurement noise can be even further improved by training on a noisy dataset.

Experimental application
To be able to apply the ML approach to experimental measurements, an adaptation of the simulation parameters (such as wafer type, N dop , and temperature) to the desired measurement is necessary. Once the ML model is trained, the experimental measurements can be fed to the model in order to extract the defect parameters. This method is validated below by comparison to the traditional approach using the DPSS analysis.
For validation, an n-type Si wafer with N dop = 5.1 × 10 15 cm −3 from Zhu et al. 13 . is used. Lifetime measurements are done using TIDLS at eight different temperatures. The DPSS method is then used to extract E t and k and the results are compared to ML predictions in Fig. 7. Each DPSS curve represents the best fit to the SRH model at that temperature. As discussed, the defects parameters are identified by a simultaneous minimization of the fit errors at all temperatures. The two solutions found by the DPSS method are marked by the black diamond symbol. The transparent ellipse represents the error-space defined by the error bars of this method. These two solutions are (E t = −0.30 ± 0.03 eV; k = 1.87 ± 1.18) for the lower half bandgap and (E t = 0.28 ± 0.04 eV; k = 1.01 ± 0.44) for the upper half bandgap solution. The ML predictions are obtained by training an instance of different ML training algorithms compared in this study: NN, RF and its boosting counterparts. Figure 7 shows that the ensemble models -RF, adaptive, and gradient boosting-all predict defect parameters within the DPSS solutions uncertainty range for both E t and k, while NN predicts E t within the DPPS solution uncertainty range but overshoot the value of k. Furthermore, calculating the sharpness of the DPSS curves' intersection points the half bandgap defect energy location in the lower half bandgap. In this experimental case, the ML approach concur with the DPSS results. The agreement between the ML models and the DPSS method highlights the success of this first-ever application of ML techniques for extraction of defect parameters. Based on this study, the approach can be extended to overcome the limitations of the currently used methods. For example, multi-level defects or multiple single-level defects modelling can be used to generate the training dataset and temperature-dependence of the defect parameters, such as the capture cross-sections, can be introduced into the physical models as well.

DISCUSSION
In this study, we proposed a ML-based approach to defect parameters extraction. By generating more than a million TIDLS lifetime curves with the SRH equation, ML models were trained and evaluated on simulated data. Overall, ensemble methods, such as RF, adaptive, and gradient boosting, seem to be the most promising algorithms to extract the defect parameters with an average R 2 score of 99%, whereas NN excels at predicting the half bandgap location of the defect with an average accuracy of 82.7%, outperforming the DPSS in predicting the half bandgap location of noisy SRH lifetime curves, which showcase an important advantage of the ML approach over the traditional approach. With no knowledge of the relationships predicated by the SRH equation, ML models provide insights regarding the physical limitations, such as when the lifetime curves are indistinguishable from one another, while still performing well in those conditions. Experimental measurements validated the ML approach in this successful defect parameter extraction using ML from lifetime measurements.
As more complex models are trained, additional insights can be gained that can contribute to a better understanding of defects. More complex models can also breach the limitations of traditional approaches as more modelling possibilities could be incorporated into the dataset generation step, such as multiple single-level defects and temperature-dependent capture cross-sections. The proposed approach can be generalized to any number of temperature measurements and wafer conditions and is not limited to Si wafers. The adaptability of this method ensures its uses in very varied scenarios and transfers its potential to other measurements techniques or material measured such as thin-film or perovskites with possibly more complex ML models.

Dataset Simulation
The SRH 18,19 equation is used to generate a dataset of simulated lifetime curves for a wide range of defect parameters (E t , σ n , σ p ). The SRH lifetime τ SRH is given by: In Eq. (2), n 0 (p 0 ) is the electron (hole) carrier concentration at thermal equilibrium and Δn is the excess carrier concentration. In Eq. (3), v n (v p ) is the electron (hole) thermal velocity calculated using the model of Green et al. 40 and N t is the defect density. In Eq. (4), k b is Boltzmann's constant, T is the temperature and n i is the intrinsic carrier concentration calculated using the model of Couderc et al. 41 , taking into account bandgap narrowing with the model of Yan et al. 42 . E t is defined relative to the intrinsic energy of Si E i (E t = 0 eV is at mid-gap) and grows positive as E t gets closer to the conduction band. At this stage, it is assumed that the defect parameters (E t , σ n , σ p ) are independent of temperature.
For each combination of defect parameters (E t , σ n , σ p ), 500 lifetime points are simulated as a feature vector of 100 Δn points at five different temperatures. Without loss of generality, p-type wafers are chosen with a bulk doping density N dop fixed at 10 15 cm −3 , while N t is fixed at 10 12 cm −3 . In total, 300,000 defect parameters combinations are chosen randomly within the reasonable physical range resulting in a million and a halfsimulated lifetime curves ranging from 0.1 μs to 10 ms. The ranges of the other simulation parameters are given in Table 1.

Machine learning training
The simulated dataset is randomly split into a training dataset (90% of the simulated data) used to train the ML model and a validation dataset (10%) used to evaluate the model. In the traditional defect parameter extraction approach, such as DPSS 20 and its derivatives 14,21,22 , two solutions for E t are usually obtained, with one in the upper half of the bandgap (E t,+ ) and the second in the lower half of the bandgap (E t,-). In this study, for the regression of E t , the training of the ML model is done separately for each half bandgap. The results are then combined and presented together. In order to identify in which half-bandgap the defect is located, the same approach is used to train a classifier model to predict the sign of E t . The classifier outputs the probability of the defect being in the upper or lower half bandgap. The capture cross-section regression's target is the ratio k:= σ n /σ p , which is usually reported in defect parameters' characterization 7,9,14,21,22 . For the regression of k, the training of the ML model is done on the whole bandgap. In total, four ML models are trained to inverse the SRH equation: two regression models to predict E t , one regression model to predict k; and one classification model to determine the defect energy location. The ML training flow is summarized in Fig. 8.
For each of the four ML models needed, different ML training algorithms can be used, defined in each grey box of Fig. 8. Random forests (RF) 43 are an ensemble of decision trees, where each tree is built from a random subset of the training data. The RF model outputs an average decision by all its decision trees. Enhancements of this method are possible through boosting, notably adaptive boosting 37,44 and gradient boosting 38 , which weights the decision of every tree to improve the predictive power of the overall model. A neural network (NN) 36,45 is a layered group of nodes,  Fig. 8 Machine learning defect parameter extraction training flow. Dataset is simulated using random defect parameters within defined constraints and different machine learning models are trained for each of the prediction targets: E t,+ , E t,− , k and the defect energy location.
called neurons, with a set of weights that are updated as it processes the training data. NN are widely used for different applications, such as image recognition 46 , text translation 46 or beating Go world champions 47 . Another widely used algorithm is support vector machines (SVM) 48,49 which consists of mapping the training datapoints to a higher dimensional feature space where the data becomes easily separable. For classification problems, the K-nearest neighbours (KNN) 50 algorithm simply assigns the predicted label to be the majority vote of the closest known training data points. Figure 8 also shows that most ML training algorithms are used to train both regression and classification models, however, SVM is dedicated to regressions and KNN to classification. In order to compare the different trained ML models, two scoring techniques are introduced. For the regression of defect parameters, predictions are made on the validation dataset and correlated to the true defect parameter values. To score the correlation, the coefficient of determination (R 2 ) is used 51 : where 'pred' ('true') reference to the predicted (true) values of the validation dataset and y i spans the N data points of the validation dataset. The R 2 score has a maximum value of 1 which indicates a perfect correlation and enables comparison between prediction targets. For the trained classification models, a confusion matrix is created, comparing predicted and true labels. The considered classification is binary, and predictions can have one of the following four outcomes: true positive (TP), true negative (TN), false negative (FN) and false positive (FP). The accuracy ('Acc') is a defined as 52 : The accuracy measures the ratio between correctly predicted points and the total number of points of the training dataset. The lifetime curves dataset simulation, ML model training and evaluation are implemented using Python and Scikit-learn 53 . More information about the hyperparameter used can be found in the Supplementary Methods. The machine used for training has an Intel ® Xeon ® W-2145 processor, a central processing unit up to 3.70 GHz and a random-access memory of 64 GB. The training time is in the order of minutes for all the algorithms presented in this paper.