Introduction

Today, artificial intelligence (AI) systems are demonstrating abilities that match or surpass skilled human performance in many fields, a feat that was barely possible 1 year ago and that is evolving at an unprecedented rate1. There is a growing focus on applying AI techniques to data extraction and analysis in the physical and applied sciences2. Only a few studies have applied ML-based algorithms to model the internal parameters of photodiodes. Ruiz Euler et al.3 utilized deep neural networks (DNN) to optimize multi-terminal nanoelectronics devices. They employed the gradient descent algorithm4 and achieved successful predictions of device functionality in disordered networks of dopant atoms in silicon. El-Mahalawy and El-Safty5 employed the Quantum Neural Network (QNN) to model the characteristics of the NTCDA/p-Si UV photodiode, accurately capturing trends and extrapolating unknown current values under different illuminations. ML algorithms have also found applications in laser welding6,7,8, optical photodiodes9,10, organic diodes11, and photonics12.

Theoretical background

In this study, we assemble, train, and apply ML to evaluate the internal parameters of semiconductor photodiodes (SPDs) when their current responses to illuminations are empirically known. This is a standard experiment for semiconductor diodes. The current response of an SPD is governed by the TE equation. This is a complex equation that depends on the aforementioned internal parameters \(\phi \), n, \(R_s\), on the applied voltage bias V, and on the ambient parameters i.e. the absolute device temperature T, and illumination, P. An empirical data point in a typical SPD measurement (at a given P and T) consists of the external, observable diode current I, and V. Incidentally, in the TE model, I is circularly dependent on itself in combination with \(R_s\), V, T, \(\phi \), and n according to the expression

$$\begin{aligned} I=AA^*T^2\exp \Big (-\frac{q\phi }{kT}\Big )\exp \Big (\frac{qV-IR_s}{nkT}\Big )\Big \{1-\exp \Big (-\frac{qV-IR_s}{kT}\Big )\Big \}, \end{aligned}$$
(1)

where q is electronic charge, k is the Boltzmann constant, A is the diode area, \(A^*\) is the Richardson constant13,14,15,16. For a given SPD, the interest is to characterize n, \(R_s\), and \(\phi \). Evidently, Eq. (1) is extremely difficult to evaluate for these parameters, with many methods having been devised over the last five decades. Many are still in use, but almost all rely on heavy simplifying approximations owing to the typically non-zero \(R_s\) in real devices17,18,19,20. One such method is the Cheung–Cheung method which was developed in the 1980s18. It relies on two functions that are linear in the current:

$$\begin{aligned} \frac{dV}{d\ln I}= & {} R_sI+\frac{nkT}{q},\nonumber \\ H(I)= & {} R_sI+n\phi = V-\frac{nkT}{q}\ln \Big (\frac{I}{AA^*T^2}\Big ), \end{aligned}$$
(2)

with the symbols as previously defined. The method gives two estimates of \(R_s\). The intercepts from the first and second plots then lead to an estimate of n and \(\phi \), respectively. For the datasets used in this study, \(A^*\)=32 A/K\(^2\)cm\(^2\), A=1mm, and T=300K. Then, we assemble, train, and apply an ANN machine language model to the empirical datasets to evaluate the internal parameters of a p-Si/Au SPD without explicitly presenting Eq. (1) to the models. For background, we briefly shall describe the ANN model, but the detailed operational principles can be found in several sources. (Fig. 1)

Figure 1
figure 1

A schematic depiction of a typical ANN model.

In brief, the ANN model simulates interconnected neuron networks (through interconnected nodes) that are inspired by the human brain. The depiction shows an input layer, one or more hidden layers that extract input and output features and patterns, and an output layer that produces the final classification. The j-th output, \(Y_j\), is computed using weights \(w_i\) on inputs \(X_i\) according to

$$\begin{aligned} Y_j=\sum _{i=1}^nw_iX_i+b. \end{aligned}$$
(3)

A bias factor, b, is included for flexibility in training the model. In this study, the weighted sum is input to the efficient, nonlinearity-inducing mathematical function called the rectified linear unit (ReLU) to determine whether or not a node with critical inputs is activated. The function ReLU is defined as

$$\begin{aligned} f(x)=\max (0,x)= {\left\{ \begin{array}{ll} x &{} \text {if } x>0\\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(4)

where x refers to the input to a neuron in the network.

In this study, we trained and evaluated the ANN ML model on thirty-six private experimental datasets from three different SBDs, denoted D1, D2, and D3, using Google Colaboratory21. The dataset is derived from our previously published SPDs22,23,24. They contain responses to the illumination falling on three photodiodes doped with 0%, 1%, 3%, 5%, and 10% graphene on the p-type silicon substrate material. Each SPD is subjected to dark up to 30 mW/cm2 illumination. The datasets also contain the calculated values of the following parameters for each diode: barrier height (\(\phi \)), ideality factor (n), and the series resistance (\(R_s\)). The results demonstrate that the ANN achieved a mean square error (MSE) and a mean absolute error (MAE) score of less than 0.003 for all datasets. Thus, the ANN ML model efficiently learned the photo responses of the photodiodes and correctly predicted their barrier height, ideality factor, and series resistance with extreme accuracy. We show that the ML model does not need prior knowledge of the mathematical TE model in any form to reach its predictions. We argue that ML modeling should be seen as a complement to researchers by adding a new, rapidly evolving tool into their arsenal. These tools can substantially reduce the time for analysis in the device development cycle and can be adapted to other areas and fields.

Results

The ANN model

Table 1 shows the train and validation accuracy for the ANN model after 30 epochs. An Epoch marks the processing of all data once. It typically involves several iterations, which can involve data batches of a specified size.

Table 1 Averaged MSE and MAE performance of the ANN models for three different SBDs, D1, D2, and D3.

Additionally, the table provides the average test accuracy for each model, which was calculated by summing the three test accuracies and dividing them by three. These findings suggest that the ANN model achieved good accuracy in predicting the target variables. The results also indicate that the model’s performance varied depending on the dataset used for training. This could be due to differences in the characteristics of the datasets, such as the range and distribution of the variables. Further investigations are needed to determine the factors affecting the model’s performance and to optimize its training parameters. In summary, the ANN model trained on thirty-six datasets from three devices using two independent and three target variables showed promising results in predicting \(\phi \), n, and \(R_s\). Figure 2 shows the training loss and accuracy curve for some models. The plots show the model’s performance for 30 epochs. The figures show that both the train—validation accuracy curve distance, and the train—validation loss curve distance is small, indicating that there is no overfitting in the ANN models.

Figure 2
figure 2

Collated plots showing the training (curves T) and validation (curves V) losses for the ANN ML model applied to three SBDs D1, D2, and D3. All the plots have a scale factor of 10−3. The doping levels range from 0 to 10% GO, and the illuminations from 0 to 60 mW/cm2. The original simulation images are available21.

The comparison is depicted in Fig. 3, where space limitations allowed us to present the comparison for ten predicted and actual values per dataset. The plots reveal a negligible difference between the predicted and actual values. This compellingly demonstrates the accuracy of the ANN models’ predictions, as the predicted values align satisfactorily with the expected values.

Figure 3
figure 3

Plots showing comparisons between the actual and the ANN model predicted values of n, \(\Phi \), and \(R_s\) for three SBDs, D1, D2, and D3. For reasons of limited space, only values at dark and 30 mW/cm\(^2\) intensities are compared21.

Furthermore, it affirms the effective learning of the photodiodes’ light responses by the ANN models, leading to accurate predictions of barrier height, ideality factor, and series resistance. The performance of the ANN models exhibited variability based on the training dataset utilized. This variability may arise from disparities in dataset characteristics, encompassing variables’ range and distribution. To comprehensively understand the factors influencing the model’s performance and optimize training parameters, further investigations are warranted. In summary, the ANN model trained on twelve datasets, incorporating two independent variables and three target variables, demonstrated promising outcomes in accurately predicting \(\Phi \), n, and \(R_s\).

Discussion

The aforementioned machine learning (ML) languages have demonstrated the ability to deduce patterns in data pertaining to the transfer characteristics of the SPD without any prior knowledge of the device physics or the TE equation. This is achieved through training on a small set of data, allowing the ML model to generalize and make accurate predictions on unseen data. The current-voltage relationship of SPDs is governed by the highly non-linear TE equation. In the context of parameter extraction for a given SPD, it is essential to linearize specific regions of the V-I characteristic. Many methods, like the previously described Cheung–Cheung method, are viable to accomplish this task but require a carefully selected voltage range to give accurate results. As a consequence, different individuals may select different ranges, leading to a significant variance in the extracted parameters even for a given SPD. During the implementation of the ANN models, it was observed that the algorithms rapidly and unexpectedly converged to the optimal ranges of applied bias across all instances. The study’s findings demonstrate the effectiveness of using ANN ML-based algorithms to accurately model the light responses of photodiodes. Through extensive training and evaluation of data collected from three photodiodes, the models successfully predicted critical parameters, such as barrier height, series resistance, and ideality factor. Importantly, the models also exhibited the ability to estimate photodiode light responses under varying illuminations and voltage settings, demonstrating their broad applicability. Furthermore, the models proved capable of predicting responses with minimal error, from 0 to 100 mW/cm2. However, complete reliance on machine learning (ML) models may not provide a comprehensive understanding of peculiarities in the data, such as negative differential conductance regions or breakdowns. Moreover, this study only explored one type of Schottky barrier diode (SBD) with p-Si/Au construction. However, SBDs with different constructions exhibit similar characteristics to the present devices, and it is highly likely that the same models can be used to determine their internal parameters. Further investigations are necessary to verify this, as the scope of this work is not intended to be exhaustive, but rather to demonstrate the potential of ML tools. Researchers can utilize these models to minimize the time and resources required for conducting experiments. These findings provide an exciting avenue for future research through the application of ML techniques to model complex and highly nonlinear systems, thereby enhancing our overall understanding of their behaviors. Finally, the dataset25 and models26,27,28 are available upon request to enable independent evaluation.

Methods

Dataset creation

The datasets presented to the developed ML models are based on V and I data points measured on three different published SBDs: D1 = Al/GO:CoPc/p-Si/Au22, D2 = Al/pSi/GO:PCBM/Au23, and D3 = Au/GO:Coumarin/p-Si/Al24. The publications summarize the instantaneous estimates of \(\phi \), n, and \(R_s\) using the Cheung–Cheung functions in Eq. (2). Table 2 shows the known results of the Cheung–Cheung method for all three diodes with 0%, 1%, 3%, 5%, and 10% GO content.

Table 2 Empirical results using the Cheung–Cheung functions for SBDs D1, D2, and D3. The entries at 80 and 100 mW/cm2 are the predictions by the ANN model.

Each dataset entry therefore has the structure (V, I, \(\phi \), n, \(R_s\)) for each illumination intensity. The four illumination intensities used are 0 mW/cm2 (dark), 10, 30, and 60, all measured in mW/cm2. Thirty-six private datasets denoting 0%, 1%, 3%, 5%, and 10% doping levels were collected from three different devices. Twelve of the datasets were collected from D1 with 101 measured (I, V) samples, twelve from D2 with 201 samples, and twelve from D3 with 251 samples. Figure 4 shows the plot of the raw empirical data collected for these diodes over the illumination range from dark to 100 mW/cm2. Only the 0–60 mW/cm2 intensities were used in the development of the models. The 80 and 100 mW/cm2 intensities were used as data for prediction during the ML model development.

Figure 4
figure 4

The plots of the measured current-voltage characteristics for the three diodes D1, D2, and D3 with various dopants and doping concentrations. Each set of I-V characteristics is measured over the 0 to 100 mW/cm2 illumination intensity range.

In all, there are 36 datasets consisting of 101, 201, and 251 sample points, respectively. They represent actual measurements along the current-voltage characteristics, from − 5 V to + 5V, including 0, in steps of 0.1, 0.04 V, and 0.05 V, respectively.

The datasets were standardized before training and contain the calculated \(\phi \), n, and \(R_s\) for each diode. Standardization was done to ensure that all the responses in the dataset contribute equally to the applied ML model.

The overall approach

We utilize ANN models to assess internal parameters of a p-Si/Au SPD using empirical datasets, without explicitly presenting Eq. (1). The ANN model comprises an input layer and an output layer, with a specific number of layers determined through experimentation. The input layer consists of two neurons, while the output layer consists of three neurons representing \(\Phi \), n, and \(R_s\). The ANN architecture and training parameters are depicted in Fig. 1 and Table 3, respectively. Figure 5 shows the data flow in the ANN ML used in this work. To accommodate the dataset size, we employed five-fold cross-validation to evaluate the ANN models, ensuring a more reliable estimation of performance on unseen samples. The five-fold cross-validation procedure utilized the KFold cross-validation function from the scikit-learn ML library29. Figure 6 illustrates the steps involved in the five-fold cross-validation.

Table 3 Training parameters for the ANN used in this work.
Figure 5
figure 5

A block schematic representation of the machine language models and the data flows in the processing.

Figure 6
figure 6

The five-fold cross-validation procedure. TE is the total error.

The dataset is split into five-folds, and the experiment is executed over five iterations. In each iteration, four folds are utilized for training the ANN model, while the remaining fold is used for testing. This process is repeated until all five folds have been employed for training and testing. Throughout each iteration, the MSE and the MAE are calculated and recorded. Ultimately, the average MSE and MAE are computed at the conclusion of the final iteration.

Training and model testing

The ANN model was first trained and then cross-validated by the five-fold approach in Fig. 6. The training and validation made use of thirty-six (3x12) different datasets, based on the parameters in Table 3. The 3 \(\times \) 12 datasets comprised 101, 201, and 251 (I, V) samples, respectively.

Finally, the trained model was tested on all samples from the original datasets. Table 1 also shows the MSE and MAE for the developed ANN model for D1, D2, and D3. Figure 7 plots the current-voltage characteristics for external data that were used to validate the ML models further.

Figure 7
figure 7

Plots of additional original datasets for D1 at 5% and 20% GO doping levels, and D2 at 1% GO. These extra data were not used in the ML model development stages but for further validation.

Performance metrics

The performance of the ANN models was assessed by the aforementioned metrics: MSE and MAE. A low MSE and MAE indicate a highly accurate model, with a zero MSE signifying a perfect match between predicted and actual values. Mathematically,

$$\begin{aligned} \text {MSE} = \frac{1}{N}\sum _{i=1}^N(V_{a,i} - V_{p,i})^2\quad \text {and}\quad \text {MAE} = \frac{1}{N}\sum _{i=1}^N|V_{a,i} - V_{p,i}|, \end{aligned}$$
(5)

where N is the number of samples in the dataset, \(V_{a,i}\) and \(V_{p,i}\) are the actual and predicted values, respectively, in the dataset. Table 1 displays the MSE and MAE results for the ANN models after 30 Epochs, where an Epoch represents the processing of all data once through several iterations, possibly involving data batches. The performance evaluation utilized five-fold cross-validation. Remarkably, all ANN models achieved an average MSE below 0.003, indicating an accurate prediction of the target variables. This successful outcome demonstrates the models’ ability to learn and capture the values of \(\Phi \), n, and \(R_s\), while effectively capturing the target variable trends. These findings underscore the impressive potential of ANN models to analyze and predict the internal parameters of an SPD, even without prior knowledge of the nonlinear thermionic emission (TE) expression for SBDs.

Conclusions

ML models can be valuable tools that complement researchers, offering significant time savings in the device development cycle and adaptability to various fields. In this study, we successfully designed and developed diverse ANN-based models to analyze and extract the internal parameters of a Schottky photodiode (SPD). These models were trained and evaluated on 36 SPD datasets, yielding remarkable results. With an MSE and MAE score below 0.003, the ANN models accurately learned the internal parameters, predicting barrier height, series resistance, and ideality factor. It is worth noting that the ANN models also demonstrate their utility as a tool for post-publication validation of the results of earlier work that was based on the pedantic Cheung–Cheung method. Notably, the models demonstrated their ability to estimate unknown photodiode light responses under different illuminations and voltage settings without overfitting. This underscores their reliability. The utility of ML models to researchers lies in reducing the time-consuming repetition of experiments, enabling the generation of reliable internal parameters from prior data. By streamlining analysis tasks, researchers can now dedicate more attention to critical aspects of their investigations, thereby improving productivity.