Introduction

Although the theory of electron–phonon superconductivity due to Bardeen–Cooper–Schrieffer, Gor’kov, Eliashberg, Migdal, and others is well-established, it has not historically aided in the discovery of new superconductors. The materials space to search for new superconductors is vast, and it is, therefore, desirable to find a practical way to use theory as a guide. Recent computational developments may allow a new approach to superconducting materials discovery based on ab-initio and materials-genome type methods1,2,3.

One approach to this problem, pioneered by McMillan4 and Allen and Dynes5, is to search for a formula for Tc based on materials-specific parameters derived from the Eliashberg equations of superconductivity. These parameters, mostly moments of the electron–phonon spectral function α2F(ω), can be determined by experiment or, more recently, calculated within ab initio approaches. In principle, this allows one also to deduce how to optimize Tc if one can optimize one or more of these parameters.

The Allen–Dynes equation has played a crucial role in debates on how to achieve high-temperature superconductivity by both theorists, who use it to predict Tc, and by experimentalists, who extract λ from measured Tc and ωD. Nevertheless, it is important to recall that the Allen–Dynes equation has been derived from Eliashberg theory within an approximation where the momentum dependence of the Eliashberg function is neglected. It is based on 217 Eliashberg solutions of three types of α2F(ω) shapes (those obtained from tunneling data on Hg and Pb, and those obtained for a single Einstein mode).

There have been several important advances in providing more detailed solutions to the Eliashberg equations since the work of Allen and Dynes. Combescot solved the Eliashberg equations on the weak coupling side and obtained an expression for Tc that depends on \(\omega_{\mathrm{log}}\) and a shape-dependent integral6. Recently, Marsiglio et al. solved the Eliashberg equations numerically at small λ7,8, exhibiting some deviations of the theory from the BCS result in this limit, in particular the correction \(\frac{1}{\sqrt{e}}\) to the BCS Tc9,10,11. And of course the full equations can be solved numerically for any coupling, including the momentum dependence of α2F if desired12,13. The validity of the theory as λ increases is a subtle question, and has been the subject of a number of recent studies14,15,16,17,18,19.

In this paper, we solve the Eliashberg equations using different types of electron–phonon spectral functions, including multimodal Einstein-like spectra and a set of α2F obtained from first-principles calculations. We find that, while the Allen–Dynes formula accurately predicts the Eliashberg Tc for λ values near 1.6 (the coupling constant for Hg and Pb), it nevertheless deviates from the Eliashberg Tc when λ is significantly larger or smaller than 1.6 and when the shape of α2F(ω) differs from the simple unimodal Einstein model. This deficiency highlights the need to improve on Allen–Dynes to investigate the high-pressure, high-temperature hydrides of great current interest20.

In a previous paper, we used an analytical machine learning approach to try to improve on the Allen–Dynes formula, testing and training on tiny databases from the Allen–Dynes table of 29 superconducting materials21. This proof of principle work showed that the SISSO framework, properly constrained by physical law, could substantially improve the performance of the Allen–Dynes equation with a smaller number of parameters. Clearly, it is necessary to apply this approach to a more extensive and diverse database.

Here, we proceed more systematically and show how we can “teach the machine Eliashberg theory” by generating large databases of α2F functions from both real materials and single- and multimodal artificial ones and learning the results of Tc from solutions to the Eliashberg equations. We additionally include in our study α2F functions for superhydrides, extending training and testing to the higher λ range. We show that the Allen–Dynes equation fails in this region particularly badly, since it was designed to fit materials with the ratio of the Allen–Dynes parameters \({\bar{\omega }}_{2}/{\omega }_{{{\mathrm{log}}}\,}\simeq 1\), which is strongly violated in some of the higher-Tc materials. Here λ is the integral \(2\int\nolimits_{0}^{\infty }{\alpha }^{2}F(\omega )/\omega \ {\rm {d}}\omega\), the frequencies \({\bar{\omega }}_{n}\) are the nth root of the nth moment of the normalized distribution g(ω) = 2/(λω) α2F(ω), and \({\omega }_{{{\mathrm{log}}}\,}\equiv \exp \left [ \int_0^\infty \ln \omega \, g(\omega) \, d\omega \right ]\).

We begin by introducing the McMillan and Allen–Dynes equations, against which we will compare our results. McMillan4, in an attempt to improve on the BCS weak-coupling Tc, incorporated elements of Eliashberg theory22 into a phenomenological expression, relating Tc to physical parameters that could in principle be extracted from tunneling data23,

$${T}_{{{\mathrm{c}}}}\simeq \frac{{\omega }_{{{\mathrm{D}}}}}{1.45}\exp \left(-\frac{1.04(1+\lambda )}{\lambda -{\mu }^{* }(1+0.62\lambda )}\right),$$
(1)

where μ* is the Coulomb pseudopotential and ωD is the Debye frequency. Note that the McMillan formula predicts a saturation of Tc in the strong-coupling limit, λ → , for fixed ωD.

Allen and Dynes5 showed that the true Eliashberg Tc did not obey such a bound in this limit but rather grew as \(\sqrt{\lambda }\). They proposed an alternate approximate fit to Eliashberg theory based on data on a few low-Tc superconductors known in 1975

$${T}_{{{\mathrm{c}}}}=\frac{{f}_{1}{f}_{2}{\omega }_{{{\mathrm{log}}}\,}}{1.20}\exp \left(-\frac{1.04(1+\lambda )}{\lambda -{\mu }^{* }(1+0.62\lambda )}\right),$$
(2)
$${f}_{1}={\left(1+{\left(\frac{\lambda }{2.46(1+3.8{\mu }^{* })}\right)}^{3/2}\right)}^{1/3},$$
(3)
$${f}_{2}=\left(1+\frac{{\lambda }^{2}(\frac{{\bar{\omega }}_{2}}{{\omega }_{{{\mathrm{log}}}\,}}-1)}{{\lambda }^{2}+{[1.82(1+6.3{\mu }^{* })(\frac{{\bar{\omega }}_{2}}{{\omega }_{{{\mathrm{log}}}\,}})]}^{2}}\right),$$
(4)

where f1 and f2 are factors depending on \(\lambda ,{\mu }^{* },{\omega }_{{{\mathrm{log}}}\,}\), and \({\bar{\omega }}_{2}\).

Results and discussion

Figure 1 outlines our methods and computational workflow. We begin by collecting α2F(ω) spectral functions from ab initio calculations and augmenting the dataset with artificial spectral functions based on generated Gaussian functions. The Coulomb pseudopotential μ* is sampled as a free parameter and used, alongside the spectral functions, as an input to the Eliashberg equations. Eliashberg theory yields the superconducting gap function Δ, from which we extract \(T_{\mathrm{c}}^{\mathrm{E}}\). At the same time, we extract the quantities λ, \({\omega }_{{{\mathrm{log}}}\,}\), and \(\bar{\omega}_2\) from α2F. Next, we use machine learning techniques to learn the relationship between the four model inputs, or features, and the critical temperature from Eliashberg theory \({T}_{\mathrm{c}}^{\mathrm{E}}\). Finally, we compare the predictive models for Tc and discuss the feature-Tc relationships.

Fig. 1: Workflow for identifying new machine learning models for Tc from α2F(ω) spectra and derived quantities.
figure 1

The workflow is organized into four computational modules: data collection, preprocessing, machine learning, and application.

Computational details

We compile a set of 2874 electron–phonon spectral functions α2F(ω), summarized in Table 1. Of these, 13 are conventional phonon mediated superconductors, where we calculate α2F using the electron–phonon Wannier package (EPW)12,24 of the Quantum Espresso (QE) code25,26. An additional 42 (29 classic and 13 hydride superconductors) are obtained from the computational superconductivity literature. We augment the dataset by generating 2819 artificial multimodal α2F(ω) functions and calculating the corresponding Tc’s with the EPW code. The superconducting transition temperatures are estimated by using both the Allen–Dynes equation and by solving the isotropic Eliashberg equations. The raw data are available upon request.

Table 1 Summary of the datasets used for training and validation of the machine learning model.

The artificially generated α2F(ω) consist of three Gaussian peaks with randomly selected peak location and height

$$\begin{array}{r}{\alpha }^{2}F(\omega )=\mathop{\sum }\limits_{i=1}^{3}\frac{{\lambda }_{i}\omega }{2}G(\omega -{\omega }_{i}),\end{array}$$
(5)

where G(ω) is a normalized Gaussian with width of 1/8 of the peak frequency ωi.

The total λ is then equal to the sum of the λi, which simplifies sampling of the space of spectral functions. The artificial trimodal α2F spectral functions resemble those of many realistic materials, see Fig. 2 for the example of LaAl2. The Allen–Dynes and Eliashberg Tc for the hydrides are obtained from published work (see refs. in Table 1).

Fig. 2: Artificial Gaussian α2F(ω) spectral functions.
figure 2

Comparison of a α2F(ω) for LaAl2 with b a trimodal model α2F(ω) illustrates that the model spectral functions can resemble real materials.

To ensure efficient sampling of the input spaces, we select values of λ and μ* with pseudorandom Sobol sequences. As shown in Fig. 3, our uniform sampling scheme results in a set of artificially generated α2F corresponding to an approximately uniform distribution of Tc. Next, we removed artificial entries with Tc > 400 K to better reflect the distribution of realistic materials. While the histogram of μ* remains approximately uniform after this truncation, the histograms of λ, \({\omega }_{{{\mathrm{log}}}\,}\), and \({\bar{\omega }}_{2}\) become skewed towards lower values.

Fig. 3: Histograms of input spaces of materials data and artificial Gaussian models.
figure 3

Compared to a the materials data, b the artificial Gaussian models exhibit superior coverage of the input spaces. We generated artificial Gaussian models by sampling inputs uniformly with Sobol sequences and retaining entries with Tc ≤ 400.

Data

In the Allen–Dynes formula, the “arbitrarily chosen” shape-dependent factor f2 is based on the numerical solutions using the spectral functions of Hg, Pb, and the Einstein model5. Because the number of α2F(ω) shapes is small, it is expected that the Allen–Dynes Tc (\({T}_{\,{{\mbox{c}}}}^{{{\mbox{AD}}}\,}\)) would have significant errors in some instances. Figure 4 illustrates such deviations for bimodal Gaussian spectral functions. So far, we discussed the α2F(ω) shapes in an abstract sense because there is no single parameter that uniquely determines their shape. Allen and Dynes proposed using the ratio \({\omega }_{\mathrm{log}} /{\bar{\omega }}_{2}\) as an indicator of the shape of α2F(ω). In Fig. 4, the ratio \({{T}_{{{\mathrm{c}}}}^{{{\mathrm{AD}}}}}/{{T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}}\) is plotted against \({\omega}_{{{\mathrm{log}}}} / {\bar{\omega}}_{2}\) for λ = 0.6, 1, 2, 3 and 4. The results demonstrate that there can be significant differences between the Allen–Dynes \({T}_{{{\mathrm{c}}}}^{{{\mathrm{AD}}}}\) and Eliashberg \({T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}\) even for some simple cases. The root mean square error in the Allen–Dynes paper is 5.6%. When the ratio \({\omega}_{{{\mathrm{log}}}} / {\bar{\omega}}_{2}\) is 1, the shape of α2F is that of the unimodal Einstein model and the Allen–Dynes Tc accurately predict the Eliashberg Tc regardless of the coupling strength. When the ratio \({\omega}_{{{\mathrm{log}}}} / {\bar{\omega}}_{2}\) decreases, i.e. the shape of α2F has more structure; whether the Allen–Dynes formula can then still reasonably predict the Eliashberg Tc depends on the electron–phonon coupling strength.

Fig. 4: Ratio of the Allen–Dynes and Eliashberg Tc for a bimodal Einstein-like model compared to data for hydrides.
figure 4

The hydride data are obtained from refs. 35,36,37. For the bimodal spectral functions, we select λ1 = λ2 for simplicity and vary the total λ from 1 to 4.

In this work, we train and test machine-learning models using the datasets listed in Table 1. Two sizes are reported for each non-Gaussian dataset, indicating the number of unique materials compared to the total number of datapoints. We sample μ* between [0.1, 0.16] which covers a wide range of possible μ* values5,27. The calculated, artificial Gaussian, and literature-derived α2F datasets are used for training all machine learning models. We left the hydride materials out of the training in order to validate the extrapolative capacity of each model.

Correction factors for T c from symbolic regression

As in our previous symbolic regression effort21, we use the SISSO framework to generate millions of candidate expressions by recursively combining the input variables with mathematical operators such as addition and exponentiation. We performed symbolic regression twice, sequentially, to obtain two dimensionless prefactors of the McMillan exponential, yielding a machine learned critical temperature

$${T}_{\,{{\mbox{c}}}\,}^{{{{\rm{ML}}}}}=\frac{{f}_{\omega }{f}_{\mu }\ {\omega }_{{{\mathrm{log}}}\,}}{1.20}\exp \left(-\frac{1.04(1+\lambda )}{\lambda -{\mu }^{* }(1+0.62\lambda )}\right).$$
(6)

We name the two learned prefactors a posteriori based on their functional forms and the mechanisms by which they reduce the error in predicting Tc. The first factor

$${f}_{\omega }=1.92\left(\frac{\lambda +\frac{{\omega }_{{{\mathrm{log}}}\,}}{{\bar{\omega }}_{2}}-\root 3 \of {{\mu }^{* }}}{\sqrt{\lambda }\exp (\frac{{\omega }_{{{\mathrm{log}}}\,}}{{\bar{\omega }}_{2}})}\right)-0.08$$
(7)

is obtained from the fit to the ratio \({T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}/{T}_{{{\mathrm{c}}}}^{{{\mathrm{McMillan}}}}\) and eliminates the systematic underprediction of Tc at higher temperatures. Like the Allen–Dynes prefactor f2, fω includes the ratio \({\omega }_{{{\mathrm{log}}}}/{\bar{\omega }}_{2}\), modifying the prediction based on the shape of α2F(ω). Moreover, fω also scales with \(\sqrt{\lambda }\), like the Allen–Dynes prefactor f1. This is in agreement with the correct large-λ behavior of Eliashberg theory, unlike our earlier work21 and the modified Tc equation with linear correction proposed recently by Shipley et al. 28. The manifestation of both behaviors in fω gives credence to our symbolic regression approach because it incorporates the primary effects of the Allen–Dynes equation with fewer parameters. Applying the correction \({T}_{{{\mathrm{c}}}}={f}_{\omega }{T}_{{{\mathrm{c}}}}^{{{\mathrm{McMillan}}}\,}\) achieves a percent RMSE of 15.2% across the materials (non-Gaussian model) data, compared to 48.6% when using the Allen–Dynes equation.

The second correction factor

$${f}_{\mu }=\frac{6.86\exp \left(\frac{-\lambda }{{\mu }^{* }}\right)}{\frac{1}{\lambda }-{\mu }^{* }-\frac{{\omega }_{{{\mathrm{log}}}\,}}{{\bar{\omega }}_{2}}}+1$$
(8)

is obtained from the fit to the ratio \({T}_{\,{{\mathrm{c}}}}^{{{\mathrm{E}}}}/({f}_{\omega }{T}_{{{\mathrm{c}}}}^{{{\mathrm{McMillan}}}})\), effectively correcting the residual error from the fit of fω and thus cannot be used independently. Applying the correction \({T}_{{{\mathrm{c}}}}={f}_{\omega }{f}_{\mu }{T}_{{{\mathrm{c}}}}^{{{\mathrm{McMillan}}}}\) achieves a percent RMSE of 15.1% across the materials datasets, compared to 15.2% when using fω alone. The influence of fμ is more apparent when examining clusters of points corresponding to resampled μ* values for a single material, where the systematic error in \({T}_{{{\mathrm{c}}}}^{{{\mathrm{ML}}}}/{T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}\) is reduced.

Note that fμ → 1 in both of the limits λ → 0 and λ → , and in fact does not vary by more than ~10% from 1 over the data set.

Figure 5 shows that, apart from the low-Tc non-hydride materials for which the difference is smaller than 0.1 K, the corrections fω and fμ dramatically improve predictions compared to using the Allen–Dynes equation. Since we excluded the hydrides from the training, these results successfully validate our data-driven symbolic regression approach by demonstrating the extrapolative capacity of the learned equations.

Fig. 5: Comparison of predictions using the Allen–Dynes equation and the new symbolic-regression corrections.
figure 5

The model \({T}_{{{\mathrm{c}}}}^{{{\mathrm{Model}}}}\) is plotted against the Eliashberg \({T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}\) such that accurate predictions lie on the gray 1:1 line. a, b Non-hydride and c, d artificial Gaussian panels depict the training error while e, f hydride35,36,37 panels show extrapolative capacity. The non-hydride and artificial Gaussian panels are colored by the log-density of points. We report the root-mean-square error (RMSE), mean-absolute error (MAE), maximum residual, and minimum residual values in Kelvin. The maximum residual corresponds to the largest overprediction while the minimum residual corresponds to the largest underprediction. The two multiplicative factors obtained from symbolic regression improve the prediction compared with the two multiplicative factors of the Allen–Dynes formula, particularly for the higher Tc systems.

To further quantify the similarity between the existing Allen–Dynes prefactors and the machine-learned prefactors, we employ two statistical measures, the Spearman and distance correlation. The Spearman correlation is a measure of monotonicity in the relationship between rankings of two variables. Like the Pearson correlation coefficient for linear correlation, the Spearman correlation varies between −1 and +1, where extrema imply high correlation and zero implies no correlation. Unlike the Pearson correlation, the Spearman correlation does not assume normally distributed datasets. By construction, all four prefactors tend to unity for many materials, resulting in asymmetric distributions that are unsuitable for analysis with parametric measures like the Pearson correlation.

In addition to the Spearman correlation, we compute the distance correlation, another nonparametric measure of the dependence between two variables. The distance correlation is defined as the ratio of the distance covariance and the product of the distance standard deviations, where distance covariance is the weighted Euclidean distance between the joint characteristic function of the two variables and the product of their marginal characteristic functions. Unlike the Pearson and Spearman correlation coefficients, the distance correlation varies between 0 and 1, where 0 indicates that the variables are independent, measuring both linear and nonlinear association.

Table 2 shows a strong relationship between f1, f2, and fω according to both Spearman and distance correlation metrics, with values close to one. This numerical analysis reinforces the conclusion that fω reproduces characteristics of both f1 and f2, as illustrated earlier in the comparison of functional forms. On the other hand, both Spearman correlation and distance correlation measures indicate slightly weaker relationships between fμ and the other three prefactors. The relative independence of fμ compared to fω, f1, and f2 stems from the sequential nature of the fitting process.

Table 2 Correlations between Allen–Dynes and machine-learned prefactors.

Comparing predictive models for T c

To compare existing equations for Tc with the corrections identified in this work, we benchmarked the RMSE across non-hydride materials, artificial Gaussians, and hydrides as tabulated in Table 3. Additionally, we compute the %RMSE by normalizing each RMSE by the mean value across the corresponding dataset. To assess the behavior of each model with increasing λ, we plot \({T}_{{{\mathrm{c}}}}/{\omega }_{{{\mathrm{log}}}\,}\) for each model in Fig. 6.

Table 3 Comparison of model performance on materials and artificial Gaussian datasets.
Fig. 6: Dependence of Tc on λ for select predictive models.
figure 6

The McMillan and Xie at al. (2019) equations, which do not change with \({\omega }_{{{\mathrm{log}}}\,}/{\bar{\omega }}_{2}\), are depicted as dashed curves. The Allen–Dynes formula and the ANN, RF, and symbolic regression machine learning corrections from this work are plotted as shaded regions bound by \({\omega }_{{{\mathrm{log}}}\,}/{\bar{\omega }}_{2}=1.1\) and \({\omega }_{{{\mathrm{log}}}\,}/{\bar{\omega }}_{2}=1.6\) curves. All models behave similarly for low to moderate values of λ. For larger values of λ, the ANN, RF, and symbolic regression corrections deviate significantly from the Allen–Dynes equation as well as the previous symbolic regression equation. The RF regression exhibits discontinuities due to its piecewise-constant form.

As expected, the Allen–Dynes equation improves on the McMillan equation across all three groups. On the other hand, the equation identified by Xie et al. 21 in an earlier symbolic regression work performs slightly worse on the low-Tc non-hydride dataset but achieves lower RMSE across the artificial Gaussian and hydride materials despite being trained on a small set of 29 low-Tc materials.

Applying the new fω prefactor to the McMillan equation reduces %RMSE in non-hydride materials from 14.4% to 8.4%, in artificial Gaussian models from 45.1% to 9.2%, and in hydrides from 36.6% to 5.8%. Moreover, applying both fω and fμ results in a further, modest improvement to the RMSE. In Fig. 6, our machine-learned correction (blue) is nearly equal to the Allen–Dynes equation (gray) for values of λ up to 1 but rapidly increases at larger λ. Both bounds, for higher and lower values of \({\omega }_{{{\mathrm{log}}}\,}/{\bar{\omega }}_{2}\), exceed the bounded region of the Allen–Dynes equation, indicating that at least part of the new model’s success is due to an improvement in capturing the behavior of Tc with increasing λ.

We additionally fit a random forest (RF) model and an artificial neural network (ANN) model using the same training data to compare against our symbolic regression method. Hyperparameters for RF and ANN models were selected using 10-fold leave-cluster-out cross-validation and the same clusters identified for symbolic regression. On the other hand, the model error was estimated using nested cross-validation, where the inner loop was performed using a conventional 5-fold cross-validation scheme. Production models used in Fig. 6 were fit with the selected hyperparameters using the entire training set.

The RF is an ensemble model comprised of decision trees, each fit to random subsets of the data and queried to yield an independent prediction. Each decision tree uses a flow-chart-like series of decisions (branches) to yield predictions (leaves) and is optimized by varying decision thresholds. While individual decision trees are prone to overfitting, a RF produces robust predictions by averaging the predictions of its members. The optimized RF model, consisting of 100 decision trees with a maximum depth of eight splits per tree, achieved the lowest RMSE across all three models, with 4.7% RMSE in the testing set of hydride materials. This success may be attributed to both the flexibility of the method and the relative complexity compared to other methods. With up to 128 nodes per tree, the RF evaluates tens of thousands of binary decisions per prediction. On the other hand, as illustrated in Fig. 6, the resulting output (green) is discontinuous. Furthermore, the RF does not have the ability to extrapolate outside of regions of the input spaces included in the training data, resulting in constant-value outputs. This deficiency is evident in both upper- and lower-bound curves above λ = 3.8, where the RF correction results in a simple rescaling of the McMillan curve.

The ANN models in this work are feedforward neural networks, also known as multi-layer perceptrons, designed to learn highly non-linear function approximators to map multiple inputs to a target output. The feedforward architecture involves an input layer consisting of one neuron per input, one or more hidden layers, and an output layer consisting of one neuron per target. The value at each non-input neuron is a weighted, linear summation of the values in the preceding layer followed by a non-linear activation function. The optimized ANN includes three hidden layers with forty neurons each, totaling 3521 trainable parameters of multiplicative weights and additive biases. Despite the increased model complexity, the ANN performs similarly to the symbolic regression model, with slightly lower training RMSE and slightly higher testing RMSE. With increasing λ, the ANN model yields similar values of Tc as indicated by the overlap between the shaded regions of the symbolic regression model (blue) and the ANN (yellow).

For low to moderate values of λ, such as those originally studied by Allen and Dynes, all models behave similarly and the dimensionless corrections (f1, f2, fω, fμ, ANN, RF) are close to unity. However, as λ increases, the ANN, RF, and symbolic regression corrections deviate significantly from the Allen–Dynes equation as well as the previous symbolic regression equation21. The corrections introduced in this work successfully correct the systematic underprediction of Tc, with the symbolic regression solution offering simplicity and accuracy. Moreover, the monotonicity constraint in the symbolic regression search guarantees invertibility, allowing experimentalists to extract λ from measured Tc and the electron–phonon spectral function. This characteristic is not guaranteed for the RF and ANN models.

Summary

The present work demonstrates the application of symbolic regression to a curated dataset of α2F(ω) spectral functions, yielding an improved analytical correction to the McMillan equation for the critical temperature of a superconductor. We showed that the well-known Allen–Dynes equation, an early improvement based on fitting to a very limited set of spectral functions, exhibits systematic error when predicting the Eliashberg critical temperature of high-Tc hydrides, a flaw due to the original training set being based on low-Tc superconductors. The equation we obtain here by symbolic regression has the same form as the original Allen–Dynes equation, with exactly the same McMillan exponential factor, but has two prefactors that behave very differently than those employed by Allen–Dynes. They ensure that superconductors with spectral functions, α2F(ω), of unusual shapes, such that \({\bar{\omega }}_{2}/{\omega }_{{{\mathrm{log}}}\,}\) is significantly different from 1, are adequately described; this subset of conventional superconductors includes the new hydride high-pressure superconductors. In addition, the machine-learned equation can be simplified by dropping one of the prefactors with negligible loss of accuracy. Since the machine-learned expression of Eqs. (6)–(8) extends the accuracy of the Allen and Dynes expression to high-temperature superconductors while maintaining the utility and simplicity of the original formula, we suggest that this equation should replace the Allen–Dynes formula for predictions of critical temperatures and estimations of λ from experimental data, particularly for higher-temperature superconductors.

Using a dataset of ab initio calculations alongside artificially generated spectral functions, we mitigated the small-data problem associated with previous symbolic-regression efforts. The dimensionless correction factor identified by symbolic regression reproduces the expected physical behavior with increasing λ and achieves lower prediction errors than the Allen–Dynes corrections, despite having similar model complexity. Finally, we compared our equation to models generated with two other machine-learning techniques, which achieve modest improvements in error at the cost of far greater complexity and lack of invertibility. While the present work successfully learns the isotropic Eliashberg Tc, future extensions may incorporate additional data from fully-anisotropic Eliashberg calculations and experimental measurements. On the other hand, separate extensions may involve approximating α2F-related quantities from less-expensive calculation of density functional theory-based descriptors like the electronic density of states.

Methods

Calculating electron–phonon spectral functions with density functional theory

We calculate the electron–phonon spectral functions α2F(ω) for 13 compounds with density-functional theory using the QE code25,26 and the EPW12,24. We use the optimized norm-conserving pseudopotential29,30 and the PBE version of the generalized gradient exchange-correlation functional31. We sample the Brillouin zone for the electron orbitals using a 24 × 24 × 24k-point mesh and for the phonons using a 6 × 6 × 6 q-point mesh. To obtain the critical temperatures Tc of the compounds, we solve the isotropic Eliashberg equations with the EPW code.

Symbolic regression

We performed symbolic regression using the sure independence screening (SIS) and sparsifying operator (SISSO) framework32,33, generating millions of candidate expressions. Based on memory constraints, the subspace of expressions was limited to those generated within four iterations. This limitation precludes the appearance of expressions of the complexity of the Allen–Dynes equation, motivating our search for a dimensionless correction to the McMillan equation rather than directly learning models for Tc.

The initial quantities for generating expressions were the three dimensionless quantities λ, μ*, and the ratio \({\omega }_{{{\mathrm{log}}}\,}/{\bar{\omega }}_{2}\). Candidates were generated using the set of operators \(\{+,-,\times ,\exp ,{{\mathrm{log}}}\,,\sqrt{},\root 3 \of {}{,}^{-1}{,}^{2}{,}^{3}\}\). During the SIS step, these expressions were ranked based on their correlation to the ratio \({T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}/{T}_{{{\mathrm{c}}}}^{{{\mathrm{McMillan}}}}\) rather than \({T}_{{{\mathrm{c}}}}^{{{\mathrm{E}}}}\) to identify dimensionless, multiplicative corrections to \({T}_{{{\mathrm{c}}}}^{{{\mathrm{McMillan}}}}\).

To facilitate generalizability, we employ leave-cluster-out cross-validation during the generation of expressions using k-means-clustering with k = 10 on the combined set of 179 non-hydride and 2819 artificial-Gaussian entries. For each round of cross-validation, we generate candidate equations using a different subset of nine clusters and used the remaining cluster to evaluate performance using the root-mean-square error metric. As such, each training sample was left out of training and used for testing during one round. The top 10,000 models, ranked by root-mean-square error (RMSE) across the training set, were returned from each round. Models that did not appear in all ten rounds, corresponding to those with poor performance in one or more clusters, were eliminated. Following the same principle, we ranked the remaining equations by the average RMSE across all ten rounds.

We note that the sparsifying operator (SO) step of the SISSO framework offers increased model complexity, as we explored in our previous work, but is limited in functional form to linear combinations of expressions generated from the preceding step. The linear combination of expressions from the initial subspace, by extension, also excludes equations as complex as the Allen–Dynes correction. Therefore, we did not consider linear combinations of expressions, meaning the SO simply selected the first-ranked expression from the SIS step in each run.