New alternatives to the Lennard-Jones potential

We present a new method for approximating two-body interatomic potentials from existing ab initio data based on representing the unknown function as an analytic continued fraction. In this study, our method was first inspired by a representation of the unknown potential as a Dirichlet polynomial, i.e., the partial sum of some terms of a Dirichlet series. Our method allows for a close and computationally efficient approximation of the ab initio data for the noble gases Xenon (Xe), Krypton (Kr), Argon (Ar), and Neon (Ne), which are proportional to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r^{-6}$$\end{document}r-6 and to a very simple \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$depth=1$$\end{document}depth=1 truncated continued fraction with integer coefficients and depending on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-r}$$\end{document}n-r only, where n is a natural number (with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=13$$\end{document}n=13 for Xe, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=16$$\end{document}n=16 for Kr, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=17$$\end{document}n=17 for Ar, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=27$$\end{document}n=27 for Neon). For Helium (He), the data is well approximated with a function having only one variable \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-r}$$\end{document}n-r with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=31$$\end{document}n=31 and a truncated continued fraction with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$depth=2$$\end{document}depth=2 (i.e., the third convergent of the expansion). Also, for He, we have found an interesting \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$depth=0$$\end{document}depth=0 result, a Dirichlet polynomial of the form \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_1 \, 6^{-r} + k_2 \, 48^{-r} + k_3 \, 72^{-r}$$\end{document}k16-r+k248-r+k372-r (with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_1, k_2, k_3$$\end{document}k1,k2,k3 all integers), which provides a surprisingly good fit, not only in the attractive but also in the repulsive region. We also discuss lessons learned while facing the surprisingly challenging non-linear optimisation tasks in fitting these approximations and opportunities for parallelisation.

where r is the distance between the interacting atoms and n is the 'repulsion exponent' and historically 1 , by mathematical convenience at the time, it was set as n = 12 , m = 6 and the values of σ , ε are chosen accord- ing to the available experimental data.Clearly the potential has a single root ( V LJ (σ ) = 0 ) and a minimum at r min = 2 1/6 σ and V LJ (r min ) = −σ.

The Mie Potential and Kulakova's approximation with non-integers n and m
It is known that Lennard-Jones explored various values for the parameters n and m before arriving at the final form.In 2017, Lina Kulakova and her colleagues conducted an intriguing study in which they investigated the joint calibration of all parameters in the Lennard-Jones functional form, allowing for non-integer values of n and m 1 .They concluded that "the repulsion exponent n ≈ 6.5 provides an excellent fit for experimental data of liquid argon across a range of thermodynamic conditions, as well as for saturated argon vapor".However, when using the quantum simulation data of the Argon dimer made available by Arthur M. Halpern in 2010 4 , a good fit was not obtained with p = 12 .The data suggested that values of n ≈ 12.7 are "preferred for Argon gas, while experimental data support lower values".
It is worth noting that many decades before 2017, there was a similar proposal; an even more general form of the LJ potential was proposed by the German physicist Gustave Mie in 1903 2 : The potential has a root at r = σ and a minimum at r min given by where So for n = 12 and m = 6 , we have that the minimum is It is important to remark that this functional form of the Mie potential (given by Eq. ( 3)), which is frequently attributed to Ref. 2 , does not appear in that manuscript.In fact, it seems that a generalised form first appeared in a textbook 6 in 1939.We are indebted to R. Sadus, who communicated this fact to us.

Buckingham and other proposed potentials
In 1938, while studying the equation of state for gaseous helium, neon and argon, Richard Buckingham proposed a simplification of the Lennard-Jones potential 7 where A, B, and C are constants.It is important to note that this functional form has a caveat.As the interatomic distance r approaches zero, the first term tends to a constant value, while the second term diverges and becomes negative for small r, indicating an attractive force.Consequently, it loses its physical relevance for very close interatomic distances.This problem is not present in both the Mie and Lennard-Jones potentials.We highlight this fact because, throughout the 20th century, introducing problem domain (i.e., physical) information has been crucial in the proposal of several alternative functional forms.We will discuss this issue in the 'Introducing problem domain information' section.
Other recently proposed functional forms of interest have been extensively discussed in Ref. 3 , so we refer the reader to that paper for more information on these potentials.
In the same paper 3 , Deiters and Sadus introduced a general functional form for a potential called SAAPx, which requires fixing seven coefficients for Helium (He) and six coefficients for the other noble gases Xenon (Xe), Krypton (Kr), Argon (Ar), and Neon (Ne).

Analytic continued fractions and symbolic regression methods
We will start with a simple introduction to symbolic regression to understand how our proposal was data-driven.

An example of symbolic regression
To illustrate how symbolic regression works, let's assume we are given values of an unknown function f(r) on some points (i.e.no experimental error in this case) so we know that the values in the given set {(r, f (r))} are perfectly known.See, for instance, Table 1 as an illustrative example. (1) Current symbolic regression methods, such as the one implemented at the core of the TuringBot software, have demonstrated remarkable power (see the methods used in Ref. 8 and their results on a large variety of datasets).They are capable of "learning" from data using a number of built-in mathematical functions.
For this illustrative task, we have employed the TuringBot software (which implements a symbolic regression approach) to obtain the following function: which does not make any error in the training data and "predicts" that f 1 (10) = 259 , f 1 (11) = 399 and f 1 (12) = 597.
We have obtained a formula with no coefficients and no error in the training data.This may be appealing, but we may suspect that this formula may not be the "true unknown function" we are trying to approximate.To deal with this, TuringBot, like many other symbolic regression packages, allows you to "unselect" many mathematical functions used as "building blocks" provided as default.In fact, we could search for functions using "just" integer coefficients and only the basic arithmetic functions of addition, subtraction, multiplication and division.
In this case, we have been able to use symbolic regression solvers to obtain, for instance, a simple polynomial equation in u = r − 1 such as: which also perfectly fits the data and for which f g.t (10) = 256 , f g.t (11) = 386 and f g.t (12) = 562 , which, as perhaps expected, do not agree with those of Eq. (5).We can rewrite it as: for any integer r ≥ 1.
Without further addition of problem domain knowledge about the nature of the unknown function f(r), both Eqs. ( 5) and (7) can equally be the function (as well as infinitely many others that fit the training data).
We will return to this motivating example later.Still, at this point, we want to remark that we can think of Eq. ( 7) as an approximation using ratios of polynomials in r with integer coefficients.When searching for relatively simpler equations, it is frequently the case that a change of variables may help to reduce the complexity of the final model.This simple illustration paves the way for discussing the following topics since we propose a novel representation.

Analytic continued fraction regression
Since 2019 we have been championing a new approach for multivariate regression.It is based on representing the unknown target function as an analytic continued fraction.The resulting method, called Continued Fraction Regression (cfr), has been demonstrated to have competitive performance on a variety of regression problems 9,10 , including in materials science 11 and physics 12,13 .In Ref. 14 , using 352 datasets from real experiments in the physical and chemical sciences, CFR showed, employing leave-one-out cross-validation, that it was ranked first in 350 out of the 352 datasets (in training) in comparison with ten machine learning regression methods of the scikit-learn collection.In testing, CFR ranked first 192 times, i.e. more than all of the other ten algorithms combined.
In CFR, it is proposed that the target function of a multivariate regression problem can be represented as an analytic continued fraction.For a multivariate input where d is the number of variables, an output y ∈ R , a regression model is defined as an analytic continued function f : X → Y with Y ⊆ R .In Ref. 10 Eq. ( 8) was first proposed, as a first approach to developing the theory, to start studying the representation potential of this functional form: In Eq. ( 8) for a continued fraction with depth n, we have g i (x) ∈ R for all integers i such that 0 ≤ i ≤ n .Each g i : R n → R is associated with an array a i ∈ R n and a constant α i ∈ R .Analogously, each h i : R n → R is defined by an array b i ∈ R n and a constant β i ∈ R .We thus define: We note that the full representational power behind CFR is more general, and other functional forms for the functions g i and h i can be used.It has been a conscious design choice to start exploring the power of this representation by restricting these base functions to be linear.We refer to Ref. 9 to see how complex functions, like the Gamma Function, can be well approximated using these choices and how they perform on 94 real-world datasets of the Penn Machine Learning Database.

A Dirichlet-inspired representation
A general Dirichlet series is an infinite series of the form where a n and s are complex numbers and the set { n } is a strictly increasing sequence of non-negative real numbers that tend to infinity.When n = ln(n) we have the "ordinary" Dirichlet series.One of the most famous of them is the Riemann zeta function which has applications in physics, statistics and many branches of mathematics and is defined as where Re(s) > 1 and its analytic continuation elsewhere.
This has suggested a new representation; for a large value of an integer N, it may be possible to approximate the potential value between two molecules (labelled 1 and 2) at a distance r.We can write so the problem of finding the best approximation for V 1,2 has now reduced to the problem of finding the set {a 1 , a 2 , . . ., a N }.
In Supplementary Material's Appendix 1, we show how to use symbolic regression software to search for continued fraction approximations of an unknown potential using the Dirichlet representation.We illustrate the methods using Halpern's Argon dataset.

Introducing problem domain information
In this section, we show how we can get very good solutions for several Noble gases using the same dataset employed by Deiters and Sadus 3 .It is worth mentioning that all potential values V 1,2 (r) are dimensionless, and the variable r is measured in nanometers (nm) in this work.

Deiters and Sadus's SAAP two-body potential and the introduction of problem-domain information
In 2019, Deiters and Sadus presented a two-body potential for the noble gases Ne, Ar, Kr, and Xe, which is called SAAP, an acronym for 'Simplified Ab initio Atomic Potential (and a variant of it called SAAPx for Helium) 3 .They provided a set of rules originating from their physical understanding of the problem domain that can help design a useful functional form that fits experimental data for all these gases well.First of all, the asymptotic behaviours that are desired should be taken into consideration: r) should be approaching zero for large values of r as a function of r −6 (and have negative values).They validate their claim by saying that dispersion interactions dominate the potential; this means that the original Lennard-Jones potential had that specific asymptotic behaviour already "hardwired" in the functional form.• When r tends to zero, there is a repulsion effect (Pauli repulsion), and They also propose the following behaviour for the potential (our rephrasing): • Following the same Mie potential convention, let σ > 0 be the value satisfying V 1,2 (σ ) = 0 .Such a value is unique for all r > 0 and in addition dV /dr < 0 for all 0 < r ≤ σ (and it is called the "collision diameter"). (9) ∞ n=1 a n e − n s , Vol.:(0123456789)  16 , an expression of the form exp(r)/r was proposed for the repulsion.Then the proposed formula for SAAP is: where a 1 , . . ., a 4 < 0 and a 0 , a 5 > 0 .These six coefficients are then to be adjusted using the experimental data.
Remembering then the definition of a general Dirichlet series, it is then interesting to note that SAAP resembles a two-body potential of the form: New fits with continued fraction regression with asymptotic behaviour as r −6 Following Deiters and Sadus's approach of introducing problem domain information, we now propose to approximate V 1,2 (r) as: so, in this case, we would multiply the value of the observed values at any given r by r 6 to find truncated continued fraction approximations.Interestingly, for Xenon, Krypton, Argon and Neon, we obtained.

Neon
with σ ≈ 2.76803 and r min ≈ 3.08297 , with V 1,2 (r min ) ≈ −42.20354 .We highlight that all these formulas can be rearranged in the form V 1,2 (r) ≈ r −6 (a 0 + a 1 /(a 2 + n r )) that requires a single computation of an exponential (to the power of r only) and only have four free adjustable integer coefficients.We believe that these relatively simpler forms have the potential to lead towards more efficient molecular dynamics simulations.

SAAPx and our model for Helium
To fit the ab initio data from Helium, Deiters and Sadus proposed a modification of the SAAP potential and called it SAAPx.It needs an extra coefficient to be empirically fitted from the data ( a 6 ).Formally it is written as: 1 + r 6 . ( Clearly, SAAP x (r) , can be seen as just a generalisation of SAAP(r), so it is reasonable to state that Deiters and Sadus's proposal for these potentials is based on a functional form with seven adjustable parameters, with a 6 being ad hoc set to zero for all other noble gases that are not Helium.
In contrast, we continue with our investigation of the representational properties derived from our proposal of eq. ( 16).We will present two functions that we have found that fit the experimental data relatively well.with σ ≈ 2.64036 and r min ≈ 2.97924 , with V 1,2 (r min ) ≈ −11.01906.
We have found an approximation of the model in Eq. ( 23) as a simplified format as follows: with σ ≈ 2.65168 and r min ≈ 2.9572 , with V 1,2 (r min ) ≈ −11.03116.
Figure 1 shows the comparison of ab initio potential energy of He and approximation by Eq. ( 22) at interatomic separations in the repulsive region and Fig. 2 shows the comparison of ab initio, SAAP and approximation by Eq. (25) close to the attractive well.
Plots for all depth = 1 or depth = 2 (He) models We show the comparison of ab inito potential energies along with the corresponding models for He, Ne, Ar, Kr and Xe gases in Fig. 3.We have computed the relative error (RE) to assess the fitting of data points by respective models, and the results are summarised in Table 2 for a more accessible and concise presentation of our findings.

Discussion
It is perhaps proper to highlight again that the choice of a good representation governs the process of finding approximations of potentials and the many aspects involved in obtaining a good fit via a computer-based optimisation process.We thus consider that there is merit in continuing to investigate how to improve these fits, using these functional forms, perhaps with more powerful optimisation approaches than the ones we have used so far.For instance, in regards to Eq. ( 22), we have also found another similar equation with different values for the integers of the associated Dirichlet polynomial: with σ ≈ 2.6407 and r min ≈ 2.9706 , with V 1,2 (r min ) ≈ −11.01307.
Figure 4 shows the comparison of the approximations of Eqs. ( 22), ( 24) and ( 25), in the range of the repulsive region where the potential is positive.We can see the effect of the introduction of problem-domain knowledge.In the case of Eq. ( 24), the depth = 2 truncated continued fraction now has an asymptotic behaviour which is very different from data-driven generated equations Eqs. ( 24) and (25).However, in this range of values of r, for which ab initio data was used to fit parameters, the approximations given by the Dirichlet polynomials were very good.In fact, Eqs. ( 22) and (25) were in some practical sense "easier" to fit than the depth = 2 Eq. ( 24).We will return to this issue later when we discuss the optimisation lessons learned in the process.We should also note that in Fig. 2, we have plotted the results of four equations, and the results of SAAP (which, surprisingly, seems to be even better than the Helium-ad hoc potential SAAPx in this region, see Fig. 3 of Ref. 3 ), so it is clear that, near the minimum, these are also good approximations.
We show the comparison of the ab initio potential energy and approximation of the models for all gases in the range of r = {0.15,• • • , 0.4} in Fig. 5 in log scale.

Conclusions
This study emphasises the challenges in deriving analytical potentials from experimental data, especially when using machine learning approaches.This challenge is particularly significant in the context of accurately modelling two-body interaction potentials without making excessive assumptions.Problem-domain knowledge about asymptotic behaviour, together with a novel representation inspired by a Dirichlet series, has been an effective combined approach.
For the area of symbolic regression, the paper underscores the importance of pursuing model-independent mathematical methods that can learn the specific functional form of two-body interaction potentials directly from data.Such approaches are critical for improving data-driven model building and could be used as benchmarks for symbolic regression solvers.
The study leverages a variety of data sources, including ab initio data for noble gases such as Xenon (Xe), Krypton (Kr), Argon (Ar), Neon (Ne), and Helium (He).These sources also include publications by J"ager, Hellmann, Bich, and Vogel, as well as the comprehensive study by Deiters and Sadus in 2019.These data sources are invaluable for developing and testing new data-driven methods.
The research problem of deriving accurate analytical forms of interatomic potentials from data remains open and continues to be a topic of ongoing investigation.This work represents a step in that direction and highlights the need for further research in this area.The approach based on continued fraction regression seems promising as iteratively increasing depth will deliver increased fitting performance 13 .However, we have illustrated in this study how a depth = 1 truncated continued fraction with integer coefficients is already a good approximation for this case and that the final model requires a single exponent computation.
Our paper suggests that future research may inspire novel data-driven methods, potentially improving the approximation of two and three-body potentials using continued fraction regression, including the use of dynamic depth strategies 17 .It also underscores the importance of addressing the computational challenges associated with these methods, especially when dealing with small datasets and highly non-linear target functions.19), Kr (red solid squares) with Eq. ( 18), and Xe (blue solid diamonds) with Eq. ( 17) at interatomic separations close to the attractive well.

Figure 1 .Figure 2 .
Figure 1.Comparison of the ab initio potential energy of He (green solid stars) with our model's (the solid green line with Eq. (22)) calculations at interatomic separations in the repulsive region.

Figure 4 .Figure 5 .
Figure 4. Comparison of the ab initio potential energy of He (black solid circles) with our model's (the solid red for Eq.(22), solid blue for Eq.(24) and dashed magenta for Eq.(25) calculations st interatomic separations in the repulsive region where the potential takes positive values.

Table 1 .
An example of a hypothetical function f(r) to be learned from the existing data with r a positive integer 1 ≤ r ≤ 9.The unknown function in this case relates to Moser's circle problem, and further details are given in Supplementary Material's Appendix 3.This example is used here to illustrate the discussion on symbolic regression approaches.