Fast calculation of scattering patterns using hypergeometric function algorithms

The scattering of light, X-rays, electrons or neutrons by matter is used widespread for structural characterization from atomic to macroscopic length scales. With the advent of high-brilliance beam sources and the development fast, large area pixelated detectors, scattering patterns are now acquired at unprecedented frame rates and frame sizes. The slow analysis of these scattering patterns has evolved into a severe bottleneck retarding scientific insight. Here we introduce an algorithm based on the use of hypergeometric functions providing gains in computational speed of up to 105 compared to present numerical integration algorithms. Hypergeometric functions provide analytical descriptions of geometrical shapes, can be rapidly computed as series and asymptotic expansions, and can be efficiently implemented in GPUs. The algorithm provides the necessary computational speed to calculate scattering patterns on timescales required for real-time experiment feedback, the analysis of large volumes of scattering data, and for the generation of training data sets for machine learning.

www.nature.com/scientificreports/ To reduce computation time, whenever possible, 2D data sets are azimuthally averaged to obtain 1D data sets with the number N of data points reduced to √ N . Yet, for the analysis of large data sets or for the generation of synthetic data sets to train neural networks, current algorithms are prohibitively slow 12 . In addition, the analysis of the full 2D-scattering pattern is required for a large class of synthetic and biological materials consisting of anisotropic structures or thin films. The fast computation of scattering data for all these cases is beyond the scope of current algorithms and software packages.
Here, we present an algorithm that is based on the use of hypergeometric functions to rapidly compute 1Dand 2D-scattering data. Hypergeometric functions provide a simple mathematical description of geometrical objects, have analytical Fourier transforms, and can be rapidly computed via series and asymptotic expansions with recursive coefficients. Compared to numerical integration schemes we observe gains in computation speed of > 10 5 . The algorithm can be efficiently parallelized and implemented into GPUs for further acceleration. This enables the computation of 2D scattering patterns at > 1 fps even for current 4k pixel detectors.

Results
Calculation of scattering patterns. The calculation of scattering patterns on pixel detectors requires the computation of the scattered intensity I(q ij ) for each pixel. Modern 4k detectors have more than 16 million pixels. Figure 1 schematically illustrates an array of pixels (i, j) , for each of which the scattering intensity needs to be computed. Mathematically, the position (x ij , y ij ) of each pixel corresponds to certain components of the scattering vector q ij , which is related to experimental parameters including the wavelength of the incoming beam, the sample-detector distance d det , and the angle ϑ ij enclosed by the scattered beam and the incoming beam. The scattering vector q ij is given by The approximation on the right-hand side is the basis for small-angle scattering experiments. For materials consisting of or modelled with assemblies of geometrical objects, the calculation of the scattered intensity I(q ij ) proceeds via the calculation of their Fourier transform or scattering amplitude F(q ij ), the formfactor P(q ij ) , and the lattice factor Z(q ij ) describing their spatial assembly. F(q ij ) is obtained by integrating the density of the object, ρ(r), multiplied with the phase factor e iq ij r over the volume V of the object Subsequently, the formfactor P(q ij ) is obtained as the absolute square of the scattering amplitude F(q ij ) . In a third step, the formfactor is averaged over the size distribution h(R) and the orientational distribution h(ϑ, ϕ) of the ensemble of objects If the objects are characterized by more than one characteristic dimension, e.g. for anisometric objects such as ellipsoids, cylinders, or parallelepipeds, the size distribution for each additional dimension needs to be taken into account. For common particle shapes, the number of required numerical integrations to calculate P(q ij ) increases from spheres (1), biaxial ellipsoids (3), cubes (3), cylinders (3), disks (3), super-ellipsoids (4), triaxial ellipsoids (5), parallelepipeds (5), octahedrons (5) to super-balls (5)(6)(7). This makes calculations of scattering patterns for modern large area pixel detectors prohibitively slow. In addition, the three-dimensional assembly  www.nature.com/scientificreports/ of these objects represented by the lattice factor Z(q ij ) needs to be included, at least at the level of the decoupling approximation 13,14 .
where b is a sample specific contrast factor, ρ N the particle number density, and G(q) the Debye-Waller factor, requiring the calculation of a large number of Bragg-peak positions. The main challenges for the calculation and analysis of large 2D scattering patterns and derived 1D-data sets are therefore (i) the large number of numerical integrations to compute P(q ij ) , and (ii) the large computational effort to calculate Z(q ij ) . The aim of the presented algorithm is.
1. to develop a method to solve as many integrations as possible analytically to perform the computation of I(q ij ) on simple functions, and 2. to factorize P(q ij ) and Z(q ij ) into q-independent parts with coefficients c n which are the same for every pixel, and a remaining q-dependent part f n (q) which is simple and can be rapidly computed on every pixel by multiplication with the pre-calculated set c n , thereby in addition allowing an efficient implementation in parallel computing algorithms and GPUs.
We show that the use of hypergeometric functions provides the needed methodology. Hypergeometric functions are usually used to compute special functions and have so far only been considered for the calculation of the Fourier transform of specific polymer core/shell structures, because they provided a closed analytical solution for shells with algebraic density profiles 15 . The benefit in the calculation and use of their q-independent series coefficients for the rapid computation of scattering patterns has not yet been considered.
Algorithm. The corresponding algorithm is shown in Fig. 2. With the input parameters that describe the objects and their spatial assembly, the coefficients needed for the reciprocal space vectors q * hkl together with the q-independent coefficients c n are calculated. The pixel intensities I(q ij ) are calculated within the (i,j)-loop over all detector pixel rows and columns as in Fig. 1. For 1D-data the i-loop to calculate I(q i ) is divided into j blocks.
The algorithm requires the pre-calculation of the coefficients c n of the series and the asymptotic expansions, which is not necessary for numerical integration schemes. We find the computational cost for the pre-calculations to be very small. It is overcompensated by an acceleration factor of up to 5 . 10 5 to compute the scattering pattern in the subsequent (i,j)-loop.
Algorithm to compute 2D-scattering patterns. Using the provided input parameters, the algorithm pre-calculates the coefficients for the reciprocal space vectors q * hkl and the coefficients c n . Within the (i, j)-loop for all pixels the scattering vector q ij , and from this the scattering amplitude F(q ij ) and the formfactor P(q ij ) are calculated using the pre-calculated coefficients c n . Subsequently, the pre-calculated coefficients for the reciprocal space vectors are used to calculate the lattice factor Z(q ij ) to finally obtain the scattered intensity I(q ij ) . Since the calculations in the (i, j)-loop are mutually independent, they can be efficiently parallelized employing GPUs. The resulting scattering pattern can be convoluted with the beam or point spread function (PSF) to obtain J(q ij ) for direct comparison to experiments. For 1D-data the i-loop to calculate I(q i ) is divided into j blocks.
R is the size, r the radial distance, and d the dimensionality of the object.
Equation (5) applies to objects that are rotationally symmetric in three dimensions ( d = 3 , spheres), in two dimensions ( d = 2 , cylinders) and formally in one dimension ( d = 1 , platelets, lamellae). By rescaling, these cases can be extended to include biaxial and triaxial ellipsoids, octahedrons and superballs ( d = 3) , superellipsoids, dumbbells and lenses ( d = 2) , cubes and parallelepipeds ( d = 1, 2, 3) as well as polymer chains and flexible tubes ( d = 1 − 2) . Thus, Eq. (5) applies to large class of objects that are relevant for the modeling of materials structures. The integral can be extended over the volume of compound objects, such that a large class of further, more complex object structures can be analytically described.
Hypergeometric functions can be calculated via series and asymptotic expansions 16,17 . A detailed derivation of the scattering amplitudes and formfactors is outlined in the Supporting Information (SI Sect. 1). We here provide the two main results: (a) For small arguments z, the hypergeometric functions can be computed via a series expansion, which for the case of 0 F 1 (z) is given by (b) For large arguments z, the hypergeometric functions can be computed via an asymptotic expansion with ν = − d+2 2 + 1 2 . For the required numerical accuracy, it is sufficient to use just the first two terms of the expansion (7). When choosing common log-normal distributions, the averages over the terms of the series expansions z 2n (Eq. 6) and asymptotic expansions cos(z) z k (Eq. 7) are simple linear and trigonometric functions. These are summarized in the Supporting Information (SI Sect. 2). The series expansion Eq. (6) and the asymptotic expansion Eq. (7) overlap and are the basis for the calculation of the q-independent coefficients c n .
Coefficients and formfactors in the isotropic case. We here illustrate the derivation of the coefficients c n with the simple example of a spherical object. In the Supporting Information we provide the coefficients for a comprehensive collection of geometrical objects including biaxial ellipsoids, triaxial ellipsoids, cylinders with circular and elliptical cross-sections, disks, cubes, parallelepipeds, superballs, super-ellipsoids, dumbbells, lenses and excluded volume polymer chains (SI Sect. 4) for a broad range of applications.
For spheres ( d = 3 ), the formfactor is obtained from P(q) = F 2 (q) as In the Supporting Information (Eq. S.1.3.2) we show that for all dimensions d the double sum can be recast into a single sum which accelerates the computation significantly The average over the size distribution, characterized by a polydispersity parameter z, yields the series www.nature.com/scientificreports/ The coefficients c n can be efficiently calculated via recursion relations for the powers, factorials, Gamma functions, Pochhammer factorials and binomial coefficients, as summarized in the Supporting Information (SI Sect. 10). Because of the recursion relations, the calculation of the coefficients is fast, can be encoded with few lines of source code, and without the necessity to compute special functions such as Bessel and Gamma functions or Pochhammer factorials.
The series expansion (Eq. (6)) converges for values of qR < 1 − 10 (Regime I), depending on the polydispersity z. For qR > 1 (Regime II) we use the asymptotic expansion (Eq. (7)) for spheres ( d = 3 ) which is The averages of the trigonometric functions are given in terms of simple cosine and arctan functions and are all summarized in the Supporting Information (SI Sect. 2).
For qR ≫ 1 (Regime III) we only need to use the leading cosine term of the asymptotic expansion to derive the non-oscillating part of the asymptote, which is identical to the Porod-q −4 -asymptote These three regimes are also indicated in the schematic algorithm description (RI, RII, RIII) in Fig. 2. The consideration of size distributions has two advantages: (i) it provides a realistic description of materials structures, and (ii) at the same time mathematically leads to non-oscillating single-term asymptotes of the formfactor P q in the hiqh-q range (Regime III regime), which can represent a significant part of the scattering patterns recorded by large area detectors. The calculation of the Porod asymptotes of anisometric objects [18][19][20] has been rarely considered in literature, but is important for the algorithm and therefore outlined in the Supporting Information (SI Sect. 5).
The computed scattering patterns in Fig. 3 show the seamless overlap of the series expansions (Regime I), asymptotic expansions (Regime II) and the Porod-Regime (Regime III) such that the formfactors can be computed rapidly over the complete q-range. The q-values for the I-II and II-III transitions can be pre-calculated during the calculations of the series cofficients c n (see Supporting Information (SI Sect. 3).
Coefficients and formfactors in the anisotropic case. A further advantage of the rapid calculations using series and asymptotic expansions is in their application to compute the scattering patterns of oriented assemblies of isometric and anisometric objects. This requires orientational averaging, usually performed by two additional numerical integrations which is very time-consuming and therefore rarely done.
For the oriented case we need to specify the main axis L of the object with respect to a director D. The particles may have a deviation angle δ with respect to the director D, with a distribution function h(δ) . We obtain simple expressions for the phases qL, which can be directly implemented in the derived series and asymptotic expansions for rapid calculations. All details of the derivation are provided in the Supporting Information (SI Sect. 7).
As an illustration, we consider the case of an ensemble of cylindrical objects oriented along the x-axis with a deviation angle δ . For the phase qL we obtain the expression where χ is the azimuthal angle on a cone with opening angle δ . For the series and asymptotic expansions, we need the averaged phase term which is obtained as with The integrals H 2l,2n−2l need to be integrated numerically over the orientational distribution function h(δ) , but are q-independent and therefore part of the coefficients c n . The integral over 3cos 2 (δ)−1 2 directly provides the orientational order parameter S.
The phase term can be inserted into the cylinder formfactor series expansion as For rapid calculations the q-independent coefficients a n and b l,n are pre-calculated, such that the series can be quickly evaluated for each pixel ( q x , q y ) Ordered and periodic structures. Most synthetic and biological materials consist of particles or objects that are assembled with varying degrees of order. Therefore, it is important to include crystallographic calculations in the algorithm. We will show that also these contributions can be factorized into a q-independent part that can pre-calculated and provided to the (i,j)-loop, where it is multiplied with a q-dependent part. For 1D-, 2D-and 3D-assemblies the calculations require to evaluate single, double or triple sums over all non-zero (hkl) Miller indices, such that pre-calculation schemes are very effective.
For the calculations, we need to specify the (uvw)-direction within the unit cell, which is parallel to the probe beam direction, i.e.r uvw ||n . The unit cell is defined by the three edge lengths and the enclosed angles (a,b,c; α, β, γ). Then the direction vector is given as r uvw = ua A + vb A + wc A in the unit cell coordinate system A 21 . We need   (a A , b A , c A ) to the orthonormal Carthesian lab-base coordinate system E to obtain the set of base vectors (a E , b E , c E ) using a transformation matrix E. These base vectors are then rotated to align with the (uvw)-direction parallel to the probe beam, i.e. r uvw ||n , using a rotation matrix R to obtain the rotated base vectors (a Er , b Er , c Er ) . These vectors are subsequently transformed to the reciprocal space vectors (a * , b * , c * ) using the metric matrix G . These three transformations can be compactly expressed as such that in the (i,j)-loop and the (h,k,l)-sums the respective reciprocal space vectors q * hkl can be calculated from the precalculated reciprocal space vectors as q * hkl = ha * + kb * + lc * . The matrices M, R, G are derived and provided in the Supporting Information (SI Sect. 8). With the vector q * hkl and predefined peak shape parameters, the peak shape function L q, q * hkl can be calculated. Then the lattice factor of Eq. (4) can be computed by summing over all (hkl) sets of Miller indices where f hkl = 0 For macroscopically isotropic materials, where Debye-Scherrer rings are observed (Fig. 5c), the azimuthal average of Z q can be obtained in closed analytical form, with q-independent coefficients that can be precalculated as indicated in Fig. 2 and used for the rapid calculation of 1D-data 22 .

Grazing incidence scattering (GISAS).
Grazing-incidence small-angle scattering and diffraction (GISAS, GIXD) are scattering techniques using X-rays or neutrons to study nanostructured surfaces and thin films 23,24 . To extend the algorithm to include this important class of scattering experiments requires to include the Fresnel coefficients of transmission and reflection from the film surface and interfaces.
As an example, we consider a thin film containing an assembly of objects, whose scattered intensity I(q ij ) is given by Eq. (4). The scattered intensity recorded in a GISAS experiment can be calculated within the framework of the Distorted Wave Born Approximation (DWBA) 23 . Therefore, the GISAS scattering intensity is computed as a sum over the four DWBA-terms representing the scattering/reflection events of the object assembly and the film/substrate interface www.nature.com/scientificreports/ where the q n,ij are the scattering vectors of the four scattering/reflection events, and the T i,f and R i,f are the Fresnel transmission and reflection coefficients, respectively. In the Supporting Information (SI Sect. 9) we extend the calculation to include incident plane specular and diffuse scattering. In effect, we multiply the already calculated scattered intensity I q ij with the Fresnel transmission and reflection coefficients, of which T i and R i are q-independent and can be precalculated together with the other coefficients.
To demonstrate the potential of the method, we show in Fig. 5D the calculated GISAS-pattern of polydisperse spheres assembled in a BCC-lattice within a thin film on a substrate. As typical features we observe the Yoneda peak at the critical scattering vector q z,c = 0.23 nm −1 , and the incident plane diffuse and specular reflection. The peaks are calculated assuming the beam direction to be parallel to the (001)-direction resulting in a (hk0)-fiber pattern of the BCC-lattice, which is typically observed e.g. for nanoparticle assemblies 25 .

Discussion
For benchmarking we compare CPU computing times between the series algorithm and conventional numerical integration schemes. We computed scattering patterns for the most common geometrical particle shapes that are used for modeling materials structures: spheres, biaxial ellipsoids, triaxial ellipsoids, cylinders, disks, and cubes. For all calculations a typical q-range was chosen comprising the low-q Guinier and the high-q Porod regimes. Moderate axial ratios from 3 to 8 for anisometric particles, and moderate polydispersities of σ = 0.1 were chosen for a fair comparison of the methods. Whenever possible, trigonometric functions in the integrands of the numerical integrations were substituted by non-oscillatory linear functions to gain computational speed. All details of the computations are summarized in the Supporting Information (SI Sect. 11.1). The calculations were performed for a range of data points of 50-10 5 . The smaller numbers are typical for the analysis of 1D-data sets, and the larger numbers typical for 2D-data sets for pixel detectors. The calculations were done on a single CPU core. Figure 6 shows the CPU-times as a function of the number of data points for different geometrical objects and the two calculation schemes. We find that even in the case of simple spherical particles for a small number of data points the series expansions are faster by a factor of 4, and therefore already have a computational benefit. For a large number of data points, e.g. for N = 10 5 in the asymptotic t ∼ N 1 -region, the series expansions for spheres (19)  www.nature.com/scientificreports/ are faster by a factor of 40 (see SI Sect. 11). For anisometric objects, the series expansion algorithm is faster by factor of up to 7 . 10 1 for biaxial ellipsoids, 3 . 10 2 for triaxial ellipsoids, 2 . 10 5 for cylinders, and 5 . 10 5 for disks and cubes. For most common objects the series algorithm allows to compute 10 6 data points in < 500 ms on a single core CPU. Only for the case of monodisperse spheres with a simple analytical expression of the formfactor, the series algorithm is slightly slower, by ca. 20% (SI Sect. 11.3). The expansions and numerical integrations are performed with a relative precision of 10 −4 . In the Supporting Information (SI Sect. 11.2) we show that the expansions converge fast, such that reaching higher precisions require much smaller increases in CPU time compared to numerical integrations. We also considered the most challenging case of polydisperse superballs. For superballs with edges of equal lengths ( a = b = c ), where five numerical integrations are necessary, CPU times of ca. 5 min. for 400 data points have been reported with optimized numerical integration routines 26 . We considered the more general case of superballs with unequal edges ( a = b = c ) as shown in Fig. 6. Here, the pre-calculation of the coefficients is the rate-limiting step requiring 2.5 s, which already is > 100-times faster compared to numerical integration. If we extrapolate the reported CPU time to 10 5 data points, the gain in computational speed is > 10 7 .
The use of centro-symmetry of I(q ij ) (Friedel's law) and further symmetries of particles and lattices leads to a further at least two-fold reduction of CPU-time, enabling already > 1 fps calculation of 16 million data point 4k pixel detector scattering patterns with a single core CPUs as motivated in the Introductory Section.
The algorithm can be efficiently implemented into GPUs, because the calculations of the pixel scattering intensities I(q ij ) are mutually independent. We demonstrate that already simple consumer graphic cards can accelerate the algorithm by a further factor of > 50, enabling sub-second 1D-and 2D-fitting of very large detector array data as demonstrated in the Supporting Information (SI Sect. 12). As applications we demonstrate in the Supporting Information the simulation of large 2D small-angle X-ray (SAXS), small-angle neutron (SANS) and small-angle light scattering patterns, as well as selected area electron diffraction (SAED) patterns with 2k-or 4k-detectors (SI Sect. 12). We furthermore show GPU-accelerated 2D-fitting, and examples of simulated data sets for the training of neural networks.

Conclusions
We demonstrated an algorithm based on the use of hypergeometric functions that computes 1D-scattering data and 2D-scattering patterns of assemblies of geometrical objects up to a factor of > 10 5 faster than conventional numerical integration schemes. This acceleration is possible, because hypergeometric functions can be efficiently computed via series and asymptotic expansions, the expansion coefficients can be rapidly calculated via recursion relations and are q-independent. They are therefore the same for every pixel and can be pre-calculated and provided to the (i,j)-pixel calculation loop as described in the algorithm in Fig. 2. Over large q-ranges, only one For the series expansions we observe a low-N plateau due to the pre-calculation time for the q-independent coefficients. This computational cost is overcompensated by the subsequent much faster calculation of the scattering patterns. For large number of data points, we observe a gain of up to 5 . 10 5 . For the most common geometrical objects, the computation of one million data points in < 500 ms is possible on just a single core CPU. www.nature.com/scientificreports/ or two terms of the expansion are necessary to compute the scattering intensities with sufficient accuracy. The algorithm enables the fast calculation of scattering patterns of simple and complex objects with defined spatial and orientational distributions. Since the computations of the pixel scattering intensities are mutually independent, the calculation can be efficiently implemented into parallel algorithms for GPUs for further significant acceleration. The algorithm enables rapid calculation of large area 2D-scattering patterns and 1D-scattering data enabling high-throughput fitting of large 1D-and 2D-data sets, on-the-fly data analysis for steering scattering experiments, and fast training of neural networks. It thereby helps addressing the data analysis bottleneck for widespread application in the structural analysis of synthetic and biological materials using X-ray, neutron, light and electron scattering and diffraction experiments. The significant saving in computation time of factors of 10 5 -10 7 furthermore considerably reduces computer energy consumption relevant for green IT.

Methods
All mathematical and computational methods are described in the Supporting Information. We provide Mathematica and C++ source code for verification and description of the methods. We further provide the full C++ source code and a compiled executable standalone software. Under https:// github. com/ neutr on-simlab/ Cryst alSca tter.

Data availability
All code and results presented in this paper are available open-source and open-access in the associated GitHub repository under https:// github. com/ neutr on-simlab/ Cryst alSca tter.