Neural-network quantum states for ultra-cold Fermi gases

Ultra-cold Fermi gases display diverse quantum mechanical properties, including the transition from a fermionic superfluid BCS state to a bosonic superfluid BEC state, which can be probed experimentally with high precision. However, the theoretical description of these properties is challenging due to the onset of strong pairing correlations and the non-perturbative nature of the interaction among the constituent particles. This work introduces a novel Pfaffian-Jastrow neural-network quantum state that includes backflow transformation based on message-passing architecture to efficiently encode pairing, and other quantum mechanical correlations. Our approach offers substantial improvements over comparable ans\"atze constructed within the Slater-Jastrow framework and outperforms state-of-the-art diffusion Monte Carlo methods, as indicated by our lower ground-state energies. We observe the emergence of strong pairing correlations through the opposite-spin pair distribution functions. Moreover, we demonstrate that transfer learning stabilizes and accelerates the training of the neural-network wave function, enabling the exploration of the BCS-BEC crossover region near unitarity. Our findings suggest that neural-network quantum states provide a promising strategy for studying ultra-cold Fermi gases.


I. INTRODUCTION
The study of ultra-cold Fermi gases has received considerable experimental and theoretical attention in recent years due to their unique properties and potential applications in fields ranging from condensed matter physics to astrophysics.These systems can be created and manipulated in the laboratory with high precision, providing a versatile platform for investigating a wide variety of phenomena.By tuning the s-wave scattering length a via external magnetic fields near a Feshbach resonance, one can smoothly crossover from a fermionic superfluid BCS state (a < 0) of long-range Cooper pairs to a bosonic superfluid BEC state (a > 0) of tightly-bound, repulsive dimers.Given their diluteness, the behavior of these systems is mainly governed by a and the effective range of the potential r e , with natural units provided by the Fermi momentum k F and the Fermi gas energy per particle in the thermodynamic limit E F G = 3 5 2 2m k 2 F (see Ref. [1] and references therein).
The region between the BCS and BEC states, known as the "unitary limit," is particularly interesting as a diverges and r e approaches zero.The unitary Fermi gas (UFG) is a strongly-interacting system that exhibits surprisingly stable superfluid behavior.Studying the BCS-BEC crossover near the unitary limit can reveal critical aspects of the underlying mechanism behind superfluidity in fermionic matter.The UFG is also universal, meaning its properties are independent of the details of the two-body potential.This universality allows for robust comparisons and predictions between seemingly disparate quantum systems.For instance, the UFG is relevant for neutron stars, as they provide a means to study superfluid low-density neutron matter [2,3], whose prop-erties are crucial for the phenomenology of glitches [4] and the cooling of these stars via neutrino emission [5][6][7].
The onset of strong pairing correlations and the nonperturbative nature of the interaction makes the theoretical study of these systems particularly challenging for quantum many-body methods.Among them, quantum Monte Carlo (QMC) has proven to be exceptionally efficient in calculating various properties with high accuracy, including the energy [8], pairing gap [9], and other quantities related to the so-called contact parameter [10].Diffusion Monte Carlo (DMC), in particular, is an accurate tool for calculating the properties of quantum many-body systems [11].The fixed-node approximation typically employed in DMC calculations to control the fermion-sign problem provides a rigorous upper bound to the ground-state energy that agrees well with other methods and experiments [12,13].Moreover, unlike the Auxiliary Field Quantum Monte Carlo method, DMC can handle broad classes of local interactions, which provides exact results that are sign-problem free but is limited only to unpolarized systems with a purely attractive interaction [8].However, the fixed-node approximation limits the accuracy of DMC energies, which induces a residual dependence on the starting variational wave function.The latter has a critical role in DMC calculations of expectation values of operators that do not commute with the Hamiltonian, such as spatial and momentum distributions.The analytical form of the variational ansatz is usually tailored to specific problems of interest and biased by the physical intuition of the researchers.
In this work, we overcome these limitations by performing variational Monte Carlo (VMC) calculations of ultra-cold Fermi gases with neural-network quantum states (NQS) [14] that incorporate only the most essential symmetries and boundary conditions.After their initial application to quantum-chemistry problems [15,16], continuous-space NQS have been successfully employed to study quantum many-body systems in the presence of spatial periodicities, such as interacting quantum gases of bosons [17], the homogeneous electron gas [18,19], and dilute neutron matter [20].Recent works have also used NQS to solve the nuclear Schrödinger equation in both real space [21][22][23][24][25] and the occupation number formalism [26].When dealing with fermions, the antisymmetry is usually enforced using generalized Slater determinants, the expressivity of which can be augmented with either backflow transformations [27] or by adding "hidden" degrees of freedom [28].
Strong pairing correlations in fermionic systems motivate adopting an antisymmetrized wave function constructed from pairing orbitals rather than single-particle orbitals.One such construction, often called the geminal wave function [29,30], considers determinants of spin-singlet pairs, while other more general wave functions based on the Pfaffian [31][32][33][34], consider both singlet and triplet contributions.Pfaffian wave functions combined with neural-network Jastrow correlators [35,36] have successfully modeled lattice fermions, even revealing the existence of a quantum spin liquid phase in the J1-J2 models on two-dimensional lattices.
We propose a novel NQS that extends the conventional Pfaffian-Jastrow [32] ansatz by incorporating neural backflow transformations into a fully trainable pairing orbital.The backflow transformations are generated by a message-passing architecture recently introduced to model the homogeneous electron gas [37].In addition to being a significant departure from generalized Slater determinants, our Pfaffian-Jastrow NQS naturally encodes pairing in the singlet and triplet channels, without stipulating a particular form for the pairing orbital.In view of this, it is broadly applicable to other strongly-interacting systems with the same symmetries and boundary conditions.We demonstrate the representative power of our NQS by computing ground-state properties of ultra-cold Fermi gases in the BCS-BEC crossover.Our Pfaffian-Jastrow NQS outperforms Slater-Jastrow NQS by a large margin, even when generalized backflow transformations are included in the latter.Most notably, we find lower energies than those obtained with state-of-the-art DMC methods, which start from highly-accurate BCS-like trial wave functions.
The rest of the paper is organized as follows.In Section II, we introduce the Hamiltonian used to model ultra-cold atomic gases near the unitary limit and the many-body techniques used to solve the Schrödinger equation.In Section III, we compare the Pfaffian-Jastrow NQS with other NQS ansätze and state-of-the-art DMC results.Finally, in Section IV, we draw our conclusion and provide future perspectives of this work.

II. METHODS
As customary in QMC approaches, we simulate the infinite system using a finite number of fermions N in a cubic simulation cell with side length L, equipped with periodic boundary conditions (PBCs) in all d = 3 spatial dimensions.We use r i ∈ R d and s i ∈ {↑, ↓} to denote the positions and spin projections on the z-axis of the i-th particle, and the length L can be determined from the uniform density of the system N/L 3 = k 3 F /(3π 2 ).The dynamics of the unpolarized gas is governed by the non-relativistic Hamiltonian where the attractive two-body interaction acts only between opposite-spin pairs, making the interaction mainly in s-wave for small values of r e .In the above equations, ∇ 2 i is the Laplacian with respect to r i and r ij = r i − r j is the Euclidean distance between particles i and j.The Pöschl-Teller interaction potential of Eq. ( 2) provides an analytic solution of the twobody problem and has been employed in several previous QMC calculations [10,[38][39][40].The parameters v 0 and µ tune the scattering length a and effective range r e , respectively.In the unitary limit |a| → ∞, the zero-energy ground state between two particles corresponds to v 0 = 1 and r e = 2/µ.In order to analyze the crossover between the BCS and BEC phases, we will use different combinations of v 0 and µ that correspond to the same effective range.In addition, we will consider various values of µ with fixed v 0 = 1 to extrapolate the zero effective range behavior at unitarity.

A. Neural-network quantum states
We solve the Schrödinger equation associated with the Hamiltonian of Eq. (1) using two different families of NQS.All ansätze have the general form where the Jastrow correlator J(X) is symmetric under particle exchange and Φ(X) is antisymmetric.Here, we have introduced X = {x 1 , . . ., x N }, with x i = (r i , s i ), to represent the set of all single-particle positions and spins compactly.
In addition to the antisymmetry of fermionic wave functions, the periodic boundary conditions, and the translational symmetry (which will be discussed in Sec.II A 3), we also enforce the discrete parity and timereversal symmetries as prescribed in Ref. [24].More specifically, we carry out the VMC calculations for the unpolarized gas using Ψ P T (R, S) given by where n = N/2 and we have used the notation R = {r 1 , . . ., r N } and S = {s 1 , . . ., s N } for the set of all positions and spins, respectively.Enforcing these symmetries has been shown to accelerate the convergence of groundstate energies for both atomic nuclei [24] and dilute neutron matter [20].

Pfaffian-Jastrow
The antisymmetric part of the wave function employed in QMC studies of ultra-cold Fermi gases is typically constructed as an antisymmetrized product of BCS spinsinglet pairs [2,38,39,41,42].It goes by a variety of names, such as the geminal wave function [29,30], the singlet pairing wave function [32], and the (numberprojected) BCS wave function [42], just to name a few.Although geminal wave functions have demonstrated significant improvements over single-determinant wave functions of single-particle orbitals, the energy gains are typically smaller for partially spin-polarized systems [30], as contributions from the spin-triplet channel are missing.This naturally leads to the singlet-triplet-unpaired (STU) Pfaffian wave function [31,32], in which the pairing orbitals are explicitly decomposed into singlet and triplet channels.Then, the STU ansatz is expressed as the Pfaffian of a block matrix, with the singlet, triplet, and unpaired contributions partitioned into separate blocks.When the triplet blocks are zero, the STU wave function reduces to the geminal wave function.
Both the geminal and the STU wave functions rely on fixing the spin ordering of the interacting fermions.Consequently, they are not amenable to potentials that exchange spin, such as those used to model the interaction among nucleons [43].In neutron-matter calculations, for instance, the pairing orbital for the Pfaffian wave function can be taken as a product of a radial part and a spin-singlet part [33,34].The spin-triplet pairing has so far been neglected in neutron-matter calculations, but they can be treated similarly without requiring spin ordering.
To address the limitations of the works mentioned above, we take the most general form of the Pfaffian wave function [31,32] as the antisymmetric part of our ansatz where we assume the unpolarized case for this initial investigation.We do not keep the spins fixed, nor do we mandate a specific form for the pairing orbital φ(x i , x j ).Instead, we capitalize on the universal approximation property of feed-forward neural networks (FNN) [44] by defining the pairing orbital as where ν is a dense FNN.The above expression ensures that the Pfaffian is mathematically well-defined, as the matrix is skew-symmetric by construction φ(x i , x j ) = −φ(x j , x i ).Since ν takes all the degrees of freedom of a given pair of particles as input, including the spins, our pairing orbital has the capacity to discover the spinsinglet and spin-triplet correlations on its own.This design leaves our Pfaffian-Jastrow (PJ) ansatz agnostic to any particular form of the interaction and systematically improvable by simply increasing the size of ν.The input dimension of ν only depends on the spatial dimension d and not the total number of particles N , leading to an exceptionally scalable ansatz.Given the generality of our formulation, the Pfaffian ansatz calculation cannot be reduced to a determinant of singlet pairing orbitals as in the geminal wave function.Thus, the efficient computation of the Pfaffian is crucial to the scalability of our approach.To this aim, we implement the Pfaffian computation according to Ref. [45].
We further improve the nodal structure of our PJ ansatz through backflow (BF) transformations [46].To our knowledge, this is the first time neural BF transformations have been used in a Pfaffian wave function, although they have demonstrated their superiority over traditional BF transformations within the Slater-Jastrow formalism in numerous applications [15,16,27].We replace the original single-particle coordinates x i by new ones xi (X), such that correlations generated by the presence of all particles are incorporated into the pairing orbital.To ensure that the Pfaffian remains antisymmetric, the backflow transformation must be permutation equivariant with respect to the original x i , i.e. xi depends on x i and is invariant with respect to the set {x j } j =i .In Sec.II A 3, we discuss in detail how the backflow correlations are encoded via a permutationequivariant message-passing neural network.All calculations labeled as PJ-BF assume that we apply the transformation ν(x i , x j ) → ν( xi , xj ) to the FNN in Eq. ( 7).

Slater-Jastrow
For comparison, we will report results obtained using a Slater-Jastrow (SJ) ansatz, which amounts to taking the antisymmetric part of the wave function to be a Slater determinant of single-particle states In the fixed-node approximation, the single-particle states are the products of spin eigenstates with definite spin projection on the z-axis s α and plane wave (PW) orbitals with discrete momenta where χ α (s i ) = δ sα,si .Here, α = (k α , s α ) denotes the quantum numbers characterizing the state.We will label Slater-Jastrow NQS calculations using above plane wave orbitals as SJ-PW.As in the Pfaffian case, we improve the nodal structure of the above Slater determinant using backflow transformations generated by the message-passing neural network discussed in Sec.II A 3. We modify the spatial coordinates of Eq. ( 9) as where the complex backflow displacement u i (X) ∈ C d allows for changes in both the phases and amplitudes of the spatial part of the single-particle states.We also map the single-particle spinors onto the Bloch sphere as where θ i (X) ∈ R is the polar angle on the sphere.Both u i (X) and θ i (X) are permutation-equivariant functions of the original coordinates x i , the functional form of which will be discussed in Sec.II A 3. Appendix A includes a detailed explanation of Eq. (11).Slater-Jastrow NQS calculations using the backflow orbitals will be labeled as SJ-BF.

Message-Passing Neural Network
Implementing the aforementioned neural-network quantum states is possible using X as direct inputs to the appropriate FNNs and Deep-Sets [47].Still, it is advantageous to devise new inputs that already capture a large portion of the correlations.As in Ref. [37], we employ a permutation-equivariant message-passing neural network (MPNN) to iteratively build correlations into new onebody and two-body features from the original "visible" features.The visible features are chosen to be with the separation vectors r ij = r i − r j and distances r ij = r ij replaced by their L-periodic surrogates and the quantity s ij ≡ 2δ si,sj − 1 assigned a value of +1 for aligned spins and −1 for anti-aligned spins.Note that we have excluded explicit dependence on the particle positions r i in the visible one-body features, thereby enforcing translational invariance in the new features.Linear transformations are applied to and concatenated with each feature to obtain the initial hidden features The main purpose of the linear transformations is to preprocess the input data.Still, they also help simplify the implementation by keeping the dimension of the hidden features h ij constant for all t.In each iteration, t = 1, . . ., T of the MPNN, information is exchanged between the one-and two-body streams through a so-called "message" For a given particle i, relevant messages are collected and pooled together to destroy the ordering with respect to all other particles j = i, The pooling operation pool collapses the order of the elements in the set it acts upon and produces a vector with the same dimension as an individual element.Throughout this work, we use logsumexp-pooling, the smooth variation of max-pooling.
The pairwise messages m (t) ij and the implied particle messages m (t) i are then used to update the hidden features The functions M t , F t , and G t are all unique FNNs with the same output dimension as the linear preprocessors A and B. By incorporating concatenated skip connections to the visible features, we guarantee that the signal originating from the raw data remains discernible even as the MPNN depth T increases.Finally, we combine the resulting outputs h to feed into subsequent networks.The flow of information through the MPNN can be visualized in Fig. 1.Notice how the hidden features in a given layer depend on the hidden features of the previous layer and the original visible features.For all our NQS, we use a Jastrow correlator based on a Deep-Set [47] to enforce permutation invariance over the set of all pairwise features Here, ρ and ζ are FNNs, and the pooling operation is the same as in Eq. ( 19).While many Jastrow functions are typically designed to satisfy Kato's cusp condition [48] for specific systems, we take a different approach and allow our neural networks to learn the cusp fully.The short-range behavior of the ground state is particularly important for the UFG, so leaving our NQS completely unbiased serves as an important test for evaluating the overall capabilities of NQS.The Slater-Jastrow ansatz with plane wave orbitals (SJ-PW) does not require any additional neural networks beyond ρ and ζ, so it establishes a baseline for the number of trainable parameters in this work.On the other hand, the backflow variables u i and θ i for the Slater-Jastrow ansatz with backflow orbitals (SJ-BF) are the outputs of another Deep-Set (Re(u i ), Im(u i ), θ i ) = ρ bf pool {ζ bf (g ij ) | j = i} , (24) which is permutation invariant with respect to all j = i by construction.The size of ρ bf and ζ bf determines the number of extra variational parameters present in the SJ-BF ansatz compared to the SJ-PW ansatz.For the PJ ansatz, the pairing orbital ν in Eq. ( 7) simply takes g ij as input in place of (x i , x j ).Therefore, the number of additional variational parameters in the PJ ansatz relative to the SJ-PW ansatz is determined by the size of ν.
All of the feedforward neural networks mentioned throughout this section have at least two hidden layers with 16 nodes each.The activation function is GELU [49] and the weights/biases are initialized with glorot normal/zeros unless pretrained parameters are used.
It is worth highlighting that the individual feedforward neural networks within our NQS are solely dependent on the spatial dimension d and not the system size N .Therefore, even though this study focuses on benchmarking the N = 14 case, the trained NQS can be used as starting points for larger even-N , unpolarized systems without requiring any modifications to the network structure.This is an example of transfer learning, a powerful strategy that involves applying knowledge gained from solving one problem to another, often more challenging problem.

Variational Monte Carlo and Training
We train our NQS by minimizing the energy with respect to the variational parameters p.To compute the energy and its gradient ∇ p E using Monte Carlo integration, we sample positions R and spins S from |Ψ(R, S)| 2 in a way that preserves periodicity and total spin projection on the z-axis, as in Refs.[17,24].Since the ordering of the spins is not fixed, our ansätze can be immediately applied to any continuous-space Hamiltonian that exchange spin, such as Ref. [50].
A sophisticated optimization technique is critical for achieving an ansatz that is both compact and expressive.In this work, we employ the stochastic reconfiguration [51] (SR) algorithm with regularization based on the RMSprop method, introduced in Ref. [24].The parameters are updated as where η is a constant learning rate and G is the quantum geometric tensor [52].
Due to the strong and short-range nature of the interaction in Eq. ( 2), it is likely for the optimization process to get trapped in a local minimum when initialized with random parameters.To avoid this problem, we use transfer learning by pretraining the NQS on a softer interaction (µ = 5) before proceeding to harder ones (µ = 10,20,40).Not only does this approach improve the final converged energy, but the efficiency of the optimization process overall.The training for lower values of µ can handle a more aggressive learning rate δ and fewer samples.As a general guideline, we reduce δ by a factor of 10 and double the number of samples each time the value of µ is doubled.The number of optimization steps required for training ranges from O( 103 ) to O(10 4 ), depending on whether the neural quantum states (NQS) were pretrained or initialized with random parameters.

B. Diffusion Monte Carlo
The fixed-node DMC calculations are performed as described in Ref. [53].The initial state is prepared using VMC methods with a variational wave function with the same general form as Eq. ( 3).Note that, within the fixednode approximation, DMC provides a strict upperbound to the energy of the system.While DMC is a precise method, its accuracy relies on the choice of nodal surface and the quality of the preceding VMC calculation.The symmetric Jastrow factor is given by where n = N/2 and the unprimed and primed indicies denote the spin-up and spin-down particles, respectively.The parameters K and γ are adjusted so that u(d) = 0 and u (d) = 0, and µ J and d are variational parameters.Considering that the s-wave channel dominates the interaction, the antisymmetric part is given by the numberprojected BCS wave function with the pairing orbitals The parameters a(k 2 i ), b and d are obtained by minimizing the energy, and c is chosen so that the function β has zero slope at the origin.If we instead let β = 0 and restrict the sum in Eq. ( 30) to momentum states filled up to k F , the antisymmetric part is equivalent to the Slater determinant with single-particle plane waves as in Eqs.(8) and (9).Since this approach does not involve pairing, we will refer to the related DMC results as DMC-PW.Conversely, the approach that accounts for pairing will be identified as DMC-BCS.
It should be emphasized that the BCS wave function of Eq. ( 29) is a special case of the generalized Pfaffian of Eq. (6).In fact, it can be easily shown [32] that by only retaining the spin-singlet blocks, the calculation of the Pfaffian reduces to the determinant of spin-singlet block.

III. RESULTS
We first compare the performance of the various neural-network quantum states outlined in Sec.II A as the message-passing neural network (MPNN) depth T is varied.As shown in Fig. 2, the final converged energies per particle for the Slater-Jastrow ansatz with plane wave orbitals (SJ-PW) decreases monotonically towards the corresponding DMC-PW benchmark, with remarkable agreement at T = 5.This behavior echoes the findings of Ref. [54], and demonstrates the impact of the MPNN on the flexibility of our Jastrow.Incorporating backflow correlations into the Slater-Jastrow ansatz (SJ-BF) significantly improves results compared to the fixednode approach with PW, but more than half of the discrepancy between the two DMC energies remains.Due FIG. 3. Ground-state energies per particle as a function of the effective range.The DMC-BCS benchmark energies (blue circles) and the Pfaffian-Jastrow with backflow (PJ-BF) energies (orange triangles) are extrapolated to zero effective range using linear fits (dashed lines).The shaded regions are the error bands for the DMC-BCS and PJ energies.to the observed weak dependence on T , it is unlikely that further increasing T would yield a substantial improvement in energy.The SJ-BF ansatz may be able to achieve energies more similar to the DMC-BCS benchmark by increasing the width of the feedforward neural networks.Still, the associated computational expenses are expected to be prohibitively high.Therefore, we turn our attention to the Pfaffian-Jastrow (PJ) ansatz.Even with a single MPNN layer, the PJ ansatz easily outperforms DMC-BCS while also possessing fewer parameters (∼5600 v.s.∼6200) than the single-layer SJ-BF ansatz.
The overall dependence on the MPNN depth is weak, with T = 2 giving slightly lower energy and variance than T = 5.For the remainder of our analysis, we will use the PJ ansatz with T = 2, which contains about 8500 variational parameters.
As the unitary limit is characterized by a vanishing effective range, we study how the ground-state energy responds to changing k F r e in Fig. 3.The PJ ansatz gives energies ∼1-2% lower than DMC-BCS as the ef- fective range is decreased from k F r e = 0.4 to k F r e = 0.1.At k F r e = 0.05, our energy falls below the range of the DMC-BCS error band, suggesting our approach is likely to maintain its superior performance as r e is decreased further.To estimate the energy at zero effective range, we also perform simple linear fits -See Table I for the extrapolated values.Note that our results have been obtained by simulating a system of N = 14 particles for benchmark purposes.In order to obtain energies closer to the thermodynamic limit, further simulations with more particles will be needed [12,55].
In Fig. 4, we show the opposite-spin pair distribution functions at unitarity for µ = 5, 10, and 20.Notice how the peaks of the distributions at k F r = 0 grow roughly quadratically with µ, demonstrating the presence of strong pairing correlations as we approach the unitary limit µ → ∞.Clearly, the short-range character of the distributions are important to capture at unitarity, as they begin to converge around k F r 0.4.
Fig. 5 presents a complementary set of opposite-spin pair distribution functions in the crossover region with fixed effective range of k F r e = 0.2.When we lean towards the BCS phase 1/ak F = −0.5, the long-range tail of the density is enhanced compared to the unitary case 1/ak F = 0. On the other hand, the tail is diminished in the BEC phase 1/ak F = −0.5, suggesting the initiation of dimer formation.The differences in the peaks of the distributions are not as dramatic as in Fig. 4, but they are consistent with the expected behavior in the BCS and BEC regimes near unitarity.
Finally, we explore the BCS-BEC crossover region for a fixed effective range k F r e = 0.2 in Fig. 6.See Table II for the values of the interaction parameters v 0 and µ, as well as the corresponding DMC-BCS benchmarks and the PJ ansatz results.The cases closer to unitarity were used  to pretrain the cases further away.In the BCS regime, our PJ ansatz consistently yields energies ∼ 0.01E F G lower than those obtained from DMC-BCS, albeit with slightly inferior performance in the BEC regime.We attribute this effect to the need for increased flexibility in capturing the short-range behavior of pairs in the BEC regime.Simply increasing the size of feedforward neural network ν that defines the pairing orbital should alleviate this discrepancy.In any case, the PJ ansatz gives lower energies than DMC-BCS for all scattering lengths tested.

IV. CONCLUSIONS AND PERSPECTIVES
In this study, we propose a novel neural-network quantum state based on the Pfaffian-Jastrow (PJ) framework that utilizes a message-passing neural network (MPNN) to encode pairing and backflow (BF) correlations.We evaluate its performance against comparable Slater-Jastrow (SJ) ansätze with identical MPNN architectures.Our results indicate that increasing the depth of the MPNN systematically improves the performance of the SJ ansätze, but backflow correlations within the single-particle picture are still insufficient in capturing all pairing correlations.However, we demonstrate that a simple and compact PJ-BF ansatz surpasses the DMC-BCS benchmark with ease.Transfer learning has proven to be an essential tool in this work.It enables the realization of the unitary limit in a controlled manner, mitigating the risk of becoming trapped in local minima.It also allows for the efficient exploration of regions beyond unitarity, unlocking new avenues for studying the BCS-BEC crossover.Transfer learning will remain a crucial part of our training procedure as we move to larger systems.All unpolarized systems can be treated with a single architecture, while the N ± 1 systems can be treated by introducing one additional FNN to represent the unpaired single-particle orbital.This modification is straightforward to implement, making the calculation of the gap more accessible and enabling further advancements in our work.
Besides calculating the gap, our next steps include a direct comparison with the STU Pfaffian wave function of Ref. [32].We also plan to perform a more careful extrapolation to the r e → 0 limit since we have used relatively large values of k F r e for this initial investigation.However, more hyperparameter tuning will be needed, especially about the width of the hidden layers, since the smaller values of r e will require more flexibility.
Our Pfaffian-Jastrow-Backflow NQS displays immense potential in the study of ultra-cold Fermi gases.Unlike conventional methods, our PJ-BF ansatz is not subject to biases arising from physical intuition or a lack thereof, as it does not require specifying a particular form for the pairing orbitals.For this reason, it can be readily applied to other strongly-correlated systems, including molecules and other strongly-correlated quantum systems.In stark contrast to the commonly used geminal wave function, our ansatz does not rely on ordering the spin of the interacting fermions, and it is therefore amenable to Hamiltonians that exchange spin, such as those modeling nuclear dynamics.In this regard, we anticipate calculations of atomic nuclei and low-density isospin-asymmetric nucleonic matter and carry out detailed investigations on the nature of nuclear pairing [56].
When the stochastic reconfiguration algorithm and transfer learning techniques are combined with the enforcement of translational, parity, and time-reversal symmetries, highly non-perturbative correlations can be encoded in a small number of parameters by modern standards.This approach will pave the way for future developments in the study of many-body systems, as it offers a powerful tool for encoding correlations in a compact and computationally feasible manner.
Note Added: A work very recently appeared in preprint [57] introduces neural backflow transformations in a geminal wave function and studies the unitary Fermi gas.We leave systematic comparisons between the two approaches to future works while already observing that the Pfaffian wave function is a strict generalization of the Geminal state [32,58].
An appropriate spatial transformation is trivial.We simply define new parameters u i ∈ C d , called the backflow displacement, and shift the coordinates as r i = r i + u i .The backflow displacement is complex, allowing for changes in both the phases and amplitudes of the original plane wave orbitals.
For the spin part of the transformation, we look to spinors on the Bloch sphere for inspiration, In the above, we have introduced another backflow variable θ i ∈ R akin to the polar angle of a Bloch spinor, and we have excluded the relative phase in favor of a completely real-valued wave function.We also write the superposition in terms of |s i and the Pauli X-operator σ x i , which flips the spin of the i-th particle, rather than | ↑ and | ↓ .This way, it is obvious that |χ i = |s i when θ i = 0, as desired.The overlap of two spinors is given by (A7) In Eqs. ( 10) and ( 11), we use the notation u i (X) and θ i (X) to emphasize that the backflow "parameters" we define here are not variational parameters, but a function of all other particles.More specifically, they are permutation-equivariant functions of the original x i , whose functional forms depend on the outputs of the permutation-equivariant message-passing neural network (MPNN) described in Sec.II A 3.

FIG. 1 .
FIG. 1. Schematic representation of a message-passing neural network with T iterations.Dashed lines represent the concatenation operations, while solid lines represent the parameterized transformations (linear transformations and nonlinear feedforward neural networks).Messages, highlighted in pink, mediate the exchange of information between the one-and two-body streams, in blue.A yellow box indicates a single iteration of the network.

FIG. 2 .
FIG.2.Ground-state energies per particle as a function of the MPNN depth T for the SJ-PW (blue squares), SJ-BF (orange circles), and PJ-BF (green triangles) ansätze.The interaction parameters are set to v0 = 1 and µ = 5, corresponding to an effective range of rekF = 0.4.The DMC benchmark energies with and without pairing are displayed as solid and dashed lines, respectively.

1 )
* 0.405(1) * TABLE I. Energy per particle for various values of µ and the corresponding values of re.The values with asterisks ( * ) are extrapolations from the linear fits shown in Fig. 3.The parameter v0 = 1 is fixed.

FIG. 6 .
FIG. 6. Upper panel: Energy per particle in the BCS-BEC crossover region as a function of the scattering length a for a fixed effective range kF re = 0.2.Lower panel: Difference between Pfaffian-Jastrow with backflow (PJ-BF) and DMC-BCS benchmark energies.See Table II for the corresponding values of v0 and µ.

TABLE II .
Energies per particle and parameters for the twobody potential in Eq. (2) giving different scattering lengths with the same effective range kF re = 0.2.