Extracting high-order cosmological information in galaxy surveys with power spectra

The reconstruction method was proposed more than a decade ago to boost the signal of baryonic acoustic oscillations measured in galaxy redshift surveys, which is one of key probes for dark energy. After moving the observed overdensities in galaxy surveys back to their initial position, the reconstructed density field is closer to a linear Gaussian field, with higher-order information moved back into the power spectrum. We find that by jointly analysing power spectra measured from the pre- and post-reconstructed galaxy samples, higher-order information beyond the $2$-point power spectrum can be efficiently extracted, which generally yields an information gain upon the analysis using the pre- or post-reconstructed galaxy sample alone. This opens a window to easily use higher-order information when constraining cosmological models.

(P pre ) because the reconstruction restores the linear signal reduced by the non-linear evolution.
The higher order statistics such as B, the bispectrum, induced by the non-linear evolution are, in turn, reduced. Thus we can extract more cosmological information encoded in the linear density field from P post than P pre . On the other hand, the non-linear field contains information on small-scale clustering such as galaxy biases, which can provide better constraints on cosmological parameters by breaking the degeneracy between them. For the pre-reconstructed sample, this information can be extracted by combining the power spectrum with higher-order statistics. However, the power spectrum and higher order statistics such as the bispectrum are correlated, reducing our ability to estimate cosmological parameters. If we instead consider the post-reconstructed power spectrum, the covariance between the power spectrum and the bispectrum is reduced, and we can extract the information more efficiently. In this work, we show that the same improvement can be achieved by a joint analysis of P pre , P post and P cross (the cross-power spectrum between the pre-and post-reconstructed density fields). Due to the restored linear signal in the reconstructed density field, P post is decorrelated with P pre on small scales, which are dominated by the non-linear information. On these scales, the combination of P pre , P post and P cross has a similar ability to extract cosmological information as the combination of P post or P pre with the bispectrum because we are able to use the linear information in P post and higher-order information in P pre separately.
Let us rewrite the non-linear over-density field as δ = R + ∆, where R is the over-density field after reconstruction, which is closer to the linear field. It is then straightforward to express P pre , P cross in terms of P RR (= P post ), P ∆∆ (the power spectrum of ∆) and P ∆R (the cross-power spectrum between ∆ and R). Using perturbation theory 14 , we can show that, at the leading order, P ∆∆ contains the integrated contribution from the bispectrum of squeezed-limit triangles while P R∆ contains the integrated contribution from the trispectrum (T) of folded/squeeze-limit quadrilaterals (those quadrilaterals with zero area and two-by-two sides equal). In this fashion when combining P pre and P cross with P post , we are essentially adding in higher-order signal, thus naturally gaining information. Note that in order to match the information obtained by adding these two extra statistics, it is not enough to consider the bispectrum signal of the pre-reconstructed field, but both bispectrum and trispectrum signals. For this reason the information content of P pre + B is different from that contained in P pre + P cross + P post . However, it is important to note that higher-order information that reconstruction brings is only a part from the total contained in the full bispectrum and trispectrum data-vectors. This is why a full analysis using P + B + T will always provide more information. However, such an analysis is not very practical because of the size of the full data-vector and the computational time typically required to measure B and especially T directly.
In this paper we show that P pre + P cross + P post is an efficient alternative for extracting the relevant information from higher-order statistics for cosmological analyses.
To demonstrate, we perform an anisotropic Lagrangian reconstruction (see Methods for details) on each realisation of the MOLINO galaxy mocks 15 , which is a large suite of realistic galaxy mocks produced from the Quijote simulations 16 at z = 0. We then use these mocks to calculate the data covariance matrix and derivatives numerically for a Fisher matrix analysis 17 using the measured multipoles (up to = 4) of P pre , P post and P cross on the parameter set Θ ≡ Panel a in Figure 1 shows the measured power spectra monopole, and we see that P cross decreases dramatically with scale compared to P pre and P post . This indicates a decorrelation between P pre and P post below quasi-nonlinear scales (k 0.1 h Mpc −1 ), which is largely due to the difference in levels of nonlinearity in P pre and P post .
From the original data vector {P pre , P post , P cross }, we can construct their linear combinations, P R∆ , P ∆∆ , P δ∆ defined as Panel b in Figure 1 show these power spectra. As discussed above, these power spectra involving ∆ contain the information of part of the high-order statistics such as bispectrum and trispectrum.
The correlation matrix for the monopole of power spectrum and bispectrum (only the correlation with the squeezed-limit of B 0 is visualised for brevity) is shown in Figure 2. It is seen that P pre 0 highly correlates with B 0 , confirming that the bispectrum is induced by nonlinearities. In contrast, P post 0 weakly correlates with B 0 , or with P pre 0 and P cross 0 on nonlinear scales (e.g., at k 0.2 h Mpc −1 ). This, however, does not mean that P post is irrelevant to the bispectrumit actually is a mixture of P pre and certain integrated forms of the bispectrum and trispectrum information 9,14 . Therefore by combining P post with P pre and P cross , one can in principle decouple the leading contribution in the power spectrum, bispectrum and trispectrum. The integrated form of the bispectrum information dominates P ∆∆ , which strongly correlates with B 0 , as shown in Supplement Figure 2. The fact that P post barely correlates with B 0 implies that the information content in P post combined with B 0 may be similar to that in P post combined with P pre and P cross , which is confirmed to be the case by the results from the Fisher analysis presented below.
The cumulative signal-to-noise ratio (SNR) for power spectrum multipoles (up to = 4), bispectrum monopole, and various data combinations are shown in Supplement Figure 3, in which we can see that the joint 2-point statistics, P all , is measured with a greater SNR than that of P post or P + B 0 , which may mean that P all can be more informative than P post or P + B 0 for constraining cosmological parameters.
To confirm, we then project the information content in the observables onto cosmological parameters using a Fisher matrix approach. Contour plots for (logM 0 , σ 8 ) derived from different datasets with two choices of k max (the maximal k for the observables used in the analysis) are shown in Figure 3 (more complete contour plots are shown in Supplement . The smoothing scale is set to be S = 10 h −1 Mpc when performing the reconstruction. The degeneracies between parameters using P pre , P post and P cross are generally different, because P pre , P post and P cross differ to a large extent in terms of nonlinearity on small scales. This is easier to see in Supplement Figure 4, in which contours for the same parameters are shown for observables used in several k intervals. The contours derived from P pre and P post generally rotate as k increases because of the kick-in of nonlinear effects, which affects P pre and P post at different levels on the same scale. This significantly improves the constraint when these power spectra are combined, labelled as P all , which is tighter than that from the traditional joint power spectrum-bispectrum analysis (P pre + B 0 ). Interestingly, P all can even win against P post + B 0 in some cases, demonstrating the To further quantify our results, in Figure 4 we compare the square root of the Fisher matrix element for each parameter, with and without marginalising over others, derived from P all and P pre + B 0 , respectively, with two choices of k max .
For k max = 0.2 h Mpc −1 , we see that the Fisher information for each parameter (panel a: without marginalising over others) derived from P all is identical or even greater than that in P pre + B 0 . In other words, combining all power spectra we can exhaust the information in P pre +B 0 . After marginalising over other parameters, panel b shows that the uncertainty on each parameter gets redistributed due to the degeneracy. The ratios for the HOD parameters are all greater than unity especially for logM 0 and σ logM , demonstrating the power of our new method on constraining HOD parameters. The information content for cosmological parameters in P pre + B 0 is well recovered by using P all , although the recovery for M ν is relatively worse. The overall trend for the case of k max = 0.5 h Mpc −1 is similar, although the advantage of using P all over P pre +B 0 gets degraded to some extent. However, P all is still competitive: it almost fully recovers the information for the HOD parameters in P pre + B 0 with or without marginalisation, and largely wins against P pre + B 0 after marginalisation. Regarding the cosmological parameters, P all recovers all information in P pre + B 0 before the marginalisation, although the recovery is slightly worse for M ν . After marginalisation when the uncertainties are redistributed, the constraint from P all is generally worse than P pre + B 0 , especially for M ν .
The 68% CL constraints on each parameter fitting to various datasets are shown in Table 1. To quantify the information gain, we evaluate the Figure- where F denotes the Fisher matrix and N p is the total number of free parameters. For the ease of comparison, for cases with different k max , we normalise all the quantities using the corresponding one for P pre . As shown, for k max = 0.2 h Mpc −1 , (FoM) P all is greater than all others, namely, it is larger than (FoM) Ppre and (FoM) Ppost by a factor of 2.7 and 1.7, respectively, and it is even greater than (FoM) Ppost+B 0 by ∼ 13%. For k max = 0.5 h Mpc −1 , (FoM) P all is also more informative than (FoM) Ppre and (FoM) Ppost by a factor of 2.1 and 1.5, respectively, and is the same as (FoM) Ppre+B 0 , but is less than (FoM) Ppost+B 0 by ∼ 10% in this case.
To highlight the constraining power on cosmological parameters, we also list FoM cos , which is the FoM with all HOD parameters fixed. It shows a similar trend as FoM Θ : P all is the most informative data combination for k max = 0.2 h Mpc −1 , but it is outnumbered by P pre + B 0 and P post + B 0 by 13% and 30%, respectively, for the case of k max = 0.5 h Mpc −1 .
As demonstrated in this analysis, a joint analysis using P pre , P post and P cross is an efficient way to extract high-order information from galaxy catalogues, and in some cases, P all is more informative even than P post + B 0 , which is computationally much more expensive.
In this example, the k-binning for P and B are different, namely, ∆k(B) = 3k f ∼ 0.019 hMpc −1 ∼ 1.9 ∆k(P ) where k f denotes the fundamental k mode given the box size of the simulation. We have checked that using a finer k-binning for B only improves the constraints marginally 19 , namely, the FoM can only be raised by ∼ 10% when ∆k(B) is reduced from 3k f to k f , which is largely due to the strong mode-coupling in B as shown in Figure 2. Such a fine binning is not practical anyway as, for example, using ∆k(B) = k f up to k = 0.5 h Mpc −1 , we end up with more than 50, 000 data points to measure for B 0 .
Note that the MOLINO mock is produced at z = 0, where the nonlinear effects are the strongest. At higher redshifts, the density fields are more linear and Gaussian, thus we may expect less gain from our new method. This can be seen from panel a of Supplement Figure 9, in which the correlation between P pre and P post at various redshifts is shown. As expected, P pre and P post are more correlated at higher redshifts, e.g., the correlation approaches 0.95 at z = 5 around k ∼ 0.3 h Mpc −1 , which implies that almost no information gain can be obtained at such high redshifts. As argued previously, the decorrelation at lower z is due to the fact that P pre and P post contain different levels of nonlinearity, as illustrated in panel b, thus are complementary. Also, the Alcock-Paczyński (AP) effect 20 , which is a geometric distortion due to the discrepancy between the true cosmology and the fiducial one used to convert redshifts to distances, is irrelevant at z = 0 21 .
As studied 22 , the AP effect can make the small-scale bispectrum more informative for constraining the standard ruler than the power spectrum (P pre ), thus it is worth revisiting the case in which P post and P cross are added to the analysis.
To further demonstrate the efficacy of our new method, we perform another analysis at a higher redshift using P and B 0 from an independent set of mocks: a suite of 4000 high-resolution N -body mocks (512 3 particles in a box with 500 h −1 Mpc a side) produced at z = 1.02. This allows us to include the AP effect when performing the BAO and RSD analysis. This test confirms that P pre , P post and P cross are complementary for constraining cosmological parameters, and that P all contains almost all the information in P combined with B 0 , which is consistent with our findings from the MOLINO analysis (see Methods and Supplement Figures 12-15 for more details).

Stage-IV redshift surveys including the Dark Energy Spectroscopic Instrument (DESI) 23 ,
Euclid 24 and the Prime Focus Spectrograph (PFS) 25 will release galaxy maps over a wide range of redshifts with an exquisite precision. As long as the distribution of a tracer in a given redshift range is not too sparse, namely, the number density is not lower than 10 −4 h 3 Mpc −3 so that a reconstruction can be efficiently performed 26 , the novel method presented in this work can be directly applied to extract high order statistics for constraining cosmological parameters from 2-point measurements, which is computationally much more efficient to perform. Since the reconstruction will be performed anyway for most ongoing and forthcoming galaxy surveys to improve the BAO signal, our proposed analysis can be performed at almost no additional computational cost.
Additional work is required to build a link between cosmological parameters to the full shape of power spectra for a likelihood analysis, and this is challenging using perturbation-theory-based models on (quasi-) nonlinear scales, especially for the reconstructed power spectrum and the cross power spectrum. However, the model-free approaches including the simulation-based emulation 27-29 , can be used for performing the P all analysis down to nonlinear scales, in order to extract the cosmological information from the power spectra to the greatest extent.

The estimation of the covariance matrix and derivatives
The 15, 000 MOLINO mocks for the fiducial cosmology are designed for accurately estimating the covariance matrices of the galaxy clustering observables, including the power spectra and bispectra. In addition, a separate set of the MOLINO mocks are produced for estimating the derivatives with respect to cosmological parameters (including the HOD ones) using the finite difference method (see Supplement Information for details). For this purpose, 60, 000 galaxy mocks are constructed at 24 cosmologies that are slightly different from the fiducial one 15 .
Since the data covariance matrices and the derivatives are all evaluated numerically using mocks, it is important to ensure that the result derived from the Fisher matrix approach is robust against numerical issues, as argued in 1-3, 15 . We therefore perform numerical tests to check the dependence of our Fisher matrix calculation on the number of mocks, and find that the marginalised uncertainties of all the concerning parameters are well converged given the number of mocks available. The details are presented in the Supplement Information.

1.2
The 4000 high-resolution N -body mocks at z = 1.02 To confirm our findings from the MOLINO mocks, we perform an independent mock test on a suite of 4000 high-resolution N -body simulations with 512 3 dark matter particles in a L = 500 h −1 Mpc box at z = 1.02 11 . The fiducial cosmology used for this set of mocks is consistent with the Planck 2015 4 observations.

The COLA mocks at multiple redshifts
To investigate how the decorrelation between P pre and P post varies with redshifts, we perform Note that the information content in the reconstructed power spectrum is the same no matter whether the RSD is kept or not during the reconstruction process, and we have numerically confirmed this by performing the analysis with the isotropic reconstruction 13 , in which the RSD is removed using the fidicual f and b used for producing the mocks.

Measurement of the power spectrum multipoles
The multipoles (up to = 4) of both the pre-and post-reconstructed density fields are measured using an FFT-based estimator 9 implemented in N-body kit 10 . The shot-noise, which reflects the discreteness of the density field, is removed as a constant for the monopole of the auto-power.
The k-binning is ∆k = 0.01 hMpc −1 for both the MOLINO and N -body mocks.
Care needs to be taken when measuring the cross-power spectrum between the pre-and post-reconstructed density fields, since the raw measurement using the FFT-based estimator is contaminated by a scale-dependent shot-noise: on large scales, the post-reconstructed field resembles the unreconstructed one, making the cross-power spectrum essentially an auto-power, thus it is subject to a shot-noise component. On small scales, however, the shot-noise largely drops because the two fields effectively decorrelate.
To obtain a measured cross-power spectrum whose mean value reflects the true power spectrum in the data such that no subtraction of the noise component is required, we adopt the "half-sum (HS) half-difference (HD)" approach 11 . We start by randomly dividing the catalog into two halves, dubbed δ 1 and δ 2 , and the corresponding reconstructed density fields are R 1 and R 2 , respectively. Let and Then HS (R) contains both the signal and noise, but HD (R) only contains the noise. Hence the cross-power spectrum estimator is, The scatter ofP cross around the mean value allows for an estimation of the covariance matrix, which is a 4-point function 12  The noise is anisotropic, and thus it affects even for multipoles with = 0 13 .
Since a change in HOD parameters can result in a change in the number density of the galaxy sample and thus affect the shot-noise, the shot-noise can in principle be used to constrain the HOD parameters. We therefore perform an additional Fisher projection with the shot-noise kept in the spectra, and find that the constraints on HOD parameters can be improved in general, but the constraint on cosmological parameters is largely unchanged (see the P SN all column in Table 1).

Measurement of the bispectrum monopole
We measure the galaxy bispectrum monopole, B 0 , for all of the mock catalogs using the publicly available pySpectrum package 1, 15 . Galaxy positions are first interpolated onto a grid using a fourth-order interpolation scheme and then Fourier transformed to obtain δ(k). Afterwards B 0 is estimated using where δ D is the Dirac delta function, V B is the normalization factor proportional to the number of triplets that can be found in the k 1 , k 2 , k 3 triangle bin, and B SN 0 is the Poisson shot noise correction term. Triangle configurations are defined by k 1 , k 2 , k 3 , and for the MOLINO mocks, the width of the bins is ∆k = 3k f , where k f = 2π/(1000 h −1 Mpc), and for the N -body mocks, ∆k = 0.02 hMpc −1 .

An AP test performed on the MOLINO mocks
Although the AP effect plays no role for the MOLINO mock since it is produced at z = 0 21 , we perform a test by isotropically stretching the scales and angles using pairs of AP parameters calculated at a non-zero redshift. This gives us an idea about whether this artificial and exaggerated AP effect can change the main conclusion of this work that the cosmological information content in P all is almost the same as or more than that in P pre + B 0 . In practice, we use the (α || and α ⊥ ) pairs computed at z eff = 0.5 and 1.0 respectively to stretch the wave numbers along and across the line of sight directions, and repeat the analysis. As shown in the Supplement Table, this added 'artificial' AP effects can generally tighten the constraint, but the relative constraints from P all and P pre + B 0 are largely unchanged, meaning that the main conclusion of this paper remains the same if the AP effect is taken into account.

An AP test on the N -body mocks
We perform a Fisher matrix analysis 17 on the AP parameters using the 4000 N -body mocks.
Part of the observables (the power spectrum monopole) are shown in Supplement Figure 12.
From panel a we see that the amplitude of P cross decreases dramatically with scales, indicating a decorrelation between P pre and P post below quasi-nonlinear scales, which is confirmed by the correlation coefficient (the normalised covariance) plotted in panel b. This decorrelation, which is not caused by the shot noise given the negligible noise level in the mocks, is a clear evidence of the complementarity among the power spectra.
The cumulative signal-to-noise ratio (SNR) is shown in panel a of Supplement Figure 13, in which we see that P all is more informative than P pre , and that P + B 0 has slightly higher SNR on small scales.

Constraints on the isotropic BAO parameter
We first perform an AP test on the isotropic dilation parameter α iso , which is defined as the ratio of the true spherically-averaged scale of the standard ruler to the fiducial one. This dilation parameter depends on cosmological parameters, and can be constrained using the monopole of the power spectrum and bispectrum. The wavenumber k gets dilated by α iso due to the AP effect, thus the observables are, where T denotes the type of P 0 , namely, T = {pre, post, cross}, and the parameters A 0 and A B are used to parameterize the overall amplitudes of power spectrum monopole and bispectrum monopole, respectively. The free parameters are {α iso , ln A 0 , ln A B }. The derivative with respect to α iso is evaluated semi-analytically as Then the constraint on α iso is derived after marginalising over the amplitudes A 0 and A B , and it is shown in panel b of Supplement Figure 13. The FoM of α iso shows up step-like features due to the BAO feature, as previously discovered 22 , and P all offers the greatest FoM, until overtaken by P + B 0 at k max 0.37 hMpc −1 .

The anisotropic AP test
We use the first three even multipole moments to assemble the two-dimensional power spectrum, i.e., The bispectrum is similarly assembled using the first three even multipoles with m = 0 14 , which are the most informative ones 15 , i.e., The wave-number k i and the cosine of the angle to the line-of-sight µ i are stretched by two dilation parameters α ⊥ and α || due to the AP effect 16,17 , The power spectrum multipoles (the index for the type is omitted for brevity) and bispectrum monopole including the AP effect are respectively given as, The free parameters are {α ⊥ , α || , ln A , ln A B }, where A ( = 0, 2, 4) denotes the overall amplitudes of the power spectrum multipoles, and A B is the amplitude of the bispectrum monopole. The derivatives with respect to the parameters α ⊥ and α || are evaluated numerically by where O ∈ {P , B 0 } denotes the observables, and the step size ∆α i = 0.01. Then the constraints on α ⊥ and α || are derived after marginalising over the amplitudes A and A B .
The FoM for α ⊥ , α || is shown in panel c of Supplement Figure 13, and it shows a similar trend as FoM(α iso ). The contour plot for α ⊥ , α || with k max = 0.4 hMpc −1 is shown in Supplement Figure   14, further highlighting the strong constraining power of P all in comparison to that of P + B 0 .

A joint BAO and RSD analysis on the N -body mocks
In addition to α ⊥ , α || , we add one more parameter to the analysis, which is ∆v, the parameter describing the change of velocities along the line of sight. This parameter mimics the change of the linear growth rate on large scales, but it also changes the velocity of particles coherently on small scales. We compute the derivatives with respect to ∆v numerically.
The projection onto the parameters, shown in Supplement Figure 15, demonstrates the advantage of performing a joint analysis using P pre , P post and P cross . On large scales, P pre and P post are both determined by the linear density field, making the power spectra highly correlated. As shown in panels c 1 and c 5 , the contours derived from P pre and P post have similar orientations and we do not gain by combining them. For k > 0.15 h Mpc −1 , the correlation between P pre and P post decreases as the pre-reconstructed density field is dominated by the non-linear field while P post still retains the correlation with the linear density field. The contours shown in lines in panel c 7 , which are derived from power spectra in the k range of [0.2, 0.25] h Mpc −1 , are almost orthogonal to each other, making the constraint from the combined spectra, as illustrated in the shaded region, significantly tightened. On smaller scales, the post-reconstructed density field is also dominated by the non-linear field and the orientations of the contours are again aligned and the complementarity on smaller scales weakens. This shows that the level of nonlinearity in the power spectrum determines the degeneracies between parameters. Since P pre and P post are affected by different levels of nonlinearities on a given scale, which gives rise to different degeneracies, a joint analysis using both P pre and P post (and P cross ) can yield a better constraint by breaking the degeneracies.
Data Availability The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.

The convergence of the Fisher matrix
The Fisher matrix is formed as follows, where D and C are the data vector storing the numerical derivatives, and the data covariance matrix, respectively, thus the convergence of F relies on the convergence of D and C −1 against the number of mocks. We therefore check the convergence of the marginalised uncertainties of cosmological parameters, as D and C −1 are estimated using different number of mocks.
Panel a in Supplement Figure 1 demonstrates the excellent convergence of F against C −1 (with D estimated using all the mocks), with the correction factors applied to remove the bias 18 .
We then check the convergence of F against D (with C −1 estimated using all the mocks). We pay particular attention to the Fisher element for the neutrino mass, because it is recommended to estimate using four pieces of the power spectrum (or bispectrum), shown in Eq. (16) 16 , instead of two for other parameters, thus F Mν Mν may be the most affected Fisher element by numerical issues, if there are any.
where M ν should be sufficiently small so that the finite difference approaches the derivative. In practice M ν is set to 0.1 eV for the Molino mocks. Note that the way the terms are assembled in Eq. (16) is to remove the second order terms in the Taylor series. However, it is not the only arrangement that is free from the second order terms so may not be optimal for removing the numerical noises. The most general form for ∂ S ∂Mν that is second-order terms free is, and λ is a free parameter. The scheme in Eq. (16) corresponds to λ = −1/12, and the three-term scheme shown in 16 corresponds to λ = 0. One can, in principle, tune λ to optimise the convergence of the Fisher matrix. The optimisation finds that λ ∼ −1/4, which effectively removes the S(M ν ) term. This means that M ν = 0.1 eV for P or B may cause numerical instabilities thus the final expression we use for ∂ S ∂Mν is, The convergence of σ θ against number of mocks is illustrated in panels b and c in Supplement   Figure 4. Contour plots derived from the MOLINO mock.
The 68% CL contour plots for (σ 8 , logM 0 ) and (σ 8 , σ logM ) derived from various combinations of spectra measured from the MOLINO mock in several k ranges.  The 68% CL contour plot for α ⊥ and α || derived from various data combinations with k max = 0.4 h Mpc −1 .