Measuring the expansion history of the Universe is one of the key goals in cosmology. The best constraints now are those from the measurements of the distance–redshift relation over a wide range of redshifts1. The baryon acoustic oscillation (BAO) feature in the clustering of galaxies is a standard ruler for robust distance measurements2,3. BAOs arise from tight coupling of photons and baryons in the early Universe. Sound waves travel through this medium and give rise to a characteristic scale in the density perturbations, corresponding to the propagation distance of the waves before the recombination. With large galaxy surveys, using BAOs, distances have been measured to a per cent level at various redshifts4,5,6,7,8.

In the aforementioned studies, only the spatial distributions of galaxies are used, and the shapes and orientations of galaxies are ignored. The intrinsic alignment (IA) of galaxies is usually treated as a contaminant in weak-lensing analysis9,10. However, as we will demonstrate using actual observations in this work, IA is actually a promising cosmological probe and contains valuable information. The galaxy–ellipticity (GI) and ellipticity–ellipticity (II) IAs were first detected in luminous red galaxies11,12,13. Studies showed that the IA of galaxies can be related to the gravitational tidal field using the linear alignment (LA) model14,15,16. According to the LA model, a BAO feature in both GI and II correlations shows up as a dip, rather than a peak17,18,19 as seen in the galaxy–galaxy (GG) correlations. Furthermore, the entire two-dimensional and anisotropic pattern of GI and II correlations may provide information additional to that of the BAOs. The results were tested and confirmed in N-body simulations20. Taruya and Okumura21 also performed a forecast of cosmological constraints for IA statistics using the LA model, and found that IA can provide a similar level of constraints on cosmological parameters to the galaxy spatial distributions.

In this Article, we report a measurement of the BAO feature using IA statistics, namely GI, and confirm that IA can provide additional information to GG correlations. In addition to reducing statistical uncertainties of the distance measurements, GI can also provide a test of systematics when compared with the BAO measurements from GG. Details of this analysis, including the datasets, measurements and modelling, are presented in Methods. Our fiducial results, as shown in the main text, are derived using data vectors in the range of 50–200 h−1 Mpc with a polynomial marginalized over. This range is chosen to ensure that the polynomial captures the entire shape and to avoid the turnover around 20 h−1 Mpc. For completeness, we also show results with other choices of the fitting range and with or without marginalizing over the polynomial in Supplementary Figs. 4 and 5 and Supplementary Table 1. We observe a 2–3σ BAO signal in all of these variations.

The measurements and modelling of the isotropic GI and GG correlation functions are shown in Fig. 1. The GI measurements show an apparent dip around the BAO scale at ~100 h−1 Mpc from both the pre- (panel a) and post-reconstructed samples (panel b). The fitting results are reasonable on all scales, namely, the reduced χ2 is 0.97 for pre-reconstruction and 0.87 for post-reconstruction. To show the significance of the BAO detection, we display the Δχ2 ≡ χ2 − χmin2 surfaces in Fig. 2 (see also Supplementary Fig. 1), where χmin2 is the χ2 for the best-fitting model. We compare Δχ2 for the no-wiggle model with the BAO model and find a 3σ detection of the BAO feature in both the pre- and post-reconstructed samples.

Fig. 1: Measurements and modelling of GI and GG correlation functions.
figure 1

a, Pre-reconstruction GI correlations. b, Post-reconstruction GI correlations. c, Post-reconstruction GG correlations. d, Post-reconstruction combined modelling (GI multiplied by 4 for better illustration). Points and error bars show the mean and s.e.m. of clustering measurements. Errors are from the diagonal elements of the jackknife covariance matrices estimated using 400 subsamples. Lines and shading are the best-fit models and 68% confidence-level regions derived from the marginalized posterior distributions.

Fig. 2: Plot of Δχ2 versus α for the pre- (left) and post- (right) reconstruction GI correlations.
figure 2

Orange lines show the Δχ2 for non-BAO models and blue lines those for BAO models.

The constraint on α, which represents the deviation from the fiducial cosmology (Methods), is \(1.05{0}_{-0.028}^{+0.030}\) and \(1.05{7}_{-0.036}^{+0.035}\) using the pre- and post-reconstructed samples, respectively. Both results are in good agreement (within 2σ) with the fiducial Planck18 (ref. 22) results (TT, TE, EE + lowE + lensing + BAO), which assumes a Λ cold dark matter model (where Λ is the cosmological constant) cosmology. We find that the constraint is not improved after reconstruction for the GI correlations, which may be due to the fact that we only reconstruct the density field and keep the shape field unchanged. The result may be further improved in principle if the shape field is also reconstructed, which is left for a future study.

The post-reconstructed GG (panel c) and combined (panel d) measurements and modelling are also shown in Fig. 1. The GG correlation alone measures α to be \(0.98{6}_{-0.013}^{+0.013}\) and the combined GG + GI derives the constraint to be \(0.99{7}_{-0.012}^{+0.012}\), demonstrating that the GI measurement gives rise to a ~10% improvement in terms of the uncertainty on α. More importantly, as we mentioned above, the next-generation surveys can tighten the GG BAO constraints by a factor of 2–3 (refs. 1,23), and systematic errors will become more and more important. If the GI measurements can be improved to the same level, comparisons between sub-per-cent (<0.5%) GG measurements and 1% GI measurements can provide a check of the systematic bias in the measurements. Our GG results are consistent with the results reported by the Baryon Oscillation Spectroscopic Survey (BOSS) team6 within the 68% confidence level uncertainty, although the numbers are slightly different due to a few effects, including fitting ranges, details of radial bins and error estimations.

In Fig. 3, we convert the constraints on α to distance DV/rd measured at redshift z = 0.57, with the fiducial values DV,fid(0.57) = 2,056.58 Mpc, rd,fid = 147.21 Mpc and DV,fid/rd,fid = 13.97 for Planck18 (ref. 22). The quantity DV/rd is measured to be \(14.6{7}_{-0.40}^{+0.42}\), \(14.7{7}_{-0.50}^{+0.49}\), \(13.7{7}_{-0.19}^{+0.18}\) and \(13.9{1}_{-0.17}^{+0.17}\) using the pre-reconstructed GI, post-reconstructed GI, GG and GI + GG, respectively. All these results are consistent with Planck18 within the 2σ level.

Fig. 3: Constraints of DV/rd from GG and GI correlation functions with NJK = 400.
figure 3

A combined post-reconstructed GG + GI constraint is also provided. The central values are the medians, and the error bars are the 16th and 84th percentiles. The vertical orange line shows the fiducial Planck18 results.

In this work, we obtain a 2–3σ measurement of the BAO dip feature in GI correlations, although the constraints on the distance from GI are only around one-third of those from GG, much weaker than predicted by Taruya and Okumura21 using the LA model. The reason may be that the galaxy–halo misalignment24 can reduce the IA signals and weaken the BAO constraints, which may not be considered appropriately by Taruya and Okumura21. According to Okumura and Jing24, on taking into account the misalignment the GI signals can be reduced by two- to threefold, which is consistent with our results. Moreover, since realistic mock catalogues for galaxy shapes are unavailable as yet, the covariance matrices in this study are estimated using the jackknife resampling method. Employing more reliable error estimation techniques could potentially improve the accuracy of the results, and is left for a future study. Nevertheless, the results are already promising. With the next-generation spectroscopic and photometric surveys including the Dark Energy Spectroscopic Instrument (DESI)23 and the Legacy Survey of Space and Time25, we will have larger galaxy samples and better shape measurements. We expect that the IA statistics can provide much tighter constraints on cosmology from BAO and other probes26,27.


Statistics of the IA

The shape of galaxies can be characterized by a two-component ellipticity, which is defined as follows:

$${\gamma }_{(+,\times )}=\frac{1-{q}^{2}}{1+{q}^{2}}(\cos (2\theta ),\sin (2\theta ))\,\,,$$

where q represents the minor-to-major axial ratio of the projected shape, and θ denotes the angle between the major axis projected onto the celestial sphere and the projected separation vector pointing towards a specific object.

The GI correlations, denoted as the cross-correlation functions between density and ellipticity fields, can be expressed as15,18

$${\xi }_{{\mathrm{g}}i}({{{\mathbf{r}}}})=\langle [1+{\delta }_{\mathrm{g}}({{{{\mathbf{x}}}}}_{1})][1+{\delta }_{\mathrm{g}}({{{{\mathbf{x}}}}}_{2})]{\gamma }_{i}({{{{\mathbf{x}}}}}_{2})\rangle .$$

Here, δg is the overdensity of galaxies, r = x1 − x2 and i = {+, ×}.

In this work, we focus on the GI correlation ξg+ since the signal of ξ vanishes due to parity considerations. It is worth noting that the IA statistics exhibit anisotropy even in real space due to the utilization of projected shapes of galaxies, and the presence of redshift space distortion28 can introduce additional anisotropies in ξg+(r). Therefore, we define the multipole moments of the correlation functions as29

$${X}_{\ell }(r)=\frac{2\ell +1}{2}\int\nolimits_{-1}^{1}{\mathrm{d}}\mu \,X({{{\mathbf{r}}}}){{{{\mathcal{P}}}}}_{\ell }(\mu ).$$

Here, X represents one of the correlation functions, \({{{{\mathcal{P}}}}}_{\ell }\) denotes the Legendre polynomials and μ corresponds to the directional cosine between r and the line-of-sight direction.

The LA model

On large scales, the LA model is frequently employed in studies of IAs14,15,18. This model assumes a linear relationship between the ellipticity fields of galaxies and halos and the gravitational tidal field.

$${\gamma }_{(+,\times) }({{{\mathbf{x}}}})=-\frac{{C}_{1}}{4\uppi G}\left({\nabla }_{x}^{2}-{\nabla }_{y}^{2},2{\nabla }_{x}{\nabla }_{y}\right){\varPsi }_{\mathrm{p}}({{{\mathbf{x}}}}),$$

where Ψp represents the gravitational potential, G denotes the gravitational constant and C1 characterizes the strength of IA. Although the observed ellipticity field is density weighted, namely [1 + δg(x)]γ(+,×)(x), the term δg(x)γ(+,×)(x) is subdominant on large scales18 because δg(x)  1, and it can be neglected at the BAO scale. In the Fourier space, equation (4) can be expressed as

$${\gamma }_{(+,\times) }({{{\mathbf{k}}}})=-{\widetilde{C}}_{1}\frac{\left({k}_{x}^{2}-{k}_{y}^{2},2{k}_{x}{k}_{y}\right)}{{k}^{2}}\delta ({{{\mathbf{k}}}}).$$

Here, \({\widetilde{C}}_{1}(a)\equiv {a}^{2}{C}_{1}\bar{\rho }(a)/\bar{D}(a)\), where \(\bar{\rho }\) represents the mean mass density of the Universe, \(\bar{D}(a)\propto D(a)/a\), and D(a) corresponds to the linear growth factor, with a denoting the scale factor.

Then, ξg+(r) can be represented by the matter power spectrum Pδδ18,30:

$${\xi }_{{\mathrm{g}}+}({{{\mathbf{r}}}})={\widetilde{C}}_{1}{b}_{\mathrm{g}}(1-{\mu }^{2})\int\nolimits_{0}^{\infty }\frac{{k}^{2}\,{\mathrm{d}}k}{2{\uppi }^{2}}{P}_{\delta \delta }(k\,){j}_{2}(kr),$$

where bg is the linear galaxy bias and j2 is the second-order spherical Bessel function.

The redshift space distortion effect28 can also be considered in ξg+(r) at large scales18. However, in this work, we do not consider the redshift space distortion effect and only focus on the monopole component of ξg+(r) given the sensitivity of current data.

$${\xi }_{{\mathrm{g}}+,0}(r)=\frac{2}{3}{\widetilde{C}}_{1}{b}_{\mathrm{g}}\int\nolimits_{0}^{\infty }\frac{{k}^{2}\,{\mathrm{d}}k}{2{\uppi }^{2}}{P}_{\delta \delta }(k\,){j}_{2}(kr).$$

We plan to measure the entire two-dimensional ξg+(r) with future large galaxy surveys, which may contain much more information. To test the LA model, Okumura et al.20 measured the IA statistics in N-body simulations and found that the results agree well with the predictions from the LA model on large scales. Thus, it is reasonable to use the above formula of ξg+,0(r) for BAO studies.

Fitting the BAO scale

We fit the BAO features in GG correlations following the SDSS-III BOSS DR12 analysis6,31.

To model the BAO features in GI correlations, we adopt a methodology similar to that used in GG studies5,32. In spherically averaged two-point measurements, the BAO position is fixed by the sound horizon at the baryon-drag epoch rd and provides a measurement of4

$${D}_{V}(z)\equiv {\left[cz{(1+z)}^{2}{D}_{\mathrm{A}}{(z)}^{2}{H}^{-1}(z)\right]}^{1/3},$$

where DA(z) is the angular diameter distance and H(z) is the Hubble parameter. The correlation functions are measured under an assumed fiducial cosmological model to convert angles and redshifts into distances. The deviation of the fiducial cosmology from the true one can be measured by comparing the BAO scale in clustering measurements with its position in a template constructed using the fiducial cosmology. The deviation is characterized by

$$\alpha \equiv \frac{{D}_{V}(z){r}_{{{{\rm{d}}}},{{{\rm{fid}}}}}}{{D}_{V,{{{\rm{fid}}}}}(z){r}_{{{{\rm{d}}}}}},$$

where the subscripts ‘fid’ denote the quantities from the fiducial cosmology.

The template of ξg+,0 is generated using the linear power spectrum, Plin, from the CLASS code33. In GG BAO peak fitting, a linear power spectrum with damped BAO is usually used to account for the nonlinear effect,

$${P}_{{{{\rm{damp}}}}}(k)={P}_{{{{\rm{nw}}}}}(k)\left[1+\left(\frac{{P}_{{{{\rm{lin}}}}}(k)}{{P}_{{{{\rm{nw}}}}}(k)}-1\right){\mathrm{e}}^{-(1/2){k}^{2}{\varSigma }_{{{{\rm{nl}}}}}^{2}}\right],$$

where Pnw is the fitting formula of the no-wiggle power spectrum3 and Σnl is the damping scale. In this analysis, we set Σnl = 0 as our fiducial model for GI, and we also show the results with Σnl as a free parameter in Supplementary Table 1.

Using the template, our model for GI correlation is given by

$${\xi }_{{\mathrm{g}}+,0}(s)=B\int\nolimits_{0}^{\infty }\frac{{k}^{2}\,{\mathrm{d}}k}{2{\uppi }^{2}}{P}_{{{{\rm{lin}}}}}(k){j}_{2}(\alpha ks),$$

where s is the comoving distance in redshift space and B accounts for all the factors that only affect the amplitude of the correlation, such as IA strength, galaxy bias and shape responsivity (see equation (16)). As in the GG analysis, we add a further polynomial in our model to marginalize over the broad band shape:

$${\xi }_{{\mathrm{g}}+,0}^{{{{\rm{mod}}}}}(s)={\xi }_{{\mathrm{g}}+,0}(s)+\frac{a_1}{{s}^{2}}+\frac{a_2}{s}+a_3.$$

Thus, with the observed GI correlation ξg+,0obs(s) and the covariance matrix C, we can assume a likelihood function \({{{\mathcal{L}}}}\propto \exp (-{\chi }^{2}/2)\), with

$${\chi }^{2}=\frac{{N}_{{{{\rm{JK}}}}}-{N}_{{{{\rm{bin}}}}}-2}{{N}_{{{{\rm{JK}}}}}-1}\mathop{\sum}\limits_{i,j}\left[{\xi }_{i}{\,}^{{{{\rm{mod}}}}}-{\xi }_{i}{\,}^{{{{\rm{obs}}}}}\right]{{{C}}}_{ij}^{-1}\left[{\xi }_{j}{\,}^{{{{\rm{mod}}}}}-{\xi }_{j}{\,}^{{{{\rm{obs}}}}}\right],$$

where C−1 is the inverse of C, i, j indicate the data points at different radial bins, NJK and Nbin are the total numbers of subsamples and radial bins and (NJK − Nbin − 2)/(NJK − 1) is the Hartlap correction factor34 to obtain the unbiased covariance matrix. The covariance matrices are estimated using the jackknife resampling from the observation data:

$${{{C}}}_{ij}=\frac{{N}_{{{{\rm{JK}}}}}-1}{{N}_{{{{\rm{JK}}}}}}\mathop{\sum }\limits_{n=1}^{{N}_{{{{\rm{JK}}}}}}\left({\xi }_{i}^{n}-{\bar{\xi }}_{i}\right)\left({\xi }_{j}^{n}-{\bar{\xi }}_{j}\right),$$

where \({\xi }_{i}^{n}\) is the measurement in the nth subsample at the ith radial bin and \({\bar{\xi }}_{i}\) is the mean jackknife correlation function at the ith radial bin. We use the Markov chain Monte Carlo sampler emcee35 to perform a maximum-likelihood analysis. NJK is chosen by gradually increasing it until the constraints on α are stable (Supplementary Table 1).

Sample selection

We use the data from the BOSS CMASS DR12 sample6,36,37. The CMASS sample covers an effective area of 9,329°2 and provides spectra for over 0.8 million galaxies. Galaxies are selected with a number of magnitude and colour cuts to obtain an approximately constant stellar mass. We use the CMASSLOWZTOT Large-Scale Structure catalogue in BOSS DR12 and adopt a redshift cut of 0.43 < z < 0.70 to select the CMASS sample with an effective redshift zeff = 0.57.

Reconstruction methods can improve the significance of the detection of the BAO feature, and reduce the uncertainty in BAO scale measurements, by correcting for the density field smoothing effect associated with large-scale bulk flows38,39,40. We also use the post-reconstructed catalogues from BOSS DR12 and we refer for the details of the reconstruction methods to their papers6,40.

To obtain high-quality images for the CMASS galaxies, we cross-match them with the DESI Legacy Imaging Surveys DR9 data41 (, which cover the full CMASS footprint and contain all the CMASS sources. The Legacy Surveys can reach an r-band point spread function depth fainter than 23.5 mag, which is two to three magnitudes deeper than the SDSS photometry survey used for target selection and is more than adequate to study the orientations of the massive CMASS galaxies. The Legacy Survey images are processed using Tractor42, a forward-modelling approach to perform source extraction on pixel-level data. We use shape_e1 and shape_e2 in the Legacy Surveys DR9 catalogues ( as the shape measurements for each CMASS galaxy. These two quantities are then converted to the ellipticity defined in equation (1). Following Okumura & Jing24, we assume that all the galaxies have q = 0, which is equivalent to assuming that a galaxy is a line along its major axis. This assumption only affects the amplitude of the GI correlations, and the measurements of the position angles are more accurate than those of the whole galaxy shapes.

The whole CMASS sample is used to trace the density field, while for the tracers of the ellipticity field we further select galaxies with Sérsic index43 n > 2, since only elliptical galaxies show strong shape alignments, and q < 0.8 for reliable position angle measurements. In principle, we should also exclude satellite galaxies. However, since selecting centrals in redshift space is arbitrary and most of the CMASS luminous red galaxies are already centrals, we do not consider the central–satellite separation. Selections using n and q remove nearly half (from 816,779 to 425,823) of the CMASS galaxies. We show the results using the whole sample in Supplementary Table 1 and confirm that the morphology and q selection can really improve the measurements and tighten the constraints.


To estimate the GI correlations, we generate two random samples Rs and R for the tracers of ellipticity and density fields respectively. Following Reid et al.37, redshifts are assigned to randoms to make sure that the galaxy and random catalogues have exactly the same redshift distribution. We adopt the generalized Landy–Szalay estimator44,45

$${\xi }_{{\mathrm{g}}+,0}(s)=\frac{{S}_{+}(D-R)}{{R}_{s}R},$$

where RsR is the normalized random–random pairs. S+D is the sum of the + component of ellipticity in all pairs:

$${S}_{+}D=\mathop{\sum}\limits_{i,j}\frac{{\gamma }_{+}(\,j| i)}{2{{{\mathcal{R}}}}},$$

where the ellipticity of the jth galaxy in the ellipticity tracers is defined relative to the direction to the ith galaxy in the density tracers, and \({{{\mathcal{R}}}}=1-\langle {\gamma }_{+}^{2}\rangle\) is the shape responsivity46. \({{{\mathcal{R}}}}\) = 0.5 under our q = 0 assumption. S+R is calculated in a similar way using the random catalogue.

We also measure the GI correlation functions for the reconstructed catalogues. The ellipticities of galaxies are assumed unchanged in the reconstruction process. The estimator becomes

$${\xi }_{{\mathrm{g}}+,0}(s)=\frac{{S}_{+}(E-T)}{{R}_{s}R},$$

where E and T represent the reconstructed data and random sample, and R and Rs are the original random samples. In the above calculations, we adopt the Feldman–Kaiser–Peacock weights47 (wFKP) and weights for correcting the redshift failure (wzf), fibre collisions (wcp) and image systematics (wsys) for the density field tracers37: wtot = wFKPwsys(wcp + wzf − 1), while no weight is used for the ellipticity field tracers.

Measurements and modelling

We measure ξg+,0(s) for both pre- and post-reconstruction catalogues in 50 < s < 200 h−1 Mpc with a bin width of 5 h−1 Mpc. We calculate their covariance matrices using the jackknife resampling with NJK = 400, and model the GI correlation functions using equation (11) with a Planck18 (ref. 22) fiducial cosmology at z = 0.57. In Supplementary Table 1, we show that the pre- and post-reconstruction GI results are relatively stable if NJK ≥ 400, verifying that NJK = 400 is a reasonable choice. We measure and model the post-reconstruction isotropic GG correlation functions with the same radial bins and error estimation schedule (NJK = 400). We also model the GG and GI correlation together with a 60 × 60 covariance matrix that includes the GG–GI cross-covariance to obtain the combined results.