## Main

We observed one transit of WASP-39b on 10 July 2022 with JWST’s Near InfraRed Spectrograph (NIRSpec)9,13, using the PRISM mode, as part of the JWST Transiting Exoplanet Community Early Release Science Program (ERS Program 1366) (PIs: Natalie Batalha, Jacob Bean, Kevin Stevenson)10,11. These observations cover the 0.5–5.5 µm wavelength range at a native resolving power of 20–300. WASP-39b was selected for this JWST-ERS Program because of previous space- and ground-based observations revealing strong alkali metal absorption and several prominent H2O bands4,6,14,15,16, suggesting a strong signal-to-noise ratio could be obtained with JWST. However, the limited wavelength range of existing transmission spectra (0.3–1.65 µm, combined with two wide photometric Spitzer channels at 3.6 and 4.5 µm) left several important questions unresolved. Previous estimates of WASP-39b’s atmospheric metallicity—a measure of the relative abundance of all gases heavier than hydrogen or helium—vary by four orders of magnitude6,16,17,18,19,20. Accurate determinations of metallicity can explain formation pathways and provide greater insight into the planet’s history21. The JWST NIRSpec PRISM observations we present here offer a more detailed view into WASP-39b’s atmospheric composition than has previously been possible (see ref. 21 for an initial infrared analysis of these data).

We obtained time-series spectroscopy over 8.23 h centred around the transit event to extract the wavelength-dependent absorption by the planet’s atmosphere—that is, the transmission spectrum, which probes the planet’s day-night terminator region near millibar pressures. We used NIRSpec PRISM in Bright Object Time Series (BOTS) mode. WASP-39 is a bright, nearby, relatively inactive22 G7 type star with an effective temperature of 5,400 K (ref. 8). WASP-39’s J-band magnitude of 10.66 puts it near PRISM’s saturation limit, which allows us to test the effects of saturation on the quality of the resulting science compared to past measurements (Methods).

In our baseline reduction using Fast InfraRed Exoplanet Fitting for Lightcurves (FIREFLy)23, we perform calibrations on the raw data using the jwst Python pipeline12 and then identify and correct for bad pixels and cosmic rays. We mitigate the 1/f noise9 at the group level rather than the integration level to ensure accurate slope fitting, which we find to be a crucial step for NIRSpec PRISM observations with few groups per integration.

We bin the resulting spectrophotometry in wavelength to create 207 variable-width spectral channels with roughly equal counts in each. Figure 1 shows the FIREFLy white and spectrophotometric light curves at this step in the top panel. Several absorption features are visible by-eye as darker horizontal stripes within the transit region in the two-dimensional (2D) light curve (Fig. 1), demonstrating the high quality of the raw spectrophotometry achieved by the PRISM observing mode.

To extract the atmosphere’s transmission spectrum, we fit the planet’s transit depth in each wavelength bin using a limb darkened transit light curve model using the Python-based Levenburg–Marquardt least-squares algorithm lmfit24. The light curves show a typical photometric scatter of 0.2–1.2% per integration (1.36 s each), and the typical transit depth uncertainties vary between 50 and 200 parts per million (ppm), which is in line with near-photon-limited precision (Methods). Although we successfully measure fluxes in the saturated regions (0.8–2.3 µm), because of the lower number of groups used per integration here (1–3) the measured count rates may be adversely affected. We do not find excess red noise in the saturated channels themselves, however, we notice large point-to-point scatter in the transit depths, which required wider wavelength binning to better match previous Hubble Space Telescope (HST) observations. Figure 2 highlights representative transit light curves spanning the entire wavelength range. These data are binned into wider wavelength channels than those used for the final transmission spectrum for ease of presentation. Light curve systematics have not been removed from these data, demonstrating the unprecedented stability and precision of the PRISM observing mode.

We also compared the results from the FIREFLy reduction to three other independent reductions that use different treatments for the saturated region of the detector, limb darkening and various detector systematics (Methods). All four reductions obtain consistent results. Figure 3 shows a comparison of the four reductions. The consistency provides confidence in the accuracy of derived atmospheric parameters, demonstrating that any residual systematics are minimal and do not strongly bias results for NIRSpec PRISM observations. The transmission spectrum also agrees well with previous measurements from ground-based telescopes15,16 as well as HST and Spitzer6 within error (Fig. 3), indicating that we can reliably recover a spectrum at these levels of saturation. These PRISM observations offer high-quality data from 0.5–5.5 µm, with minimal contributions from systematics and at precisions generally near the photon limit (Methods). Although recovery of the saturated region (0.9–1.5 µm) is possible, caution is warranted when interpreting this portion of the spectrum (Methods). Future PRISM observations of similarly bright targets should therefore carefully consider whether saturating the spectrum is an appropriate choice for a given planet, or whether building the wavelength coverage from several transits with different complementary modes is preferable.

The transmission spectrum of WASP-39b from the FIREFLy reduction is shown in Fig. 4. We select the FIREFLy reduction to be our baseline reduction, but comparable results are achieved with the three other reductions presented in this work (Methods). We interpret the spectrum with grids of one-dimensional (1D) radiative–convective–thermochemical equilibrium (RCTE) models (postprocessed with some more gases (Methods)), with a representative best-fitting model transmission spectrum shown in Fig. 4, along with opacity contributions from atoms, molecules and grey clouds. We detect the presence of H2O by means of four pronounced independent bands (33σ, 1–2.2 µm), a prominent CO2 feature at 4.3 µm (28σ), Na at 0.58 µm (19σ), a CO absorption band at 4.7 µm (7σ) and a grey cloud (21σ). We do not observe any significant CH4 absorption (expected at 3.3 µm), despite predictions of its presence for atmospheres at approximately solar metallicity and place a 3σ upper limit of 5 × 10−6 on the CH4 volume mixing ratio between 0.1 and 2 mbar. We also observe a relatively narrow absorption feature at 4.05 µm (roughy 2.7σ), which we attribute to SO2—a potential tracer for photochemistry25,26,27—after an extensive search across many possible opacity sources (Methods). Using a Bayesian approach described in the Methods section, we calculate that the volume mixing ratio of SO2 needed to explain this feature is 10−5. The potential SO2 feature is also observed at higher resolutions with JWST NIRSpec G395H (ref. 28), adding confidence that the feature first reported as an unknown absorber29 is a genuine feature of the planet’s atmosphere. With Na detected in the atmosphere, the alkali metal, K, is also expected at optical wavelengths14 although not detected. However, the resolution covering the narrow K absorption doublet in the optical is low, which may be preventing detection. This might also be because of detector saturation in the wavelength range where K absorption is expected. We also do not detect the presence of H2S in the atmosphere. We note that although the best-fitting models shown in Figs. 3 and 4 have some CH4, H2S and K signatures, these species are not favoured by the data to the level of a detection. We determine the single best-fitting atmospheric metallicity, C/O ratio and grey cloud opacity to be ten times solar, 0.7 and κcld = 10−2.07 cm2 g−1, respectively. A detailed discussion on these best-fitting parameters is presented in the Methods section.

JWST/NIRSpec PRISM’s power to constrain many chemical species in hot giant planet atmospheres provides new windows into their compositions and chemical processes, as we show here with WASP-39b. Using our model grids, we find that WASP-39b’s best-fitting atmospheric metallicity is roughly ten times solar. In the limit of equilibrium chemistry, our non-detection of CH4 at 3.4 µm paired with the prominence of the large CO2 feature at 4.4 µm are indicative of a super-solar atmospheric metallicity, as illustrated in Extended Data Fig. 9. This may point to WASP-39b’s puffy envelope bearing more compositional similarity to the similarly massed ice giants than the gas giants. Moreover, the probably detection of SO2, and its unexpectedly high estimated abundance, suggests that photochemical processes are pushing this species out of equilibrium. Photochemistry models show that sulfur compounds such as H2S efficiently photodissociate and recombine to form SO2 with roughly 1 ppm abundances and at 1–100 mbar pressures26—roughly the same pressure range probed by our transmission spectroscopy (Extended Data Fig. 10). The abundance measurement of SO2 can therefore serve as an important tracer of the thermochemical properties of highly irradiated stratospheres and the efficiency of photochemistry. Furthermore, our detection of a qualitatively significant wavelength dependence to the planet’s central transit time (Extended Data Fig. 3) suggests that these observations are sensitive to differences in the atmospheric composition at the planet’s leading and trailing hemispheres. The measured roughly 20 s amplitude of this effect is in line with model expectations30. This indicates that such observations will be informative in exploring the three-dimensional (3D) nature of hot Jupiter atmospheres, which may give a more holistic understanding of their heat redistribution and nightside chemistry.

## Methods

### Data reduction

One transit of WASP-39b was observed with the NIRSpec PRISM mode, with the 8.23-h observation roughly centred around the transit event. We used NIRSpec’s Bright Object Time Series (BOTS) mode with the NRSRAPID readout pattern, the S1600A1 slit (1.6" × 1.6" ) and the SUB512 subarray. Throughout the exposure, we recorded 21,500 integrations, each with five 0.28-s groups up the ramp. We achieved a duty cycle of 82%.

We extracted transmission spectra of WASP-39b using four different reductions with the FIREFLy, tshirt, Eureka!+ExoTEP and Tiberius pipelines. The results from all reductions are broadly consistent (Fig. 3 and Extended Data Fig. 1). We used the FIREFLy reduction as our baseline for comparison to models throughout this paper, however, equivalent overall results can be deduced from the other reductions. Some key attributes of the reductions are compared in Extended Data Table 2. All reductions correct for 1/f noise: correlated frequency-dependent read noise in the images caused by detector readout and current biases in the electronics31. We note that as the GAINSCALE step of the JWST pipeline applies a gain correction to the raw count rate files, the counts and count rates quoted herein are in units of electrons and electrons per second, respectively.

We find that recovery of the saturated region was possible by applying several custom steps described here. Without these steps, the heavily saturated region showed a large and unexpected point-to-point scatter of several thousand ppm in the transmission spectra. We note that there was limited on-sky NIRSpec calibration data available when the data were obtained and reduced, including an incomplete detector bias image whose values were all set to zero. We used a custom bias frame for this step (private communication, S. Birkmann). Although the transmission spectra longwards of about 2 µm could be extracted without the use of this calibration, we found that bias correction was critical to extract the spectrum in the saturated region.

In addition, to recover the saturated region it was necessary to perform a reference pixel correction something that was skipped by the default jwst pipeline for NIRSpec PRISM because no official reference pixels are present in the subarray (tshirt section below). All reductions also expand the saturation flags along entire columns and only use the groups before saturation for slope fitting in these regions. With these steps, the spectra broadly matched previous HST and Very Large Telescope (VLT) observations6, with improvement in the region with only one or two groups before saturation. We expect that as updated NIRSpec calibration data become available, the recovery of saturated regions in PRISM observations may become easier, however, we still suggest avoiding rapid saturation with less than two groups before saturation if possible, especially if that region of the spectrum is important to one’s science case.

#### FIREFLy

We performed custom calibrations on the uncalibrated data, including 1/f noise destriping9 at the group level, bad and hot pixel cleaning, cosmic ray removal and 5σ outlier rejection. Destriping the data also removed potential background in the 2D images, although none was apparent in the data. The jump-step and dark-current stages of the jwst pipeline12 (v.1.6.2) were skipped, and the top and bottom six pixels of the non-illuminated subarray were manually set to be reference pixels in the jwst pipeline reference pixel step. To obtain our final wavelength calibration, we extrapolated the STScI-provided in-flight instrumental wavelength calibration data product across the detector edge pixels that did not have an assigned wavelength. The calibration was derived using the ground-based wavelength solution. We performed tests to search for zero-point offsets in the calibration versus the planetary and stellar spectra and did not find any at the level of half a pixel width or greater.

JWST detectors integrate using a non-destructive up-the-ramp sampling technique in which the flux is measured in counts per second from fitting the ramp from the groups contained within each integration. Extended Data Fig. 2 shows the regions of the spectrum affected by saturation. Within a column where a pixel was marked as saturated by the pipeline in any given group, we used only the data from the preceding groups for ramp fitting and manually set an entire column of the detector as saturated if a pixel in that column was saturated. Because a small portion of the spectrum reaches our saturation threshold in the second group, this region of the spectrum only uses one group to derive a ‘ramp’. Although we were able to recover the spectra in this wavelength range by flagging and ignoring saturated pixels at the group level, we note that the data quality is lower in the saturated region than in the rest of the spectrum given the counts per second ramp was measured from fewer than the total five groups.

We measured the positional shift of the spectral trace across the detector throughout the time series using cross correlation and used them to shift-stabilize the images with flux-conserving interpolation. This procedure reduced the amplitude of position-dependent trends in the light curves. We optimized the width of our flux extraction aperture at each wavelength pixel and extracted the spectrophotometry. For each wavelength, we tested a wide range of aperture widths and determined the width that minimized the scatter of the photometry of the first 350 data points. We bin the cleaned spectrophotometry in wavelength to create 207 variable-width spectral channels with roughly 105 counts per second in each bin and widths ranging from 3.3 to 60 nm. Because we use fewer groups in the saturated detector columns, our bin widths are larger by a factor of a few in this region to account for the lower count rates per detector column.

Before fitting the transmission spectrum, we use a very wide, high-SNR white light channel (3–5.5 µm) to fit for the planet’s orbital parameters (listed in Extended Data Table 1). Restricting the wide bin to the reddest wavelengths minimizes the impact of limb darkening on the transit light curve and the resulting covariance with the orbital system parameters while ignoring the saturated region. We fit this white light curve using the Markov chain Monte Carlo sampler emcee32 within the least-squares minimization framework of lmfit. We use 1,000 steps and uniform priors with extremely wide bounds that encapsulate the limits of physicality to ensure that there is no bias introduced by the prior. Our fitting approach accounts for non-Gaussian degeneracies in the posterior distribution, thereby addressing the known linear correlation between impact parameter (b) and the scaled semimajor axis (a/R*)

We excluded the first 3,000 integrations as they showed a slight non-linear baseline flux trend, and integrations 20,750–20,758 because of a high-gain antenna move that was identified from outliers in the photometry that correlated with noticeable trace shifts in the x and y directions. To measure the transmission spectrum, we fit the light curve at each wavelength channel jointly with a transit model33 and a linear combination of systematics vectors composed of the measured spectral shifts in the x and y directions. At each channel, we fit the planet’s transit depth and the stellar limb darkening, while fixing the transit centre time T0, affect parameter b and normalized semimajor axis a/R* to the values determined in the white light curve fit. We also fix the orbital period to the published value of 4.0552941 days (ref. 34). With the orbital system parameters fixed, we find the posterior distribution is well-fit by a multivariate Gaussian distribution, and therefore use a Levenberg–Marquart least-squares minimization algorithm24 to efficiently determine the best-fit parameters. In each channel, we inflate the transit depth error bars in quadrature with the measured residual red noise in the photometry as measured by the binning technique35. Measured uncertainties on the transit depths vary from 50 to 200 ppm, with a median of 99 ppm (Extended Data Fig. 4). As the noise levels are very close to the limit with what is expected including only photon and read noise sources, tools such as PandExo36 should accurately predict what is achievable for other planets. We measure an increase in red noise for a few select spectral channels, but otherwise the light curves show no significant systematic errors, with some channels binning down to precision levels of a few ppm. We measure x and y jitter systematics at the roughly 100 ppm level. We see differences in the central transit time as a function of wavelength on the order of 10 s, which may be attributable to limb asymmetries in the atmospheric temperature and composition. We show these signatures in Extended Data Fig. 3. Notably, we see a significant timing structure in the 2–3 µm range, which may arise from limb asymmetries in temperature and/or cloud coverage at the altitude probed by the water vapour absorption feature at 2.7 µm (ref. 37). Further analysis of the spectrophotometry could be warranted to investigate limb asymmetries in more detail.

We fit the transit light curves using a quadratic function to model stellar limb darkening given as,

$$\frac{I\left(\mu \right)}{I\left(1\right)}=1-a\left(1-\mu \right)-b{\left(1-\mu \right)}^{2}$$
(1)

where I(1) is the intensity at the centre of the stellar disc, µ = cos(θ) where θ is the angle between the line of sight and the emergent intensity, and a and b are the limb darkening coefficients. We tested a four-parameter non-linear limb darkening function38 as well, which provided equivalent results. In practice, we first fit for both u+ = a + b and u = a − b for the quadratic law. When comparing the limb darkening coefficients to theoretical values, we find an offset between the theoretically derived values of u+ from the 3D stellar models from ref. 39 and the JWST values derived from the transit light curve fits (Extended Data Fig. 5). This offset suggests the limb of WASP-39A is brighter than the stellar models predict. We fit for this offset and find it to be −0.065 ± 0.022. As the wavelength-to-wavelength shape of u+ is well described by the model, we then apply this offset to the theoretical limb darkening coefficients and then subsequently fix u+ while allowing only u to be free (Extended Data Fig. 5). This procedure helps reduce degeneracies when fitting several limb darkening coefficients and increases the precision of the transmission spectrum, as the limb darkening is often not well constrained, particularly at long wavelengths where the limb darkening is weak39 (Extended Data Fig. 5). The main effect of fitting for limb darkening over fixing the coefficients to the 3D models is the transit depth level of the optical spectrum, which is lower with values fixed to the model. We compare the optical spectrum with fixed limb darkening to the HST data from ref. 6 in Extended Data Fig. 6, which was also fit with limb darkening fixed to the same model. Overall, we find good agreement between the two spectra. We note that the assumptions around limb darkening can affect the optical spectra continuum wthat particularly affects the interpreted levels of aerosol scattering: further investigations are warranted.

#### tshirt

We use the tshirt pipeline, for example, ref. 40, to extract an independent set of light curves and spectrum. We begin with the uncalibrated ‘uncal’ data product and apply a custom set of processing steps on stage 1 that build on the existing jwst stage 1 pipeline software v.1.6.0 with reference files CRDS (Calibration Reference Data System) jwst_0930.pmap. We use a custom bias file shared by the instrument team (S. Birkmann, private communication), which is the same file that was delivered to the JWST CRDS.

We attempt to minimize the biasing effects of count rate non-linearity by modifying the quality flags of pixels surpassing 90% of full-well depth at the group stage. To ensure that there are no systematic differences between pixels within the spectral trace and in the background region, we adjust the quality flags uniformly along the entire pixel column at each group for all integrations. We skip the ‘jump’ and ‘dark’ steps of stage 1.

The tshirt code includes a row, odd-even by amplifier correction to reduce 1/f noise. We first identify source pixels by choosing pixels with more than five data numbers per second (DN s−1) in the rate file and expanding this region out by 8 pixels. We then identify background pixels for 1/f corrections by choosing all non-source pixels and pipeline flagged non-‘DO NOT USE’ pixels. We loop through every group and subtract the median of odd (even) row background pixels from all odd (even) rows. We next find a column-by-column median of all background pixels to calculate a 1/f stripe correction and subtract this from each column.

After calculating rate files in DN s−1, we use tshirt to perform covariance-weight extraction of the spectrum31. We do a column-by-column linear background subtraction using pixels 0–7 and 25–32. We use a rectangular source extraction region centred on Y = 16 pixels with a width of 14 pixels. We assume the correlation between pixels to be 8% from previous studies of background pixels31. We use a spline with 30 knots to estimate a smooth spectrum of the star at the source pixels and identify bad pixels as ones that deviate by more than 50σ from the spline. Pixels that are more than 50σ or else marked as DO NOT USE are flagged and then the spatial profile is interpolated over those pixels. No corrections were made to the centroid or wavelength solution because of the exceptional pointing stability of the observatory41.

When fitting the light curves, we exclude all time samples between UT 2022-07-10T23:20:01 and 2022-07-10T23:21:08 to avoid the effects of the high-gain antenna move. We first fit the broadband light curve with all wavelengths. We assume zero eccentricity and the orbital parameters from ref. 34 for a/R and period. We try fitting the white light curve with eccentricity and argument of periastron set free and find that eccentricity is consistent with 0. We therefore assume zero eccentricity and a transit centre projected to the time of observations from a fit to the TESS data. We also assume an exponential temporal baseline in time to the data and a second-order polynomial trend in time. We fit the quadratic limb darkening parameters with uninformative priors42 and the exoplanet code43,44,45 with 3,000 burn-in steps and 3,000 sampling steps and two No U Turns Sampling chains46. We next binned the spectra into 116 bins, each 4 pixels wide. We fit all the individual spectroscopic channels with the orbital parameter fixed from the broadband light curve fit and only allowed the transit depth and limb darkening parameters to be free. Our resulting transit depth uncertainties ranged from 35 to 732 ppm, with a median of 90 ppm.

#### Eureka! and ExoTEP

We use the Eureka! pipeline47 for the data reduction steps of detector processing, data calibration and stellar spectrum extraction, and the ExoTEP pipeline48,49,50 to generate light curves in each wavelength bin and perform light curve fitting.

We start our data reduction using the uncalibrated uncal outputs of the jwst pipeline’s stage 0. From there, Eureka! acts as a wrapper for the first two stages of the jwst pipeline, v.1.6.0. We use the jwst pipeline to fit slopes to the ramp in each pixel and perform data calibration, and follow the default pipeline steps unless otherwise stated. We skip the jump detection step, meant to correct the ramps for discontinuities in the slopes of group count rates as a function of time. Owing to the small number of groups up the ramp, performing this step leads to a large fraction of the detector pixels being incorrectly flagged as outliers and we therefore rely on the time series outlier-clipping steps in the subsequent stages to correct for cosmic rays. A custom bias frame is used, rather than the default one available on CRDS at the time of reduction. We also expand the saturation flags in stage 1 to ignore saturated pixels more conservatively than allowed by the default jwst pipeline settings: in each group, we flag pixels as saturated if they reach roughly 85% of the full well in the median image across all integrations for that group and expand the saturation flag such that in a given detector column (constant wavelength) all pixels are marked as saturated if any one pixel in that column is flagged. This is implemented by inputting the indices of columns to mask on the basis of an inspection of the uncal data products, rather than an internal calculation of the full well percentage. We include a version of the row, odd-even by amplifier correction described above, using the top and bottom six rows. We further add a custom background correction at the group level before ramp fitting, and subtract from each column the median of the six pixels at the top and at the bottom of the detector, excluding outliers at more than the 3-σ level. We skip the ‘photom’ step in stage 2 of the STScI detector pipeline because absolute fluxes are not needed in our analysis. We also skip the ‘extract1d’ step as we perform custom spectral extraction using Eureka!.

For 1D spectral extraction, we trim the array to include only columns 14 to 495 in the dispersion direction, as NIRSpec’s throughput is negligible beyond this range. We then use the median detector frame to construct the weights used in the optimal extraction based on ref. 51. Pixels are masked if they have an marked data quality flag (that is, bad pixels that are flagged by the jwst pipeline as ‘DO NOT USE’ for various reasons) or if they are clipped by two iterations of 10-σ-clipping of the time series. We perform the optimal extraction over eight rows centred on the source position (corresponding to a spectral half-width aperture of 4 pixels). The source position is identified from the maximum of a Gaussian fitted to the summed spatial profile from all detector columns over the entire integration.

We use ExoTEP to generate median-normalized light curves at the native pixel resolution from each detector column, using the stellar spectra outputs from stage 3 of Eureka!. We then perform further clipping of outliers in time in the white and wavelength-dependent light curves by computing a running median with a window size of 20 and excluding 3σ outliers in several time series. This outlier-clipping was applied to the flux, source position and width in the cross-dispersion direction in each frame and spectrum shifts in the dispersion direction.

We jointly fit astrophysical and systematics model parameters to the white (0.5–5.5 µm) light curves and each of the wavelength-dependent light curves. Our astrophysical transit model is calculated using the batman package33. Using the white light curve, we fit for the two coefficients of a quadratic limb darkening law (equation (1)), WASP-39b’s impact parameter, scaled semimajor axis a/R*, time of transit centre and the planet-to-star radius ratio. In each of the wavelength channels we then fix the planet’s impact parameter, semimajor axis and transit time to the values derived from the white light curve and fit only for the planet-to-star radius ratio and the two quadratic limb darkening coefficients. For the systematics model, we assume a linear trend with time that can be different in each spectroscopic channel, and fit for its slope and y intercept. Last, we fit a single-point scatter to each light curve, which illustrates the level of scatter required for our joint model to reach a reduced chi-squared of 1. The fitted light curve scatter in both the white light curve and wavelength-dependent channels is within a few percent of the expectation from the high-frequency scatter in the raw light curves, which attests to the lack of systematics. We bin the final transmission spectrum (four points binned together throughout the spectrum) for visual comparison with the other reductions in Fig. 3.

#### Tiberius

The Tiberius pipeline builds on the LRG-BEASTS spectral reduction and analysis pipelines introduced in refs. 16,52,53. The Tiberius pipeline operates on the stage 1 JWST data products to obtain 1D stellar spectra through tracing of the stellar spectra, fitting and removal of the background noise and simple aperture photometry. We used the FIREFLy-processed stage 0 data.

Before tracing the spectra, we interpolate each column of the detector onto a finer grid, 10× the initial spatial resolution. This step improves the extraction of flux at the subpixel level, particularly where the edges of the photometric aperture bisect a pixel, and leads to a 14% reduction in the noise in the data. We also interpolate over the bad pixels using their nearest neighbouring pixels in x and y. We identify bad pixels by combining 5σ outlying pixels found by means ofrunning medians operating along the pixel rows with bad pixels identified by visual inspection. We trace the spectrum by fitting a Gaussian distribution at each column (in which a column refers to the cross-dispersion direction) to the stellar spectra. We then use a running median, calculated with a moving box with a width of five data points, to smooth the measured centres of the trace. We fit these smoothed centres with a fourth-order polynomial, removed five median absolute deviation outliers and refitted with a fourth-order polynomial.

To remove residual background flux not captured by the 1/f correction, we fit a linear polynomial along each column in the spatial direction. We mask the stellar spectrum, defined by an aperture with a full width of 4 pixels centred on the trace we found in the previous step, from this background fit. We also mask an extra 7 pixels on either side of this aperture so that the background fit is not affected by the wings of the stellar point spread function. This left us with 7 pixels at each edge of the detector (a total of 14 pixels) with which to estimate the background. We also clipped any pixels within the background that deviate by more than three standard deviations from the mean for that particular column and frame to avoid residual bad pixels and cosmic rays affecting our background estimation. We found that this extra background step led to a 3% improvement in the precision of the transmission spectrum.

The stellar spectra are then extracted by summing the flux within a 4-pixel-wide aperture following the removal of the background at each column. The background count level, as estimated by the JWST Exposure Time Calculator is on the order of a few counts per second, meaning the background is negligible. Further, because we perform 1/f subtraction, this faint background is subtracted column-by-column. We experimented with the choice of the aperture width, also running reductions with 8- and 16-pixel-wide apertures. The 8-pixel-wide aperture gave a median uncertainty 1% larger than a 4-pixel aperture and a 16-pixel aperture gave an uncertainty 15% larger than 4 pixels. This same change was reflected in the median root mean square of the residuals to the light curve fits. As the stellar point spread function is so narrow in PRISM data, we believe that the increase in noise with increasing aperture width is related to the increasing influence of photon noise, read noise and bad pixels where the stellar flux is lower. Following the extraction of the stellar spectra, we divide the measured count rates by a factor of ten to correct for our pixel oversampling, as described above.

To remove residual cosmic rays, we identify outliers in each stellar spectrum through comparison with the median stellar spectrum. We did this in three iterations, each of which involves making a median spectrum, identifying outliers (10, 9, 8 σ) and replacing pixels containing a cosmic ray with a linear interpolation between neighbouring pixels. We tested this interpolation against assigning the cosmic ray pixels zero weight and found that this led to a negligible difference in the transmission spectrum. To correct for shifts in the stellar spectra and align each spectrum in pixel space, we cross-correlate each stellar spectrum with the first spectrum of the observation and linearly resample each spectrum onto a common wavelength grid. We adopt the custom wavelength solution calculated by the tshirt pipeline, which uses the jwst pipeline to evaluate the wavelengths at pixel row 16 using the world coordinate system.

Our white light curves are created by summing over the full wavelength range between 0.518 and 5.348 µm. We make two sets of spectroscopic light curves: one set of 440 light curves at 1-pixel resolution and one set of 147 light curves at 3-pixel resolution. We mask integrations 20,751–20,765 because of a high-gain antenna move that leads to increased noise in the light curves. We also mask the first 2,000 integrations from our analysis because of a systematic ramp. This means our light curves each contained 19,486 data points.

To fit our light curves, we began by fitting the white light curve to determine the system parameters.

We fit for the following parameters: the scaled planetary radius (Rp/R*), the planet’s orbital inclination (i), the time of mid-transit (TC), the scaled separation (a/R*), the linear limb darkening coefficient (u1) and the parameters defining the systematics model. We fix the planet’s orbital period to 4.0552941d and eccentricity to 0 (ref. 34). For the remaining parameters, we use the values from ref. 34 as initial guesses.

For the analytic transit light curve model, we use batman33 with a quadratic limb darkening law. We use ExoTiC-LD54,55, with 3D stellar models39 to determine the appropriate limb darkening coefficients (LDs), adopting the stellar parameters (Teff = 5,512 ± 55 K, log g = 4.47 ± 0.03 cgs, [Fe/H] = 0.01 ± 0.09 dex) from ref. 34 and Gaia DR3 (refs. 56,57). For our final fits, we fix the quadratic coefficient, u2, to the values determined by ExoTiC-LD. However, we also run a set of fits with neither u1 nor u2 fixed and find this leads to a transmission spectrum that is qualitatively similar to the one in which LDs are fixed. For the systematics model, we sum the following three polynomials: quadratic in time, linear in x position of the star on the detector and linear in y position of the star on the detector. The final fit model, M, was of the form:

$$M(t)=T(t,p)\times ({\sum }_{i}({S}_{i}{({a}_{i},s)}^{ni}))$$
(2)

where t is time, p are the parameters of the transit model, T, a are the ancillary data and s are the parameters (polynomial coefficients) of the systematics model, S. The systematics model is the sum of the polynomials operating over each ancillary input, ai, with ni defining the order of the polynomial used for each input.

We fit our white light curve in three steps: a first fit to remove any 4σ outliers from the light curves, a second fit that is used to rescale the photometric uncertainties such that the best-fitting model gives χν2 = 1 and a third fit with the rescaled photometric uncertainties, from which our final parameter values and uncertainties are estimated. The parameter uncertainties were calculated as the standard deviation of the diagonal of the covariance matrix that was in turn calculated from the Jacobian returned by scipy.optimize.

Following the white light curve, we fit our spectroscopic, wavelength-binned, light curves. For these fits, we held a/R*, i and TC fixed to the values determined from the white light curve fit: 11.462 ± 0.014, 87.847 ± 0.015°, 2,459,770.835623 ± 0.000008 Barycentric Julian Date Dynamical Time (BJDTDB.). These values are different from the FIREFLy-reduced white light parameters, and these differences will be explored in greater detail in a future work. To zeroth order, offsets in orbital parameters result in simple vertical offsets in the resulting transmission spectrum. The remaining fit parameters were the same as for the white light curve fit. We perform the same iteration of fits using a Levenberg–Marquardt algorithm to determine Rp/Rs as a function of wavelength.

### Reduction comparison

Procedural differences exist across the four main reductions of the dataset, which may account for the subtle qualitative differences between the final reduced spectra. A careful investigation of these nuances is warranted and will be presented in a future paper. Extended Data Table 2 highlights some key procedural differences between the reductions. We note that, despite these differences, the resulting exoplanet spectra are qualitatively in excellent agreement with each other (Fig. 3) because of the stability of the data and the self-calibrating nature of the transit technique.

### Stellar activity

WASP-39b has a reported low activity level8, with a Ca II H and K stellar activity index of logRHK = −4.994 (ref. 4). NGTS and TESS photometric monitoring of WASP-39A is reported in ref. 22, which finds low modulations at the 0.06% level with no apparent star-spot crossings. With low stellar activity levels, the transit observations are unlikely to be affected by stellar activity.

### Forward model grids

We use four different 1D RCTE model grids to assess atmospheric properties such as detection of individual gases, metallicity, carbon-to-oxygen (C/O) elemental abundance ratio, and the presence/absence of clouds. The ScCHIMERA58,59, PICASO 3.0 (refs. 60,61,62,63), ATMO54,64,65 and PHOENIX66,67 models were used to generate these grids specifically for WASP-39b. Whereas the ATMO and the PHOENIX grids were used to fit the data with a reduced χ2 based grid search method, the PICASO 3.0 and ScCHIMERA grids were used in a grid retrieval framework using a nested sampler68,69. Within each nested sample likelihood calculation, the transmission spectra are generated on-the-fly by postprocessing the precomputed 1D RCTE model atmospheres. The SO2 volume mixing ratio and cloud properties are injected into spectrum during this postprocessed transmission calculation. Extended Data Fig. 7 shows best-fit models obtained by each of the four grids compared with the transmission spectrum obtained with the FIREFLy data reduction pipeline. ScCHIMERA, PICASO 3.0 and ATMO produce fits with reduced χ2 between 3.2 and 3.3, while the PHOENIX grid obtains a reduced χ2 of 4.3. The reduced χ2 is defined as the total χ2 calculated from all the data points divided by the total number of data points. Although PICASO 3.0, ScCHIMERA and ATMO predict the metallicity of the atmosphere to be about 10× solar, PHOENIX finds a best-fit metallicity to be a 100× solar that might be due to the larger grid spacing of the PHOENIX grid along both the cloud and metallicity dimensions. Although the models qualitatively match the data, the reduced χ2 obtained by the best-fitting models from these grids are also greater than three, which suggests that these are not fitting the data particularly well. These poor fits could arise for many reasons, such as the region of the data affected by saturation, the presence of disequilibrium chemistry in the atmosphere due to vertical mixing or photochemistry and the non-grey nature of scattering in the upper atmosphere. Extended Data Table 3 provides a summary of the best-fit atmospheric parameters obtained by the four different grids with different fitting methods (grid retrievals and grid search). To explore the effect of the saturated region on the best-fit parameters, we inflate the transit depth errors in the saturated regions (0.68–1.91 µm) by a factor of 1,000 and recompute the best-fit models using the grid retrieval framework with both the PICASO v.3.0 and ScCHIMERA grids. We find that this did not significantly change any of the best-fit parameters including the metallicity and the C/O ratio. Extended Data Table 3 lists the best-fit parameters obtained when the saturated region error bars were inflated by a factor of 1,000. We summarize the main results obtained by these 1D grids here and refer the reader to ref. 29 for detailed descriptions of each of these model grids.

### Detection significance of gases

We quantify the detection significance of each species through a Bayes factor analysis, for example ref. 70. To do so within the ScCHIMERA grid retrieval framework, we remove each gas during the transmission spectrum computation step (the 1D RCTE atmosphere models remain unchanged) one at a time and re-run the nested sampler. We compare the Bayesian evidence of each removed-gas run to that of grid retrieval with all the gases. There is no change in the number of parameters except the cloud and SO2 mixing ratio parameters. Extended Data Table 4 shows the result of this exercise summarized as the log-Bayes factor and a conversion to the detection significance: for example ref. 71.

We also quantify the detection significances of different gases following the procedure used in ref. 29. To calculate the detection significance of each gas, the best-fit transmission spectrum model from the PICASO 3.0 grid ([M/H] = +1.0, C/O = 0.68) is recalculated without that gas. The wavelength ranges in which the particular gas has the most prominent effect are first identified and then a residual spectrum is calculated by subtracting the model without the gas from the data. The residual spectra for H2O, CO2, CO, Na, SO2 and CH4 are shown in the six panels of Extended Data Fig. 8. We fit each of these residual spectra with two functions, a Gaussian–double Gaussian–Voigt function and a constant line. We use the Dynesty nested sampling routine68 to perform the fits and to determine the Bayesian evidence associated with each fit. The Bayes factor between the fits of the residual spectrum with the Gaussian–Voigt function and the constant line is then used to determine the detection significance of a gas. For example, for computing the detection significance of H2O, two adjacent H2O features between 1 and 2.2 µm are used. We note that H2O is expected to be the dominant opacity source in other wavelength ranges (for example, 2.2–3 µm) as well, so choosing two features for this analysis would produce a lower limit on the detection significance of H2O. The best-fit double Gaussian function to these features along with its 1σ and 2σ envelopes are shown with the red line and shaded regions in Extended Data Fig. 8 top-left panel. The same residual spectrum is also fitted with a straight line shown with blue colour in Extended Data Fig. 8. The logarithm of the Bayes factor between the two models is found to be lnB = 242, which shows that the model with H2O is significantly favoured over a model without any H2O. The detection significance of H2O corresponding to this Bayes factor is calculated using the prescription in ref. 71 and is found to be 22σ. The same methodology, but with a single Gaussian function, is also followed for CO2, CO, SO2, H2S and CH4 to get their detection significance summarized in Extended Data Table 4, in the last column. Our Gaussian residual fit significance for CO2 matches the initial analysis of the NIRSpec PRISM data presented in ref. 29.

As shown in Extended Data Table 4, the detection significance of all gases increases with the Bayes factor analysis technique relative to the Gaussian–Voigt function technique. This is notably also the case for SO2, lending confidence to the detection and identification of the molecule, as the feature is better fit by its respective opacity profile.

### Resolution bias and the detection significance of CO

The resolution-linked bias effect serves to dilute the measured amplitudes of planetary atmospheric features because of overlapping absorption lines in the stellar atmosphere. Although this effect is negligible for most stars earlier than M dwarfs, some stellar CO absorption is expected in WASP-39, meaning the measured planetary CO abundance may be biased. Following equation 4 in ref. 72 and using high-resolution (R of roughly 105) PHOENIX models of the planet and the star, we quantify an upper limit on the magnitude of this bias effect. We find that the planetary CO feature is biased by 30 to 40 ppm in the 4.5–5.1 µm region, leading to as much as a roughly 1 − σ underestimate of the planetary CO absorption strength, and subsequently a similar underestimate of its abundance. We note that this effect is potentially weakened by Doppler broadening of the molecular lines (which is unaccounted for by PHOENIX) because of stellar rotation, planetary orbital radial velocity and planetary winds. Future work, which may benefit from more detailed modelling and high-resolution observations of WASP-39’s CO band heads, will better quantify the magnitude of this dilution.

### Metallicity, C/O ratio and CH4 abundance

The best-fitting atmospheric metallicity for WASP-39b is found to be roughly ten times the solar metallicity using the model grids. The top panel in Extended Data Fig. 9 shows the observed transmission spectrum of the planet between 2.0 and 5.3 µm (in which variations due to metallicity are most prominent), along with several transmission spectrum models assuming different atmospheric metallicities ranging from subsolar values (for example, 0.3× solar) to super-solar values (for example, 100× solar). The bottom panel demonstrates the effect of different atmospheric C/O ratios at ten times solar metallicity on many transmission spectrum models along with the data. As the star WASP-39 has near-solar elemental abundances73, scaled solar abundances are a reasonable choice for this star. The CH4 feature between 3.1–4 and 2.2–2.5 µm is very prominent in subsolar and solar metallicity thermochemical equilibrium models shown in Extended Data Fig. 9. The absence of such a CH4 feature in the data is evident. This, combined with the large CO2 feature between 4.3 and 4.6 µm and measurable CO feature at 4.7 µm, led to a super-solar (10×) metallicity estimate for the planet. The C/O ratio of the RCTE models significantly affects the predicted gas abundances, and therefore the calculated transmission spectrum. Extended Data Fig. 9 bottom panel shows that for metal-rich atmospheres (for example, >10× solar) with C/O ratios lower than 0.7, the transmission spectrum is dominated by features of oxygen-bearing gases (H2O, CO2, CO): for example, refs. 65,74,75. But for higher C/O ratios (for example, 0.916), the transmission spectrum becomes CH4 dominated at wavelengths greater than 1.5 µm. We obtain an upper limit on the C/O ratio of WASP-39b at about 0.7. However, these interpretations are based on single-best fits from model grids assuming thermochemical equilibrium. Other chemical disequilibrium processes such as atmospheric mixing and high-energy stellar radiation-induced photochemistry can also potentially affect this interpretation. These disequilibrium chemistry effects require further exploration in the context of WASP-39b and will be discussed in future work (Welbanks et al. (in prep), Tsai et al. (submitted)).

The best-fitting metallicity models can be used to place an upper limit on the CH4 abundance, if the pressure ranges probed by the transmission spectrum are estimated. To estimate the pressure ranges probed by the data, we use the best-fit PICASO 3.0 model to calculate a pressure- and wavelength-dependent transmission contribution function of the atmosphere76. This contribution function for the best-fit 10× solar metallicity PICASO v.3.0 model is shown as a heat map in Extended Data Fig. 10. This shows that the data mostly probes pressure ranges between 0.1 and 2 mbar. We also computed contribution functions for models with solar metallicity and find that they probe similar pressure ranges as well. Extended Data Fig. 10 also shows the pressure dependent CH4 abundances in models with different metallicities presented in Extended Data Fig. 9 top panel. As only super-solar metallicity thermochemical equilibrium models are preferred by the data, the abundance profiles in Extended Data Fig. 10 help us in putting an upper limit of 5 × 10−6 on the CH4 volume mixing ratio between 0.1 and 2 mbar.

### Clouds

The observed spectrum shows slightly muted transit depths, across the entire wavelength range, compared with the depths expected from clear atmospheric models. This hints towards some extra opacity source in the atmosphere with weak wavelength dependence. Opacity sources such as clouds can mute the spectral features in a transmission spectrum2,4. We postprocess the transmission spectrum models with grey (that is, wavelength-independent) cloud opacities to check whether they are preferred over clear atmospheric models by the data. However, the treatment of clouds differ between the four 1D RCTE model grids. PICASO 3.0 and ScCHIMERA grids implemented the cloud opacities using the following equation,

$${\tau }_{i,{\rm{cld}}}={\kappa }_{{\rm{cld}}}\frac{\delta {P}_{i}}{g}$$
(3)

where τi,cld is the cloud optical depth of the ith atmospheric layer in the model with pressure width δPi and g represents the gravity of the planet. The best-fit value of the grey cloud opacity κcld = 10−2.07 cm2 g−1 is calculated in a Bayesian framework by postprocessing the RCTE model grid with this cloud opacity and comparing these postprocessed models with the data. The ATMO grid includes grey cloud decks at several pressures between 1 and 50 mbar, but with variable factors 0, 0.5, 1 and 5 governing cloud opacity with respect to H2’s scattering cross-section at 0.35 µm, where a factor 0 indicates a cloud-free model spectrum. The PHOENIX grid includes similar cloud decks but between 0.3 and 10 mbar with cloud optical depth enhancement factors (identically defined as the ATMO grid) 0 and 10. We find that the cloudy models better fit the data than clear models across all four model grids. The contribution of clouds in limiting the depths of the gaseous features across the entire wavelength range is also shown in Fig. 4 with the grey shaded region.

### 4 µm SO2 feature identification

None of the 1D RCTE models capture the 4 µm absorption feature seen in the data. We searched for several candidate gas species that could produce this feature if their abundances differ from the expected abundances from thermochemical equilibrium. The list of searched chemical species include C-bearing gases such as C2H2, CS, CS2, C2H6, C2H4, CH3, CH, C2, CH3Cl, CH3F, CN and CP. Various metal hydrides, bromides, flourides and chlorides such as LiH, AlH, FeH, CrH, BeH, TiH, CaH, HBr, LiCl, HCl, HF, AlCl, NaF and AlF were also searched as potential candidates to explain the feature. SO2, SO3, SO and SH are among the sulfur-based gases that were considered. Other species that were considered include gases such as PH3, H2S, HCN, N2O, GeH4, SiH4, SiO, AsH3, H2CO, H+3, OH+, KOH, Brα-H, AlO, CN, CP, CaF, H2O2, H3O+, HNO3, KF, MgO, PN, PO, PS, SiH, SiO2, SiS, TiO and VO.

Among all these gases, SO2 was the most promising candidate in terms of its spectral shape and chemical plausibility, although the expected chemical equilibrium abundance of SO2 is too low to produce the absorption signal seen in the data. However, previous work exploring photochemistry in exoplanetary atmospheres25,26 have shown that higher amounts of SO2 can be created in the upper atmospheres of irradiated planets through photochemical processes. Therefore, we postprocess the PICASO 3.0 and ScCHIMERA chemical equilibrium models with varying amounts of SO2 in a Bayesian framework to estimate the SO2 abundance required to explain the strength of the 4-µm feature. The required volume mixing ratio of SO2 was found to be roughly 10−5–10−6. Note that in obtaining this estimate we assumed that the SO2 volume mixing ratio does not vary with pressure for simplicity. In a photochemical scenario this assumption is probably not realistic, although the pressure range probed by SO2 is also limited. Whether photochemical models can produce this amount of SO2 in the atmospheric conditions of WASP-39b is a pressing question that the ERS team is now exploring (Welbanks et al. (in prep), Tsai et al. (submitted)). Whether this feature can be better explained by any other gaseous absorber is also at present under investigation by the ERS team.