## Main

We obtained a single-transit observation of WASP-39b using the NIRSpec11,12 G395H grating on 30–31 July 2022 between 21:45 and 06:21 UTC using the Bright Object Time Series mode. WASP-39b is a hot (Teq = 1,120 K), low-density giant planet with an extended atmosphere. Previous spectroscopic observations have shown prominent atmospheric absorption by Na, K and H2O (refs. 3,4,13,14,15), with tentative evidence of CO2 from infrared photometry4. Atmospheric models fitted to the spectrum have inferred metallicities (amount of heavy elements relative to the host star) from 0.003 to 300 times solar3,15,16,17,18,19,20, which makes it difficult to ascertain the formation pathway of the planet21,22. The host, WASP-39, is a G8-type star that shows little photometric variability23 and has nearly solar elemental abundance patterns24. The quiet host and extended planetary atmosphere make WASP-39b an ideal exoplanet for transmission spectroscopy25. The transmission spectrum of WASP-39b was observed as part of the JWST Transiting Exoplanet Community Director’s Discretionary Early Release Science (JTEC ERS) Program26,27 (ERS-1366; principal investigators Natalie M. Batalha, Jacob L. Bean and Kevin B. Stevenson), which uses four instrument configurations to test their capabilities and provide lessons learned for the community.

The NIRSpec G395H data were recorded with the 1.6″ × 1.6″ fixed slit aperture using the SUB2048 subarray and NRSRAPID readout pattern, with spectra dispersed across both the NRS1 and NRS2 detectors. Over the roughly 8-h duration of the observation, a total of 465 integrations were taken, centred around the 2.8-h transit. We obtained 70 groups per integration, resulting in an effective integration time of 63.14 s. During the observation, the telescope experienced a ‘tilt event’, a spontaneous and abrupt change in the position of one or more mirror segments, causing changes in the point spread function (PSF) and hence jumps in flux28. The tilt event occurred mid-transit, affecting approximately three integrations and resulted in a noticeable step in the flux time series, the size of which is dependent on wavelength (Fig. 1 and Methods). The tilt event also affects the PSF, with the full width at half maximum (FWHM) of the spectral trace showing a step-function-like shape (see Extended Data Figs. 2 and 3).

We produced several reductions of the observations using independent analysis pipelines (see Methods). For each reduction, we created broadband and spectroscopic light curves in the ranges 2.725–3.716 μm for NRS1 and 3.829–5.172 μm for NRS2 using 10-pixel-wide bins (≈0.007 μm, median resolution R ≈ 600), excluding the detector gap between 3.717–3.823 μm. The light curves show a settling ramp during the first ten integrations (≈631.4 s), with a linear slope across the entire observation for NRS1. We otherwise see no substantial systematic trends and achieve improvements in precision from raw uncorrected to fitted broadband light curves of 1.63 to 1.03 times photon noise for NRS1 and 1.95 to 1.31 times for NRS2. The flux jump caused by the mirror-tilt event could be corrected by detrending against the spectral trace x and y positional shifts, normalizing the light curves or fitting the light curves with a step function (see Methods). We produced several fits from each set of light curves, resulting in a total of 11 independently measured transmission spectra. Figure 1 demonstrates that our spectroscopic light curves achieve precisions close to photon noise, with a median precision of 1.46 times photon noise across the full wavelength range (see Extended Data Fig. 4).

We show transmission spectra from several combinations of independent reductions and light-curve-fitting routines in Fig. 2, along with the weighted average of all 11 transmission spectra with the unweighted mean uncertainty produced by our analyses (see Methods). We find that using different combinations of reduction and fitting methods results in consistent transmission spectra (see Methods and Extended Data Fig. 5). Although we see some artefacts at the edges of the detectors (see Fig. 3, bottom panel) that may be caused by uncharacterized systematics, these only affect a small number of wavelength bins. Our resulting averaged NIRSpec G395H spectrum shows increased absorption towards bluer wavelengths short of 3.7 μm and a prominent absorption feature between 4.2 and 4.5 μm, along with a smaller-amplitude absorption feature at 4.1 μm and a narrow feature around 4.56 μm.

We compared the weighted-average G395H transmission spectrum to three grids of 1D radiative–convective–thermochemical equilibrium (RCTE) atmosphere models of WASP-39b (see Methods and Extended Data Table 2), containing a total of 10,308 model spectra. The best-fit models from each grid provide a reduced chi-square per data point (χ2/N) of 1.08–1.20 for our 344-data-point transmission spectrum (Fig. 3). The increased absorption at blue wavelengths across NRS1 is consistent with absorption from H2O (at 21.5σ; see Methods), whereas the large bump in absorption between 4.2 and 4.5 μm (ref. 29) can be attributed to CO2 (28.5σ). H2O and CO2 are expected atmospheric constituents for near-solar atmospheric metallicities, with the CO2 abundance increasing nonlinearly with higher metallicity30. The spectral feature at 4.56 μm (3.3σ) is unidentified at present but does not correlate with any obvious detector artefacts and is reproduced by several independent analyses. The absorption feature at 4.1 μm is also not seen in the RCTE model grids. After an exhaustive search for possible opacity sources (S.-M. Tsai et al., manuscript in preparation), described in the corresponding NIRSpec PRISM analysis31, we interpret this feature as SO2 (4.8σ), as it is the best candidate at this wavelength.

Although SO2 would have volume mixing ratios (VMRs) of less than 10−10 throughout most of the observable atmosphere in thermochemical equilibrium, coupled photochemistry of H2S and H2O can produce SO2 on giant exoplanets, with the resulting SO2 mixing ratio expected to increase with increasing atmospheric metallicity32,33,34. We find that a VMR of approximately 10−6 of SO2 is required to fit the spectral feature at 4.1 μm in the transmission spectrum of WASP-39b, consistent with lower-resolution NIRSpec PRISM observations of this planet31 and previous photochemical modelling of super-solar metallicity giant exoplanets34,35. Figure 4 shows a breakdown of the contributing opacity sources for the lowest χ2/N best-fit model (PICASO 3.0) with VMR = 10−5.6 injected SO2. The inclusion of SO2 in the models results in an improved χ2/N and is detected at 4.8σ (see Methods), confirming its presence in the atmosphere of WASP-39b.

We also look for evidence of CH4, CO, H2S and OCS (carbonyl sulfide) because their near-solar chemical equilibrium abundances could result in a contribution to the spectrum. We see no evidence of CH4 in our spectrum between 3.0 and 3.6 μm (ref. 23), which is indicative of C/O < 1 (ref. 36) and/or photochemical destruction34,37. With regards to CO, H2S and OCS, we were unable to conclusively confirm their presence with these data. In particular, CO, H2O, OCS and our modelled cloud deck all have overlapping opacity, which creates a pseudo-continuum from 4.6 to 5.1 μm (see Figs. 3 and 4). Therefore, we were unable to unambiguously identify the individual contributions from CO and other molecules over this wavelength region at the resolution presented in this work.

Our models show an atmosphere enriched in heavy elements, with best-fit parameters ranging from 3 to 10 times solar metallicity, given the spacing of individual model grids (see Methods). The spectra also indicate C/O ratios ranging from sub-solar to solar depending on the grid used, informed by the relative strength of absorption from carbon-bearing molecules to oxygen-bearing molecules. The interpretation of the relatively high resolution and precision of the G395H spectrum seems to be sensitive to the treatment of aerosols in the model, with one grid preferring 3 times solar metallicity when using a wavelength-dependent cloud opacity and physically motivated vertical cloud distribution38 but 10 times solar metallicity when assuming a grey cloud. In general, forward model grids fit the main features of the data but do not place statistically significant constraints on many of the atmospheric parameters (see Methods). Future interpretation of the JTEC ERS WASP-39b data with Bayesian retrieval analyses will provide robust confidence intervals for these planetary properties and explore the degree to which these data are sensitive to modelling assumptions (for example, chemical equilibrium versus disequilibrium) and parameter degeneracies (for example, clouds versus atmospheric metallicity).

We are able to strongly rule out an absolute C/O ≥ 1 scenario (χ2/N ≥ 3.97), which has previously been proposed for gas-dominated accretion at wide orbital radii beyond the CO2 ice line at which the gas may be carbon-rich39. Our C/O upper limit, therefore, suggests that WASP-39b may have either formed at smaller orbital radii with gas-dominated accretion or that the accretion of solids enriched the atmosphere of WASP-39b with oxygen-bearing species2. The level of metal enrichment (3–10 times solar) is consistent with similar measurements of Jupiter and Saturn40,41, potentially suggesting core-accretion formation scenarios42, and is consistent with upper limits from interior-structure modelling43. These NIRSpec G395H transmission spectroscopy observations demonstrate the promise of robustly characterizing the atmospheric properties of exoplanets with JWST unburdened by substantial instrumental systematics, unravelling the nature and origins of exoplanetary systems.

## Methods

### Data reduction

We produced several analyses of stellar spectra from the Stage 1 2D spectral images produced using the default STScI JWST Calibration Pipeline44 (‘rateints’ files) and by means of customized runs of the STScI JWST Calibration Pipeline with user-defined inputs and processes for steps such as the ‘jump detection’ and ‘bias subtraction’ steps.

Each pipeline starts with the raw ‘uncal’ 2D images that contain group-level products. As we noticed that the default superbias images were of poor quality, we produced two customized runs of the JWST Calibration Pipeline, using either the default bias step or a customized version. The customized step built a pseudo-bias image by computing the median pixel value in the first group across all integrations and then subtracted the new bias image from all groups. We note that the poor quality of the default superbias images affects NRS1 more notably than NRS2, and this method could be revised once a better superbias is available.

Before ramp fitting, both our standard and custom bias step runs of the edited JWST Calibration Pipeline ‘destriped’ the group-level images to remove so-called ‘1/f noise’ (correlated noise arising from the electronics of the readout pattern, which appears as column striping in the subarray images11,12). Our group-level destriping step used a mask of the trace 15σ from the dispersion axis for all groups within an integration, ensuring that a consistent set of pixels is masked within a ramp. The median values of non-masked pixels in each column were then computed and subtracted for each group.

The results of our customized runs of the JWST Calibration Pipeline are a set of custom group-level destriped products and custom bias-subtracted group-level destriped products. In both cases, the ramp-jump detection threshold of the JWST Calibration Pipeline was set to 15σ (as opposed to the default of 4σ), as it produced the most consistent results at the integration level. In both custom runs of the JWST Calibration Pipeline, all other steps and inputs were left at the default values.

For all analyses, wavelength maps from the JWST Calibration Pipeline were used to produce wavelength solutions, verified against stellar absorption lines, for both detectors. The mid-integration times in BJDTDB were extracted from the image headers for use in producing light curves. None of our data-reduction pipelines performed a flat-field correction, as the available flat fields were of poor quality and unexpectedly removed portions of the spectral trace. In general, we found that 1/f noise can be corrected at either the group or integration levels to similar effect; however, correction at the group level with a repeated column-by-column cleaning step at the integration level probably results in cleaner 1D stellar spectra. This was particularly true for NRS2, owing to the limited number of columns in which the unilluminated region on the detector extends both above and below the spectral trace, as shown in Extended Data Fig. 1.

Below we detail each of the independent data-reduction pipelines used to produce the time series of stellar spectra from our G395H observations.

#### ExoTiC-JEDI pipeline

We used the Exoplanet Timeseries Characterisation - JWST Extraction and Diagnostics Investigator (ExoTiC-JEDI45) pipeline on our custom group-level destriped products, treating each detector separately. Using the data-quality flags produced by the JWST Calibration Pipeline, we replaced any pixels identified as bad, saturated, dead, hot, low quantum efficiency or no gain value with the median value of surrounding pixels. We also searched each integration for pixels that were spatial outliers from the median of the surrounding 20 pixels in the same row by 6σ (to remove permanently affected ‘bad’ pixels) or outliers from the median of that pixel in the surrounding ten integrations in time by 20σ (to identify high-energy short-term effects such as cosmic rays) and replaced the outliers with the median values. To obtain the trace position and FWHM, we fitted a Gaussian to each column of an integration, finding a median standard deviation of 0.7 pixels. A fourth-order polynomial was fitted through the trace centres and the widths, which were smoothed with a median filter, to obtain a simple aperture region. This region extended 5 times the FWHM of the spectral trace, above and below the centre, corresponding to a median aperture width of 7 pixels. To remove any remaining 1/f and background noise from each integration, we subtracted the median of the unilluminated region in each column by masking all pixels that were 5 pixels away from the aperture. For each integration, the counts in each row and column of the aperture region were summed using an intrapixel extraction, taking the relevant fractional flux of the pixels at the edge of the aperture and cross-correlated to produce x-pixel and y-pixel shifts for detrending (see Extended Data Fig. 2). On average, the x-pixel shift represents movement of 1 × 10−4 and 8 × 10−6 of a pixel for NRS1 and NRS2, respectively. The aperture column sums resulted in 1D stellar spectra with errors calculated from photon noise after converting from data numbers using the gain factor. This reduction is denoted hereafter as ExoTiC-JEDI [V1].

We produced further 1D stellar spectra from both the custom group-level destriped product and custom bias-subtracted group-level destriped products using the ExoTiC-JEDI pipeline as described above, but with further cleaning by repeating the spatial outliers step. The reduction with further cleaning using the custom group-level destriped products is hence denoted as ExoTiC-JEDI [V2] and the reduction with further cleaning using the custom bias-subtracted group-level destriped products is hence denoted as ExoTiC-JEDI [V3].

#### Tiberius pipeline

We used the Tiberius pipeline, which builds on the LRG-BEASTS spectral reduction and analysis pipelines15,46,47, on our custom group-level destriped products. For each detector, we created bad-pixel masks by manually identifying hot pixels in the data. We then combined them with pixels flagged as greater than 3σ above the defined background. Before identifying the spectral trace, we interpolated each column of the detectors onto a grid 10 times finer than the initial spatial resolution. This step reduces the noise in the extracted data by improving the extraction of flux at the sub-pixel level, particularly where the edges of the photometric aperture bisect a pixel. We also interpolated over the bad pixels using their nearest-neighbouring pixels in x and y.

We traced the spectra by fitting Gaussians at each column and used a running median, calculated with a moving box with a width of five data points, to smooth the measured centres of the trace. We fitted these smoothed centres with a fourth-order polynomial, removed points that deviated from the median by 3σ and refitted with a fourth-order polynomial. To remove any residual background flux not captured by the group-level destriping, we fitted a linear polynomial along each column, masking the stellar spectrum. This was defined by an aperture with a width of 4 pixels centred on the trace. We also masked an extra 7 pixels on either side of the aperture so that the background was not fitting the wings of the stellar PSF and we clipped any pixels in the background that deviated by more than 3σ from the mean for that particular column and frame. After removing the background in each column, the stellar spectra were then extracted by summing within a 4-pixel-wide aperture and correcting for pixel oversampling caused by the interpolation onto a finer grid, as described above. The uncertainties in the stellar spectra were calculated from the photon noise before background subtraction.

#### transitspectroscopy pipeline

We used the transitspectroscopy pipeline48 on the ‘rateints’ products of the JWST Calibration Pipeline, treating each detector separately. The trace position was found from the median integration by cross-correlating each column with a Gaussian function, removing outliers using a median filter with a 10-pixel-wide window and smoothing the trace with a spline. We removed 1/f noise from the ‘rateints’ products by masking all pixels within 10 pixels from the centre of the trace and calculating and removing the median value from all columns. We then used optimal extraction49 to obtain the 1D stellar spectra, with a 5-pixel-wide aperture above and below the trace. This allowed us to treat bad pixels and cosmic rays that had not been accounted for or masked in the ‘rateints’ products in an automated fashion. To monitor systematic trends in the observations, we also calculated the trace centre as described above and the FWHM for all integrations. The FWHM was calculated at each column and at each integration by first subtracting each column to half the maximum value in it, with a spline used to interpolate the profile. The roots of this profile were then found to estimate the FWHM.

#### Eureka! pipeline

We used two customized versions of the Eureka! pipeline50, which combines standard steps from the JWST Calibration Pipeline with an optimal extraction scheme to obtain the time series of stellar spectra.

The first Eureka! reduction used our custom group-level destriped products and applied Stages 2 and 3 of Eureka! Stage 2, a wrapper of the JWST Calibration Pipeline, followed the default settings up to the flat fielding and photometric calibration steps, which were both skipped. Stage 3 of Eureka! was then used to perform the background subtraction and extraction of the 1D stellar spectra. We started by correcting for the curvature of NIRSpec G395H spectra by shifting the detector columns by whole pixels, to bring the peak of the distribution of the counts in each column to the centre of our subarray. To ensure that this curvature correction was smooth, we computed the shifts in each column for each integration from the median integration frame in each segment and applied a running median to the shifts obtained for each column. The pixel shifts were applied with periodic boundary conditions, such that pixels shifted upwards from the top of the subarray appeared at the bottom after the correction, ensuring no pixels were lost. We applied a column-by-column background subtraction by fitting and subtracting a flat line to each column of the curvature-corrected data frames, obtained by fitting all pixels further than six pixels from the central row. We also performed a double iteration of outlier rejection in time with a threshold of 10σ, along with a 3σ spatial outlier-rejection routine, to ensure that bad pixels were not biasing our background correction. These outlier-rejection thresholds were selected to remove clear outliers in the data and provide a balance with the background subtraction step. We performed optimal extraction using an extraction profile defined from the median frame, the central nine rows of our subarray (four rows on either side of the central row). We also measured the vertical shift in pixels of the spectrum from one integration to the other using cross-correlation and the average PSF width at each integration, obtained by adding all columns together and fitting a Gaussian to the profile to estimate its width. This reduction is henceforth denoted as Eureka! [V1].

The second Eureka! reduction (Eureka! [V2]) used the ‘rateints’ outputs of the JWST Calibration Pipeline and applied Stage 2 of Eureka! as described above, with a modified version of Stage 3. In this reduction, we corrected the curvature of the trace using a spline and found the centre of the trace using the median of each column. We removed 1/f noise by subtracting the mean from each column, excluding the region 6 pixels away from the trace, sigma-clipping outliers at 3σ. We extracted the 1D stellar spectra using a 4-pixel-wide aperture on either side of the trace centre.

### Limb-darkening

Limb-darkening is a function of the physical structure of the star that results in variations in the specific intensity of the light from the centre of the star to the limb, such that the limb looks darker than the centre. This is because of the change in depth of the stellar atmosphere being observed. At the limb of the star, the region of the atmosphere being observed at slant geometry is at higher altitudes and lower density, and thus lower temperatures, compared with the deeper atmosphere observed at the centre of the star, at which hotter, denser layers are observed. The effect of limb-darkening is most prominent at shorter wavelengths, resulting in a more U-shaped light curve compared with the flat-bottomed light curves observed at longer wavelengths. To account for the effects of limb-darkening on the time-series light curves, we used analytical approximations for computing the ratio of the mean intensity to the central intensity of the star. The most commonly used limb-darkening laws for exoplanet transit light curves are the quadratic, square-root and nonlinear four-parameter laws51:

$$\frac{I(\mu )}{I(1)}=1-{u}_{1}\left(1-\mu \right)-{u}_{2}{\left(1-\mu \right)}^{2}$$

Square-root:

$$\frac{I(\mu )}{I(1)}=1-{s}_{1}\left(1-\mu \right)-{s}_{2}\left(1-\sqrt{\mu }\right)$$

Nonlinear four-parameter:

$$\frac{I(\mu )}{I(1)}=1-{c}_{1}(1-{\mu }^{0.5})-{c}_{2}(1-\mu )-{c}_{3}(1-{\mu }^{1.5})-{c}_{4}(1-{\mu }^{2})$$
(1)

in which I(1) is the specific intensity in the centre of the disk, u1, u2, s1, s2, c1, c2, c3 and c4 are the limb-darkening coefficients and μ = cos(γ), in which γ is the angle between the line of sight and the emergent intensity.

The limb-darkening coefficients depend on the particular stellar atmosphere and therefore vary from star to star. For consistency across all of the light-curve fitting, we used 3D stellar models52 for Teff = 5,512 K, log(g) = 4.47 cgs and Fe/H = 0.0, along with the instrument throughput to determine I and μ. As instrument commissioning showed that the throughput was higher than the pre-mission expectations53, a custom throughput was produced from the median of the measured ExoTiC-JEDI [V2] stellar spectra, divided by the stellar model and Gaussian smoothed.

For the limb-darkening coefficients that were held fixed, we used the values computed using the ExoTiC-LD54,55 package, which can compute the linear, quadratic and three-parameter and four-parameter nonlinear limb-darkening coefficients51,56. To compute and fit for the coefficients from the square-root law, we used previously outlined formalisms57,58. We highlight that we do not see any dependence in our transmission spectra on the limb-darkening procedure used across our independent reductions and analyses.

### Light-curve fitting

From the time series of extracted 1D stellar spectra, we created our broadband transit light curves by summing the flux over 2.725–3.716 μm for NRS1 and 3.829–5.172 μm for NRS2. For the spectroscopic light curves, we used a common 10-pixel binning scheme within these wavelength ranges to generate a total of 349 spectroscopic bins (146 for NRS1 and 203 for NRS2). We also tested wider and narrower binning schemes but found that 10-pixel-wide bins achieved the best compromise between the noise in the spectrum and showcasing the abilities of G395H across analyses. In our analyses, we treated the NRS1 and NRS2 light curves independently to account for differing systematics across the two detectors. To construct the full NIRSpec G395H transmission spectrum of WASP-39b, we fitted the NRS1 and NRS2 broadband and spectroscopic light curves using 11 independent light-curve-fitting codes, which are detailed below. When starting values were required, all analyses used the same system parameters37. In many of our analyses, we detrended the raw broadband and spectroscopic light curves using the time-dependent decorrelation parameters for the change in the FWHM of the spectral trace or the shift in x-pixel and y-pixel positions (Extended Data Fig. 2). We also used various approaches to account for the mirror-tilt event, which we found to have a smaller effect at longer wavelengths (Extended Data Fig. 3).

Using fitting pipeline 1, we measured a centre of transit time (T0) of T0 = 2,459,791.612039 ± 0.000017 BJDTDB and T0 = 2,459,791.6120689 ± 0.000021 BJDTDB computed from the NRS1 and NRS2 broadband light curves, respectively; most of the fitting pipelines obtained T0 within 1σ of the quoted uncertainty.

For each of our analyses, we computed the expected photon noise from the raw counts taking into account the instrument read noise (16.18 e on NRS1 and 17.75 e on NRS2), gain (1.42 for NRS1 and 1.62 for NRS2) and the background counts (which are found to be negligible after cleaning) and compared it to the final signal-to-noise ratio in our light curves (see Fig. 1). We also determine the level of white and red noise in our spectroscopic light curves by computing the Allan deviation59, which is used to measure the deviation from the expected photon noise by binning the data into successively smaller bins (that is, fewer data points per bin) and calculating the signal-to-noise ratio achieved60. Extended Data Fig. 4 shows the Allan deviation for three of the 11 reductions performed on the data (see the ExoTiC-ISM noise_calculator function54).

Although there is a general consensus across each of the data analyses, by comparing the results of each fitting pipeline, we were better able to evaluate the impact of different approaches to the data reduction, such as the removal of bad pixels. For future studies, we recommend the application of several pipelines that use differing analysis methods, such as the treatment of limb-darkening, systematic effects and noise removal. No single pipeline presented on its own can fully evaluate the measured impact of each effect, given the differing strategies, targets and potential for chance events such as a mirror tilt with each observation. In particular, attention should be paid to 1/f noise removal at the group versus integration levels for observations with fewer groups per integration than this study.

Below, we detail each of our 11 fitting pipelines and summarise them in Extended Data Table 1.

#### Fitting pipeline 1: ExoTiC-JEDI

We fitted the broadband and spectroscopic light curves produced from the ExoTIC-JEDI [V3] stellar spectra using the least-squares optimizer, scipy.optimize lm (ref. 61). We simultaneously fitted a batman transit model62 with a constant baseline and systematics models for data pre-tilt and post-tilt event, fixing the centre of transit time T0, the ratio of the semi-major axis to stellar radius a/R and the inclination i to the broadband light-curve best-fit values. The systematics models included a linear regression on x and y, for which x and y are the measured trace positions in the dispersion and cross-dispersion directions, respectively. We accounted for the tilt event by normalizing the light curve pre-tilt by the median pre-transit flux and normalizing the light curve post-tilt by the median post-transit flux. We discarded the first 15 integrations and the three integrations during the tilt event. Fourteen-pixel columns were discarded owing to outlier pixels directly on the trace. We fixed the limb-darkening coefficients to the four-parameter nonlinear law.

#### Fitting pipeline 2: Tiberius

We used the broadband light curves generated from the Tiberius stellar spectra and fitted for the ratio of the planet to stellar radii Rp/R, as well as i, T0, a/R, the quadratic law limb-darkening coefficient u1 and the systematics model parameters, the x-pixel and y-pixel shifts, FWHM and sky background, with the period P, the eccentricity e and u2 fixed. We used uniform priors for all the fitted parameters. Our analytic transit light-curve model was generated with batman. We fitted our broadband light curve with a transit + systematics model using a Gaussian process (GP)63,64, implemented through george65, and a Markov chain Monte Carlo method, implemented through emcee66. For our Tiberius spectroscopic light curves, we held a/R, i and T0 fixed to the values determined from the broadband light-curve fits and applied a systematics correction from the broadband light-curve fit to aid in fitting the mirror-tilt event. We fitted the spectroscopic light curves using a GP with an exponential squared kernel for the same systematics detrending parameters detailed above. We used a Gaussian prior for a/R and uniform priors for all other fitted parameters.

#### Fitting pipeline 3: Aesop

We used transit light curves from the ExoTiC-JEDI [V1] stellar spectra and fit the broadband and spectroscopic light curves using the least-squares minimizer LMFIT67. We fitted each light curve with a two-component function consisting of a transit model (generated using batman) multiplied by a systematics model. Our systematics model included the x-pixel and y-pixel positions on the detector (x, y, xy, x2 and y2). To capture the amplitude of the tilt event in our systematics model, we carried out piecewise linear regression on the out-of-transit baseline pre-tilt and post-tilt. We first fit the broadband light curve by fixing P and e and fitting for T0, a/R, i, Rp/R, stellar baseline flux and systematic trends using wide uniform priors. For the spectroscopic light curves, we fixed T0, a/R and i to the best-fit values from the broadband light curve and fit for Rp/R. We held the nonlinear limb-darkening coefficients fixed.

#### Fitting pipeline 4: transitspectroscopy

We fit the broadband and spectroscopic light curves produced from the transitspectroscopy stellar spectra, running juliet68 in parallel with the light-curve-fitting module of the transitspectroscopy pipeline48 with dynamic nested sampling through dynesty69 and analytical transit models computed using batman. We fit the broadband light curves for NRS1 and NRS2 individually, fixing P, e and ω and fitting for the impact parameter b, as well as T0, a/R, Rp/R, extra jitter and the mean out-of-transit flux. We also fitted two linear regressors, a simple slope and a ‘jump’ (a regressor with zeros before the tilt event and ones after the tilt event), scaled to fit the data. We fitted the square-root-law limb-darkening coefficients using the Kipping sampling scheme. We fitted the spectroscopic light curves at the native resolution of the instrument, fixing T0, a/R and b. We used the broadband light-curve systematics model for the spectroscopic light curve, with wide uniform priors for each wavelength bin, and set truncated normal priors for the square-root-law limb-darkening coefficients. We also fitted a jitter term added in quadrature to the error bars at each wavelength with a log-uniform prior between 10 and 1,000 ppm. We computed the mean of the limb-darkening coefficients by first computing the nonlinear coefficients from ATLAS models70 and passing them through the SPAM algorithm71. We binned the data into 10-pixel-wavelength bins after fitting the native-resolution light curves.

#### Fitting pipeline 5: ExoTEP

We fitted the transit light curves from the Eureka! [V1] stellar spectra using the ExoTEP analysis framework72,73,74,75. ExoTEP uses batman to generate analytical light-curve models, adds an analytical instrument systematics model along with a photometric scatter parameter and fits for the best-fit parameters and their uncertainties using emcee. Before fitting, we cleaned the light curves by running ten iterations of 5σ clipping using a running median of window length 20 on the flux, x-pixel and y-pixel shifts and the ‘ydriftwidth’ data product from Eureka! Stage 3 (the average spatial PSF width at each integration). Our systematics model consisted of a linear trend in time with a ‘jump’ (constant offset) after the tilt event. The ‘ydriftwidth’ was used before the fit to locate the tilt event. We used a running median of ‘ydriftwidth’ to search for the largest offset and flagged every data point after the tilt event so that they would receive a constant ‘jump’ offset in our systematics model. We also removed the first point of the tilt event in our fits, as it was not captured by the ‘jump’ model. We fitted the broadband light curves, fitting for Rp/R, photometric scatter, T0, b, a/R, the quadratic limb-darkening coefficients and the systematics model parameters (normalization constant, slope in time and constant ‘jump’ offset). We used uninformative flat priors on all the parameters. The orbital parameters were fixed to the best-fit broadband light curve values for the subsequent spectroscopic light-curve fits.

#### Fitting pipeline 6

We fitted transit light curves from the ExoTiC-JEDI [V1] stellar spectra using a custom lmfit light-curve-fitting code. The final systematic detrending model included a batman analytical transit model multiplied by a systematics model consisting of a linear stellar baseline term, a linear term for the x-pixel and y-pixel shifts and an exponential ramp function. The tilt event was accounted for by decorrelating the light curves with the y-pixel shifts, using a (1 + constant × y-shift) term with the constant fitted for in each light curve. For the broadband light-curve fits, we fixed P and fitted for T0, i, Rp/R, a/R, x-pixel and y-pixel shifts and the exponential ramp amplitude and timescale. We fixed the nonlinear limb-darkening coefficients. For the spectroscopic light-curve fits, we fixed the values of T0, i and a/R and the exponential ramp timescale to the broadband light-curve-fit values, and fitted for Rp/R, the x-pixel and y-pixel shifts and the ramp amplitude. Wide, uniform priors were used on all the fitting parameters in both the broadband and spectroscopic light-curve fits.

#### Fitting pipeline 7

We fitted transit light curves from the Eureka! [V2] stellar spectra, using PyLightcurve (ref. 75) to generate the transit model with emcee as the sampler. We calculated the nonlinear four-parameter limb-darkening coefficients using ExoTHETyS (ref. 76), which relies on PHOENIX 2012–2013 stellar models77,78, and fixed these in our fits to the precomputed theoretical values. Our full transit + systematics model included a transit model multiplied by a second-order polynomial in the time domain. We accounted for the tilt event by subtracting the mean of the last 30 integrations of the pre-transit data from the mean of the first 30 integrations of the post-transit data, to account for the jump in flux, shifting the post-transit light curve upwards by the jump value. We fitted for the systematics (the parameters of the second-order polynomial), Rp/R and T0. We used uniform priors for all the fitted parameters. We adopted the root mean square of the out-of-transit data as the error bars for the light-curve data points to account for the scatter in the data.

#### Fitting pipeline 8

We used the transit light curves generated from the ExoTiC-JEDI [V1] stellar spectra. We fit the broadband light curves with a batman transit model multiplied by a second-order systematics model as a function of x-pixel and y-pixel shifts. We fixed both of the quadratic limb-darkening coefficients for each wavelength bin. We fitted for Rp/R, i, T0 and a/R, using wide uninformed priors, and ran our fits using emcee. For the spectroscopic light-curve fits, we fixed i and a/R to the broadband light-curve best-fit values and fitted for an extra error term added in quadrature.

#### Fitting pipeline 9

We used the transit light curves from the ExoTiC-JEDI [V1] stellar spectra. We fixed both of the quadratic limb-darkening coefficients and fitted the light curves with a batman transit model multiplied by a systematics model of a second-order function of x-pixel and y-pixel shifts. We fixed the best-fit broadband light-curve values for T0, a/R and i for the spectroscopic light-curve fits and fitted for Rp/R using emcee for each 10-pixel bin, with the walkers initialized in a tight cluster around the best-fit solution from a Levenberg–Marquardt minimization. For both the broadband and spectroscopic light curves, we also fit for an extra per-point error term.

#### Fitting pipeline 10

We fitted the transit light curves from the ExoTiC-JEDI [V2] stellar spectra and performed our model fitting using automatic differentiation implemented with JAX (ref. 79). We used a GP systematics model with a time-dependent Matérn (ν= 3/2) kernel and a variable white-noise jitter term. The mean function consists of a linear trend in time plus a sigmoid function to account for the drop in measured flux that occurred mid-transit owing to the mirror-tilt event. For the transit model, we used the exoplanet package80, making use of previously developed light-curve models81,82. For the GP systematics component, a generalization of the algorithm used by the celerite package83 was adapted for JAX. We fixed both of the quadratic limb-darkening coefficients. For the initial broadband light-curve fit, both NRS1 and NRS2 were fitted simultaneously. All transit parameters were shared across both light curves, except for Rp/R, which was allowed to vary for NRS1 and NRS2 independently. We fitted for T0, the transit duration b and both Rp/R values. For the spectroscopic light-curve fits, all transit parameters were then fixed to the maximum-likelihood values determined from the broadband fit, except for Rp/R, which was allowed to vary for each wavelength bin. Uncertainties for the transit model parameters, including Rp/R, were assumed to be Gaussian and estimated using the Fisher information matrix at the location of the maximum-likelihood solution, which was computed exactly using the JAX automatic differentiation framework.

#### Fitting pipeline 11: Eureka!

We used transit light curves from the Eureka! [V2] time-series stellar spectra with the open-source Eureka! code to estimate the best-fit transit parameters and their uncertainties using a Markov chain Monte Carlo method fit to the data (implemented by emcee). A linear trend in time was used as a systematics model and a step function was used to account for the tilt event. We fixed a/R, i, T0 and the time of the tilt event to the best-fit values from the NRS1 broadband light curve, with the three integrations during the tilt event clipped from the light curve. We fitted for Rp/R, both quadratic limb-darkening coefficients, the linear time trend and the magnitude of the step from the tilt event, with uniform priors for both the magnitude of the step and the limb-darkening coefficients.

### Transmission spectral analysis

On the basis of the independent light-curve fits described above, we produced 11 transmission spectra from our NIRSpec G395H observations using several analyses and fitting methods. Extended Data Table 1 shows a breakdown of the different steps used in each fitting pipeline. In this work, three different 2D spectral image products were used, producing seven different 1D stellar spectra. Eleven different fitting pipelines using five different limb-darkening methods were then applied. Each of these fitting pipelines resulted in an independent analysis of the observations and 11 comparative transmission spectra. Extended Data Fig. 5 details comparative information for all 11 analyses to quantify their similarities and differences.

We computed the standard deviation of the 11 spectra in each wavelength bin and compared this to the mean uncertainty obtained in that bin. The average standard deviation in each bin across all fitting pipelines was 199 ppm, compared with an average uncertainty of 221 ppm (which ranged from 131 to 625 ppm across the bins). The computed standard deviation in each bin across all pipelines ranged from 85 to 1,040 ppm, with greater than 98% of the bins having a standard deviation lower than 500 ppm. We see an increase in scatter at longer wavelengths, with the structure of the scatter following closely with the measured stellar flux, for which throughput beyond 3.8 μm combines with decreasing stellar flux. The unweighted mean uncertainty of all 11 transmission spectra follows a similar structure to the standard deviation, with the uncertainty increasing at longer wavelengths. The uncertainties from each fitting pipeline are consistent to within 3σ of each other, with the uncertainty per bin typically overestimated compared with the mean uncertainty across all reductions.

From all 11 transmission spectra, we computed a weighted-average transmission spectrum using the transit depth values from all reductions in each bin weighted by 1/variance (1/σ2, in which σ is the uncertainty on the data point from each reduction). For this weighted-average transmission spectrum, the unweighted mean of the uncertainties in each bin was used to represent the error bar on each point. By using the weighted average of all 11 independently obtained transmission spectra, we therefore do not apply infinite weight to any one reduction in our interpretation of the atmosphere. Although the weighted average could be sensitive to any one spectrum with underestimated uncertainties, we find that our uncertainties are typically overestimated compared with the average. Similarly, we chose to use the mean rather than the median of the transmission spectral uncertainties, as this results in a more conservative estimate of the uncertainties in each bin. We find that all of the 11 transmission spectra are within 2.95σ of the weighted-average transmission spectrum without applying offsets.

We calculated normalized transmission spectrum residuals for each fitting pipeline by subtracting the weighted-average spectrum and dividing by the uncertainty in each bin. We generated histograms of the normalized transmission spectrum residuals and used the mean and standard deviation of the residuals to compute a normalized probability density function (PDF). We performed a Kolmogorov–Smirnov test on each of the normalized residuals and found that they are all approximately symmetric around their means, with normal distributions. This confirms that they are Gaussian in shape, with the null hypothesis that they are not Gaussian strongly rejected in the majority of cases (see Extended Data Fig. 5).

The PDFs of the residuals indicate three distinct clusters of computed spectra based on their deviations from the mean and their spreads. The first cluster is negatively offset by less than 200 ppm and corresponds to fitting pipelines that used extracted stellar spectra and that underwent further cleaning steps (for example, ExoTiC-JEDI [V3]). The second cluster is positively offset from the mean by about 120 ppm and contains most of the transmission spectra produced. We see no obvious trends in this group to any specific reduction or fitting process. The final cluster is centred around the mean but has a broad distribution, suggesting a larger scatter both above and below the average transmission spectrum. This is probably the result of uncleaned outliers or marginal offsets between NRS1 and NRS2. These transmission spectra demonstrate that the 11 independent fitting pipelines are able to accurately reproduce the same transmission spectral feature structures, further highlighting the minimized impact of systematics on the time-series light curves. We suspect that the minor differences resulting from different reduction products and fitting pipelines are linked to the superbias and treatment of 1/f noise. We anticipate that the impacts of these will be improved with new superbias images, expected to be released soon by STScI, and with more detailed investigation into the impact of 1/f noise at the group level beyond the scope of this work.

### Model comparison

To identify spectral absorption features, we compared the resulting weighted-average transmission spectrum of WASP-39b to several 1D RCTE atmosphere models from three independent model grids. Each forward model is computed on a set of common physical parameters (for example, metallicity, C/O ratio, internal temperature and heat redistribution), shown in Extended Data Table 2. Additionally, each model grid contains different prescriptions for treating certain physical effects (for example, scattering aerosols). Although each grid contains different opacity sources from varying line lists (see Extended Data Table 2), they each consider all of the main molecular and atomic species84. Each model transmission spectrum from the grids was binned to the same resolution as that of the observations to compute the χ2 per data point, with a wavelength-independent transit depth offset as the free parameter. In general, the forward model grids fit the main features of the data but are unable to place statistically significant constraints on many of the atmospheric parameters, owing to both the finite nature of the forward model grid spacing13 and the insensitivity of some of these parameters to the 3–5-μm transmission spectrum of WASP-39b (for example, >100 K differences in interior temperature provided nearly identical χ2/N).

#### ATMO

We used the ATMO RCTE grid85,86,87,88, which consists of model transmission spectra for different day–night energy redistribution factors, atmospheric metallicities, C/O ratios, haze factors and grey cloud factors with a range of line lists and pressure-broadening sources88. In total, there were 5,160 models. Within this grid, we find the best-fit model to have 3 times solar metallicity, with a C/O ratio of 0.35 and a grey cloud opacity 5 times the strength of H2 Rayleigh scattering at 350 nm and a χ2/N = 1.098 for N= 344 data points and only fitting for an absolute altitude change in y.

#### PHOENIX

We calculated a grid of transmission spectra using the PHOENIX atmosphere model89,90,91, varying the heat redistribution of the planet, atmospheric metallicity, C/O ratio, internal temperature, the presence of aerosols and the atmospheric chemistry (equilibrium or rainout). Opacities used include the BT2 H2O line list92, as well as HITRAN for 129 other main molecular absorbers93 and Kurucz and Bell data for atomic species94. The HITRAN line lists available in this version of PHOENIX are often complete only at room temperature, which may be the cause of the apparent shift in the CO2 spectral feature compared with the other grids that primarily use HITEMP and ExoMol lists. This shift is the cause of the difference in χ2 between PHOENIX and the other model grids. In total, there were 1,116 models. Within this grid, the best-fit model has 10 times solar metallicity, a C/O ratio of 0.3, an internal temperature of 400 K, rainout chemistry and a cloud deck top at 0.3 mbar. The best-fit model has a χ2/N = 1.203 for N= 344 data points.

#### PICASO 3.0 and Virga

We used the open-source radiative–convective equilibrium model PICASO 3.0 (refs. 95,96), which has its heritage in the Fortran-based EGP mode97,98, to compute a grid of 1D pressure–temperature models for WASP-39b. The opacity sources included in PICASO 3.0 are listed in Extended Data Table 2. Of the 29 molecular opacity sources included, the line lists of notable molecules used were: H2O (ref. 99), CO2 (ref. 100), CH4 (ref. 101) and CO (ref. 102). The parameters varied in this grid of models include the interior temperature of the planet (Tint), atmospheric metallicity, C/O ratio and the dayside-to-nightside heat redistribution factor (see Extended Data Table 2), with correlated-k opacities98,103. In total, there were 192 cloud-free models. We include the effect of clouds in two ways. First, we post-processed the pressure–temperature profile using the cloud model Virga95,104, which follows from previously developed methodologies38, in which we included three condensable species (MnS, Na2S and MgSiO3). Virga requires a vertical mixing parameter, Kzz (cm2 s−1), and a vertically constant sedimentation efficiency parameter, fsed. In general, fsed controls the vertical extent of the cloud opacity, with low values (fsed < 1) creating large, vertically extended cloud decks with small particle sizes. In total, there were 3,840 cloudy models. The best fit from our grid with Virga-computed clouds has 3 times solar metallicity, solar C/O (0.458) and fsed = 0.6, which results in a χ2/N = 1.084.

As well as the grid fit, we also use the PICASO framework to quantify the feature-detection significance. In this method, we are able to incorporate clouds on the fly using the fitting routine PyMultiNest105. We fit for each of the grid parameters using a nearest-neighbour technique and a radius scaling to account for the unknown reference pressure, giving five parameters in total. When fitting for clouds, we either fit for Kzz and fsed in the Virga framework (seven parameters in total) or we fit for the cloud-top pressure corresponding to a grey cloud deck with infinite opacity (six parameters in total). These results are described in the following section.

### Feature-detection significance

From the chemical equilibrium results of the single best-fit models, the molecules that could potentially contribute to the spectrum based on their abundances and 3–5-μm opacity sources are H2 and He (via continuum) and CO, H2O, H2S, CO2 and CH4. More minor sources of opacity with VMR abundances <1 ppm are molecules such as OCS and NH3. For example, removing H2S, NH3 and OCS from the single best-fit PICASO 3.0 model increases the chi-square value by less than 0.002. Therefore, we focus on computing the statistical significance of only H2O, SO2, CO2, CH4 and CO.

To quantify the statistical significance, we performed two different tests. First, we used a Gaussian residual fitting analysis, as used in other JTEC ERS analyses23,29,31. In this method, we subtracted the best-fit model without a specific opacity source from the weighted-average spectrum of WASP-39b, isolating the supposed spectral feature. We then fit a three-parameter or four-parameter Gaussian curve to the residual data using a nested sampling algorithm to calculate the Bayesian evidence106. For H2O and CO, the extra transit depth offset parameter for the Gaussian fit was necessary to account for local mismatch of the fit to the continuum, whereas only a mean, standard deviation and scale parameter were required for a residual fit to the other molecules. We then compared this to the Bayesian evidence of a flat line to find the Bayes factor between a model that fits the spectral feature versus a model that excludes the spectral feature. These fits are shown in Extended Data Fig. 6.

Although the Gaussian residual fitting method is useful for quantifying the presence of potentially unknown spectral features, it cannot robustly determine the source of any given opacity. We therefore used the Bayesian fitting routine from PyMultiNest in the PICASO 3.0 framework to refit the grid parameters, while excluding the opacity contribution from the species in question. Then, we compared the significance of the molecule through a Bayes factor analysis107. Those values are shown in Extended Data Table 3.

We find significant evidence (>3σ) for H2O, CO2 and SO2. In general, the two methods only agree well for molecules whose contribution has a Gaussian shape (that is, SO2 and CO2). For example, for CO2, we find decisive 28.5σ and 26.9σ detections for the Bayes factor and Gaussian analysis, respectively. Similarly, for H2O, we find 21.5σ and 16.5σ detections, respectively. The evidence for SO2 is less substantial, but both methods give significant detections of 4.8σ and 3.5σ, respectively. Although the Gaussian fitting method found a broad 1-μm-wide residual in the region of CO (that is, >4.5 μm), its shape was unlike that seen with the PRISM data31. CO remained undetected with the Bayesian fitting analysis and therefore we are unable to robustly confirm evidence of CO. Similarly, no evidence for CH4 was found23. Gaussian residual fitting in the region of CH4 absorption only found a very broad inverse Gaussian and so is not included in Extended Data Table 3.

### SO2 absorption

We performed an injection test with the PICASO best-fit model in the PyMultiNest fitting framework to determine the abundance of SO2 required to match the observations. We add SO2 opacity using the ExoMol line list108, without rerunning the RCTE model to self-consistently compute a new climate profile. Fitting for the cloud deck dynamically, without SO2, produces a single best estimate of 10 times solar metallicity, sub-solar C/O (0.229), resulting in a marginally worse χ2/N = 1.11. With SO2, the single best fit tends back to 3 times solar metallicity, solar C/O. This suggests that cloud treatment and the exclusion of spectrally active molecules have an effect on the resultant physical interpretation of bulk atmospheric parameters. Ultimately, if we fit for SO2 in our PyMultiNest framework with the Virga cloud treatment, we obtain 3 times solar metallicity, solar C/O, log SO2 = −5.6 ± 0.1 (SO2 = 2.5 ± 0.65 ppm) and χ2/N = 1.02, which is our single best-fit model (shown in Fig. 4). For context, an atmospheric metallicity of 3–10 times solar would provide a thermochemical equilibrium abundance of 72–240 ppm H2S, the presumed source for photochemically produced SO2 (ref. 36).

To confirm the plausibility of SO2 absorption to explain the 4.1-μm spectral feature, we also computed models with prescribed, vertically uniform SO2 VMRs of 0, 1, 5 and 10 ppm using the structure from the best-fit PHOENIX model (10 times solar metallicity, C/O = 0.3). We calculated ad hoc spectra using the gCMCRT radiative transfer code109 with the ExoMol SO2 line list108 (see Extended Data Fig. 7). Linearly interpolating the models with respect to the SO2 abundance and performing a Levenberg–Marquardt regression gave a best-fit value of 4.6 ± 0.67 ppm. Inserting this abundance of SO2 into the best-fit PHOENIX model improves the χ2/N from 1.2 to 1.08.

Future atmospheric retrievals can provide a more statistically robust measurement for the SO2 abundance and add extra information from the similar absorption seen in the PRISM transmission spectrum29,31.

### 4.56-μm feature

A 0.08-μm-wide bump in transit depth centred at 4.56 μm is not fit by any of the model grids. This feature, picked up by the resolution of G395H, is not clearly seen in other ERS observations of WASP-39b. Following the same Gaussian residual fitting procedure as described above, we found a feature significance of 3.3σ (see Extended Data Fig. 6). To identify possible opacity sources in the atmosphere of WASP-39b that might be the cause of this absorption, we compared the feature with CH4 (ref. 110), C2H2 (ref. 111), C2H4 (ref. 112), C2H6 (ref. 113), CO (ref. 114), CO2 (ref. 100), CS2 (ref. 113), CN (ref. 115), HCN (ref. 116), HCl (ref. 113), H2S (ref. 117), HF (ref. 118), H3+ (ref. 119), LiCl (ref. 115), NH3 (ref. 120), NO (ref. 114), NO2 (ref. 113), N2O (ref. 114), N2 (ref. 121), NaCl (ref. 122), OCS (ref. 113), PH3 (ref. 123), PN (ref. 124), PO (ref. 125), SH (ref. 126), SiS (ref. 127), SiH4 (ref. 128), SiO (ref. 129), the X–X state of SO (ref. 130), SO2 (ref. 108), SO3 (ref. 108) and isotopologues of H2O, CH4, CO2 and CO, but did not find a convincing candidate that showed opacity at the correct wavelength or the correct width. The narrowness of the feature suggests that it could be a very distinct Q-branch, in which the rotational quantum number in the ground state is the same as the rotational quantum number in the excited state. However, of the molecules we explored, there were no candidates with a distinct Q-branch at this wavelength whose P-branch and R-branch did not obstruct the neighbouring CO2 and continuum-like CO + H2O opacity.

We also note that many of these species lack high-temperature line-list data, making it difficult to definitively rule out such species. For example, OCS, SO and CS2 are available in HITRAN2020 (ref. 113) but not in ExoMol131. Furthermore, if photochemistry is important for WASP-39b, as indicated by the presence of SO2, then there may be many species out of equilibrium that may contribute to the transit spectrum, some of which do not have high-temperature opacity data at present (such as OCS, NH2 or HSO). Future observations over this wavelength region of this and other planets may confirm or refute the presence of this unknown absorber.