Chemical production is estimated to be the dominant atmospheric source of formic acid (HCOOH), with a substantial contribution ascribed to sunlight-induced degradation of volatile organic compounds (VOCs) emitted by plants6,8,9. Direct HCOOH emissions are thought to account for less than 15% of the total production6,8,9. The overall atmospheric lifetime of HCOOH is 2–4 days, owing to efficient wet and dry deposition in the atmospheric boundary layer6,7,10, but increases to about 25 days in cloud-free tropospheric conditions.

Here we use the global chemistry–climate model ECHAM5/MESSy11 (EMAC) to simulate atmospheric HCOOH abundance. The reference simulation (EMAC(base)) implements the chemical formation pathways that are usually accounted for8,9,12 (Methods). Using Infrared Atmospheric Sounding Interferometer (IASI)/Metop-A satellite column measurements13 to determine the HCOOH burden (Methods), EMAC(base) illustrates the issue (Fig. 1a, b): the model globally underpredicts the satellite columns by a factor of 2–5. Similar biases relative to ground-based Fourier transform infrared (FTIR) columns are observed at several latitudes (Extended Data Fig. 1). These persistent discrepancies point to substantial unidentified sources of atmospheric HCOOH.

Fig. 1: Formic acid abundance from satellite and model.
figure 1

ad, Total formic acid (HCOOH) column (colour scale) derived from IASI satellite observations (a), or simulated by the base version of the model (EMAC(base); b) or by the model that implements the multiphase production of HCOOH (c, EMAC(dioh); d, EMAC(diol)). The HCOOH columns are means over 2010–2012. e, f, Probability histograms of the HCOOH column bias between EMAC simulations and satellite data. For EMAC(base) versus IASI (purple; e, f), the mean column bias over 2010–2012 is −1.97 × 1015 molecules cm−2, the median is −1.59 × 1015 molecules cm−2 and the 1σ standard deviation is 1.64 × 1015 molecules cm−2. For EMAC(dioh) versus IASI (blue; e), the mean is −0.88 × 1015 molecules cm−2, the median is −0.66 × 1015 molecules cm−2 and the 1σ standard deviation is 1.62 × 1015 molecules cm−2. For EMAC(diol) versus IASI (green; f), the mean is 0.99 × 1015 molecules cm−2, the median is 0.97 × 1015 molecules cm−2 and the 1σ standard deviation is 2.16 × 1015 molecules cm−2. A seasonal comparison is provided in Extended Data Figs. 3, 4.

Source data

Recent studies have proposed several missing sources to explain the model underprediction. These include locally enhanced emissions of HCOOH and its precursors, and updated or tentative chemical pathways that involve a broad range of precursors, primarily of biogenic origin6,9,12,14. To match the observed concentrations, the required increase in emissions of the known HCOOH precursors and/or HCOOH yields from hydrocarbon oxidation is inconsistent with our understanding of the reactive carbon budget7,8,15. Furthermore, such attempts do not account for the elevated HCOOH concentrations observed in free-tropospheric, low-VOC air masses13,16,17. Owing to a lack of supporting laboratory measurements, the proposed chemical pathways are often affected by large uncertainties or are speculative. Currently, no atmospheric model offers a consistent picture of tropospheric organic acids.

Here we present a large, ubiquitous chemical source of HCOOH from a multiphase pathway (Fig. 2). In cloud water, formaldehyde (HCHO)—the most abundant aldehyde in the atmosphere—is a known source of HCOOH in remote regions5,10,18, via rapid oxidation of its monohydrated form, methanediol (HOCH2OH). Nevertheless, most of the HCOOH produced in this manner is efficiently oxidized by OH in the aqueous phase before outgassing. As a result, the net contribution of in-cloud HCOOH formation is small18. Because most methanediol is assumed to instantaneously dehydrate to formaldehyde before it volatilizes, global models do not explicitly represent methanediol and instead account for direct aqueous-phase formation of HCOOH from formaldehyde19,20 (Fig. 2). Using experimental kinetic data21, we calculate that under typical warm cloud conditions (260–300 K) methanediol dehydration takes place on timescales of 100–900 s. This is longer than the timescales of cloud-droplet evaporation and aqueous-phase diffusion, which are shorter than 100 s and 0.1–0.01 s, respectively22,23. Moreover, methanediol transfer at the gas–liquid interface proceeds rapidly22. Therefore, the net flux is driven by the difference in chemical potential between the two phases. We provide evidence that methanediol reaction with OH in the gas phase quantitatively yields HCOOH under atmospheric conditions (Fig. 2). By conducting experiments with the atmospheric simulation chamber SAPHIR (Supplementary Information, section 1), we show that formaldehyde in aqueous solution is efficiently converted to gaseous methanediol immediately after injection, which quantitatively yields HCOOH on photo-oxidation (Fig. 3). This is supported by theoretical calculations (Supplementary Information, section 2). Hence, the competition between the gas- and aqueous-phase oxidation of methanediol determines the phase in which HCOOH is predominantly produced.

Fig. 2: Schematic of the multiphase production of formic acid.
figure 2

The common assumption in global atmospheric chemistry models is illustrated in black: aqueous-phase methanediol (HOCH2OH) is neglected and aqueous-phase formic acid (HCOOH) is assumed to form directly from formaldehyde (HCHO) on reaction with OH. The implementation of HOCH2OH multiphase equilibria is illustrated in red: the explicit representation of the slow dehydration of aqueous-phase HOCH2OH, of its fast outgassing from cloud droplets and of its OH-initiated oxidation in the gas phase leads to a pervasive production of gaseous HCOOH. Under typical daytime conditions with average [OH](g) = 1 × 106 molecules cm−3 and [OH](aq) = 1 × 10−13 mol l−1, the lifetimes of HOCH2OH against OH are about 1 × 105 s and 3 × 104 s, respectively. Under typical midday conditions with [OH](g) = 5 × 106 molecules cm−3, the gas-phase sink is five times stronger. Thus, gas-phase oxidation sustains the chemical gradient that drives HOCH2OH from the aqueous to the gas phase.

Fig. 3: Multiphase production of formic acid in the SAPHIR chamber.
figure 3

The formaldehyde (HCHO) mixing ratio was measured (in parts per billion by volume, ppbv) by differential optical absorption spectroscopy (black), whereas the sum of HCHO and methanediol (HOCH2OH) was measured using the Hantzsch method. The difference between the Hantzsch and differential optical absorption spectroscopy signals enables visualization of HOCH2OH (blue). Formic acid (HCOOH) was monitored by using proton-transfer reaction time-of-flight mass spectrometry (red). The instrument uncertainties (shading) are 5% for HCHO, 12% for HOCH2OH and 20% for HCOOH. On injection of the formalin (stabilized formaldehyde) solution into the Teflon chamber, HOCH2OH immediately outgasses from the droplets. The chamber roof is initially closed (stage I). The gas-phase HCHO mixing ratio is initially very low, but increases to be as abundant as HOCH2OH just before the start of the photo-oxidation when the roof is opened (stage II). The decay of the HCHO and HOCH2OH signals is concurrent with an additional production of HCOOH. Finally, addition of carbon monoxide (CO) as an OH scavenger enabled quantification of the wall effects (stage III). Experimental details are provided in Supplementary Information, sections 1 and 4.

Source data

We implemented in EMAC the explicit kinetic model for the aqueous-phase transformations and bidirectional phase transfer of methanediol (Supplementary Information, section 3). The solubility of methanediol is not known at any temperature and estimates of it span two orders of magnitude at 298 K. We gauge the effect of this uncertainty on the results by performing the simulations EMAC(diol) and EMAC(dioh), which implement the multiphase chemistry of methanediol with Henry’s law constants (solubilities) for methanediol of around 104 M atm−1 and 106 M atm−1, respectively (Methods). At the temperatures prevailing inside the clouds, the kinetic barrier strongly limits the dehydration of methanediol, allowing large amounts to be produced and then outgassed. Over regions with high levels of gas-phase formaldehyde and in the presence of clouds, large methanediol fluxes to the gas phase are predicted (Extended Data Fig. 2). Eventually, rapid gas-phase oxidation of methanediol by OH forms HCOOH, resulting in a substantial increase in the predicted HCOOH columns, by a factor of 2–4 compared to EMAC(base) (Fig. 1, Extended Data Figs. 3, 4). Because cloud droplets may potentially form everywhere and formaldehyde is ubiquitous in the troposphere (Extended Data Fig. 5), the HCOOH enhancement occurs both in high-VOC concentration regions and in remote environments. The additional HCOOH production allows the model predictions to reach the measured HCOOH levels derived from IASI and to reduce the mean (±1σ) model-to-satellite biases from −1.97(±1.64) × 1015 molecules cm−2 for EMAC(base) to −0.88(±1.62) × 1015 molecules cm−2 for EMAC(dioh) and 0.99(±2.16) × 1015 molecules cm−2 for EMAC(diol) (Fig. 1). Similar improvements are observed with respect to the FTIR data (Extended Data Fig. 1).

Although the multiphase mechanism fills the gap between model and measurements globally, the EMAC(dioh) and EMAC(diol) simulations overpredict the HCOOH columns over tropical forests and underpredict the columns over boreal forests. We ascribe these remaining discrepancies primarily to inaccuracies in the predicted formaldehyde distributions as compared to Ozone Monitoring Instrument (OMI)/Aura measurements (Extended Data Fig. 5). Regional underestimation (overestimation) of modelled formaldehyde translates through the multiphase conversion to underprediction (overprediction) of HCOOH (Extended Data Fig. 6). For instance, underestimated biomass-burning emissions of VOCs lead to an underpredicted abundance of formaldehyde, and hence of HCOOH, such as during the 2010 Russian wildfires (Extended Data Fig. 6a–d). Conversely, the too-high model temperatures over Amazonia during the dry season induce an excess in isoprene emissions, which results in too-high formaldehyde and HCOOH levels (Extended Data Fig. 6i–l). More realistic VOC emissions, and enhanced modelling of formaldehyde and its dependence on NOx, will eventually lead to further improvements in predicted HCOOH. Fast reaction of HCOOH with stabilized Criegee intermediates have recently been emphasized24,25. The overprediction of HCOOH over the tropical forests might be reduced if this additional sink were considered. Implementation of α-hydroperoxycarbonyls photolysis9,26 and photo-oxidation of aromatics27, and of a temperature-dependent solubility for methanediol, would further improve the representation of HCOOH.

We present in Table 1 a revised atmospheric budget for HCOOH, which we compare to estimates from recent studies6,7,8,9 (the contribution of single chemical terms is provided in Extended Data Table 1). EMAC(dioh) and EMAC(diol) provide, respectively, lower and higher estimates of the extra HCOOH produced via the multiphase processing of formaldehyde. EMAC(diol) yields an increase by a factor of five of the total photochemical source predicted by EMAC(base) (190.9 Tg yr−1 compared to 37.7 Tg yr−1), and gas-phase oxidation of methanediol becomes the dominant contributor to atmospheric HCOOH (150.6 Tg yr−1). Although EMAC(dioh) assumes that methanediol is 100 times more soluble (compared to EMAC(diol)), it still yields an increase by a factor of two in photochemical production (83.5 Tg yr−1). This is in line with previous estimates of the missing HCOOH sources, which include, from source inversions, direct HCOOH emissions from vegetation or the OH-initiated oxidation of a short-lived, unidentified biogenic precursor7. The second largest source is VOC ozonolysis (about 31 Tg yr−1); other sources are below 4 Tg yr−1.

Table 1 Atmospheric budget for formic acid

The extra HCOOH production leads to a more realistic prediction of atmospheric organic acids and substantially increases atmospheric acidity globally (Extended Data Fig. 7). Compared to EMAC(base), EMAC(dioh) and EMAC(diol) predict a decrease in the pH of clouds and rainwater in the tropics by as much as 0.2 and 0.3, respectively. The high moisture content, extended cloud cover and high temperatures that prevail in tropical and similar environments facilitate the production of HCOOH via formation and outgassing of the relevant gem-diol. Higher acidity is also predicted at North Hemisphere mid-latitudes in summertime, notably over boreal forests, consistent with previous predictions7.

The multiphase production of HCOOH affects predictions for formaldehyde and carbon monoxide (CO). Both gases are important for tropospheric ozone and radical cycles, and are usually the target of satellite-driven inversion modelling. EMAC(dioh) and EMAC(diol) predict decreases of up to 10% and 20%, respectively, in formaldehyde columns over tropical source regions during specific months (Extended Data Fig. 8). We anticipate that the estimates of regional hydrocarbon emissions based on formaldehyde source inversions will be improved once the multiphase mechanism is accounted for. The reduced formaldehyde concentrations result in lower modelled CO yield from methane oxidation, notably over remote areas, where methane oxidation is the main source of atmospheric CO (Extended Data Fig. 9). Globally, the average tropospheric CO yield from methane oxidation changes from 0.91 for EMAC(base) to 0.88 for EMAC(diol) and 0.90 for EMAC(dioh), in agreement with isotope-enabled inversion estimates28.

We have shown that a multiphase pathway involving aldehyde hydrates is decisive in predicting organic acid formation and atmospheric acidity. It could also be important in the presence of deliquescent aerosols and would explain the elevated HCOOH levels in cloud-free conditions29. Given the favourable hydration equilibrium constants for major C2–C3 carbonyls30, this pathway opens up avenues for more realistic representation of other abundant organic acids, and hence of cloud-droplet nucleation and cloud evolution. We expect the multiphase processing for glyoxal and methylglyoxal to be important for explaining the observed concentrations of oxalic and pyruvic acids4. Understanding these multiphase processes advances our knowledge of atmospheric reactive carbon oxidation chains and of chemistry–climate interactions.


Model setup and simulations

Simulations were performed with the ECHAM5/MESSy v2.53.0 model11 (EMAC) on the JURECA supercomputer31. A horizontal resolution of T63 (about 1.8° × 1.8°), with 31 vertical layers from the surface up to the lower stratosphere at 10 hPa, was applied. Chemical feedbacks are deactivated by using the quasi chemical transport mode32. Biomass-burning emissions are calculated with the Global Fire Assimilation System (GFAS) inventory33. The emission factors for organic compounds were taken from ref. 34, except the ones for aromatics, which were taken from refs. 35,36. Anthropogenic emissions of NOx and organic compounds were taken from ACCMIP37. The chosen gas-phase chemical mechanism includes a state-of-the-art representation of terpene and aromatics oxidation chemistry20. The EMAC cloud and precipitation parameterization follows ref. 38.

In the reference model simulation (EMAC(base)), HCOOH production proceeds through the ozonolysis of alkenes with terminal double bonds (simple alkenes and degradation products of isoprene and monoterpenes), alkyne oxidation, reaction of formaldehyde with the peroxy radical, oxidation of enols, and formation from vinyl alcohol39. Nonetheless, we exclude the OH-initiated oxidation of isoprene and monoterpenes, the corresponding mechanisms of which are still speculative6,8,40,41, as well as the reaction of methyl peroxy radical with OH, which was shown not to yield HCOOH42. A detailed description of the relevant chemical kinetics, budget terms and deposition parameters for each model simulation is provided in Supplementary Information, section 3a.

Two simulations with the explicit multiphase model for methanediol, EMAC(dioh) and EMAC(diol), are described in detail in Supplementary Information, section 3b. The simulations differ only by the value of the Henry’s law constant (solubility) of methanediol, for which no experimental measurements are available. Values of about 104 M atm−1 and 106 M atm−1 are used for EMAC(diol) and EMAC(dioh), respectively. These are possible values of the Henry’s law constant for methanediol, given the spread of estimates at 298 K by semi-empirical methods and the expected temperature dependence. However, higher values (around 107 M atm−1) cannot be excluded at typical temperatures of warm clouds (Supplementary Information, section 3b.iii).

For the comparison with IASI and OMI observations (Fig. 1, Extended Data Figs. 36), the HCOOH and formaldehyde volume mixing ratio profiles simulated by EMAC are sampled along the Sun-synchronous satellite Metop-A and Aura orbits, respectively, at the time and location of the IASI and OMI measurements, using the SORBIT submodel11. The sampled volume mixing ratios are then daily averaged and computed in HCOOH and formaldehyde columns.

Model sources of uncertainties, including the formation of a HCOOH·H2O complex with water vapour43, are discussed in Supplementary Information, section 5.

IASI column observations

IASI44 is a nadir-viewing Fourier transform spectrometer launched on board the Metop-A, -B and -C platforms in October 2006, September 2012 and November 2018, respectively. IASI measures in the thermal infrared, between 645 cm−1 and 2,760 cm−1. It records radiance from the Earth’s surface and the atmosphere, with an apodized spectral resolution of 0.5 cm−1, spectrally sampled at 0.25 cm−1. In the spectral range in which the HCOOH ν6 Q branch absorbs (about 1,105 cm−1), IASI has a radiometric noise of around 0.15 K for a reference blackbody at 280 K. IASI provides near global coverage twice per day, with observations at around 09:30 am and 09:30 pm, local time. Here, the HCOOH columns are derived from IASI/Metop-A (covering 2010–2012). Only the morning satellite overpasses are used, because such observations have a higher measurement sensitivity13. For comparison with EMAC simulations, the 2010–2012 IASI data are daily averaged on the model spatial grid. On average, 17 satellite measurements per day (more than 18,000 over 2010–2012) are used per 1.8° × 1.8° model grid box at the Equator. This number increases with latitude and with the higher spatial sampling of IASI, owing to the satellite polar orbits.

Version 3 of the artificial neural network for IASI (ANNI) was applied to retrieve HCOOH abundances from the IASI measurements (see refs. 13,45 for a comprehensive description of the retrieval algorithm and the HCOOH product). The ANNI framework was specifically designed to provide a robust and unbiased retrieval of weakly absorbing trace gases such as HCOOH. The retrieval relies on a neural network to convert weak spectral signatures to a total column, accounting for the state of the surface and atmosphere at the time and place of the overpass of IASI. The vertical sensitivity of IASI to HCOOH peaks between 1 km and 6 km, gradually decreasing outside that range46. However, by assuming that HCOOH is distributed vertically according to a certain profile, the neural network is able to provide an estimate of the total column of HCOOH. Because the ANNI retrievals do not rely on a priori information, no averaging kernels are produced and the retrieved columns are meant to be used at face value for carrying out unbiased comparisons with model data (see ref. 13 and references therein for the rationale). Data filtering prevents retrieval over cloudy scenes and post-filtering discards scenes for which the sensitivity to HCOOH is too low for a meaningful retrieval.

The HCOOH product comes with its own pixel-dependent estimate of random uncertainties, calculated by propagating the uncertainties of each input variable of the neural network13. For a typical non-background HCOOH abundance ((0.3–2.0) × 1015 molecules cm−2), the relative uncertainty on an individual retrieved column ranges from 10% to 50%, with the highest uncertainties found for the low columns. This uncertainty increases for lower-background columns as the weaker HCOOH concentrations approach the IASI detection threshold. However, these random uncertainties become negligible for the column averages presented here, because of the total number of measurements per grid cell. With respect to systematic uncertainties, the main term is related to the assumption of a fixed HCOOH vertical profile. It is not possible to quantify this uncertainty on an individual-pixel basis, but it was estimated to not exceed 20% on average13. A comparison with independent HCOOH columns from ground-based FTIR measurements at various latitudes and environments confirmed the absence of any large systematic biases of the IASI data45. Although biases of around 20% cannot be excluded, in the context of this work, the accuracy of the IASI product is sufficient to demonstrate the initial model underprediction (EMAC(base)) of the HCOOH columns and the large improvements from the multiphase mechanism.

Theoretical predictions

Quantum chemical calculations were performed at various levels of theory, up to CCSD(T)/CBS(DTQ)//IRCMax(CCSD(T)//M06-2X/aug-cc-pVQZ), and combined with E,J-μVTST multi-conformer microvariational transition-state calculations to obtain rate coefficients for the gas-phase high-pressure-limit rate coefficients (Supplementary Information, section 2).