Ubiquitous atmospheric production of organic acids mediated by cloud droplets

Atmospheric acidity is increasingly determined by carbon dioxide and organic acids1–3. Among the latter, formic acid facilitates the nucleation of cloud droplets4 and contributes to the acidity of clouds and rainwater1,5. At present, chemistry–climate models greatly underestimate the atmospheric burden of formic acid, because key processes related to its sources and sinks remain poorly understood2,6–9. Here we present atmospheric chamber experiments that show that formaldehyde is efficiently converted to gaseous formic acid via a multiphase pathway that involves its hydrated form, methanediol. In warm cloud droplets, methanediol undergoes fast outgassing but slow dehydration. Using a chemistry–climate model, we estimate that the gas-phase oxidation of methanediol produces up to four times more formic acid than all other known chemical sources combined. Our findings reconcile model predictions and measurements of formic acid abundance. The additional formic acid burden increases atmospheric acidity by reducing the pH of clouds and rainwater by up to 0.3. The diol mechanism presented here probably applies to other aldehydes and may help to explain the high atmospheric levels of other organic acids that affect aerosol growth and cloud evolution.

Article produced in this manner is efficiently oxidized by OH in the aqueous phase before outgassing. As a result, the net contribution of in-cloud HCOOH formation is small 18 . Because most methanediol is assumed to instantaneously dehydrate to formaldehyde before it volatilizes, global models do not explicitly represent methanediol and instead account for direct aqueous-phase formation of HCOOH from formaldehyde 19,20 (Fig. 2). Using experimental kinetic data 21 , we calculate that under typical warm cloud conditions (260-300 K) methanediol dehydration takes place on timescales of 100-900 s. This is longer than the timescales of cloud-droplet evaporation and aqueous-phase diffusion, which are shorter than 100 s and 0.1-0.01 s, respectively 22,23 . Moreover, methanediol transfer at the gas-liquid interface proceeds rapidly 22 . Therefore, the net flux is driven by the difference in chemical potential between the two phases. We provide evidence that methanediol reaction with OH in the gas phase quantitatively yields HCOOH under atmospheric conditions (Fig. 2). By conducting experiments with the atmospheric  simulation chamber SAPHIR (Supplementary Information, section 1), we show that formaldehyde in aqueous solution is efficiently converted to gaseous methanediol immediately after injection, which quantitatively yields HCOOH on photo-oxidation (Fig. 3). This is supported by theoretical calculations (Supplementary Information, section 2). Hence, the competition between the gas-and aqueous-phase oxidation of methanediol determines the phase in which HCOOH is predominantly produced.
We implemented in EMAC the explicit kinetic model for the aqueous-phase transformations and bidirectional phase transfer of methanediol ( Supplementary Information, section 3). The solubility of methanediol is not known at any temperature and estimates of it span two orders of magnitude at 298 K. We gauge the effect of this uncertainty on the results by performing the simulations EMAC (diol) and EMAC (dioh) , which implement the multiphase chemistry of methanediol with Henry's law constants (solubilities) for methanediol of around 10 4 M atm −1 and 10 6 M atm −1 , respectively (Methods). At the temperatures prevailing inside the clouds, the kinetic barrier strongly limits the dehydration of methanediol, allowing large amounts to be produced and then outgassed. Over regions with high levels of gas-phase formaldehyde and in the presence of clouds, large methanediol fluxes to the gas phase are predicted (Extended Data Fig. 2). Eventually, rapid gas-phase oxidation of methanediol by OH forms HCOOH, resulting in a substantial increase in the predicted HCOOH columns, by a factor of 2-4 compared to EMAC (base) (Fig. 1, Extended Data Figs. 3, 4). Because cloud droplets may potentially form everywhere and formaldehyde is ubiquitous in the troposphere (Extended Data Fig. 5), the HCOOH enhancement occurs both in high-VOC concentration regions and in remote environments.
Although the multiphase mechanism fills the gap between model and measurements globally, the EMAC (dioh) and EMAC (diol) simulations overpredict the HCOOH columns over tropical forests and underpredict the columns over boreal forests. We ascribe these remaining discrepancies primarily to inaccuracies in the predicted formaldehyde distributions as compared to Ozone Monitoring Instrument (OMI)/ Aura measurements (Extended Data Fig. 5). Regional underestimation (overestimation) of modelled formaldehyde translates through the multiphase conversion to underprediction (overprediction) of HCOOH (Extended Data Fig. 6). For instance, underestimated biomass-burning emissions of VOCs lead to an underpredicted abundance of formaldehyde, and hence of HCOOH, such as during the 2010 Russian wildfires (Extended Data Fig. 6a-d). Conversely, the too-high model temperatures over Amazonia during the dry season induce an excess in isoprene emissions, which results in too-high formaldehyde and HCOOH levels (Extended Data Fig. 6i-l). More realistic VOC emissions, and enhanced modelling of formaldehyde and its dependence on NO x , will eventually lead to further improvements in predicted HCOOH. Fast reaction of HCOOH with stabilized Criegee intermediates have recently been emphasized 24,25 . The overprediction of HCOOH over the tropical forests might be reduced if this additional sink were considered. Implementation of α-hydroperoxycarbonyls photolysis 9,26 and photo-oxidation of aromatics 27 , and of a temperature-dependent solubility for methanediol, would further improve the representation of HCOOH.
We present in Table 1 a revised atmospheric budget for HCOOH, which we compare to estimates from recent studies [6][7][8][9] (the contribution of single chemical terms is provided in Extended Data Table 1). EMAC (dioh) and EMAC (diol) provide, respectively, lower and higher estimates of the extra HCOOH produced via the multiphase processing of formaldehyde. EMAC (diol) yields an increase by a factor of five of the total photochemical source predicted by EMAC (base) (190.9 Tg yr −1 compared to 37.7 Tg yr −1 ), and gas-phase oxidation of methanediol becomes the dominant contributor to atmospheric HCOOH (150.6 Tg yr −1 ). Although EMAC (dioh) assumes that methanediol is 100 times more soluble (compared to EMAC (diol) ), it still yields an increase by a factor of two in photochemical production (83.5 Tg yr −1 ). This is in line with previous estimates of the missing HCOOH sources, which include, from source inversions, direct HCOOH emissions from vegetation or the OH-initiated oxidation of a short-lived, unidentified biogenic precursor 7 . The second largest source is VOC ozonolysis (about 31 Tg yr −1 ); other sources are below 4 Tg yr −1 .
The extra HCOOH production leads to a more realistic prediction of atmospheric organic acids and substantially increases atmospheric acidity globally (Extended Data Fig. 7). Compared to EMAC (base) , EMAC (dioh) and EMAC (diol) predict a decrease in the pH of clouds and rainwater in the tropics by as much as 0.2 and 0.3, respectively. The high moisture content, extended cloud cover and high temperatures that prevail in tropical and similar environments facilitate the production of HCOOH via formation and outgassing of the relevant gem-diol. Higher acidity is also predicted at North Hemisphere mid-latitudes in summertime, notably over boreal forests, consistent with previous predictions 7 .
The multiphase production of HCOOH affects predictions for formaldehyde and carbon monoxide (CO). Both gases are important for tropospheric ozone and radical cycles, and are usually the target of satellite-driven inversion modelling. EMAC (dioh) and EMAC (diol) predict decreases of up to 10% and 20%, respectively, in formaldehyde columns over tropical source regions during specific months (Extended Data  The common assumption in global atmospheric chemistry models is illustrated in black: aqueous-phase methanediol (HOCH 2 OH) is neglected and aqueous-phase formic acid (HCOOH) is assumed to form directly from formaldehyde (HCHO) on reaction with OH. The implementation of HOCH 2 OH multiphase equilibria is illustrated in red: the explicit representation of the slow dehydration of aqueous-phase HOCH 2 OH, of its fast outgassing from cloud droplets and of its OH-initiated oxidation in the gas phase leads to a pervasive production of gaseous HCOOH. Under typical daytime conditions with average [OH] (g) = 1 × 10 6 molecules cm −3 and [OH] (aq) = 1 × 10 −13 mol l −1 , the lifetimes of HOCH 2 OH against OH are about 1 × 10 5 s and 3 × 10 4 s, respectively. Under typical midday conditions with [OH] (g) = 5 × 10 6 molecules cm −3 , the gas-phase sink is five times stronger. Thus, gas-phase oxidation sustains the chemical gradient that drives HOCH 2 OH from the aqueous to the gas phase. [OH] g = 7 × 10 6 molecules cm -3 [OH] g = 0 molecules cm -3

Fig. 3 | Multiphase production of formic acid in the SAPHIR chamber.
The formaldehyde (HCHO) mixing ratio was measured (in parts per billion by volume, ppbv) by differential optical absorption spectroscopy (black), whereas the sum of HCHO and methanediol (HOCH 2 OH) was measured using the Hantzsch method. The difference between the Hantzsch and differential optical absorption spectroscopy signals enables visualization of HOCH 2 OH (blue). Formic acid (HCOOH) was monitored by using proton-transfer reaction time-of-flight mass spectrometry (red). The instrument uncertainties (shading) are 5% for HCHO, 12% for HOCH 2 OH and 20% for HCOOH. On injection of the formalin (stabilized formaldehyde) solution into the Teflon chamber, HOCH 2 OH immediately outgasses from the droplets. The chamber roof is initially closed (stage I). The gas-phase HCHO mixing ratio is initially very low, but increases to be as abundant as HOCH 2 OH just before the start of the photo-oxidation when the roof is opened (stage II). The decay of the HCHO and HOCH 2 OH signals is concurrent with an additional production of HCOOH.
Finally, addition of carbon monoxide (CO) as an OH scavenger enabled quantification of the wall effects (stage III). Experimental details are provided in Supplementary Information, sections 1 and 4.  Fig. 8). We anticipate that the estimates of regional hydrocarbon emissions based on formaldehyde source inversions will be improved once the multiphase mechanism is accounted for. The reduced formaldehyde concentrations result in lower modelled CO yield from methane oxidation, notably over remote areas, where methane oxidation is the main source of atmospheric CO (Extended Data Fig. 9). Globally, the average tropospheric CO yield from methane oxidation changes from 0.91 for EMAC (base) to 0.88 for EMAC (diol) and 0.90 for EMAC (dioh) , in agreement with isotope-enabled inversion estimates 28 .
We have shown that a multiphase pathway involving aldehyde hydrates is decisive in predicting organic acid formation and atmospheric acidity. It could also be important in the presence of deliquescent aerosols and would explain the elevated HCOOH levels in cloud-free conditions 29 . Given the favourable hydration equilibrium constants for major C 2 -C 3 carbonyls 30 , this pathway opens up avenues for more realistic representation of other abundant organic acids, and hence of cloud-droplet nucleation and cloud evolution. We expect the multiphase processing for glyoxal and methylglyoxal to be important for explaining the observed concentrations of oxalic and pyruvic acids 4 . Understanding these multiphase processes advances our knowledge of atmospheric reactive carbon oxidation chains and of chemistryclimate interactions.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03462-x.

Model setup and simulations
Simulations were performed with the ECHAM5/MESSy v2.53.0 model 11 (EMAC) on the JURECA supercomputer 31 . A horizontal resolution of T63 (about 1.8° × 1.8°), with 31 vertical layers from the surface up to the lower stratosphere at 10 hPa, was applied. Chemical feedbacks are deactivated by using the quasi chemical transport mode 32 . Biomass-burning emissions are calculated with the Global Fire Assimilation System (GFAS) inventory 33 . The emission factors for organic compounds were taken from ref. 34 , except the ones for aromatics, which were taken from refs. 35,36 . Anthropogenic emissions of NO x and organic compounds were taken from ACCMIP 37 . The chosen gas-phase chemical mechanism includes a state-of-the-art representation of terpene and aromatics oxidation chemistry 20 . The EMAC cloud and precipitation parameterization follows ref. 38 .
In the reference model simulation (EMAC (base) ), HCOOH production proceeds through the ozonolysis of alkenes with terminal double bonds (simple alkenes and degradation products of isoprene and monoterpenes), alkyne oxidation, reaction of formaldehyde with the peroxy radical, oxidation of enols, and formation from vinyl alcohol 39 . Nonetheless, we exclude the OH-initiated oxidation of isoprene and monoterpenes, the corresponding mechanisms of which are still speculative 6,8,40,41 , as well as the reaction of methyl peroxy radical with OH, which was shown not to yield HCOOH 42 . A detailed description of the relevant chemical kinetics, budget terms and deposition parameters for each model simulation is provided in Supplementary Information, section 3a.
Two simulations with the explicit multiphase model for methanediol, EMAC (dioh) and EMAC (diol) , are described in detail in Supplementary  Information, section 3b. The simulations differ only by the value of the Henry's law constant (solubility) of methanediol, for which no experimental measurements are available. Values of about 10 4 M atm −1 and 10 6 M atm −1 are used for EMAC (diol) and EMAC (dioh) , respectively. These are possible values of the Henry's law constant for methanediol, given the spread of estimates at 298 K by semi-empirical methods and the expected temperature dependence. However, higher values (around 10 7 M atm −1 ) cannot be excluded at typical temperatures of warm clouds ( Supplementary Information, section 3b.iii).
For the comparison with IASI and OMI observations (Fig. 1, Extended Data Figs. [3][4][5][6], the HCOOH and formaldehyde volume mixing ratio profiles simulated by EMAC are sampled along the Sun-synchronous satellite Metop-A and Aura orbits, respectively, at the time and location of the IASI and OMI measurements, using the SORBIT submodel 11 . The sampled volume mixing ratios are then daily averaged and computed in HCOOH and formaldehyde columns. Model sources of uncertainties, including the formation of a HCOOH·H 2 O complex with water vapour 43 , are discussed in Supplementary Information, section 5. Version 3 of the artificial neural network for IASI (ANNI) was applied to retrieve HCOOH abundances from the IASI measurements (see refs. 13,45 for a comprehensive description of the retrieval algorithm and the HCOOH product). The ANNI framework was specifically designed to provide a robust and unbiased retrieval of weakly absorbing trace gases such as HCOOH. The retrieval relies on a neural network to convert weak spectral signatures to a total column, accounting for the state of the surface and atmosphere at the time and place of the overpass of IASI. The vertical sensitivity of IASI to HCOOH peaks between 1 km and 6 km, gradually decreasing outside that range 46 . However, by assuming that HCOOH is distributed vertically according to a certain profile, the neural network is able to provide an estimate of the total column of HCOOH. Because the ANNI retrievals do not rely on a priori information, no averaging kernels are produced and the retrieved columns are meant to be used at face value for carrying out unbiased comparisons with model data (see ref. 13 and references therein for the rationale). Data filtering prevents retrieval over cloudy scenes and post-filtering discards scenes for which the sensitivity to HCOOH is too low for a meaningful retrieval.

IASI column observations
The HCOOH product comes with its own pixel-dependent estimate of random uncertainties, calculated by propagating the uncertainties of each input variable of the neural network 13 . For a typical non-background HCOOH abundance ((0.3-2.0) × 10 15 molecules cm −2 ), the relative uncertainty on an individual retrieved column ranges from 10% to 50%, with the highest uncertainties found for the low columns. This uncertainty increases for lower-background columns as the weaker HCOOH concentrations approach the IASI detection threshold. However, these random uncertainties become negligible for the column averages presented here, because of the total number of measurements per grid cell. With respect to systematic uncertainties, the main term is related to the assumption of a fixed HCOOH vertical profile. It is not possible to quantify this uncertainty on an individual-pixel basis, but it was estimated to not exceed 20% on average 13 . A comparison with independent HCOOH columns from ground-based FTIR measurements at various latitudes and environments confirmed the absence of any large systematic biases of the IASI data 45 . Although biases of around 20% cannot be excluded, in the context of this work, the accuracy of the IASI product is sufficient to demonstrate the initial model underprediction (EMAC (base) ) of the HCOOH columns and the large improvements from the multiphase mechanism.

Data availability
The The raw quantum chemical data are provided in Supplementary Information, section 10. Source data are provided with this paper.

Code availability
The Modular Earth Submodel System (MESSy) is continuously being developed and applied by a consortium of institutions. The usage of MESSy and access to the source code is licensed to all affiliates of institutions that are members of the MESSy Consortium. Institutions can become a member of the MESSy Consortium by signing the MESSy Memorandum of Understanding (more information at http://www. messy-interface.org). The modifications presented here were implemented on MESSy v2.53.0. The source code used to produce the results is archived at the Jülich Supercomputing Centre and can be made available to members of the MESSy community on request.   were calculated between the daily mean FTIR and EMAC data, over the days with FTIR measurements available. The vertical sensitivity of the FTIR retrievals was accounted for by applying averaging kernels (except at Wollongong, where no averaging kernels were produced). Details on the ground-based FTIR retrievals are provided in Supplementary Information, section 6. m a.s.l., metres above sea level. Fig. 7 | Effect of cloud processing on cloud and rainwater acidity. a-h, pH difference of the large-scale clouds (a, e) and associated rain (b, f), and of the convective clouds (c, g) and associated rain (d, h), between the EMAC (diol) and EMAC (base) simulations. The pH differences are seasonal averages over June-August (a-d) and September-November (e-h) 2010-2012. The pH decrease is due to the additional production of formic acid (HCOOH) via the multiphase chemistry of methanediol implemented in EMAC (diol) . The effect on cloud and rain pH of the EMAC (dioh) simulation is displayed in Supplementary  Fig. 8 ( Supplementary Information, section 8). The effect on HCHO modelling of the EMAC (dioh) simulation is presented in Supplementary Fig. 9 ( Supplementary Information, section 8).