## Introduction

The distribution of ecosystems across the globe is strongly regulated by the availability of water1. In the face of predicted future climate shifts, reconstructions of past climate and hydrology provide critical opportunities to evaluate the relationship between atmospheric CO2 and hydrological regimes during periods when Earth was significantly warmer than today2. At present, however, reconstructions of past water availability suffer from large uncertainties, especially in regions that are prone to large fluctuations in climate and hydrology in response to variations in pCO2 (e.g. Western North America).

A number of semi-quantitative geochemical methods have been used to infer paleoaridity. These include: (i) stable isotopes of fossil bones/teeth3,4, speleothems5, and soil carbonates6,7; (ii) the presence and/or provenance of dust in ice cores8; (iii) the carbon isotope composition of plants9,10; and (iv) the geochemical composition of paleosols11,12. While these methods have advanced our understanding of paleohydrological processes, they possess a number of significant mechanistic, temporal or spatial limitations. For example, dust in ice cores is strongly influenced by changes in wind strength and direction, in addition to aridity13. Ice cores also only record the last few million years of glacial conditions, placing important temporal constraints on paleoclimatic reconstructions. Conflict between paleosol and fossil bone carbonate δ18O records4,12 is likely due to the fact that the isotopic composition of ungulate tooth enamel can record the influence of a complex mixture of factors such as temperature, water availability and atmospheric circulation patterns14, as well as the ability of animals to migrate significant distances to continue residing in their ideal habitat15. Furthermore, soil carbonates only form in a discrete range of hydrologic conditions. A shift from a dry environment (characterized by soil carbonate formation) to wetter conditions is often marked by a reduction or complete absence of soil carbonate nodules.

In recent decades, there has been a dramatic upsurge in the use of molecules present in the epicuticular waxes coating leaves of terrestrial and aquatic higher plants (Fig. 1) as proxies for paleoclimatic and paleoenvironmental reconstructions16. The are many advantages to this biomarker-based approach. n-Alkyl lipids from higher plants are found in both terrestrial and marine sedimentary systems10,17,18,19,20, and are relatively resistant to shallow burial diagenesis21,22,23, enabling paleoenvironmental reconstruction over hundreds of millions of years of Earth history24,25. A number of methods have been used to infer paleoclimatic information from leaf wax biomarkers. The δ13C compositions of n-alkanes in sedimentary archives are commonly used to identify the proportion of C3 and C4 plants, and hence extrapolate changes in temperature and aridity26,27. Relative humidity and the availability of moisture are also thought to be important determinants of the relationship between the hydrogen isotope composition of precipitation and the δD signal recorded by leaf waxes28. The extent that leaf wax n-alkane δD values record this aridity/transpiration signal, however, remains controversial. Some studies of modern vegetation suggest that transpiration influences n-alkane δD values29, yet grasses grown under controlled greenhouse conditions show no n-alkane δD shifts in response to changes in relative humidity30. In addition, biochemical processes are shown to drive interspecies variation in n-alkane δD among modern plants31,32, suggesting that mechanisms controlling leaf wax lipid hydrogen isotope compositions are not yet fully constrained.

The molecular distribution of leaf wax n-alkanes, commonly described by indices such as the Average Chain Length (ACL), is shown to vary along a latitudinal gradient, a phenomenon previously interpreted to be driven by changes in temperature33,34,35. However, correlation does not necessarily imply causation. ACL is also known to be related to changes in the availability of moisture, with longer chain lengths prevalent in more arid environments34,36,37,38. The trend for plants to synthesize longer alkane homologues under conditions with lower humidity has even been documented during glacial periods, where aridity is synchronous with colder climates39.

There are several ways to express the moisture state of the atmosphere. Relative humidity (rH), for example, describes the ratio of the actual vapor pressure of the air, (e a ) and the saturation vapor pressure of air at a given temperature, e s (T a ), and is reported as a percentage as shown in Equation 140:

$${RH}=100\times \,\frac{{e}_{a}}{{e}_{s}({T}_{a})}$$
(1)

Relative humidity is therefore not an actual measurement of the quantity of water vapor in the atmosphere, but a simple ratio of two known values40,41. The actual water holding capacity of the atmosphere, however, doubles exponentially for every ~11 °C of temperature increase, meaning the same relative humidity over a range of temperatures can reflect very different atmospheric moisture contents41. In contrast, vapor pressure deficit (VPD) reports an absolute measure of the amount of atmospheric moisture relative to e s (T a ) which is a function of temperature. VPD is defined in Equation 240:

$${VPD}={e}_{s}({T}_{a})-{e}_{a}$$
(2)

VPD reflects the atmospheric moisture deficit, which controls the extent to which the atmosphere can extract moisture from land surfaces (and by extrapolation the evaporative demand on plants). If all other factors (e.g. windspeed, water availability) are constant, for a given VPD the rate of evaporation is constant regardless of air temperature40. A VPD of 2 kPa will therefore have the same impact on the rate of evaporation from plant leaves regardless of MAT - indeed, this is why VPD is argued to be a better determinant of plant water stress than rH40,41.

Just as plants respond to variation in the intra- vs. extracellular pressure in CO2, the gradient between atmospheric and intracellular or leaf surface concentration of specific molecules (including water), influences the flux of volatile and organic compounds from plant leaves42,43. In addition, gene regulation within plants is shown to be sensitive to specific biologic triggers related to water availability, not temperature44. Here, we show that ACL records changes in VPD at the ecosystem scale, based on the statistically significant relationship between ACL and mean annual VPD (VPDav) across a range of biomes in North and Central America. We then apply this transfer function to an organic molecular record from Miocene sediments in Central Spain that span the Middle Miocene Climatic Optimum (MMCO) and the Middle Miocene Climate Transition (MMCT). This new paleohydrological tool enables reconstruction of VPD during this global climate transition.

## Sampling and Statistical Analysis

### Modern calibration studies

We compiled a comprehensive dataset of 149 new and previously published soil n-alkane profiles from North and Central America (Fig. 2; Table S1). Soils span a range of Koppen climates, from Koppen number 12 to 4345. We obtained climate variables from the ‘PRISM’ database (PRISM Climate Group, 2010), and the WorldClim 2 database46 (Supplementary Information). Both of these databases have a spatial resolution of ~800 m to 1 km grid squares, and provide 30 year normals of climate parameters such as monthly average temperature, precipitation and vapor pressure deficit. Sample sites span a range of ecosystem types with MAT from −0.2 to 26.6 °C, while ACL values from our selected soils range from 28 to 33, spanning the range of values commonly reported in sedimentary lipid biomarker studies34,47. Importantly, we focus on the ACL recorded in soils and sediments, which integrate vegetation inputs totalling hundreds or thousands of years, providing a combined ecosystem-scale signal rather than that of any individual plant species. Mean annual VPD values for all modern sites range from 0.4 kPa to 1.3 kPa. All statistical analyses describing the relationships between our data are carried out using Minitab v. 17.

To establish the relationship between sedimentary leaf wax n-alkane Average Chain Length (ACL) values and annual average vapor pressure deficit (VPDav) we combined previously published soil n-alkane distribution profiles from North and Central America33,35,48, with new measurements of soils and sediments from western North America (SI Table 1). Where possible, we recalculated this index from peak area determinations provided in previous publications using Equation 3 (where C x refers to the peak area of the individual alkane), to ensure consistency in the alkane chain lengths used:

$${ACL}=\frac{{C}_{25}(25)+{C}_{27}(27)+{C}_{29}(29)+{C}_{31}(31)+{C}_{33}(33)+{C}_{35}(35)}{{C}_{25}+{C}_{27}+{C}_{29}+{C}_{31}+{C}_{33}+{C}_{35}}$$
(3)

We note the existence of a broad positive relationship between VPD and MAT in our sampling locations (Fig. S2). However, the presence of a genetic signalling pathway upregulating the production of leaf wax precursors in response to water deficit44, and the absence of a similar molecular response to increasing temperatures, leads us to conclude that it is VPD, rather than MAT, that is the dominant control on our ACL values.

## Results and Discussion

### Relationship between ACL and VPD in Modern Soils

We used ACL and climate information from modern soils that span North and Central America to quantify the relationship between chain length distribution and VPD (Supplementary Information Table S2). We performed regression analysis using VPD as the predictor variable and ACL as the response variable, and identified a statistically significant (p < 0.05) positive relationship (Fig. 3) between ACL and VPD, described by Equation 4:

$${\rm{ACL}}=26.78+5.670\,{\rm{VPD}}-2.162\,{{\rm{VPD}}}^{2}$$
(4)

Our data show that half of the variation in ACL can be explained by variation in VPD. The residuals plot (Fig. 3b) displays homoscedacity, indicating that errors associated with the model predictions are stochastic. The standard error associated with VPD values calculated using our regression model is ±0.1 kPa. This error was subsequently assumed for all VPD reconstructions from the Miocene sediments.

### Proposed mechanism driving relationship between ACL and VPD

There are two primary mechanisms that could account for the observed relationship between ACL and VPD. The ‘plant production’ mechanism requires that plants actively shift towards preferential production of the longer-chain n-alkane homologues via the acetogenic lipid pathway (thus increasing ACL) in response to external environmental or climatic drivers49. Although the overall composition of leaf wax is genetically regulated and highly responsive to developmental and environmental factors50, previous studies show conflicting results regarding whether n-alkane distribution patterns vary dynamically and systematically in response to environmental change. A number of authors suggest that ACL is primarily linked to temperature34,51, however there is no specific biologic mechanism linking temperature to the chain length of compounds that are produced. But water deficit and osmotic stress stimulate transcription of the CER6 gene, which is involved in the elongation of fatty acid acyl-CoAs longer than C2244,50. This means that, unlike for temperature, there is a specific biologic signalling system that relates leaf wax biosynthetic processes to water deficit. Further, examination of our temperature and VPD data for all sampling sites show that these are linearly related (Supplementary Information Fig. S2), thus apparent links between ACL and MAT are most likely dominantly driven by the moisture deficit effects, rather than temperature alone.

The relationship between ACL and VPD does not require the invocation of a plant physiological or biochemical mechanism, however, as it can be explained entirely through physical processes. n-Alkanes are known to be released from plants alongside other volatile compounds (e.g., terpenes, isoprene) and water52,53. When VPD increases, evaporation of these volatiles from leaves rises, as plants regulate stomatal and cuticular transpiration in response to changes in the moisture availability of the atmosphere43. Longer chain length homologues have higher molecular masses than shorter-chain n-alkanes (e.g., nonacosane = 408.799 g/mol, pentacosane = 352.691 g/mol and tricosane = 324.637 g/mol), and evaporate more slowly due to stronger inter-molecule bonds54. As a result, shorter homologues are likely to be preferentially lost when VPD increases, potentially contributing to the observed positive relationship between ACL and VPD.

Sedimentary n-alkanes record the combined effect of VPD on many individual plants over hundreds and/or thousands of years. Thus, while modern studies of individual plant n-alkane responses to environmental perturbations may be highly specific and display a wide scatter, sedimentary alkanes record long-term ecosystem-scale changes due to climatic and hydrologic shifts. Further, should plants that preferentially produce longer-chain alkanes (as a protection against increasing aridity) proliferate as an ecosystem becomes more moisture depleted, soils will incorporate this vegetation shift signal over long formation timescales. Thus, as we specifically focus on soils here in our modern calibration, we effectively encompass this potential ‘ecosystem-change’ driver of the observed relationship between ACL and VPR in our transfer function.

### Paleoaridity changes during the MMCO and MMCT in Central Spain

The Miocene was a time of rapid climate shifts. The Middle Miocene Climatic Optimum (MMCO, 15–17 Ma) was the warmest period of the Neogene with temperatures some 3–8 °C warmer than present55. In contrast, the subsequent Middle Miocene Climate Transition (MMCT, ~15–13.7 Ma) saw the widespread expansion of the East Antarctic Ice Sheet, and a shift towards much cooler conditions56. On a global scale, changes in atmospheric pCO2 during the Miocene are likely to have played a dominant role in climate dynamics57. At a regional level, however, spatial heterogeneity in the nature and magnitude of environmental change suggests that more localised mechanisms such as tectonic uplift, changes in regional geology, shifts in temperature gradients or variation in freshwater inputs also influence terrestrial temperature and hydrology across Europe58. The production of new high-resolution sedimentary sequences spanning the MMCO and the MMCT, and the development of new organic molecular tools for paleoclimate reconstruction, are critical steps in the evolution of our understanding of terrestrial environmental change during the mid and late Miocene.

The Armantes section is located in the Calatayud-Daroca basin of central Spain and contains expanded sections of fluvial and paleosol sequences spanning the MMCO59 and the cooling associated with the MMCT57,60,61. Sedimentation is continuous, with no observed hiatuses between 17 Ma and 12 Ma62. The section is up to 280 m thick, consisting of alternating red clay/silts and pink/white indurated silty limestones for much of its expanse59. The Calatayud-Daroca basin contains many fossil bearing sediments, making it a key location for understanding the response of ecosystems in Southwest Europe to Miocene climate shifts58. The age model for this section is based on well-established magnetostratigraphy59,62. A number of paleobiological studies establish that many parts of Spain experienced widespread cooling and drying during this time interval60,63, making it an ideal sequence to evaluate our new leaf wax biomarker-based paleoaridity proxy and expand existing paleohydrological reconstructions of the MMCO and MMCT in central Spain.

Sediments from the Armantes section record a positive shift of ~3 ACL units between 16.4 and 12.4 Ma. These sediments span > 200 m of stratigraphic section, and are unlikely to have experienced differential diagenesis between the top and bottom of the section following burial. Equally, despite sediments containing both paleosol and floodplain deposits, there is no indication that the observed changes in ACL are due to an increased contribution from wetland/aquatic plants. Indeed, aquatic plants frequently have distinctive n-alkane distribution patterns with carbon chain maxima at C23 or C2564, while all of the Armantes samples analysed here (with the exception of the sample dating from 16.4 Ma) have carbon chain maxima of C29 or C31. Rearranging Eq. 4 as shown in Eq. 5, we calculate reconstructed VPD values varying from 0.13 to 0.92 kPa during this interval, with higher values occurring at ~15.2 Ma during the MMCO, and again between 13.5 and 12.5 Ma, commencing during the MMCT and continuing into the late Miocene (Fig. 4).

$${\rm{VPD}}=1.3125-\sqrt{14.1208-0.4629\,{\rm{ACL}}}$$
(5)

Comparison of these data with previously existing paleoclimate reconstructions show good agreement, with relative decreases and increases in VPDav occurring broadly in synch with decreases in global air temperature61, atmospheric pCO257, and previous reconstructions of aridity in the region based on fossil tooth assemblages and stable isotope data60,65. We note, however, that the largest shift in reconstructed VPD occurs after 13 Ma, rather than during the MMCT interval (Fig. 4c). Reconstructions of mean annual precipitation in Europe suggest that rainfall declined throughout the late middle to late Miocene, and hence our increase in VPD (suggestive of a reduction in available moisture), is likely to reflect the broader aridification of Central and Southwest Europe previously identified between 13 and 11 Ma58. Meta-analysis of paleoprecipitation data from Europe suggests that such changes in rainfall are not directly correlated with global temperatures, but rather indicative of the influence of processes such as changes in ice-volume, shifts in hemispherical temperature gradients, and regional geography and topography66.

Analysis of modern sediment n-alkane profiles show that the distribution of homologues is strongly controlled by the moisture deficit of the environment. There are both biological and physical bases for this response, but most critically organic biomarkers in the sedimentary system record these processes allowing for quantitative reconstruction of paleo-VPD. This new organic molecular tool for paleohydrologic investigations provides new constraints on VPD in Central Spain during a critical climate transition, highlighting the applicability of this technique over geologic timescales.

### Analytical methods

#### Lipid extraction and quantification

We collected materials from fresh exposed paleosol and floodplain sediments from the Armantes section of the Calatayud-Daroca Basin in Central Spain deposited from the peak of MMCO warming (~16.5 Ma) to the MMCT transition period (~12.4 Ma). Modern surface soils collected as part of this study were collected from the top ~10 cm, with visible plant and root material removed. Soils and sediments were freeze dried as soon as possible after collection to minimise the potential for microbial alteration of sedimentary lipids67.

All surface soils and Miocene sediments analysed as part of this study were solvent extracted to obtain the aliphatic fraction containing n-alkanes. Approximately 150 g of dried sediment was extracted with soxhlet apparatus, using a 2:1 (v/v) mixture of dichloromethane and methanol. Extracts were concentrated under a stream of N2 gas, and then separated into compound fractions by silica gel chromatography using ashed Pasteur pipettes packed with activated silica gel (70–230 mesh). Aliphatic, aromatic and polar solutions were eluted by the sequential application of 2 ml hexane, 4 ml dichloromethane, and 4 ml of methanol. The hexane fraction was further purified using urea adduction. Adducted normal alkanes were extracted into hexane, concentrated under N2, and analyzed for molecular distributions on a Thermo-Scientific Trace GC Ultra equipped with a split-splitless injector and a flame ionisation detector, using a DB-5 column (60 m × 0.25 mm i.d., 0.25 µm) with helium as a carrier gas. Alkane peaks were identified by comparing retention times to those of a laboratory standard, and the Average Chain Length (ACL) was calculated using Eq. 3, and results are shown in Table S2.

### Data availability

All data generated or analysed during this study are included in this published article (and its Supplementary Information files).