Introduction

The aviation industry contributes to anthropogenic climate change through CO2 and non-CO2 emissions. Recent studies have underscored the significance of aviation non-CO2 emissions, which are now thought to be responsible for around two-thirds of the total warming from aviation1. Of primary importance is the release of nitrogen oxides (NOx)2,3,4,5,6, water vapour7,8 and aerosols9,10,11,12, and also the formation of contrails13,14,15,16,17,18. The EU Parliament recently adopted legislation (Directive 2023/958 of May 10, 2023 amending Directive 2003/87/EC) that aims to revise the EU Emissions Trading System (ETS) for aviation, inter alia requiring the European Commission to include aviation non-CO2 effects in a monitoring, reporting and verification (MRV) framework and, if deemed appropriate, expand the scope of the ETS to include aviation non-CO2 effects by the end of 2027. Such implementation requires the use of a climate metric, which relates non-CO2 emissions and effects to their consequences on the climate and/or on society19,20,21,22. To maintain compatibility with market-based or offsetting schemes such as the ETS, climate metrics are often used as exchange rates, expressing non-CO2 emissions on a common scale with CO2 emissions. This single-basket approach can simplify climate negotiations and the implementation of climate policies22.

However, establishing an adequate equivalence is not trivial and there is currently no consensus on which climate metric is most appropriate for aviation. In international climate policy, the most commonly used climate metric is the Global Warming Potential (GWP)19,23, although it has been heavily criticised, primarily due to its dependence on the time horizon20,24,25,26,27. The choice of climate metric for aviation climate policy is further complicated because aviation non-CO2 emissions and effects have highly varying atmospheric lifetimes and efficacies28, are dependent on the emission time, altitude and location29,30 and their impacts on the climate have a high degree of uncertainty1,31. Furthermore, since each climate metric uses a different climate indicator (e.g., stratospheric-adjusted radiative forcing or global mean near-surface temperature change)32 and calculation method, a climate metric can inherently and inadvertently place emphasis on certain aircraft design choices, emission species or effects. The choice of climate metric is thus an important consideration for all stakeholders to ensure that the implementation of climate policy results in the desired reduction of the aviation industry’s impact on climate and on society22,32.

In this paper, we explore the applicability of existing, physical climate metrics to the aviation industry. Specifically, we analyse the compatibility to aircraft design and aviation policy, methodological simplicity, neutrality and stability of the following conventional climate metrics: Radiative Forcing (RF) and the Radiative Forcing Index (RFI—relative RF)33,34, GWP19,23, Global Temperature Change Potential (GTP)35, Integrated GTP (iGTP)36 and Average Temperature Response (ATR)37. The performance of a recently proposed, unconventional method that relates the changes in emission rates of short-lived species to pulses of CO2, denoted GWP*38,39,40, is also evaluated. We further analyse derivatives of the GWP and GWP* that are weighted by the efficacy, which we denote the Efficacy-weighted Global Warming Potential (EGWP) and EGWP* respectively. We find that, compared to the dominant GWP, a more accurate assessment of existing as well as future aircraft powered by novel aviation fuels would be enabled through the introduction of the ATR or EGWP into climate policy. We recommend further research into the potential use of the EGWP and into new efficacy estimates for aviation non-CO2 emissions.

Results

Development of climate metric requirements

Climate metrics are used in both their absolute form and relative to CO2. In aviation, absolute climate metrics have two primary use cases: in trajectory optimisation, where aircraft are re-routed to avoid climate-sensitive regions41,42; and in aircraft design, where the climate metric can be part of the design trade-off process43. Relative metrics are primarily used at a policy level, notably to calculate CO2-equivalent emissions and multipliers in single-basket emissions trading schemes such as the ETS19,22.

From these use cases, we identify the following main requirements for climate metrics used for aviation. Aviation climate metrics shall:

  1. 1.

    Neutrally represent the chosen climate indicator (REQ 1)32,44. Value judgements should be left to policymakers and should not be built into climate metrics. Therefore, a climate metric should not exhibit any inherent bias towards specific aircraft design changes.

  2. 2.

    Be temporally stable (REQ 2)45. For aviation policy, it should be possible to use climate metrics to monitor how well the industry is performing through annual and quarterly reports. The results shown by a climate metric, and thus the emission offsetting cost, should not vary to the extent that policymakers cannot gauge the effectiveness of their policies, and airlines and other stakeholders cannot estimate their offsetting cost.

  3. 3.

    Be compatible with existing climate policy (REQ 3)45. A new climate metric must still be able to perform the same functions in the current climate policy context.

  4. 4.

    Be simple to understand and implement (REQ 4)20,21,45. Non-specialists should be able to understand how a climate metric is calculated and be able to correctly interpret what its results show.

In the following, we analyse each requirement individually and recommend the best-suited climate metric based on the results. We also analyse the impact of the time horizon on the results. We perform our analyses using the climate-chemistry response model AirClim46,47, which provides yearly global mean radiative forcing and temperature change values from spatially resolved aviation scenarios for CO2, water vapour, contrails and NOx-induced changes in ozone (short and long-term) and methane. For the purposes of analysing climate metrics with time horizons in the order of years, the responses of other very short-lived species such as aerosols are expected and assumed to be qualitatively the same as for contrails.

Note that in this paper, we add an A to denote an absolute climate metric (i.e., AGWP, AEGWP, AGTP, iAGTP) and use rATR to denote the relative ATR (to CO2). The GWP* and EGWP* do not have absolute and relative forms. Where necessary for clarification, we also use P-, F- and S- to denote climate metrics calculated using a pulse, fleet or total aviation industry emission scenario, respectively.

REQ 1: climate metric neutrality with respect to aviation emissions

To assess the neutrality of climate metrics for aircraft design, the peak and average total temperatures and climate metric values of potential future fleets are compared. A wide range of narrowbody fleets are generated using a Monte Carlo simulation of various high-level aircraft parameters, including the use of conventional as well as novel aviation fuels such as SAF and hydrogen. The fleets are analysed using the climate-chemistry response model AirClim46,47, as described in “Methods” (cf. Table 3). The neutrality of each climate metric is gauged by the frequency f of incorrect fleet pairs—defined here as when the signs of the differences in peak/average temperature (ΔT) and total climate metric value (CM) between any two fleets i and j do not match—compared to the total number of fleet pairs (cf. the method used by Grewe et al.48):

$$f(H) =\frac{1}{C(N,2)}\sum\limits_{{{{{{{{\rm{i}}}}}}}}=1}^{N}\sum\limits_{{{{{{{{\rm{j}}}}}}}}=1,{{{{{{{\rm{j}}}}}}}}\ne {{{{{{{\rm{i}}}}}}}}}^{N}\left\{\begin{array}{ll}1 &{{{{{{{\rm{if}}}}}}}}\,({{{{{{{{\rm{CM}}}}}}}}}_{H,{{{{{{{\rm{j}}}}}}}}}-{{{{{{{{\rm{CM}}}}}}}}}_{H,{{{{{{{\rm{i}}}}}}}}})\times ({{\Delta }}{T}_{{{{{{{{\rm{j}}}}}}}}}-{{\Delta }}{T}_{{{{{{{{\rm{i}}}}}}}}}) \, < 0\\ 0 & \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{{{{{{\rm{otherwise}}}}}}}}\end{array}\right.$$
(1)

where N = 10,000 is the total number of fleets. Figure 1 shows the frequency of incorrect fleet pairs as a function of the time horizon H; example results for a single time horizon H = 100 are shown in Supplementary Fig. 1.

Fig. 1: Comparison of the neutrality of different climate metrics to changes in aircraft design.
figure 1

Shown is the frequency of incorrect fleet pairs, corresponding to those where the signs of the peak/average temperature change and climate metric change do not match, as a function of the time horizon for the peak temperature (a) and 20-, 50- and 100-year average temperature (bd) climate objectives. Each climate metric is represented by a different combination of colour, line style and marker. Values are available between 5 and 100 years with a time horizon step of 5 years; however, markers are shown every 10 years for clarity.

In general, the similarity shown in Fig. 1 between the results for all climate metrics and climate objectives suggest that the peak temperature is also a good indicator for the average temperature and vice versa over a wide range of time horizons. The endpoint climate metrics F-RF and F-AGTP show a clear dependence on the time horizon and hence shape of the temporal emission profile (temporal evolution of yearly emissions per species). This demonstrates that a single value of radiative forcing or temperature at a time in the future is not a good indicator of the peak or average temperature. Since the radiative forcing from a number of aviation non-CO2 effects, such as the warming effect from the NOx-induced short-term increase in ozone (e.g., ref. 1), has dissipated at large time horizons, the F-RF in particular can have low or altogether no sensitivity to a number of aircraft and engine design changes, leading to a high rate of incorrect fleet pairs. The integrated climate metrics F-AGWP, F-AEGWP and F-iAGTP/F-ATR, in comparison, have a memory of these previous emissions and are less dependent on the temporal emission profile.

The F-AGWP and F-GWP* show largely linear responses for peak and average temperature, particularly for time horizons above 60 years, but in general have higher frequencies of incorrect fleet pairs than climate metrics based on temperature or using efficacy. The F-EGWP* has a similarly low dependence on the time horizon with a lower frequency of incorrect fleet pairs, demonstrating almost ideal behaviour in this context. Whilst the F-iAGTP/F-ATR has a clear minimum at 70 years for peak temperature and at 20, 50 and 100 years for the corresponding average temperatures, it generally has a frequency of incorrect fleet pairs of less than 2%. The F-AEGWP performs very similarly, surpassing 2% error frequency only for the 20-year average temperature, and has clear minima at slightly lower time horizons: 55, 15, 45 and 80 years, respectively.

REQ 2: temporal stability

The temporal stability of climate metrics is judged using CO2-eq trajectories for the full aviation industry. In this work, we use the CORSIA and FP2050 scenarios developed by Grewe et al.49 as examples. The CORSIA scenario assumes business as usual, but that CO2 emissions are offset beyond 2020; whereas the FP2050 scenario makes use of the Flightpath 2050 targets: 75% CO2 and 90% NOx reduction by 2050. Figure 2 shows the CO2-eq emissions calculated for both scenarios using each climate metric with a 100-year time horizon. Two elements of the responses are highlighted here.

Fig. 2: Comparison of the climate metric responses for full aviation emission scenarios.
figure 2

Shown are the CO2-eq emissions calculated using each climate metric with a 100-year time horizon for the CORSIA (a) and FP2050 (b) scenarios. Each climate metric is represented by a different combination of colour, line style and marker for clarity. Also shown is the fuel use (red dashed line) and temperature (solid black line) response for each emission species calculated using AirClim for the CORSIA (c) and the FP2050 (d) scenarios. All values are calculated on a yearly basis—the markers for each species are differently spaced such that overlapping lines can more easily be identified.

First, the total CO2-eq values calculated using the endpoint S-RFI and S-GTP climate metrics are very similar, although as the emission rate (rate of change of yearly emissions over time) reduces around the year 2020 in the FP2050 scenario the results begin to drift apart. The RFI and GTP can thus be seen to be stable for the analysed full aviation emissions. However, both climate metrics can struggle to show qualitatively the same response for pulse and constant emissions (see e.g., ref. 50, their Fig. 3.3), depending on the chosen time horizon.

The S-GWP and S-iGTP/S-rATR show very similar responses; the S-EGWP also produces similar, albeit generally lower results. The similarity between the S-GWP and S-iGTP/S-rATR potentially allows for species-dependent conversion factors and reduces the political capital required to switch from the standard GWP to either the iGTP or rATR in climate policy. This is a somewhat surprisingly result, since the bases for the climate metric calculations differ, affecting the contributions of individual species to the total CO2-eq: The GWP is RF-based, whereas the ATR is temperature-based. As a result, the GWP emphasises contrail cirrus, and the ATR the warming effect of NOx-induced ozone (see Supplementary Fig. 2). Nevertheless, for full aviation scenarios assuming Jet-A1 fuel, the differing contributions seem to balance out. It is, however, likely that the introduction of novel propulsion technologies and fuels, which change the emission indices relative to one another, will result in a divergence of the total CO2-eq emissions calculated by both metrics. The rATR would then likely more closely match the EGWP than the GWP. Further research could analyse the response from different models and emission inventories to check the validity of conversion factors, in particular for novel aviation fuels.

A second noticeable element is the rapid deviation of the S-GWP* and S-EGWP* from the response shown by all other climate metrics in both scenarios. The GWP* calculation method uses an average of the previous 20 years of radiative forcing and is closely tied to the emission rate given by the scenario. Therefore, small changes in the emission rate, in the years 2020 and 2050 in the CORSIA scenario, result in large changes in the CO2-eq trajectory. A policymaker using the GWP* or EGWP* method to monitor CO2-eq emissions between 2030 and 2050 could incorrectly assume that the impact of aviation is reducing, when in actuality only the emission rate has decreased.

This instability is particularly problematic for the FP2050 scenario. Whilst the values from all other climate metrics largely correspond to the reducing fuel use, the S-GWP* and S-EGWP* show negative CO2-eq emissions between 2050 and 2080. Whilst this behaviour is useful for representing the temperature using a cumulative integral, negative CO2-eq values could easily be misinterpreted as a sign that aviation is causing an active cooling. The magnitude of the negative CO2-eq values is also disproportionately large compared to the shallow peak shown in the temperature.

REQ 3: compatibility with existing climate policy

To be compatible with existing climate policy means that a climate metric can be used in current climate frameworks and methods. These have generally been established on the basis of the GWP, which has become the most commonly used climate metric. There is thus a natural bias towards climate metrics that behave in a similar manner to the GWP. For aviation, this functionally means that any alternative to the GWP must be able to (1) calculate the temporal trajectories of CO2-eq emissions; and (2) calculate single values for fleets and individual flights, the latter of which is necessary for the introduction of aviation non-CO2 emissions into the ETS. The climate metrics RF, GWP, EGWP, GTP, iGTP and ATR are able to perform these functions. However, the GWP* and EGWP* struggle to provide a single value for an individual fleet or flight.

Rather than providing a single value for a given time horizon, the GWP* method provides a temporal trajectory for each emission species, as shown in Fig. 3 for a simple fleet temporal emission profile. If used as a climate metric, for example, to compare this fleet to another, there is no obvious point along the temporal trajectory to choose. Indeed, the choice of which point to use is itself a trade-off between different emission species. Therefore, whilst the GWP* method is useful for certain technical discussions, it should be seen as a model rather than a metric, as previously argued by Meinshausen and Nicholls45, and should not be viewed as a potential replacement for the GWP in aviation policy.

Fig. 3: CO2-eq emissions calculated using the GWP*100 method for an example fleet demonstrating the flow-based nature of the GWP*.
figure 3

The inset figure shows the temporal emission profile; the main figure the CO2-eq response for each species (colours). For the comparison of fleets in this study (F-GWP*), the peak total value is used, in this example occurring in the year 2050. Note that the total value is dominated by the contrail impact and that different species have their peaks at later times. Therefore, the choice of which value to use is itself a trade-off between different emission species.

REQ 4: simplicity

The endpoint climate metrics RF and GTP are clearly the easiest to understand. It is straightforward to determine how these climate metrics behave for different time horizons, background emissions scenarios and fuel scenarios. Integrated climate metrics—GWP, EGWP, iGTP and ATR—are more complex and it can be difficult to ascertain the impacts of individual effects and species on the results. The least simple to understand and implement are the GWP* and EGWP*: Their behaviour can be puzzling even for simple temporal emission profiles and can show initially counter-intuitive results, such as the negative emissions in Fig. 2.

In comparison to temperature-based climate metrics (GTP, iGTP, ATR), climate metrics based on radiative forcing (RF, GWP) are easier to implement since they do not need a full climate or carbon-cycle model. The EGWP requires efficacy values and is thus more complex. However, the demand on computational time depends on which model is used to calculate the climate metric values. The GWP* and EGWP* are RF-based climate metrics, but they do use the AGWP for CO2, which would need to be defined as a standard value or calculated for a given scenario. In addition, these metrics require the temporal emission profile twenty years prior to any value, since the method uses a 20-year running average. This could potentially complicate the implementation of the GWP* method.

Choice of climate metric and time horizon

An overview of the performance of all analysed climate metrics is shown in Table 1. It is clear that the choice of climate metric must be the result of a trade-off. Based on our analysis and definition of requirements, the ATR and EGWP can be seen to perform best. Here, we inspect in more detail the advantages and disadvantages of these metrics in comparison to the GWP and investigate their dependence on the time horizon. We note that the ATR and iAGTP differ only in the division of the time horizon; in their relative forms, the rATR and iGTP are identical. However, the ATR is chosen rather than the iAGTP because the division by the time horizon improves the stability of the absolute climate metric responses.

Table 1 Overview of the performance of the analysed climate metrics with respect to each requirement

The ATR and EGWP perform similarly well in the pairwise fleet analysis (REQ 1) for the peak and average temperature climate indicators, as well as in analysis on temporal stability (REQ 2). The rATR100 produces CO2-eq emissions that very closely match those calculated using the GWP100, potentially easing the introduction of the ATR in climate policy (REQ 3); introduction of the EGWP is even simpler since it is only a derivative of the GWP. Finally, although the concept of an average temperature change is simple to understand for non-specialists, it can be difficult to identify the impacts of specific effects on results. In comparison to the GWP, the EGWP and in particular the ATR as a temperature-based metric, include more climatic processes, but thus also more assumptions and uncertainties. Implementation of both climate metrics can thus be seen as complex (REQ 4). Further work is required to better understand the benefits and potential downsides of the EGWP in other contexts. Further research into best estimates of the efficacy would also be beneficial.

The dependence of the GWP, EGWP and ATR on the time horizon is shown in Fig. 4 for three different emission scenarios. Individual overviews for each of these metrics and the GTP are provided in Supplementary Figs. 3–6. The inclusion of the efficacy in the EGWP is evident from the lower panel of the figure, affecting the relative importance of ozone and contrails in particular. The results of the RF-based EGWP now much more closely match those of the temperature-based ATR. Nevertheless, the total relative metric values and thus calculated CO2-eq emissions of the GWP, EGWP and ATR are very similar, especially for large time horizons. In general, the sensitivity of all three metrics to the time horizon, represented by the gradient of the relative metric values, decreases with increasing time horizon. Using a low time horizon, for example, 20 years, would require particular justification: Why was 20 years chosen rather than say 15, 25 or even 19 years? Instead, the responses suggest that larger time horizons are most suitable for integrated climate metrics, greater than around 70 years. This is particularly true for the ATR, which requires larger time horizons to properly account for the delay in the temperature response of the atmosphere.

Fig. 4: Comparison of the GWP, EGWP and ATR responses for time horizons between 0 and 100 years.
figure 4

Shown are the responses from the GWP (solid line), EGWP (dashed line) and rATR (dotted line), which in its relative form is equivalent to the iGTP. The top row shows the total metric value relative to CO2; the bottom row the responses calculated for each species (colours) relative to the total. Three temporal emission profiles are used, for which the fuel usage profiles are shown in the inset plots: a pulse emission (P-) in (a, d); a fleet emission (F-) in (b, e) and a 1% increasing emission (I-) in (c, f). Each response is shown for the Shared Socioeconomic Pathway SSP2-4.5 with margins (shading) for scenarios SSP1 to SSP5, which are used as the background emissions scenarios in AirClim.

Discussion

Our analyses demonstrate that the selection of a climate metric plays a crucial role in ensuring that implemented climate policies effectively reduce the aviation industry’s impact on the climate. In the fleet pairing analysis, we illustrate that climate metrics can have inherent trade-offs and favour certain aircraft designs over others. These inherent biases are undesirable since value judgements should be left to policy decision-making and not embedded into climate metrics.

The choice of climate metric is always the result of a trade-off. Due to the historical dominance of the GWP, there is a natural bias towards climate metrics that behave in a similar manner. However, our research clearly suggests that there are derivatives and alternatives that outperform the GWP for aviation. We require that a suitable climate metric displays neutrality with respect to different emission species; exhibits temporal stability; is compatible with existing climate policy; and is simple to understand and implement. These requirements are in line with those stated by others20,45. Based on these requirements, we identify the Efficacy-weighted Global Warming Potential (EGWP) and Average Temperature Response (ATR) as the most appropriate climate metrics for aircraft design and aviation policy. Both metrics are stable and can monitor the impact of the aviation industry using CO2-equivalents effectively. They also do not favour specific emission species for both peak and average temperature climate indicators across a wide range of time horizons and emission scenarios.

Whilst the ATR as a temperature-based climate metric has the potential to include more climatic processes and be more relevant for temperature-based targets than the GWP, the larger number of assumptions and uncertainties must also be considered. The EGWP may, therefore, be a useful compromise for policymakers, in that it can more accurately represent the climate impact of aviation whilst still using the GWP methodology. Further research is recommended into the advantages and potential disadvantages of using the EGWP. If the ATR were to be chosen, it would benefit from the close match of the total CO2-eq emissions calculated by the S-rATR100 and S-GWP100, despite the differences in contributions of individual species. The S-EGWP100, for its part, also produces similar results. However, it is likely that the total emissions calculated by the ATR and GWP will diverge with the introduction of novel propulsion technologies and fuels such as hydrogen since the relative contributions of the non-CO2 emissions will change.

Determining an appropriate time horizon for both the EGWP and ATR remains a challenge. The time horizon is a trade-off between incorporating the long-term response to an emission and ensuring the predictability and accuracy of a future emission scenario. We find that integrated climate metrics generally require larger time horizons to account for the atmospheric radiative forcing and temperature adjustment. If a short time horizon is chosen, policymakers must provide sufficient justification for the choice. Alternatively, values for different time horizons could be provided together, as proposed in ref. 27, although this complicates the calculation of CO2-eq emissions, for example in the upcoming ETS revision.

The accuracy of our results could be improved by using real aircraft designs: Since design parameters are chosen randomly within a given range, some fleets may not be physically feasible. However, since we used the same method of randomly choosing parameters, any additional incorrect fleet pairings caused by this limitation are assumed to cancel out and not impact our conclusions. Similarly, given the wide range of potential aircraft designs analysed in this study, it is unlikely that the choice of climate model, AirClim, has influenced the results significantly. Verification with another climate model may enhance our understanding of the results.

Ultimately, the most suitable climate metric and corresponding time horizon must be determined by policymakers depending on the policy and climate objective, the emission scenario, and whether a relative or absolute climate metric is required. Based on a general set of requirements suitable for policymaking, our findings endorse the use of the ATR and EGWP with a time horizon greater than 70 years for aircraft design and aviation policy to assess the long-term climate impact of aviation. However, the choice of climate metric does not have to be contentious or controversial: As our analysis and the numerous previous studies have demonstrated, tools exist with which the performance of any climate metric can be analysed and potential shortcomings and pitfalls identified, such that these can be addressed in climate policy.

Methods

Climate metric calculation methods

The calculation methods for all climate metrics are given in Table 2. These methods require a time series of radiative forcing (RF) and resulting temperature change ΔT. The calculation methods of the EGWP, ATR/iGTP and GWP*/EGWP* are described in more detail below. In this research, we use the climate-chemistry response model AirClim46,47 to calculate the RF and ΔT responses of individual aircraft fleets using data from the DLR WeCare project51, and of the full global fleet using scenarios developed by Grewe et al.49. These are described in more detail in subsequent sections. For ease of comparison, we use the Shared Socioeconomic Pathway SSP2-4.552 as the default background emissions scenario for our analyses, but vary between SSP1 to SSP5 in the multivariate fleet analysis. AirClim is an extension to the linear response model for CO2 developed by Sausen and Schumann53 and combines emission data with pre-calculated altitude- and latitude-dependent data obtained from steady-state simulations with the E39/CA54 climate-chemistry model (for ozone, methane, water vapour and contrails) and ECHAM4-CCMod55 (for contrail cirrus). It was chosen for this research due to its low computational cost and flexibility.

Table 2 Calculation methods for all climate metrics used in this research

EGWP—The Efficacy-weighted Global Warming Potential (EGWP) was developed as a derivative of the GWP. It aims to introduce the efficacy of non-CO2 emissions into the GWP method, such that the results obtained by the GWP more closely match those of temperature-based climate metrics. The EGWP for a single species i is then the GWP of that species multiplied by its efficacy ri, taken from ref. 28 (their Table 1). We note that this calculation method is still quite uncertain, in particular for contrail cirrus56, although it is not expected to affect the results of this study. Another potential approach to calculate the EGWP would be to use the Effective Radiative Forcing (ERF) and corresponding efficacies \(r{{\prime} }_{{{{{{{{\rm{i}}}}}}}}}\). Further work is required to analyse the performance of these two climate metrics and to develop better estimates of the efficacies for aviation emissions.

ATR—The Average Temperature Response (ATR) was initially developed by Dallara et al.37 specifically for aircraft design. Initially, it included a weighting function and used an infinite time horizon H. However, the infinite time horizon in particular made it inappropriate for global fuel scenarios, for example. Since its inception, therefore, the ATR has been repurposed and is now generally used as the average temperature change over a given time horizon, as shown in Table 2. The weighting function is also no longer used. The relative ATR is denoted rATR in this research for clarity. Note that this definition of the ATR is related to the iAGTP by: \({{{{{{{{\rm{ATR}}}}}}}}}_{H}={{{{{{{{\rm{iAGTP}}}}}}}}}_{H}/H\).

GWP*—Since emission rates are meaningless for NOx-induced aviation effects (O3, long-term CH4 reduction and the Primary Mode Ozone (PMO) effect) and for contrails, the GWP* methodology must be adapted to use radiative forcing. This equivalent calculation is proposed in the initial development of the GWP* by Allen et al.38 and is modified using the improvements suggested by Cain et al.39 and Smith et al.40 to obtain:

$${E}_{{{{{{{{{\rm{CO}}}}}}}}}_{2}-{{{{{{{\rm{we}}}}}}}}}(t)=g(s)\times \left[(1-s)\times \frac{{{\Delta }}{{{{{{{\rm{RF}}}}}}}}(t)}{{{\Delta }}t}\times \frac{H}{{{{{{{{{\rm{AGWP}}}}}}}}}_{H({{{{{{{{\rm{CO}}}}}}}}}_{2})}}+s\times \frac{\overline{{{{{{{{\rm{RF}}}}}}}}}(t)}{{{{{{{{{\rm{AGWP}}}}}}}}}_{H({{{{{{{{\rm{CO}}}}}}}}}_{2})}}\right]$$
(2)

where \({E}_{{{{{{{{{\rm{CO}}}}}}}}}_{2}-{{{{{{{\rm{we}}}}}}}}}(t)\) are CO2-warming equivalent emissions as a function of time, ΔRF the change in radiative forcing over the previous Δt = 20 years, \(\overline{{{{{{{{\rm{RF}}}}}}}}}\) the running average of RF and \({{{{{{{{\rm{AGWP}}}}}}}}}_{H({{{{{{{{\rm{CO}}}}}}}}}_{2})}\) the AGWP of a CO2 pulse at a time horizon of H years. The above equation differs to the one used by Lee et al.1 only by the multiplication by g(s), which was introduced in the same year by Smith et al.40 to improve consistency with the linear models used for climate metric calculations. In this research, we use s = 0.75 to be consistent with Smith et al.40. However, we note that this value was calculated for methane (CH4) and thus may not be optimal for other aviation non-CO2 emissions and effects.

The EGWP* is a climate metric developed as part of this research as a derivative of the GWP*. Similarly to the EGWP, it makes use of the efficacy ri, also taken from ref. 28, to more closely match the results obtained by temperature-based climate metrics. The GWP* methodology is adapted by replacing RFi with RFi × ri.

The GWP* and EGWP* differ from the other climate metrics considered in this study in that they are flow-based climate metrics: The GWP* method does not provide a single value over a specific time horizon. Instead, it provides a CO2-eq value as a function of time, as shown in Fig. 3 (main text). To estimate the impact of a fleet or flight, a certain point along the temporal trajectory must be chosen. It can be argued that for the analysis of the peak temperature, the peak CO2-eq value should be chosen. However, the time at which the peak occurs differs per species, and can also differ per fleet, thereby raising the question whether the climate metric values of each fleet are showing the same thing and are thus intercomparable. In the example shown in the Figure, the peak total CO2-eq value is dominated by the contrail impact—all other emissions have their peaks at a later time. However, since no other point could be identified as appropriate, in this research we use time of the peak total CO2-eq value for fleet comparisons—in Fig. 3 thus the values in 2050.

Development of fuel scenarios

This research is based on the CORSIA and Flightpath 2050 (FP2050) fuel scenarios developed by Grewe et al.49. Since time horizons of up to 100 years are analysed, the scenarios needed to be extended. For this research, they have been extended until the year 2200, assuming a 0.5% annual growth rate after the year 2100. The scenarios are developed to test climate metrics and have not been evaluated for reliability and accuracy.

Fleet pairing analysis

The fleets used in this research are theoretical and characterised with a set of input parameters in AirClim, chosen uniformly from ranges shown in Table 3. The parameter ranges are based on expected technological pathways developed by Grewe et al.49, within the Clean Sky 2 Technology Evaluator57 and by the “Hydrogen-powered aviation” report by the Clean Hydrogen Joint Undertaking (2020, https://doi.org/10.2843/766989). The contrail distance modifier mentioned in the Table is a multiplier for the total cruise distance for which contrails form, which is an AirClim input. In this context, a factor below unity corresponds to aircraft flying further to avoid climate-sensitive regions and, therefore, contrail formation. As a result, the reduction in contrail distance is coupled with an increase in fuel burn, estimated from ref. 58 to be of the ratio −15%:1% contrail distance to fuel burn up to a contrail distance reduction of 60% (contrail distance modifier of 40%), which is approximately the end of the quasi-linear region of the Pareto fronts calculated. For fleets using fuels other than Jet-A1, the emissions parameters are further modified according to Table 4. Here, the contrail reduction is assumed to correspond to changes in the exhaust composition due to the use of different fuels. We note that the data in both tables are a simplification and that comprehensive data is not yet available for different fuel types. However, since our objective is to provide a wide range of potential future fleets, this simplification is deemed appropriate and should not affect the results of this research.

Table 3 Ranges of fleet design parameters for the fleet pairing analysis simulations
Table 4 Assumed change of in-flight emissions and emission-related effects for Sustainable Aviation Fuel (SAF) and hydrogen

For each fleet, a constant production rate is assumed, expected to last 30 years. Production is assumed to begin after 2030, approximately on par with the expected introduction of the next generation of single-aisle aircraft and new fuels such as hydrogen according to the analyses of Grewe et al.49. The exact year of introduction of new fleets is, however, not relevant to the outcome of this study and is thus varied. Each aircraft is further assumed to have a lifetime of 35 years with no hull losses. A single-aisle aircraft about the size of the Airbus A320 is chosen for reference. For simplicity, the fuel use of this fleet is taken to be 40% of Category 4 of the DLR WeCare project51, characterised by aircraft with seat numbers between 152 and 201. A total of 10,000 fleets are simulated using AirClim.