Diverging importance of drought stress for maize and winter wheat in Europe

Understanding the drivers of yield levels under climate change is required to support adaptation planning and respond to changing production risks. This study uses an ensemble of crop models applied on a spatial grid to quantify the contributions of various climatic drivers to past yield variability in grain maize and winter wheat of European cropping systems (1984–2009) and drivers of climate change impacts to 2050. Results reveal that for the current genotypes and mix of irrigated and rainfed production, climate change would lead to yield losses for grain maize and gains for winter wheat. Across Europe, on average heat stress does not increase for either crop in rainfed systems, while drought stress intensifies for maize only. In low-yielding years, drought stress persists as the main driver of losses for both crops, with elevated CO2 offering no yield benefit in these years.

The authors run two ensembles of models (6 maize models and 8 winter wheat models) to evaluate the effects of heat stress and drought stress (with/without CO2 effect) on crop yields in Europe. They considered three different RCPs, and presented their results for several major European countries, as well as for the whole Europe. The results are not surprising; the authors found that maize yields were more impacted by drought stress than wheat, and that the positive effect of CO2 was stronger for wheat than for maize. These results were expected because wheat and maize are C3/winter and C4/summer crops, respectively. Several papers already reported this type of results. The main interest of this new paper lies in the use of two ensembles of models at a relatively large scale (Europe).
The sizes of the model ensembles considered by the authors are relatively modest compared to the ensembles sizes used in other studies. Looking at the very large between-model variability reported by the authors, I am not sure that the ensemble sizes considered here are large enough to obtain robust conclusions. This should be discussed.
The authors estimated the correlations of their model outputs with FAO yield statistics ( Figure 1) but this part seems to be disconnected from the other parts of the paper. Although I found the idea interesting, the flow of this section was not very good and the results were not clearly presented. Figure 4 could be improved; the names of the y-axis are slightly inconsistent over a, b, and c. More importantly, the authors only showed the 50% probability intervals here; why showing these narrow intervals and not 90% (or at least 80%) intervals?
In the material and method section, the authors mentioned that they computed two types of sensitivity indices. I think Eq.(2) is wrong (L144). The term E(Y/X-i) was mentioned in the text (L147) but not used in the equation. It is not explained how the indices were computed from the model outputs and how they were used to interpret the results.
Reviewer #2 (Remarks to the Author): This paper addresses the trajectory for crop yields for two of the world's major food crops in Europe as affected by climate change. The authors use ensembles of climate change models and crop models under 3 representative pathways to examine the likely impacts of various climate change scenarios to 2050 on yields and yield variability. The paper is well written and the research is well conceived and executed. The findings are novel and of wide interest to the scientific and general community. The results are credible although the use of fixed annual resets of soil water at sowing is somewhat problematical. I have some comments and some suggested edits: L12. This may seem pedantic to crop modelling insiders but the term "gridded crop models" is potentially confusing to a general scientific audience. The models are not gridded but are in fact single point models applied to a spatial grid with results interpolated between grid points. As this is the only instance in which the term is used perhaps it could be modified to say 'crop models applied to spatial grid'. L18 I found the use of the notation 'e[CO2]' confusing. Based on this line I understand it to mean elevated concentration of CO2 but e in association with CO2 is usually taken to mean total GHG emissions expressed as CO2 equivalents (e.g. to include NOx) which makes no sense in the context of this paper. I suggest the 'e' be replaced with 'elevated' throughout the text.
Editorial Note: This manuscript has been previously reviewed at another journal that is not operating a transparent peer review scheme. This document only contains reviewer comments and rebuttal letters for versions considered at Nature Communications .
L36-42 This statement omits the work of Hochman et al. 2017 (GCB paper) who decomposed drivers of climate change (rainfall, temperature and [CO2]) systematically for wheat at a continental scale. The difference in findings between this study and the Australian one with respect to wheat should be considered in discussion of results. L49-67 This begs the question -to what extent do the various crop models used capture these processes. A brief summary with reference to Tables S1 and S2 would help. L73-78 This long sentence peters out into nonsense in the last line. Also, is reference 33 relevant here? Economic factors and spatial aggregation are quite different though both may contribute to the lower variability of actual yields. L99-100 The term 'optimal temperature responses' and its differentiation form heat stress required explanation. L100 Add 'of' before 'heat, L119 Add 'the' before 'mix' L137-138 the meaning of the text in brackets is not clear to me. L217-221 I am inclined to accept this explanation. However, if this is the case why not show the relationship between wheat yield and the CO2 enrichment effect. This would be more convincing and probably present a richer story. L234 change 'exception' to 'except for' L 274 what justification is there for only including the models that had significant correlations in the means? I suspect this biases the results. L277 change 'for only' to 'only for' L296 I think the authors mean 'yield losses' rather than 'yield levels' Methods: There were no line numbers on this document so I added them and used these to reference comments. L1 Climate data are from the baseline period 1980-2010. However the baseline simulation was restricted to 1984-2009. This should be stated and justified -why not use all years? L10 Why use only 2 GCMs for RCP 2.6? L38-39 I take it from this statement that initial soil water and other soil parameters were reset annually rather than used in a continuous simulation. This needs to be justified as it has been shown to make a significant difference to results (e.g. Lilley and Kirkegaard 2016 in JXB). L71-72 Given the subject matter of this paper why did you include models that could not simulate these interactions? L100 delete 'both' L104 Is this because response to CO2 fertilisation is more uncertain that response to high temperature and drought stress? Do you have references for this or is it based on your results? Reviewer #3 (Remarks to the Author): What are the major claims of the paper?
This manuscript analyzes the key drivers of yield levels and variability under climate change using an ensemble of gridded grain maize and winter wheat crop models over Europe. The simulations show that climate change will lead to yield losses for grain maize and gains for winter wheat. Decreases in grain maize yield and increases in winter wheat yield were both primarily driven by increasing and decreasing water stress, respectively, followed by mean temperature. Heat stress emerged as a relatively weak cause of climate change induced losses. In low yielding years, the drivers of yield reductions are similar to all years, though intensified. However, unlike yields in all years, elevated CO2 did not offer any advantages in terms of mitigating losses.
Are they novel and will they be of interest to others in the community and the wider field?
As the authors note, statistical crop modeling has been used extensively to examine drivers of yield changes, and there are also a number of process-based modeling studies that explore facets of this broader question through sensitivity analyses and place-based studies where stresses are removed. That said, I find the use of gridded crop models over a large region with a methodical testing of yield reduction drivers compelling, and I am not aware of any similar studies. I believe this paper will be of particular interest to researchers in crop modeling community, and more widely of interest to researchers exploring food security and climate impacts of agriculture.
Is the work convincing, and if not, what further evidence would be required to strengthen the conclusions?
Overall I find no critical flaws with the approach, but do highlight two issues that I believe should be addressed.
1. From my reading, one of the key contributions of your manuscript is identifying the prominent role of water stress in yield losses. This is counter to much of the statistical modeling work, where temperature is dominant (e.g., Schlenker and Roberts, 2009;Lobell et al., 2013). Yet, there isn't any discussion as to why this is. I would suggest pulling this theme out across the manuscript, expanding the motivation (lines 36-42), and incorporating this idea in the descriptions of Figure 1, 3, and 4, discussion, and conclusions. Also, isn't the impact water stress strongly dependent on how it is parameterized in the crop model? Same with heat stress. How do we know that the threshold for heat stress damage isn't too high in the crop models, which is why it emerges in statistical analyses but not your study? More discussion of this would be helpful given the major claims of the paper.
2. There's a lack of significance testing throughout the manuscript. For example, "expected yield increases of 2-6% across RCPs when e[CO2] effects were included" (lines 127-128) and "additional drought limitation increased from 9% without consideration of [CO2] to 12% with its inclusion" (lines 202-203). That said, I don't think you need a p-value on all numbers, you could end up leaving those statements unchanged. But, you should conduct significance testing on your key results and review the manuscript for any trivial changes that can be removed to make space for more important text. Do you feel that the paper will influence thinking in the field? I do believe that this paper will influence thinking in the crop modeling and climate impacts fields. It is an interesting application of models to push toward the attribution of yield losses to climatic drivers, which leverages the strengths of process-based modeling. As gridded crop models continue to improve, this type of assessment will only become more powerful, and there are aspects of this methodology that would be very interesting to apply at local scales, where model accuracy would be higher and the complexity of the response reduced.
Further questions and concerns about the paper Overall the paper has a lot of scenarios, drivers, and figures to keep track of, and I found it difficult to read and follow. I give a few examples below, and offer suggestions for how to address them. Note my suggestions are not meant to be prescriptive as there are a variety of ways to fix each issue. After revisions, I would suggest that the authors give the draft to someone not involved with the manuscript to make sure that everything is clear and logical.
First, the results are more of a description of the figures, as opposed to pulling out the most important aspects (ideally with significance testing) that support the major claims of the paper. I've identified what I think those are above, but of course I defer to the authors to select and pull these out clearly, and then show how the results support them.
Lines 72 and 73: Can you push some of the supplemental figure references to the methods? Or at least restructure to lead with a figure in the manuscript? This will help highlight the most important figures and concepts and not immediately send the reader to nuance that only a fraction of readers will be interested in.
Water limitation, water stress, and drought stress seem to be used interchangeably in this manuscript. There are so many scenarios that it would be best to just pick one term to refer to this and use is consistently throughout.
Lines 123-126: "Most uncertainty in the projections for maize result from different GCMs or crop models (Fig. S9), with larger negative impacts projected using the HadGEM2-ES model arising from daily maximum temperatures that were 1.1, 1.5 and 1.7°C warmer MPI-ESM-MR for RCP 2.6, RCP 4.5, and RCP 8.5, respectively (Figs. S10 and S11)."   Figure 4 seems to give the Europe-wide drivers, and then you could launch into a discussion of the distribution of those drivers in space. Also, I think taking the figures in turn, instead of having readers try to simultaneously synthesize Figures  3 and 4, would improve the readability of results. Figure 4: What's the difference between "% points" and "%" on the y-axes? If you have three panels in one figure that look similar, readers will expect continuity, but this is not the case. Panel a is a percentage change relative to the baseline and Panels b and c are differences? Is Panel b even needed given the effects are relatively small? Also I would suggest denoting changes consistently, either negative changes (as in Figures 2 and 3) or positive losses (as in Figure 4).
Supplemental Figures: Review for some of the same issues identified above in figures from the main manuscript. Figure S15 is particularly difficult to read. Is there some better way, a simpler figure or maybe a table?
Finally, as these edits will likely require some text I wanted to mention places where I believe you could reduce the word count. Throughout the manuscript, sharpening your focus around your major claims should help some. Also, the "Implications for adaptations" section could be condensed.
NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe

Response to reviewers' comments
We are grateful to the three anonymous reviewers who provided valuable comments and suggestions which have led us to revise and improve various sections of our manuscript. In addition to the main points raised by the reviewers, we also made minor editing changes through the manuscript to improve readability and clarity, and to try to shorten the text. Our responses are indicated in italics and blue on lines starting with an asterisk (*).

Reviewers' comments:
Reviewer #1 (Remarks to the Author): The authors run two ensembles of models (6 maize models and 8 winter wheat models) to evaluate the effects of heat stress and drought stress (with/without CO2 effect) on crop yields in Europe. They considered three different RCPs, and presented their results for several major European countries, as well as for the whole Europe. The results are not surprising; the authors found that maize yields were more impacted by drought stress than wheat, and that the positive effect of CO2 was stronger for wheat than for maize. These results were expected because wheat and maize are C3/winter and C4/summer crops, respectively. Several papers already reported this type of results. The main interest of this new paper lies in the use of two ensembles of models at a relatively large scale (Europe).
* we understand the reviewer did not find our work particularly novel. We would like to draw attention to some of our key findings, that we believe provide new insights and nuance for some important common conceptions as mentioned by the reviewer about what climate change in Europe would bring for C3 and C4 crops. Specifically, these include: (1)

drought, not heat stress, drove yield losses in worst years for both crops, (2) elevated CO2 did not mitigate yield losses in years/conditions with severe water limitation for maize and (3) the intensification of heat stress is fairly minor for the continued use of current varieties, as earlier maturity under climate change enables these varieties to escape heat stress. Additionally, a main methodological contribution of our study has been the use process-based crop models for gaining insights into what drives crop response to warmer temperatures and elevated CO2. In revising our paper we have emphasized these contributions throughout the introduction and results section (having made the text less descriptive).
The sizes of the model ensembles considered by the authors are relatively modest compared to the ensembles sizes used in other studies. Looking at the very large between-model variability reported by the authors, I am not sure that the ensemble sizes considered here are large enough to obtain robust conclusions.
This should be discussed. * Several studies have shown that there is no significant improvement in ensemble skill when more than about 8-10 crop models are considered in a multi-crop ensemble [1][2][3][4] . Therefore considering the high cost to run a crop model in a gridded framework, the cost-benefit ratio of the ensembles used in this study is close to the optimum. The discussion of the limited value of adding "many" models to an NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments ensemble, when the models are not independent is discussed more conceptually for climate models by Tebaldi and Knutti 5 and Knutti, et al. 6 . Additionally, a criteria for models to participate in the study was their ability to simulate heat stress effects and ideally also canopy temperature. We have added references to the crop modelling studies showing the number of ensemble members required to reduce uncertainty to the level of the experimental error in our revised manuscript as: "Here we used an optimally sized 1,2 multi-model ensemble of six grain maize models and eight winter wheat, hereafter maize and wheat respectively, to analyze the drivers of current (1984 to 2009) yield variability and projected (2040 to 2069) yield changes." The authors estimated the correlations of their model outputs with FAO yield statistics ( Figure 1) but this part seems to be disconnected from the other parts of the paper. Although I found the idea interesting, the flow of this section was not very good and the results were not clearly presented.
* We appreciate and agree with this feedback that this section was not well connected to the subsequent analysis. Based on this feedback, in the revised paper we try to better connect the two aspects, as shown in the text below (last paragraph of main text). As no specific details about what aspects of the results were not clearly presented we have not made any substantial changes to it.
"While these European patterns are informative, our analysis of baseline yield variability confirmed that adaptation planning must be conducted at the local level. The high degree of spatial variability in drivers and the number of models describing yield variability reinforces earlier findings of conducting adaptation planning at local scales with models that consider the most relevant factors 7 . The baseline analysis also provided a, albeit limited, degree of validation for our impact projections. Year-to-year maize yield variability was demonstrated as sensitive to drought stress, and this drought stress was projected to increase even after accounting for accelerated development with warmer mean temperatures. On the other hand, winter wheat yield variability was shown to be insensitive to drought and our model ensemble projected that yield limitation would not increase due to drought. We can have some confidence that for each crop, the drivers of yield change that emerged as important in the projections are built on models that had skill in explaining these drivers in the baseline. The important exception here is with wheat and drought, which emerged important in low yielding years, though our ensemble skill was found to decrease in many instances when drought effects were included. This leads to a final consideration on the possibility of weighting ensemble member projections based on performance in a historical period. We opted not to do this for reasons elaborated for climate model ensembles by Tebaldi and Knutti 5 . There is no good scientific basis to assume that models that capture past variability will best describe projected response, as relative importance of processes are expected to shift under new climatic conditions. Given the importance of understanding how crops will respond to climate change, continued work to evaluate the robustness of impact study results is needed." Figure 4 could be improved; the names of the y-axis are slightly inconsistent over a, b, and c. More importantly, the authors only showed the 50% probability intervals here; why showing these narrow intervals and not 90% (or at least 80%) intervals? * Based on the reviewer's comment, we have now modified the y-axes in Figure 4, such that they are more consistent (previous panels a and b modified and combined). Further, based on the request of the NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments third reviewer to include some statistical testing, Figure 4 now shows EU aggregate losses with error bars showing uncertainty across crop models and GCMs at 10 th and 90 th percentiles. Note, it the original version production area weighted averages (for model medians of the losses) at national level were presented-with error bars indicating the spread across countries. We used the 25 th and 75 th percentile as error bars did not consider the production weight, just values over countries, so not as misleading as perhaps perceived. We believe the new version of Figure 4 is clearer and more transparent.
In the material and method section, the authors mentioned that they computed two types of sensitivity indices. I think Eq.(2) is wrong (L144). The term E(Y/X-i) was mentioned in the text (L147) but not used in the equation. It is not explained how the indices were computed from the model outputs and how they were used to interpret the results.
* Thank you for pointing out the error in Eq. (2), which has now been corrected as: The indices were calculated in R according to the formula. These methods are described in the methods section, supplemented with a discussion of statistical testing. The text on the sensitivity indices reads: "A sensitivity analysis revealed that most uncertainty in maize projections resulted from different GCMs or crop models, whereas consideration of [CO2] fertilization effects had a very large influence on the magnitude and sign of the simulated impacts for wheat (Fig. S9). Due to our study design (CO2 effects confounded with RCPs), we have not isolated the uncertainty of model response to elevated [CO2], though comparison of the main and total effects for the crop models and CO2 terms suggests there is some degree of uncertainty in this across crop models." Reviewer #2 (Remarks to the Author): This paper addresses the trajectory for crop yields for two of the world's major food crops in Europe as affected by climate change. The authors use ensembles of climate change models and crop models under 3 representative pathways to examine the likely impacts of various climate change scenarios to 2050 on yields and yield variability. The paper is well written and the research is well conceived and executed. The findings are novel and of wide interest to the scientific and general community. The results are credible although the use of fixed annual resets of soil water at sowing is somewhat problematical.
* We thank the reviewer for these comments. We also acknowledge the very valid point about resetting the soil water, which the revised paper now addresses in the methods section, as detailed in in this reviewers specific comment on this topic below.
I have some comments and some suggested edits: L12. This may seem pedantic to crop modelling insiders but the term "gridded crop models" is potentially confusing to a general scientific audience. The models are not gridded but are in fact single NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments point models applied to a spatial grid with results interpolated between grid points. As this is the only instance in which the term is used perhaps it could be modified to say 'crop models applied to spatial grid'.
* changed as suggested L18 I found the use of the notation 'e[CO2]' confusing. Based on this line I understand it to mean elevated concentration of CO2 but e in association with CO2 is usually taken to mean total GHG emissions expressed as CO2 equivalents (e.g. to include NOx) which makes no sense in the context of this paper. I suggest the 'e' be replaced with 'elevated' throughout the text. * We thank the reviewer for pointing out this paper and we now refer to the paper in the introduction (see below). However, the results of the two studies are difficult to compare directly, as the focus of the Hochman et al paper (16) is on trends in historical yield data (actual yields and simulated water limited yields) compared to trends in the rainfall and temperature data, whereas our study compares drivers (mean temperature effects mainly accelerated development, drought and heat stress) of yield changes from a baseline to a scenario climate (assumed to have no trend in the 30 year period). A further difference in the studies is that our study confounds effects of higher temperature with changed precipitation in the scenarios, while the Hochman study confounds effects of accelerated development and drought (associated with temperature response). The added reference to the paper reads: "Similarly a process-based model was combined with climate and yield trend analysis for wheat in Australia to estimate the relative contribution of climatic and technological changes in explaining past yield trends 16 . Nevertheless in both studies questions remain as to the crop level processes dominating these responses, as potentially confounding effects of higher temperature accelerating development and damaging reproductive organs were not explicitly controlled for, both of which are expected to be larger under drought stress conditions due to canopy heating 21 ." L49-67 This begs the question -to what extent do the various crop models used capture these processes. A brief summary with reference to Tables S1 and S2 would help.

* We have now included a reference to the Tables and made explicit in our introduction that each of the models in the study consider these factors.
"The use of process-based crop models in this study considering each of these factors and their interactions allow accounting for compensation (accelerated development avoiding heat or drought stress) or reinforcement (drought stress leading to higher crop temperatures and greater heat stress) between mechanisms 32 " NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments L73-78 This long sentence peters out into nonsense in the last line. Also, is reference 33 relevant here? Economic factors and spatial aggregation are quite different though both may contribute to the lower variability of actual yields.

* This text and supplementary figures have been deleted
L99-100 The term 'optimal temperature responses' and its differentiation form heat stress required explanation.
* Sorry, this was meant to read "mean temperature responses". We have corrected the error in the text.  NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments * we agree that this would bias the results, but we have tried to indicate this transparently by indicating the number of models for each country with significant correlations (size of the symbols) We wanted to show this to indicate that for at least one model, good correlations are possible (wheat in Germany) indicating that climate variability is an important driver of yields, though most models fail to include the important processes (e.g., lodging, water logging, delayed harvesting, ground water contributions or diseases). On the other hand, maize yield variability in Spain and Portugal is likely driven by economic or irrigation water availability, as no model in the ensemble had significant correlations. In any case, in the revised manuscript we tried to better explain our purpose in including the analysis in Fig 1 and better connect it to the climate change analysis in our discussion as: "While these European patterns are informative, our analysis of baseline yield variability confirmed that adaptation planning must be conducted at the local level. The high degree of spatial variability in drivers and the number of models describing yield variability reinforces earlier findings of conducting adaptation planning at local scales with models that consider the most relevant factors 7 . The baseline analysis also provided a, albeit limited, degree of validation for our impact projections. Year-to-year maize yield variability was demonstrated as sensitive to drought stress, and this drought stress was projected to increase even after accounting for accelerated crop development with warmer mean temperatures. On the other hand, winter wheat yield variability was shown to be insensitive to drought and our model ensemble projected that yield limitation would not increase due to drought. We can have some confidence that for each crop, the drivers of yield change that emerged as important in the projections are built on models that had skill in explaining these drivers in the baseline. The important exception here is with wheat and drought, which emerged important in low yielding years, though our ensemble skill was found to decrease in many instances when drought effects were included. This leads to a final consideration on the possibility of weighting ensemble member projections based on performance in a historical period. We opted not to do this for reasons elaborated for climate model ensembles 5,6 . There is no good scientific basis to assume that models that capture past variability will best describe projected response 5 , as relative importance of processes are expected to shift under new climatic conditions. Given the importance of understanding how crops will respond to climate change, continued work to evaluate the robustness of impact study results is needed." L277 change 'for only' to 'only for' * changed L296 I think the authors mean 'yield losses' rather than 'yield levels'

* changed
Methods: There were no line numbers on this document so I added them and used these to reference comments.  8,9 ." L10 Why use only 2 GCMs for RCP 2.6?

* we have now added the explanation in the methods (line 10) as: "For RCP2.6, only 2 GCMs (HadGEM2-ES, MPI-ESM-MR) were available with all required input variables at the time the study was conducted."
L38-39 I take it from this statement that initial soil water and other soil parameters were reset annually rather than used in a continuous simulation. This needs to be justified as it has been shown to make a significant difference to results (e.g. Lilley and Kirkegaard 2016 in JXB).
* we agree that this is an important issue that has implications for simulations and we now treat this topic in the methods. We opted to reset the soil water each year as we suspected that additional (and substantial) uncertainty would be introduced based on the differing methods, skill and assumptions required to run the models continuously. In reality, crops are not grown in sequence, but in rotations with different crops which vary considerably across Europe. Therefore, we agree that it was beyond the scope of our study and skill of some of the selected models (which were selected due to their strength in simulation the heat and/heat and drought interactions). While our study is very much a simplification of reality, we can have confidence in the differences between models being related to processes considered and/or parameterization, and not due to differences in water available due to differences in simulating carry over effects, which will differ based on selected rotations as well as model skill. We now provide this explanation in the manuscript: "While previous studies have demonstrated the uncertainty introduced to simulation results by resetting soils water 10 , we opted to reset to avoid uncertainty that would arise from differing methods, skill and assumptions required to run the models continuously over seasons. Further, it was beyond the scope and expertise of this study to specify crop rotation sequences across Europe under climate change " NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments L71-72 Given the subject matter of this paper why did you include models that could not simulate these interactions? * Firstly, we acknowledge this is clearly a limitation of the study, but a limitation that we cannot easily address. That said, we emphasize here (and now added to the methods) that these two models are only applied to wheat systems where CO2 effects on radiation use efficiency, which both models include, dominate. We included both SQ and S2 as they are widely applied and tested in European wheat systems and have recently undergone improvements in their treatment of heat stress and in the case of SQ in improving their simulations of canopy temperature (interaction of drought and heat stress). However, to address this concern, we tested the robustness of our findings at the EU level for aggregate yield changes (Fig. 2) and drivers (Fig 4) with and without including those two models and found that including them did not change our main findings or conclusions. Below we present the EU aggregate yield changes in wheat across crop models, GCMs and scenarios as: What are the major claims of the paper?
This manuscript analyzes the key drivers of yield levels and variability under climate change using an ensemble of gridded grain maize and winter wheat crop models over Europe. The simulations show that climate change will lead to yield losses for grain maize and gains for winter wheat. Decreases in grain maize yield and increases in winter wheat yield were both primarily driven by increasing and decreasing water stress, respectively, followed by mean temperature. Heat stress emerged as a relatively weak cause of climate change induced losses. In low yielding years, the drivers of yield reductions are similar to all years, though intensified. However, unlike yields in all years, elevated CO2 did not offer any advantages in terms of mitigating losses.
Are they novel and will they be of interest to others in the community and the wider field?
As the authors note, statistical crop modeling has been used extensively to examine drivers of yield changes, and there are also a number of process-based modeling studies that explore facets of this broader question through sensitivity analyses and place-based studies where stresses are removed. That said, I find the use of gridded crop models over a large region with a methodical testing of yield reduction drivers compelling, and I am not aware of any similar studies. I believe this paper will be of particular interest to researchers in crop modeling community, and more widely of interest to researchers exploring food security and climate impacts of agriculture.

* thank you for this comment
Is the work convincing, and if not, what further evidence would be required to strengthen the conclusions?
Overall I find no critical flaws with the approach, but do highlight two issues that I believe should be addressed.
1. From my reading, one of the key contributions of your manuscript is identifying the prominent role of water stress in yield losses. This is counter to much of the statistical modeling work, where temperature is dominant (e.g., Schlenker and Roberts, 2009;Lobell et al., 2013). Yet, there isn't any discussion as to why this is. I would suggest pulling this theme out across the manuscript, expanding the motivation (lines 36-42), and incorporating this idea in the descriptions of NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments Figure 1, 3, and 4, discussion, and conclusions. Also, isn't the impact water stress strongly dependent on how it is parameterized in the crop model? Same with heat stress. How do we know that the threshold for heat stress damage isn't too high in the crop models, which is why it emerges in statistical analyses but not your study? More discussion of this would be helpful given the major claims of the paper. * we appreciate this suggestion and have now expanded the introduction and motivation to address these studies, and return to it in the discussion. We agree that simulated drought and heat stress depend on the model parameterization, and we think the present ensemble is well suited for this task, as all models explicitly consider heat stress effects on reproductive growth and/or development, and the majority include interactions between crop temperature and water status. These models have been developed and tested in a number of studies related to the MACSUR and AgMIP projects in the very recent years e.g. 4,11,[12][13][14][15][16][17] largely with the motivation to inform risk of damages from climate change as performed in the current study. Finally, we do not think this study contradicts the findings of the statistical studies mentioned, rather adds important nuance to the crop level processes driving the response to warmer temperatures, as was already suggested in the Lobell, et al. 18 paper. Our introduction now reflects this as: "Observational studies have offered considerable insight into the importance of high temperatures compared to precipitation in driving negative yield trends 19,20 and non-linear yield responses 21,22 . Subsequent study with a process-based crop model identified drought stress as the probable underlying mechanism of this high temperature response in maize in the US, as high temperatures drive non-linear increase in VPD, raising demand and concurrently depleting subsequent supply 18 . Similarly a process based model was combined with climate and yield trend analysis for wheat in Australia to estimate the relative contribution of climatic and technological changes in explaining past yield trends 19 . Nevertheless in both studies questions remain as to the crop level processes dominating these responses, as potentially confounding effects of higher temperature accelerating development and damaging reproductive organs were not explicitly controlled for, both of which are expected to be larger under drought stress conditions due to canopy heating 16 ." 2. There's a lack of significance testing throughout the manuscript. For example, "expected yield increases of 2-6% across RCPs when e[CO2] effects were included" (lines 127-128) and "additional drought limitation increased from 9% without consideration of [CO2] to 12% with its inclusion" (lines 202-203). That said, I don't think you need a p-value on all numbers, you could end up leaving those statements unchanged. But, you should conduct significance testing on your key results and review the manuscript for any trivial changes that can be removed to make space for more important text. NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments * This is a very good point, but not easy to adequately address within the scope of the paper. That said, we have now added statistical testing to the paper, acknowledging (1) that it violates key assumptions about error terms being random and independent, -and we additionally combine two members from non-independent ensembles (2)  Do you feel that the paper will influence thinking in the field?
I do believe that this paper will influence thinking in the crop modeling and climate impacts fields. It is an interesting application of models to push toward the attribution of yield losses to climatic drivers, which leverages the strengths of process-based modeling. As gridded crop models continue to improve, this type of assessment will only become more powerful, and there are aspects of this methodology that would be very interesting to apply at local scales, where model accuracy would be higher and the complexity of the response reduced. Lines 72 and 73: Can you push some of the supplemental figure references to the methods? Or at least restructure to lead with a figure in the manuscript? This will help highlight the most important figures and concepts and not immediately send the reader to nuance that only a fraction of readers will be interested in. Water limitation, water stress, and drought stress seem to be used interchangeably in this manuscript. There are so many scenarios that it would be best to just pick one term to refer to this and use is consistently throughout.
* Thanks for this pointer. Where the meaning is the same we have replaced water stress/waterlimitation with drought. We maintain "water-limitation" when the context is broad to include different degrees of water -limitation (none under perfect irrigation to full drought) Lines 123-126: "Most uncertainty in the projections for maize result from different GCMs or crop models (Fig. S9), with larger negative impacts projected using the HadGEM2-ES model arising from daily maximum temperatures that were 1.1, 1.5 and 1.7°C warmer MPI-ESM-MR for RCP 2.6, RCP 4.5, and RCP 8.5, respectively (Figs. S10 and S11)." NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments * sentence has been deleted Figure 1: Y-axis uses a "/", which suggests you're dividing. Use a comma or better yet make a second y-axis on the right. Where are the values for Spain and Portugal? Is the correlation negative for the UK (lines 109-110) and R2 positive (Figure 1b)? Maybe in that case you shouldn't plot the R2 and put a note in the caption.
* The y-axis label for Figure 1 has now been changed to " R2 (symbols)   * Yes, you are correct in that we consider two different reference points in this figure and we make an effort now to distinguish the two reference points by using only two symbol shapes (circle versus cross, shape was previously redundant with colour). While the use of two references can be argued as making the figure more complicated, it actually provides the most useful summary information. We now try to clarify/describe this more in the results. In the text for figure 3: "To understand how drivers yield changes under climate change, we decomposed yields at each of the national and EU level for rainfed systems into losses from potential levels due to: drought, heat stress, and the combination of drought plus heat for the baseline and three RCPs (Fig. 3). Additionally, for each of the three RCPs, changes in potential yield levels between the respective scenario and the baseline were examined to quantify the direct effects of warmer mean temperatures versus elevated [CO2]." and for figure 4: "To summarize the drivers of EU aggregate rainfed yields changes, changes in potential yields relative to the baseline, as well as absolute shifts in the losses from drought and heat for each scenario from levels in the baseline are presented in Figure 4." Consider switching the order of Figures 3 and 4. Figure 4 seems to give the Europe-wide drivers, and then you could launch into a discussion of the distribution of those drivers in space. Also, I think taking the figures in turn, instead of having readers try to simultaneously synthesize Figures 3 and 4, would improve the readability of results.
* We considered this, but opted to keep the current order, as figure 3 helps to explain our method in decomposing the yield changes into different drivers, whereas figure 4 summarizes how the drivers change. We hope that with the clearer descriptions and less descriptive text, the ordering will make more sense, but we remain open to changing it again if the reviewer strongly objects. Figure 4: What's the difference between "% points" and "%" on the y-axes? If you have three panels in one figure that look similar, readers will expect continuity, but this is not the case. Panel a is a percentage change relative to the baseline and Panels b and c are differences? Is Panel b even needed NCOMMS-18-08680-T Diverging importance of drought stress for maize and winter wheat in Europe Response to reviewers' comments given the effects are relatively small? Also I would suggest denoting changes consistently, either negative changes (as in Figures 2 and 3) or positive losses (as in Figure 4).  Figure S15 is particularly difficult to read. Is there some better way, a simpler figure or maybe a table?
* Some SI figures have been deleted, including S15 Finally, as these edits will likely require some text I wanted to mention places where I believe you could reduce the word count. Throughout the manuscript, sharpening your focus around your major claims should help some. Also, the "Implications for adaptations" section could be condensed.
* Thank you, we kept the word count very close to 3000 through removing descriptive aspects.
I appreciate that the authors have thoroughly and thoughtfully addressed all reviewers' comments.
The revised manuscripts reads well and makes a contribution to knowledge about drivers of yield variability and yield levels of wheat and maize crops under climate change in Europe. In my opinion the manuscript is now ready for publication.
Reviewer #3 (Remarks to the Author): Thank you for your response to my review. I see no additional major scientific issues with the paper. You've also made improvements to the clarity of the manuscript; however, I feel it is still confusing at points. I have included some suggestions that I hope will be helpful.
Lines 10-11: Sentence difficult to follow. Maybe "Knowledge of climate change impacts on yield means and variability is required to support adaptation planning and respond to changing production risks." Lines 85-87: The reference to Figure S5 is confusing. Figure 1 shows the relative unimportance of heat stress, Figure S5 just provides background information on irrigated and rainfed production. Add a reference to Figure 1, and maybe delete the reference to Figure S5. Figure S6 and other supplemental figures: Why is it "optimal temperature effects" while in the main manuscript it's just "mean temperature effects"? More broadly, make sure language is consistent across the manuscript and supplement. Figure 1: You make a differentiation between heat stress with air and canopy temperature but never discuss it in the main text. I would take it out (preferably to simplify) or add a sentence explaining. Figure 3: Fix references to crosses in caption. Also, I still feel that the multiple baselines make an already complicated figure more complex for no apparent benefit to the interpretation of results. I think readers will care about losses vs. baseline, not losses vs. future potential yield, especially given Figure 4 directly addresses changing impacts of drought relative to scenario potential. That said, I understand your perspective and am fine with leaving it as is. But maybe clarify in the caption that scenario potentials for each RCP are indicated by the black triangles: "For the other drivers, changes indicated by circles are relative to potential yields for that same scenario (black triangles). Drought -blue, heat stress -red and combined drought and heat stress -green." I think it would also be helpful to consistently use the language "relative to baseline potential" and "relative to scenario potential". Those are concise phrases that a reader can understand once and then apply to other figures.  Referring to this figure as "changing drought intensity" suggests you're looking at a change in drought itself. Also, is the use of "absolute" supposed to indicate change vs. baseline? I would define or drop this. Figure S11: Is the bottom row supposed to show a percentage reduction in growing season length due to earlier maturation or a change in yield due to earlier maturation? Currently the caption reads as the latter, but lines 213-216 suggest you mean the former.
Supplement title: Does not match manuscript title.