Introduction

Climate change impact on fruit tree yields and farm economic wellbeing

Climate change can impact crops, with yields of many important crops projected to decline in the future1,2. Increases in temperature, in particular, can reduce yields of major crops worldwide3. Such climatic impacts can and will have a detrimental effect on food availability and its nutritional value4. Because of its nature, much of climate change agricultural research is crop, region or country-specific. While there have been numerous investigations into the effects of climate change on various crops, the studies have tended to focus on wheat, rice, corn, and soybean, primarily grown in Asia, Europe and North America3. Unfortunately, there has been a lack of assessments regarding the vulnerabilities of fruit crops in the regions that we are interested in, namely North Africa and South America. In particular, no information is available on the extent climate change factors impact not only the fruit yields but also the overall financial wellbeing of farms.

In this paper we focus on the effects of climate change on crops that have an important nutritional and monetary value in Chile and Tunisia: cherry and peach fruit tree5,6. Both crops are sensitive to climate change damage, with reproductive organs being particularly vulnerable to climatic impacts, leading to a reduction in the quantity and quality of harvestable fruit7,8,9. Increases in winter temperatures can affect fruit tree chill requirements resulting in changes of bud, flower and fruit set10,11,12,13,14. Similarly, elevated temperatures during fruit set and development can lead to changes in fruit growth and maturation9,15,16. Combined with reduced water availability, high temperatures can affect both fruit yield and quality7,17,18. These effects can vary between fruit tree cultivars and species. Additionally, extreme events (hail, wind, frost) have also been observed to impact the physical environment and cause fruit crop damage7,19,20,21. These climate events are region-specific, affecting food production the crops to varying extents. While climate change impacts on fruit crop quality and yield can be estimated, evaluating climate change impacts on farm financial wellbeing is much more difficult and uncertain22. Yet the ability to predict the impacts of factors on the farm financial well-being is crucial for the development of appropriate policy measures that target factors with the highest monetary impacts.

Use of a hybrid approach to predict farm financial wellbeing

In this paper, we introduce a novel hybrid approach that combines machine learning and generalized linear models to address this challenge of predicting farm financial well-being. Traditional economic climate change impact models typically estimate effects of climate change on crop yields using climate and crop simulation models, and then translate this information into likely farm financial performance. However, these analyses are based on a number of assumptions that seldom take a combination of adaptive measures, socio-economic and other factors, such as regional differences, into consideration23. One of the most often used economic models measuring impacts of climate change on agriculture is the Ricardian approach that focuses on the land value and agricultural revenue24,25, with cross-sectional and panel regression analysis as the analytical tools of choice26. Whatever the approach and type of analysis performed, the omission of variables that may directly or indirectly affect crop/farm incomes/revenue makes climate change financial impact assessments highly uncertain. In classical statistics, regression analysis can have predictive powers. But there are situations where regression analysis is not sufficient to handle the generated datasets or the specific questions to be answered or where the assumption of the existence of a linear function between independent and dependent variables doesn’t hold. This is especially the case when complex variable interactions are present in the dataset. And this is where machine learning becomes a useful tool that complement traditional statistical analysis27,28,29.

Machine learning offers the ability to analyse large datasets and many variables simultaneously, reducing the chance that important variables are left out of the data analysis process. It comes thus as no surprise that machine learning has vast potential to analyse big data in agriculture30,31,32,33,34,35,36,37, especially when considered in combination with other research domains, such as climate change38,39,40,41. However, tackling agricultural problems is complex. For example, whether a new crop variety actually provides better yield and farm income under certain climatic conditions is potentially dependent not only on its genetic traits but also on many other factors, such as those related to biophysical and farm management issues42. This means that complex and deep interactions could exist in the datasets. Such data can become quickly difficult to properly analyze using classical statistical approaches. The resulting datasets, just for one farm, could encompass millions of data point combinations. Importantly, analysis of such data can provide answers as to which variables, from the millions of possible combinations, are associated and important for the outcome variable - in our case financial well-being of a farm. This is where the power of machine learning can be explored to its full potential36,43. By including not only biophysical variables such as microclimate effects, soil structure and quality, but also socio-economic variables, such as land use, urban-farm water accessibility, farm size, demographic data and access to markets, machine learning enables analysis at every step of the agricultural value chain32,44,45. Thus the usefulness of machine learning is evident not only when considering ultimate outcome variables, such as the financial well-being of a farm, but also to assess whether adaptive measures were effective in maintaining or increasing crop yield under certain climatic conditions, provide information on the relative importance of an intervention for a desired outcome and generally help with future predictions and strategies38,40,46,47,48. However, the interpretability of machine learning models, especially complex algorithms like

support vector machines, deep neural networks, and random forest or boosted trees, can be limited. Although there are post hoc interpretability methods to approximate the functioning of such black box models, there is no straightforward way of understanding and interpreting the exact processes leading to the outcome. This is potentially a major drawback for research questions that aim to deepen the understanding of the processes or factors associated with the desired outcome.

To overcome this problem, we introduce herein a hybrid method that combines analysis of datasets based on generalized linear models combined with strategies from machine learning, such as cross-validation and boosting and group-variable selection. The output of this approach preserves interpretability, respects the group structure of the data and is still competitive with state-of-the-art machine learning algorithms. Detail information about this strategy can be found in the data analysis section of this paper. Use of this hybrid model has allowed us to effectively address our research objectives.

Research objectives

The primary objective of this paper is to assess the potential impact of climate change on the financial well-being of fruit farms. To achieve this, we relied on farmer self-reporting about the past experiences with climate change and examined whether these experiences had any bearing on the financial performance of their farms. The information was collected through face-to-face interviews. It is important to note that because it was the farmers who provided the information for subsequent data analysis, we are in effect, reporting herein on farmer´s perceived financial well-being. Perception in the context of this paper refers to how individual farmers interpret, assess and experience climate change information. Their perceptions may be influenced by sensory observations as well as their previous memories, knowledge and expectations of climate change.

To address the complex nature of the datasets, which includes many grouped and single independent variables, we employed a combination of classical statistical analysis and machine learning techniques. This approach allowed us to consider the high dimensionality of the data and determine the relative importance and predictive power of both individual and grouped independent variables in relation to the outcome variable.

In this paper, we aim to answer three basic research questions based on farmer self-reporting. First, we investigate whether climate change has a discernible impact on how well fruit farmers are doing financially. Second, in cases where climate change is not important for the farm financial well-being, we explore what other factors may influence this outcome variable. And third, we examine the potential effects of factor interactions on farm financial well-being. By addressing these questions, we seek to enhance our understanding of the relationship between climate change and farm financial well-being.

Results

Climate change effects on farm financial well-being

First, we evaluated whether experiencing climate change had any impact on farm financial wellbeing. Decreasing rainfall and increasing temperatures were associated with reduced farm financial well-being (Fig. 1). Combined, farmers in Chile and Tunisia, who have experienced reduced rainfalls, were significantly less likely to do financially well than farmers who did not experience reduced rainfalls (0.635, p = 0.020). To a lesser extent, increases in temperature in the two countries also resulted in the likelihood farms to do financially well (0.751, p = 0.11). Increasing drought frequencies and extreme weather experiences had no significant impact on farm financial well-being in any of the regions studied. The effects of increasing temperatures or decreasing rainfall were more discernible in Chile than Tunisia. Thus negative experiences with certain climatic factors lowered in some cases farm financial well-being, with the provision that the effects of the negative experiences may be country-specific.

Fig. 1: Effect of experiences with climate change and crop financial damage on financial wellbeing of a farm in Chile and Tunisia.
figure 1

Confidence intervals of the Odds-ratios (OR), based on logistic regression (see Supplementary Tables 4 and 5 for more data).

Second, we investigated to what extent financial damage to crops, caused by specific climate change impacts, is associated with overall financial farm well-being. The results indicate that farms that performed financially well, the odds were that only decreasing rainfall-associated income impacts were significantly associated with farm-high well-being (0.568, p = 0.002 for Chile and Tunisia combined, 0.434, p < 0.001 for Chile). Farms that were not doing financially well, the odds were that higher temperature-associated income impacts were significantly associated with farms low wellbeing (2.119, p = 0.021 for Chile) and more frequent drought (2.457, p < 0.001 for Chile and Tunisia combined, 2.623, p = 0.003 for Chile and 2.385, p = 0.006 for Tunisia). Decreasing rainfall, especially in Chile, seemed to be somewhat relevant for explaining low well-being farms. It is noteworthy that although experiencing drought was not significantly associated with low or high financial well-being, the financial impacts of drought tended to be significantly associated with farm financial well-being.

Variables important for farm financial wellbeing

The sparse group boosting (sgb) algorism allowed the model to choose between individual and grouped independent variables for the predictive modeling (Fig. 2). Arrow directions indicate the added effect size (log odds) of all variables within one group on the farm financial wellbeing, resulting in a latent variable. For high financial well-being, upward pointing arrows indicate that an overall increase of group variable values lead to an increased probability for high financial well-being, while downward pointing arrows indicate a decreased probability of high well-being. Similarly, for low financial well-being, an arrow pointing upward means that increases in the group variable values increase the probability of low well-being. Thus higher/increasing social assets will increase the probability of farm high well-being. Note that no arrows were added for nonordinal variables or groups of variables.

Fig. 2: Most important variables contributing to farm financial well-being.
figure 2

Sparse group boosting model for Chile and Tunisia and high and low financially performing farms separately. Central Chile was associated with higher financial wellbeing compared Southern Chile and Northern Tunisia slightly higher than Central Tunisia.

Generally, variables not related to the climate change factors were comparatively more important for predicting farm financial well-being. Thus the most important predictors of farm high financial well-being, common both to Chile and Tunisia, are social (reliance on/use of information, trust in information sources, community, science or religion) and biophysical (farm size, water management systems used on the farm, diversity of crops used) assets, as well as one individual variable, years of owning the farm (Fig. 2). The latter two tend to have a negative effect on farm financial wellbeing. Natural assets (regional differences) are important predictors almost exclusively only for Chile, where farms in Central Chile tend to exhibit higher financial well-being. Prior farm ownership and the human asset group (including education, age, gender, and knowledge) are important factors specific for Tunisia only. The most important predictors of farm low financial well-being, common both to Chile and Tunisia, are regional differences, income impact and economic asset groups, where for example increasing farm debt and reliance on orchard income increase the likelihood of farm low well-being. A number of factors are associated with the likelihood of farm low well-being in Tunisia only: these are the length of farm ownership, drought, social and biophysical assets groups, and varieties grown. The latter three are associated with increased likelihood of reducing low financial well-being. For Chile only, the important individual factors are use of a well and years of farm management. The more farms use wells, the less likely will they exhibit low financial well-being, whereas longer the farmer is managing the farm, higher the likelihood of low financial well-being. Factors unique to Chile are not very important variables.

Note that some factors are important predictors of both high and low financial well-being, just with oposing effect. For example, increased well usage in Chile increases the likelihood of high well-being while decreasing the likelihood of low well-being. In Tunisia, prior family ownership decreases the likelihood of high well-being while increasing the likelihood of low well-being. The exception are biophysical assets, that decrease the odds for high wellbeing and also decrease the odds for low wellbeing, indicating using biophysical assets, like adaptive measures, are only useful to help farmers with low financial wellbeing.

Variable interactions affecting farm financial wellbeing

We have examined whether interactions between independent variables may change the model outcomes vis a vie financial well-being of a farm (Fig. 3). Even though the model that included variable interactions was not as predictive as the model including only additive effects (Table 1), the importance of each interaction still showcases interesting and important inter-dependencies in the datasets. One outcome is that the region variable seems to be less important when other interactions are considered. Interactions within and between social and human assets seem to be relevant for the farm’s financial well-being, especially those related to use of information and trust. Interactions that involve adaptive measures, current assessment of climate change as well as education are also of relative importance. Such interactions point to inter-dependencies between variables and to likely confounding and mediating effects of certain variables.

Fig. 3: The most important interacting variables for farm financial wellbeing.
figure 3

Component-wise boosting model for Chile and Tunisia combined.

Table 1 Predictive power for farm financial high and low well-being.

Figure 4 provides information about some noteworthy interactions that can affect farm financial well-being. Without the use of newspapers as a source on information the probability of high well-being drops in Chile and Tunisia when temperature increases or precipitation decreases (Fig. 4, top left). However, when farmers used newspapers, financial well-being in Chile and Tunisia is not markedly reduced by increasing temperatures or decreasing precipitation (Fig. 4, bottom left). Indeed, use of newspapers increased the probability of farm financial well-being irrespective whether or not temperature increases or precipitation decreases: the use of newspapers eliminated any negative effect of reduction in precipitation or increases in temperature on doing financially well. A similar effect was observed for trust in industry (Fig. 4, top right and bottom right). Farmers, especially in Tunisia, who trusted industry as a source of information, were more likely to do financially well than farmers who did not trust industry, regardless whether or not they experienced a reduction of precipitation. However, the effect of increasing temperatures on high wellbeing seems to be unchanged by trust in industry in Tunisia while in Chile, trust in industry, compared to no trust in industry, intensified the negative effect of temperature increases on financial farm wellbeing.

Fig. 4: Probability for high financial well-being of the farm.
figure 4

Comparisons based on an interaction between country, climate change factors, use of newspapers and trust in industry.

Trust in media, use of industry information and farm financial well-being indicate that farmers, regardless of their country of origin, who did not trust media and did not use information from industry had the lowest probability of doing financially well (Fig. 5, top left). Farmers who did trust media sources but still did not use industry information, performed financially substantially better. Farmers with the highest probability of doing financially well were those that trusted the media and used industry information, where the trust factor acted synergistically with the use of information. The importance of trust for financial well-being can be illustrated with the effect

Fig. 5: Probability for high financial wellbeing of the farm.
figure 5

Comparisons based on an interaction between country, trust, use of information sources and education.

of trust in industry, experts and government. Thus, trust in industry acted synergistically with trust in experts (Fig. 5, top right) as did trust in government and trust in industry (Fig. 5, bottom left).

In all cases, farmers that trusted industry, experts or the government were more likely to be financial well off than farmers who had no trust in their information sources. Other interactions, for example, education and use of media also have a positive modifying effect on farm financial well-being in Chile but not in Tunisia: educated farmers who used media tended to be more likely to do well financially than farmers with low education (Fig. 5, bottom right).

Discussion

Previous studies have highlighted the detrimental effect of individual climate change factors on crop yields and farm income3,47,49,50,51,52,53. Our research contributes to these findings by showing that climate change factors, when analyzed concurrently, impact fruit farm financial well-being to different extents. Whereas odds are that increasing drought and reduction in the amount of rain will negatively affect fruit farm financial well-being, especially in Chile, extreme climatic events do not seem to play such a role. Thus, while farmers have discussed possible fruit damage due to frost or hail events54, such events do not appear to affect the mid to long-term farm income prospects. Indeed, fruit farmers are more likely to be concerned about drought issues (and consequently future water availability)54, reflecting findings herein showing that the increasing frequency of droughts had a negative effect on farm income and farm financial wellbeing.

Contrary to expectations, our analysis reveals that climate change is, compared to other factors we investigated, not the most important factor for predicting fruit farm financial wellbeing. In Chile, farm location emerged as the strongest indicator of farm financial well-being, with farms in central Chile doing better than farms in Southern Chile. In Tunisia, farms that have been in family possession for multiple generations, did worse financially. Chile and Tunisia also shared a number of important predictors. In both Chile and Tunisia, access to information and trust of information sources are more important than climate change in predicting farm financial well-being. These shared factors are useful to predict both financially high and low-performing farms: better the information access and more trust there is in information sources, better the farm financial performance and vice versa. On the other hand, climate change-related factors do play a more important role for farms not doing financially well.

As predictive factors differ between farms doing financially well and those experiencing financial hardship, policymakers or farmers need to employ different strategies depending whether they wish to focus on maintaining or improving fruit farm financial performance. An argument can be made to focus on factors important for improving farm financial well-being as financially healthier farms are more likely to be resilient against climatic impacts48,55. Furthermore, synergistic effects and interactions between factors can affect their individual or combined importance for farm financial wellbeing. It is important to note that inter-dependencies between factors can motivate farmers to respond to climate change56. In this respect, the specificity of some factors implicated in fruit farm financial wellbeing advocates for collecting extensive regional rather than country-wide datasets.

Although our findings presented herein indicate that climate change currently is not important for predicting fruit farm financial wellbeing, the situation may change in the future. This is evident from climate change trends analyzed in this paper: the odds are that with higher temperatures and less precipitation fruit farm financial performance may decrease. Temperature predictions indicate continuing increases of winter night temperatures in Tunisia and Chile in the future14,57. This will lead to winter chill deficits and potential problems with fruit tree phenology necessitating changes in the types of fruit trees grown. Similarly, reduced precipitation and water availability in Tunisia has serious implications for the future of fruit trees in that country58. Much of the irrigation water for fruit trees comes from underground aqua ducts. If they are depleted or become saline, farmers in Tunisia may face major crop yield loses. In Chile, reduced water flow from the Andes, increasing urbanization and inappropriate crop use could create water distribution bottlenecks59,60. These latter predictions are in agreement with farmer climate predictions for the future: most worry about the effects of drought and water availability (63 > 54). Ensuring general access to water, beyond relying solely on rainfall, is critical for adequate irrigation and reducing drought exposure. Resolution of these problems will require the implementation of specific adaptive measures that will reduce the future vulnerability of fruit farms to climate change, measures that farmers and governments need to be willing to pay55,61. Policymakers must enact regulations to guarantee fair and sufficient access, distribution, and use of limited water resources among all stakeholders. Furthermore, policymakers should provide fruit farmers with effective, affordable, and accessible resources and tools to enhance farm adaptive capacity and reduce vulnerability to drought, such as sustainable irrigation systems, insurance schemes, crop alternatives, and farm management training54. When it comes to communication efforts to convince stakeholders to adapt the necessary protective measures, policy makes must keep in mind that climatic impacts may not be a primary risk to farm financial wellbeing. Indeed, due to the current conflict in the Ukraine and Covid aftermath, costs associated with adaptive measures are likely to become a dominant concern of many farmers around the world. It is also worthwhile to remember that more media coverage does not necessarily influence farmer perception of climate change: we found no substantial association between use or trust in media and farmer perceptions of climate change. Similarly, if farmers trust or use media as their source of news, they don’t necessarily think that precipitation decline is bad for farm financial well-being.

To our knowledge, this is the first instance of using a hybrid modeling approach combining statistical models with machine learning techniques to analyze data in a much more complex and integrated manner. Using this approach, our aim was to improve predictability while maintaining interpretability. We found that statistical models, utilizing limited datasets that reflect the requirements of relevant theories, can be used to make adequate predictions about the relationships between climate change, intervening variables, and the outcome variable. However, we have also shown that by combining statistical models with specific machine learning methods, such as boosting and cross-validation, we were able to substantially improve the predictability of the (generalized linear) statistical model. This hybrid model can still be interpreted through variable importance and odds ratios, but classical inference based on F and t statistics is not valid for variables selected through a data-driven process62. Predictive modeling provided new insights into data relationships that can serve to generate and test new hypotheses by classical statistical means. Even though the random forest (a typical black box model) outperformed the sparse group boosting model, we believe that this improvement generally does not compensate for the loss of interpretability. With a similar analysis methodology, neural networks marginally outperformed logistic regression63. The predictive sparse group boosting and component-wise boosting models were ultimately chosen for the current data analysis. The former model provided evidence of regional or supra-regional variables that are important for predicting whether fruit farms will perform financially well. The latter model revealed that the interaction between various variables and farm financial well-being, as the outcome variable, is not a simple one-to-one relationship. Rather, certain variables, such as trust and use of specific information sources, appear to have a modulating effect on variables that may directly affect the outcome variable.

Conclusions and future considerations

Our research underlines the usefulness of the hybrid analytical approach and highlights specific climate change factors that impact fruit farm financial wellbeing while emphasizing the significance of other influential variables. Policymakers, stakeholders, and researchers can utilize these findings to develop targeted strategies and adaptive measures to support fruit farmers, reduce their vulnerability to climate change while enhancing the financial stability.

Our experience with the hybrid model indicates that, especially when it is necessary to balance predictive improvements (usually requiring larger datasets) with loss of model interpretability, the research questions and the modeling tools available will dictate the extent and complexity of the data to be collected, whether the focus should be on regional or supra-regional datasets and the type and depth of analysis that can be performed. Machine learning provided the opportunity to include a broader range of independent variables with substantially better predictability of farm’s financial well-being and clarity of data presentation than offered by traditional regression analysis. We believe that, through group-component-wise boosting of generalized linear models, our hybrid approach can generate useful predictions in high dimensional settings, while still preserving basic interpretability, like variable importance and odds ratios. This way, new hypotheses and models can be generated, left to be validated or rejected by future research. The key challenge for future studies will be to find the correct balance between a theory-based approach, where a limited number of likely relevant variables are included in the survey design and resulting datasets, and a black-box approach that relies on deep mining of the largest possible number of data points.

Our results indicate that self-reporting of changes in temperatures and precipitation within the last ten years generally reflect the meteorological observations over the past 30 years. Farmer’s perceptions and self-assessment are thus a valid tool to investigate the linkage between climate change and other factors, such as farmers’ perception of financial well-being as an outcome variable and ultimately allows investigation into the influence of perceived financial well-being on farmer behavior. It is, however, important to note that the perception of farm well-being is not the same as using actual financial performance data from farms or regions to assess its impact on farmer behaviour. Future research should consider collecting actual farm financial data and conducting comparative studies with self-assessment data collected from face-to-face interviews with farmers. The relatively small size of the resulting dataset, based on face-to-face interviews with 800 farmers, restricted subnational comparisons and increased the possibility of false selections due to the large number of influencing variables. However, the project size, the complexity of the survey and the length of the interview (ca. one hour), precluded a larger sample size and the number of variables and items to be investigated.

Fruit farming is an important sector for the economy, particularly due to high export potential. It is essential to develop policies that support fruit farmers in improving their financial well-being and achieving financial stability as the climate changes. Farmer experiences with climate change is reflected in perception of their financial well-being but it is factors other than climate change that are deemed to be more important for farm financial well-being. Policymakers should thus prioritize supporting and strengthening farmers’ financial well-being beyond climate change considerations. Addressing issues such as trust, information sharing and targeted communications can contribute to these goals.

Methods

General agricultural attributes of the study areas

According to FAO statistical yearbook for 2022, the world value of primary agricultural production reached USD 2.7 trillion, of which fruits represented 17%64. World Food and Agriculture- Statistical yearbook 2022. Rome. doi.org/10.4060/cc2211en). More specifically, the agriculture and related sectors in Chile represent 24.4% of total exports, 9% of total GDP, and employs around 10% of Chile’s labor force65. In Tunisia, agriculture represents 12% of the country´s GDP, employing 16% of the country´s workforce66. It is, however, very difficult to obtain up-to-date and reliable information on the importance of cherry and peach crops for the economies of Chile and Tunisia. Chile 2022 cherry production was estimated at 255 711 metric tons, ranking number 6 in the world67. Majority of the production is exported to China, valued at over USD 2 billion68. Tunisia 2022 peach production was estimated at 123 000 metric tons, ranking number 20 in the world69. Majority of the exported production is destined for the Gulf states70.

Environmental attributes of the study areas

Four contrasting geographical and climatic regions were selected for the study, two regions in Tunisia and two in Chile. In Tunisia, these were the Mornag and Reueb peach-growing regions. In Chile, these were the Rengo and Chillán cherry-growing regions.

Tunisia

Mornag, Tunisia, hereafter referred to as Northern Tunisia, has an elevation of 110 meters and is located approximately 20 km east of the capital Tunis. The region has a Mediterranean climate. Precipitation in Mornag is characterized by a rainy fall-winter season spanning October and March (ca. 400 mm) and a relatively dry spring and summer (ca. 130 mm). The coldest month is February with minimum and maximum average temperatures of 5.5 oC and 16 oC, respectively. The warmest month is August with average minimum and maximum temperatures of 22 oC and 34 oC respectively.

Regueb, Tunisia, hereafter referred to as Central Tunisia, has an elevation of 160 meters and is located approximately 230 km south of Tunis. It is a semi-arid region characterized by low rainfall and high temperatures. Most of the rainfall is between October and the end of March (ca. 210 mm). Spring and summer are dry (ca.80 mm). The coldest month is January with minimum and maximum average temperatures of 5 oC and 15 oC, respectively. The warmest month is July with minimum and maximum average temperatures of 21.5 oC and 36oC respectively.

Chile

Rengo, Chile, hereafter referred to as Central Chile, has an elevation of 570 m and is located approximately 110 km south of Santiago de Chile. The Mediterranean climate in this region is characterized by rainy, cool, wet winters and hot, dry summers. Rainfall is concentrated in the winter months between May and September (ca. 500 mm). Spring and summer tend to be dry (ca. 60 mm). The coldest month is July with minimum and maximum average temperatures of 0 oC and 10 oC, respectively. The warmest month is January with minimum and maximum average temperatures of 10 oC and 24 oC, respectively.

Chillán, Chile, hereafter referred to as Southern Chile, has an elevation of 120 to 150 meters and is located approximately 380 km south of Santiago de Chile. The climate of the region is Mediterranean, with the rainy season occurring primarily during the winter months. Summers are relatively dry. Most of the rainfall occurs in the winter between May and September (ca.700 mm). Rainfall in the spring and summer is ca. 200 mm. July is the coldest month with minimum and maximum average temperatures of 0.5 oC and 11 oC. The warmest month is January with minimum and maximum average temperatures of 10.5 oC and 25 oC, respectively.

In order to place farmer perceptions in the context of climate change, we analysed regional Chile and Tunisia climatic data for the last 30 years (see Supplementary Fig. 1). Changes in temperatures and precipitation within the last 10 years generally reflect the meteorological observations over the past 30 years. Farmer´s perceptions are thus a valid tool to investigate the linkage between climate change and other factors, such as farmer´s perception of financial wellbeing.

Data collection: survey methodology and sampling

The data collection instrument used in this study was a face-to-face survey with cherry farmers in Chile and peach farmers in Tunisia. A total of 801 farmers were interviewed, 401 in Tunisia and 400 in Chile in the fall of 2018 and spring 2019, respectively.

Survey methodology

The questionnaire for the survey was prepared in English and translated into Tunisian Arabic and Chilean Spanish. The translated documents were back-translated into English to check for inconsistencies. The survey was pre-tested with 12 farmers in consultation with Qualitas AgroConsultores in Chile and Elka Consulting in Tunisia. Based on their feedback, and that of our research colleagues in Tunisia and Chile, some questions were removed while others were reformulated. The same consultants carried out the face-to-face interviews. Farmers were asked to answer a combination of multiple-choice, open, Likert Scale and Yes / No questions related to climate change and climate impacts on their farms between the years 2009 and 2018 and to their past, present and planned adaptive measures. The relevant survey questions and analysed variables are presented in the Supplementary Tables 1 and 2.

We analysed threat to fruit farms from four different climate change factors: temperature, precipitation, extreme weather and drought. Farmers were asked whether any of the factors over the past 10 years were increasing, decreasing, staying the same or became unpredictable.

In addition to climate change, there may be groups or individual farm-related variables that may, by themselves or in interaction with climate threat, affect farm financial well-being.

  • Groups of variables. We have focused our analysis on groups of farm variables (assets) that may be important for farm financial well-being. These were:

    • Natural (geographical regions)

    • human (education, age, gender, knowledge)

    • social (reliance on/use of information, trust in information sources, community, science or religion)

    • biophysical/manufactured (farm size, water management systems used on the farm, diversity of crops used, adaptive measures)

    • economic (farm debt, farm performance, reliance on orchard income)

    • climate experience

    • income damage

    The choice of the above variables was made on the basis of the five resource/capital sustainability model that addresses the concept of sustainable wealth creation71,72.

  • Individual variables. Above listed grouped variables were also assessed individually. In addition, other variables were examined that may, by themselves or in interaction with climate threat, affect farm financial wellbeing.

  • Dependent variable. The question given to farmers that defines the dependent variable was: When it comes to financial matters of your farm operation, how well is your farm doing?” The variable consists of three categories. Doing well and very well, neither doing or not doing well (“neutral”), and not doing well or not well at all. Throughout the analysis, the financial well-being variable is coded as two separate variables. We refer to the first variable as “high well-being” comparing farmers who are doing well and very well financially with farmers who are doing neutral or not well (reference category) and the second one as “low well-being” differentiating between farmers who are not doing well financially with farmers who are doing neutral, well or very well (reference category). This enabled us to differentiate between the process leading to farmers not doing well and the process leading to farmers doing well, as the farmers who are neither doing or not doing well are always part of the reference category.

Sampling

A list of individual fruit farms in regions of interest were obtained from respective Ministries of Agriculture. Farms from these lists were randomly selected for the survey if they fulfilled the following criteria: farmers had to own the farm, manage and work on the farm and derive over 70% of their income from their farming activities. A total of 801 face-to-face interviews were subsequently conducted with farmers who fulfilled the preselection criteria – 401 peach farmers in Tunisia (201 in Mornag and 200 in Regueb regions) and 400 cherry farmers in Chile (200 in Rengo and 200 in Chillán regions). The approximately one-hour-long interviews were carried out with farmers directly on their farms. The interviews were carried out after harvest completion in the fall of 2018 by Elka Consulting in Tunisia and in the spring 2019 by Qualitas AgroConsultores in Chile. Guidance was sought from the Department of Communication and Media Research, University of Munich about the participation of human subjects in the survey research and subsequent data use. The farm data was collected according to data collection procedures applicable in each country. Informed consent for the data collection was provided by the survey participants. No personal identifiable data was collected, assuring full anonymity. After compiling the data from farmer interviews, the resultant datasets were checked for errors and integrated into excel formats for further data analysis.

Data analysis strategy

Research question one: does climate change have an effect on how well the farm is doing financially?

We used a statistical approach to determine the effect of independent variables on the farm financial well-being. As the two outcome variable “high well-being” and “low well-being “ are binary, we used logistic regression and analysed the odds ratios as well as associated p-values and confidence intervals of adaptive measures and past experience for the outcome.

Research question two: what factors, other than climate change, may be important for the financial well-being of a farm?

This research question imposes a major challenge. There are many possible influencing variables in the dataset. Some may be relevant for the outcome variable, but others may not. Variables not related to the outcome variable create unnecessary background “noise” because generalized linear models tend to over-adapt to the data (the so-called overfitting) in high-dimensional cases. In the extreme case, where the number of independent variables is higher than the number of observations, linear models cannot be fitted. The solution to this problem is to perform variable selection, and then include only these variables in the model. The current practice is to perform this selection based on literature and expert knowledge. In fact, there is always an implicit variable selection process based on which such data is collected. However, one may still end up with a large number of possible influencing variables. In this situation, the combination of statistics and machine learning can be used to perform the variable selection. We used model-based boosting73, but other strategies, such as the Lasso74 can be utilized. The model-based boosting strategy is to improve a given model by only adding variables that improve the overall model the most. The process of adding variables is stopped if a further update would not result in a “better” model.

Importantly, in some instances, grouped variables may be more important for the model than individual variables. We used sparse group boosting for this purpose75. In sparse group boosting, the model can decide between individual variables and groups of variables. New hypotheses can be generated about the association of selected variables or groups of variables and the farm’s financial well-being. Being able to differentiate between the importance of groups and individual variables may help in designing questionnaires because if individual variables are more important than the group, only the important individual variables need to be included in the questionnaire. This may greatly shorten the questionnaire without loss of information. Conversely, variable groups may provide information about variable interactions.

Research question three: Are observed effects on the financial well-being of a farm the result of moderating effects and/or more complex relationships between variable?

We analysed (pairwise) interaction effects of all variables on the financial well-being of the farm. Interactions of variables were evaluated with the help of model-based boosting, allowing comparisons of their relative importance for the outcome variable. Note that if there are p variables in the dataset, then there are 0.5* p*(p-1) possible interactions in the dataset, leading to an even higher dimensional noise problem. However, this brute force method has the potential to identify important moderation or additive variable effects, and thus increase our understanding of the processes leading up to the outcome.

Depending on the research question being asked, the complexity of data analysis, as described above, may still not be sufficient. In such situations, noninterpretable black-box machine learning models should be used. Comparing the predictive performance of these machine learning models with the interpretable hybrid and statistical models gives an indication of the necessary analytical complexity. If the hybrid model outperforms the black-box model regarding the predictive power (i.e. delivers better AUC), then further complexities are not necessary. If the converse is true, the goal of future research should be to understand how these complexities can be explained, for example, by using highly nonlinear relationships or higher-order interactions.

Models used for data evaluation

Statistical models

We used generalized linear models76 to answer whether interventions had an impact on the outcome of interest. As the outcome variables were binary, logistic regression was used to provide odds ratios, the corresponding p-values, and confidence intervals.

Machine learning

We have compared different popular machine learning models to ensure that the models used for our analysis were competitive in their predictability. A list of all models used is given in Supplementary Table 3. In contrast to the model-based boosting models and the logistic regression, these machine-learning models do not allow insight into the data.

Hybrid statistical - machine learning-based predictive models

We decided to use model-based boosting as means to select variables for the predictive models. The number of boosting iterations was controlled by 25-fold cross-validation using the training data. This hyper-parameter controls effect penalization (smoothness) and regularization (variable selection)73. Variable selection was completed in under 4000 iterations. The effect sizes, in our cases the odds ratios, were shrunken to zero through ridge regularization. This makes it easier to interpret the results since only the most important variables for the outcome must be analyzed and irrelevant variables are not considered by the model. Since the influencing variables can be clustered into groups, as described in the contextual definitions, we used sparse group boosting75 as an extension of model-based boosting. The chosen approach allows the resulting model and variables to be interpreted similarly to generalized linear models77. A possible alternative for this approach is to use the lasso and the sparse group lasso78.

Model evaluation

70 percent of the observations in the data were randomly assigned to the training dataset and the remaining 30 percent were assigned to the test data set for the final evaluation.

Model evaluation was based on the area under the receiver operator curve, as evaluated on the test data. For the binary outcome variables, two major performance metrics were evaluated at every threshold of probability. First, the rate of correctly identified farms doing well financially, and second, the rate of correctly identified farms not doing well financially yielding the receiver operator curve (ROC). The area under the ROC (AUC) takes both rates into account by considering all possible thresholds of probabilities computed by a prediction model. We also computed the Accuracy as additional metric, which is the percentage of all correctly identified/predicted farmers in the test data set by a classification model. Even though this metric does not balance the true positive and true negative rate in unbalanced data like the AUC, it is used because of its intuitive interpretation property.

All data analyses were performed using the statistical programming environment R, visualizations were created with the R package ggplot279.

Choice of predictive models for data evaluation

We compared different predictive models to ascertain which model has the best predictive power and should therefore be used for the data analysis (Table 1). Except for Chile and Tunisia combined low financial wellness, the random forest (rf) tended to outperform all other models for Chile and Tunisia combined as well as for Chile and Tunisia separately. The overview of ROC curves for selected models can be found in Extended Data Fig. 2 Boosted decision trees (gbm) performed similarly to sparse group boosting (sqb) and model-based boosting (mb). In all cases, neural networks (nn) performed worse than sgb and mb. Generalized linear model (glm), which consisted only of experiences with climate change and its financial impact, had lower predictive properties than sgb and mb. However, when the glm was fitted with boosting (model-based boosting-mb), which included more variables related to the farm vulnerability to climate change and geographical location, the accuracy and AUC tended to improve compared to the glm-only. Including interactions between all independent variables (mb-int) did not improve the predictive outcomes of model-based boosting. The results imply that only considering experiences with climate change and its financial impact as in the glm is not enough to explain both financial well-being variables. Thus, additional variables had to be considered. When compared to the interpretable models, accounting for deep

interactions and complex relationships like the random forest could, in some cases, result in marginal improvements in accuracy and AUC predicting high well-being, but for predicting low well-being the simpler models seem to suffice. Since our investigation necessitated data interpretation, sgb was chosen for subsequent data analysis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.