Introduction

Extreme weather events, such as drought, flood, and heatwave threaten food security from regional to global scales through the resulting sharp decline in the availability, affordability, and adequate utilization of food1,2. During 2003–2013, extreme weather events caused marked damage of USD 30 billion to agricultural productivity3. Crop production was impacted the most, with yield reductions3,4,5,6,7 introducing price volatility in the food systems1,8, affecting food trade, the welfare of farmers, and economic development, especially in low-income or import-dependent countries1,9,10,11.

International crop trade can potentially alleviate the negative impacts of extreme weather events on food security by exporting food commodities from surplus to deficit regions12. Currently, international trade accounts for 23% of the global food supply for humans through major food commodities. Wheat, a crop essential for people’s daily caloric and protein needs, was produced at the rate of 683 million tonnes annually and 147 million tonnes was traded globally, accounting for 22% of the crop trade (in caloric content) in 200913,14,15. However, a heavy reliance on imports from other countries or the global market may expose a country to the yield and market variations outside of the country’s jurisdiction and consequently introduce additional risk to the country’s food supply. For example, the 2010 heatwave in Russia triggered export restrictions for wheat, led to a wheat shortage and price spike in the Middle East, where over 1/3 of the wheat supply is from Russia, and potentially contributed to the destabilization of the region9,16. A simultaneous drop in yields of major exporters may disrupt the global trade network and food supply, and a synchronous yield fluctuation between trade partner countries may further exacerbate the problem. Therefore, the controversial role of international trade in addressing the food security challenge is associated with patterns of extreme weather events and yield fluctuations; however, such associations remain poorly understood and require an in-depth investigation17.

The occurrence and volume of the trade between countries have been often investigated as results of comparative advantages in producing food commodities (e.g., more efficient use of water and land resources), as well as many socioeconomic factors such as geographical proximity of countries, population, agricultural productivity, language, contiguity, level of economic development, and trade agreements18,19,20,21,22. Several recent studies have evaluated the impacts of climate factors19, but most focus on the average state of the climate, such as annual rainfall, annual evapotranspiration, and annual temperature19,23,24. Many studies have examined the impacts of natural disasters25,26, most of which are quantified based on deaths and losses due to rare events like flood or drought27 and are not specific to the impacts of extreme weather events on crop production28,29. Only a few studies investigated the impacts of extreme weather stress and synchronous crop yield fluctuations30,31,32,33, but their focus was on the impacts on food price fluctuations or trade volumes for individual countries, and not on the changes in the bilateral trade network.

In addition, the investigation of drivers for trade links has been limited to statistical approaches that were not designed to handle complex network data or derive data-driven relationships. Prior research of the potential drivers used regression models that impose multiple restrictive assumptions on the shapes of relationships (e.g., linear, log-linear, or log-log) and data distribution24,34,35,36,37,38. These regression approaches focus on modeling the attributes of network vertices without considering the structure of the network. In contrast, network analysis emphasizes both the vertices and structure of the network; thereby, it is capable of handling complex relationships39,40. Only recently, the statistical exponential random graph models (ERGMs) have been used to investigate the relationships between international trade links (or volume) and their potential drivers22,41,42. The ERGM has proven to be superior in capturing zero-inflated trade volumes and complex trade patterns (e.g., third-party relations and clustering), compared to traditional approaches22. However, these recent studies still impose parametric assumptions and are not flexible enough for modeling highly nonlinear relationships. Despite the success of machine learning approaches such as random forest (RF) in handling large volumes of complex data and deriving nonlinear relationships from the data, such data-driven approaches have been rarely utilized in trade analysis43.

To address these knowledge gaps, we proposed network-based covariates for studying the international trade network of wheat during the period from 2005 to 2014 using modern statistical and machine learning models. In addition to commonly used geopolitical (e.g., contiguity) and economic (e.g., regional trade agreements) factors, two network-based covariates were developed to characterize the extreme weather stress and yield synchrony, namely the difference in extreme weather stress (DEWS) and short-term synchrony (STS) of crop yield anomalies between countries. The DEWS network, derived using principal components of weather stress indices (see Methods), quantifies by how much the weather stress in an exporting country is stronger than the stress in an importing country. The STS network is represented by the correlation of fluctuations in crop yield, where the fluctuations are defined as deviations from the current average yield. The STS allows us to quantify how strongly the deviations of yields in one country correspond to similar deviations in other countries. A positive STS indicates synchronous fluctuations, while a negative STS indicates asynchronous fluctuations. Using the two network-based covariates, we hypothesize that 1) Higher weather stress difference (i.e., higher DEWS) leads to higher import volumes and likelihood of trade partnerships, and 2) Countries with synchronized yield anomalies (i.e., positive STS) are less likely to trade or have lower trade volumes.

To accommodate the complexity and network structure of the data, we applied ERGM and RF to model trade linkages and volume between countries (see Methods section for details). The ERGM was selected as the most suitable statistical method, and RF, being one of the best-performing machine learning techniques, was selected as a nonparametric alternative for benchmarking the results. We compared the performance of ERGM and RF in cross-validation for different types of networks, which highlighted the strengths and weaknesses of these alternative methods.

The analysis reveals that the current wheat trade network tends to have partnerships between countries with more synchronized yield fluctuation, exposing countries to synchronized yield failure. In addition, our models show that countries with larger differences in extreme weather stress tend to have higher import volumes and more trade partnerships. Therefore, this study suggests the need to consider extreme weather stress and yield synchrony in the design of trade networks to improve the stability and fairness of the global food system.

Results

Extreme weather stress for wheat production

Cold and heat stresses were identified as the major contributors to the variability of extreme weather indices developed for a country’s wheat production. A total of 17 indices were used to quantify weather stresses (including heat stress, cold stress, flood, and drought) during the growing period for wheat in 115 countries for the years 2005–2014 (see Methods and Supplementary Note 3). The first two principal components of the 17 indices, dominated by cold and heat stress, represent 65% and 22.7% variance of the weather index matrix, respectively (Supplementary Fig. 2).

The dominant principal components of the extreme weather indices are not significantly correlated with production levels across countries, while the heat stress indices are correlated with the import dependency (Fig. 1; detailed results in Supplementary Table 3 and Supplementary Figs. 1329). It suggests that the scale of wheat production in a country was not necessarily affected by the extreme weather stress in the wheat-producing region, but a country’s dependency on wheat imports was associated with higher heat stress. Furthermore, out of 115 major wheat-producing countries in our study, 56 countries have an import dependency ratio (IDR) greater than 50% (mostly in developing countries), while only 27 countries are net exporters. Since around half of the countries are highly import-dependent for wheat, trade plays an important role in ensuring a stable wheat supply. The pairwise relationships between weather stress and other major characteristics of trade (such as the number of linkages and trade volume) are similar: the countries facing higher heat stress (or lower cold stress) are likely to have fewer trade partners for exports; and countries with higher cold stress tend to have more import trade partners (Supplementary Figs. 811 and Supplementary Table 3).

Fig. 1: Relationship between import dependency ratio and weather stress.
figure 1

Relationships between the 2005–2014 import dependency ratio (IDR; Eq. 1) and derived principal components (PCs) representing the weather stress: (a) cold stress, (b) heat stress. Positive IDR means higher import dependency, while a negative IDR means that a country is a net exporter. Each black point represents a country, the size of the point corresponds to the average wheat production during 2005–2014. The black lines represent the estimated linear relationships between weather stress and IDR (p-value = 0.460 for cold stress and 0.005 for heat stress), gray shaded areas correspond to 95% confidence intervals.

Relationships between trade networks and extreme weather stress

Using the principal components of extreme weather indices as part of the covariates, we modeled the bilateral trade networks (one weighted by trade volume and one without the weights) of wheat with ERGM and RF model separately. The two models observe similar general relationships between trade networks and their potential drivers, but the performances of the two models vary. To evaluate the performance of each model, we conducted cross-validation. The results show that ERGM, with an error rate of 5.17%, was more accurate than RF in predicting trade presence/absence (i.e., the trade network without weight by trade volume), while RF was more accurate in predicting trade volume (Tables 1 and 2; see the distribution of cross-validated errors in Supplementary Figs. 56). Hence, throughout the rest of the paper, we report the modeling results for trade linkages and trade volumes based on ERGM and RF, respectively.

Table 1 Summaries of the models for 2005–2014 unweighted directed (trade presence) international wheat trade network.
Table 2 Summaries of the models for 2005–2014 weighted directed (trade volume) international wheat trade network.

Modeling results from both ERGM and RF show that country pairs with larger differences in the levels of extreme weather stresses are more likely to be trade partners. The ERGM shows that more severe heat stress in the importing country compared to an exporting country (i.e., DEWSheat < 0) corresponds to a higher likelihood of trade link formation. Vice versa, trade partnerships are less likely if the exporting country is experiencing larger heat stress than the importer does (i.e., DEWSheat > 0; Table 1, and Supplementary Fig. 12b). These model results align with the observed relationship between import dependency and heat stress (Fig. 1b).

The differences of both heat and cold stress between countries have significant relationships with trade volume. The RF shows overall higher trade volumes correspond to higher heat stress in importing countries (i.e., when DEWSheat < 0, compared with DEWSheat > 0, similar to the ERGM results); however, the relationship is not exactly linear, and the trade volumes increase marginally for DEWSheat around zero (Fig. 2b). Higher trade volume is predicted when differences in cold stress between partners exist (i.e., DEWScold ≠ 0; Fig. 2a); however, in contrast to the heat stress, two upper deciles of DEWScold are associated with higher trade volumes. In particular, the biggest spike in Fig. 2a is driven by Germany, i.e., a large exporter that may experience more severe cold stress than its trade partners do (DEWScold > 0). The cases of DEWScold > 3000 are dominated by Japan, Mongolia, and South Korea in the exporter role; hence, the corresponding average trade volumes decline from the peak values (Fig. 2a). The RF rankings of variables by their predictive importance are consistent between the weighted and unweighted trade networks. The economic and difference in production covariates are ranked as the top predictors followed by the STS and DEWS. Among the other factors, common official language and contiguity are ranked eighth and ninth, while regional trade agreements (RTA) is ranked tenth (Supplementary Figs. 32 and 33). Permutation-based assessment of statistical significance of the importance values identified all the variables in the RFs as statistically significant, which means the RF models have one additional variable (DEWScold) compared with the ERGMs for the unweighted directed trade network.

Fig. 2: Random forest partial dependence plots for trade volume in 2005–2014.
figure 2

The x-axes represent the considered covariates: (a) difference in extreme cold stress (DEWScold), (b) difference in extreme heat stress (DEWSheat), (c) short-term synchrony (STS), (d) distance, (e) contiguity, (f) common official language, (g) difference in production, (h) The General Agreement on Tariffs and Trade (GATT)/ World Trade Organization (WTO), (i) Regional Trade Agreements (RTA), (j) gross domestic product (GDP) per capita, and (k) GDP. The inner tickmarks on the x-axes represent deciles of the variables. The y-axis represents the marginal effect of the covariate on wheat trade volume. The blue bars show marginal effects caused by categorical covariates, while black lines show marginal effects of continuous covariates.

The role of yield synchrony

In addition to extreme weather stress, the STS of crop yield anomalies also demonstrates a significant relationship with the wheat trade networks regarding the presence/absence of trade links and trade volumes. More specifically, the ERGM for the unweighted network shows that STS is positively associated with the likelihood of trade partnerships (Table 1). In the weighted trade network, RF detects a nonlinear relationship characterized by the overall accelerating increase of trade volume with the increase in STS (main body of the distribution; Fig. 2c). However, the first decile of STS, comprising the most asynchronous pairs of countries, is also characterized by a spike in trade volume (Fig. 2c). This illustrates that countries with perfect asynchrony (STS ≈ –1) and synchrony (STS ≈ 1) of yield fluctuations tend to trade more.

The role of other factors

All our ERGM and RF models also include the following covariates that have been considered as important for the formation of trade linkages: population-weighted distance, contiguity, common official language, gross domestic product (GDP), GDP per capita, General Agreement on Tariffs and Trade (GATT) /World Trade Organization (WTO) memberships, RTA, and difference in production between countries. The modeling results further confirmed the important role of these factors. The ERGM results show that trade partnerships are more likely to occur between countries that are closer to each other, contiguous, or have a common official language (Table 1), which aligns well with the existing findings in the literature. The RF results show, similarly to ERGM, higher trade volumes for countries that are contiguous and have a common official language (Fig. 2e, f), and an overall negative relationship between trade volume and distance (Fig. 2d). However, RF was also able to model nonlinearity in the latter relationship, characterized by substantial spikes in trade volume around the deciles 3–4 and 9–10 of the population-weighted distance (Fig. 2d).

The inclusion/exclusion of these covariates in the ERGM and RF models does not affect the above results regarding the relationships between trade networks and extreme weather stresses, as well as yield synchrony, further confirming the robustness of the modeling results. For example, countries closer to each other tend to have more synchronized yield; however, the ERGM results show a significant positive association of trade partnerships with STS regardless of whether the distance variable is included or excluded (Supplementary Table 4). This test suggests that the positive relationship between trade networks and STS is not only due to the positive relationship between STS and distance but could be an outcome of other factors that are not included in the models (e.g., cultivars, and technology and management practices in agriculture).

The difference in production level between exporting and importing countries shows a significant positive association with the international wheat trade network (Fig. 2g and Supplementary Fig. 12a). The amount of production plays a critical role in a country’s decision to engage in international trade. The modeling results from both ERGM and RF show that the likelihood of trade partnership is higher when country pairs have larger differences in their production levels. The ERGM results show a higher likelihood of trade partnership when production is higher in exporting country (i.e., difference in production > 0) compared to when it is higher in importing country (i.e., difference in production < 0).

In RF, a non-linear relationship between difference in production and trade volume is observed (Fig. 2g). A substantial increase in trade volume is predicted when the production is higher in exporting country (i.e., difference in production > 0 compared with difference in production > 0), such a pattern is similar to ERGM results. The cases with difference in production > 109 (i.e., 1 million tonnes; upper decile) mostly include the Russian Federation, the USA, and India as the major exporters. Specifically, the highest marginal increase in trade volume (Fig. 2g) is driven by the Russian Federation, the USA, France, and Canada highlighting their major contribution to the international wheat trade.

In addition to the above covariates, we tested several other economic factors, including GDP, GDP per capita, GATT/WTO membership, and RTA. These variables are frequently used to account for multilateral resistance, total expenditure, and market size in international trade44,45. The ERGM results show that trade partnerships are more likely to occur for countries with higher GDP/GDP per capita and involvement in regional trade agreements, while the possibility of trade partnership reduces when countries are in GATT/WTO membership. In the RF, the GATT/WTO shows a similar marginal increase in volume of trade for both member and non-member countries, indicating that the imports and exports are occurring between countries irrespective of their trade membership. However, including these factors in the models does not change the relationship between the trade network and the two main covariates (i.e., DEWS and STS).

Discussion

While few trade partnerships have been established with direct consideration of extreme weather stress and yield synchrony among countries, significant relationships between these factors and the wheat trade network have been revealed by our investigation into the historical records for the period of 2005–2014. The revealed relationships have strong implications for the resilience of wheat trade networks and food supply for countries around the world.

The significant positive associations between trade linkages and synchronous yield fluctuations indicate that in the current wheat trade network, trade partnerships tend to be established between countries with more synchronized yields. Therefore, the current wheat trade network is potentially vulnerable to synchronous yield failure. A recent study projected that synchronous failure in wheat yield can lead to about a 33% reduction in wheat production46, which is equivalent to half of the projected wheat consumption by 205047. Hence, the synchronous yield failure can lead to dramatic disruptions to wheat supply for a country and the world, and consequently threaten food security in countries, especially those low- and middle-income countries that are dependent on imports1,11.

Higher likelihood of trade partnerships and higher import volume are found to be associated with higher differences in extreme weather stress when importing countries have higher stress than their exporting partners. This finding indicates the role of international trade in alleviating the long-term production stress caused by frequent extreme weather. Under the possible higher weather stress due to future climate change, the wheat trade network is expected to change, and increases in import linkages are likely to occur, particularly in tropical countries that are already import-dependent. Consequently, whether these countries can establish more trade linkages on time to combat the climate-change-induced stress in domestic production and stabilize their wheat supply is critical for food security in these countries.

Our analysis demonstrates the need to factor in the consideration of extreme weather stress and yield synchrony when designing trade agreements and establishing trade partnerships. Countries have been choosing their trade partner mainly based on economic benefits (e.g., promising duty- and quota-free access to the international market) or geopolitical relations, while few consider the extreme weather stress and yield variation in their potential partner countries. While the partnership is economically beneficial for the countries involved, it may lead some countries to become highly dependent on wheat supply from a few countries or regions, and potentially expose these countries to a higher risk of food insecurity when trade partners are facing synchronous yield losses. For instance, Ukraine’s substantial (~77%) wheat production losses in 2007 compared to 2006 forced its trade partners to buy wheat from other countries (e.g., Australia, France, the USA, Russia, Kazakhstan) and relax import barriers to meet their domestic food demand48. Such rapid alterations in trade partners within a year overwhelm not only the major exporters through reduced stock-to-use ratio and doubled export volume in a short period, but also importers who have been relying on the supply from Ukraine or the global market, potentially contributing to spikes in wheat prices16,48. Importers from least-developed nations are particularly vulnerable to such rapid changes in the global market, as they usually have less political and economic capacity to establish new partnerships or to adapt to higher market prices. Therefore, the economic benefits and abundant food supply, brought by the trade partnerships, can potentially be compromised by the occurrence of extreme weather stress and synchronized yield failure. To mitigate these potential negative outcomes, countries, as well as multilateral trade agreements, need to carefully consider their portfolio of trade partners by 1) identifying whether their trade partners have synchronized yield patterns; 2) determining the level of extreme weather stress in each partner and the impact of climate change. To ensure a more stable crop supply, a country or a trade agreement should favor trade partners with asynchronized yield patterns and less extreme weather stress, in addition to economic benefits. In addition, the examination of yield synchrony and extreme weather stress in existing partnerships will inform the design of mitigation measures such as crop reserves to minimize the disruption of synchronized yield failure.

We tested two models, ERGM and RF, for investigating the trade linkages and trade volume. The goal of testing the two models is to assess the linear (via ERGM) and nonlinear (via RF) relationships between trade networks and drivers. While the two modeling approaches provide similar results on the general relationships between the trade networks and a range of covariates, we find that ERGM performed better when investigating the unweighted network, and RF performed better for the network with trade volume (Tables 1 and 2). The reason for the better cross-validation performance of ERGM in an unweighted network but not in a weighted network is the relative simplicity and thus robustness of the model when capturing the presence/absence information and, at the same time, lack of flexibility when modeling the actual trade volumes. In contrast, RF automatically captures the nonlinearity of the relationships but tends to overfit the data; hence, the results can be less robust, such as in the case of the unweighted network.

RF performed better than ERGM on the weighted network in the cross-validation study, which implies that the nonlinear relationships the RF was able to capture were important in predicting trade volume. However, the difference in errors between models is substantial. Although RF has lower errors, it still needs improvements in providing robust predictions of trade volume (see Table 2 and Supplementary Figs. 5 and 6). The correlation of observed and predicted percentage contributions of each country in global export volume is 0.91 (Supplementary Fig. 34). However, the correlation between observed and predicted trade volumes is not as strong (Supplementary Fig. 35). The performance can potentially be improved by considering in the modeling process other commonly used drivers, the impact of physical limits of crop production on trade volumes, and adjustments for imbalanced (zero-inflated, in case of numeric response such as trade volume) distributions. Substantial progress has been made in RF classification to deal with imbalanced datasets49,50,51,52, but developments for RF regression with zero-inflated data are still ongoing53. Even with the current limitations and uncertainties in ERGM and RF, the models can establish a reliable general relationship between trade networks (with and without the weight of trade volume) and extreme weather stress and yield synchrony. Further research is needed to improve the modeling and prediction of trade volume in the context of climate change by incorporating the production and demand levels of both the importing and exporting countries.

Conclusions

Our analysis suggests that the two factors, the level of extreme weather stress and synchrony of crop yield fluctuations, substantially affect the international wheat trade network. Country pairs with larger differences in heat stress are more likely to have trade connections and higher trade volumes. Meanwhile, in the current wheat trade network, trade partnerships are more likely to be established between countries with synchronized yield fluctuations. This represents a systemic risk in the current global wheat market since synchronized yield failure can disrupt wheat supply and intensify food insecurity for both partnering countries. Other fundamental drivers in the analysis (i.e., production level, and economic and geopolitical factors) show significant relationships with trade link and volume, further confirming their importance in the international trade network. Our results demonstrate the need to consider extreme weather stress and yield synchrony in the trade policy framework to improve the stability and fairness of the global food system.

Methods

Weather, wheat yield, and trade data

Major datasets used in this study include daily weather records54, annual wheat yield55, annual wheat production55, and bilateral trade volume15 for the period of 2005–2014. To assess crop-specific weather conditions for countries around the world, global maps of crop calendar56 and harvested area57 were used. The other datasets used in this study include the distance between countries58, and their official languages59, contiguity58, gross domestic product (GDP)60, GDP per capita60, General Agreement on Tariffs and Trade (GATT)/World Trade Organization (WTO) memberships, and regional trade agreements (RTA)61,62 (Supplementary Table 1). The reason to include these additional datasets is to incorporate factors that are considered important for trade connections. Countries with at least seven years of yield data in the analyzed period were selected for this study (115 countries, see Supplementary Note 1). All data used in this study are publicly available from the corresponding sources.

Import dependency ratio

Import dependency ratio (IDR) for each country was calculated as the ratio of net imports (i.e., importsexports) and total production of the crop (kg) and net imports (kg):

$$IDR=\frac{net\,imports}{production+net\,imports}.$$
(1)

Network-based covariates for extreme weather stress

To assess the extreme weather stress in wheat-producing countries and the relationship among countries, we developed the Difference in Extreme Weather Stress (DEWS) covariate, following four steps outlined below. DEWS was used to address the hypothesis that trade is more likely between a country with low weather stress (country capable of producing high yields due to good weather) and a country with high weather stress (country willing to buy because of yield losses due to bad weather), i.e., in the cases with large DEWS. If both trade partners experience good or bad weather for growing the crop, DEWS is low.

First, we comprehensively characterized local agricultural weather conditions. We calculated 17 agricultural weather indices63 (Supplementary Table 2) for each spatial grid cell during the growing season of wheat with crop calendar in each year from 2005 to 2014 (Supplementary Algorithm 1 and Supplementary Note 2). The weather indices represent extreme weather stress as they quantify the temperature and precipitation conditions that are outside the optimal growing conditions (i.e., comfort zone) of wheat63. Second, we aggregated the weather indices for relevant areas. The gridded maps of weather indices from the first step were aggregated for each country with the spatial grid weights proportional to the harvested area for wheat (e.g., weather index in grid cells without wheat production was disregarded) (Supplementary Algorithm 1, Supplementary Note 2, and Supplementary Fig. 36). This resulted in 17 weather indices for 115 countries for 10 years. Third, we reduced the dimensionality of the weather data to provide a combined assessment of the weather conditions and to avoid potential problems of multicollinearity and overfitting in later modeling. In the principal component analysis (PCA) of the extreme weather indices (see Supplementary Note 4), the first two principal components (PCs) account for 87.7% of the variation, and they represent cold and heat extreme weather stress conditions, respectively, in a country’s wheat-producing region (Supplementary Fig. 2). Fourth, we defined DEWS for cold stress (DEWScold) (or heat stress; DEWSheat) as the difference in the cold stress PC (or heat stress PC) in two countries, where DEWS < 0 means higher stress in importing country, and DEWS > 0 means higher stress in exporting country.

Network-based covariate for yield synchrony

To assess associations of synchrony of crop yield fluctuations with trade, we developed the covariate short-term synchrony (STS). STS was defined as the correlation of yield fluctuations between countries. The yield fluctuations for each country were obtained by removing trends from the yield time series, where the trend was estimated using a locally weighted regression approach (see Supplementary Note 5 and Supplementary Fig. 3).

Trade networks

Based on the bilateral wheat trade flows during 2005–2014, two types of trade networks were developed and examined in this study, namely unweighted directed and weighted directed networks. Both networks consist of nodes and edges, where nodes represent countries and edges represent trade connections between countries. The edges are directed, representing the direction of trade flows from one node to another. The unweighted directed network does not assign a weight to the edges, while the weighted directed network assigns a weight for each edge based on the trade volume (measured in kg). Between 2005 and 2014, the international trade network included 115 nodes (i.e., countries) and 5726 directed edges, resulting in an average in/out-degree of 50 (Supplementary Figs. 7, 30, and 31).

Exponential random graph model

The exponential random graph model (ERGM) was chosen as the most suitable and powerful statistical approach for network modeling. ERGM is an exponential probability model used to predict edges (or weights on edges) between nodes in a network Y64:

$$\mathop{\Pr }\limits_{\theta ,{{{{{\mathscr{Y}}}}}}}(Y=y)=\frac{\exp \{{\theta }^{\top }g(y,X)\}}{\kappa (\theta ,y)},y\in {{{{{\mathscr{Y}}}}}},$$
(2)

where \(\theta \in {\mathbb{R}}\) represents model coefficients, g(y,X) are statistics corresponding to y and covariates Xκ(θ,y) is the normalizing factor, and \({{{{{\mathscr{Y}}}}}}\) is the set of all possible networks for the given number of nodes.

In this study, the network Y represents network realization of the observed network y, covariates X include the DEWS, STS, distance between countries, contiguity, common official language, GDP, GDP per capita, RTA, GATT/WTO membership, and difference in production. The ERGM is tested for both the trade networks (i.e., weighted directed and unweighted directed). See Supplementary Note 6 for details on the model specification in R software. In addition to the above-defined ERGM, several different combinations of ERGM are tested (see Supplementary Note 15 and Note 18).

The sign and magnitude of the ERGM coefficients for each covariate (Tables 1 and 2) demonstrate the likelihood of link formation in the trade network22. A significant (p-value < 0.05) positive coefficient shows that higher values of the covariate are associated with a higher likelihood of the link formation, while a negative coefficient corresponds to a lower likelihood. Statistical significance of the coefficients is assessed using t-test at 0.05 significance level.

Random forest model

The random forest (RF) model65 was used as an alternative nonparametric method for modeling the two trade networks. RF is one of the best-performing machine learning techniques, hence serves as a valuable benchmark for validating the results. Compared with other popular machine learning models, such as neural networks, RF has fewer tuning parameters. Some of the important tuning parameters in RF are the number of variables to examine at each split and the minimal size of terminal nodes, both of which have sensible defaults66. In contrast, neural networks rely on many more hyperparameters, many of which do not have default values and have to be set by the user (for example, number of layers and number of units in each layer, dropout rates, size and number of filters in convolutional layers)67. We chose RF due to its competitive performance, ease of specification, and rising popularity in agricultural studies68,69,70.

Unlike ERGM, random forest is a machine learning tool that investigates relationships between covariates and can model nonlinear and combined effects of several covariates automatically. RF is a set of regression trees, where each tree is trained on a bootstrap sample of the original data, and only a random subset of covariates is assessed at each tree split (see details of RF implementation in Supplementary Note 7). The relationships captured by RF are investigated through partial dependence plots that show marginal changes in the response variable due to the changes in the covariate, while all other covariates are kept at their observed levels. In addition to the partial dependence plots, we assessed the importance of the covariates based on the mean decrease in the predictive accuracy of a regression tree when values of one of the covariates are randomly permuted66. We further ranked the covariates based on their importance and assessed their statistical significance (at 0.05 level) using the Altmann permutation algorithm with 100 permutations71.

Model assessment

Validation and benchmarking are essential steps for evaluating a model’s performance and how it compares to competitive alternatives. We evaluated the model performance through cross-validation. In the cross-validation, five years of data from the observed period were randomly selected for training the models, and the remaining five years were used to obtain predictions and calculate the errors of the model predictions (see details in Supplementary Note 8).

Two types of errors were calculated in the cross-validation. For the models of the unweighted network, we used misclassification error (MCE):

$$MCE=1-{n}^{-1}(TP+TN),$$
(3)

where n is the sample size, TN and TP are the counts of true positive and true negative classifications.

For the models of the weighted network, we calculated mixed error (ME):

$$ME={n}^{-1}{\sum }_{i=1}^{n}\frac{|O{T}_{i}-P{T}_{i}|}{1+|O{T}_{i}|},$$
(4)

where OTi is the observed trade volume, and PTi is the predicted trade volume for i-th trade connection. The cross-validation was carried out 100 times and the two types of errors were calculated each time.