Article | Published:

A dynamical systems approach to gross domestic product forecasting

Nature Physicsvolume 14pages861865 (2018) | Download Citation

Abstract

Models developed for gross domestic product (GDP) growth forecasting tend to be extremely complex, relying on a large number of variables and parameters. Such complexity is not always to the benefit of the accuracy of the forecast. Economic complexity constitutes a framework that builds on methods developed for the study of complex systems to construct approaches that are less demanding than standard macroeconomic ones in terms of data requirements, but whose accuracy remains to be systematically benchmarked. Here we develop a forecasting scheme that is shown to outperform the accuracy of the five-year forecast issued by the International Monetary Fund (IMF) by more than 25% on the available data. The model is based on effectively representing economic growth as a two-dimensional dynamical system, defined by GDP per capita and ‘fitness’, a variable computed using only publicly available product-level export data. We show that forecasting errors produced by the method are generally predictable and are also uncorrelated to IMF errors, suggesting that our method is extracting information that is complementary to standard approaches. We believe that our findings are of a very general nature and we plan to extend our validations on larger datasets in future works.

Main

In recent years a new approach to macroeconomic analysis and forecasting has been developed in the context of complex systems. This new framework, which goes under the name of economic complexity (EC), provides a concise description of global macroeconomic relations, technological trends and growth dynamics. By contrast with standard macroeconomics, EC is parsimonious in terms of data requirements, both in terms of quantity and diversity. EC leverages the ability of network algorithms to extract information from reliable and standardized datasets on global trade to build a metrics of national industrial competitiveness, called economic fitness (EF). In this work we show how EF can be combined with techniques from dynamical systems to provide a novel approach to gross domestic product growth forecasting that is concise and reproducible, surpasses the accuracy of mainstream institutions such as the IMF, and provides a clearer understanding and interpretation of the dynamics of growth. Besides providing extensive validation, we show how this new approach is complementary to standard ones, and how its combination with mainstream models further improves accuracy. These results imply that further effort should be made in order to integrate EC in the standard framework, and the fact that such accuracy arises from the dynamics of low-dimensional systems can help improve and rethink our vision of the forces that drive long-term growth.

Less data is more

Models developed for GDP growth forecasting are very complex. These typically make use of a large number of variables (up to hundreds of them), and for each of those there is at least one parameter to be fitted to data, or assigned by assumptions. Such variables range from socio-economic indicators (labour productivity, employment, schooling, population age, and so on) to financial indicators (fiscal policies, interest rates, public debt) and global trade indicators (raw material prices, trade openness, exchange rates) among others1,2,3. To have such complex models, with so many variables, offers the illusion to be able to capture all the components that drive economic growth and therefore to deliver the best possible forecasting. However, this is in general not the case, for two reasons: first, it is in practice very hard to find the right ‘recipe’ to mix all these variables (that is, the right functional form and the right parameters) in order to have an accurate forecast of growth. There is in fact scarcely a principle that can drive the choice of how to combine schooling with raw material prices for instance. Second, increasing the number of input variables exponentially increases the dimensionality of the space over which one would like to fit a function (the model) that predicts economic growth4: this implies that the ability to systematically sample observations from this space is very limited.

In many cases, not only in economics, theoretical modelling and forecasting are not tightly related5. Most of the modelling efforts are in the direction of oversimplified representations that aim only at understanding the potential effects of a single variable, or of a limited set of them, in a controlled setting. Although these models may help grasp many isolated implications, they are rarely used directly to predict precise dynamics of complex systems such as, in the case of economics, countries’ growth or crisis. On the other side, the approaches explicitly developed for forecasting often depart from rigorous theoretical models, and are typically grounded on econometric or statistical techniques.

From a physicist’s perspective, such a situation can be mapped to the more general problem of forecasting the evolution of a dynamical system of which we do not know the actual laws of motion. The state of the system, in the case of growth forecasting, is the GDP of the country, together with all the other socio-economic variables that are used in the model, and we suppose the dynamics to be ruled by some kind of coupling among such variables. The only knowledge we have about the system is a collection of previously observed states, together with their evolution after a fixed time delay (possibly only along the GDP direction, if we do not know the evolution of the other indicators). In such a situation, to perform a prediction for the evolution of a new and previously unseen state, the most basic approach would be to look for the most similar state of which we know the evolution from past data (an analogue) and use that evolution as a prediction. This approach goes under the name of ‘method of analogues’, and has a long story of successful and failed applications6,7,8. This model requires only a rule to chose the analogues and a procedure to extract the information from them.

In light of basic considerations about dynamical systems, such as those exposed in ref. 4, it is easy to see that we can only hope that such an approach succeeds—that is, that we are able to identify ‘close enough’ analogues—if the effective dimensionality of the phase space of the dynamics we want to describe is very low: in other words, if the system is likely to be found in a small volume of the phase space, an attractor, so that we can effectively sample it. In this perspective, the addition of more data to a forecasting problem is often detrimental to the actual quality of the prediction, as it diminishes our ability to find relevant analogues.

Economic growth as a low-dimensional dynamical system

In this work we demonstrate how the dimensionality problem can be solved in the context of GDP forecasting, by completely rethinking the process of data selection. In particular we expand on a series of recent works in the field of economic complexity (EC)9,10,11,12,13,14,15, and bring this conceptual framework at the level of the state of the art of GDP forecasting, obtaining a substantial improvement in accuracy over the current International Monetary Fund (IMF) five-year forecasting3. Such results are obtained over the largest publicly available historical dataset released by the IMF, which covers three windows of five years from 2008–2013 to 2010–2015 (details on the exact specification of the data used are reported in the Methods). We forecast the GDP per capita at purchasing power parity (PPP) in constant dollars.

The EF metrics, or fitness, introduced in 201210, provides an effective and extremely synthetic way of quantifying the competitiveness of a country’s economy, and does so by considering only export data. EF is not a simple statistic of the export but extracts the information on how complex is the productive structure of a country, accounting for the complexity of the produced goods in an algorithmic way (see Methods for details), using revealed comparative advantage16 as a proxy for competitiveness. Export data at the national level is publicly available through the COMTRADE database. The COMTRADE database provides bilateral declarations of export flows of single products, categorized in the Harmonized System international classification, and aggregated on a yearly basis, for more than 20 years. This data offers three main advantages.

The first is syntheticity. Exported products are in principle a very good proxy of competitiveness. To export a product a country has to compete in the global market, and being a relevant player is a much stronger signal of competitiveness than internal demand. In other words, if an industry is able to compete only in its own country internal market, there is no explicit sign of global competitiveness that we can use to infer the presence of a competitive mix of endowments. Moreover, looking at product-level variables gives a perspective that is much closer to the actual ability of the country to produce wealth. Instead of inferring this ability through more indirect indicators (the endowments), looking at products allows one to get direct information on how such endowments interact to produce material wealth.

The second is standardization. The product classification is global, and for most of the trade flows two declarations of the total values are available: one from the importer and one from the exporter. This allows a great reduction in fluctuations and increases data quality.

The third is homogeneity. Export data does not suffer many of the most problematic limitations of other macroeconomic observables. Being collected at an international level there is a strong homogeneity in units, gathering methodology, frequency and data availability, both geographically and through time. This allows for much easier and less obscure standardization and regularization procedures on raw data.

We leverage the advantages of export data in two ways: first we develop a state-of-the-art workflow for data sanitation and regularization (see Methods section); then on this high-quality dataset, we compute the countries’ fitness, which is a very effective scalar indicator that synthetically describes the industrial competitiveness of a country.

As has been shown in previous works1,15, fitness and GDP per capita define a bidimensional space where the dynamics of countries exhibits high levels of regularity. Data sanitation greatly expands the volume of this space where the dynamics is regular (see Supplementary Information). Given the low dimensionality of such space, and the absence of an effective theoretical framework to describe the dynamics of economic growth in the GDP–fitness plane, the situation is perfect to develop a forecasting scheme based on the method of analogues.

We produce probabilistic five-year forecasts by repeatedly sampling analogues in the GDPpc–fitness space with a Gaussian kernel, centred on the present state of a country (we call this approach SPSb, see Methods section for details). This results in a distribution of possible outcomes (Fig. 1a). To further refine our forecast, and take into consideration the strong self-correlation of GDP growth17, we combine the forecast obtained from the SPSb distribution with the forecast that would be obtained by assuming a growth exactly equal to that of the past five years, as shown in Fig. 1b (see Methods for details). The resulting distribution is the velocity-SPS forecast.

Fig. 1: Graphic scheme of the SPS method and its combination with the past growth.
Fig. 1

a, The Bootstrap Selective Predictability Scheme. To predict the evolution of a point (red) in the GDPpc–fitness plane we perform a bootstrap of previously observed evolution, weighted by the distance of the analogues starting points (black) from the country. b, In the velocity-SPS approach we perform a weighted average of the forecast given by SPSb (black arrow) with the forecast that corresponds to a perfectly autocorrelated dynamics (red arrow).

To backtest these approaches we build 482 GDP growth forecasts on three five-year windows: 2008–2013, 2009–2014 and 2010–2015. The forecasts are built with a rigorous out-of-the-sample approach, namely all the data sanitation is performed using data only up to the beginning of the forecasting window, and the analogues are sampled among transitions observed up to the beginning of the forecasting window. We forecast the growth of all the countries for which we have data of fitness and GDP in the corresponding time range and with at least one analogue in a radius of one σ (see definition in the Methods section). As a benchmark of the state of the art, we compare our results with the historical forecasts released by the IMF for exactly the same set of countries and time windows3. The quality of the IMF forecast is debated in the economic literature18,19, but it is nevertheless often used as a valid reference term20. These forecasts are the most authoritative publicly available global historical forecast data on a five-year horizon, and this motivates our choice to use them as benchmarks. The choice of a five-year time horizon is a trade-off between the medium–long term where we expect the GDP–fitness dynamics to be meaningful and stronger than fluctuations, and the availability of benchmark forecasts from the IMF. The 482 data points that we use to test the forecasting accuracy correspond to the maximum possible intersection between the forecasts released by IMF and the 169 countries for which reliable export data is available on COMTRADE. The results are summarized in Table 1. We consider two error metrics over the forecasted percentage compound annual growth rate (%CAGR) of the countries: the mean absolute error (MAE) and the root mean square error (RMSE). The error is defined as

$$E = \% {\mathrm{CAGR}} - \% {\mathrm{CAGR}}_{\mathrm{forecasted}}$$

An error of 1% means that if the real compound annual growth has been 3% the forecast was 2%.

Table 1 Summary of the results
Table 2 Top 20 MAE enhancements combining IMF forecasting with velocity-SPS

We consider three sets of countries (all, predictable and unpredictable), emphasizing how the fitness of a country is a strong predictor of the ability to forecast growth, and how this is a general feature that holds not only for SPS forecasts, but for the IMF as well. The predictable and unpredictable classifications are based uniquely on the fitness of the country at the beginning of the forecasting window. A country c is classified as predictable if log(Fc) > −1.5 and unpredictable otherwise. Such distinction is taken from ref. 15, where the predictable and unpredictable regimes have been identified. We also show the performance of a combination of the two SPS methods presented here with the IMF forecasts—that is, what is the error of a forecast that is the simple linear average of the two combinations of forecasted %CAGR: IMF–SPSb and IMF–velocity-SPS. That is, we do not average the errors, but rather compute the error made by an averaged prediction.

We comment the main findings reported in Table 1. On average SPSb performs as good as IMF models, despite being much simpler and easier to interpret, being built out of only two variables. Velocity-SPS represents an improvement in terms of MAE over state-of-the-art IMF forecasts. All the methods perform significantly better in the predictable regime than in the unpredictable regime. This consideration surprisingly holds much more strongly for IMF forecasts, that are fitness-unaware. velocity-SPS is by far the best approach to forecast the growth of low-fitness countries, with a striking improvement over IMF forecasts in terms of MAE. In general, the IMF and SPS approaches seem to provide information that is to some extent orthogonal, and can be fruitfully combined. Globally the combination of velocity-SPS and IMF brings an improvement in MAE over IMF alone. Similar results hold in the predictable regime as well, but fail in the unpredictable regime: possibly due to the very large IMF error, here the velocity-SPS alone is the best performing strategy in terms of MAE and RMSE. The combination of velocity-SPS and IMF could be easily improved by adding the confidence interval of the IMF’s projections.

The SPS approach not only provides state-of-the-art GDP forecasting within a minimal framework, but it is also able to provide a consistent estimate of the forecasting error for each specific forecast. To demonstrate the consistency of the error estimates, we first approximate all the forecast GDP probability distributions as Gaussian distributions, by simply computing the mean and standard deviation of the centres of mass of the bootstrapped samples (see Methods). We then standardize the corresponding realized growth Gc,t by computing

$$\tilde G_{c,t} = \frac{{E\left( {G_{c,t}} \right) - G_{c,t}}}{{\sigma _{c,t}}}$$

where E(Gc,t) is the expected growth of country c in time window t + Δt under the Gaussian approximation and σc,t is the standard deviation of such Gaussian. If the Gc,t are distributed as predicted by the SPS Gaussians, then we expect \(\tilde G_{c,t}\sim N(0,1)\). In Fig. 2a we show a comparison of N(0, 1) with the empirical distribution of \(\tilde G_{c,t}\), obtained again from strictly out-of-sample forecasts in the three five-year time windows discussed above. In Fig. 2b we show how the uncertainty varies with the fitness of the country: in general, uncertainty is lower for developed countries and is larger for countries with low fitness. Interestingly, there seems to be an increase in uncertainty for countries close to the transition from the unpredictable to the predictable regime, at log(F) = −1.5: this is in line with findings outlining bifurcation-like behaviours of the dynamics of countries close to development barriers, which can make the difference between transitioning to a developed economy or remaining stuck in a ‘poverty trap’21.

Fig. 2: Expected errors’ properties of the forecasting.
Fig. 2

a Empirical distribution of \(\tilde G_{c,t}\) for the SPSb method (yellow bars) compared with the N(0, 1) expected distribution (blue line). b Expected forecasting error decreases when fitness increases. The forecasting error is calculated as the standard deviation of the average growth realized by the bootstrapped analogues. Dashed lines mark ±2σ confidence intervals.

With these new techniques we are able to easily interpret the situation of a large set of countries, providing a sharp, quantitative answer to many heavily debated questions. In Fig. 3 we show the velocity-SPS forecasts 2015–2020 for three emblematic countries: China, Brazil and Tanzania. In 2015 Brazil’s GDPpc was still slightly above China’s. By standard definitions both would be middle-income economies, with a danger of becoming trapped in the so called ‘middle income trap’22. However, even before running quantitative estimates, the situation of the two countries already appears radically different when projected in the GDPpc–fitness plane. While China seems a radical historical outlier with two consecutive decades of steady growth (actually its trajectory resembles very closely that of Japan after the Second World War), Brazil sits in the middle of a much more crowded part of the space. It is in that crowded space, where also countries like Russia and South Africa are currently placed, that the idea of the ‘middle income trap’ was defined. But by adding only one more dimension, the fitness, we are able to separate extremely well the scenarios of China and Brazil. In the past 20 years, and even today, the opinion on how long China’s fast growth would have lasted has often been pessimistic and generally unclear. From Fig. 3 the situation is clear, although China has no analogues in the plane, so this method is not optimal for a quantitative forecast, it still has a GDP much lower than the other countries with the same fitness. Thus it should be looking at several more years of fast growth in its future, and not only in terms of GDP, but in industrial competitiveness as well. A country like Tanzania instead is in a much more uncertain situation. Its fitness has been fluctuating, and the uncertainty on where it will be in five years is much higher. That uncertainty also reflects on the GDP growth estimate, which is in general less precise for low-fitness countries.

Fig. 3: Dynamics and forecasts in the GDP–fitness plane.
Fig. 3

We show three highlighted examples of dynamics during 1995–2015 for China, Brazil and Tanzania (coloured lines) among all countries (grey lines). Coloured regions represent the probability distributions estimated with velocity-SPS for 2015–2020 evolution (red: high probability, blue: low probability).

In Table 2 we show the top 20 average improvements in terms of MAE of the combined model over the IMF forecasts. These are averaged over the three windows for which we have conducted our tests. There is a clear prevalence of developing countries. This is in line with the idea that fitness describes structural industrial properties, which are the main drivers of growth in developing economies. On the other hand, fitness is less effective in capturing financial and monetary effects, which are best described by the IMF models. We believe that this complementarity is the key to understanding the improvement in predictability of the combined SPS + IMF model.

Extensive tests of the statistical robustness of such results are presented in the Supplementary Figs. 3,4.

Outlook

In recent years we have witnessed a strong debate about the ability of current macroeconomic models to forecast growth and crisis, both internal to the field and in the mainstream media. Many have pointed out the need for a fundamental rethinking of economic modelling that goes in the direction of a more scientific and less dogmatic approach. Economic complexity has been one of the attempts in this direction. The fact that here we have shown how these low-dimensional representations are so effective in forecasting growth can have a strong impact on the general thinking about the driving forces of economic growth and on our understanding of the main features of economic development.

An important side note is that the effectiveness of our methods means that countries’ development paths tend to be similar across countries with no specific temporal or geographical limitations. This kind of ‘universal’ behaviour inspires the exploration of the scale invariance of the growth mechanisms that can be described by EC, especially towards smaller regional scales. Although regional trade data are at the moment largely unavailable, and in general not sufficiently standardized, we believe that encouraging results at the country level such as those presented here can motivate their collection in the future, at least in some areas of the world, such as the European Union. Some preliminary results show nontrivial behaviours at the level of single firms11,23. All these observations imply that a deeper and more solid connection between EC and the discipline of macroeconomics in general is the clear next step to pursue. As we have shown, also a trivial averaging is already effective in terms of accuracy.

Many of the ideas and methods from EC are currently being used, on the field, by large macroeconomic institutions such as the World Bank1. Nevertheless, we believe that there is tremendous margin for improvement in a more formal and refined integration of EC and dynamical systems in the standard macroeconomics practice, both in terms of accuracy and, more importantly, in our understanding of growth and development. Implications can stretch a long way, from a simply more effective approach to data gathering (for example, at the regional level) to a whole new way of designing development policies.

Methods

The fitness–complexity algorithm

The fitness–complexity algorithm (FC) has been introduced and explored in a recent series of papers13,14,24. It allows one to define a measure of countries’ industrial fitness and products’ complexity. The fitness of a country is defined as the weighted sum of the complexity of the products of which that country is a competitive exporter. The complexity of products is defined in a self-consistent way as a nonlinear function of the fitness of the countries that are competitive exporters of that product. The spirit is to bound the complexity of a product with the fitness of the less complex economy that is able to be a competitive exporter of the product. In formulas the FC algorithm is defined iteratively as

$$\left\{ \begin{array}{l}\tilde F_c^{(N + 1)} = \mathop {\sum}\limits_p {\kern 1pt} M_{cp}Q_p^{(N)}\cr \tilde Q_p^{(N + 1)} = \frac{1}{{\mathop {\sum}\limits_c \frac{{M_{cp}}}{{F_c^{(N)}}}}}\end{array} \right.$$
(1)

with a normalization step after each iteration:

$$\left\{ {\begin{array}{*{20}{l}} {F_c^{(N)} = \frac{{\tilde F_c^N}}{{\left\langle {\tilde F} \right\rangle _c}}} \hfill \cr {Q_p^{(N)} = \frac{{\tilde Q_p^N}}{{\left\langle {\tilde Q} \right\rangle _p}}} \hfill \end{array}} \right.$$
(2)

where Mcp is a binary matrix whose elements are 1 if the country c is a competitive exporter of product p. A precise definition of this matrix is crucial, as it is the only input that we use for our measures of competitiveness and forecasts of growth. The procedure that we use to compute it out of export data is detailed in the Supplementary Fig. 1, and the impact of such procedures is shown in Supplementary Fig. 2 and Supplementary Table 1. The FC iterations lead to a unique fixed point that does not depend on the initial conditions \(F_c^0\) and \(Q_p^0\) and whose convergence properties are extensively discussed in ref. 25. Extensive discussions about the motivation for the introduction of the FC algorithm and the properties of the fitness and complexity measures can be found in ref. 14.

SPSb and velocity-SPS

In order to forecast annualized five-year GDPpc growth, we develop a statistical framework to select and weight analogues in the log(fitness)–log(GDPpc) plane (FG plane). We define \({\mathbf{x}}_{c,t}\) the position of country c in the FG plane at time t and \(\delta {\mathbf{x}}_{c,t}\) the displacement vector of country c from time t to t + Δt, on the same plane. We always refer to Δt = 5 years in this paper.

To forecast \(\delta {\mathbf{x}}_{\tilde c,t^ \ast }\)—that is, the evolution of country \(\tilde c\) from time t* to t* + Δt—we consider the analogues to be the set of available data points \(\mathrm{x}_{c,t}\) on the FG plane for which the five-year evolutions would be known at time t. Namely all the \(\delta {\mathbf{x}}_{c,t}\) where t ≤ t* + δt. We build a probability distribution for \(\delta {\mathbf{x}}_{\tilde c,t^ \ast }\) in two steps:

  1. 1.

    We sample the set of analogues with a probability distribution given by a Gaussian kernel centred in \({\mathbf{x}}_{\tilde c,t^ \ast }\); that is, an analogue is sampled with probability

    $$p\left( {{\mathbf{x}}_{c,t}|{\mathbf{x}}_{\tilde c,t^ \ast }} \right) = \frac{1}{{\sigma \sqrt {2\uppi } }}{\mathrm{e}}^{ - \frac{{\left| {{\mathbf{x}}_{c,t} - {\mathbf{x}}_{\tilde c,t^ \ast }} \right|^2}}{{2\sigma ^2}}}$$
    (3)

    where σ = 0.5. So our definition of analogues is just dependent on the proximity in the FG plane regardless of time or other variables. We sample with repetition N analogues, where N is the number of available analogues.

  2. 2.

    We sample 1,000 batches with the above procedure (bootstrap). The global distribution of the sampled displacements is our probabilistic forecast of \(\delta {\mathbf{x}}_{\tilde c,t^ \ast }\). The mode of the distribution is used as our forecast value, and the standard deviation as the uncertainty on the forecast.

This method is described in Fig. 1a and we call it the bootstrapped selective predictability scheme (SPSb).

In order to take into consideration the fact that GDP growth tends to be strongly self-correlated, we develop a different version of the SPSb approach that combines the bootstrapping of analogues with the recent GDP growth of the country. We name this approach velocity-SPS. To do so, we perform two forecasts: the SPSb forecast as described above, and a naive forecast where we predict the country to grow exactly as much as it grew in the past five years. To combine the past velocity with the SPSb distribution we use the Gaussian approximation, which we have shown holds in Fig. 2 for the SPSb, taking a σvel for the velocity equal to the standard deviation of all the past one-year velocity of the given country, so quantifying the spread of the velocity distribution in the past. We use one-year instead of the more intuitive five-year velocity to have more data and a better estimation of the standard deviation. The resulting distribution is a binormal distribution with mean value μ and variance σ2

$$\mu = \frac{{\frac{{\mu _{\mathrm{sps}}}}{{\sigma _{\mathrm{sps}}^2}} + \frac{{\mu _{\mathrm{vel}}}}{{\sigma _{\mathrm{vel}}^2}}}}{{\frac{1}{{\sigma _{\mathrm{sps}}^2}} + \frac{1}{{\sigma _{\mathrm{vel}}^2}}}}$$
(4)
$$\sigma ^2 = \left( {\frac{1}{{\sigma _{\mathrm{sps}}^2}} + \frac{1}{{\sigma _{\mathrm{vel}}^2}}} \right)^{ - 1}$$
(5)

These methods naturally provide probabilistic forecasts of GDPpc and fitness growth. For all the results presented in this paper, we considered the marginal distributions along the GDPpc axis. In the Supplementary Information we show as an example what would be the accuracy of linear models relating the same variables to GDP growth. Results are shown in Supplementary Table 2 for comparison.

Sources and specifications of GDP data

Throughout the analysis, as the GDP dimension in the GDP–fitness space, we use the gross domestic product per capita based on purchasing power parity in constant 2011 international dollars, (GDP, PPP (constant 2011 international $)). These data have been acquired from the World Bank website on 17 June 2017. The same data are used as ground truth for all the out-of-sample error estimations. Such choices are motivated by three reasons:

  • We use GDP PPP in constant dollars to remove growth terms due to inflation and other monetary effects.

  • In terms of smoothness and general data quality, the World Bank GDP data is the best publicly available global dataset that we could find.

  • We want to use a third party dataset as ground truth in order to have an unbiased estimate of IMF errors on the growth rates.

The IMF provides five-year historical forecasts in the World Economic Outlook (WEO) dataset3. Such forecasts are expressed for the GDP per capita PPP in current dollars. We applied the exchange rates provided by the World Bank to convert such forecasts into the same 2011 constant dollars. We downloaded the data from the IMF website in February 2018.

One important limitation of all these datasets, including the COMTRADE dataset, is that a true strict out-of-sample error estimation is not completely possible, due to ex-post revisions that are conducted on the data. Citing comments written in the WEO files: ‘[WEO] Historical data are updated on a continual basis, as more information becomes available, and structural breaks in data are often adjusted to produce smooth series with the use of splicing and other techniques […] When errors are discovered, there is a concerted effort to correct them as appropriate and feasible.’. In our experience we have observed that significant corrections to trade data in COMTRADE are usually limited within a year after their first release. We have no precise notion of how often and with what delays the World Bank corrects past GDP data.

Such caveats are of course important, but since these apply to all the data sources, including our IMF benchmarks, we do not expect our conclusions to be biased in any particular direction.

Data availability

All the GDP data used in this work is publicly and freely available at https://data.worldbank.org/.

The COMTRADE dataset is available at https://comtrade.un.org/. Bulk downloads may require a paid subscription.

The economic fitness dataset is available at https://datacatalog.worldbank.org/dataset/economic-fitness. However, that version of the dataset, although similar to the version used in this work, is not exactly the same, includes fewer countries and has been regularized with a simpler data-sanitation procedure. The most updated version of the fitness dataset will be published together with an upcoming methodological paper, explaining in detail all the manipulations needed to compute fitness from the raw COMTRADE dataset. In the meantime, the dataset used in this work will be provided by the authors upon request. The interested reader can contact the corresponding author via the provided e-mail address.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Cristelli, M. C. A., Tacchella, A., Cader, M. Z., Roster, K. I. & Pietronero, L. On the predictability of growth Policy Research Working Paper WPS 8117 (The World Bank, 2017).

  2. 2.

    Long-Term Projections of Asian GDP and Trade (Asian Development Bank, 2011).

  3. 3.

    World Economic Outlook. Subdued Demand: Symptoms and Remedies (IMF, 2016).

  4. 4.

    Cecconi, F., Cencini, M., Falcioni, M. & Vulpiani, A. Predicting the future from the past: An old problem from a modern perspective. Am. J. Phys. 80, 1001–1008 (2012).

  5. 5.

    Little, D. Varieties of Social Explanation (Westview, Boulder, CO, 1991).

  6. 6.

    Lorenz, E. N. Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci. 26, 636–646 (1969).

  7. 7.

    Ye, H. et al. Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proc. Natl Acad. Sci. USA 112, E1569–E1576 (2015).

  8. 8.

    Calude, C. S. & Longo, G. The deluge of spurious correlations in big data. Found. Sci. 22, 595–612 (2017).

  9. 9.

    Hausmann, R. & Hidalgo, C. A. The network structure of economic output. J. Econ. Growth 16, 309–342 (2011).

  10. 10.

    Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. A new metrics for countries’ fitness and products’ complexity. Sci. Rep. 2, 723 (2012).

  11. 11.

    Di Clemente, R., Chiarotti, G. L., Cristelli, M., Tacchella, A. & Pietronero, L. Diversification versus specialization in complex ecosystems. PLoS One 9, e112525 (2014).

  12. 12.

    Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. Economic complexity: conceptual grounding of a new metrics for global competitiveness. J. Econ. Dyn. Control 37, 1683–1691 (2013).

  13. 13.

    Zaccaria, A., Cristelli, M., Tacchella, A. & Pietronero, L. How the taxonomy of products drives the economic development of countries. PLoS One 9, e113770 (2014).

  14. 14.

    Cristelli, M., Gabrielli, A., Tacchella, A., Caldarelli, G. & Pietronero, L. Measuring the intangibles: A metrics for the economic complexity of countries and products. PLoS One 8, e70726 (2013).

  15. 15.

    Cristelli, M., Tacchella, A. & Pietronero, L. The heterogeneous dynamics of economic complexity. PLoS One 10, e0117174 (2015).

  16. 16.

    Balassa, B. Trade liberalisation and “revealed” comparative advantage. Manch. Sch. 33, 99–123 (1965).

  17. 17.

    Pritchett, L. & Summers, L. H. Asiaphoria Meets Regression to the Mean NBER Working Paper 20573 (National Bureau of Economic Research, 2014).

  18. 18.

    Dreher, A., Marchesi, S. & Vreeland, J. R. The political economy of IMF forecasts. Public Choice 137, 145–171 (2008).

  19. 19.

    Batchelor, R. How useful are the forecasts of intergovernmental agencies? the IMF and OECD versus the consensus. Appl. Econ. 33, 225–235 (2001).

  20. 20.

    Frenkel, M., Rülke, J.-C. & Zimmermann, L. Do private sector forecasters chase after IMF or OECD forecasts? J. Macroecon. 37, 217–229 (2013).

  21. 21.

    Pugliese, E., Chiarotti, G. L., Zaccaria, A. & Pietronero, L. Complex economies have a lateral escape from the poverty trap. PLoS One 12, e0168540 (2017).

  22. 22.

    Solow, R. M. A contribution to the theory of economic growth. Q. J. Econ. 70, 65–94 (1956).

  23. 23.

    Pugliese, E., Napolitano, L., Zaccaria, A. & Pietronero, L. Coherent diversification in corporate technological portfolios. Preprint at https://arXiv.org/abs/1707.02188 (2017).

  24. 24.

    Battiston, F., Cristelli, M., Tacchella, A. & Pietronero, L. How metrics for economic complexity are affected by noise. Complex. Econ. 3, 1–22 (2014).

  25. 25.

    Pugliese, E., Zaccaria, A. & Pietronero, L. On the convergence of the fitness–complexity algorithm. Eur. Phys. J. Spec. Top. 225, 1893–1911 (2016).

Download references

Acknowledgements

The authors wish to thank M. Cristelli for useful discussions and technical ideas about the methods proposed in this paper, as well as giving fundamental contributions to many of the foundations of the economic complexity framework. The authors wish to thank M. Cader (Lead of Country Analytics, IFC) for his support and application of these methods to IFC country strategy. The authors wish to thank K. Roster for feedback on the SPS methods from use in World Bank Group country strategy instruments. This work has been partly funded by the Italian PNR project ‘CRISIS-Lab’. The findings, interpretations and conclusions expressed herein are those of the authors and do not necessarily reflect the view of the World Bank Group, its Board of Directors or the governments they represent.

Author information

Affiliations

  1. Institute for Complex Systems, CNR, Rome, Italy

    • A. Tacchella
    •  & L. Pietronero
  2. Country Analytics, International Finance Corporation, World Bank Group, Washington DC, USA

    • A. Tacchella
    •  & L. Pietronero
  3. La Sapienza University of Rome, Rome, Italy

    • D. Mazzilli
    •  & L. Pietronero

Authors

  1. Search for A. Tacchella in:

  2. Search for D. Mazzilli in:

  3. Search for L. Pietronero in:

Contributions

All the authors contributed equally to the design of the methods. D.M. ran the computations and analysed the data. All authors contributed equally to the writing of the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to A. Tacchella.

Supplementary information

  1. Supplementary Information

    Supplementary Information, Supplementary Figures 1–4, Supplementary Tables 1–2, Supplementary References 1–13

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41567-018-0204-y