Reconciling contrasting views on economic complexity

Summarising the complexity of a country’s economy in a single number is the holy grail for scholars engaging in data-based economics. In a field where the Gross Domestic Product remains the preferred indicator for many, economic complexity measures, aiming at uncovering the productive knowledge of countries, have been stirring the pot in the past few years. The commonly used methodologies to measure economic complexity produce contrasting results, undermining their acceptance and applications. Here we show that these methodologies – apparently conflicting on fundamental aspects – can be reconciled by adopting a neat mathematical perspective based on linear-algebra tools within a bipartite-networks framework. The obtained results shed new light on the potential of economic complexity to trace and forecast countries’ innovation potential and to interpret the temporal dynamics of economic growth, possibly paving the way to a micro-foundation of the field.

1. In the abstract the authors state that Economic Complexity is about the diversity of product exports. While this was true for the original works, nowadays the field has extended much beyond, and the more general subject of EC can be better described as the diversity of (human) activities. Especially in the field of economics EC has been often confused as a framework to study global trade, and I think it is important to not further fuel this misunderstanding. 2. line 179: does it really make sense to multiply an intensive complexity measure by the population of the country? What does that mean?
To conclude, I have found this contribution very interesting at least in principle. However I think the manuscript in its present form does not allow to evaluate if this contribution adds substantially to the field, or if it just provides a neat mathematical framework but no additional forecasting or explanatory power. My comments are mostly in the direction of getting hints about this point (actually proving a better forecasting power would probably be beyond the scope of this paper). I don't really expect that The paper provides an interesting and novel insight on a series of previously proposed metrics to assess in a synthetic way the capability stock of countries. It surely contributes to the previous literature in a constructive way providing a general framework to better understand the differences of existing metrics and how to, potentially, improve them, especially the Fitness.
The paper is well written, it is extremely neat and pleasant to read.
I would have only two major remarks: i) figure 2 and related analysis in the text: differently from the first part, I find the analysis a bit weak and the three regimes sound a bit anecdotal. The authors should make this part more quantitative. As a simple (and minimum ) step in this direction, they should somehow smooth and aggregate the dynamics in their plane to show the emerging flow as did in [1,2,3] for instance and try to show the three regimes. The analysis would be even more robust and complete, if the authors would try an analysis in the spirit of what did in [4] where with a simple Granger causality analysis it is shown that there are different regimes of interaction between two variables in the perspective of a dynamical system -the Granger test is just a suggestion, it can be any other techniques which is considered by the authors appropriate to quantify the effects in this plane. We should expect that in the left bottom corner, X1 and X2 to be essentially unrelated -causality speaking -, in the middle part, they should be related and perhaps one is leading the other -this would be an interesting question to address -, top right part, I do not know what to expect a priori.
ii) the authors often state that the high correlation of two metrics is somehow implicitly a proof of the fact that they carry similar information. For instance, it is used to justify that the non-linear specification of the fitness is likely not needed. While, generally speaking, I essentially agree with the content of the paper, I strongly disagree with the authors on this specific point. A priori, we do not know in which part of the variables is stored the informative content, two variables might be 99% correlated but the difference in their forecasting power can be paradoxically huge and all concentrated in the residual part. Let us consider two variables Z1= a X + (1-a) Y1 and Z2 = a X + (1-a) Y2, and Y1 is orthogonal to Y2 and Y1 is signal while X and Y2 are noise. If 'a' is close to 1, the two variables will be highly correlated but only Z2 will be useful to forecast/explain something, the signal-to-noise ratio will be surely weak, but only Z2 will have signal despite the high correlation with Z1. This is for instance the case of ECI and Fitness, they are correlated, but once used in practice to forecast or explain economic dynamics, the discrepancies distinguish the two variables much more than what the correlation would suggest. I would suggest to either downgrade the argument which is leveraged several times or, more involved, to show that the linearized fitness and the non-linear have a similar predictive/explanatory power (see for instance last suggestion). Provided the correlation argument only, the conclusion cannot be drawn and the statement should be only descriptive. As a side comment, the authors should also show that the results of the correlation does not depend on the specific estimator, if they use Spearman or Kendall's Tau correlation, do the results change?
A minor comment concerns figure 3, the one of the three different barycenters. The fact that the GDP barycenter is still far from the one provided by the GENEPY index is likely the sign that the potential of economic growth of Asian countries is still strong (even for China) as observed in [3].
As a general suggestion, likely for the next paper -it would interesting to test GENEPY index in a framework like the one proposed in [3] because this would the true benchmark to see if GENEPY carries more information that Fitness and ECI and if the linearized part of the Fitness X1 is really carrying the same information of the non-linear counterpart.

Reviewers' comments:
Reviewer #1 (Remarks to the Author): The paper provides an interesting and potentially impactful contribution to the Economic Complexity (EC) literature. In particular it introduces a unified linearized framework to reconciliate the two most popular bipartite centrality measures used in EC, namely ECI/PCI and Fitness/Complexity. While I appreciate the mathematical formalism, that allows to provide more rigorous formal interpretations of the metrics, I think the paper lacks in providing more economical interpretations and validations of the new metrics introduced.
The ECI and Fitness formulations were proposed by their authors on the basis of some heuristic reasoning, related to a tripartite countries-capabilities-products network and the 'Building Blocks' combinatorics. In this sense, even if without any formal justification, there is an intuition about what the metrics are trying to capture. While in this work the formalism is certainly better defined, I think the intuitions are much harder to grasp, and there is no explicit validation that suggests that this metric is preferable for any economic task, other than the better formalism itself.
We thank this reviewer for the positive feedback about our work. Generally, we agree with him about the grounding reasoning of the present work. In fact, the aim of this work is not to present a new indicator, rather to show how the two existing metrics, ECI and Fitness, can be joined in a unique, neat framework, thus providing a tool to exploit the potential (and the economic significance and validation) of both metrics in assessing the hidden capabilities of countries. It follows that our framework inherits the same intuitions/assumptions both methodologies (firstly FC, secondly ECI) have introduced about the way countries manage and distribute their capabilities in their export baskets, from which the complexity of products is determined. This is implicit in the way we compute the GENEPY indices for countries and products, which have as input the matrices from the linear mapping of the FC algorithm (i.e.,). Moreover, to combine ECI and FC (as the result of the neat framework we here present, not an "a priori" construction) resolves the old (and sometimes harsh) debate on which index to use; a debate that in our view has weakened the application of Economic Complexity in the field of economic studies.
We have made this explicit by editing line 103 of the revised manuscript as follows: "…FC carries more information than MR. The grounding hypotheses about the hidden capabilities of countries -and on how these can be deducted looking at the export baskets of countries upon which the EC algorithms are built -are preserved in our framework. From here on…" And at line 146: "…Eq (15), respectively. Being the GENEPY framework grounded on both existing indicators of economic complexity (the FC and the MR algorithms), it inherits the intuitions and rationales upon which these two metrics are built: the capabilities of countries to export diversely complex goods are hidden within the bipartite network of countries and exports, under which they combine to maximize the complexity of the goods." For what concerns validation, we embrace the idea of economic complexity as a driver for growth [a, d]. Any formal validation would require defining an objective function (e.g., predictive power of economic growth) and verifying if using the GENEPY as an additional independent variable allows one to get closer to the selected objective (e.g., to improve the predictive power). However, the validation problem is somewhat illposed: the standard use of the GDP to measure growth, for example, is contradictory within a frameworkthe economic complexity one -aiming to overcome a simplistic, one-dimensional, view of the economic dynamics. One would be tempted to use the economic complexity measures to uniquely represent economic growth, but this would induce a logical loop in the system, where the validation is based on the same variable subject to validation. We therefore deliberately decided not to enter the validation arena in this work, trusting the validation efforts performed by others on the Fitness and ECI measures [a,b,e], and taking advantage of their results to also support our multidimensional metrics, whose components are in fact strictly related to Fitness and ECI.
Lastly, we would like to highlight, once more, that -from an economical point of view -this work could pave the way to the microeconomics foundation of the Economic Complexity field, due to the similarities of the formalisms among the GENEPY and the EXPY [c]. Once again we underline that this similarity is a result of the application of our framework, and not an "a priori" construction: in a sense, the economical concepts are selfemerging, with some significant variations with respect to the original economic complexity works, from the reformulation of the intuitions of MR and FC within a neater mathematical framework. We argue that this aspect is a fundamental one for the improvement of data-science based economics. Concerning with this comment, some sentences have been added at lines 265 and on: "Moreover, in the FC algorithm the Quality of a product is mainly determined by the least fit country exporting it, a crucial property accomplished by the non-linearity of the FC approach. In our linear framework, this property is maintained through the term ′ = ∑ / , occurring in = / ′ . This term in fact represents the degree of a product corrected by how easily it is found within the network. Its inverse 1/ ′ is an anti-centrality score for the product, determining how limited is its presence within the producers' baskets and thus suggesting the need for higher productive knowledge in its production process. Notice that, by substituting the incidence matrix M with the traded monetary values, the term ′ also recurs in the so-called EXPY rationale by Hausmann et al. [c] Based on a decision-making model of firms' investment choices, the Authors in [c] defined an index of economic growth potential of countries, assessed through the required productive level of the exported products, i.e., EXPY. As we show, (see Methods,Eq (17)), the equations to compute Xc in the GENEPY framework are similar to those defining the EXPY scores of countries [c]. Clearly, EXPY has been defined from a different deductive rational, which considers the trade as described by the weighted incidence matrix of the monetary fluxes (thus providing different input information) and embeds exogenous information such as the GDP per capita. Notwithstanding these differences, the formal similarity of GENEPY with EXPY is striking. This similarity is a result of the application of our framework, and not an "a priori" construction: in a sense, the economic concepts are self-emerging, with some significant variations with respect to the original EC framework we here reconcile [a,b]. In our view, this similarity represents a possible micro-economically sounded bases for the economic complexity theory, toward which we address future work." [a] Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A., & Pietronero, L. (2012). A new metrics for countries' fitness and products' complexity. Scientific reports, 2, 723.
Therefore I would suggest addressing these points to improve the paper: 1. line 24: to be more precise, MR has been introduced in [12] without any reference to the algebraic formulation. The algebraic formulation has been introduced in [A] and further expanded in [B], much earlier than [14] and [16]. Since [A, B] cast the MR on the same linear algebra framework on which also this paper builds and expands, I think it would be fair to cite and contextualize these works.
We thank the reviewer for raising this point and for the useful readings suggested. We have changed line 24 as follows: "The equations defining the two averages are coupled to obtain the Economic Complexity Index, ECI, and the Product Complexity Index, PCI, [11,12] , which have been shown to be the result of a linear algebra exercise [A,B,11]." 2. line 74: it is very interesting to find such an high correlation, however this raises a few questions/comments that can be addressed to better contextualize the results: -It is important to notice that one of the main features of Fitness is to be a sum (not an average) over the products. Here the authors divide the Fitness by k_c, therefore transforming it into an average. This passage is non trivial and can have a big impact on the rankings: the authors should comment on this. In what context and why is this desirable? Isn't diversification itself an important point?
We thank the reviewer for this point. A response pertains with a possible misunderstanding: ,1 and / carry approximately the same information (correlation coefficient larger than 0.98), while : ,1 and do not. In fact, the mentioned sentence at line 74 refers to a figure in the SI included to "validate" the linearization procedure, where the comparison analysis is performed between variables sharing the same meaning (i.e., ,1 and / , or, analogously ,1 * and ). We clarified this concept at lines 75: "… See Methods, Eqs (11) -(13)). Surprisingly enough, comparing the terms ,1 and / , or ,1 * and , for the Fitness values -analogously ,1 and ′ (or ,1 / ′ and ) for the Quality values -this linearization preserves >98% of the information (independently of the kind of indicator of correlation chosen, Figure S2), thus questioning …" The rankings obtained with ,1 and are indeed different ones, and in this sense this reviewer has a good point in arguing that transforming the sum ( ) into an average ( ,1 ) might imply that diversification is lost from our metrics, with disputable economic implications. However, this is not the case: ,1 maintains a very high correlation with diversification (as Figure 1 shows, >0.85 correlation coefficients -both in Spearman and in Pearson formulations). This might appear surprising, because ,1 is obtained by dividing by , and one could expect taking the ratio kills correlation: however, this would be the case only if and were perfectly linearly related, which of course is not the case. Since this relation is super-linear (because more sophisticated products are produced by countries with highly diversified baskets), ,1 preserves information on diversification. We have included some discussion on this very relevant issue after the comment added at line 146 of the revised manuscript: "…Eq (15)), respectively. Being the GENEPY framework grounded on both existing indicators of economic complexity (the FC and the MR algorithms) it inherits the intuitions and rationales upon which these two metrics are built: the capabilities of countries to export diversely complex goods are hidden within the bipartite network of countries and exports, under which they combine to maximize the complexity of the goods. Also, since ,1 maintains a very high correlation with (see SI, Figure S6), our framework preserves the information on diversification, which is a relevant one to understand how export capabilities are exploited by countries." -Are the deviations from this correlation informative of something? What countries are the biggest outliers? Can this help explaining the relation between Fitness and its linearized version?
We thank the reviewer for raising this point. The mapping in Eqs (14) solves the linearized version of the FC algorithm in Eqs (13) by defining the symmetric matrices N and G (Eqs (15) -(16)). As we explain, the GENEPY arise from the eigenvectors of the matrix N, this one being interpreted as proximity matrix. This interpretation entails setting the elements of the matrix as: { * = ∑ * * ( ′ ) 2 , ≠ * * = 0, = * ; i.e., the diagonal values are set to zero (any other uniform value would lead to the same eigen-result; we lean toward the use of zero because it allows one to delete, in network jargon, the unnecessary information about the self-loops). If one left the diagonal values as resulting from the linearization and mapping procedure, such values would be different among different countries, and this would corrupt the interpretation of N as proximity matrix (in fact each country has perfect similarity to itself and, therefore, all diagonal entries have to be equal). (14)   is used also for = * , i.e., when the matrix is not interpreted as a proximity matrix (see Methods,Eq (13) and SI, Figure  S5). However, this would imply inflating the (or ,1 ) values for countries with large self-interactions, which, in our opinion, induces an undesired bias in the results. Analogously, a good proxy of the ECI values is obtained […]. In this case, the scatter of the plot is due to the differences in the matrices and (see Methods,Eq (9) and Eq (15)), resepctively." -Is this correlation a specific feature of the countries-products bipartite network or is it found also in other contexts? E.g.: would such high correlation be found on sparser/denser bipartite networks? On more/less nested networks? In random networks? In bipartite networks arising in other domains (ecology, biology, technology)? I don't expect the authors to sistematically explore these questions, as that would be an entirely new paper. However I would suggest to run some test at least on perturbations of the countries-products network, to provide a better understanding of the similarity and differences between Fitness and this linearized counterpart.

However, to use the mapping in Eqs
-Fitness/Complexity is known to give a close-to-perfect nested ordering of the rows and columns of the bipartite matrices [C, D]. One way to look at this is to notice that it allows to very efficiently 'pack' all the non-zero elements of such matrices in a 'triangle'. Is the 'packing' induced by this linearized version better, equal or worse than that arising from Fitness? This could be another way to understand in which sense this linearized version approximates Fitness.
Thanks for these comments. We are aware of the potential the FC algorithm has in minimizing the nestedness temperature of ecological networks [f], and in this field non-linearity has been shown to be an important feature of the algorithms for temperature minimization [g]. As detailed in the Discussion, our work only pertains with the specific structure of the countries-products bipartite network, with no ambition to build up an algorithm with more general 'data-packing' potential, where indeed we expect nonlinearity to play a major role. To confirm this statement, we have tested the packaging performance of the linearized form of the FC algorithm, similarly to the comparison showed in [f]. We exemplify the results through the analysis of the pollination networks provided by The Web of Life project,(network IDs : M_PL_062 and : M_PL_015). The networks describe the pollination phenomena among plants and pollinators. As Figure 3 shows, the nonlinear algorithm outperforms the linearized form in the capability of maximizing the nestedness of the matrices for the two pollination networks we have taken for the example. Instead, there are no significant differences between the non-linear and the linear algorithm for maximizing the data-packing of the trade matrix, confirming that the feature of linearity only pertains with the countries-products bipartite network. sentence. There are many other ways to project a bipartite network into a monopartite one that can be interpreted as a proximity network. Can the authors better clarify what they mean?
We thank the reviewer for this insight. To the best of our knowledge, the presented mathematical framework is the only one that allows one to merge the two analyses related to the countries-products bipartite network, i.e., measuring competitiveness and simultaneously, using the same matrix, defining the similarities among countries. In our view, this feature is an important one: For better clarify this point, we have modified the text at lines 82 -84 as follows: "The use of the variables Xc and Yp allows one to gain neatness in the mathematics, also reflected by the fact that the matrices N and G can be considered as suitable proximity matrices containing information about the similarities among countries and products, respectively. This aspect ..." 4. lines 88-89: they are eigenvalues of different order, but also of different matrices. I might agree that it is pointless to compare the two metrics with respect to what they say on the topology of networks, however these metrics are tipically compared on how much they can say about the economic status of countries. In this context is perfectly fine to compare different topological properties in order to see which one carries more information about economies. I think the authors should better clarify this sentence.
We thank the reviewer for pointing this out. In this framework we are showing that, these metrics arise from some structural features of the countries' proximity networks, N A and N B , for MR and FC, respectively. Although the matrices N A and N B are different, as we show in Figure S3, there exist 88% of correlation among the eigenvectors of the same order of the matrices N A and N B . Therefore, even considering the matrix N B and its corresponding eigenvectors ,1 and ,2 , the information these vectors are bringing about the structural properties of the similarities across the countries are differently relevant. In this sense, we agree with the reviewer that the comparison should only be performed at the analytical level, not the economic one.
We have updated the text at line 89 -and on -as follows: "… notwithstanding the differences among the matrices N from which these metrics are recovered, the eigenvectors ,1 and ,1 carry similar information (see Figure S3), as also ,2 and ,2 (this is also partially true for Yp, see SI, Figure S6). Therefore, the divergences between Fc and ECI -and corresponding outcomesshown in Figure S1 should be mainly attributed to the fact that eigenvectors of different order are considered in the two approaches. Hence, the two metrics bring …" We thank the reviewer for pointing these observations out. We merge the answers to these two points because they are related.
The knee-shape of the points in the plane ,1 − ,2 is recurrent in all the years of analysis, thus showing the existence of a functional relationship between the two eigenvectors. The reasons of the knee-like shape of this functional relationship are related to linear algebra and network science.
Let us define a functional relationship between ,1 and ,2 s.t. (iv) if any element of the eigenvector corresponding to the first (largest) eigenvalue 1 is zero, the same element is null also within the successive eigenvectors. In fact, the eigen-equation for the matrix N is: * ,1 1 = ∑ * ,1 ; because of condition (ii), it holds that * ,1 = 0 iff * = 0, i.e., if the matrix has null elements along the column (or row) * . Interpreting this result through network science lenses, the node to which the null element of the eigenvector refers is disconnected in the network. Therefore, in the hypothesis of existence of any functional relationship between two eigenvectors as in Eq 1, it must hold (0) = 0.
We now proceed exploring two cases of possible functional relationship for Eq 1. Low values of ,1 , high values of ,2 or viceversa: this situation identifies the presence of some "outliers" of the core and the periphery components. These nodes connect the stronger and the weaker components and have a role in bridging the gaps across the network. We identify these nodes as able to jump from one group to another.
We added some comments on this part at line 163 and on, and included this analysis in the SI, S1.1: "… One recognizes that also the ensemble of the trajectories is knee-shaped: in fact, in each year of analysis the positions of countries in the plane ,1 − ,2 configures in a knee-like shape as shown in Figure 1 for the year 2017. The presence of this shape is related to linear algebra and network science (see SI, S1.1)." As regards the ability of ,2 to cluster countries, the authors in [m] have proved that ECI perfectly solves a spectral clustering exercise in a network, with the sign of the eigenvector discerning to which cluster the nodes belong. The matrix we use to construct the GENEPY does not coincide with the same matrix from the Method of Reflection; however, the good correlation we have shown in Figure 1c suggests that the sign of ,2 substantially preserves the clustering information of ECI. These are clearly simply due to the choice of how to build the similarity matrix (being the method perfectly identical to ECI): can we gain some interpretation about the validity of this choice?
As the reviewer pointed out, these comments and the above ones are related one to the other. Resuming: -In panel b the outliers are related to the interpretation of the matrix as a proximity matrix, which implies setting to zero the diagonal values. More explanations on this issue are reported in Response reported above (page 4 -5, Figure 2); -In panel c the presence of the scatter is due to the differences in the matrices from which the second eigenvector is computed. There is perfect coincidence between the term ,2 /√ and ECI, with the second eigenvector being computed from the matrix ; while there is good (but not perfect) correlation between ,2 /√ and the vector ECI, as shown in panel c, when the eigenvector is computed from the matrix . As explained in the text (line 86) we find no definitive arguments to prefer using or as the base for our analysis, except for the fact that with the information carried by the first eigenvector is the same as diversification (in fact, ,1 equals √ ).
Also, we thank the reviewer for the useful suggestion of adding some comments about the outliers, comments that we included at line 138 of the revised manuscript as defined above in this reply letter, page 5.
10. lines 225-227: "As such… countries." I don't agree with this statement. All the economic complexity metrics are essentially more or less sophisticated ways of measuring topological features of the locations-activities networks (see minor comment 1 below). While mathematically these metrics can have better or worse properties, it is only by comparing them against other sources of information that one is able to understand the actual value of the metrics. If we know these metrics do we know something more about the state of the system? Are we able to make better predictions? Or to do them with less assumptions? Less or noisier data? 11. Following my previous comment: can the authors give some ideas of how does GENEPY compare to some standard macroeconomic features of countries (e.g. GDP)? Is there a reason to believe that GENEPY is actually carrying more information about these than Fitness or ECI?

We thank the reviewer for these useful insights about our work. In this work our aim is to show that the two most used measures of EC, Fitness and ECI, can be reconciled in a unique measure of complexity, the GENEPY.
With the sentence the reviewer is referring to we highlight that the GENEPY embeds two variables, i.e., the two eigenvectors, that can be used to trace the trajectories of growth of countries as driven by economic complexity in a 2D plane, without invoking other macroeconomic variables. The idea lying at the foundation of economic complexity is to find quantitative metrics that can complement the more standard ones in describing wealth and economic growth. In this sense, the comparison of these metrics to more standard ones (e.g., per capita GDP) can be a useful exercise, but it leaves much room to interpretation and discussion. Should one be happy of finding a high correlation of econocomplexity metrics with GDP pc (because this entails robustness of the methods, see [a,b,i]), or in contrast a low correlation coefficient, should be seen as good news (because this would be taken as a clue of having added independent information to the system)? Some of the angry debate which have characterized the economic complexity field in recent years have been based on these arguments: aiming at reconciling the contrasting views that emerged during these discussions, we deliberately decided to remain out of the arena.
We have edited the lines 243 -246 as follows: "...that most applications require. As such, the chance of maintaining the simplicity of a data driven approach endows the GENEPY framework with the main founding-reason for which economic complexity was born, i.e., to provide the ground for a more quantitative, data-driven approach to the assessment of the economic growth potential of countries as guided by knowledge [n]." Concerning with this argument, we added some comments in the Discussion section.
[n] Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A., & Pietronero, L. (2013 1. In the abstract the authors state that Economic Complexity is about the diversity of product exports. While this was true for the original works, nowadays the field has extended much beyond, and the more general subject of EC can be better described as the diversity of (human) activities. Especially in the field of economics EC has been often confused as a framework to study global trade, and I think it is important to not further fuel this misunderstanding.
Good observation. The aim of the sentence the reviewer is referring to ("…economic complexity metrics, based on the diversity and sophistication of the products countries export") was to sum up in few words the core of the metrics, but we agree that nowadays this sentence could be misleading and shallow. We edited the abstract in the following way: "…economic complexity metrics, aiming at uncovering the productive knowledge of countries".
2. line 179: does it really make sense to multiply an intensive complexity measure by the population of the country? What does that mean?

As the reviewer also pointed out in the previous comments, EC metrics (intensive complexity measures) are typically compared to standard (intensive) economic features as the GDP per capita [a,b]). For the sake of comparison, since we adopted the methodology in [o], which computes the evolving barycenter of the world weighted by the GDP (absolute value), we proceeded in defining the barycenter of the world according to the GENEPY of countries by multiplying the metrics for the size of the countries' population. We added a comment on this point at line 203:
"…for its population value in time, thus allowing for a fair comparison with the path followed by the GDP (in absolute value) in time." To conclude, I have found this contribution very interesting at least in principle. However I think the manuscript in its present form does not allow to evaluate if this contribution adds substantially to the field, or if it just provides a neat mathematical framework but no additional forecasting or explanatory power. My comments are mostly in the direction of getting hints about this point (actually proving a better forecasting power would probably be beyond the scope of this paper). I don't really expect that the authors perform all the exercises that I propose, I just intend them as a stimulus for the authors to find ways to better show the strength of their contribution.

We thank the reviewer for these comments and insights, which we think have been properly implemented. As for the forecasting and explanatory power, we already commented on the intrinsic difficulty to find an agreed choice of the variable(s) that need to be forecasted or explained; in many cases the GDP pc is selected as the target, but this is partially counterintuitive within a framework trying to innovate data-based economics from its very foundations. In our view, some of the angriness in the debate that developed about this field in the scientific literature can be ascribable to a lack of definition of a clear target for these metrics. In this context, our approach does not only add neatness to the mathematical framework, but, most importantly, it let economic insights naturally emerge from the mathematics.
Reviewer #2 (Remarks to the Author): The paper provides an interesting and novel insight on a series of previously proposed metrics to assess in a synthetic way the capability stock of countries. It surely contributes to the previous literature in a constructive way providing a general framework to better understand the differences of existing metrics and how to, potentially, improve them, especially the Fitness.
The paper is well written, it is extremely neat and pleasant to read.
We are glad to read these positive comments and we acknowledge this reviewer.
I would have only two major remarks: i) figure 2 and related analysis in the text: differently from the first part, I find the analysis a bit weak and the three regimes sound a bit anecdotal. The authors should make this part more quantitative. As a simple (and minimum) step in this direction, they should somehow smooth and aggregate the dynamics in their plane to show the emerging flow as did in [1,2,3] for instance and try to show the three regimes. The analysis would be even more robust and complete, if the authors would try an analysis in the spirit of what did in [4] where with a simple Granger causality analysis it is shown that there are different regimes of interaction between two variables in the perspective of a dynamical system -the Granger test is just a suggestion, it can be any other techniques which is considered by the authors appropriate to quantify the effects in this plane. We should expect that in the left bottom corner, X1 and X2 to be essentially unrelated -causality speaking -, in the middle part, they should be related and perhaps one is leading the other -this would be an interesting question to address -, top right part, I do not know what to expect a priori.
We thank the reviewer for the useful suggestion, which allowed us to improve the manuscript. We agree about the necessity to reinforce the analysis of Figure 2 of the main text. To make the analysis more informative, we followed a path different from the one suggested by this reviewer. In fact, the two components ,1 and ,2 evolve simultaneously, being both variables determined by the way the matrix , and thus the trade, changes year by year. In this framework, it seems meaningless to investigate if one component is driving the other, because both of them descend from the same information.
However, we welcome the suggestion of the reviewer to provide a more quantitative approach to our analysis. To this aim, we have analyzed the trend of the trajectories of all the countries for which there are continuous data in time (154 countries). Figure 1 of this reply shows the resulting dynamics. The arrows connect the point located at the center of mass of ,1 and ,2 during the first 3 years of analysis (1995 -1998) to the center of mass during the last 3 years (2014 -2017). In Figure 2 we show the aggregated dynamics of countries along the knee-shape.  ,1 and ,2 ). Here, countries may lose ground on the plane of growth. There are many factors which may contribute to these downgrading dynamics. In fact, as described in the manuscript, the economic and financial crisis are more likely to be the cause of these drops; also, the entrance in the markets of new economies decreases the potential of economies to increase their economic complexity in time. We added some comments on this part at line 163 and on and included this analysis in the SI, S1.2: "… One recognizes that also the ensemble of the trajectories is knee-shaped: in fact, in each year of analysis the positions of countries in the plane ,1 − ,2 configures in a knee-like shape as shown in Figure 1 for the year 2017. The presence of this shape is related to linear algebra and network science (see SI, S1.1). By analysing the aggregated displacements of countries in time from to 1995 to 2017 (for details see SI, S1.2, Figure S8) it is possible to identify in the graph three regimes of growth:" Line 168 "Impasse: the countries that lie within this area averagely exhibit a horizontal displacement dynamic, within the borders delimited by low values of …" Line 174 "Bounce: marked by the crossing of the zero value of the y-axis, this area defines the increment in quantity and quality of the exports. Here, the average dynamics of the countries is uplifting toward higher stages of growth. Countries …" Line 178 "Arena: … competitiveness, where the GENEPY index of some countries increases in time, that of others follows a decreasing path, instead. In fact, in this area countries aim at increasing the … higher scores in ,2 . However, the entrance of new countries in the competitive market is likely to affect other countries growth. This area …" ii) the authors often state that the high correlation of two metrics is somehow implicitly a proof of the fact that they carry similar information. For instance, it is used to justify that the non-linear specification of the fitness is likely not needed. While, generally speaking, I essentially agree with the content of the paper, I strongly disagree with the authors on this specific point. A priori, we do not know in which part of the variables is stored the informative content, two variables might be 99% correlated but the difference in their forecasting power can be paradoxically huge and all concentrated in the residual part. Let us consider two variables Z1= a X + (1-a) Y1 and Z2 = a X + (1-a) Y2, and Y1 is orthogonal to Y2 and Y1 is signal while X and Y2 are noise. If 'a' is close to 1, the two variables will be highly correlated but only Z2 will be useful to forecast/explain something, the signal-to-noise ratio will be surely weak, but only Z2 will have signal despite the high correlation with Z1. This is for instance the case of ECI and Fitness, they are correlated, but once used in practice to forecast or explain economic dynamics, the discrepancies distinguish the two variables much more than what the correlation would suggest. I would suggest to either downgrade the argument, which is leveraged several times or, more involved, to show that the linearized fitness and the non-linear have a similar predictive/explanatory power (see for instance last suggestion). Provided the correlation argument only, the conclusion cannot be drawn and the statement should be only descriptive. As a side comment, the authors should also show that the results of the correlation does not depend on the specific estimator, if they use Spearman or Kendall's Tau correlation, do the results change?
We thank the reviewer for this good observation. The example he proposes above to demonstrate the weakness of our argument on correlation depicts a very peculiar situation, a reasonable one in theory, but not very likely to occur in the real world, where signal and noise are, typically, inextricably mixed together. However, we accept to follow this reviewer on his ground, and better contextualize the example: in order to have a correlation coefficient 0.99, the example specifies to Z1= 0.99 X + 0.01 Y1 and Z2= 0.99 X + 0.01 Y2. Since Y1 is signal and X is noise, the predictive power of Z1 on Y1 (the "signal" in the reviewer's example) will be 0.01, when measured in terms of correlation. Of course, no one would be interested, in real-world applications, in a variable with 1% correlation coefficient with another one: in fact, the correlation coefficient, as computed from finite-size samples, is itself a random variable, i.e., it is known with uncertainty. The probability distribution of the correlation coefficient R between two uncorrelated random variables follows a transformed student-t distribution with (n-2) degrees of freedom [a]. Another way to support the same reasoning is by considering that real data are of course affected by measurement and reporting errors: it is sufficient to perturb the values in the sample with a 1% multiplicative noise to have a drop to a 0.99 correlation between a measure and itself (for example, between the nonlinear Fitness values measured from the uncorrupted and corrupted dataset). Unfortunately, trade data carry errors which, due to the presence of corrections and assumptions to model the costs, are typically even larger than 1% [b]. Also, data sanitation [c] introduces corrections that we expect will have a larger than 1% drop in correlation.
Last but not least, the correlation coefficient between the linearized and non-linear version of the algorithm further increases to a stunning 0.999 if one avoids setting to zero the diagonal values of the N matrix (Figure 3 of this reply). In fact, the linearized algorithm in Eqs (13) is solved by the computation of the eigenvectors of the matrix N, Eqs (15), which provide the linear values of Fitness by mean of the mapping in Eqs (14). Instead, as we explain, the GENEPY arises from the eigenvectors of the matrix , this one being interpreted as a proximity matrix, i.e., the diagonal values are set to zero. Also, in the case of interpreting the matrices N as proximity matrix provides a very good correlation of the non-linearly and linearly computed values, as we show here in Figure 4. Still, one should be aware that these correlation coefficients are the result of linearization plus self-loops elimination, not linearization alone. On this issue, a comment has been added at lines 75 and on, of the revised manuscript:

As for the suggestion to also use other measures of correlation or dependency, we address the analysis requested by the reviewer by computing the dependence analysis among the non-linear and the linear FC
"… Surprisingly enough, comparing the terms ,1 and / , or ,1 * and , for the Fitness valuesanalogously ,1 and ′ (or ,1 / ′ and ) for the Quality values -this linearization preserves >98% of the information (independently of the kind of indicator of correlation chosen, Figure S2), thus questioning ..." We point out that a small oversight has been done in the caption of Fig S2 in Figure 5. In fact, as the figure shows, there are no significant differences between the non-linear and the linear algorithm for maximizing the data-packing of the trade matrix.   As regards the results we present in this work, we are aware that the ground for comparison among the metrics of economic complexity is their performance in predicting some econometrics about the growth performance of countries (such as the GDP per capita).
In this work our aim is to show that the two most used measures of EC, Fitness and ECI, can be reconciled in a unique measure of complexity, the GENEPY. In this context, our approach does not only add neatness to the mathematical framework, but, most importantly, it naturally let economic insights emerge from the mathematics. The idea lying at the foundation of economic complexity is to find quantitative metrics that can complement the more standard ones in describing wealth and economic growth: in this sense, the validation of EC metrics against some economic indicator would induce a logical loop in the system, where the validation is based on the same variable subject to validation. This kind of comparison of EC metrics to more standard economic ones (e.g., GDP pc) can be a useful exercise, but leaving much room to interpretation and discussion. We therefore deliberately decided not to enter the validation arena in this work, trusting the validation efforts performed by others on the Fitness and ECI measures [d,e,f], and taking advantage of their results to also support our multidimensional metrics, whose components are in fact strictly related to Fitness and ECI.
A minor comment concerns figure 3, the one of the three different barycenters. The fact that the GDP barycenter is still far from the one provided by the GENEPY index is likely the sign that the potential of economic growth of Asian countries is still strong (even for China) as observed in [3].

This is a very useful comment, many thanks. We have included this observation in the manuscript at lines 207 and on:
" [...] poorly impacts the ability of countries to economically grow. The distance between the current position of the barycenter of GDP and GENEPY may also state that Asian countries (China included) still have a strong potential for economic growth, as also stated in [3]." As a general suggestion, likely for the next paper -it would interesting to test GENEPY index in a framework like the one proposed in [3] because this would the true benchmark to see if GENEPY carries more information that Fitness and ECI and if the linearized part of the Fitness X1 is really carrying the same information of the non-linear counterpart.
We thank the reviewer for this suggestion. I want to acknowledge the authors their extensive and convincing work in improving the manuscript.
In my opinion the manuscript in its present form is well written, clear and precise in discussing the advancements it proposes. I think that it does represent an important and substantial contribution to the field and therefore deserves publication in Nature Communications.

Andrea Tacchella
Reviewer #2 (Remarks to the Author): I thank the authors for addressing the two comments. Concerning the first comment on robustifying the trajectories part, I think that now the paper has been improved significantly and the three regimes they find are more convincing.
Concerning the second, actually I disagree both with their reply and even more with the new sentences they added ' thus questioning the relevance of non-linearity to assess the Quality of goods and the Fitness of countries.' The reason of my disagreement is based on the fact that I believe there is a small flaw in the reasoning proposed as a reply for the toy model I was suggesting. Let's go back to the model: Z1= a * X + (1-a) Y1 Z2= a * X + (1-a) Y2 and the sake of simplicity X, Y1 and Y2 have unit variance and X is orthogonal to Y1 and Y2 and Y1 and Y2 are also orthogonal. Let's b = 1-a for the sake of notation. The first point I disagree on is that the correlation between Z1 and Z2 is not 'a' but rather a/sqrt(a^2 + b^2). Therefore a correlation of 0.98 is close to a scenario of a = 0.90 (the authors proved that for 200 countries a = 0.8 is significant and a = 0.8 implies a correlation of 97+%). I would therefore say that they are instead proving my point, we are in a regime of correlation where the 'a' is such that my example is statistically significant (or very close to be).
The second point is that they have 200 countries and 20 years. This means that the t-stat they estimated is the one for only one year, so the true significance over 20 years is the t-stat they estimate times sqrt(20) as a first approximation. This means that a scenario with 'b' much lower than 0.10 would be now significant, confirming again my point. That we can have high correlation, small 'a' and still discriminate between Z1 and Z2.
Third point, back to the toy model proposed, it is easy to realize that, assuming that Y1 is the component useful for the forecast, the best strategy is to build the variable Z1 -Z2, this one would magnify my signal to noise ratio in a dramatic if 'b' is small. This shows that they can be very correlated but the best scenario from a signal to noise ratio point of view stays in the difference of the variables not in the common part.
I really think that the paper is worth publishing but the authors have to remove the sentences they added and, as said, in my first review they have to mitigate the implicit statement that 'since the linear fitness and the non linear fitness are very correlated, then there are no evidences for going non linear.' High correlation does not imply same forecasting power and we are in regime where the toy model proposed shows that the component which is not common could be statistically significant as discussed above. Actually the authors itself have shown that for a = 0.8 the results would significant having only 1 year of observation.

Reviewer #1 (Remarks to the Author):
I want to acknowledge the authors their extensive and convincing work in improving the manuscript.
In my opinion the manuscript in its present form is well written, clear and precise in discussing the advancements it proposes. I think that it does represent an important and substantial contribution to the field and therefore deserves publication in Nature Communications.
We are grateful to read this positive comment and to have the approval for publication. We thank this Reviewer for the time dedicated to our work.

Reviewer #2 (Remarks to the Author):
I thank the authors for addressing the two comments. Concerning the first comment on robustifying the trajectories part, I think that now the paper has been improved significantly and the three regimes they find are more convincing.
Thanks a lot for your appreciation of our work, and for the kind suggestion provided in the first and in this review round: in fact, we agree that the revised version we submitted after the first review round is more convincing than the original manuscript, in particular in identifying the three regimes. Also, we are confident the additional variations we are now introducing, following this Reviewer's advice, are further improving the quality of the manuscript.
Concerning the second, actually I disagree both with their reply and even more with the new sentences they added ' thus questioning the relevance of non-linearity to assess the Quality of goods and the Fitness of countries.' The "The fact of having found very similar results between the linear and the non-linear versions of the FC algorithm cannot be systematically generalized to other cases: in fact, some bipartite systems may require a genuine nonlinear approach to let their nested nature emerge. However, the good results obtained in this case suggest that there are also systems where non-linearity plays a minor role. We speculate that this might be related to the differences in the decision-making processes ruling these systems." We believe the new version of the manuscript preserves all of its value after this modifications, with the additional advantage that on this specific issue we leave the reader with all the needed, quantitative, information (scatter plot in Figure 1 and values of the correlation coefficients), while we leave out the qualitative reasonings.
Our reply to this Reviewer could even stop here, since we have strictly followed the Reviewer's comment (more precisely, we considered this comment in a wider sense than requested, by extending to other sentences his demand) and "demined" our paper from possible misunderstanding deriving from this "correlation issue".
However, we believe scientific reviews, especially when written with a constructive purpose like in this case, also offer the ideal ground for discussing relevant scientific issues which in some cases, like the present one, even cross the borders of the specific paper being subject to review. With this intention, we add below some further considerations on the points raised by the referee, in an attempt to fully valorize the time and effort he has put in this review.
The reason of my disagreement is based on the fact that I believe there is a small flaw in the reasoning proposed as a reply for the toy model I was suggesting. Let's go back to the model: Z1= a * X + (1-a) Y1 Z2= a * X + (1-a) Y2 and the sake of simplicity X, Y1 and Y2 have unit variance and X is orthogonal to Y1 and Y2 and Y1 and Y2 are also orthogonal. Let's b = 1-a for the sake of notation.
Thanks for summarizing the model. For the benefit of the Editor, we recall that in the model X and Y2 are random noise, while Y1 is signal. The scope of the model is to demonstrate that there might be cases when two variables (Z1 and Z2 here) are highly correlated (hence a is large), but the signal is embedded in one of the two variables (namely, Z1, through the effect of Y1) and not in the other.
The first point I disagree on is that the correlation between Z1 and Z2 is not 'a' but rather a/sqrt(a^2 + b^2).
Thanks for this comment, there was indeed some confusion in our previous reply. We try to remediate here with a more detailed description. For the implied variables we have: E( ) = E( 1) = E( 2) = E( 1) = E( 2) = 0, where E() is the mean operator and is the standard deviation.
In our work we have found that the linear and non-linear versions of FC have a correlation coefficient which is 0.995 (as the average value over the available years). By setting 0.995 = ( 1, 2) we find = 0.935. Under this scenario, the correlation between Z1 and Y1 is ( 1, 1) = 0.069.
The average correlation coefficient between the linear and non-linear version of the algorithms raises to 0.99985 (on average across the years) when the linear algorithm is implemented without setting the diagonal of the proximity matrix to zero (see Methods and Figure S2, Figure S5). If ( 1, 2) is set to this value, in the toy model one has = 0.988 and ( 1, 1) = 0.012.
I would therefore say that they are instead proving my point, we are in a regime of correlation where the 'a' is such that my example is statistically significant (or very close to be).
A value 0.069 is not significant at the 5% level with respect to a null hypothesis of uncorrelation between two variables, in samples of 200 data. In fact, the critical value is 0.1166 [a, b]. The p-value corresponding to = 0.069 is 0.17: in other words, there is a 17% probability of sampling a value larger than 0.069 in couples of samples of size 200 sampled from a bivariate distribution, with the two variables being uncorrelated [a, b].
Another way to visualize the same outcome is by numerically simulating the system of equations defining Z1 and Z2 (with = 0.935) and plotting the resulting relation between Y1 and Z1. Figure 1 reports the results for 20 couples of samples of size 200. The resulting regression lines are reported in red.