Abstract
Economic growth is associated with the diversification of economic activities, which can be observed via the evolution of product export baskets. Exporting a new product is dependent on having, and acquiring, a specific set of capabilities, making the diversification process pathdependent. Taking an agnostic view on the identity of the capabilities, here we derive a probabilistic model for the directed dynamical process of capability accumulation and product diversification of countries. Using international trade data, we identify the set of preexisting products, the product Ecosystem, that enables a product to be exported competitively. We construct a directed network of products, the Eco Space, where the edge weight corresponds to capability overlap. We uncover a modular structure, and show that low and middleincome countries move from product communities dominated by small Ecosystem products to advanced (large Ecosystem) product clusters over time. Finally, we show that our network model is predictive of product appearances.
Introduction
There is strong evidence that as countries experience economic growth, they change what they do and undergo structural transformation via diversification of their economic activities^{1,2} by increasing the number of industries that they have comparative advantage in. Emergence of a particular industry in a country depends on the availability of different combinations of capabilities, including various factors like capital, labour, and productive knowledge^{3,4,5}. From this viewpoint, countries grow as they acquire productive knowledge and/or capabilities, and learn to combine these complementary capabilities in order to move into new economic activities. Hence, industrialisation is mostly a pathdependent process, whereby the appearance of new industries and economic activities is conditional on having or acquiring the relevant capabilities and knowhow^{3,4,5,6,7,8}.
Drawing up an exhaustive list of capabilities and/or the productive knowledge required for an industry is challenging. For instance, for a country to develop the freshcut flower industry, it requires capabilities, such as cold storage facilities, airports, irrigation systems, suitable climate, efficient customs, a good business environment as well as knowledge embedded in its farmers, botanic experts, engineers, logistic specialists, marketing professionals, bureaucrats and business executives to name but a few. This list is by no means exhaustive and the listed components might not be independent of each other. Since these capabilities are difficult to observe and measure, we do not try to uncover their identities and seek to quantify their existence drawing inspiration from biology and, in particular, the study of genetics. In genetics, observed phenotypes are the result of genotypes encoded in genetic material. Mendel, in his landmark study^{9}, recorded the phenotypes present in successive generations of peas without directly observing the underlying genes and DNA structure. Hence, valuable information can be gathered by observing the phenotypic traits of individuals when the underlying genetic structure is unknown. Furthermore, by observing which phenotypic traits often cooccur in individuals, or which traits often follow each other, we can uncover genetic relationships or distances^{10}. The genetic distance between phenotypes is relevant, for example, to inferring relationships between diseases^{11}.
Here we take an agnostic view on the identity of the capabilities and we derive a probabilistic model to describe the directed dynamic process of capability accumulation and product diversification of countries. We use the presence and appearance of industries in countries (phenotypes) to infer capability and knowhowbased (genotypic) relationships between industries. Using our geneticsinspired industry capability distance, and modelling industrial diversification as a process by which countries accumulate capabilities and move into new industries that share existing capabilities, we can predict the emergence of new industries.
A number of wellestablished models of economics can be interpreted from a genetic or phenotypic perspective. For instance, standard trade theories first proposed by Ricardo^{12}, and Hecksher and Ohlin^{13} take complementary approaches which can be thought as phenotypic and genotypic stances, respectively, to explain trade patterns between countries. For example, a recent and celebrated version of the Ricardian model developed by Eaton and Kortum^{14} proposes that technological differences across countries, and the relative evolution of productivity across exports, determines the pattern of production in the world. These authors do not seek to uncover the causes behind the observed pattern, hence implicitly taking a phenotypic view of the international trade. On the other hand, the Hecksher–Ohlin model ties trade patterns to factor differences between countries, and proposes that the relative abundance of factors (labour, capital etc.) shapes the production choices of a country. This model takes a genetic perspective, yet quickly becomes intractable for large numbers of factors and products, constraining detailed insights into diversification processes.
Turning to models of structural transformation and diversification, understanding these processes at a detailed level has been of keen interest for policymakers and practitioners. However, analytical intractability and measurement problems require economists to often focus on few core productive factors such as capital, labour, human capital and institutions^{15,16} and technological differences^{17,18,19}, usually taking a genetic perspective albeit with a limited number of factors. But these models struggle to adequately describe structural transformation at a disaggregate level. Here, we exploit the fact that we can observe and measure the phenotypes, namely the presence of industries in countries, and propose a phenotypic approach to modelling the process underlying structural transformation at a detailed level.
To date, two coupled but distinct modelling approaches have emerged aiming to describe the pathdependent process of diversification based on capabilities and productive knowledge using a phenotypic approach. The first is focused on empirically estimating the number of complementary capabilities, or complexity, needed to make a product (or present in a technology or place)^{4,20}. While a variety of approaches have been proposed, the foundational method to estimate product complexity^{4,21} uses information on which countries make what products to infer product capability requirements. This model assumes that complex products can only be made by countries which have many capabilities, and hence, also make many other products. It has been shown that the aggregate complexity level of a country is a strong predictor of its future income growth compared to standard variables often associated with country sophistication such as education and quality of government.
A second class of models seeks to map the pathdependent dynamical process by which countries move into new products^{3}. These are connected, both theoretically and methodologically, to the study of regional and urban industrial diversification^{5,7,8}, and are based on the assumption that countries will move into products similar to their current export (capability) basket. At the forefront of these models, the The Product Space^{3} is a network of products with edges based on crosssectional export data. Under the assumption that a product pair requiring similar capabilities will be coexported by many countries, the (crosssectional) coexport probability of any two products is assumed to be related to the capability overlap. The location of products made by a country in this network determines its future diversification potential. Countries with products in denser parts of the network have more options, while those on the periphery share capabilities with few other products. In related work, Zaccaria et al.^{22}, similarly inspired by a capabilitybased approach, created a taxonomy of products based on the excess conditional probability of producing a product in the presence of other products, also using the crosssectional data. By selecting the maximum among the excess probabilities, a product hierarchy tree is generated and used to model the dynamics of the product diversification of countries. The ability of these network models, and others like it, to generate detailed metrics related to diversification processes has propelled the field into development and industrial policymaking at the global, national and regional level^{2,23}.
Yet, these dual modelling approaches, capturing slightly different elements of the same underlying process, have not been unified to date. Importantly, these models are motivated via a capabilitybased narrative, as introduced above, but they are not underpinned by a mathematical model that explicitly takes capabilities into account. Additionally, they do not address the temporal aspect of the diversification process as a result of capability accumulation directly. Furthermore, they omit a large amount of available information on the patterns of diversification observed over the past couple of decades worldwide. Here we seek to develop a unified model, which is theoretically grounded in the pathdependent accumulation of capabilities and products, and utilises the available data for international export diversification.
Building on Hausmann and Hidalgo^{24}, who developed a capabilitybased Leontieflike production function, we propose a model to describe the pattern of product appearances within and across countries based on capability accumulation. Within this framework, a country will jump to a new product with probability decreasing in the number of missing capabilities to make the product. We infer the capabilities possessed by a country by looking at the capabilities of the products it currently produces. The ability of a country to diversify is, hence, dependent on its current product basket. Countries with many existing products will have few missing capabilities, and many options for diversification. Hence, the pattern of product appearances contains information about the underlying capability overlap between products. We derive a relationship between the probability of a product presence (say product i) given the subsequent appearance of product j, and use this to infer the extent of capability overlap between the product pair i and j. The Ecosystem of a product i is then the overlap of product i with all other products j. We empirically estimate this capability overlap using product presences and appearances in international export data from 1984 to 2016.
What does it mean for a product j to have a large value in the Ecosystem of product i? There are several implications that come directly from the model. First, it means that products i and j share capabilities and the extent of overlap between the capabilities is captured by the value of the Ecosystem entry. Secondly, product j often precedes the product i in appearances in the world (if the value of the Ecosystem entry is large), giving us a directional relationship. Third, countries that have j have a higher probability to jump to product i. This dynamic aspect of the Ecosystem as captured by this precedence relationship is one of the most important differences compared to the Product Space.
In order to explore pathdependent diversification processes, we construct a weighted directed network, the Eco Space. The direction of the edges connecting nodes (products) represents export precedence, and the edge weight is given by our estimate of capability overlap. We analyse a range of network characteristics, including node indegree and outdegree. Nodes with high indegree (equivalent to the size of the product Ecosystem) are typically complex products, requiring many inputs. Nodes with high outdegree, on the other hand, are typically less sophisticated products which contribute to the Ecosystems of many other products. We show that the majority of nonzero directed edges (over 80%) transition from low complexity to high complexity products as we would expect under a capabilityaccumulation model. We also compute the node betweenness centrality, a measure of the number of shortest paths that transition through a node. Such nodes exhibit both high in and outdegree  they are transition products typically produced by low and middle income countries as they move into more sophisticated products.
We investigate the structure of this network, finding that it exhibits a modular structure composed of a number of welldefined product communities (clusters). These communities are composed of groups of products that share similar capabilities, and are detected via an algorithm based on random walker dynamics^{25}. In essence, if let jump from node to node on a network with probability proportional to edge weight, a random walker will become trapped in regions of the network exhibiting high internal connectivity. Deploying this method, we identify five stable communities in the Ecosystem network. We explore the evolution of countries based on the location of product appearances in the network: countries tend to diversify along the arrow of development starting in an origin community which is composed of high outdegree products, and eventually concentrating in a variety of distinct but interconnected destination communities composed of high indegree products.
Finally, using an outofsample approach, we show that our model (empirically estimated from export data for the period 1984–2009) is informative in predicting the emergence of new products in the exports of countries for the period 2010–2016. We can interpret this result as suggesting that a country with an export basket proximate (in terms of capability gap) to a particular product is more likely to competitively export that product in the future. This model compares favourably in comparison to the Product Space^{3} in terms of the prediction of export appearances.
Results
Productive Ecosystems
In order to model the process of product diversification via capability accumulation, we build on Hausmann and Hidalgo^{24}. According to this Leontiefbased model, products require a large number of capabilities in order to be made, and countries can only make a product if they possess all the required capabilities. We denote the vector of capabilities of a product i, p_{i} ∈ {0, 1}^{m} where m denotes the (unknown) number of capabilities and p_{ik} = 1 if product i requires capability k. Analogously, the capability vector c_{n}∈ {0, 1}^{m} encodes the capabilities present in country n. Neither of the vectors p_{i} or c_{n} are directly observable, but serve as intermediate inputs into our model.
In Hausmann and Hidalgo^{24}, the authors develop a model based on the capability endowments of countries and the capability requirements of products in order to explain crosssectional patterns in the distribution of product presences across countries. This model is based on a Leontieflike production function whereby a product i is produced in country n if and only if country n has all of the capabilities required by product i. The number of capabilities that product i requires is \(\parallel {{\bf{p}}}_{{\bf{i}}}{\parallel }_{1}={{\bf{p}}}_{{\bf{i}}}^{T}\cdot {{\bf{p}}}_{{\bf{i}}}\) where ∥∥_{1} denotes the Euclidean 1norm and ^{T} denotes the transpose of the vector. In the remainder of the paper, we only use this norm, so we will skip the subscript in the norm and the transpose sign when we are calculating inner products. Hence, country n produces product i if and only if c_{n} ⋅ p_{i} = ∥p_{i}∥. The model is solved assuming that the probability that a country has/product requires a capability with a constant probability.
Focusing on modelling the temporal dynamics of diversification, and specifically product appearances, here we assume that country n will start making a product i at a future time t_{1}, which it does not currently make, with a probability that decreases with the number of capabilities that are not present in the country but required for product i (at some initial time t_{0}). Formally, if product i requires ∥p_{i}∥ capabilities, and country n has c_{n} ⋅ p_{i} of them, country n needs to acquire ∥p_{i}∥ − c_{n} ⋅ p_{i} capabilities in order to produce product i. We name this difference the capability gap between the capability vector of the country and capability requirement vector of the product. The probability that country n will start making product i decreases as size of this gap increases. Following Hausmann and Hidalgo^{24}, we can assume that the probability of acquiring a capability is binomial with mean q. Hence,
where \({J}_{n,i}^{{t}_{0}\to {t}_{1}}=1\) if product i, which was absent in time t_{0}, appears in country n at time t_{1}, and 0 otherwise (to minimize notational clutter, we will omit the time indices). Since 0 < q < 1, the probability of jump decreases with an increase in the capability gap.
We show in the ‘Methods’ section that, if we assume that the probability of a country having each capability is w by making a meanfield assumption, we can express the capability overlap between i and j as
where M_{n,i} = 1 if product i is present in country n at t_{0}, and E is the Ecosystem matrix. The overlap vector \({{\bf{p}}}_{{\bf{i}}}^{{\bf{j}}}\) is defined as \({p}_{ik}^{j}=1\) if both p_{ik} = 1 and p_{jk} = 1 and 0 otherwise. Therefore, the probability that the product j is already produced in a country, given the country started making the product i, increases with the overlap between the capability requirements of these two products, captured by \(\parallel {{\bf{p}}}_{{\bf{i}}}^{{\bf{j}}}\parallel\) up to a constant multiplicative factor.
We refer to the row vector \({{\bf{E}}}_{{\bf{i}}}={\{{E}_{i,j}\}}{\,}_{j = 1,...,n}\) as the Ecosystem of a product i. This captures the extent of capability overlap between product i and each product j, and is calculated based on the probability that product j was already present when product i appeared. In the ‘Methods’ section, we outline how we empirically estimate the Ecosystem matrix using product presences and appearances (based on revealed comparative advantage^{26}) in international trade data. Negative values are set to 0 in this matrix which corresponds to a ratio P(M_{n,j} = 1∣J_{n,i} = 1)/P(M_{n,j} = 1) less than 1. In order to study long term diversification trends, we create a single composite Ecosystem matrix \(\widehat{{\bf{E}}}\) using data from 1984 to 2009 for 751 4digit SITC products (in our prediction exercise, we use an outofsample approach to predict product appearances for the period 2010–2016). A toy example illustrating the method is shown in Fig. 1.
As discussed above, a range of approaches have been proposed to quantify the complexity of a product. Under our capabilitybased model, products with large Ecosystems (that have many nonzero entries in their Ecosystem vector) share capabilities with many other products. These are complex products, likely requiring a wide range of distinct capabilities. Let \({\bf{X}}=\widehat{{\bf{E}}}\,> \, 0\), an indicator matrix for the positive entries of \(\widehat{{\bf{E}}}\). We define the product Ecosystem size of product i as the sum of row i of X, i.e., the number of nonzero Ecosystem products. We define the product Ecosystem input of product i as the sum of column i of X, i.e., the number of products for which product i is an Ecosystem product.
Figure 2 A presents a visual representation of the entries in matrix X, where blue dots in row i correspond to nonzero entries in the Ecosystem vector of product i. Products are sorted by Ecosystem size in the rows and Ecosystem input for the columns. The nested structure of the matrix shows that large size Ecosystems products rely on capabilities present in both small and large Ecosystem input products (top rows are densely filled), whereas small Ecosystem products rely on capabilities present in products common to many Ecosystems (bottom rows are filled only on the left hand side). These patterns are consistent with the nested pattern observed in crosssectional data for product presences by Bustos et al.^{21}.
Consistent with our capabilitybased model, Fig. 2B shows that almost no products have both a large Ecosystem size and input (i.e. the top right corner is empty). In other words, as we would expect, products requiring many capabilities with a large Ecosystem size are not simultaneously (input) Ecosystem products for many products. Machinery and transport equipment products (blue) tend not to be high input products, while most food products (yellow) have a small Ecosystem size.
Next, we explore how the Ecosystem input and Ecosystem size relate to product ubiquity, which is the number of countries that has comparative advantage in the product (above the threshold). According to Hidalgo and Hausmann^{4}, high product ubiquity is associated with low complexity products as these products can be made by many countries. Figure 2C shows the positive relationship between Ecosystem input and product ubiquity. Note that the denominator of the Ecosystem equation, Eq. (2), is effectively equal to the product ubiquity. For this reason one might be concerned that high ubiquity products may not be present in the Ecosystems of many products. Figure 2C shows that this is not the case. Figure 2D confirms the negative relationship between Ecosystem size and product ubiquity.
Table 1 shows the top 15 products in terms of Ecosystem size, and the top 15 products in terms of Ecosystem input. In the first case we observe a range of sophisticated products, including machinery and electrical appliances, vehicles and engines, and chemicals. In the second case we have less complex products, including raw textiles and fabrics, simple garments and basic chemicals. In Supplementary Note 1 of the Supplementary Information, we show that the overall pattern of entries in the Ecosystem matrix is robust to alternative Revealed Comparative Advantage (RCA) thresholds associated with product absences, presences and appearances, and to variation in the time period used.
The arrow of development
Countries diversify into new products that are similar (in terms of required capabilities) to what they currently produce. In order to model this process, we construct a network of products. Directed edges connect the products: there is an arrow from node j to node i if product j is in the Ecosystem of product i, which implies that j tends to be produced before i appears. The weight of the edge is an estimate of capability overlap between i and j as determined by the corresponding positive Ecosystem entry.
We can ask questions such as: do we observe clusters of products sharing many capabilities? Which products are most likely to be part of a development path? How do countries diversify in this network?
Formally, the Eco Space is a network with n nodes (or vertices). The structure of any network can be encoded by the adjacency matrix \({\bf{A}}\in {{\mathcal{R}}}^{n\times n}\) where entries A_{ij} correspond to the weight of the directed edge from node i to node j. In this case, \({{\bf{A}}={({\bf{X}}\circ \widehat{{\bf{E}}})}^{\text{T}}}\) is the adjacency matrix for the Eco Space where ^{T} denotes the transpose of the matrix and ∘ denotes the Hadamard product (elementwise multiplication) operator. Using this adjacency matrix we can compute a host of network metrics, including, for example, the indegree d_{i} = ∑_{j} X_{ji} (equivalent to the Ecosystem size), and the instrength, s_{i} = ∑_{j} A_{ji} for each node i. Note that the indegree is exactly Ecosystem size shown Fig. 2. Similarly, the node outdegree is the Ecosystem input.
We construct a reduced version of this network, calculating the mean edge weight between products in 2digit divisions (there are 63 2digit product divisions). Figure 3A illustrates the directional relationship between divisions, showing mean edge weights over a threshold of 0.2 (this includes about the top 12.5% of edges). The software programme Gephi^{27} has been used to generate the network layout using the automated Force Atlas algorithm.
In order to probe the structure of this network, we search for clusters of nodes (communities) which exhibit high internal connectivity, but sparse connections between communities. Within this context, communities represent groups of products with shared capabilities. The presence of modular structure, whereby sparse connections lie between clusters, could prove an obstacle to diversification processes, as countries become trapped in a community. This type of network topology has been detected in a wide range of networks, particularly social and biological networks, and is often indicative of an underlying functional organisation^{28,29}.
There are a large variety of approaches to community detection, many based on comparison of the network structure to a statistical null model (i.e., the connectivity structure if edges were placed at random, see Fortunato^{28} for a review). Most traditional methods seek to find a single optimal partition, yet this approach often neglects the presence of modular structure at a range of scales (e.g. few large communities vs many small communities). Here we apply the Stability algorithm^{25}, which is based on the dynamics of a random walker on a network. In essence, a random walker jumps from node to node with probability proportional to edge weight. If the walker gets trapped in a region of high connectivity, the corresponding group of nodes corresponds to a tightly knit community. The longer the walker jumps, the larger the communities she finds. Hence, a time parameter enables us to control the scale (from many small communities to few larger communities) at which communities are uncovered. In the Supplementary Information, we describe the optimization process to find a node partition for which the algorithm is most robust.
We apply this algorithm to our reduced twodigit network representation. In Fig. 3A, we observe clear groupings, with food, animals, crude materials and textiles dominating the yellow community on the righthandside. As we move to the left we observe clusters of transportation equipment and machinery (green), and manufacturing (orange). In the center (purple) we have processed petroleum products, metals and chemicals/plastics. Moving to the far left (blue), we have sophisticated products such as medicines and pharmaceuticals, scientific equipment and electrical machinery. The inset shows a further reduced version of the network, where each node corresponds to a community. We clearly observe the arrow of development, as countries begin their development path in the yellow community, and progressively jump into new products located in the center and far left of the network layout. The blue and green (and to a lesser extent orange and purple) communities represent destination products typically produced in highly developed nations.
Figure 3B shows the relationship between mean Ecosystem input and mean Ecosystem size at the 2digit division level (63 divisions), with points coloured by community assignment. In the inset, we show the same relationship aggregated to the community level. We confirm that the yellow community is dominated by high Ecosystem input but low Ecosystem size products. On the other hand, the blue and green communities are dominated by high Ecosystem size but low Ecosystem input products. Figure 3C, D shows the Ecosystem network at 2digit level with node sizes proportional to the mean Ecosystem size and mean Ecosystem input. We can clearly see that the large Ecosystem input products are located in the right and large Ecosystem size products are located in the left.
We can also extract information about intermediate steps. We compute the betweenness centrality of each node, a measure of the number of times a shortest path between any two nodes traverses the node. In Fig. 3E we visualise the mean betweenness for each 2digit division. These transition products tend to have high in and out degree  they are stepping stones.
An alternative widelyused metric to estimate the productive sophistication of a product, the product complexity index (PCI)^{4}, is derived from export data based on the hypothesis that rare or complex products are only made by few countries that possess many capabilities (and, therefore, produce many other products). As shown by Hausmann et al.^{2}, higher complexity products are mostly associated with (rare) highly diversified developed countries (who produce both common and rare products) while lower complexity products are produced in countries at all levels of development. Here, we investigate the relationship between the Ecosystem of a product and its PCI value. We would expect that products with a high PCI value, requiring many (and rare) capabilities, have a large Ecosystem size. On the other hand, products with a low PCI value, needing fewer more common capabilities, would be expected to have few Ecosystem products. Figure 4A shows the distribution of PCI within the Eco Space (e.g. the mean PCI of products within the 2digit divisions). As confirmed in Fig. 4B, high PCI products coincide with those with a high Ecosystem size. On the other hand, Fig. 4C shows that low PCI products tend to be inputs to the Ecosystem of many products.
If capability accumulation underlies the development process, we expect countries to move from less complex products towards sophisticated products over time. Hence, we expect diversification from low complexity to high complexity products as countries upgrade their complexity level. We look at the directed edges between products of different complexity levels and ask, is it more likely that an edge connects a lower complexity node to a higher complexity node? In other words, are the input products within a product’s Ecosystem less complex than the product itself? Hence, we are interested to see whether the directionality of edges moves from lower to higher PCI products. For each node we show the relationship between its own PCI, and the mean PCI of its top x = 10 incoming neighbours (Ecosystem entries) in Fig. 4D. We observe that most products (83% of products) have a higher PCI than the mean of their top 10 Ecosystem products. Next, we compute the PCI of the product minus the mean PCI of its top x = 10 incoming neighbours. The histogram in Fig. 4E shows a clear bias towards positive values—the PCI of the product is higher than its incoming neighbours. Finally, by looking at the mean of this distribution across a range of x in Fig. 4F, we find, as expected, that the mean decreases as we increase the number of neighbours.
How do these product attributes relate to the wealth of the countries who produce them? For each country, we compute the mean Ecosystem size and mean Ecosystem input level of the products it exports with RCA higher than the presence threshold. Figure 5A plots these values for each country, where the points are coloured according to GDP per capita. There is a clear negative relationship between the size and the input, with higher GDPpc countries—located in the lower right portion of the graph—exhibiting mainly high Ecosystem size/low Ecosystem input products. We label the outliers in the graph, which are mostly oil or natural gas rich countries. Figure 5B, C shows maps with countries shaded by mean Ecosystem size and Ecosystem input. Finally, Fig. 5F, G confirms that wealthier countries export products with high mean Ecosystem size and low mean Ecosystem input.
We can also examine the share of products in advanced communities (defined here as all communities except the yellow community), and the logbetweenness centrality of their products, for each country. Figure 5D, H confirm that wealthier countries tend to be concentrated in advanced communities. Products with high betweenness centrality can be seen as transition products, and would be expected to be produced by lowmiddle income countries as seen in Fig. 5E, I.
Next, we explore the evolution of these metrics over time (1984–2016), dividing countries into four equallysized income groups (by GDP per capita). Figure 5J–M show that middle income countries increased their share of products in high Ecosystem size products (and those in advanced communities), while both poor and middle income countries decreased their share of products in high Ecosystem input products, and moved out of transition products with high betweenness centrality.
How do individual countries transform their export composition over the communities we identified from the Ecosystem network? Here, we explore in more detail the temporal evolution of the product basket of nations as they diversify into new products and move through the network over time. Using data for 2016, the central map in Fig. 6 shows countries shaded by the colour of community that has the highest share among their products. We observe that a majority of countries currently export products concentrated within three communities: yellow (food, animals, crude materials and textiles), purple (petroleum products, metals and chemicals/plastics), and blue (medicines and pharmaceuticals, scientific equipment and electrical machinery). In Supplementary Note 5 of Supplementary Information, we show the relative share of each country’s export presences across each individual community. The inset next to the map shows the mean Ecosystem size vs. PRODY^{30}, which is calculated as the RCA weighted average of the GDPpc of countries for each product. The size of the points is the number of 2digit products in the community. This plot confirms that the yellow and purple communities are dominated by products exported by low GDPpc nations, while the blue community is dominated by products exported by high GDPpc nations.
Over time, from 1984 to 2016, we see that many countries go through transformations by changing their share of products in different communities. For example, we observe a number of countries transitioning over this period from a concentration of products in the yellow community to those in the blue community. We can distinguish between those who transitioned early in the period in the 1980s (SGPSingapore), those who transitioned in the middle of the period around the year 2000 (HUNHungary, KORKorea, MYSMalaysia, and MEXMexico) and those who transitioned more recently (CHNChina). India (IND) appears to be on this path, with a future transition on the horizon. Norway (NOR) is dominated by products in the purple community, while Germany (DEU) and the USA are dominated by products in the blue community.
Predicting product appearances
Beyond analysing network properties and diversification paths, we wish to assess whether the model is informative in predicting the appearance of new products, or equivalently the export of new products with comparative advantage, for the set of all countries. For each productcountry pair, this translates to estimating the exponent in Eq. (1), the gap between the capabilities required by the product and the capabilities held by the country. Our strategy is to infer the capabilities required for a product by looking at its maximum Ecosystem entry, which is an estimate of the maximum capability overlap with all other products. While we simply introduce our new metric here, a comprehensive derivation and explanation is provided in the Methods Section.
To predict the likelihood of an appearance of product i in country c, we estimate the capability gap in the exponent of Eq. (1) via
where \({{\mathcal{J}}}_{n}\) is the set of products present in country n. We call this metric the Ecosystem density. We complement our derivation in the ‘Methods’ section with a graphical explanation. In order to reduce noise, we take the mean value over the top k = 25 entries for each \(\mathop{\max }\nolimits_{j}{\hat{E}}_{i,j}\) and \(\mathop{\max }\nolimits_{j\in {{\mathcal{J}}}_{n}}{\hat{E}}_{i,j}\). The robustness of our results in terms of parameter k is given in Supplementary Note 3.3 of the Supplementary Information. We note that the Ecosystem encoded in matrix \(\hat{E}\) was constructed using data from 1984 to 2009. Product presences in Eq. (3) are measured in 2010, and we seek to predict appearances during the period 2010–2016.
We measure the predictive power of our variables using area under the curve (AUC) of the receiver operating characteristic, which plots the rate of true positives of a continuous prediction criterion as a function of the rate of false positives. For a standard probit model, Table 2 shows that Ecosystem density variable has predictive power for countryproduct appearances with AUC = 0.715 (column 1), increasing to AUC = 0.813 when country and product fixed effects are included (column 4).
We compare the ability of this metric to predict product appearances with the Product Space density^{3,31}, a predictive metric based on the structure of the Product Space (see ‘Methods’ for details). We find that our Ecosystem based metric outperforms the Product Space density which has AUC = 0.623 (column 5, no fixed effects). When both measures are included together, the sign of the Product Space density becomes negative after controlling for the Ecosystem density in Column 6, but it recovers its positive sign when the fixed effects are included in Column 7. An increase in the predictive power is also evident in the pseudoR^{2} measure, which increases to 6.5% from 1.5% when Ecosystem density is used compared to the Product Space density.
Product appearances are dependent on two thresholds: one for product absences (τ_{0}) and one for product presences (τ_{1}), see Eqs. (6) and (7) in the ‘Methods’ section. The default values of these, discussed below, are τ_{0} = 0.1 and τ_{1} = 1. As we decrease τ_{0}, we have fewer absences (and hence fewer possible appearances). As we increase τ_{1}, we also have fewer appearances. In order to explore variation in the predictive ability of our model for variation in these parameters, in Fig. 7, we show a heat map for the AUC values for various combinations of τ_{0} and τ_{1} corresponding to column 1 of the table. We observe, for reasonable combinations τ_{0} and τ_{1}, the baseline (no fixed effects) AUC scores are consistently close to 0.72.
In the Supplementary Information, we apply a number of tests to assess the robustness of our results:

We vary the number of products used in the computation of the maximum in the exponent of q that we use to create our density measure in Eq. (3), further explained in the ‘Methods’ section (Supplementary Fig. 10).

We split the countries into different categories such as high vs. low per capita GDP, high vs. low complexity and high vs. low export volume (Supplementary Table 1).

In order to test for redundancy in the product classification, we omit products from the same SITC division in the construction of the Ecosystem (columns 1 and 2 of Supplementary Table 2).

We split the products into different groups such as manufacturing vs. nonmanufacturing, high vs. low complexity, high vs. low ubiquity, and high vs. low export volume (columns 3–10 of Supplementary Table 2).

We modify our criteria in order to observe a jump of a country into a new product in terms of the number of years of product absence followed by product presence required (Supplementary Tables 3–5).

We use an alternative measure of RCA, which compares a country’s per capita production levels in a product to the world’s overall per capita production of the product to reveal the comparative advantage (Supplementary Table 6).
Our results remain robust to these various tests.
Our regression results indicate our Ecosystem measure captures pathdependent diversification patterns and surpasses the current best comparator, the Product Space, in its predictive ability. It is important to acknowledge that our results do not predict future jumps perfectly. Our Ecosystem density measure captures potential products which require few additional capabilities for countries to move into, but given the limited resources of countries to exploit these adjacent products, not all possible jumps are realised. As a consequence, we are trying to predict rare events, only 1831 jumps were observed out of 49,352 absences, which is close to rate of 3.7%. In addition, there are many other factors that prompt countries to begin production of new products for export, including pathdefying factors^{32,33} which are not captured by our model.
Discussion
Classical growth and trade theory has struggled to reconcile macro variables such as factor endowments with differences in the productive structure and knowhow of nations. One approach would be to increase the number of factors measured and write down more detailed production functions to understand the dynamics. A complementary approach might take an agnostic stance towards the identity of the capabilities or factors but focus on the development paths associated with this deeply granular process. In this paper, we took the latter approach and, inspired by early approaches to the study of genetics, we develop a model for product diversification based on capability accumulation.
We propose a new metric, the Ecosystem of a product, which contains information on other products sharing a highlevel of capability overlap. Empirically, this is the set of preexisting products that are typically necessary for a future appearance of that product. Given the temporal nature of this measure, we construct a directed network, the Eco Space, to describe probable development paths. Exploiting tools from network science, we identify product clusters and transition sectors governing dynamics on the network. Finally, we show that the model is a good predictor of export diversification, performing favourably compared to the wellknown Product Space framework^{3}.
This work contributes to both the theoretical literature on the modelling of capabilities and knowledge accumulation, and more generally the processes underlying economic growth. It is particularly relevant for the literature on economic complexity^{4}, and the ongoing search for empirical methods to quantify, measure and validate complexity^{3,20,34,35}. Similarly, it is embedded in the literature on pathdependent diversification^{3,5,7}, including regional dynamics and related varieties similarly derived from an evolutionary or capabilitybased perspective. Future work could include estimating this model for industry employment or establishment data, which provides additional information on domestic production (and by extension domestic and service capabilities) not contained in export data^{31,36}.
Our framework can also be potentially applied to a range of other settings where pathdependent diversification occurs. The first obvious extension is to the regional or urban setting where firms/industries need specific locally available capabilities to flourish. This will result in a pathdependent process of diversification, which underlies some of the dynamics behind industrial cluster formation^{37} and urban agglomeration^{8}. In a similar vein, technology adoptions by countries^{38} also follow a pathdependent process: many technologies require other technologies to be present in advance in a country. Finally, in biology, from where we borrow the term ecosystem, organisms require the presence of other animals or plants to populate a location, and, hence, this mechanism also leads to pathdependent dynamics. This process is intimately linked to observed nested structure emphasised in the ecology (and economics) literature^{21,39,40,41}.
As confidence in market efficiency has declined, particularly since the 20089 financial crisis, industrial policy has enjoyed somewhat of a global resurgence^{42}. Although, there have been clear examples of pathdefying changes^{32,33}, the metrics derived here aim to aid countries or regions to connect their current productive capabilities to future possibilities. In particular, we hope that the Ecosystem metric is helpful to policymakers seeking to analyse the preparedness of a nation or region to move into a new product, or trying to identify key transition sectors which could open up future opportunities. Additionally, policymakers can also use this methodology retrospectively to identify market failures. This is possible by identifying products that had a high likelihood of appearance, but have not yet been observed. Factors that prevented the appearance of these products can then be investigated. While there are clear policy applications for our work, it is also prudent to highlight limitations of the model. Firstly, although an improvement on previous approaches, the predictive power of the model suffers from false positives since many possible jumps are not realised due to external factors. Secondly, evolving production technologies impact the underlying capability requirements of many products, leading to an evolution of Ecosystem matrix over time, albeit at a slow pace. Based on this, and the predictive power of our model over a fiveyear period, we suggest that this tool is most suited to deliver short to medium term policy insights. Overall, we believe this methodology will be a valuable asset for policymakers.
Methods
The model
Let M_{n,i} = 1 if product i is present in country n, and otherwise 0. Similarly, let J_{n,i} = 1 if product i appeared in country n, and otherwise 0.
For a product i and a country n, p_{i} ∈ {0, 1}^{m} is the capability requirement vector of product i, and c_{n} ∈ {0, 1}^{m} represents the capabilities present in country n with m representing the number of the capabilities. Following Hausmann and Hidalgo^{24}, country n makes the product i if country n has all necessary capabilities to make i. Formally:
where ∥∥_{1} denotes the Euclidean 1norm and ^{T} denotes the transpose. We drop the subscript of the norm and transpose sign for notational brevity. We assume that the country will jump to the product upon acquisition of all missing capabilities required to make the product. Hence, the probability of a jump depends on the capability gap between the country and the product capability vectors:
We wish to quantify the likelihood of country n producing product i given that the country is already producing product j. We can split the capability vector of product i into two parts, one which contains the capabilities overlapping with j, and other the nonoverlapping capabilities. We write \({{\bf{p}}}_{{\bf{i}}}={{\bf{p}}}_{{\bf{i}}}^{{\bf{j}}}+{\overline{{\bf{p}}}}_{{\bf{i}}}^{{\bf{j}}}\), where

\({p}_{ik}^{j}=1\) if both p_{ik} = 1 and p_{jk} = 1 and 0 otherwise, and

\({\bar{p}}_{ik}^{j}=1\) if p_{ik} = 1 and p_{jk} = 0 and 0 otherwise.
Since country n is already making product j, it has all the necessary capabilities for it. Hence, the probability that country n starts making product i can be expressed as:
where q is the mean probability of acquiring a capability under a binomial model. We can apply Bayes’ Rule:
and take logarithms:
If we assume that the probability of a country having each capability is w, this expression becomes
Hence, the probability that the product j is already produced in a country, given the country started making the product i, increases with the overlap between the capability requirements of these two products, captured by \(\parallel {{\bf{p}}}_{{\bf{i}}}^{{\bf{j}}}\parallel\) up to a constant multiplicative factor.
Algorithm
We construct the Ecosystem matrix \(\hat{E}\) using export data from the Standard International Trade Classification (SITC) revision 2 at the 4digit level beginning in 1984 using data from Harvard Dataverse present at https://dataverse.harvard.edu/dataverse/atlascleaned by Bustos and Yildirim.
In order to estimate the matrices M and J, we measure product presences and appearances via international export competitiveness. In particular, we measure the intensity with which a country exports each product by computing its Revealed Comparative Advantage (RCA), first proposed by Balassa^{26}. The RCA that a country has in a product is defined as the ratio between the share of the product in the country’s export basket and the overall share of the product in the global export basket. Equivalently, we can also think of RCA as the share of the country in the product divided by the total share of the country in the world exports. A product is overrepresented in a country’s export basket if its RCA is above a threshold.
Formally, if X_{n,i} is equal to the export of country n in product i, then the RCA of country n in product i is defined as:
An appearance of product i in country n is defined as:
Since we will aggregate all jumps for each countryproduct pair in the analysis below, we will drop the time indices in the J matrix.
Based on our definitions of jumps and presences, we compute the entries \({\hat{E}}_{i,j}\) as follows:
We will show how we build the P(M_{n,j} = 1∣J_{n,i} = 1) and P(M_{n,j} = 1) terms separately to create a single composite Ecosystem matrix (using data from 19842010).
Building the P(M_{n,j} = 1∣J_{n,i} = 1) term:

1.
For each country and product pair, we calculate RCA values (top row in Fig. 8).

2.
We designate a product absent if its RCA value is below τ_{0} (= 0.1 in Fig. 8) and present if its RCA value is above τ_{1} (=1 in Fig. 8). If the RCA value is between these two thresholds we designate this product undefined. If the countryproduct pair is missing for that year, we also designate it undefined (middle row in Fig. 8).

3.
We collapse all consecutive absences—and absences interspaced with undefined values—to the latest absence (bottom row in Fig. 8).

4.
We collapse all consecutive presences—and presences interspaced with undefined values—to the earliest presence (bottom row in Fig. 8).

5.
After collapsing, we are guaranteed to have a single absence followed by at most a single presence. After the presence, however, another absence could be present. A jump occurs when a country transitions from an absence to a subsequent presence (bottom row in Fig. 8).

6.
For a product i: we search for the set of countries \({{\mathcal{K}}}_{i}\) in which it appeared. For each of these countries, we detect which other products j were present in the jump start year. A product j was present in the start year if its RCA value was greater than τ_{1}.

7.
For each i and j, we compute the total number of presences of each product j (given an appearance of product i), and divide it by the number of appearance countries (e.g. the size of set \({{\mathcal{K}}}_{i}\)).
Building the P(M_{n,j} = 1) term:

1.
For each product j, we compute the total number of presences of each product j across all countries (i.e., RCA_{n,j} > τ_{1}) and years.

2.
We divide the total number of presences of product j across all years by the total number of countries (each country is counted once for each year it appears in the sample).
Finally, the Ecosystem is a log of the ratio of the P(M_{n,j} = 1∣J_{n,i} = 1) and P(M_{n,j} = 1) terms.
Notes:

Unless otherwise specified, we set standard values for parameters for absence and presence: τ_{0} = 0.1 and τ_{1} = 1.

Following Hausmann et al.^{2}, we restrict our sample to countries with population greater than 1.2 million and total exports of at least $1 billion in 2008. There are also countries with known data reporting issues that were removed by Hausmann et al.^{2}. The sample reduces to 125 countries for the Ecosystem matrix computation.

The full SITC Rev.2 has 786 4digit products in 1984. We omit 6 products with onedigit code 9 (‘Commodities and transactions not classified elsewhere in the SITC’), and drop to 780 products. Then we drop products that do not constitute more than one in one millionth of world trade and have at least 5 million USD exports in all 33 years, which reduces the number of products to 756. Eliminated products are very small in terms of export volume, and create spurious jumps.

The definition of RCA enables small countries to surpass the presence threshold easily. To minimize noise, we converted presences (RCA > 1) to undefined if the countries’ export in the product is less than $10 thousand or the country exports less than one in ten thousandth of the product. Overall, in 33 years, we have 474,494 presences and this change affects 4801 of them (~1%).
Predicting product appearances
A country n has capabilities c_{n}, and products \(j\in {{\mathcal{J}}}_{n}\). We want to compute the probability that country n will acquire the missing capabilities for the appearance of product i:
We do not know which capabilities country n already has, but we can proxy for them by looking at the capabilities of products already present in the country:
where the function \({\mathbb{1}}\) sets the entry of a vector to be 1 if the corresponding entry of the input vector is greater than or equal to 1, i.e., the entry k of c_{n} is 1 if at least one product present in country n requires capability k.
For each product i, we do not know the length of its capability vector, ∥p_{i}∥, but using the Ecosystem entries, we obtain estimates for the overlaps, \(\parallel {{\bf{p}}}_{{\bf{i}}}^{{\bf{j}}}\parallel\)’s. We will assume that these overlaps between the products are uniformly distributed. Under this assumption, we can estimate ∥p_{i}∥ as the maximum of the \(\parallel {{\bf{p}}}_{{\bf{i}}}^{{\bf{j}}}\parallel\)’s. Then the maximum likelihood estimator of the maximum statistic for a uniform distribution is:
where N_{p} is the number of products. Figure 9 depicts this process. Hence, we estimate the number of capabilities needed for i by computing its maximum overlap with all other products j.
Country n makes only a subset of these products and from this subset we can estimate the p_{i}.c_{n} term. The maximum likelihood estimator (up to the same multiplicative factor as above) for the overlap p_{i}.c_{n} is then:
This is the maximum overlap between product i and any product j which is present in country n.
Empirically, we estimate \(\parallel \widehat{{{\bf{p}}}_{{\bf{i}}}}\parallel\) as \(\mathop{\max }\nolimits_{j}{\hat{E}}_{i,j}\) and \(\widehat{{{\bf{p}}}_{{\bf{i}}}{\boldsymbol{.}}{{\bf{c}}}_{{\bf{n}}}}\) as \(\mathop{\max }\nolimits_{j\in {{\mathcal{J}}}_{n}}{\hat{E}}_{i,j}\). Therefore, we estimate the likelihood of an appearance of a product i in country c as
where q is the probability of acquiring a new capability, and \({{\mathcal{J}}}_{n}\) is the set of products present in country n. In order to reduce noise, we take the mean value over the top k = 25 entries for each \(\mathop{\max }\nolimits_{j}{\hat{E}}_{i,j}\) and \(\mathop{\max }\nolimits_{j\in {{\mathcal{J}}}_{n}}{\hat{E}}_{i,j}\). The robustness of our results in terms of parameter k is given in Supplementary Note 3.4 of the Supplementary Information.
The product space
The Product Space^{3} is a network that was proposed to model the process of industrial diversification of nations. Similar to the Eco Space, nodes represent products, and edges are intended to capture capability overlap. The Product Space is built from a crosssection of data—as opposed to the timeseries data required to build the Eco Space. The edge weight between two nodes is estimated using a measure of coexport—i.e., a pair of products is connected by an edge if they are exported by a similar set of countries. It has been shown that the Product Space is a good predictor of product appearances^{3,31}.
Hidalgo et al.^{3} define the Product Space as a matrix P such that
where M_{n,i} = 1 if country n makes product i, and 0, otherwise. The logic behind this approximation is that if a pair of products is coexported by a large subset of countries, then these products must require a similar capability base.
Consequently, countries are expected to move into industries which are close or similar to activities they are already successful at. From a network perspective, this is equivalent to saying that the probability of a product appearance in the future is dependent on the RCA that the country currently enjoys in neighbouring products. Mathematically, we write the Product Space density of product i in country n as
where the matrix P represents the network proximity or adjacency matrix for the Product Space as defined above.
Probit model
We perform a standard Probit regression for the probability of a product appearance of the form:
where the binary variable J_{n,i} is defined by Eq. (7), Φ is a normal cumulative distribution function, d^{E} corresponds to the Ecosystem density, and d^{P} corresponds to the Product Space density, and γ_{i} and η_{n} are product and country fixed effects respectively.
We construct the Ecosystem for years 1984–2009, and use RCA values from the year 2010, to compute the density metrics for both the Eco Space and the Product Space. Our dependent variable is defined for appearances during the 6year period 2010–2016. Note that we condition on the product being absent at the start of the period, e.g., we only include countryproduct pairs that were absent in 2010.
In order to quantify the predictive power of each density metric, and their combination, we compute the AUC or Area Under the Curve of the ROC (Receiver Operating Characteristic). The ROC curve plots the rate of true positives of a continuous prediction criterion as a function of the rate of false positives. The area under the curve (AUC) statistic is equivalent to the Mann–Whitney statistic (the probability of ranking a true positive ahead of a false positive in a prediction criterion). By definition, a random prediction will find true positives and false positives at the same rate, and hence will result in an AUC = 0.5, whereas AUC = 1 for a perfect prediction.
Data availability
All data that we use in this study is publicly available. The trade data is available from the Atlas of Economic Complexity Dataverse, http://dataverse.harvard.edu/dataverse/atlas. Country level indicators were obtained from the World Development Indicators database, http://datatopics.worldbank.org/worlddevelopmentindicators/. The shape files for the world maps were downloaded from https://thematicmapping.org/downloads/world_borders.php. The shape files were not altered, and only used for mapping levels of several variables. The shape files are licensed under Creative Commons AttributionShare Alike License (https://creativecommons.org/licenses/bysa/3.0/).
Code availability
The analysis in this study was done using Stata and Matlab. The Stata and Matlab code is available upon request. The network layout was generated using Gephi (https://gephi.org/). The communities in the network were identified using the Partition Stability algorithm (http://wwwf.imperial.ac.uk/~mpbara/Partition_Stability/).
References
 1.
Imbs, J. & Wacziarg, R. Stages of diversification. Am. Economic Rev. 93, 63–86 (2003).
 2.
Hausmann, R. et al. The Atlas of Economic Complexity: Mapping Paths to Prosperity (The MIT Press, 2014).
 3.
Hidalgo, C. A., Klinger, B., Barabási, A.L. & Hausmann, R. The product space conditions the development of nations. Science 317, 482–487 (2007).
 4.
Hidalgo, C. A. & Hausmann, R. The building blocks of economic complexity. Proc. Natl Acad. Sci. USA 106, 10570–10575 (2009).
 5.
Neffke, F. & Henning, M. Skill relatedness and firm diversification. Strategic Manag. J. 34, 297–316 (2013).
 6.
Nelson Richard, R. & Winter Sidney, G. An Evolutionary Theory of Economic Change (Harvard Business School Press, 1982).
 7.
Frenken, K., Van Oort, F. & Verburg, T. Related variety, unrelated variety and regional economic growth. Regional Stud. 41, 685–697 (2007).
 8.
Ellison, G., Glaeser, E. L. & Kerr, W. R. What causes industry agglomeration? Evidence from coagglomeration patterns. Am. Economic Rev. 100, 1195–1213 (2010).
 9.
Mendel, G. Versuche über plflanzenhybriden. In Verhandlungen des naturforschenden Vereines in Brünn, Bd. IV für das Jahr1865, 3–47 (Abhandlungen, 1866).
 10.
Hidalgo, C. A., Blumm, N., Barabási, A.L. & Christakis, N. A. A dynamic network approach for the study of human phenotypes. PLoS computational Biol. 5, e1000353 (2009).
 11.
Goh, K.I. et al. The human disease network. Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007).
 12.
Ricardo, D. On the Principles of Political Economy and Taxation. (John Murray, 1817).
 13.
Heckscher, E. F. & Ohlin, B. G. HeckscherOhlin Trade Theory (The MIT Press, 1991).
 14.
Eaton, J. & Kortum, S. Technology, geography, and trade. Econometrica 70, 1741–1779 (2002).
 15.
Solow, R. M. A contribution to the theory of economic growth. Q. J. Econ. 70, 65–94 (1956).
 16.
Mankiw, N. G. Principles of Macroeconomics (Cengage Learning, 2014).
 17.
Romer, P. M. Endogenous technological change. J. Political Econ. 98, S71–S102 (1990).
 18.
Aghion, P. & Howitt, P. A model of growth through creative destruction. Econometrica 60, 323351 (1992).
 19.
Aghion, P., Howitt, P., BrantCollett, M. & GarcíaPeñalosa, C. Endogenous Growth Theory (The MIT Press, 1998).
 20.
Balland, P.A. & Rigby, D. The geography of complex knowledge. Economic Geogr. 93, 1–23 (2017).
 21.
Bustos, S., Gomez, C., Hausmann, R. & Hidalgo, C. A. The dynamics of nestedness predicts the evolution of industrial ecosystems. PLoS ONE 7, e49393 (2012).
 22.
Zaccaria, A., Cristelli, M., Tacchella, A. & Pietronero, L. How the taxonomy of products drives the economic development of countries. PLoS ONE 9, e113770 (2014).
 23.
Boschma, R. Constructing regional advantage and smart specialisation: comparison of two european policy concepts. Scienze Regionali (2014).
 24.
Hausmann, R. & Hidalgo, C. A. The network structure of economic output. J. Economic Growth 16, 309–342 (2011).
 25.
Delvenne, J.C., Yaliraki, S. N. & Barahona, M. Stability of graph communities across time scales. Proc. Natl Acad. Sci. USA 107, 12755–12760 (2010).
 26.
Balassa, B. Trade liberalisation and “revealed” comparative advantage. Manch. Sch. 33, 99–123 (1965).
 27.
Bastian, M., Heymann, S. & Jacomy, M. et al. Gephi: an open source software for exploring and manipulating networks. ICWSM 8, 361–362 (2009).
 28.
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
 29.
Girvan, M. & Newman, M. E. Community structure in social and biological networks. Proc. Natl Acad. Sci. USA 99, 7821–7826 (2002).
 30.
Hausmann, R., Hwang, J. & Rodrik, D. What you export matters. J. Economic Growth 12, 1–25 (2007).
 31.
Hausmann, R., Hidalgo, C., Stock, D. P. & Yildirim, M. A. Implied comparative advantage. HKS Working Paper No. RWP14003 (2019).
 32.
Coniglio, N. D., Lagravinese, R., Vurchio, D. & Armenise, M. The pattern of structural change: testing the product space framework. Ind. Corp. Change 27, 763–785 (2018).
 33.
Coniglio, N. D., Vurchio, D., Cantore, N. & Clara, M. On the evolution of comparative advantage: pathdependent versus pathdefying changes (2018). Papers in Evolutionary Economic Geography (PEEG) 1818, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography.
 34.
Bettencourt, L. M., Samaniego, H. & Youn, H. Professional diversity and the productivity of cities. Sci. Rep. 4, 5393 (2014).
 35.
GomezLievano, A., PattersonLomba, O. & Hausmann, R. Explaining the prevalence, scaling and variance of urban phenomena. Nat. Hum. Behav. 1, 0012 (2017).
 36.
O’Clery, N., GomezLievano, A. & Lora, E. The path to labor formality: urban agglomeration and the emergence of complex industries. Tech. Rep., HKS Working Paper No. RFWP 78 (2016).
 37.
Porter, M. E. Clusters and the new economics of competition. Harv. Bus. Rev. 76, 77–90 (1998).
 38.
Comin, D. & Hobijn, B. An exploration of technology diffusion. Am. Economic Rev. 100, 2031–59 (2010).
 39.
Bascompte, J., Jordano, P., Melián, C. J. & Olesen, J. M. The nested assembly of plant–animal mutualistic networks. Proc. Natl Acad. Sci. USA 100, 9383–9387 (2003).
 40.
Saavedra, S., ReedTsochas, F. & Uzzi, B. A simple model of bipartite cooperation for ecological and organizational networks. Nature 457, 463 (2009).
 41.
Saavedra, S., Stouffer, D. B., Uzzi, B. & Bascompte, J. Strong contributors to network persistence are the most vulnerable to extinction. Nature 478, 233 (2011).
 42.
Stiglitz, J. E., Lin, J. Y. & Monga, C.The Rejuvenation of Industrial Policy (The World Bank, 2013).
Acknowledgements
We would like to take the members of the Growth Lab, Center for International Development at Harvard University. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant agreement No 661068. N.O’C received support from the PEAK Urban programme, funded by UKRI’s Global Challenge Research Fund, Grant Ref: ES/P011055/.
Author information
Affiliations
Contributions
Conceived and designed the experiments: N.O’C., M.A.Y. and R.H. Performed the experiments: N.O’C. and M.A.Y. Analysed the results: N.O’C., M.A.Y. and R.H. Wrote the manuscript: N.O’C. and M.A.Y.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Nicola Coniglio, Michael Danziger and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
O’Clery, N., Yıldırım, M.A. & Hausmann, R. Productive Ecosystems and the arrow of development. Nat Commun 12, 1479 (2021). https://doi.org/10.1038/s41467021216890
Received:
Accepted:
Published:
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.