Capability accumulation patterns across economic, innovation, and knowledge-production activities

The evolution of economic and innovation systems at the national scale is shaped by a complex dynamics related to the multi-layer network connecting countries to the activities in which they are proficient. Each layer represents a different domain, related to the production of knowledge and goods: scientific research, technology innovation, industrial production and trade. Nestedness, a footprint of a complex dynamics, emerges as a persistent feature across these multiple kinds of activities (i.e. network layers). We observe that, in the layers of innovation and trade, the competitiveness of countries correlates unambiguously with their diversification, while the science layer shows some peculiar features. The evolution of the scientific domain leads to an increasingly modular structure, in which the most developed countries become relatively less active in the less advanced scientific fields, where emerging countries acquire prominence. This observation is in line with a capability-based view of the evolution of economic systems, but with a slight twist. Indeed, while the accumulation of specific know-how and skills is a fundamental step towards development, resource constraints force countries to acquire competitiveness in the more complex research fields at the expense of more basic, albeit less visible (or more crowded) ones. This tendency towards a relatively specialized basket of capabilities leads to a trade-off between the need to diversify in order to evolve and the need to allocate resources efficiently. Collaborative patterns among developed countries reduce the necessity to be competitive in the less sophisticated research fields, freeing resources for the more complex ones.


Introduction
The global set of national economic systems can be represented as a multitude of interrelated layers, each referring to a characteristic domain of activity (e.g. trade, innovation, scientific research), a subset of which are measurable. Indeed, the view of economic outcomes as the result of the complex interactions between interconnected systems is a long-standing idea with deep roots e.g. in evolutionary economics. One notable example is the notion of systems of innovation [1], defined as the network of private and public institutions operating within a territory, which concur, through their activities and mutual relations, in the discovery an spread of new technologies. In general, the innovation system considers universities, firms and the public sector among the relevant actors responsible for the interactions that shape innovation and knowledge transmission. Moreover, the concept has been adapted successfully over time to serve as a framework to describe and analyze National [2,3] as well as Regional systems [4]. A similar perspective is shared by the triple helix model [5] and its later refinements, which identify economic agents and their co-evolution as drivers of knowledge production and innovation in knowledge-based societies, while reducing the emphasis on the importance of borders and local specificity. As a result, understanding how the different layers evolve over time has become increasingly relevant in the economic literature and the variety of empirical tools proposed to address it has grown accordingly.
The framework of Economic Complexity [6] allows to study the innovation systems focusing on the national competitiveness on the different activities. The interaction between science, technology and production is rich and displays a high degree of interconnection. Indeed, the strongest statistically significant signal of interaction between these layers suggests that technological breakthroughs drive the development of new products and science [7], although the sub-leading interactions are still significant.
At the same time, each layer displays nested and diversified patterns with specific features that are not shared by the other layers. These features are connected to the main drivers of competitiveness of each system and might not be general, although diversification arises in all the cases. Therefore, it is interesting to measure the structural evolution of each layer taken in isolation and compare their analogies and differences.
The scientific literature about biology and ecological systems provides a foundation for our analysis because mutualistic systems create patterns comparable to the ones found in economic systems. Furthermore, the nested patterns of ecological systems are related to their stability with respect to external or internal perturbations. Innovative and productive systems also display nested and modular patterns as emerging features. Therefore, exploiting the analogy by applying the methods borrowed from the ecological domains could hep shed light on interesting properties of human systems, possibly related to their stability and evolution.
The manuscript firstly introduces the material and methods, giving a ground to the datasets and the mathematical clues of the main tools considered. Successively, the results are presented in detail and are argued in the final discussions.

Materials and Methods
This section describes the databases considered in the analysis and the tools and methods implemented.

Databases
The analysis of the scientific layer is based on data that aggregates the scientific output of nations into scientific domains. The source of this data is the Open Academic Graph v2 (OAG) [8][9][10] which is a snapshot of the Microsoft database taken in the late November 2018. OAG lists a large number of academic papers, reporting for each one information such as its citation counts and the institute or university to which the authors are affiliated. The database covers most of the journals, conference proceeding, books and manuscripts published from the early 1800s up to the moment when the snapshot was taken. We consider publications starting from 1960 because in earlier years only a small number of publications is available, which mostly concentrate in a small set of developed countries. In terms of geographical coverage, OAG accounts for most of the nations in the world and it is one of the most complete and detailed datasets in terms of the geographical coverage it allows. OAG presents a small bias towards developed and English-speaking nations [11], although this is a common characteristic of many other databases, e.g. Scopus [12,13].
The technological dataset RegPat [14], built by the OECD and updated yearly, accounts for the innovation taking place within a large set of countries as proxied by the patenting activity carried out by applicants and inventors located within their borders. This database does not cover developed nations uniformly and, since it focuses on patent applications submited to the European Patent Office, it has a bias toward Europe. Nevertheless, the database makes up for this shortcoming thanks to an accurate geocoding of the patent documents it contains. Overall, it contains data about 200 nations and 649 4-digits technological codes of the CPC classification. It covers the period from 1978, when the EPO was first established, to the year of publication.
The COMTRADE [15] database collected by the UN, which reports the trade flows of physical goods between countries, forms the basis for the economic layer employed in our analysis. The database, as homogenized by [16], covers 169 nations and reports 1218 4-digits product codes of the HS-1992 classification.

Measures of competitiveness
Technological impact can be measured with the number of patents filed by field of technology and economic impact can be measured with export flows by product category. Scientific impact is instead usually based on citation counts because citations are widely recognized as a proxy of the quality of the research performed by authors, institutions and, consequently, nations. However, due to the rich-get-richer mechanism, citation counts display very skewed distributions with fat tails and weak convergence to stationary measures [17][18][19]. To partly correct skewness one can employ log-citation counts [11,20,21], defined as where the label i refers to geographical areas and the label α refers to the scientific field.
In the present manuscript we consider both citation counts and log-citations counts. As the results show, the latter yield more stable results with lower fluctuations and are easier to interpret.
Comparing countries based on their competitiveness requires identifying the domains actively pursued by each one. This can be achieved using the Revealed Comparative Advantage (RCA) indicator [22], commonly used in literature as a measure of relative specialization. As shown in equation RCA is computed as the weight an activity in national baskets of activities relative to the global weight of the same activity.
where W i,α indicates the extensive measure over which the RCA is computed, i.e. patent counts, export flows or log-citations, depending on the layer. RCA takes values on a continuum and can hence encode a great deal of information. However, for our purposes a more coarse grained measure is more adequate. For this reason, we transform RCA into binary values flagging the activities in which each nation is more active then a given threshold. Following standard practice, we apply binary filtering by keeping only RCA values above RCA * = 1, thus constructing binary bipartite networks whose adjacency matrix has elements M ij = Θ(RCA ij − 1), where Θ(·) is the step function.

Nestedness
Nestedness is a property of systems consisting of actors with heterogeneous features that measures the extent to which shared features belong to both feature-rich and feature-poor actors. In the manuscript we estimate the nestedness of each network through the computation of two metrics widely used in literature, the Temperature of Nestedness [23,24] and the Nestedness metric based on Overlap and Decreasing Fill (NODF) [25]. NODF estimates the nestedness evaluating the overlap of each row and column with respect to all the others rows and columns. Defining M the bi-adjacency matrix of the network considered, NODF is computed as where N is a suitable normalization and C i,j = α M i,α M j,α is the number of co-occurrences between of rows i and j, C α,β = i M i,α M i,β is the number of co-occurring element between columns α and β. The function θ(x) is the Heaviside function, also known as step function. The two terms inside the square brackets are proportional to the row NODF and the column NODF respectively. On the contrary, the Temperature of Nestedness is computed through a rather convoluted formula evaluating the unexpectedness of 0/1 at the distance above/below the isocline coinciding with the line of perfect nestedness, which is determined by the density of the adjacency matrix. We use the code made available by the Nestedness for Dummies (NeD) project 1 to compute the Nestedness Temperature.

Modularity
Modularity measures the quality of the partitions of a network, i.e. a given community structure. According to the modularity metric, the best community structure maximizes where ξ i is the label of the partition to which node i belongs and A ij is the adjacency matrix of the network. In this work we compute the modularity of the monopartite projections of the bipartite networks corresponding to the layers connecting nations to their activities. We focus mainly on the evolution of the modularity of each layer, where the elements of the adjacency matrix A of equation (4) are related to C ij . Moreover, we check that the modularity of the monopartite representation of each bipartite network, often considered in literature about the block nestedness, does not create meaningful partitions in terms of specialized blocks [26]. The value of the modularity of the best partitioning measures the inter-dependence between the modules found, thus providing an estimate of the strength and stability of the proposed communities.

Results
As explained in the Materials and Methods section, the competitiveness of each nation in the various production and innovation activities can be estimated by evaluating the RCA. The RCA of the economic layer is usually based on International Trade data that is considered a qualitatively good measure of competitiveness. In the technological layer the RCA is often estimated by aggregating the patent production while in the scientific layer this is done through the log-citations counts. Thus, the measure of RCA are different across layers (see figure 1, right panel). For example, the profile of the RCA distribution of the scientific data peaks around 1, with a lower occurrence of low RCA values and a stronger power-law decay corresponding to large RCA values. Conversely, the economic data does not display a peak in the analyzed spectrum. For all layers, the power-law decay is a sign that the system is highly heterogeneous since it cannot be easily replicated by the RCA distribution obtained by considering random matrices. Yet, the slope of the decay depends on the particular properties associated with the relative competitiveness of the nations [27]. Interestingly, the slope of the RCA distribution of the Technological and Scientific layers have a cross-over around 1 (the global average) while the distribution of the RCA of the international trade exports does not present a clear cross-over. The binary representation of the RCA intrinsically defines the bipartite networks displaying the competitiveness of the nations in the basket of activities characterizing each layer. A basic quantity of interest in this representation is the network density defined as the fraction of observed links with respect to the maximum number. i.e. the number of links one would observe in the fully connected graph. The time series of the density of the Scientific and Technological layers indicates that the nations increase their diversification as time increases, as shown in the right panel of figure 1. This growth marks a second difference of Science with respect to the Economy layer, which features a much steadier evolution of diversification. Indeed, the density of the scientific environment grows almost linearly and it is probably triggered by the exponential growth of the scientific corpus with a doubling period of approximately a decade [28]. Such exponential growth is not found only at the global scale but also at the national level for most of the developed and developing countries (as shown in the inset of the right panel of figure 1). A similar pattern can be observed in the production of patents, while it is not reproduced in the available Export data in the last decades, but it was detected during the economic boom around the sixties [29].
The most important and characteristic pattern emerging from the binary representation is the presence of the triangular shape of the matrix M i,α , visible when the rows and columns are properly ordered [6,30]. All the layers display the hierarchical structure shown in figure 2, where top rows, corresponding to the most competitive nations, have a high diversification while some activities are highly ubiquitous (leftmost columns) and some others are performed only by the top nations (rightmost columns). This feature of triangularity, can be easily visualized when the matrices are properly ordered, and signals a high nestedness, though, unfortunately, a precise mathematical definition of nestedness is still lacking [31]. Among the innovation-related layers, science appears to be the less nested, or triangular, while the technological layer displays a relatively sharp boundary along the diagonal, which highlights its nested structure nicely within the associated adjacency matrix . This feature points to a structure of the scientific network that presents more intrinsic heterogeneity, compared to the other layers. Such heterogeneity was considered as the cause of the lower quality of the scientific layer in the context of the Economic Complexity [13,32], which was solved by the introduction of the more stable log-citations [11] metrics. Intriguingly, the network based on the log-citations metrics presents a hole in the top left corner, which is not observed in the other cases, suggesting that the top nations might be not very competitive in sectors with high ubiquity. Note that the absence of the top left links does not mean that the top nations are not producing science in the less complex scientific domains, rather that the number of citations they receive in those fields is below the fair share, given the global average 2 .
The lack of a precise definition of nestedness induces the derivation of different, albeit slightly counter-intuitive, metrics. Indeed, depending on the feature considered as the representative characteristic of nestedness, various metrics can be defined to make the concept operational. In this work we consider the Temperature of Nestedness [23] (or, simply, Temperature), and the NODF [25] because both are connected to different, yet related, features characterizing the dynamics of the innovative layers. The main difference between the metrics is that Temperature estimates the nestedness by the unexpected presence/absence of links in the empirical bipartite network with respect to the perfectly nested case for a fixed density 3 . Instead, NODF estimates the nestedness by considering the degree of overlap that each row and column has with the others. Hence, an important difference between the two metrics is that the algorithm computing the Temperature requires the network to be re-ordered to achieve the most nested arrangement, while NODF is independent on the ordering. In the following we opt for the re-ordering given by the Fitness-Complexity ranks whenever we want to compute the Temperature, since the Fitness-Complexity algorithms has been shown to outperform other techniques [33] in approximating the maximally nested arrangement.
Both measures of nestedness capture some feature of the dynamical evolution of the bipartite networks, and consequently of the innovation systems. For instance, by computing the overlap (co-occurrence) between nations, NODF can describe the ability of countries to follow the path of more developed nations, according to capability-based mechanisms. Indeed, NODF can be separated into the row and column components, allowing to disentangle the contribution of row-wise and column-wise co-occurrences to the nestedness of the system. On the contrary, Temperature evaluates the match and difference of the empirical network with a fully nested equilibrium of the environment, based on the stable state of a mutualistic system [27]. Therefore in the innovation systems, the Temperature is high when low performing nations are active in highly unexpected domains.
A problem encountered in the evaluation of the nestedness is that Temperature and NODF may depend on more basic topological properties of the networks that are not related to a particular visual pattern. For instance, the density of the network is the most important parameter in the estimation of the nestedness [31], so that comparing networks with different densities is problematic. Another typical source of bias in the comparison of the nestedness is given by the degree distributions (the distribution of the diversification of countries and of the ubiquity of activities) since their evolution is not random and presents high temporal persistence. The standard way to correct this issue is to extract the statistical significance of the nestedness, scattering the empirical measures with respect to those obtained in suitable random models able to represent the selected biases [34,35]. For instance, the Erdos-Renyi (ER) null model [36] draws a network ensemble constraining only the average density, while the Bipartite Configuration Model (BiCM) [29] constrains also the average network degrees.
Irrespective of the increasing trend of both the density and the nestedness measures over time, discounting only the density does not provide much information in terms of evolution of the nestedness in the innovations systems. Indeed, the ER ensemble of matrices is much less nested than the empirical matrices, as shown in the top panels of figure 3. According to the z-score values, which measures the distance of empirical values from ensemble averages in terms of ensemble standard deviations, in all layers empirical nestedness is much higher than in the random case. Instead, the null ensemble obtained by constraining the degrees of the nodes in the network with the BiCM, leads to a much lower significance of the features of the empirical network. Indeed, the value of the degrees is not random but has a persistent dynamics that affects the evolution of the nestedness, and this information must be accounted for in the null models. The information contained in the degrees is not usually taken into account in the biological literature on nestedness, where most of metrics were originally developed [37] (see however [38,39]). For example, the perfect patterns for the Temperature are related only to a single parameter, the density. On the contrary, in the Economic literature, diversification (i.e. the node degree in the binary network) is the main feature against which the nestedness is studied [40] 4 . However, the difference between the significance of Temperature and NODF is not strong and for what concerns this analysis, the two nestedness measures correlate.
The nested pattern that emerges in innovation layers is usually related to a competitive dynamics. Instead, collaborative systems can create a more clustered structure of the networks, with the growth of modules or communities [41]. In the biological realm, the modular organization of species interactions can increase the dynamical stability of the communities [42,43] toward exogenous (external) perturbations 5 . The combination of competitive and collaborative dynamics promotes the formation of local nested patterns or in-block nestedness, indicating the natural aggregation of components on the network. Focusing on the modular evolution of communities of nations having many internal connections (resulting from co-occurrences of activities) than connections with other communities, all the layers tend to level out at a comparable value of the optimal modularity, as seen in the left panel of figure 4. The decrease of the value of the best modularity is typically associated to a decrease in the quality of the modular structures [42,44] due to a more uniform distribution of links within and across partitions. However, the effective number of modules, or communities, seems to converge to a number between 3 and 5, suggesting that the network is getting more and more clustered, as a result of a more globalized and collaborative world. Interestingly, the best partitioning of the scientific activities indicates neither a clear organization nor a block-nested structure. Indeed, only on the layer of nations a meaningful partition is found while the scientific fields are more homogeneously distributed, thus the emergent community structure is dependent on how different classifications separate the fields of science. Following the same arguments discussed above regarding the nestedness, the modularity depends on the network's features and the comparison of different networks is performed using the random model for the bias-removal. The right panel of figure 4 indicates that the modularity in science becomes more and more significant, thus the partitions are more and more reliable. On the contrary, both the international trade export and the technological layers are closer to a random structure, indicating that the modules are less meaningful and can be partially ascribed to the diversification and ubiquity.

Discussion
In this Section we discuss possible interpretations of the results. The evolution of the scientific system highlights a constant growth of diversification: all countries are enriching their basket of active domains. Such growth persists from the end of the second world war. The less competitive nations become able to actively progress in more sophisticated research areas following the leading countries, while the leaders are developing new fields of research increasing their diversification further. This picture is consistent with the standard narrative of the evolution of innovative systems, from the Economic to the Technological layers, where the evolution of the competitiveness is related to the evolution of capabilities brought by each single actor and nation [45][46][47]. The same conclusion can be drawn here because the most reliable patterns can be obtained by the null model built by fixing the diversification and ubiquity on the empirical values (see figure 3). Instead, fixing only the global density generates an ensemble where the average behavior is very far from the evolution of the empirical patterns.
Furthermore, the nestedness of the network remains stable in the significant region considering both the Temperature and the NODF estimations, as seen in the central panels of figure 3. However, the binary matrix representation of the network indicates a new pattern hidden from the implementation of the citation counts metrics: two denser regions appear on the top right and bottom left sides with a hole in the top left corner. The appearance of two regions in the country layer suggests the emergence of a modular organization of nations that cannot be described by the sole dynamics of diversification, but needs to account a specialization mechanism. Thus, the region on top is populated by the most diversified and competitive nations that are focusing on the most complex scientific domains, reducing their competitiveness in the less complex sectors. Instead, the less competitive nations have more probability to be competitive in less sophisticated domains and are less efficient in the more complex sectors, as indicated by the fact that according to Temperature the empirical network is significantly more nested than the random.
A possible interpretation is that the evolution of the top nations brings them to dismiss resources allocated from the less complex fields favoring the most complex domains, which causes the appearance of the hole in the top left corner that is not present in the standard narrative of diversification [45]. This hole is characterized by RCA values below the global average, although these values remain close but below the threshold. Thus, in the scientific environment the driver of evolution is not simply the effort to achieve an increasing diversification of the basket of activities but, rather, a trade-off strategy between diversification and specialization in order to better allocate the available resources. This behavior is probably the trademark of the scientific layer, where funding is usually channelled toward the most complex and highly-cited domains, instead of being broadly distributed throughout the spectrum of scientific activities. For instance, in the production of physical goods there is an economic advantage to produce also the less complex artifacts, while this feature is less important in the scientific realm. At the same time, the scientific evolution of the less performing nations is driven by emulation of the dynamics of leading nations with an increase of diversification, as indicated by the high significance of NODF. However, the significance of the Temperature suggests that, although the less performing nations are able to be active in some sophisticated domain, this is lower that the random expectation.
The Technological layer yields the visually best nestedness because the plot of the network's adjacency matrix presents a clear triangular shape. The most diversifies nations lie at the top of the image with a roughly uniform basket of active sectors, ranging from the less to the most complex domains. The presence of a clear diversification among the nations suggest that the dynamics of the Technological environment follows the capability-based framework and for instance, NODF becomes more significant over time. Indeed, the NODF of rows (countries) follows that of the Scientific layer and the difference between the two relies on the different activities layers. Furthermore, the significance of the Temperature follows the same trend since the significant level of unexpectedness is comparable. Instead, the Technological environment is the less prone to create meaningful partitions, or modules, among the nations, and its dynamics is probably driven by the co-occurrence of knowledge and capabilities.
Finally, the Economic environment displays a more conservative evolution, keeping roughly constant the network density and the effective number of active competitors in the system. Thus, the dynamics of the network is more related to an evolution of the nestedness pattern and not of its size. Indeed, the nestedness follows the dynamics depicted by the other Technological systems, highlighting the possible presence of