Scarcity of scale-free topology is universal across biochemical networks

Biochemical reactions underlie the functioning of all life. Like many examples of biology or technology, the complex set of interactions among molecules within cells and ecosystems poses a challenge for quantification within simple mathematical objects. A large body of research has indicated many real-world biological and technological systems, including biochemistry, can be described by power-law relationships between the numbers of nodes and edges, often described as “scale-free”. Recently, new statistical analyses have revealed true scale-free networks are rare. We provide a first application of these methods to data sampled from across two distinct levels of biological organization: individuals and ecosystems. We analyze a large ensemble of biochemical networks including networks generated from data of 785 metagenomes and 1082 genomes (sampled from the three domains of life). The results confirm no more than a few biochemical networks are any more than super-weakly scale-free. Additionally, we test the distinguishability of individual and ecosystem-level biochemical networks and show there is no sharp transition in the structure of biochemical networks across these levels of organization moving from individuals to ecosystems. This result holds across different network projections. Our results indicate that while biochemical networks are not scale-free, they nonetheless exhibit common structure across different levels of organization, independent of the projection chosen, suggestive of shared organizing principles across all biochemical networks.

Statistical mechanics was developed in the nineteenth century for studying and predicting the behavior of systems with many components. It has been hugely successful in its application to those physical systems wellapproximated by idealized models of non-interacting particles. However, real-world systems are often much more complex, leading to a realization over the last several decades that new statistical approaches are necessary to describe biological and technological systems. Among the most natural mathematical frameworks for developing the necessary formalism is network theory, which projects the complex set of interactions composing real systems onto an abstract graph representation [1][2][3][4][5][6][7] . Such representations are powerful in their capacity to quantitatively describe the relationship between components of complex systems and because they permit inferring function and dynamics from structure [8][9][10][11][12] .
Network theory has been especially useful for studying metabolism. Metabolism consists of catalyzed reactions that transform matter along specific pathways, creating a complex web of interactions among the set of molecular species that collectively compose living things [13][14][15][16][17] . It is the collective behavior of this system of reactions that must be understood in order to fully characterize living chemical processes-counting only individual components (molecules) is inadequate. The structure of how those components interact with one another (via reactions) really matters: in fact it is precisely what separates organized biological systems from messy chemical ones [18][19][20] .
Within the formalism of network theory, one of the simplest ways to capture insights into the global structure of a network is to analyze the shape of its degree distribution. A huge volume of research into various complex biological, technological and social networks has therefore focused on identifying the scaling behavior of the corresponding degree distributions for network projections describing those systems. One of the most significant results emerging from these analyses is that many networks describing real-world systems exhibit ostensibly "scale-free" topology [21][22][23][24][25] , characterized by a power-law degree distribution. The allure of scale-free networks www.nature.com/scientificreports/ on whether the same or different generative mechanisms could operate to organize biochemistry across multiple scales.

Results
We use the statistical methods developed by Broido and Clauset 30 in what follows. All identified biochemical reactions encoded in each genome and metagenome were used to construct eight distinct network representations for each. This resulted in 8656 total network projections across the 1082 genome-level biochemical datasets, and 6280 total network projections across the 785 metagenome-level biochemical datasets. Each representation can be viewed as a different coarse-graining of the underlying system of reactions (i.e. the underlying dataset) (Fig. 1). We determine whether or not these datasets are scale-free, and analyze the aspects of them, and their diverse projections, that tend to lend themselves to be more or less scale-free. The alternative distributions that we compare to the power-law are: The exponential distribution, the log-normal distribution, the stretched exponential distribution, and the power-law distribution with a cutoff (see 29,30 for more details on these distributions).
We first classified each dataset in terms of how scale-free it is. Data are classified as: Super-Weak when for ≥ 50% of network projections, no alternative distributions are favored over power-law; Weakest if for ≥ 50% of network projections, a power-law cannot be rejected ( p ≥ 0.1 ); Weak if it meets the requirements for Weakest, and there are ≥ 50 nodes in the distribution's tail ( n tail ≥ 50 ); Strong if it meets the requirements of both Super-Weak and Weak, and the median scaling exponent satisfies ( 2 <α < 3 ); and Strongest if it meets the requirements for Strong for ≥ 90% of graphs, rather than ≥ 50% , and if for at least 95% of graphs none of the alternative distributions are favored over the power-law.
Our results are consistent with nearly all biochemical networks, at either the individual or ecosystem-level, being "super-weakly" scale-free (Fig. 2). While the power-law is better than other models, it is not itself a good model. When doing a goodness-of-fit test, we find the majority of network representations across individual and ecosystem-level networks have p < 0.1 . This indicates there is a < 10% chance that the data is truly power-law distributed. Additionally, when compared to other distributions through log-likelihood ratios, 99% of all data sets do not favor alternative heavy tail distributions to the power law for the majority of their network-projections ( Fig. 3, top row).
Where biochemical systems succeed and fail scale-free classifications. Goodness-of-fit p-value. The "weakest" requirement for a scale-free network introduced by Broido and Clauset stipulates at least 50% of a dataset's network-projections must have a power-law goodness-of-fit p ≥ 0.1 . For both individuals and ecosystems, only 6% of network-projections meet this requirement (Fig. 4, left column). This goodness-of-fit p-value requirement is the most restrictive of all scale-free requirements.
Tail size. Setting aside the fact each subsequent scale-free requirement builds on the requirement(s) of the preceding one, we find 98% of individual networks and 99% of ecosystem networks do meet the requirement of n tail ≥ 50 for a scale-free degree distribution (Fig. 4, center column).
The power-law exponent, α. Only 50% of individual-level networks and 51% of ecosystem-level networks meet the requirement that 2 < α < 3 for their degree distribution. The goodness-of-fit p value requirement, followed by the requirement constraining values of α , are the most restrictive when determining whether a biochemical network's degree distribution should be considered scale free (Fig. 4, right column).
Meeting the threshold for scale-free classification is dependent on the network representation. We find the results of each requirement listed above for classifying topology as scale-free differ across the eight network projection types for each dataset. Unsurprisingly, for most requirements, there exists a minute difference between the values observed for the largest connected component and entire graph of a given network projection type (Fig. 3, right column). Depending on the measure, there is a noticeably larger difference between the major network projection types, e.g., between bipartite, unipartite-reactions, unipartite-compounds (where all substrates participating in the same reaction are connected), and unipartite-compounds (where substrates on the same side of a reaction are not connected) (Fig. 3, right column).
Comparing to alternative distributions. Over 99% of individual and ecosystem-level datasets have 6 projections which do not favor any other distribution over the power-law (Fig. 3, top row, left column). No datasets have more than 6. The other two projections nearly always favor at least one other distribution over the power-law distribution-either the log-normal, exponential, stretched exponential, or power-law with exponential cutoff (Fig. 3, top row, right column). There are only 3 of the 6280 ecosystem-level network projections (across the 785 ecosystem-level datasets) that do not favor at least one of the alternative distributions. Oftentimes all four are favored over the power-law distribution (Fig. S1, rows 3-4). These results are identical, within 95% confidence, for both individuals and ecosystems.
Goodness-of-fit p-value. Out of all datasets, 80% of individuals and 84% of ecosystems have only a single projection type with p ≥ 0.1 for a power-law fit to their degree distribution. This indicates the majority of datasets would still not meet the "weakest" requirement for scale-free even with a threshold that lowered the percent of a dataset's projections needed to 25% (2 networks) instead of 50% (4 networks) (Fig. 3, 2nd row, left column). The unipartite projection where substrates on the same side of a reaction are not connected (unipartite-subs_not_ connected) was the most likely to satisfy p ≥ 0. www.nature.com/scientificreports/ between individuals and ecosystems is within the error. The unipartite-reaction projections were the least likely to satisfy p ≥ 0.1 , which is consistent with the observation that these networks always favor an alternative distri- www.nature.com/scientificreports/ bution as a better fit to the data than the power-law (Fig. 3, 2nd row, right column). As we initially reported, the majority of datasets do not meet the p-value threshold for being considered scale-free, although ecosystems-level datasets are more likely to meet the threshold.
Tail size. Out of all datasets, 98% of individuals and 96% of ecosystems meet n tail ≥ 50 for all projection types ( Fig. 3, 3rd row, left column). For 7 of the projection types, there is no difference between individuals and ecosystems, within 95% confidence (Fig. 3, 3rd row, right column).
The power-law exponent α. Out of all datasets, 95% of individuals and 97% of ecosystems meet 2 < α < 3 for 4 of 8 projection types (Fig. 3, bottom row, left column). The two types of unipartite-compound networks contribute to the datasets which meet the alpha-range requirement the majority of the time. That is, chances are if a dataset has at least 4 projection types meeting 2 < α < 3 , two of them are going to be unipartite-compound network projections (Fig. 3, bottom row, right column). The results are similar for both individuals and ecosystems.
Correlation of results between projections. Because 8 different network projections are derived from a single biochemical dataset, there is reason to expect the proportions of each projection type meeting any given scalefree criteria are correlated. We therefore constructed a Pearson correlation matrix to test whether there are correlations between projections (Fig. S2). Unsurprisingly, we find that values from projections of a network's LCC and entire graph are highly correlated. All types of unipartite compound networks tend to be correlated. Values across many other projection types are barely correlated for the p-value and n tail criteria. Ecosystems tend to show more correlation, across all projection types, than individuals.

Distinguishing individuals and ecosystems based on their degree distributions. Multinomial
regression. We used multinomial regression on network and degree distribution data from the above analyses to attempt to distinguish individuals from ecosystems. Most measures cannot reliably distinguish between these two levels of organization, with only network size and network tail size data distinguishing the two levels better than chance. Using only network size, ecosystems could be correctly identified in test data 72.23% of the time, whereas individuals could be correctly identified 85.33% of the time (Fig. 5, left columns). When normalizing other measures to network size, the only one that improved in distinguishing individuals and ecosystems to be better than chance was dexp (Fig. S3). This is a measure of which type of distribution is favored (or neither) when doing a log-likelihood ratio test between the power-law and exponential distribution.
Random forest. Random forest classifiers are a supervised machine learning technique that use decision trees to make classifications. When using random forests to try and distinguish individuals and ecosystems based on network and distribution data, we find ecosystems can be correctly predicted 87.01% of the time, and individuals can be correctly predicted 95.82% of the time (Out of bag, OOB, error rate is 7.91%). However, the size of the network and size of the degree distribution tail once again are the best relative predictors. Without network size and tail size, the prediction accuracy drops to 79.27% for ecosystems and 94.81% for individuals (OOB error rate of 11.80%). When doing random forest classification by projection type, the prediction accuracies are still above 75% for ecosystems and 91% for individuals across all projections, which is better than multinomial regression models even when information about network size is included (Fig. 5, left columns; Table S1). Mean degree was the best predictor across all network projection types. However, the power-law distribution is not a "good" fit for most dataset network representations. No networks meet the "Strongest" criteria defined by Broido and Clauset al. 30 . Overlaid values show the percent of networks of each level which fall into each category, ±2SD. Right The relationship between scale-freeness and largest network size across projections (n). All datasets containing networks larger than approximately 2100 nodes have degree distributions that rule out fitting well to a power-law.    random forest. Each subplot shows the accuracy of using a particular network or statistical measure to predict whether that network data came from a biological individual or ecosystem. The top plots show prediction accuracy from using multinomial regression across all network projection types, and the bottom plots show prediction accuracy using random forest on each type of projection. The random forest classifier is much better at predicting individuals and ecosystems correctly from network data, even without direct access to network size. All random forest predictions have an accuracy of at least 75% across all projection types. Subplots measures are: power-law alpha value; log-likelihood result from power-law vs. exponential; log-likelihood result from power-law vs. log-normal; log-likelihood result from power-law vs. power-law with exponential cutoff; log-likelihood result from power-law vs. stretched exponential; the network mean degree; network node size; degree distribution tail size; network edge size; the p-value of the goodness-of-fit test for the power-law model; cutoff degree value for network tail. These are the only predictors used in the random forest classifier. Prediction accuracy is random if ≤ 50% , Fair if > 50% , and Good if ≥ 75%.

Discussion
Our results indicate biochemical systems across individuals and ecosystems are, at best, only weakly scale-free. This is revealed by studying all possible projections of biochemical systems in tandem: only six of the eight network projection types analyzed favor power-law distributions over alternatives and in all cases the power-law is not itself a good fit to the data. Nonetheless, we can conclude individuals and ecosystems both share qualitatively similar degree distribution characteristics, and while this is a very coarse-grained measure of network structure, it suggests the possibility of shared principles operating across levels of organization to architect biochemical systems. The random forest distinguishability analyses demonstrate how using a combination of all the results of scale-free analyses completed in this paper can predict, better than chance, whether the data comes from individuals or ecosystems. Individuals are perhaps more tightly constrained in their network structure, based on being able to more accurately predict them based on simple network characteristics. Whether or not this structure is truly a universal property of life's chemical systems is more difficult to conclude. Based on the sample sizes, we are confident our results hold over the population of genomes and metagenomes in the JGI and PATRIC databases. However, the observed scaling is only reflective of biology universally if the databases are unbiased in sampling from all of biology on Earth, and this is impossible to know with certainty (see e.g. proposals of 'shadow life' and reports of missing biota 49,50 ). Nonetheless, the fact that multiple levels and multiple projections of biochemistry reveal common structure suggests universal principles may be within reach if cast within an ensemble theory of biochemical network organization (see e.g. also 41 ).
Achieving an ensemble theory for biochemistry will requires different approaches to those that have been used to apply to cases of simpler physical systems where statistics over individual components are sufficient to describe and predict their behavior. Complex systems are complex precisely because they require additional information about the structure of interactions among their many components. This challenge is well-known. However, the most effective methods for projecting these high-dimensional structures onto simple mathematical objects to enable their analysis and comparison is among the most central problems of complexity science. By contrast, in physics coarse-graining procedures are well known, but we are not so advanced in understanding complex systems that we have similarly useful tools at hand. A first challenge is to identify coarse-grained network representations, which is subject to debate. Current literature cautions against the use of unipartite graphs, as they can lead to "wrong" interpretations of some system properties, including degree 34,51 . We find instead that this conclusion is not so easy to arrive at. Whether the interpretation of a given representation is correct depends strongly on the characteristics of the degree distribution under consideration. As an example, all network projection types in our analysis, aside from unipartite reaction networks, favor power-law degree distributions over other heavy-tailed alternatives (Fig. 3, top row). For power-law α , there are a similar proportion of networks with bipartite projections and ecosystem unipartite reaction projections with 2 < α < 3 (within 2SD). However, the proportion of networks within this alpha range differ when compared using ecosystems, or any unipartite compound projections (Fig. 3, fourth row). Nearly all projections show different results for the scale-free p-value cutoff (Fig. 3, second row). While previous work 32, 33 has advocated for unipartite networks (where all compounds that participate in a reaction are connected-called uni-compounds here), we find these overestimate the power-law goodness-of-fit p-estimates and the values for α when compared to reaction networks or bipartite networks (Fig. 3). The nuances of both similarity and difference in the structure of the same system across different projections can provide insights into the underlying system of interest, providing details that are inaccessible looking at just one projection. That is, regardless of whether or not a given projection is scale-free, all projections provide insight into the underlying system. In physics, we are accustomed to a unique coarse-grained descriptor describing all relevant features. To understand complex interacting systems, such as the systems of reactions underlying all life on Earth, it may be the case that we should forgo the allure of simple, singular models with only one coarse-grained description. Instead, to characterize living processes, it may be time to adopt and develop theory for statistical analyses over many projections in tandem.

Materials and methods
Obtaining biological data. Bacteria and Archaea data were obtained through PATRIC 52 . Starting with the 21,637 bacterial genomes available from the 2014 version of PATRIC, we created a parsed dataset by selecting one representative genome containing the largest number of annotated ECs from each genus. Unique genera (genera only represented by a single genome) were also included in our parsed data. Uncultured/candidate organisms without genera level nomenclature are left in the parsed dataset. This left us with 1152 parsed bacteria, from which we chose 361 randomly to use in this analysis. Starting with 845 archaeal genomes available from the 2014 version of PATRIC, we randomly chose 358 to use in this analysis. Enzyme Commission (EC) numbers associated with each genome were extracted from the ec_number column of each genome's .pathway.tab file.
Eukarya and Metagenome data were obtained through JGI IMG/m 53 . All 363 eukaryotic genomes available from JGI IMG/m as of Dec. 01, 2017 were used. Starting with the 5586 metagenomes available from JGI IMG/m as of June 20, 2017, 785 metagenomes were randomly chosen for this paper's analyses. Enzyme Commission (EC) numbers associated with each genome/metagenome were extracted from the list of Protein coding genes with enzymes, and metagenome EC numbers were obtained from the total category. All JGI IMG/m data used in this study were sequenced at JGI.
Because each EC number corresponds to a unique set of reactions that an enzyme catalyzes, the list of EC numbers associated with each genome and metagenome can be used to identify the reactions that are catalyzed by enzymes coded for in each genome/metagenome. We use the Kyoto Encyclopedia of Genes and Genomes (KEGG) ENZYME database to match EC numbers to reactions, and the KEGG REACTION database to identify www.nature.com/scientificreports/ the substrates and products of each reaction [46][47][48] . This provides us with a list of all chemical reactions that a genome/metagenome's enzymes can catalyze.
Generating networks. Each genomic/metagenomic dataset is used to construct eight representations of biochemical reaction networks. We refer to each type of representation as a "network projection type" throughout the text: 1. Bipartite graph with reaction and compound nodes. A compound node C i is connected to a reaction node R i if it is involved in the reaction as a reactant or a product. Abbreviated in figures as bi-full. There exists a version of each of these four network construction methods for the largest connected component (LCC), and for the entire graph, yielding a total of eight network projections for each dataset (Fig. 1). These network projection types are signified in the figured by appending -largest and -entire to the network projection abbreviations. Some datasets may yield identical networks for their LCC and entire graph, if there is exists only a single connected component.
Assessing the power-law fit on degree distributions. As defined in Clauset 29 , a quantity x obeys a power law if it is drawn from a probability distribution where α , the exponent/scaling parameter of the distribution, is a constant. In order to estimate α , we follow the methods described in Clauset 29 , and use an approximation of the discrete maximum likelihood estimator (MLE) where x min is the lower bound of power-law behavior in our data, and x i , i=1,2,...,n, are the observed values x such that x i ≥ x min . The standard error of our calculated α is given by where the higher-order correction is positive 29 . Because many quantities only obey a power-law for values greater than some x min , the optimal x min value must be calculated. The importance of choosing the correct value for x min is discussed in detail in Clauset et al. 29 . If it is chosen too low, data points which deviate from a power-law distribution are incorporated. If it is chosen too high, the sample size decreases. Both can change the accuracy of the MLE, but it is better to err too high than too low.
In order to determine x min , we use the method first proposed by Clauset et al. 54 , and elaborated on in Clauset et al. 29 : we choose the value of x min that makes the probability distributions of the measured data and the best-fit power-law model as similar as possible above x min . The similarity between the distributions is quantified using the Kolmogorov-Smirnov or KS statistic, given by where S(x) is the cumulative density function (CDF) of the data for the observations with value at least x min , and P(x) is the CDF for the power-law model that best fits the data in the region x ≥ x min . Our estimate of x min is the one that minimizes D.
We used the github respository made available in Broido and Clauset 30 to determine the optimal x min of all our degree distributions, and to subsequently calculate the MLE in order to determine the scaling exponent α and the standard error on α , σ 55 .
A power-law can always be fit to data, regardless of the true distribution from which it is drawn from, so we need to determine whether the power-law fit is a good match to the data. We do this by sampling many synthetic data sets from a true power-law distribution, recording their fluctuation from power-law form, and comparing this to similar measurements on the empirical data in question. If the empirical data has similar form to the synthetic data drawn from a true-power law distribution, then the power-law fit is plausible. We use the KS statistic to measure the distance between distributions.
(2) www.nature.com/scientificreports/ We use a goodness-of-fit test to generate a p-value which indicates the plausibility of a hypothesis. The p-value is defined as the fraction of the synthetic distances that are larger than the empirical distance. If p is large (close to 1), then the difference between the empirical data and the model can be attributed to statistical fluctuations alone; if it is small, the model is not a plausible fit to the data 29 . We follow the methods in Clauset et al. 29 -and implement them with the github package used in Broido and Clauset 30 -to generate synthetic datasets and measure the distance between distributions. Following these methods, we chose to generate 1000 synthetic datasets in order to optimize the trade-off between having an accurate estimation of the p-value and computational efficiency. If p is small enough ( p < 0.1 ) the power law is ruled out. Put another way, it is ruled out if there is a probability of 1 in 10 or less that we would by chance get data that agree as poorly with the model as the data we have 29 . However, measuring a p ≥ 0.1 does not guarantee that the power-law is the most likely distribution for the data. Other distributions may match equally well or better. Additionally, it is harder to rule out distributions when working with small sample sizes.
A better way to determine whether or not data is drawn from a power-law distribution is to compare its likelihood of being drawn from a power-law distribution directly to a competing distribution 29,56 . We use the exponential, stretched-exponential, log-normal, and power-law-with-cutoff distributions as four competing distributions to the power-law. While we cannot compare how the data fits between every possible distribution, comparing the power-law distribution to these four similarly shaped competing distributions helps us ensure that our results are valid.
We use the log-likelihood ratio test R 29,56 to compare the power-law distribution to other candidate distributions, where L PL and L Alt are the log-likelihoods of the best fits for the power-law and alternative distributions, respectively. This can be rewritten as a summation over individual observations, with the log-likelihood of single observed degree values under the power-law distribution, ℓ (PL) i , and alternative distribution, ℓ (Alt) i , are summed over the number of model observations, n tail . If R > 0 , the power-law distribution is more likely; if R < 0 , the competing candidate distribution is more likely; if R = 0 , they are equally likely. Just like with the goodness of fit test, we need to make sure our result is statistically significant ( p < 0.01 ). The methodology described here summarizes the methodology introduced by Clauset et al. (2009), and described again in Broido and Clauset 29,30 and more details such as the exact formulas for alternative distributions, and derivation of the p-value for R can be obtained therein.
Classifying network scaling. We classify each genomic/metagenomic dataset, as represented by the set of eight network projection types, as having some categorical degree of "scale-freeness" from "super-weak" to "strongest". This classification scheme was introduced by Broido and Clauset 30 in order to compare many networks with different degrees of complexity, and the definitions below were extracted from therein: • Super-Weak For at least 50% of graphs, none of the alternative distributions are favored over the power law.
The four remaining definitions are nested, and represent increasing levels of direct evidence that the degree structure of the network data set is scale free: • Weakest For at least 50% of graphs, the power-law hypothesis cannot be rejected ( p ≥ 0.1).
• Weak The requirements of the Weakest set, and there are at least 50 nodes in the distribution's tail ( n tail ≥ 50 ). • Strong The requirements of the Weak and Super-Weak sets, and that 2 <α < 3.
• Strongest The requirements of the Strong set for at least 90% of graphs, rather than 50%, and for at least 95% of graphs none of the alternative distributions are favored over the power-law.
Categorizing a network as "Super-Weak" is in effect saying that that network's degree distribution data is better modeled by a power-law fit than alternative distributions. This is independent of whether or not the power-law model is a good fit to the data, which is what is what the "Weakest" and "Weak" definitions emphasize. A network may be classified as "Super-Weak" without meeting any of the nested definition's criteria. Similarly, a network may be classified as "Weak" without meeting the criteria in the "Super-Weak" definition. We believe this framework is a proper way to classify the degree-distributions of biochemical networks, given that there are many different accepted ways to represent biochemical reactions as networks, and each has their pros and cons [32][33][34] .
Standard error and correlation. The black error bars on each plot represent 2 standard deviation (2SD) around the sample proportion p (the height of the bar, which we also refer to as the mean). This is equivalent to 2 standard error around the mean (2SEM), or a 95% confidence interval for the true population proportion p (true population mean). Standard deviation was calculated by treating each category as a binomial distribution, meaning the standard deviation is given by: www.nature.com/scientificreports/ Although the errors for each plot's categories are calculated independently, there is co-variance between many of them. This is especially true for the right column of Fig. 3, where all bars of a color total to a fixed number of datasets, with each dataset falling into one of the 8 network projection type bins. Because of this, we also calculated the correlations between each network projection type, across both individuals and ecosystems (Fig. S2). The correlation matrices were calculated by using the pandas function DataFrame. correlation(method='pearson') on a matrix of binomially distributed True/False values representing whether each dataset passed or failed specific scale-free criteria for p-value, tail size, or power-law exponent value ( α ), for each network-projection.
Classifying levels of biology using degree distribution data. We used two different statistical methods, multinomial regression and random forest classifiers, in conjunction with the scale-free classification scheme above in order to test if individuals and ecosystems were distinguishable based on their degree distribution characteristics.
Multinomial regression. For our multinomial regression, the response class is the biological level (individual or ecosytem), and a single network or statistical measure is the dependent variable. In order to control for over fitting the training data was composed of an equal number of samples from each level. The number of networks used for training data was chosen to be equal in size to 80% of all ecosystem projections, because there were less ecosystem datasets used than individual datasets. This corresponded to 80% of 6280 networks (of all projection types), or 5024 networks. The model was tested on the 20% of the data that it was not trained on. This process was repeated 100 times and the average model error is reported in the results and Fig. 5, left columns. The multinom and predict functions from the R-package nnet were used to do the multinomial regression.
Random forest classifiers. We used a random forest to attempt to classify networks as falling into the category of individuals or ecosystems. In the first scenario, we used 11 predictors: power-law alpha value ( α ); log-likelihood result from power-law vs. exponential (dexp); log-likelihood result from power-law vs. log-normal (dln); log-likelihood result from power-law vs. power-law with exponential cutoff (dplwc); log-likelihood result from power-law vs. stretched exponential (dstrexp); the network mean degree ( < k > ); network node size (n); degree distribution tail size ( n tail ); network edge size ( n edges ); the p-value of the goodness-of-fit test for the power-law model (p); and cutoff degree value for network tail ( x min ). In the second scenario, we repeated the random forest without the three predictors which can be directly used to quantify the size of a network (n, n tail , and n edges ).
In the third scenario, we repeated the random forest without the three predictors on each network projection type independently. For each scenario, we randomly split our data in two halves: one for training, and one for testing (for the third scenario, each training and testing set is 1/8 as large as for the first two scenarios, since we run the classifier on each network projection type independently). In all scenarios, we use the randomForest function from the R-package randomForest for classification. Three features were used to construct each tree (mtry=3), which is ≈ √ n features , with 100 trees generated each time (enough time for the out-of-bag, or OOB, estimate of the error rate to level off).