Biconnectivity of the cellular metabolism: A cross-species study and its implication for human diseases

The maintenance of stability during perturbations is essential for living organisms, and cellular networks organize multiple pathways to enable elements to remain connected and communicate, even when some pathways are broken. Here, we evaluated the biconnectivity of the metabolic networks of 506 species in terms of the clustering coefficients and the largest biconnected components (LBCs), wherein a biconnected component (BC) indicates a set of nodes in which every pair is connected by more than one path. Via comparison with the rewired networks, we illustrated how biconnectivity in cellular metabolism is achieved on small and large scales. Defining the biconnectivity of individual metabolic compounds by counting the number of species in which the compound belonged to the LBC, we demonstrated that biconnectivity is significantly correlated with the evolutionary age and functional importance of a compound. The prevalence of diseases associated with each metabolic compound quantifies the compounds vulnerability, i.e., the likelihood that it will cause a metabolic disorder. Moreover, the vulnerability depends on both the biconnectivity and the lethality of the compound. This fact can be used in drug discovery and medical treatments.

Scientific RepoRts | 5:15567 | DOi: 10.1038/srep15567 small number of links. This finding suggests that SF networks can reliably execute their core functions, even with the loss of a significant fraction of links 29 .
Small perturbations are often more common than large ones. Cellular networks need to maintain their regularity during small but frequent perturbations; therefore, a single path between two elements may be insufficient for the robust maintenance of their normal states. If more than one disjoint path exists between two nodes, the nodes can remain connected and therefore function normally, even with some loss in one path. The impact of multiple pathways has recently been investigated in various contexts, including the epistatic interactions of different enzymes in yeast metabolism 30 , the pair-wise interactions of drugs 31 , and the synthetic lethality of bacterial metabolism 32,33 .
The reaction pathways of metabolism are organized for the survival and reproduction of a species 34 . If there are alternatives for many pathways, the robustness under perturbation is increased, thus endowing a species with a high degree of resilience. However, the establishment of backup pathways may have a biochemical and entropic cost; the establishment of a pathway can require biochemical resources, and it can take substantial time for a mutant equipped with a backup pathway of interest to appear. Therefore, the organization of alternative or backup pathways can indicate the ways in which living organisms remodel their cellular networks towards high resilience during evolution. In this work, we analyzed the structures of the metabolic networks of hundreds of living species to determine the organization of backup pathways. We consulted the BioCyc database version 13.1 8 to construct the bipartite metabolic networks of 506 species and to numerically determine their clustering coefficients and biconnected components (BCs). The clustering coefficient quantifies the abundance of diamond-shaped subgraphs that involve two distinct reactions that process two metabolites. In graph theory, two nodes are considered to be biconnected if there are two disjoint paths connecting them. A BC is a subset of nodes and links in which every pair of nodes is connected by two or more disjoint paths and therefore is expected to be robust against perturbations because most pairs of nodes would remain connected even if some pathways break.
We demonstrated that real networks have, compared with randomly wired networks that preserve degree distributions of both metabolites and reactions, greater clustering coefficient but smaller largest biconnected component (LBC), and indicating the scale-dependent organization of pathways for the maintenance of order. This feature is preserved even after coarse graining of nodes into modules. We subsequently defined the biconnectivity of individual compounds by counting the number of species in which the compound belonged to the LBCs. The establishment of a detour pathway is determined by the competition in terms of cost and benefit. Evolutionarily old compounds, identified here with those present in many species, were demonstrated to have high biconnectivity because they are associated with many backup pathways. Additionally, the important compounds, which may be lethal if their concentrations reach abnormal levels and which were identified with compounds located close to the biomass components, exhibited high biconnectivity, thus suggesting that abundant backup pathways are associated with compounds of functional importance. The biconnectivity of each compound is related to the maintainability of its normal state. Therefore, the likelihood of disease occurrence related to a compound is expected to depend on the compounds biconnectivity. By identifying human diseases, the pathogenesis of which could potentially involve the abnormality of each metabolic compound 8,24,25,[35][36][37] , we demonstrated that the prevalence of the associated diseases was negatively correlated with biconnectivity. Disease prevalence should also depend on the lethality of the compound, and we demonstrated that the correlation between the biconnectivity and the lethality of metabolic compounds facilitates an understanding of the disease prevalence patterns.
The work we present here is an extensive and quantitative analysis of pathway organization across hundreds of species. The obtained results offer insights into the evolutionary pressure imposed on the topology of metabolic networks that should be stable while interacting ceaselessly with fluctuating environments. We discuss the implications of our findings for human diseases and drug target searches.

Results
Biconnectivity of the metabolic network of each species. In the bipartite metabolic network for each species, a reaction is connected by undirected links to all its substrates and products [ Fig. 1(a-c)]. Although the reversibility of individual reactions may provide important information, some irreversible reactions may be forced in the reverse direction under certain physiological conditions of temperature or metabolite concentrations [38][39][40] . The numbers of metabolic reactions, N r , compounds, N c , and links, L, are not broadly distributed over species [ Fig. 1(d)]. In contrast, the number of reactions that process a compound is highly heterogeneous. The distribution of the degree k of a compound follows a power-law ( ) γ − p k k , with the exponent γ, which ranges between 2 and 3 [ Fig. 1(e)] 6 . In this study, we applied the compound-centric view in considering the biconnectivity of the metabolic networks, because the stable production and consumption of metabolites is the most important function of metabolism. Moreover, the presence of two distinct paths is more meaningful for a pair of compounds than for a pair of reactions. When two metabolites are connected by two pathways, the production of one metabolite from the other is conducted along two pathways and can occur even if one pathway is broken. However, two pathways that connect two reactions do not guarantee the stability of both reactions.    For comparison, we generated three ensembles of rewired networks; they are obtained by relocating of randomly selected links completely randomly ('rewired-all'), by changing the end reactions of the links to randomly selected reactions, which preserves the degree of every compound ('rewired-c'), and by exchanging the neighbor reactions of two metabolites via crosslinking, which preserves the degree distributions of both compounds and reactions ('rewired-cr').
The clustering coefficients, C, and the fractions of the LBC, m 2 , were plotted as a function of the mean degree = / k L N 2 for the real and the rewired networks of the 506 species in Fig. 3(a,b). On average, 75% of the nodes of the real networks belonged to the LBC, i.e., = . m 0 75 2 and = . C 9 5. The clustering coefficient was greater than any of the considered rewired networks, thus suggesting enhanced biconnectivity on a small scale. The fraction of the LBC was smaller than the rewired-all and the rewired-cr networks; however, it was larger than the rewired-c networks.
It has been shown 42,43 that m 2 depends on k and the large-k behavior of the degree distribution p(k). When k is not too small, the random SF networks have smaller LBCs than the Erdös-Rény (ER) networks, which are constructed by randomly assigning links and thus have narrow Poisson degree distributions. Therefore, the order of the m 2 s of the real, rewired-all, and rewired-c networks can be understood by noting that the rewired-all networks have narrower degree distributions for both reactions and compounds compared with the real networks, whereas the rewired-c networks have broader degree distributions for reactions than the real networks. It should be noted, however, that in large-scale failures in which a significant number of links are lost and k becomes quite small, hub nodes, the abundance of which is responsible for broad degree distributions, play a dominant role as seeds for building connected components and BCs, leading to larger LCCs and LBCs than without hubs 43,44 . Therefore, the abundance of hub nodes facilitates large-scale failure survival.
The topology of metabolic networks exhibits various features, such as modularity 19,20 , which may be responsible for the difference between the real networks and the rewired-cr networks. Cellular metabolism consists of biochemical pathways, such as glycolysis, the TCA cycle, and fatty acid pathways 45 , and the graph-theoretical approaches enable us to computationally identify the modules that exhibit high intraconnectivity and low interconnectivity 20,46 . In the application of the algorithm utilizing the maximization of modularity 47 , one can obtain the modules, which provide a coarse-grained view of the cellular metabolism in each species. We identified a larger number of modules N mod in the real networks than any ensemble of the rewired networks. These modules were non-uniform in size compared with the rewired networks [ Fig. 3(c)]. The networks of N mod nodes and L mod links, in which two nodes (modules) are connected if a link connects the nodes that belong to them in the original network, exhibit clustering coefficients and LBCs larger than the modular networks extracted from the ER networks [blue lines in Fig. 3(d,e)] for a given mean degree In the real networks, compared with the modular networks from the rewired-cr networks, the modular clustering coefficients were similar; however, the modular LBC, m 2,mod , was substantially larger in the rewired-cr networks than the real networks, similar to m 2 . Given the prominent modularity of metabolic networks 19,20 , the smaller values of m 2 , a measure of the long-range biconnectivity, of the real networks than the rewired-cr networks can be related to the larger number of modules and their low biconnectivity. The backup pathways are subsequently expected to be selectively established for pathways that require a high degree of stability.
Although the specific modules in a given network may differ depending on the algorithm used, we demonstrated that our results were not changed even when we used a different algorithm, such as the algorithm in 48 (Fig. S2). To further confirm the robustness of our results, we performed the same analysis after removal of the highly abundant compounds that are involved in numerous reactions, such as protons, water, adenosine triphosphate (ATP), and carbon dioxide 49 , in the metabolic networks of the studied species, and our results remained valid (Fig. S3). For example, even without currency metabolites 49 , the fraction of the LBC is only mildly changed ( = . ) m 0 73 2 . We also analyzed the well-curated metabolic reconstructions of two strains of Escherichia coli and Helicobacter pylori, Pseudomonas putida, Staphylococcus aureus, Methanosarcina barkeri, Mycobacterium tuberculosis, Saccharomyces cerevisiae, and Homo sapiens from the BiGG database 50 ; our results did not qualitatively change, although some trends were difficult to identify because of the small number of data points [ Fig. S4].
The evolution of species, including speciation, may have occurred along with the recruitment of new pathways, which enhanced stability, or the removal of those that were merely redundant, thus leading to the gradual variation of the biconnectivity of metabolism 1,2 . Using the National Center for Biotechnology Information (NCBI) taxonomy database for species classification according to morphological studies and molecular information 51,52 , we computed the network distance, d ij , which represents the length of the shortest path in the phylogenetic tree, between every pair of species i and j to define as the evolutionary distance. Considering the normalized difference of m 2 between two species i and j, defined as Compounds with a large f, and the enzymes that bind to them, may have been introduced to metabolism in the early stages of evolution and inherited by many contemporary species. Therefore, compound popularity can be a measure of evolutionary age. The popularity is broadly distributed, and follows a power-law distribution: ( ) . ± .  1 0 0 1. If a compound belongs to the LBC, it can maintain a connection to most core compounds; thus, its concentration can be readily stabilized, even when some pathways are inactive. We define the biconnectivity, b i , of a compound i, as the ratio of the number of species that have compound i in the LBC to the number of all species having it. The compound biconnectivity is expected to be related to the maintainability of the normal state against perturbations by utilizing alternative pathways. We demonstrated that among 6058 compounds 3071 compounds had b = 0, which indicates that they were not protected by backup pathways in their communication with the core compounds in any species. Many of these unprotected compounds were not isolated; they belong to the LCC. Defining the (single) connectivity g i as the ratio of the number of species that have a compound i in the LCC to the number of all species that have it, we demonstrated that 5097 compounds had g = 1 and 2545 compounds had g = 1 and b = 0. We hypothesize that the abnormal concentration of a compound with low biconnectivity is not sufficiently lethal to the cells overall function as to overcome the entropic and biological costs of establishing backup pathways. In contrast, 971 compounds, approximately 16% overall, belonged to the LBC in all species. Chlorides, 3-beta-hydroxysterols, brassinosteroids, trans-polyisoprenyls, amino acids, quinones, folates, and uridine diphosphate glucose were the first few types of compounds that exhibited high biconnectivity. However, organosulfur, antibiotics, and sugar-alcohols exhibited quite low biconnectivity. The biconnectivity distribution P(b) peaked both at b = 0 and b = 1, whereas the connectivity distribution P(g) peaked at g = 1 [ Fig. 4(a)].
We also evaluated the biconnectivity of each of the 924 pathways, as listed in the BioCyc database, via the mean biconnectivity of the involved compounds. When the analysis was restricted to the pathways consisting of more than 10 reactions, glycolysis had the largest biconnectivity b = 0.999, which indicates that the compounds involved in these pathways had that value of biconnectivity on average. Following the glycolysis pathways were L-histidine biosynthesis (b = 0.991), L-lysine biosynthesis (b = 0.989), adenosylcobalamin biosynthesis (b = 0.973), mixed acid fermentation (b = 0.962), and folate biosynthesis (b = 0.953). In contrast, nicotine degradation (b = 0.324) had the smallest biconnectivity, followed by the tRNA charging pathway (b = 0.382), the gibberellin inactivation pathway (b = 0.436), the phospholipase pathways (b = 0.524), and mycolate biosynthesis (b = 0.557).
Evolutionarily old compounds may have had more chances to acquire alternative pathways than young compounds over time, which effectively reduced the cost of establishing backup pathways. Relating the popularity f to the evolutionary age of each compound, one can expect a positive correlation between the popularity f and the biconnectivity b, which was confirmed by our analysis shown in Fig. 4(a). The correlation between the connectivity g and f was substantially weaker.
The metabolic compounds occupy different locations in each metabolic network and play different roles of varying importance in metabolism. On average, the compounds located close to other compounds may be lethal because their abnormality can quickly spread to other compounds. The deprivation of essential enzymes can halt the production of the biomass components, as well as cell growth 54 . For each compound i, we evaluated the average of the inverse of the network distance to the biomass components and denoted it by the closeness c i to represent its potential lethality. We used the union of the biomass components identified for several well-studied species 55 , which may deviate from the true list of biomass components of each species. In ref. 55, the biomass components of B. subtilis, E. coli, H. pylori, S. aureus, M. barkeri and S. cerevisiae have been experimentally determined. Here, we consider the lethality of a compound as the probability that local perturbation around the compound can threaten the most important function of the metabolic network that generates the biomass components; furthermore, the compounds' lethality may arise in a context other than their impact on biomass components. Figure 4(b) indicates that the biconnectivity b increases with the closeness c. This correlation implies that the metabolic networks have many loops around the biomass components, and the backup pathways are not equally distributed but concentrated around functionally important (lethal) compounds. Interestingly, we identified a crossover behavior b as a function of c: α b c , with α .  0 4 for small c and α .  1 7 for large c, which indicates that the evolutionary pressure for backup pathways is substantially stronger for the compounds with high lethality.

Implications of biconnectivity on the prevalence of human diseases. The malfunction of an
enzyme accompanied by perturbation in the concentrations of related compounds is not confined and may spread in metabolism, which could result in a disorder on a large scale. Backup pathways help suppress these cascading failures. Human diseases, particularly metabolic diseases, may arise from cellular-level failures; thus, the prevalence pattern of various diseases may reflect the organization of backup pathways.
Inherited disorders are associated with the mutations of specific gene(s) and have been archived, e.g., in the Online Mendelian Inheritance in Man (OMIM) database 35 . In cases in which a disease gene generates enzymatic proteins that catalyze certain reactions, the perturbations of the compounds processed by the reactions may be involved in the pathogenesis of the associated diseases 24,37 . We consulted the OMIM database 35 and the BioCyc database 8 to identify the diseases potentially associated with each metabolic compound [ Fig. 5(a)]. To assess the likelihood that a compound's fluctuation would eventually lead to a human disease, we computed the mean prevalence of the associated disease. Here, the prevalence is the fraction of the patients diagnosed with a given disease among all patients in the Medicare dataset 24,25,36,37 . For each compound i, we took the mean prevalence of the associated diseases as its vulnerability v i .
The human metabolic network has 1513 compounds that are processed by 1451 reactions, and only 754 compounds are associated with 477 diseases. The compounds that have multiple pathways to the core compounds are less likely to cause diseases than the compounds without multiple pathways. In Fig. 5(b), the disease compounds in the LCC but outside the LBC of the human metabolic network have an average 0 012, which is larger than = . v 0 0075 of the human LBC (P = 0.002). In contrast, the disease compounds located outside the LCC have = . v 0 0025, which is smaller than the compounds in the LCC because local fluctuations in the small components do not affect the LCC in which most elements reside.
An example of a disease that is associated with low-b and a high prevalence is depressive disorder with an accelerated response to antidepressant drug treatment, with = . v 0 00815. It is associated with peptidyl-proline, in which b = 0. Hyperproteinemia associated with angiotensinogen in which b = 0 has = . v 0 004. Renal carcinoma associated with 12 compounds that have = . b 0 21 has = . v 0 0038. Other examples include renal tubular dysgenesis (b = 0, = .
) v 0 0005 . Whether a disease compound belongs to the LBC and the LCC of the human metabolic network depends on the cross-species biconnectivity b. The disease compounds outside the human LBC have b = 0, with few exceptions [Fig. 5(c)]. Many disease compounds in the human LBC have b = 1, whereas some have b < 1. Diverse features of the human metabolic networks may contribute to preventing local perturbations from spreading, and the compound biconnectivity may provide the relevant information. The compounds with high biconnectivity are less vulnerable than the compounds that exhibit low biconnectivity. In Fig. 5(d 0 07 , thus implying that the compounds with high biconnectivity are stabilized better than the compounds with low biconnectivity, while they are all in the human LBC.
The vulnerability of a compound depends on its lethality and its capacity to maintain its normal state, which can be roughly represented as follows: The closeness c i of a compound can quantify its lethality, and the biconnectivity b i represents a measure of the maintainability of the normal state under perturbations. The vulnerability v is expected to increase with the closeness c if other conditions are identical. However, the lethality and the maintainability are not independent of each other; evolutionary pressure is imposed to enhance the maintainability of lethal compounds, as supported by the correlation between the closeness c and the biconnectivity b shown in Fig. 4(b). The correlation between b and c makes it difficult to identify their independent contributions to vulnerability, as indicated in the weak correlation between v and c in Fig. 5(d). Interestingly, for compounds outside the human LBC, a higher biconnectivity is associated with greater vulnerability These findings suggest that the network properties of the metabolic compounds in other species can be helpful for understanding human diseases. The cross-species features reflect the generic features of each compound identified in diverse metabolic networks that have evolved in different ways; thus, they carry information that could not be identified by only analyzing the human metabolic network. Our findings could be useful in the search for drug targets and medical treatments. For example, if we must choose one of two candidate target compounds to set up alternative pathways and thereby suppress the spread of perturbations initiated somewhere in metabolism, we can first select the compound with increased biconnectivity. We expect this approach to establish the additional pathway and make it stable relatively easily.

Discussion
Our study illustrates a novel use of the cross-species cellular network data that are rapidly accumulating in this post-genomic era. Metabolic pathway organization is essential to allow living organisms to be resilient to perturbations; thus, we performed an extensive quantitative analysis of the biconnectivity of the cellular metabolism in hundreds of species. As a result, we were able to understand how biconnectivity is achieved on small and large scales in real metabolic networks in the presence of other structural features, such as modularity and heterogeneous connectivity patterns. The idea that biconnectivity serves for resilience to perturbations is supported by the negative correlation between the prevalence of associated diseases and the biconnectivity of compounds. Furthermore, we demonstrated that the pattern of disease prevalence can be better understood by considering the dependence of the compound vulnerability on both the lethality and the biconnectivity, suggesting its potential use in drug design and medical treatments.
Our study does have several limitations. We used the biomass components known in the well-curated metabolic reconstructions 55,56 as the biomass components of the species studied in our work as long as they were present in the metabolic network of a given species, which must be curated by plugging the species-specific biomass components. The development of a biconnectivity measure that incorporates the direction of chemical reactions could better elucidate pathway organization in metabolism. Although we focused on individual compounds, the properties of pairs of compounds in terms of their multiple paths deserve investigation. The present work focused on the structure of metabolic networks; however, an understanding of the dynamics, e.g., flux-balance modeling, would allow us to identify different sets of pathways that are active in wild-type and perturbation conditions, which would thereby facilitate an understanding of homeostatic mechanisms in living organisms. It would also be interesting to investigate structural biconnectivity and dynamical flux-coupling relations. The idea that biconnected compounds are resilient to perturbations can be confirmed and better supported by in-depth studies regarding the stoichiometry of specific compounds.

Methods
Database. We used the BioCyc database version 13.1 8 , which included 506 species, to construct the bipartite metabolic networks of 453 bacteria, 34 archaea, and 19 eukarya. The latest version is 19.1, which covers 5700 metabolic pathways as of 2015. The same compounds in different compartments were considered to reflect different nodes. The transport reactions between different compartments were included.
Clustering coefficient of a bipartite network. A diamond-shaped subgraph (a quad) that involves two metabolic compounds and two reactions involves two disjoint pathways that connect the two compounds. For a bipartite metabolic network of the adjacency matrix A cr , we defined the clustering coefficient as follows: which is the ratio of the total number of quads that involve distinct pairs of compounds to the total number of reactions that involve pairs of compounds. If two compounds had k distinct reactions processing them, the number of quads would be k 2 . Therefore, C would be larger than 1 if all pairs of compounds had more than one reaction processing them. If all pairs of compounds were involved in only one reaction, C would be zero. This definition of the clustering coefficient is rather different from the previous definitions for bipartite networks 57-59 . Popularity, biconnectivity, and closeness. Let  be the set of the 506 species considered in this work. Using =  F 1 i or 0 to represent whether a species  has a compound i or not, respectively, and =  B 1 i or 0 to represent whether a species  has the compound i in the LBC or not, respectively, we can represent the popularity and the biconnectivity as