Articulation points in complex networks

An articulation point in a network is a node whose removal disconnects the network. Those nodes play key roles in ensuring connectivity of many real-world networks, from infrastructure networks to protein interaction networks and terrorist communication networks. Despite their fundamental importance, a general framework of studying articulation points in complex networks is lacking. Here we develop analytical tools to study key issues pertinent to articulation points, such as the expected number of them and the network vulnerability against their removal, in an arbitrary complex network. We find that a greedy articulation point removal process provides us a different perspective on the organizational principles of complex networks. Moreover, this process results in a rich phase diagram with two fundamentally different types of percolation transitions. Our results shed light on the design of more resilient infrastructure networks and the effective destruction of terrorist communication networks.

A fundamental challenge in studying complex networked systems is to reveal the interplay between network structure and function 1,2 . Here we tackle this challenge by investigating a classical notion in graph theory, that is, articulation points. A node in a network is an articulation point (AP) if its removal disconnects the network or increases the number of connected components in the network 3,4 (Fig. 1a). Those APs can be easily identified using a linear-time algorithm based on depth-first search 5 . It has been found that APs play important roles in ensuring the robustness and connectivity of many real-world networks. For example, in infrastructure networks such as air traffic networks or power grids, APs, if disrupted or attacked, pose serious risks to the infrastructure 6,7 . In wireless sensor networks, failures of APs will block data transmission from one network component to others 8 . In the yeast protein-protein interaction network, lethal mutations are enriched in the group of highly connected proteins that are APs 9 . Analysis of APs hence provides us a different angle to systematically investigate the structure and function of real-world networks.
Despite the importance of APs in ensuring the robustness and connectivity of many real-world networks, we still lack a deep understanding on the roles of APs in many complex networks. Can we design an AP-based attack strategy to more efficiently destroy malicious networks? Can we develop an AP-based network decomposition method to better reveal the organizing principles of complex networks? What happens if we keep removing APs from a random graph or a real network? Will there be a core left? If yes, what's the implication of such a core in terms of structural integrity and functionality of the network? How to quantify if a real network has overrepresented or underrepresented APs comparing to its randomized counterparts? In this article we offer an analytical framework to study those fundamental issues pertinent to APs in both real networks and random graphs, harvesting a series of interesting results.

Results
Articulation point-targeted attack. Representing natural vulnerabilities of a network, APs are potential targets of attack if one aims for immediate damage to a network. Note that the removal of an AP in a network may lead to the emergence of new APs in the remainder of the network, that is, new potential targets of attack (Fig. 1b). This fact inspires us to design a brute-force AP-targeted attack (APTA) strategy: iteratively remove the most destructive AP that will cause the most nodes disconnected from the giant connected component (GCC) of the current network. Given a limited 'budget' (that is, the number of nodes to be removed), this APTA strategy is very efficient in reducing the GCC, compared with strategies based on other node centrality measures, such as degree 10,11 and collective influence 12 . Indeed, we find that for a small fraction of removed nodes APTA leads to the fastest reduction of GCC for a wide range of real-world networks from technological to infrastructure, biological, communication, and social networks (Supplementary Note 1; Supplementary Fig. 1). Depending on the initial network structure, APTA would either completely decompose the network or result in a residual GCC that occupies a finite fraction of the network. This residual GCC is a biconnected component (or bicomponent), in which any two nodes are connected by at least two independent paths and hence no AP exists 5 . For accuracy, we will call it residual giant bicomponent (RGB) hereafter. This RGB naturally represents a core that maintains the structural integrity of the network.
Greedy articulation points removal. We also find that the identification and removal of APs provide us a new perspective on the organizational principles of complex networks. For example, in the terrorist communication network of the 9/11 attacks on U.S. (Fig. 1a), each AP member (shown in red) can be considered as a messenger of a particular subnetwork, because any information exchange between that subnetwork and the rest of the network passes through the AP 13 . All the APs and their associated subnetworks in the original network constitute the first layer of the terrorist network. After removing all the APs in the original network, the first layer is peeled off, new APs emerge and the second layer of the network is exposed. We can repeat this greedy APs removal (GAPR) process until there is no AP left in the network. Note that at each step we simultaneously remove all the APs present in the current network. Figure 1 illustrates this network decomposition process in the 9/11 terrorist communication network, which has 62 terrorists. We find that this network consists of three layers and an RGB of 26 nodes. (Note that the RGB associated with the GAPR process is similar to but not necessarily the same as that of the APTA process, see Supplementary Note 2; Supplementary Fig. 2.) Interestingly, a Greedy articulation points removal (GAPR)

Residual giant bicomponent (RGB)
Step-0 Step-1 Step-2 Step-3 b c d among those 26 RGB nodes, 16 of them are hijackers in the 9/11 terrorist attack, which in total has 19 hijackers. In a sense, this RGB serves as a core maintaining the functionality of this covert network, which has a particular goal-hijacking. Note that some of the hijackers in the RGB are not hubs (that is, highly connected nodes), but only have two or three neighbours in the network. Hence they cannot be easily identified through traditional network decomposition methods, for example, maximum clique 3,4 , k-core decomposition 14,15 and t-core decomposition [16][17][18] , which are designed to uncover or extract a dense core structure consisting of highly connected nodes (Supplementary Note 3; Supplementary Fig. 3). Interestingly, we find that the two RGBs associated with the APTA and GAPR processes significantly overlap for many realworld networks (Supplementary Note 2; Supplementary Fig. 2). Note that, compared with APTA, the GAPR process is deterministic and avoids the optimization of the damage caused by nodes removal, which make it analytically solvable. Hereafter we focus on the RGB obtained from the GAPR process.
Articulation points and residual giant bicomponent in real networks. The results presented in the previous subsections prompt us to study the fraction of APs (n AP : ¼ N AP /N) and the relative size of the RGB (n RGB : ¼ N RGB /N) in a wide range of realworld networks. Here N AP , N RGB and N represent the number of APs, the number of nodes in the RGB (obtained from the GAPR process), and the number of nodes in the whole network, respectively.
We find that many real networks have a non-ignorable fraction of APs and a rather small RGB (Fig. 2a). One may expect that infrastructure networks should have a relatively small fraction of APs and a large RGB, and hence are very robust against AP removal. Interestingly, this is not the case. The power grids in two regions of U.S. have almost the largest fraction (B24%) of APs among all the real networks analysed in this work. And they have almost no RGB. The road networks of three states in U.S. have almost 20% of APs, and a small RGB n RGB $ 0:08 ð Þ . These results suggest that infrastructure networks are apparently not optimized with respect to AP removal. Indeed, because of the high cost of adding new links (for example, connecting two power stations with high-voltage transition lines, or connecting two cities with a new highway), infrastructure networks typically lack a high redundancy, but are often optimized with respect to other criteria, such as social profitability. By contrast, among all the 28 food webs we analysed, 22 of them have no APs (and hence n RGB ¼ 1). In other words, those ecological networks tend to be biconnected and the extinction of one species will not disconnect the whole community. This high structural robustness could be because of evolutionary inter-species interactions across the whole community 19 .
More interestingly, we find that most of the real networks analysed here have either a very small RGB or a rather big one (see Fig. 2a, light magenta and turquoise regions). Later we will show that this phenomenon is related to a discontinuous phase transition associated with the GAPR process.
To identify the topological characteristics that determine these two quantities (n AP and n RGB ), we compare n AP (or n RGB ) of a given real network with that of its randomized counterpart. To this aim, we randomize each real network using a complete randomization procedure that turns the network into an Erd+ os-Rényi (ER) type of random network with the number of nodes  20 . We find that most of the completely randomized networks possess very different n AP (or n RGB ), comparing to their corresponding real networks (Fig. 2b,c). This indicates that complete randomization eliminates the topological characteristics that determine n AP and n RGB . By contrast, when we apply a degree-preserving randomization, which rewires the links among nodes, while keeping the degree k of each node unchanged, this procedure does not alter n AP and n RGB significantly (Fig. 2d,e). In other words, the characteristics of a network in terms of n AP and n RGB is largely encoded in its degree distribution P(k). Most of the real-world networks display slightly smaller n AP and bigger n RGB than their degree-preserving randomized counterparts. We attribute these differences to higher-order structure correlations, such as clustering 21 and degree assortativity 22 , which are eliminated in the degreepreserving randomization.
Analytical framework of the greedy articulation points removal process. The results of n AP and n RGB in real-world networks encourage us to analytically calculate n AP and n RGB for networks with prescribed degree distributions 23 . To achieve that, we analyse the GAPR process on infinitely large networks and explore in depth the effect of different degree distributions on n AP and n RGB . Consider the discrete-time dynamics of the deterministic GAPR process, which generates a series of snapshots for the remainder network with a clear temporal order {0, 1,..., t,...., T}. Here, T is the total number of GAPR steps, which is also the number of layers peeled off during the GAPR process. We denote the fraction of APs and the relative size of the GCC in the original network as n AP (0) and n GCC (0), respectively. Removal of the original APs leads to a new fraction of APs n AP (1) and a smaller GCC of relative size n GCC (1). We repeat this process and denote the fraction of APs and the relative size of the GCC of the network snapshot at time step t as n AP (t) and n GCC (t), respectively. At the end of the GAPR process, we have n AP (T) ¼ 0 and n GCC (T) ¼ n RGB . On the basis of the configuration model of uncorrelated random networks 23-25 , we can analytically calculate n AP (t) and n GCC (t) for networks with arbitrary degree distributions at any time step t. This enables us to further compute T and n RGB . See Methods section and Supplementary Note 4 for the details of our analytical framework of the GAPR process.
Articulation points in classical model networks. The analytical framework of the GAPR process enables us to calculate various quantities of interests. We first investigate the fraction of APs in the original network, that is, n AP ¼ n AP (0). We calculate n AP in two canonical model networks: (1) ER random networks with Poisson degree distributions P(k) ¼ e À c c k /k!, where c is the mean degree (hereafter we also use c to denote the mean degree of a general network); and (2) scale-free (SF) networks with powerlaw degree distributions P k ð Þ $ k À l , where l is often called the degree exponent (Fig. 3a,b). The fraction of APs is trivially zero in the two limits c-0 and c-N, and reaches its maximum at a particular mean degree c AP . For ER networks, we find that c AP ¼ 1.41868?, which is larger than c p ¼ 1, the critical point of ordinary percolation where the GCC emerges 1,2 .
The phenomenon that n AP displays a unimodal behaviour and the fact that c AP 4c p can be explained as follows. The process of increasing the mean degree c of an ER network can be considered as the process of randomly adding links into the network. When the mean degree c is very small (nearly zero), there are only isolated nodes and dimers (that is, components consisting of two nodes connected by one link), and thus n AP -0. With c gradually increasing but still smaller than c p , the network is full of finite connected components (FCCs), most of which are trees (Fig. 3c).
Hence, in the range of 0ococ p most of the nodes (except isolated nodes and leaf nodes) are APs, and adding more links to the network will increase the number of APs (Fig. 3c). When c4c p , the GCC develops and occupies a finite fraction of nodes in the network (highlighted in light blue in Fig. 3d-f). In this case, we can classify the links to be added to the network into two types: (I) links inside the GCC (yellow dashed lines); and (II) links that connect the GCC with an FCC or connect two FCCs (turquoise dashed lines). The probability that an added link is type-I (or type-II) as a function of the mean degree c is shown in Fig. 3a (light blue region). Adding type-I links to the network will never induce new APs, and may even convert the existing APs (see Fig. 3d,e, nodes in black boxes) back to normal nodes. By contrast, adding type-II links will never decrease the number of APs and could convert normal nodes (see Fig. 3d,e, nodes in orange boxes) to APs. The contributions of these two types of links to n AP compete with each other. At the initial stage of this range of c (c4c p ), since the GCC is still small, most of the added links are type-II (Fig. 3a, turquoise dashed line), and thus n AP continues to increase (Fig. 3a, red line). At certain point c AP , where the peak of n AP locates, the contribution of type-I links to n AP overwhelms that of type-II links, hence n AP begins to decrease. When the mean degree c is large enough, the network itself becomes a bicomponent without any AP.
The phenomena that c AP 4c p is even more prominent for SF networks generated by the static model 26 , where c p (l)o1 and c AP (l)41.41868?. This is because, for those SF networks, even though the GCC emerges at lower c p (l), its relative size is rather small at the initial stage of its emergence and the network is more fragmented in FCCs, which results in larger c AP (l) (Fig. 3b) (see Supplementary Note 5 for details).
Percolation transitions associated with greedy articulation points removal. We now systematically study the behaviours of n GCC (t) and n RGB , as functions of the mean degree c for infinitely large ER networks. To emphasize the c-dependence, hereafter we denote n GCC (t) and n RGB of ER networks as n GCC (t, c) and n RGB (c), respectively. To systematically characterize the percolation transitions, various quantities will be analysed, such as the critical mean degree, critical exponents, the jump size of the order parameter at criticality, and so on.
As shown in Fig. 4a (grey lines), after any finite steps of GAPR, the GCC always emerges in a continuous manner, suggesting a continuous phase transition. Hereafter we will call it GCC percolation transition. For t steps of GAPR, the GCC percolation transition displays a critical phenomenon: is the critical mean degree, that is, the percolation threshold, and b(t) is the critical exponent. We find that as t increases, c*(t) becomes larger and larger, but eventually converges to c*(N) ¼ c* ¼ 3.39807?. Note that, for any finite t, the critical exponent b(t) associated with the GCC percolation transition is the same: Fig. 4c, grey lines).
By contrast, if we allow for infinite steps of GAPR (that is, we stop the process only if there is no AP left), the size of the resulting GCC (that is, the RGB), denoted as n RGB (c), displays a remarkable discontinuous phase transition: n RGB (c) abruptly jumps from zero (when coc*) to a finite value at c*, and then increases with increasing c (see Fig. 4a black line). Hereafter we will call it RGB percolation transition. If we denote the jump size as D, we find that n RGB c ð Þ À D $ c À c Ã ð Þ b RGB with critical exponent b RGB ¼ 1/2 when (c À c*)-0 þ (Fig. 4c, black line), suggesting that the RGB percolation transition is actually a hybrid phase transition 14,27,28 . In other words, n RGB (c) has a jump at the critical point c* as a first-order phase transition but also has a critical singularity as a second-order phase transition. Interestingly, the GCC and RGB percolation transitions have completely different critical exponents associated with their critical singularities (Fig. 4c).
We also calculate the total number of GAPR steps T(c) needed to remove all APs of an infinitely large ER network of mean degree c (Fig. 4b). We find that T(c) is finite for coc*; diverges when (c À c*)-0 À ; and is infinite for any c4c*. The divergence of T(c) displays a scaling behaviour T c ð Þ $ c À c Ã j j À g À with critical exponent g À ¼ 1/2 when (c À c*)-0 À (Fig. 4d, magenta line).
The  26 to construct scale-free networks with asymptotically power-law degree distribution P k ð Þ $ k À l . Simulations are performed with network size N ¼ 10 6 and the results (symbols) are averaged over 128 realizations with error bars (s.e.m.) smaller than the symbols. Lines are our theoretical predictions. (c-f) Illustrations of articulation points (red nodes), type-I links (yellow dashed lines) and type-II links (turquoise dashed lines) in Erd+ os-Rényi random networks of different mean degrees. Note that adding a single type-II link at most convert two normal nodes to articulation points (orange boxes), while adding a single type-I link could convert much more articulation points back to normal nodes (black boxes). This explains why the peak of n AP emerges even though the probability of adding type-II links is still larger than that of adding type-I links. The largest connected component is highlighted in light blue in d-f. dynamics of the GAPR process. In particular, we can calculate n GCC (t, c), n AP (t, c), as well as a key quantity in the GAPR process, that is, the average number of newly induced APs per single AP removal: Z(t, c) ¼ n AP (t, c)/n AP (t À 1, c) for t40 and at different mean degrees c (Fig. 4e-g).
For c4c* (supercritical region), the fraction of APs exponentially decays as n AP t; c ð Þ$ exp À t=T c ð Þ À Á after an initial transient time (Fig. 4f, turquoise lines), whereT c ð Þ is the characteristic time scale. In this region, with increasing t, Z(t, c) quickly reaches an equilibrium value Z 1; c ð Þ¼exp À 1=T c ð Þ À Á , which is smaller than 1 (Fig. 4g, turquoise lines). Consequently, n GCC (t, c) converges to a finite value for t-N (Fig. 4e, turquoise lines), resulting in a finite n RGB . Since T(c) is infinite in this region, we can useT c ð Þ to characterize the relaxation behaviour of GAPR process. We find thatT c ð Þ increases as c decreases (Fig. 4b, turquoise line), and diverges asT c ð Þ $ c À c j * | À g þ with critical exponent g þ ¼ 1/2 when (c À c*)-0 þ (Fig. 4d, turquoise  line). Note that as c decreases and approaches c* from above, the equilibrium value Z(N, c) gradually approaches 1 (Fig. 4g).
When (c À c*)-0 þ (that is, right above the criticality), the fraction of APs decays in a power-law manner for large t, that is, n AP $ t À z with z ¼ 2 ( Fig. 4f and inset, black line), rendering Z(N, c) ¼ 1 (Fig. 4g, black line). Consequently, n GCC (t, c) converges to a finite value in the t-N limit (Fig. 4e, black lines), leading to a finite n RGB . The fact Z(N, c) ¼ 1 suggests that in average every removed AP will induce one new AP at the next time step, and hence the GAPR process will continue forever. This explains why T(c*) diverges. Note that the equilibrium value Z(N, c) ¼ 1 can only be reached when n AP (t, c) displays a powerlaw decay as t-N. This is because, as long as n AP (t, c) is finite at any finite t, the GAPR will gradually dilute the network, rendering a larger and larger value of Z(t, c) as t grows.
When (c À c*)-0 À (that is, right below the criticality), as well as in the entire subcritical region (coc*), after an initial decay, n AP (t, c) begins to exponentially grow with increasing t (Fig. 4f,  magenta lines). Consequently, Z(t, c) is initially o1, but then becomes drastically larger than 1 (Fig. 4g, magenta lines), which causes n GCC (t, c) quickly decays to zero (Fig. 4e, magenta lines), and hence T(c) is finite and the RGB dose not exist. The sudden collapse of the RGB upon an infinitesimal decrease in c suggests the discontinuous nature of the RGB percolation transition in ER networks. Note that at time step T, the network will break into pieces and there is no AP left. Hence in the last few GAPR steps the growth of n AP (t, c) will slow down and eventually decrease (Fig. 4f,g, tails in the magenta lines).
The nature of the RGB percolation transition in SF networks is qualitatively the same as that in ER networks. The transition from the non-RGB phase to the RGB phase is discontinuous (Fig. 5a). The critical point c*(l) increases with decreasing l. Also, the jump of the RGB relative size at criticality increases as l approaches 2 (Fig. 5a).
The c À l phase diagram of SF networks is shown in Fig. 5b. The whole diagram consists of three phases. For coc p (l) (grey region), there exists no GCC in the network, and hence no RGB. For c p (l)ococ*(l) (light blue region), even though the GCC may survive after certain finite steps of GAPR, the RGB still does not exist. Since in both regimes there is no RGB, we call them non-RGB-I phase and non-RGB-II phase, respectively. The transition between these two phases is the ordinary GCC percolation transition, which is continuous (thick dashed line). Note that, in non-RGB-II phase, the phase transition associated with the emergence the GCC after any finite t steps of GAPR is still continuous (thin dot-dashed lines). For c4c*(l) (light yellow region), the network suddenly has an RGB. This regime is referred to as the RGB phase. As we mentioned above, the transition between the non-RGB-II phase and the RGB phase is discontinuous (thick solid line). We have performed extensive numerical simulations to confirm our analytical results (Supplementary Note 6; Supplementary Figs 4-6).

Discussion
In this article, we systematically investigate AP-related issues in complex networks. Many interesting phenomena of APs are discovered and explained for the first time. On the empirical side, we proposed two AP-based applications: a network attack strategy (APTA) and a network decomposition method (GAPR). We found that, given a limited 'budget' (that is, the number of nodes to be removed), our APTA strategy is more efficient in reducing the GCC of the network than other existing strategies. In revealing the core-periphery structure of complex networks, our GAPR method is quite different from traditional network decomposition methods in the sense that our identified core may include low-degree nodes. Those sparsely connected nodes can be functionally very important, but they are always ignored in traditional decomposition methods. On the theoretical side, we proposed an analytical framework to calculate various AP-related properties, among which the emergence of the RGB as a discontinuous percolation transition is of great theoretical interest. This finding also provides a theoretical explanation of the empirical findings that most of the real-world networks have either a very small RGB or a rather big one.
Taken together, our results offer a different perspective on the organizational principles of complex networks, shed light on the design of more resilient infrastructure networks and more effective destructions of malicious networks, and open new avenues to deepening our understanding of complex networked systems. Since the identification of APs also helps us better solve other challenging problems, for example, the calculation of determinants of large matrices 44 , and the minimum vertex cover problem on large graphs (a classical NP-hard problem) 45 , we anticipate that our results on APs will trigger more research activities on those problems as well.

Methods
Theoretical analysis of greedy articulation points removal process. Our theoretical treatment of the GAPR process is based on the local tree approximation, which assumes in the thermodynamics limit (that is, network size N-N) there are no finite loops in a network and only infinite loops exist [23][24][25]28 . This approximation allows us to use the convenient techniques of random branching processes to solve the GAPR process on large uncorrelated random networks (Supplementary Note 4). Note that the local tree approximation is only exact for networks with finite second moment of the degree distribution. However, it has been demonstrated in various network problems that this approximation can obtain very accurate results even for networks with diverging second moment of the degree distribution 28 . Here we find that this local tree approximation works very well in analysis of the GAPR process ( Supplementary Notes 4 and 6). At each time step t during the GAPR process in a network G, we classify the remaining nodes into the following three categories or states: (1) a t -nodes: nodes in FCCs; (2) b t -nodes: nodes that are APs in the GCC; (3) g t -nodes: nodes that are not APs in the GCC. Note that if a node is a g t -node, it must be g t -node with tot. (The notations b t and g t here have totally different meanings from the critical exponents b and g mentioned in the main text.) According to the local tree approximation (Supplementary Note 4), the state of a randomly chosen node i can be determined by the states of its neighbours in G n i, that is, the induced subgraph of G with node i and all its links removed. In other words, in order to determine the state of a node, we need to know the states of its neighbours. Therefore, at each time step t, we need to know the probability that, following a randomly chosen link to one of its end nodes, this node belongs to any of the above categories after this link is removed. These probabilities are denoted as a t , b t , and g t , respectively. Note that for convenience sake here we use the same notation to denote both the state of a node and the probability of a node in that state. To be precise, hereafter when we consider the state of a neighbour of a given node i, we mean the state of the neighbour in the induced subgraph G n i.
The GAPR process can be fully characterized by the three sets of probabilities {a 0 , a 1 ,...}, {b 0 , b 1 ,...} and {g 0 , g 1 ,...}. Note that every node must belong to one of the three categories, which means the three sets of probabilities are not independent from each other. Specifically, at time step t, the probability g t can be derived by the other two sets of probabilities through the following normalization condition: Hereafter we focus on a t and b t only. We can calculate {a 0 , a 1 ,...} and {b 0 , b 1 ,...} in an iterative way. At first, we consider the initial time step t ¼ 0. The self-consistent equations for a 0 and b 0 are given by where Q(k) ¼ kP(k)/c is the degree distribution of the nodes that we arrive at by following a randomly chosen link (a.k.a. the excess degree distribution) 1,2 . We derive the above equations based on the following observations: (1) a 0 -node: its neighbours can only be a 0 -nodes; (2) b 0 -node: since it is an AP node, at least one of its neighbours is an a 0 -node. Moreover, since it belongs to the GCC, at lease one of its neighbours is not an a 0 -node. For the t-th GAPR time step (t40), we can compute a t and b t as follows: The derivations of equations (4 and 5) are based on the following observations: (1) a t -node: First, its neighbours can only be a t -nodes or b t -nodes with tot (because if one of its neighbours is a t -node with tot, this node will be an AP before time step t and hence would have already been removed; if one of its neighbour is b t -node, this node will belong to the GCC at time step t). Second, its neighbours can not be all b t -nodes with tot À 1. Otherwise this node will be a leaf node before time step t À 1, and will become an isolated node before the t-th time step. In this case, we can not reach this node through a randomly chosen link at time step t. (2) b t -node: First, its neighbours can not be a t -nodes with tot, otherwise this node would have been removed before time step t. Second, since it is an AP node, at least one of its neighbours is an a t -node. Finally, since this node belongs to the GCC, at least one of its neighbours is neither a t -node nor b t -node with tot.
Diagrammatic representations of these probabilities (a t , b t , g t ) and their relationship are shown in Supplementary Figs 7-9 (see Supplementary Note 4 for details).
By solving the above self-consistent equations, we can obtain {a 0 , a 1 ,...} and {b 0 , b 1 ,...}, which govern the whole process of GAPR. With these two sets of probabilities, we can compute any quantities of interest, such as the total number of GAPR steps, the fraction of APs, the relative size of the GCC and the RGB, and so on (Supplementary Note 4; Supplementary Figs 10-12).
Data availability. The data that support the findings of this study are available from the corresponding author on reasonable request.