Activity networks determine project performance

Projects are characterised by activity networks with a critical path, a sequence of activities from start to end, that must be finished on time to complete the project on time. Watching over the critical path is the project manager’s strategy to ensure timely project completion. This intense focus on a single path contrasts the broader complex structure of the activity network, and is due to our poor understanding on how that structure influences this critical path. Here, we use a generative model and detailed data from 77 real world projects (+ $10 bn total budget) to demonstrate how this network structure forces us to look beyond the critical path. We introduce a duplication-split model of project schedules that yields (i) identical power-law in- and-out degree distributions and (ii) a vanishing fraction of critical path activities with schedule size. These predictions are corroborated in real projects. We demonstrate that the incidence of delayed activities in real projects is consistent with the expectation from percolation theory in complex networks. We conclude that delay propagation in project schedules is a network property and it is not confined to the critical path.


Results
Generative model of project schedules. A project schedule is generated using a standardised procedure. In that process planners take into account the state of the art of contractors operations. If specialization occurs and the work of a former contractor doing activity A is now carried on by two contractors doing activities A1 and A2, then we would experience a change of A to A1 and A2 when looking at schedules before and after this specialization.
The evolution of project schedules (or activity networks) in time can be seen as the outcome of a growth process, where a parent activity can be duplicated or split (Fig. 1A). Generic activities can be duplicated and www.nature.com/scientificreports/ broken into two smaller activities that run in parallel, both inheriting all predecessors and successors of the parent activity. Specialised activities can be split into two activities executed in sequence, such that one specialised contractor executes the first part and another the second. Starting from two activities executed in sequence (Fig. 1B), we can grow the network by a stochastic sequence of duplication and split events, with a probability of duplication q. For small q, activities will be mostly split, generating a mostly linear activity network. For large q, most activities are duplicated, leading to a network with numerous parallel paths (Fig. 1).
Node duplication, also known as copying, has been studied in the context of web networks and protein interactions networks [10][11][12][13] . It has been shown that node duplication generates networks with a power law probability distribution in the number of links associated to a node [10][11][12][13] . In the Methods section we demonstrate that this is indeed the case for our model of duplication-split activity networks, but with a twist. We can show that the distributions of the number k of predecessors and successors to an activity follow the same power law p k ~ k -1/q , where p k is the probability that an activity has k predecessors (or successors). Our calculations are validated by numerical simulations of the duplication-split model (Fig. 2).
Once we create activity networks, we can populate synthetic project schedules by assigning durations to each activity. We now have project schedules with a critical path, a sequence of activities from the start to the end of the project. The latter carry as a consequence that delaying the finish of any activity in the critical path delays the project end date by the same amount.
Shrinkage of the critical path. Critical path is the perceived centrepiece in project management due to its sensitivity to delays. Yet, a look at the synthetic activity networks in Fig. 1C-E made us question whether that critical-path-centric view is valid for modern projects, given that modern projects have complex structures with many parallel paths of work happening at the same time 6 .
In cases where activity networks are quasi-linear, the critical path is indeed the dominating structural feature (Fig. 1C, q = 0.1). In contrast, in the q = 0.9 activity network we observe a large number of parallel paths with similar number of activities (Fig. 1E, q = 0.9). It is in these cases that the concept of the critical path may be of less relevance to manage the delay risk of the project.
Following these qualitative observations, we show that the larger the project network, the smaller the relative size of the critical path. Furthermore, the larger the duplication probability q, the smaller the relative number of activities in the critical path, in agreement with the visual inspection of the q = 0.1 and 0.9 synthetic activity www.nature.com/scientificreports/ networks in Fig. 1C,E. We determine the number c of critical path activities in a network of n activities and duplication parameter q. We estimate that c ~ n 1-q and therefore the fraction of activities in the critical path decreases as c/n ~ n -q . Numerical simulations corroborate the c/n ∼ n −α(q) scaling, albeit with α(q) ≤ q (Fig. 3). We note that the duplication-split networks are not small-world networks 14,15 . In small-world networks the typical distances between nodes scale logarithmically with network size (c ~ lnn) 15 . Duplication-split networks are a new class of networks with power-law degree distributions and power-law scaling of node distances with network size. In fact, these are fractal networks (c ~ n 1/D ), with a fractal dimension D = 1/(1 − α(q)).

Vanishing critical path in empirical activity networks.
To demonstrate that our observations are representative of the real-world challenge, we shift our focus to empirical data for 77 construction projects (total value + $10bn), with activity networks representing different stages of the project lifecycle, adding up to 323 project schedules. These activity networks vary in size, from 100 to 16,000 activities.  www.nature.com/scientificreports/ Driven by our synthetic schedule analysis, our prediction that the relative size of the critical path decreases as the number of activities increases is further confirmed in the empirical data.
First, we corroborate the distribution of the number of predecessors (in-degree) and the number of successors (out-degree) to an activity are almost identical and they follow a power law decay (Fig. 4A). Assuming the powerlaw decay of the duplication split model, we obtained a maximum likelihood estimate q from the distribution of the number of predecessors and independently from the number of successors. The duplication-split model predicts that the two should coincide. Indeed, the data for the construction projects fall at or in the vicinity of the equality line (Fig. 4B). Furthermore, the duplication q index of real projects is distributed between 0.1 and 0.5, with most values between above 0.2 (Fig. 4C).
Second, we tested the c/n ∼ n −α scaling of the fraction of activities in the critical path. The fraction of activities in the critical path c of real activity networks decreases as the number of activities n increases (Fig. 4D, blue symbols). This decrease approximately follows the scaling c/n ∼ n −α with α = 0.79.
Network complexity drives delay risk. Now we switch our attention to delay propagation in activity networks. Exogenous delays such as extreme weather events, pandemics or financial crises can cause some activities to be delayed beyond their planned finish date. When activity delays exceed the spare time between activities (free floats) they propagate downstream triggering a delay cascade. We view activity delays exceeding the free floats as microscopic events and the delay cascades reaching the project end as the macroscopic behaviour. The microscopic events are quantified by the probability p that an activity dependency will transmit a delay. The macroscopic behaviour is quantified by the fraction of activities where the activity delay exceeds its total float. We call the latter the delay incidence.
If the critical path is a key delay risk factor, then the incidence of delay across activities should increase with increasing p × c, where c is the critical path size as denoted above. However, when we plot the delay incidence vs p × c we actually observe a negative non significant correlation (Fig. 5A, Pearson correlation coefficient = − 0.1, significance = 0.7). Therefore, the delay risk is not determined by the critical path size. www.nature.com/scientificreports/ If the critical path vanishes for large projects, and we know that almost all complex projects are delayed, where does this risk come from? After ruling out the standard hypothesis (critical path) we shift our focus to activities outside the critical path.
We use percolation theory as a framework to help us quantify the propensity of the project to exhibit a delay, driven by delays at the activity level [16][17][18] . Bond percolation indicates that when p exceeds a critical threshold p c delay cascades will take place with a finite probability. For directed networks with uncorrelated in-degrees and out-degrees p c = 1/ < k > 18 , where < k > is the average out-degree. Percolation theory predicts a phase transition from no macroscopic cascades when p < p c to a finite risk of macroscopics cascades for p > p c .
This is exactly what we observe for real project networks (Fig. 5B), highlighting that project end overruns are indeed a property of the whole network. For p < p c the delay incidence is below 1%, almost no risk of project delay. In contrast, for p > p c the delay incidence increases gradually, and in some cases impacting 15% of the entire project. We note that for some projects with p > p c the delay incidence is below 1% and the confidence interval reaches zero (Fig. 5B, orange band, p-p c > 0). This is expected from percolation theory. The occurrence of macroscopic events is probabilistic. What is different from zero is the probability that such macroscopic events occur.

Conclusions
We focus on activity networks that describe large-capital projects, showing that their broader structure contains information about their propensity for delays. Our first contribution is the introduction of the duplicate-split model, and the fact that the duplication index q is a core feature of activity networks. Networks with small q are indicative of quasi-linear topologies, and a good fit for using the critical path. Large q indicates a complex project, where the critical path is relatively small, and parallel paths tend to dominate the overall structure. We then use synthetic and empirical data to both validate the output of the duplicate-split model. Our second contribution is showing that the number of activities in the critical path decreases as n -α and therefore the critical path vanishes in the limit of large activity networks. As a result, the critical path is of limited applicability when it comes to large and complex projects. Our third contribution is the application of percolation theory in order to go beyond the limitations of critical path analysis, whilst showcasing that project end overruns are a network property.

Methods
Estimation of the degree distribution. Let n k (n) the number of activities with k predecessors in the activity network. As new activities are added n k (n) changes according to the equation The first term inside […] corresponds to activities with k-1 predecessors and the duplication of one predecessor with probability (k-1)/n, moving to the k predecessors group. The second term inside the […] is the same but for activities with k predecessors, moving from the k predecessors group. The third term inside […] is the chance that an activity with k predecessors is duplicated, thus generating a new activity with k predecessors. Finally, the last term in (1) is the creation of a new activity with one predecessor following a splitting event, where δ k1 = 1 if k = 1 and 0 otherwise (Kronecker delta symbol).
Assuming a steady state solution we obtain We can iterate this equation to obtain an expression for all k > 1 as a function of p 1 (1) n k (n + 1) = n k (n) + q k − 1 n n k−1 (n) − k n n k (n) + 1 n n k (n) + 1 − q δ k1 (2) p k = q (k − 1)p k−1 − kp k + p k + 1 − q δ k1 . The same reasoning can be repeated using k as the number of activity successors. That is, the distributions of the number of predecessors (in-degree) and successors (out-degree) are identical in the n → ∞ limit.
Estimation of the critical path size. Consider a network schedule with n activities and c activities in the critical path. As new activities are added, the size of the critical path can increase if a task in the critical path is subject to the split rule. Since the split rule is executed with probability 1-q at each activity addition and the probability that the activity selected for splitting is in the critical path is equal to c/n, then

Integrating this equation we obtain
This result is an approximation. As the network grows there could be changes in what activities are in the critical path, making the critical path shorter. We conjecture the scaling c/n∼n -α(q) , where α(q) ≤ q.
Python code for the duplication-split model.  1) We generate an activity network by successive application of the duplication/split rules up to we reach n activities. At each activity addition we select an activity with equal probability among all current activities in the network, execute the duplication rule with a probability q otherwise the split rule. (2) We assign a duration x to each activity from a distribution with probability density function f(x). Here we use an exponential distribution with mean 1 day. (3) We assume that all activity relations are of the standard Finish-Start type, that all activities with no predecessors start at day 0 and apply forward/backward passing 6,7 to determine the early and late start and end dates for all other activities. Average statistics and distributions are estimated from 100 simulations of these steps for each set of parameters (n,q).
Critical path. Once a schedule has been generated, we perform a second backward pass to calculate the total float of each activity. The total float is defined as the amount of time that the end date of an activity can be postponed without affecting the project end date 6,7 . The critical path is the set of activities with total float 0 and it will be denoted by C. The size of C, the number of activities in the critical path, is denoted by c.
(3) www.nature.com/scientificreports/ Probability of delay transmission. We estimate the probability p that an activity dependency will transmit delays by looking at all completed activities, and computing the fraction of dependencies with slack time that is smaller than the delay at the parent activity, across all relations out-going from finished activities.
Control parameter of the critical path method. The probability that there are no delay transmissions in the critical path is P(p,c) = 1 − (1 − p) c . For small p it can be approximated by P(p,c)≈1 − e −pc . This later equation shows that the delay risk associated with the critical path should decrease with increasing pc.
Empirical data of construction projects. The dataset is composed of 77 construction projects, with multiple project schedules for each construction project, totalling 323 project schedules. The project schedules were generated by the project managers using an industry standard enterprise software package (Oracle Primavera P6).

Data availability and code availability
All the data necessary to support our conclusions is reported in the figures. Code for the duplication-split model is provided in the methods. Raw data for construction projects has restricted access and can be provided upon consultation. Request for data should be directed to corresponding authors.