Introduction

Air transportation systems have been traditionally described as graphs with vertices representing airports and edges direct flights during a fixed time period1,2. These graphs are called airport networks and have been studied at different geographical resolution scales, restricted, for instance, to a single country (usually the U.S. (USAN)3,4,5,6 but also China7 or Europe8), or for the whole world (WAN)1,2. These networks show high heterogeneity in the distribution of connections per airport and in the traffic sustained by each connection. A non linear relation between the number of connections of the airports (topology) and the number of passengers (traffic) has been observed in Ref. 1 and used later for modeling9. Furthermore, airport networks are structured in clusters of highly interconnected airports that reflect the geographical areas in which the traffic is naturally divided10. The dynamics of the connections and the traffic levels have been also analyzed for the USAN4. All of these are aspects of the graphs that influence their capability to transport persons, goods and even other less desirable passengers. For example the propagation of infectious diseases at a global scale that occurs when infected persons travel across the network11,12,13,14,15. The modeling and forecasting of disease spreading patterns using air traffic data is a story of a notable success13,14,15. One can, thus, wonder if this success can be extended to the propagation of other phenomena. In particular, we are interested in considering here flight delays and the way in which congestion can become a systemic risk.

According to the 2008 Report of the Congress Joint Economic Committee, flight delays have an economic impact in the U.S. equivalent to 40.7 billions of dollars per year16, while a similar cost is expected in Europe17,18. The situation can turn even grimmer in the next decade since the air traffic is envisaged to increase16,17,18,19. Delays damage companies' balances due to enhanced operation costs contributing to deteriorate their image with costumers20. Passengers suffer a loss of time, even more acute in case of missing connections, that translates into decreased productivity, missed business opportunities or leisure activities. Additionally, attempts to recover delays lead to excess fuel consumption and larger CO2 emissions. As a consequence of this challenging situation, a considerable effort has been invested in the area of Air Traffic Management to characterize the sources of initial (primary) delays21,22 and the way in which they may be transferred and amplified by consequent operations, the so-called reactionary delays19,23,24,25,26,27. The concept of delay itself implies a time difference with respect to the baseline provided by a predefined schedule21,24. The propagation of delays thus corresponds to the spreading of a malfunction across the system. The mechanisms responsible for it reflect the complexity of air traffic operations. Apart from the airport networks structure and dynamics, other factors contributing to the delay propagation are airport congestion25, plane rotation or crew and passenger connection disruptions23,26,27. Airline schedules typically include a buffer time to deal with all these issues. However, when this time is not enough, the departure of the next flight gets delayed and can affect further operations in a cascade-like effect23. There have been several attempts to model delay spreading28,29,30,31,32,33. These studies differ in the level of detail included but in general they consider the effects of delays or disruptions in the operations of a few major airports (hubs). In this work, we take instead a network-wide perspective to analyze the performance of a transportation system. We define metrics able to quantify the level of spread of the delays in the network. We then apply these metrics to a database with information on the operations in the U.S during 2010 and introduce a model that reproduces the delay propagation patterns observed in the data. The model shows also a notable capacity to evaluate the risk of development of system-wide congestion and to assess the resilience of daily schedules to service disruptions.

Results

Database

The data was downloaded from the web page of the Bureau of Transport Statistics (BTS)35. In particular, we used the Airline On-Time Performance Data, which is built with flight statistics provided by air carriers that exceed one percent of the annual national revenue for domestic regular service. The database comprehends 6, 450, 129 scheduled flights operated by 18 carriers connecting 305 different commercial airports. The total flights operated in the US in 2010, not only those that report on-time performance data, sum up 8, 687, 80036. Therefore, the database comprises information accounting for 74% of them. The information per flight includes real and scheduled departure (arrival) times, origin and destination airport, an identification code (tail number) for each aircraft, airline, etc. This data enables us to represent the US airport network and furthermore replicate the scheduled flights for every day of 2010. A detailed description can be found in Section 1 of the Supplementary Information. It is important to note that this schedule is based on real events, which in some occasions may differ from the original planned schedule of the companies. If a flight gets canceled, diverted or even rescheduled the airline may introduce changes in the original schedule that are not possible to trace back. However, given that these flights represent, respectively, the 0.20% and 1.75% of all flights in the database, one can expect these changes not to be of large magnitude.

Model

The modeling approach followed is agent-based at the level of aircrafts and is data-driven in the sense that the daily schedules and the primary delays are obtained directly from real records in the database. This level of realism is necessary to confront the model predictions with the real unfolding of the delay events during each day. Concretely, the model dynamics simulates three main subprocesses: aircraft rotation, flight connectivity and airport congestion. The latter two are independent from each other and can be turned on/off to explore the relevance of each subprocess in leading to network-wide congestion. Aircraft rotation, on the other hand, is intrinsic to the schedule and cannot be suppressed.

The basic time unit of the simulations is one minute, every aircraft state is tracked at this temporal resolution. We assume that the flights are not able to recover delays on air and so the departure delays are equal to those at arrival to destination. Throughout a day, each aircraft follows the connections given in the schedule, the so-called plane rotations. The airports are supposed to have a capacity per hour proportional to the scheduled airport arrival rate with a proportionality factor β. Further arrivals produce delays. Passengers (crew) of incoming flights have a certain probability of connecting with other flights within a time window of 3 hours from the scheduled arrival. The probability of connection is proportional, with a factor α, to flight connectivity levels provided by the BTS for each U.S. airport. A more precise description of the model is included in Section 2 of the Supplementary Information. This model has, thus, two free parameters: α, controlling passenger connectivity and β, accounting for airport capacity. In the following section, we will examine the effect of these parameters on the systemic spread of delays.

Data analysis and comparison with model predictions

Flight delays are defined as the difference between the scheduled and real departure (arrival) times21,24. Actually most of the flights operated in 2010 were on time, even some before schedule, but 37.5% of those reporting performance arrived or departed late. Their delays do not show a characteristic value: the delay distribution displays a broad tail as can be seen in Figure 1A. This implies that most flights arrived late by just a few minutes, while others were hours behind schedule. The shape of the distributions is similar regardless of the arrival or departure nature of the operations. The planned buffer time on ground for each aircraft should help absorb part of the delays, specially those mildest as will be discussed next, thus altering the shape of the departure delay distribution. However, this factor is not able to substantially modify the characteristics of the distributions. Interestingly, the shape of the delay distribution does not change either when the season of the year is considered. Summer concentrates the major part of the year traffic, so the total delay is higher but when the distribution of delay per flight is taken into account both summer and winter behave similarly (see Figure 1B). The overall distributions of delays are thus quite robust. Some small differences can be only observed when one focuses on particular airports. In Figure 1C, the departure delay distribution is plotted for Atlanta, JFK New York and Honolulu airports. While the distributions in Atlanta and New York are similar, the Honolulu airport shows a bias toward larger delays due to its isolation from the continent.

Figure 1
figure 1

Characterization of flight delays in the U.S. during 2010.

(A) Distribution of the delay per flight for arrivals and departures. (B) Distribution of departure delays separating the flights according to the season: Summer and winter. (C) Delay distribution for flights departing from Atlanta Hartsfield-Jackson (ATL), New York John F. Kennedy (JFK) and Honolulu (HNL) airports.

The effect of the buffer time in the airports for absorbing delays can be measured using the Turn Around Time (TAT). The TAT stands for the time spent by an aircraft on ground from arrival to departure from the gate. This measure is associated with airport operational efficiency and is used to improve the planning of flight connectivity and aircraft rotational sequence stability34. We refer as ΔTAT to the difference between scheduled and real times at the gate. On the one hand, a negative value of ΔTAT means that an aircraft stayed at the gate longer than expected and so fresh delay was introduced. On the other hand, a positive ΔTAT shows that the operation was quicker than scheduled and that part of the delay was recovered. In Figure 2A, we depict ΔTAT for each flight along a day in the most trafficked airport of the network: Hartsfield-Jackson in Atlanta (ATL). That day, March 12, happened to be one of the worst in the database in terms of average flight delay. The abundance of positive values of ΔTAT is a prove in favor of the capacity of the airport to recover delays. The distributions of ΔTAT for all the operations in 2010 separated in positive and negative values are displayed in Figure 2B. These distributions, as those for the delays, show long tails, which is a marker of the complex nature of delay spreading mechanisms.

Figure 2
figure 2

In (A) difference between the scheduled and real Turn Around Time (ΔT AT) for operations in Atlanta airport, ATL, on March 12.In (B), distribution of the ΔT AT per flight, separating positive and negative contributions.

The focus so far has been on individual flight delays. We define now a metric of congestion for the full network. To do so, the average delay of all delayed flights during the year is taken as baseline and amounts to 29 minutes. An airport is considered as congested whenever the average delay of all its departing flights over a certain period of time exceeds 29 minutes. Additionally, a daily airport network is built using the flights of the day to assess whether congested airports are organized in connected clusters or not. Note that being in the same cluster is a measure of spatiotemporal correlation of congestion but not necessarily a sign of a cause-effect relation. We apply the same metric in the simulations in order to compare empirical and model results. Maps with the congested airports and the connections between them are shown for different days of the database in Figures 3A–3C. As can be seen, the scenario dramatically changes from day to day: in some days a large cluster surges covering 1/3 of all airports, while in others only one or two airports cluster together. This is confirmed when the size of the largest connected cluster is depicted as a function of the day in Figure 3D. A strong variability is thus the main characteristic of the dynamics of the size of the largest congested cluster. The cumulative distribution of the cluster size is displayed in Figure 3E and it seems compatible with an exponential decay. Even if the fluctuations are large, there exists a well defined characteristic cluster size. Given the cluster variability, an important question to answer is whether the congested airports are recurrent. In panel 3F, we calculate the Jaccard index to compare the sets of airports in the largest cluster in consecutive days or for the top 20 worst and best days. This index is 1 if the clusters are equal and 0 if they are strictly different. Interestingly, the index is relatively low for days with large clusters, which implies that the same airports are not consistently part of the cluster.

Figure 3
figure 3

Clusters of congested airports.

Maps of the congested airports showing also connections between them for days with: (A) low, (B) intermediate and (C) high level of congestion. The airport color codes are: red, congested airport belonging to the largest cluster; orange, congested airport not belonging to the largest cluster; green, airport not congested. Links connecting airports in the largest cluster are in red. In (D) daily size of the largest cluster as a function of time. In (E) complementary cumulative distribution of the size of the largest cluster (log-normal scale). And in (F) Jaccard index comparing airports belonging to the largest clusters in consecutive days or consecutive ranking positions according to the top 20 days with largest or lowest average delay. The maps in the upper panels were generated using basemap in python. The geographical position data is provided by the open source Geometry Engine GEOS (http://trac.osgeo.org/geos date of access: Jan 14, 2013).

In order to compare empirical results and model predictions regarding the evolution of the cluster of congested airports, we run the model fixing the airport capacity parameter β = 1 and fitting the flight connectivity factor α to obtain a maximum cluster size similar to the one observed in the data. By fixing β to 1, we are assuming the same airport capacity as originally scheduled. The results for the temporal evolution of the congested cluster size hour by hour can be seen in Figure 4 for March 12 and April 19. Similar plots for other days of the year are included in the Section 3 of the Supplementary Information. Note that the fit of α is essential to get the maximum of these curves, however all the cluster size evolution predicted by the model follows strikingly well that of the real data. Actually, almost 60% of the airports in the real cluster are correctly identified by the model since they are top ranking when airports are ordered by probability of congestion. Furthermore, by fixing α, without any fitting, the model can predict with 66% accuracy if a day will develop or not a large congested cluster (see Supplementary Information, Section 3 for further details). The model allows us also to explore which are the contributions of the main three ingredients (plane rotation, flight connectivity or airport congestion) to propagate delays. From Figures 4B–C, we can conclude that flight connectivity is the most important factor. One may still wonder if the picture changes when the capacity of the airports is modified. Actually, the model exhibits weak sensitivity to variations on the β coefficient as shown in Figure S13 of the Supplementary Information. Slightly increasing the airport capacity will not ease off the propagation of delays since the main cause of the spreading, flight connections, is independent of it. Conversely, a very strong decrease on the airports' capacity, around 50%, is needed to trigger new primary delays that later on will spread in a cascading effect. This might be the case when generalized severe weather conditions or labor conflicts occur.

Figure 4
figure 4

Comparison model-reality.

Evolution of the largest cluster per hour: (A) for the full model, (B) the model only with plane rotations, (C) only with plane rotations and passenger connections and (D) only with plane rotations and airport congestion. The selected days are the ones with the lowest delay (April 19) and the second day with the largest delay (March 12).

The initial delays affect the outcome of the model. In the results of Figure 4, we take the primary delays for each aircraft from the data as initial conditions for the model. Introducing different initial conditions, we can assess the resilience of a day schedule to an increase of unexpected incidences. This question is explored in Figure 5 where a fraction of randomly selected flights are delayed. The size of the largest cluster is estimated as a function of the fraction of delayed flights and of the intensity of the initial delays. For the sake of simplicity, we set all the initial delays in the simulation equal to a fixed value (delay intensity in Figure 5). The results are displayed for the schedules of two days: April 19 and March 12, which respectively show a very small and very large cluster in the real data. In particular, the average flight delay on March 12 was the second largest in 2010. The congestion on the worst day of the year, October 27, can be explained due to extreme meteorological conditions37,38, while on March 12 no major external event was reported. Therefore, the network-wide propagation of delays in that day was likely caused and driven by internal mechanisms of the system. Comparing in Figure 5 the curves for March 12 and April 19, one notices that the surface representing the largest cluster size for March 12 are displaced toward smaller values of the initial delay intensity or fraction of flights with primary delay. This shows a higher susceptibility of the schedule of this day to disruptive perturbations. Another interesting feature of the curves of Figure 5 is that, given enough primary delays, they show a non-negligible risk of systemic failure regardless of the schedule. The curves in Figure 5 for different values of α also confirm the relevance of connections and crew rotations for the spreading of delays.

Figure 5
figure 5

Assessment of the schedule resilience to develop large clusters.

In the plots, the size of the largest congested cluster is displayed as a function of the fraction of initial delayed flights and of the intensity of the initial delays for a congested March 12 in (B) and (D) and for an uncongested day on April 19, (A) and (C), for two values of the flight connectivity factor α. An initial fixed delay is assigned to randomly chosen flights.

The primary flight delays in a day of real operations do not necessarily localize randomly in the network. If the causes are bad weather, technical or labor issues are more prone to concentrate in a few airports. In Figure 6, this issue is explored by comparing the intra-day evolution of the cumulative size of the largest congested cluster when the initial delays are introduced in the model in two different ways. The first one is by using the primary delays given in the database. The second procedure is by randomly shuffling the flights affected by the primary delays. The values of the real delays in the database are maintained but they are assigned to flights selected at random. The comparison of the curves for the two cases with the real data shows that random perturbations are way more efficient to collapse the system. While airports in general have some capacity to recover delays, the random selection of delayed flights affect a larger number of them and besides concentrate a heavier burden on smaller airports which have less capacity to react. This result evinces that the method followed for schedule evaluation in Figure 5 is conservative in the sense that it considers the schedule under a non favorable scenario for the distribution of primary delays.

Figure 6
figure 6

Time evolution of the cumulative size of the largest congested cluster for different initial delays of the flights: assigned as found in data or to randomly selected flights but keeping the same values as in the data.

Discussion

In summary, we analyze the spreading of delays in an air traffic network. In particular, our results focus on the US airport network in 2010 but the concepts and techniques employed can be easily extrapolated to the analysis of the performance of a generic transport system. We introduce a measure for the level of network-wide extension of the delays by defining when an airport is considered as congested and studying how congested airports form connected clusters in the network. The size of the largest congested cluster displays in the data a high variability from one day to the next. This feature is due to the re-start that the system suffers at the end of each day and points toward the relevance of the daily schedule to define the delay propagation patterns. In addition we introduce a data-driven model able to reproduce the delay evolution observed in the data. The model includes three main mechanisms to spread delays: Plane rotation, flight connections of either passenger or crews and airport congestion. The last two processes can be modulated at will to understand the role that each one of them plays in delay propagation. Our simulations evidence that passenger and crew connections is the most effective single mechanism to induce network congestion. We show how the model can be used to assess the daily schedule ability to deal with an increase in the number of disruptive events and also study the relevance of primary delay localization for the evolution of congestion in the network. Furthermore the model offers the possibility of evaluating the effects of interventions in the system before their real implementation.

Flight delays represent failures to meet constraints imposed by a daily schedule. Its propagation in the network is a paradigmatic example of the way in which a distributed transport system moves toward collapse. The framework develop in this work is thus of easy extension to system with dynamics regulated by predefined schedules. Its translation to other airport networks is, of course, straightforward and even though the modeling of other transportation systems may require some particular details, the applicability of the metrics defined to measure network-wide congestion based on clustering is universal.