Introduction

We seldom hear of a large airspace shutting down. One of the last known examples was triggered by the eruption of the Icelandic volcano, Eyjafjallajökull, in 2010, that led to the cancellation of at least 60% of daily European flights and lasted for five days1. Airspace disruptions of this type affect both, global economic activity and daily life of many people. According to the International Civil Aviation Organization (ICAO), in 2011, 2.9 billion people used the world airline network to realize business and tourism2. Understanding the dynamics and resilience of such a network to failures is a question of paramount relevance. However, this is not a trivial task. The World Airline Network (WAN) was not planned to be resilient at a global scale in the first place. Instead, it is a network designed to cope up with several economic, political and geographical interests. The resilience of such a network is therefore delicate to quantify and may give us a false sense of global connectivity.

Algorithms for analysis and design of complex networks have enabled us to quantify complexity and understand the rationale behind their structure and self-organization3,4,5,6,7,8. Exemplarily, studies on the resilience of various infrastructures, from water transport to the Internet, have provided insights into the evolution and resilience of such networks9,10,11,12,13. In particular, important work has been done in the domain of air transport networks carefully studying the structure14,15 including extraction of its multilevel modular structure16. Guimerà et al.17 analyzed the heterogeneous connection patterns among nodes with different fraction of connections within and outside of their communities that give rise to the dynamics within the air transportation network and other infrastructure networks. In a more recent approach, Cardillo et al. have studied the emerging features of the WAN as a result of the dynamics of cooperation between nodes in the network18,19. However, a worldwide view of the WAN is not comprehensive without exploring the aftermath of failures in light of such an organization.

Here, we extract the non-communal hierarchical structure of the WAN, analyze this network through a disruption approach and develop weighted measures to understand its fragility. We compare these measures to see how the topology of the network complements the empirical evidence and we reveal a completely new picture of the world airline network. We find that besides the strongly connected core, many important hubs are in fact at the centers of star-like structures which are at the “periphery”, or areas of low economic growth. Upon removing these hubs, the entire star loses its connection to the rest of the network. We show that this mechanism is responsible for the vulnerability of the network and indicate this by identifying different regimes of clustering within the network.

Results

Our analysis involves a dataset of the WAN, freely accessible at Openflights20 and passengers serviced at each airport during the year 2011. The WAN comprises N = 3237 nodes as airports and L = 18125 links as direct connections between any pair of airports. An important characterization of a network is its degree distribution which gives us the probability, P(k), of an airport having k connections. The WAN data reveals a scale-free behavior with an exponential cutoff,

with an exponent γ = 1.5 ± 0.1 (see Supporting Information ). The exponential function truncates the distribution around kx ≈ 180, which we confirm by analyzing the tail of the distribution in the cumulative complementary degree distribution. In reality, nodes of a physical infrastructure network always depict a truncated distribution as each node can only sustain up to a certain number of connections.

Each connection might offer multiple alternative flights accounting for its weight. The International Air Transport Association (IATA) assigns a distinct code to these alternatives. The average weighted degree, i.e. the average number of different flights offered from any airport, of the network is . Frankfurt airport that falls within the core has the maximum number of flights, 498 and the maximum number of connections, 255. Whereas, St. Petersburg airport, Tampa Bay, Florida, a star-like peripheral hub, has only 24 connections with 24 flights. The average path length, measured as the average number of minimum connections required from any airport to any other airport, is < l > = 4.05. The largest number of connections a passenger needs to travel between any pair of airports is 12. On average, 33% of the routes can be covered with at most three connections.

The structure of this network is naturally divided into continents. North America has the largest number of airports followed by Asia and Europe. The classical approach to shed light on the structure of networks is through community detection. We have measured the size of closely formed communities based on the number of flights that exist for a connection between any two airports. To do this, we have used the definition of modularity as introduced by Newman et al.21,22 using the algorithm developed by Blondel et al.23. Modularity of a partition is a value that measures the density of links within a community compared to the links that are holding the communities together. This analysis has revealed a total of 20 well connected communities, identifying economically agglomerated regions of the world, such as the Middle-East, South-East Asia, Alaska and Oceania. Our results are in agreement with the work of Guimerà et al.15 (see Supporting Information ). This confirms the strong political, geographical and social influence in the development of the air transport network. Nevertheless, the traditional analysis of the network structure fails to uncover the hierarchical structure of the WAN. Sales-Pardo et al.16 have extracted hierarchies within communities for various complex networks, including the WAN, wherein the modules of these communities correspond to different levels in the hierarchy. Their approach stems from community detection algorithms that form the basis for identifying cluster of nodes that have more internal links than external. We are interested in a network-wide extraction of hierarchies as a step towards objectively analyzing the vulnerability of the entire system.

We have identified airports and also commercial, cargo and private airstrips that may have been used for a flight recorded by IATA. This adds more weight to our analysis as the world is continuously evolving socially and politically and any tie that may have been formed due to a commercial interest may give us insight into the economic connectivity of specific places around the globe. For instance, Alaska has many airports mainly used to serve industries scattered around towns. Our analysis reveals that these airports are completely cut off from the contiguous United States except through major flights to and from Anchorage. This reveals a star-like structure completely different from the country-wide network inside, for example, the United States of America, Canada or Mexico.

The eclectic mix of star-like structures and a strongly connected core emerges as an interesting global hierarchical structure. In Fig. 1 we illustrate three distinct layers of the network - Core, Bridge and Periphery - based on the “tcore decomposition” of the network. If a connection fails, passengers are typically rerouted through another airport to their destination. This formation is called a triangle. To extract the aforementioned structure, we propose a decomposition method based on these triangles. Similar to the kcore method24, first we remove all nodes that are not part of any triangle. These nodes form the network periphery (briefly following the classification of Guimerà et al.17), as upon removing them a lot of nodes get completely disconnected from the main network. They also form the 0 – layer and what remains falls within the 1 – core with nodes being part of at least one triangle. In the next iteration, all nodes with at most one triangle are removed from the network together with their edges and they form the 1 – layer. As we continue to remove nodes that are part of 1, 2, 3, …t triangles, we uncover the bridge of the network with nodes laying at different tlayers. Note that removal of a node with t or fewer triangles is done recursively. If removal exposes a new node with now less than or equal to t triangles, it is removed in the current iteration as well. The algorithm stops when each node has been assigned a tlayer. The layer that remains at the end is the core of the network (see Supporting Video ). Each airport in the core is part of at least 387 triangles. For a list of these airports refer to Supporting Information . This indicates that if we remove a connection within the core, there will be numerous other ways to get to the destination. A random scale-free network is vulnerable to intentional attack on hubs and breaks down rapidly3,25,29,30,31,32. In the WAN, this is not the case. Removal of the core, that undeniably consists of many hubs, leads to a minor degradation in the connectivity (the average path length may increase but the world remains connected). In what follows, we focus our attention to the properties of the rest of the network.

Figure 1
figure 1

The world airline network divided into three parts.

The bottom layer is the Periphery with airports having a zero clustering coefficient. These airports reveal the peripheral world. The top layer is the Core with airports that form the nucleus of the t-core, t = 387. Nodes in the t-core are part of at least t triangles. This layer shows how well connected some of the major economic hubs of the world are. The intermediate layer is the Bridge with all the remaining airports that act as bridges to connect remote locations to global hubs. All maps are produced using Gephi28.

Connectivity

Upon removal of the core, most part of the network remains connected and only 8.5% of the airports fall out of the connected cluster. For designing a resilient airline network, analyzing its reaction to catastrophes is significant. We develop a model to understand how the global connectivity is affected due to cancellation of flights or shutdown of airports around the globe. To quantify the loss in connectivity, we measure the fraction of airports that are still part of the largest connected component, S(q) and observe it as a function of a) the fraction of airports being shut and b) the fraction of connections getting canceled. Note, that in our case S(0) = 1, as we start with merely one connected cluster.

We analyze the connectivity of the network by sequentially shutting down airports using two different strategies. In the first one, at each step, we remove airports with the highest degree. Figure 2 shows that upon removal of the highest degree airports, the network rapidly disintegrates into many small clusters and the size of the largest connected component drops significantly. When the most connected airports are not functional anymore, the long haul flights that give the network a small-world characteristic15 also break down, explaining the sudden drop in connectivity. These airports naturally fall in the core of the network. By contrast, upon removal of the lowest degree airports, the size of the largest cluster decays linearly. As shown in Fig. 1, this network is hierarchically structured with a well connected core and a tree-like structure at the periphery. No removal of low degree airports can affect the rest of the network because they are in the periphery. Thus, the core holds the network together while the leaves of the tree are pruned. The aforementioned analysis strengthens our argument of airports with no clustering being peripheral in the sense that they are at the extremes of the network.

Figure 2
figure 2

Drop in the size of the largest connected cluster of the WAN against removal of nodes.

High degree refers to a conventional targeted removal strategy wherein each subsequent step corresponds to removing upto a fraction, q, of nodes with the highest degree. In low degree strategy, each subsequent step corresponds to removing upto a fraction, q, of nodes with the lowest degree. The random removal strategy is an average over 500 statistically independent simulations.

Airports shut down only in extreme cases. Subsequently, we focus on the most common scenario where connections are canceled. As we will show, the picture is significantly different when connections fail instead of airports.

Passenger flux

In an effort to remove connections from the network, we need to define the relevance of the connections. Passenger flux seems ideal to describe the relevance of a connection. Data is available for the number of passengers being serviced at each airport in 2011 and we define the relevance of a connection as follows,

where pij is the passenger flux on the connection from i to j. neigh(i) is the set of all nodes directly reachable from i. gives the total number of flights available from airport i and pi is the number of passengers at airport i.

This division may overestimate the flux of passengers on a connection but keeps its relevance intact as the number of passengers taking a connection depends on further available connections from the destination airport. In addition, it is also a simple measure to identify destinations that are popular tourism and business spots. This method does not explicitly take into account the frequency of flights per day between two airports, but that has implicitly been accounted for by considering the number of passengers using that connection. We use the above measure following a basic ideology of the airline network, i.e. a flight is only as important as the number of passengers using it. This provides us with an empirical measure on the network. We constitute another measure based on connectivity to complement the empirical evidence from the data.

Degree of connectivity

The degree of connectivity, cij, characterizes each connection using the topological information about the network. The topological clustering coefficient26 of each node in the network which measures the degree to which neighbors of a node cluster together is defined as follows,

where ki is the number of direct connections of airport i and E, the number of connections that exist between the first neighbors of i. Each connection has an integer weight wij given by the number of flights. We define degree of connectivity as follows,

which is asymmetric in nature, i.e. cij is not necessarily equal to cji. Not all passengers fly back to the source airport or take the same connections on their return journey. If the clustering coefficient is very low, then a connection from this airport with very few flights will be extremely important due to very few alternate paths that passengers can take to their destination. No passenger wants to take five connections to a destination that is at a relatively short distance. Thence, we focus more on a removal strategy based on low degree of connectivity and show how seemingly irrelevant details of a complex network might add to its vulnerability. For an elaborate justification of this concept, we have included an example in Supporting Information .

Connections that serve the least number of passengers or have the lowest degree of connectivity are most often overlooked and likely to be in the periphery. An important long-range connection cannot affect the connectivity of the network because of a high degree of redundancy, for example, cancellation of the longest flight, which is between Newark Liberty International airport in New Jersey, USA and Changi Airport, Singapore, will not restrict the mobility of passengers as more than fifteen one-stop alternatives exist for this route. But many short-range connections within the periphery, such as from Anchorage, Alaska, to Honolulu, Hawaii (an important tourist hub), have only one option and such cancellations can result in stranded passengers since very few passengers would accept to take even two connections that take a much longer detour (about twice the flying time of the direct connection).

Following the above arguments and defining relevance of a connection based on empirical evidence and topological information, we remove a certain fraction of connections. Here, relevance of a connection directly translates into traffic or connectivity. A high traffic removal would affect the busiest connections. Whereas, a low traffic removal would focus on idle connections. Figure 3 shows that the world fragments into different parts upon removing a small fraction (20%) of idle connections while it is almost fully connected upon removing busy connections, using passenger flux as relevance. During an economic crisis or loss suffered by an airline company, the typical flights to be canceled are the ones that carry the smallest number of passengers. Cancellation of such flights, however, causes more damage to the global connectivity than flights that connect hubs. The analysis using degree of connectivity shows similar results. Moreover, the effect of an idle removal is magnified using passenger flux over degree of connectivity. Both these observations give enough topological and empirical evidence for the importance of the peripheral connections in the WAN. The periphery is huge and consists of many airports and connections between these airports. To understand what kind of connections lay in the periphery, we analyze the resilience using the most basic information of the connections - the number of flights associated with it.

Figure 3
figure 3

Drop in the size of the largest cluster of the WAN against removal of links.

Connections are ranked according to the number of passengers using it. In high traffic removal, each subsequent step corresponds to removal of all connections up to a fraction, q, with the highest number of passengers. In low traffic removal, each subsequent step corresponds to removal of all connections up to a fraction, q, with the lowest number of passengers. The random removal strategy is an average over 500 statistically independent simulations and each step removes a fraction, q, of connections chosen at random. After removing 40% of the busy connections, 72% of the network is still connected, shown in the top-right map. The bottom right map shows that after removing the same fraction of idle connections, the world disintegrates completely, revealing the vulnerable nature of the periphery of the network (22% connected). The black nodes are not part of the largest connected cluster. The remaining colors represent different continents and show the nodes that are part of the largest connected cluster. All maps are produced using Gephi28.

Flight model

Airports might offer more than one flight for a direct connection between them. We define for each connection an integer weight, wij, given by the number of alternate flights that are available on that connection. The data set does not give us precise information about the frequency of flights. Each alternate flight has a different IATA code, i.e. they are operated by either different companies or at different times. Passengers have the option to choose among the flights from source to destination based on their preferences (price, time of flight, reliability and quality-of-service, etc).

We study the connectivity of the network upon removing connections following the rank of their weight. We found that the WAN is quite resilient to breakdowns in frequent connections with multiple flights as it has a high degree of redundancy with many paths between any pair of airports with high degree (see Supporting Information ). However, the global connectivity of the network, upon removing rare connections with the least number of flights, is lower than any other removal strategy. Our common sense would mislead us to believe that disruption of peripheral connections would not lead to loss in global connectivity and therefore predestined to face economic cuts in case of a crisis. However, our result shows that the periphery is weakly connected in terms of possibles routes as well as the number of different flights between the same pair of airports and could render a large part of the world inaccessible. Refer to Supporting Information for a visual representation of how global connectivity evolves with the sequence of failures. In order to explain this phenomenon, we study the clustering properties of the WAN in depth.

Clustering

Among other reasons airports form connections based on physical proximity. Airports can be clustered differently. We have identified three regimes of clustering using the physical length of connections.

Firstly, we define the weighted clustering coefficient, Cw(i), as defined by Barrat et al.27

which is a measure of local cohesiveness of neighbors of an airport that takes into account the intensity of the connections given by its euclidean distance, dij, between airports i and j. si is the strength of an airport i defined as . ki is the number of connections from an airport (out-degree). Lastly, aij is either 0 or 1 depending on the absence or presence of a connection between airports i and j, respectively. In this way we consider the total relative weight of the closed triplets of any airport with respect to the strength of the airport. The topological clustering coefficient, Cc, is obtained by simply marking dij = 1 for all connections.

The following regimes of nodes can be distinguished,

  1. 1

    Peripheral: Nodes that have no connections between their neighbors, Cc(i) = Cw(i) = 0.

  2. 2

    If Cc(i) ≥ Cw(i), the interconnected triplets are formed by connections with large distances and hence are global.

  3. 3

    If Cw(i) > Cc(i), the interconnected triplets are formed by connections with short distances and hence are local.

Most airports in the first category have really few connections but some of them are busy airports having a significant number of connections to and from cut-off regions and hence we call them peripheral hubs. For instance, St. Petersburg Airport, Tampa Bay, Florida, is a huge tourist destination consisting of 24 connections and no clustering. In the remaining categories, nodes are distributed equally with low degree airports having a low number of passengers and high degree airports having a higher number of passengers. Examples of airports in the local category include Frankfurt and JFK, New York. Atlanta and Domodedovo, Moscow, fall in the global category. Each continent has at least 40% airports with local clustering pointing to a continental evolution. When the continents are connected to form a world airline network, there is no significant change in the fraction of nodes that form local clusters, indicating that the airline networks typically evolved at the level of continents.

In almost all cases of removal scenarios, airports with zero clustering are the ones that are mainly responsible for a drop in the resilience of the network, followed by locally clustered airports and then the global ones. For instance, Fig. 4 shows the drop in fraction of airports within each cluster regime upon removal of connections. Here, link relevance is defined using passenger flux. All airports that have a zero clustering coefficient are disconnected from the largest connected component first. Since these airports constitute the peripheral hubs of the network, they take down the extremities of the network with them (Fig. 1 shows a large fraction of airports situated in relatively inaccessible areas of the world). This explains a sudden drop in the connectivity of the network.

Figure 4
figure 4

Drop in fraction of airports within a cluster regime.

Fc(q) is the fraction of airports belonging to a cluster regime c after removal of a fraction of connections q. The first frame shows a drop in airports that belong to the largest connected component based on an low traffic removal. The subsequent frames show the same for high traffic removal and an average over 500 random removals. In all cases, peripheral hubs (Cc = Cw = 0) drop out of the largest connected component first.

Discussion

In summary, we found that even though the network has a core resilient structure, which guarantees intercontinental connections, most of the world is accessible through peripheral connections. Clustering plays a crucial role as most airports that have no alternative connections to their destinations are the ones that make the WAN most vulnerable. This is surprising since the traditional strategy of studying the community structure15 (see Supporting Information ) gives no evidence of such a hierarchical ordering in the network. In the past, researchers have shown an abrupt hub-decomposition of random scale-free networks3,25,29,30,31,32. We have uncovered a structure that does not fall into the above category, insofar as there seems to be a hybrid of two disparate network types that are known through literature. The global connections are economically and politically significant with the world moving towards a free society for travel and living, but the local connections serve the population of the region in extending tourism and business.

A possible reason for the existence of a strongly connected core and a weakly tied periphery is the necessity of airline companies to cope with the minimization of flying time and the maximization of profit. A flight is only profitable if there is a minimum number of passengers per flight. Flying time is ideally minimized with a fully connected network with every pair of airports directly connected. Yet, this is only economically reasonable between highly populated areas or regions of intensive economic activity, which are served by the strongly connected core. The scanty number of passengers traveling to and from remote regions only justifies the creation of a star-like network, with a peripheral hub in the center, as we found in the periphery of the WAN.

Future work could account for temporal evolution of the resilience of the network. With an improved model with accurate passenger count on each connection and the frequency of flights, we can extract the community structure based on influx of passengers to obtain an even more realistic assessment of resilience. It would also be interesting to study passenger flows in different locations of the world and reveal the travel patterns toward which the current generation is moving. This could be beneficial for airline companies not only to maximize their profit but also to redesign the network to make it more resilient.

Methods

We use a straightforward framework of resilience analysis, wherein, we remove nodes and links as two separate entities in mutually exclusive simulation environments, respectively. This emulates a sequence of failures in the system. The subsequent step involves studying the breakdown pattern of the giant component against such removals. For each environment we have three different scenarios of removals. We rank all the entities in the network based on their relevance, for example, using node weight or link weight. The first scenario is based on a conventional approach that starts the removal process from the most relevant entities down to the least. The second one consists of removals that follow a counter-intuitive strategy of kicking out the least important entities first. The third and the last scenario, involves a randomized approach for comparison purposes.

All of the above framework uses input data from the following two data sets.

WAN

The flight data (airports, airlines, routes and geo-locations) was obtained from OpenFlights20 as of May, 2013. This data contains some circular connections, i.e. a flight may go from A to B and not return directly to A. Instead, this flight follows a path from A to B to C and back to A. To simplify our analysis, we have made the adjacency matrices symmetric by replicating each unidirectional connection in the opposite direction. This is justified by the fact that only very small airline companies have circular connections and merely in remote parts of the world.

The network is generated using the x (longitude) and y (latitude) coordinates of each airport and embedding them in a two dimensional space using an equirectangular projection of earth. The links are placed between any pair of airports if there exists a direct flight between the two.

Passenger traffic

The passenger data was obtained from The World Bank data set sourced through Civil Aviation Statistics of the World and ICAO staff estimates for the year 2011. This data has passenger count for every country that is registered with scheduled air carriers of that country. Changes in air transport regulations in Europe have made it more difficult to classify traffic as scheduled or unscheduled. Thus recent increases shown for some European countries may be due to changes in the classification of air traffic rather than actual growth. We have divided the passenger data among airports of the countries based on their relative weighted degree using the following equation,

where pi gives the number of passengers serviced at any airport i from country c, pc is the passenger data of the country c and is the number of alternative connections from any airport i, i.e. the weighted degree. This approach gives us an indication of the relevance of connections originating from each airport (see Supporting Information ).