Introduction

Complex networks allow to study underlying interaction structures of dynamical systems, where a detailed description of structure and dynamics may be impossible due to chaotic or otherwise complex behaviour1,2,3. In recent years complex networks have also found fruitful application in climate science, where the reduction of an inaccessibly complex continuous system to a discrete complex network helps to find large-scale interaction structures, that can not be found with conventional methods. Climate networks have provided important insights regarding various questions in climate sciences, ranging from the impact of the El Niño Southern Oscillation on global climate4,5,6, to the dynamics of the Asian monsoons7,8, ocean9,10 and atmospheric dynamics11,12. Usually, these spatially embedded climate networks are constructed from nodes corresponding to (geographical) locations and links corresponding to statistical interdependence between climate time series observed at the locations of the node4,8,13,14,15. The strength of statistical dependence is often9,6,10,15 quantified using Pearson correlation16. Such statistical interdependences between climatic variables can occur due to a common driver (i.e. solar forcing) or be the result of a physical connection, which can be direct (i.e. ocean flows) or more indirect (i.e. teleconnections17). In our model, we focus on direct local and causal connections, where statistical interdependences imply the existence of dynamical exchanges of energy and matter mediated by the underlying flow field. In the real-world application on the Pacific ocean we choose a region and time window, such that common drivers and indirect connections are unlikely to play a dominant role.

While physical mechanisms were indicated in some studies11,7,9, the dynamical processes behind the obtained networks and the relationship between the underlying flow's dynamics and the network measures, have remained largely unclear. Donges et al.9 observed in the climate network of global surface air temperature a striking resemblance between line-like structures of high betweenness and the locations of major ocean currents. They hypothesized that this “backbone of the climate network” resulted from atmospheric-oceanic coupling and the fact that “surface ocean currents play a major role in the energy and information transfer in the climate system”. In this letter, we propose a fundamental analytical model to study transport mechanisms in synoptic currents with which we test this hypothesis and bridge the gap in reasoning between the system's dynamics and network results.

Instead of reconstructing the network from observations as in previous climate network studies, we compute the network directly and analytically from the underlying flow field and thus connect flow dynamics to the topology of networks. In spirit, this bottom-up approach from a model to the network, is similar to18,19, but applies to very different systems. We provide a general framework that allows the inference of the topology of networks based only on the velocity field in the fluid. This way, the network approach can be used to compare the network imprint of the dynamics of the real-world climate system to that of a model system displaying an idealized flow with controlled dynamics.

In climate, advection and diffusion in atmospheric and oceanic flows are amongst the most important mechanisms by which dynamics are mediated. Temperature dispersion in such flows is governed by the advection diffusion equation (ADE). As an abstraction of local random fluctuations in the temperature field on such a flow, we compare the decay of temperature δ-peaks over time throughout the flow. We use them to define a cross-correlation analogue (CCA), given in equation (3), as the continuous normed scalar product between the temperature development of a tracer peak, evaluated at locations x1 and x2. We apply this method to derive networks i) analytically for a homogeneous and a circular flow, ii) from two more complex paradigmatic flows (using an appropriate approximation for the solution of the differential equation) and iii) from observed surface temperature data from the equatorial Pacific (using Pearson correlation) and investigate the relationships between the properties of the velocity field of the underlying flow and the observed network.

Results

Application to paradigmatic flows

We constructed flow networks analytically for homogeneous and circular flows and found that the links are longer in flow direction than perpendicular to it (in the supplementary material (SM)). For more complex flows we have to use an approximate solution to the ADE, as a direct analytic derivation is not possible any more. We assume that these stationary flows vary slowly over space, (we have used throughout this paper χ = 1) and all derivatives of the velocity field are ignored in the following. This is necessary to ensure the applicability of the approximation. In d dimensions this gives us the approximated temperature field

where χ is the diffusivity of the fluid. To evaluate the validity of this assumption for a given velocity field , we compute a diagnostic residual R from:

is zero for a perfect solution, which is the case if . If the maximum of this function is small compared to the other terms in equation 2, , the approximation is considered to be good. For the velocity functions used in this paper this is indeed the case. The spatial integration for the norm is a simple Gaussian integration (SM).

Then the correlation function takes the form (derivation in the Methods section):

where

and tl is defined in equation (9). We now compute the correlations in a grid and connect any pair of sites with a correlation larger than α with a “link”. We determine the threshold α such that the link density ρ is constant, , where Lnet denotes the number of links in the flow network and Lfull denotes the number of links in the fully connected graph with the same nodes. We choose a value for ρ such that the network has almost no isolated nodes and is sufficiently far from being fully connected. We find this to be the case in a large range of values for ρ and, out of those, we choose ρ = 0.2. Our results are robust for a large range of link densities (see section S3 in the SM).

We compute networks for the analytically homogeneous case (SM) and, using numerical integration, for two basic, low-gradient velocity fields given in Fig. 1, where i) one is composed of three narrow parallel flows, with alternating directions and ii) the other flow is made up of two narrow flows intersecting in the middle. The resulting networks and underlying flows are illustrated in Fig. 1. Please note that the image resolution is equal to the grid resolution in all network figures. In areas of the flow with a higher velocity, the resulting networks show a higher density and length of links than in slower regions. We analyze these networks using the network measures degree ki (equation (14)) and betweenness centrality bi (equation (15)), in order to find relationships between them and the underlying velocity field. The network measures are given in Figs. 2 and 3.

Figure 1
figure 1

The correlation network (black) computed from the given velocity field (red arrows) for two flow fields for: (a) Counter-currents, (b) Crossing currents, for better visibility, a low link density of 2 percent was chosen.

The networks display longer links in flow direction and a higher link density in regions with higher velocity.

Figure 2
figure 2

Flow field and network measures for the counter-currents in Fig. (1a).

(a) The normed degree, relates to (b) the absolute value of the flow's local velocity; (c) The maxima of the normed betweenness are co-located with (d) the maxima of the absolute value of the gradient gradient of the absolute current velocity. See equations (14) and (15) for definitions of the network measures.

Figure 3
figure 3

Flows and network measures for the crossing currents, see captio of Fig. 2.

We mainly find that high absolute velocity coincides with high node degree. For low velocities, degree and flow speed are approximately proportional, for higher speeds a saturation occurs due to the finite size of the grid (see Fig. S.2 in SM). High values for shortest path betweenness occur in the transition zones between in our case opposing flow directions (Fig. 2), or regions of distinctly different flow velocities (Fig. 3). In both cases, the regions of highest betweenness outline the underlying velocity field. The position of the high betweenness zone depends on the value of the threshold (link density), a lower threshold increases the size of the well-connected region and pushes the transition zone further out.

Other network measures such as local clustering coefficient or local assortativity3 yield structures similar to that of the node degree (results not shown).

Application to ocean data

In the next step, we compute correlation networks from sea surface temperatures (SST) in the tropical Pacific and compare them with measured ocean currents velocity field (see data description in Methods). Flow velocity, gradient and the obtained network measures are given in Fig. 4. To suppress turbulent effects, we use only the longitudinal component of the gradient. As for the paradigmatic flows we investigated earlier, also here we find a reasonable agreement between the absolute values of the velocity field and the degree in the correlation network (Fig. 4 (a) and (b), also Fig. S.3 in SM). Again, the degree is maximal where the current's velocity is and the betweenness shows large values in regions with large values of the longitudinal velocity field's absolute gradient (Fig. 4 (c) and (d)), hereby confirming the results obtained for the paradigmatic flows.

Figure 4
figure 4

Network measures of the correlation network of the equatorial counter-currents from 1997 daily anomaly SST data in comparison with flow velocity and gradient.

The region of highest degrees coincides with the region of highest flow velocity, while the regions of highest betweenness coincide with the highest velocity gradient.

Discussion

In this paper we have established a connection between data networks and the underlying physical system. The approach can easily be generalized beyond 2D static flows and to flow systems outside of climate science, as temperature can be replaced by any quantity described by the heat equation such as density or chemical concentrations. In multivariate settings, reaction, advection and diffusion processes could be studied simultaneously. Given sufficient computing resources, non-stationary flows could be treated similarly, using a time offset for integration range and peak appearance, as the ADE can still be solved analytically for time dependent velocity fields. This could give new insights in the dynamics of evolving flows, highly valuable not only in the analysis of changing climates.

The line-like structures in the betweenness fields of global climate networks9 were previously attributed to “information flow” in underlying ocean currents. We found that regions of high betweenness outline the flow rather than tracing it. Our results therefore suggest some corrections concerning the former interpretation and suggest that a high betweenness occurs in transition zones between regions of different magnitude or direction of the underlying velocity. This qualitative observation can be seen when comparing the betweennes with the absolute gradient. Physically, this could be due to the fact that advection dominates in fast flowing regions, which results in a higher parallel but lower perpendicular link density compared to the stagnant case. At the same time, we observe a correlation between regions with a high node degree and high average current velocity. Considering the advective-diffusive nature of these surface currents, a physical explanation could be that a fast flow transports the signal farther.

We find that both, the degree and the betweenness increase marginally along the flow direction. This can be understood as the signals from the slow flowing region first travel through diffusion, once they hit the fast region their main peak will travel downstream (the trajectory approximately follows the red arrow in Fig. 5). This leads to points downstream in the fast flowing area to have connections even to points in the slow region upstream from them, leading to increased degree and betweenness there.

Figure 5
figure 5

Schematic illustration of flow properties, that result in distinctive network properties: While advection dominates the transport of temperatire fluctuations in regions of fast propagation, localized diffusion dominates in stagnant regions.

Signals that leave the stagnant area by diffusion through the mixed region are subsequently transmitted along the flow. This leads to the asymmetry seen in the betweenness, where the betweenness values rise in flow direction.

In future research, such idealized case studies may be highly useful to study the influence of spatial embedding and to test hypotheses concerning the dynamics of observed correlation networks. Given sufficiently low-gradient flow data, this method can be used to construct correlation networks from observed oceanic or atmospheric flows.

We have shown how correlation networks can be constructed directly from flow fields and given an example of how to use these networks to interpret network measures. We thereby provide a foundation for climate network analysis and bridge the gap between the dynamics of underlying flows and climate network interpretation.

Methods

Definition of continuous cross-correlation analogue

For the model system we assume stationary two-dimensional flows in a square area in a two-dimensional boundaryless fluid of constant diffusivity χ described by the velocity field . The ADE states how the change of temperature over time is governed by the spatial temperature change and the velocity:

and is obtained by inserting the advective and diffusive flux

into the sourceless continuity equation for temperature

Here, is the value of the temperature in position at time t. We use a δ-peak as a tracer of the flow, analogous to local temperature fluctuations. It is inserted at an arbitrary point in the fluid as the initial condition, so, in other words, we solve the Cauchy problem of equation (4) with the initial condition

Analogous to the commonly used Pearson correlation16, we define the continuous cross-correlation analogue (CCA) as the normed scalar product of solutions of the Cauchy problem of the ADE at two points and

The time lag tl is the difference in travel time of the peak from to and , the norm is defined in the SM in equation (S.1)

where is the time when the temperature at reaches its maximum, with the initial peak starting at . The scalar product is then defined as the integral over time and peak position . The time integration is analogous to the sum over time steps, the integration over the peak position is an integral over realizations of the peak, corresponding to stochastics in the time series, where peaks appear at random in arbitrary places. So we define the CCA in this context as:

where is the position of the peak. The lower limit of the integration, t0 is chosen small but non-zero (here t0 < 10−2) as the correlation function is not defined for t = 0. The upper limit is chosen such that all temperature profiles have decayed to a value very close to zero (here: t1 = 5000).

Network construction

We evaluate the CCA on a regular grid between all pairs of grid-points. This provides the correlation matrix Cij from which the adjacency matrix A is constructed by choosing a fixed significance threshold α (see “results” section on page three). This can be expressed with the Heaviside θ function and Kronecker δ as

For any given flow field, we first have to solve the ADE (equation (4)) and use the result to compute the correlation matrix using equation (10). The Cauchy problem of the ADE in d dimensions can be solved as

in the homogeneous case with the velocity field 20. This solution takes its maximum value at

Network measures

To analyze the networks, we used the basic network measures degree and betweenness2, as normalized measures to account for grid size effects:

The degreeki of node i of a network with N nodes is given by the number of links attached to it,

and the shortest path betweennessbi of a node i is defined as the number of all shortest paths that go through it,

Where njk is the number of shortest paths connecting k and j and njk(i) is the number of those paths, that go through i.

Data

The daily anomaly SST data is based on the optimum interpolation data (OI.v2) as provided by NOAA/NCDC21,22 and the averaged monthly current's velocity data was provided by the OSCAR Project Office (Earth and Space Research, Seattle). We used data from the region 120°–160°W, 15°S–15°N and for the time period August 1996 to August 1997. The chosen year is neither an El Niño nor a La Niña-year and the results we present in the following are largely robust against the choice of the particular year. The network is calculated by standard cross-correlation and allowing for a lag of up to one day.