A universal approach for drainage basins

Drainage basins are essential to Geohydrology and Biodiversity. Defining those regions in a simple, robust and efficient way is a constant challenge in Earth Science. Here, we introduce a model to delineate multiple drainage basins through an extension of the Invasion Percolation-Based Algorithm (IPBA). In order to prove the potential of our approach, we apply it to real and artificial datasets. We observe that the perimeter and area distributions of basins and anti-basins display long tails extending over several orders of magnitude and following approximately power-law behaviors. Moreover, the exponents of these power laws depend on spatial correlations and are invariant under the landscape orientation, not only for terrestrial, but lunar and martian landscapes. The terrestrial and martian results are statistically identical, which suggests that a hypothetical martian river would present similarity to the terrestrial rivers. Finally, we propose a theoretical value for the Hack’s exponent based on the fractal dimension of watersheds, γ = D/2. We measure γ = 0.54 ± 0.01 for Earth, which is close to our estimation of γ ≈ 0.55. Our study suggests that Hack’s law can have its origin purely in the maximum and minimum lines of the landscapes.

Drainage basins play a fundamental role in the hydrologic cycle, which make them essential to diversity and maintenance of Life on Earth 1 . The longstanding problem of characterising drainage basins has drawn much attention due to its importance in a variety of environmental issues, such as water management [2][3][4] , landslide and flood prevention [5][6][7][8][9] , and aquatic dead zones [10][11][12] . In this context, drainage basins, or simply basins, are all land areas sloping toward a single outlet, e.g. a river mouth or points of higher infiltration or evaporation rates. They are outlined by abstract boundary lines, called topographic divides or watersheds. The concept of watersheds appears in many other seemingly unrelated areas like percolation theory 13,14 , image segmentation and medicine [15][16][17][18] , and even international borders 19,20 .
Watersheds are recognized as fractals in several cases [21][22][23][24] , exhibiting self-similarity and a well defined fractal dimension. There are several objects that share the same fractal dimension of watersheds on uncorrelated random substrates (viz. D ≈ 1.22), such as optimal paths under strong disorder [25][26][27][28] , minimum spanning trees on random networks 29 , backbones of the optimal path crack [30][31][32] , and bridge bonds on ranked surfaces 33 . All these loopless paths belong to the same universality class as watersheds. Furthermore, it is also known that watersheds are Schramm-Loewner evolution curves 34 and that it is possible to define hydrological watersheds 35 , where the infiltration process in the soil is taken into account.
The availability of digital elevation models (DEMs) allowed the development of modern techniques to automatically delineate watersheds. Nowadays, these methods are used in the standard Geographic Information System (GIS) software and are well established as a fundamental tool in Geoscience 36 . The basic idea behind these methods is the calculation of the flow directions, which can be defined in several ways depending on the assumptions on the flow and number of neighbouring cells in the grid [37][38][39][40][41][42][43][44] . Simultaneously to the development of these methods, the concept of watersheds were also introduced in the context of contour delineation 45 and became a widely used tool for image processing. The key difference among these methods is that those adopted in image processing do not use channels to define watershed lines. Instead, they simply search the landscape for the highest lines that divide it. In this context, even the most modern computational models are not able to get all watersheds of the globe at once, they are usually limited to specific regions. As compared to all these methods, methods based (2019) 9:9845 | https://doi.org/10.1038/s41598-019-46165-0 www.nature.com/scientificreports www.nature.com/scientificreports/ on invasion percolation as ours do not need to explore the entire space but only a fractal subset of it. Therefore, our method becomes more and more efficient the larger the landscape.
Here, we propose a simple model to fully delineate multiple drainage basins for any given landscape based on the traditional Invasion Percolation (IP) model 46 , which is known to be a Self-Organized Criticality (SOC) model 47 , i.e. the IP model does not need a tuning parameter to evolve and identify the watershed. Our method allows us to study the morphological segmentation of global maxima and minima in image processing, even if the landscape stands for the gray scale of a brain Magnetic Resonance Imaging (MRI) or the heights of a celestial body with or without river channels. The novelty of our approach is to characterize all basins from a single height dataset through the definition of a reference (sea) level, i.e. our approach is free of parameter tuning.

The Model
In 2009, Fehr et al. [22][23][24] introduced a model, called Invasion Percolation-Based Algorithm (IPBA), in order to extract watersheds from landscapes. The IPBA was proposed for a regular square lattice of size L with fixed boundary conditions in the vertical direction and periodic boundary conditions in the horizontal direction, where the height of each site i was represented by h i . It was also defined that the upper and lower lines of the lattice represent the sinks of two basins, e.g. one at the North (N) and other at the South (S), respectively. In this context, the following rule for the identification of the basins was proposed: For each site i, one applies the IP model, defining that the basin (N or S) to which the site i belongs is the one that the IP invaded cluster reaches first. Thus, all sites of the lattice belong to one of the two basins and the interface line between them defines the watershed. To improve the computational performance of the task of finding the interface line, an efficient sweeping strategy was also introduced: (i) Initially, the sites are chosen along a straight line that connects the sinks. Therefore, when the IP processes from two neighbouring sites evolve to different sinks, a segment of the watershed lies between them.
(ii) From then on, the sites are chosen only in the neighbourhood of the already known watershed segments in order to reveal more segments of the watershed, resulting at the end in the complete watershed. Fehr et al. 22 also showed that the IPBA follows the same dynamics as the Vincent-Soille algorithm 15 . The Vincent-Soille algorithm has a direct interpretation, which is basically a flooding process, although it is computationally inefficient. The main advantage of the IPBA is its computational performance, the IPBA presents a sublinear complexity time since it only explores a fractal subset of space of fractal dimension 91/48 22 and this characteristic allows us to perform a global analysis to the drainage basins.
Our aim is to define a robust mathematical model for the delineation of multiple drainage basins through an extension of the IPBA. Suppose a regular rectangular lattice L x × L y , where the height of each site i is h i , analogous to the original model. We introduce a height threshold h * such that, if h i > h * , then the i th site belongs to a cluster, which we call height cluster, composed by all connected sites with height above that threshold. Otherwise, the i th site does not belong to any cluster. As explained in the following section, we adopted h * = 0 throughout this study, which for Earth corresponds to sea level. For this particular choice, the height clusters define continents and islands on Earth, as shown in Fig. 1A. Here, the sinks S k (k = 1, 2, …, N b ) are all the N b border sites of the height clusters, i.e. the sea shore on Earth. Consequently, we know a priori that they define N b drainage basins separated by several interface lines, but their specific sizes and shapes need to be determined. Similarly to the ideas proposed by Fehr et al. 22 , we define the following rule to identify basins present in the height clusters: The IP model is applied for each site i defining that the basin (S k ) at which the site i belongs is the one that the IP invaded www.nature.com/scientificreports www.nature.com/scientificreports/ cluster reaches first (see Fig. 1A). As depicted in Fig. 1B, the set of interface lines forms the watershed network that separates all basins in the height clusters. We also use a strategy analogous to the original IPBA to improve the performance of finding the watershed network. Here, the sweeping occurs in each basin S k as follows: (i) The sink S k defines the ends (initial and final segments) of its yet not identified watershed. (ii) For each basin, the sweeping occurs only at sites neighboring the already known watershed segments in order to reveal the missing ones. In other words, we scanned the sites along the watershed inner perimeter neighbourhood of each basin. This strategy drastically reduces the number of times that we need to apply the IP algorithm for the identification of the watershed network. (iii) Optionally, a simple burning algorithm can be applied to each basin in order to evaluate its area 47 .
Actually, we consider two versions of our algorithm along the study: A version with traditional periodic boundary conditions in horizontal direction and unconventional periodic boundary conditions in vertical direction for real landscapes, and another version with fixed boundary conditions on both directions for artificial landscapes. The unconventional periodic boundary conditions, adopted for real landscapes, are defined by imposing that each site in the top (bottom) row be neighbour of every other site in the top (bottom) row. These boundary conditions represent a mapping of a sphere into a lattice. They are needed because we performed all measures on the entire globe for real landscapes.
A natural extension of the watershed concept is the definition of its reciprocal line, called anti-watershed. The anti-watersheds are composed by lines of minimal heights, in contrast to the watersheds, defined in terms of lines of maximal heights. Therefore, if the basins can (grossly) be understood as "cavities" in a surface, then the anti-basins can also be understood as "humps" on that same (inverted) surface. We emphasize that anti-watersheds do not necessarily represent river channels, since a river channel (or a channel) is a "clearly defined watercourse ("natural or man-made channel through or along which water may flow") which periodically or continuously contains moving water", or is a "watercourse forming a connecting link between two water bodies", or even is the "deepest portion of a watercourse, in which the main stream flows" 48 , while a anti-watershed line may never contain any flowing water. Given an upside-down landscape, the same approach for watersheds can be considered to define the anti-watersheds. In this case, the watershed network represents the anti-watershed network in the original landscape. Furthermore, we can also define an anti-watershed network within a drainage basin (see Fig. 1C). Here, the lines of minimal heights represent the most deeper rivers and their tributaries in a lot of situations.
We highlight that our model just takes into account the drainage basins that flow out to the oceans, i.e. we are ignoring the endorheic drainage basins (or simply endorheic basins), which are those basins that flow out to any place other than the oceans, e.g. lakes or swamps, their amounts of water being balanced by infiltration or evaporation 49 . Nonetheless, we could also include such basins in our model introducing additional sinks ′ S k (where k = 1, 2, …, N e and N e is the number of endorheic basins), located at the points of higher evaporation rates, for example. In this context, the total number of drainage basins (N b + N e ) would depend on the spatial resolution of the data and the information available about the endorheic basins. Our code is available at https://github.com/ erneson.

Results
We applied the IPBA model to real and artificial landscapes in order to study the statistical properties of the drainage basins on Earth, Moon, and Mars. For real landscapes, we use three different DEMs throughout this study: the General Bathymetric Chart of the Oceans (GEBCO) 50 , the Lunar Orbiter Laser Altimeter (LOLA) 51 , and the Mars Orbiter Laser Altimeter (MOLA) 52 . Such datasets consist of the map of heights for the Earth, Moon and Mars, respectively. Moreover, we obtained the artificial landscapes through the fractional Brownian motion (fBm) 53 . We chose this method because we are interested in generating landscapes with tunable spatial long-range correlations since it is known that such correlations change the statistical properties of watersheds 24 . We show all landscapes in Figs 2A-C and 3A-D. We considered two scenarios for GEBCO, LOLA, and MOLA datasets: the original and the upside-down landscape orientations. We emphasize that we chose the GEBCO dataset to represent the terrestrial landscapes because it includes altimetric and bathymetric heights, which makes it more similar to LOLA and MOLA datasets. In other words, we used the GEBCO dataset in order to make a general and uniform comparison between Earth and two celestial bodies that do not have oceans (Moon and Mars). To perform a global analysis, the resolutions of the real datasets were decreased by a factor of r = 8, where r is the resolution factor. Thus, each tile of 8 × 8 sites was replaced by a single site with a value given by the mean of all 64 original sites. The sizes of the used lattices were 5400 × 2700 for GEBCO (original 43200 × 21600), 5760 × 2880 for LOLA (original 46080 × 23040), and 5760 × 2880 for MOLA (original 46080 × 23040). For the fBm landscapes, we considered only the original orientation due to the natural symmetry of the Gaussian distribution used in the Ffm. In this case, we averaged our simulations over 10 samples of lattices of 4096 × 4096 for the traditional range of the Hurst exponent H (0 ≤ H ≤ 1). The following subsections show the corresponding results.

Real landscapes.
Here, we show the main results of our approach applied to real landscapes. In Fig. 2D,G, we show the results of our model applied to the original and upside-down landscapes of the Earth. In Fig. 2E,H,F and I, we use the height threshold h * = 0 (hypothetical sea level) and perform the same analysis used for Earth's landscape to obtain the basins and anti-basins for the Moon and Mars. We note that the basins and anti-basins look similar in the terrestrial and martian landscapes, while in the lunar landscape, they are affected by the impact basins, i.e. craters originated from the impact of asteroids. This similarity is quantified here through the statistical distributions of perimeters and areas of the basins and anti-basins for Earth, Moon and Mars. We included the lunar analysis as a counterpoint to show that the observed similarity between the terrestrial and Martian results is indeed genuine. The log-log plots of all these distributions shown in Fig. 4A-B clearly indicate the presence of long tails extending over several orders of magnitude. Moreover, these tails approximately follow power-law www.nature.com/scientificreports www.nature.com/scientificreports/ behaviors, P(s) ∝ s α and P(A) ∝ A β , for the perimeters and areas, respectively. By performing Ordinary Least Square (OLS) fits 54 to the corresponding distributions, we obtained estimates for the power-law exponents α and β that are summarized in Table 1. Similar results where found using a Maximum Likelihood Estimator (MLE) 55 , as shown in the Supplementary Information (SI).  www.nature.com/scientificreports www.nature.com/scientificreports/ We found that the terrestrial and martian results are statistically identical, which suggests the surfaces of both planets underwent a similar formation history and that a hypothetical martian river would present some level of similarity to the terrestrial rivers, since both landscapes share the same statistics for watershed (maxima) and anti-watershed (minima) lines. There is already evidence that the Martian channels were formed by a series of fluvial and non-fluvial processes 56 . Furthermore, we found evidence that the obtained perimeter and area exponents are independent of the resolution factor r (See SI). It is known that the effects of finite size are attenuated as the system size increases. We also obtained similar results for other values of h * on Earth, Moon and Mars.
In Fig. 5, we show the Amazon basin and its associated anti-basins defined by our algorithm on GEBCO dataset at the original resolution. In this special case, we are removing all internal sinks from the South American continent, i.e. we are allowing the existence of sites with negative heights within South America. We emphasise that several rivers (the deeper ones) follow the anti-watershed lines. The black lines in Fig. 5 stand for the anti-watershed network, which shows an impressive similarity with the Amazon river network (See SI for further comparisons). This similarity allows us to obtain the length of the longest river in a basin by approximating it by the length of the longest path from the point of minimal height (the mouth of the river) of the largest tree on the anti-watershed network. We defined the height of each point of the anti-watershed lines as the mean of the heights of its neighbouring sites. Therefore, we are ignoring any issues related to the initiation point of river channels 57 .

Cases
Abbreviation α β www.nature.com/scientificreports www.nature.com/scientificreports/ As a perspective for future work, quantitative analyses can be performed comparing the anti-watershed and river networks, e.g. see the proposed method by Grieve et al. 58 .
We also performed the verification of Hack's law 59 for the entire planet. This scaling law establishes the relation between the areas (A) of the basins and the maximum lengths () of their rivers, i.e.
where γ is known as the Hack exponent. In Fig. 6, we show Hack's law for the original orientation of the terrestrial landscape considering only the basins with area greater than A * = 100 km 2 . We chose an area threshold A * to ensure that the analyzed basins were not too much affected by the resolution of the dataset. We found the Hack exponent γ = 0.54 ± 0.01 with coefficient of determination R 2 = 0.87, which is very close to our theoretical value of γ ≈ 0.55 for Earth.
Artificial landscapes. We applied the extension of the IPBA to artificial landscapes. In Fig. 3E   www.nature.com/scientificreports www.nature.com/scientificreports/ and 1.93 and between 2.39 and 1.59, respectively. In the uncorrelated case (H = −1), however, we obtained less than one order of magnitude for  and A precluding the same kind of analysis.
In Fig. 6, we show Hack's law obtained for H = 0.7 (a close value of H is usually obtained for real landscapes 24 ), considering only the basins with area above A * = 1024. Such result led us to the following conjecture: Let the basin area be A, the longest anti-watershed line be , and assuming that the anti-watershed lines are indeed watershed lines of the upside-down landscapes, it is known that ∝  L D , where L is the linear length of the system and D is the fractal dimension of the watershed lines 22 . Since A ∝ L 2 , we have:

Discussion
We proposed a general model to fully delineate multiple drainage basins for any given landscape of heights through an extension of the IPBA. The novelty of our approach is to characterise all basins from a single height dataset through the definition of a reference (sea) level. Such fact allows us to claim that our model is free of parameter tuning. In this way, we are able to delineate the basins through the definition of the watershed network (maximal lines of a landscape) as well as the anti-basins through the definition of the anti-watershed network (minimal lines of a landscape). In order to show that our algorithm was robust, we applied it to real and artificial landscapes. In both cases, we found that the perimeter and area distributions are ruled by power laws with exponents α and β, respectively. It was also shown that the terrestrial and martian results are statistically identical, which suggests that the surfaces of Earth and Moon have undergone similar formation processes and that a hypothetical martian river would present similarity to the terrestrial rivers, since both landscapes share the same statistics for watershed and anti-watershed networks. We also verified that, in the Amazon basin and its associated anti-basins defined by our approach, several rivers (the most deeper ones) rest on anti-watershed lines. Furthermore, we showed that the exponents α and β, for artificial landscapes, decrease systematically with the Hurst exponent H and that they are invariant under the inversion of real landscapes. Finally, we found a theoretical value for the Hack's exponent based on the fractal dimension of the watershed and anti-watershed lines, γ = D/2. We measured γ = 0.521 ± 0.002 for artificial landscapes with H = 0.7 and γ = 0.54 ± 0.01 for Earth, which agree within error bars with our estimation of γ ≈ 0.55 for real cases.

Methods
Real landscapes. We use three different DEM datasets throughout this study. The first dataset is the General Bathymetric Chart of the Oceans (GEBCO) 50 consisting of altimetric and baltimetric heights, i.e. the heights above and below the sea level, around the Earth globe. The resolution of this dataset is 30 arc−seconds (30/3600 decimaldegrees) in both coordinates, equivalent to a square lattice with edge length of 0.926 kilometers (km) at the Equator line. The other two are the Lunar Orbiter Laser Altimeter (LOLA) 51 and Mars Orbiter Laser Altimeter (MOLA) 52 consisting of the map of heights for the Moon and Mars, respectively. The LOLA resolution is about 0.118 km, while MOLA resolution is 0.463 km, both in relation to their corresponding "Equator". We show the three datasets in Fig. 2A-C. In addition, we point out that all datasets are freely available and are friendly readyto-use, i.e. all technical preprocessing steps were already performed by the GEBCO, LOLA, and MOLA research teams. We do not perform any additional preprocessing (such as any hydrologic correction or void filling), except the addition of a tiny noise in the heights (<0.001 m, much less than the precision of all datasets) in order to ensure that all values in real landscapes are different, defining a unique sequence if the height values are sorted.
We also emphasise that all three datasets are available in the Geographic Coordinate System (GCS), more precisely, in latitude and longitude grids in the image format TIFF, i.e. they are mapped on spheres, the terrestrial (with radius R earth = 6378.137 km), the lunar (with radius R moon = 1737.4 km), and the martian (with radius R mars = 3396.19 km). For GEBCO, the reference surface (the zero height) is defined by the terrestrial geoid. The geoid is the natural shape that a static fluid would present due to the gravitational potential of its celestial body 62 . On Earth, the oceans could be considered static and, consequently, they are well approximated by such a surface. For LOLA and MOLA, the geoid concept is generalised by the gravitational equipotential surface with the mean lunar and martian radius at the Equator, respectively, defining hypothetical sea levels 51,52 . We adopt the height threshold h * = 0 for all landscapes in order to make a general comparative analysis.
Here, we perform the calculation of the area of each site by the composition of two spherical triangles (the site areas for artificial landscapes have no unit of measure and are all unitary). The area of a spherical triangle with edges a, b and c is given by 63  www.nature.com/scientificreports www.nature.com/scientificreports/ where s = (a/R k + b/R k + c/R k )/2, s a = s − a/R k , s b = s − b/R k , and s c = s − c/R k . In this formalism, R k is the sphere radius, where, in our case, k = {earth, moon, mars}, and the edge lengths are calculated by the great circle (geodesic) distance between two points i and j on the sphere surface given by the Haversine formula 64  where Δφ = φ j − φ i and Δλ = λ j − λ i . The values of λ i (λ j ) and φ i (φ j ), measured in radians, are the longitude and latitude, respectively, of the point i (j). Therefore, we are able to define the site areas, and, consequently, obtain the total area of a given basin, since each basin is composed by a set of sites.
Artificial landscapes. We obtained artificial landscapes through the fractional Brownian motion (fBm) 53 in order to study the watershed and anti-watershed networks. One of the most established method to generate a fBm is the so-called Fourier filtering method (Ffm) 53 . The basic idea of the Ffm is to define random Fourier coefficients in the reciprocal space, distributed according to the following power-law spectral density: where f i is the frequency of the dimension i, d is the topological dimension, and w is the spectral exponent. Subsequently, the inverse Fourier transform is applied to generate a correlated distribution in the real space. In our case d = 2, the correlated distribution is a landscape. Each landscape is characterised by an exponent H, called Hurst exponent, related to the spectral exponent by w = 2H + d = 2H + 2. Four cases can be distinguished: (i) For H = −1, the uncorrelated landscape (see Fig. 3A). (ii) For 0 < H < 1/2, the landscape has a negative correlation, i.e. the increments are anticorrelated (see Fig. 3B). (iii) For H = 1/2, the landscape is correlated, but the increments are uncorrelated (see Fig. 3C), which is the case of the classical Brownian motion 53 . (iv) Finally, for 1/2 < H < 1, the landscape has a positive correlation, i.e. the increments are correlated (see Fig. 3D).