The Mississippi–Missouri River System (MMRS) is an invaluable resource of great biotic diversity, including freshwater fish. Its vast extent spans diverse habitat types operating under varying environmental conditions (such as climate, hydrological regime, primary productivity and human disturbance); these diverse habitats are connected to each other by one river network. An analysis that adequately captures major spatial biodiversity patterns in such a system is therefore noteworthy.

In recent years, the neutral theory of biodiversity6, with its minimal set of assumptions and parameters, has proven both influential7,8,9,10,11 and controversial12,13,14 as an explanation of biodiversity patterns. However, the theory has been tested mainly with ecosystems in two-dimensional landscapes or a mean-field context, to which spatial aspects contribute only weakly6,7,8,14,15,16,17. Only recently have the contributions of landscape spatial structure18, for example, to biodiversity patterns in river networks1, been investigated. Furthermore, implications of hydrological controls placed by river networks as ecological corridors have recently been explored19,20. Here we analyse a large database of fish diversity in the MMRS to compare empirical biodiversity patterns against those predicted by a neutral metacommunity model (see Methods). The data analysis provides significant insights in its own right, and the comparison with model results allows us to investigate the extent to which a neutral model captures observed patterns and extends inferences from the database.

In the following analysis, the 824 direct tributary areas comprising the MMRS are populated with occurrence data of 433 freshwater fish species from a database compiled by NatureServe21 (see Methods). Here, a direct tributary area (DTA) is a geographical region directly draining to a group of streams (that is, not including areas upstream of it); the DTAs correspond to the United States Geological Survey (USGS) HUC8-scale sub-basins as defined in US National Hydrography Database Plus22 (NHDPlus; see also Methods). Occurrence data and river network structure can be combined and analysed for several biodiversity patterns. We consider three patterns: first, the distribution of local species richness (LSR), or α diversity; second, species occupancies; and third, between-community (β) diversity. LSR is the number of species found in a randomly selected DTA. The occupancy of a species, in this case, is simply the number of DTAs in which that species is reported as present. To characterize β diversity, we consider the overall spatial decay of Jaccard’s similarity index7 (JSI). JSI of any pair of DTAs is defined as Sij /(Si + Sj - Sij ), where Sij is the number of species present in both DTAs i and j, and Si is the total number of species in DTA i. To achieve reliable statistics, we consider only topological distances (see Methods) for which more than 500 DTA pairs exist.

The map of LSR is shown in Fig. 1a. The DTA with the maximum LSR is Pickwick Lake (156 species) at the borders of the states of Alabama, Mississippi and Tennessee (NHDPlus sub-basin number 06030005). The map can be divided roughly into the western, species-poor half and the eastern, species-rich half. The sharp decrease in the species richness occurs around the 100° W meridian, which is also known to be the location of sharp gradients of annual precipitation23 and runoff production24,25 (Fig. 1b). Although these gradients partly explain the arid climate and low fish diversity in the western half26,27, we argue that the western DTAs are low in fish diversity both because their climate is dry and because they are upstream portions of the river network (see Supplementary Information). If they were located downstream, they might receive enough water supply and have access to a larger species pool from their wetter upstream sub-networks to maintain high fish-habitat capacities and fish diversity.

Figure 1: Maps of freshwater fish diversity and AARP in the MMRS.
figure 1

a, Local species richness (LSR), or α diversity, of the freshwater fish in each DTA (that is, at the USGS HUC8 scale; see the text) of the MMRS. b, The AARP of the MMRS.

Figure 2a shows the LSR as a function of the topological distance from the network outlet. The distance zero corresponds to Atchafalaya, Louisiana (NHDPlus sub-basin number 08080101). The LSR profile shows a significant increase in the downstream direction, except at the very end in Louisiana, where we suggest that the freshwater fish-habitat capacities are significantly reduced by salinity, co-occurrence/intrusion by some freshwater-tolerant estuarine or coastal fish species, human disturbance and pollution. The overall downstream increase in richness results from the converging character of the river network28 and is steepened by the dry–wet climatic gradient mentioned above (see Supplementary Information). Figure 2b presents the frequency distribution of LSR, whose two peaks at low and high values reflect the difference between the western and eastern halves of the MMRS.

Figure 2: Patterns of local species richness.
figure 2

a, LSR profile as a function of the topological distance (see Methods) from the outlet located in Atchafalaya, Louisiana (NHDPlus sub-basin number 08080101). b, Frequency distribution of LSR. The squares (average values) with error bars (ranging from the 25th to the 75th quantile) and bar plots represent the empirical data, and the lines represent the average values of the model results.

The species occupancies are presented in Fig. 3 as a rank–occupancy curve, in which the fish species are ranked by their occupancies. The rank–occupancy curve (akin to the familiar rank–abundance curves6) yields a straight line on a semilogarithmic scale, a pattern reminiscent of the rank–abundance curves predicted by the neutral theory6,9,16. Figure 4 shows that the JSI decreases as the topological distance between DTA pairs under consideration increases, an expected trend for β diversity. However, the JSI does not vanish even for DTA pairs that are very far apart. Such long-distance similarity in species composition is probably maintained by species with extremely large occupancies, for example Ictalurus punctatus (channel catfish), Ameiurus melas (black bullhead) and Ameiurus natalis (yellow bullhead).

Figure 3: Rank–occupancy curve.
figure 3

The squares (which are densely placed and appear as grey stripes) represent the data and the line represents the model result. Here, the occupancy of a freshwater fish species is simply the number of DTAs in which that species is reported as present. Note that the straight-line character in the semilogarithmic scale is shared by the familiar rank–abundance curves predicted by the neutral theory of biodiversity6,9,16.

Figure 4: The Jaccard’s similarity index (JSI) as a function of topological distance between DTA pairs.
figure 4

The overall decay of JSI characterizes β diversity7; the squares with error bars represent the average values with the range between the 25th and 75th quantiles of the empirical data, and the line represents the average values of the model results. Note that the JSI does not vanish, even for widely separated DTA pairs.

As alluded to above, the neutral metacommunity model is a promising candidate for modelling the general spatial biodiversity patterns of the MMRS’s freshwater fish. Here we show that by implementing the neutral model in the MMRS and incorporating the effect of average annual runoff production (AARP) on fish-habitat capacities, we can effectively reproduce a wide spectrum of observed biodiversity patterns. For instance, in addition to the general trend and magnitude, the model also captures fine-structured fluctuations of the LSR profile (Fig. 2a). The fits to the LSR frequency distribution and β diversity pattern are also very good (Figs 2b and 4). The straight-line character of the rank–occupancy curves is evident for both the data and the model result (Fig. 3). Simultaneous fits of these diverse patterns (and others, such as species–area relationship) are a very stringent test for a model29, especially a model with only four parameters as in this case (see Supplementary Information). The model also permits additional inferences to be drawn. The parameters corresponding to the best fits imply that the spread of the average fish species is quite symmetrical; that is, significantly biased in neither the upstream nor the downstream direction (wu = 1; see Methods). The model results also suggest that, on average, most fish disperse locally (that is, to nearby DTAs) but a non-negligible fraction travel very long distances (see Supplementary Information).

Given the diverse environmental conditions covered by the MMRS, our demonstration that a simple neutral metacommunity model coupled with an appropriate habitat capacity distribution and dispersal kernel can simultaneously reproduce several major observed biodiversity patterns has far-reaching implications. These results suggest that only parameters characterizing average fish behaviour—as opposed to those characterizing biological properties of all different fish species in the system—and habitat capacities and connected structure suffice for reasonably reliable predictions of large-scale biodiversity patterns to be obtained. The neutral metacommunity model also provides a null model against which more biologically realistic models may be compared, and further developments in our understanding of riverine networks and fish movement will permit a continued improvement between model and data. Indeed, although this modelling approach has been shown here to be useful for investigating key spatial patterns, it is crucial to recognize that “neutral pattern does not imply neutral process”9. Different approaches will therefore be necessary for predicting transient dynamics of the system or for understanding patterns and dynamics of specific species.

Finally, because mobile fish in a river network differ drastically from sessile trees in a forest, it is remarkable that the neutral theory can reproduce key biodiversity patterns of both sets of organisms quite well. This suggests that patterns predicted by the neutral metacommunity model—with appropriate habitat capacity distribution and dispersal kernel—may be broadly applicable across diverse ecosystems. It also offers a general, parsimonious modelling approach that acts as a coherent framework for studying several large-scale spatial biodiversity patterns simultaneously. This framework permits direct linkages to be made from various environmental changes to biodiversity patterns. For example, changes in precipitation patterns, perhaps as a result of global climate change, can now be mapped to changes in habitat capacities in the model; changes in connectivity among local communities, for example flow rerouting or damming in the case of fish, can be characterized by modifying the dispersal kernel. These linkages in turn enable us to make reliable predictions of a comprehensive set of altered biodiversity patterns, with significant implications for conservation campaigns and large-scale resource management.

Methods Summary

The biogeographical data on fish used in the analysis were obtained from the NatureServe21 database of US freshwater fish distributions, which summarizes museum records, published literature and expert opinion about fish species distribution in the United States, except Alaska, and is tabulated at the USGS HUC8 scale22. Owing to the present lack of availability of data, the Canadian portions of the MMRS are not included in the analysis, but we do not expect this to affect the key results and conclusions reported here. The data were then analysed to produce spatial biodiversity patterns (see Methods).

Our model is of a structured metacommunity type. The neutral theory of biodiversity is implemented in the MMRS, using its network as the structure of the metacommunity. Each DTA is a local community in that metacommunity and has a different fish-habitat capacity, H, defined as the number of ‘fish units’ sustainable by resources in that particular DTA; a fish unit can be thought of as a subpopulation of fish of the same species. H is assumed to be proportional to the product of the DTA watershed area and AARP25, an indicator of the quantity of resources available for fish2,4. The model uses the topological, rather than euclidean, distances between DTAs because they are representative of how far fish travel. The model captures basic ecological processes: birth, death, dispersal, colonization and diversification. The simulations are run until the system reaches a steady state; the biodiversity patterns of interest are then determined and compared with the empirical patterns.

Online Methods

Model simulations

Every DTA is assumed to always be saturated at its capacity; that is, no available resources are left unexploited. At each time step, a fish unit, randomly selected from all fish units in the system, dies and the resources that previously sustained the unit are freed and available for sustaining a new fish unit. With probability ν, the diversification rate, the new unit will represent a new species (the diversification is a rate per birth and is due to speciation, to external introduction of non-native species, or to immigration (and reimmigration) of a new species from outside the MMRS); with probability 1 - ν, the new unit will belong to a species already existing in the system (the MMRS). In the latter case, the probability Pij that an empty unit in DTA i will be colonized by a species from DTA j is determined as follows (including the probability 1 - ν):

where Kij is the dispersal kernel (see below), Hk is the habitat capacity of DTA k, and N is the total number of DTAs (here, N = 824). All the fish units in DTA j have the same probability of colonizing the empty unit in DTA i where the death took place. The reported model results are the average patterns after the system reaches a statistically steady state.

A dispersal kernel determines how the fish units move within the river network. Here, it is assumed to take the form of a combination of back-to-back exponential and Cauchy distributions; note that a combination of several theoretical dispersal kernels has been used to achieve a good representation of real dispersal kernels30 (see also Supplementary Information). The dispersal kernel in this model can be expressed as

where Kij is the probability that a fish unit produced at DTA j arrives at DTA i after dispersal; C is the normalization constant; Lij is the effective distance, defined as ND ij + wu NU ij , where ND ij and NU ij are the numbers of downstream and upstream steps comprising the shortest path from DTA j to DTA i, and wu is the weight factor modifying the upstream distance; wu > 1 implies downstream-biased dispersal, thereby characterizing dispersal directionality; and a (less than 1) and b characterize the exponential and Cauchy decays, respectively. Here, C is determined numerically such that, for every DTA j, Σ iKij = 1; that is, no fish can travel out of the network. At the upstream ends of the system this is obvious; it is also true at the downstream end of the system; namely the outlet to the Gulf of Mexico, a marine body that acts as a barrier to freshwater fish. Finally, the dispersal kernel of every species is assumed to be the same; this is perhaps a strong assumption because fish species obviously differ in their dispersal abilities. However, the ‘functional equivalence’ between species is a key way in which the neutral theory of biodiversity departs from classical ecological models. We assume the species equivalence to study just how good a fit the neutral metacommunity model can produce to our data in the absence of detailed, species-specific information.

Average annual runoff production (AARP)

Runoff is the portion of precipitation that is drained by the river network. It depends on precipitation, evapotranspiration and infiltration. The map in Fig. 1b is estimated from the streamflow data of small tributaries collected from about 12,000 gauging stations averaged over the period 1951–80. For details see ref. 25.

Direct tributary area (DTA)

The DTAs in the present analysis correspond to the HUC8-scale sub-basins designated by the USGS (available from Details of how their boundaries are designated are given in ref. 22.

Habitat capacity, H

Habitat capacity of DTA i, Hi , is determined by

rounded to the nearest integer. WA denotes watershed area, N (which here is 824) the total number of DTAs, and CH the estimate (due to rounding) of average habitat capacity in a DTA.

Topological distance

The topological distance is a measure of distance along the network. An increment in the topological distance occurs when one travels along the network and crosses from one DTA to another. In the present case, one unit of topological distance corresponds to a distance in the range 100–200 km.

Notes on supplementary data

Two matrices summarizing the data used in the analysis are provided in the Supplementary Information. The first matrix, IndicatorMatrix.txt, reports the occurrence data21 of each of the 433 fish species in each of the 824 DTAs included in the analysis. Its first column lists the identification numbers of the HUC8 sub-basins22 (DTAs in the present analysis). The remaining 433 columns consist exclusively of zeros and ones, representing the absence and presence of each species, respectively. No species names are given; they are not necessary for the analysis. The sum along each row (433 elements) therefore gives the local species richness (LSR) of the corresponding DTA, and the sum along each column (824 elements) gives the occupancy of the corresponding species. The second matrix, TopologicalDistanceMatrix.txt, reports the topological distance between each pair of the 824 DTAs, which we derived from the data available from ref. 22. Its diagonal elements are zeros: each DTA is at zero distance from itself. The rows and columns of this matrix correspond to the DTA (that is, the HUC8 sub-basin) numbers in the first column of IndicatorMatrix.txt. The outlet corresponds to DTA number 08080101, which is the 364th row of IndicatorMatrix.txt. These two matrices can thus be combined to produce the profile of LSR as a function of topological distance from the outlet (Fig. 1a) and the pattern of Jaccard’s similarity index (JSI) as a function of topological distance between DTA pairs (Fig. 4).