Predicting the impact and spread of biological invasions remains a key challenge for environmental management1,2. Characteristics of the receiving environment, native biota and the species themselves have all been implicated as determinants of species establishment, spread and impact3,4,5,6,7,8. Yet there is little agreement on the relative utility or generality of these predictors, despite the fact that they are used to inform our investments in preventing invasion and mitigating impacts. For example, to minimize impact from non-native species should we prioritize approaches that ban importation of species with certain traits, or guard habitats that seem particularly vulnerable?

In assessing invader impact and its key drivers, a quantifiable metric is required. The range size of a non-native species is among the most relevant because when multiplied by a species’ mean density and per capita effects, it determines the species’ total impact9. Range is often the easiest component of impact to measure, probably has the smallest estimation error (on the order of 10%)9 and can display over seven orders of magnitude variation among species. Furthermore, within a species’ native geographic distribution, range is responsive to species traits (body size, niche breadth) and environmental conditions10. For example, among marine invertebrates, species with more mobile or longer-lived dispersal stages have significantly larger native geographic ranges than species with non-feeding, less mobile larvae (e.g.,11,12,13) and body size is also associated with range size within some taxa14,15. Such patterns within native ranges form the basis for using these traits to predict the potential impact and spread of invasions.

Although species traits and environmental conditions are sometimes good predictors of the range sizes of native species, they may not work well in predicting the novel ranges of introduced species’ where species have not yet reached geographic equilibrium. If the rate of post-introduction dispersal or population growth is low, time since introduction could have a strong influence on the distribution and abundance of these species16,17,18. Thus, although species’ traits and environmental attributes may ultimately predict the potential range occupied by species, non-native species may not have been present long enough to spread and occupy the total potential range, making the influences and relative importance of environmental and species traits harder to discern. As a result, species may appear to have weak habitat and environmental affinities if other processes, such as dispersal limitation, delay the occupation of suitable habitats in a significant portion of the potential range19. The confounding effects of time could in part be responsible for discord in the science and management communities about the relative importance of different factors driving invasion success20 and thus which management options deserve highest priority.

We investigated the ability of introduced species’ traits, environmental attributes of the recipient environment and time since first introduction globally to explain the non-native geographic ranges of marine invertebrate species. Vectors typically responsible for introducing marine invertebrate species are ships (ballast water, hull fouling), aquaculture imports (primarily oysters and associated hitchhiking species) and live trade (aquarium, live seafood, bait)21. Once in a new range a species can continue to spread secondarily through these same human-mediated vectors, as well as through natural dispersal, including passive transport of dispersive stages in the water column (e.g., larvae) and to a lesser degree through active locomotion (e.g., walking or swimming).

Using the Global Biodiversity Information Facility (GBIF,, we developed a database of 138 species of coastal marine invertebrates that are non-native in Australia, New Zealand, or the United States—well-studied domains with some of the world’s richest data for marine invasions and environmental variables (Supplementary Table 1). To control for variation in data quality, we included in our database only marine invertebrates non-native to these countries that: 1) are present in benthic habitats; 2) are not restricted to brackish water; 3) are not part of known cryptic species complexes (which can enhance the probability of misidentification); 4) have sufficient life history information to contribute to our analyses; and 5) have known native ranges (i.e., not cryptogenic). For each species we logged all recorded locations in the non-native range, tabulated the length of its non-native distribution separately for each continental coastline and summed these values across coasts to calculate the total non-native range size to be used as our response variable. From the literature we extracted time since the first record of introduction anywhere in the world and life history attributes, including maximum body size, adult mobility (sessile or mobile), adult habitat (epifaunal or infaunal) and larval development type (planktonic or non-planktonic). These variables were selected because they are easy to measure, have widely available information and have previously been shown to either directly or indirectly influence dispersal and/or population growth rate. We extracted physical data on annual and seasonal averages and standard deviations in water temperature, salinity and currents for all the GPS points of occurrence in the non-native range from a 1-degree resolution climatology of oceanographic variables. These physical variables address whether there are certain abiotic/environmental properties more associated with invasion, for example, if warmer or saltier domains are more heavily invaded, i.e. promote broader ranges in invasive species. To aid visualization of our results, for each species we also recorded the latitudinal range it occupied along each coastline.

The sizes of species’ non-native ranges along coastlines varied widely, ranging between 50 km (our minimum resolution) and 23,000 km (Fig. 1). The single most important variable in predicting non-native range size was time since first recorded global introduction. Time explained 20% of variation in range size (Fig. 2) and was present in every top fitting model, hence its relative variable importance (RVI) of 1.0 (Table 1). The best-fitting model also contained maximum body size, habitat type and mean salinity and explained 29% of variation in range size. Among the top models, several other biological and physical variables, especially mean spring water temperature, contributed slightly to the overall fit (Table 1), but little relative to time since invasion (Supplementary Figure 1).

Table 1 Models explaining the coastal range (in km) occupied by non-native marine invertebrates.
Figure 1
figure 1

Ranges, distribution and invasion age of species in our study.

Waterfall plot of the coastal ranges of non-native marine invertebrate species along the focal coastlines of this study shown as the latitudinal bands of species’ distributions. Some species may appear in more than one panel if present on more than one coast. Blue represents species whose first record of introduction anywhere in the world is before 1954 (i.e. the median time in our dataset, “Old”); red represents species whose first record of introduction anywhere in the world is between 1954–2012 (“Young”). New Zealand plot does not differentiate between east and west coasts. Figure was created in R 3.1.

Figure 2
figure 2

Relationship between a species’ time since first record of introduction anywhere and the length of coastline it occupies in its non-native range.

Time since invasion is the single-most influential variable on range, explaining 20% of the variability. Range = 37.8 x Years since first record of global invasion + 138 (R2 = 0.20, P < 0.0001).

Time since first global record of introduction is a powerful explanatory variable for predicting the current extent of coastal non-native invertebrate species’ distributions. The strong, positive slope of the relationship, with the absence of an asymptote, suggests that many invaders are not yet at geographic equilibrium and signals their continuing spread. The slope of the relationship (37.8) indicates that non-native species expand on average ~400 km per decade. Although our results do not expose which mechanistic processes are driving expansion as a function of time, many influential ecological and evolutionary processes that affect spread rates of non-native species are a function of time, such as population growth rates, dispersal rates, competitive exclusion and selection (e.g.,22,23). Therefore, time may serve, at least in part, as a useful proxy variable that integrates and subsumes the influences of many processes that are harder to measure directly.

Our study marks the first instance that the influence of time since global introduction has been considered comprehensively as a predictor of spatial extent of marine invasions and it is striking that effects of time since invasion emerge despite considering taxa from 10 phyla, spanning diverse life histories and ecological characteristics. The median year since the first global record of introduction in our dataset is 1954, reflecting a relatively recent influx of species, especially compared to the median time for plant invasions in terrestrial systems which is often 1900 or earlier (e.g.,16,24,25,26,27). Although introduction history in the ocean seems more recent, this largely reflects a limited historical baseline of species inventories compared to terrestrial biotas28,29—a notion supported by the high number of species designated as cryptogenic in marine environs30. Despite the older introductions, many terrestrial plant studies have similarly shown a significant positive influence of time on range size18,27. Other taxa with more active dispersal may achieve equilibrium distributions quickly, especially when confined to a small region, such as birds in New Zealand, which showed no correlation between invasion time and current range size31.

That non-native species are largely not at equilibrium is one explanation for why other environmental and biological trait variables did not hold much explanatory power. If a species is not occupying its full niche space, areas with suitable combinations of habitat and environmental characteristics may still have an absence of the species due to dispersal limitation. However, failure to detect an effect of invader traits or recipient environment on range size may also result from other explanations. Variables we did not include in our model may have large influences on some species’ ranges, especially biotic interactions such as competition, predation and mutualism. It is difficult, for example, to quantify recipient community diversity, or other proxies for the intensity of biotic interactions at all the invaded locations in our database and poor descriptions of native ranges for many invaders preclude using donor environment characteristics. Both of these aspects have been implicated in predicting invasion success32 and may account for some of the unexplained variance in our models.

Also, it is possible that no single variable is universally important in predicting species’ range sizes33,34 and so the failure of factors to contribute to range prediction occurs because species respond idiosyncratically to these factors. However, we view this as unlikely, given the first order importance of certain variables, such as salinity and temperature, in affecting organisms’ physiology, defining their ecological niches and thus helping to set species biogeographic ranges10,35,36. It also seems unlikely that the lack of environmental correlates of range size stems from a poor characterization of the range due to under-sampling, poor data quality, or slow time to publication. While some under-reporting of occurrence data undoubtedly exists, the geographic region and species selected for analyses are well studied, serving to constrain such biases and we did find a strong correlation of range with time since invasion. If lags exist in the discovery and reporting of invaders from their actual time of invasion, as long as these lags are random or unbiased across species, this effect should not affect the slope of the relationship between time since invasion and range size but would only shift the relationship to the right (lower the intercept of the relationship).

The lack of a standard influence of climate-related variables like temperature in our analysis is noteworthy, but is not inconsistent with reports that warming temperatures have facilitated invasions in some locations37 or resulted in range expansions of native species (e.g.,38,39). Even if a species’ range is expanded in one direction as a result of climate change, the total range size may still be a fraction of the potential range. Furthermore, we might expect climate change to have a minimal effect on total range size because expansions in one direction along a coastline may be balanced by contractions in another. In general, we urge caution in interpreting net climatic effects (or the lack thereof) for invasive species with potential ranges that are currently undersaturated.

In conclusion, the importance of the length of time since first global introduction implies that many invaders are still spreading, which increases the difficulty in detecting key trait and environmental controls of biological invasions from broad scale correlative analyses. Time since first global introduction is an appealing predictive variable because it is discrete and relatively easy to quantify. However, unlike habitat and trait variables, it does not help to build a preventative model of spread since it is a variable that is quantifiable only post-invasion. Because the footprint of existing non-native species is still expanding, any consideration of their potential impact (sensu 9) is likely underestimated. Our findings further suggest that the current best general strategy to preemptively protect areas from invasion is to identify vulnerable areas based on vector inputs as opposed to site and species characteristics.


Choosing species for inclusion

We focused on marine benthic invertebrate species that are non-native to either Australia, New Zealand, or the United States because these countries have the most complete data available on distributional range for non-native species. We accumulated target species by first compiling a comprehensive list of introduced species from previously assembled, publicly available national port surveys and non-native species databases in these countries. For Australia we used the Australian port surveys’ list of known exotic species in Australian waters (summarized in40), the Australian National Consultative Committee on Introduced Marine Pest Emergencies (CCIMPE) trigger list species. For New Zealand we used the New Zealand Port Biological Baseline Survey list of introduced species ( For the United States, sources for our initial species list were the United States Geological Survey Nonindigenous Aquatic Species database (, the United States National Exotic Marine and Estuarine Species Information System ( and the United States NOAA Technical Memorandum NOS NCCOS 77 (

We excluded most aquaculture species, as these species have been intentionally introduced and nurtured to thrive in their new environments and often specific steps are taken in association with their introduction to either prevent or promote spread. We also excluded brackish species and species known to comprise a cryptic species complex, i.e., more than one species (e.g. Namanereis littoralis).

Data collection

Literature search: Compilation of species and their traits

First, we checked species taxonomies against the World Register of Marine Species (WoRMS, and acquired the accepted taxonomic name and associated synonyms. Then, for each species, we collected life history trait and invasion history data. Relevant life history traits included species’ temperature and salinity tolerances, depth range, larval duration, development type (planktonic or non-planktonic), maximum body size, habitat use (infaunal or epifaunal) and mobility (mobile or sessile). We also categorized species by phylum. We began our data gathering by mining the above databases, plus previously compiled species fact sheets from Australian and New Zealand governments (; NIWA pest sheets from G. Inglis), as well as fact sheets from the Bishop Museum and University of Hawaii, ( All information from fact sheets was double-checked with the referenced primary sources. We also extracted data from the following non-indigenous species online databases (searching for each species under all its synonyms): 1) The National Exotic Marine and Estuarine Species Information System (NEMESIS,, 2) the National Introduced Marine Pest Information System (NIMPIS,, 3) the Global Invasive Species Database (, 4) the Exotics Guide: Non-native Marine Species of the North American Pacific Coast (, 5) the Invasive Species Compendium (, 6) the European Network on Invasive Alien Species (NOBANIS,, 7) Delivering Alien Invasive Species Inventories for Europe (DAISIE,, 8) the Marine Life Information Network (MarLIN, and 9) the United States Geological Survey Nonindigenous Aquatic Species (NAS,

To fill in missing data we then performed targeted literature searches of the gray literature and peer-reviewed journals in Web of Science and Google Scholar. Two life history traits—minimum duration of larval stage and development type—still had data missing after these literature searches. For these species we made reasonable assumptions for some life history traits based on species taxonomic group. For example, in the absence of a source, we assumed colonial ascidians to be lecithotrophic (yolk-feeding), with minimum larval durations of less than 24 hours. We kept data with added assumptions separate from data with concrete sources so that we could ascertain if inclusion of the assumptions differentially affected the model outcome. Several life history traits (e.g., larval temperature and salinity tolerance, larval duration, age and size at maturity, depth range) were still missing from more than 25% of the species and so they were no longer considered in the model construction. The final trait variables that had full representation in the dataset were: maximum body size, adult mobility (sessile or mobile), adult habitat (epifaunal or infaunal) and larval development type (planktonic or non-planktonic). Overall, the final dataset contained 138 species.

From the literature we also extracted for each species its time since the first record of introduction anywhere in the world.

Quantifying distribution data

We used species occurrence data from the Global Biodiversity Information Facility database (GBIF, to quantify species ranges. Specifically we calculated the total length of coastline occupied by a species in its non-native range.

Ranges were determined for each non-native species by first downloading from GBIF exact latitudinal and longitudinal coordinates for each point of the species’ occurrence and then partitioning non-native from native occurrences using the previously described literature search. We then introduced the geographic coordinates for the non-native occurrences of each species into Google Earth and used the measuring tool to measure ranges (in km), spanning occurrences, along the coastline of each ocean basin in both northern and southern hemispheres. That is, separate range distributions were calculated for the east and west coastlines of the Atlantic and Pacific Oceans in both the Northern and Southern Hemispheres. Where multiple points of occurrence occurred on large islands (e.g. Tasmania), ranges were measured around the coastline in the same manner as continents; occurrences on oceanic archipelagos >1000km from a continental coastline were treated as a separate distribution and ranges were measured as a straight line through the cluster of occurrences. All single, isolated points of occurrence were given a common distance measurement of 50 km. To prevent overestimation of ranges, if two neighboring points of occurrence on a coastline were >1500 km apart, then the range was not considered continuous through this stretch and the ranges on either side of those points were tabulated as disparate. The distances of ranges for all disparate segments along each coastline (and oceanic islands) were then summed across all coasts to give a total range size in km.

Any single, isolated points of occurrence in GBIF that had no corroborating documentation within the literature were ignored. (Such instances were few and their inclusion would have only added 50 km to the total calculated range). We also eliminated coordinates if they clearly exhibited human error, such as points with zero values for both latitude and longitude, or if coordinates were located inland (e.g. as is sometimes the case for museum specimens).

For 25 known non-native species without GBIF occurrences, we assigned coordinates to the range distributions extracted during our extensive literature search. For most of these species, exact latitudinal coordinates were given, but in a handful of cases only general descriptions were given (e.g. “coast of California”), in which case we assigned coordinates by recording a geographic coordinate every 200 km along the coastline of the reported locale (e.g., every 200 km along the coast of California). To test the concordance of the GBIF and literature-based data gathering approaches, we applied the literature-based method to ten randomly chosen species with available GBIF coordinates to calculate how similar the non-native ranges were between these two methods. Ranges of the two methods were generally congruent, although species with available GBIF occurrence data tended to slightly underestimate the literature-extracted geographic ranges. We organized the range distribution data to allow for analysis of species collected with GBIF methodology both separately and combined with the species extracted with literature-based methods. Finding no substantial differences between the pooled data and the GBIF data alone, we used the pooled dataset for formal analyses.

Physical data

We collected seasonal and annual oceanographic data for each species in its non-native geographic ranges using the same coordinates of species occurrence points we extracted from GBIF (or the literature for those 25 species). We extracted oceanographic parameters—current speed, temperature and salinity—for each species occurrence point from the closest one-degree by one-degree boxes included in both the World Ocean Atlas 2009 ( and the Drifter Data Assembly Center ( This is the highest resolution global data set with coastal coverage that includes ocean currents. We calculated the mean, standard deviation, minimum and maximum for each parameter across all occurrences for all seasons (defined as standard oceanographic seasons for northern and southern hemispheres), as well as for the year in total. An annual range in temperature and salinity was also calculated.


We used a model information-theoretic framework to test the relative influence of time since introduction, biological characteristics of species and physical habitat properties on non-native range distributions. Residuals of the data were nearly normal and so they did not require transformation. We fit multiple linear regression models to the measured cumulative range size of each species and selected the minimally adequate models with corrected Akaike’s Information Criterion (AICc). We estimated the standardized beta coefficients of all variables (including categorical ones) to enable comparisons of relative importance among the predictor variables.

Predictor variables included time since introduction, habitat (epifaunal vs infaunal), mobility (sessile vs mobile), maximum body size, development type (planktonic vs non-planktonic), mean current speed in spring, annual current variability, mean spring temperature, annual standard deviation in temperature, annual mean salinity and annual standard deviation in salinity. Minimum larval duration, a trait that was only available for a small subset of the species, was also explored but showed extremely little effect, so was removed to increase the overall degrees of freedom in the model by having a more complete dataset. Instead, life history attributes like development type were considered a categorical proxy for larval duration.

We pared down the large list of physical variables by generating a covariance matrix of seasonal and annual temperatures and then again for salinity. Many variables were highly correlated (R > 0.9) and among such clusters we selected one representative variable among those where R > 0.7 for entry into the final model competitions. Mostly, this meant using yearly averages and standard deviations. The only exceptions were for mean current speed and temperature for which we used spring values since this is the peak period of spawning for many species with broadcast larvae41 and thus a time when larvae in the water column and thus spread rates of populations, can be most influenced by these variables42.

To identify a minimally adequate model, we first fit linear regressions with all possible combinations of independent variables. A null (intercept only) model was also compared. Next, we selected the model with the lowest AICc and calculated ΔAICc for each model, which indicates the difference in model parsimony as explained by AICc relative to the best model; lower ΔAICc values indicate higher support for a model. Because other models may be nearly as good (AICc nearly as low), we also calculated Akaike weights (wi) across the four best models for each number of variables up to seven43. Finally, for each variable included in the models we calculated its Relative Variable Importance (RVI), which is the sum of Akaike weights across all models that included that particular variable. Because of the large number of models, we conducted these RVI calculations on a subset composed of the best 40 models (lowest AICc). AICc is a conservative version of AIC and often used when sample sizes are small. Although we were not overly concerned with small sample sizes, AICc converges on AIC as sample size increases and is considered a more conservative metric44.

To determine the robustness of results we also analyzed data with conditional random forest45 and boosted regression trees46. Tree-based algorithms have different assumptions to linear regression and consistency between predictions of various methods can be used to gauge confidence in results. Random forest and boosted regression trees were done using the party45 and dismo packages in R v. 3.1.1, respectively.

Consistent with linear regression, both conditional random forest and boosted regression trees found ‘time since introduction’ to be the overwhelmingly strongest predictor (Fig. S1, Supp Materials 3). It was considered over 7 times more important than any other variable by conditional forest and approximately 3 times more important by boosted regression trees. Both algorithms rank mean spring temperature and mean annual salinity the second and third most important variables respectively, but both with much lower importance than time since introduction.

Additional Information

How to cite this article: Byers, J. E. et al. Invasion Expansion: Time since introduction best predicts global ranges of marine invaders. Sci. Rep. 5, 12436; doi: 10.1038/srep12436 (2015).