Network analysis of ballast-mediated species transfer reveals important introduction and dispersal patterns in the Arctic

Rapid climate change has wide-ranging implications for the Arctic region, including sea ice loss, increased geopolitical attention, and expanding economic activity resulting in a dramatic increase in shipping activity. As a result, the risk of harmful non-native marine species being introduced into this critical region will increase unless policy and management steps are implemented in response. Using data about shipping, ecoregions, and environmental conditions, we leverage network analysis and data mining techniques to assess, visualize, and project ballast water-mediated species introductions into the Arctic and dispersal of non-native species within the Arctic. We first identify high-risk connections between the Arctic and non-Arctic ports that could be sources of non-native species over 15 years (1997–2012) and observe the emergence of shipping hubs in the Arctic where the cumulative risk of non-native species introduction is increasing. We then consider how environmental conditions can constrain this Arctic introduction network for species with different physiological limits, thus providing a tool that will allow decision-makers to evaluate the relative risk of different shipping routes. Next, we focus on within-Arctic ballast-mediated species dispersal where we use higher-order network analysis to identify critical shipping routes that may facilitate species dispersal within the Arctic. The risk assessment and projection framework we propose could inform risk-based assessment and management of ship-borne invasive species in the Arctic.


Results
Species introduction to the Arctic. In the global shipping network comprised of nodes (representing ports) and edges (representing direct shipping routes among ports), we first investigate species introduction pathways to the Arctic (pathways connecting Arctic ports to non-Arctic ports, illustrated in Fig. 1A as the purple lines connecting the green non-Arctic ports to orange Arctic ports, see methods for details). The introduction pathways form a network containing 310 Arctic ports (using the Arctic conservation area boundary defined by the Arctic Council), and 7187 non-Arctic ports that could be the direct source of invasive species. From 1997 to 2012, voyages in the LLI data form 3,902 active introduction pathways. We exclude pathways from the same or neighboring ecoregions (geographic regions with similar environmental conditions as defined by 26,27 ), because species could possibly disperse to these nearby locations naturally 26,27 . Removing pathways using this ecoregion criteria in effect reduces the probability of a species jumping between ports in the same or neighboring ecoregions to zero, and further reduces the number of introduction pathways from non-Arctic ports to Arctic ports we analyzed to 2,874.
We conduct a predictive analysis of ship-borne species invasion risks in the Arctic based on the evolution of shipping patterns observed from 1997 to 2012 (Fig. 6). We use a linear model for this task, given the sparsity of available data, because it requires the least number of variables and assumptions. With increased human activities in the Arctic, the properties of the species introduction network evolved over time (Fig. 1B, Supplementary Table S1) from 1997 to 2012. We use a simple linear model because we have no strong basis for expecting any particular alternative functional form, and the linear model makes minimal assumptions. Linear trend lines, which we do not convey as predictions, plus confidence intervals are projected for the next 15 years. In general, we observe an increase in shipping activities in the Arctic from 1997 to 2012, shown by the significant increase of number of voyages (+ 128/year, p < 0.05), the total dead weight tonnage (DWT, a crude estimate of propagule pressure) of the ships (+ 2.52 × 10 16 /year, p < 0.01), and the average capacity of ships (DWT per voyage, + 253/year, p < 0.05). On the other hand, the number of distinct introduction pathways only increased by 2% over the same period (p > 0.05). These observations indicate that the increased shipping activities in the Arctic are reflected in heavier shipping traffic per introduction pathway, as opposed to an increase in the number of introduction pathways.
We also simulated how the observed emergence of hubs influences the aggregated risks of invasion at ports (details in the Methods section). Counter-intuitively, the number of recipient ports per introduction pathways has been significantly decreasing (− 2.03/year, p < 0.01) despite increased shipping activity in the Arctic (Fig. 1B). As a result, a few ports have experienced a significant increase in introduction pathways (Murmansk's introduction pathways grew from 110 in 1997 to 158 in 2012). The evolution of the species introduction network shows that the shipping traffic was previously more evenly distributed amongst Arctic ports but is gradually being "rewired" to a few hub ports in the Arctic. From the networks perspective, this rewiring process is a real-world example of the network evolution following preferential attachment 32 .
Focusing on individual introduction pathways, we denote the relative risk of species introduction from a non-Arctic port i to Arctic port j as P i→j . Note that P i→j is not an absolute probability of species introduction, but a robust metric of the relative risk of species introduction determined by shipping frequency, ship size, ship type, trip duration, ballast water exchange amount and frequency (see "Methods" section; environmental similarities that influences risk of establishment will be discussed in-depth in the next section). In general, over the 15 years, we observe a significant increase in the average P i→j per introduction pathway (+ 9.99 × 10 4 /year, p < 0.01), a combined effect of the increase of shipping frequency, ship size, and the relatively stable number of introduction pathways (Fig. 1B).
We visualize the species introduction pathways in Fig. 2, with colors indicating relative risk of introduction P i→j across 15 years from 1997 to 2012. Many high-risk pathways (red paths in Fig. 2) originate from Northwestern Europe (e.g., Rotterdam, Hamburg, and Amsterdam) and point to Arctic ports of Narvik in Norway and Murmansk in Russia (Supplementary Table S2). For each port in the Arctic we aggregate the risk of introduction denoted by node sizes in Fig. 2. Ports associated with high-risk pathways (such as Murmansk) demonstrate high aggregated risks; however, even some ports such as Afognak and Kodiak (both in Alaska) associated with low-risk pathways demonstrate high aggregated risks because of the substantial number of different ports to which they are connected. Full data of species introduction pathways (per year and overall) are available online.
Risk of species establishment in the Arctic. We now consider the environmental constraints on species establishment, and how these will alter the topology of the species spread (introduction + establishment) network. Figure 2 shows that many spread pathways connect the Arctic to environmental dissimilar ports as distant as Australia, South America, and Africa. Many species are unlikely to survive translocation between distant ports and the Arctic because of the temperature and/or salinity differences between the regions. Similarly, species being transported from the freshwater Great Lakes ports would likely not establish in a marine Arctic port due to wide salinity differences. To present a more detailed view of establishment risk to the Arctic, we categorize species into six groups that reflect different environmental tolerances to temperature changes (∆t ≤ 2.9° and ∆t ≤ 9.7°) based on estimated long-term thermal tolerances of marine invertebrate taxa 33 and salinity ranges (∆s ≤ 0.2 ppt, ∆s ≤ 2 ppt, and ∆s ≤ 12 ppt) that reflect obligatory freshwater, obligatory marine, and euryhaline species, respectively 16 , and then filter the introduction pathways based on the ports environmental condition 24 . This sensitivity analysis illustrates which Arctic species introduction pathways are most likely to lead to species establishment for species in a given tolerance group (Fig. 3, Supplementary Table S3), and is meant as an intermediate approach between estimating establishment risk naively (i.e. assuming it corresponds directly to introduction risk without accounting for environmental constraints) and estimating establishment risk posed by a particular species with specific temperature and salinity tolerances. The latter requires detailed knowledge that www.nature.com/scientificreports/ is not currently available for most species. We also assume that temperature-salinity interactions, environmental plasticity or adaptive evolution are negligible because, even though there is evidence of the contrary (e.g. [34][35][36], these processes are currently outside of the scope of our model. We first looked at the introduction pathways connecting Arctic ports with non-Arctic ports that have almost identical environmental conditions ( Fig. 3 top left, ∆t ≤ 2.9°, ∆s ≤ 0.2 ppt). Although the number of introduction pathways available to species in this tolerance group are limited, these introduction pathways need particular attention from policy makers, because all species groups (including those most sensitive to environmental changes), will have the potential to establish in the target Arctic port once introduced. These pathways including those from Port Alfred (Canada) to Churchill (Canada), and from Seaham (UK) to Akranes (Iceland).
Our analysis can also inform decision-makers regarding the role of temperature and salinity in shaping the species introduction network. Churchill (Canada) has the highest combined risk of species introduction P j = 3.6% (see "Methods") for the most sensitive species group (∆t ≤ 2.9°, ∆s ≤ 0.2 ppt). When the temperature tolerance constraint is loosened to ∆t ≤ 9.7° and salinity tolerance held constant (∆s ≤ 0.2 ppt), Churchill (Canada) remains the most vulnerable port with P j unchanged (Fig. 3, top right). But when the salinity constraint is loosed to ∆s ≤ 2 ppt and temperature tolerance held constant (∆t ≤ 2.9°), Afognak (USA) and Dutch Harbor (USA) emerge as the more vulnerable ports (with P j = 5.6% and 4.7%, respectively), because species from ports in Northeastern Asia (notably Japan and Russia) can likely survive in the new environment (Fig. 3, mid left). In general, going from top left to bottom right on Fig. 3, we can learn which introduction pathways are open to species that are more resilient to temperature or salinity changes. Such information makes it possible to focus on the physiological limits of organisms when devising targeted management strategies for specific routes. Full data of species introduction routes and aggregated risks at ports under different environmental constraints are available online.

Ship-borne species dispersal within the Arctic: direct and indirect pathways.
Here we shift our attention from identifying the Arctic ports at highest risk of initial introduction and establishment to the relative risk of subsequent ship-driven dispersal among ports within the Arctic, referred to as dispersal pathways (orange pathways in Fig. 1A). We model species dispersal as a network: nodes represent Arctic ports, links that connect nodes represent the species dispersal pathways, and links' weights (strengths) represent the relative risks of pathways. From 1997 to 2012, voyages in the LLI data form 1,269 direct dispersal pathways within the Arctic (Fig. 4). Most high-risk connections (denoted with thick lines) exist between ports in Arctic Europe, including those connecting Murmansk in Russia and Tromso in Norway (top 10 risky pathways in Supplementary Table S2). The aggregated risks of dispersal (denoted with node sizes) at ports in Norway and Iceland are particularly high, due to the many strong dispersal pathways connecting to them. A projection of the species dispersal within the Arctic is shown in Fig. 1C. We observe a significant increase in the number of voyages, ship size, and the number of incoming pathways from the largest hub (+ 0.89/year, p < 0.04). Full data are available online in our public repository.
The analyses above focus on the direct dispersal pathways pointing to a ship's next destination from the ship's current port but ignore the fact that a ship's next destination can also depend on its previously visited ports. Ships' multi-step movement patterns (i.e., higher-order movement patterns) can help refine the modeling of indirect species dispersal through intermediate ports. In the illustrative example in Fig. 5A we consider six ships coming from Tromso or Narvik, going through Murmansk, and heading to Bodo or Nuuk. The conventional first-order i→j from non-Arctic port i to Arctic port j. Sizes of nodes indicate the risk of introduction to that port P j aggregated over all voyages into that port. Relative introduction risks and port introduction risks calculated using shipping data from 1997 to 2012. The black outline delineates the Arctic boundary. Figure  www.nature.com/scientificreports/ network analysis 16,37 simply counts the number of trips between port pairs to model direct species dispersal. We refer to this network as first-order network of species flow (SF-FON). In this case, the number of trips, and therefore edge weight or "risk", is three for all trips between ports including from Tromso to Murmansk, from Narvik to Murmansk, from Murmansk to Bodo, and from Murmansk to Nuuk (Fig. 5B, left). However, according to the ship trajectories in Fig. 5A, ships coming from Tromso through Murmansk are more likely to go to Bodo (edge weight two) than Nuuk (edge weight one), and ships coming from Narvik are more likely to go to Nuuk than Bodo. Studies show that ships do not always discharge all of their ballast at each port 38 . If the ballast water management at Murmansk is not 100% effective (i.e., ship did not fully discharge or treat ballast water at Murmansk), species from Tromso are more likely to reach Bodo than Nuuk through the intermediate port Murmansk, following the higher-order ship movement patterns. This important information on indirect species dispersal is completely hidden from SF-FON (Fig. 5B, left), but is obvious if a higher-order network representation 31 is used. Therefore, in this section, we use higher-order network of species-flow (SF-HON) 31 . Figure 5B, right shows an example. SF-HON splits the node Murmansk into two nodes "Murmansk given the last port being Tromso" (Murmansk|Tromso) and "Murmansk given the last port being Narvik" (Murmansk|Narvik), each node having differently weighted outgoing edges to Bodo and Nuuk. Decision-makers can derive this additional information on indirect species dispersal from SF-HON that they would miss if they relied merely on the SF-FON analysis. www.nature.com/scientificreports/ The example of considered SF-HON structure and dynamics outlined above is likely to be very important to species dispersal: ship movements demonstrate up to fifth order dependency which can only be effectively captured by higher-order network modeling 30,31 . Moreover, higher-order structures influence clustering, ranking, diffusion, network representation, and anomaly detection in networks 30,[39][40][41] . However, SF-HON structure and its implications have not been studied for Arctic shipping. Here we provide a systematic analysis of the Arctic species dispersal network using SF-HON and compare that to SF-FON. The details of SF-HON construction are provided in the Methods section and supplementary Fig S2).
We first visualize the SF-HON of Arctic species dispersal pathways in Fig. 5C, where nodes represent ports (with labels in the form of [CurrentPort] | [PreviousPorts]), edges (lines connecting nodes) represent species dispersal pathways between Arctic ports, and edge weights (line widths) represent species dispersal probabilities (only non-trivial pathways with P i→j ≥ 0.001 are shown). To highlight the connections among ports, instead of placing ports on a map, we use a layout that places nodes with stronger connections closer to each other 42 . Ports in the network are automatically grouped into five distinct clusters; connections within each individual cluster are denser and stronger than connections between different clusters. Although the algorithm behind the layout in Fig. 5C does not include any geographical information, ports still self-organize into clusters that are clearly separated by geographical regions: specifically, the Murmansk cluster (blue) has only a few weak connections to the Icelandic cluster (yellow), and the Alaskan cluster (red) has no significant dispersal pathways to other clusters. This provides opportunities for more effective species management: species control policies targeting the loose connections between clusters (such as those between Murmansk and Iceland clusters) can effectively prevent or slow species propagation from one cluster to another with management target at only a few voyage routes.
We next show how the dynamics of indirect species dispersal in the Arctic is more accurately reflected in SF-HON than in SF-FON. Species dispersal can be thought of as a random walk process on the network: starting from a port i in the Arctic, the species will randomly travel through one of the outgoing connections as its next step of dispersal, and the probability of choosing the connection is proportional to the relative strength of the introduction risk P i→j ; that is, P i→j = P i→j k P i→k . In the SF-FON model in Fig. 5B left, a random walker (representing a species) starting from Tromso may first disperse to Murmansk through shipping, then have 50/50 chances of dispersing directly from Murmansk to Bodo and Nuuk. In reality, before the species is established in Murmansk and has reached a sufficient propagule pressure to disperse from Murmansk, the species would have already been carried by the same ship to the next port (Bodo/Nuuk), causing indirect dispersal from Tromso to Bodo/Nuuk (assuming the ship's ballast water is not fully treated at Murmansk). This indirect species dispersal behavior is more accurately reflected when random walkers move along the SF-HON model (Fig. 5B right), which will have 2/3 and 1/3 chance dispersing from Tromso through Murmansk to Bodo and Nuuk, respectively. In other words, the species dispersal modeled by random walk on SF-HON is not completely random like on SF-FON but follows higher-order ship movement patterns.
We further show that the relative rankings of ports by species dispersal risk are different under SF-FON and SF-HON network representations. We approximated species dispersal dynamics using the well-established PageRank 43 "random walking with resets" algorithm. This is shown in Fig. 5C, where the node size indicates the port's risk of receiving indirect species. These species dispersal risks are fundamentally different from the aggregated risk of receiving species dispersal illustrated earlier in Fig. 4, which was computed on SF-FON and only aggregates risks of direct species dispersal pathways to a port, regardless of ships' higher-order movement patterns. A side-by-side comparison of port risks of direct dispersal on SF-FON and indirect dispersal on SF-HON is provided in Supplementary Table S4. Taking Reykjavik in Iceland as an example, the direct risk of receiving species dispersal for Reykjavik only ranks 14th among all Arctic ports, since the pathways connecting to Reykjavik are weak ( max i P i→Reykjavik = 0.14 compared to max i (P i→Murmansk ) = 0.70 ). However, Reykjavik ranks 3rd for receiving indirect species dispersal, and is the central port in the Icelandic cluster, therefore, species at other ports in the Icelandic cluster have a high probability of eventually flowing to the topologically highly connected port of Reykjavik.
The case for Reykjavik as a high-risk recipient port of indirect species dispersal is consistent with recent reports of several ship-borne non-native species establishing themselves in southwest Iceland 44 . Another example supporting the SF-HON species dispersal is the presence of at least 8 cryptogenic or non-native species in Dutch Harbor Alaska 45 , a port that ranks 8th in the risk of receiving indirect dispersal but only 24th in the risk of receiving direct dispersal. Unfortunately, more complete and rigorous testing of SF-FON and SF-HON risk rankings in Arctic ports is not possible given the absence of biodiversity surveys conducted in the same ways in multiple ports. Additionally, since new species may take years or decades to establish and become detectable 46 , current survey data may not present the true picture of recent introductions. Therefore, we recommend that, in devising management strategies, the direct dispersal-based ranking is suitable for targeted short-term policies focusing on direct introduction (i.e. species in a location which pose a known risk), and the indirect dispersalbased ranking is a better guideline for long-term prevention strategies.
Case study of higher-order species dispersal pathways. In this section we present a case study of species introduction pathways through first-order and higher-order species flow networks (using SF-HON). We focus on the port of Murmansk and compare the propagation pattern through the first-order node of Murmansk, the second-order node of Murmansk|Tromso, and Murmansk|Hammerfest. The first, second, and third step of propagation pathways for each port are visualized in Fig. 6. Two key patterns emerge. First, a first-order model for simulating the propagation pattern results in several potential vulnerable ports that need to be targeted for management, even at the first step of propagation (compare Fig. 6c1, a1, b1  www.nature.com/scientificreports/ propagation pathways can become even more complex as we consider second and third steps of species propagation. Considering the propagation through the second order ports, however, leaves far fewer targeted ports and highlights the few important pathways that should be targeted given each second-order node. As a result, using the first-order representation is not helpful when devising targeted control policies which require costly planning and resources. Second, each second-order port has a unique propagation pattern resulting in different vulnerable ports at each step. For example, knowing that species were introduced from Hammerfest through Murmansk, the control policies should target Varandey and Longyearbyen to prevent the second step propagation (Fig. 6a2), while for species that were introduced from Tromso through Murmansk the prevention policies should target Varandey, Dudinka, Advent Bay, Torshavan, and Reykjavik (Fig. 6b2). SF-HON provides a more fine-grained view of species introduction pathways based on previous destinations. Therefore, preliminary prevention strategies based on SF-HON can optimize the cost of management by accurate identification of the potential introduction pathways based on species origin.

Discussion
Key observations and applicability to management. Our work presents an assessment and projection of ballast mediated non-native species introduction and establishment tailored to inform arctic policy and management, and is the first attempt to integrate higher-order network approaches to analyze the within-Arctic dispersal risk. While future improvements in the available data sets and methods will build upon these results, the current analysis represents first guidance as to which ports might be prioritized in management efforts, given limited resources and the Arctic Council's goal of protecting the Arctic region from invasion 22 . The UN's International Maritime Organization's (IMO) 2004 International Convention for the Control and Management of Ships' Ballast Water and Sediments came into force in September 2017 and will provide a uniform international policy framework for reducing ballast-mediated introductions. Management mandated by the IMO convention could be augmented by geographically targeted efforts coordinated among Arctic Council member nations to protect the Arctic from threats revealed by our analyses.
Our analyses of the past 15 years' trend presents a first glimpse into the future risk posed to the pan-Arctic by ballast-mediated species introduction. Given current climate projections and the positive climate feedbacks on sea ice loss, our simple linear projection of current trends of shipping is no doubt wrong, and is more likely to be conservative than an overestimate. It indicates that shipping intensity (number of trips, average or aggregated ship capacities) and the average risk per introduction or diffusion pathway is likely to continue to increase. While www.nature.com/scientificreports/ the number of pathways may remain steady, the pathways are likely to be rerouted through Arctic shipping hubs such as Murmansk. Increased management at these emerging hubs could therefore have a disproportionately positive impact on risk of biological invasions across the entire Arctic. Our analysis of species introduction pathways highlights the potential for the introduction of highly tolerant species from distant locations normally thought of as being disconnected from the Arctic, including Australia, South America, and Africa. It also reveals high-risk introduction pathways densely distributed in Northwestern Europe (the Murmansk-Narvik region), an area that is environmentally similar to Arctic regions. We further demonstrated how the same routes pose different levels of establishment risk for species with narrow or broad environmental tolerance levels (Fig. 3). Specifically, there is spread risk from source ports in northeast Asia and Europe to neighboring Arctic ports from all but the most stenohaline species (e.g. Laurentian Great Lakes amphipod Gammarus fasciatus 47 ), and the most tolerant species are a risk to spread from these sources throughout the Arctic (e.g. the Red King Crab Paralithodes camtschaticus which has been stocked and subsequently established in the Barents Sea 48 ). This environmental risk analysis could be further refined for specific target species whose tolerances are well known (e.g. such as the eight species analyzed for habitat suitability in the Canadian Arctic by 49 ). Thus, our modeling framework can be used at the general level as we have done here or a species-specific level (if detailed knowledge exists of a species' environmental tolerances).
Our analysis of ballast-mediated species dispersal within the Arctic leverages the higher-order network to capture the influence on species dispersal of path-dependent ship movement patterns. The higher-order network analysis models indirect species dispersal, highlighting ports that might otherwise be ignored by management due to their weak connections to other ports, but which are still at risk through indirect dispersal in the long term. Our case study of species propagation in the Arctic ports highlights the advantages of using a higher-order network model for effective and accurate targeting of the potential vulnerable ports for early prevention strategies.
Overall, the current results could aid in the development of effective Arctic invasive species management policies. As summarized above, our risk assessment framework identifies shipping routes where ballast water management will have the greatest impact on the overall threat of a non-native species establishment. Given limited resources and the especially challenging Arctic environment, information like we provide can help guide the placement of resources for surveillance, prevention, and other species management strategies. Placebased strategies that decision-makers could consider include: (1) prioritizing locations for surveillance efforts, including new eDNA and other genetic-based methods, to inform early detection and rapid response efforts 50,51 ; (2) choosing the location for on-shore ballast water treatment facilities for high-risk regions or ports that are becoming hubs (e.g., Murmansk and Narvik where many high risk pathways connect) 52 ; (3) implementing strict management policies on the highest risk routes and ports, including routes through inter-cluster connections in the Arctic, and ports (e.g., Reykjavik) with a high potential of leading to further introduction and establishments; and (4) developing site-based and/or mobile equipment and protocols for rapid response control efforts when an incipient establishment is discovered.
Benefits of the higher-order network approach. Higher-order structures in networks influence clustering, ranking, anomaly detection, representation learning, and diffusion in networks 30,[39][40][41] , suggesting that these processes are important for understanding patterns of introduction risk within shipping networks. In an attempt to capture higher-order movements in their shipping networks, Keller et al. 24 assumed a fifth-order for all shipping paths while Seebens et al. 6 included the entire trajectory of ship movements in the species introduction risk. Although assigning a fixed order of dependecy for the shipping trajectories or exhaustively including all possible higher-order dependecies obtained from the entire shipping trajectories may fit well for the historically observed data, they can overcomplicate the ship movement models by forcing higher-order patterns when first-order is sufficient, and risk losing the generality towards the future by overfitting the history. To preserve generality, it is necessary to learn which higher-order patterns are most likely to represent introduction risk, and which occur at frequencies low enough to represent low risk. Xu et al. 30 proposed a higher-order network construction algorithm that only keeps higher-order structures when they are significantly different than that of lower-orders, but that algorithm treats all ships in the global shipping network equally and cannot be readily adapted to species introduction risk modeling. Saebi et al. 31 extended Xu et al. 's higher-order network algorithm 30 so that it respects the impact of ship type, ship size, trip duration and ballast water discharge patterns for more realistically modeling of species introduction. As a result, we apply the algorithm proposed in 31 to ensure that the networks presented here are a succinct reflection of the backbone of species introduction and dispersal pathways in the Arctic.
Opportunities for future improvement and application. We hope that our approach will be refined in the future in several directions. First, our framework can benefit from an improvement of the breadth of existing data set. For example, although the LLI data set we used is the most comprehensive data set available on global shipping, it has limited observations in the northern regions of Canada 53 . The NBIC ballast water data set, from which we parameterized the ballast water discharge model, is limited to the U.S. More comprehensive information about port types, port usage patterns and ships would allow for the implementation of machine learning approaches to modeling ballast water discharge and species introduction risks, producing more robust estimates of absolute risk.
Second, our framework can benefit from the real-time incorporation of new data that captures other aspects of rapidly changing conditions in the Arctic. Given melting Arctic sea ice and other warming temperatures, temperature and salinity in ports and other coastal environments may change rapidly, underscoring the need for the incorporation of more frequent and widespread environmental data. Future research could also couple www.nature.com/scientificreports/ ballast water discharge monitoring, species occurrence monitoring, and long-term monitoring of the introduction processes to complement existing data. Third, our framework may benefit from more detailed biological models. For example, some of the key biological relationships-the likelihood of species establishment as a function of the volume of ballast discharge or discharge frequency or concentration of organisms in discharge-could be improved with more empirical research, although the impact of such changes could be incremental and uncertain. Thus, given the nature of the fast-changing and non-comprehensive data, our analyses do not attempt to compute the absolute magnitude of introductions risks. Rather we use available data to estimate relative risks, which we believe are far more robust than any estimates of the absolute risk.
Finally, our risk assessment and prediction framework can be extended to include biofouling and integrated with Arctic biodiversity monitoring to enhance invasion prediction and mitigation. The absolute risk of establishment from biofouling has long been considered similar to, if not greater than, that from ballast discharges 54 , but is less studied and has not received as much policy attention as ballast-mediated introduction. However, as more data become available on biofouling patterns on different surfaces of different ship types, trip speed and detailed route trajectories (the satellite-based Automatic Identification System shows promise), our framework for assessing ballast-mediated introduction risk can be adapted to assess biofouling risk 31 . Our risk modeling framework can then be paired with species monitoring efforts, including newer methods based on environmental DNA 50 to prioritize high-risk ports and to test our model predictions of ship-mediated introduction through the Arctic in near real-time. This would enable increase accuracy of the models and, in turn, would facilitate more effective policy and enforcement.

Methods
Data sets and preprocessing. For ship movements, we utilized global ship movement data from the Lloyd's List Intelligence (LLI) for the years 1997, 1999, 2002, 2005, 2008, and 2012 (starting on May 1st of these years and ending on Apr 30th of the following years). This data set was organized by individual voyages, totaling 12,723,028 records across the six years. The data also included unique ship identifiers, ship type (150 categories), gross weight tonnage, dead weight tonnage, ship departure and arrival port and dates. Removing duplicate records yields 9,569,619 ship movements. The data was further subset to include only voyages to and between Arctic ports, herein defined as ports located within the boundary laid out by the Arctic Council's Conservation of Arctic Flora and Fauna working group 25 , yielding 48,364 voyages through 3902 introduction pathways from non-Arctic ports to Arctic ports, and 4715 voyages through 1269 dispersal pathways within the Arctic. We exclude pathways from the same or neighboring ecoregions further reducing the number of introduction pathways from non-Arctic ports to Arctic ports we analyzed to 2874. The ecoregions were defined based on Marine Ecoregion of the World (MEOW) 26 and Freshwater Ecoregion of the World (FEOW) 27 and attributed to a particular port based on its geographic coordinates.
For ballast water discharge, we used the records collected from the U.S. National Ballast Water Information Clearinghouse (NBIC) 25 for the years 2004 to 2016 for ships completing foreign and domestic voyages in Alaska, totaling 4926 records. This data set included information about vessel type (9 categories), ballast water discharge and gross weight tonnage. Records with missing information, zero discharges, or where recorded ballast water discharge exceeded recorded ballast water capacity were removed, leaving 1280 valid records. Since ships sailing to and through the Arctic face unique climate conditions and have distinct patterns (frequencies and amounts) of ballast water uptake/discharge, we subset the NBIC data to include only voyages to and between Arctic ports, and used that subset in our estimates of ballast water discharge patterns.
Calculation of introduction risks. The relative risk of non-native species introduction was first calculated for every ship movement in the Lloyd's data, then aggregated for every pathway. Inspired by the introduction risk equation from Seebens et al. 6 , for a ship s making the trip t from port i to port j, which took � (t) i→j days and discharged D (t) i→j ballast water, the relative risk of introduction for this trip is: based on the premise that species have a higher probability of being transported through the trip if there was larger amount of ballast water discharge, or if the trip was short increasing the probability of species survival in the ballast. The duration � (t) i→j of trip t was taken from the Lloyd's data, and the daily species mortality rate µ = 0.02 was chosen based on the work of Seebens et al. 6 . The species introduction potential per volume of discharge parameter λ was given as λ = 3.22 × 10 −6 based on Xu et al 11,16 , so that P (t) i→j is 0.8 when ballast discharge volume is 500,000m 3 and trip duration is zero. The volume of ballast water D (t) i→j translocated by trip t made by a ship of type k and gross weight tonnage GWT is estimated using a modified version of the approach of Seebens et al. 6 : where Zk is the fraction of non-zero releases for ship type k , and WGWT is the estimated discharge in metric tons for a ship with gross weight tonnage GWT . The 150 ship types in the LLI data were mapped to the 9 types present in the NBIC data and given ship type k from the 9 types in the NBIC data, the ballast water discharge frequency Z k is computed based on the NBIC data, yielding the mapping of k → Z k (Supplementary Table S5). We estimated WGWT , by removing the zero discharge records from the NBIC data, randomly splitting the data into training www.nature.com/scientificreports/ (70%) and testing (30%) data sets, and fitting a random forest regression to the training data in R. The random forest regression predicted WGWT as a function of ship type and gross weight tonnage and was validated using the testing set yielding an R 2 of 0.93 ( Supplementary Fig. S1).
We computed the introduction risk P (t) i→j based on ship size, ship type, and trip duration. Assuming P (t) i→j are independent, then the aggregated probability of introduction for non-native species for a pathway i→j is: and the aggregated probability of introduction for a given target port j is: To investigate species' sensitivity to environmental differences between their origin ports outside the Arctic and destination ports inside the Arctic we obtained annual average water temperature and salinity conditions for each port, from the Global Ports Database 24 wherever possible, supplemented by records of surface water conditions closest to ports' locations from the World Ocean Atlas 28 . We then filtered the introduction pathways into six groups that reflect different environmental tolerances to temperature changes (Δt ≤ 2.9° and Δt ≤ 9.7°) based on estimated long-term thermal tolerances of marine invertebrate taxa 16 and salinity changes (Δs ≤ 0.2 ppt, Δs ≤ 2 ppt, and Δs ≤ 12 ppt) reflecting freshwater stenohaline, marine stenohaline, and euryhaline species.
Simulating the topological evolution of pathways. We adapt the idea of the preferential attachment mechanism from the Barabási-Albert network 32 model based on the species introduction network in 2012, each pathway has a small probability of being rewired to a new Arctic port to simulate changes in shipping routes. We combine this projected topological evolution of pathways with the projected increase in introduction risk per pathway and compute the aggregated introduction and dispersal risks for Arctic ports in 2027.
Given the introduction or dispersal pathways in 2012, we repeat the following rewiring procedure: randomly choose a pathway, keep the strength of connection, and rewire the pathway to connect to a new target port (which cannot be the current source port); the probability of each port being chosen as the new target port is proportional to the number of incoming pathways. Therefore, the rewiring procedure is more likely to attach a pathway to a hub port that is already highly connected, resulting in a decrease of number of recipient ports. The rewiring process is repeated until the projected number of recipient ports has been reached (50 ports for introduction pathways in 2027, and 64 ports for dispersal pathways in 2027).
Higher-order network analysis. We used a higher-order network approach (SF-HON) to analyze the species dispersal within the Arctic. In the original work that proposed the higher-order network (HON) 30 , the algorithm constructs the network based on a sole source of data (e.g., ship trajectories), which cannot readily fit the needs of modeling species dispersal that is a function of multiple factors. In this work, we use SF-HON, the extension of 30 proposed by Saebi et al. 31 since it can take an arbitrary number of data sources as input, can have customized aggregation functions, and can have customized thresholds.
To clarify the SF-HON approach, we present a side-by-side comparison of the original algorithm and the extended algorithm used in this work, illustrated in Supplementary Fig. S2, to highlight the key differences in network construction. The original HON algorithm takes a single source of event sequence (illustrated as ship trajectories) as the input, whereas the species-flow higher-order network (SF-HON) not only takes ship trajectories but also ship types, trip durations and ballast water discharge for every ship movement as the input. The influence per trip is trivial in HON, in that every ship movement is treated as having the same weight and counted as "one trip" from source port to target port; in SF-HON however, the influence of every trip can be different due to variations in trip duration, ballast discharge, ship type and so on, therefore we compute the risk of introduction P (t) i→j separately for every trip. Aggregating the influence through a given pathway in SF-HON is accomplished simply by counting the number of trips observed through the pathway; in SF-HON we instead take the joint probability assuming different trips are independent. Finally, as the algorithm needs a parameter "minimum support" as the terminating condition for higher-order rule extraction, in SF-HON the minimum support is a positive integer, such that pathways with less than the specified trips through them will be discarded. In SF-HON the minimum support is extended to probabilities, and pathways with aggregated probability of species introduction less than the specified threshold will be discarded.
For further analysis of Arctic dispersal pathways, we apply the clustering algorithm developed by Rosvall et al. 55 to assign different colors to clusters in Fig. 5C. We use random walk with resets to calculate the ranking of ports based on species dispersal risk in Fig. 5C. While species dispersal dynamics can be approximated with random walks, realistically, we require the random walkers occasionally be relocated to a random Arctic port, so that (1) the random walkers will not be trapped in loops forever, and (2) random walkers have a small chance of following unobserved or new shipping paths. The idea of using "random walking with resets" to simulate multistep species dispersal is inspired by the well-established PageRank algorithm 43 , which was originally developed to model Web users' browsing behavior and rank Web pages' probabilities of being visited by a random user. Similarly, in the species dispersal context, a port's risk of receiving indirect species dispersal is the probability of being visited by a random walker flowing through the species dispersal network, denoted as the node size in Fig. 5C.