## Introduction

Based on the purported Quaternary geodynamic stability of the Sunda Shelf1,2,3,4,5, eustatic sea level fluctuations have generally been regarded as an important contributor to the recent Southeast Asia extraordinary biological diversification6,7,8,9. Divergence and speciation would increase during eustatic highstands and geographic dispersal during glacial sea level lowstands; this alternation would overall remodel the taxonomic composition of the regional biotas3,10,11,12,13. Under these circumstances, sea level oscillations would have acted as a species pump to increase the regional terrestrial biodiversity1,10,14,15,16. New findings17,18 have challenged the idea of the prevalence of eustatic controls and have demonstrated that the Sunda Shelf was subsiding throughout the Quaternary, being permanently subaerial before 400 ka. With Sundaland exposed for such a long period, the assumption of sea level dominance is called into question implying that the distribution of Sundaland biogeography either predates the Quaternary10,19,20 or is modulated by additional factors including variations of edaphic properties12,14,21, fragmentation of forested habitats4,8,21,22,23 or changes in paleoclimate conditions4,24,25.

In this study, we evaluate the contributions of overlooked surface processes on the last 500 kyr biodiversity dynamic of the region. Although investigations on the role that landscapes might have played in the biological evolution of Southeast Asia can be traced back to Darwin and Wallace1, limited work has been conducted on the potential roles of geomorphology in shaping the relationships observed in the present-day assembly of the region’s biota5,13,26,27.

Previous work28,29,30 has shown that changing landscape morphology is a major driver of species dispersal strategies as they track their optimum habitat31,32. Species running out of suitable habitats either become isolated, go extinct or are forced to coexist with others and adapt. Fragmentation of habitats could favour local endemism and higher speciation rates, both of which contribute positively to increase biodiversity28,33. Here we test the role of landscape dynamic in modifying connectivity and dispersal corridors across the exposed Sunda Shelf using a series of calibrated surface evolution simulations34 forced with eustatic, climatic and tectonic conditions. From changing physiographic characteristics, we then estimate the spatio-temporal permeability of the shelf to lowland evergreen rainforests species movement using a connectivity model35. Finally, we use a combination of statistical approaches to quantify preferential and persistent paleo-migration pathways across the shelf since the Late Pleistocene.

## Results

### Sensitivity of Sundaland flooding history to tectonic and sea level conditions

To evaluate paleo-stream paths and associated paleo-drainage basins evolution, we performed a series of landscape evolution simulations34 over the last one million years (Fig. 1a). Each simulation accounts for fluvial incision, sediment transport and deposition (in endorheic basins and in the marine realm) as well as hillslope processes (Methods, Fig. 1b). All scenarios are constrained by three external forcing mechanisms: (1) climate, as rainfalls, (2) eustatic sea level fluctuations, and (3) vertical tectonics. In this study, we tested five combinations of these forcing conditions (Supplementary Table 2). Two sea level curves were assessed (Supplementary Fig. 1c), the first three scenarios used the reconstruction from36 and the last two the sea level stack proposed by37. We explored three different tectonic regimes. In scenario 1, we assumed no vertical deformation; in 2, a uniform subsidence rate of −0.25 mm/yr17, and for the last 3 we imposed a tectonic map (i.e., vertical land motion, Supplementary Fig. 1b) built from available regional uplift and subsidence rates (Methods, Supplementary Table 1). Finally, in all cases except scenario 4 (where we assumed a uniform rainfall of 2 m/yr), we imposed derived precipitation estimates from the PaleoClim database38,39,40. Landscape evolution models were calibrated against paleo-rivers derived from phylogenetic studies9,13 and seismic surveys as well as flooding events retrieved from boreholes data41 in the Malay Basin (box A in Fig. 1b) and sediment accumulation across the Sunda Shelf41,42,43,44 (Methods, Supplementary Table 3). In summary, scenarios 1 and 2 are used to isolate the effects of tectonics, while scenario 4 is used as a control for rainfall regime. Scenarios 3 and 5 are used to evaluate the role of sea level on the flooding history of the shelf.

We analysed the flooding history of Sundaland for the five proposed scenarios (Methods - Supplementary Fig. 1c) and found that except for the case with no subsidence (scenario 1), the shelf was fully exposed prior to 400 ka (MIS 11) in agreement with previous work17,18. At low sea level periods, sediments bypass the shelf and accumulate in more distal offshore regions43 (Fig. 1b), while minimal shelf erosion is driven by river incision. Conversely, during marine transgressions, terrestrial sediments fluxes mostly rest on the shelf (Supplementary Fig. 1d). Over 1 Myr, our simulations predict up to 105 m of deposition (for scenario 1, Supplementary Table 3) in the Gulf of Thailand and around 300 m sediment accumulation on the continental slope facing the southeast South China Sea (scenarios 2, 3 and 5 - Fig. 1b).

After 400 ka, level of marine incursions across the shelf are variable between tested scenarios (Fig. 1c and Supplementary Fig. 1c), with incomplete (<80% of the shelf flooded) flooding events during sea level highstands. For variable tectonic subsidence and even when considering the sea level curve with the highest amplitudes37 (scenario 5), pervasive (more than half the surface of the shelf) flooding prevails for <20% of the time (corresponding to ~80 kyr since MIS 11, Fig. 1c).

We found a clear sensitivity of the flooding history to both tectonic and seal evel conditions. For all scenarios with imposed tectonic forcing (2–5), land bridges between Borneo and surrounding regions are disrupted on several occasions. Conversely, most of the shelf remains subaerial over the last million years, and the terrestrial connection from mainland Asia to Java persists up to the Holocene. Our simulations refine recent work on Sundaland subsidence17 and provide, for the first time, quantitative estimates of the timing and extent of the flooding episodes across the shelf (shaded grey areas in Fig. 2a). From Fig. 2b–d, compiled timing of the divergences for different species groups and between populations varied widely over the last 500 kyr and cannot be attributed to a single event or mechanism such as eustatic sea level fluctuations18. Instead, this suggests that the divergences that did happen during the Late Pleistocene were more likely caused by a combination of different drivers acting on species diversification over tens of thousands of years6. In the following sections, we investigate the role that physiographic and climatic changes might have played on this regional biological diversification and how it affected Late Pleistocene diversification and post-divergence flow from older speciation events.

### Drainage basin reorganisation and river capture

Although paleo-drainage systems have been invoked to explain the spatial distribution of Southeast Asia freshwater aquatic species5,9,13, quantifying how physiographic changes might have affected regional biota has never been done. Here our first step is to model the region’s geomorphological evolution by characterising the main catchments dissecting the shelf and detailing in space and time the major phases of drainage basin reorganisations (Fig. 3).

In all models accounting for subsidence (scenarios 2–5), we found that successive marine regressions and transgressions promote expansions and contractions of individual drainage basins. This is indicated by the robust statistical correlations between sea level fluctuations and catchments characteristics (Pearson’s coefficients in Fig. 3b and Supplementary Fig. 2). We also found a 30–50% decrease in drainage area for both the Siam and Johore rivers during the largest intermittent flooding episodes (Fig. 3b and Supplementary Fig. 2).

In our simulations, the Mekong and Singapore main drainage divides remain stable over time (Fig. 3a). When tectonic forcing is considered, the shelf experiences at least one phase of drainage basin capture of the Johore and Siam catchments (shaded blue areas in Figs. 2a and 3b, Supplementary Figs. 2 and 3). For the model presented in Fig. 3 (scenario 5), the Johore, Siam and West Borneo rivers undergo several phases of drainage reorganisations. First, at around 330 ka, the Johore River captures neighbouring headwaters from the Siam Basin. Following a phase of flooding around 240 ka, the newly formed large drainage basin splits into two distinct catchments which have similar characteristics as the ones before the capture (Fig. 3b). Since 200 ka, the Siam and West Borneo basins also experienced two captures lasting for 60 and 15 kyr, respectively. Finally, we notice a large capture of parts of the Siam headwaters by the East Sunda Basin over the last 30 kyr. This capture lasts until the complete flooding of the Sunda Shelf 10 kyr ago. The simulation from scenario 5, which includes tectonics, sea level and precipitation maps constrained with Earth data, reproduces the number of interpreted marine intrusions41 (Supplementary Table 2). In addition, the simulated cumulative deposit thicknesses from this scenario is in the range of the ones observed across the shelf (Supplementary Table 2). Finally, it also shows the maximum number of reorganisations between the main catchments, making it the best candidate to estimate geomorphological impacts on species migration (Fig. 2a).

These robust simulated geomorphic changes could be critical to some species. For example, lowland freshwater biotas6,9,13,26 (Supplementary Fig. 1a and Table 4) have the ability to migrate across the shelf from mainland Southeast Asia to both the Malay Peninsula, Sumatra, Borneo and Java based on simulated drainage basins evolution and river captures (Fig. 3a and Supplementary Fig. 2). By comparing the major physiographic changes from our simulations with divergence times from different groups of terrestrial organisms over the past 320 kyr (Fig. 2b–d), we found that many of the Johore, Siam and West Borneo reorganisations often predate measured divergence times and population splits for the considered time period.

### Geomorphically derived cost surfaces for lowland evergreen rainforests species

From the landscape evolution models, we assessed the roles of geomorphology on past movements across the exposed Sunda Shelf over the last million years. Models based on structural connectivity can be used to evaluate species dispersal, gene flow, and other ecological functions of a particular landscape35,45,46. Often, a focal-species approach is implemented and considers only one or few species and their specific dispersal thresholds to evaluate the connectivity of a larger suite of species based on different resistance values derived from chosen landscape features. For our large-scale, long-term analysis, this approach may require choosing a large number of focal species to represent diverse habitat types35,46. An alternative technique, which does not depend on specific biological or ecological species characteristics is employed in this study. It consists in generating corridors for structural connectivity using a species-agnostic approach45,47 focused on particular landscape geomorphological attributes (Methods).

Even though species-agnostic methods do not rely on any particular species, they are often used to predict functional connectivity across large regions for multispecies favouring a similar type of habitat46 (e.g., closed forests, edge habitats, open prairies). Some paleo-environmental reconstructions suggest lowland rainforests across Sundaland were extensive during sea level lowstands4,24 while others hypothesize that the Pleistocene land bridges were much drier and substantially covered by grasslands or savannah22,23. Conversely, even in the presence of a transequatorial savannah, studies have proposed that gallery forests along paleo-rivers running through the xeric exposed land could have acted as corridors for forest-dependent taxa5,8,27. As a result, our geomorphically derived cost surfaces have been designed to evaluate connectivity pathways for lowland evergreen rainforests species since these organisms would be impacted by barriers to migration created by flooding episodes during sea level highstands and would also be affected by changes associated to drainage basin reorganisation and river captures under the assumption of limited rainforest habitats. In addition and depending on species, rivers network could be viewed either as corridor48,49 or as barrier50,51 to migration. To account for these two cases, we defined different cost surfaces related to closeness to rivers when evaluating landscape connectivity across Sundaland.

At specific time steps deemed representative of the different stages of Sundaland basins physiography and for the terrestrial domain, we compute cost surfaces from specific geomorphic features derived from our landscape evolution models. As previously mentioned, these cost surfaces are designed to represent long-term suitability of the landscape to the movement of lowland evergreen rainforests species and combine the following three natural landscape features: (1) the normalized landscape elevational connectivity (LEC) metric33 explained in the Methods, (2) the distance to main river systems where rivers are used either as corridors (Fig. 4b) or barriers to species movement, and (3) the average local slope.

We analyse LEC33,52 distribution for the simulated terrestrial part of the landscapes at specific time intervals corresponding to different sea level positions (Fig. 4a). Higher LECs correspond to places where a pool of species adapted to a particular habitat will have the ability to move up and down their initial and preferred habitat elevation and to migrate across the landscape more easily than lower LEC regions. The distribution of normalised LECs is maximum for regions within the Sunda Shelf elevation range. Graphs, in Fig. 4a, indicate that high connectivities are distributed around two regions. During full exposure and partial flooding, we found that LEC peaks around mid-elevation (between 10 and 80 m above sea level—blue line close to zmean in Fig. 4a). Mid-elevations provide maximum surface area (peak in elevation kernel density—red line in Fig. 4a), and species can easily move up or down to accommodate terrestrial habitats waxing and waning. Not surprisingly, higher elevations are characterized by low landscape elevational connectivities (Fig. 4a) which promote isolation and increase endemism29. A second peak is located on the higher elevational range of the Sunda Shelf (zmax). This second peak is within the shelf elevation range but corresponds to regions outside the shelf and relates to flood plains and plateaus found in the northern part of the simulated domain (i.e., Chao Phraya flood plain as well as the Khorat Basin in Thailand and to a lesser extent the Tonle Sap Lake in Cambodia).

For landscape connectivity computation, we assumed that costs related to LEC but also distance to river and local slope are equally weighted with categorical values ranging from 0–20 (Fig. 4b). Our choice of resistance and weight values implies assumptions about species movement ability, and many examples can be found where a natural topographic feature may act as a barrier for one species but not for another. The resistance maps are then combined to produce regular cost surfaces grids (5 km side squared cells), ranked from high cost (>30) that are impermeable to movement (e.g., ocean, remote/close from river drainage network, steep local slope and low elevational connectivity) to low cost (<10) when permeable to movement (e.g. terrestrial, close/remote to rivers, low local slope, and high elevational connectivity).

### Landscape connectivity in the exposed Sunda Shelf

We used circuit theory35 to extract preferential pathways over time. The approach relies on the temporal evolution of the cost surfaces from the species-agnostic approach46 presented in the previous section. For our study, we connected the eight neighbouring cells as an average cost using the pairwise mode35. In each pair of points, one point is given a current source of 1 ampere and the other is connected to the ground35. Effective resistance, voltage and current are then calculated over the landscape between these points. The operation is repeated between all pairs, and the final map is produced by summing current values calculated between all pairs.

The resultant current density maps are a prediction of functional connectivity, whereby high values (high current flow) represent a high probability of use by random walkers (Supplementary Fig. 4). In the case where rivers network represents corridors to migration, high current flows are mostly associated with main river systems dissecting the Sunda Shelf around mean shelf elevation range when the shelf is exposed (Supplementary Fig. 4a, b). During partial or full marine transgression events, the remaining exposed shelf exhibits extensive high flow regions showing that the limited shelf areas in such cases would become preferential corridors for species dispersal (Supplementary Fig. 4c). Highly constrained regions (i.e., high current flow - Supplementary Figs. 4 and 5a) represent corridors where movement would be funnelled. In these corridors, even a small loss of area would disproportionately compromise connectivity and thus may jeopardize species migration35.

To normalise connectivity over time, we compute current flow z-scores (Methods, Fig. 5 and Supplementary Fig. 5b). During full shelf exposure and when considering rivers as corridors to migration, the region is characterised by narrow high z-scores pathways which follow the Johore and Siam main paleo-drainage systems (Fig. 5 at 250 ka - top panel). At 250 ka, these pathways form a continuous migratory corridor across Sundaland and result from the capture of the Siam Basin by the Johore River (Fig. 3a). When rivers are considered as barriers to species movement, Sundaland exhibits patches of high connectivity areas delimited by the major streams flowing across the Johore and Siam basins (Fig. 5 at 250 ka - bottom panel). During partially flooded periods, the shelf is characterised by extended regions of high connectivity limited by the Titiwangsa Mountains on the Malay Peninsula to the west and the shallow marine incursion to the east (Fig. 5 at 400 ka). High connectivity regions become even more restricted when the shelf is nearly fully submerged (Fig. 5 at 13 ka) and represent essential areas for species migration due to habitat reduction and lack of alternative pathways5,53. It is worth mentioning that grey regions in Fig. 5 are not areas that did not support gene flow (e.g. between Java and Borneo) but exhibit low resistance to movement for geomorphology-derived cost surfaces and therefore are not showing up as high current density areas. Our results show that physiographic changes exert a strong control on the location of high connectivity pathways, especially when the shelf is fully exposed. Rivers either trigger new dispersal routes (Fig. 5 top panel) or fragment the environment in multiple habitats (Fig. 5 bottom panel), both of which are critical to biodiversity evolution.

### Hotspot analysis of preferential migration pathways

To explore spatio-temporal migration paths, we measure the degree of spatial clustering in successive current flow fields during periods of marine regression (corresponding to at least 50% of shelf exposure—scenario 5 in Supplementary Fig. 1c). For each map, we compute the Getis-Ord Gi index showing statistically significant connectivity hotspots (Methods, Supplementary Fig. 5c). We then stack all time slices to evaluate persistent corridors over geological time scales (Fig. 6a and c). Hotspots cover about 24% of Sundaland in both cases, with normalized Gi above 0.75 making ~8% of the region (Fig. 6b and d). Each combined hotspots map predicts that hotspots are preferentially located across the paleo Johore (Gulf of Thailand) and Siam basins and are currently between 40 and 80 m below sea level (Fig. 6e). Cold spots (blue regions in Fig. 6a and c) correspond to areas where gene flow is not constrained; hence current density in these regions appears to be diffused. Geomorphology-controlled hotspot regions (Fig. 6a and c) highlight a network of preferential, well-connected biodiversity corridors that favour species migration between mainland Southeast Asia and Sumatra, Borneo and Java.

Our approach provides an innovative way of reframing the dynamics of species evolution on our planet, by comprehensively integrating several components of the Earth system (tectonics, climate, and surface processes) over geological time scales. We found that the fast pace of physiographic changes in the region has the ability to drive diversification processes by facilitating habitat fragmentation and promoting preferential pathways across the shelf. From a methodological standpoint, we show that extracting high connectivity areas from landscape evolution simulations provides an opportunity to evaluate how surface processes influence species movements and drive diversification. Conceptually, we demonstrate that accounting for these dynamics to delineate past biodiversity hotspots is critical to advance our comprehension of species migration both across Southeast Asia and globally and to further our understanding of the dynamics of life on Earth.

## Methods

### Surface processes model and forcing mechanisms

Landscape evolution over the last one million years interval is performed with the open-source modelling code Badlands34. It simulates the evolution of topography induced by sediment erosion, transport, and deposition (Fig. 1a). Amongst the different capabilities available in Badlands, we applied the fluvial incision and hillslope processes, which are described by geomorphic equations and explicitly solved using a finite volume discretisation. In this study, soil properties are assumed to be spatially and temporally uniform over the region, and we do not differentiate between regolith and bedrock. It is worth noting that the role of flexural responses induced by erosion and deposition is also not accounted for. Under these assumptions, the continuity of mass is governed by vertical land motion (U, uplift or subsidence in m/yr), long-term diffusive processes and detachment-limited fluvial runoff-based stream power law:

$$\frac{\partial z}{\partial t}=U+\kappa {\nabla }^{2}z+\epsilon {(PA)}^{m}\nabla {z}^{n}$$
(1)

where z is the surface elevation (m), t is the time (yr), κ is the diffusion coefficient for soil creep34 with different values for terrestrial and marine environments, ϵ is a dimensional coefficient of erodibility of the channel bed, m and n are dimensionless empirically constants, that are set to 0.5 and 1, respectively, and PA is a proxy for water discharge that numerically integrates the total area (A) and precipitation (P) from upstream regions34.

Both κ and ϵ depend on lithology, precipitation, and channel hydraulics and are scale dependent34. All our landscape evolution simulations are running over a triangular irregular network of ~18. e6 km2 with a resolution of ~5 km, and outputs are saved every 1000 yr.

The detachment-limited fluvial runoff-based stream power law is computed with a $${{{{{{{\mathcal{O}}}}}}}}(n)$$-efficient ordering approach54 based on a single-flow-direction approximation where water is routed down the path of the steepest descent. The flow routing algorithm and associated sediment transport from source to sink depend on surface morphology, and sediment deposition occurs under three circumstances: (1) existence of depressions or endorheic basins, (2) if local slope is less than the aggregational slope in land areas and (3) when sediments enter the marine realm34. Submerged sediments are then transported by diffusion processes defined with a constant marine diffusion coefficient34.

All landscape simulations are constrained with different forcing mechanisms, and five scenarios were tested (Supplementary Table 2).

First, we impose precipitation estimates from the PaleoClim database38,39,40. These estimates are products from paleoclimate simulations (coupled atmosphere-ocean general circulation model) downscaled at approximately the same resolution as our landscape model (~5 km at the equator). Annual averages of precipitation rates are then used to provide rainfall trends in our simulations based on the ten specific snapshots available (from the mid-Pliocene warm period to late Holocene and present day). Between two consecutive snapshots, we assume that precipitation remains constant for the considered time interval. For exposed regions that are considered flooded in the PaleoClim database, we define offshore precipitations using a nearest neighbour algorithm where closest precipitation estimates are averaged from PaleoClim inland regions. To evaluate the role of precipitation variability on landscape dynamics, we also run a uniform rainfall scenario (2 m/yr obtained by averaging the annual precipitation rates from the PaleoClim database).

Secondly, the models are forced with sea level fluctuations known to play a major role in the flooding history of the Sunda Shelf11,13,53. Two sea level curves are tested (Supplementary Fig. 1d). To account for the inherent uncertainty in reconstructed sea level variations, we chose a first curve37 obtained from a sea level stack constructed from five to seven individual reconstructions that agrees with isostatically adjusted coral-based sea level estimates at both 125 and 400 ka. The second one is taken from the global sea level curve reconstruction36 based on benthic oxygen isotope data and has been recently used to reconstruct the subsidence history of Sundaland17,18.

The last forcing considered in our study is the tectonic regime. First, we chose to explore a non-tectonic model based on the default assumption of stability for the shelf17. Secondly, we assumed a uniform subsidence rate of −0.25 mm/yr recently derived from a combination of geomorphological observations, coral reef growth numerical simulations and shallow seismic stratigraphy interpretations17. Then, to represent the regional variations in the tectonic regime, we have compiled and digitised a number of calibration points (Supplementary Fig. 1b and Table 1) that were used to produce a subsidence and uplift map by geo-referencing calibration points and available tectonic polygons, and by Gaussian-smoothing and normalising the uplift and subsidence rates between the calibrated range to avoid sharp transitions in regions without observations. The resulting map does not account for fine spatial scale tectonic features such as fault systems43,55 or orogenic and sedimentary related isostatic responses. It rather represents a regional vertical tectonic trend with an overall uplift of Wallacea and NW Borneo regions and long-wavelength subsidence of Sunda Shelf and Singapore Strait17.

### Landscape evolution model calibration

The landscape models start during the Calabrian in the Pleistocene Epoch, one million years before the present. At each time interval, the landscape evolves following Eq. (1) and the surface adjusts under the action of rivers and soil creep (Fig. 1a). In addition to surface changes, we extract morphometric characteristics such as drainage basins extents, river profiles lengths (Fig. 3 and Supplementary Fig. 2), distance between main rivers outlet (Supplementary Fig. 3) and tracks the cumulative erosion and deposition over time (Fig. 1b and Supplementary Fig. 1d).

For model calibration, we perform a series of steps consisting in adjusting the initial elevation and the erosion–deposition parameters (i.e., κ and ϵ in Eq. (1)) to match with different observations.

The initial paleo-surface is obtained by applying the uplift and subsidence rates backwards to calculate the total change in topography for the 1 Myr interval. Then, we test the simulated paleo-river drainages against results from a combination of phylogenic studies9,13 and paleo-river channels and valleys found from seismic and well surveys41,42,44. Iteratively, we modify our paleo-elevation to ensure those main river basins (e.g., Johore, Siam, Mekong, East Sunda) encapsulate the paleo-drainage maps reconstructed using lowland freshwater taxa described in13 (Supplementary Fig. 1a and Table 4) and that the major rivers follow paleo-rivers systems derived from both 2D and 3D seismic interpretations (Fig. 1b).

For surface processes parametrisation, we tested different ranges of diffusion and erodibility coefficients and compared the final sediment accumulation across the Sunda Shelf (Fig. 1b) using estimated deposit thicknesses41,42,43,44. The Sunda Shelf is predominantly experiencing deposition over the past 500 kyr and increases in deposition are positively correlated with periods of sea level rise (i.e., Pearson’s coefficients for correlation with sea level above 80%, Supplementary Fig. 1d). After exploring a range of values, we set κ values to 1. e−2 and 8. e−2 m2/yr for terrestrial and marine environments and ϵ between 2.5 and 7.5 e−8yr−1 for the different scenarios to fit with chosen surveys dataset (Supplementary Table 2 and 3).

Upon uniform subsidence case (−0.25mm/yr), flooding is limited, and the shelf only undergoes two full marine transgressions (>60% of the shelf flooded) around 125 ka and during the last 10–20 kyr (Supplementary Fig. 1c). Upon spatially variable tectonics (non-uniform subsidence), partial flooding events are more pervasive, with higher magnitudes and greater temporal durations. Due to the shallow and flat physiography of Sundaland, we also note that even small increases in sea level amplitudes (<10 m, as bore between our two sea level curves36,37) could trigger up to a 30% increase in shelf area inundation (for instance the red and blue curves in Supplementary Fig. 1c during MIS7, ~200 ka). In addition, we further tested our combination of forcing mechanisms, initial paleo-surface and model parameters by counting the number of flooding events simulated within the Malay Basin on the Sunda Shelf (box A in Fig. 1b). For both sea level curves and variable tectonic map, we recorded at least five marine incursion episodes, similar to the number of events found for the same area in41 based on interpreted facies characteristics site survey borehole data and 3D seismic sections analysis.

### Landscape elevational connectivity

The landscape elevational connectivity (LEC) is a measure of the energy required by a pool of adapted species to move outside its usual niche width, up and down their initial elevation range, to spread and colonise any other habitats33. This metric can predict local species richness (α diversity) obtained from full metacommunity models when applied to real landscapes33. In a recent study52, comparisons between generated Badlands model landscapes using both uniform and orographic precipitation conditions have shown similar results when the metric is applied over geological time scales.

Landscape elevational connectivity at node i (LECi) relies on the following set of equations33:

$$LE{C}_{i}= \, \mathop{\sum }\limits_{j=1}^{N}{C}_{ji}\\ -{{{{\mathrm{ln}}}}}\,{C}_{ji}= \, \frac{1}{2{\sigma }^{2}}\mathop{\min }\limits_{p\in \{j\to i\}}\mathop{\sum }\limits_{r=2}^{L}{({z}_{{k}_{r}}-{z}_{j})}^{2}$$
(2)

where Cji quantifies the closeness between sites j and i with respect to elevational connectivity and measures the cost for a given species adapted to cell j to move to cell i. This cost is a function of elevation and evaluates how often species adapted to the elevation of cell j have to travel outside their optimal species niche width (σ) to reach cell i52. p = [k1, k2, . . . , kL] (with k1 = j and kL = i) are the cells in the path p from j to i.

Solving Eq. (2) relies on Dijkstra’s algorithm accounting for diagonal connectivity between cells and implemented using the scikit-image library56. Even with a balanced parallel implementation, the LECi calculation is slow as a Dijkstra tree for all nodes must be created, and least-cost distances between each node and all others have to be calculated.

In this study, to decrease computation cost, we have modified the initial algorithm52 to limit the number of nodes that needs to be visited when calculating least-cost distances. We gave each cell an initial amount of energy that is consumed as the pool of species adapted to a specific cell is moving across the topography. The energy expenditure depends on (1) the differences in elevation and (2) the distances between neighbouring cells. Here, the initial amount of energy is set to 2000 and we weight the energy dissipation based on travelled distances by 0.4% assuming that species have the ability to move easily over long distances if they stay in their optimal niche range. This assumption is valid in our case as we are simulating the connectivity associated with a large area (~18. e6 km2) and over geological time scale.

### Connectivity mapping from current density modelling

By considering all possible pathways, circuit theory offers a better modelling alternative to least-cost path approaches as it captures all the dynamics at play in travel decisions based on provided resistance maps. We chose Circuitscape35 to model multiple pathways. The software uses random walk and electric current running through a circuit. Electric current runs across our cost surfaces between predefined source points. We position these points across Sundaland (approximately 250 km apart) chosen along the outer margin (100 m above sea-level) of the maximum fully submerged shelf coastline. Circuitscape functions as a graph, where each cell centre is a node connected to neighbouring nodes with links35. The graph is interpreted as a circuit, and links have cost values combining the laws of electricity to estimate species flow. Effective resistance, voltage and current is then calculated over the landscape between the prescribed points with Ohm’s and Kirchhoff’s laws35.

In Ohm’s law, voltage V applied to a resistor R gives the current I through I = V/R; as a result, a lower resistance (e.g., low geomorphic cost) in the landscape will correspond to higher flow of species. Kirchhoff’s law deals with effective resistance; when nodes are connected to several resistors the effective resistance will be the sum of the resistances: R = R1 + R2.

### Connectivity maps statistical analyses

To evaluate the relative contribution of each of the three geomorphological features, we compute current density maps for different levels of Sunda Shelf exposure using different combinations of resistance surfaces. We then tested each feature independently (e.g., only landscape elevational connectivity, only distance to rivers (with rivers as barriers and as corridors) and only local slope), then pairwise costs (e.g., landscape elevational connectivity and distance to rivers, landscape elevational connectivity and local slope, local slope and distance to rivers) and finally all costs combined (Supplementary Fig. 5a).

Qualitative comparisons of current density maps rely on visual interpretation and are often altered by the choice of colour scale used to distinguish regions of high connectivity47. To perform a better evaluation, we first express current flow values (c) as z-scores ($${z}_{sc}=(c-\overline{c})/sd$$) by subtracting the mean current value $$\overline{c}$$ and dividing it by the standard deviation sd. We then used three different thresholds to estimate regions that have positive mean values (i.e., zsc > 0) and values higher than one and two standard deviations (zsc > 1 and 2, respectively, Supplementary Fig. 1b). The approach provides a quantitative assessment of flow maps sensitivity to the chosen resistance maps.

To gain additional insights into the distribution of connectivity regions across the shelf, we also employed a local spatial autocorrelation indicator, namely the Getis-Ord Gi index57. This hotspot analysis method assesses spatial clustering of the obtained current density maps, and the resultant z-scores provide spatially and statistically significant high or low clustered areas. The approach consists in looking at each local current value relative to its neighbouring one. From this spatial analysis, we extract both statistically significant hot and cold spots for each combination of resistance surfaces (Supplementary Fig. 5c). To extract statistically significant and persistent biogeographic connectivity areas across the exposed Sunda Shelf, we then combine all hotspots together and define preferential migration pathways as regions having a positive Gi z-scores for all resistance surfaces combination.

We used the function zscore in the SciPy stats package to obtain the z-scores and the ESDA library for the Gi indicator computation.

### Modelling assumptions and limitations

There are a number of important caveats for interpreting our modelling results.

First, we made several assumptions related to our transient landscape evolution simulations. A single-flow direction algorithm54 was used to simulate temporal changes in river pathways. Recent work58 has shown that this algorithm might lead to numerical diffusion, fast degradation of knickpoints and underestimation of river captures particularly in flat regions. One way to address this would be to use a multiple flow direction method59 which allow for a better representation of flow distribution across the landscape. In this study, we also assumed a uniform and invariant soil erodibility coefficient for the entire domain and a detachment-limited erosion law. Even though the erodibility coefficient was calibrated independently for each simulation (Supplementary Table 3), soil cover and properties vary notably between Borneo, Sumatra, Java and the Malay Peninsula and soil conditions for the exposed sea floor would have changed significantly over successive flooding events12. Badlands software34 allows for multiple erodibility coefficients representing different soil compositions to be defined, and this functionality could be used to evaluate the impact of differential erosion on physiographic changes. Similarly, several transport-limited laws are also available and could be compared against our detachment-limited simulations.

A second set of simplifications lies in the climatic conditions (i.e., rainfall regimes) used to force our simulations. We relied on the PaleoClim database40 which contains nine high-resolution paleoclimate dataset38,39,40 corresponding to specific time periods (4.2–0.3 ka, 8.326–4.2 ka, 11.7–8.326 ka, 12.9–11.7 ka, 14.7–12.9 ka, 17.0–14.7 ka, ca. 130 ka, ca. 787 ka and 3.205 Ma). The climate simulations from which these time periods are extracted do not consider emerged Sunda Shelf for the oldest inter-glacial events which can result in incorrect climatic pattern60. From 0.3–17 ka, precipitation fields in PaleoClim are obtained from the TRaCE21ka transient simulations of the last 21 kyr run with the CCSM3 model40. Although Fordham et al.39 show that precipitation errors range from 10–200% in their modern experiment, the paleoclim dataset provides a statistical downscaling method that includes a bias correction (namely the Change-Factor method, in which the anomaly between the modern simulation and observations is removed from the paleoclimate experiment) allowing the use of the model for paleoclimate studies40. The very same technique is applied for 130 ka and 787 ka fields that were obtained with different GCMs (namely HadCM3 and CCSM2). Given the absence of a million-year long transient climate simulation, we oversimplified the climatic conditions by considering that precipitation distribution and intensity remain constant between two consecutive intervals, generating an artificial stepwise evolution of rainfall through time. To evaluate the sensitivity of physiographic responses on the Sunda Shelf to precipitation, we ran a model with uniform rainfall over 1 Myr (scenario 4). Despite changes in the timing and extent of basins reorganisation (Supplementary Fig. 2 and Fig. 3b), we found limited differences in terms of flooding history and erosion/deposition patterns when compared with scenario 5 (Supplementary Fig. 1c, d and Supplementary Table 2). Recent work60 suggests clear regional responses induced by the emerged Sunda Shelf with seasonal enhancement of moisture convergence and continental precipitation induced by thermal properties of the land surface. This could significantly impact our simulation results. However, and at the time of writing, more continuous high-resolution paleoclimatic simulations considering the shelf as an emerged continental platform were still unavailable. Using high-resolution multi-model outputs would allow to target the uncertainty on climatic maps4 and will surely represent a significant advance for future studies. One approach would have used the orographic rainfall capability61 available in Badlands. The method is better suited to run generic simulations but falls short when applied to real cases as it relies on imposing paleo-environmental boundary conditions (e.g., temporal changes in wind direction and speed, moisture stability frequency or depth of moist layer) difficult to obtain for Earth-like model applied over geological time scales.

Finally, our species-agnostic approach assumes an equally weighted cost between the three considered geomorphic features and does not account for additional factors (temperature, vegetation cover, solar radiation to cite a few), which are all important when assessing landscape connectivity for wildlife. Most importantly, we model connectivity at very large scales (5 km resolution). Often, species are highly influenced by microclimates and small-scale topography47. From our regional-scale simulations and hotspot analysis (Fig. 6), higher resolution models focusing on highly connected regions (across the Gulf of Thailand and Siam basin) could be applied to produce more detailed representations of species migration in the region. In addition, current flow field calculations from Circuitscape35 rely on randomly selecting nodes around the region of interest. For connectivity analysis, we used 33 terrestrial points located around the perimeter of the buffered Sundaland area (white contour line in Fig. 1b). Using a selection of nodes in a buffered region allows to reduce the bias in current density estimates46. However, bias might depend on the buffer size compared to the study area as well as the number of nodes selected46,47. Because of memory limitations and the great number of computed grids used to cover the past 500 kyr, we made a trade-off between buffer size and the number of selected points for pairwise calculations. Additional experiments could possibly be tested to evaluate bias in the proposed connectivity maps potentially using a tilling approach to reduce cell number45.