A spatial analysis of seagrass habitat and community diversity in the Great Barrier Reef World Heritage Area

The Great Barrier Reef World Heritage Area (GBRWHA) in north eastern Australia spans 2500 km of coastline and covers an area of ~ 350,000 km2. It includes one of the world’s largest seagrass resources. To provide a foundation to monitor, establish trends and manage the protection of seagrass meadows in the GBRWHA we quantified potential seagrass community extent using six random forest models that include environmental data and seagrass sampling history. We identified 88,331 km2 of potential seagrass habitat in intertidal and subtidal areas: 1111 km2 in estuaries, 16,276 km2 in coastal areas, and 70,934 km2 in reef areas. Thirty-six seagrass community types were defined by species assemblages within these habitat types using multivariate regression tree models. We show that the structure, location and distribution of the seagrass communities is the result of complex environmental interactions. These environmental conditions include depth, tidal exposure, latitude, current speed, benthic light, proportion of mud in the sediment, water type, water temperature, salinity, and wind speed. Our analysis will underpin spatial planning, can be used in the design of monitoring programs to represent the diversity of seagrass communities and will facilitate our understanding of environmental risk to these habitats.

Coastal marine habitats are some of the most at-risk ecosystems in the world 1 . Proximity to land-based anthropogenic activities exposes these habitats to threats from multiple stressors 2 . The scale and complexity of marine habitats and the high cost of sampling them means the data used to inform management is often less precise than for equivalent terrestrial systems 3 . This is compounded by significant gaps in the data available for important risks and threats, asymmetry in ecological connectivity, a lack of long-term historical data, enormous variations in scale and poorly documented temporal cycles of impacts and recovery [4][5][6] . It is difficult to detect even large changes in status and distribution for some coastal and marine habitats, with the current extent and consistency of spatial coverage of monitoring.
Understanding the factors that support the resilience of important coastal marine habitats at large regional scales is difficult. Challenges include describing diversity, distribution and connectivity within ecosystems, defining desired state and assessing ecosystem condition to understand long-term trends and in evaluating short-term impact-recovery cycles 7 . This is exacerbated in Australia's Great Barrier Reef World Heritage Area (GBRWHA) by the vaguely defined objective and high bar set by the reef management authority in 2015 to "maintain diversity of species and ecological habitats in at least a good condition and with a stable to improving trend" 8 and updated in 2018 to "[facilitate] adaptive management for the Reef that is effective, efficient and evolving" 9 . Compounding these issues in the GBRWHA is the difficulty of developing and implementing appropriate management frameworks to maintain resilience within multiple priority habitats within the time and investment constraints typically faced by marine management agencies.
Spatial data and visualization techniques are important in understanding and communicating options for managing large and complex coastal marine habitats. Habitat and community maps are a frequently used spatial tool that visualise and assist in evaluation of the association of species and communities of interest with key environmental drivers likely to affect those communities 10 . The ability of these maps to capture the diversity Results Seagrass habitat. Using random forest (RF) statistical models we identified approximately 88,321 km 2 of potential seagrass habitat (probability of seagrass present ≥ 0.2) in the GBRWHA (Fig. 1) as a function of 12 environmental variables. This includes 1111 km 2 of potential seagrass habitat in estuaries, 16,276 km 2 in coastal areas, and 70,934 km 2 in reef areas ( Table 1). The performance of RF models varied. Estuary subtidal and intertidal models were the least accurate (72 and 73% overall accuracy, respectively) and reef subtidal and intertidal models were the most accurate (81 and 84% overall accuracy, respectively) (see ESM Appendix S1 in "Supporting Information").
The importance of different environmental variables in predicting seagrass presence differed among the six RF models ( Table 2). In subtidal areas, depth was the most important environmental condition in estuaries and coasts, and the second most important after benthic light (PARb) in reef areas. The least important of the environmental conditions included for predicting seagrass habitat in subtidal coastal and reef areas was water type, and in estuaries dominant sediment type.
In intertidal areas of estuaries, relative tidal exposure was the most important environmental condition for predicting seagrass presence. In contrast, on reefs, tidal exposure was least important and water temperature was most important. For coastal intertidal areas, wind speed was most important, followed by water temperature, salinity, tidal exposure and benthic light ( Table 2).
The relationship between each environmental predictor and the probability of seagrass being present varied among the models (Fig. 2). In subtidal areas in estuaries, the probability of seagrass presence declines in the first 5 m to p < 0.2. In coasts, the probability of seagrass presence reduces over the first 10 m and then stabilises at p ~ 0.35, while in reefs the probability of seagrass increases between 0 and 40 m depth, then declines between 40 and 60 m to p ~ 0.45 (Fig. 2). In coastal and reef intertidal areas, there was a positive relationship between the proportion of mud in the sediment and probability of seagrass, while in coastal and reef subtidal areas, it was a slightly negative relationship (Fig. 2). In reef areas, there was a greater extent of potential seagrass habitat in www.nature.com/scientificreports/ subtidal than in intertidal areas, while in coastal areas the extent of potential seagrass was greater in the intertidal zone (Fig. 2). There were distinct environmental thresholds identified by some models. In reef areas, the long-term annual average temperature of ~ 27 °C was a threshold. Above that temperature, seagrass probability decreased in intertidal areas but increased in subtidal areas (Fig. 2). In both intertidal and subtidal coastal areas, the probability of seagrass increased with water temperature > 26 °C, then declined once waters were > 28 °C (Fig. 2). The probability of seagrass was always greatest where current speeds were lowest and salinity was > 34 PSU. Latitude had a strong effect on the probability of intertidal and subtidal estuarine seagrass, which was most likely to be present at latitudes > 18° S (Fig. 2).   Table 3), based on the results of Multivariate Regression Trees (MRTs). The importance of environmental conditions in structuring the location and spatial extent of these communities also was diverse. Important variables included depth, tidal exposure, latitude, current speed, benthic light, proportion of mud in the sediment, water type, water temperature, salinity and wind speed (Figs. 5, 6, 7). Estuaries have communities with the smallest spatial extent in the GBRWHA. Five estuarine communities had a predicted total area between 4 and 7 km 2 ( Table 3). Estuary communities were predicted by variations in relative tidal exposure, depth and latitude, but not the dominant sediment type (Fig. 5). Hinchinbrook Island in the central GBRWHA was identified as an area of high community diversity and a transition zone between communities for both intertidal and subtidal estuarine communities (Figs. 3 and 5e; Table 3).
Coastal communities occur in a highly dynamic zone between estuaries and reefs. Coastal seagrass communities were predicted by the greatest variety of environmental variables and were the only area where all 12 seagrass species were present (Fig. 6).
Reef communities have a distinct species composition. Species such as Thalassia hemprichii, Cymodocea rotundata and Syringodium isoetifolium often dominate intertidal and shallow subtidal reef communities, while species found in estuarine and coastal habitats (Enhalus acoroides and Zostera capricorni) were not present.
Estuary intertidal. Nine seagrass communities were predicted in the estuarine intertidal model (Figs. 3 and   5, Table 3). Z. capricorni, Halodule uninervis and Halophila ovalis were found in nearly all of these communities with varying frequencies of occurrence (Figs. 4a and 5). The most extensive estuarine intertidal community, EI1, was predicted to cover a total ~ 288 km 2 throughout the GBRWHA in areas associated with extremely infrequent (intertidal extent model (ITEM) = 0) and medium to high tidal exposure (ITEM = 4-9). The remaining estuarine intertidal communities were predicted to occur in distinct latitudinal bands. Four intertidal communities occurred where tidal exposure was very low (ITEM = 1): the Z. capricorni dominated community EI4 in the northern GBRWHA, the H. uninervis dominated community EI2 between Bingil Bay and Hinchinbrook      Table 3).
Coastal subtidal. The coastal subtidal model predicted communities separated by variations in current speed, depth, and the proportion of mud in the sediment (Fig. 6). Four communities were associated with very low current speeds (< 0.11 ms −1 ): the H. uninervis dominated community CS4 in areas where almost no mud (proportion mud < 0.005) is present in the sediment, and the more diverse communities CS5, CS6 and CS7 when some mud is present. Community CS5 is the largest of these low current communities (2938 km 2 ) and predicted www.nature.com/scientificreports/ at depths > 2 m MSL from the Whitsunday Islands north (Fig. 3), with ten species recorded. Communities CS6 and CS7 are predicted to occur throughout the GBRWHA at depths < 2.0 m: CS6 where the proportion of mud is low to moderate (0.005-0.38), and CS7 where the proportion of mud is > 0.38 and the frequency of H. uninervis and Halophila spinulosa is greater than in community CS6 (Fig. 6). Three coastal subtidal communities were predicted to occur throughout the GBRWHA where current speed was > 0.11 ms −1 . The predicted area of these communities was much larger than low current communities, and communities were associated with different depths. Community CS3 in shallow subtidal waters (< 1.6 m MSL) had a species mix similar to coastal intertidal communities. The large (4575 km 2 ) community CS2 at intermediate depths (1.6-12.6 m MSL) was dominated by H. uninervis and H. ovalis but with a much greater prevalence of typical subtidal species such as H. decipiens, H. spinulosa, and Cymodocea serrulata, and very little Z. capricorni. The deep subtidal community CS1 (> 12.6 m) had the largest predicted total area (7589 km 2 ) of all coastal communities. This community was dominated almost entirely by H. decipiens and H. spinulosa, and was one of the few seagrass communities where Halophila tricostata, an endemic seagrass species is present (Figs. 3, 4d and 6, Table 3).

Reef intertidal.
Intertidal reef communities were best predicted by a model that included benthic light, proportion mud, and wind speed (Fig. 7). Three reef intertidal communities were associated with light levels < 13.4 mol photons m −2 day −1 . T. hemprichii was the dominant species in all of these communities (Fig. 4e). H. ovalis occurred in greatest frequency in community RI1, predicted to be most prevalent on fringing reefs around the Palm Island Group in the central GBRWHA and as small patches on reefs north of there when some mud is present in the sediment (Figs. 3, 7, Table 3). The large intertidal communities RI2 (887 km 2 ) and RI3 (608 km 2 ) were associated with very low mud content: RI2 in the northern GBRWHA where wind speed was high (> 6.8 ms −1 ) and RI3 throughout the GBRWHA in calmer conditions (Fig. 7). Communities RI4 and RI5 were associated with high light levels (> 13.4 mol photons m −2 day −1 ; Fig. 7). Both communities were characterised by similar frequencies of the dominant species T. hemprichii, C. rotundata and H. uninervis, but variations in other species depended on the proportion of mud in the sediment with greater species diversity in community RI4 with the addition of mud. Communities RI4 and RI5 were predicted to occur as small patches on reef tops mostly confined to clusters of reefs near Cairns and Princess Charlotte Bay (Fig. 3, Table 3). www.nature.com/scientificreports/ Reef subtidal. The reef subtidal model predicted three reef communities separated by depth and water temperature (Fig. 7). Community RS3 was found at depths < 8 m MSL in the transition zone between intertidal and deep subtidal reef communities. This community was predicted to occur as narrow perimeter bands around reefs and islands throughout the GBRWHA, but particularly on reefs between the Palm Island Group and Bloomfield, and on nearshore reefs north of Princess Charlotte Bay (Figs. 3 and 7, Table 3). Species composition for RS3 was similar to the intertidal reef communities RI4 and RI5: C. rotundata, H. ovalis and H. uninervis frequently occur, but the dominant intertidal species T. hemprichii was replaced by C. serrulata and S. isoetifolium (Fig. 7). The two largest seagrass communities were associated with reef waters > 8 m MSL (Fig. 7). Both deep communities were dominated by a mix of Halophila species, but the frequency of each species varied with water temperature. Community RS1 (19,434 km 2 ) was predicted in warmer waters (> 27.3 °C) north of the Palm Island Group, was dominated by H. decipiens, and the relatively uncommon H. tricostata which was found in this community more often than in any other. The cooler-water subtidal community RS2 (49,052 km 2 ) was predicted south of the Palm Island Group and around a cool-water area in the Lizard Island region of the northern GBRWHA (Figs. 3 and 7). Community RS2 is characterised by a more even mix of Halophila species: H. decipiens, H. ovalis, and H. spinulosa are all common. The rarer species Halophila capricorni is found in this community more often than in any other (Figs. 4f and 7, Table 3).  Figure 5. Multivariate regression tree (MRT) and seagrass communities classified for estuaries using species presence/absence data for (a) subtidal sites and (b) intertidal sites. The number (n) below each community is the count of observations that fall into that community. The histogram shows the frequency of occurrence for each species in that community with the height of the bar representing the frequency that each species was observed in that assemblage. The coloured dots represent unique communities for coast intertidal (EI) 1-9, and coast subtidal (ES) 1-6. The CV Error is the cross-validated relative error.  www.nature.com/scientificreports/

Discussion
We present an approach to define seagrass habitat and communities and how they are distributed over large spatial scales. Our study area is vast and encompasses a multitude of changing physical and biological conditions as well as diverse seagrass species. Despite these challenges (and a dataset sourced from multiple studies that collected observations at different times and spatial scales), our approach provides a statistically valid and transferrable approach for one of the world's most complex seagrass systems. This approach could also be adapted for use at other locations to identify the seagrass community types that make up the seagrass biome, addressing the critical gaps in spatial knowledge needed for global seagrass management and protection 10,52,53 . Our seagrass community model provides a spatial tool needed for understanding seagrass community distribution; a critical pre-requisite for assessing connectivity, resilience, dispersal and restoration decisions 54 . The advantage of constrained clustering techniques, such as the MRTs we applied to define community types, is that each cluster defines an assemblage type, but additionally the environmental values define an associated habitat type for the assemblage. This allowed prediction of assemblage types where the set of environmental values were available but there was no seagrass data. Our analysis provides a basis for management authorities to identify likely seagrass communities within environmental management plans that are inadequately protected or exposed to environmental threats 10 .
Creating spatially expansive models at the scale of the GBRWHA constrained the analysis to the environmental data also available at that scale, so our models are unlikely to account for smaller-scale localised differences in seagrass communities and their drivers. For example, our community models demonstrated that very small shifts in depth and tidal exposure can lead to significant shifts in seagrass communities. Depth and tidal exposure were two of the highest resolution environmental data sets we used in our models, but many of the environmental predictors we used represent larger-scale spatial patterns modelled at a 1 km grid scale (e.g., benthic light, wind speed, current speed, water temperature, salinity). This means that variables such as modelled benthic light should be reliable at large-scale seasonal-average spatial patterns, but smaller-scale variations in benthic light that depend on small-scale variations in bathymetric depth and sediment distribution will not be accurately predicted.
Seagrass distribution and communities are shaped by multiple environmental complexities. Large spatial trends were present. Seagrass communities in the northern GBRWHA extend from the coast to the edge of the continental shelf, while in the inshore central region bands where no seagrass was present ran parallel to the coast, and in the south, there are large inter-reef areas with little or no likelihood of seagrasses. Our seagrass habitat model and community classification provide an important tool to make informed decisions at an appropriate scale during marine spatial planning, management, monitoring design, threat mitigation, and habitat restoration. Zoning in the Great Barrier Reef Marine Park (GBRMP) to protect biodiversity and regulate human activities has been in place since 1981 when the region became the world's first coral reef ecosystem to achieve world heritage status. The GBRMP was rezoned in 2004 55 and while that represented best practice at the time, rezoning identified only five seagrass bioregions where seagrass was a key element (http:// www. gbrmpa. gov. au/__ data/ assets/ pdf_ file/ 0011/ 17300/ nonre ef-biore gions-in-the-gbrmp-and-gbrwh. pdf). We now provide those previously missed details of the complexity of seagrass communities, particularly for coastal waters and estuaries, that can be used to inform future management of the GBRWHA. This allows a more nuanced approach to management as management responses and spatial planning decisions can be tailored to take into account the diversity of seagrass habitats.
Estuaries and rivers adjacent to the GBRWHA are small by international standards, but their flow and sediment load variability in a monsoon-influenced coastline makes them both key attributes of the GBRWHA and sources of environmental variability 56 . Our estuarine models highlight the paucity of environmental data for estuaries at this scale-our models were limited to just three environmental variables in estuaries but still predicted 15 communities, with latitude acting as a proxy for other complex environmental conditions. We identified substantial spatial complexity in community types. Some extend throughout the GBRWHA while some communities are small and localised. We focus on seagrass habitats, but these overlap spatially with other environmental values such as populations of sea turtles and dugong that suggest priority areas for management protection such as the Hinchinbrook Island region where extensive and diverse seagrass communities were predicted.
While we provide a framework to understand spatial patterns in seagrass communities it remains open to management authorities to evaluate a level of concern for protection. Some communities have distinct assemblages, while others are differentiated by only slight changes in relative occurrence of species but have been identified as distinct communities in our analysis because species and environmental features were different. Sensitivity of seagrass communities to environmental threats can also be ameliorated by resilience inferred by connectivity, which is not included in this model but likely to have an influence at scales of hundreds of kilometres 54,57 .
To design a marine protection system for all seagrass communities, these spatial complexities and differing sensitivities to environmental conditions would need to be adapted into a broader marine protection approach. We are now able to better evaluate environmental risk to seagrass habitats from natural processes and anthropogenic activity and to assess environmental threats that affect seagrass at a large scale including cyclones and floods 4,36,58 , climate change 59,60 , and more localised impacts such as coastal development, dredging, and oil spills 61,62 . Spatial assessments of cumulative anthropogenic risk to seagrasses in the GBRWHA found risks tend to accumulate where ports and coastal development pressures overlay with inputs from coastal catchment runoff 38 . Our community model provides a tool to identify communities that occur in these risk hotspots and may be vulnerable due to their lack of representation outside of high-risk areas.
Our classification of seagrass communities provides a spatially comprehensive tool that can be used to assess management actions for seagrass throughout the GBRWHA. The varied environmental conditions that determine www.nature.com/scientificreports/ seagrass community diversity demonstrate that reporting trends at large scales and with coarse partitions such as "coastal" fail to accurately account for changes at the more precise community level. Our research emphasizes that the GBRWHA seagrasses do not function as a single entity and reporting of trends will provide a more accurate picture of change if it is reported at the scale of the seagrass communities. We identify communities in locations where data is poor and could be improved by a more comprehensive monitoring program. Our method has potential global utility as an approach to create informative models based on data that is scalable and easily available as it requires only presence/absence data for seagrass species.

Methods
Study area. The Great Barrier Reef in tropical north-eastern Australia is one the world's most extensive coral reef structures, an environment home to a globally outstanding and biodiverse marine ecosystem. A Marine Park was proclaimed by the Australian Federal government in 1975 (Great Barrier Reef Marine Park Act 1975) and the region was inscribed as a World Heritage Area in 1981. Within the GBRWHA more than 2500 individual reefs and over 900 islands protect an extensive and mostly shallow inter-reef lagoon that extends across the continental shelf. The GBRWHA covers an area of around 350,000 km 2 in north-east Australia, including 2500 km of coastline and a shelf that extends up to 250 km offshore. Our study area covers the continental shelf region of the GBRWHA and extends into and includes adjacent estuaries along the mainland Australian coast (Fig. 8).
The large size of the GBRWHA means there is large variation in geography, topography, and environmental conditions Fig. 8; ESM Appendix S4 63,64 . The northern GBRWHA (< ~ 16° S) is characterised by a narrow shelf, shallow inter-reef waters (< 30 m), elongate reefs, warmer water temperatures, high benthic light, low current speed, and low salinity (ESM Appendix S4). The central GBRWHA (~ 16° and 20° S) is characterised by lower reef density, intermediate inter-reef depths (> 40 m), low current speed, low salinity and low wind speed (ESM Appendix S4). The southern GBRWHA (> ~ 20°S) is characterised by high reef density areas of deep water (down to 140 m) across a wide continental shelf, high salinity, high current speed, cooler water and lower mud content in the sediment ESM Appendix S4 63,64 . There are also major regional differences along the coast adjacent to the reef lagoon in land type, climate, and land use, e.g., tropical and subtropical, wet and dry tropics, pristine, sugar cane or cattle-dominated catchments 42,65 . Adding to this complexity is a coastal mountain range that in the northern GBRWHA runs close to the coast with mostly small watersheds and short rivers compared with the central and southern GBRWHA. The human population is concentrated in coastal communities of the central and southern coast. Threats and risk to coastal seagrass integrate these broad trends 38 . and (5) frequent (at least annual) and spatially intense mapping and monitoring in and adjacent to six Queensland ports, that in some cases extend back more than 25 years. The seagrass data set is publicly available as a single GIS file of 81,387 survey sites (https:// doi. org/ 10. 25909/ y1yk-9w85) 44 . The data includes presence/absence of seagrass (Fig. 8) Jacobs, 2006). The impact of a series of intense tropical cyclones with high rainfall and flooding that severely reduced seagrass presence and altered species composition along the southern two-thirds of the GBRWHA was quantified by ports long-term monitoring programs between 2009 and 2012 and inshore seagrass monitoring. Recovery has been variable among locations 4,35,58,66 . Previous seagrass community analysis demonstrates species assemblages during and after major disturbance events are disproportionally dominated by colonising species, leading to an overly simplistic community classification relative to the long-term seagrass diversity 7 . Because of this, ports longterm monitoring data was excluded from our analysis if overall seagrass condition was classed as poor or very poor in the annual report card for each of those locations [67][68][69][70][71][72] . This avoided a bias in the analysis from defining seagrass community types based on data that overwhelmingly represented a significant environmental impact, rather than average environmental conditions. This process was applied only to ports data because these were the only locations in the central and southern GBRWHA where sampling occurred during 2009-2012. Data was also restricted to the seagrass growing season (August-January; included approximately 80% of available site data) to reduce the likelihood of including times and sites in the analysis where seagrass was absent due to the seasonal and ephemeral nature of some species. This is particularly important for deep-water Halophila communities, which may be present only as a seed bank through the colder months of the year 37,73 . Models and environmental predictors. We fitted random forest and multivariate regression tree models to six subsets of the data: estuary intertidal, estuary subtidal, coast intertidal, coast subtidal, reef intertidal and reef subtidal, each resulting in a different model fit (ESM Appendix S4; Fig. S1a). This separation was used as it accounted for variation in availability of environmental data (e.g. lack of environmental data for estuaries), variation in seagrass sampling history and intensity (e.g. a gradient in sampling intensity that decreases with distance from the Australian mainland coast, and with depth), and well-established general differences in seagrass species distributions (e.g. intertidal versus subtidal species) 7,22 .
For each site we chose spatial data that either directly influenced the category for spatial analysis (e.g. exposure for intertidal habitat and depth for subtidal habitat) or was known from previous studies to influence seagrass distribution and/or community structure (e.g. benthic light). These data sets were used to quantify environmental conditions at each site, and to assign each site to one of the six models (estuarine intertidal, estuarine subtidal, coastal intertidal, coastal subtidal, reef intertidal, reef subtidal) ( Table 4; ESM Appendix S2).

Statistical analysis.
We conducted a two-step analysis to (1) define potential seagrass habitat, then (2) classify seagrass communities within that habitat. To define potential seagrass habitat, we used the machine learning technique random forest (RF) to examine the probability of seagrass occurrence irrespective of species. The RF method is a non-parametric tree-based analysis. It generates multiple classification or regression trees, each calibrated on a bootstrap sample of the original data using a subset of the predictor variables, with the model prediction calculated as the average value over the predictions of all the trees in the forest 76 . The accuracy of the RF model depends on the predictive power of each tree and the correlation between trees 76 .
Random forest models were implemented using the randomForest package 77 in R version 4.0.2 78 . For each RF model, seagrass presence/absence (1/0) data was randomly partitioned into training (80% of data set) and testing (remaining 20%) datasets (Table 5). For each model, we set the number of classification trees (ntree) to 500. The optimal number of predictor variables to be randomly selected at each node (mtry) was determined by tuning each model ( Table 5). The importance of predictor variables was assessed using the mean decrease in accuracy. Variables included in each model were plotted using the plotmo package 79 where, for each plot, the background variables are held fixed at their median values (calculated from the training data). Each model was validated using a confusion matrix derived from the independent validation (test) data, using the caret package in R 80 . A confusion matrix shows agreement and disagreement in a table format, with predicted values forming the matrix columns and observed values forming the rows. From this matrix we calculated the total accuracy (i.e., percentage of sites correctly classified) and accuracy for each class (present/absent).
To avoid the issue of multicollinearity of environmental variables in our models we calculated variance inflation factors (VIF) for all environmental variables. Highly correlated variables (VIF > 3) were removed prior to analysis following the conservative threshold recommended by Zuur et al. 81 : tidal range (collinear with water temperature) was not included in any model; apart from that, collinearity and the variables excluded differed among models. Variables available in the RF models were: The six RF models were used to generate rasters of seagrass predicted probability across the entire GBRWHA. We created this by predicting each model onto a raster stack of data corresponding to the same predictors included in each model using the raster package in R 82 . Raster data sets within each stack were predicted to the  www.nature.com/scientificreports/ 30 m resolution of the depth model 45 using the sf package in R 83 . In this analysis we defined potential seagrass habitat as regions where the RF models predicted a probability ≥ 0.2 rather than a more conservative ≥ 0.5 used previously 22 . This previous level of probability was chosen to identify seagrass distributions for management and zoning advice. In the present analysis it was important to choose a level of probability to be more inclusive and that ensured that no seagrass area was missed in an exercise designed to identify distinct community types. The ≥ 0.2 threshold appropriately captures the extent of seagrass for a communities analysis and avoids excluding areas where seagrass could occur. This threshold still has the effect of excluding from the analysis areas classed as unlikely seagrass habitat and where seagrass has never been previously recorded. Our second analysis defined seagrass communities within predicted potential seagrass habitat using multivariate regression trees (MRT) in the R package mvpart 84 (available in archive form on CRAN at https:// cran.r-proje ct. org). MRT are a constrained analysis that repeatedly splits the assembled data (in this case a matrix of presence/ absence data for each species as the response variable for each model) into groups that represent a distinct community composition defined by threshold values of associated environmental variables (De'ath 2002) 85 . Using species presence/absence from each site resulted in the community type being defined based on the frequency of occurrence of each species. For each MRT we used the same environmental predictors as for the RF models. We excluded sites from unlikely seagrass habitat to allow the six MRT models to identify patterns in seagrass species presence without being diluted by zeros due to seagrass absence ( Table 5). As the aim was to cluster the sites spatially, we did not include 'year' as a factor in the model. Instead, the intent was to categorise where each seagrass species is found, on average, through time.
We selected the best MRT for each habitat model using the cross-validated relative error (CVRE). The CVRE represents the capacity of the tree to predict community composition for new sites. Calculation of the CVRE is based on a repeated random sub-sampling cross-validation, where number of cross-validations can be specified and controls the proportional allocation of sites to training and testing (evaluation) sets and this is repeated ten times, where each time site data are randomly allocated to train and test groups. We designated 80% of our data for model training and 20% for testing. The CVRE is the average test error over the chosen number of crossvalidations. We repeated the cross-validation 100 times to stabilise variability in CVRE estimates due to the random cross-validation; the mvpart package then estimates the mean CVRE, where 0 indicates perfect prediction and ≥ 1 indicates no predictive power. The depth (number of splits) in the trees was selected by finding that depth that fitted the best predictive tree in the cross-validation.
All maps were created in ArcMap 10.8 (ESRI, Redlands, CA). The area of each seagrass probability level from the RF analysis, and each seagrass community from the MRT analysis, was determined by multiplying the pixel size (900 m 2 ) by the total number of pixels for each category of interest in each raster of the modelled predictions for seagrass probability and community type.

Data availability
The seagrass site data used in this analysis is available at: https:// doi. org/ 10. 25909/ y1yk-9w85. The predicted probability of seagrass presence across the Great Barrier Reef World Heritage Area and adjacent estuaries (random forest models) is available at: https:// doi. org/ 10. 26274/ J6B6-PH79. The predicted distribution of seagrass communities across the Great Barrier Reef World Heritage Area and adjacent estuaries (MRT models) is available at: https:// doi. org/ 10. 26274/ NRE6-YS16.