Background & Summary

Tidal marshes are vegetated wetlands formed by herbaceous and woody vascular plants that are present on many of the world’s depositional coastlines and are regularly inundated by tides1. While tidal marshes naturally change in extent, anthropogenic pressures (sometimes operating over thousands of years2) have greatly accelerated this change in recent decades, degrading their condition globally. Tidal marshes have received considerable attention recently as blue carbon ecosystems, one of a group of ecosystems that have the capacity to capture and store large amounts of soil organic carbon (SOC) over hundreds to thousands of years3. Alongside mangroves and seagrasses, they accumulate organic carbon most effectively in their soils where decomposition is slow due to anoxic waterlogged conditions4,5. Precise and consistent global-scale information on tidal marsh extent, distribution change, or other ecosystem functions is lacking, highlighting a critical research gap given their potential value for climate change mitigation6,7.

Assessments of tidal marsh change have found that previous decades were characterised by extensive losses, with marshes disappearing at a rate of 1–2% per year8, leading to a total loss of 67% of tidal marshes over recent centuries9. In the period 2000 to 2019, one study estimated a global tidal marsh loss rate of 0.28% per year10, while another suggested that marshes have actually marginally increased globally in extent, including vegetation expansion onto existing tidal flats11. A new 10 m resolution global map of tidal marsh extent estimates that the ecosystem occupies 52,880 km2 (95% confidence intervals: 32,000 to 59,800 km2)12, similar to previous estimates13. These ecosystems continue to be at risk due to direct anthropogenic impacts such as activities that lead to destruction, disturbance, or degradation, sea-level rise, and changes in climate14, which negatively impact their ability to retain their stored SOC or accumulate more SOC via carbon sequestration and sediment accretion15,16.

The quantification of organic carbon stocks in tidal marsh soils provides critical information to promote the protection, management, and restoration of these natural carbon sinks. Such information, and derived models, may support blue carbon assessments, and enable the incorporation of tidal marsh ecosystems into climate change mitigation and adaptation strategies and policies, including the Nationally Determined Contributions that form a core component of global climate actions. Previous global estimates have averaged values from a few select studies4,17, or relied on global datasets that are biassed towards farmland soils10,18. There is a clear need for a centralised tidal marsh soil carbon dataset, and to this end the Coastal Carbon Research Coordination Network (CCRCN)19 has been collating and publishing core-level datasets. These data are mostly from the United States (U.S.) and have been used to model soil carbon of the Conterminous U.S. tidal marshes20. Here, we expand on these efforts by collating site- and core-level tidal marsh SOC data distributed globally.

We collected data from 99 tidal marsh SOC peer-reviewed and unpublished studies and reformatted the data into a common structure using the R computing environment21. Studies were initially identified through a search of the peer-reviewed literature, and data were extracted directly from papers, from data repositories, or through personal communication from authors (Fig. 1). The tidal Marsh Soil Organic Carbon (MarSOC) database22 contains 17,454 data points, each with geographic coordinates, collection year, soil depth, and site information (country, site name). The database includes data from 29 countries with an extensive tidal marsh coverage, and over 40% of the data are soil samples deeper than 30 cm. Using these data and the data from the CCRCN19, we provide a first order estimate for a globally representative SOC stock value for tidal marshes to 30 cm depth of 79.2 ± 38.1 Mg C ha−1 (median ± absolute deviation of the median; n = 26,349), and to 1 m depth of 231 ± 134 Mg C ha−1 (n = 39,126). Because marshes can be shallower or deeper than this, region-specific studies should develop their own stock estimates. However, using this value we can estimate an average of 1.22 ± 0.20 Pg C stored in tidal marshes in the upper metre of soil globally.

Fig. 1
figure 1

Workflow of the literature search, abstract screening, and dataset generation process for the MarSOC dataset.

Generally, carbon content is quantified using an elemental analyzer, but these analyses can incur high costs, particularly in countries where laboratories with this specialised equipment are not easily accessible. Therefore, many studies only record soil organic matter (SOM) content based on Loss On Ignition (LOI). Therefore, to estimate soil organic carbon (SOC) content, a number of equations have been developed to calculate SOC from SOM. For example, Craft and collaborators measured both SOM and SOC from marshes in North Carolina, U.S., and developed an equation for this relationship23, which has been used extensively by researchers globally to predict SOC from SOM in wetlands. However, for mangroves, this relationship can change according to the coastal environmental setting24, and several studies have generated their own site-specific equations25,26,27,28,29. For marshes in the continental U.S., Holmquist and collaborators20 developed their own equation using over 1,500 points from 6 studies. Ouyang and Lee30 developed a global conversion equation, but they used only a subset of points from each of 11 studies in 4 countries (n = 344). Developing a more globally generalizable equation for tidal marshes is needed for large-scale analyses or as a starting point for new study sites. Within our database there are 17 studies with measurements of both SOM measured via LOI, and SOC measured via elemental analysis, allowing us to present this relationship. We therefore looked to include as many data points distributed globally using data from our database (n = 1,470) and the CCRCN database (n = 3,604), to create a universal conversion equation that spans the diversity of marsh soil types (e.g., minerogenic and organogenic settings) reported in the current literature.

The MarSOC dataset22 described here can be used for new global or large-scale estimates of tidal marsh soil organic carbon, and also provide a foundation for additional data collection and collaboration to improve soil organic carbon in tidal marsh estimates, especially from underrepresented areas. The dataset is released for noncommercial use only and is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). All publications that use this database are encouraged to appropriately cite the data and this paper.

Methods

Literature search

We compiled the MarSOC dataset from a systematic review of the literature. On 19 January 2022, we searched the title, abstract and keywords in both Scopus and the Web of Science (WoS) All Databases using a naive search string: ((“soil C” OR “soil carbon” OR “soil inorganic carbon” OR “soil organic carbon”) OR (“soil carbon sequestrat*” OR “soil carbon stabiliz*” OR “soil carbon stock*”)) AND (“tidal marsh*” OR “salt marsh*” OR “saltmarsh*”). The search identified 259 studies from Scopus and 331 from the WoS (Fig. 1).

We used the litsearchr R package31 to broaden our search terms using keyword co-occurrence networks31. All steps can be viewed in the published code with the dataset22. This resulted in our final search string: (“blue carbon” OR “carbon accumulation” OR “carbon cycle” OR “carbon dioxide” OR “carbon sequestration” OR “carbon stock*” OR “carbon stor*” OR “organic carbon” OR “organic matter” OR “soil carbon” OR “soil organic carbon” OR “soil organic matter” OR “soil respiration” OR “carbon content” OR “carbon dynamic*” OR “carbon pool*”) AND (“coastal marsh*” OR “coastal salt marsh*” OR “salt marsh*” OR “tidal marsh*” OR “tidal salt marsh*” OR “marsh ecosystem*” OR “marsh soil*” OR “saltmarsh*”).

On 28 January 2022, we searched both Scopus and the WoS All Databases using the final search string mentioned above within the University of Cambridge library account, which includes the following databases: Web of Science Core Collection, BIOSIS Previews, BIOSIS Citation Index, Current Contents Connect, MEDLINE, Zoological Record, Data Citation Index, KCI- Korean Journal Database, SciELO Citation Index, Russian Science Citation Index, and Derwent Innovations Index. This procedure aimed to ensure the inclusion of articles published in languages other than English. We retrieved 4,035 items from WoS and 2,428 from Scopus. We deduplicated the results, giving a total of 4,317 references (Fig. 1), which is tenfold higher than the original naive search.

Inclusion criteria

The initial and retained articles, with inclusion criteria and additional labels, can be found on our sysrev projects, an open and online tool to screen and label abstracts. In the first sysrev project, we screened the title and abstracts of the 4,317 references to identify those that mentioned soil organic matter or organic carbon in tidal marsh studies. We excluded studies that did not meet these criteria, and separated these into SOC measured in mudflats or seagrasses, other C cycling variables measured in tidal marshes, or studies generally not in tidal marshes or without mention of SOC data. Included studies were labelled as reviews (studies with a general scope, studies with potentially large datasets), modelling (studies with raw data that was used for modelling purposes in that study), and raw data (studies that may contain raw data). Studies could have two tags, such as review studies that included raw data. All studies labelled as “reviews” were retained for full-text assessment, from which we were able to include 9 datasets from tables or the supplementary material. Some of the studies labelled as “raw data” were easily identified as having extractable data (n = 23), such as published datasets (Fig. 1).

To reduce the number of studies requiring full-text screening, from the initial studies tagged “raw data” (n = 1,168), we focused on geographical locations from which we had few datasets (i.e., outside the U.S., U.K., China, and Australia). A second abstract screening with more specific labels was then conducted. We labelled abstracts to identify studies by continent, presence of SOC or SOM data, and inclusion of primary data. A total of 69 studies with primary data in data-poor regions were identified. From these, 21 datasets were extracted or provided by the lead authors on the corresponding papers (Fig. 1).

We searched the SEANOE, PANGAEA, CIFOR, and Marine Scotland Data repositories and found 5 additional studies that fit the inclusion criteria (Fig. 1). We also included data compiled previously for a separate project, which included 12 core-level and 10 site-level published studies. Correspondence with experts in the field led to the inclusion of 10 additional datasets from published studies and 2 from studies that are unpublished or in preparation (see Supplementary Information section I for corresponding sampling methodologies). Finally, data from 7 recent studies published beyond the search date of January 2022 were included. Datasets already held in the Coastal Carbon Research Coordination Network were not included, as our data compilation is intended to be complementary to that research database. The final extracted datasets were from 99 studies (Fig. 1).

Data acquisition

From the identified studies, when possible, we extracted data (SOM and/or SOC) from the publications’ tables, figures, or supplementary information. When not available, we contacted authors and asked them to contribute their datasets. We downloaded published datasets in repositories from their respective online sources. In total, we extracted data (from tables, supplementary material) from 22 studies, received data via email from 33 studies, and included 22 published datasets from a variety of general (Dataverse, DRYAD, FigShare, Mendeley Data), subject-specific (SEANOE, PANGEA), and country-specific (Environmental Information Data Centre (EIDC), Marine Scotland Data, USGS) repositories. Finally, we appended data from 22 studies from a previous data compilation effort. 

In total, we compiled data from 2,329 unique locations (Fig. 2). To be as comprehensive as possible, we included data recorded at the core-level (n = 72 studies25,26,27,28,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97), site-level (n = 227,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118), and from reviews (n = 5119,120,121,122,123). This data identification is included in the Data_type column, while the unique ID for each core or plot sampled is reported in the Site_name column (Table 1).

Fig. 2
figure 2

Sample locations coloured by data type (core-level purple, review turquoise, site-level yellow).

Table 1 Description of variables contained in the dataset.

For each data point (i.e., each row), the data include the upper and lower depth of the soil sample, with SOC percent and/or SOM percent (Fig. 3), alongside the method used to determine these values (elemental analyser, Loss-On-Ignition, etc.). Each data point in our dataset also includes geographical coordinates, with a corresponding accuracy flag. If available in the original datasets, dry bulk density (85% of the data) and nitrogen content (15% of the data) were also included. There is also information on the year the sample was collected and the site name and country where the sample was collected, with the name of any finer scale administrative unit if applicable.

Fig. 3
figure 3

Distribution of data stored in this MarSOC database across all soil depths.

Additionally, the data collated here provide the opportunity to calculate an updated and more globally representative average value for the soil organic carbon stock to 1 m depth in tidal marshes. To do so, our database was used with the data from the CCRCN19 to maximise the number of points for this calculation (n = 38,945). Using the following equation (Eq. 1), we calculated soil organic carbon density for the subset of soil samples which recorded both a SOC content and measured dry bulk density value (Fig. 3d).

$${\rm{SOC}}\;{\rm{density}}\left[{\rm{g}}\;{{\rm{cm}}}^{-{\rm{3}}}\right]={\rm{dry}}\;{\rm{bulk}}\;{\rm{density}}\left[{\rm{g}}\;{{\rm{cm}}}^{-{\rm{3}}}\right]* \left({\rm{SOC}}\left[ \% \right]/100\right)$$
(1)

We separated all SOC density samples according to their horizon midpoint into the following soil layer categories: 0–15, 15–30, 30–50, and 50–100 cm (Figure S1). Using all of the measured SOC density values within each of these soil layers (that is, depth interval bins), we calculated the median SOC density value for each layer, along with its absolute deviation. The median was chosen as opposed to the mean due to the skewness of the data (Fig. 3d). We then multiplied this value by the corresponding thickness of each layer, and by 100 to convert grams to megagrams and cubic centimetres to hectares, to get the median SOC stock for each layer (Eq. 2).

$${\rm{S}}{\rm{O}}{\rm{C}}\,{\rm{s}}{\rm{t}}{\rm{o}}{\rm{c}}{\rm{k}}[{\rm{M}}{\rm{g}}\,{{\rm{h}}{\rm{a}}}^{-1}]={\rm{S}}{\rm{O}}{\rm{C}}\,{\rm{d}}{\rm{e}}{\rm{n}}{\rm{s}}{\rm{i}}{\rm{t}}{\rm{y}}[{{\rm{g}}{\rm{c}}{\rm{m}}}^{-3}]\ast {\rm{H}}{\rm{o}}{\rm{r}}{\rm{i}}{\rm{z}}{\rm{o}}{\rm{n}}\,{\rm{t}}{\rm{h}}{\rm{i}}{\rm{c}}{\rm{k}}{\rm{n}}{\rm{e}}{\rm{s}}{\rm{s}}[{\rm{c}}{\rm{m}}]\ast 100$$
(2)

We then summed these estimated stocks of the four layers to get the estimated total stock to both 30 cm and to 1 m depth. The final estimated value of SOC stock to 30 cm was 79.2 ± 38.1 Mg ha−1 (n = 26,239). With an additional 7,204 points located between 30 cm and 50 cm and 5,502 points between 50 and 100 cm, we calculated the stock to 1 m in tidal marsh soils as 231 ± 134 Mg ha−1 (median ± median absolute deviation). By using SOC density values from each sample to estimate the density for their respective soil layer (i.e., 0–15, 15–30, 30–50, and 50–100 cm), all data points were used in the stock calculation without needing to extrapolate. To get a more refined estimate of global tidal marsh soil carbon storage, it is possible to multiply this stock value by the tidal marsh area estimate of 52,880 km2 (95% CI: 32,000 to 59,800 km2) from the recent globally consistent extent map12. This gives us a global estimate of tidal marsh soil carbon of around 1.22 ± 0.20 Pg C in the top metre of soil, which is lower than previous estimates17. However, we acknowledge that this is a general estimate, and that a study using machine learning and environmental predictors to estimate SOC at a finer scale would give a more appropriate and accurate spatial representation of SOC stocks across the world’s coastal marshes. We also acknowledge that tidal marsh soils in different regions may be more shallow, or deeper than 1 m, so we recommend that regional studies develop their own carbon stock estimates.

Global conversion factor

To create our conversion factor between SOC and SOM, we identified 17 studies in which, both SOM and SOC were measured. While data from the CCRCN is not included in our final dataset, we did include all data with both SOM and SOC measurements from the CCRCN19 to create the conversion factor equation. Thus, we included 18 studies124,125,126,127,128 from the CCRCN129,130,131,132,133,134,135,136,137,138,139,140,141 and 17 studies from our dataset to investigate the SOM to SOC relationship (Fig. 4). A further 10 studies, in which the authors developed their own conversion factor to convert SOM to SOC (Fig. 5), were selected for comparison.

Fig. 4
figure 4

Data points with both soil organic matter and soil organic carbon values, used to calculate the conversion equation for SOM to SOC (solid black line, with prediction intervals in grey). Data extracted from the Coastal Carbon Research Coordination Network (CCRCN)19 are shown in circles, and values from this dataset are shown in triangles.

Fig. 5
figure 5

Soil organic matter to soil organic carbon conversion relationships developed by different sources, along with the region, site, or species zone from which these were developed (equations detailed in Table S1). Our conversion equation is a solid black line, with prediction intervals in grey.

To model SOC from SOM, we used the nls() and the lmer() functions in R to fit linear and quadratic models with an intercept fixed to 0, and included the study ID as a random effect. Based on Akaike’s Information Criteria, testing model parsimony relative to explanatory power, the best fitting model was a quadratic function with study ID as a random effect (Eq. 3; Fig. 4, R2 = 0.949, n = 5074).

$${\rm{SOC}}\left[ \% \right]=\left(0.000683\pm 0.00563\right)\ast {\rm{SOM}}{\left[ \% \right]}^{2}+\left(0.410\pm 0\right)\ast {\rm{SOM}}\left[ \% \right]$$
(3)

This can be compared to 16 studies from our literature search that used a variety of conversion factors (Table S1). We also fitted a quadratic model to each of the individual studies presented in Fig. 4, used to generate the general equation (Figure S2). We found that many of the study-specific quadratic equations were significantly different to the overall equation (Table S2), showing that there is high variability in the relationship between SOM and SOC between each study. While site-specific conversion equations will always be desirable, our general model captures a range of coastal tidal marsh types distributed across the climatic, oceanographic, and geomorphic gradients with applications to regional or larger-scale studies. Our equation lies amongst the other conversion equations (Fig. 5), and estimates less organic carbon from organic matter than the commonly used Craft23 equation or the second equation presented in Blue Carbon Initiative handbook142, which used data from Maine. Our dataset can be used to analyse the uncertainty in how these different equations affect the calculation of a C stock for soils. For example, the uncertainty may be different for varying levels of soil organic matter, or for marshes with different coastal geomorphologies or soil type, which may influence the relationship between SOM and SOC24,143. It can also be used to estimate soil carbon stocks in tidal marshes for varying soil depths and using different methods, such as extrapolating cores to 1 m or confining the analysis to the topsoil. Finally, the data can serve as a basis for future work integrating other soil variables, such as soil total inorganic carbon, particulate organic and inorganic carbon, as well as isotope measurements.

Data Records

The data and code used in the methods described above are archived in a Zenodo repository22. This is a static copy of the data peer-reviewed in 2023, which is a release from the dynamic Github repository https://github.com/Tania-Maxwell/MarSOC-Dataset. The data is currently being incorporated into the CCRCN Atlas.

The repository is formatted in the following structure:

  • Maxwell_MarSOC_dataset.csv: .csv file containing the final dataset. The data structure is described in the metadata file. It contains 17,454 records distributed amongst 29 countries.

  • Maxwell_MarSOC_dataset_metadata.csv: .csv file containing the main data file metadata (equivalent to Table 1).

  • data_paper/: folder containing the list of studies included in the dataset, as well as figures for this data paper (generated from the following R script: ‘reports/04_data_process/scripts/04_data-paper_data_clean.R’).

  • reports/01_litsearchr/: folder containing.bib files with references from the original naive search, a .Rmd document describing the litsearchr analysis using nodes to go from the naive search to the final search string, and the.bib files from this final search, which were then imported into sysrev for abstract screening.

  • reports/02_sysrev/: folder with.csv files exported from sysrev after abstract screening. These files contain the included studies with their various labels.

  • reports/03_data_format/: folder containing all original data, associated scripts, and exported data.

  • reports/04_data_process/: folder containing data processing scripts to bind and clean the exported data, as well as a script testing the different models for predicting soil organic carbon from organic matter and finalising the equation using all available data. A script testing and removing outliers is also included.

Technical Validation

For consistency and to validate the inclusion criteria, the literature search and screening was conducted in a two-part process that included a repeated evaluation by different co-authors. All SOC and SOM values were extracted from numerical sources (tables, supplementary tables, or published datasets). The distribution of all quantitative variables was verified visually by two authors, and the following outliers were flagged: 1) SOC, SOM, and dry bulk density values greater than the sum of 2.2x the interquartile range plus the 95% percent quantile144 of this dataset combined with the CCRCN dataset, 2) SOM values greater than 100, and 3) SOC values greater than SOM values, which may have been due to incomplete removal of water prior to LOI or due to incomplete removal of carbonates prior to SOC measurements. These values were removed from all calculations but remain in the dataset with an outlier flag in the “Notes” column of the dataset. In total, this represented less than 1% of data removal. These operations and the distribution of all variables (Fig. 3) can be found in the script 02_outliers.R.

Usage Notes

This data descriptor manuscript and dataset was peer reviewed in 2023 based on a targeted search of the data available at the time. This compilation of 99 published and unpublished tidal marsh soil carbon datasets can be used to answer multiple research questions. First, the MarSOC dataset can be used to support large-scale models of soil carbon in tidal marshes and improve global estimates of carbon stored in these coastal ecosystems. Different drivers of soil carbon at the landscape-scale can be investigated, such as the influence of coastal geomorphology. In addition, our database serves as a baseline for targeted ecosystem design outcomes and restoration of degraded tidal marshes.