The ProkaBioDen database, a global database of benthic prokaryotic biomasses and densities in the marine realm

Benthic prokaryotes include Bacteria and Archaea and dominate densities of marine benthos. They play major roles in element cycles and heterotrophic, chemoautotrophic, and phototrophic carbon production. To understand how anthropogenic disturbances and climate change might affect these processes, better estimates of prokaryotic biomasses and densities are required. Hence, I developed the ProkaBioDen database, the largest open-access database of benthic prokaryotic biomasses and densities in marine surface sediments. In total, the database comprises 1,089 georeferenced benthic prokaryotic biomass and 1,875 density records extracted from 85 and 112 studies, respectively. I identified all references applying the procedures for systematic reviews and meta analyses and report prokaryotic biomasses as g C cm−3 sediment, g C g−1 sediment, and g C m−2. Density records are presented as cell cm−3 sediment, cell g−1 sediment/ sulfide/ vent precipitate, and cell m−2. This database should serve as reference to close sampling gaps in the future.

Depending on the environmental conditions, prokaryotes can be involved in the transfer of organic matter to higher trophic levels: In the oxygen minimum zone of the Arabian Sea (Indian Ocean), the transfer of labelled carbon, that was taken up by prokaryotes, to their metazoan meiobenthic and macrobenthic consumers is relatively inefficient 6 . In comparison, for an intertidal area of the Scheldt estuary (North Sea), a model combined with a pulse-chase tracer experiment estimated that 3% of the prokaryotic carbon production was grazed upon by meiobenthos and 24% of this carbon production was consumed by macrobenthos 23 . In the deep-sea sediments of the Fram Strait (N Atlantic) and of the Clarion-Clipperton Fracture Zone (equatorial Pacific), however, no direct transfer of labelled carbon from prokaryotes to metazoan meiobenthos or metazoan macrobenthos was detected 8,24 .

Methods
In March and June 2020, I compiled the "ProkaBio" part of the "ProkaBioDen database" applying the principles of "Preferred Reporting Items for Systematic Reviews and Meta-Analyses" (PRISMA) 40 . In the so-called "Identification" step, I identified 1,553 peer-reviewed articles in the Web of Science by using the key words "microb* biomass benth*", "benthic prokaryotic biomass", "benth* bacteria* biomass marin*", and "Archaea biomass marin*". Additionally, I found 138 publications in other sources, such as PANGAEA ® Data Publisher (https://www.pangaea.de/) and peer-reviewed publications known to the author. After removing duplicate publications, I screened all titles and abstracts of 1,299 studies (Table 1; Fig. 1a; "Screening" step) and excluded 967 studies that did not report prokaryotic biomasses. In step 3, the so-called "Eligibility" step, I excluded in total 249 studies because they did not present prokaryotic biomasses in the marine sediment surface in standardizable units, i.e., in g C cm −3 wet sediment, g C g −1 wet sediment, g C g −1 dry sediment, or g C m −2 . Furthermore, several studies lacked detailed geographical information about sampling stations or did not present primary research. Additional reasons for study exclusion were presenting prokaryotic biomasses for specific taxa instead  Fig. 1a).
In March and June 2020, I established the "ProkaDen" part of the database that consists of records of prokaryotic density as well as of density of Bacteria and of Archaea. Following the PRISMA approach 40 , I searched the Web of Science using the key words "marin* microb* abundance benth*", "'benthic bacteria' abundance marin*", "prokaryotic abundance marin*", "prokaryotic density marin*", "Archaea density abundance marin*", "Archaea   Table 2. References of biomass conversion factors to calculate prokaryotic biomass from prokaryotic densities measured with epifluorescence microscopy or with laser confocal scanning microscopy, from phospholipidderived fatty acid (PLFA) concentrations, and from adenosine triphosphate (ATP) concentrations.  www.nature.com/scientificdata www.nature.com/scientificdata/ density marin*", "Archaea abundance marin* benth*", "Crenarchaea density abundance marin*", "Crenarchaea density marin*", "Crenarchaea abundance marin* benth*", "Euryarchaea density abundance marin*", and "Euryarchaea abundance marin* benth*" and found 1,204 peer-reviewed articles (Fig. 1b). I was aware of 171 additional studies that I included in the dataset which contained 1,104 studies after removing duplicates. In step 2 of the PRISMA approach ("Screening" step), I excluded 752 studies because they did not report benthic prokaryotic densities. In the "Eligibility" step, I furthermore omitted 239 studies because they did not present surface sediment prokaryotic densities or densities of a reduced number of prokaryotic taxa instead of reporting densities of all prokaryotes. I also removed studies that showed prokaryotic densities in poor-quality figures impeding data extraction and studies that listed densities which could not be converted to the common density units cell cm −3 dry sediment, cell cm −3 wet sediment, cell g −1 dry sediment, cell g −1 dry sulfide, cell g −1 vent precipitate, cell g −1 wet sediment, or cell m −2 . I also excluded studies that reported experimental or culture studies and publications that I could not access. In the last step, I included 112 studies in the global benthic prokaryotic density database from which I extracted 1,875 georeferenced benthic prokaryotic density records ( Table 1, Fig. 1a).
In 51% of the prokaryotic biomass studies and 34% of the prokaryotic density studies, the authors of the original publications did not report exact geographical coordinates (latitude, longitude) of the sampling stations. In these cases, I approximated the sampling locations using Google Maps based on maps from the original publications and indicated this with the label "approximated location".
Prokaryotic biomasses were often not directly measured, but determined by extraction of bacterial adenosine triphosphate (ATP), extraction of bacterial phospholipid-derived fatty acid (PLFA), or by measuring prokaryotic densities. Subsequently, the authors of the original publications converted these data to prokaryotic biomasses using conversion factors (Table 2).
For cases where the prokaryotic biomasses and densities were not reported in the text or in tables, but were shown in figures, I extracted the data using ImageJ 41 .

Data records
The "ProkaBioDen database" is an open access database in the Dryad Digital Repository and contains two txt. files, i.e., the List of studies for ProkaBio database and the List of studies for ProkaDen database, and two xlsx. files, i.e., the file ProkaBio database and the file ProkaDen database 38 . The List of studies files report all studies in alphabetical order (prokaryotic biomasses: 1,300 studies:, prokaryotic densities: 1,104 studies) that I identified in the "Identification" step of the systematic review after I eliminated duplicates. Each data entry in the "ProkaBioDen database" includes information about the region and the ocean where the samples were taken, the geographical location (latitude, longitude), the water depth (in m), and the depth range after Dunne et al. 42 . The authors of said study classified the ocean into near-shore areas from 0 to 50 m water depth, continental shelves from >50 to 200 m water depth, continental slopes from >200 to 2,000 m water depth, and continental rises and abyssal plains >2,000 m depth. The database includes biomass and density records for individual sediment layers and information about the thickness of said sediment layers and its specific upper and lower boundaries when a layer was sliced horizontally, but also biomass and density records for vertically integrated sediment profiles. Additionally, the database contains information about sediment type, median sediment grain size (µm), www.nature.com/scientificdata www.nature.com/scientificdata/ sediment density (g cm −3 ), and porosity, and whether prokaryotic densities were reported for total prokaryotes, Bacteria, or Archaea.

technical Validation
In the database, 40% of the benthic prokaryotic biomass samples originated in the Mediterranean Sea, 34% in the Atlantic, and 11% in the Arctic Ocean (Fig. 2). Most benthic prokaryotic density samples were taken in the Mediterranean Sea (42%), the Atlantic (27%), and the Arctic Ocean (15%), and also benthic Bacteria and Archaea densities were mainly sampled in the Mediterranean Sea (Bacteria: 62%, Archaea: 65%) and the www.nature.com/scientificdata www.nature.com/scientificdata/ Atlantic Ocean (Bacteria: 15%, Archaea: 17%) (Fig. 3). Both, benthic prokaryotic biomasses and densities were predominantly sampled in the northern hemisphere north of 1°N (biomass: 87%, density: 90%), whereas the southern hemisphere was seriously undersampled (Fig. 4 left panel and Fig. 5). Almost no samples were collected in the Indian Ocean (biomass: 7%, density: 1%) and the Southern Ocean (biomass: 2%, density: 1%). Hence, benthic prokaryote samples are biased towards the northern hemisphere and particularly towards the Mediterranean Sea and the North Atlantic.
Benthic prokaryotic biomasses were mostly quantified in the near-shore areas at <50 m water depth (54% of all samples, Fig. 4 right panel) that encompass 2% of the global ocean floor 42 . In comparison, only 15% of all