Global sea-surface iodide observations, 1967–2018

The marine iodine cycle has significant impacts on air quality and atmospheric chemistry. Specifically, the reaction of iodide with ozone in the top few micrometres of the surface ocean is an important sink for tropospheric ozone (a pollutant gas) and the dominant source of reactive iodine to the atmosphere. Sea surface iodide parameterisations are now being implemented in air quality models, but these are currently a major source of uncertainty. Relatively little observational data is available to estimate the global surface iodide concentrations, and this data has not hitherto been openly available in a collated, digital form. Here we present all available sea surface (<20 m depth) iodide observations. The dataset includes values digitised from published manuscripts, published and unpublished data supplied directly by the originators, and data obtained from repositories. It contains 1342 data points, and spans latitudes from 70°S to 68°N, representing all major basins. The data may be used to model sea surface iodide concentrations or as a reference for future observations.


Background & Summary
There has recently been a resurgence of interest in the marine iodine cycle, reflecting its involvement in a diverse range of processes, from influencing air quality (e.g. 1 ) to recording ocean deoxygenation in sediments (e.g. 2 ). Iodine is a redox active element that is present in seawater in two main forms, iodide (I − ) and iodate (IO 3 − ). Sea-to-air transfer is the dominant source of iodine to the atmosphere, where it is subject to atmospheric processing prior to deposition back to the sea or onto land. It is an essential nutrient for many organisms including humans, and deficiency in humans leads to goitre, cretinism and is the leading cause of preventable mental retardation globally 3 . Iodine radionuclides are also released to the oceans by anthropogenic activities, where they will be subject to the same processes of biogeochemical cycling and volatilisation as the naturally occurring stable isotope 4 . Despite the wide ranging impacts of iodine biogeochemistry, the distribution of iodine species in the oceans remains relatively poorly understood. Here we present an updated compilation of all currently available sea surface iodide concentrations. The data set is specifically intended to inform studies of the sea-air exchange of iodine species, but may also be of use in improving understanding of the marine iodine cycle more generally.
The reaction of iodide with ozone at the surface of the ocean has been established as an important sink for ozone, thought to be responsible for around one third of the total ozone loss by dry deposition 5 . The reaction liberates reactive iodine compounds to the atmosphere, which in turn contribute to further ozone removal processes. Gas phase reactions involving iodine are estimated to account for up to 15% of tropospheric ozone losses 6 . To incorporate this chemistry, global and regional air quality and atmospheric chemistry models have begun to include predicted sea-surface iodide fields derived from parameterisations (e.g. 5,[7][8][9] ). However, current sea surface iodide parameterisations are known to have biases 10 , are subject to substantial uncertainty 6 , and do not take advantage of recent and substantial increases in the number of available observations (e.g. 11 ).
The observational data underpinning iodide parameterisations is sparse, and has hitherto not been publicly available in a collated form. In many cases, iodide observations are not readily accessible in a digital form (i.e. are only presented in graphical format). To facilitate the development and validation of improved sea surface iodide parameterisations, we have compiled all available sea surface iodide observations. The dataset is an extended version of that used in our earlier publication 12 , in which we described the large scale sea surface iodide distribution and presented correlations between iodide and other oceanographic variables, but did not publish the observations themselves. The dataset we now present incorporates more than 400 new observations (see Fig. 1), including new, basin scale transects from the Indian Ocean (currently unpublished) and the tropical eastern Pacific 11 , both of which were previously undersampled 12 . This new extended dataset is freely available via the British Oceanographic Data Centre (BODC; http://doi.org/czhx) 13 .
We anticipate that the primary use of the dataset will be modelling of ozone deposition to the sea surface and/or associated trace gas emissions to the atmosphere. It has been used to generate new monthly parameterised sea-surface iodide fields (12 × 12 km resolution) using a machine learning approach, described in our accompanying partner publication 10 . The dataset may also be of interest in other areas of iodine research. In particular, improved understanding of the marine iodine cycle is needed to refine the use of iodine speciation as a paleo-oceanographic tracer of past ocean oxygenation (e.g. 2 ), and to better predict the impacts of iodine radionuclides released to the environment by anthropogenic activities (e.g. 4 ).

Methods
Data compilation. The data set includes iodide measurements made by a number of different research groups (Online-only Table 1). These were collated from the following sources: A. Published manuscripts. Data was digitised from tables and graphics, either by hand or using the free online tool WebPlotDigitizer (https://automeris.io/WebPlotDigitizer). B. Data originators. Data (both published and unpublished) was provided directly by the owners. C. Data repositories. Data was obtained by request or on-demand download from hosting repositories (e.g. BODC, PANGAEA, the US JGOFS Data System). www.nature.com/scientificdata www.nature.com/scientificdata/ Following the approach adopted previously 12 , 'surface' concentrations are considered to be those from depths of less than 20 m. As discussed in Chance et al. 12 , the ocean is usually considered well mixed to this depth, and to restrict 'surface data' to shallower depths would substantially reduce the number of observations included. We examined a sub-set of data (n = 93) where observations were available from multiple depths within the upper 20 m of the water column. While significant differences were occasionally found between individual pairs of samples collected from depths of ~1-2 m and ~10 m at a given station, concentrations were within 10 nM in almost 50% of pairs (49.5%), and 80% were within 26 nM. Statistical analysis (using a paired students t-test) found no significant difference between samples from different depths within the upper 20 m. The exact depth of near surface samples can itself have high relative uncertainty, as factors such as sea swell can lead to metre scale fluctuations to the exact depth of e.g. a ship seawater inlet. Furthermore, the exact depths of such inlets, or the 'surface' sample bottle, was not always stated in the original data sources. Therefore, we have not included depth as a parameter in our compiled data set and no distinction has been made between samples obtained using a CTD rosette fitted with Niskin bottles (or similar), a pumped underway seawater supply or a manual method (such as bucket sampling).
Each data set was entered onto an individual Excel spreadsheet in a standard format. Rarely, source values were below the limit of detection (LoD) for the method used. Where this was the case, we have used a substitute value of 0.75 x the estimated LoD and the data point was flagged (column 'ErrorMethod'). No further processing has been applied to any of the data. It has not been normalised e.g. to salinity. Required fields from individual 'input' files were then collated into a single comma-separated value (.csv) file using open-source Python code, including the Pandas package 14 .
A total of 1342 observations, from 57 individual data sets has been collated (Online-only Table 1). This is an increase of 417 observations (45%) on that included in our earlier compilation 12 . Locations of individual data points are shown in Fig. 1, which highlights how the expanded dataset increases spatial coverage. The earliest observations were made in 1967 and the most recent in 2018. For some data points (n = 32) the date of sampling is not specified as this was not given in the original publication. Ten of the input data sets are currently unpublished. Table 1. In addition to spatial and temporal co-ordinates, the estimated uncertainty and analytical method used to generate the observations are provided.

Additional fields. Each iodide observation is associated with the record fields listed in
Method. Analytical methods are summarised in Table 2. In the majority of cases (~53%), iodide was measured by cathodic stripping square wave voltammetry (CSSWV) according to the method of Campos 15 . However a range of other measurements techniques were also used. Iodide was sometimes measured as the difference between the total inorganic iodine (TII) concentration and the iodate concentration.
Uncertainty. Measurements of iodide in seawater are subject to non-trivial analytical uncertainties, which should be considered when using the data set. An estimate of the uncertainty associated with each observation has been included, using either information provided by the data source where available, or comparison to other measurements using the same analytical method. The uncertainty estimates provided are typically derived from replicate analyses of the same sample, and so represent the precision of the measurements. As insufficient information was available to quantify the precision in the same way for all observations, the approach used to estimate the precision is also included (see Table 1). Relative uncertainty estimates for each analytical method, for typical ambient concentrations in a seawater matrix, are also provided in Table 2. The precision given for each data set is often 5% (Table 2), which reflects the stated repeatability of the CSSWV method 15 and a number of other measurements used. However, we note that repeat analyses of samples using this method can sometimes give lower precision (e.g. ~10%) 16 . Considering all data points in our dataset, we find ~75% have a precision of 10% or less, and ~51% have an precision of 5% or less. Such uncertainties are modest in comparison to the global scale variation in sea surface iodide concentrations (from less than 10 to more than 200 nM; Fig. 2).
As the uncertainty estimates provided are typically derived from replicate analyses of the same sample they only estimate the short (days) to medium term (approx. monthly) repeatability. A fuller consideration of the uncertainty should also include the longer term (months to years) reproducibility, and an estimate of any uncertainties arising from bias, and thus may result in a larger uncertainty value. These sources of uncertainty are as yet poorly documented for the determination of iodide in seawater. At least in the case of the most commonly used method (CSSWV), we believe the contribution of long term reproducibility and bias to be small relative to the short-term precision. This is because the key of sources of uncertainty (e.g. that associated with making standard additions and sample dilutions by pipette, or variation in mercury electrode drop size) operate over a short time scale. Within our own laboratory, we have been monitoring long term reproducibility of the CSSWV method using aliquots of a near shore seawater sample, and estimate it at ~12% RSD over a period of 11 months (analysis by three operators using two different instruments; individual aliquots stored at −20 °C and defrosted within a few days of analysis), compared to ~monthly repeatability of 7-12% and repeatability over a few days of 5 to 18%. Changes taking place during storage will also contribute to the overall uncertainty of reported observations; for samples stored frozen (−16 °C), average iodide recovery after one year was 95-96%, compared to an average standard deviation of 5-8% 15 . In the majority of data sets we include, samples were stored frozen for less than one year prior to analysis, others were either analysed immediately following collection or stored for a shorter period refrigerated. Therefore we assume that storage artefacts were minimal. This view is supported by the oceanographic consistency found between stored and freshly analysed samples.
Assessment of bias in iodide in seawater determinations is hindered by the lack of a suitable reference material -many similar reference materials e.g. for trace metals, are acidified, which is unsuitable for the preservation of iodine speciation. Inaccuracy in standard preparation will contribute to bias in the short-term (all samples analysed using same standard), but are likely to become a random error in the longer term (several standards used over time). In either case, this should be a small contribution, as the uncertainty associated with preparing a typical analytical standard (e.g. 10 μM standard) should be less than 1% in a competent lab with well-maintained and calibrated equipment (e.g. balance, pipette). Other contributions to bias, such as matrix effects, are minimised by the use of standard additions rather than external calibration in the CSSWV protocol. In the absence of an iodide reference material, Campos 15 tested the accuracy of the CSSWV method using solutions of known iodate concentration and a reduction step, and found it to be 99 ± 5.7% for 34 analyses. Given the current interest in marine iodide concentrations 2,10,11 , we believe that an inter-laboratory calibration exercise leading to development of a saline iodide reference material with a consensus value would be very timely. Such an exercise could follow the model of the recent GEOTRACES inter-calibration scheme (http://www.geotraces.org/Intercalibration).
Geographical categorisation. Data points are categorised as either 'coastal' or 'non-coastal' . Following the approach used in Chance et al. 12 , this is determined by the designation of their static Longhurst biogeochemical province 17 . In most cases, the Longhurst province was assigned automatically, according to the nearest whole number degree of latitude and longitude. For a small number of samples collected very close to the coast, province (and hence coastal/non-coastal) was assigned manually -these samples are flagged (see Table 1). As in Chance et al. 12 , a small number of samples collected near Bermuda were also categorised as 'coastal' despite being located in an open ocean province (North Atlantic Subtropical Gyre Province (West)), as they were collected from an inshore area 18 . These samples are identified as such in the 'Locator Method' column.
Ancillary data. Note that original ancillary data such as temperature and salinity is not included, as this was not reliably available for all data sets. Instead we recommend the use of climatological data (e.g. the World Ocean Database and World Ocean Atlas Series) selected according to user needs.

Data Records
The compiled dataset is hosted by BODC (https://doi.org/10.5285/7e77d6b9-83fb-41e0-e053-6c86abc069d0) 13 , and is available as a single.csv file (plus a separate metadata file). It includes the fields listed in Table 1. It is anticipated that updated versions will be made available periodically, as new sea surface iodide observations become available. The current iteration is termed Version 1.0, future iterations will be named sequentially (i.e. version 2.0 etc). The lead authors would be very pleased to be contacted regarding new or omitted iodide observations for inclusion in future iterations of the dataset. Iodide concentration, nmol L −1 Estimated uncertainty on iodide concentration, nmol L −1 Indicates way uncertainty was estimated: 1 Precision stated in paper or by source, based on replicate analyses of selected samples 2 Precision assumed same as in similar work using same method 3 Individual samples all analysed in replicate, uncertainty is range (n = 2) or sd (n > = 3) 4 Propagated analytical uncertainty for replicates of a given sample, where this is greater than uncertainty determined by ( Table 2 for method codes

Coastal
LocatorFlag 0 = open ocean location 1 = coastal location Indicates way coastal flag was assigned as follows: 0 Location found by province picker, coastal determined according to Province 1 Location not found by Province picker as too close to coast, so Province manually assigned. Province necessarily coastal.
2 Province is open ocean, but individual samples known to be coastal e.g. Bermuda Inshore Reference Publication in which the data set is described.

technical Validation
Of the records included in our database, the majority (47/57) are described in peer-reviewed literature, and a further two are from PhD theses, and so their quality has already been subject to scientific scrutiny. Unpublished data sets made use of well-established analytical techniques, including the use of calibration standards and replicate analyses. In addition, the majority of data points were described in our earlier peer reviewed manuscript 12 , and were shown to have to a cohesive global distribution. The distribution of observations in the extended dataset  Table 2. Analytical methods and associated uncertainties. www.nature.com/scientificdata www.nature.com/scientificdata/ continues to conform to this distribution (not shown), with concentrations remaining in the expected range (Fig. 2).
A very small number of unusually high concentration points (19 with iodide levels higher than 400 nM) are present in the data set. These are not representative of the overall iodide distribution, all being above the 98 th percentile and also defined as outliers under the Tukey definition 19 . Where present, these extreme outlier values have been subject to rigorous scrutiny and are believed to be real.
We have not evaluated the data set to look for systematic differences between measurement techniques, as method used and location (and hence iodide concentration) are not independent variables. In most cases, only a small number of geographically limited points are available for a given method ( Table 2). As noted, more than half the observations have been made using the same CSSWV technique. The remainder have been analysed using a wide range of other approaches, including, for some of the earliest datasets, labour intensive 'wet chemical' procedures which have since been superseded. In particular, a large proportion of the Pacific measurements were made in between 1968 and 1970 20,21 using a revised version of the Sugawara precipitation method 22 . The scarcity of more modern data from the Pacific limits comparisons, but we note that the range of this early Pacific data (3-168 nM) falls within that of the global data set, with a well-defined latitudinal distribution consistent with that observed overall. Regional concentrations (e.g. high latitudes, north Pacific 23 ) are in agreement with those measured subsequently using different methods. Furthermore, the original data sources report vertical iodide profiles consistent in shape and magnitude with more recent measurements. Data obtained using the original, unmodified Sugawara method 24 (1955) is not included, as this method is known to have poor performance 22 .
As described earlier, iodide observations are subject to non-negligible analytical uncertainty; we have reviewed the uncertainty estimation for each data set, and present this alongside the observations. As noted above, precision has usually been taken to represent method uncertainty. A variety of different methods have been used to estimate this, and so uncertainty magnitudes may not be directly comparable across all datasets.

Usage Notes
For computational convenience, iodide concentrations and associated uncertainties are provided to one decimal place (units are nM for both). However, note that this does not usually reflect the precision of the data points correctly, as this is typically a few percent.
For the purposes of investigating large-scale trends and creating regional iodide parameterisations, it may be appropriate to exclude the very high outlier values noted in the preceding section. Similarly, a number of points are from relatively low salinity estuarine areas (e.g. the Skaggerak), and so may not be representative of true marine trends in iodine speciation.
Missing fields are shown as not a number ("NaN") in the output data file.

Code availability
The Python code used to prepare the archived data, and to enable incorporation of any subsequent observational data files, has also been made permanently available (https://doi.org/10.5281/zenodo.3271678) 25 .