A database of chlorophyll a in Australian waters

Chlorophyll a is the most commonly used indicator of phytoplankton biomass in the marine environment. It is relatively simple and cost effective to measure when compared to phytoplankton abundance and is thus routinely included in many surveys. Here we collate 173, 333 records of chlorophyll a collected since 1965 from Australian waters gathered from researchers on regular coastal monitoring surveys and ocean voyages into a single repository. This dataset includes the chlorophyll a values as measured from samples analysed using spectrophotometry, fluorometry and high performance liquid chromatography (HPLC). The Australian Chlorophyll a database is freely available through the Australian Ocean Data Network portal (https://portal.aodn.org.au/). These data can be used in isolation as an index of phytoplankton biomass or in combination with other data to provide insight into water quality, ecosystem state, and relationships with other trophic levels such as zooplankton or fish.

A full list of authors and their affiliations appears at the end of the paper.

Background & Summary
As the pigment chlorophyll a is present in all photosynthetic phytoplankton species 1 and is relatively easy and cheap to measure, it has become a standard proxy for estimating phytoplankton biomass 2 . Samples require minimal processing and storage in the field and are not easily contaminated. Chlorophyll a is cheaper to process using spectrophotometry or fluorometry relative to estimating phytoplankton abundance/biomass using cell counts. Importantly, chlorophyll a measurements also account for the pico and nano plankton in the samples, which are substantially underestimated by phytoplankton analysts using light microscopy. These smaller size classes account for a significant fraction (commonly>70%) of total chlorophyll a biomass 3,4 . However, whilst using chlorophyll a as an estimate of phytoplankton biomass is widespread, the relationship between the two variables is complex. Not only does the carbon to chlorophyll ratio of phytoplankton vary with species and morphological characteristics, the chlorophyll a content of a phytoplankton cell per unit of organic matter will vary with light intensity, nutrient availability, temperature and cell age [5][6][7][8] . Despite these complexities chlorophyll a remains useful as a coarse proxy for phytoplankton biomass.
In Australian waters chlorophyll a concentrations are generally lowest in the tropical and subtropical oceanic regions (0.05-0.5 μgL − 1 ) and higher in the Southern Ocean and temperate regions (up to 1.5 μgL − 1 ) 2 . In coastal zones, the chlorophyll a concentration can fluctuate greatly as phytoplankton blooms develop, peak and crash. The coastal station at Port Hacking, project number P782 in our database, is a good example where chlorophyll a concentrations typically vary between 0.1-8.0 μgL − 1 over an annual cycle, with peaks sometimes up to 15 μgL − 1 at 20-40 m depth coinciding with phytoplankton blooms 9 . In inshore estuaries and bays, high chlorophyll a values can also indicate the system is eutrophic with elevated nutrient levels from terrestrial run off. Chlorophyll a is therefore used in several water quality monitoring programs across the country (e.g. project number P1072 Ecosystem Health Monitoring Program in Moreton Bay, Queensland, Australia, http://healthywaterways.org/initatives/monitoring). Concentrations of chlorophyll a also vary throughout the oceans with oceanographic features such as upwelling and fronts which drive nutrients towards surface layers and thus enhance chlorophyll a levels 10,11 .
Here we collate all available chlorophyll a data from Australian waters, gathered from researchers, students, government bodies, state agencies, councils and databases, along with the associated metadata through the process as detailed in Fig. 1. The chlorophyll a values are as measured and no attempt has been made to synthesise the data across analysis methods. The Australian Chlorophyll a database is available through the Australian Ocean Data Network portal (AODN: https://portal.aodn.org.au/), the main repository for marine data in Australia. The Australian Chlorophyll a Database will be maintained and updated through the CSIRO data centre, with periodic updates sent to the AODN. A snapshot of the Australian Chlorophyll a Database at the time of this publication has been assigned a DOI and will be maintained in perpetuity by the AODN (Data Citation 1).

Methods
There are three standard methods for determining chlorophyll a concentrations in water samples: spectrophotometry, fluorometry and high performance liquid chromatography (HPLC). Spectrophotometric methods are described fully in Strickland and Parsons (1972) 12 , fluorometry in Zeng (2015) 13 , and HPLC in Shoaf (1978) 14 . A comprehensive discussion of the details of each method and its merits can be found in Manotura et al. To measure chlorophyll a, a known volume of water sample is filtered through a glass fibre filter paper, typically 0.45-0.7 μm pore size, under a gentle vacuum. The volume filtered varies depending on the chlorophyll a concentration expected in the sample, with more water filtered at lower concentrations, but the volume should be sufficient to produce a green tinge on the filter paper. Chlorophyll a is extracted from the filter paper with an organic solvent (e.g. acetone). Concentrations are derived from a spectrophotometer to record the light absorbance at particular wavelengths or a fluorometer that transmits an excitation beam of light in the blue range (440-460 nm) and detects the light fluoresced by chlorophyll a in the red wavelength range (650-700 nm). This fluorescence is directly proportional to the concentration of chlorophyll a. For HPLC, the filter paper is similarly extracted with an organic solvent, however pigments are then separated by passing the extract through a chromatographic column and then measured either spectrophotometrically or fluorometrically.
Although HPLC has become the accepted benchmark for the quantification of chlorophyll a the volume of data collated in this database shows that spectrophotometric and fluorometric extraction methods are much more commonly used (Fig. 2). HPLC has the advantages of being more accurate and also quantifies all the other accessory pigments but it does require specialised equipment and technical skills which make it more expensive. Spectrophotometry and fluorometry are simpler and effective, but unlike HPLC they do not differentiate between chlorophyll functional types and accessory pigments. To improve the spectrophotometric and fluorometric methods, an acidifying step (e.g. addition of a small amount of hydrochloric acid) can be added after the extraction to reduce errors associated with chlorophyll degradation products 2 .
All three methods require laboratory time and sample preparation, but in-water phytoplankton biomass can also be estimated using in-situ fluorometers. We have excluded such observations from the www.nature.com/sdata/ SCIENTIFIC DATA | 5:180018 | DOI: 10.1038/sdata.2018.18 current dataset because chlorophyll a estimates from in-situ fluorometers are notoriously difficult to calibrate to an absolute standard. Although the accuracy of fluorometers is continually improving they require regular calibration, including against other methods 13 . The instruments are somewhat unstable and measurements are influenced by the presence of other environmental factors, particularly coloured dissolved organic matter (CDOM), diel, seasonal and regional effects and would also require correction for these factors 13 . The calibration routines must account for physical factors such as sensor drift, instrument design, biofouling etc. as well as the phytoplankton community composition and physiology in the sample environment, which may vary over space and time.
Data collated for this database have come from many different sources, from long-term monitoring programs run by local governments concerned with water quality to ocean voyages on research vessels. Data have been standardised to μgL − 1 , and the collection and analysis methods have been included so that inter-project comparisons can be considered. We have collated data from researchers, local and state government agencies and regional databases, e.g. AESOP (The Australian-waters Earth Observation Phytoplankton-type products) database (http://aesop.csiro.au/). The database will be maintained by the CSIRO Data Centre and updates will be available periodically through the AODN.    to a project, with each project having a unique identification number, Pxxx. A project is defined as a set of data records that have been collected together, usually as a cruise or study, and have the same sampling and analysis methods and the same person analysing the samples. Metadata ascribed to a project relates to all data records within that project. Details to identify each project, along with their associated samples, time and space information (Table 1 (available online only), Fig. 3) allow users to select and download discrete datasets in their area of interest. While each sample within a project has a unique sample_id, there may be more than one chlorophyll a record per sample if multiple replicates or depths were sampled. The sample_id has not been changed from the original data set to maintain traceability. Therefore P(project_id)_(sample_id) may be duplicated within projects, but the chlorophyll a records within that sample, taken at different depths for example, are given a unique record_id. Each data record has been quality controlled. Data with insufficient or unreliable metadata were removed. All depths, times and locations have been validated and are within the boundaries expected for each project.

Technical Validation
The database has been constructed to ensure data extraction is straight forward, although the user needs to be aware of two caveats. First, if chlorophyll b or other pigments are present, then fluorometry may underestimate or spectrophotometry overestimate the chlorophyll a concentration relative to HPLC [17][18][19] . When an acidification step is included, the accuracy of chlorophyll a from spectrophotometric and fluorometric methods is improved as effects of chlorophyll degradation products are reduced 18 . Without further pigment information comparisons between methods need to be carefully considered. This database reports values as measured and does not attempt to compare values across methods, leaving this to the discretion of the user. Second, for the HPLC data we are reporting the sum of the chlorophyll a pigments including the divinyl chlorophyll a components. The user should thus be careful when comparing data across datasets where different analysis methods have been used. Metadata have been provided in as much detail as is available so the user is aware of methodological details specific to the project.
Chlorophyll a values can be reported in micrograms per litre (μgL − 1 ), milligrams per cubic metre (mgm − 3 ) (1 μgL − 1 = 1 mgm − 3 ) or as depth integrated values, i.e. per square metre (mgm − 2 ). In this data set we have standardised to μgL − 1 . Where depth integrated values were given, the appropriate sample depth from the study was used to convert to μgL − 1 .
All times have been converted to Coordinated Universal Time, UTC. Dates with no time component remain as reported.
The value −999 has been assigned to values that were below detection limits. The detection limit has also been included, where known, in the sample_method field of the metadata table.

Usage Notes
This dataset and metadata has been made freely available through the AODN (Data Citation 1). The Australian Chlorophyll a database is complementary to the Australian Zooplankton Database 20 and the Australian Phytoplankton Database 21 , both of which provide species-level data and are available through the AODN. Many projects in this data set have corresponding data in these species level databases and can be matched to the project via Project_id and to individual samples, via sample_id, or by using the time and date information. For example, the project 599 has data on zooplankton and phytoplankton composition, included in the aforementioned databases, plus chlorophyll a data in the current data set. Because the three data sets were collected at the same locations and times as part of the Integrated Marine Observing System (IMOS) National Reference Stations (NRS), they can be analysed together to investigate relationships among different trophic levels. These combined data have been used in an analysis of climate-driven variability contrasting the 2010 El Niño with the 2011 La Niña 22 . Further examples of using chlorophyll data in partnership with species-level phytoplankton and zooplankton data using data are from project 17 in the North West Cape, Western Australia 23,24 .
Projects 599, 1063, 1064, 1065, 1071, 1072, 1074, 1078 and 1129 are ongoing, and data will continue to be added to the Australian chlorophyll a database; for further information, contact the data custodian as listed in the metadata. The most updated version of P599 IMOS National Reference Stations, is available at: https://portal.aodn.org.au/search?uuid = f48531e2-f182-56ca-e043-08114f8c7f2e.