Measurement of Cyanobacterial Bloom Magnitude using Satellite Remote Sensing

Cyanobacterial harmful algal blooms (cyanoHABs) are a serious environmental, water quality and public health issue worldwide because of their ability to form dense biomass and produce toxins. Models and algorithms have been developed to detect and quantify cyanoHABs biomass using remotely sensed data but not for quantifying bloom magnitude, information that would guide water quality management decisions. We propose a method to quantify seasonal and annual cyanoHAB magnitude in lakes and reservoirs. The magnitude is the spatiotemporal mean of weekly or biweekly maximum cyanobacteria biomass for the season or year. CyanoHAB biomass is quantified using a standard reflectance spectral shape-based algorithm that uses data from Medium Resolution Imaging Spectrometer (MERIS). We demonstrate the method to quantify annual and seasonal cyanoHAB magnitude in Florida and Ohio (USA) respectively during 2003–2011 and rank the lakes based on median magnitude over the study period. The new method can be applied to Sentinel-3 Ocean Land Color Imager (OLCI) data for assessment of cyanoHABs and the change over time, even with issues such as variable data acquisition frequency or sensor calibration uncertainties between satellites. CyanoHAB magnitude can support monitoring and management decision-making for recreational and drinking water sources.


1.
To examine cyanoHAB magnitudes in lakes that are known to have cyanoHAB related water quality issues. These two states also have a significant number of lakes that are resolvable in MERIS/OLCI data. Many lakes in the Coastal Plains ecoregion, which includes Florida, are known to have cyanoHAB issues, with 34% of lakes known to be hypereutrophic by the NLA in 2007 30 . The 2007 NLA also reported that 43% of lakes in Florida had microcystin present 31 . Similarly, cyanoHABs are a common water quality issue in the Temperate Plains ecoregion, which includes western Ohio, where 45% lakes are considered hypereutrophic 30 . Approximately 32% of lakes in Ohio had microcystin present in 2007 31 .

To consider results from lakes located in different geographic and climatic regimes. The climate in
Florida is subtropical, with hot, humid, high precipitation summers and mild, dry winters. In contrast, Ohio has a temperate climate with cold winters, hot and humid summers, and year-round moderate precipitation 32 . Land and lakes are shown in gray and blue colors respectively.
3. To assess the impacts of differences in data coverage in each location. MERIS full resolution (FR) data collection frequency prior to 2008 was inconsistent. The temporal frequency of MERIS FR data over Ohio is higher than that over Florida during this time period. The consideration of two states with different temporal data coverage will illustrate the effect of reduced data frequency on the bloom magnitude metrics. Figure 2 shows the steps of the data analysis and workflow carried out in this study. Individual components of the data and methods are presented below. Lake outline data. The lakes were screened for size using polygons of lakes and water bodies from the National Hydrography Dataset Plus version 2.0 (NHDPlusV2) lake polygons dataset 33 , with the condition that each selected water body should be resolved by a satellite image with 300 × 300 m pixel resolution. Lakes and other water bodies were considered resolvable if they had the equivalent of three connected non-mixed water pixels (i.e. three pixels without any land) within the NHDPlusV2 dataset. Further, all selected water bodies were screened and filtered using U.S. EPA's 2012 NLA 34 site evaluation guidelines (U.S. EPA, 2011). Waterbodies classified as intermittent or estuarine were excluded from further analysis based on NLA criteria (although some estuarine lakes in Florida were not identified and excluded, as discussed later). The final lake polygon layer included 135 lakes in Florida and 21 lakes in Ohio that would be resolved in FR MERIS/OLCI imagery. The surface area of resolvable lakes in Florida varied from 1.26 km 2 to 1427 km 2 with a median surface area of 5.31 km 2 , whereas, surface areas of Ohio lakes varied from 1.98 to 53 km 2 with a median of 8.9 km 2 . In Florida, Center Lake and Lake Okeechobee were the smallest and largest lakes considered in this study, respectively. In Ohio, Evans Lake and Pymatuning Reservoir were the smallest and largest, respectively. Cyanobacteria estimation algorithm. Cyanobacterial quantity was found through a combination of biomass estimation and cyanobacterial presence algorithms 11,28 . The Cyanobacteria Index (CI) measures a proxy of Chl-a absorption and provides the cyanobacterial biomass 11,18,19,29,41 . It is calculated with a spectral shape (SS) algorithm 28,42 and is presented as: where ρ s is the top of atmosphere reflectance corrected for Rayleigh radiance, λ is the central band, and λ + and λare the adjacent reference bands. Cyanobacteria Index (CI) is calculated by centering the spectral shape at 681 nm and changing the sign of SS, or CI = −SS (681). The CI evaluates Chl-a absorption at 681 nm. At 681 nm, chlorophyll in eukaryotes typically fluoresces strongly, leading to increased apparent reflectance that obscures chlorophyll absorption. In cyanobacteria, however, chlorophyll fluorescence is much weaker 43 , such that Chl-a absorption dominates the radiance signal from the water at 681 nm, causing the reflectance at 681 nm to decrease relative to 665 and 709 nm.
For more specific identification of cyanobacteria, a SS using 620, 665, and 681 nm was used to identify the presence of PC, a characteristic pigment in this taxonomic division with identifying features in this spectral region 25,41,44 . (Estimated PC concentration is not used as it has several issues, in particular, it is not a consistent estimator of cyanobacterial biomass, and the more usable PC concentration algorithms require robust atmospheric corrections 45 greatly limiting data availability.) In this case, a conditional negative SS (665) value indicates the presence of PC. Inclusion of 620 nm, which is the absorption peak of PC, a characteristic photopigment in cyanobacteria, reduces the false detection issue. In the case of cyanobacteria, SS (665) turns negative due to lower reflectance at 620 nm band and is used as an exclusion criterion to select only cyanobacteria. This spectral shape condition has also been applied by 46   to separate cyanobacteria from other algal blooms in African lakes. The CI product, when SS (665) is negative, is termed as CI-cyano and was used to estimate cyanobacteria biomass in this research.
For purposes of setting risk thresholds, we applied a relationship between CI and cyanobacterial cell concentration of 10 8 cells mL −1 per 1 unit of CI-cyano 11 . While the relationship 11 was developed for Microcystis (so we term the value as "Microcystis-equivalent cells"), it was validated by 15,41 for unspecified total cell concentrations of cyanobacteria in eight U.S. eastern states across New England (Connecticut, Massachusetts, Maine, New Hampshire, Rhode Island, and Vermont), Ohio, and Florida. Mean absolute percent error (MAPE) of 28.6% was reported between field-measured cyanobacteria biomass data (cells mL −1 ) and satellite-derived cell biomass 15 . This CI algorithm has also been confirmed for detecting cyanobacterial blooms and estimating biomass (cells mL −1 ) in other areas 18,19 .
Fourteen-day and seven-day maximum temporal CI-cyano composites were computed for 2003-2007 and 2008-2011 MERIS FR time series data. Fourteen-day intervals were chosen for compositing the 2003-2007 time series in order to address the FR data gaps as discussed earlier. Maximum temporal composites, that is, reporting of the maximum value retrieved during the 14-day or 7-day window, serve two purposes. First, many cyanoHAB species such as Microcystis, Aphanizomenon, and Dolichospermum, have buoyancy control mechanisms and will typically float to the surface in the day when vertical water column mixing is weak 47 . It is expected that over a 14-day or 7-day window, cyanobacteria would be near the surface on one or more days to be captured in the satellite data 12,47 . In addition, compositing reduces the amount of missing data, particularly due to clouds and where, the indices P and T represent, respectively, the number of valid pixels with detectable CI-cyano in a lake, and the number of composite (time) sequences in each month (e.g. two composites in 2003 and four in 2011). Index M represents the number of months in a season/annual study period. Bloom magnitude was expressed in CI units, which is dimensionless. As noted above, CI-cyano can be converted to Microcystis-equivalent cells by multiplying by the factor 10 8 (cells mL −1 ) 11,15,41 , to provide a more intuitive biomass metric. In order to be able to compare bloom magnitude across lakes with different surface area, we normalize bloom magnitude by lake surface area as below: Bloom magnitude dl Lake Surface Area km Area normalized bloom magnitude ( ) ( ) 2 − = Lake surface area in Eq. (3), as detectable by satellite images, was estimated by identifying all pixels inside a lake polygon vector layer that were classified as water during 2008-2011. The number of water pixels was converted to area by multiplying it by area of one MERIS FR pixel (0.09 km 2 ). Note that the estimated surface area may change over time due to seasonal precipitation and evapotranspiration. However, this satellite-adjusted surface area is a better representation of surface area than that available in lake databases, which often include dry lake beds and/or areas not covered by standing water. Henceforth, bloom metrics in Eqs. 2 and 3 are referred to as magnitude and area-normalized magnitude for brevity.
Based on the World Health Organization's (WHO) cell abundance threshold 48 , three magnitude classes were considered for categorizing lakes as Low (≤20,000 cells mL −1 ), Moderate (20,000 ≤ cells mL −1 ≤ 100,000), and High (>100,000 cells mL −1 ) exposure health risk. We estimated the area-normalized magnitudes that are equivalent to the WHO thresholds using the CI-cell abundance relation 11,15,41 and normalizing this cell-equivalent threshold by the pixel unit area (CI threshold/0.09 km 2 ). These area-normalized CI thresholds are ≤0.0022 for WHO-low, 0.0022 to 0.0111 for moderate, and >0.0111 for high. We have also added 'Very High' (V.High) category when estimated cyanobacteria concentration and area-normalized magnitude exceeded 1,000,000 cell mL −1 and 0.111, respectively.  15,29 . To address this issue, the lakes were ranked based on their seasonal or annual area-normalized magnitude with rank 1 assigned to the lake with the greatest area-normalized magnitude in a specific year. When more than one lake had the same area-normalized magnitude level, the minimum possible rank was assigned to all lakes in the group. To summarize across years, each lake's median rank for the observational period was used. While ranks reduce the information on absolute bloom impact that is found in magnitude, they offer a key advantage over magnitude by allowing us to compare lakes between years, even when differences in data frequency biases the magnitudes. Ranks might still be region-specific, depending on differences in data coverage. For example, magnitude scores cannot be compared (ranked) between regions (states) having widely varying data sampling, because the lakes with sufficiently higher data frequency might appear to have more severe blooms than the lakes with reduced frequency. Lakes in regions (states) with similar data coverage can be compared with minimal bias.

Ranking of lakes
We used a non-parametric statistic, Theil-Sen's slope estimator 49 for assessing trends in the ranks of cyanoHAB magnitude in a lake over time, with Kendall's τ 50 for the strength of the trend. Theil-Sen's slope is estimated as the median of the set of slopes in the ranked and paired data. Theil-Sen's estimator for slope makes no assumptions about error distribution and provides an unbiased estimator of trend 51 . Theil-Sen's slope was expressed in the units of ranks yr −1 and interpreted as the number of ranks increased or decreased over time for a lake in question. A negative trend (toward the rank of 1) indicates that the lake is getting relatively worse. Kendall's τ, by determining the concordance between all pairs of two ranked variables indicates the strength of a monotonic trend 51 . Kendall's τ varies between −1 and +1, where a positive τ indicates that the ranks of both variables increase together, and a negative τ indicates that as the rank of one variable increases the other decreases. As we have a slope for direction, we report absolute value of τ. A τ value of |0.2| to |0.5| indicates a moderate trend and >| . | 0 5 indicates a strong trend.

Results
Annual area-normalized magnitude in Florida lakes. CyanoHAB magnitude is calculated as an annual mean of 7-or 14-day maximum accumulation of CI in the lakes in Florida over the observation period. Therefore, large lakes are more likely to have a higher accumulation of biomass as compared to lakes with smaller surface area (Fig. 3). Normalization of magnitude by lake surface area removes the influence of lake size from the metric (2019) 9:18310 | https://doi.org/10.1038/s41598-019-54453-y www.nature.com/scientificreports www.nature.com/scientificreports/ and allows comparison of cyanobacteria area-normalized magnitude among lakes of different sizes. Results from 2011 are shown in Fig. 3, which highlights the impact of normalization of annual bloom magnitude using the lake surface area. Before normalization, Lake Apopka, Lake Okeechobee, Lake George, Hancock Lake, and Right Arm Lochloosa Lake ranked 1 st to 5 th respectively in 2011. Lake Apopka, which has the fourth largest surface area, but year-round blooms, ranked first, whereas Lake Okeechobee, with the largest surface area but less frequent blooms, ranked second in terms of cyanoHAB magnitude in 2011 (Fig. 3A). When the mean annual biomass estimates were normalized by lake surface area, Hancock Lake, Cuthbert Lake, Thonotosassa Lake, Right Arm Lochloosa, and West Lake ranked in the top five positions respectively (Fig. 3B). These results highlight that: 1) in 2011, while Lake Okeechobee had an intense bloom that covered only a portion of the lake, most of Hancock Lake was impacted; and 2) the average area of Hancock Lake was more severely affected by cyanobacteria than the average area in Lake Okeechobee.
Area-normalized magnitude rankings for all lakes during the study period were analyzed to identify those lakes with the most severe cyanobacteria blooms needing attention (Fig. 4). Lakes are ordered by their median rank from 2003-2011 in ascending order. Hancock Lake, Lake Apopka, Lake Dora/Beauclair/Carlton, Cuthbert Lake, and West Lake are the top five lakes for annual area-normalized bloom magnitude in Florida (Fig. 4, Table 1). The three top-ranked Florida lakes exhibited little variation over time (Hancock Lake, Lake Apopka, and Lake Dora/Beauclair/Carlton) (Fig. 4). Obviously, for a change in rank to occur one lake changes to a lower rank, another lake must move to a higher rank. The changes are not evenly distributed. Six lakes showed large declines: Right Arm Lochloosa and Lake George declined at ~6 ranks yr −1 , and Clinch, Hamilton, Panasoffkee, and Buffum lakes declined at 3 to 5 ranks yr −1 . Of these lakes, the decline was highly consistent (τ > 0.5) for Lochloosa, Panasoffkee, George, and Buffum, and moderately consistent (τ = 0.3-0.5) for the others. In contrast, only three lakes (Seven Palms, Leonore, Konomac) had consistent (τ > 0.3) and large increases in rank (better) changing at +6-7 ranks yr −1 . Overall, area-normalized magnitude improved in Dias Lake, Monroe Lake, Deaton Lake, and Lake Griffin, which resulted in their lake ranks increasing at the rate of ~ + 2-3 ranks yr −1 . Several of the lakes at the southern tip of Florida (e.g., Cuthbert, West), while in the Everglades, are actually estuarine with salinity influenced by Florida Bay. These are noted by asterisks (*) in the table.
In order to infer potential exposure risk of cyanoHABs in Florida lakes, we determined the number of lakes where cyanoHAB abundance exceeded the specific WHO risk thresholds of Low, Moderate, High, and V.High levels. A recreational Low WHO limit indicates lakes that are unlikely to have a management concern 48 . Out of 135 lakes, 34-58 (range represents the number of lakes in a specific year) lakes were assigned to the High category with area-normalized magnitude over the study period (Fig. 5A). 2011 witnessed the maximum number of lakes in the high category (n = 58, ~43% of all lakes in Florida). Similarly, the area-normalized magnitude for 66-90 lakes were in the Moderate category and 1-11 lakes were in the Low category. In 2010, all lakes were in Moderate and High categories. Area-normalized magnitude fell into the V.High range in 10 lakes (Right Arm Lochloosa Lake, Bear Lake, Parker Lake, Apopka Lake, Thonotosassa Lake, West Lake, Hancock Lake, Cuthbert Lake, Lake Griffin, and Lake Dora/Beauclair/Carlton, not in order) over the study period. In 2008 and 2011, eight (excluding Lake Griffin and Thonotosassa Lake) and nine (excluding Lake Griffin) out of those 10 lakes were categorized as V.High.
Without normalization by lake surface area, median magnitude data in Florida looked completely skewed (min = 0.007 CI, max = 15.49 CI, median = 0.05 CI), where about 90% of lakes were below annual bloom magnitude of one CI. The top 10% (13) having the highest median bloom magnitude had bloom magnitudes  Table 2). These top three lakes maintained the ranks consistently and were in WHO High category with area-normalized magnitude > 0.011 confirming severe CyanoHAB issue over time. Unlike the case for Florida, the variance in rank change for Ohio lakes during the study period is negligible, indicating that lakes maintained their area-normalized magnitude ranks consistently every year. However, there were substantial differences in cyanoHAB magnitude among different Ohio lakes. For instance, the median area-normalized magnitude in Grand Lake St. Marys (0.27 km −2 ) was ~12 times    www.nature.com/scientificreports www.nature.com/scientificreports/ higher than 50% of the lakes in Ohio ( Table 2). Out of 21 lakes in Ohio, there are three lakes -Ladue Reservoir, Clarence J. Brown Reservoir, and Evans Lake, where the relative rank decreased, or the area-normalized magnitude deteriorated over time (slope = ~1-1.5 ranks yr −1 ). These lakes showed relative decline against other lakes that showed relative improvement, especially East Fork Lake, Pymatuning reservoir, Lake Milton, Bresler Upground Reservoir, and Michael J Kirwan Lake.
Based on the WHO recreational thresholds, 13 to 16 lakes (~62-76%) had an area-normalized magnitude in the High category (Fig. 5B)   www.nature.com/scientificreports www.nature.com/scientificreports/ Area-normalized magnitude in Grand Lake St. Marys stayed in the V.High category throughout the study period. Similarly, the area-normalized magnitude in Buckeye Lake fell in the V.High category during 2003-2007 and in 2011. Bloom magnitude in Indian Lake reached the V.High threshold only in 2007. Median bloom magnitude data before normalization clearly highlighted four lakes: Grand Lake St. Marys (13.07 CI), Buckeye Lake (2.09 CI), Pymatuning Reservoir (1.64 CI), and Indian Lake (1.59 CI), where bloom magnitudes were one order of magnitude higher than the rest of the lakes.

Comparison of area-normalized magnitude in Florida and Ohio lakes.
In order to compare Florida and Ohio, the magnitudes from the two states were combined and lakes were then ranked. In the combined dataset, the seasonal area-normalized magnitude estimation was limited to a time period of May 1 st -Oct 31 st for each year to exclude the snow/ice-covered winter months and to include area-normalized magnitude status during the typical cyanoHAB season. In addition, a comparison between the two states was only performed from 2008-2011 to avoid any positive bias in the MERIS FR data in Ohio relative to Florida prior to the year 2008.
Among all lakes from the two states, Grand Lake St. Marys (OH), Hancock Lake (FL), Apopka Lake (FL), Cuthbert Lake (FL), Lake Dora/Beauclair/Carlton (FL), West Lake (FL), Right Arm Lochloosa Lake (FL), Parker Lake (FL), Buckeye Lake (OH), and Thonotosassa Lake (FL) are among the top ten lakes based on median area-normalized magnitude rank as observed from 2008 to 2011 (Fig. 8, Table 3). Based on seasonal magnitude (not normalized), Apopka Lake in Florida (15.75 CI), Grand Lake St. Marys in Ohio (12.78 CI), and Lake Okeechobee in Florida (12.71 CI) are top three lakes in descending order (Fig. 9), reflecting the large magnitude blooms that occurred in these large lakes.

Discussion
Evaluating severity each year and comparing lakes provides a potentially important resource for managers, including under such regulatory frameworks as the European Union Water Framework Directive 52 and the U.S. Clean Water Act 53,54 . In Lake Erie, an assessment of magnitude (over a 30-day period) from satellite proved essential to the development of nutrient target strategies 55 , and has also led to an annual forecast of bloom severity 45 . In comparison to satellite, traditional routine monitoring is difficult and expensive. Exceptionally strong programs such as Ohio EPA's drinking water program may collect a sample each week in a water body (although for toxin only) 56 . More common for a water quality monitoring program is monthly or quarterly sampling of water quality (including Chl-a), such as seen in Florida's several monitoring programs 57,58 .
The results presented here capture the relative severity of cyanobacterial blooms observed in state monitoring programs. In Ohio, the concentration of microcystin toxins is the most common water quality measurement. Three of the top four lakes (Grand Lake St. Marys, Buckeye, and LaDue) consistently reported the highest microcystin concentrations of the observed lakes when Ohio EPA started sampling in 2010, and these were well above the WHO recreational risk levels (10 µg L −1 ) 15 . These lakes are also currently listed as impaired due to algae and associated microcystin 56 . The fourth, Indian Lake, does not have routine sampling.
In Florida, we verified the ranking using field Chl-a data for lakes found in both the Florida Water Atlas 58 and in the top 50 lakes from our satellite-based observations (as in Table 1). Similar to annual area-normalized magnitude estimation, we calculated annual mean Chl-a concentration for the study years 2003-2011 by taking the mean of monthly mean Chl-a concentrations for all samples from a lake available in the database. To match area-normalized magnitude, we calculated the median of annual mean Chl-a concentrations over the study period. We further ranked the ten lakes based on median area-normalized magnitude and median of annual mean Chl-a concentration over nine-year study period (Fig. 10), and found similar results across these lakes from Lake Hancock (1 in both) to Hamilton Lake (rank 10 among this set of lakes, and 49 in the larger satellite dataset).
Two studies with satellite-based methods examined Ohio and Florida lakes, and had similar conclusions about the most impacted lakes as found here, although they use different methods. Gorham et al. 27 used 10 years of MERIS data (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011) to estimate PC with a semi-analytical model 25 . Lakes were evaluated using the maximum PC concentration for the year at each pixel in each lake. This approach would rank a lake having a single day of high concentration as more severe than a lake with a slightly less severe long-duration bloom. Regardless, this approach also put Grand Lake St. Marys, Buckeye Lake, Indian Lake, and Seneca Lake as the top four, matching our result.
Clark et al. 15 focused on bloom frequency. They calculated cyanoHAB frequency for Ohio and Florida lakes as the fraction of total pixel observations where cyanoHAB abundance exceeded the WHO's threshold of 100,000 cells mL −1 , and then ranked those lakes by the overall frequency during 2008-2011 study period. They concluded that Lake Apopka in Florida (cyanoHAB frequency = 99.1%) and Grand Lake St. Marys in Ohio (cyanoHAB frequency = 83.1%) had the highest cyanoHAB frequency during 2008-2011. Their top ten lakes from Florida based on cyanoHAB bloom frequency 15 (in order: Apopka (1), Pierce, Dora, Marion, Howard, Parker, Hancock, Harris, Jesup, and Juliana (10)) are in the top 18 lakes in our study (Fig. 4). As expected, a comparison of Florida lake ranks based on the frequency and area-normalized magnitude highlights differences in information between the methods. A lake with a persistent moderate bloom would rank higher in frequency than in magnitude; an example is Lake Pierce (rank 2 in frequency, 9 in magnitude). A lake with short intense blooms would rank higher in magnitude than in frequency. An example is Hancock Lake, which has intense annual blooms (Chl-a of 300-500 mg m −3 ) that only last for a few months 58 , and ranks 1 in area-normalized magnitude and 7 in frequency. The difference between metrics can be more acute in large lakes, like Lake Okeechobee, which have blooms that are large in magnitude but do not cover most of the lake (Fig. 9). A single metric cannot highlight all aspects of cyanoHABs; complementary cyanoHAB metrics representing factors like bloom frequency, area, and area-normalized magnitude are needed.
Wind-driven mixing in the water column can add uncertainty to the cyanoHAB magnitude estimates for species with buoyancy regulation, such as Microcystis aeruginosa and Aphanizomenon flos-aquae. Satellite Scientific RepoRtS | (2019) 9:18310 | https://doi.org/10.1038/s41598-019-54453-y www.nature.com/scientificreports www.nature.com/scientificreports/ algorithms, in bloom conditions, detect cyanobacteria concentration only near the surface. Previous studies have reported that wind stress can increase vertical mixing of cyanobacteria cells and thereby reduce the ability of the ocean or water color sensor to detect the majority of the biomass 12 . Therefore, persistent high wind (>7.7 m s −1 as observed in Lake Erie) 11 , when combined with frequent cloud cover (such that only windy days during the bloom season are imaged) may occasionally lead to underestimation of cyanoHAB biomass or area-normalized magnitude. Cloud cover, sun glint, and the effectiveness of masking algorithms for other invalid pixels (e.g. mixed land and water at the shore, dry lake bed, algal mats, and vegetative areas) may add uncertainty to the satellite-based measurements. Cloud and glint impacts should be uniform through a region, but the other factors might bias specific lakes. Another source of uncertainty could come from the use of the CI-cyanobacteria cell count relationship when the analysis would be scaled up to the CONUS lakes. In Lunetta et al. 41 , a CI-cyanobacteria biomass relationship was demonstrated using field data collected from lakes in the US states across Ohio, Florida, and throughout New England. The same CI-cyano and cyanobacteria biomass relationship was revised and presented    15 with more meaningful error estimates (MAPE = 28.6%). Error in manual cell enumeration is inversely proportional to the number of colonies counted 48 . 20-30% error is expected and considered acceptable when at least 400 units or colonies are counted although Chorus and Bartram 48 report that normally 20-40 colonies may be present in 100 mL of sampled water from the field. Therefore, even higher than 20-30% error, only from field cell density, cannot be ruled out. Similarly, variability in the spatial distribution of biomass in a bloom can add up to two orders of magnitude difference in biomass, as observed by 59 in a cyanobacteria bloom in the Gulf of Finland. Spatiotemporal variability in biomass in a diurnal scale can add significant uncertainty as well 60 . Therefore, after considering errors from multiple sources, it is expected to have greater than 30% error or difference in satellite estimates, when compared with field measured cell density from a point source in a bloom event. Additionally, a CI-cyano to Chl-a relationship, established by Tomlinson et al. 16 for Florida lakes, could be used in future studies, although the CI-cyano to Chl-a relationship may require additional examination before applying to all CONUS lakes.
Lake size presents a potential limitation on decisions based on area-normalized magnitude. One such example is Lake Okeechobee in Florida, which is the largest freshwater lake in Florida and the 9 th largest freshwater lake (by area) in the United States. Due to its size and societal importance (for water supply, tourism, and ecological impacts), cyanoHAB issues in this lake have been widely covered by the press and media, thereby creating cyanoHAB awareness at state and national levels. While Lake Okeechobee was ranked second in bloom magnitude (behind Apopka, a moderately large lake), based on the area-normalized magnitude, Lake Okeechobee was ranked 95 th among the Florida lakes. This is because the bloom area is simply a small percentage of surface area in such a large lake. Therefore, for larger lakes, annual or seasonal bloom magnitude numbers should be used for lake water management and decision making related to water quality. In contrast, area normalization highlights cyanoHABs in most of the smaller lakes such as Hancock Lake, Lake Dora/Beauclair/Carlton, and many others, Lake labels include the state name the lake is associated with and the number inside brackets represents the median area-normalized magnitude rank over the same study period. Right panel: median area-normalized magnitude for the same lakes during the same study period provided for comparison. Gray-colored bars represent lakes from Ohio. Figure 10. Relative comparison of lake ranks calculated from the annual area-normalized magnitude and measured annual mean Chl-a concentration. Numbers associated with the lake names in the x-axis tick label represent the median lake rank as in Table 1. (2019) 9:18310 | https://doi.org/10.1038/s41598-019-54453-y www.nature.com/scientificreports www.nature.com/scientificreports/ which may not get enough attention due to their size, although they equally bear the potential of causing an adverse effect on health and the environment.
The proposed bloom magnitude metric provided a synoptic view in lakes by capturing spatiotemporal mean areal cyanobacteria biomass. Further normalization of bloom magnitude by lake surface area provided comparable and actionable information for water quality managers inside states or other jurisdictional boundaries. The relative ranking of lakes allows the MERIS record (2003-2011) to be utilized within a state or a region, as the MERIS FR temporal frequency is expected to be similar. Ranks and nonparametric statistics provide robust parameters that do not depend on the calibration accuracy and precise thresholds, as compared to the direct metrics like a bloom area, bloom frequency, or area-normalized magnitude. The rank-based metric has additional power of allowing the use of multiple satellites without introducing biases between the different satellite data sets. For example, OLCI on Sentinel 3A and 3B may not currently match the MERIS calibration as the OLCI calibration is still on-going as of this writing, but the ranking of the lakes would eliminate the systematic bias in the data due to differences in calibration coefficients. Therefore, area-normalized magnitude ranks estimated from OLCI should be consistent with those from MERIS in each season, allowing identification of those lakes that are changing in bloom magnitude. The area-normalized magnitude should be inspected (together with sample frequency) to confirm that there is not a systemic change in all the lakes in a region over the time period of interest. This problem is less likely to occur if many lakes are considered in the analysis, or if they have fundamentally different environmental characteristics. These sensors cannot resolve all lakes of interest in a state. Narrow lakes and some rivers are a particular problem. For those small or narrow water bodies, Sentinel-2 may provide a solution, but that requires more research, as some key bands (620 and 681 nm) are not on that sensor.
The method presented in this study captures an assessment of cyanoHAB magnitude, which is the cyanobacteria biomass for the year or season. Normalization of bloom magnitude by lake surface area let us compare the cyanoHAB magnitude across lakes with varying size. Our approach to rank the lakes by median area-normalized magnitude helped us to highlight the top lakes, which need immediate attention from water quality managers. Provided below are three advantages of the ranking: 1. This approach uses the power of ranked and non-parametric statistics in order to be able to use the MERIS time series (2003-2011) in a state or a localized region irrespective of bias in temporal coverage. However, if contemporaneous satellite data collection frequency is different between two areas, they cannot be compared side-by-side and lakes in those areas should be analyzed separately. 2. This approach would also allow the use of Sentinel-3 OLCI data along with MERIS time series even though the sensor calibration coefficients of OLCI are still being refined and may not match with MERIS. This would enable a comparison of cyanoHAB bloom magnitude derived from OLCI with the historic cyano-HAB magnitude derived from MERIS. 3. As our method also included WHO recreational thresholds, the same information may also be used for categorizing which lakes need pressing attention for cyanoHAB management. A specific threshold representing exposure risk can be set and lakes above that threshold may be identified as a priority during the observational period.
No one metric can completely represent the attributes of cyanoHAB severity of interest to water quality managers. Area-normalized magnitude can provide awareness of smaller lakes that have significant blooms. However, the area-normalized magnitude can equate a short-lived, large, intense bloom with a long-lived, moderate bloom in any sized lake. Therefore, the reader is encouraged to compare other metrics such as temporal frequency 15 and bloom spatial extent 29 to address related questions. These methods complement each other and can provide a more complete picture of cyanoHABs on a regional or national scale. The Cyanobacteria Assessment Network (CyAN) project 13 provides the capability to scale this effort to CONUS fresh water lakes and water bodies. Our future work could focus on providing a comprehensive analysis of cyanoHAB magnitude in CONUS lakes and identifying lakes of concern.

Data availability
The satellite dataset used in the current study can be downloaded from https://oceancolor.gsfc.nasa.gov/ and datasets generated during this study are available from the corresponding author on request.