Main

Harmful algal blooms and their associated toxins pose a severe threat to health and water quality1,2. Microcystins, produced by cyanobacteria commonly known as blue-green algae, are one of the most frequently detected classes of such toxins globally3, with several hundred microcystin congeners currently known4. As a potent hepatotoxin, microcystin affects liver function by inhibiting protein phosphatases5 and acute concentrations can lead to fatalities in wild and domestic animals and, in rare cases, humans6,7. Microcystin can also impact animal and human health through chronic exposure8,9, including being classified as a potential carcinogen10. As an example of real-world impacts, around 500,000 people living near Lake Erie were instructed in 2014 to not drink tap water because microcystin concentrations exceeded safe limits in finished drinking water1.

Climate change is one of the greatest challenges to water quality and aquatic ecosystems11,12. Lakes, which hold 87% of Earth’s liquid surface freshwater13, are sentinels of climate change14 with lake summer surface temperatures expected to increase by 0.34 °C per decade on average15. This trend will have severe consequences for lakes, with impacts on critical abiotic and biotic processes such as mixing regimes, evaporation, lake ice phenology and the growth rates and composition of freshwater taxa11,12,14.

How temperature affects total microcystin concentrations within lakes, however, is still unclear. Generally, cyanobacteria have been identified as big winners of climate warming and are likely to increase globally in abundance and dominance, outcompeting other species due to boosted growth rates and intensified water stratification16,17,18,19,20. Larger blooms, however, do not necessarily contain higher microcystin concentrations, as not all taxa can produce this class of toxins21 and high genetic diversity leads to varying toxin-producing capabilities even within a species22. Warming has also been found to promote toxic strains over non-toxic strains23 but toxin cell quotas have been reported to decrease with increasing temperatures24,25. While laboratory studies have reported temperature optima for elevated microcystin concentrations26, field studies widely disagree on the direction of the temperature effect on microcystin concentrations, reporting negative, positive or negligible27,28,29 relationships in lakes. This complicates water quality management because rising temperatures may counteract the effect of nutrient management strategies30.

Here, we use 3,027 measurements from 2,804 lakes across the United States that were sampled in 2007, 2012 and 2017 as part of the National Lakes Assessment31 (NLA) to determine how temperature impacts microcystin occurrence (defined here as concentrations above the 0.1 μg l−1 detection limit of enzyme-linked immunosorbent assays (ELISA)32,33 used by the NLA) and concentrations across large geographic regions. We use a logistic model to represent occurrence and a log-normal model to represent concentrations above the analytical detection limit while controlling for pH, Secchi depth, the natural-log transformations of each of total nitrogen (TN), total phosphorus (TP), chlorophyll a (Chl-a), dissolved organic carbon (DOC) and lake depth (D) and area (A), all of which have been found to impact or correlate with microcystin concentrations in previous studies22,27,28,34,35. The two models are then combined into a zero-adjusted model27 that represents microcystin both below and above the detection limit (Methods).

The models are fitted in the framework of generalized additive models for location, scale and shape36 (GAMLSS) which allows us to identify potential nonlinear relationships and to model the full probability distribution of microcystin concentrations, making it possible to quantify the probability of exceeding any defined water quality threshold under specific environmental conditions (Methods).

We then use the combined model to assess the impact of future warming on the geographic distribution of areas with elevated probabilities of microcystin concentrations above critical water quality thresholds across the continental United States. To do so, we use projections from 15 generalized circulations models (GCMs) (Extended Data Table 1) participating in the coupled model intercomparison project phase 6 (CMIP6) (ref. 37) under the ‘middle-of-the-road’ shared socioeconomic pathway scenario SSP 2-4.5 (ref. 38) (Methods).

Exceedance probabilities of water quality thresholds

The World Health Organization (WHO) provisional guideline values for drinking water39 list concentrations of 0.3 and 1 μg l−1 for children and adults, respectively. The US Environmental Protection Agency (EPA) has set a microcystin threshold of 8 μg l−1 as the water quality criterion for recreational waters protective of human health while either swimming or taking part in recreational activities on the water40. During the three sampling years of the NLA, 18.0% and 9.9% of lakes investigated in this study had concentrations above the drinking water guideline for children and adults, respectively, and 1.3% had concentrations above the recreational water quality criterion (Fig. 1). The model developed here (Methods) was applied under the observed environmental conditions at each lake at the time of sampling and predicts 16.8–20.0% (99% interval; Methods) of lakes would have concentrations above the children’s drinking water guideline. The predicted intervals are 7.6–10.1% for the adult drinking water guideline and 1.1–2.3% for the recreational water criterion. The model is therefore highly effective at representing the probability of exceeding critical water quality standards under the range of conditions observed during the NLA.

Fig. 1: Microcystin concentrations across the United States.
figure 1

Observed microcystin concentrations across 3,027 sampling points (2,804 lakes) over three survey years (2007, 2012 and 2017).

For individual lakes, the predicted probabilities range from 0 (Waldo Lake, Oregon) for exceeding the recreational water criterion to 0.91 for exceeding the children’s drinking water guideline (Roundup Lake, Nebraska) under the environmental conditions observed at the time of sampling. In regions such as the upper Midwest corn belt, the probability of exceeding the 0.3 μg l−1 water quality guideline for drinking water for children can be >0.50 on average across entire basins (Fig. 2a; five basins), >0.25 for exceeding the 1.0 μg l−1 water quality guideline for adults (Fig. 2c; 14 basins) and >0.10 for the 8 μg l−1 recreational water criterion (Fig. 2e; two basins). These results indicate that exceeding water quality thresholds in those areas is already common rather than being the exception. The probabilities show a strong geographic signature (Fig. 2) based on temperature, degree of eutrophication and other environmental conditions (Extended Data Fig. 1), which we discuss next.

Fig. 2: The upper Great Plains and the upper Midwest corn belt show the highest probabilities of exceeding microcystin thresholds.
figure 2

The exceedance probabilities for each lake are calculated on the basis of the environmental conditions measured at the time of microcystin sampling. af, Exceedance probabilities are then averaged across lakes on the HUC6 basin scale (a,c,e) and HUC2 regional scale (b,d,f) for the thresholds 0.3 μg l−1 (a,b), 1 μg l−1 (c,d) and 8 μg l−1 (e,f).

Source data

Drivers of microcystin occurrence and exceedance probabilities

We find that temperature (T), TN, Chl-a and pH help explain both the occurrence (logistic model) of microcystin and median concentrations above the detection limit (log-normal model). In addition, lake area (A) and DOC have an effect on microcystin occurrence, thus indirectly also impacting microcystin concentration in the combined model, while lake depth (D) affects concentration above the detection limit. The partial effects of these covariates are shown in Fig. 3a–f for occurrence, in Fig. 3g–l for the median and spread (Methods) of the concentrations above the detection limit and in Fig. 4 for the probability of microcystin concentrations exceeding a specific threshold. Neither log TP, nor its interaction with log TN, showed a significant relationship to microcystin. Also, Secchi depth did not provide significant additional explanatory power.

Fig. 3: Partial effects of environmental parameters.
figure 3

af, Partial effects of the environmental parameters on the log odds of microcystin being detected (that is, microcystin occurrence) for the logistic model. gl, Partial effects of the environmental parameters on concentrations above the detection limit for the log-normal model; gk, show the partial effect on log(μ), where μ is the predicted median of the microcystin concentration distribution, while l shows the effect on log(σ), where σ is the predicted standard deviation of the log microcystin concentration above the detection limit; Methods. The ranges on the horizontal axes represent the ranges observed in the sampled data. The range for the log-normal model is smaller than for the logistic model because it only includes the range of observed values for those samples where the microcystin concentration was above the detection limit. The shaded areas represent the 95% confidence intervals.

Fig. 4: Exceedance probabilities under varying temperature, log TN, log area, log chlorophyll a, log DOC, log depth and pH.
figure 4

For easier visual interpretation the plots are grouped into selected exceedance probability bins, although exceedance probabilities are continuous. ag, For a given microcystin threshold on the vertical axis the colours represent the exceedance probability as a function of temperature (a), TN (b), area (c), chlorophyll a (d), DOC (e), depth (f) and pH (g). In each panel the other (log-transformed) environmental variables are held at their median (pH of 8.27, log chlorophyll a of 2.23, log TN of 6.53, temperature of 24.4 °C, log DOC of 1.77, log depth of 1.57 and log area of 3.55).

Temperature

We find that while the probability of microcystin occurrence increases monotonically with temperature (Fig. 3a), the median concentration above the detection limit peaks at 22 °C (Fig. 3g). Consequently, the probability of exceeding specific concentration thresholds is also highest for temperatures ranging from 20 to 25 °C (Fig. 4). This contrasts with previous studies based on a more limited set of observations that had found a negative or negligible effect of temperature27,28. Laboratory or single lake studies, however, have also shown an optimum temperature in the range of 20–25 °C for elevated microcystin concentrations26,41, supporting our findings.

The more flexible nonlinear approach applied here thus made it possible to observe the presence of a temperature optimum at the field scale across thousands of lakes. Because both the logistic and log-normal models include proxies for biomass (Chl-a), the higher probability of microcystin occurrence at higher temperatures is probably attributable to an increase in the dominance and relative abundance of toxic strains of cyanobacteria23, resulting from increased growth rates and water column stratification at higher temperatures.

The decreasing exceedance probabilities above 22 °C are consistent with studies reporting a reduction in toxin quota at higher temperatures and a decoupling between optimal growth rates and highest toxin concentration24,26. Physiological reasons for this pattern are not fully understood24,25. Another plausible reason for decreasing concentrations when temperature exceeds 22 °C is an increase in microcystin removal by bacteria that are capable of degrading microcystin42.

Other environmental drivers

As expected, the model developed here confirms that in-lake nitrogen has a clear positive effect on microcystin occurrence and exceedance probabilities (Figs. 3 and 4). This is in line with previously observed results indicating that TN is a dominant driver of microcystin27,28,34. This pattern may reflect that cells deal with excess N by shunting it out into N-rich metabolites26,43 such as microcystin. In addition, however, we also identified a direct impact of TN on the variability in microcystin concentrations above the detection limit (Fig. 3l), suggesting that the uncertainty in microcystin concentration increases with TN, that is, that the likelihood of microcystin concentration extremes scales directly with in-lake TN.

Consistent with earlier field and laboratory studies27,41, the probability of exceeding microcystin thresholds grows with increasing Chl-a and pH. Intense blooms can lead to depleted carbon dioxide concentrations, thereby increasing pH. Studies also suggest that depleted carbon dioxide concentrations favour cyanobacteria over other taxa25,44. For DOC, earlier studies report both negative and positive effects on microcystin occurrence or concentration27,34,45. Here, we find that DOC has a positive effect on microcystin occurrence, which increases the probability of microcystin concentrations exceeding water quality thresholds. The effect of DOC on microcystin may be the result of light limitation46 or the production of reactive oxygen species formed under ultraviolet radiation, which in turn has been shown to spur toxin production35. We also find an effect of lake area and depth on the probability of exceeding microcystin thresholds. Both lake area and lake depth increase the likelihood and strength of stratification in summer months47, which has been observed to promote cyanobacterial dominance over eukaryotic competitors48. Another way in which lake morphology may impact microcystin concentrations is through the coupling between lake water and lake sediments, where microcystin degradation has been found to take place49.

Compound effects in the presence of warming

We find that the impact of rising temperatures is greater under high TN concentrations (Fig. 5a) and the sensitivity to changes in TN concentrations is highest for temperatures between 20 and 25 °C (Fig. 5b). This suggests that the impact of warming can be supercharged by eutrophication. At the time of the NLA, regions with high TN concentrations (Extended Data Fig. 1a) did not coincide with temperature regions associated with the highest risk for elevated microcystin concentrations (Extended Data Fig. 1g) but this could change under future conditions. That the whole may be greater than the sum of its parts is in line with the ‘allied attack’ concept50 suggesting a potential reinforcing effect between eutrophication and rising temperatures and was previously observed to boost microcystin in a pond water experiment16. Here, we observe and quantify this effect across thousands of lakes (Fig. 5). Interestingly, such a compounding effect was not observed for cyanobacterial biovolume19.

Fig. 5: High in-lake TN concentrations amplify the effect of warming on the probability of exceeding microcystin water quality guidelines.
figure 5

a, The absolute change in exceedance probabilities for microcystin being above the detection limit (occurrence) and other selected thresholds when changing temperatures under different in-lake nitrogen concentrations. All variables (other than temperature and log TN) are kept at their respective medians. b, The absolute change in exceedance probabilities when changing TN concentrations under different temperature conditions. All variables (other than temperature and TN concentration) are kept at their respective medians.

Geographic redistribution under rising temperatures

We find that the probability of exceeding water quality thresholds will increase under SSP 2-4.5, applied to 15 climate models participating in the CMIP6 intercomparison (Methods). Taking the water quality guideline for children as an example, the relative increase in exceedance probability will be >50% for 14 basins by the mid-century and 28 basins by the late-century relative to historical conditions (Fig. 6a,c,e). We project that about a quarter of basins will see increases of >25% for the mid- (66 basins) and late-century (78 basins). Even at the Hydrologic Unit Code (HUC) regional scale HUC2 (Fig. 6b,d,f), three regions will see increases of >25% by mid-century and two regions will see increases of >50% by late-century.

Fig. 6: Warming will change the exceedance probability of the children’s drinking water guideline (0.3 μg l−1), with relative increases exceeding 50% in some areas.
figure 6

af, The relative increase in risk due to warming relative to the historic period (1950–1979), averaged across lakes within individual HUC6 basins (a,c,e) and HUC2 regions (b,d,f), for present (1990–2019) (a,b), projected mid-century (2030–2059) (c,d) and projected late-century (2070–2099) (e,f) summertime temperatures. All other variables are held constant at their values at the time of sampling. White areas represent basins with no sampled lakes.

Source data

Geographically, the regions with the greatest relative increase in exceedance probabilities are located in the north of the United States, where summer temperatures are projected to be close to the optimum for high microcystin concentrations (Fig. 7). For some regions, such as the Great Lakes, the exceedance probabilities are already high (Fig. 2) based on relatively high TN concentrations. But even for northern regions that currently have relatively low exceedance probabilities (Fig. 2), a relative change in exceedance probabilities of up to 50% means that those regions are much more likely to experience hazardous microcystin concentrations in the future. Interestingly, even at present the relative increase in exceedance probabilities is >10% higher than under historical conditions for more than a quarter of basins (Fig. 6a), indicating that the impact of regional shifts in temperature is already being felt.

Fig. 7: Areas with summertime temperatures associated with the highest risk of elevated microcystin concentrations are moving northward.
figure 7

ad, The absolute difference of summertime air temperatures to 22 °C, the approximate temperature with the highest risk, for the historic period (1950–1979) (a), present (1990–2019) (b), projected mid-century (2030–2059) (c) and projected late-century (2070–2099) (d).

Conversely, only one basin is expected to see a relative reduction in exceedance probabilities of >25% by the late-century, while less than a quarter of basins will experience a reduction of >10% by the mid-century and a third by the late-century (Fig. 6d,e,f). These areas coincide with regions where temperatures are expected to rise above the range associated with the highest probability of exceeding microcystin thresholds.

Similar patterns emerge for other microcystin thresholds, that is, 1 and 8 μg l−1 (Extended Data Figs. 2 and 3). Overall, the divergent responses in the north and south result from the northward migration of the region with summertime temperatures closest to the 22 °C optimum (Fig. 7). However, the divergent response is not symmetric and there are more regions experiencing increases than decreases (Extended Data Fig. 4). Under increasing temperatures, microcystin levels are also expected to reach detectable concentrations more often (Extended Data Fig. 5) with average relative increases in occurrence of 9.5% by the late-century across HUC2 regions.

Given that microcystin occurrence is defined here on the basis of the detection limit of 0.1 μg l−1, it is likely that smaller concentrations will become even more frequent. What impact an increasingly chronic microcystin exposure even under low concentrations will have on health and ecosystem functions is not well understood. In a recent study, it was shown that chronic exposure with an estimated daily intake of even only 0.15–0.27 μg (MC-LReq per day) led to detectable microcystin concentrations in human blood sera and was associated with signs of renal impairments9. Chronic exposure is especially dangerous for people with existing liver disorders such as non-alcoholic fatty liver disease8. Microcystin has also been reported to be transferred to higher trophic levels through the food chain51, to lead to growth inhibition of edible plants in bioassays and to be detected on plants grown for human consumption after crop spray irrigation52,53.

Compound events54, for example, events of synchronously high in-lake TN concentrations together with high-risk temperatures, will further increase the likelihood of microcystin extreme concentrations in the future (Fig. 5). High concentrations may occur under changing land use patterns or from agricultural runoff after weather events such as heavy rainfall. Hotter spring days or extended periods of temperatures around 22 °C in autumn may also increase the frequency and duration of high-concentration events. Although more data will be needed to assess seasonal differences in microcystin occurrence and concentration, summer-based monitoring programmes will need to extend to the spring and autumn months.

Conversely, preventing excess nutrients from reaching water bodies and reducing eutrophication is key to counteracting the effects of rising temperatures. Efforts to reduce inland water eutrophication over the past decades have focused mainly on phosphorus, due to the assumption that atmospheric N2-fixing cyanobacteria will buffer nitrogen levels in the water column55. However, N2-fixing is not sufficient to offset nitrogen loss due to processes such as denitrification56 and microcystin-producing taxa such as the genus Microcystis require external sources of nitrogen57 from runoff or atmospheric deposition. In some systems, nitrogen loading has been shown to selectively promote the abundance of toxic Microcystis strains, while orthophosphate has not58. Therefore, the results presented here add to the increasing evidence that an exclusive focus on phosphorus reductions alone will not be sufficient to mitigate dangers from nitrogen-rich cyanotoxins26.

Additional complexities and broader considerations

The ELISA used here (Methods) represent a congener-independent, robust and highly sensitive assessment of total microcystin concentrations59,60. Its main advantage relative to methods such as liquid chromatography-mass spectrometry is that it quantifies the total concentration across a very high fraction (>80%) of microcystin congeners60. A loss of microcystins during preparation and storage can in principle result in an underestimation of the true microcystin concentration61 with ELISA, while microcystin conjugates and byproducts could lead to overestimations of the toxicity of a sample62,63. Also, variable cross-reactivity of congeners can lead to either under- or overestimations of concentrations63. The uncertainty resulting from these factors, however, would be highly unlikely to change the primary conclusions of the current study (Methods).

Future large-scale monitoring using congener-specific approaches such as liquid chromatography-mass spectrometry could be used to answer additional questions about the impact of warming on specific strains and congeners, such as a shift in their diversity and relative abundance64. Routine monitoring of all known congeners is almost impossible, however, because new microcystin congeners continue to be discovered. Low levels of several congeners that individually fall under the detection limit could also lead to an underestimation of the total microcystin concentration. Therefore, monitoring using congener-specific methods could supplement but not replace assessments based on ELISA.

Additional factors such as the local cyanobacterial strain composition and genetic diversity, grazing by zooplankton, viral lysis or microbial interactions can also influence microcystin concentrations65,66 but were not considered here because such data are currently non-existent for the United States across large scales. The identified effects of temperature therefore represent the net effect of temperature across various biotic and physiological responses. While strain evolution could in principle impact the future trends explored here, an earlier study found that increased eutrophication and temperature did not favour strain evolution towards more toxin-producing strains in one of the most important microcystin producers Microcystis aeruginosa67. Future work exploring biotic factors could further deepen understanding of the processes that control microcystin and bloom dynamics.

In addition, future changes in temperature are also likely to affect other drivers included in the model and the future projections presented here do not account for correlations among changes to those variables (Extended Data Fig. 6). Given that temperature has been found to increase cyanobacterial biomass either directly or indirectly17,68, increases in temperature may also increase other predictors in the model, such as Chl-a. As such, microcystin concentrations would be expected to increase even more than presented here, making our conclusions a conservative estimate of future impact. The long-term coupling of environmental variables, however, may impact cyanobacterial taxa differently19,69, which makes future microcystin projection much more complex under climate change and highlights the need for more research focusing on the interplay of environmental variables.

Lastly, although this study focuses on lakes in the United States, microcystin is a global hazard65 and the results shown here indicate that microcystin occurrence is expected to increase with rising temperatures. Regions where summertime temperatures are near 22 °C, promoting exceedance of microcystin thresholds, will shift poleward not just for the United States but globally (Extended Data Fig. 7). Potential seasonal shifts in periods with elevated microcystin concentrations can also be expected. The threat to water quality posed by microcystin will therefore grow globally as warming continues, especially in eutrophic systems.

Methods

Data

We analysed lakes across the continental United States that were part of the EPA National Lake Assessment (NLA) surveys conducted in 200731, 201270 and 201771 with microcystin concentrations detected by ELISA.

ELISA is an antibody-based analytical method widely applied in water treatment for the screening of toxic cyanobacterial metabolites32,33,59,60. ELISA does not target individual microcystin congeners but rather measures the total concentration of microcystins (and nodularins) in a sample33,59,60 that share very similar toxicity and structure and are thus combined under the general term microcystins here. The ELISA used in the NLA surveys is calibrated against the microcystin-LR congener59, which also forms the basis of the WHO drinking water guidelines39. No statistically significant difference between concentrations could be identified when comparing ELISA to liquid chromatography/tandem mass spectrometry under various cell lysis techniques33. The EPA uses freeze/thaw cycles to lyse cyanobacterial cells and extract toxins; as such this method not only quantifies dissolved toxins but also microcystin within cells59. ELISA is the recommended method of the EPA to quantify total microcystins (and nodularins)59 and as such many water monitoring efforts and drinking water supplies use ELISA kits to measure microcystins within their systems. Therefore, ELISA and the predicted microcystin concentrations derived from those measurements are in line with end users.

Lakes included in the NLA were selected by a stratified probabilistic sampling design to be representative of the >50,000 lakes in that project. Previous studies have analysed the 2007 and 2012 surveys27,28,34,45, while our study incorporates the 2017 data.

From these data, we selected the environmental variables Chl-a (μg l−1), surface water temperature T (°C), TN (μg l−1) and TP (μg l−1), DOC (mg l−1), pH, Secchi depth (m), lake depth D (m) and area A (ha) for consideration for inclusion in the model. A very detailed description of data sampling can be found at EPA and within NLA references31,70,71.

Although cyanobacterial abundance is also measured as part of this effort, we used Chl-a as a proxy for biomass because Chl-a is more commonly measured and makes the model applicable to a wider range of water bodies beyond those examined in this study. Chl-a, TN, TP, DOC, A and D were all natural log-transformed. In total, 2,804 individual lakes had measurements of all the environmental variables considered here with paired microcystin measurements. As some lakes were sampled multiple times, the total number of observations was 3,027. In 1,092 of those observations microcystin was above the detection limit.

Statistical model

We used GAMLSS implemented via the R package gamlss36,72,73. This is a distribution-based approach to nonlinear regression where not only the mean but all parameters of the conditional distribution, for example, the variance of the response variable, are related to environmental variables. In the easiest case, assuming the conditional distribution of the response is Gaussian, both the mean and the standard deviation are functions of covariates. Further, as all parameters of the distribution are modelled, the probability density function can be calculated for any combination of covariates. With this information, every quantile or probability to cross a chosen response value (microcystin threshold) can be calculated.

There is a detection limit of 0.1 μg l−1 for microcystin in these surveys, such that the observed microcystin concentration (MC) is not completely continuous. Because of that, we used a zero-adjusted model for M = MC − 0.1 that combines a logistic model (occurrence model) of microcystin detection or non-detection, with a continuous model to represent microcystin concentrations above the detection limit, that is, to model M > 0. For the continuous model, a log-normal distribution (LOGNO2) was selected as the optimal conditional distribution for M > 0 after extensive testing of >50 distributions currently available in GAMLSS74 based on fit and parsimony via the Bayesian information criterion75 (BIC). The log-normal distribution has two parameters, μ (the median of M) and σ (the standard deviation of log M). The log-normal probability density function of M > 0 is defined as:

$${f}_{M > 0}(m)=\frac{1}{{(2\pi )}^{1/2}\sigma m}\exp \left\{-\frac{1}{2{\sigma }^{2}}{[\mathrm{ln}(m)-\mathrm{ln}(\mu )]}^{2}\right\}$$
(1)

where m represents any specific value of the random variable M. It is interesting that the log-normal distribution, which has a compounding effect between predictor variables (effects combine multiplicatively), performed best for microcystin concentrations out of the many distributions available in GAMLSS including, for example, a zero truncated t-distribution with identity link that does not lead to a compounding effect. This indicates that compounding effects between environmental variables are likely in predicting microcystin concentrations above the detection limit.

M was modelled using a zero-adjusted log-normal distribution, with a mixed (discrete-continuous) probability function. Hence, the mixed (discrete–continuous) probability function for microcystin MC is:

$${f}_\mathrm{{MC}(mc)}=\left\{\begin{array}{ll}{p}_{0} & {\rm{if}}\,{\rm{mc}}=0.1\\ \frac{(1-{p}_{0})}{{(2\pi )}^{1/2}\sigma ({\rm{mc}}-0.1)}\exp \left\{-\frac{1}{2{\sigma }^{2}}{[\mathrm{ln}({\rm{mc}}-0.1)-\mathrm{ln}(\,\mu )]}^{2}\right\} & {\rm{if}}\,{\rm{mc}} > 0.1\end{array}\right.$$
(2)

where p0 is the probability that microcystin is below the detection limit from the logistic model and where mc represents any specific value of the random variable MC.

The selection of environmental covariates used to estimate μ, σ of the log-normal model, and p0 of the logistic model was based on minimizing the BIC in a stepwise selection process. Here, each environmental variable can be part of the distribution parameter model either as a penalized P-spline72 or a linear term, or not selected. BIC was also used locally for selecting the amount of smoothing, that is, the effective degrees of freedom used for smoothing, in the P-spline. The selection of environmental variables for the models was performed using the stepGAIC() and stepGAICAll.A() functions, respectively, from the R package gamlss72. We also checked for an interaction between TN and TP but these did not improve the fit as assessed by the BIC.

The final model (equation (2)) giving fMC (mc) for MC has the following parameters:

$$\begin{array}{ll}\mathrm{ln}\left(\frac{{p}_{0}}{(1-{p}_{0})}\right)=s(\mathrm{ln}({\rm{TN}}))-0.31\,\mathrm{ln}({\rm{Chl}{\hbox{-}}{\mathrm {a}}})+s(\mathrm{ln}(A))-0.42\,\mathrm{ln}({\rm{DOC}})\\ -0.03\,T-0.45\,{\mathrm{pH}}+6.89\\ \mathrm{ln}(\,\mu )=0.68\,\mathrm{ln}({\rm{TN}})+0.37{\mathrm{pH}}+s(T)\,+0.28 \,\mathrm{ln}({\rm{Chl}{\hbox{-}}{\mathrm {a}}})+0.34\,\mathrm{ln}(D)-9.81\\ \mathrm{ln}(\sigma )=0.08\,\mathrm{ln}({\rm{TN}})+0.06\end{array}$$
(3)

The partial effects for ln((1 − p0)/p0), ln(μ) and ln(σ), are displayed in Fig. 3. Note that since Fig. 3 shows the partial effects for ln((1 − p0)/p0) = −ln(p0/(1 − p0)), the sign in the terms for ln(p0/(1 − p0)) in equation (3) is reversed in Fig. 3, for example, −0.3 T in equation (3) becomes 0.3 T in Fig. 3.

Model validity was assessed using detrended quantile–quantile plots of the normalized quantile residuals76, which show a good fit for both the logistic and log-normal model (Extended Data Fig. 8). Concurvity between model predictors was not concerning in both models as assessed via the R package gamlss.ggplots77 using the function gamlss.ggplots:::get_concurvity(). Also refitting the models using only linear predictors resulted in variance inflation factors never exceeding five, indicating no issues of multicollinearity78 for those parameters. The partial effects plots, exceedance probability plots and detrended quantile–quantile plots were also created via the R package gamlss.ggplots77.

Once all parameters are estimated from the final model, the probability of crossing a specified microcystin threshold under given environmental conditions can be calculated as one minus the cumulative probability up to that threshold. In contrast to studies conducting multiple logistic regressions to determine exceedance probabilities, this model can thus be applied to any threshold without refitting a new model at each threshold.

On the basis of the exceedance probabilities, we can create intervals for the percentage of samples in this study expected to cross a chosen threshold. For this, we treated each exceedance probability for a sample as a draw from a Bernoulli distribution79 (a biased coin flip). We can do this for each sample to simulate whether it exceeds the threshold. We then sum the number of samples for which the threshold was exceeded and divide by the total number of samples. This gives us a simulated percentage of how often samples are expected to exceed the threshold under equivalent environmental conditions. We repeated this 10,000 times to calculate the 0.5th percentile and 99.5th percentile of this distribution and compare it to the observed data.

Spatially averaged detection probabilities and exceedance probabilities for 0.3, 1 and 8 μg l−1 were calculated to find broad-scale patterns across the United States within HUC boundaries at the HUC2 and HUC6 level, as developed by the US Geological Survey80 (Fig. 2). All base maps of this study have been created with the R package sf 81.

Scenarios

We calculated how the risk of exceeding water quality guidelines changes under warming temperatures at each lake. To do so, we used projections from an ensemble of 15 GCMs participating in CMIP6 (ref. 37) (Extended Data Table 1) run for SSP 2-4.5 (ref. 38), which represents a ‘likely’ scenario for future climate regulation strategy given current policies. Data from the model projections were bias-corrected and spatially downscaled to 0.25° using the bias correction and spatial disaggregation82,83 method. For the bias correction process, the reference air temperature data in the historical period were collected from the Global Meteorological Forcing Dataset84.

While the above climate models are predicting air temperature the model is based on water surface temperature. However, in the temperate zones and on a monthly scale, water surface and air temperature track very closely, especially in the range 5–30 °C during summer months85,86.

Under this scenario we investigated how exceedance probabilities have changed and will change under the expected average local air temperature change in the summer months (June–September) for the present day (1990–2019), for the mid-century (2030–2059) and for the late-century (2070–2099) relative to the historical period (1950–1979). The relative change in exceedance probability was calculated as the ratio of the exceedance probability during each of the three periods to the exceeding probability during the historical period. The relative changes were again spatially averaged to identify broad-scale patterns of shifts in occurrence and exceedance probabilities at the HUC2 and HUC6 level. The percentage relative changes in exceedance probability (Fig. 6 and Extended Data Figs. 2 and 3) are defined as:

$$100\left(\frac{P{({\rm{MC}} > {\rm{mc}})}_{S}-P{({\rm{MC}} > {\rm{mc}})}_{H}}{P{({\rm{MC}} > {\rm{mc}})}_{H}}\right) \%$$
(4)

where S represents one of the scenarios (present, mid-century or late-century) and H the historic reference.

The change in exceedance probability we report here between two scenarios can be seen and interpreted as the change between two snapshots that exhibit the average summer temperature of the assessed time periods, respectively.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.