Introduction

As predicted by the IPCC1 (2014), warming of the oceans continues and the frequency of extreme events is increasing2, and the most recent years (2015–2020) have been the warmest years in the oceans since 19553. The upper layers of the ocean are absorbing most of the atmospheric heat and shallow coastal ecosystems experience the highest (relative) impact of the warming. The most palpable consequences in these productive and diverse systems are seen as range shifts and local extinctions of many species and losses of ecosystem goods and services, following periods of abnormally warm water temperatures4. Such discrete, prolonged and anomalously warm events can be defined as marine heatwaves (MHWs) when compared to a local, 30-year climatological baseline5.

The disturbance of marine ecosystems caused by MHWs has been widely documented around the globe over the last two decades4,6,7,8,9,10. This has propelled a wide range of research investigating short-term extreme events, in contrast to the long-term gradual increases in average temperature11. Recently a framework for investigating the physical drivers, detection, predictability characteristics and various ecological impacts of MHWs has been proposed12. A common framework for defining MHWs is necessary in order to be able to analyse impacts and to compare trends across time and space, locally as well as globally.

While the effects of MHWs are increasingly being reported, there is a general lack of accurate temperature data to quantitatively link the cause and effects (in time and space, incl. depth). Most studies on the impacts of MHWs rely solely on remotely-sensed sea surface temperature (SST) data. Remotely-sensed data have a great advantage in monitoring and analysing SSTs and MHWs over a large spatial extent13, but (1) lack ability to describe sub-surface temperatures, (2) are temporally limited since satellites were not launched until 1980 and (3) demonstrate inaccuracies in coastal waters compared to in situ data14. These issues are of considerable importance in a shallow water body like the Baltic Sea with highly complex, large coastal archipelago systems.

Most parts of the Baltic Sea have experienced a larger temperature increase over the last decades compared to global oceans15,16. This gives the Baltic Sea a time machine character of how global coastal systems might respond to a warming climate17. The warming trend is not limited to surface waters but is also observed in sub-surface layers thus affecting the entire water column and the seafloor. Due to the complex bathymetry and hydrology, the warming trend in deeper layers is heterogeneous as e.g. the layer between 35 and 75 m depth in the Bothnian Sea shows a similar signal as the Gulf of Finland at 15–35 m16. Some of the first studies on sub-surface MHWs in the global oceans indicate their independence from surface MHW18 and suggest the stability of the stratification and local mechanical forces (e.g. Ekman downwelling) causing the heat transport to deeper layers19,20. Key to such studies is the long-term monitoring of deeper layers, which is rare due to technical challenges. Most current data rely on CTD-casts by research vessels or by long-term monitoring stations, although increasingly sub-surface water conditions are monitored using Argo-floats (fleetmonitoring.euro-argo.eu) and permanent sensors at specific sites, e.g. mooring buoys.

The assessment of anomalies and extreme events like MHWs requires knowledge of a norm-state and defining its threshold. The World Meteorological Organization recommends using the 30-year time period from 1961–1990 as a climatological standard reference period (RP) for long-term climate variability21. However, since SST satellites only became available in the 1980s, a commonly used reference period is from 1983–20128,22. This time period, however, refers to a time when the global SST was already on a steep increase23 causing an elevated baseline that might underestimate the magnitude of current extreme events.

Coastal zones are often spatially heterogeneous and highly complex in bathymetry, creating a variety of microhabitats with individual temperature characteristics poorly captured by remotely-sensed techniques. The spatial resolution of the commonly used satellite product NOAA Optimally Interpolated SST24 (OISST) refers to a 0.25 × 0.25° grid, which corresponds to approximately 400 km2 in the Baltic Sea area. Although the spatial resolution has been greatly improved via the EU Copernicus Marine Service Product (“Baltic Sea—Sea surface temperature reprocessed”), with a 0.02 × 0.02° grid which corresponds to an area of about 2.5 km2, it is still crude for complex archipelago systems with a multitude of habitats. Satellite data have been useful for providing new insights into open water patterns and broad-scale ecological studies, but are only partly applicable on smaller scales and in particular habitats13. Moreover, satellite-derived SST seemingly underestimate the intensity of MHWs14.

In this study we examine in situ temperature data from the surface (0–2 m) and sub-surface (30 m deep at the seafloor) layers from 1931 until 2020 collected at the monitoring site Storfjärden. Storfjärden is located in the channel east of Hanko Peninsula (south-west Finland; Supplementary Fig. 1) connecting the river mouth and the open Baltic Sea. It can therefore be classified as middle archipelago region with highly dynamic water masses influenced by riverine freshwater run-off and saline water from the open sea, and is prone to upwelling and highly susceptible to wind forcing25. The thermocline is typically at 5–10 m depth between January and August, and strongest in summer26. The aim of this study was to (1) compare historic with current climatological periods, (2) investigate historical extreme temperatures, (3) point out inaccuracies of remotely-sensed SST and (4) combine long-term with recent high-resolution temperature data to identify MHWs in sub-surface layers between 2016 and 2020. The remotely-sensed SST was retrieved from E.U. Copernicus Marine Service Information (CMEMS) from the Storfjärden pixel with a spatial resolution of 0.02 × 0.02° (see panel b of Supplementary Fig. 1 for pixel area). Our results highlight the urgent need for in situ temperature monitoring networks in coastal zones to understand the characteristics of MHWs and their subsequent ecological impacts.

We demonstrate an ongoing warming trend for the period 1927–2020 in the surface and bottom layer of the monitoring site Storfjärden, south-west Finland. Consequently, the ongoing warming caused the climatological baseline and its thresholds to be significantly warmer on almost every day of the year for the 30-year reference period of 1991–2020 compared to previous periods from 1931–1960 and 1961–1990, and in the surface and bottom layer, respectively. This results in a severe change of perspective when evaluating extreme temperatures based on different reference periods. Satellite-derived data sets are not able to demonstrate this shift in the baseline as they have only been initiated in the 1980s. Further, we demonstrate significant differences in the climatological baseline and its thresholds derived from remotely-sensed SST and in situ SST for the periods from 1982–2011. Satellites are not able to capture sub-surface conditions, where we detect 18 marine heatwaves on the seafloor in just four years of high-resolution recordings, which are important for investigating the effects of MHW on benthic ecosystems.

Results

Long-term trend

Figure 1 shows the annually averaged long-term water temperature in the surface and bottom layer and the percent of extremely warm observations (RP31-20) of Storfjärden from 1927 until 2020. The temperature in the surface layer increased by 1.8 °C (linear trend= 0.019 °C per year; y-intercept= 6.5 °C) and in the bottom layer by 1.3 °C (linear trend= 0.014 °C per year; y-intercept= 3.7 °C). The warming appears to be accelerating, as the six warmest years in the surface ever recorded were after 2012 (Supplementary Table 1). Further, in a ranking of the 20 warmest years, in the surface only 5 years were before the year 2000, 6 years for the bottom. The number of warm extremes is increasing in both layers, peaking in 2020 with 58 % of the measurements above normal in the surface and 71 % in the bottom.

Fig. 1: Yearly average temperature and abnormally warm extremes at Storfjärden.
figure 1

Yearly average temperature of Storfjärden from 1927 until 2020 in the surface layer (a, solid line) and bottom layer (b, dashed line). The stripes represent the occurrence of abnormally warm observations (RP31-20) as percent of all measurements per year in a. the surface and b. bottom layer. Greyed areas are years with less than 30 observations.

Comparison of Reference Periods

The long-term time series starting in August 1926 allowed defining multiple 30-year reference periods. A comparison of annual temperature distributions for RP31-60 and RP91-20 is shown in Fig. 2. The mean climatological baseline and its thresholds for the surface and bottom layer of RP91-20 are higher than of RP31-60 over the entire year. Averaged over the entire year the climatological baseline of RP91-20 has been elevated by 0.88 °C (sd = 0.29), the 90th percentile threshold by 0.996 °C (sd = 0.53) and the 10th percentile threshold by 0.78 °C (sd = 0.63) in the surface layer compared to RP31-60. Analogously for the bottom layer, the climatological baseline has been elevated by 0.83 °C (sd = 0.32), the 90th percentile threshold by 0.83 °C (sd = 0.46) and the 10th percentile by 0.63 °C (sd = 0.37). Minor time periods in each layer (30 days of the year in the surface layer spread across spring and autumn, 62 days of the year in the bottom layer spread across spring and winter) were not significantly different, but most days of the year had two significantly different probability density functions (see Supplementary Fig. 2 for a figure of Dmax and p-values resulting from Kolmogorov-Smirnov (KS)-test). This highlights the importance of declaring which reference period is used in an assessment of extreme events, as the resulting MHW-parameters will differ markedly.

Fig. 2: Comparison of climatological baselines of reference periods 1931–1960 and 1991–2020.
figure 2

Climatological baseline (solid lines) and 90th percentile (dashed lines) of RP31-60 (black lines) and RP91-20 (grey lines) of the surface (a) and bottom (b) layer derived from the interpolated long-term time series from Storfjärden. The bottom ribbon of each panel refers to the days of the year, where the two probability density functions of each day are significantly different (p < 0.05, light blue) or not different (p > 0.05, orange).

In the following assessments of this study (except in the comparison with remotely-sensed SST) we applied a 90-year reference period from 1931–2020 as this summarizes past colder and current warmer periods creating “conservative” climatological values.

Quantifying extreme Observations

All vertical temperature profiles from Storfjärden, starting 1931-01-01 until 2020-12-16, were evaluated against the 90-year reference period (RP31-20). “Near normal” observations, none of which exceed the 10th or 90th percentile, were excluded for clarity. Until the 1990’s “abnormally warm” observations were mainly less than 10% of all measurements, whereas “abnormally cold” categories increased until the 1970’s to almost 25% of total measurements in the surface layer, 1960’s with 22.8% for the bottom layer, respectively (Fig. 3). After the 1990’s, “abnormally warm” observations steeply increased to about 25% of all observations, whereas “abnormally cold” observations declined to 3% of observations.

Fig. 3: Decadal temperature extremes according to 1931–2020 reference period.
figure 3

Percent of extreme observations per decade in the surface and bottom layer of Storfjärden. Orange lines are of “abnormally warm” category exceeding the 90th percentile of RP31-20, whereas blue lines are of “abnormally cold” category negatively exceeding the 10th percentile. Dashed lines refer to bottom layer, solid lines refer to surface layer. n refers to the total number of measurements per decade.

Both surface and bottom layers demonstrated a steep increase in “abnormally warm” observations in all months of the year and a decline with partial disappearance of “abnormally cold” observations (Fig. 4). The summer months June, July and August had the smallest increase of “abnormally warm” observations in both layers, whereas the winter and spring months (November—May) had the highest increase, i.e., the largest change was observed in the winter and spring months. During the 2000’s the month with the highest increase of “abnormally warm” observations was January for both layers. During the latest decade (2010s) most “abnormally warm” observations were in September for the surface layer and in December for the bottom layer. This analysis demonstrates that warm extremes presently dominate in the surface and bottom layer at Storfjärden, but lacks in specifying the duration of extreme events. In the following section we will therefore introduce daily temperature data from the seafloor of Storfjärden to identify sub-surface MHWs.

Fig. 4: Decadal temperature extrema per month according to RP31-20.
figure 4

Monthly percent of extreme observations per decade in the surface (a) and bottom (b) layer of Storfjärden. Orange columns are of “abnormally warm” category exceeding the 90th percentile of RP31-20, whereas blue columns are of “abnormally cold” category negatively exceeding the 10th percentile.

Sub-surface MHWs at Storfjärden analysed from high-resolution data between 2016 and 2020

Since the initiation 2016-08-02 until the latest available datapoint 2020-12-09 (1591 days) measurements were recorded on a total of 1335 days, of which 432 days were part of a MHW with a total cumulative intensity (above the 90th percentile) of 511.1 °C*days (Supplementary Table 2), when comparing to the 90-year reference period from 1931–2020. MHWs of moderate and strong categories22 were observed across all four seasons with varying durations and intensities (Fig. 5). The notably longest MHW started 2019-12-17 and lasted until 2020-06-30 (with a 9-day gap in recordings at the beginning of April 2020 due to instrument maintenance) with a duration of 198 days, a cumulative intensity above the threshold of 252.2 °C*days and a maximum intensity of 6.9 °C above the threshold on the 2020-06-22 (RP31-20: climatological mean 4.5 °C; threshold 7.9 °C). Abrupt changes in temperature, e.g., the rapid onset and decline of the MHW 2018 at the end of July, are common for Storfjärden as this area is prone to upwellings. The highest daily average temperature recorded was during the summer MHW of 2018 with 17.8 °C and a daily maximum of 21.29 °C on 2018-07-31 (RP31-20: climatological mean 6.6 °C; threshold 10 °C).

Fig. 5: Marine Heatwaves at the seafloor (30 m) of Storfjärden.
figure 5

Marine Heatwaves between 2016 and 2020 at Storfjärden at 30m depth according to RP31-20. Linetype: solid—Daily average temperature; (following linetypes in order from bottom to top for each panel:) dotdash—mean climatology; dashed—threshold climatology; dotted—threshold_2x climatology. Coloured ribbon refers to the seasons: light blue—winter; green—spring, yellow—summer, light brown—autumn. MHWs of moderate intensity (Category I) are of light orange colour and strong MHWs (Category II) of dark orange colour.

Comparison of remotely-sensed and in situ surface long-term data

In general, remotely-sensed and in situ surface temperatures correlated positively for the tested period (1982-01-01 – 2019-08-31; R = 0.96; p < 0.01; RMSE = 1.85; Fig. 6). Despite the strongly correlating trend, care should be taken as single measurements deviated markedly from each other by up to 11.2 °C (2004-06-21; RS-SST 20.7 °C, in situ SST 9.5 °C).

Fig. 6: Correlation of satellite-derived SST and in situ SST.
figure 6

Correlation of in situ surface water temperatures with satellite-derived SST from Storfjärden for the period from 1982-01-01 – 2019-08-31. RMSE = 1.85.

The baseline climatology is essential for defining MHWs. Here we compared the probability density function of each day of the year (doy) for the exact 30-year period of 1982-01-01—2011-12-31 on the remotely-sensed and interpolated long-term SST data by applying a two-sided Kolmogorov-Smirnov test (see Supplementary Fig. 3 for a figure of Dmax and p values resulting from KS-test). Significant dissimilarities were identified on 193 days, particularly for autumn until the onset of spring (265th doy to 56th doy) and late spring until beginning of summer (120th doy to 169th doy) (Fig. 7). The climatological mean derived from the long-term dataset is lower from autumn until winter and higher during spring than the mean climatology derived from the remotely-sensed dataset.

Fig. 7: Comparison climatological baselines of in situ and satellite-derived SST for 1982–2011.
figure 7

Reference Periods from 1982-01-01 – 2011-12-31 derived from remotely-sensed (grey lines) and interpolated in situ (black lines) SST from Storfjärden. Solid lines refer to the mean climatological baseline and dashed lines to the 90th percentile threshold, respectively. The bottom ribbon refers to the days of the year, where the two probability density functions of each day are significantly different (p < 0.05, light blue) or not significantly different (p > 0.05, orange).

Discussion

In the last decades most parts of the oceans4 and lakes27 around the globe have experienced heatwaves and the frequency, duration and intensity are projected to increase even under an optimistic scenario of 1.5 °C global warming28. The entire Baltic Sea has experienced an average temperature increase from 1982 to 2016 of 1.09 °C (derived from extrapolated vertical profiles from eight sub-basins), which is two times higher than the upper 100 m of the Atlantic Ocean16. More specifically, for the monitoring site at Storfjärden, at the south-west coast of Finland, we demonstrate an increase of the trend for the period from 1927 until 2020 by 1.8 °C SST (1.3 °C bottom). In a previous study on the same time series between 1927 and 2012 an increase of 1.0 °C SST (0.8 °C at the bottom) was reported26. This means that the overall trend estimate has almost doubled when including the years after 2012. In the current study, we have in more detail investigated the underlying temperature data forming the trend, with particular focus on the fluctuations, and analysed normal and extreme temperatures between 1931 and 2020. We analysed the occurrence of MHWs only from the recent (2016–2020), high-resolution dataset. For the 90-year long-term dataset, we only analysed the occurrence of extremes (can be single data points) and there is no higher-resolution data available from the early days to analyse MHWs the way they are defined nowadays with a minimum of 5 consecutive days.

The daily probability density functions of the time periods 1931–1960 and the latest 1991–2020, from the surface and bottom layer of Storfjärden, are for most days of the year significantly different (Supplementary Fig. 2). This means that the climatological baseline and its thresholds from RP31-60 has been elevated by approximately 1 °C over the entire year in RP 91-20. The climatological baseline is representative for 30 years of data and not only highlights the persistent warming trend over decades but also changes our perspective on extreme temperatures. When applying RP31-60 to the entire time series 633 abnormally warm and 450 abnormally cold observations can be detected in the surface, and 648 and 397 in the bottom, respectively. By evaluating the time series referring to RP91-20 only 273 abnormally warm but 952 abnormally cold observations can be detected in the surface, and 268 and 897 in the bottom, respectively. This shift is schematically illustrated, by comparing the yearly bottom average temperature at Storfjärden for 2019 (5.5 °C) and 2020 (7.06 °C) with the PDFs and respective thresholds of different reference periods, in Fig. 8. 2019 can be considered an extreme year compared to the reference periods 1931–1960 and 1961–1990, but is considered normal for the 90-year reference period from 1931–2020 and also the latest reference period 1991–2020. The yearly average temperature of 2020 can be considered as an extreme year compared to any reference period. These differences in classification, whether a temperature is “normal” or “extreme”, is of theoretical, climatological nature and more research is required to understand their ecological relevance. Yet, for the evaluation of extreme events, the choice of reference period changes their estimated intensity, magnitude and duration due to the relative character of the PDF. Instead of evaluating temperature data in relation to one historic climatological period, they could also be evaluated by separate, moving climatological periods. Then an increasing trend and its extremes would shift at a similar pace29 or in other words: a moving reference period would assign the long-term warming to the climatological baseline30. Undoubtedly, the high frequency of extreme temperatures in the latest decades of this study is tightly related to long-term warming. Detrending or applying a moving reference period is yet of little ecological relevance, but mostly useful when investigating the temperature variability and forecasting. A moving reference period might be appropriate for organisms that adapt over short timescales27, but adaptation particularly for the upper thermal limit can be rather slow31. However, the specific baseline chosen depends on the question addressed and more research is required to elucidate which reference strategy to apply.

Fig. 8: Changing the perspective.
figure 8

Probability densities of four different reference periods (a Reference period 1931–1960, b 1961–1990, c 1991–2020, d 1931–2020) and their respective mean temperature (dashed line), lower (10th percentile, LT) and upper thresholds (90th percentile, UT) (solid lines). Yearly bottom average temperature of Storfjärden for 2019 (orange dot) and 2020 (brown dot).

Reference periods that span 30 years include interannual oscillations that drive climate variability. Nevertheless, particular MHW events can be directly attributed to these oscillations32. Of particular importance for the climate of the Baltic Sea are the North Atlantic Oscillation (NAO) and the Atlantic Multidecadal Oscillation (AMO). The winter NAO index is dominant in explaining the mean SST on an annual scale, whereas the AMO explains almost 60% of the variability on an interannual scale33. The 90-year reference period used in this study, includes a warm phase of the AMO peaking in the 1930s with an approximate even occurrence of cold and warm extremes (at around 12% of all observations during this decade, Fig. 3). Thereafter a cold phase with a negative maximum occurred in the 1970s which corresponded to the maximum of cold extremes in this study at about 25% (Fig. 3), and another warm phase starting in the 1990s, where we demonstrate the disappearance of cold extremes and dominance of warm extremes at about 25% (Fig. 3). Despite this close link between temperature extremes and the phases of the AMO, the latest decades after the 2000s demonstrate unprecedently warm temperatures at an exceptional frequency. Kniebusch et al. (2019)33 suggest an enhanced effect of the warm phase through global warming and propose a weakening once the AMO index engages into its cold phase again. It can therefore be expected that the next phase will be a cold phase moderated by the ongoing global warming and therefore possibly warmer than the previous cold phase. A discussion on the contribution of the NAO to annual extremes, and the identification of regional drivers, would require additional investigations that are beyond the scope of this study. Further, on a monthly scale a shift in the bottom layer happened from the 2000’s to the 2010’s; previously most excessive warm observations were recorded from June to January and in the following decade between November and May. Changes in the surface layer from the 2000’s to 2010’s are less pronounced, except for a steep decrease in extreme warm observations in January and a steep increase of warm observations in September. The long-term temperature increase in the bottom and surface layer of Storfjärden and the tendency towards milder winters with lesser sea ice extent until 201226,34 has continued between 2010 and 2020. The ongoing trend and high frequency of warm extremes emphasize their tight connection and suggest that current extremes will become the new normal.

Sub-surface water temperatures are essential for understanding biological and chemical processes on the seafloor and coastal ecosystems in general, but current remote sensing techniques cannot resolve them and have limited capacity for capturing SST in spatially heterogenous coastal areas. While there are large deviations between in situ and remotely-sensed SST also on an oceanic coast14, this deviation can be even more pronounced in the highly complex archipelago systems along the Baltic Sea coast. In this study we show that while in situ and remotely-sensed SST are highly correlated (R = 0.96; p < 0.01; RMSE = 1.85), the resulting long-term climatological values differ significantly. Deviations accumulate when considering the long-term period and occur, in our case, mostly during the warming and cooling periods of the year (spring and autumn). The impact of this mismatch is further enhanced when considering local ecosystem effects of episodic events such as MHWs. Despite the high overall correlation, deviations of up to 11.2 °C (2004-06-21; RS-SST 20.7 °C, in situ SST 9.5 °C) were recorded, which highlights the difficulty when assessing ecosystem consequences of episodic events13. Few locations have datasets spanning 30 years with daily measurements for defining climatological baselines, which in turn makes the use of satellite-derived climatological values necessary. Nevertheless, such mismatches in accuracy should be considered when evaluating extreme events by mixing in situ and remotely-sensed data. Importantly, however, it should be recognised that remotely-sensed estimates of SSTs are of very limited use for interpreting trends in sub-surface temperatures. The coastal zone can be divided into 5 temperature classes from innermost to outermost areas with depth and exposure as the dominating explanatory variables25. The water mass at Storfjärden is highly dynamic and is characterized by stratification between May and September causing a sharp vertical temperature gradient34, is prone to horizontal exchanges of water masses due to up- and downwelling35 and can thus experience large temperature changes over short times (2018-07-31; 13 °C in 24 h). In this study we used the climatological baseline derived from 90-years of monitoring, and high-resolution data from an automatic logger to investigate recent occurrences of sub-surface MHWs. In only a few years of high-resolution monitoring, 18 MHWs have been detected across all seasons ranging from short durations of 5 days up to 198 days and intensities of moderate to strong categories22 even peaking in all-time record temperatures during the July MHW in 2018. Summer MHWs have diverse effects on the seafloor and benthic communities36,37,38. For example, observations related to the two summer MHWs in 2018 on the southern coast of Finland, both in terms of ecological impacts and also a potentially less considered effect on coastal greenhouse gas emissions highlight potential MHW impacts. The summer MHWs in July 2018 (6-day duration; 21.29 °C maximum temperature 2018-07-31) and September 2018 (5-day duration; 16.75 °C maximum temperature 2018-09-11) were at the minimum duration of an MHW-event and might have been caused by elevated air temperatures and easterly winds resulting in a downwelling (Supplementary Fig. 6). Following the July 2018 MHW, diverse temperature related signals in long-term projects and a large mortality (>50%) in Mytilus edulis beds were observed (Westerbom, personal communication), raising concerns for the fate of multiple other key habitats in the area, such as seagrass meadows. Humborg et al. (2019)39 recorded elevated greenhouse gas emissions during a research cruise directly after the September MHW close to Storfjärden, suggesting a combination of a MHW and a storm event triggered the outgassing. Particularly shallow areas demonstrated elevated CH4 emissions, due to organic-rich sediment and high temperatures. Consequently, it is implied that with ongoing warming and increased frequency of MHWs, organic-rich coastal sediments of the Baltic Sea could accelerate their out-gassing of greenhouse gases39. The coincidence of the timing of the cruise and the combined MHW-storm event highlight the need to combine long-time observations, high-resolution in situ monitoring and event-based preparedness. Without the appropriate data to define the reference conditions and baselines, we are not able to assess climatological change or to predict subsequent ecosystem effects.

Materials & methods

Long-term time series

We make use of data collected at the monitoring site Storfjärden (59°51’33”N, 23°15’35”E), close to Tvärminne Zoological Station (University of Helsinki, Supplementary Fig. 1). The monitoring site has a depth of about 30 m and is located in the transition zone between the open Baltic Sea and the mouth of the Pohjanpitäjänlahti fjord-like estuary east of the Hanko Peninsula. The bight area is typical for the Finnish coast with a large number of skerries and islands creating a geomorphologically complex area. It is open towards the south and facing typically southwesterly winds making Storfjärden, located in one of the main channels, well connected to the open Baltic Sea.

The recording of water temperature and salinity started in August 1926 and has been sampled manually on the first, tenth and twenty-first day of the month, making it one of the oldest time series in the Baltic Sea next to the time series of Utö in the outermost, southwestern part of the Archipelago Sea40. Unlike the time series from Utö, the recordings at Storfjärden have been conducted regularly over the decades with only minor gaps, particularly between 1940 and 1942 during Soviet occupation. Over time the methods for measuring temperature at different depths has changed. Between 1926 and 1989 a water sampler with a reversing thermometer was used to record temperatures at 0 m, 5 m, 10 m, 20 m and 30 m depth (additionally 15 m starting in 1956). Starting in 1989 the frequency of measurements increased (from about every 10 days to about every 7 days, Supplementary Table 3) and a higher vertical resolution was achieved using a SIS CTD plus 100 and from 2009 onwards more readings were obtained by different CTDs (RBR-Concerto, FSI NXIC CTD, Valeport miniCTD, CastAway CTD) partly replacing the water sampler. The Storfjärden monitoring site is one of the most intensely sampled sites in this area, and the dataset used in the current study is a compilation of data provided by the Finnish Meteorological Institute (FMI), the Tvärminne Zoological Station (TZS) and multiple projects that have obtained CTD-readings at Storfjärden (Supplementary Table 3).

High-resolution seafloor temperature Data

In August 2016 a sensor (YSI Exo2) for high-resolution temporal monitoring of multiple parameters 30 cm above the seafloor was deployed at the Storfjärden monitoring site (30 m depth). The sensor measures temperature, salinity, oxygen, pH, and turbidity every 30 min throughout the year (except when retrieved for maintenance).

Remotely-sensed dataset

The remotely-sensed SST temperature data product “Baltic Sea – Sea Surface Temperature Reprocessed” (Product identifier: SST_BAL_SST_L4_REP_OBSERVATIONS_010_016; Location: https://resources.marine.copernicus.eu/product-detail/SST_BAL_SST_L4_REP_OBSERVATIONS_010_016/INFORMATION, accessed 2021-10-14), referred to as L4 product, used in this study derived from E.U. Copernicus Marine Service Information (CMEMS)41. The temporal coverage of this product ranges from 1982-01-01 until 2019-08-31 and has a spatial resolution of 0.02 × 0.02°. The dataset contains the SST of the closest pixel to Storfjärden, which refers to about 2,5 km2. The climatological mean and the 90th percentile for the period from 1982-01-01 until 2011-12-31 were estimated according to the definitions by Hobday et al., 20165.

Dataset quality and preparation

First, we performed a quality control of the long-term time series and the high-resolution data removing values with markedly decreasing salinity (e.g., ~2 psu within the last 1 m) towards the bottom, fluctuating or false depth signals and obviously false outliers in visual inspections. After quality control, the time series data were averaged over depths between 0–2 m (surface) and below 29 m (bottom), respectively. This resulted in the long-term time series data containing 3657 unique surface datapoints (3653 for bottom) between 1926-08-23 and 2020-12-16 (~10.6% of all days within this time frame are represented). The high-resolution data was averaged across time to produce a single value for each day.

In order to create a continuous daily record and remove the effect of different sampling frequencies over time, a daily record of temperatures in the surface and bottom was created by interpolating between datapoints when the gap was less than 31days. Linear interpolation is an accepted method of gap filling in time series as temperatures typically don’t change very abruptly21. While we apply this to a dynamic, coastal site, we feel the approach is justified for creating daily datapoints that are further averaged over at least 30 years for each day to describe overall climatic conditions. Five gaps larger than 31 days were identified. Three of these gaps were between December and January (1950 with gap = 33 days, 1962 with gap = 41 days, 1979 with gap = 41 days). The gaps were linearly filled as winters typically vary very little and they followed a prevailing pattern of warmer temperatures in December and colder temperatures in January. The two remaining gaps, between 1940 and 1942 and between July and September 1947, were not filled.

Reference periods, thresholds and analyses

A MHW is defined as an anomalously warm, prolonged and discrete event. This means that the water temperature has to exceed the 90th percentile of a preferably 30-year climatological period for at least 5 consecutive days5. MHWs can further be classified based on the multiples of difference between the mean climatological baseline and its corresponding 90th percentile22 (Category I “moderate” 1-2x, Category II “strong” 2-3x). A critical factor in this assessment is the time period it is being compared to. Historically, a 30-year period was used merely because only 30 years of data were available at that time21. The current general recommendation is to apply a 30-year span, preferably from 1961–1990, but modifications depending on data availability and application might also be used. Due to the availability of a particularly long in situ time series, we created five reference periods based on the linearly interpolated long-term time series (Table 1, schematic overview Supplementary Fig. 4).

Table 1 Time periods (starting 1st January and ending 31st December) used for climatological baselines and thresholds.

An overview of where in this study which reference period has been applied and additional detailed information can be found in the supplementary material (Supplementary Table 3).

We defined a probability density function (PDF) of the temperature data in a day-of-the-year manner for the respective reference period, the climatological mean and the thresholds according to the 10th and 90th percentile5,14. Afterwards, the climatological mean and the thresholds were smoothed by a 31-day moving average with the respective day centred. In the analyses we compared the applied dataset to the thresholds derived from the reference period based on the linearly interpolated long-term time series. To be named an extreme event, a MHW needs to persist over at least 5 consecutive days5. We therefore calculated MHWs for the high-resolution and the remotely-sensed data, using the software module heatwaveR42 in R-studio. For long-term time series, we did not feel this was appropriate as the time period between measurements was generally greater than 5-days43. Instead, we categorized and counted single measurements exceeding the threshold, in total or monthly per decade, as “abnormally warm”, exceeding the 90th percentile, “abnormally cold”, negatively exceeding the 10th percentile, and “near normal” for values between those thresholds. As the frequency of measurements has increased since the 1980’s, the counts of extreme values were standardized to the number of total measurements and expressed in percentages.

Long-term trends from 1927 until 2020 were estimated by linear regression (least squares) for the surface and the bottom layer separate. Years with less than 30 measurements were excluded.

In order to compare the difference between any two reference periods a two-sided Kolmogorov-Smirnov-test was applied to both probability density functions at every day of the year separately for the surface and bottom layer. The two-sided KS-test is a standard, non-parametric test to test if two samples come from the same underlying distributions and its null hypothesis is that both probability functions come from the same distribution. This results that a p value below 5% rejects the null hypothesis and demonstrates a significant difference between the probability density functions. As the number of measurements in the latest period between 1991 and 2020 is markedly higher (surface, n = 1470) than in previous reference periods (1931–1960, surface, n = 956; 1961–1990, surface, n = 1083), a KS-test was conducted to test if the increased frequency biased the results. Therefore, a separate dataset was created where measurements causing a gap smaller than 10 days were omitted for the RP91-20. This reduced the sampling frequency (from 1470 measurements to 901 per 30 years) to the level of previous reference periods. Thereafter, the gaps were linearly interpolated and the temperature distributions of every day of the year between the applied RP91-20 with 1470 measurements and the RP91-20 with forced gaps (n = 901) were tested with a KS-test and showed no significant difference (surface, lowest p = 0.586 at doy 171, overall mean p = 0.982; bottom, lowest p = 0.586 at doy 165, overall mean p = 0.977; Supplementary Fig. 5). This showed that the largely increased sampling frequency during the latest 30-year period does not influence the resulting climatological baseline and its thresholds.

Comparing in situ and remotely-sensed SST and reference periods

In situ SST data from Storfjärden is representative for an area classified as outer, exposed archipelago25. Here we investigated the correlation of in situ surface data from the long-term time series with remotely-sensed SST and compared the respective climatological means and thresholds of RP82-11 by applying a KS-test to their daily temperature distributions.