Introduction

The COVID-19 pandemic has posed enormous global challenges to the healthcare sector. To estimate the future need of personnel, equipment and hospital beds, reliable statistical analysis tools are required. Historic data is an important asset in figuring out how to best combine available time series data to gain predictive capability while reducing the influence of bias and other sources of prediction error and uncertainty. At the same time, statistical analysis of the historical epidemic evolution can provide indications for the success of medical treatments and vaccination programs. It also allows estimation of the accumulated number of infections. This number essentially determines the level of herd immunity, and thus received much attention in Sweden during the spring of 2020.

It is a difficult task to predict healthcare, and—of particular interest in the COVID-19 context—ICU demand. This is especially true in an early phase of an epidemic caused by a previously unknown pathogen, such as the SARS-CoV-2 virus that causes COVID-19. While it was possible to falsify early prediction models1 based on high sensitivity, it remains a largely open question how time series data could be analyzed to arrive at accurate and precise predictions, with practical use to healthcare planners.

We investigate if a particular simple class of time-invariant finite impulse response (FIR) models2—those with a delayed delta impulse response—is sufficient to model the relation between time series data. Particularly, our aim is to investigate whether the simple FIR model is sufficient for relating COVID-19 cases (detected infections), ICU admissions, and registered deaths in Sweden. We then demonstrate how such simple models can be used for reconstructing the epidemiological evolution during times of measurement uncertainty caused by limited test capacity, as well as prediction of ICU demand based on case data.

Methods

Data used

In this paper, the evolution of the pandemic is based on the following official and openly accessible time series reported by the Swedish Public Health Agency:

  • Cases: Daily RT-PCR-confirmed SARS-CoV-2 cases in Sweden. The date refers to the registration.

  • ICU admissions: Daily number of ICU admissions in Sweden for patients with COVID-19 at the given day.

  • Deaths: Daily number of deaths in Sweden for persons with a SARS-CoV-2 infection at the given day.

The data was downloaded from The Swedish Public Health Agency3 on 2 November 2021. Here we display (death) data until 10 October 2021 in order to avoid incomplete data due to late reporting. Due to insufficient testing we omit case data before 18 June 2020 in the model fitting, and represent it with dotted lines in the figures. During the first wave in March–May 2020, RT-PCR-testing was essentially focused to persons admitted to hospital and elderly care in Sweden due to limited testing capacity. In the first half of June, the Swedish government strongly advocated the testing of all persons with symptoms of COVID-19 and supplied financial assistance to the regions as of 11 June 2020. We assume that this had full effect on testing after a further week, which justifies the date given above.

In addition to the ordinary testing of persons with suspected COVID-19 infection, results for six randomized studies4 in 2020 and 2021 have been published by the Swedish Public Health Agency. They can be used to estimate the prevalence of COVID-19 in the population at the corresponding times. The studies conducted 24–28 August 2020 and 21-25 September 2020 did not provide any positive samples, while 23, 9, 24, and 43 positive cases where detected for 21–24 April 2020, 25–28 May 2020, 30 November–4 December 2020, and 12–16 April 2021, respectively. Test results were available for slightly less than 3 000 persons in the studies of 2020 and 4758 persons for the study in 2021. The limited sample size results in statistical uncertainty, indicated by the 68% confidence limits for the average assuming a Poisson distribution for the number of positively tested. Sampling bias might provide a reduced prevalence for the two latest studies according to the statistical analysis performed in the studies4. Here, we use the bare results based on the number of positive cases. For comparison, we also provide an estimate for the ICR based on sampling-bias corrected data.

While deaths and ICU admissions related to COVID-19 naturally also appear in the case data, their total number until 1 November 2021 sums up to only 1.3% and 0.7%, respectively, of the total cases. Regarding ICU and death data, one has to take into account that approximately 75% of ICU patients survive5, and former ICU patients contribute to the death toll with only 13%, as there are approximately twice as many deaths as ICU admissions. Furthermore, ICU admission data and death data relate to different age groups: While 69% of the ICU patients are younger than 70 years old, 69% of the deceased have reached at least the age of 80. The small overlap between the groups generating the cases, ICU admission, and deaths time series suggests that statistical correlations between the time series can be expected to reflect the links to their common cause: antecedent SARS-CoV-2 infection in the Swedish society. This motivates the FIR model discussed in “Finite impulse response models” with three independent filter functions for cases, ICU admission, and deaths.

Furthermore, data on antibody prevalence from blood donors and health center samples (unrelated to COVID-19-specific testing) have been aggregated and published6. Here we provide 95% confidence intervals for COVID-prevalence based on these data sources. We are aware, that neither the group of blood donors nor the group visiting healthcare centers for other reasons than COVID-19 are representative for the entire population. However, the fact that the prevalence of both groups are very similar, despite their very different characteristics, suggests that their prevalences of antibodies against Covid-19 are representative for the population.

Parameters used

In addition to the data on the COVID-19 evolution and prevalences provided by the Swedish Public Health Agency addressed in “Data used”, we apply three further parameters within this study:

Note that \(T_\text {interval}\) and \(p_\text {antibody}\) are used independently of each other in two different determinations of the ICR. Both ways provide essentially the same result, which stabilizes our results against systematic errors in these parameters. \(\tau _a\) only enters Eq. (22) and the time axis in Fig. 3.

Finite impulse response models

Finite impulse response (FIR) models are a class of linear filters. (The interested reader is referred to the textbook2 for a thorough mathematical introduction to FIR and related linear model structures.) They describe the outcome of a time-dependent observable (such as a death rate) by a sum of preceding data (here number of infections at earlier dates), which are weighted by a filter function. They are commonly used for analysing epidemiological problems, where the filter function could represent for example the serial interval distribution. For practical applications, the filter function is often not easy to obtain. Here we show that the assumption of a Dirac delta response, where the filter function has only two free parameters (delay and amplitude), allows for a consistent analysis of the COVID-19 evolution in Sweden.

A central entity for the evolution of an epidemic is the number of new infections \(\tilde{x}(t)\), occurring on day t. Let \(\tilde{p}_c(t,\tau )\) be the probability for an infection starting on day t to generate a reported positive RT-PCR test \(\tau\) days later. This results in the observation model

$$\begin{aligned} \tilde{y}_c(t)=\sum _{\tau =0 }^{\infty }\tilde{p}_c(t-\tau ,\tau )\tilde{x}(t-\tau ) + \tilde{e}_c(t), \end{aligned}$$
(1)

where \(\tilde{y}_c(t)\) denotes new cases on day t, and \(\tilde{e}_c\) is a zero-mean uncorrelated noise process representing statistical fluctuations associated with the probability \(\tilde{p}_c\). The model has finite impulse since \(p_c\) is identically zero for sufficiently large \(\tau\) (e.g. a human lifetime), and can be regarded practically as zero for \(\tau \gg 1\) week. The time index t has the unit of days. In (1) it represents that the detection probability distribution, defined through the dependence of the second time index \(\tau\), may itself vary over time.

The use of the tilde \(\sim\) in (1) is to distinguish unfiltered measurements. Historic observations \(\tilde{y}_c(t)\) exhibit a clear weekday pattern. For retrospective analysis it is therefore customary to apply a centered 7-day moving average filter,

$$\begin{aligned} y_c(t) = \frac{1}{7} \sum _{s=-3}^{3} \tilde{y}_c(t-s) , \end{aligned}$$
(2)

to compensate for such effects. Throughout the paper we will work with time series that have been subjected to filtering according to (2). We will drop the \(\sim\) notation but still write e.g. “cases” instead of “filtered cases” in favor of readability.

Within linear system theory, a model with the structure of Eq. (1) is referred to as a (stochastic) finite impulse response (FIR) model, implying (combined with Eq. (2)) that a \(y_c(t)\) can be described by a finite record of x(t).

Summation of \(p_c(t,\tau )\) over \(\tau\), yields the expected infection-to-case ratio (ICR):

$$\begin{aligned} b_c(t)=\sum _{\tau = 0}^{\infty } p_c(t,\tau ), \end{aligned}$$
(3)

defined as the probability that a person infected on day t will eventually become detected and registered as a case.

The central point of the manuscript is that we investigate the hypothesis that \(p_c(t,\tau )\) can be adequately modeled using the delta FIR model

$$\begin{aligned} p_c(t,\tau ) = b_c(t)\delta (\tau -\tau _c), \end{aligned}$$
(4)

where the discrete delta filter function given by

$$\begin{aligned} \delta (t)=\left\{ \begin{array}{ll} 1,&{}~t=0,\\ 0,&{}~\text {otherwise}, \end{array} \right. \end{aligned}$$
(5)

where \(\tau _c\) is the average delay between infection and case registration. Note that the model (4) is defined for the averaged quantities (2), where \(p_c(t,\tau )\) does not display the weekday fluctuations in t, that are likely in \(\tilde{p}_c(t,\tau )\). Assuming that the \(\tau\)-dependence of \(p_c(t,\tau )\) is reasonably well reproduced by its average \(\tau _c\) and standard deviation \(\sigma\), the simplified model (Eq. 4) is justified in the Appendix. This relies on the assumption that \(p_c(t,\tau )\) does not change on the scale \(\sigma\) in t and that the second derivative of x(t) is much smaller than \(x(t)/\sigma ^2\). For the special case of an exponential evolution for x(t), this provides an accuracy of better than 5% if \(\sigma\) is less than 46% of the doubling time, as has already been stated7.

Employing Eq. (4), the observation model Eq. (1) becomes

$$\begin{aligned} y_c(t) = b_c(t-\tau _c) x(t-\tau _c)+e_c(t). \end{aligned}$$
(6)

The model (6) asserts that the number of cases \(y_c(t)\), detected through RT-PCR testing on day t, only depends on the number of new infections \(x(t-\tau _c)\) that occurred \(\tau _c\) days earlier. Furthermore, the expected dependence is through a linear scaling factor, the ICR.

Relating the time series

We introduce analogous observation models for ICU admissions \(y_a\) and deaths \(y_d\):

$$\begin{aligned} y_a(t)&= b_a(t-\tau _a) x(t-\tau _a)+e_a(t), \end{aligned}$$
(7)
$$\begin{aligned} y_d(t)&= b_d(t-\tau _d) x(t-\tau _d)+e_d(t), \end{aligned}$$
(8)

where \(b_a(t)\) is the Infection ICU admission Ratio (IIAR) and \(b_d(t)\) the Infection Fatality Ratio (IFR), where the time dependence denotes the infection date.

The underlying infections x(t) are unknown, and cannot be estimated solely from the measurements \(y_c,y_a,y_d\), since an absolute reference frame against which to estimate the individual time-shifts and gain factors is not available. However, if we disregard the noise terms, we can relate the cases and ICU admissions time series through

$$\begin{aligned} y_a(t)=\frac{b_a(t-\tau _a)}{b_c(t-\tau _a)} y_c(t-\tau _{ac}) , \end{aligned}$$
(9)

where \(\tau _{ac}=\tau _a-\tau _c\) is the average delay between the registration as a case and the admission to ICU (not necessarily for the same person). Analogously, we have

$$\begin{aligned} y_d(t) = \frac{b_d(t-\tau _d)}{b_c(t-\tau _d)}y_c(t-\tau _{dc}) , \end{aligned}$$
(10)

with \(\tau _{dc}=\tau _d-\tau _c\) and

$$\begin{aligned} y_d(t) = \frac{b_d(t-\tau _d)}{b_a(t-\tau _d)}y_a(t-\tau _{da}) , \end{aligned}$$
(11)

with \(\tau _{da}=\tau _d-\tau _a\).

Equations (911) can be conveniently fitted to data. If the time-dependence of the b-coefficients is negligible, we can fit the ratio \(\lambda =b_a/b_c\) and \(\tau _{ac}\) from Eq. (9) by minimising the sum of squares

$$\begin{aligned} F^\text {LS}_{ac}(\lambda ,\tau _{ac})=\sum _t \left[ y_a(t)-\lambda y_c(t-\tau _{ac})\right] ^2, \end{aligned}$$
(12)

which results in the two fitting parameters \(\lambda ,\tau _{ac}\). In order to obtain robust estimates we alternatively minimise the modified sum of squares

$$\begin{aligned} F^\text {LSmod}_{ac}(\lambda ,\tau _{ac})=\sum _t \left[ \frac{y_a(t)}{\sqrt{\lambda }}-\sqrt{\lambda }y_c(t-\tau _{ac})\right] ^2\, . \end{aligned}$$
(13)

As a third option, we maximize the correlation coefficient

$$\begin{aligned} r(\tau _{ac})=\frac{\sum _t\left[ y_a(t)-\bar{y}_a\right] \left[ y_c(t-\tau _{ac})-\bar{y}_c\right] }{\sqrt{\sum _t \left[ y_a(t)-\bar{y}_a\right] ^2}\sqrt{\left[ y_c(t-\tau _{ac})-\bar{y}_c\right] ^2}} \end{aligned}$$
(14)

to obtain \(\tau _{ac}\) and use \(b_a/b_c=\bar{y}_a/\bar{y}_c\). In all three approaches, the times t are chosen such that reliable data for both \(y_c(t-\tau _{ac})\) and \(y_a(t)\) are available. Equations (10, 11) are treated in the same way, where the indices ac are replaced by dc and da, respectively, in the formulae above. Note, that each combination of indices applies a different time interval due to the reliability condition.

Calibrating against randomized RT-PCR test data

While Eqs. (911) establish relative relations between the time series \(y_c,y_a,y_d\), a “grounding point” is needed to obtain absolute values of the time shift and scaling parameters of the observation models Eqs. (68). Randomized RT-PCR studies provide such a grounding point, where we use the data discussed in “Data used”. From the number of positive RT-PCR test results in each study, we can estimate the prevalence \(N_\text {positive}(t)\) by multiplying with the population of Sweden and dividing by the number of tested persons.

The prevalence \(N_\text {positive}(t)\) depends on the probability \(p_\text {positive}(\tau )\) to have a positive test result \(\tau\) days after becoming infected:

$$\begin{aligned} \tilde{N}_\text {positive}(t)= \sum _\tau p_\text {positive}(\tau )\tilde{x}(t-\tau ). \end{aligned}$$
(15)

After time-averaging and using again the impulse FIR model this provides

$$\begin{aligned} N_\text {positive}(t)=T_\text {Interval}x(t-\tau _\text {positive}), \end{aligned}$$
(16)

where

$$\begin{aligned} T_\text {Interval}=\sum _\tau p_\text {positive}(\tau ) \end{aligned}$$
(17)

is the average time-interval over which a positive test result is expected and

$$\begin{aligned} \tau _\text {positive}=\frac{1}{T_\text {Interval}} \sum _\tau \tau p_\text {positive}(\tau ) \end{aligned}$$
(18)

is the average delay after the time of infection. From the data displayed in Figure 2 of Kuricka et al.8, we extract \(T_\text {Interval}=10.8\) days and \(\tau _\text {positive}=12\) days. Another study9 found \(T_\text {Interval}=9.5\) days. Motivated by these numbers we use \(T_\text {Interval}=10\pm 1\) days and make the simplifying assumption \(\tau _\text {positive}=\tau _c\). (The resulting value for \(\tau _c\) from Eq. (22) agrees well with \(\tau _\text {positive}=12\) days, extracted from8). Analogously to Eq. (9) we fit the relation

$$\begin{aligned} N_\text {positive}(t)=\frac{T_\text {Interval}}{b_j(t-\tau _\text {positive})} y_j(t-\tau _\text {positive}+\tau _j)\, \end{aligned}$$
(19)

where \(j=a,d\) refers to the data sets for ICU and deaths. We apply all three fitting routines, again neglecting the time-dependence of \(b_j(t)\)—however, \(\tau _\text {positive}=\tau _c\) is kept fixed. (As we only have 4 non-vanishing data points for \(N_\text {positive}\), the use of a second fit-parameter next to \(b_j\) could provide spurious results)

Results

Scaling of data

The symbols on the upper panel of Fig. 1 show the daily Swedish numbers of positively tested persons (named cases), persons admitted to intensive care units (named ICU), and deaths, see “Data used” for details. These data were averaged over a seven day period (lines) in order to avoid weekday fluctuations, resulting in the functions \(y_c(t)\) (Cases), \(y_a(t)\) (ICU admission), and \(y_d(t)\) (deaths), which are the main data-sets used throughout this article. Here the time t is chosen as the central day of the averaging period.

Figure 1
figure 1

Upper panel: Raw data (symbols) used in this study. (The numbers of ICU admissions and deaths have been multiplied by 100 to provide comparable numbers.) The lines show the respective averages over a 7 day period, where the central day of the interval was used in the abscissa. Lower panel: 7 day averaged data from upper panel scaled using the proposed methodology of “Methods”, with resulting scale factors provided in the figure legend. The ICU-data are shifted by 5 days to the left and the death-data are shifted by 13 days to the left, so that they essentially fall on one curve. Dotted lines indicate sections of data that are considered as incomplete. The black error bars provide scaled 68% (one normal standard deviation) confidence levels of randomized RT-PCR study data. All scaling factors and delays are taken from the LS values in Table 1.

As described in “Relating the time series”, we determined the ratios \(b_a/b_c\), \(b_d/b_c\), and \(b_d/b_a\), as well as the corresponding delays. The vaccination program against SARS-CoV-2 (starting in the last week of 2020 in Sweden) is likely to have effected \(b_c(t)\), \(b_d(t)\) and \(b_a(t)\) differently. In its first phase, the program targeted elderly in need of special care, who show the by far largest risk to die from COVID-19. Thus, we disregard death data after January 2021 for the model fitting. Subsequently, groups were vaccinated in order of decreasing age. The main part of the age group 65–70 years was vaccinated with the first dose around the first half of April 2021. As this group is highly over-represented in ICU care, we also disregard ICU admission data newer than April 2021 from the model fitting. As mentioned above, we additionally disregard the case data before 18 June 2020, when testing became available to all persons with symptoms in Sweden, see “Data used”. The results are given in Table 1. We find that all three approaches provide identical time delays, which we regard as particularly reliable. Also the fractions between the ratios agree fairly well with deviations far below 10 %. In the following we apply the values from the least squares (LS) method, see “Relating the time series”, but we note that none of our results depends on this choice of method. The relations \(b_a/b_c\cdot b_d/b_a=b_d/b_c\) and \(\tau _{ac}+\tau _{da}=\tau _{dc}\) hold only approximately as different time intervals are used in the fitting due to the exclusion of death data after 1 February 2021 and case data before 18 June 2020 addressed above.

Table 1 Results for different fitting procedures (LS least squares; LSmod modified least squares; Corr. coeff. correlation coefficient) detailed in “Relating the time series” for ratios and time delays.

Using these scaling factors \(b_a/b_c=1/193\) and \(b_d/b_c=1/68\) as well as the respective time delays, the lower panel of Fig. 1 shows that all three curves agree very well over the second wave of November 2020–January 2021. The ICU and death curve also show a similar behavior at the first wave March–May 2020, albeit the ratio between ICU admission and deaths appears to be slightly higher here. The case numbers are much lower due to the limited testing before mid-July. For the third wave March–May 2021, the ICU and case curve agree very well (the dip in cases around 1 April may be attributed to decreased testing around Easter). We also see that the death curve shows much lower values from around mid-January 2021, which coincides with the start of the vaccination program in Sweden at the end of 2020.

Comparing with randomized RT-PCR and antibody studies

We also fitted the data from the 6 randomized RT-PCR studies to the ICU and death data (here we omitted the sixth study, as the fatality was significantly reduced in 2021, most likely due to the vaccination program) and found almost identical results in both cases, see Table 1. Thus, the prevalence of RT-PCR-detectable SARS-CoV2 infections is about 16 times higher than the number of cases (as reconstructed by shifting and scaling the death and ICU data), see the lower panel of Fig. 1. As an infected person can be detected over an average time interval \(T_\text {interval}\), but is only registered once as a case with probability \(b_c\), this implies \(T_\text {Interval}/b_c\approx 15.6\pm 0.9\).See “Calibrating against randomized RT-PCR test data” for details. Here we used the average of all values provided in the two lowest rows of Table 1 and used the maximal deviation as an estimate for the error. The time interval, an infected person is tested positive seems to be less well known. Using \(T_\text {Interval}=10\pm 1\) days, see “See “Calibrating against randomized RT-PCR test data”, this provides the ICR based on the prevalence studies

$$\begin{aligned} b^\text {prevalence}_c\approx 0.64\pm 0.10 \end{aligned}$$
(20)

We note that the sampling-bias corrected data for the prevalence results in a slightly larger value \(\begin{aligned} b^\text {prevalence\_corrected}_c\approx 0.7 \pm 0.11 \end{aligned}\) with overlapping error marginal.

The lower panel of Fig. 1 shows, that the case data agree with the prevalences divided by 16 after testing become widely accessible around mid-June 2020. As both the scaled ICU and death data agree well with the prevalences at all times, we can take these curves for estimating the number of cases before mid-July, where the case data are not reliable due to limited testing.

Figure 2
figure 2

Accumulated number of cases at a given date, where we added the estimates based on scaling the death and ICU data. The dashed blue line provides the case data where we added an estimate for number of cases missed due to limited testing before mid-June. The symbols with error bars show the results of antibody tests performed for blood donors and blood samples from health centers6, where the fraction of positive tests was multiplied by Sweden’s population. Note that the number of persons with antibodies in March 2021 include detected results from vaccinations.

In Fig. 2 we find a clear plateau of total cases in July–November 2020, as there were few new cases in this period. The average plateau value (at 1. Sept) was 450,000 (reconstructed) cases, with an uncertainty (based on the ICU and death data) of about 55,000. Antibody tests for different groups provide estimates of 700,000 positive persons in Sweden both at the beginning and the end of the plateau, see Fig. 2. This results in \(b_c/p_\text {antibody}=0.64\pm 0.08\). The probability \(p_\text {antibody}\) to develop detectable antibodies has been found10,11 to be above 90%. As it should not exceed 100%, we assume \(p_\text {antibody}=0.95\pm 0.05\), and find

$$\begin{aligned} b^\text {antibody}_c\approx 0.61\pm 0.11 \end{aligned}$$
(21)

This agrees very well with the different estimate Eq. (20), albeit different statistical and systematic errors (in particular the values of \(T_\text {Interval}\) and \(p_\text {antibody}\)) enter both ways to calculate \(b_c\). Thus we consider \(b_c=0.63\) as a good estimate for the ICR with the awareness that a 10% error is not unlikely.

We note, that a relatively large number of persons with antibodies was found in the study at the beginning of March 2021. Here, one has to take into account that a part of the antibodies detected results from vaccinations. At this time about 700,000 persons had been vaccinated in Sweden and a majority of them should have developed antibodies, when the data was collected.

Reconstructing the IFR and the IIAR

Assuming the ICR \(b_c=0.63\) for the time after June 18\(\text {th}\), 2020, we can estimate the variables \(b_a(t)\) and \(b_d(t)\) from Eqs. (9, 10). At first we need the absolute delays. Here we rely on the data for ICU admission, which in average occurs about 11 days after the onset of symptoms according to the Swedish Intensive Care Registry12. Furthermore, it is known, that it takes about 6  days from the times of infection to develop symptoms13,14,15. Thus we use \(\tau _a=17\) days in the following. Based on the values on Table 1 we get

$$\begin{aligned} \tau _c=12 \,\,\text { days, }\tau _a=17 \,\,\text { days, and } \tau _d=25 \,\,\text { days.} \end{aligned}$$
(22)

In Fig. 3 we plot the time-dependence of the infection-to-ICU admission ratio (IIAR) \(b_a(t)\) by a green line on the basis of Eq. (9). Here we used 21 day averages to restrict fluctuations. We find that \(b_a(t)\approx 0.34\) % is close to its average, which confirms the quality of the scaling. The larger bump around late July occurs within a range with small numbers of ICU admissions (average of 1.6 in August 2020), so that statistical fluctuations cannot be excluded here.

Figure 3
figure 3

Estimated values for the IIAR \(b_a(t)\) (green) and the IFR \(b_d(t)\) (red, blue) based on Eqs. (9, 10, 11). We assume a constant ICR \(b_c=0.63\), which appears reliable for cases after 18 June (dotted curves indicate that earlier cases data are applied). The horizontal dashed lines provide the average values.

Similarly, we obtain the IFR \(b_d(t)\) from Eq. (10) as shown by the red line in Fig. 3. On average, we have 0.74% until end of 2020 (before vaccinations started to show effects), but there are pronounced changes over time. For infections in August and September of 2020, the IFR is much lower. For infections after 1 January 2021, we see a distinct decline in the IFR reaching values of 0.1% for infections around mid-June 2021. Afterwards the IFR predicted by our approach increased again. The same behavior for the IFR can be obtained from from Eq. (11) as shown by the blue line in Fig. 3, which now extrapolates to times before mid-June under the assumption, that the IIAR remained essentially constant in this period as well.

Discussion

Using openly accessible data from the Swedish Public Health Agency and the Swedish Intensive Care Registry, our approach provides estimates for the infection-to-fatality ratio (IFR), infection-to-case ratio (ICR), and infection-to-ICU admission ratio (IIAR). We find that data for daily cases, daily ICU admission, and daily deaths of individuals with confirmed infection fall on essentially a single curve based on a FIR model with a delta filter function and time-invariant fit parameters, see the lower panel of Fig. 1. There are only two major well-understood exceptions: (I) Cases exhibit a poor match before mid-June 2020, when free RT-PCR testing became broadly available in Sweden for all persons with symptoms. (II) There is a sharp relative decrease in deaths coinciding with the start of the national vaccination program around the turn of the year 2020–2021.

Based on these findings we demonstrate that the approach can be used to retrospectively estimate the cases time series prior to July 2020, that would have been observable with a broad RT-PCR testing program in place. Furthermore, by incorporating data from six randomized RT-PCR studies conducted by the Swedish Public Health Agency and data for the prevalence of antibodies, we obtain an absolute value for the infection-to-case ratio ICR of \(0.63 \pm 0.07\) for the time when testing is easily available to persons with symptoms in Sweden. This value is larger (but compatible within its error) than the value 0.56 obtained in a study for Iceland10. It means that approximately 37% of all infected remained undetected as cases.

The key underlying assumption is that the correlation of the data observed under the model originates in causal relations. Such assumption is present, and constitutes a limitation, in all dynamic modeling. The reason we found merit in the proposed model is that it (I) is supported by data, (II) has few parameters, thus reducing the concern of over-fitting, and (III) is consistent with the plausible mechanistic explanation presented in the manuscript.

Additionally, the model relies on the assumption that the confusion matrices (true and false positives and negatives) for involved time series did not change importantly over time. For example, we have assumed that the RT-PCR detection probability is similar across SARS-CoV-2 variants (but it is straightforward to replace this assumption with externally obtained information on how e.g. detection probabilities vary based on the exact testing protocol). This is why we have omitted data from episodes during which such variations are known to have been prominent: early in the pandemic, when there was a shortage of RT-PCR test availability; late in the pandemic, when the effect of vaccination started to manifest itself in the data. For the period in-between, the good correlation of data under the model suggests that the assumption is a valid one, or at least one that cannot be rejected by the considered data.

Regarding systematic errors, the ICR value obtained from our study is reduced if both the time interval for positive testing, \(T_\text {Interval}\), and probability to develop measurable antibodies after an infection, \(p_\text {antibody}\), turn out to be much less than 10 days and 0.95, respectively. On the other hand, larger values are less likely as \(p_\text {antibody}=0.95\) was chosen close to its absolute maximum value 1 in the analysis. However, a sampling bias or false-positive tests may allow for deviations, which we cannot judge here.

Figure 2 indicates that the total number of cases would have been around 820 000 at the end of 2020, if testing in the first half of 2020 had been as available as in the second half. With an ICR of 0.63, this provides about 1.3 million infected persons in Sweden in 2020. This corresponds to 13% of the population and is far below values required to reach herd immunity.

Based on the ICR of 0.63, we obtain the infection to ICU admission ratio IIAR of 0.34% and an average infection to fatality ratio IFR of 0.74 % (before the start of the vaccinations), see Fig. 3. Our value for the IFR is comparable to earlier studies16,17,18. Note that possible systematic errors in ICR, as addressed above, affect the IIAR and IFR proportionally. Our analysis shows that the IIAR is rather constant in time, at least for the time after august 2020. In contrast, the IFR varies much stronger over time. We attribute its decline starting for infections around 1 January 2021 to successful vaccination of elderly persons. They dominate the mortality in COVID-19, but are less relevant for ICU admission. The increase in the IFR after early July 2021 may have a variety of reasons. It may indicate an increasing number of breakthrough infections for the elderly around 6 months after their vaccination. However, another contributing factor could be the fact that around 1 July 2021 the SARS-CoV-2 vaccination became available to all adults in Sweden. In Sweden, the following vaccines are approved (approval date): Comirnaty by Pfizer/BionTech (21 December 2020), Spikevax by Moderna (January 6 2021), Vaxzevria by Astra Zenecas (29 January 2021), and Covid-19 Cavvine Janssen by Janssen (11 March 2021). Predominantly, Comirnaty has been used, making up for roughly 13 M of totally 19 M doses administered according to Swedish Public Health Agency statistics19 by 29 October 2021.

The appearance of novel variants of the virus in Sweden during late 2020 and 2021 complicates the interpretation of the results. The disease severity is known to vary depending on the variant20, but the time period during which new variants appeared overlaps with the time frame of the vaccination program, making it difficult to disentangle the effects of vaccination and new variants on the IFR. Relatedly, we lack explanation for the reduced IFR between mid-July and mid-September of 2020, as seen in Fig. 3. Such observations suggest that modeling for prevalence and healthcare demand estimation or prediction could be facilitated by national-level aggregation of epidemiological data. Particularly, accessible data on vaccination status, relative prevalence of identified virus variants, and reasons (suspected infection, randomized testing, etc.) could be directly incorporated into the proposed model with the objective of bias reduction.

The close correlation between cases, ICU admissions, and deaths—down to a linear scaling and time-shift once self-test had been made broadly available—warrants further investigation. It could be explained through (I) time-invariance of the ICR, the IIAR, the IFR, (the dip in IFR in summer is not visible in the lower panel of Fig. 1 due to the small numbers) and the temporal distributions describing the associated flows; (II) variations in the mentioned entities that have essentially cancelled each other throughout 2020; (III) external confounders providing this canceling effect. Case (I) would imply that the healthcare system’s ability to save COVID-19 patients and the impact from different virus mutations has not changed markedly. Case (II) would be surprising for the time from August 2020, where the ratio between cases and ICU admission is largely constant in time, see Fig. 3. Thus it is most likely that \(b_c(t)\) and \(b_a(t)\) are both constant in this range. However, for the first half of 2020, things are less obvious. The variations observed in the magenta curve may result from reductions in \(b_a(t)\) and \(b_d(t)\) (i.e. improvement in healthcare) occurring at different times. Case (III) would also be noteworthy since the groups of deceased and persons admitted to ICU care have marginal overlap with each other.

The studied time series alone are not sufficient to map out the causation of the observed correlations, thus distinguishing between (I) and combinations of (II) and (III). However, an understanding of the underlying mechanism could be obtained by retrospectively tracking individual traces connecting the considered time series (e.g. persons who have tested positive, been admitted to an ICU or died with COVID-19). Importantly, if the correlation can be understood and shown not to be coincidental, it could constitute the basis for an accurate 1–2 week predictor of the ICU demand.

Conclusion

The summarizing conclusion from our observations is that important insight into numerous aspects of an ongoing epidemic can be obtained by considering the scaling between different time series, where time shifts are crucial. This is based on an FIR model with a delta filter function, which is shown to work well. We demonstrated this by reconstructing the daily number of cases for the first half year of 2020 periods, where testing was limited in Sweden, and extracting time variations in the infection fatality ratio.