Introduction

The Japanese National Institute of Infectious Diseases identified the new P.1 SARS-CoV-2 variant from travelers returning from Amazonas State, Brazil, on 6-January-20211. P.1 was eventually reported in Manaus city (Amazonas state capital), on 11 January-20212. Later, it was identified in samples collected since 6-Dec-2020 from Manaus3. According to phylogenetic studies, P.1 likely emerged in the Amazonas state in early3 or late4 November 2020. This variant shares mutations with other variants of concern (VOCs) previously detected in the United Kingdom and South Africa (B.1.1.7 and B.1.351, respectively)2. Mutations of these two other variants are associated with greater transmissibility and immune evasion5,6, which confer them the status of variant of concern. However, information, data, and analyses on the epidemiology of P.1 are still incipient.

The Coronavirus disease 2019 (COVID-19) outbreak in Manaus (April–May 2020) was followed by a period of high but stable incidence, after which the proportion of individuals who were infected by the SARS-CoV2 virus may have reached 42%7 to 76%8 by November 2020. From December 2020 to February 2021 the city was devastated by a new outbreak that caused a collapse in the already fragile health system9, with shortages of oxygen supply10, while the frequency of P.1 increased sharply from 0% in November 2020 to 73% in January 20214. The pathogenicity of P.1 variant is still unknown, although recent studies point to increased viral load in individuals infected with the new variant4, suggesting it could be higher than the one from previous circulating strain. We analyzed Brazilian national health surveillance data on COVID-19 hospitalizations and the frequency of P.1 among sequences from residents of Manaus city using a model-based approach (an extended SEIR compartmental model—see Fig. 1) to estimate the relative transmissibility in comparison to the previous local variant(s), and relative force of reinfection of the P.1 variant, i.e. the ratio between the force of infection by P.1 on previously infected individuals (reinfections) and the force of infection by P.1 on susceptible ones (new infections). We estimate that the P.1 variant is about 2.6 times more transmissible than previous circulating variant(s), and 28% of Manaus cases in the period were due to reinfections.

Fig. 1: Diagram of the extended deterministic compartmental model (SEAIHRD).
figure 1

The model compartments and the respective connections between them are summarized in this diagram, and they are named as S: Susceptible, E: Exposed (pre-symptomatic), H: Hospitalized (severe infected individuals), I: Infected (symptomatic individuals, not hospitalized), A: Asymptomatic. D: Deceased, R: Recovered. Compartments are subdivided into three age categories, not represented here for simplicity. Compartments with subindex 1 represent the wild-type variant, subindex 2 refers to the VOC P.1. Continuous lines represent flux between each compartment; dashed lines, infection pathways. Small arrows indicate force of reinfection and transmissibility. λ = force of infection. β = relative transmission rate. pr = relative force of reinfection. γ = average time between being infectious and presenting symptoms. σ = proportion of severe cases that require hospitalization. α = proportion of asymptomatic cases. νs = average time between being infectious and recovering for severe cases. νi = average time between being infectious and recovering for mild/asymptomatic cases. μ = in-hospital mortality ratio.

Methods

In order to estimate key parameters of the variant of concern (VOC) P.1, we developed a model and fitted it to time-series data of number of hospitalized cases and frequency of the P.1 variation. The fitting approach used here can be applied to other regions where as soon as data on number of cases and frequency of a new variant are available. It primarily requires surveillance data to determine proper model initial conditions. In Brazil, these are the hospitalized cases data. Stratification by age allows the model to also consider the different death rates, asymptomatic and hospitalized proportions of each age class, important features for SARS-CoV-2. Contact levels between different age classes, which may vary from one place to another, can also be considered. For special cases in which information such as contact between ages classes and age distribution are not available (or even unnecessary for some other disease), the model can be easily simplified. In this sense, the method proposed here demands low-detailed data and relies on the structure of a simple compartmental model to measure quantities of interest, such as transmissibility and relative force of reinfection. More information about the methodology here applied is available in the Supplementary Information (SI) in sections 13.

Model

A deterministic compartmental model (Fig. 1) was developed to model the infection of Susceptible individuals moving to the Exposed (pre-symptomatic) compartment, which can progress to three alternative compartments: Hospitalized (severely ill), Infected (symptomatic but non-hospitalized), and Asymptomatic. Eventually, individuals move to Recovered or Deceased. Two variants are considered: 1-non-P.1 (“wild-type”) and 2-new/P.1. The latter is assumed to infect Recovered individuals previously infected by the wild-type, and no reinfections of wild-type due to waning immunity occur. Compartments were stratified into three age categories: young (<20 years old), adult (≥20 and <60 years old) and elderly (≥60 years old), with different rates for outcomes. The key parameters of relative transmissibility and relative force of reinfection were estimated by a maximum likelihood fitting to the weekly number of new hospitalizations and to genomic surveillance data. Three additional model parameters with unknown values were also estimated. The remaining parameters (24 out of 29) were fixed, using current values from the literature. Sensitivity to different pathogenicity of the P.1 variant was explored by repeating the fit assuming IHR as a free parameter (SA1). The sensitivity to the period analysed was also explored by another fit excluding the health system collapse period (SA2) (see below. Further model details about the model are available in the Supplementary Methods (Section 1 of the SI).

Dataset

We used the Brazilian epidemiological syndrome surveillance system for influenza, SIVEP-Gripe (https://opendatasus.saude.gov.br), to track COVID-19 hospitalized cases. All hospitalized patients with Severe Acute Respiratory Illness (SARI) are reported to SIVEP-Gripe with symptom onset date and SARS-CoV-2 test results. SIVEP-Gripe, due to its universal coverage and mandatory notification of SARI cases, has an homogeneous testing effort to diagnose SARS-CoV-2 infections, and is currently the best source for Brazilian data at the national level. As the data are publicly available by the Brazilian National Health System, no ethical approval was needed to perform the analysis, according the the National Ethical Commission (CONEP) of the National Health Council, Resolution Number 510 of April 7, 2016. Hospitalization data provides the most accurate basis to infer incidence in Manaus, because mild cases are vastly under-reported and testing capacity fluctuates, while mortality data are harder to relate to total number of cases, since the city’s health system endured a prolonged stress even before the collapse, with large variation in the in-hospital fatality rate over time11. Time-series of frequency of sequenced genomes identified as P.1 in Manaus were extracted from published datasets3,12. Because of the uncertainty of the mortality rate, hospitalization data are more reliable to monitor the Brazilian situation. However, hospitalization suffers an important dampening when the health system collapses, presenting a false spreading control precisely because of the overload on the health services, which cannot admit more patients than their capacity. We take this into account making a sensitivity analysis (SA2—see below). Likewise, when the health system is overwhelmed, the IHR can be higher (as it can be only a trait of the P.1 variant), leading to greater mortality rates than the parameters used in this work. For this reason, we also taken the differing IHR into account for P.1 using a sensitivity analysis (SA1—see below). Data and code are available in the following link: https://zenodo.org/record/559460013, and in the Github repository https://github.com/covid19br/model-P1-variant.

Nowcasting

Data for hospitalized COVID-19 cases among residents in Manaus from 01 November 2020 to 31 January 2021 was obtained from SIVEP-Gripe database of 15 February 2021. The hospitalized cases of the last 10 weeks in the time-series were nowcasted14 to correct for notification delay Data used in parameters estimation were collected from the SIVEP-Gripe In this system, reporting of cases can be delayed for several reasons, including the notification system itself and confirmation of RT-PCR test results. The nowcasting procedure estimates, based on the past delay distribution, the number of cases that already occurred but were not yet reported. A window of 10 weeks is the acting window on the series, since delays greater than this are rare.

Nowcasting requires a pair of dates: (i) onset date of the event and (ii) report date of the event. The delay distribution is modeled as being best described as a Poisson distribution for days since the onset date to the report date. We considered the first symptoms date as the onset date. For the report date, we used the latest between the test result date and the clinical classification date. The nowcasting algorithm were developed by ref. 14, and implemented in the NobBS (Nowcasting by Bayesian Smoothing) package in \({\mathsf{R}}\)15.

Model parametrization and initial condition estimation

The model requires appropriate mid-epidemic initial conditions in order to give relevant results. In the model, the number of new hospitalizations at a given time—hnew, is directly proportional to the number of exposed individuals at that time, therefore data was used to get an approximation of the number of exposed people. Also, to quantify the number of people belonging to the recovered class, prevalence was used. In Table 1 we present the parameters considered for the wild-variant are described below. The parameters for the P.1 variant are the same except for those considered in the model fitting and in the Sensitivity Analysis 1 (SA1). See more detailed for the analysis of the initial conditions in the Supplementary Methods (Section 3 of the SI).

Table 1 Epidemiological parameters.

Maximum likelihood estimation

Given the cumulative daily curves of hospitalization for wild-type variant (C1), and P.1 variant, (C2) we can obtain the daily variation of each curve (namely \({{\Delta }}{C}_{1}^{t}\) and \({{\Delta }}{C}_{2}^{t}\)). Those curves are summed up to give the total number of weekly new cases:

$${{\Delta }}{C}^{\tau }=\mathop{\sum }\limits_{i=1}^{7}({{\Delta }}{C}_{1}^{\tau -1+i}+{{\Delta }}{C}_{2}^{\tau -1+i})$$
(1)

where τ is a discrete index given in weeks.

To calculate the frequency of P.1 in a given time period T, we use the proportion of new cases in this period from the wild-type and P.1 variant as follows:

$${P}^{t^{\prime} }=\frac{\mathop{\sum }\limits_{i=1}^{T}{{\Delta }}{C}_{2}^{T-1+i}}{\mathop{\sum }\limits_{i=1}^{T}{{\Delta }}{C}_{1}^{T-1+i}+\mathop{\sum }\limits_{i=1}^{T}{{\Delta }}{C}_{2}^{T-1+i}}$$
(2)

where \(t^{\prime}\) is a discrete index given in T periods. The time period T depends on the dataset of genome sequences (weekly3 and monthly12).

Using the maximum likelihood method, we fitted the model by estimating five parameters, namely, the relative transmissibility (\({{\Delta }}\beta =\frac{{\beta }_{2}}{{\beta }_{1}}\)), the relative force of reinfection of P.1 (pr), initial total prevalence (ρ0 = [R/N]t=0), initial fraction of cases that were caused by the new variant (P0), and intrinsic growth rate of the wild-type variant (r). The initial fraction of P.1 cases (P0) accounts for the uncertainty in the time of emergence of the new variant: the simulation starts at beginning of November, so this initial value is below 1 individual, and only reaches this threshold by mid to late November, depending on the value of P0. The parameter r incorporates effects related to contact rates for the wild-type variant, such as non-pharmacological interventions relaxation, elections, and others; it affects the transmissibilities of both variants (β1 and β2) in the same way, and so is independent of Δβ.

Number of hospitalization cases were assumed to follow a Poisson distribution, with expected value given by Eq. (1). The recorded number of P.1 in genome samples was assumed to follow a binomial distribution with an expected value equal to the product of the total number of genome sequences sampled in each date and the proportion of P.1 cases (Eq. (2)). The log-likelihood function for the model fitting was then:

$${{{{{{{\mathcal{L}}}}}}}}=\mathop{\sum}\limits_{i}{{{{{{\mathrm{log}}}}}}}\,{{{{{{{\rm{Pois}}}}}}}}({x}^{i}| \lambda ={C}^{i})+\mathop{\sum}\limits_{j}{{{{{{\mathrm{log}}}}}}}\,{{{{{{{\rm{Bin}}}}}}}}({y}^{j}| N={n}^{j},\theta ({\pi }^{j})={P}^{j})\,,$$
(3)

where Pois is a Poisson distribution with parameter λ, xi is the number of recorded hospitalizations in week i, Bin is a Binomial distribution with parameters N (total number of trials) and πj (probability of success at each trial), nj is total number of sequences in clinical samples in week or day j, yj is the number of P.1 sequences in each of these samples, and θ(. ) is the logit function.

The model was then fitted by finding the values of the five above mentioned parameters that minimize the negative of the log-likelihood function (Eq. (3)), using the function mle2, from the R package bbmle16. To find starting values for the optimization performed by mle2 we calculated the log-likelihood function for one million combinations of parameters values in a regular reticulate within reasonable ranges. The 100 sets of parameters that were local minima (that is, with highest log-likelihood values) were used as starting values for the computational minimization.

The confidence intervals for the expected number of cases and frequency were estimated from 20,000 parametric bootstrap samples assuming that the estimated parameters follow a multivariate normal distribution. The parameters of these multivariate distributions were the estimated values and estimated variance-covariance matrix of the parameters. We then calculate the 2.5% and 97.5% quantiles of each parameter to obtain the confidence intervals of our estimates. We check the reliability of these estimates by verifying that the log-likelihood profiles satisfy required conditions for identifiability, as detailed in the Supplementary Methods (Section 4 of the SI and Fig. S1).

Sensitivity analysis

The model fitting assumed a constant infection hospitalization rate (IHR, parameter σ) for each age group over time for both variants. Recent evidence suggests that prior SARS-CoV-2 infection protects most individuals against reinfection17, so reinfections might have lower IHR. Because the pathogenicity of the P.1 variant is unknown, the model fitting was repeated assuming that the odds ratio of the IHR in each age class for P.1 infections compared to wild-type variant infections (SA1) is a free parameter. Moreover, as the collapse of Manaus health system hindered hospitalizations of new severe cases and may have affected case recording in surveillance databases, the model fitting was repeated considering only the period prior to the collapse (10 January 2021) (SA2). Sensitivity analysis, latin hypercube explorations and likelihood profiles characterization are important methods which can be applied in this kind of model, where fitting depends on a set parameters, which were obtained from the literature and one has no further information about confidence intervals.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Results

We present the fitted parameters on Table 2 and Fig. 2. The estimated transmissibility of P.1 was 2.6 (95% Confidence Interval (CI): 2.4–2.8) times higher compared to the wild-type variant, while the relative force of reinfection of the new variant was estimated to be 0.032 (CI: 0.026–0.040, Table 2). The fitted model also estimated that, at the time P.1 variant emerged, the prevalence of previous infection by the wild-type variant was 78% (CI: 73–83%), and that the number of cases by the wild-type variant were increasing with an estimated daily intrinsic growth rate of 0.029 days−1 (CI: 0.024–0.035 days−1). Given these parameter values, reinfections by P.1 accounted for 28% of the cases in Manaus from November 2020 through January 2021.

Table 2 Summary of the fitted parameters and respective confidence intervals considering the entire period, November 1, 2020 to January 31, 2021 maintaining the same pathogenicity of the previous variant.
Fig. 2: Hospitalization cases and frequency of the P.1 variant in Manaus city.
figure 2

a Weekly new hospitalized COVID-19 cases in Manaus city. Grey line represents the fitted values of total cases (all variants) by maximum likelihood estimation (MLE) of the parameters. Red and blue lines represent the predicted values of cases due to P.1 and wild-type variants, respectively. Black dots are nowcasted observed data of hospitalizations. Panels b, c show the fittings to the time-series frequency of P.1 on datasets provided by previous works, which were aggregated monthly13 and weekly3, respectively. The area around the lines indicate the 95% CI of the expected values. Dots and lines are the sample proportions of P.1 in sequenced genomes, and their 95% sample CI. The fitted values of the model parameters are presented in Table 2.

We also evaluated the impact of a distinct pathogenicity of the P.1 variant on our estimates by allowing the infection hospitalization rate (IHR) of the new variant to be estimated as a free parameter (see SA1 in Table 2). The relative transmissibility and prevalence did not differ statistically from the the previous estimates and thus are robust to relaxing the assumption about equal pathogenicity for both variants. Oddly enough, the data gives no support for a higher IHR of P.1. Moreover, the model fit to hospitalization data prior to the healthcare system collapse in the city of Manaus (11 January, 2021) estimated an even larger transmissibility (SA2 in Table 2). We present additional results in the Supplementary Methods (Section 5 of the SI and Fig. S2), with plots of the number of individuals over time in each compartment (shown in Fig. 1) for both variants and the cumulative curves for hospitalized individuals using the main fitting parameters.

Discussion

COVID-19 hospitalizations and frequency of the P.1 variant in clinical samples showed a sharp increase in Manaus, Brazil, starting December 2020. The fitted model describes this joint increase as the result of the emergence of P.1, estimated to be 2.6 times more transmissible than the wild-type variant. The spread of P.1 occurred despite a high estimated prevalence of infection by the wild-type virus both estimated (this present work) and empirically found7,8. The pathogenicity of P.1 is still unknown, but assuming hospitalization rates as a proxy for pathogenicity, the high P.1 transmissibility holds for different ranges of pathogenicity. Two recent studies analysed genomic data of SARS-CoV-2 from Manaus evaluating the transmissibility of the new variant3,4. Faria and collaborators integrated mortality and genomic data and, using a semi-mechanistic Bayesian model, estimated a transmissibility 1.4–2.2 times higher and 25–61% evasion of protective immunity related to the P.1 variant3. Naveca and collaborators estimated a 2.2 times higher effective reproduction number for the P.1 variant using phylogenetic methods, and suggested that P.1 is at least two times more transmissible than the parental lineage, assuming reinfections are rare4. The present work follows a different approach that can be defined as an epidemiological, model-based, and data-fitting approach, suitable for scenarios where only surveillance data are available, and applicable to other emerging variants throughout the world. Our epidemiological compartment model can provide point estimates of two key epidemiological parameters as soon as data on number of cases and frequency of a new variant are available. This is valuable for practical purposes, as estimates can be available timely for interventions and alerts. Also, we think we contribute for the field of mathematical models in epidemiology, showing a simple instance to link such class of models to data through maximum likelihood methods. Notably, all three different approaches estimated very high transmissibility of the P.1 variant.

Many knowledge gaps about the pandemic in the Amazonian region still remain. Population-based serological surveys are not available and thus prevalence was included in the fitted parameters. The analysed data overlapped with the period of the health system collapse. Aware that in-hospital fatality rates can quickly change when the health system is under stress11, we have chosen hospitalization data instead of mortality data (see subsection Dataset in Methods). Still, during the health system collapse many severe cases probably were not recorded in the system and remained unaccounted for. Our results were robust to removing this period in the sensitivity analysis (SA2). Even without P.1 emergence, the model estimates an increase in the number of cases (intrinsic growth rate parameter, see Table 2), which could be a consequence of loosening non-pharmacological interventions (NPIs)9, an effect of waning immunity, or both. Our model does not consider these effects explicitly, but by fitting the initial growth rate we indirectly account for their effects on the dynamics and on the estimation of the remaining parameters.

The impacts of a highly transmissible variant have already been highlighted by the spread of VOC B.1.1.7 in the UK, USA and Europe18. The variant B.1.1.7 has an upper-bound estimate for transmissibility of 2.35, which is smaller than our lower bound estimate for P.1. Although P.1 is highly transmissible according to our model fitting, it has rarely been found at high frequencies in other countries outside of Latin America (but see Canada cases19 and CDC analysis showing P.1 variant grows in frequency in the United States of Amerca until June, 202120). We believe that some factor may have contributed to the unsuccessfulness of P.1 variant across the world. First, we believe the warnings given by the Brazilian and Japanese authorities/researchers in January may have contained the spread of P.1, by travel ban increment to the restrictions already in place since 2019 for people who departed from Brazil or have stopped over in in the country before the P.1 emergence due to the uncontrolled epidemic situation in the country. Another important geographic aspect is that the main international airports linking Brazil to other countries are in the Southeast region (São Paulo and Rio de Janeiro), which reached high frequency of P.1 variant only after many travel bans were already in effect. Second, it should be expected that a variant of concern causing greater impact where it emerged and was naturally selected. Virus and the immune system are under a coevolutionary process which is usually composed by a geographic mosaic of coldspots and hotspots21,22. The previous variants circulating in Brazil, the environment and genetic conditions where P.1 emerged determines its success in Latin America, making this place a hotspot of the coevolution between the coronavirus and local populations. Such specific conditions might not be found in other places. Finally, it is also important to notice, for instance, that variants emerged in other continents (such as B.1.1.7 and B.1.351) have not caused the same damage in Brazil and other Latin-american countries as it caused in their place of emergence.

Notwithstanding the local emergence and likely greater concern for Latin America, P.1 has been raised to a Variant of Concern by experts of WHO and it remains in their official list up to today (together with B.1.1.7, B.1.351, and B.1.617) because of its potential related to the damage already caused and to the mutations it carries. Higher transmissibility of the P.1 variant raises strong concerns of swift upsurges in the number of cases once P.1 reaches community transmission. Although our estimate for the relative force of reinfection by the P.1 variant seems low, the impact is strong enough to drive, together with a high transmissibility, a large surge even in a population heavily affected by the wild-type variant. For instance, in Manaus, 28% of the new cases in the period considered were due to reinfections by P.1 in our estimations, which is in line with estimates from blood donors samples in Manaus city23. Relaxing the assumptiom of equal IHR for both variants highlighted that the IHR for P.1 may be lower, and then up to 40% of the new cases would be of reinfections (SA1). This lower value of IHR could be explained by the fact that more asymptomatic cases happen amongst reinfections compared to primary cases, as shown in the SIREN study24, and since our model does not consider differing proportions of asymptomatic cases between primary and reinfection cases, this led to overall reduction in IHR to the P.1 variant. However, in a scenario of low prevalence rate of infection by the wild-type variant, the high transmissibility is the most determinant parameter of the rapid increase in the number of cases and can lead to even sharper outbreaks. The P.1 variant has already been detected in at least 36 countries, with local transmission currently confirmed in five of them18. This points to the urgency of reinforcing measures to avoid a global spread of P.1, which include an agile global genomic surveillance network.

Further, to improve our ability to deal with the threat of P.1, it is urgent to study (i) the pathogenicity of the P.1 variant, since this trait, in association with high transmissibility, can drive even well-prepared health systems to collapse; (ii) the efficacy of current vaccines for P.1 variant infections; and iii) the main factors promoting the emergence of VOCs, specially the roles of previous high prevalence and of waning immunity.