Changes in notifiable infectious disease incidence in China during the COVID-19 pandemic

Nationwide nonpharmaceutical interventions (NPIs) have been effective at mitigating the spread of the novel coronavirus disease (COVID-19), but their broad impact on other diseases remains under-investigated. Here we report an ecological analysis comparing the incidence of 31 major notifiable infectious diseases in China in 2020 to the average level during 2014-2019, controlling for temporal phases defined by NPI intensity levels. Respiratory diseases and gastrointestinal or enteroviral diseases declined more than sexually transmitted or bloodborne diseases and vector-borne or zoonotic diseases. Early pandemic phases with more stringent NPIs were associated with greater reductions in disease incidence. Non-respiratory diseases, such as hand, foot and mouth disease, rebounded substantially towards the end of the year 2020 as the NPIs were relaxed. Statistical modeling analyses confirm that strong NPIs were associated with a broad mitigation effect on communicable diseases, but resurgence of non-respiratory diseases should be expected when the NPIs, especially restrictions of human movement and gathering, become less stringent.


Generalized Linear Models
In the modeling analysis, we assess the association of NPIs with disease incidences while accounting for changes in healthcare seeking behavior and autocorrelation among the case numbers. We use the monthly volume of outpatient visits as a surrogate for healthcare seeking behavior. Outpatient data of Hubei and Tibet are not available and therefore the two provinces are excluded from the GLM analysis. As outpatient data are available only at the monthly scale, the outcome of the GLM is the monthly case numbers, and we redefine the intervention phases as Phase I (Jan), Phase II (Feb and Mar), Phase III (Apr−Aug) and Phase IV (Sep−Dec), which is largely consistent with the previous definition. The modeling steps are conducted as follows for each specific notifiable infectious disease in South China and North China.
(1). We first use X13ARIMA-SEATS, a time series decomposition approach, to decompose the time series of monthly reported case numbers into a seasonal component and a nonseasonal component, and we refer to the latter as seasonalityremoved numbers of reported cases. 2 Briefly, SEATS (signal extraction in ARIMA time series) is a seasonal adjustment tool based on ARIMA model decomposition of time series data. 3 SEATS decomposes the linearized time series into trend, seasonal, transitory and irregular components, provides forecasts for these components together with associated standard errors, and assign deterministic effects to each component. A fundamental assumption made by SEATS is that the linearized time series (which is the log of monthly case number in our analysis) follows the ARIMA model is the time series, is the regression part with covariates , is the white noise with mean 0 and variance , and and : the non-seasonal and seasonal backshift operators, i.e., ( ) = , ( ) = ; ( ) = 1 − − ⋯ − : a non-seasonal autoregressive (AR) operator of order ; Φ( ) = 1 − − ⋯ − : a seasonal autoregressive operator of order ; (1 − ) and (1 − ) : non-seasonal and seasonal differencing operators of orders and , respectively; ( ) = 1 − − ⋯ − : a non-seasonal moving average (MA) of order ; Θ( ) = 1 − Θ − ⋯ − Θ : a seasonal moving average of order ; We perform the decomposition using the SEATS implemented in the R package "seasonal" (version 1.8.3). 4 For the regression part, we only consider the holiday effect of Chinese New Year generated by the "genhol" function. One nice feature of SEATS is that it automatically detects shifts in the mean level of the time series, suggesting that it can partially account for the impact of NPIs during 2020 when estimating seasonality. We use the default setting for the selection of the orders of all operators using the Bayesian Information Criterion (BIC) provided by "seasonal". Inference of model parameters is based on a combination of the iterative least square (IRLS) step for the regression parameters and maximum likelihood for other parameters, where the maximum likelihood estimation is based on the standard approach by Box and Jenkins (1976). 5 Two outputs are obtained from SEATS, the seasonality-removed monthly case numbers and the seasonal trend itself. The seasonal trend is then extrapolated to 2020 to obtain seasonality-removed case numbers during the pandemic.
(2). A generalized linear model is fitted to the seasonality-removed monthly case numbers, which we assume follow a quasi-Poisson distribution to account for overdispersion. The following covariates are adjusted in the model: (i) Three binary indicators for COVID-19 NPI phases during 2020: phase II for Feb−Mar 2020, Phase III for Apr−Aug 2020, and phase IV for Sep−Dec 2020. Other months from Jan 2014 to Jan 2020 serve as the reference when no NPIs was implemented.
(ii) Monthly volume of outpatient visits. Outpatient visit data in December of every year were not available and we imputed these numbers with the average between the adjacent November and January. (iii) Long-term effect as a function of year, which can be none, linear or quadratic selected by the quasi-Akaike information criterion (QAIC). The functional form with the smallest QAIC was selected (Supplementary Table 4). (iv) The product of the number of days in each month and the population size was used as the offset. Residuals of the fitted model were calculated. The GLM and residuals obtained are referred to as the stage-1 GLM and stage-1 residuals, respectively.
(3). Another GLM is fitted to the seasonality-removed case numbers, which is the same as the GLM above except that stage-1 residuals were included as a covariate, with a one-month lag, to account for autocorrelation in the outcome. 6 Residuals from this model are calculated and auto-correlation function is plotted to assess whether autocorrelation has been adequately removed (see supplementary Figs. 23−26). The GLM and residuals obtained are referred to as the stage-2 GLM and stage-2 residuals, respectively. A 95% confidence band for model-fitted cases counts is constructed by bootstrapping model coefficients from the asymptotic distribution (multivariate normal) of the coefficient estimates. (4). Finally, the seasonal component estimated by X13ARIMA-SEATS is multiplied (the seasonal trend is estimated at the log scale in step 1) back to the fitted stage-2 GLM to project counterfactual numbers of reported cases during 2020 had the COVID-19 NPIs been absent. As healthcare seeking behavior was also altered by the pandemic, we assume the outpatient visit data in 2020 were the same as in 2019 for the projection of 2020. The stage-1 residuals are also dropped from the covariates of the stage-2 GLM for projection. To construct s 95% confidence band for the projected trajectory, we sample both coefficient from their asymptotic distribution and residuals from the stage-1 residuals associated with months during 2014 to 2019. The residuals are sampled in blocks of 10 months to preserve the autocorrelation structure. Sampled residuals are then added to the model-projected values to form the confidence band.

Sensitivity Analysis
As an alternative adjustment for seasonal trends, we skip the SEATS decomposition step but incorporate harmonic functions into stage-1 GLM and fit the model to the original data to obtain seasonality-adjusted stage-1 residuals. We consider harmonic functions with annual or semiannual seasonal cycles, and the exact number of harmonics is chosen by QAIC for each pathogen and region. Then, a stage-2 GLM is fitted to the original case numbers, which is the same as the stage-1 GLM except that stage-1 residuals are included as a covariate, with a one-month lag, to account for autocorrelation.
We show the sensitivity results in Supplementary Figs. 18−21 and Supplementary Table 2, which are qualitatively similar to the primary analysis. Interestingly, SEATS allows slight variation of seasonality from year to year, in comparison to invariant seasonality enforced by harmonic functions, e.g., as seen by comparing the model-projected mumps epidemic curves during 2020 between Figure 2 (SEATS+GLM) and Supplementary Figure 18 (GLM with harmonic functions).
We also considered a single GLM including not only long-term trend and harmonic functions but also an autoregressive term of the case number from the previous month, log( ). 7 However, this model is unlikely to yield reliable results (not shown) mostly due to the high correlation between the autoregressive term and the temporal trend combining the long-term trend and harmonic functions (Supplementary Table 5).
In addition, we performed a sensitivity analysis using a single-stage GLM that only compares incidence in the four periods between 2014−2019 and 2020, without adjusting for monthly healthcare visits, yearly long-term trend, seasonality or autocorrelation (Supplementary Table 6). This unadjusted model detected fewer differences in respiratory diseases and Gastrointestinal or enteroviral disease than the adjusted model in Table 2 of the main text. represents one year starting from 2014 (inner) to 2020 (outer). Each sector represents one infectious disease. Diseases were grouped into four categories (C1−C4): respiratory diseases, gastrointestinal or enteroviral diseases, sexually transmitted or bloodborne diseases, and vector-borne or zoonotic diseases. The colors of the largest outer ring indicate the average annual incidence of each category over 2014−2020.  The black lines indicate the 95% CIs of the percent changes. The percent change was calculated by:

Supplementary
where ( ) indicates the average incidence during phase k over 2014 to 2019 and ( ) indicates the incidence specific to phase k of 2020 as shown in Table 1 in the main text. The black lines indicate the error bars. The error bars indicating 95% confidence intervals for the percent changes were calculated by the delta method.  Table 1 in the main text. The black lines indicate the error bars. The error bars indicating 95% confidence intervals for the percent changes were calculated by the delta method.  Table 1 in the main text. The black lines indicate the error bars. The error bars indicating 95% confidence intervals for the percent changes were calculated by the delta method.  Table 1 in the main text. The black lines indicate the error bars. The error bars indicating 95% confidence intervals for the percent changes were calculated by the delta method.  Table 2. Sensitivity analysis for model-estimated incidence rate ratio (IRR) of 26 selected infectious diseases. Generalized linear models (GLM) were used for estimating the IRRs measuring the association of NPIs with disease trends. The models were adjusted for monthly healthcare visits, seasonality and yearly long-term trend, and Seasonality was adjusted for by harmonic functions in the GLM instead of SEATS. IRR bellow 1 with P<0.05 indicates significant decline of incidence rate in year 2020 compared to year 2014-2019. All p-values are two-sided and not adjusted for multiple comparisons. * Data of Hubei and Tibet (both in South China) were not included in modelling due to unavailable healthcare visits data during the COVID-19 pandemic. † HFMD, hand, foot and mouth disease; AHC, acute hemorrhagic conjunctivitis; HFRS, hemorrhagic fever with renal syndrome. ‡ Phase II: Feb 2020 and Mar 2020; Phase III: Apr 2020 to Aug 2020; Phase IV: Sep 2020 to Dec 2020. The reference period is Jan 2014 to Jan 2020.

Supplementary
-: GLM was not fitted for disease with median monthly case number is less than 50. Statistically significant reductions (IRR<1) are displayed in bolded font.