Prediction of daily and cumulative cases for COVID-19 infection based on reproductive number (R0) in Karnataka: a data-driven analytics

To estimate the reproductive number (R0) of the coronavirus in the present scenario and to predict the incidence of daily and probable cumulative cases, by 20 August, 2020 for Karnataka state in India. The model used serial interval with a gamma distribution and applied ‘early R’ to estimate the R0 and ‘projections’ package in R program. This was performed to mimic the probable cumulative epidemic trajectories and predict future daily incidence by fitting the data to existing daily incidence and the estimated R0 by a model based on the assumption that daily incidence follows Poisson distribution. The maximum-likelihood (ML) value of R0 was 2.242 for COVID-19 outbreak, as on June 2020. The median with 95% CI of R0 values was 2.242 (1.50–3.00) estimated by bootstrap resampling method. The expected number of new cases for the next 60 days would progressively increase, and the estimated cumulative cases would reach 27,238 (26,008–28,467) at the end of 60th day in the future. But, if R0 value was doubled the estimated total number of cumulative cases would increase up to 432,411 (400,929–463,893) and if, R0 increase by 50%, the cases would increase up to 86,386 (80,910–91,861). The probable outbreak size and future daily cumulative incidence are largely dependent on the change in R0 values. Hence, it is vital to expedite the hospital provisions, medical facility enhancement work, and number of random tests for COVID-19 at a very rapid pace to prepare the state for exponential growth in next 2 months.

Coronavirus disease (COVID-19), a novel virus originated from Wuhan a city in the Hubei Province of China at the end of 2019, has progressed rapidly to become a global epidemic. In February 2020, the World Health Organization (WHO) designated the disease as COVID-19 and declared it as a global pandemic, as the disease has spread to nearly all the continents and the cases are rising at an exponential rate 1 .
The present study is aimed at predicting the spreading efficiency of the COVID-19 in Karnataka, which is one among the 28 states of India. The state has a total population of 6.41 crore, which is ~ 4.7% of the overall population of India. The first case of the COVID-19 in India was reported on January 30, 2020 in a couple who had a travel history to Dubai 2 . In Karnataka, the first case was detected on March 09, 2020 3 . On March 12, 2020 the World Health Organization (WHO) announced COVID-19 as a global pandemic to emphasize on the rapid spread of the disease to multiple countries and continents 4 .
As on June 23, 2020 the state's case fatality rate was 1.01%, which was significantly lower than the global average and other Indian states with moderate number of cases 5 . Karnataka can be considered as moderately affected state with 9399 confirmed cases, 5730 recovered and 142 deaths 6 . More than two-thirds of the cases in the state have emerged from the migrant travelers from other states, mainly Maharashtra, Tamil Nadu, Delhi, Gujarat etc. www.nature.com/scientificreports/ Restricted space and high population density of the country are the key factors influencing the transmissibility of COVID-19. The forecasting of the probable number of cases is essential to create awareness and arrange effective disease control measures 7 . There is a major threat associated with increase in disease spread, as most of the population belongs to below poverty line and the country does not have huge resources for medical interventions proportional to the population. The options to manage the disease are acquiring herd immunity and implementing lockdown or restricting the population movement 8 . In the present study, we have calculated the reproductive number (R 0 ) of COVID-19 at an early stage of viral disease outbreak, to predict the daily incidence and cumulative cases for the next sixty days (till August 20, 2020).

Methodology
Focus. A confirmed case of COVID-19 infection is defined as those with a positive result for viral infection and history of acute respiratory illness for the collected specimens. A suspected case is defined as a patient with symptoms of COVID-19 infection, but not confirmed by viral nucleic acid testing. An actual estimate of the serial interval was considered by estimating the time from onset of illness in a primary case (infector) to illness onset in a secondary case (infected) in a transmission chain 9 . Serial interval can only be estimated by linking dates of onset for infector-infected data pairs, and these links are difficult to be established. R 0 is defined as the actual expected number of secondary cases that one primary case will generate in a susceptible population 10 .
Data source. All the data were derived from cloud sourced database published in the official website of Ministry of Health and Family Welfare of India 11 . The data for model development were updated from April 14, 2020 to June 21, 2020. However, the initial data were not considered, as the serial interval did not reflect the average behavior for effectively modelling the epidemic curve and number of effective cases was very low due to imposing strict lockdown in the state.

Model development and statistical analysis.
To estimate the contiguous transmissibility of COVID-19 in the state, the study employed the 'early R' statistical package to evaluate the R 0 in the early stage of the disease outbreak 12 . R refers to the average number of secondary infected cases by one primary infected person during the infectious period. If R > 1, the number of cases increases; if R < 1, the number of infected cases reduces, and the disease will die out. When R = 1, it suggests that infectious disease has become an endemic within the community.
R is calculated by the probability of the spread of infection upon contact with an infected person, based on the number of contacts and the duration within the infected person that can help to spread the infection. Here, β refers to the probability of infection transmitted (transmission rate) multiplied by the contact levels and 1/α is the duration of infection transmitted. The mathematical model used to estimate R for COVID-19 in this study is represented below: Serial interval (SI) distribution data was calculated to estimate R 0 , and there was inadequate information about total number of cases to estimate serial interval. The value of serial interval (mean and standard deviation) was fixed with a gamma distribution, as earlier described 13 . The 'get_R' function was used to estimate the distribution of R 0 with maximum-likelihood (ML). A bootstrap strategy with 500 times resampling was adopted to get larger set of likely R 0 values. These R 0 values were subsequently presented in a histogram format, calculated the cumulative cases and interquartile range for these values. R package of 'projections' was used to estimate the probable epidemic trajectories simulation, prediction of future daily incidence and cumulative cases 14 . The simulation and prediction were generated by fitting the data to an existing daily incidence, a serial interval distribution, and the estimated R 0 by a model based on the assumption that daily incidence followed Poisson distribution determined by daily infectiousness, which is denoted as w (t -s) is the vector of probability mass function (PMF) of serial interval distribution and y s is the real-time incidence at s time 14,15 . The study also computed the prediction of daily incidence and cumulative cases for the next 60 days using a bootstrap resampling method (500 times). All the statistical analysis and disease forecast model was developed by using R version 3.6.3.

Results
The COVID-19 serial interval distribution is shown in Fig. 1A. Using the serial interval distribution, as described above, the maximum likelihood estimate (MLE) value of R 0 was found as 2.242 for COVID-19 outbreak at the present stage in Karnataka (Fig. 1B). Bootstrap strategy was adopted to obtain 500 likely R 0 values. The distribution of these R 0 values was presented as histogram plot (Fig. 1C). The estimated median with 95% confidence interval (CI) of R 0 values was 2.242 (1.50-3.00), as on June 21, 2020.
The probable number of daily new cases for the next 60 days was calculated based on existing data (Fig. 2). The daily number, at seven days' interval (between June 22, 2020 and August 20, 2020) with 95% confidence interval (CI) of probable new cases at actual R 0 (2.242) will be 255 (

Discussion
Using the current data and the proposed pandemic model, the current study provides an estimation of the R 0 for COVID-19 during the present stage of disease spread in Karnataka. The estimated distribution of R 0 is about 2.242 (95% CI 1.50-3.00), which is similar to a set of previously published estimates, ranging from 2.286 (95% CI 1.4-3.9) to 3.58 (95% CI 2.89-4.39). It is equivalent to the reported estimate from a recent study with a larger sample size, which suggested the R 0 of 3.77 (95% CI 3.51-4.05) 4,16,17 . The wide ranges and the difference in R 0 values reported by different studies indicates that exact estimation of R 0 is quite challenging, because it is difficult to calculate the exact total number of infected cases during an epidemic. The R 0 value is usually affected by a set of factors like analysis, environmental circumstances, modeling procedures and statistical caliber 18 .  In the present study, the accuracy of estimated R 0 is mainly dependent on the identification of all the infected cases in Karnataka. According to the report from Ministry of Health and Family Welfare of India, all the suspected cases and cases who had close contact with confirmed cases contracted viral infection after testing. Therefore, the percentage of unidentified cases is thought to be very low. In contrast, previous studies were mainly focused on the estimation of R 0 in Wuhan, China 4,16 . Moreover, in contrast to such studies, the disease that can be transmitted from animals to individuals for SARS-Co V was absent among the extensively larger population of India 4,9 . Therefore, the R 0 estimated in the current study only depicts the human-to-human transmissibility of COVID-19 and not from animals to human transmissibility. In Karnataka, the community level transmission has already begun and cases could be rising in the next 2 months 19 . The current results have concluded that the R 0 is low due to enforcement of strict quarantine and lockdown measures in Karnataka. The current estimations also denote the silent spread of the virus at an exponential rate. Most of the patients, who are testing positive, are asymptomatic. Such cases within incubation may also cause continuous spread of the novel virus and this may also partially explain why low R 0 was noted at current stage 19 .
The current study has also estimated the daily incidence, cumulative cases, and the probable size of the outbreak for the next sixty days. According to the analysis, the daily incidence and the magnitude of outbreak are largely dependent on the value of R 0 . If the R 0 (2.242) value is presumed to remain unaffected, the probable cumulative number of infected cases may reach 27,238 (26,008-28,467) at the 60th day, suggesting more population would be infected in future. Karnataka has adopted strict measures to control the spread of infection by enforcing strict lockdown and quarantining the suspected cases. As a result, the transmissibility is expected to reduce in the future days. But, due to relaxation in lockdown, improper social distancing, and population mixing would lead to natural process of disease transmission, infected cases might increase in future due to change in R 0 value. If the R 0 (2.802) value increased by 50%, the infected cases will increase up to 86,386 (80,910-91,861), and if, R 0 (3.923) increased by 75%, the infected cases will rise up to 184,167 (172,572-195,762). The scenarios such as opening of schools and colleges, migration of travelers from highly infected states, unhygienic condition at markets and crowded places, increase in religious conglomeration, and opening of overseas travelling may contribute to increase in number of active cases at rapid pace. This may lead to doubling of R 0 (4.484) than actual R 0 (2.242) values and the infected probable cumulative cases will reach up to 432,411 (400,929-463,893) at the end of 60th day (20 August, 2020). The present data-driven analytics is mainly aimed at predicting the probable epidemic size. It also highlights the importance of controlling the transmissibility among population, to prepare the state by arranging all medical facilities and resources to manage the estimated exponential increase. R 0 is not an intrinsic characteristic value of SARS-CoV pathogen, but it describes the transmissibility of that pathogen within the specific population and settings. Hence, R 0 mainly depends on socio-demographic variable and the biology of infectious agent. Serial interval indicates that COVID-19 infection leads to rapid cycles of transmission from one generation of cases to the next. The difference between these distributions suggests that using serial interval estimates from SARS data will result in overestimation of the COVID-19 basic reproduction number and correct ascertainment on dates of illness onset is critical to calculate the serial interval 20,21 .
The current trend shows that there will be a geometric progression in the upcoming days due to relaxation of lockdown and negligent attitude of citizens regarding the infection spread, such as not using face mask, improper social distancing, mixing of more population, going to unhygienic market places, and not following strict advices given by the public health officials to consult doctors. Although the patient recovery rate is rising, the current trends indicate unprecedented increase in number of daily new cases and death rate due to COVID-19 infection.  www.nature.com/scientificreports/ Even though stringent control measures are being implemented by the Government, there is increased chances of following the predicted geometric progression pattern, as the new cases identified in state are mainly in healthcare workers, police, BMTC, KSRTC and railway staffs, auto-rickshaw drivers, footpath vendors, delivery boys and salesmen. Such individuals may serve as super spreaders and it is essential to identify such hospital-based outbreaks and community-based clusters through continuous testing. Hospital provisions (in both public and private sectors) and medical facility enhancement work and number of random tests for COVID-19 infection should be continued at a very rapid pace to prepare the state for managing the predicted exponential growth. Through such current interventions and preparations, the Government of Karnataka is looking forward to flatten the pandemic curve.

Conclusion
At the present stage of infection in Karnataka, the estimated R 0 with 95% CI for COVID-19 is about 2.242 (1.50-3.00). The future daily incidence, probable cumulative cases and outbreak size are mainly dependent on value of R 0 . Due to the relaxation in lockdown, negligent attitude of people in maintaining social distancing, not taking precautionary measures and population mixing may accentuate the natural process of disease transmission. The number of active cases may double in the forthcoming days due to change in R 0 value. The present findings highlight the importance of reducing transmissibility in controlling the probable outbreak size as well as to enhance the hospital provisions and medical facility (number of random tests) at very rapid pace to prepare the state for managing the worst situation for the months of September and October. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.