A prognostic dynamic model applicable to infectious diseases providing easily visualized guides: a case study of COVID-19 in the UK

A reasonable prediction of infectious diseases’ transmission process under different disease control strategies is an important reference point for policy makers. Here we established a dynamic transmission model via Python and realized comprehensive regulation of disease control measures. We classified government interventions into three categories and introduced three parameters as descriptions for the key points in disease control, these being intraregional growth rate, interregional communication rate, and detection rate of infectors. Our simulation predicts the infection by COVID-19 in the UK would be out of control in 73 days without any interventions; at the same time, herd immunity acquisition will begin from the epicentre. After we introduced government interventions, a single intervention is effective in disease control but at huge expense, while combined interventions would be more efficient, among which, enhancing detection number is crucial in the control strategy for COVID-19. In addition, we calculated requirements for the most effective vaccination strategy based on infection numbers in a real situation. Our model was programmed with iterative algorithms, and visualized via cellular automata; it can be applied to similar epidemics in other regions if the basic parameters are inputted, and is able to synthetically mimic the effect of multiple factors in infectious disease control.


Results
Dynamic transmission model. In this article we summarised governmental interventions into three key strategies. First, the intraregional transmission probability has been lowered by protective measures or vaccinations, which aim to reduce the possibility of people contracting the disease 24,30 . Second, the mobility of the population has been reduced by government-level measures like city lock-down, border sealing, and compulsory stay-at-home policies 19 . Last but not least, healthcare system capacity has been enhanced to make sure as many patients as possible are quarantined and treated, while enhancement of detection capacity has aided early detection and immediate isolation 31 . We introduced interregional communication rate (c) to describe the coefficient of disease transmission between communities to take the impact of population mobility into consideration. Initial intraregional growth rate (m) was introduced to describe internal infection among communities, which indicates the influence of personal protection measures such as keeping a social distance and face covering. During the simulated transmission process, intraregional growth rate (m) changes continuously as it is affected by patient recovery and thus gain of immunity. Detection rate of infectors (k) was introduced to describe the possibility of an infector (including asymptomatic and pre-symptomatic individuals) being detected.
N: Daily infection number, c: Interregional communication coefficient (travel rate over 15 km 2 ), m: Self-growth rate (intraregional spreading coefficient), h: General percentage of hospitalization (including death), s: General percentage of self-healing, t h : Average latent period, t s : Average self-heal period, H: Daily hospitalization number, S: Daily self-healing number, p: Population.
We use cellular automata as a platform of modelling; in cellular automata, cells are arranged as matrixes such as: 1 2 3 4 5 6 7 8 9 ; each cell represents a region and people tend to migrate between two adjacent cells (details provided in Supplementary Information, Fig. S1). In the equations, N5(t + 1) is the daily infection number on day t + 1 in cell 5, which equals the daily infection number on day t added to the effect of migration in and out, then multiplied by the intraregional spreading coefficient, and subtraction of the day's number going to hospital and selfhealing. The percentage of people who migrate out of a square cell from one side is c, therefore, the percentage of people who migrate out of a whole square is 4c. Then we introduce controlled parameters to describe the situation after interventions (details in "Methods" section). To be realistic, controlled interregional communication rate (c c ) and detection rate of infectors (k) are steady state values while controlled intraregional growth rate (m c ) is an initial value varying with the immunity acquisition number. www.nature.com/scientificreports/ m c : Initial value of controlled intraregional growth rate, c c : Controlled interregional communication rate, k: Detection rate of infectors, p: Regional population. We also take advantage of the definition of m which initially relies on R 0 and changes with the percentage of immunization to calculate the number of vaccinations required in disease control. Visualised transmission process of COVID-19 in the UK. Videos showing the prognostic transmission process on the UK and London maps were generated by Python 3 based on our model (details shown in "Methods" section and Supplementary Information). Here we present the infection curves ( Fig. 1) and visualised transmission maps (Fig. 2) based on our simulation of the infection which starts with the initial daily infection number of each region in the UK on 4th March 2020 32 . Due to the limitations of the iterative algorithm, the earliest date we can start simulating disease control interventions is one day after the first self-heal period. Therefore, in the first 15 days (a self-heal period) of the simulation, we use the initial parameters according to COVID-19 epidemiological characteristics, the initial intraregional growth rate as 0.4 8,33 , the interregional travel rate as 0.1 38 , and the detection rate of infectors as 0 ( Table 1, data resources presented in "Methods" section). Without any governmental intervention, the accumulated infection number in 100 days may exceed 70% of the UK population (Fig. 1A). Since the simulation was interrupted when the cumulative infection number exceeds the local population, COVID-19 transmission in the UK suspends on the 73rd day (Fig. 1A). The daily infection scale will exceed 14.8 million people on the 60th day of domestic infection ( Fig. 1B), which accounts for nearly a quarter of the UK population 34 . When we look at the regional infection curves, we can see that the epidemic in London develops differently from other regions in that the infection in the former increases sharply from the 30th to the 58th day, then decreases rapidly (Fig. 1C). This indicates that, without interventions, the infection will first be eliminated in London while the conditions are still getting worse in other regions.
From the visualized transmission maps we can also see that when the infection peak is reached, the COVID-19 outbreak will ameliorate in the epicentre while it is still developing in surrounding regions ( Fig. 2A). As shown in the screenshots of the simulation videos (Supplementary video files), the epidemic spreads from the epicentre to the periphery and surprisingly leaves the centre epiclean ( Fig. 2A). Here we also simulated the COVID-19 transmission process in London based on the initial daily infection number of each borough on 11th March (Fig. 2B) which shows similar results 32 . Despite the fact that a second wave of infection may occur from the epicentre of the outbreak once again, the periphery may be the place hit by the epidemic more severely in a later period. COVID-19 can be brought under control by a single intervention at the early stage, but at huge expense. We ran simulations to see how effective single interventions are in flattening the daily infection curves. When the controlled intraregional growth rate (m c ) was in the range of 0.05-0.4 (0.75 < R 0 < 6, m = R 0 /15 days , details in the "Methods" section) 14,35 , the daily infection curve became progressively flatter with reduced m c while the period of the epidemic became longer. When the controlled intraregional growth rate was lower than 0.1 (R 0 < 1.5), this was effective in controlling the propagation tendency, but a second infection wave was likely to occur. In addition, it was not possible to completely eliminate infection cases within a 100-day period by only controlling m c , and the epidemic would thus last for a longer time (Fig. 3A). When the controlled travel parameter (c c ) was in the range of 0.0125-0.01, the overall infection trends were downward but it was not possible to control the daily infection scale to an acceptable level (Fig. 3B). An increasing detection rate of infectors (k) was capable of controlling the infection scales stably and efficiently as well as eliminating the infection within a short period (Fig. 3C). The daily infection number dropped while k was enhanced, and as shown in Fig. 3, controlling k brings better disease control results than controlling m c and c c . When k is as high as 0.175, it was possible to maintain the daily infection number curve at a flat level.
Combined interventions will significantly enhance disease control efficiency. As shown in Fig. 3, with achievable single interventions, it is hard to contain the peak daily infection number to acceptable levels. Therefore, as shown in Fig. 4, combined interventions were applied to search for optimum disease control strategies.
The average length of the hospital stay for COVID-19 patients is 7 days 36 . The number of inpatient beds available for COVID-19 patients at the early stage of COVID-19 in the UK was around 6000-7000 32 . So the daily number of hospitalizations should be kept at around 1,000, which is a premise for our optimum disease control strategies. As shown in mobility trend reports by Apple Maps, the general mobility in the UK was reduced by www.nature.com/scientificreports/ 20-50% during the COVID-19 lockdowns (January 2020-January 2021) 37 . Hence, we suppose the interregional communication rate in the UK to be reduced from 0.1 to 0.08-0.05 38 , indicating that 20% ~ 32% people can travel beyond 4 km (2.5 miles) every day. We selected several representative conditions to quantify the UK tier system, according to this tier system being based on the mobility trends and four-tier alert levels: m c = 0.17 (R 0 = 2.5), m c = 0.23 (R 0 = 3.5) and m c = 0.3 (R 0 = 4.5) with c c in the range of 0.08-0.05 (Table 2) 32,35,37 ; then we ran simulations and found proper relevant detection rates. The disease control process occurring systematically with three interventions was simulated and presented as transmission curves ( Fig. 4a, b, c). With regard to these different combinations of interventions, we can conclude that when tier 1, tier 2, and tier 3 lockdown measures are implemented, the detection rate should be ensured as 16%, 11.5% and 4.5%, and as shown in Fig. 4d, e, f, the highest daily numbers to hospital (H) under these conditions do not exceed hospital capacity.

Application for calculating vaccination demand to end COVID-19.
A key current strategy to combat COVID-19 is the development of an effective vaccine. We thereby provided a method of calculating the required vaccination numbers based on this model. We started the simulation with the current infection number as an initial condition, and intraregional growth rate was influenced by vaccination and existing natural immunity. For example, in early January 2021, the average daily infection cases in the UK were around 35,000 with 3,300 being in London 39   www.nature.com/scientificreports/  (Table 2).

Discussion
As we have summarised, intervention measures have focused on three aspects: the intraregional spreading rate, detection rate of infectors and interregional communication rate. Long-duration intraregional lockdown effectively reduced the burden of the pandemic 19,20 , however, without cutting off the source of infection, the epidemic will not be eliminated in 100 days, even if the R 0 number is very low (Fig. 3A). With respect to interregional communication rate, during the first round lockdown and associated impact on travel, 46% of driving, 62% of public transport and 33% walking trips, were reduced on the days of lock-down compared to normal days 37 . However, the data shows the reduction started from 21st March 37 , when the cases of infection had already spread all over the country, therefore intraregional growth was already occurring at this point 35 . This also matches our simulation results and shows that it is difficult to control the infectious trend by simply reducing the travel parameter once the infection has spread to all regions. We then considered rate of detection and quarantine for infectors. In our simulation, around one fifth of infectors must be detected and strictly isolated even if they are in the latent period of the disease or asymptomatic, and enhancing detection rate of infectors (k) is shown to be the most efficient intervention to bring the infection scale down as well as shorten the intervention period.
However, controlling all the single disease control parameters to ideal values is difficult in real conditions because of the special characteristics of COVID-19. Detection and isolation of early-stage and asymptomatic infectors is a big challenge for healthcare systems, and this was particularly the case with the immature detection technologies and limited resources in the first phase of the COVID-19 outbreak 14,15 . Therefore, our findings support the conclusion that COVID-19 spread must be controlled by multiple combined strategies and as early as possible ( Supplementary Information Fig. S2). The initial R 0 value was around 5.81 in the UK 9 . To reduce the social burden as well as balance the needs of the economy and disease control, we believe that controlling R 0 within the range of 2.5-4.5 and mobility reduced by 20-32% (tier 1-3) is a reachable goal with proper control measures taken at the beginning of the period of interventions 17,32,35 . To keep the peak daily number of hospitalizations within acceptable levels when R 0 is immediately controlled at 2.5 (tier 3), with intermediate travelling control policy, 5% of infectors must be detected and quarantined. When R 0 is controlled at 4.5 (tier 1), our recommendation is that the detection rate should be enhanced to at least 20%.  www.nature.com/scientificreports/ Moreover, our simulation showed that the location at which the epidemic is most severe, was where the epidemic first began to disappear. So we suggest that instead of gathering all detection systems and resources to the main areas affected by the epidemic, distributing these resources to the peripheral regions will be a more efficient way to save resources and bring the epidemic under control. We also provided a potential method of calculating required vaccination numbers based on the actual infection number, for example, our simulation shows that when the infection number is around 220,000 in the UK and 22,000 in London 32 , the number of vaccinations in the UK should not be less than 45 million in the UK and 6.6 million in London.
Our study presents a few limitations due to model design as well as the nature of cellular automata. One such limitation is the inability of cellular automata to mimic long-distance migrations like trips by plane during the early stage of the disease transmission, as the chroma are only transmitted between two adjacent cells at a time  www.nature.com/scientificreports/ ( Supplementary Information Fig. S1). Another limitation is that mimicking the change of R 0 through the process of viral mutation is not applicable. Future optimizations of such modelling studies may focus on plugging evolving parameters relating to the variations in the virus in the longer term 40 . It will also be interesting to introduce new parameters to quantify other critical factors affecting epidemic transmission from social, economic, environmental, demographic, climatological, and health risk angles 23,25,26,41 .

Conclusion
This study is a prognostic analysis of infectious disease development on the strength of an infectious-hospitalized-self-heal (IHS) mathematic model with the first wave of COVID-19 in the UK as an example. The model is designed to match the epidemiological characteristics of infectious diseases with similarities to COVID-19, in particular ones with asymptomatic and pre-symptomatic infectivity. Through Python design, we realized the systematic regulation of intraregional growth rate, interregional communication coefficient, and detection rate. It is easy to evaluate the disease control effect by adjusting parameters and thus we can seek optimal solutions. In addition, we have found that to achieve better control effects in the mid-term of the epidemic, more attention should be paid to the surrounding areas of the epicentre. Moreover, our model can also be applied to estimate the quantity of vaccination demand based on realistic situations to provide guidance for vaccination production.
This model can also be applied in the future to predict the spread of similar infectious diseases in different regions. It only needs to input specific disease parameters in the system, such as incubation period, self-healing period, self-healing rate and so on. This model makes it convenient to quickly find the optimal solutions for comprehensive interventions and take action, which can be helpful in future public health decision-making to reduce morbidity and mortality.

Assumptions.
1. The population is approximated to be constant and evenly distributed within each geographical region. 2. Death rate is counted as a part of the percentage of cases that are admitted to hospital. 3. Infected people are contagious constantly from the beginning to the end of the incubation period as well as during the illness stage. 4. All the population are at the same risk of infection. 5. All patients have contracted the virus through secondary infection; considering the high population mobility, primary infected patients, which represent a tiny percentage, are omitted.
Automata cell establishment. Cellular automata is a dynamic system that is discrete in time and space; it consists of a regular grid of cells, with each one being in a finite number of states. In our model, the disease transmission was described as partial cellular interaction leading to global change. A geographical region was regarded as a two-dimension network. To input this into cellular automata, each network was deemed as a cell while each cell stands for the location of a group of people. Pixels were downscaled to correspond to the area of the cells. Each cell was selected and separated according to the red, green, blue (RGB) value of the map (Supplementary Information Fig. S1). Red colour chromatic value in the pixels represents the severity of the epidemic in the corresponding regions. The minimum value (r = 0) means no cases while the maximum value (re = 225) represents the population of the cell. The epidemic information and regional population of each cell was set initially 32,34 , and the number of people who migrate each day depends on the interregional communication coefficient and the local population. www.nature.com/scientificreports/ A method of convolution kernel was applied for calculation of migrating cases. We suppose that infection starts from one cell in each region, which was represented as a red dot on the map. The location of the dot was randomly chosen, and pseudo-random number seeds were fixed. The epidemic is assumed not to transmit to non-populated areas outside the coast where the infection number was forcibly set as zero.
Dynamical equations. Infectors go to the hospital and become self-healed only after the appearance of symptoms, and thus the precise number that go to hospital and self-heal depends on the infection number one period previously. The day's number to hospital in the cell is dependent on the daily infection number that occurred six days (the average latent period) previously in local and surrounding regions. Considering that some infectors will continually migrate between regions, these infectors who are infected 6 days previously and are currently in cell 5 can be divided into two parts: infectors who were infected in cell 5 and remained in cell 5 (local infectors who never migrate), and infectors who were infected in other cells and moved into the cell 5 in the previous 6 days. Assuming the number of infectors in cell 5 at the beginning is Y 5 .
Therefore, replacing Y with the exact number of infectors, the daily number to hospital of local infectors in cell 5 on day t is calculated as Next, we consider the number of infectors who move into cell 5. Suppose the number of people in group Y in adjacent cells of Y 5 are Y 2 , Y 4 , Y 6 , Y 8 , and add up to Y 2,4,6,8 . Since each cell has only one side in contact with cell 5, on the first day the number of people in group Y who move into cell 5 is Y 2,4,6,8 · c . Meanwhile people also move out from cell 5, so on the second day the number of people in group Y who move into cell 5 is Y 2,4,6,8 · c · (1 − 4c) . The rest can be calculated in the same manner.
Hence, the total number of people in group Y who move into cell 5 on day t is If we replace Y with the exact number of infectors, the daily number of infectors who move into hospital in cell 5 on day t is calculated as Therefore, the daily increase in the number of hospitalizations in cell 5 on day t is In a similar way, the daily increase in the number that self-heal (S) is calculated as Data sources. We used an initial spreading coefficient to explain the daily percentage increase; in the UK the initial R 0 value was 5.81 8,9 , the infection period (including incubation period) was 15 days, and the incidence number doubled every 1.8-2.8 days (Table 1). So the value of the initial growth rate can be calculated as 0.4 m = R 0 /15 days .
We set 16,183 pixels, for the areas of the UK. Therefore, on the UK map, each pixel represents a 15 km 2 geographic area, and people who travel over a 4 km (2.5 mile) straight-line distance are considered as migrants. The percentage of people migrating between cells in the UK is around 40% as roughly estimated based on available worldwide and domestic travel and transport statistics (Table 1) 38,42 . Since there are four directions in which Day Number of local infectors in cell 5 1 www.nature.com/scientificreports/ people in one square cell can migrate, the number will be divided by 4, and the travel parameter was estimated to be 0.1, standing for 10% people in one cell migrating between two adjacent cells every day ( Table 1). The general percentage of hospitalization means the possibility of infectors being accepted to the hospital and thus strictly isolated. We consulted the cumulative death rate which was estimated at 15.4%, and the number of beds occupied by confirmed COVID-19 patients according to the NHS statistics in July, which showed that 2000 beds were occupied by COVID-19 patients 43 . Moreover, in early April the number of hospitalizations was estimated at around 7000-8000 44 . Considering there to be 200% undetected cases, as the number of undetected patients is estimated to be more than two times that of the confirmed patients, the percentage to hospital including death rate is 20% (Table 1) 45 .
Since the illness period is estimated at 15 days, the controlled spreading coefficient can be calculated as m = R 0 /15 days , which means the average number of people who can contract COVID-19 from one patient in one day during his/her illness period 46 .
The detection rate of infectors stands for the possibility of an infector being detected as well as isolated. For instance, if the healthcare system provides no detection service, the detection rate of infectors is equal to 0. If the healthcare system provides enough detection for all patients with severe symptoms and immediately isolates them, the detection rate of infectors is equal to the rate of occurrence of severe symptoms (13.8%) 46 . If the healthcare system provides general, extensive and compulsive detection services for all citizens, the detection rate of infectors will be close to 1.
An approximate validation of the accuracy of the model was based on the early statistics from the UK government, although this was hard to do in practice because the real transmission dynamic and infection scale were difficult to determine at the early stage of the pandemic (Supplementary Table S1). www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.