Emergence of universality in the transmission dynamics of COVID-19

The complexities involved in modelling the transmission dynamics of COVID-19 has been a roadblock in achieving predictability in the spread and containment of the disease. In addition to understanding the modes of transmission, the effectiveness of the mitigation methods also needs to be built into any effective model for making such predictions. We show that such complexities can be circumvented by appealing to scaling principles which lead to the emergence of universality in the transmission dynamics of the disease. The ensuing data collapse renders the transmission dynamics largely independent of geopolitical variations, the effectiveness of various mitigation strategies, population demographics, etc. We propose a simple two-parameter model—the Blue Sky model—and show that one class of transmission dynamics can be explained by a solution that lives at the edge of a blue sky bifurcation. In addition, the data collapse leads to an enhanced degree of predictability in the disease spread for several geographical scales which can also be realized in a model-independent manner as we show using a deep neural network. The methodology adopted in this work can potentially be applied to the transmission of other infectious diseases and new universality classes may be found. The predictability in transmission dynamics and the simplicity of our methodology can help in building policies for exit strategies and mitigation methods during a pandemic.

In this section we discuss some examples of data collapse seen in physical system. The emergence of universality accompanied by a data collapse is quite common and has been observed for more than half a century. We begin our exposition with data collapse in the study of critical phenomenon [3][4][5] in fluids. We intend to keep the discussion only at a representative level to give some introduction to universality and data collapse and not make it an exhaustive survey which is not within the scope of this paper.
It is well known that if a gas is subjected to isothermal compression, then at a certain pressure it undergoes a phase transition to the liquid state while the pressure remains constant. Upon completion of the phase transition, the pressure increases very sharply on further compression while the volume increases too. If the temperature is raised and the process repeated, the same phenomenon occurs except that the parameter space over which the pressure and volume simultaneously rise grows smaller and smaller as the temperature is raised. At a particular temperature, T = T c , this parameter space reduces to a point and at higher temperatures the liquefaction does not occur however high the pressure is. The temperature, T c , is the critical temperature and in its vicinity the fluid shows a large amount of fluctuation as it lies between a state of reasonable order, the liquid state, and a state of disorder, which is the gaseous state.
The fluctuations in density very close to the critical point actually mean that some short-lived liquid state bubbles are produced in the gas. As the critical point is reached these fluctuations become long lived and infinitely long ranged as one has a liquid-vapour coexistence. The infinite lifetime of fluctuations and the infinite range of correlation mean that this phenomenon cannot be sensitive to which gas is being considered. Whether it be Carbon Dioxide, Oxygen, Nitrogen, Xenon or Sulphur Hexafluoride, they will all behave in the same way. This is known as universality and, hence, means that if we draw the equation of state of any gas near the critical point with pressure scaled by the critical pressure and the volume scaled by the critical volume, then we will get just one curve. The critical volumes, critical temperatures and critical pressures are all different for the different materials but the scaled equation of state is the same as can be seen from Figure 1. If P, V and T are the pressure, volume and temperature of a fluid with a and b being constants specific to the fluid, then where π = P/P c , φ = V /V c and τ = T /T c with P c , V c and T c being the pressure volume and temperature at the critical point. The equation of state thus collapses to a single equation independent of the nature of the fluid. An identical situation holds for the dynamics which probes the lifetime of the fluctuations. The experimental technique for probing fluctuations directly is frequency resolved light scattering. The intensity I as a function of frequency, ω, is what is measured and at a given temperature and has a Lorentzian shape. As the temperature is lowered towards the critical point, both the peak (at ω = 0) and the width at half-maximum increase with decreasing temperature, both headed for infinitely large values at the critical point. Different materials show different Lorentz distributions at different temperatures. If we denote the intensity at a particular frequency and a particular temperature, T , by I(ω, ∆T ), with ∆T = T − T c , then for different materials and different sets of ∆T one has different I(ω, ∆T ) vs ω distributions. However, if one plots I(ω, ∆T )/I(0, ∆T ) vs. ω/ω 0 where ω 0 is the characteristic frequency for a given material, then the data for hundreds of individual distributions collapse on a single distribution. This phenomenon is called dynamic scaling [3][4][5] and is displayed in Figure 1.
This phenomenon of universal data collapse is ubiquitous: It appears in various forms in network dynamics [6], neuronal network [7], brain [8,9], supercritical fluids [10], amorphous solids [11], Bose-Einstein condensate [12], jamming transition [13], granular media [14,15], financial markets [16], metallic liquids [17], etc. Thus, in some sense, it is not surprising that this intriguing phenomenon be present in the transmission of infectious diseases as well.    Type I  and Type II transmission dynamics; and the right panels shows the dÑ(t)/dt vs.t plot for the same. We fit four models to the data -the logistic growth model, the generalized logistic model, the generalized Richards model and the Blue Sky model. While the fit to data for all the models are the same for Type I transmission where all the models go to the limit of being a logistic growth model, for Type II transmission the Blue Sky model can explain the data better than the other models.

Comparisons between different models
In this section we compare some models that have been used commonly in epidemiology to quantify transmission dynamics of diseases including COVID-19 [18]. All the models that we describe here are variants of the logistic growth model. The 2/5 logistic growth model (LGM) is defined as: We definet = t/t 1/2 andÑ(t) = N(t)/N max , and by settingβ = βt 1/2 andκ = κ/N max , we get the rescaled equation: We see that β scales as the time dimension and κ scales as the number of cases. This makes the dimensional quantities,β and κ, independent of any characteristic timescale or size of the system.κ = 1 in the LGM. The generalized logistic model (GLM) is given by: Similar rescaling as above leads to the rescaled parametersβ = βt 1/2 N π−1 max andκ = κ/N max leading to the rescaled GLM being: The generalized Richards model (GRM) is given by: For π = 1, it reduces to the standard Richards model [19]. We see that the scaling is similar to the GLM withβ = βt 1/2 N π−1 max andκ = κ/N max ; the rescaled GRM is: In Figure 2 we show a comparison between all the models we list here and the Blue Sky model (BSM). The two upper panels are for transmission of Type I. The LGM fits this data quite well and all the other models acquire the parameter values that take them to the limit of being an LGM. The two lower panels are for transmission of Type II. Here we clearly see that the BSM fits the data much better that the other models. Especially for the tail of distribution past the peak of the dÑ(t)/dt vs.t plot wheret >t peak , the BSM model allows for larger dÑ(t)/dt compared to the other models hence being able to replicate transmission of Type II where the spread of the disease lingers on.

The deep neural network architechture
In our model-agnostic approach we used a deep neural network (DNN) [20,21] to fit the data. A DNN is a universal function generator. The reason we chose a DNN over other machine learning frameworks (i.e. decisions trees, random forests, SVM etc.) is the ease with which it can be use for a regression to noisy data to extract the underlying function. We will describe in brief the choice of the network architechture.
We started with a small DNN with 1 input node fort, 2 hidden layers, each with 4 nodes and 1 output layer forÑ(t). All layers were fully connected as shown in Figure 3. The activation function used was the sigmoid function. The solution to the LM is a sigmoid function and hence the transmission of Type 1 can be fairly explained with a just one sigmoid function that connects the inputt to the outputÑ(t). This is not possible though for transmission of Type II which shows clear deviation from the LM. However, in the model agnostic approach we did not want to introduce any model biases through assumptions of functional form. Hence we used a larger (and deeper) neural network with 33 parameters to begin with. We scaled up the network to 3 hidden layers each with 16 nodes to check for the stability of our approach. The regression with the DNN with 2 hidden layers with 4 nodes each gave almost the same fit as the one with 3 hidden layers with 16 nodes each having a total of 593 parameters. To keep our method as general as possible and not introduce model biases from the activation function we decided to use the larger DNN with 3 hidden layers each having 16 nodes. In addition, we tested the DNN with other activation function like tanh and relu. For tanh a somewhat wider and deeper network was needed to fit to the data. With the relu activation the network needed to be very large. Hence, we used the sigmoid activation function for the final DNN architecture.
Even though the DNN we used was not so large we used an early callback algorithm to stop the training of the network when the validation loss stopped decreasing to avoid overtraining and chose the epoch where the validation loss was the least. The loss function that we used was a simple mean square error function which measures the L 2 distance of the fit from the data. The training time was a few seconds with the early stopping coming into effect at about 100-150 epochs. The trained DNN was used for making prediction for the ongoing phases