Change sign detection with differential MDL change statistics and its applications to COVID-19 pandemic analysis

We are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of an epidemic by detecting signs of changes in the data. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minimum description length change statistics (D-MDL), for measuring the scores of change sign. We first give a fundamental theory for D-MDL. We then demonstrate its effectiveness using synthetic datasets. We apply it to detecting early warning signals of the COVID-19 epidemic using time series of the cases for individual countries. We empirically demonstrate that D-MDL is able to raise early warning signals of events such as significant increase/decrease of cases. Remarkably, for about \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$64\%$$\end{document}64% of the events of significant increase of cases in studied countries, our method can detect warning signals as early as nearly six days on average before the events, buying considerably long time for making responses. We further relate the warning signals to the dynamics of the basic reproduction number R0 and the timing of social distancing. The results show that our method is a promising approach to the epidemic analysis from a data science viewpoint.


Motivation
We address the issue of detecting changes and their signs in a data stream. For example, when given time series of the number of COVID-19 cases in a country, we may expect to warn the beginning of an epidemic by detecting changes and their signs.
Although change detection (see e.g. [1,2,3]) is a classical issue, it has remained open how the sign of changes can be found. In principle the degree of change at a given time point has been evaluated in terms of the discrepancy measure (e.g.. the Kullback-Leibler (KL) divergence) between probability 1 distributions of data before and after that time point (see e.g. [1,4]). It is reasonable to think that the differentials of the KL divergence may be related to signs of change. This is because the first differential of the KL divergence is a velocity of change while its second differential is an acceleration of change.
The problem is here that in real cases, the KL-divergence and its differentials cannot be exactly calculated since the true distribution is unknown in advance. A question lies in how we can estimate the discrepancy measure and their differentials from data when the parameter values are unknown.
The purpose of this paper is to answer the above question from an information-theoretic viewpoint based on the minimum description length (MDL) principle [5] (see also [6,7] for its recent advances). The MDL principle gives a strategy for evaluating the goodness of a probabilistic model in terms of codelength required for encoding the data where a shorter codelength indicates a better model. We apply this principle to change detection where a longer codelength indicates a more significant change. Along this idea, we introduce the notion called the differential MDL change statistics (D-MDL) for the measure of change signs. We theoretically and empirically justify this notion, and then apply it to the COVID- 19 pandemic analysis using open datasets.

Related Work
There are plenty of work on change detection (see e.g. [1,2,3,4,8,9,10,11]). In many of them, the degree of change has been related to the discrepancy measure for two distributions before and after a time point, such as likelihood ratio, KL-divergence. However, there is no work on relating the differential information such as the velocity of the change to change sign detection.
Most of previous studies in change detection are concerned with detecting abrupt changes [3]. In the scenario of concept drift [12], the issues of detecting various types of changes, including incremental changes and gradual changes, have been addressed. How to find signs of changes has been addressed in the scenarios of volatility shift detection [13], gradual change detection [14] and clustering change detection [15,16,17]. However, the notion of differential information has never been related to change sign detection.
The MDL change statistics has been proposed as a test statistics in the hypothesis testing for change detection [14,18]. It is defined as the difference between the total codelength required for encoding data for the non-change case and that for the change case at a specific time point t. A number of data compression-based change statistics similar to it have also been proposed in data mining [19,20,21]. However, any differential variation of the compression-based change statistics has never been proposed.

Significance of This Paper
The significance of this paper is summarized as follows: (1) Proposal of D-MDL and its use for change sign detection. We introduce a novel notion of D-MDL as an approximation of KL-divergence of change and its differentials. We then propose practical algorithms for on-line detection of change signs on the basis of D-MDL.
(2) Theoretical and empirical justification of D-MDL. We theoretically justify D-MDL in the hypothesis testing of change detection. We consider the hypothesis tests which are equivalent with D-MDL scoring. We derive upper bounds on the error probabilities for these tests to show that they converge exponentially to zero as sample size increases. The bounds on the error probabilities are used to determine a threshold for raising an alarm with D-MDL. We also empirically justify D-MDL using synthetic datasets. We demonstrate that D-MDL outperforms existing change detection methods in terms of AUC for detecting the starting point of a gradual change.
(3) Applications to COVID-19 pandemic analysis. On the basis of the theoretical and empirical advantages of D-MDL, we apply it to the COVID-19 pandemic analysis. We are mainly concerned with how early we are able to detect signs of outbreaks or the contraction of the epidemic for individual countries. The results showed that for about 64% of outbreaks in studied countries, our method can detect signs as early as about six days on average before the outbreaks. Considering the rapid spread, six days can earn us considerably long time for making responses, e.g., implementing control measures [22,23,24]. The earned time is especially precious in the presence of a considerably long period of the incubation of the COVID-19 [25,26,27]. Moreover, we analyze relations between the change detection results and social distancing events. One of findings is that for individual countries, an average of about four changes/change signs detected before the implementation of social distancing correlates a significant decline from the peak of daily new cases by the end of April.
The change analysis is a pure data science methodology, which detects changes only using statistical models without using differential equations about the time evolution. Meanwhile, SIR (Susceptible Infected Recovered) model [28] is a typical simulation method which predicts the time evolution of infected population with physics model-based differential equations. Although the fitness of the SIR model or its variants to COVID-19 data was argued in e.g. [29,30], the complicated situation of COVID-19 due to virus mutations, international interactions, highly variable responses from authorities, etc. does not necessarily make any simulation model perfect. Therefore, the basic reproduction number R0 [31] (a term in epidemiology, representing the average number of people who will contract a contagious disease from one person with that disease) estimated from the SIR model may not be precise. We empirically demonstrate that as a byproduct, the dynamics of R0 can be monitored by our methodology which only requires the information of daily new cases. The data science approach then may form a complementary relation with the simulation approach and gives new insights into epidemic analysis.
The software for the experiments is available at https://github.com/IbarakikenYukishi/ differential-mdl-change-statistics. An online detection system is available at https: //ibarakikenyukishi.github.io/d-mdl-html/index.html The rest of this paper is organized as follows: Section 2 introduces D-MDL and gives a theory of its use in the context of change sign detection. Section 3 gives empirical justification of D-MDL using synthetic datasets. Section 4 gives applications of D-MDL to the COVID-19 pandemic analysis. Section 5 gives concluding remarks.

Definitions of Changes and their Symptoms
X be a domain where we assume that X is discrete without loss of generality. For a random variable x ∈ X , let p(x; θ) = p θ (x) be the probability mass function specified by a parameter θ. Supposing that θ changes over time. In the case when θ gradually changes over time, we define the signs of change as the starting point of that change.
Let us consider the discrete time t. Let θ t be the parameter value of θ at time t. Let D(p||q) denote the Kullback-Leibler (KL) divergence between two probability mass functions p and q: We define the 0th, 1st, 2nd change degrees at time t as t−1 = D(p θ t+1 ||p θ t ) − 2D(p θ t ||p θ t−1 ) + D(p θ t−1 ||p θ t−2 ). When the parameter sequence {θ t : t ∈ Z} is known, we can define the degree of changes at any given time point. We can think of Φ t as the degree of change of the parameter value itself at time t. We can think of Φ (1) t , Φ (2) t as the velocity of change and the acceleration of change of the parameter at time t, respectively. All of them quantify the signs of change. However, the parameter values are not known in advance for general cases. The problem is how we can define the degree of changes when the true distributions are unknown.

Differential MDL Change Statistics
In the case where the true parameter value is unknown, the MDL change statistics has been proposed to measure the change degree in [14,18] from a given data sequence. Below we denote x a , . . . , x b = x b a . In the case of a = 1, we may drop off a and write it as x b .
When the parameter θ is unknown, we may estimate it asθ using the maximum likelihood estimation method from a given sequence x n . I.e.,θ = argmax θ p(x n ; θ). Note that the maximum likelihood function max θ p(x n ;θ) does not form a probability distribution of x n because x n p(x n ;θ) > 1. Thus we construct a normalized maximum likelihood (NML) distribution [33] by max θ p(x n ; θ) y n max θ p(y n ; θ) and consider the logarithmic loss for x n relative to this distribution by (1) which we call the NML codelength, where C n is called the parametric complexity defined as It is known in [32] that equation (1) is the optimal codelength that achieves the Shtarkov's minimax regret in the case where the parameter value is unknown. It is known in [33] that under some regularity condition for the model class, C n is asymptotically expanded as follows: where I(θ) is the Fisher information matrix defined by I(θ) = lim n→∞ 1 n E θ [− ∂ 2 log p(X n ;θ) ∂θ∂θ ], d is the dimensionality of θ, and lim n→∞ o(1) = 0.
According to [14], the MDL change statistics at time point t is defined as follows: {(− log p(x t 1 ; θ 1 ) + log C t ) + (− log p(x n t+1 ; θ 2 ) + log C n−t )} , The MDL change statistics is the difference between that the NML codelength of a given data sequence for non-change and that for change at time t. It is a generalization of the likelihood ratio test [1,34].
If the parameters are known such that θ 0 = θ 1 = θ 2 , then under the independence assumption, ≈ n − t n D(p θ 2 ||p θ 1 ). The last equality holds by the law of large numbers when n is sufficiently large, under the assumption that the true distribution is p(x t 1 ; θ 1 )p(x n t+1 ; θ 2 ). Letting t = n/2 and p θ 2 = p θ t+1 , p θ 1 = p θ t , we have D(p θ t+1 ||p θ t ) This implies that the MDL change statistics in equation (4) is equivalent with the KL-divergence between two probability distributions in the case where their parameters are known in advance.
Therefore, by extending the change degrees Φ t , Φ t , . . . to the cases where the true parameters are unknown, we may consider the following statistics: the αth differential MDL change statistics, which we abbreviate as the αth D-MDL (α = 0, 1, 2, . . . ). The 0th D-MDL is the original MDL change statistics as in [14].
For example, let us consider the uni-variate Gaussian distribution: where x ∈ R and θ = (µ, σ). We assume |µ| < µ max and σ min < σ < σ max where µ max < ∞, 0 < σ min , σ max < ∞ are hyper parameters. The 0th D-MDL at time t is calculated as whereσ 0 ,σ 1 andσ 2 denote the maximum likelihood (ML) estimators calculated for x n 1 , x t 1 and x n t+1 , respectively. C n is the normalizer of the NML calculated as The 1st and 2nd D-MDL are calculated according to equation (5) and equation (6) With the MDL principle, the test statistics is given as follows: For an accuracy parameter > 0, where Ψ (0) t is the 0th D-MDL as in equation (4). H 1 is accepted if h 0 (x n ; t, ) > 0, otherwise H 0 is accepted. We call this test the 0th D-MDL test.
We define Type I error probability as the probability that the test accepts H 1 although H 0 is true (false alarm rate) while Type II error probability as the one that the test accepts H 0 although H 1 is true (overlooking rate). The following theorem justifies the use of the 0th D-MDL in change detection. Theorem 2.1 [14] Type I and II error probabilities for the 0th D-MDL test are upper bounded as follows: where C n is the parametric complexity as in equation (2) and This theorem shows that Type I and II error probabilities in equation (10) and equation (11) converge to zero exponentially in n as n increases for some appropriate . We see that the error exponents depend on the parametric complexities of the model class as well as the Bhattacharyya distance in equation (12) between the null and composite hypotheses. In this sense the 0th MDL test is effective in change point 6 detection.

The 1st D-MDL test
Next we give a hypothesis testing setting equivalent with the 1st D-MDL scoring. We consider the situation where a change point exists at time either t or t + 1. Let us consider the following hypotheses: The null hypothesis H 0 is that the change point is t while the composite one H 1 is that it is t + 1.
We consider the following test statistics: For an accuracy parameter > 0, which compares the NML codelength for H 0 with that for H 1 . We accept H 1 if h 1 (x n ; t, ) > 0, otherwise we accept H 0 . We call this test the 1st D-MDL test. We easily see where Ψ (1) t is the 1st D-MDL. This implies that the 1st D-MDL test is equivalent with testing whether the 1st D-MDL is larger than or not. Thus the basic performance of discriminability of the 1st D-MDL can be reduced to that of the 1st D-MDL test.
Let p t is the probability distribution at time t. Note that if t + 1 is the only change point, Ψ t+1 ≈ D(p t+1 ||p t ) and if t is the only change point, Ψ (0) t ≈ D(p t ||p t−1 ). Hence this test is also equivalent with comparison of the degree of change at time t + 1 and that at time t.
The following theorem shows the basic property of the 1st D-MDL test. Theorem 2.2 Type I and II error probabilities for the 1st D-MDL test are upper bounded as follows: where C n is the parametric complexity as in equation (2), d is the Bhattacharyya distance as in equation (12) and . (The proof is in Sec. 1 of the supplementary information.) This theorem shows that Type I and II error probabilities in equation (15) and equation (16) converge to zero exponentially in n as n increases where the error exponents are related to the parametric complexities for the hypotheses as well as the Bhattacharyya distance between the null and composite hypotheses. In this sense the 1st MDL test is effective. Type I error probability in equation (15) will be used for determining a threshold of the alarm.

The 2nd D-MDL test
Next we consider a hypothesis testing setting equivalent with the 2nd D-MDL scoring. Suppose that change points exists either at time t or at t − 1 and t + 1.
H 0 is the hypothesis that a change happens at time t while H 1 is the hypothesis that two changes happen at time t − 1 and t. In H 0 , t is a single change point while in H 1 , t is a transition point between two close change points. Thus this hypothesis testing evaluates whether time t is a change point or a transition point of close changes.
The test statistics is: For an accuracy parameter > 0, Under the assumption t ≈ 2h 2 (x n ; t, ) + 2 (18) This implies that the 2nd D-MDL test is equivalent with testing whether the 2nd D-MDL is larger than 2 or not. Thus the basic performance of discriminability of the 2nd D-MDL can be reduced to that of the 2nd D-MDL test.
The following theorem shows the basic property of the 2nd D-MDL test. Theorem 2.3 Type I and II error probabilities for the 2nd D-MDL test are upper bounded as follows: where C n is the parametric complexity as in equation (2), d is the Bhattacharyya distance as in equation (12) and . This theorem can be proven similarly with Theorem 2.2. This theorem shows that Type I and II error probabilities in equation (19) and equation (20) converge to zero exponentially in n as n increases where the error exponents are related the sum of parametric complexities for the hypotheses as well as the Bhattacharyya distance between the null and composite hypotheses. In this sense the 2nd D-MDL test is effective. Type I probability in equation (19) will be used for determining the threshold in Sec.2.5.

Sequential Change Sign Detection with D-MDL
In previous sections, we considered how to measure the change sign scores at a specific time point t. In order to detect change signs sequentially for the case where there exist multiple change points, we can conduct sequential change sign detection using D-MDL in a similar manner with [14]. We give two variants of the sequential algorithms. One is the sequential D-MDL algorithm with fixed windowing while the other is that with adaptive windowing. In the former, we prepare a local window of fixed size to calculate D-MDL at the center of the window. We then slide the window to obtain a sequence of D-MDL change scores as with [14] (see also [35] for local windowing). We raise an alarm when the score exceeds the predetermined threshold β. The algorithm is summarized as follows: Sequential D-MDL algorithm with fixed windowing Given: 2h: window size, T : data length, β: threshold parameter at t by sliding the window. Make an alarm if and only if Ψ Next we design the sequential D-MDL algorithm with adaptive windowing. In [36], the sequential algorithm with adaptive windowing (SCAW2) was proposed by combining the 0th D-MDL with ADWIN algorithm [9] (see also [37] for adaptive windowing) where the window grows until the MDL change statistics exceeds a threshold. The threshold is determined so that the total number of detected change points is finite. The sequential D-MDL algorithm can be obtained by replacing the 0th D-MDL with general D-MDL in SCAW2. It outputs the size of window whenever a change point is detected. Sequential D-MDL algorithm with adaptive windowing Given: T : data length,

Hierarchical Sequential D-MDL Algorithm
Practically, we combine the algorithm with adaptive windowing for the 0th D-MDL and the algorithms with fixed windowing for the 1st and 2nd D-MDL. We call this algorithm the hierarchical sequential D-MDL algorithm. It is designed as follows. We first output not only a 0th D-MDL score but also a window size with the 0th D-MDL with adaptive windowing and raise an alarm when the window shrinks, i.e., equation (21) is satisfied. We then output the 1st and 2nd D-MDL scores using the window produced by the 0th D-MDL and raise alarms when the 1st or 2nd D-MDL exceeds the threshold so as to expect the 1st and 2nd D-MDL to detect change signs before the window shrinkage. Note that the window shrinks only with the 0th D-MDL, but neither with the 1st nor 2nd D-MDL.
In this algorithm, the threshold is determined so that Type I error probability in equation (15) is less than the confidence parameter δ 1 . That is, from equation (15) and equation (3), letting We employ the righthand side of equation (22) as the threshold of an alert of the 1st D-MDL. The threshold (2) w for the 2nd D-MDL Ψ (2) t can also be derived similarly with the 1st one. Note that by equation (18), the threshold is 2 times the accuracy parameter for the hypothesis testing. Letting δ 2 be the confidence parameter, we have (2) w ≥ 2(d log(w/2) + log(1/δ 2 )).
(23) We employ the righthand side of equation (23) as the threshold of an alert of the 2nd D-MDL. In practice, δ 1 and δ 2 are estimated from data (see Sec. 4.2).

Datasets
To evaluate how well D-MDL performs for abrupt/gradual change detection, we consider two cases; multiple mean change detection and multiple variance one.
In the case of multiple mean change detection, we constructed datasets as follows: each datum was independently drawn from the Gaussian distribution N (µ t , 1) where the mean µ t abruptly/gradually changed over time according to the following rule: In the case of abrupt changes, where H(x) is the Heaviside step function that takes 1 if x > 0 otherwise 0. In the case of gradual changes, H is replaced with the following continuous function: In the case of multiple variance change detection, each datum was independently drawn from the Gaussian distribution N (0, σ 2 t ) where the variance σ 2 t abruptly/gradually changed over time according to the following rule: In the case of abrupt changes, In the case of gradual changes, H is replaced with S as with the multiple mean changes.
We define a sign of a gradual change as the starting point of that change. In all the datasets, change points for abrupt changes and change signs for gradual changes were set at nine points: t = 1000, 2000, . . . , 9000.

Evaluation Metric
For any change detection algorithm that outputs change scores for all time points, letting β be a threshold parameter, we convert change-point scores {s t } into binary alarms {a t } as follows: a t = 1 (s t > β), 0 (otherwise). By varying β, we evaluate the change detection algorithms in terms of benefit and false alarm rate defined as follows: Let T be a maximum tolerant delay of change detection. When the change truly starts from t * , we define benefit of an alarm at time t as (otherwise), where t * is a change point for abrupt change, while it is a sign for gradual change.
The total benefit of alarm sequence a n−1 0 is calculated as B(a n−1 The number of false alarms is calculated as where Θ(t) takes 1 if and only if t is true, otherwise 0. We evaluate the performance of any algorithm in terms of AUC (Area under curve) of the graph of the total benefit B/ sup β B, against the false alarm rate (FAR) N/ sup β N , with β varying.

Methods for Comparison
In order to conduct the sequential D-MDL algorithm, we employed the univariate Gaussian distribution whose probability density function is given by equation (7). We employed three sequential change detection methods for comparison: (1) Bayesian online change point detection (BOCPD) [11]: A retrospective Bayesian online change detection method. It originally calculates the posterior of run length. We modified it to compute a change score by taking the expectation of the reciprocal of run length with respect to the posterior.
We conducted the sequential D-MDL algorithms with fixed window size in order to investigate their most basic performance in terms of the AUC metric. The sequential D-MDL algorithm with adaptive windowing outputs the window size rather than the D-MDL values themselves, hence in order to evaluate the effectiveness of the magnitude of D-MDL, the sequential D-MDL with fixed windowing is a better target for the comparison. All of CF, BOCPD, and ADWIN2 had some parameters, which we determined from 5 sequences so that the AUC scores were made the largest.

Results
The performance comparison is summarized in Table 1. We see that both for the datasets, in the case of abrupt changes, the 0th D-MDL performs best, while in the case of gradual changes, the 1st D-MDL performs best and the 2nd D-MDL performs worse than the 1st but better than the 0th. That matches our intuition. Because the 0th D-MDL was designed so that it could detect abrupt changes while the 1st one was designed so that it could detect starting points of gradual changes. The purpose of our analysis is to demonstrate the importance of monitoring the dynamics of the epidemic through detecting the occurrence of drastic outbreaks and their signs. We define outbreak as a significant increase in the number of cases in a country.
We are mainly concerned with the following two problems: 1. How early are the outbreak signs detected prior to outbreaks? 2. How are the outbreaks/outbreak signs related to the social distancing events?
As a byproduct, the analysis of the dynamics of the basic reproduction number R0 [31] is conducted, which can serve as supplementary information to the particular value estimated from the SIR model [38].

Data Source
We employed the data provided by European Centre for Disease Prevention and Control (ECDC) via https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-dis We studied 37 countries with no less than 10,000 cumulative cases by Apr. 30 since some countries started to ease the social distancing around the date. More details can be found in Sec. 2.1 of the supplementary information.

Data Modeling
We studied two data models by considering the value of R0, which by definition is the product of transmissibility, the average contact rate between susceptible and infected individuals, and the duration of infectiousness [38]. At the initial phase of an epidemic, R0 is larger than one [31]. And the cumulative cases may grow exponentially [39,40,41,42]. We thus employed the Malthusian growth model [43] because it is widely used for characterizing the early phase of an epidemic [41,42]. In particular, the cumulative cases at time t, C(t), grows according to the following equation: where C(0) is the number of cases at the start of an epidemic, and r is the growth rate. In the experiments, we took the logarithm of C(t) to obtain the linear regression of the logarithm growth with respect to time as follows: log C(t) = rt + log C(0).
(25) We modeled the residual error of the linear regression using the univariate Gaussian. When a change is detected in the modeling of the residual error, we examine the increase/decrease in the coefficient of the linear regression, i.e., r. We expect to detect changes in the parameter of the exponential modeling to monitor the increase/decrease of R0 because R0 is proportional to r as derived from the SIR [40].
In later phases, the exponential growth pattern may not hold. For instance, when R0 < 1, daily new cases would continue to decline and cease to exist [31]. Considering the complicated real scenarios, epidemic models with certain assumptions on the growth rate or R0 may not fit an epidemic at a given time. Therefore, we employed the univariate Gaussian distribution as in equation (7) to directly model the number of daily new cases, without assuming any patterns of the growth. The change in the parameter of the Gaussian modeling may reveal the relation between one and R0, i.e., R0 > 1 when daily new cases increase significantly or R0 < 1 when daily new cases decrease significantly.
We conducted the hierarchical sequential D-MDL algorithm as in Sec. 2.6. The confidence parameter δ for the 0th D-MDL as in equation (21) was set to be 0.05. Those for the 1st and 2nd D-MDL, i.e. δ 1 , δ 2 as in equation (22), equation (23) were determined as follows: we calculated the D-MDL scores around the time when the initial warning was announced by an authority; we determined δ 1 , δ 2 so that the score was the threshold. For example, the initial warning for Japan was set on Feb. 27, when the government required closing elementary, junior high and high schools. If the resulting δ 1 , δ 2 was larger than 1, it was set to be 0.99 because of the concept of confidence parameter.

Results for South Korea
We present two representative case studies (the results for all the studied countries are included in Sec. 2.2 of the supplementary information), South Korea and Japan (next subsection). The date of the implementation of social distancing was considered as Feb. 25 from which many non-essential services were closed. We present results in Fig. 1 and Fig. 2 for the Gaussian modeling and the exponential modeling, respectively. Change scores were normalized into the range from 0 to 1.
With the Gaussian modeling, there were several alarms raised before the social distancing event. For each alarm raised by the 0th D-MDL, the interpretation can be that a statistically significant increase of cases occurs, with reference to the cases in Fig. 1(a). Hereafter, a change that is detected by the 0th D-MDL and that corresponds to the increase of cases is regarded as an outbreak, which instantiates our definition of outbreak. The outbreak detection is the classic change detection. We further relate it to R0. Around the dates of the alarms, R0 > 1 was considered since we can confirm that the new infections resulted from community transmission. Correspondingly, R0 was estimated at 1.5 by an epidemiological study [44]. When the 0th D-MDL raised an alarm, the window size shrank to zero. Before that, both the 1st and the 2nd D-MDL raised alarms, which are interpreted as the changes in the velocity and the acceleration of the increase of cases, respectively. We can conclude that the 1st and the 2nd D-MDL were able to detect the signs of the outbreak by examining the velocity and the acceleration of the spread. The sign detection is the new concept with which we propose to supplement the classic change detection.
The 0th D-MDL raised several alarms after the event, and the latest ones corresponded to decreases of cases. It is not difficult to tell that the corresponding R0 was less than one. We think that the social distancing played a critical role in containing the spread because it can suppress R0 through reducing the contact rate, which was supported by studies e.g. [44,45,46,47]. Both the 1st and the 2nd D-MDL again demonstrated the capability of early sign detection.
As for the exponential modeling, there was only one alarm raised by the 0th D-MDL and it was after the social distancing event. Although the alarm was raised on Apr. 2, the date as the change point was within the window as of Apr. 2, and was identified as Mar. 1. It turned out that the alarm corresponded to a decrease in the coefficient of the linear regression. As we can see the cumulative cases in Fig. 2(a), the sub-curve before Mar. 1 might have experienced an exponentially upward trend while the sub-curve after that became almost flatten. We can conclude that R0 declined from a value larger than one to a value considerably smaller than one on Mar. 1. The 1st D-MDL and the 2nd D-MDL did not raise alarms of signs, which may be because R0 changed slowly as shown by only one alarm raised by the 0th D-MDL.

Results for Japan
The results are presented in Fig. 3 and Fig. 4. The policies of social distancing were enacted on Apr. 7.
With the Gaussian modeling, the observations were similar to those for South Korea before the event. But after that, the 0th D-MDL raised no alarms, implying that R0 did not decrease to a value considerably smaller than one. With respect to the exponential modeling, there were no alarms raised by the 0th D-MDL, showing that there were no statistically significant changes in the value of R0. As a comparison, the Gaussian modeling was effective and efficient at estimating the relation between one and R0. The exponential modeling was able to monitor the change in the value of R0, but in a relatively slower manner. The two models form a complementary relation on monitoring the dynamics of R0. For instance, for Japan, the Gaussian modeling showed that the value of R0 reminded at a value larger than one, and the exponential modeling showed that its value did not significantly change. In terms of sign detection, the Gaussian modeling successfully detected signs while the exponential modeling did not, which is because the Gaussian modeling deals with daily new cases whose growth pattern would significantly change in the same direction (i.e., either increase or decrease) within a short period with either R0 > 1 or R0 < 1 while the exponential modeling deals with cumulative cases whose growth pattern only changes significantly when R0 changes significantly, and a significant change of R0 within a short period did not happen in South Korea and Japan.

Summarization of Results on Individual Countries
This section presents two observations. The first is how early the signs can be detected prior to outbreaks with the Gaussian modeling. For the countries studied, there were 106 outbreaks in total. The number of outbreaks whose signs were detected by either the 1st or the 2nd D-MDL is 68, representing a detection rate of 64%. For each outbreak whose signs were detected, we measured the time difference between the earliest sign alarm and the outbreak alarm. The time difference in terms of the number of days is 6.25 (mean) ± 6.04 (standard deviation). Considering the fast spread, six days can buy us considerably long time to prepare for an outbreak, and even to avoid a potential outbreak.
As a comparison, the 1st D-MDL detected signs for 65 outbreaks and the 2nd D-MDL detected signs for 27 outbreaks. The smaller number for the 2nd D-MDL might be because the 1st D-MDL is better at detecting starting points of gradual changes, and is consistent with results on the synthetic datasets as in Table 1. The number of days before which the 1st D-MDL detected signs was 6.35 ± 5.91, and the number for the 2nd D-MDL was 5.56 ± 6.50. Note that not all the outbreaks allowed for sign detection since the 1st D-MDL sign detection requires one more data point and the 2nd one requires two more data points in the window than the 0th D-MDL, respectively. The number of outbreaks allowing for a 1st D-DML sign is 88 while the number for a 2nd one is 81. Hence, it turns out that some outbreaks occurred too quickly before signs can be detected.
Second, we observed that on average, countries responding faster in terms of a smaller number of alarms before the social distancing event saw a quicker contraction of daily new cases. As of Apr. 30, the curve of daily new infections in many countries had been flatten, and even started to be downward. Therefore, alarms for declines in the number of cases from the global peak number were raised for ten countries including Austria, China, Germany, Iran, Italy, Netherlands, South Korea, Spain, Switzerland, Table 2: Summarization of statistics where ± connects mean and standard deviation.

Measurement
Value Total number of outbreaks 106 Number of outbreaks whose signs were detected by either the 1st or the 2nd D-MDL 68 Detection rate of either the 1st or the 2nd D-MDL 64% Number of days before an outbreak for the first sign of either the 1st or the 2nd D-MDL 6.25 ± 6.04 Total number of outbreaks that allowed for the 1st D-MDL sign detection 88 Total number of outbreaks that allowed for the 2nd D-MDL sign detection 81 Number of outbreaks whose signs were detected by the 1st D-MDL 65 Number of outbreaks whose signs were detected by the 2nd D-MDL and Turkey in alphabetical order. These countries are referred to as downward countries.
In total, the number of all kinds of alarms raised before the event for downward countries was 4.30 ± 2.79 while it was 5.96 ± 4.22 for other countries. Separately, the number of alarms raised by the 0th, the 1st, and the 2nd D-MDL for downward countries was 2.00 ± 1.55, 1.70 ± 1.42, and 0.60 ± 0.66, respectively, while it was 1.85 ± 1.41, 3.08 ± 2.34, and 1.04 ± 1.53, respectively, for other countries. Therefore, if the social distancing is a viable option, it is suggested that the action should better be taken before it is late, e.g., later than four alarms. We further measured that it took an average of 30 days to suppress the spread if prompt social distancing policies were enacted. By contrast, the average number of days from the event to Apr. 30 is nearly 37 for non-downward countries, which is considerably more than the time used for suppressing the spread in downward countries. Table 2 summarizes all the statistics, and please refer to Sec. 2.3 of the supplementary information for more detailed numbers.

Conclusion
This paper has proposed a novel methodology for detecting signs of changes from a data stream. The key idea is to use the differential MDL change statistics (D-MDL) as a sign score. This score can be thought of as a natural extension of the differentials of the Kullback-Leibler divergence for measuring the degree of changes to the case where the true mechanism for generating data is unknown. We have theoretically justified D-MDL using the hypothesis testing framework and have empirically justified the sequential D-MDL algorithm using the synthetic data. On the basis of the theory of D-MDL, we have applied it to the COVID-19 pandemic analysis. We have observed that the 0th D-MDL found change points related to outbreaks and that the 1st and 2nd D-MDL were able to detect their signs several days earlier than them. We have further related the change points to the dynamics of the basic reproduction number R0. We have also found that the countries with no more than five changes/change signs before the implementation of social distancing tended to experience the decrease in the number of cases considerably earlier. This analysis is a new promising approach to the pandemic analysis from the view of data science.
Future work includes studying how we can integrate the change analysis such as our methodology with the conventional simulation studies such as SIR model. It is expected that our data science approach has a complementary relation with the simulation approach and gives new insights into epidemiology.

Proof of Theorem 2.2
Let the maximum likelihood estimator of θ i beθ i (i = 0, 1, 2, 3). Let us define the event as Type I error probability is evaluated as follows: Let the true parameter be θ * 0 and θ * 1 . where we have used the following relations: Next we evaluate Type II error probability. Let us define the event as Let the true parameter be θ * 2 and θ * 3 . Then under the event (2), The Type II error probability is upper-bounded as follows: where d and p θ * 2 * p θ * 3 are defined as in Theorem 2.1 and 2.2. This completes the proof.

Details of Experiments
Here, we give more details about the experiments on the change/change sign detection for all the countries studied.

Data Information
We employed the data provided by European Centre for Disease Prevention and Control (ECDC) via https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-dis For information, there are 37 countries that had no less than 10,000 cases in total by Apr. 30 We collected the date on which the social distancing was implemented from the information listed in the IHME COVID-19 predictions via https://covid19.healthdata.org/united-kingdom. If a certain country is not listed in the website, we referred to the Wikipedia page for the COVID-19 pandemic of the country, e.g., the COVID-19 pandemic in South Korea https://en.wikipedia. Ecuador was excluded from the list above because the social distancing is introduced to be related to changes incurred by declines in the number of cases and there was a very large number of cases in the initial phase of the epidemic in Ecuador. The large number might be an outlier due to the data collection procedure, and would make any changes after that date downward changes. But we still studied the change/change sign detection for Ecuador.

Results for All the Studied Countries
This section presents the results for all the studied countries with both the Gaussian modeling and the exponential modeling. Since the interpretation for each country is similar and there are many countries, we omit the explanations. Please refer to the main content for the illustration of South Korea and Japan. One point worthy mentioning is that with the exponential modeling, the 1st and the 2nd D-MDL raised alarms for Pakistan (   Fig. 76 shows the histograms of the number of days before which the 1st D-MDL and the 2nd D-MDL detected the first signs of outbreaks, respectively. Fig. 77 shows the histograms of the numbers of alarms (all the 0th, the 1st and the 2nd) before the implementation of social distancing for downward countries and non-downward ones, respectively. Fig. 78(a) shows the number of days between the implementation of social distancing and the date of the first alarm raised by the 0th D-MDL for the decline in the number of cases for downward countries. Fig. 78(b) shows the number of days between the implementation of social distancing and Apr. 30 for non-downward countries.