Comparison of incubation period distribution of human infections with MERS-CoV in South Korea and Saudi Arabia

The incubation period is an important epidemiologic distribution, it is often incorporated in case definitions, used to determine appropriate quarantine periods, and is an input to mathematical modeling studies. Middle East Respiratory Syndrome coronavirus (MERS) is an emerging infectious disease in the Arabian Peninsula. There was a large outbreak of MERS in South Korea in 2015. We examined the incubation period distribution of MERS coronavirus infection for cases in South Korea and in Saudi Arabia. Using parametric and nonparametric methods, we estimated a mean incubation period of 6.9 days (95% credibility interval: 6.3–7.5) for cases in South Korea and 5.0 days (95% credibility interval: 4.0–6.6) among cases in Saudi Arabia. In a log-linear regression model, the mean incubation period was 1.42 times longer (95% credibility interval: 1.18–1.71) among cases in South Korea compared to Saudi Arabia. The variation that we identified in the incubation period distribution between locations could be associated with differences in ascertainment or reporting of exposure dates and illness onset dates, differences in the source or mode of infection, or environmental differences.

similar in South Korea and in Saudi Arabia. The risk of death was much higher in the cases from Saudi Arabia (16/34; 47%) compared to the cases from South Korea (26/115; 23%) (p = 0.01, chi-squared test). Figure 1A,B compare alternative parametric models with the non-parametric maximum likelihood estimator. Visual inspection of the parametric curves against the Turnbull estimate in Fig. 1A,B confirm that all of the two-parameter distributions provided reasonable fits, while the one-parameter exponential distribution was inferior. Among the cases in South Korea, the gamma and Weibull parametric models (Fig. 1C) had the best BIC value with an estimated mean of 6.9 days (95% credibility interval: 6.3-7.5) ( Table 2). Among cases from Saudi Arabia, the lognormal distribution (Fig. 1D) had the best BIC value with an estimated mean of 5.0 days (95% credibility interval: 4.0-6.6) ( Table 2). The other fitted two-parameter distributions had generally similar means, with 95 th percentiles in the range 10-14 days and 99 th percentiles in the range 14-22 days ( Table 2). Except for the exponential distribution, the various two-parameter distributions had similar BIC values among the cases in each location.
Since a lognormal distribution gave a good fit to the data in both locations, we pooled information from both locations and fitted a log-linear regression model to the data. Using that model, we found that the mean incubation period was 1.40 times longer (95% credibility interval: 1.15-1.71) among cases in South Korea compared to Saudi Arabia without adjustment, and the estimate was almost the same after adjustment for age and sex (Table 3).

Discussion
Using all available data for the recent outbreak of MERS-CoV infections in South Korea, and published data from cases in Saudi Arabia, we estimated that the mean incubation period was 6.9 days for cases in South Korea and 5.0 days for cases in Saudi Arabia. In various parametric models, the 95 th percentiles were in the range 10-14 days, which is consistent with the currently used case definitions 6 . While it is difficult to estimate the right hand tail of the incubation period distribution based on small sample sizes, we estimated the 99 th percentile could be as long as 14-22 days and this indicates that long incubation periods are possible. In South Korea, one of the 186 cases was reported to have an incubation period of 21 days or longer, although it has been suggested that immunosuppression in that person could potentially have delayed the onset of symptoms 15 .
Our estimates for the incubation period distribution of MERS-CoV infections in Saudi Arabia are consistent with the previous estimates of Assiri et al. 8 . in a hospital outbreak in the eastern province of Saudi Arabia based on 23 cases with an estimated median incubation period of 5.2 days (95% confidence interval: 1.9-14.7 days). Our estimates for cases in South Korea are also close to other reports with an estimated mean incubation period of 6.7 days (95% credibility interval: 6.1-7.3 days) 9 , and a median of 6 days in one hospital 9,10 .
We found a significant difference in mean incubation periods between the cases in South Korea and in Saudi Arabia (Table 3). This difference could be related to the transmission dynamics of MERS-CoV infection with only secondary cases and longer transmission chains in the outbreak in South Korea 9 , compared with cases in Saudi Arabia included in this study where a majority (74%) came from the same hospital 8 where it has already been shown that the transmission chain was shorter with multiple separate animal-to-human infections 16,17 . Potential direct transmission could be related to a higher infecting dose and higher virulence of the strain that could lead to a shorter incubation period 18 . A recent studies on MERS-CoV transmission during the outbreak in South Korea reported different estimates of the incubation period depending on the intensity of exposure and/or inoculation route 19 . Indeed, the authors showed that the incubation period was significantly shorter among patients that were exposed to the index case in the same zone of the emergency room (median: 5 days; interquartile range (IQR): 4-8 days) compared with patients from different zones (median: 11 days; IQR: 6-12 days). These results strengthen the hypothesis that a higher infecting dose could have been transmitted by the index case leading to a shorter incubation period compared with cases associated with "indirect" transmission that may have been responsible for transmission in different zones of the emergency room. Further investigation on human-to-human and human-to-animal transmission dynamics would improve our understanding of the potential role of the exposure route on the incubation period. It is also possible that this difference is an artifact of different approaches to data collection or reporting in South Korea and in Saudi Arabia.
Our study had some limitations. First, we did not have access to original patient records, and our data on MERS-CoV infections in South Korea were based on publicly available information while we relied on published data for a relatively small number of cases in Saudi Arabia. There is a potential concern that symptoms and symptom onset dates might be reported differentially in the two locations. In the cases of MERS-CoV infection reported by Assiri et al. from Saudi Arabia, 20/23 cases had fever on the day of symptom onset 11 . In the outbreak in South Korea, fever was part of the case definition and symptom onset may reflect the date of onset of fever rather than other symptoms 20 In conclusion, accurate and rapid estimates of the length of incubation period are required during an outbreak to advise public health policy, to specify case definitions, and to facilitate robust mathematical modeling. In this paper, we assessed precisely the length of incubation period of MERS-CoV infections using two different datasets from Saudi Arabia and from South Korea and showed that the incubation period of MERS-CoV infections appeared to vary depending on the location of the outbreak.

Methods
Sources of data. For the outbreak in South Korea, we retrieved publicly available data from multiple sources, including the Korea Center for Disease Control and Prevention, the Korean Ministry of Health and Welfare, the World Health Organization, and local Korean news reports to compile a line list of all confirmed cases that had been reported by 27 July 2015. We used the most updated information from official reports that have been published by the Center for Disease Control and Prevention and the Ministry of Health and Welfare on a daily basis during the outbreak. The official reports included a brief description of each of all confirmed cases, including demographic characteristics (e.g., age and sex), dates of exposure, onset of symptoms and outcome. The information on exposure was mostly recorded as intervals of 2 to 15 days during which transmission was thought to have occurred rather than exact dates of presumed transmission.   Table 3. Factors associated with the incubation period of MERS-CoV infection. 1 The coefficients (β ) of the multiple linear regression were estimated using Markov Chain Monte Carlo (10,000 runs) with incubation period as the outcome variable and age, sex and location as predictors. Moreover, 10,000 samples from the posterior distributions of the incubation periods T for each patient estimated with were used here in the multiple regression model. 2 10,000 samples of the incubation periods T for each patient were drawn using MCMC.
Scientific RepoRts | 6:35839 | DOI: 10.1038/srep35839 Information on cases of MERS-CoV infection in the Middle East were retrieved from four published studies that provided individual patient data from Saudi Arabia [11][12][13][14] . We selected only the cases with available exposure information and collected data including demographic characteristics (e.g. age and sex), dates of exposure and onset of symptoms, geographical location of the exposure, and final outcome. For both locations, the day of symptoms onset was defined as the day when clinical symptoms related to MERS-CoV infection first occurred, including non-specific symptoms such as fever, chills, shortness of breath, cough, sputum, sore throat, myalgia, diarrhea, nausea and vomiting 1,11-14,20,21 . Statistical analyses. The incubation period T k for each case k is defined as T k = S k -X k , where S k is the symptom onset time and X k the infection time. Infection events are rarely observed but rather interval-censored. If case k reported that exposure to infection occurred in a period between times L k and U k , where L k ≤ X k ≤ Uk, the incubation time therefore is bounded by the interval (S k -U k , S k -L k ). These interval-censored data are a special type of survival data, and it is possible to "reverse" the time axis considering S k as the origin and X k as the outcome time, if the density function for infection is uniform in chronologic time 22 . This condition should be reasonable here in the setting of MERS-CoV infections, with each exposure interval being relatively short, and reversing the time axis allowed us to use standard approaches for interval-censored data. We added + 0.5 to each upper bound and − 0.5 to each lower bound to give appropriate intervals in continuous time and to account account for uncertainty in the reported exposure times 23 . For example, an exposure that was reported two and three days before illness onset would be written as an incubation period censored on the interval (1.5, 3.5) instead of (2,3).
To deal with interval-exposure data, the most basic approach is to impute the infection dates as the midpoint of exposure intervals, but this leads to overestimation of the incubation period distribution, which tends to be right-skewed 24 . Non-parametric estimation of a distribution based on interval-censored data can be done with the generalized non-parametric maximum likelihood estimator developed by Turnbull 25 . The incubation period can often be appropriately characterized by different parametric distributions that have been previously used such as gamma 9,26 , Weibull 9,27,28 , lognormal 9,18 , log-logistic, and exponential distributions. We fitted five different distributions and estimated the parameters of each distribution using Markov Chain Monte Carlo (MCMC) in a Bayesian framework. The incubation period distribution was estimated using first the interval-censored data and compared between the different parametric models (using Bayesian Information Criterion) and the Turnbull non-parametric estimate 25 .
To evaluate potential factors such as age, sex and geographic location that could be associated with the length of the incubation period, we used a linear regression model on the log of the incubation period (assuming that incubation periods generally followed lognormal distributions), which can also be referred to as a log-linear model. The multiple linear regression model used in this study is based on the following equation: where IncP i is the length of incubation period for individual i, β i 's are the regression coefficients, estimated with MCMC using flat priors, X i 's the explanatory variables labeled directly in the equation above and ε I the disturbance factor, normally distributed, independently and identically with E(εi) = 0 and V(εi) = σ 2 for all i. We used two different approaches to estimate model parameters, including an exact likelihood method and a resampling method 29 .
Approach 1: exact likelihood approach. The equation (1)  and consequently using equation (2) we can define the following pdf of the normal distribution: We defined the probability q i as: i L i U is the range of incubation period for case i and where (k, θ) is the couple of parameters of the gamma distribution.
Approach 2: resampling approach. We defined another multiple linear regression model using incubation times resampled from the 10,000 posterior samples. In this approach, the probability P(ε i ) was similarly defined as in equation (3) and for each patient with interval-censored exposure data, we estimated 10,000 posterior samples for the incubation time using MCMC in order to simulate the incubation period distribution for each patient. We used the same likelihood as defined in equation (5) using the resampled incubation time for each patient.