The age distribution of mortality from novel coronavirus disease (COVID-19) suggests no large difference of susceptibility by age

Among Italy, Spain, and Japan, the age distributions of COVID-19 mortality show only small variation even though the number of deaths per country shows large variation. To understand the determinant for this situation, we constructed a mathematical model describing the transmission dynamics and natural history of COVID-19 and analyzed the dataset of mortality in Italy, Spain, and Japan. We estimated the parameter which describes the age-dependency of susceptibility by fitting the model to reported data, including the effect of change in contact patterns during the epidemics of COVID-19, and the fraction of symptomatic infections. Our study revealed that if the mortality rate or the fraction of symptomatic infections among all COVID-19 cases does not depend on age, then unrealistically different age-dependencies of susceptibilities against COVID-19 infections between Italy, Japan, and Spain are required to explain the similar age distribution of mortality but different basic reproduction numbers (R0). Variation of susceptibility by age itself cannot explain the robust age distribution in mortality by COVID-19 infections in those three countries, however it does suggest that the age-dependencies of (i) the mortality rate and (ii) the fraction of symptomatic infections among all COVID-19 cases determine the age distribution of mortality by COVID-19.

Since its emergence, coronavirus disease 2019 (COVID-19) has resulted in a pandemic and has produced a huge number of cases worldwide 1 . As of May 29, 2020, the number of confirmed cases in Italy was 382.3 (per 100,000 population), with 507.2 in Spain, and 13.2 in Japan 1 . Of those infected, it has been reported that elderly individuals account for a large portion of fatal cases inducing a large heterogeneity in the age distribution of mortality [2][3][4] .
The expected value of mortality (the number of deaths, hereafter referred to as mortality) is calculated as the product of the number of cases and the mortality rate among cases (hereafter referred to as morality rate). As the background mechanism of the heterogeneity of mortality by age, the association of two epidemiological factors with mortality can be considered: (i) the age-dependency of susceptibility to infection, which is related to the heterogeneity in the number of cases, and (ii) the age-dependency of severity, which is related to the heterogeneity in the mortality rate, e.g. the rate of becoming a symptomatic, severe, or fatal case among infected individuals. For the first factor, a high susceptibility for infection will generate a larger number of infections and result in an increase in fatal cases. The possibility of heterogeneity in susceptibility by age was pointed out by the analysis of epidemiological data reported from Wuhan, China 4-6 and from Iceland 7 . For the second factor, an increase in severity will result in a higher mortality rate and subsequently a rise in the number of fatal cases. This assumption is also reasonable because elder age as well as the existence of comorbidities, which are likely with aging, have been reported as risk factors for severe COVID-19 infections [8][9][10][11][12][13] .
Although not yet shown in relation to severe acute respiratory syndrome coronavirus 2 (SARS Cov-2), which is the causal agent of COVID-19, the presence of age-dependent enhancement of severity has been suggested in SARS coronavirus by the analysis of the innate immune responses in the BALB/c mouse model [14][15][16] . Additionally, it has been suggested that antibody-dependent enhancement (ADE) can contribute to the formation of the observed age-dependency of severity, as suggested in SARS and Middle East respiratory syndrome (MERS) cases [17][18][19][20][21][22] .
Interestingly, the age distribution of mortality by COVID-19 (the distribution of the proportion of deaths per age group among all deaths), is similar between Italy, Japan, and Spain, even though the number of deaths are quite different among them [23][24][25] (Fig. 1). The reported number of deaths was 3 in 0-9 years old (yo), 0 in 10-19 yo, 11 in 20-29 yo, 58 in 30-39 yo, 257 in 40-49 yo, 1,051 in 50-59 yo, 3,107 in 60-69 yo, and 25,038 in 70 + yo in Italy as of May 13, 2020. In Japan, that was 0 in 0-9 yo, 0 in 10-19 yo, 0 in 20-29 yo, 2 in 30-39 yo, 8 in 40-49 yo, 16 in 50-59 yo, 44 in 60-69 yo, and 330 in over 70 + yo as of May 7, 2020. In Spain, that was 2 in age 0-9 yo, 5 in 10-19 yo, 23 in 20-29 yo, 61 in 30-39 yo, 198 in 40-49 yo, 607 in 50-59 yo, 1669 in 60-69 yo, and 16,253 in over 70 + yo as of May 12, 2020. According to projections by the United Nations 26 , the population size for 2020 per 1,000,000 was 4.99 in 0-9 yo, 5.73 in 10-19 yo, 6.10 in 20-29 yo, 7.00 in 30-39 yo, 9. 40-49 yo, 7.05 in 50-59 yo, 5.34 in 60-69 yo, and 6.94 in 70 + yo. The large difference in the number of deaths between the countries suggests a large difference in their basic reproduction numbers, R 0 s. An independency between age distribution of mortality by COVID-19 and R 0 is suggested. From this independency of age distributions of mortality from R 0 , it can be expected that the contribution of heterogeneity in susceptibility by age to forming the age distribution of mortality is small. That is because, as we will show in this paper, though the age-dependency of severity will naturally produce a proportional effect on the distribution of mortality and result in the formation of robust distributions, when the age-dependency of susceptibility forms the age distribution of mortality, the age distribution of mortality highly depends on R 0 and shows variability.
To understand the background of robust age distribution of mortality with varied R 0 , we constructed a mathematical model describing the transmission dynamics of COVID-19 and analyzed the impact of age-dependent susceptibility on the age distribution of mortality. The heterogeneity in social contacts by age may also contribute to the age distribution of mortality. Our model took into account the heterogeneity in social contacts by age and country, and the effect of behavioral change outside of the household during the outbreak. We also estimated and compared the age-dependent susceptibility in Japan, Italy, and Spain to argue the existence of heterogeneity in susceptibility among age groups.

Results
Our result shows variation of susceptibility among age groups measured by the exponent parameter φ can explain the age distribution of mortality by COVID-19 (Fig. 2a). However, the age distribution of mortality formed by the age-dependency of susceptibility is influenced by the value of R 0 (Fig. 2b), which cannot explain the similarity in age distributions of mortality among Italy, Japan, and Spain. On the other hand, if susceptibility is constant among age groups, the impact of R 0 is quite small on the age distribution of mortality (Fig. 3).

Discussion
In the present study, we explored the role of susceptibility to COVID-19 in explaining the age distribution of mortality by COVID-19. Interestingly, the age distributions of mortality from COVID-19 are quite similar between Italy, Japan, and Spain ( Fig. 1). When comparing the age distributions of mortality, only the comparison between Italy and Spain is significant (p < 0.05 in Wilcoxon rank sum test with Bonferroni correction). On the other hand, the numbers of deaths are quite different (29,525 for Italy, 400 for Japan, 18,818 for Spain). Indeed, R 0 values are largely different: 2.4-3.3 for Italy 27,28 , 1.7 for Japan 29 , and 2.9 for Spain 30 . If the variation of mortality by age is determined by only the age-dependency of susceptibility, the age distribution of mortality is affected by R 0 as shown in Fig. 2b. However, we observed a similarity in age distributions of mortalities between Italy, Japan,  Figure 3. The sensitivity of transmission coefficient β against age distribution of mortality when it was assumed that age-dependent mortality was proportional to cCFR per age group. All parameters were fixed and parameterized as the setting for Spain except the transmission coefficient β. www.nature.com/scientificreports/ and Spain where their R 0 s are quite different. Indeed, unrealistically different φs among these three countries are required to explain their age distribution of mortality for both settings, (i) age-independent mortality, and, (ii) the fraction of infections that becomes symptomatic among all COVID-19 cases, f s , does not depend on age.
Although we cannot fully reject the existence of age-dependency in susceptibility, our results suggest that it does not largely depend on age, but rather that age-dependency in severity highly contributes to the formation of the observed age distribution in mortality. The estimates of φs assuming age independency in symptomatic infections were smaller than those that assumed age independency in mortality. This suggests that the age-dependency of the confirmed case fatality rate (cCFR), which can be biased by the age-dependent difference of the fraction of symptomatic infections among all cases, partially explains the age distribution in mortality. Indeed, when we assumed that the fraction of symptomatic infections was not dependent on age, the estimate of φ in Japan was close to zero in all scenarios regarding the fraction of symptomatic infections, meaning that susceptibility is constant among age groups (Fig. 5). Although we observed φs not close to zero in Italy and Spain, this does not mean straightforwardly that susceptibility is age dependent because there is room for an alternative explanation: not susceptibility, but an age-dependent fraction of symptomatic infections can explain this age-dependency. Unfortunately, as we do not yet have detailed data regarding the age-dependent fraction of symptomatic infections and the rate of diagnosis in COVID-19, we cannot conclude which factors (i.e., susceptibility or the fraction of symptomatic infection among all cases) contributed to the observed age-dependency.
Wu et al. 4 showed variation of susceptibility to symptomatic infection by age. This susceptibility can be expressed as the product of the susceptibility and the fraction of symptomatic infection among all cases. To accurately understand susceptibility (i.e., without the constraint of the symptom onset), estimates of the agedependent fraction of symptomatic infections is required.
To understand the mechanism of age-dependency of mortality by COVID-19, an accurate age-dependent mortality rate is required. The data of mortality by COVID-19 infections used in this study might not cover all mortalities by COVID-19 infections. To estimate the age-dependent mortality rate, an accurate estimate of the case fatality rate is required. However, the number of cases, which is the denominator of the case fatality rate, is  www.nature.com/scientificreports/ difficult to estimate for COVID-19 due to changes in the testing rate [31][32][33] , the change of case definition 34 , selection biases 35 , and the delay between the onset of symptoms and death 12,36-38 as were the cases we experienced in the surveillance of other emerging diseases 39,40 . To address this problem, implementation of active epidemiological surveillances, such as a large-scale cohort study including real-time detection of infections, should be considered. From the modelling perspective on mortality by Covid-19, age-dependency of severity should be carefully taken into consideration. In particular, in the mathematical models of ADE, previous models employed three types of assumptions 41 , the assumption of: increasing susceptibility to infection 42,43 , increasing transmissibility once infection occurred 42,44,45 , and increasing severity and/or mortality associated with infection 46 . Based on our results and from the biological/epidemiological observations of past SARS and MERS cases, the "increasing severity" assumption should be taken into account when analyzing SARS Cov-2 epidemics.
We modelled the age-specific susceptibility as a power law function based on the monotonic increase of mortality by COVID-19 over age as seen in Fig. 1. The power law function is widely used to model heterogeneity, e.g., the heterogeneity in risks of sexually transmitted infections 47 . Although our model for age-specific susceptibility covers a wide variation of monotonic changes, our results might be biased by this formulation if the susceptibility changes over age in non-monotonic fashion.
The increase in width of the confidence interval for the estimate of φ by increasing R 0 values were observed in Fig. 5. To explain with the "left-skewed" age-distribution of mortality with high R 0 , a large φ is required since the higher R 0 value decreased the heterogeneity of mortality by age (Fig. 2b) and the large φ increased the heterogeneity of mortality (Fig. 2a). The sensitivity of φ to the age-distribution of mortality becomes smaller when φ is larger (Fig. 2a), the large widths of the confidence intervals for the estimate of φ is necessary to explain the age-distribution of mortality when R 0 is high.
In conclusion, the contribution of age-dependency to susceptibility is difficult to use to explain the robust age distribution in mortalities by COVID-19, and it suggests that the age-dependencies of the mortality rate and the fraction of symptomatic infections among all COVID-19 cases determine the age distribution in mortality from www.nature.com/scientificreports/ COVID-19. Further investigations regarding age-dependency on the fraction of infections becoming symptomatic is required to understand the mechanism behind the mortality by COVID-19 infections.

Materials and methods
Data. We analyzed the number of mortalities caused by COVID-19 in Italy reported on 13th May 2020, Japan reported on 7th May 2020, and Spain reported on 12th May 2020. The data were collected from public data sources in each country 23-25 . Model. A simple SEIRD model taking into account mixing between age groups (model 1). To understand the background of robust age distribution of mortality with varied R 0 , we employed a mathematical model describing transmissions of COVID-19. Clinical observations suggest that both asymptomatic and symptomatic cases are infectious after the latent period 48,49 , we used a simple age-structured SEIRD (susceptible-exposedinfectious-recovered-dead) model, which can be written as; where S n , E n , I n , R n and D n represent the proportion of susceptible, latent, infectious, recovered and dead among the entire population, and the subscript index n denotes age group. We stratified the entire population by into eight groups, n = 1, 2, 3, 4, 5, 6, 7, and 8 for < 10 yo, 10-19 yo, 20-29 yo, 30-39 yo, 40-49 yo, 50-59 yo, 60-69 yo, and 70 + yo. β, k n,m , ε, γ and δ represent a transmission coefficient, an element of the contact matrix between age group n and m, the progression rate from latent to infectious, recovery rate and mortality rate by COVID-19 infections, respectively. σ n denotes the susceptibility of age group n. For the sake of simplicity, based on the short study duration of COVID-19 epidemics compared to the length of a human lifespan, births and deaths from causes other than COVID-19 were ignored. To take into account the effect of behavioral changes outside of the household during the outbreak, k n,m is decomposed by a matrix for contacts within household k in,n,m and that for contacts outside the household k out,n,m ; where α denotes the reduced fraction of contacts outside of the household. We modelled age specific susceptibility as where c is susceptibility among age group 1 and a constant among all age groups, φ denotes the exponent parameter describing the variation of susceptibility among age groups. An increase in φ means an increase in the variation of susceptibility among age groups, and φ = 0 means that susceptibility is equal among all age groups.
SEIRD model taking into account mixing between age groups, asymptomatic/symptomatic, and age-dependency of mortality by . Model 1 does not classify the cases into asymptomatic and symptomatic cases explicitly. If the progression of disease is largely different between asymptomatic and symptomatic cases, the estimates using model 1 can be biased. In addition, the age-dependency of mortality by COVID-19 infections is not taken into account. Model 2 takes into account both the different progression of disease between asymptomatic and symptomatic cases and the age-dependency of mortality by COVID-19 infections; (1) S ′ n = −βσ n S n m k n,m I m ,  www.nature.com/scientificreports/ where I s,n and I a,n represent the proportion of symptomatic and asymptomatic cases among age group n. Other compartments are the same as model 1. f s represents the fraction of symptomatic infections among all COVID-19 cases and δ n represents the mortality rate by COVID-19 infection among age group n. γ s and γ a denote the recovery rates among symptomatic and asymptomatic cases. Other parameters are the same as model 1.

Parameterizations.
We parameterized ε and γ using values from a previous modelling study of 50,51 . The average length of the latent period (i.e., 1/ε) was set to 6.4 days 48,50 , assuming that the latent period is equal to the incubation period, and the average length of the infectious period (i.e., 1/γ) was 7 days 48,51 for model 1. In model 2, to take into account the different infectious period between symptomatic and asymptomatic infections, we set an average length of infectious period among asymptomatic cases (i.e., 1/γ a ) as 9 days 49 and an average length of infectious period among symptomatic cases (i.e., 1/γ s ) as 7 days. We referred to the contact matrices for Italy, Japan, and Spain from Prem et al. 52 . β and c were controlled such that the basic reproduction number, R 0 , becomes arbitral values. R 0 was calculated by constructing a next generation matrix 53,54 using each country's demographic data obtained from a public data source 26 .
In terms of parameterization for mortality rate by COVID-19 infection, a reliable estimate of δ n for COVID-19 is difficult to obtain. Due to the uncertainty of the fraction of symptomatic infections per age group, δ n is difficult to estimate from observed data, i.e., the confirmed case fatality rate among age group n (cCFR n ). Since an estimate of δ n is difficult to obtain, we employed two different settings (i) δ n is assumed to be a constant among all age groups as assumed in the model 1, i.e., δ n = δ for any age group n, or, (ii) δ n is calculated from cCFR n assuming that the fraction of symptomatic infections among all COVID-19 cases (f s ) is not dependent on age as assumed in model 2.
In the setting for model 1, the value of δ is not required to estimate D n once the value of R 0 is given. We calculated D n by calculating the proportions of recovered persons per age group among all recovered persons R n (∞)/ n R n (∞) instead of D n (∞)/ n D n (∞) . In our model, shown in Eq. (1-5), R n (∞)/ n R n (∞) is determined by the value of R 0 completely when all parameter values other than β and δ are fixed, and D n (∞)/ n D n (∞) = R n (∞)/ n R n (∞) if δ n = 0 . The proof can be found in the Supplemental text.
The assumption in model 1, δ n is constant among all age groups, may be too strong for the COVID-19 epidemic. To take into account the age-dependency of mortality by COVID-19, δ n was calculated from the cCFR n assuming that f s is not dependent with age. For the setting in model 2, assuming three scenarios; f s = 0.05, 0.25, and 0.5, δ n for each country were calculated using cCFR n in each country. We obtained δ n by solving cCFR n = δ n / (δ n + γ s ).
Fitting. We calculated the proportions of deaths in the age group n among all deaths, D n ( = D n (∞)/ n D n (∞) ), and fitted them to the observed data in each country. We solved model 1 shown in Eqs. (1)(2)(3)(4)(5) and model 2 shown in Eqs. (8)(9)(10)(11)(12)(13) numerically, and D n was calculated after sufficient time was given to finish the epidemics. We estimated φ using a log likelihood function describing the multinomial sampling process of deaths per age group; Maximum likelihood estimates of φ with given R 0 were obtained by maximizing Eq. (14) and the profile likelihood-based confidence intervals were computed.

Data availability
All data collected and analyzed during this study are included in this published article and its Supplementary Information files. www.nature.com/scientificreports/