Introduction

Kawasaki disease (KD) is an acute vasculitis of childhood which can be complicated by the development of coronary artery aneurysms (CAA)1. It is the most common cause of acquired heart disease in children in developed countries1, leading to important lifelong morbidity and mortality2. KD has been reported worldwide, with a marked geographic variation in incidence, and is characterized by a male predominance3,4. Despite extensive previous research, the etiology of the disease remains unknown. A genetic component is suggested from global epidemiological data, the distribution of the disease amongst ethnic groups, the clustering of cases within families and the increased risk of recurrence5,6,7,8,9,10. In addition, whole genome association studies have identified multiple genetic polymorphisms in inflammatory pathways which may increase the risk of KD11,12. Previous epidemiological studies have described associations between KD and various infectious diseases, pollen, pollution, early childhood environment and seasonality, albeit largely without considering the potential interaction between those factors13,14,15,16. Spatiotemporal clusters have been described in the distribution of KD and have been associated with local environmental factors and outbreaks of infectious diseases15,17,18,19. The prevailing consensus regarding the etiology of KD is that it occurs in individuals with a genetic and potential early childhood susceptibility, which predisposes to the development of a hyper-reactive immune response when exposed to an unidentified environmental or infectious trigger(s)20. In a recent studies, it was demonstrated that lower habitual exposure to environmental allergens and a preceding non-specific infectious exposure were associated with an increased risk of developing KD21. We herein present an integrative model that describes the variation in incidence of KD over time and space in Canada and its association with incidence of infections and allergic conditions, weather components, pollution, atmospheric biological particles and population composition.

Methods

KD data and identification of KD cases across Canada

Hospital admissions for Kawasaki disease were identified based on a primary or secondary discharge diagnosis of mucocutaneous lymph node syndrome (ICD-10-CA standard code M30.3). De-identified data from all hospital admissions (ages 0–18) from April 1, 2004 to March 31, 2017 were obtained from the Discharge Abstract Database (DAD) maintained by the Canadian Institute for Health Information (CIHI). Submission of administrative data to CIHI is legally mandated for all hospitals in Canada except for those in Quebec. In Canada, the entire population is covered by a single-payer universal health plan. A universal provincial health number was provided in an encrypted format for each case, which allowed us to exclude multiple admissions for a given patient. A previously validated algorithm was used to exclude all readmissions and transfers between hospitals22. For each KD case reported in DAD, the date of hospital admission and the forward sortation area (FSA)23 for the patient residence were provided. These data were previously used to determine and report the epidemiology of KD in Canada22.

Population at risk, ethnicity, age distribution, population density and rural effect

Information on population, ethnicity, medium household income and age distribution was obtained from the Canadian Census 200624,25,26, 201127 and 201628, and the National Household Survey29. Values from the 2006 census were used for 2004–2008, from the 2011 census/household survey for 2009–2014, and from the 2016 census for 2015–2017. FSAs without a Census population count were not considered in the analysis. Population density was calculated based on the land area file that contains the FSAs codes and sizes in square-km, derived from the FSA cartographic boundary file available through the University of Toronto Libraries30.

Ethnicity data in the Canadian Census is self-reported, based on the individual’s perception of their ethnic ancestry and respondents are allowed to select multiple ethnicity31. Given the strong predilection of KD for children of East Asian ethnicity, we considered the percentage of the population who self-reported Chinese, Japanese, Koreans and Taiwanese as one of their ethnicity. Finally, the percentages of the population in each FSA who live in postal codes classified as “rural” was calculated, with rural being defined as FSAs with a zero in the second digit.

Healthcare encounters associated with infectious diseases and allergies

Population-level incidences of health care encounters associated with specific infections or allergy-related illnesses were obtained from both DAD and the voluntary National Ambulatory Care Reporting System Metadata (NACRS), which is also maintained by CIHI. Participation in NACRS is determined at the provincial level and achieves complete capture of ambulatory care in participating provinces. As it was not mandatory for all provinces to submit data to NACRS, models containing healthcare encounters for infections or atopic diseases as covariates were adjusted for the percentage of mandatory submission of infection and atopy data for each respective region. Data were obtained for all healthcare encounters across Canada which were associated with an ICD-10 diagnostic code for selected infections and allergy-related illnesses (see Table 1) in hospital admissions, and outpatient and community-based clinic encounters. These data were used to calculate monthly counts for codes and groups of codes. Infectious diseases and organisms included in this analysis were selected based on their prevalence in Canadian children. Infectious diseases were grouped in 2 separate ways: first, according to the type of etiologic agent (viral, bacterial, fungal infections or infections of unknown origins) and the affected physiological system (upper respiratory tract, lower respiratory tract, gastrointestinal tract and skin) and second, by the specific pathogen involved in the healthcare encounter. This dual system allowed us to consider both the contribution of specific pathogens but also of ailments for which clinical care rarely needs the identification of the specific pathogen and as such the majority of such encounters remain of indeterminate cause (e.g. viral intestinal infections).

Table 1 Comparison of hierarchical models with different temporal structures, adjustment factors and covariates.

Weather and pollution data

Monthly weather data were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis ERA-Interim product. ERA-Interim uses an integrated forecasting system model and the 4D-variational data assimilation system32. We included variables that are known or suspected to directly impact human health, or influence disease vectors and/or the growth and transport of aeroallergens (see full list in Table 1). Air pollution data were obtained from the ECMWF Copernicus Atmospheric Monitoring Service (CAMS) Reanalysis [2018]33. The CAMS Reanalysis provides information on aerosol and reactive gases as total column values and are based on assimilation of satellite observations into a forecasting model. We included particulate matter smaller than 2.5 µm (PM2.5), nitrogen dioxide, ozone, sulfur dioxide and carbon monoxide from the CAMS reanalysis. Given their similar association with the incidence of KD, nitrogen dioxide and sulfur dioxide levels were combined into a single variable by summing up the two variables. The reanalysis data were obtained on a 1.25-degree longitude and 1.25-degree latitude grid, and values for grid cells with a center inside an FSA were aggregated to obtain data per FSA.

Atmospheric biological particles

Daily measures of major biological atmospheric particles are collected by Aerobiology Research Laboratories (Ottawa, Canada) via rotational impaction sampling methods. These data are representative of the amount of environmental allergens in the atmosphere. Data were obtained for the 10 cities representing the largest populations in Canada (Vancouver, Edmonton, Calgary, Regina, Winnipeg, London, Toronto, Ottawa, Moncton and St. Johns). Previous studies have shown that biological atmospheric particle concentration can be regionalized up to ranges of ~ 100 km from the sampling site (~ 30,000 km2) for analysis34. As such, for each FSA, atmospheric biological particle data were assigned from the closest city (i.e. the closest particle collection point). These data are only available during the growing season of the year. Pollen counts before and after the growing season were set to zero.

Data preparation for modelling

Due to the different scales and units of the covariates, all covariates were centered to mean zero and scaled to variance one before the analysis, except for wind direction which was not centered to conserve the wind direction.

Aggregation of spatial units

All data except for weather and pollution data was originally available on an FSA level. FSA boundaries were obtained from the 2011 Canadian Census35. FSAs designate a geographical area based on the first 3 digits of the postal code (equivalent to the US zip code) that corresponds to an average of ~ 20,000 individuals. In order to maintain sufficient case density for analysis, Northern regions, where population density was extremely low, were excluded. The remaining FSAs were combined using a data driven strategy.

To begin with, the FSA with the smallest population (R1B with 10 inhabitants) was selected. FSAs that share at least one common boundary with this initial FSA were identified (R1A and R3C). The neighbouring FSA with the smallest population (R3C with 1620 population) was then combined to the initial FSA, i.e. the spatial polygons of R1B and R3C were aggregated by dissolving their internal boundaries. We obtain a new map, for which we have one new area (R1B ∩ R3C) with a population of 1630. From this new map, we again selected the FSA with the smallest population (V4G). This FSA had 5 neighbours from which the one with the smallest population was selected for spatial aggregation. This procedure was repeated until FSAs were combined into 100 groups. The supplementary map (online auxiliary material) shows the combined regions as well as lists of all FSAs that were aggregated for each region. Spatial aggregation was performed in order to preserve the granularity of the data and achieve sufficient case density for modelling.

Bayesian disease model

A hierarchical Bayesian model framework is used to address the issues posed by the sparseness of the data and the spatial dependencies between the data36,37. The standardized incidence rate for KD in each area and month was modelled with the Besag-York-Mollie (BYM) model38, which includes spatial random effects that account for the spatial autocorrelation of the data. Temporal effects have been added to the model sequentially. We added a linear trend, dynamic non-linear trend39, specified through a random walk model of order 1, and a seasonal term. More details on the model parameterization can be found in the supplement.

To approximate posterior marginals, we used an integrated nested Laplace approximation (INLA) approach, which is a deterministic algorithm for Bayesian inference proposed by Rue et al.40 The computation was performed with the R-INLA interface41. To select the best temporal structure of the model and to select variables, we compared the Watanabe-Akaike information criterion (WAIC) of different models (see Table 1). The calculation of WAIC followed Gelman et al.42.

All statistical analyses and maps creation for this project were performed using R version 4.0.2 (https://www.R-project.org). Base map and data obtained from OpenStreetMap and OpenStreetMap Foundation (www.openstreetmap.org).

Results

KD incidence and spatial and temporal variation

From April 2004-March 2017, a total of 5,882 acute hospital admissions for KD were identified in the discharge abstract database maintained by the Canadian Institute of Health Information (CIHI) based on the ICD-10-CA standard code M30.3, of which 266 were excluded according to our previously designed algorithm to identify readmissions and between-hospital transfers (Fig. 1)22. The final number of acute admissions included in this study was 5,616, with an annual incidence of 22.3 per 100,000 children under the age of four, 7.2 per 100,000 children from 5 to 9 years of age and 0.8 per 100,000 children from 10 to 18 years of age. Spatiotemporal variations of KD were modelled with a Bayesian hierarchical model (see Expanded methods and Supplementary information for details), which includes covariates related to environmental exposure (list in Table 1). We compared several spatiotemporal models that accounted for the temporal structure differently, but were all based on the Besag-York-Mollie model to account for spatial autocorrelation38. The final multivariable model included a linear and non-linear time component, and covariates that improved the performance of the model.

Figure 1
figure 1

Identification of KD patients and theoretical framework of the spatiotemporal matrix for use for modelling. The spatial and temporal scales were held constant for incidence of KD and the distribution of all predictive features so that data points at the same position in the matrix are aligned with each other in time and space. The temporal (1 month) and spatial (group of ~ 10 FSAs) calipers were decided based on density distribution of the incidence of KD. The example below (red bar) shows the alignment of the incidence of KD and of a generic predicting feature at a specific time and location in our spatiotemporal matrix. KD Kawasaki disease, n number, FSA Forward Sortation Area.

Standardized incidence rates obtained from the mean of the posterior distribution of the temporal and spatial random effects are presented in Fig. 2. The model suggests that the risk for KD was highest between late 2010 and 2011, and reached its lowest point at the end of 2007 (Fig. 2A). As expected, a distinct seasonal distribution with higher risk for KD during the winter months and relatively lower risks at the end of the summer months was found (Fig. 2B). Spatial areas with increased risk for KD were identified in some areas around Toronto and Halifax (spatial standardized incidence rate of larger 1.5 and 1.8 respectively), with a posterior probability > 99% (Fig. 2C). Further hotspots were found in region of Northern Ontario and the city of Calgary (spatial standardized incidence rate > 1.3). A detailed map including posterior probabilities is provided in Appendix 2.

Figure 2
figure 2

Posterior mean value for the temporal and spatial effects for the risk of Kawasaki disease (KD) in Canada from April 2004 to March 2017 based on the Bayesian hierarchical model. Standardized incidence rate above 1 indicate increased risks, and below 1 decreased risks. (a) Monthly temporal unstructured and structured trends. An increasing trend is visible for the unstructured effect (solid line) across time. The structured effect (dashed line) shows nearly no fluctuation. (b) The seasonal effect illustrates a below average standardized incidence rate for KD in September and a high risk in the winter months. (c) Mean spatially autocorrelated random effect, expressed as area-specific standardized incidence rate. Areas with standardized incidence rate larger than 1 have an above average risk compared to the rest of Canada, while standardized incidence rate smaller than 1 indicate a below average risk. The red colours in the Greater Toronto Area and around Halifax for example indicate a 50 to 90% increase of risk. For better visibility of small regions, an interactive map can be found in the online appendix. The spatial units (of the analysis regions consisting of several forward sorting areas) are indicated with boundaries and grey filling. White regions were added to the plot for easier orientation. For uncertainty estimates, see online appendix.

Environmental factors and exposures associated with distribution of KD

For each of the covariates, a base model that included only spatial and temporal effects and adjusted for distribution of age and ethnicity in the population was compared with models that additionally included one of the covariates of interest based on model fit, measured by the Watanabe-Akaike information criteria (WAIC). The WAIC for the base model was 23,134.0, while the adjusted base model WAIC was 23,128.0 (see top of Table 1). Only variables that reduced the WAIC from the base model by more than 1, i.e. improved the model fit, were included in a multivariable model. In a second step, the deletion of each variable from this multivariable model was tested in a backward selection to identify and remove variables that did not contribute to the multivariable model fit (see Table 1 and Fig. 3).

Figure 3
figure 3

Association of factors with the standardized incidence rate for Kawasaki disease (KD). Distribution of the standardized incidence rate (line = median) based on the posteriors of the fixed effects (same as multivariable results in Table 1). The values represent the change in standardized incidence rates for a 1 standard deviation change in the predictor. A value below 1.0 (blue) indicates a negative association between the predictor and the standardized incidence rate for KD, and values above 1.0 (red) indicate a positive association. The background color of the labels indicates the type of predictor, with grey representing genetic and demographic factors, blue representing atmospheric pollution, red representing the incidence of healthcare encounters for various causes and green representing atmospheric concentration of biological particles. at.c atmospheric concentration, HE healthcare encounters.

Both adjustment factors, the percentage of population under 4 years of age and the percentage of population self-identifying as Taiwanese, Korean, Japanese or Chinese (‘East Asian’) were associated with increased risk for KD. In the final multivariable model, environmental factors contributing to an increased risk of KD included lower median income, lower proportion of an area classified as rural, increased atmospheric concentration of spores from Myxomycetes (slime molds), Deuteromycetes (fungi imperfecti) and algae concentrations, lower atmospheric concentration of Dothideomycetes spores and higher combined smog (NO2 + SO2) concentration in the atmosphere. Increased incidence of KD was also associated with higher incidence of healthcare encounters for atopic dermatitis, viral intestinal illnesses, bacterial pneumonia (associated with Mycoplasma, Klebsiella and possibly Haemophilus influenzae), and multisystem infections with a heterogeneous group of pathogens known to be potentially associated with hyperinflammatory or autoimmune syndromes (Campylobacter, cytomegalovirus, coronavirus, measles, Salmonella). Higher incidence of healthcare encounters for bacterial skin infections or fungal infection secondary to Candida were associated with reduced incidence of KD. The WAIC for the final multivariable model was 23,059.7.

Correlation between modelled and observed KD rates

The correlation between the multi-year mean of observed and modelled KD incidence across the regions was 0.98 (Fig. 4A). The correlation across time was 0.86 (Fig. 4B). The time series of the observed and modelled KD incidences averaged across Canada (Fig. 4C) shows that the model reproduced the temporal variation well. Adding environmental factors, aeroallergen exposure and infectious agents to the model improved the observed-to-modelled correlation across years and provinces compared to our model that only adjusted to the population numbers, age and ethnicity distribution (correlation of 0.56 vs. 0.47).

Figure 4
figure 4

Correlation between observed and modelled Kawasaki disease (KD) rates. Modelled rates have been obtained from the exponentiated risk ratios of the multivariable hierarchical Bayesian model posteriors, multiplied by the expected rate. (a) Multi-year mean of observed and modelled KD rates, where each dot represents 1 of the 100 regions (groups of forward sorting areas). (b) Regional mean of observed and modelled KD rates. The rates were aggregated across all of Canada, and each dot represents 1 month of the 13 years. (c) Time series of observed (black) vs modelled (red) KD rates averaged over Canada from April 2004 to March 2017.

Discussion

In this study, a mathematical model of the spatiotemporal distribution of KD in Canada over a 13-year period was derived. Factors found to be associated with the spatiotemporal incidence of KD included numerous environmental and infectious factors which were integrated into a comprehensive model for the distribution of KD. This study is significant both from the mathematical approach used to combine numerous dimensions of risk measured on different scales into a single model and by the high degree with which it could recapitulate the spatiotemporal distribution of KD. Such a model could eventually form the basis of a prediction and surveillance model for the future incidence of KD in a specific location. Careful examination of associated factors and how they contribute to the risk of KD can also substantially contribute to our understanding of the etiology and pathogenesis of KD (Fig. 5).

Figure 5
figure 5

Hypothetical framework of Kawasaki disease (KD) etiology and risk based on published evidence, our previously published environmental case control study21 and the evidence presented in our current spatiotemporal model of the spatiotemporal distribution of KD. Red indicates higher risk while green indicates lower risk.

Some well known features of the epidemiology of KD were readily apparent in the associations observed in this study, including an increased risk in children 0–4 years old (although up to 30% of cases occur after 5 years of age), presence of seasonal patterns14 and spatiotemporal clusters17,18,19, as well as a higher risk in people of East Asian ancestry11. Numerous additional environmental factors were also identified in this study, some of which had already been reported in previous studies. Without a known etiological agent, epidemiological patterns and associations can be used to extrapolate some aspects of the etiology and pathogenesis of the disease. Positing a framework where KD is an immunological reaction to a still unidentified trigger(s) in genetically and/or immunologically susceptible individuals, epidemiological patterns and associations can reflect the inherent susceptibility of the population, the local or global distribution of the trigger or can be reflecting the presence of factor(s) modulating the risk of developing KD when encountering said trigger(s). Some environmental factors identified in this study might fit those scenarios and are discussed below.

Beyond the genetic susceptibility to KD, which is evidenced by the strong association between Asian ancestry and various genetic polymorphisms and increased risk of KD11, other factors might increase the inherent susceptibility of a population to KD through early childhood experience. It has been proposed that KD partially develops through the hygiene hypothesis, where lack of early microbial experience and limited exposure to allergens may promote atopy and predisposes children to a hyper-reactive response to a normal trigger43. A study in Japan reported that children living in households with up to three persons had a 1.6 fold increased risk of developing KD compared to those from households with more than five persons16.

The seasonal pattern of KD noted in this study was concordant with previous findings22,44. Seasonal variations in KD have been the subject of multiple previous studies, and is one of the most consistently described epidemiological features14. Multiple environmental factors found to be associated with the incidence of KD in this study also have a seasonal distribution. Thus, these may contribute, in a multifactorial manner, to the seasonal distribution of KD.

Results from previous studies have shown that some environmental exposures might be associated with the risk of KD, either positively or negatively. Several studies have reported higher incidences of KD associated with urbanized settings which might reflect the presence of modulating factors16,21,45. In a previous case–control epidemiological study we showed that children living in areas with more trees, farms and closer proximity to a natural body of water were at lower risk of KD than children living in areas with reduced exposures21. In this study, some of the factors identified for KD are consistent with these previous findings, including the presence of spatial hotspots in cities, decreased risk in rural areas and in environment with denser vegetation and increased atmospheric concentration of some biological particles.

At the same time, activation of the inflammasome via various environmental exposures might be increasing the risk of KD46. One such example is pollution exposure. In this study, an association between higher atmospheric concentration of SO2 and NO2 and increased risk of KD was noted. A recent Japanese publication reported that a 1-μg/m3 increase in NO and SO2 were associated with a higher incidence of KD (3.94, 95% CI 0.04–7.98 and 3.60, 95% CI 1.12–6.14 respectively)47. Air pollution has also been shown to negatively affect the respiratory mucosa through an interaction with several defense mechanisms leading to increased mucous production, epithelial barrier dysfunction and attenuated cytokine response48,49,50. The association between lower household income and increasing KD risk found in this study could also be related to activation of the inflammasome. Children from lower socioeconomic areas are known to have generally greater exposure to pollution, dust, toxins and allergens, and are at higher risk of multiple atopic and inflammatory disease, such as asthma and KD because of those exposures16,51,52.

Beyond susceptibility, case distribution and modulation, environmental associations might point to potential targets for the trigger(s). Higher atmospheric levels of Deuteromycetes (fungi imperfecti) and Myxomycetes (slime molds), both spore-producing, were found to be associated with higher incidences of KD, as were greater atmospheric concentrations of algal spores. Many spore-producing fungi and some algae have allergenic properties, and are known to be capable of initiating an immune response in human in addition to activating the inflammasome53,54. In a mouse model mimicking KD, intraperitoneal injection of Candida albicans extract has been shown to induce coronary arteritis55. Fungi are also capable of producing MAMPs such as beta-glucans and chitin, which may stimulate an inappropriate immune response through pattern recognition receptors56.

There is also a longstanding recognition that a substantial number of patients with KD present either with a documented infection, a recent history of an infectious illness or with infectious disease symptoms57. Numerous previous studies have postulated that one or more infectious agents is/are the trigger for KD, but no consistent or definitive findings have been reported58. Both respiratory and gastrointestinal diseases have been reported with KD and, indeed, in a previous case–control study, children with KD were found to exhibit respiratory and gastrointestinal symptoms in a higher proportion than controls up to 8–31 days before the appearance of the first KD symptoms21.

In this study, we identified groups of healthcare encounters for infections that were associated with increased incidence of KD. One interesting finding is the association between viral intestinal infections and increased incidence of KD. In North America, the most common causes of viral gastroenteritis (i.e. rotavirus, adenovirus and norovirus infections) exhibit a winter peak; notably in our previous environmental case–control study, rotavirus vaccination was associated with a lower risk of KD21. However, an autumn or spring peak is seen in other parts of the world, consistent with the global epidemiology of KD59. The association between viral intestinal infections and increased incidence of KD possibly reflect the facilitation of the entry of the trigger into the blood stream and as such would represent a modulating factor as opposed to a potential trigger. This being said, in the absence of direct experimentation, it remains unknown whether spores or infections can actually trigger KD or whether they are additional modulators.

This study must be viewed in light of some limitations. Firstly, this is an observational epidemiological study and, as such, associations are ecologic in nature. Any causal inferences are implied. Secondly, some dimensions of environmental risk dimensions which have previously been linked to KD, such as the microbiome, could not be included in our population-level model. Third, we were limited regarding the granularity of our model, given that KD is a rare disease and cases are dispersed in space and time. Therefore, any smaller, unpredicted changes in factor levels displayed over days or hours would not be captured. In addition, Census data were only available every 5 years, as such we were not able to fully adjust for the temporal variation in ethnicity across Canada across our study period, but fluctuations were expected to be minor. All spatial effects were regionalized, which meant that local environmental factors could not be considered. Moreover, atmospheric biological data were only available for 10 stations situated across Canada, but were extrapolated to the nearest FSA region for each of the 100 FSA groups. This limitation may potentially impact the validity of our pollen findings, as pollen dispersal is likely a local phenomenon. While KD incidence varies strongly with age, we cannot exclude that some factors may be more relevant to certain age groups than others, although no evidence of such an effect has been previously reported. The lower number of patients with KD > 5 years of age precluded a sub-analysis stratified by age. Finally, it is important to note that our proposed theoretical framework is a hypothesis and should not be construed as being confirmed.

Conclusion

Our model of the spatiotemporal distribution of KD across Canada over a 13-year period was able to explain a substantial portion of the variation in the incidence of KD over that time period and identified multiple environmental factors that could be integrated into a theoretical etiological and pathophysiological framework for the spatiotemporal distribution of the incidence of KD. This framework includes a child’s susceptibility to the disease, the presence of modulating factors and the spatiotemporal distribution of the trigger(s) and/or of modulating factors. Findings from this study strengthen our understanding of the epidemiology of KD, which is likely affected by various factors reflecting the complex reality of KD.