Spatiotemporal clustering of cases of Kawasaki disease and associated coronary artery aneurysms in Canada

Detailed epidemiologic examination of the distribution of Kawasaki disease (KD) cases could help elucidate the etiology and pathogenesis of this puzzling condition. Location of residence at KD admission was obtained for patients diagnosed in Canada (excluding Quebec) between March 2004 and March 2015. We identified 4,839 patients, 164 of whom (3.4%) developed a coronary artery aneurysm (CAA). A spatiotemporal clustering analysis was performed to determine whether non-random clusters emerged in the distributions of KD and CAA cases. A high-incidence KD cluster occurred in Toronto, ON, between October 2004 and May 2005 (116 cases; relative risk (RR) = 3.43; p < 0.001). A cluster of increased CAA frequency emerged in Mississauga, ON, between April 2004 and September 2005 (17% of KD cases; RR = 4.86). High-incidence clusters also arose in British Columbia (November 2010 to March 2011) and Alberta (January 2010 to November 2012) for KD and CAA, respectively. In an exploratory comparison between the primary KD cluster and reference groups of varying spatial and temporal origin, the main cluster demonstrated higher frequencies of conjunctivitis, oral mucosa changes and treatment with antibiotics, suggesting a possible coincident infectious process. Further spatiotemporal evaluation of KD cases might help understand the probable multifactorial etiology.

Kawasaki disease (KD) is an acute, mucocutaneous illness of unknown etiology 1 . First reported in 1967 2 , KD primarily affects children younger than 5 years of age and is indicated by a constellation of clinical signs including vasculitis, which can lead to arterial complications 3 , namely coronary artery aneurysms (CAA) 4 . Although the likelihood of CAA has been significantly reduced with the use of prompt treatment with intravenous immunoglobulin 5 , this complication remains the primary cause of morbidity and mortality associated with KD 6 . Given uncertainty regarding the etiology and specific treatment, efforts are ongoing to elucidate the genetic, environmental and infectious causes of KD and the optimal strategies to prevent complications [7][8][9] .
While the genetic susceptibility for KD within certain population groups is well-documented, with the observation of consistently-higher incidence rates amongst those of East Asian descent 10 , the possible environmental and infectious causes remain unclear. The documented seasonality of KD 11 and its frequent co-occurrence with infection 12,13 has led many to conclude that KD is associated with an environmental and/or infectious trigger inciting an extreme inflammatory response in genetically-susceptible children 14 . Unfortunately, the trigger (or triggers) remain(s) poorly understood, as researchers continue to debate whether the causal agent is infectious, environmental or some combination of the two. Furthermore, though previous research has shown the development of CAA in KD patients to be unlinked to the presence of an infection 15 , the environmental causes of poor coronary artery outcomes have yet to be thoroughly investigated. Thus, we explored the spatiotemporal distribution of both KD cases and CAA complications in Canada between March 2004 and March 2015, with the goal of identifying non-random clusters that might help explain the association between occurrence and infectious and/ or environmental phenomena.  Table 1. Though some areas demonstrated an increased KD incidence during a particular time window, others showed clusters of significantly-reduced incidence of KD (as seen in Table 2 In addition to the spatiotemporal clusters of KD, two purely temporal clusters (i.e. a national spatial window) emerged in the clustering analysis. A period of high KD incidence occurred between October 2010 and April 2011, when approximately 337 cases of KD (incidence = 11.9 per 100,000 children aged 19 years or younger, RR = 1.58, p ≤ 0.001; see Table 1) were recorded throughout the country. The analysis also identified a low-incidence cluster between June 2007 and November 2007, when only 99 Canadian children were diagnosed with KD (incidence = 4.2 per 100,000 children aged 19 years or younger, RR = 0.53, p ≤ 0.001; see Table 2).
The spatiotemporal analysis also revealed two major clusters of increased CAA frequency in the study window.  Fig. 4a). Due to the relatively-small number of CAA cases during the study period, statistical significance of these clusters could not be reliably assessed. For a complete summary of the identified CAA clusters, see Table 3.
Lastly, as most KD cases occur in young children, a sensitivity analysis was performed to determine whether restricting the study population to patients younger than 5 years of age would change our identified clusters. In repeating the spatiotemporal analyses for the narrowed patient population, the emerging clusters remained unchanged for both high incidence of KD and high frequency of CAA. Epidemiologic Comparison. Epidemiologic   temporal origin (RG1, RG2, RG3 and RG4; detailed in the Methods section). As shown in Table 4, several differences were noted in the exploratory comparison of the main cluster with RG1, RG3 and RG4. When compared to these three reference groups, patients in the main cluster demonstrated higher frequencies of conjunctivitis (p = 0.04, p = 0.002, p = 0.08, respectively), oral mucosa changes (p < 0.001, p = 0.16, p < 0.001, respectively) and, to a lesser extent, antibiotic treatment (p = 0.11, p = 0.08, p = 0.11, respectively) -all of which may be suggestive  of an infectious process. The only reference group that did not follow these patterns was RG2 (i.e. the same spatial area, but in the year prior to the main cluster), which showed negligible differences in most of the investigated variables when compared to the main cluster.
In an attempt to explain the apparent infectious process, we searched reportable disease databases -made available by both Public Health Ontario 16 and the British Columbia Centre for Disease Control 17 -for an infectious disease demonstrating increased incidence during the two major KD clusters. After thorough examination of the two databases, no increased prevalence of any single infectious disease was found to coincide with the two clusters of high KD incidence.

Discussion
The presence of space-time clusters noted in this study adds to the growing body of evidence associating the etiology and pathophysiology of KD with localized, infectious and/or environmental phenomena. First, it is worth noting that all four of the identified KD clusters occurred in and around the winter season. This result, in conjunction with the apparent seasonal pattern of KD presented in Fig. 1, is consistent with previous evidence of a possible KD etiologic agent present in increased concentrations during winter months in the Northern Hemisphere 11 . The geographic location of the two major KD clusters also offers potential insight regarding the condition's etiology. The presence of a primary cluster in Toronto, ON, for example, supports previously-reported KD risk factors associated with living in an urban setting 14 , and the presence of a localized infectious agent (which is further discussed in the subsequent section). Moreover, this cluster is also consistent with the previously-reported increased genetic susceptibility in people of Asian ancestry, as the city's Asian population (representing approximately 38% of the population) is significantly larger than that of the rest of the country (only 15%, p ≤ 0.001) 18 . Lastly, the Port Renfrew, BC, cluster supports the reported link between an increased KD incidence and westerly winds over the Pacific Ocean -wind patterns that have been previously associated with elevated concentrations of fungal particles in the atmosphere 19,20 . The potential effect of these wind patterns was further supported by the presence of low-incidence clusters in the Prairies and in the Eastern provinces -areas far removed from westerly wind formations.
Though clusters of increased frequency were also noted in the spatiotemporal analysis of CAA, the underlying causes were less easily extracted from the results given the relatively-low number of CAA cases. As reported previously in the literature, the number of CAA cases was likely an underestimation due to its basis on absolute internal lumen diameter and not a relative index, such as coronary artery z-scores 21 . Unlike the KD incidence analysis, no consistent seasonal pattern appeared to emerge in the spatiotemporal analysis of the frequency of CAA. Given the association between KD seasonality and the increased prevalence of an infectious agent, the apparent lack of CAA seasonality reinforces McCrindle et al. and their claim that the development of CAA is not associated with the presence of a concomitant infection 15 . The degree of both spatial and temporal overlap between the major CAA frequency cluster (in Mississauga) and KD incidence cluster (in Toronto) also suggests a potential commonality between the causal factors. To avoid this result being discounted, it is worth noting that -as per the study design -an increased KD incidence was not necessarily associated with an increased CAA frequency, as frequency was defined on a per KD case basis. The results of the exploratory epidemiologic comparison offer important insight regarding the potential source of a localized KD trigger. The main KD cluster demonstrated important differences when compared to three of the four reference groups, including higher frequencies of conjunctivitis, oral mucosa changes and antibiotic treatment. Not all of these differences were found to be statistically significant, however, the differences were consistent and relatively large across the three variables, even with the limited number of cases in each group. Seeing as conjunctivitis and oral mucosa changes are manifestations commonly associated with infection, and antibiotics are commonly used to treat infection, the demonstrated patterns suggest the potential presence of an infectious trigger in the main cluster. Moreover, as RG2 showed negligible differences when compared to the main cluster, it is possible that the infectious trigger was already present in the geographic area during the year prior to the main cluster, but perhaps at a lower concentration. While no single infectious disease was found to coincide with the major KD clusters in Ontario and British Columbia, the examination was limited to diseases   included in each province's reportable disease database. Thus, we cannot discount the potential presence of an infectious trigger that is not routinely reported to public health authorities. Due to the retrospective nature of the research study, the spatiotemporal analysis and epidemiologic comparison were limited to data available from the Canadian Institute for Health Information, The Hospital for Sick Children and the 2011 Canadian Census -all of which were assumed to be accurate and complete. The spatiotemporal analysis of CAA was limited by the relatively-small number of cases occurring in Canada during the study window. Specifically, the small sample size made it difficult to scan for clusters of low CAA frequency, as this analysis would simply isolate the many regions in which a CAA complication did not occur in a given time window. The accuracy of the spatiotemporal analysis was also limited by the moderate resolution of the available geographic data, which identified a patient's place of residence as the centre point of their forward sortation area (FSA) at time of admission. Though FSA is often an effective proxy for location, specifically in densely-populated areas, it offers suboptimal resolution for large geographic and sparsely-populated areas. The exploratory epidemiologic analysis was limited to clusters that coincided with a major KD surveillance study performed by The Hospital for Sick Children in Ontario between 1995 and 2006 22 . Fortunately, the main KD cluster fell within the identified window, minimizing the effect of this limitation. Lastly, the results of the epidemiologic comparison should be interpreted with caution, given that the identified patterns emerged in a post hoc analysis. Future work could aim to strengthen the claim of a potential coincident infectious process in regions of high KD incidence by performing a prospective epidemiologic study.
To conclude, the etiology and pathophysiology of KD and its associated complications remain incompletely understood. To determine the potential environmental and infectious causes of this condition, we performed a spatiotemporal analysis of KD cases and associated CAA complications occurring  Length of initial fever (days) 6 ± 2 (N = 96) 7 ± 3 (N = 99) 0.   infectious process. While the results of the exploratory comparison suggest the potential presence of an infectious trigger in the main cluster, the exact source of the trigger remains unknown.

Data Sources. This study was approved by the Research Ethics Board (REB) at The Hospital for Sick Children
and study procedures were carried out following the approved protocol in accordance with the Tri-Council Policy Statement on Ethical Conduct for Research Involving Humans 2010 (TCPS 2), Ontario and Canadian law. Under TCPS 2, a waiver of consent was granted by the REB for the following reasons: a very large study population and patients too difficult to locate. Descriptive data were extracted from a dataset originally collected by the Canadian Institute for Health Information for all KD patients diagnosed and hospitalized in Canada (not including Quebec) between March 2004 and March 2015. It is worth noting that this dataset is generally considered complete, and that the methods used in its collection have been thoroughly documented in the literature 23  Data Pre-Processing. First, the at-risk population for KD was identified as the number of persons aged 19 years or younger living in a given area. Accordingly, a KD population file was generated containing the number of children living in each FSA at the time of the 2011 Canadian Census. The population at risk of developing CAA, on the other hand, was identified as the number of patients diagnosed with KD in a given area at a given point in time. Thus, a second, CAA population file was generated containing the number of patients diagnosed with KD in each FSA in a given month. Next, a case file was generated containing the diagnosis date and FSA of residence for each KD case, as well as a binary indicator of whether CAA developed. Patients were classified as being affected by coronary artery aneurysms if they had a diagnostic code I25.4 (Coronary artery aneurysm or coronary arteriovenous fistula, acquired; clinical definition: abnormal balloon-or sac-like dilatation in the wall of coronary vessels most often due to coronary atherosclerosis or inflammatory disease such as Kawasaki disease) associated with any of their hospital admissions. Lastly, a geography file was generated containing the longitude and latitude coordinates at the center point of each FSA. Spatiotemporal Clustering Analysis. Using the population, case and geography files generated in the pre-processing phase, discrete Poisson models were fitted and tested using the SaTScan software and Kuldorff 's spatial scan statistic 26 . This statistic is defined as a cylindrical window with a circular base representing a geographic location and a height corresponding to time. As the window is allowed to vary in terms of its height, diameter and spatial location, the statistic essentially visits each possible time period for each possible geographic location and size. From the vast number of generated cylinders spanning the entire study region (both in time and space), clusters of increased or decreased event incidence are identified and outputted from the SaTScan software 27 .
For the purposes of this study, the maximum spatial cluster size was identified as 25% of the area of interest for both the KD and CAA analyses. KD clusters of both increased and decreased incidence were identified, whereas only clusters of high frequency were analyzed for CAA. The maximum temporal cluster sizes were set as 24 months for the KD analysis and 36 months for the CAA analysis. The CAA analysis included only clusters of increased frequency and a larger temporal cluster size due to the reduced number of CAA cases taking place in the study region. Log likelihood ratio tests were performed to assess the significance of space-time clusters, with p-values being obtained from a 999-iteration Monte Carlo simulation. All clustering analyses were performed in SaTScan v9. 4 Epidemiologic Comparison. Once the spatiotemporal analysis had been completed, it was found that one of the major KD clusters occurred in Ontario between October 2004 and May 2005. Given the overlap between this cluster and the aforementioned Ontario KD surveillance data (available for cases between 1995 and 2006) 22 , an exploratory epidemiologic analysis was subsequently performed on the identified cluster. Specifically, cases that occurred within the identified space-time window were designated as the main cluster, and were individually compared to the following groups of cases occurring outside of the cluster: • RG1: Cases occurring within the same temporal window, but outside the spatial window, • RG2: Cases occurring within the same spatial window, but within the equivalent temporal window a year prior to the identified cluster (i.e. October 2003 to May 2004), • RG3: Cases occurring within the same spatial window, but within the equivalent temporal window a year after the identified cluster (i.e. October 2005 to May 2006), and • RG4: A random subset of cases occurring within two years of the temporal window.
To assess the similarities and differences between the main cluster and the four reference groups, various demographic, clinical and treatment variables were examined. The mean and standard deviation were reported for continuous parameters, whereas categorical variables were presented in terms of count and percentage. With the goal of identifying important differences between the main cluster and the four reference groups, the significance of the difference from the main cluster was reported for each reference-group parameter. All analyses were performed using the R Project for Statistical Computing v3.4.2.