Abstract
A diffusion process can be considered as the movement of linked events through space and time. Therefore, spacetime locations of events are key to identify any diffusion process. However, previous clustering analysis methods have focused only on spacetime proximity characteristics, neglecting the temporal lag of the movement of events. We argue that the temporal lag between events is a key to understand the process of diffusion movement. Using the temporal lag could help to clarify the types of close relationships. This study aims to develop a data exploration algorithm, namely the TrAcking Progression In Time And Space (TaPiTaS) algorithm, for understanding diffusion processes. Based on the spatial distance and temporal interval between cases, TaPiTaS detects subclusters, a group of events that have high probability of having common sources, identifies progression links, the relationships between subclusters, and tracks progression chains, the connected components of subclusters. Dengue Fever cases data was used as an illustrative case study. The location and temporal range of subclusters are presented, along with the progression links. TaPiTaS algorithm contributes a more detailed and indepth understanding of the development of progression chains, namely the geographic diffusion process.
Similar content being viewed by others
Introduction
A geographic diffusion process is the evolution of spacetime clusters of entities. Geographic diffusion processes are a scientific field of research that focuses on the movement of events, goods, information, ideas, or people through space and time^{1,2}, that is, how do things spread from one place to another through time. In the literature, there are three types of diffusion: contagious, relocation, and hierarchical^{3,4,5}. Contagious diffusion is concerned with proximate contact and is highly influenced by the friction of distance. Relocation processes involve larger leaps in spatial distance. Hierarchical diffusion is influenced by inherent hierarchies of geographical space, such as demographic, socioeconomic, or the mobility structure of a region. While modeling geographic diffusion processes from the original event point locations, there are two critical points to consider: the occurrence of events, and the transmission of events.
The concept of event occurrence focuses solely on the spatialtemporal locations of events. To model the process of diffusion, first, we need to know where and when events occurred. For example, the onset date and the residential or working locations of the patients of a disease outbreak have to be recorded and analyzed in the model. Point pattern analysis methods are designed to describe the pattern of the locations of the events, such as disease cases^{6,7,8}, accidents^{9}, crime locations^{10,11}, and disaster locations^{12,13}. Point pattern analysis can be classified into distancebased or densitybased techniques^{1}. Distancebased techniques, such as nearest neighbor analysis, use information on the spacing of points to define a pattern^{14,15}. Densitybased techniques, such as quadrat analysis and kernel density estimation, rely on various characteristics of the frequency distribution of the observed numbers of points in regularly defined subregions in the study area^{16,17}. In spatial epidemiology, kernel density estimation has been used to estimate the spatial distribution of potential risk factors^{7}. For example, Sabel et al.^{7} mapped the spatial distribution of the relative risk based on patients’ residential locations, and spatial temporal trends of the groups of patients based on their age groups. In summary, point pattern analysis detects spatial clustering^{18,19,20,21,22} and describes the spatial pattern of the occurrences of events.
On the other hand, the concept of the transmission of events focuses on movement through space and time. Diffusion of diseases has been studied for decades. Using Iceland as the study area, Cliff, Haggett and their team intensively worked in the 1960s on the spread of infectious disease within a closed island community in time and space^{3,23,24,25}. They attempted to link epidemic models with spatial theory and had some success in revealing underlying mechanisms of movement of disease through time and space. Aside from modeling diffusion from spacetime characteristics, recent studies have used graph theory and complex network analysis to explicitly model relationships between the events^{26,27,28}. Transmission relationships were modeled at various scales of networks, which included individual social networks^{29,30,31}, metapopulation and subpopulation networks^{32,33}, buildings network^{34}, and cities or countries networks^{35,36}, by converting the objects of study into nodes and the contacts or interactions between them into links. Using complex network theories to analyze the transmission relationships provides clear topological structure of contacts in terms of nodes and links for revealing the process of complex interactions^{37,38}. These studies attempted to understand the process of the exposure to disease through an agent or equationbased simulation, or an integrated modeling approach. For example, Meloni et al.^{32} investigated infectious disease spread using a metapopulation system, a network composed of subpopulations. They modeled changes in human movement behavior in response to the status of disease at the location, and simulated the transmission of disease under these scenarios. They presented the concept of an invasion tree that shows disease progression by defining a directional link from the origin to the destination subpopulations of an infection process. Disease diffusion in space and time has also been modelled by spatial dynamic models to understand the spatial pattern formation^{39,40,41}. Spatial dynamic modeling is a mathematical approach that captures dynamic behaviors with patchbased spatial interaction models (i.e. cellular automata), for revealing population dynamic processes, such as the disease spreading, predatorprey interaction, and the interaction between population density and the fitness of individuals^{42,43,44}. Therefore, we see that the major function of modeling the transmission between events is to understand and detect the movement process.
The temporal dimension is fundamental to understanding human activity^{45}. Taking the temporal dimension into consideration is crucial for investigating spacetime clustering and diffusion processes^{46}. Previous studies of spacetime point data analyses aim to investigate two spatial temporal phenomena: spacetime interactions and spacetime clustering. Spacetime interactions determine whether a significant association between short distances in time and space exist. For example, the Knox test, a method which uses a critical space and time to determine whether a pair of events is spatially and temporally close. If the distances in space and time are correlated, a spacetime interaction exists^{47}. In the spatial epidemiological field, these tests can determine whether epidemics have contagious characteristics^{48,49,50}. On the other hand, spacetime clustering focuses on detecting clusters of events that are close with each other in both spatial and temporal dimensions. Spacetime clustering methods can be used to detect a statistically significant excess of events occurring within a limited spacetime continuum, which indicate where and when a situation becomes more serious. SaTScan, a spacetime scan statistic method, which differs from spacetime interaction tests, can identify when and where clusters are, and has been used to detect spacetime clusters^{51}. By considering the temporal dimension, not only the spatial location of the clusters but also the temporal periods of the clusters can be revealed.
Diffusion processes emphasize the movement of events through space and time^{29}. But, neither spacetime interactions nor spacetime clustering phenomena are designed to capture the temporal differences of movements. Two events are considered related in spacetime dimensions if they happen at the same place in the same time, i.e., two events happened in a small spatial range and temporal differences. However, while diffusion processes describe the spread of events through space and time, it means a temporal lag must be in between the source and target events, i.e., the second event should have occurred some time after the first event, and also not too far from it. This is the case especially in disease diffusion processes, where transmissions may experience a temporal lag for an incubation period, that is the time between infection and disease emergence. Thus a temporal lag between the transmission pairs should be considered in the understanding of disease diffusion^{52}. To study disease diffusion processes, previous studies that used simulation approaches, including equation and agentbased modeling, considered the shifting in temporal dimension as a key aspect in simulation models^{53,54,55,56}. In disease diffusion simulation models, such as the susceptibleexposedinfectiousrecovered (SEIR) model, a patient is exposed after physical contact with another infectious patient, and then waits for several time steps (depending on the particular disease etiology) before becoming an infectious patient^{53}. The temporal lag effect has been considered in previous simulated diffusion studies, but the purpose of these studies was to understand the outcomes of different policy scenarios. In other words, a simulation approach cannot be used for empirical data exploration purpose.
From a data exploration perspective, the purpose of which is to identify patterns within spacetime data, considering temporal lag can help clarify the relationships of spacetime proximate events, and capture the progression of diffusion events. This study aims to develop a novel algorithm that utilizes temporal lag properties for understanding diffusion processes. Previous studies in diffusion have tended to focus on visualizing the spreading processes, whereas here we propose to extend this work by modeling the process, specifically to identify the spacetime pattern and understand the structure and relationships between the clusters of events. First we describe the proposed algorithm, before we demonstrate its applicability using Dengue Fever data from Kaohsiung City, Taiwan.
Method
In this section, we present a novel algorithm, namely the TrAcking Progression in Time and Space (TaPiTaS) algorithm. We decomposed the diffusion process into subclusters and progression links between subclusters. The TaPiTaS algorithm uses the spatial and temporal distance between each pair of points to identify the most probable common origin and to detect subclusters. A subcluster is formed by a group of spatially and temporally close points that are probably related to one or several common origins. By common origin, we mean the source or in an epidemiological context, the original infective agent or individual that is common to all subsequent subclusters. Then, subclusters are connected by progression links according to the spatial and temporal relationships of these subclusters. Finally, progression chains that are formed by several linked subclusters could be revealed. The illustration of the subclusters is shown in Fig. 1, where the spatial dimensions are reduced to one dimension to show if the cases are close to each other.
The TaPiTaS algorithm is composed of three steps. The first step distinguishes the relationships of each pair of the spatially close events into two types: shifting link or neighboring pair. The second step focuses on identifying spacetime subclusters. The third step aims to construct the progressions between subclusters. The algorithm framework is shown in Fig. 2. We applied the TaPiTaS algorithm to individual cases of Dengue Fever from 1998 to 2015 in Taiwan.
Distinguishing shifting links and neighboring pairs
While the spatial diffusion process is a concept describing the movements of events, it takes time to shift from one location to another. If two events happened in a same area at the same time, they can be considered as a spacetime cluster of events, but not a spreading process. In point data analysis, the idea of the same area is captured by a spatial buffer zone, i.e., if one event happened within a distance buffer of another event, the two events are considered as spatially neighbors. For the temporal dimension, the idea of the same time can also be captured by a time buffer, i.e., if one event happened within a time buffer of another event, they are temporal neighbor. If two events happened in the same area, and the second event happened immediately after the time buffer from the first event, the second event can be considered as the outcome of the first event, that the first event has shifted to the location of the second event. This situation captures the concept of temporal lag, which is defined as the temporal interval that is needed for the outcome event to appear starting from the occurrence of the source event. But, if the second event happened a long time after the first event, they may be indirectly connected, but the relationship between the two events is not specified, thus, can be considered as not related.
The first step of the algorithm is to separate the spatially neighboring events into the three types of relationships (Fig. 3). To those pairs of events happened within a spatial buffer (D), we denote the pair as a neighboring pair if the temporallength (temporal lag) between the two events is shorter than or equal to a timebuffer (T1). We denote the pair as a shifting link if the timelag between the two events is longer than the timebuffer but shorter or equal to a timethreshold (T2). And we denote the pairs with longer timelag than the timethreshold as nonrelated, which are not included in the next procedure.
Detecting spacetime subclusters
To detect spacetime subclusters, the algorithm analyzed neighboring pairs and shifting links. Then, a group of nodes that are probably related to one or several common origins is determined by the shifting relationships. We define shifting links for capturing the opportunity of moving from one to a latter event, which is used to measure the chances if a pair of nodes have a strong common origin (or several common origins). The shifting relationship in time and space is defined as a spatial and temporal weighting function. Spatial weights decrease with the increasing distance between the two nodes and the strength temporal weights raises after T1 until the middle of the range between T1 and T2 where the strength reaches a peak, and decreases after the middle point until T2. Therefore, the spatial weighting function is formulated as a distancedecay function with a threshold at D (Equation 2), and the temporal weighting equation is formulated as a bell shape function with the mean as the middle between T1 and T2 and the standard deviation as the half of the range between T1 and T2 (Equation 3). The results of the spacetime weighting equations are illustrated in Fig. 4. The two weighting equations are normalized to range from zero (the lowest) to one (the highest). Thus, the shifting relationship between nodes is defined as a combined weight (\({W}_{c}(i,j)\)) of the occurrence of the link calculated based on the spatial and temporal weights with the Equations 2 and 3, respectively. We adopted the concept of Mantel Index, a commonlyused spacetime correlation statistic which is based on a product of normalized spatial distances and time intervals^{47}. Thus, the combined weight (Equation 1) adopts a product of spatial and temporal weights. A higher combined weight value between nodes indicates a higher spacetime association with each other.
where, \({W}_{s}(i,j)\), \({W}_{t}(i,j)\), and \({W}_{c}(i,j)\) are the spatial, temporal, and combined weighting functions; \(d(i,j)\) and \(t(i,j)\) are the spatial distance and temporal distance of the pair of nodes of a shifting link; \(D\) is the spatial buffer; and T1 and T2 are the timebuffer and timethreshold.
For each target case(j), we compare the combined weight of each of its incoming shifting link(\({W}_{c}(i,j)\)) with the total combined weight of all of its incoming shifting links \((\sum _{k\in I(j)}{W}_{c}(k,j))\). The higher the relative weight of a shifting link, the more likely the target case is shifted from the shifting link.
where, \(RW(i,j)\) is the relative weight of shifting, \(I(j)\) is the set of incoming shifting links of j.
Using the relative weight of shifting, we calculate the single propensity (\({P}_{c}(s,(a,b))\)) of each neighboring pair (\(a,b\)) from each common source (\(s\in I(a)\cap I(b)\)) with Equation 5. The total common propensity (\({S}_{c}(a,b)\)) of each neighboring pair from all of their common sources (\(k\in I(a)\cap I(b)\)) is calculated by Equation 6. Thus, the higher the total common propensity, the more likely the neighboring pair are shifted from one of their common sources.
where, \({P}_{c}(s,(a,b))\) is the single propensity from one common source to a pair of nodes of a neighboring pair; \({S}_{c}(a,b)\) is the total common propensity of a pair of neighboring nodes.
Based on the groups of total common propensity values, a nonparametric bootstrap procedure is used to identify pairs of neighboring relationships that are significantly stronger than the others. A critical value (\({S}_{c}^{{\rm{^{\prime} }}}\)) is calculated as the threshold, from a bootstrapping process, to filter the neighboring pairs. Denotes N as the number of neighboring pairs, the bootstrapping process randomly sample Npairs of neighboring pairs (\(a,b\)), and calculates and records the mean (mean _{ s }) of the Nsamples’ total common propensity (\({S}_{c}(a,b)\)). The resampling process is repeated for M times, and the bootstrapped mean (mean _{ boot }) and standard deviation (\(s{d}_{boot}\)) of the Mrecorded means (mean _{ s }) is calculated for evaluating the critical value with Equation 7. Then, the neighboring pair with a common propensity \(({S}_{c}(a,b))\) that is higher than or equal to the critical value(\({S}_{c}^{{\rm{^{\prime} }}}\)) is defined as the cluster pair. The other neighboring pairs would then be neglected in the following procedures.
where, \({S}_{c}^{{\rm{^{\prime} }}}\) is the upper critical value from the bootstrapping process; \(mea{n}_{boot}\) and \(s{d}_{boot}\) are the mean and standard deviation that represents the distribution of the Mtimes of resample means of the total common propensity values (mean _{ s }). To calculate the upper bound of the 80% interval of the distribution (twotails), the standard deviation is multiplied by 1.28 and summed to the mean of the M samples.
After all of the neighboring pairs are evaluated, we can construct a network where the nodes are events, and two nodes are connected if a cluster pair exists between them. We search for the connected components within the network, which represent the subgroup of nodes, in which each node has a connecting path to any node within the same subgroup. Each connected component is identified as a spacetime subcluster. Therefore, a subcluster is composed of a group of events which are connected by a bunch of cluster pairs, indicating that they are more likely to have one or several common sources.
Constructing the progression chains
The progressions represent the connections between subclusters, that show how the subclusters influence one another, and the direction of the diffusion process. Two scales of progressions are included in this part: the progression links, and the progression chains. After all cluster pairs are found, the most probable common origin (the source(s) of the shifting links with the max P _{ c }(s,(a,b))) of each cluster pair are revealed and defined as common links. A progression link is constructed between two subclusters that has at least one common link exists between them, which represents the progressions from one subcluster to another. And, the subclusters that are connected with each other form a progressionchain (Fig. 1b), whereas the other subclusters that are not linked with any subclusters are called isolated subclusters.
Application: the subclusters and progressions of Dengue Fever in Taiwan from 1998 to 2015
To test and demonstrate our TaPiTaS algorithm for understanding and visualizing the diffusion process, we used Dengue Fever data in Kaohsiung City, Taiwan from 1998 to 2015. Located in East Asia, Taiwan straddles tropical and subtropical zones. The tropical weather pattern of Kaohsiung City, hot temperature and high humidity in summer, provides suitable habitats for the vector of Dengue Fever (mainly the Aedes aegypti mosquitoes)^{57,58}. Moreover, Kaohsiung City, which has the largest harbor and an international airport in Taiwan, is a major EastAsian transport hub that has a high volume of travelers from SouthEast Asia. This increases the opportunity of Dengue Fever importation to Taiwan^{59,60}. The number of Dengue Fever cases in Kaohsiung City shows an annual cyclical pattern, which mainly starts in the early summer and ends in the late winter of the next year^{57}. Therefore, we have separated the data in this study to start from April 1st of each year to March 31st of the following year. The locations of past outbreaks are mostly concentrated in southern Taiwan (including Kaohsiung City) due to the temperature suitability for the mosquitoes breeding^{61,62}. Therefore, the aim of the case study is to detect subclusters from the annual Dengue Fever cases and identify processes between the subclusters.
Dengue Fever is a vectorborne disease, with a humanmosquitohuman transmission cycle. There is a temporal lag between the time a case is infected and the time when the case become infectious. The infectious period after the first symptoms appear is about 4–5 days; the extrinsic incubation period (EIP) for mosquitoes is about 8–12 days; and the intrinsic incubation period (IIP) for humans is about 4–10 days^{63} (Fig. 5). Therefore, the minimum temporal lag between a pair of related cases is 12 days (the first patient infects the mosquitoes on the first day, 8 days of extrinsic incubation period, following by 4 days of intrinsic incubation in the second patient), whereas the maximum timelag is about 27 days (the first patient infects the mosquitoes on the 5th day, following by 12 days and 10 days of the extrinsic and intrinsic incubation period, respectively).
Our data was provided by Taiwan Centers for Disease Control (Taiwan CDC), which records the daily number of Dengue Fever cases in each basic statistical unit (BSU) in Kaohsiung City, separated into imported (to Taiwan) and local indigenous cases, based on the epidemiological investigation records. The Kaohsiung City area includes the previously named Kaohsiung City and Kaohsiung County until 2010. Only the indigenous case data was used in this study to eliminate the external importation noise from the local diffusion process. On average for the whole of Taiwan, each BSU contains 400 people. There are three main parameters in the our TaPiTaS algorithm: a spatial distance parameter, and two temporal parameters. In this case study, the spatial neighboring parameter (D) was set to 500 meters following Hsu and Tsai^{64}. Based on the intrinsic and extrinsic incubation periods of Dengue Fever^{63}, the timebuffer(T1) was set to 12 days; and the timethreshold parameter (T2) was set to 27 days. Experimentally, the bootstrapping value converged after 99 iterations.
Descriptive statistics of events, pairs of close events, subclusters, and chains
Table 1 shows the descriptive statistics of the diffusion progression by year. The total number of cases varied between years: six out of the 18 years have less than 100 cases; in 2014 and 2015, the number of cases exceeded 10,000. The number of subclusters are related to the number of cases in the year, but are not always proportional to the number of cases. For example, 2010 had less cases than 2011, but more subclusters were detected. The subcluster size (SC size) and duration (SC duration) shows the overall extent and the duration of continuity of subclusters. Subcluster size is measured by the number of cases in every subcluster, whereas the duration is measured by the days between the first case and the last case in each subcluster. Regardless of the significant differences in the total number of cases, the median number and the median absolute deviation of the subcluster sizes and durations over the 18 years are similar: subclusters consist of about 2 to 4 cases, and the duration median is 5 days (with 4 days MAD). MAD is the abbreviation for median absolute deviation^{65}, which better describes our range than the normal standard deviation.
The number of isolated subclusters (isoSC), progression links (PL), chains, and the characteristics of progression chains including the sizes and duration are also shown in Table 1. The chain size is measured by using the number of subclusters in each chain to show the extent of chains in the year; the chain duration is measured by using the temporal difference between the first case and the last case in each chain to show the temporal continuity in each year. Similar to the subcluster sizes and durations, the chain sizes and durations are also similar through the 18 years regardless of the differences in terms of the total size of cases. The chain sizes consist of 2 to 3 subclusters and the duration median is one month (with 2 weeks MAD).
Visualizing the processes of diffusion
From the descriptive statistics analysis (Table 1), in three years, 2002, 2014, and 2015, Kaohsiung City experienced the severest epidemics in the past 70 years^{65,66}. The total number of confirmed Dengue Fever cases in these three years were 4671 cases, 15011 cases, and 19520 cases, respectively. The progression structures of largescale Dengue transmission in these epidemic years can be distinctly illustrated and investigated quantitatively for demonstrating the functionality of our proposed algorithm. Therefore, in the following discussion, we focused on the three years. In 2002, our algorithm found 276 subclusters and 224 of these formed 39 progression chains. In 2014, 435 subclusters were detected, and 336 of these subclusters were found in 56 progression chains. In 2015, 484 subclusters were detected, and 368 formed 44 progression chains.
The spatial distribution of the cases, subclusters and progression links in 2002, 2014, and 2015 are shown in Fig. 6. The colors indicate the progression chains to which they belong. The subclusters are presented as standard ellipses using the XY coordinates of cases to determine the standard distances. The width of the progression links indicates the number of shifting links that connect the two subclusters. Cases were distributed throughout the city for the three years (Fig. 6a,d,g). In 2002, the algorithm found a significant spatial separation between the progression chains (Fig. 6a). Figure 6b shows that the progression chains were differentiated into ellipses by our algorithm based on the temporal differences between the cases. In order to show the evolution and strength between the subclusters, the progression links were mapped in Fig. 6c.
The spatial distribution of the progression chains in 2014 and 2015 were different from the year 2002, in that an extremely large subcluster appeared and covered the most highly populated area of the city (Fig. 6e,h). This is because the cases were not concentrated in a small area over the small temporal duration, but distributed over a larger region within the spatial search radius. Different from 2002, some smaller progression chains overlapped with the large subcluster in 2014. However, most of the smaller overlapping subclusters were linked with another large subcluster in 2015. Comparing the 2014 results to the 2015 results, the 2015 analysis revealed more subclusters but less progression chains. The chain sizes in 2015 were larger than those in 2014, indicating that the subclusters in 2015 were more connected to one another. For example, in the southern part of Kaohsiung City (lower part of the map), five progression chains were detected in 2014, but only one progression chain was found in 2015.
Discussion
When exploring diffusion processes, especially for disease diffusion, the actual interactions between events (cases) are normally not known, since information about the transmission process (subcluster evolution over time) between large numbers of cases is difficult to measure retrospectively. The location and temporal case information is normally the only available data. To uncover any diffusion process from this type of data, we have developed the TaPiTaS algorithm for exploring and visualizing spacetime point data to show the process of diffusion. TaPiTaS is a novel algorithm that utilizes the temporal lag within the diffusion process and the spatial distance between events to detect the spatialtemporal subclusters and to uncover the development of progression chains. Recall that a subcluster represents events that have high propensity to have common origins in the diffusion process. A progression chain represents the linked subclusters. Thus TaPiTaS is an important conceptual contribution to cluster structure and subcluster evolution understanding.
We anticipate that this algorithm can be used to explore diffusion phenomena, in which temporal lag is a key to the movement of the diffusion process. Unlike previous methods for detecting clusters that only included a timewindow to capture the temporal proximity, our algorithm adds a lagged timewindow (between T1 and T2) to capture the shifts between events, and weights the temporal lags to search for the subclusters and progressions. This enables us to distinguish the subclusters from the spatially and temporally clustered events.
To demonstrate the analysis process and algorithm outputs, we presented a case study, of the annual Dengue Fever diffusion process in Kaohsiung City, Taiwan, from 1998 to 2015. In the case study, despite varying sizes of epidemics in different years, we detected similar size subclusters with a median of 2 to 4 cases. The reason that most of the subclusters are composed of small numbers of cases is because the algorithm distinguishes the relationships between spacetime proximate events using the lagged timewindow, and the algorithm filtered neighboring pairs using a critical value to ensure that only the pairs of neighbors with compelling high total common propensity are used to detect subclusters. In other words, by connecting the events that happen in the same place at the same time (with buffer zones in space and time), our TaPiTaS algorithm aggregates cases into small groups, namely the subclusters, which could form part of a larger spacetime cluster. Moving on to discuss the progression chains, by connecting the subclusters according to the evolution of shifting links, TaPiTaS uncovers the movement progression in spacetime dimensions, namely the processes within a spacetime cluster.
Our results differentiate diffusion structures in time and space among the severest epidemic years: 2002, 2014, and 2015. It implies Dengue Fever epidemics for these years in Kaohsiung could have been triggered by different sources of infection, driving forces of transmission, and the effectiveness of intervention measures. In 2002, its diffusion structure indicated the epidemic circulated around the different districtlevel administration areas. The district heath authority is the basic operation unit for disease control and prevention. Thus, it implies that the peripheral areas may lack consistent intervention measures. The subclusters with green color originated from the boxA (Fig. 6c), near to the boundary of the Fengshan district, a frequent Dengueepidemic region^{57,67}, then spreads southwest, northwest, and north. Fengshan district is a satellite city of the Kaohsiung metropolitan area that has a high population density but its socioeconomic level is relatively lower than the neighboring central business district (CBD) of Kaohsiung City. Recent studies have shown the areas with high urbanization levels, high population density, and low socialeconomic status would increase the risk of Dengue diffusion and tend to become the sources of diffusion^{52}. Moreover, there is a group of small and less connected subclusters appearing in the southern part (Fig. 6b, boxB) of the map, indicating that the areas may also be vulnerable to Dengue Fever and may require more attention in the future.
The epidemic progressions in 2014 and 2015 share some common diffusion structure characteristics. First, these two Dengue epidemics are composed of two major progression chains. One of the progression chains contains the largest size of subcluster, and these cases in the subcluster were mostly located in epidemic areas (Fig. 6e,h). Second, two smaller groups of progression chains appear in the southern part of Kaohsiung City, one of the progression chain groups contains the orange progression chain in 2014 (Fig. 6e, boxC), and part of the purple progression chain in 2015 (Fig. 6h, boxE). The other contains a group of progression chains in 2014 (Fig. 6e, boxD), and a black progression chain in 2015 (Fig. 6h, boxF). These locations are approximately the same vulnerable areas as identified in 2002 (Fig. 6b, boxB). Therefore, comparing these progression chains could reveal geographic epidemiological links among these three years. On the other hand, we also identified different progression directions in 2014 and 2015. The progression links in 2014 pointed northwest are stronger than the other directions; however, in 2015, some of the progression links pointed south are stronger. In 2014, the epidemic subclusters mainly diffused northward and northwestward, whereas in 2015, some significant southward spreading of subclusters was also observed.
Using standard ellipses to represent subclusters, and arrows between them to show the progression chains, the diffusion process is illustrated. Previous studies on spacetime clustering issues using the spacetime scan statistic (SaTScan) visualized the clustered area with a circle, and described the clustering periods of each cluster^{68,69}. This visualization method shows a clear location and the magnitude of the detected clusters. But it ignores the process of cluster development. Other spacetime clustering studies used kernel density techniques to visualize the clustered area in spacetime cube plots superimposed on a 2D X,Y map^{21,22}. Event temporal mobility was illustrated on the Z or 3rd dimension, and thus the event migration could be shown on a threedimensional figure. However, without the ability to interact with a static 3D image, it is confusing to understand cluster progression. Furthermore, using density based techniques, simple temporal proximity relationships were assumed, not calculated from the data whereas in our TaPiTaS algorithm, the temporal lag between cases is explicitly included in the calculation procedure. In our study, by using standard ellipses to visualize subclusters, we suggest that readers can get a better understanding about the shape and directions of subclusters; using arrows to link the subclusters and to represent the progression links with the day of appearance of the subclusters, the process of diffusion can be revealed.
Profiling diffusion structures of Dengue Fever epidemics can provide important clues for health authorities to implement spatiallytargeted intervention measures. Our proposed algorithm reveals three key geospatial characteristics of Dengue diffusion by identifying progression links: the source areas, the target areas, and the linkage between epidemic areas. The source areas of diffusion may represent the areas that are suitable for the breeding of disease vector. Eliminating the habitats of mosquitoes could be beneficial for the source area and also its adjacent areas. It would produce the transboundary externalities in disease control^{70,71}. The target areas could be the regions with spatial risk factors of Dengue transmission. The areas with these risk factors are expected to be more vulnerable to the Dengue Fever transmission, including high urbanization levels, low socialeconomic status, favorable weather conditions and previous epidemic records^{52,72,73}. The linkage between epidemic areas indicates the spatial interactions or communication through human movement. It also implies the possible route of the virus diffusing from one to another region. A longer progression linkage may cause a largescale epidemic through longdistance human moving behaviors^{50,66,74}.
Our TaPiTaS algorithm has several limitations. First, when a large amount of events happens within a short spatial distance of each other, over a short temporal period, such as occurred in the case study in 2014 and 2015 with more than 15,000 cases per year, our algorithm could not fully capture the diffusion process. Instead of identifying subclusters as separate events, our algorithm grouped them as a giant subcluster and which became a common source for the subclusters at different places. We think that this situation requires further investigation to understand the mechanism of the extreme subclusters or progression chains. Second, no significance evaluation procedure is included in our TaPiTaS algorithm. While clusters could happen by chance under a complete spatial randomness condition, the detected subclusters and the progression links could be a result of a spacetime random distribution. To overcome this limitation, a MonteCarlo significance test could be included to evaluate the level of significance of each subcluster and progression link. Third, the underlying population distribution is not considered. Other methods, such as SaTScan, consider the distribution of population at risk to capture the spatial inhomogeneity of population, and perform a risk normalization procedure in the calculation. Risk normalization is not included within our current algorithm, but it may be done in the data preparation stage before the algorithm, or in the results evaluation stage using additional calculation after the algorithm. Fourth, the definition of spatial proximity is measured only by the straight line distance between events, which simplifies the concept of distance. The measurement of distance is a key issue in most spatial analysis models and methods. Some alternatives, such as street network distance, time needed for movement, or cost of movement, could be used alternatively to measure the spatial proximity between events. Fifth, we used two equations to calculate the spatial and temporal weight separately (Equations 2 and 3), and multiplied them to calculate the integrated weight (Equation 1) in the algorithm. The equations may not be suitable for all events of interest in different studies. Therefore, the parameters for weighting spatial and temporal distance, and for the combined weight of shifting links could be modified according to the needs of the study subject.
Conclusion
Temporal lag is essential in the understanding of diffusion processes. We have proposed a novel algorithm, TaPiTaS, that utilizes the spatial distance and temporal interval between events, to explore and to visualize the diffusion process. This algorithm can be used to explore point data with timestamps, and outputs subclusters, progression links, and progression chains to show the process of diffusion. As a method for spacetime point data exploration, TaPiTaS differentiates the relationships of events and detects the subclusters of events that are immediately proximate to each other. TaPiTaS then identifies the directional links between subclusters, which represents diffusion progressions. By additionally visualizing the detected subclusters and progression links, our TaPiTaS algorithm contributes a more detailed and indepth understanding of the geographic diffusion process then currently exists. In summary, we propose our TaPiTaS algorithm as a tool for uncovering the evolution of spacetime clusters of entities.
Data availability
The data that support the findings of this study are available from Center for Disease Control, Taiwan, under the Taiwan Open Government Data License, version 1.0 (http://data.gov.tw/license#eng). The datasets are available in the Daily reported Dengue Fever cases since 1998 repository (dataset url: http://data.gov.tw/node/21025, direct download url: http://data.gov.tw/iisi/logaccess/61136?dataUrl=http://nidss.cdc.gov.tw/download/Dengue_ Daily_EN.csv&ndctype=CSV&ndcnid=21025).
References
Haggett, P., Cliff, A. D. & Frey, A. E. Locational analysis in human geography. (Wiley, 1977).
Sabel, C. E., Pringle, D. & Schærstrom, A. Chapter 7: Infectious Disease Diffusion. In A Companion to Health and Medical Geography 111–132 (WileyBlackwell, 2010).
Cliff, A. D., Haggett, P. & SmallmanRaynor, M. Island Epidemics. (Oxford University Press, 2000).
Nekola, J. C. & White, P. S. The distance decay of similarity in biogeography and ecology. J. Biogeogr. 26, 867–878 (1999).
Meade, M. S. & Emch, M. Medical geography. (The Guilford Press., 2010).
Bithell, J. F. An application of density estimation to geographical epidemiology. Stat. Med. 9, 691–701 (1990).
Sabel, C. E., Gatrell, A. C., Löytönen, M., Maasilta, P. & Jokelainen, M. Modelling exposure opportunities: estimating relative risk for motor neurone disease in Finland. Soc. Sci. Med. 50, 1121–1137 (2000).
Porphyre, T. et al. Vulnerability of the British swine industry to classical swine fever. Sci. Rep. 7, 42992 (2017).
Xie, Z. & Yan, J. Kernel Density Estimation of traffic accidents in a network space. Comput. Environ. Urban Syst. 32, 396–406 (2008).
Gerber, M. S. Predicting crime using Twitter and kernel density estimation. Decis. Support Syst. 61, 115–125 (2014).
Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P. & Tita, G. E. SelfExciting Point Process Modeling of Crime. J. Am. Stat. Assoc. 106, 100–108 (2011).
Woo, G. Kernel estimation methods for seismic hazard area source modeling. Bull. Seismol. Soc. Am. 86, 353–362 (1996).
Galindo, I., Romero, M. C., Sánchez, N. & Morales, J. M. Quantitative volcanic susceptibility analysis of Lanzarote and Chinijo Islands based on kernel density estimation via a linear diffusion process. Sci. Rep. 6, 27381 (2016).
Clark, P. J. & Evans, F. C. Distance to Nearest Neighbor as a Measure of Spatial Relationships in Populations. Ecology 35, 445–453 (1954).
Lee, J., Lay, J.G., Chin, W. C. B., Chi, Y.L. & Hsueh, Y.H. An Experiment to Model Spatial Diffusion Process with Nearest Neighbor Analysis and Regression Estimation. Int. J. Appl. Geospatial Res. 5, 1–15 (2014).
Hess, D., van Lieshout, M.C., Payne, B. & Stein, A. A review of spatiotemporal modelling of quadrat count data with application to striga occurrence in a pearl millet field. Int. J. Appl. Earth Obs. Geoinformation 3, 133–138 (2001).
Silverman, B. W. Density estimation for statistics and data analysis. (Chapman and Hall, 1986).
Openshaw, S., Charlton, M., Wymer, C. & Craft, A. A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets. Int. J. Geogr. Inf. Syst. 1, 335–358 (1987).
Gatrell, A. C., Bailey, T. C., Diggle, P. J. & Rowlingson, B. S. Spatial Point Pattern Analysis and Its Application in GeographicalEpidemiology. Trans. Inst. Br. Geogr. 21, 256–274 (1996).
Cuzick, J. & Edwards, R. Spatial Clustering for Inhomogeneous Populations. J. R. Stat. Soc. Ser. B Methodol. 52, 73–104 (1990).
Demšar, U. & Virrantaus, K. Space–time density of trajectories: exploring spatiotemporal patterns in movement data. Int. J. Geogr. Inf. Sci. 24, 1527–1542 (2010).
Lee, J., Gong, J. & Li, S. Exploring spatiotemporal clusters based on extended kernel estimation methods. Int. J. Geogr. Inf. Sci. 31, 1154–1177 (2017).
Cliff, A. D. & Haggett, P. Changes in the seasonal incidence of measles in Iceland, 18961974. J. Hyg. (Lond.) 85, 451–457 (1980).
Cliff, A. D., Haggett, P. & Graham, R. Reconstruction of diffusion processes at different geographical scales: the 1904 measles epidemic in northwest Iceland. J. Hist. Geogr. 9, 29–46 (1983).
Cliff, A. D., Haggett, P. & Ord, J. K. Forecasting epidemic pathways for measles in Iceland: the use of simultaneous equation and logit models. Ecol. Dis. 2, 377–396 (1983).
Keeling, M. J. & Eames, K. T. D. Networks and epidemic models. J. R. Soc. Interface 2, 295–307 (2005).
PastorSatorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925–979 (2015).
Sun, Y., Ma, L., Zeng, A. & Wang, W.X. Spreading to localized targets in complex networks. Sci. Rep. 6, 38865 (2016).
Eubank, S. et al. Modelling disease outbreaks in realistic urban social networks. Nature 429, 180–184 (2004).
Ridenhour, B. J., Braun, A., Teyrasse, T. & Goldsman, D. Controlling the Spread of Disease in Schools. PLoS ONE 6, e29640 (2011).
Fournet, J. & Barrat, A. Epidemic risk from friendship network data: an equivalence with a nonuniform sampling of contact networks. Sci. Rep. 6, 24593 (2016).
Meloni, S. et al. Modeling human mobility responses to the largescale spreading of infectious diseases. Sci. Rep. 1, 62 (2011).
Ryan, S. J., Jones, J. H. & Dobson, A. P. Interactions between Social Structure, Demography, and Transmission Determine Disease Persistence in Primates. PLoS ONE 8, e76863 (2013).
Wen, T.H. & Chin, W.C.B. Incorporation of Spatial Interactions in Location Networks to Identify Critical GeoReferenced Routes for Assessing Disease Control Measures on a LargeScale Campus. Int. J. Environ. Res. Public. Health 12, 4170–4184 (2015).
Chan, J., Holmes, A. & Rabadan, R. Network Analysis of Global Influenza Spread. PLoS Comput. Biol. 6, e1001005 (2010).
Gómez, J. M. & Verdú, M. Network theory may explain the vulnerability of medieval human settlements to the Black Death pandemic. Sci. Rep. 7, 43467 (2017).
Wang, Y. et al. Global analysis of an SIS model with an infective vector on complex networks. Nonlinear Anal. Real World Appl. 13, 543–557 (2012).
Wang, Y., Cao, J., Alofi, A., ALMazrooei, A. & Elaiw, A. Revisiting nodebased SIR models in complex networks with degree correlations. Phys. Stat. Mech. Its Appl. 437, 75–88 (2015).
Sun, G.Q. Pattern formation of an epidemic model with diffusion. Nonlinear Dyn. 69, 1097–1104 (2012).
Li, L. Patch invasion in a spatial epidemic model. Appl. Math. Comput. 258, 342–349 (2015).
Sun, G.Q., Jusup, M., Jin, Z., Wang, Y. & Wang, Z. Pattern transitions in spatial epidemics: Mechanisms and emergent properties. Phys. Life Rev. 19, 43–73 (2016).
Sun, G.Q. Mathematical modeling of population dynamics with Allee effect. Nonlinear Dyn. 85, 1–12 (2016).
Sun, G.Q., Wu, Z.Y., Wang, Z. & Jin, Z. Influence of isolation degree of spatial patterns on persistence of populations. Nonlinear Dyn. 83, 811–819 (2016).
Sun, G.Q., Wang, C.H. & Wu, Z.Y. Pattern dynamics of a Gierer–Meinhardt model with spatial effects. Nonlinear Dyn. 88, 1385–1396 (2017).
Abler, R., Adams, J. S. & Gould, P. Spatial organization; the geographer’s view of the world. (PrenticeHall, 1971).
Cohen, J. & Tita, G. Diffusion in Homicide: Exploring a General Method for Detecting Spatial Diffusion Processes. J. Quant. Criminol. 15, 451–493 (1999).
Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).
Knox, E. G. & Bartlett, M. S. The Detection of SpaceTime Interactions. Appl. Stat. 13, 25–30 (1964).
Kulldorff, M. & Hjalmars, U. The Knox Method and Other Tests for SpaceTime Interaction. Biometrics 55, 544–552 (1999).
Wen, T.H., Lin, M.H. & Fang, C.T. Population Movement and VectorBorne Disease Transmission: Differentiating Spatial–Temporal Diffusion Patterns of Commuting and Noncommuting Dengue Cases. Ann. Assoc. Am. Geogr. 102, 1026–1037 (2012).
Kulldorff, M., Athas, W. F., Feurer, E. J., Miller, B. A. & Key, C. R. Evaluating cluster alarms: a spacetime scan statistic and brain cancer in Los Alamos, New Mexico. Am. J. Public Health 88, 1377–1180 (1998).
Wen, T.H., Tsai, C.T. & Chin, W.C.B. Evaluating the role of disease importation in the spatiotemporal transmission of indigenous dengue outbreak. Appl. Geogr. 76, 137–146 (2016).
Ciofi degli Atti, M. L. et al. Mitigation Measures for Pandemic Influenza in Italy: An Individual Based Model Considering Different Scenarios. PLoS ONE 3, e1790 (2008).
Balcan, D. et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc. Natl. Acad. Sci. USA. 106, 21484–9 (2009).
Sun, G.Q., Wang, S.L., Ren, Q., Jin, Z. & Wu, Y.P. Effects of time delay and space on herbivore dynamics: linking inducible defenses of plants to herbivore outbreak. Sci. Rep. 5, srep11246 (2015).
Li, L., Jin, Z. & Li, J. Periodic solutions in a herbivoreplant system with time delay and spatial diffusion. Appl. Math. Model. 40, 4765–4777 (2016).
Shang, C.S. et al. The Role of Imported Cases and Favorable Meteorological Conditions in the Onset of Dengue Epidemics. PLoS Negl. Trop. Dis. 4, e775 (2010).
Yu, H.L., Yang, S.J., Yen, H.J. & Christakos, G. A spatiotemporal climatebased model of early dengue fever warning in southern Taiwan. Stoch. Environ. Res. Risk Assess. 25, 485–494 (2011).
Shu, P.Y. et al. Molecular Characterization of Dengue Viruses Imported Into Taiwan during 2003–2007: Geographic Distribution and Genotype Shift. Am. J. Trop. Med. Hyg. 80, 1039–1046 (2009).
Huang, J.H. et al. Molecular Characterization and Phylogenetic Analysis of Dengue Viruses Imported into Taiwan during 2008–2010. Am. J. Trop. Med. Hyg. 87, 349–358 (2012).
Chang, S.F., Huang, J.H. & Shu, P.Y. Characteristics of dengue epidemics in Taiwan. J. Formos. Med. Assoc. 111, 297–9 (2012).
Yang, C.F., Hou, J.N., Chen, T.H. & Chen, W.J. Discriminable roles of Aedes aegypti and Aedes albopictus in establishment of dengue outbreaks in Taiwan. Acta Trop. 130, 17–23 (2014).
World Health Organization. Epidemiology, burden of disease and transmission. In Dengue: Guidelines for Diagnosis, Treatment, Prevention and Control 1–21 (World Health Organization, 2009).
Hsu, C.I. & Tsai, Y.C. An Energy Expenditure Approach for Estimating Walking Distance. Environ. Plan. B Plan. Des. 41, 289–306 (2014).
Leys, C., Ley, C., Klein, O., Bernard, P. & Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 49, 764–766 (2013).
Kan, C.C. et al. Two clustering diffusion patterns identified from the 2001–2003 dengue epidemic, Kaohsiung, Taiwan. Am. J. Trop. Med. Hyg. 79, 344–352 (2008).
Wen, T.H. et al. Spatial–temporal patterns of dengue in areas at risk of dengue hemorrhagic fever in Kaohsiung, Taiwan, 2002. Int. J. Infect. Dis. 14, e334–e343 (2010).
Schmidt, W.P. et al. Population Density, Water Supply, and the Risk of Dengue Fever in Vietnam: Cohort Study and Spatial Analysis. PLoS Med. 8, e1001082 (2011).
Souris, M. et al. Poultry Farm Vulnerability and Risk of Avian Influenza ReEmergence in Thailand. Int. J. Environ. Res. Public. Health 11, 934–951 (2014).
Sharp, B. L. et al. Seven years of regional malaria control collaboration–Mozambique, South Africa, and Swaziland. Am. J. Trop. Med. Hyg. 76, 42–47 (2007).
Laxminarayan, R. Transboundary commons in infectious diseases. Oxf. Rev. Econ. Policy 32, 88–101 (2016).
Hsueh, Y.H., Lee, J. & Beltz, L. Spatiotemporal patterns of dengue fever cases in Kaoshiung City, Taiwan, 2003–2008. Appl. Geogr. 34, 587–594 (2012).
Khalid, B. & Ghaffar, A. Dengue transmission based on urban environmental gradients in different cities of Pakistan. Int. J. Biometeorol. 59, 267–283 (2015).
Stoddard, S. T. et al. The Role of Human Movement in the Transmission of VectorBornePathogens. PLoS Negl. Trop. Dis. 3, e481 (2009).
Acknowledgements
The research was supported by the grants of the Ministry of Science and Technology in Taiwan (MOST 1042627M002020; MOST 1052627M002018). The funder had no role in the study design, data collection and analysis or in the preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
T.H.W. and I.H.W. conceived the experiments. W.C.B.C. and I.H.W. conducted the experiments. W.C.B.C., and T.H.W. analyzed the results. W.C.B.C., T.H.W., and C.E.S. wrote the paper. All authors reviewed the manuscript and approved the content.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chin, WCB., Wen, TH., Sabel, C.E. et al. A geocomputational algorithm for exploring the structure of diffusion progression in time and space. Sci Rep 7, 12565 (2017). https://doi.org/10.1038/s4159801712852z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4159801712852z
This article is cited by

Twostage algorithms for visually exploring spatiotemporal clustering of avian influenza virus outbreaks in poultry farms
Scientific Reports (2021)

Spatial superspreaders and supersusceptibles in human movement networks
Scientific Reports (2020)

EpiRank: Modeling Bidirectional Disease Spread in Asymmetric Commuting Networks
Scientific Reports (2019)

Spatially Adjusted Timevarying Reproductive Numbers: Understanding the Geographical Expansion of Urban Dengue Outbreaks
Scientific Reports (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.