Spatiotemporal data analysis with chronological networks

The number of spatiotemporal data sets has increased rapidly in the last years, which demands robust and fast methods to extract information from this kind of data. Here, we propose a network-based model, called Chronnet, for spatiotemporal data analysis. The network construction process consists of dividing a geometric space into grid cells represented by nodes connected chronologically. Strong links in the network represent consecutive recurrent events between cells. The chronnet construction process is fast, making the model suitable to process large data sets. Using artificial and real data sets, we show how chronnets can capture data properties beyond simple statistics, like frequent patterns, spatial changes, outliers, and spatiotemporal clusters. Therefore, we conclude that chronnets represent a robust tool for the analysis of spatiotemporal data sets.

This is an important problem of high actuality. The paper is well written; the methods and the results are well explained.
But I have several questions which should be considered before I can make a final recommendation: -It should be pointed out that the method is restricted to 2D data, but often one has 3D data in nature. Is it possible to extend it to 3D? -The definition of an "event" is also not trivial and could lead to several artifacts.
-There could be (almost) parallel events at different spatial locations. How to treat them? -Self-connections are allowed here. But they are typically excluded in networks.
-The pruning parameter seems to be somewhat artificial and could also lead to artifacts.
-There are several techniques in the complex network literature to identify the most influential nodes. They should be used for comparison here at least and may be applied.
-What do you really learn new about fires with your method?
In our detailed answer to reviewer #1, we make clear that all the experiments presented in this paper are new, and they were not presented in our previous work [5]. We believe that the extensive study on various artificial data sets and the large scale real-world data set provides ample evidence on the method's capability. We have also shown that the experiments presented in the manuscript demonstrate how the proposed method can be applied to four spatiotemporal data mining tasks [1]: frequent pattern mining, relationship mining, clustering, and outlier detection. Moreover, we followed the reviewer suggestion, and we added a new experiment in the revised version to show the method application in another machine learning and data mining task: change detection. We would like to affirm that we tried our best to attend the reviewers' concerns and suggestions to improve our manuscript, and we expect the editor to reconsider our paper for publication.

Reviewer 1
We would like to thank reviewer #1 for his/her time, for carefully reviewing our article, and for the considerations that have surely improved the manuscript.
the claims in the paper need to be validated through rigorous experiments that ideally go beyond synthetic datasets and wild-fire a nalyses t hat w as a lready p resented i n t heir previous work. Even for experiments done on the synthetic dataset, this reviewer couldn't find experiments supporting the capability of the network in the context of predictive learning, pattern mining and change detection We agree with the reviewer that additional experiments can better elucidate the method's capability. Therefore, in the revised version, we have added experimental analyses on four new spatiotemporal data sets generated by different models, including the Lorenz and Rössler models. The new study provides more evidence that the Chronnets can properly capture the spatiotemporal patterns of the data sets under study, for example, the double-wing structure of the chaotic Lorenz attactor (see Figure 5 and the corresponding explanation).
Still following the reviewer's suggestion, we added to the manuscript another experiment to support the method's capability in a new spatiotemporal data mining task: change detection. We created an artificial d ata s et u sing t he d ata g enerator p roposed i n t he p aper composed of repeating consecutive events between cells, a behavior that is captured by our model. Spatiotemporal changes in the data set can be observed by tracking the temporal community where the events occur. The experimental result is illustrated in the new Fig. 7. We hope that this new experiment makes clear to the reviewer that our method can be used in almost all the spatiotemporal data mining tasks, showing itself a prominent tool.
Our experimental results focused on presenting evidence that this model captures temporal phenomena previously observed in artificial and real data s ets. We worked with both artificial data sets and wildfire d ata s et t hat w ere n ot u sed b efore. T he a rtificial da ta se ts were proposed in this paper, and the results have not been published in our previous work [5]. The same happens to the real data set. The wildfire d ata s et u sed i n t he p revious work focus in a specific r egion ( the A mazon b asin), a nd h ere w e u se g lobal d ata, w hich changes the analysis and results completely. The larger area permits the method to capture not just the fire seasons in a specific region but also the dynamics on a global scale. The difference in data set sizes also demonstrates the method's capability of dealing with huge data sets. Thus, we reaffirm that all the results presented in this paper were not previously published.
The main goal of this paper is to present a network-based method that captures frequent consecutive events. Another goal is to make possible the use of network science and graph mining tools to extract information from spatiotemporal data. The research areas mentioned by the reviewer involves different tasks with many works in the literature [2]. In our paper, we selected four tasks [1]: frequent pattern mining, relationship mining, outlier detection, and clustering. For these four tasks, we experimentally demonstrate how to use the model together with other graph mining tools to detect temporal patterns. However, it is relevant to mention that our method is not limited to the analysis tools described in the manuscript. Virtually all previous works [6] in the context of predictive learning, pattern mining, and change detection in networks can be applied to this model, which is an advantage. For example, in the context of predictive learning, many methods for link prediction in networks [4] have been developed and can be applied or adapted to our model. Given the complexity of these tasks, we believe that exploring them in this paper would diverge from our focus, which is the chronnet model proposal. The application of other graph mining tools to our model opens many possibilities for future works. We are certain that they will be used together to extract information from different spatiotemporal data sets.
The paper has many presentation issues. The grammar and typos needs to be thoroughly checked, and references should be consistent. For example, many references have all the authors listed while several of them have the authors in et al. format.
We reviewed the whole manuscript and tried our best to remove typos. We also correct the references. Please note that we used the recommended latex bibliographical style for the Nature journals, which automatically shrinks the references to (et al.) according to the number of authors.

Reviewer 2
We would like to extend our gratitude to reviewer #2 for his/her time, for carefully reviewing our article, and for the considerations that have certainly improved our paper.
-It should be pointed out that the method is restricted to 2D data, but often one has 3D data in nature. Is it possible to extend it to 3D?
In this paper, we focus on spatiotemporal events, commonly defined as triples: (x,y,t) representing the location and the time in which observation occurs. So our method, as presented in the paper, takes into account three dimensions (not only 2D). For simplicity reasons, we opt for focusing on the three dimensions because they are the most simple and common way to represent spatiotemporal events. In our method, x and y are divided into grid cells represented by nodes, and the temporal variable t defines the order where the links are established. If the spatial grid is, for example, 10 × 10, it will lead to a chronnet of 100 nodes. This grid transformation is simply a binning operation in the variables x and y to find the cell (and the node) where an event occurs. One possible way to extend it to one variable z (or more variables) consists of applying the same binning operation for the grid to this new variable. Considering now a 4D data set (x,y,z,t) and a grid 10 × 10 × 10, it will lead to a chronnet of 1000 nodes. In summary, more variables can be considered in the chronnet model by increasing the number of dimensions of the grid. Then, our method is not restricted to three dimensions, but it can process higher-dimensional data. We included this explanation in the manuscript.
-The definition of an "event" is also not trivial and could lead to several artifacts.
We define an event as a spatiotemporal observation, i.e. , a record in a spatiotemporal data set. Since these observations are application-dependent, they represent different occurrences in a different data set, but all events have a common characteristic: spatial and temporal values that define where and when they occur. Please note that this definition is widely used in spatiotemporal data analysis [1]. What our method does is transforming a spatiotemporal data set into a network, so it can be used as a general model to study spatiotemporal data set. We do not introduce nor create any record in the data set. If the data set contains incorrect records, this is a data collection issue that can be solved during the data set construction or minimized in the pre-processing step. However, this is not a problem with the method itself.
-There could be (almost) parallel events at different spatial locations. How to treat them?
This is an important question and we included an explanation for this question in the revised version of the manuscript. For simplicity reason and without loose of generality, we consider in the manuscript that time is discrete {t 1 , t 2 · · · ≤ t T } and the time window is h = 1. Let {v 1 , v 2 , . . . , v r } t and {v 1 , v 2 , . . . , v s } t+1 be the sets of r and s vertices whose cells have simultaneous events in time t and t+1 respectively. In this case, a link is established between all the combinations of different vertices between both sets.
-Self-connections are allowed here. But they are typically excluded in networks.
Self-connections represent consecutive events that occur in the same cell. This is additional information captured in the model that might be used or not. If this information is not relevant for the user, self-connections can be easily excluded. Please, note that typical network transformations and simplifications (e.g., node, self-links, or direction removal) may be applied if convenient or necessary. The model enables the use of any network science or graph mining tool to extract information from the spatiotemporal data set, and this is an advantage of the model.
-The pruning parameter seems to be somewhat artificial and could also lead to artifacts.
The pruning process works as a filter that removes weak links. This process does not introduce data or transform them. It only excludes consecutive events that are not relevant or do not occur sufficiently frequently and can be unconsidered. This procedure is another tool for the model that permits the user to focus on consecutive events that repeat more frequently (stronger links). This parameter depends on the application and might be used or not according to the explored data set. The pruning procedure can also be interpreted as an outlier removal process, making the method more robust to wrong measurements in the modeled data set. We added to the method description section an explanation about the pruning process.
-There are several techniques in the complex network literature to identify the most influential nodes. They should be used for comparison here at least and may be applied.
Following the reviewer's recommendation, we added in the revised version examples of how to apply centrality measures in the context of chronnets. It is important to mention that many centrality measures do not take into account the link weights, which is essential information for the model. We do not recommend merely disregarding the weights because, without them, the centrality measures would consider that all links have the same influence. Therefore, we suggest two approaches: (1) to use centrality measures for weighted graphs or (2) to prune low weight (spurious) links and then apply centrality measures for unweighted networks. We added in the method section a description of how to apply centrality measures, and we added the new Fig. 5, which presents applications of well-known centrality measures in pruned or considering the weights in chronnets. We show that these measures can find influential nodes in chronnets. Centrality measures are one of the many network science tools that can be applied to our model. We discuss in this paper only the most common network measures, but as we mentioned before, this model opens many possibilities to be explored in future works.
-What do you really learn new about fires with your method?
We thank the reviewer for raising this question. As we informed in the experimental settings, we use this data set to observe, via the proposed method, features that we already know. We apply our method to a real-world data set to present experimental evidence that it can extract spatiotemporal information that was previously observed in the literature. For example, the outlier events ("type" column) in the MCD14ML data set [3] was used to test if they correspond to the expected high degree nodes in the chronnet (Fig. 8). Furthermore, we describe the frequency of fire events, outlier fire detections, and the seasonal activity, using a single chronnet. We have chosen this data set as a prove-of-concept of our analysis: it has temporal patterns and is considerably large. However, it could have been any other event-based spatiotemporal data set, which is an advantage of our method. This method opens the possibility to analyze other applications and spatiotemporal domains, which we will consider in future works.