Real-time estimation of wildfire perimeters from curated crowdsourcing

Real-time information about the spatial extents of evolving natural disasters, such as wildfire or flood perimeters, can assist both emergency responders and the general public during an emergency. However, authoritative information sources can suffer from bottlenecks and delays, while user-generated social media data usually lacks the necessary structure and trustworthiness for reliable automated processing. This paper describes and evaluates an automated technique for real-time tracking of wildfire perimeters based on publicly available “curated” crowdsourced data about telephone calls to the emergency services. Our technique is based on established data mining tools, and can be adjusted using a small number of intuitive parameters. Experiments using data from the devastating Black Saturday wildfires (2009) in Victoria, Australia, demonstrate the potential for the technique to detect and track wildfire perimeters automatically, in real time, and with moderate accuracy. Accuracy can be further increased through combination with other authoritative demographic and environmental information, such as population density and dynamic wind fields. These results are also independently validated against data from the more recent 2014 Mickleham-Dalrymple wildfires.


B Calculating wind-biased outward buffers
In order to determine downwind objects, wind-biased outward buffers are calculated for previously estimated wildfire perimeters using the Huygens' principle. 4,5 As illustrated in Fig. S5, an elliptical wavelet propagates from each vertex of the previously estimated wildfire perimeters. The envelope of all elliptical wavelets determines the vertex locations of the buffer. According to, 4 the displacement (x d i , y d i ) from a vertex v i = (x i , y i ) of the previously estimated wildfire perimeter to the corresponding vertex v b i = (x b i , y b i ) of the buffer is calculated as (1) where x s i and y s i are the angle differentials at v i . When the vertices of the wildfire perimeter are in clockwise order, x s i and y s i are approximated by θ i = θ w i + π controls the orientation of the elliptical wavelet at v i , where θ w i is wind direction at v i , i.e., the direction wind blows from. a i and b i are half the minor axis and half the major axis of the elliptical wavelet at v i , respectively. c i is the distance from v i to the center of the elliptical wavelet at v i . a i , b i , and c i are functions of wind speed U i (km/h) at v i .
Wind speed and direction and direction data were used to calculate θ i , a i , b i , and c i . Since the automated weather stations measure wind speed and direction every half an hour, near real-time wind speed and direction are utilized by our wildfire perimeter estimator. In, 5 a i , b i , and c i are calculated as where C r and C u are empirical coefficients; L i and H i are the length to breadth ratio and the head to back ratio of the elliptical wavelet at v i . L i and H i are computed as 6 As the calculations at each vertex are assumed independent of the others, the resultant wind-biased buffer may contain crossovers and self-merge, which are solved in the same fashion to. 5

C Design of experiment details
The behavior and performance of the basic algorithm (without using authoritative information) is tuned through six relatively intuitive parameters. The parameters are A t (the minimum area of a detected fire); χ (the maximum length of any removable edge in the regular polygon representing the wildfire perimeter); τ (the size of the temporal "window" in which calls are processed); and minPts (the minimum number of calls), ε s (spatial neighborhood size), and ε t (temporal neighborhood size) used in the underlying ST-DBSCAN clustering algorithm. minPts can be set effectively by a simple heuristic to be the nearest integer ≥ 2 to ln(N), 7 where N is the number of calls in the temporal window. Thus the performance of the estimator is affected by the rest 5 parameters.

2/8
A full-factorial design of experiment was conducted to find the optimal parameterization and investigate the effect of the five factors on the performance of the estimator. Table S1 summarizes the notations and factorial values of the five factors in the design of experiment. The combinations of different factorial values cover an adequately large parameter space for exploring the effect of the factors on the performance of the estimator. The experiment was repeated under four settings: 1) without using authoritative information, 2) integrating wind field, 3) integrating population density, and 4) integrating both wind field and population density.

D Evaluation details
The experimental results were compared quantitatively with ground truth wildfire progression as our response variable, measuring the closeness of our results to the true wildfire perimeter. Naively, we can apply two evaluation metrics commonlyused in information retrieval: precision (positive predictive value) and recall (sensitivity). 8 Precision can be computed as the number of wildfire perimeter estimates that spatially overlap a true wildfire divided by the total number of estimates. Recall is the number of wildfires detected by our estimator divided by the total number of wildfires. However, raw precision and recall ignore the exact shape and precise location of wildfires, tending to inflate the apparent level of accuracy. Instead area-based precision and recall 9 is preferred in evaluating how closely the wildfire perimeter estimates track the true wildfire progression. Thus the performance of our estimator is evaluated using the area-based precision and recall. Let be the union of the perimeter of N f (k) wildfires burning at the kth epoch of the estimator. While our estimator generated estimated wildfire perimeters every 10 minutes, the reconstructed ground truth perimeters of multiple fires are available at less frequent and irregular intervals. To minimize this systematic effects on evaluation, the true wildfire perimeters at the kth epoch are approximated with the temporally nearest ground truth information. True positive area at the kth epoch (A T P (k)) is defined as the area of the intersection between estimated and ground truth wildfire perimeters, i.e., Then area-based precision (P A (k)) and recall (R A (k)) at the kth epoch are defined as if Area(G(k)) = 0 0 otherwise.
The overall area-based F1-score (F1 A ) is used as the summary statistic of the overall performance of the estimator, computed as the harmonic mean of average precision (P A ) and recall (R A ) over all epochs,

E Parameterization details
In the design of experiment under both population density and wind field weight, the optimal parameterization in Table S2 that achieves the highest overall area-based F1-score is selected. The notations and values of the factors, along with the empirical coefficients for population density and wind field weights are listed in Table S2.  Figure S1. An example of a CFA RSS feed for wildfire. The key attributes "guid", "Type", "Start Date/Time", and "georss:point" are highlighted. RSS feeds for wildfire are extracted by the attribute "Type". Status updates of existing incidents are removed by the attribute "guid". The attributes "georss:point" and "Start Date/Time" provide the spatiotemporal coordinates of the RSS feed.